Don’t Spook the Cattle!

All my life, people have told me I’m smart. And like other smart people, I have a problem: I imagine that others want to know as much about stuff as I do. And this threatens to spill out in my business writing, and that of the smart people I’m lucky to work with.

Now I certainly don’t think of business stakeholders as cattle, but through saying too much, or saying the wrong thing, it’s pretty easy to start a stampede. People just plain get spooked.

I don’t have data to back it up, but it’s my impression and my experience that most people have a hard time pulling a complex or nuanced message out of dense text. And it’s this sort of message that I often find myself trying to convey.

The Wrong Way

A few years ago, I worked with a technical team resolving a 30+ hour database outage. It was awful: the company had ignored pleas from the DBAs to upgrade aging hardware, and the database server was a single point of failure, but it was being used with a number of production web sites. Hooboy…

As I communicated with management, what spewed out was probably something like:

“…so the restore took 16 hours, but once it was put back, we figured out that the databases hadn’t been backed up with the rest of the server. Also, they’ve pulled the only spare hard disk from the parts cabinet, and we’re running without any failover. Seagate’s not sure if they’ve got another disk that will work in the array – they haven’t made these in a few years. We may have to look for spares on eBay. The team THINKS they can restore the databases from an offsite snapshot, but if another disk fails, we’ll lose the five production sites described earlier…”

Why write like this? It’s a jumbled mess that’s bound to sow panic in people who can’t do a thing to get the problem solved a minute faster. The round of angry phone calls and “get in my office!” texts that will follow isn’t going to do the sleepy resolution team any favors either.

Communication in an Emergency

What management wants to hear is something along the lines of:

Prod Database Outage Status, 3PM Sunday March 31

Problem: Failing disk array brought down production database server on Friday night.

Status: Restore from tape backup has not yet brought the database back online. Problem replacing hard disk means database is not operating in fail-safe mode.

Current actions: Working with vendor and aftermarket to secure spares to repair disk array. Team retrieving offsite backup to attempt restore of databases.

Resolution timeline: If vendor can provide spares, the team will begin the rebuild by 6PM PDT this evening, and expects to be done by morning.

Next report at 6:00 PM EDT

  • A concise and consistent short title for the problem.
  • What happened?
  • What’s the current status?
  • What are you doing about it?
  • When will things get better – or at least when will you report out again?
  • Maybe what next?

Transparency and clarity are as essential as brevity. You’re not lying, you’re not trying to hide anything, just trying to generate understanding without triggering panic.

What generates panic? Uncertainty about a bad outcome

Nobody panics because there “might be puppies and cake tomorrow“. But “if we lose another disk, all five sites go down” is a scary prospect based on no data about how likely it is for another disk to fail. So don’t speculate. Warn if you must, but don’t spin out a nightmare scenario in a communication if you have no idea whether it will occur.

Save speculation for face-to-face conversations with people you must influence. They’ll be better able to gauge your mood, and can ask questions and explore the scenario.

I’ll see your Strunk and White and raise you 140

The essential skills of writing are reading and editing – good writing is a matter of design and iteration.

I found my ability to write improved after I’d used Twitter for a while. The discipline of communicating a succinct message in just 140 characters focused me on editing. I try hard not to use shortcuts like “U” and “&”, but to write in English, and get the message down to its essentials. Twitter enforces concision.

Strunk and White in “Elements of Style” said about concise writing:

“A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts.”

When writing for consumption in an emergency, remember that your message must be:

  • Concise – Make the message “As simple as possible, but not simpler” (paraphrasing Einstein).
  • Accurate – Speak the truth you can verify. Say what you know. Don’t speculate on what might be.
  • Authoritative – Provide authority by attaching your name, date of the message, naming your sources.
  • Neutral – Talk about the situation, not personalities. Don’t try to affix blame or settle scores. Be effective, first.
  • Transparent – Don’t soft-peddle or cover-up. The Big Deal just gets bigger while it’s covered up.

Use the acronym CAANT to help you remember these rules.

When to spook

Believe it or not there actually are times when Spooking the Cattle is the right course of action.

Remember the long-ignored database cluster above? Mission critical, single-point-of-failure, but running on hardware so old that replacement disks are no longer being manufactured? This system was positively dying for some pre-emptive spooking action.

It’s human nature not to want to pay to replace a system that seems to be operating well. The challenge for the admins of that system was making the cost of failure apparent to the business that must fund its replacement. Over time, several people had suggested we “just kick the plug out of the wall and see who screams loudest”. Don’t try that.

One way to make the pain and terror of a mission-critical system outage feel real to the people who will fund its replacement is to pre-announce a maintenance outage, or a disaster-preparedness exercise. You can describe the business consequences of the system going offline:

“…will be unable to process credit card transactions…”, “…corporate master data offline…”, “…unable to access patient records at this time…”, “…web site will be unable to serve traffic…”

Schedule the exercise as close to peak business hours as possible without making the prospect unbelievable.

Of course, you don’t actually have to take the system offline. If you’re lucky, when you broadly and loudly announce your planned outage, you’ll have something of a stampede on your hands, and a demand that the outage not occur. As the dust settles, you’ll be in a better position to make the case for replacement of an aging system the business relies on.

After all, in an emergency, the effects the business couldn’t tolerate in a planned outage would be forced upon it at a time it didn’t choose.

Good luck, and hang onto your hat, cowpoke!

 

Understanding, communicating, motivating – this is what I do.