How to deal with the IT Skinner box?
November 27, 2008 7:22 AM   Subscribe

I'm starting to have trouble with stress from being perennially "on call" in my IT position. I need some advice on how to deal with it.

I don't have a bad job. I work for a medium-sized business with a substantial number of employees. We're a two-man shop -- I'm the manager, and I also have a full-time sysadmin; we run around 20 physical servers, physical security and phones. My employers are nice, I like my co-workers, and I've been given the money and opportunity to improve the network so that it works well. Generally, I like to think that we run a rather good IT shop; the staff appreciate what we do. I've been doing this for several years.

My employers are casual, but (naturally) expect good work. They're loathe to demand a particular service level agreement -- or even specify one when asked -- but it's clear that they want the systems up and available. Last year we hit four 9s.

I'm ultimately where the buck stops. My sysadmin, while very capable and a great troubleshooter, doesn't have the experience to fix all of what we do quickly. We maintain a very diverse number of systems running different apps on different platforms; I can't just hire "a linux guy" or "a windows guy" or "a so-and-so guy" who can deal with most of what we work on. If something with any of the systems goes wrong, I'm ultimately the one on the hook. If someone sets off the alarm, I'm on the call list.

Even though the vast majority of days are problem-free in reality, and even during those times I know my sysadmin is technically on-call with the monitoring system, I feel like I'm effectively on call 24/7, every day. I'm worried about visiting friends in a far (but drivable) city tomorrow for Thanksgiving, since my sysadmin has traveled by plane far away, and it'll be a nightmare for my family for me to have to head into the office if something goes down that I can't resolve remotely, or if there's a security problem at the building.

I'm starting to find my stress level increasing -- not because things are going wrong, but just out of doing contingency planning, considering all of the options to determine what I should be prepared for in case something does go down. I can't enjoy life anymore, and I feel like I can't "get away."

I feel like when I was a bit younger, I was able to deal with this much better emotionally -- or, at least, I worried about it less. Now, I'm slowly becoming a nervous wreck.

I'm sure others in small-business IT have experienced this before. What can I do to deal with the stress? What changes can I make -- short of adding experienced staff -- to make this easier to take?
posted by anonymous to Work & Money (32 answers total) 12 users marked this as a favorite
 
Oh man, I feel for you. I've been in that situation before -- roughly from 1995-2005, and now since 2007 I've been the on-call guy too, all in small departments like yours. And yeah, I've spent a lot of time in the past worrying about what happens if I go on vacation, or if I go outside of cellphone range, or my battery goes out, or I go to the movies and shut off the phone...

Here's my solution, which may not be helpful to you and I'm sorry if it isn't:

Fuck it.

Yes, you should be as available as much as you're able. But it's entirely unrealistic for *you* to have four nines of "uptime". We are people, we're not machines, we need downtime. Sleep. Recreation. Times where we don't have to worry if our systems rooms are on fire. And your employers, if they have any sense of compassion at all, will be cool with this. Either they'll be cool with this, or they'll see the need to hire additional folks, and if they're not cool with it, then they suck and you should think about your relationship with them.
posted by the dief at 7:49 AM on November 27, 2008 [17 favorites]


I was also on-call for about 10 years....the stress, the anxiety, that feeling of being "chained"....it slowly creeped in.
My advice? change jobs. really. You wrote "I cant enjoy life anymore". From here, it only goes down. I had a flashback when I read that sentence....I remember how bad I felt at that time, and I am so pleased that I make the right choice....get out.
posted by theKik at 8:07 AM on November 27, 2008


Seconding theKik. That line of employment is closer to an enforced addiction than a healthy career. It will take time and changing a large part of your life. But it is time to get your life back.
posted by YoBananaBoy at 8:19 AM on November 27, 2008


I'm the manager, and I also have a full-time sysadmin; we run around 20 physical servers, physical security and phones.

Your employers are exploiting you, and you need to increase your headcount, period. Two people managing that level of technology is utterly insane. Unless those 20 servers are part of some farm serving a single app, I'm guessing your company is several hundred people. There are rules of thumb in IT for people vs. servers/seats, I'm sure someone here knows them.

You need to lobby hard hard hard to get more people who are specialized in certain areas to better cover everything. Start writing down all the stuff you're not getting done, or that only get done because you're working overtime. Don't sugarcoat any of it. All that stuff translates into man-hours, which translates into additional headcount. And then there's forecasting for growth- have you had a serious talk with your superiors about capacity planning?
posted by mkultra at 8:29 AM on November 27, 2008 [2 favorites]


Seconding the dief. In 7 years, I took maybe 3 weeks of vacation (never more than a day or two at a time, and never terribly far). Then I stopped caring. I went on a nice long (for me - 1 week) vacation. I checked email in the evening. There was a minor emergency, but they survived. There have to be limits.
posted by Cat Pie Hurts at 8:32 AM on November 27, 2008 [1 favorite]


Hire someone ASAP to help balance the workload, before you inevitably burn out.
posted by Blazecock Pileon at 8:36 AM on November 27, 2008


You need at least 3 people to doing your job if you're expecting that kind of uptime, if they're each working or on call 80 hours a week with minimal overlap. You alone can not do it. If the powers to can't or won't provide you with that, then you have two options: Say 'fuck it' and do the best you can while still living the life you need, or say 'fuck it' and get a new job. You need to do one of those two things, for the sake of your health, sanity and your life. It really is that simple. It's the old "Price, speed, or quality. Pick 2." dilemma.

20 years from now, you won't be regretting the fact you didn't have 4 nines for 2008 for some company that's all but a distant memory, you'll be regretting that you didn't spend more time with your family, that you didn't see your friends, or had no time for a relationship.
posted by cgg at 8:38 AM on November 27, 2008 [3 favorites]


ack! that should be: "if the powers to be can't or won't..."
posted by cgg at 8:39 AM on November 27, 2008


I know that feeling. I know it very well. At one point, you start to carry a laptop everywhere because you never know when screamy, panicky people must have, well, whatever it is they want. Unless you're in the medical field or keeping airline traffic control systems up, nobody is going to lose an eye over it, but people who will just calmly sigh when the power goes out flip out like ninjas over service interruptions.

All of the above advice is good, but I suggest you treat your job like an application whose purpose is to give you money and a nice life. First, look at other servers (workplaces) and applications (jobs at the workplaces) and evaluate their suitability. I'm saying that you should have some job interviews lined up and maybe an offer or two. Only then can you rationally be able to sit down and write what you'd like and present it to your employers.

State that you are burning out. State all of the ridiculous things you do, with the cellphones, and the extra cellphone batteries, and maybe that backup pager for times when the cellphone coverage isn't so good. You've got that laptop in the trunk that you probably even paid for, with all kinds of stuff on it ready for bear. It's fine to make the analogy: you're a brain surgeon who has to have a bag of tools ready at all times, but you're not being paid like a brain surgeon. Even brain surgeons get to enjoy their vacations. People who start talking about "the nines" do not get to have just two people working IT.

If said employers begin histrionics, you can let them know that they can either have you on call some of the time, or have you on call not at all, because you are ready for the switchover (you have another job lined up). You should probably dump them if they're histrionic in any case, but that's up to you.
posted by adipocere at 8:48 AM on November 27, 2008


You need to enforce limits on your availability. Turn off the cell phone and pager at those times (weekends, nights, whatever zone you establish for yourself). You can do your job well without being the one to do everything all the time.
posted by zippy at 9:03 AM on November 27, 2008


What are your job duties? Are you contracted to be on call 24/7? Or are you just contracted to work *during office hours* to keep the server downtime to a minimum?

I know you feel an obligation to keep the servers up *all the time*, but you need to distance yourself from the situation. Work your 9-to-5 and during those times when you're not officially on-call, the company's uptime is not your responsibility. You can deal with it when you get back to work.

If your employers complain, remind them that you're not obliged to work overtime. Explain (nicely) that you have personal things to take care of. There's no need to go into any details. Say that you'll work overtime if you can, but if they need more then they need to hire more people.
posted by Xianny at 9:47 AM on November 27, 2008


"My sysadmin, while very capable and a great troubleshooter, doesn't have the experience to fix all of what we do quickly."

That "quickly" part really jumped out at me. I've been in your sysadmin's place. You know what helped me become a better sysadmin? My boss saying "I won't be available. I need you to handle whatever comes up." Was it stressful? Yes. Did I sometimes take longer to solve the problem than the company (and I!) would have liked? Yes - but you know what? There's never any "quickly enough" where the company is concerned. You said your sysadmin is a great troubleshooter, so let them do it. They'll get faster. You can always have some guidelines, like "If you can't fix it within x hours, call."

And as for you being concerned that something might fail on Thanksgiving? You have the whole weekend to get back there, and if they don't have the systems up on Friday, jesus, tough shit. Like everyone else said, you need to hire at least another person to take some of the load off - all the servers, clients, AND the phone system? Sheesh. And what do you mean you handle physical security, as well? Get someone like Sonitrol in there, hook their cam server into your network so you can monitor the cams remotely, and kick back.

Also - I'm not sure if Xianny's suggestion will go over very well if you're salaried.
posted by HopperFan at 10:09 AM on November 27, 2008 [2 favorites]


Being on call is disastrous for your mental health. The way to handle it is to set firm limits and make sure you are adequately compensated.
posted by ikkyu2 at 10:31 AM on November 27, 2008


You know what helped me become a better sysadmin? My boss saying "I won't be available. I need you to handle whatever comes up." Was it stressful? Yes. Did I sometimes take longer to solve the problem than the company (and I!) would have liked? Yes - but you know what? There's never any "quickly enough" where the company is concerned. You said your sysadmin is a great troubleshooter, so let them do it. They'll get faster. You can always have some guidelines, like "If you can't fix it within x hours, call."

This is good advice (I've been in IT for 38 years).
posted by davcoo at 11:49 AM on November 27, 2008


When faced with this same situation, we added layers of redundancy and ways to access systems remotely. For instance, we have managed APC power distribution units that we can use to remotely power systems that have become responsive on and off. We have purchased a KVM unit (APC AP5616 I think) that is accessible via internet on any computer that can run Java apps. For those of us who are going to be away or possibly not have an internet connection, we have cellular modem PCMCIA cards that provide a decent ability to at least shell in to a prompt ... or in bigger cities with better cellular wireless, we can actually use the KVM via the cellular card.

The end result: Even if we're on call 24/7 and we all are out of town, we can still do pretty much anything except replace a physical machine that's failed -- and that's why we mostly use virtual machines that share storage and aren't dependent on a specific physical server.
posted by SpecialK at 11:57 AM on November 27, 2008 [1 favorite]


Your problem is very much your company's problem. You cannot be the only person who knows how to handle everything that comes up. If you are, not only are you wearing a ball and chain but your company is insecure, because even if you were to sacrifice yourself for them indefinitely (which you must not try to do), you cannot guarantee you won't get sick, have family emergencies, fall under a bus, etc.

Make them see that. Get somebody trained. At the very minimum, every essential task should be coverable by at least two people, and if not, your company's doin' it wrong. Explain it to them. If it doesn't take, start getting your resume out there.
posted by George_Spiggott at 12:08 PM on November 27, 2008


If you've started to have Pavlovian reactions to your cell phone or pager, it's time to make changes. Like it's been said upthread, you can't continue to maintain that level of service by yourself. Something has to give, and if it's not management, the choice is between uptime and your life.
posted by zamboni at 12:08 PM on November 27, 2008


Two thoughts from my own personal experience with a similar situation on a larger scale.

1. Document everything and make a habit of taking at least two weeks a year of vacation. Plan them well in advance so that everyone else knows to expect the situation, and make sure everyone else is comfortable with the situation before you leave. They won't deny that it's your right to take some vacation occasionally even though they may be nervous about it. This isn't about you, it's about what happens if you are unavailable for some reason outside your control when things go to crap.

2. The only systems you can maintain are the ones that require no maintenance. Every time something breaks that requires your intervention come up with a plan to ensure that a similar problem can't recur. If this means massive redesign of some systems, it's worth it - manual maintenance doesn't scale.
posted by sergent at 12:21 PM on November 27, 2008


If more staff is not an option, why not do a two-tiered support idea?

Every other week, the junior takes the pager. S/He can contact you for problems beyond beyond his/her experience but handle the easy stuff so you don't need to sweat it?
posted by WinnipegDragon at 12:57 PM on November 27, 2008


I've been the emergency-backup guy in similar situations (despite my mediocre skill set in systems, it's sometimes been a best-of-weak-options thing) and I empathize with the weight of the world stress. I've also seen real systems folks bring this upon themselves. (Please don't be offended by what follows -- I have no idea if this is your situation or not, but even if you're the noblest person ever there's advice here too...)

I've seen even good systems people bring this upon themselves, mainly, by simply not documenting anything well, whether out of a sense of Wally-like job-ensurity or just plain laziness or bad priorities. After all, if you're the only one who can fix things you've got a job for life, right? Right?

The ironic punishment you get for that approach is that not only do your employers and coworkers end up hating you for the wizard's tower approach, you also get locked in yourself, because indeed, you're the only one who can fix things, so kiss your vacations and free time goodbye: it's just how you planned it. Quite the dramatic twist.

To avoid falling into that trap (and I'm not suggesting you did so at all, let alone deliberately. but some do), document the hell out of things. If you don't have a formal documentation system in your shop, start dedicating fully half (yes) of your time to building a process book or a wiki or big folder of RTFMs that you can add to slowly over time. Do it descriptively, not for morons, but the key info you'd want a competent replacement to know: where things are, what format they're in, how to reconfigure this, and so on. And whenever you FIX something, realize that your job is not complete until you've documented (diarized) what you have done and how, for others to follow when you're in Maui, or in general. Whenever you fix someting AGAIN, try doing it by following the steps in your previous document... you'll probably find things that you described poorly or steps you skipped.

Repeat THAT for a couple years, and you will have the smoothest running systems shop on the planet, where any semi-competent person can fill in as needed by following the recipes of the master. Now you can take vacations, sleep in, even quit and find a new job without feeling guilty, all at your leisure.

Success isn't job security. Success is replacing yourself.
posted by rokusan at 1:03 PM on November 27, 2008 [2 favorites]


Sergent's "intervention events signal a need for a long-term fix" is also great and fits into my notion of when the alarm should go off in your head that says "Wow, I really need to document this for next time." philosophy.

Walking away the moment something is "working again" is a huge mistake I have seen hundreds of times.
posted by rokusan at 1:04 PM on November 27, 2008


What happens to your company when you get hit by a bus?

Seriously. If nothing else will get it through their (and your) head that this isn't sustainable, what provision has been made for you being in an accident? None, from the sounds of it.

Other than that, I echo what others have said: two people is not enough for making four nines sustainable. Your subordiate will never learn to fly on his/her own if you don't let them solo.
posted by rodgerd at 1:05 PM on November 27, 2008


Do you have disaster recovery for your IT systems? Mirrored servers, seperate data centres, redundant links etc?

Its time to do disaster recovery planning for yourself - since it sounds like you are headed towards a disaster. Document what you do. Give your sysadmin a chance to do what you do, so you have the chance to enjoy time off work.

I've been in your position, and it was (for me) partly a problem of ego. I thought I could fix it better and faster than anyone else in the company, and of course the company exploited that, and I let them. Once I realised I was close to burning out, and I realised that the staff around me were very capable and in a lot of ways BETTER than I was, I documented and let go.

I also think you probably need more staff given what you support.

Good luck.
posted by Admira at 1:30 PM on November 27, 2008 [1 favorite]


You say you've got a good relationship with your employers; have you laid this out as clearly for them as you have for us? If they don't know that the issue is this serious, they can't address it.

Tell them what you're written here, clearly and in detail. There's been lots of good specific advice here, but what you're after is boundaries. Give your employers a chance to think about this and negotiate clear limits to your personal responsibilities.

You need to communicate your problem to them first though.
posted by bonehead at 2:16 PM on November 27, 2008 [1 favorite]


What happens to your company when you get hit by a bus?

This bears repeating. One of your goals should be to rearrange things so that you are no longer indispensable all the time.

In the late '90s I worked for a company where I was the only one who understood and felt confident with knowing what to do when their ecommerce web site had problems. I was called at all hours. I was called while at medical appointments. I was paged in the dentist's chair; lucky her wasn't drilling a tooth at the time.

You need to be able to arrange things so that you have some time, if not once a week at least a few times a month, that is sacrosanct where you cannot be disturbed. This is necessary for your mental health. It is also necessary for the development of your staff.

Hire at least one more person.

Educate them and empower them to make decisions in your absence.

Then -- and this is the hardest part -- learn to let go.
posted by Robert Angelo at 2:30 PM on November 27, 2008 [1 favorite]


The advice above is gold, I love threads like this. What you need to do, and I mean this weekend if you can, is to document, document, document. I mean everything, passwords, IP addresses, quirks, everything. Do this freeform, and then figure out how you want to organize it later. This will take some time, get your sysadmin in it. Go through everything, everything from setting up a phone, to who to call for hardware failures. You have support contracts for all your mission critical elements, right? I neglected this to make my costs look better and because "support sucks." Okay if the company's support sucks (like Cisco I can call up and feel confident they can work through anything, Microsoft no so much), find an outfit that just does support. If it is custom shop and they won't do support, find another shop that will support it. I've found experienced people who specialize in the craziest stuff, like Exchange 2007 installs. If it is so critical that I absolutely need to be there in person at 9PM on Christmas then I am doing it wrong (or getting grossly underpaid as I'm managing something very valuable, you get the picture?).

And nearly every system must have redundancy or must be on the "do not call me outside office hours for this" sign on it. This works wonders. Don't be a corporate dick about these things, but do make sure you either track time or have the ability to charge the budget of whoever is calling you. If, for example, sales is really pesky and has some line of business service they told you they rarely used and wanted to keep the budget down so you don't buy whatever it is you need to buy to make it redundant. Make sure you have a way to track the times they call you or that it breaks. Record when it happened, record how long it took you to resolved, record everything. Hey sometimes it might even be your fault for the system breaking (you were upgrading component X that had nothing to do with Y, but well, actually it did). It doesn't matter. If they call you record it, charge it. Don't make exceptions. Have quarterly meetings with your sysadmin and your CFO, go over usage, show him the time spent, the numbers. When people see numbers and graphs and reports, magically they become logical creatures. Oh they actually use it more than they thought? Yeah let's bump up the budget and get a second redundant component in. Give them options, managers love options. Make sure you let them know what option you think is the best. Execs also love shit like this. You run the risk of becoming like the Hapsburg empire, yes, so don't fall too much in love with MBA reports and numbers.

Also, listen to Admira. And if you do a good job of this and revamp the IT department, you just hit independent consulting gold.
posted by geoff. at 2:36 PM on November 27, 2008 [1 favorite]


the reason it seemed easier when you were younger is that this stuff wears on you.

nthing everything above, and also: you need to document this for your employers, along with your plan for changing it. e.g., set milestones and dates (i will have all documentation written by x, all documentation in a wiki/KB by X, will write job descriptions by X, will start interviewing by Y, will have hired one backup by this date and then the second by this date. if you don't set the deadlines and publish them you will find reasons that it can't be done.

this also puts the company on notice and finally puts expectations in place. if you've been constantly available and then walk in tomorrow and say, "nope, sorry, done" they're going to freak, and rightly so. yes, they've been abusing your dedication, but you can't just have them go cold turkey.
posted by micawber at 2:58 PM on November 27, 2008


"I'm ultimately where the buck stops. My sysadmin, while very capable and a great troubleshooter, doesn't have the experience to fix all of what we do quickly. We maintain a very diverse number of systems running different apps on different platforms; I can't just hire 'a linux guy' or 'a windows guy' or 'a so-and-so guy' who can deal with most of what we work on. If something with any of the systems goes wrong, I'm ultimately the one on the hook. If someone sets off the alarm, I'm on the call list.

"Even though the vast majority of days are problem-free in reality, and even during those times I know my sysadmin is technically on-call with the monitoring system, I feel like I'm effectively on call 24/7, every day. I'm worried about visiting friends in a far (but drivable) city tomorrow for Thanksgiving..."


You're not that essential, the uptime isn't that important, and you need to learn to delegate.
posted by orthogonality at 3:08 PM on November 27, 2008 [1 favorite]


My employers are casual, but (naturally) expect good work. They're loathe to demand a particular service level agreement -- or even specify one when asked -- but it's clear that they want the systems up and available.

You're ruining your life to perform a service that no one has actually asked of you? You don't have to prove anything to your employer, you just have to fulfill your contract. Get those SLAs and get an appropriate headcount to meet them.
posted by PueExMachina at 5:39 PM on November 27, 2008


In my first programming job (COBOL!) I was the new guy so I got the pager. As has been stated above, I learned the system faster than I ever would have otherwise.

At the same time the sound of the pager was traumatizing. That sound would cause a nervous twitch for years afterwards.

As long as you treat everything like an emergency then everything will be and emergency.
posted by trinity8-director at 6:07 PM on November 27, 2008 [1 favorite]


Think about this: if you're not single, you might be soon. Is this job worth it? I laid down the law when my husband was constantly checking his smartphone during family dinners. He's not a brain surgeon; no one is going to die if he reads an email 20 minutes after it was sent. Whoever said it was an addiction upthread was correct. He has been able to break this addiction by changing his habits and setting boundaries with his employer (with very strong encouragement from his wife). We did not get one work-related phone call on our last vacation, or I would have reached through the cell network and strangled the caller. He's still very responsible at work without letting it run his entire life.
posted by desjardins at 4:40 PM on November 28, 2008


You need more people. Period.

If you employer balks at this simply explain to them what would happen to their system if you got hit by a bus. They'd be up a creek. Having all of the system knowledge in one head is a monumentally stupid move for them to make.

Just like all your hardware needs full redundancy, so too does the knowledge that keeps it running.
posted by jaded at 9:56 AM on November 29, 2008


« Older Wanted: Research on how illustrations influence...   |   Accessing Admin's Music from a Limited Account Newer »
This thread is closed to new comments.