I need a Windows failover backup solution.
September 30, 2005 12:16 PM   Subscribe

We are considering acquiring a failover backup for a Windows 2003 server at my work. Is there a good turnkey solution for failovers for under $2000-3000? Is there *any* solution available at that price? Firewall recommendation bonus question inside.

Ideally it'd mirror the main server, but I'm sure we could live with having it lag a day behind if there is a big price difference. The big thing is that it needs to be entirely automated and reliable. I don't want our server to eat it and then be all "oh, it's ok, we'll just crank up the failover" only to find that it's been idling for the past month (this has happened to me with a backup solution that was a bit iffy).

We're willing to spend upwards of $3000 if it's really necessary, because we basically figure this is business insurance. If we lose that server, we're hosed.

We'll also want to be able to have a failover solution for our linux box(es) in the future, so if there's some sort of software we could buy that would cover ALL of our machines, that'd be cool -- or if there's a free open source solution for the linux box so that we wouldn't need to buy anything, that'd be great too.

The bonus question: We're pretty set on purchasing a Sonicwall Pro 1260 enhanced firewall. Anyone have any experience? We don't really need all the features right now (we've just got two windows servers and a linux box) but we've got a full rack to fill up so there'll probably be more stuff in there. We don't necessarily need the enterprise features, but the VLAN stuff would be nice. Anyone want to suggest something else we should also be considering?
posted by fishfucker to Computers & Internet (20 answers total)
 
I don't have specific experience with that model, but I have a number of Sonicwall firewalls in my care that work like champs. Do you have a specific question about Sonicwalls in mind?
posted by Rothko at 12:35 PM on September 30, 2005


Wow I am currently researching this too. My plan was to build an IDENTICAL box with only one harddrive matching the one that's in the first server. Then if anything gets hosed we'd take out one of the harddrives (we run a RAID with two drives) and stick it in the spare clone server. It'd then be up and running in no time. This, however, is less than ideal and I was looking for the same setup as you.
posted by geoff. at 12:53 PM on September 30, 2005


Oh and if both harddrives get hosed you are fubar, and will have to revert from a tape revision.
posted by geoff. at 12:54 PM on September 30, 2005


yeah, we're looking to actually run a box in parallel. I figure we could do this for around $2k-3k (because our server is maybe a $1000 box now on ebay, and then 2k-3k for software), but that may be a lowball figure.

Ideally we'd be able to load balance to the other box too.

Well, i guess I should do some more googling. I'll post anything i find back in this thread, but I was hoping someone would have direct experience.

Rothko: no specific questions, really "[they] work like champs." was pretty much what i wanted to hear. Maybe any downsides you're aware of?
posted by fishfucker at 1:31 PM on September 30, 2005


Sounds like the Windows Server 2003 Cluster Service is exactly what you need to implement :)
posted by starscream at 1:37 PM on September 30, 2005


Ooops, here's the link.
posted by starscream at 1:39 PM on September 30, 2005


What do you use the server for? Is it an AD server that holds everyones files? Is it an app server? A DB server? The solution would be different in each case.

I don't agree that the Cluster Serivce is the necesarrily the best way to go. It probably overkill unless you need it for load balancing.
posted by gus at 1:47 PM on September 30, 2005


In my experience, if you can get the Sonicwall working how you want it, it'll work well. But if you run into a problem, heaven help you. The software is quirky, the support will make you hate life itself, and they'll try to nickel and dime you death the whole way. The support renewal requests that look like bills are particularly scummy. Oh, and "transparent mode" isn't.

So, yeah, I'm fed up with Sonicwall, and will try someone else next time, probably Netscreen.
posted by trevyn at 2:01 PM on September 30, 2005


Gus: The server is our primary web and mail server. It also contains our MySQL db. We aren't using AD because we're too small for it. It is not part of a windows network. Yeah, load balancing isn't a primary concern -- it might be nice, but what we need is a solution that I can implement in the next 2 months, and I'm thinking that setting up a cluster would take me a hell of a lot more time then that as it's probably beyond my it skillset.

I guess basically we'd like to have a really GOOD backup solution by the end of the year, meaning that all of our data on server A would be copied at least nightly (preferrably simultaneously, if this is possible) to server B and server B would be ready to go into service in under an hour should anything happen to A. I'm not real familar with enterprise level IT stuff so the sort of information available on the web tends to spin my head; however, I have the feeling that there should be a solution for this sort of problem.

-------------------

After doing some research it appears that maybe I am not using the right terms in my question.:
"In a failover cluster, the nodes within the cluster share disk storage. The applications and data used on the cluster are stored on the shared disk so that each server can access the data in the event of a failure; however, both machines can never access data on the external storage array at the same time. There must be one owner for each physical disk used within the array. As you will see later, configuration of the cluster software entails assigning ownership of all applications on the cluster to one machine."
the above isn't exactly what we want, because all the data is stored on the external drive. Ideally we would have two machines, one which cloned the other, and then if the first one had a hd suicide, then we could just start the other one up. My concern with the failover cluster as described is if the shared external drive eats it, we're still fucked. We care more about the data then the service availability, but we do want to have a server back up and running (with as up-to-the-moment data as possible) in under an hour.

Is the solution to this a failover with an unsually large external raid array?
posted by fishfucker at 2:07 PM on September 30, 2005


/shameless plug

I work in the technical support department for NSI Software, the makers of Double-Take. It should do exactly what you want it to do. We mirror and replicate any data on a NTFS file system. The tricky part will be the failover. We can put the NetBIOS name and IP on the target server, but how your application responds to it is a whole 'nother ballgame that will require testing to validate that it works.

MS Clustering might work, but it is expensive, you have to know what you are doing, it requires AD - which you don't have, the applications must support clustering, and the hardware is expensive.

If you want more info about Double-Take, email me (look at my profile).

/shameless plug
posted by internal at 2:22 PM on September 30, 2005


internal:

that does sound like exactly what we need. I'll give you an email.
posted by fishfucker at 2:40 PM on September 30, 2005


I'm a big fan of over kill when it comes to mission critical systems, clustering is perfect for this, you can take an axe to one of the boxes and no one will notice, but you'd have to eat a lot of overhead for the AD. 'Course you plan to expand quite a bit so it may be better to start off with AD rather than having to migrate in a couple years time anyways.
posted by Mitheral at 2:55 PM on September 30, 2005


I'm still not getting why RAID wouldn't work? If there is a HD suicide you simply pop out the HD, reboot the HD mirror each other and off you go. It would take less than hour (or even reboot time if you don't want to have the RAID mirror until a more convenient time).

I have a Firebox which was a bitch to setup, but fairly easy to maintain. I don't keep up with the "Live Subscription Service" which is the yearly tech support for like $600/yr. It's been working fine, knock on wood. I actually have two running a VPN and both work great. The onlything I don't like is the interface, I was brought up on Cisco command line so I was used to doing that. I do hear from other IT workers that the GUI interface is great for those who don't have previous Cisco experience (and thus are forced to point and click instead of just using long commands).
posted by geoff. at 4:00 PM on September 30, 2005


Rothko: no specific questions, really "[they] work like champs." was pretty much what i wanted to hear. Maybe any downsides you're aware of?
posted by fishfucker at 1:31 PM PST on September 30 [!]


Their high cost? Their tech support is not very helpful beyond a reset-everything-and-set-it-up-again script, but if you know what you're doing, the Sonicwall boxes are largely airtight.
posted by Rothko at 4:03 PM on September 30, 2005


I'm still not getting why RAID wouldn't work? If there is a HD suicide you simply pop out the HD, reboot the HD mirror each other and off you go. It would take less than hour (or even reboot time if you don't want to have the RAID mirror until a more convenient time).

We want to have a backup in case we have the unusual but possible situation of the two of the drives going at the same time. We've also had some weirdness with the RAID array in that server so I don't entirely trust it. It was a rush job to production, and so i haven't actually tested the hot swappability of that beast. (yeah, not wise, looking back. but that's where i'm at).
posted by fishfucker at 5:02 PM on September 30, 2005


Also if your failure is in the RAID controller (like the one that HP replaced for me under warranty yesterday) your mirror set may not be usable or recoverable.
posted by Mitheral at 5:15 PM on September 30, 2005


Definitely, definitely get the RAID to wear you trust it. What I do currently (instead of a solution another layer of redundancy you're seaking) is test and swap the RAID periodically. I do it around low times in case something catastrophic happens (Christmas and 4th of July for me). I always replace one harddrive during the procedure so there's a six-month gap in between purchase times.

I am watching the life of some servers that were purchases ~5 years ago, so I have a good idea of what the end-life looks likes. The hard drives are first to go, these servers didn't see my anal swapping procedure, and both seem to go within three months of each other. The video cards are the next to hit the fan followed by the power supplies. By the time the power supplies stopped functioning (they were both redudant and failed within 2 months of each other) I decommissioned the server.

I have never had any problems whatsoever with a RAID controller failing. I keep some basic parts (fans, RAM, cheap video card) around in case something happens. Luckily I've never had a CPU go out. I keep really good airflow and do a total overkill on the fans so the whole room stays nice and cool. It seems like the only things that really fail are the hard drives because of their moving parts, the video cards (the fans are tiny and die young) and the power supply (forwhatever reason).

I'm in a larger setup than you and they demand pretty much 100% uptime. I tell them to expect one day downtime if anything crashes hard, less than hour for most hardware disturbances. If you need more reliability than this you're going to need to put out a lot of dollars for "enterprise" solutions such as clustering.

I guess you need to do what I did (with management) and decide the cost of working like a major enterprise and the cost in downtime (whether it be lost clients, lost productivity, etc.) So far us at least, the cost of complete and absolute failover negates the hour/half a day or so a year we need to spend on getting a server back up online -- and we're a slightly bigger company it seems. The perspective we took was "How much will we lose company-wide in the event of total server failure (to backup)?", then "How much will it cost to do a complete failover system." I don't know if you're setting the < $3000 price range or the upper brass are, but it seems like the wrong way to go about setting the price for something such as this.br>
Sorry I'm not answering your question directly, but I literally spent this week going through this process and am sharing everything I learned.
posted by geoff. at 5:30 PM on September 30, 2005


MySQL does support replication, doesn't it? The product made by the company I work for can use MySQL replication to maintain a live backup of the database on a separate machine. Our front end can be failed over automatically or manually to the slave server on a moment's notice. Of course the application has to support detecting the failure and switching, or else you could do the switching manually through the DNS.
posted by kindall at 5:45 PM on September 30, 2005


geoff: no, that's great information, thanks. The price tag isn't a solid upper. In fact, we just had a meeting about it and decided our cap is going to be around $10k (if we pay someone else to do it) or 6-7k if we do it ourselves.

thanks for all the help so far... i'll let you know how things go and what we decide.
posted by fishfucker at 6:01 PM on September 30, 2005


What Geoff said. And I didn't see it mentioned explicitly, so I'll have a go: Make sure your System and your Data are on different disk spindles. Mirror the system disks, and Raid5 the data disks (Raid 1+0 if you have the money). Put the data on an external drive shelf if you can. That way, you have the option of banging the Data silo into a second machine without any fancy failover plumbing to bollix up.

I have to wonder - what is the ROI on the 2-server Active/Passive system versus a lone machine which is built to the hilt? A n+1 power supply, dual proc, mem-hotswap, n+1 fans, with mirrored system disks and an external disk array on 2 separate controllers has few remaining failure modes that an Active/Passive pair can cover. You do get software corruption coverage, (perhaps a 3 hour rebuild) and the (unlikely) loss of the system disk controller is mitigated. OTOH, an Active/Passive pair will cost more in software licences, will cost more in Admin time, may cost more in power & cooling, and is twice as likely to suffer hardware failure (MTBF is divided by number of units). In a fire, you'll loose either solution...

FWIW, HP has a dandy "cluster in a box" product starting at $10K.

Re: firewalls - I'm a Cisco PIX man, from my first cigarette to my last dying day.
posted by Triode at 7:37 PM on October 1, 2005


« Older Down the memory hole!   |   Gas dryer or electric? Newer »
This thread is closed to new comments.