August 04, 2008

Understanding High Availability

I've just finished a course on High Availability, more of an overview on different HA technologies on various platforms.
What I have noticed is that is really, really hard to have people understand that you cannot plan high availability as a "one night affair". Most organizations have their border routers under VRRP, and their Oracle database running on application cluster, but yet they seldom have layer 2 redundancy ( the "oh my god, a loop! kill it, kill it!" syndrome) or any redundancy on "less-important" systems.

Like an old friend said, "if it's worth having it, it's worth having it all the time". With the new virtualization techniques available there's really no excuse for not achieving HA on most of your infrastructure.

Need an easy to manage yet featureful HA firewall? Go for pfSense. You can name almost any software, an HA solution is there for free or for the time you need to build it: if it's running on Linux, then you have DRBD (150-160Mb on two bonded nics) and Heartbeat and many others, if it's under Windows you have tons of choice - not to forget a scheduled VMware converter run which might not be HA but yet it's far more than most organizations actually have.

One of our clients had an hardware failure last Friday, which resulted in a complete halt of business for the weekend. Hard to tell how much damage was actually done, but does it make any sense to work in such a way when HA solutions are so cheap?

Yes, you need skills to do HA. But what we don't need anymore in our business is IT people without skills: we already have far too many.

PS: As you might or might not have noticed, this is the first post since a lifetime. Long story short, more posts will come from now on ;)