So last night I am made aware that the RC NUT website was down, ok, I checked the webserver .. no response, so I sent out a e-mail to support and didn’t tough much of it .. surely it would be up in 5-10 minutes
30 minutes later and no show, I started to get annoyed, told support to get a grip on things and since it was very late, I went to bed.
This morning I woke up early, about 6:00 AM to check it out, site was up, but no DB connectivity .. I have my DB on a separate server (thats the proper way a H-Sphere install is done) .. so I ping the DB server .. no luck, I then checked the IP of the webserver and .. i’ll be dammed, I was on a US server, not a UK server, I ping the MySQL server by name, and sure enough, it was there, but also in the US.
I changed my config files and removed the old US ip, the site was fully up .. except the site was 13 days old !!
Now, for 99% of people out there, this is not a biggie, but for a online shop with sometimes dozens of orders in one day .. this is a catastrophe .. over 100 order records are lost and all since we do inventory by the website stock level, a messed up inventory as well.
Upon further investigation, I found out that Relio’s (webhost) server provider in the UK seems to have decided not to pay their bandwidth bill, and the bandwidth provided cut them off, in essence rendering companies like Relio’s UK servers .. dead (notice, it was not Relio that did not pay their bill, but their server provider, UK Easily)
So I called the bandwidth provider (Uk Grid … all those UK names, they confused the hell out of me today), explain to them our situation and after some negotiations, they agreed to open up traffic to the server for 3 hours to allow me (and others) to access and backup their data out, that was commendable .. those guys where true stars in doing that, after I got the database back, I restored it and the website was up to date … gosh was I relieved.
Why was my site 13 days old to start with ? .. well .. simple, Relio backups my data every day to another server, and every week or two they backup that server to another server in the US .. ok, at first glance, not a bad setup .. but this experience taught me .. not good enough, I need more.
So, since Relio’s UK servers where dead in the water .. and no access to the backup server was possible, they had to do an emergency recovery from the US backups.
I will give Relio credit, to restore every UK account in the US servers, move DNS, remap ip’s, etc .. etc in less than 6 hours .. thats not bad, they did a good job, but it goes to show, no matter how good your first Tier is .. somebody up the ladder is bound to mess it up !!.
Thats the problem with the Web Hosting market these days, too many idiots working on it, we put out livelihoods in the hands of people who later turn around and mess it up for you .. and does not matter how big or small the company is, your are always at risk.
What is the solution ? well, nothing below 50K of hardware and a few K’s a month in fees, so sometimes .. you just have to pay up and shut up.
Thats what I am doing next, first I am going to hire a local hosting company (when I mean local, I mean 3 streets over), I feel the threat of a 300lbs+ guys walking into the office and breaking your nose is a good incentive to make sure they dont screw with you .. that can only happen if they are local, second, I am going VPS (Virtuozzo) .. my idea is to find a setup I can create a Virtuozzo Failover Cluster with a backup residing off-site.. third, I am setting up a mirror of all this on a VPS server in the US, with my DNS residing also offsite (easily re-routable) so if something happens to RIPE itself (The IP consortium of Europe) .. I can still be up, fourth … dont know fourth yet .. but it’s sure to pop up while we doing this.
Overboard, overblown, over-budget ? .. of course .. but piece of mind, i’ve been in this business for 10 years .. I have seen some huge companies working with a setup that would make anybody cringe, promises of redundancy are usually all hogwash, backups many times are simply not practical, yes, they will back your data, they just cant extract it back, customers are at huge risks all the time of loosing all their data, and for some, thats instant bankruptcy, I have try to give away control on how my data is stored and managed, but it’s proven once again, if I want it right, I have to do it myself.