So last night I am made aware that the RC NUT website was down, ok, I checked the webserver .. no response, so I sent out a e-mail to support and didn’t tough much of it .. surely it would be up in 5-10 minutes
30 minutes later and no show, I started to get annoyed, told support to get a grip on things and since it was very late, I went to bed.
This morning I woke up early, about 6:00 AM to check it out, site was up, but no DB connectivity .. I have my DB on a separate server (thats the proper way a H-Sphere install is done) .. so I ping the DB server .. no luck, I then checked the IP of the webserver and .. i’ll be dammed, I was on a US server, not a UK server, I ping the MySQL server by name, and sure enough, it was there, but also in the US.
I changed my config files and removed the old US ip, the site was fully up .. except the site was 13 days old !!
Now, for 99% of people out there, this is not a biggie, but for a online shop with sometimes dozens of orders in one day .. this is a catastrophe .. over 100 order records are lost and all since we do inventory by the website stock level, a messed up inventory as well.
Upon further investigation, I found out that Relio’s (webhost) server provider in the UK seems to have decided not to pay their bandwidth bill, and the bandwidth provided cut them off, in essence rendering companies like Relio’s UK servers .. dead (notice, it was not Relio that did not pay their bill, but their server provider, UK Easily)
So I called the bandwidth provider (Uk Grid … all those UK names, they confused the hell out of me today), explain to them our situation and after some negotiations, they agreed to open up traffic to the server for 3 hours to allow me (and others) to access and backup their data out, that was commendable .. those guys where true stars in doing that, after I got the database back, I restored it and the website was up to date … gosh was I relieved.
Why was my site 13 days old to start with ? .. well .. simple, Relio backups my data every day to another server, and every week or two they backup that server to another server in the US .. ok, at first glance, not a bad setup .. but this experience taught me .. not good enough, I need more.
So, since Relio’s UK servers where dead in the water .. and no access to the backup server was possible, they had to do an emergency recovery from the US backups.
I will give Relio credit, to restore every UK account in the US servers, move DNS, remap ip’s, etc .. etc in less than 6 hours .. thats not bad, they did a good job, but it goes to show, no matter how good your first Tier is .. somebody up the ladder is bound to mess it up !!.
Thats the problem with the Web Hosting market these days, too many idiots working on it, we put out livelihoods in the hands of people who later turn around and mess it up for you .. and does not matter how big or small the company is, your are always at risk.
What is the solution ? well, nothing below 50K of hardware and a few K’s a month in fees, so sometimes .. you just have to pay up and shut up.
Thats what I am doing next, first I am going to hire a local hosting company (when I mean local, I mean 3 streets over), I feel the threat of a 300lbs+ guys walking into the office and breaking your nose is a good incentive to make sure they dont screw with you .. that can only happen if they are local, second, I am going VPS (Virtuozzo) .. my idea is to find a setup I can create a Virtuozzo Failover Cluster with a backup residing off-site.. third, I am setting up a mirror of all this on a VPS server in the US, with my DNS residing also offsite (easily re-routable) so if something happens to RIPE itself (The IP consortium of Europe) .. I can still be up, fourth … dont know fourth yet .. but it’s sure to pop up while we doing this.
Overboard, overblown, over-budget ? .. of course .. but piece of mind, i’ve been in this business for 10 years .. I have seen some huge companies working with a setup that would make anybody cringe, promises of redundancy are usually all hogwash, backups many times are simply not practical, yes, they will back your data, they just cant extract it back, customers are at huge risks all the time of loosing all their data, and for some, thats instant bankruptcy, I have try to give away control on how my data is stored and managed, but it’s proven once again, if I want it right, I have to do it myself.
If you are looking for a vps in the UK you may want to check out http://www.clustered.net or http://www.blacknight.ie as both of those guys offer. If you just need another shared host I know http://www.unitedhosting.co.uk has been around forever and can probably help you.
Ross
-http://www.hostdisciple.com
Hi Ross
I was going to go VPS and local, but then I calm down and got my toughts back in order, so I checked with my good friends at http://www.hosting365.com and decided to go with their “cloud” servers instead with an additional SAN partition
Basically a single blade with a HP EVA-8000 storage partition (fully redundant) .. 100% SLA on Storage and 15 minute SLA on hardware .. you cant ask for more 🙂
For now I decided to have a second storage mount drive for backups, but soon they will be offering a “mirrored cloud” setup .. that will be interesting
Plus I know them, been to the DC a few times, so know I can count on these guys, the reason i was not going to go with them initially was because they are in Ireland, I was afraid google geo-indexing would not work for me, but we will go around it by registering a small class of IP’s to our UK location.. that should get around that problem.
Anyway, the solution starts at 99€ .. so one cant complain, it’s as cheap as a high end VPS these days.
That cloud solution, is that just a web server that is part of an h-sphere cluster?
it’s a “blade” server with a SAN storage,basically the storage does not reside on the server but on a massive storage cluster, so if something happens to the blade, you can just replace it with another blade and keep going.
Cloud 2 will be live in 8 weeks time (all going to plan) and we can start providing replication and failover to a second physical site… 🙂