| Otherwise Occupied |
| About Greg Classes Ride Videos |
|
Server downtime
gregh 2006-08-31 12:37 hosting_services system_administration Website
Maybe you noticed that my server has been unavailable lately? Since Saturday, my server has been randomly going down. I hadn't been able to figure out why. My logs bore no indications, I keep the system relatively up-to-date, and I don't run much on the server other than the webserver, ssh, and a tightly controlled proxy. I figured there might be some random hardware problem, or perhaps there was an unpatched denial of service targeted at my kernel, which I don't update as often. Neither was particularly attractive to address. The reason I try to avoid touching the kernel is that the server is a rented machine, hosted by EV1Servers (yes, the company that took so much heat for buying an IP license from the SCO Group a couple of years ago.) I've not had an opportunity to test them with a tough problem, and hosing up booting by killing the system wasn't something I wanted to try. So, I've only patched kernels when it's been critical. Because the server lives in Houston, TX, and is actually owned by the host, that's another reason that troubleshooting potential hardware issues is difficult. Of course, on-campus interviewing submissions had to be out Tuesday morning, so I didn't spend much time on it over the weekend. There's work and school during the week. I'm going out of town this weekend. In short, I wasn't sure how or when I was going to resolve this, and it's been driving me nuts. And so, today it went down again, just as it had first on Saturday, then again on Monday morning, and Tuesday afternoon. EV1 provides a remote reboot capability, but it's never brought the server back up. I've always had to request a manual reboot. It never made much sense to me; after all, it's supposed to be power cycling the server. I just figured maybe it didn't really do what they claimed it did. Things started to get odd when the second of my requests received a response like this:
I reopened the ticket, noted that I had no outstanding trouble tickets, and that I really need the server rebooted. They complied. On Tuesday's request, no "other tickets" were mentioned. That subject did come up again today after I tried to reopen their closed ticket:
And so I contacted customer service:
And so, around 4 hours afters it originally went down again:
And so, they had mistakenly tied my server (but not my account) to an abuse investigation. This apparently resulted in repeatedly unplugging my server without any notification. From what I can tell of the reboot history of this machine since Saturday, others probably could have noticed something was amiss and said something about it, but no one did. Mistakes do happen, and I'm glad they cleared it up quickly once they discovered there was a problem. Right now, I'm just hopeful that this ends the recent string server deaths I was seeing, that this actually was my problem. Reply |
|