Otherwise Occupied
 


Navigation


Syndicate
Syndicate content


User login


 
Server downtime
gregh  2006-08-31 12:37         

Maybe you noticed that my server has been unavailable lately?

Since Saturday, my server has been randomly going down. I hadn't been able to figure out why. My logs bore no indications, I keep the system relatively up-to-date, and I don't run much on the server other than the webserver, ssh, and a tightly controlled proxy. I figured there might be some random hardware problem, or perhaps there was an unpatched denial of service targeted at my kernel, which I don't update as often.

Neither was particularly attractive to address. The reason I try to avoid touching the kernel is that the server is a rented machine, hosted by EV1Servers (yes, the company that took so much heat for buying an IP license from the SCO Group a couple of years ago.) I've not had an opportunity to test them with a tough problem, and hosing up booting by killing the system wasn't something I wanted to try. So, I've only patched kernels when it's been critical. Because the server lives in Houston, TX, and is actually owned by the host, that's another reason that troubleshooting potential hardware issues is difficult.

Of course, on-campus interviewing submissions had to be out Tuesday morning, so I didn't spend much time on it over the weekend. There's work and school during the week. I'm going out of town this weekend. In short, I wasn't sure how or when I was going to resolve this, and it's been driving me nuts.

And so, today it went down again, just as it had first on Saturday, then again on Monday morning, and Tuesday afternoon. EV1 provides a remote reboot capability, but it's never brought the server back up. I've always had to request a manual reboot. It never made much sense to me; after all, it's supposed to be power cycling the server. I just figured maybe it didn't really do what they claimed it did.

Things started to get odd when the second of my requests received a response like this:

8/28/2006 5:15:18 PM
DataCenter
Dear Gregory Haverkamp,
We are closing this ticket now for ther seem to be another issue being addressed on this server at this time. Please feel free to look into the other open tickets on this server for further details of the status of your server.

I reopened the ticket, noted that I had no outstanding trouble tickets, and that I really need the server rebooted. They complied. On Tuesday's request, no "other tickets" were mentioned. That subject did come up again today after I tried to reopen their closed ticket:

8/31/2006 2:25:10 PM
DataCenter
Dear Gregory Haverkamp,
We are closing this ticket now for ther seem to be another issue being
addressed on this server at this time. Please feel free to look into
the other open tickets or customer service on this server for further details of the status
of your server.

Thank You
joyce o
Webhosting Server Support Technician
Ev1Servers

8/31/2006 2:42:46 PM
DataCenter
Dear Gregory Haverkamp,
Your server is under abuse investigations right now.It is for this reason that we are unable to reboot your server.We are sorry for any inconvience costed.Please refer to escalation # 1307120.Thanks you for your patience.

Thank You
joyce o
Webhosting Server Support Technician
Ev1Servers

And so I contacted customer service:

[16:26:06] Janelle W : unfortunately, I believe there has been a
mistake made on your account. The ticket number that they have
refrenced does pull up an abuse issue, but it is not on your account.
It is on someone else's account.
[16:26:40] Greg : Hehe. It's probably the same mistake they made
two days ago, too, when they initially told me they couldn't do it.
[16:27:01] Greg : Could this also be involved in my server appearing
to occasional drop off the network?
[16:27:09] Janelle W : let me check one more thing, and if I still
cant figure it out, I will forward you to technical support, since
they are the ones that e-mailed you
[16:30:13] Janelle W : I believe we might have figured out what the
problem is. can you hold on just one moment while I confirm it?
[16:30:26] Greg : Certainly.
[16:34:08] Janelle W : thank you so much for your patience, it will
be just one more moment
[16:40:15] Janelle W : Thank you so much for waiting. Our abuse
department is investigating this issue and someone will get back with
you either by trouble ticket, or e-mail

And so, around 4 hours afters it originally went down again:

8/31/2006 3:53:58 PM
Abuse
Dear Customer,

Your server is online and responding to ping and ssh. After reviewing the situation, it appears that your server was unplugged incorrectly due to human error. We sincerely apologize for any inconvenience this issue has caused you. No AUP violation has occured on your server, and we will be notating your account to indicate that. Once again, we apologize for the error and will be reviewing the matter so that it does not reoccur.

And so, they had mistakenly tied my server (but not my account) to an abuse investigation. This apparently resulted in repeatedly unplugging my server without any notification. From what I can tell of the reboot history of this machine since Saturday, others probably could have noticed something was amiss and said something about it, but no one did.

Mistakes do happen, and I'm glad they cleared it up quickly once they discovered there was a problem. Right now, I'm just hopeful that this ends the recent string server deaths I was seeing, that this actually was my problem.

Reply

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • You can use Textile markup to format text between the [textile] and (optional) [/textile] tags.
More information about formatting options
 
Browse archives
« October 2008  
Su Mo Tu We Th Fr Sa
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  










Akismet spam counter
Proudly protected by Akismet, 2137 spam caught since October 20, 2006