So, the Alagad web server is one of the 1st EC2 servers I configured. Windows 2003, small instance, with instance level storage for the boot device. For new servers, I would *always* advise using an EBS partition for your boot device, it makes backing up and restoring 100% easier. Rebuilding this server is on my to-do list, but I digress…
Friday, this server decided to stop responding. In the EC2 control panel, it showed as ‘Up’, but failing health checks. The 1st thing I always do, is attempt to reboot it (have you tried turning it off and on again?). It appeared to be rebooting, but still would fail contact health checks and we were unable to RDP to the box. Sometimes a reboot just takes time, but in this case several hours later we were still stuck. I submitted a ticket to Amazon support, where was was told:
Your instance is currently failing the System Reachability status check. Given that your root device type is the instance store, you will need to terminate the instance and launch a replacement.
If it is possible for your situation, I recommend launching the replacement using an EBS volume as your root device type. When an EBS-backed instance fail system status checks, you can often resolve failed System status checks by stopping and re-starting the instance rather than replacing it.
You can find more information here about
This was, obviously, not what I wanted to hear! I dug into the storage attached to the instance, and sure enough the S3 drive showed that it was failing a reliability check. I forced a dis-connect of that storage device from my instance, then clicked in the interface to make the device available anyway. With this drive disconnected, I was able to re-boot my instance (cheers!). Once windows was back up and running and I had RDP access, I made a snapshot of my storage device, attached the snapshot to windows, and re-activated the volume in windows storage manager.
Does this mean the server is going to stay as it is? Not at all – this still needs to be rebuilt, using an EBS root device, but at least my hand is not forced to do the rebuild on Friday at 4:00 PM!