Archive for January 27th, 2010

Saving our Server

Wednesday, January 27th, 2010

About two weeks ago, our fair server (home to, Chris’ IB History Topics, Helen’s new blog, my business websites, etc.) shut down for no apparent reason at 4am. Panic!

I spend the day combing through log files, searching for hints of an intruder. The shutdown scrambled several log files, so I decided to rebuild the system from scratch. Right in the middle of doing so and, coincidentally at 4pm, the computer shut down again. Argh!

Again, assuming a South Korean hacker was coming for my lovely family photos and insightful rhetoric, I downgraded the software to a slightly earlier version, assuming that the bugs would be ironed out of that. Installing the earlier Linux operating system solved the problem. For two days. Then another 4am shutdown.

I got the bright idea that the computer might be overheating. There are sensors already in place inside most computer equipment these days to measure temperature and shutdown if it gets too high. Accessing the sensors is easy with the MacOS, but a bit more complicated with Linux. I installed lm-sensors (a data extraction tool) and sensorsd (a separate tool to query lm-sensors and provide a log). The CPU is showing 67º Celsius (152º for you Fahrenheit people). Is that high? Querying Google, I find an engineering doc from Intel that rates the Celeron processor in the server at a maximum temperature of 67º. Hmm.

I take the casing off the computer. I move the server down to the garage. Winter in Temecula means outside temperatures between 34º and 65º Farenheit, so the garage location should cool it down a bit. I also blow some compressed air around the insides of the computer and particularly around the fan connected to the CPU.

Temperatures go down for awhile – 57º, 59º, 65º, 57º – but then spike again after a few days – 65º, 67º, 69º, 67º. What’s going on? Do I need to get a new computer?

Last night, about 9pm I return home from taking Daniel to his periodic ultrasound exam. He fell asleep in the back of the Toyota before we got back. I check the server. 72º C, right at the edge of shutdown. I go to the garage with a can of compressed air and a flashlight. I tilt the computer over 45º and shine the flashlight directly into the spinning fan that sits on the CPU. What do my eyes behold but a solid mass of lint, plugging every single cooling vent on the heatsink, packing it with dust and lint. The dust and lint were completely invisible when the fan wasn’t running, but under the flashlight with the running fan, they stand out like dog poop on a red carpet entrance. I take five minutes cleaning with compressed air and some tape to remove the larger particles.

This morning, after running for 12 hours, all temperatures are at 38º, with a 4am spike of 56º. Our server is safe again …