Blog Navigation
Partners
Latest Activity
Phil explains how to use the old telephone tones to wane off telemarketers!
my biggest blunder as a system admin
A good place to introduce this topic is by stating that I consider myself a “good” system administrator. I consider myself above average when it comes to Windows Server administration, and “average” when it comes to Linux server administration. Normally, regardless of platform (Linux or Windows), I usually know enough not to get myself into trouble, yet rectify the problem that is presented to me.
Prior to last October (2008), I have solved a wide array of problems consisting of Apache malfunctions and complete Server Hard Drive failures requiring data recovery, to simply having to unblock a person’s ip address from the firewall because they tried to login to the server incorrectly too many times. I never really messed a server up so badly that I couldn’t undo what I attempted to fix in the first place.
The biggest problem that I have ever had up until October 2008 was with an email server’s outgoing email queue. All email that was sent from this server would always be refused by other popular mail servers on the internet due to it not having the correct configuration. I never did figure out that error, and instead I changed the software that manages the email server from LXAdmin to CPanel. This fixed the problem, I never did understand why the server wouldn’t send, but CPanel fixed my mail problems and so many other problems that I sort of fixed on LXAdmin but didn’t really have a long term solution for.
Now that I have blabbed on enough about my experience, lets get to this blunder (I’m sure I’ve posted about it elsewhere, but I don’t recall putting it here). In October, Justin, a good friend who runs AmpHosted came to me (this wasn’t the first time incase you are wondering) asking about some sort of tiny problem that he was somewhat unsure of how to fix, but he had the right idea and I confirmed it. He also asked me how to free up space on the linux /var directory since his was getting pretty full. I’m not sure how the conversation went anymore, however I know that there were a few possible solutions.
My first goal was to free up enough space so that the /var partition wouldn’t overflow and risk crashing the server. Server crashes can be costly, and Justin was in no mood to lose money as a president of a strengthening hosting company. So I began googling to figure out what log files were safe to delete. I know that linux has a lot of log files that cannot be deleted safely, and I was finding these so I would know not to delete them.
My second goal was to have this partition expanded from free space on the other partitions so that the problem would have a more permanent solution (which did happen in November).
I then noticed that one of the mysql directories was using most of the space. I quickly did a google search and read that it was safe to delete a mysql log directory. Unfortunate for me, I only saw what I wanted to see, and didn’t read the article thoroughly. Needless to say, I wiped out the /var/lib/mysql directory from his server, effectively freeing up a lot of disk space on the /var partition, and also wiping out the mysql server and all of the database files. On top of this, when I began looking for the backup files to quickly restore the databases within an hour, I found out quickly that some of the backups were corrupt and others non existant. After restoring most of the server, one client lost a month of data and I felt horrible!
Since then, my admin buddies still push that blunder in my face. I’m not entirely sure why because I still feel sorry for Justin. I have also started taking the time to read what is safe to remove and not so I don’t accidently do something that bad again. A mistake like that could have costed me my job or a pay cut if I was working directly for a big time hosting company, even if I did have 10 to 15 years of experience.
Since then, when it comes to matthouse and my own hosting company, or when I’m helping Justin, I always make sure to double check that I’m right before I proceed. I know that I’m now slower, but I also have made changes to my procedures to make them more safe, sacraficing speed in my repairs.
I feel that I should write this blog for 2 reasons, 1. to make it known that I DO make mistakes and I’m not perfect, and also, I hope that anyone who reads this blog will make sure to check twice before doing an operation that is not reversable (at least easily).
I will be taking a system administration course next semester which is linux based with a FreeBSD pioneer teaching that course. The final reason why I wrote this was to say how good of an Admin I consider myself now, so that after that course, I can re-rate myself and hopefully talk about a lot of my experiences in that class.
One final note, I’m still working on adminreference.com, and I will probably start posting more recent knowledge that I’ve acquired in the near future after this final week of college classes and work.
Tags: administration, blunder, delete, disk space, hosting, Linux, mysql, var, var/lib/mysql, windows
Posted in Hosting / Server Administration, Personal
I lmao at this mistake when Justin or John or You told me about it (can’t remember who). “Who needs those MySQL log files…”
Interesting, I`ll quote it on my site later.
Charlie