ThatJeffSmith

That Time I Accidentally Deleted All the Oracle Databases

Noooooooooooooooooooooooooooooooooooooooooooooooooooooooo!!!

My data hero, Karen Lopez aka @datachick, is hosting a blog meme for this friday called “#FailFriday: I was young and didn’t know any better.”

I have made lots of mistakes, but this one still gets to me more than 10 years later.

In 2000 I was working for a small ISV in the library management systems space. We had customers all over the world, including Kuwait. Now most of our customers were librarians, not Oracle DBAs. So, they paid us to manage their systems for them – remotely.

Now, I don’t know if you remember an Internet where ftp’ing megabytes over long distances was a challenge. But in 2000, it definitely was a challenge.

My task at hand was pretty simple:

Upgrade the customer’s 6 Oracle databases from version 7.3.4 to version 8i, remotely over a telnet session, in Kuwait.

I was a cocky 23 year old at the time. I had a college degree, an entire college class dedicated to database design, and almost a full year of experience under my belt! This new job was very intimidating at first. I was expected to be a DBA, UNIX systems admin, Apache and Perl/CGI, and our own product’s jack of all trades.

I was pretty comfortable with UNIX as my entire 4 years of college has used Solaris as the primary programming platform for my Computer Science classes. I had picked up Perl pretty quickly as it seemed much easier and intuitive than C++ and Ada (I never did get Object Oriented programming which pretty much explains why I’m not a developer), and I was getting more and more comfortable with Oracle. Heck, they had even tasked me to write an operations manual for our Oracle customers.

So when they asked me to perform this upgrade, it was a big deal, but I had done it before several times with other customers.

The process to upgrade these servers went something like this:

  • Wake up early or stay up late to FTP the new Oracle RDBMS server software to the Kuwaiti servers
  • Export the data – or take a DMP (giggle)
  • Shut down the database
  • Take a full backup
  • Archive to tape
  • Install Oracle 8i
  • Create new database
  • Import the data from the old database to the new one
  • Delete the staging software and old database

Now I did all of these steps save the ‘archive to tape’ piece. That was taken care of by the customer as they could actually put the tape physically in the box and run a script. The rest was on me. I had managed to do this successfully for 5 of the 6 servers when I really stuck my foot in it (that’s slang for royally screwed up.)

Hold on Jeff, why would you delete the staging software and old database right away?

Remember in the time before time, where the internet was slow and storage was expensive? Also, this was a library – even though they were in Kuwait, they still had a limited IT budget. There was barely enough room to un-TAR the software for me to even install it, much less leave duplicate copies of the database laying around.

Jeff, one more thing, why didn’t you just upgrade the actual database?

I could have done that. But I wanted to build the things from scratch. Mostly I remember doing that because I thought it was more fun, and I could brag about it later…or I was just more comfortable doing it that way.

The Epic Massive Fail

I had just finished getting the last database upgraded and ready to go on the server. So the only thing left to do was to remove the old files. Here’s an awesome UNIX command that any experienced person has a huge amount of respect for:

rm -rf

And when I say 'respect', I mean like how you would respect the power and capabilities of a loaded firearm.

'rm' does what it sounds like. It removes or deletes files off the filesystem. The '-rf' part are flags, or options for the command. 'r' is for 'recursive', meaning it will walk the entire directory tree down. 'f' is for 'force', as in 'do not prompt me for each and every file that is to be deleted, just delete it all!'

Are you figuring out what I did wrong?

Noooooooooooooooooooooooooooooooooooooooooooooooooooooooo!!!

Yup, I issued this command in the WRONG DIRECTORY. I wiped out all the work I had just done. In the best case scenario this would have meant the system would be down for maybe 8-12 hours instead of 4. We just had to get them to put that tape back in the server so I could restore the backup and start over.

But, I didn't do the tape backup, they did.

So I sent an email and asked them to do the recovery.

Oh, and I did the walk of shame to my boss and told them what had happened.

Four months later they found the tape and I was able to finish. I had no idea what they did to let folks check out their books and manage their catalog. I doubt they closed the library, but they could have and it would have been mostly my fault.

To this day the first thing I do when entering a UNIX environment is change the prompt to show the full directory path. And the second thing I do is check the directory 5 or 10 times before I even think of issuing that command again.