That Time I Accidentally Deleted All the Oracle Databases

thatjeffsmith Database Stuff 8 Comments

Tell Others About This Story:

My data hero, Karen Lopez aka @datachick, is hosting a blog meme for this friday called “#FailFriday: I was young and didn’t know any better.”

I have made lots of mistakes, but this one still gets to me more than 10 years later.

In 2000 I was working for a small ISV in the library management systems space. We had customers all over the world, including Kuwait. Now most of our customers were librarians, not Oracle DBAs. So, they paid us to manage their systems for them – remotely.

Now, I don’t know if you remember an Internet where ftp’ing megabytes over long distances was a challenge. But in 2000, it definitely was a challenge.

My task at hand was pretty simple:

Upgrade the customer’s 6 Oracle databases from version 7.3.4 to version 8i, remotely over a telnet session, in Kuwait.

I was a cocky 23 year old at the time. I had a college degree, an entire college class dedicated to database design, and almost a full year of experience under my belt! This new job was very intimidating at first. I was expected to be a DBA, UNIX systems admin, Apache and Perl/CGI, and our own product’s jack of all trades.

I was pretty comfortable with UNIX as my entire 4 years of college has used Solaris as the primary programming platform for my Computer Science classes. I had picked up Perl pretty quickly as it seemed much easier and intuitive than C++ and Ada (I never did get Object Oriented programming which pretty much explains why I’m not a developer), and I was getting more and more comfortable with Oracle. Heck, they had even tasked me to write an operations manual for our Oracle customers.

So when they asked me to perform this upgrade, it was a big deal, but I had done it before several times with other customers.

The process to upgrade these servers went something like this:

  • Wake up early or stay up late to FTP the new Oracle RDBMS server software to the Kuwaiti servers
  • Export the data – or take a DMP (giggle)
  • Shut down the database
  • Take a full backup
  • Archive to tape
  • Install Oracle 8i
  • Create new database
  • Import the data from the old database to the new one
  • Delete the staging software and old database

Now I did all of these steps save the ‘archive to tape’ piece. That was taken care of by the customer as they could actually put the tape physically in the box and run a script. The rest was on me. I had managed to do this successfully for 5 of the 6 servers when I really stuck my foot in it (that’s slang for royally screwed up.)

Hold on Jeff, why would you delete the staging software and old database right away?

Remember in the time before time, where the internet was slow and storage was expensive? Also, this was a library – even though they were in Kuwait, they still had a limited IT budget. There was barely enough room to un-TAR the software for me to even install it, much less leave duplicate copies of the database laying around.

Jeff, one more thing, why didn’t you just upgrade the actual database?

I could have done that. But I wanted to build the things from scratch. Mostly I remember doing that because I thought it was more fun, and I could brag about it later…or I was just more comfortable doing it that way.

The Epic Massive Fail

I had just finished getting the last database upgraded and ready to go on the server. So the only thing left to do was to remove the old files. Here’s an awesome UNIX command that any experienced person has a huge amount of respect for:

rm -rf

And when I say ‘respect’, I mean like how you would respect the power and capabilities of a loaded firearm.

‘rm’ does what it sounds like. It removes or deletes files off the filesystem. The ‘-rf’ part are flags, or options for the command. ‘r’ is for ‘recursive’, meaning it will walk the entire directory tree down. ‘f’ is for ‘force’, as in ‘do not prompt me for each and every file that is to be deleted, just delete it all!’

Are you figuring out what I did wrong?

Noooooooooooooooooooooooooooooooooooooooooooooooooooooooo!!!

Yup, I issued this command in the WRONG DIRECTORY. I wiped out all the work I had just done. In the best case scenario this would have meant the system would be down for maybe 8-12 hours instead of 4. We just had to get them to put that tape back in the server so I could restore the backup and start over.

But, I didn’t do the tape backup, they did.

So I sent an email and asked them to do the recovery.

Oh, and I did the walk of shame to my boss and told them what had happened.

Four months later they found the tape and I was able to finish. I had no idea what they did to let folks check out their books and manage their catalog. I doubt they closed the library, but they could have and it would have been mostly my fault.

To this day the first thing I do when entering a UNIX environment is change the prompt to show the full directory path. And the second thing I do is check the directory 5 or 10 times before I even think of issuing that command again.

Tell Others About This Story:

Comments 8

  1. Hi Jeff

    Found this when searching for ‘prevent accidental oracle import’ which I just did ‘coz turns out someone has a default export dumpfile lying around in the default datapump directory … arrggghhh

    I know DBAs are supposed to know what they are doing when they run commands like these but I wish the next version of impdp has a default option where it prompt your for a Y or N before running the import 🙂

  2. the last time I had to actually use a command line was…a half a lifetime ago.

    part of me wants to really know unix and its derivatives – the same part that appreciates Oriental gardens, clean white shirts and Gregorian Chants. But though it annoys me, windows does make it less possible to really hurt myself and others…

    1. JeffS Post
      Author

      You’d be surprised how much damage you can do by clicking a button Charles! That’s probably the #1 argument against GUIs – makes it too easy for the inexperienced to mess something up. That’s why I stress the proper amount of security setup to folks can only do what you want them to be able to do.

      Thanks for the comment, your Oriental garden comment really cracked me up 🙂

  3. Tape? What is this “tape” you speak of?

    I admit, one time I went to delete the test oracle home on the test box, clicked on wrong window, and deleted the production home on the production box. Funny, didn’t stop people from working, though it did stop new logins.

    It ain’t how you screw up, it’s how quick you fix it. 4 months, haha.

    1. JeffS Post
      Author
  4. JeffS Post
    Author
  5. Jeff, been there done that too, same command. Sigh!

    I was told to install SE oracle and remove the mistakenly installed EE versions. Everything was off of /opt/oracle/product. I installed both versions of SE and started to remove the EEs.

    cd /opt/oracle/product/9.2.0.6/
    rm -Rf *

    What could possibly go wrong?

    How about not spelling /opt/oracle/product/9.2.0.6/ correctly? “orcale” – oops. As I was in /opt/oracle/product (or /opt/oracle maybe) and anyway, the “cd” failed but not the “rm -Rf *”. I too walked the walk of shame to my boss and in a crowded office full of DBAs, I admitted to being a plank!

    Luckily we had backups.

    Cheers,
    Norm.

    1. Been there too.

      It’s a right of passage.

      Show me a DBA who has never had to recover from tape and I’ll show you a COWARD. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *