- Are You a Real DBA – I argue that you are indeed NOT if you have not went through the crash and recovery nightmare
- What Databases Should You Back Up – I argue that any database you depend on should be backed up, production or not.
The cynics out there are thinking
Jeff, you crashed your database on purpose, just to give you an excuse for another blog post!
1 – I am not that smart, and
B – It crashed 15 minutes before I started a customer call.
Actually, it crashed several hours before that, the night before to be precise. I just didn’t notice it until I happened to need the database.
A Real DBA Would Have Caught This Immediately in the Alert Log
A half-decent DBA would have checked the alert log as soon as they noticed the instance was hosed. A DBA intern would have asked “Hey, isn’t there some log somewhere we can check to see what happened?” Yes, of course there is. Here’s the documentation for those developers that are morbidly curious.
Too Smart for My Own Good?
When I tried to connect to the database, I was told there was nothing handling that service. Well thank you Mr. Listener. I just assumed my database had shut down for some reason, or maybe the virtual image had a hiccup (this is pretty much what actually happened).
Instead of checking the alert log, I logged in as SYS to SQLPLUS and did a shutdown..except it was already shutdown. And then I did a STARTUP and got the ORA-03113.
I had never seen this before, what the heck?
So I go to Google, which takes me to OTN
First, check for network problems and review the SQL*Net setup. Also, look in the alert.log file for any errors.
I skimmed right over Mr or Missus 845470 advice who is labeled as a ‘Newbie.’
I am an Oracle ACE, surely I know better than this ‘n00b’!
Ten minutes into the situation, and I am still overlooking the obvious. Instead, I think – “Hey wait, I have backups! I’ll just restore my database and figure out what happened later.”
Guess what? Didn’t work.
I was able to restore and recover the database.
RMAN>ALTER DATABASE OPEN
“Sorry Sucker! We gots a ORA-03113 for you!”
OK, that’s not what Oracle told me, but it’s close. At this point, I was frustrated enough to start over and just do what I should have done an hour earlier.
Hey, ya know, maybe I’ll check the Alert.log and see what it says.
Tip #27 – Have the Alert.Log directory bookmarked so you can find it
It took me about 0.3 seconds to figure out what had happened once I saw the log. The database had ran out of space. No longer able to archive the redo logs, the database hung. Around the same time, the Virtual Machine ran out of memory, guess why? That’s right, the entire virtual disk was full. I’m not exactly sure if the memory was the chicken or the egg, but it didn’t really matter.
I did end up needing to recovery the database to make up for the crash. Once I freed up disk space, the database opened almost immediately.
Here’s what I saw in the log:
ALTER DATABASE OPEN LGWR: STARTING ARCH PROCESSES Thu May 19 11:11:02 2011 ARC0 started with pid=20, OS id=2912 ARC0: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Thu May 19 11:11:03 2011 ARC1 started with pid=21, OS id=2916 Thu May 19 11:11:03 2011 ARC2 started with pid=22, OS id=2920 ARC1: Archival started ARC2: Archival started ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH ARC2: Becoming the heartbeat ARCH Thu May 19 11:11:03 2011 ARC3 started with pid=23, OS id=2924 Errors in file c:appadministratordiagrdbmsorclorcltraceorcl_arc2_2920.trc: ORA-19504: failed to create file "C:APPADMINISTRATORPRODUCT11.2.0DBHOMEDATABASEARCHIVEARC0000001100_0664818017.0001" ORA-27044: unable to write the header block of file OSD-04008: WriteFile() failure, unable to write to file O/S-Error: (OS 112) There is not enough space on the disk.
This Will Probably Never Happen To You
Most systems have more than 1 disk. Most systems have enough space that this is never an issue. Most systems have a process in place to warn when space is running low. Most DBAs actually check or read their Alert Logs on a regular basis.
However, if you ever come across a really mundane error message, before you jump in and start doing stuff, please just check the darn logs first.
I will give myself some credit (my ego demands constant feeding). I didn’t get flustered. I knew where to look when I wanted to look. And I knew how to fix the issue. Next time…there won’t be a next time.