ThatJeffSmith

ORA-03113: end-of-file on communication channel

Don't ignore the warning signs. This crash was completely avoidable.

Not pretty, for cars or databases.

My database crashed today. My ‘production’ database that is. The one I rely on for product demonstrations, that thing I do that Quest gives me money for. Pretty funny considering in the last few weeks my blogs have included such gems as:

The cynics out there are thinking
Jeff, you crashed your database on purpose, just to give you an excuse for another blog post!

1 – I am not that smart, and
B – It crashed 15 minutes before I started a customer call.

Actually, it crashed several hours before that, the night before to be precise. I just didn’t notice it until I happened to need the database.

A Real DBA Would Have Caught This Immediately in the Alert Log

A half-decent DBA would have checked the alert log as soon as they noticed the instance was hosed. A DBA intern would have asked “Hey, isn’t there some log somewhere we can check to see what happened?” Yes, of course there is. Here’s the documentation for those developers that are morbidly curious.

Too Smart for My Own Good?

When I tried to connect to the database, I was told there was nothing handling that service. Well thank you Mr. Listener. I just assumed my database had shut down for some reason, or maybe the virtual image had a hiccup (this is pretty much what actually happened).

Instead of checking the alert log, I logged in as SYS to SQLPLUS and did a shutdown..except it was already shutdown. And then I did a STARTUP and got the ORA-03113.

I had never seen this before, what the heck?

So I go to Google, which takes me to OTN

845470:
First, check for network problems and review the SQL*Net setup. Also, look in the alert.log file for any errors.

I skimmed right over Mr or Missus 845470 advice who is labeled as a ‘Newbie.’

I am an Oracle ACE, surely I know better than this ‘n00b’!

Ten minutes into the situation, and I am still overlooking the obvious. Instead, I think – “Hey wait, I have backups! I’ll just restore my database and figure out what happened later.”

Guess what? Didn’t work.

I was able to restore and recover the database.

RMAN>ALTER DATABASE OPEN
“Sorry Sucker! We gots a ORA-03113 for you!”

OK, that’s not what Oracle told me, but it’s close. At this point, I was frustrated enough to start over and just do what I should have done an hour earlier.

Don't ignore the warning signs. This crash was completely avoidable.

Hey, ya know, maybe I’ll check the Alert.log and see what it says.

Tip #27 – Have the Alert.Log directory bookmarked so you can find it :)

It took me about 0.3 seconds to figure out what had happened once I saw the log. The database had ran out of space. No longer able to archive the redo logs, the database hung. Around the same time, the Virtual Machine ran out of memory, guess why? That’s right, the entire virtual disk was full. I’m not exactly sure if the memory was the chicken or the egg, but it didn’t really matter.

I did end up needing to recovery the database to make up for the crash. Once I freed up disk space, the database opened almost immediately.

Here’s what I saw in the log:

ALTER DATABASE OPEN
LGWR: STARTING ARCH PROCESSES
Thu May 19 11:11:02 2011
ARC0 started with pid=20, OS id=2912 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thu May 19 11:11:03 2011
ARC1 started with pid=21, OS id=2916 
Thu May 19 11:11:03 2011
ARC2 started with pid=22, OS id=2920 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thu May 19 11:11:03 2011
ARC3 started with pid=23, OS id=2924 
Errors in file c:appadministratordiagrdbmsorclorcltraceorcl_arc2_2920.trc:
ORA-19504: failed to create file "C:APPADMINISTRATORPRODUCT11.2.0DBHOMEDATABASEARCHIVEARC0000001100_0664818017.0001"
ORA-27044: unable to write the header block of file
OSD-04008: WriteFile() failure, unable to write to file
O/S-Error: (OS 112) There is not enough space on the disk.

This Will Probably Never Happen To You

Most systems have more than 1 disk. Most systems have enough space that this is never an issue. Most systems have a process in place to warn when space is running low. Most DBAs actually check or read their Alert Logs on a regular basis.

However, if you ever come across a really mundane error message, before you jump in and start doing stuff, please just check the darn logs first.

I will give myself some credit (my ego demands constant feeding). I didn’t get flustered. I knew where to look when I wanted to look. And I knew how to fix the issue. Next time…there won’t be a next time.