Friday, May 28, 2010

C++ crash handlers

Thanks to the new generation of programming languages such as Java and C# most programmers don't need to worry about memory management, pointers and so on anymore... Most but not everyone, and... not me :(.
In fact, either if you have strict performance requirements or if you're enhancing a 10 years' old program, you may need spend entire days fighting against the well known C++ memory access violation issues. I'm in the second case - enhancing an old and crappy code written more than 10 years ago - and I wanted to share with you some useful information that will hopefully help programmers to get out of - or at least to manage - wild memory errors!

The basic idea is that C++ is like a super-car: to drive a Ferrari you need to be a good driver, but if you're not such a good driver you can still drive a Ferrari, you just need to press the ESP button. :)
Since I'm not the Ayrton Senna of programming neither they were those programmers that originally wrote the code, I decided to add an ESP to that C++ program: the so called Crash Handler. Obviously I didn't invent the ESP for C++, someone already did it for me, but I got inspired by a great book any c++ programmer should read: Debugging Applications written by John Robbins. The book and this article are focused on Microsoft Windows programming only.


Combining SEH and C++ Exception handling, the Crash Handler is an exception filter which allow you to get control before the application crashes. Since you intercept the malicious exception right before the crash, you can easily put the necessary code to gracefully recover from it. Isn't that like an ESP?!?
In other words, what we're going to do is to use the standard C++ exception handling mechanism to catch SEH exceptions too.
If you're not familiar with SEH, the Structured Exception Handling is a language-independent exception handling provided by the operating system when an error occurs at the OS level. For example, when your application tries to write a memory address allocated to another process, the OS throws a "bad write" exception and your program can catch it thanks to the SEH support (you would use the __try/__except construct to do that).

So the first step is to "extend" the C++ exception mechanism to handle the SEH. We can easily do that using the C runtime library function _set_se_translator that lets you set a translator function that will be called when a structured exception occurs.

Let's implement the translate function in our CrashHandler dll project:
void NTException::translate(unsigned code, EXCEPTION_POINTERS* info)

{

switch (code) {

case EXCEPTION_ACCESS_VIOLATION:

throw AccessViolation(info);

break;

default:

throw NTException(info);

}

}
Based on the code recieved, we implemented 2 exceptions: a generic win32_exception and a more specific access_violation that will return the exact memory address where the error occourred.
The NTException is implemented as following:
NTException::NTException(const EXCEPTION_POINTERS* info)

: mWhat("Win32 exception"), mWhere(info->ExceptionRecord->ExceptionAddress), mCode(info->ExceptionRecord->ExceptionCode)

{

switch (info->ExceptionRecord->ExceptionCode) {

case EXCEPTION_ACCESS_VIOLATION:

mWhat = "Access violation";

break;

case EXCEPTION_FLT_DIVIDE_BY_ZERO:

case EXCEPTION_INT_DIVIDE_BY_ZERO:

mWhat = "Division by zero";

break;

}

}
The more specific handler for access violation is implemented as following:
AccessViolation::AccessViolation(const EXCEPTION_POINTERS* info)

: NTException(info), mIsWrite(false), mBadAddress(0)

{

mIsWrite = info->ExceptionRecord->ExceptionInformation[0] == 1;

mBadAddress = reinterpret_cast<:address>(info->ExceptionRecord->ExceptionInformation[1]);

CONTEXT *cstack = info->ContextRecord; 

}
As you can see the EXCEPTION_POINTER is the key structure to navigate through the exception information.

To enable the CrashHandler we need to register it in the program main:
_set_se_translator(CrashHandler::translate);

We can use this code to test our CrashHandler dll:
#include "CrashHandler.h"
#include 
int main()
{

// register the translator function

_set_se_translator(NTException::translate);

try
{
char* a = "";
int k = 10 / strlen(a); // division by 0!
std::cout << "TEST FAILED" << std::endl;
} 
catch (const AccessViolation& e)
{
std::cerr << "Error " << e.what() << " at " << std::hex << e.where()

<< ": Bad " << (e.isWrite()?"write":"read")

<< " on " << e.badAddress() << std::endl;
std::cout << "TEST PASSED FOR ACCESS VIOLATION" << std::endl; 
}
catch (const NTException& e) 
{
std::cerr << "Error " << e.what() << " (code " << std::hex << e.code()

<< ") at " << e.where() << std::endl;

std::cout << "TEST PASSED FOR NTException" << std::endl;
}
return 0;
}
You just need to compile now, but before that, you need to make sure that asynchronous exception handling is enabled at compile time. The default is the synchronous model (/EHsc). To enable the asynchronous exception handling you need to explicitely add the /EHa switch in the compiler options. If that option is not enabled, your translate function will be never called and you'll not catch the exceptions thrown by the OS.
Note that the asynchronous model adds some overhead with an impact on performance because the compiler has to track the lifetime of objects to be able to unwind the exceptions at any point in the code.

Lets wrap it all up in a MS VC++6 project! The workspace should include:
  • a CrashHandler project to create the Dynamic Link Library
  • a CHUnitTests project to test the new exception handler library
You can download the fully functional DLL with the source code here.
If you download and execute the code you'll get the following message:

As you can see, the Division by zero is intercepted at the address 00401602, so the test passed.

That is the essential you need to write a robust application that is able to collect useful information and gracefully terminate in case of a crash.
We need 2 more things to get the whole picture:
  • to be able to read a MAP file to convert the crash address reported in the where() result of our CrashHandler.
  • and to enhance the where function to return the line of the source code (and not the physical address) that caused the crash.
In fact, having the physical address is not so useful if you can't relate it to the source code. C#, Java and other languages report the code line where the exception is thrown and you can walk the stack trace to identify the problem. We'll achieve this result in the next posts.

In the next days, I'll first explain how to read a MAP file to convert a physical address into a source code line and then we'll create a function that, given the physical address, returns the related source code line.