Friday, May 28, 2010

C++ crash handlers

Thanks to the new generation of programming languages such as Java and C# most programmers don't need to worry about memory management, pointers and so on anymore... Most but not everyone, and... not me :(.
In fact, either if you have strict performance requirements or if you're enhancing a 10 years' old program, you may need spend entire days fighting against the well known C++ memory access violation issues. I'm in the second case - enhancing an old and crappy code written more than 10 years ago - and I wanted to share with you some useful information that will hopefully help programmers to get out of - or at least to manage - wild memory errors!

The basic idea is that C++ is like a super-car: to drive a Ferrari you need to be a good driver, but if you're not such a good driver you can still drive a Ferrari, you just need to press the ESP button. :)
Since I'm not the Ayrton Senna of programming neither they were those programmers that originally wrote the code, I decided to add an ESP to that C++ program: the so called Crash Handler. Obviously I didn't invent the ESP for C++, someone already did it for me, but I got inspired by a great book any c++ programmer should read: Debugging Applications written by John Robbins. The book and this article are focused on Microsoft Windows programming only.


Combining SEH and C++ Exception handling, the Crash Handler is an exception filter which allow you to get control before the application crashes. Since you intercept the malicious exception right before the crash, you can easily put the necessary code to gracefully recover from it. Isn't that like an ESP?!?
In other words, what we're going to do is to use the standard C++ exception handling mechanism to catch SEH exceptions too.
If you're not familiar with SEH, the Structured Exception Handling is a language-independent exception handling provided by the operating system when an error occurs at the OS level. For example, when your application tries to write a memory address allocated to another process, the OS throws a "bad write" exception and your program can catch it thanks to the SEH support (you would use the __try/__except construct to do that).

So the first step is to "extend" the C++ exception mechanism to handle the SEH. We can easily do that using the C runtime library function _set_se_translator that lets you set a translator function that will be called when a structured exception occurs.

Let's implement the translate function in our CrashHandler dll project:
void NTException::translate(unsigned code, EXCEPTION_POINTERS* info)

{

switch (code) {

case EXCEPTION_ACCESS_VIOLATION:

throw AccessViolation(info);

break;

default:

throw NTException(info);

}

}
Based on the code recieved, we implemented 2 exceptions: a generic win32_exception and a more specific access_violation that will return the exact memory address where the error occourred.
The NTException is implemented as following:
NTException::NTException(const EXCEPTION_POINTERS* info)

: mWhat("Win32 exception"), mWhere(info->ExceptionRecord->ExceptionAddress), mCode(info->ExceptionRecord->ExceptionCode)

{

switch (info->ExceptionRecord->ExceptionCode) {

case EXCEPTION_ACCESS_VIOLATION:

mWhat = "Access violation";

break;

case EXCEPTION_FLT_DIVIDE_BY_ZERO:

case EXCEPTION_INT_DIVIDE_BY_ZERO:

mWhat = "Division by zero";

break;

}

}
The more specific handler for access violation is implemented as following:
AccessViolation::AccessViolation(const EXCEPTION_POINTERS* info)

: NTException(info), mIsWrite(false), mBadAddress(0)

{

mIsWrite = info->ExceptionRecord->ExceptionInformation[0] == 1;

mBadAddress = reinterpret_cast<:address>(info->ExceptionRecord->ExceptionInformation[1]);

CONTEXT *cstack = info->ContextRecord; 

}
As you can see the EXCEPTION_POINTER is the key structure to navigate through the exception information.

To enable the CrashHandler we need to register it in the program main:
_set_se_translator(CrashHandler::translate);

We can use this code to test our CrashHandler dll:
#include "CrashHandler.h"
#include 
int main()
{

// register the translator function

_set_se_translator(NTException::translate);

try
{
char* a = "";
int k = 10 / strlen(a); // division by 0!
std::cout << "TEST FAILED" << std::endl;
} 
catch (const AccessViolation& e)
{
std::cerr << "Error " << e.what() << " at " << std::hex << e.where()

<< ": Bad " << (e.isWrite()?"write":"read")

<< " on " << e.badAddress() << std::endl;
std::cout << "TEST PASSED FOR ACCESS VIOLATION" << std::endl; 
}
catch (const NTException& e) 
{
std::cerr << "Error " << e.what() << " (code " << std::hex << e.code()

<< ") at " << e.where() << std::endl;

std::cout << "TEST PASSED FOR NTException" << std::endl;
}
return 0;
}
You just need to compile now, but before that, you need to make sure that asynchronous exception handling is enabled at compile time. The default is the synchronous model (/EHsc). To enable the asynchronous exception handling you need to explicitely add the /EHa switch in the compiler options. If that option is not enabled, your translate function will be never called and you'll not catch the exceptions thrown by the OS.
Note that the asynchronous model adds some overhead with an impact on performance because the compiler has to track the lifetime of objects to be able to unwind the exceptions at any point in the code.

Lets wrap it all up in a MS VC++6 project! The workspace should include:
  • a CrashHandler project to create the Dynamic Link Library
  • a CHUnitTests project to test the new exception handler library
You can download the fully functional DLL with the source code here.
If you download and execute the code you'll get the following message:

As you can see, the Division by zero is intercepted at the address 00401602, so the test passed.

That is the essential you need to write a robust application that is able to collect useful information and gracefully terminate in case of a crash.
We need 2 more things to get the whole picture:
  • to be able to read a MAP file to convert the crash address reported in the where() result of our CrashHandler.
  • and to enhance the where function to return the line of the source code (and not the physical address) that caused the crash.
In fact, having the physical address is not so useful if you can't relate it to the source code. C#, Java and other languages report the code line where the exception is thrown and you can walk the stack trace to identify the problem. We'll achieve this result in the next posts.

In the next days, I'll first explain how to read a MAP file to convert a physical address into a source code line and then we'll create a function that, given the physical address, returns the related source code line.






Tuesday, February 2, 2010

Overclocking INTEL i7 920 processor: why not?

You should not overclock your CPU because:
  • you void the warranty
  • there is a risk to damage the CPU
But since I don't care about that and since I decided I needed some more juice to run the CPU-hungry Flight Simulator, I proceeded overclocking my wonderful CPU.

Basically, why should you overclock your CPU?
  • first because it is fun
  • second because modern CPUs have multiple cores but slow absolute CPU speed and very few applications take advantage of multiple cores
  • third because you want to learn something more about your computers' internals
To achieve good results we need the following ingredients:
  1. a CPU that is underpowered and therefore has a good margin of improvement. If you don't want to spend thousand of EUROs for a Intel Extreme edition, you can get a Intel i7 920 (Bloomfield), a great CPU at a fair price (~250Euros).
  2. We need a good motherboard that will make it easy overclocking via the BIOS settings. My favourite is the ASUS P6T Deluxe v2 that costs roughly 300Euros. Expensive but worth the money.
  3. Good DRAM modules (they make the difference in terms of stability). I got 6Gb (3x2Gb) of the Crucial Ballstix DDR3-1333Mhz 1.65v at 180Euros.
  4. Cool cooler to keep your CPU temperature low: I got the ASUS Triton 88 for 50Euros.

Now that we have the hardware we can start thinking at the overclock: the default frequency of the Intel i7 920 cpu is 2.66Ghz; we want to bring that to at least 3.4Ghz gaining ~30% of CPU speed.

With an overclocked CPU it's crucial to keep its temperature under control: download the free Core Temp utility (google it) to monitor cpu temperature and set temperature warning limits. The CPU temperatre should be at around 65C as per Intel i7 920 specification.

The first test, before overclocking,  is to verify that the ASUS Triton 88 is doing a good job cooling the CPU: with the pc in idle (Windows XP loaded but no other program running) Core Temp reports 30C to 35C on each core. That is a great result, and is the confirmation that the CPU has room for improvement.

To overclock a processor few simple calculations are needed. We first calculate the BCLK (Base Clock) needed to achieve the desired speed. Since we would like a CPU speed of 3.4Ghz, the needed BCKL is:

BCKL = Target Speed / CPU Ratio = 3400 / 20 = 170.

The CPU Ratio is fixed at 20 in the i7 920.
We need to calculate the multiplier, that depends on the DRAM frequency. We got the 1333Mhz DRAM so the multiplier is:

Multiplier = DRAM Frequency / BCKL = 1333 / 170 = 7.8 =~ 8

We need to choose the closest selectable integer in the BIOS settings, in this case it is 8.
The new DRAM frequency is:

New DRAM Frequency = BCKL * Multiplier = 170 * 8 = 1360Mhz

The Uncore Frequency is:

UCKL = New DRAM Frequency * 2 = 1360 * 2 = 2720Mhz

Fianlly, the QPI Link Data Rate should be the lowest selecatble, in this case 6135MT/s; it can be also left to AUTO in the Bios settings.

Now we need to put these values into the BIOS to make it happen! Enter the BIOS and select the AI Tweaker tab and set:
  • AI Overclocking Tuner   [Manual]
  • CPU Ratio Setting          [20.0]
  • BCKL Frequency          [170]
  • PCIE Frequency            [100]
  • DRAM Frequency         [DDR3-1363]
  • UCKL Frequency          [2726Mhz]
  • QPI Link Data Rate       [6135MT/s]
  • leave everything else to [AUTO]
Reboot.
This should result in a stable 3.4Ghz CPU at 50/55C.

The i7 920 could be overclocked up to 4Ghz, and some overclockers report that they reached 6Ghz. To achieve these results you need to manually optimize the CPU Voltage finding the lowest possible voltage where 100% stability is achieved for at least a run overnight. That requires experience and there is a good chance to damage the CPU.

With a step by step approach, it was easy to achieve 4Ghz without any stability issue. Tip: I disabled the Hyper Threading functionality in the advanced BIOS settings to lower the temperature a bit.

Important: dear reader, this is not a tutorial on how to overclock the CPU. This is a note I took to keep track of what I did on my PC. If you follow this note you may damage your CPU. If you want, do it, but do it at your own risk: in other words don't blame me if something goes wrong!

Friday, January 15, 2010

That algorithm is so hungry!!

Time to write an hungry algorithm in C#. Ok, I did my best to reduce its complexity but data to process was so much that I couldn't do any better than the first attempt. Low DRAM prices helped a bit to cut down costs but it was not the complete solution to the OutOfMemoryException issue that, from time to time, was ruin the game.



What can you do if  a program experiences a OutOfMemoryException the 20% of times you execute that piece of code?
If you're in a situation where your algorithm requires many objects that occupy a lot of memory, you could take advantage from the MemoryFailPoint class that you find in the System.Runtime namespace. The class allows you to check for sufficient memory before starting your hungry piece of code.
To use the class you just need to instantiate an object passing the amount of memory the algorithm you're going to execute would require.
try
{
   // try to reserve 2Gb of memory
   using (MemoryFailPoint mem = new MemoryFailPoint(2000))
   {
      // execute hungry code here
   } // dispose to release resources
}
catch (InsufficientMemoryException e)
{
   // gracefully recover in case of not enough memory
}
The constructor first checks if there is enough space in the page file to satisfy the request. If the space is not available, a garbage collection is forced to try to free up some space. If the space does not suffice yet, it tries to expand the paging file. If the file cannot grow enough, a InsufficientMemoryException (derived from OutOfMemoryException) is thrown. Otherwise, if the space is enough, the requested memory is reserved to a private static field defined within the class. At that point you can run your algorithm with a good chance to have enough memory: it is not guaranteed in fact that reserved memory will be physically allocated to the algorithm. When your algorithm completes, make sure you call the Dispose() method to release the reserved resources.
The MemoryFailPoint class can be a good help to create a robust solution: it's not a guarantee but it helps to gather as much memory as possible providing an elegant way to gracefully recover from a memory issue (for example if the exception is thrown you could decide to split the algorithm execution in two runs and then merge back the results).