Tuesday, June 1, 2010

Converting a physical address into the source code line: how to read a MAP file

In the previous note, the C++ crash handler, I described how to obtain the physical address of an hard crash and gracefully terminate the program. In this note I'll try to recap the steps needed to convert the physical address of the crash to the related source code line.



The first step is to re-build your source code to generate the MAP file. In MS VC++6 go to Project Settings, open the Link tab and enable the Generate mapfile option. In the Project Options below add the switches /MAPINFO:EXPORTS /MAPINFO:LINES. When you rebuild the program, the linker will add a .map file; in our example the file is the CRUnitTests.map. We'll refer to this easy example to understand the logic behind the map file.

To convert the physical address into the source line we need to:
  1. open the map file with a text editor (I like VIM!)
  2. check if the map file is correct
  3. identify the source file and the function name where the crash occurred
  4. find the line number within the source file
1.
The map file contains many info about the program. However what we really need to note are:
- the Preferred load address
- the public function information section that shows Address, Public by Balue, Rva+Base, Lib:Object. The Rva+Base is the starting address of the function
- and the line information of each file. That section starts with something like "Line numbers for .cpp" and is generated only if the /MAPINFO switches are set.

2.
To validate the map file correctness you need to check that the crash address (in our example 0x00401602) is within the preffered load address (0x00400000) and the last Rva+Base address in the public function section (0x0047f324). Tip: to quickly go down to the end of the public function section search the string "entry point at". In our example the crash address 0x00401602 is within 0x00400000 and 0x0047f324, so the map file is correct.

3.
To find the file where the crash occurred, scan down the Rva+Base column of the public function section until you find the first function address that is greater then the crash address. The preceeding entry is the function that crashed. Our example is so small that the identification is very easy:

Address Publics by Value Rva+Base Lib:Object
0001:000005a0 _main 004015a0 f CRUnitTests.obj
0001:000008b0 ??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@P6AAAV01@AAV01@@Z@Z 004018b0 f i CRUnitTests.obj
 
The crash occurred in the CRUnitTests file, in the function main because our crash address is before address 0x004018b0.
 
4.
To find the line number where the crash occurred we need to calculate the address as following:
 
target = crash address - preferred load address - 0x1000
 
In our example the target is:
0x00401602 - 0x00400000 - 0x1000 = 0x602
Now search the line numbers section for the crunittests.cpp file:
 
Line numbers for .\CRUnitTests.obj(c:\crashhandler\crunittests.cpp) segment .text
5 0001:000005a0 7 0001:000005d1 10 0001:000005df 11 0001:000005e6
12 0001:000005ed 13 0001:00000607 16 0001:00000625 20 0001:0000062a

22 0001:00000707 23 0001:00000725 27 0001:0000072b 28 0001:000007c2
30 0001:000007e0 33 0001:000007e6 34 0001:000007ef

Find the closest address that isn't over the calculated target (0x602). The closest address in this case is 000005ed that is mapped in the source file at line 12. If you check the source code, you should find that the division by 0 simulation is exactly at line 12.

This procedure works but it's pretty tedious. Next step is to enhance the CrashHandler dll to directly return the source line address instead of the physical address. Don't miss the next post!

Friday, May 28, 2010

C++ crash handlers

Thanks to the new generation of programming languages such as Java and C# most programmers don't need to worry about memory management, pointers and so on anymore... Most but not everyone, and... not me :(.
In fact, either if you have strict performance requirements or if you're enhancing a 10 years' old program, you may need spend entire days fighting against the well known C++ memory access violation issues. I'm in the second case - enhancing an old and crappy code written more than 10 years ago - and I wanted to share with you some useful information that will hopefully help programmers to get out of - or at least to manage - wild memory errors!

The basic idea is that C++ is like a super-car: to drive a Ferrari you need to be a good driver, but if you're not such a good driver you can still drive a Ferrari, you just need to press the ESP button. :)
Since I'm not the Ayrton Senna of programming neither they were those programmers that originally wrote the code, I decided to add an ESP to that C++ program: the so called Crash Handler. Obviously I didn't invent the ESP for C++, someone already did it for me, but I got inspired by a great book any c++ programmer should read: Debugging Applications written by John Robbins. The book and this article are focused on Microsoft Windows programming only.


Combining SEH and C++ Exception handling, the Crash Handler is an exception filter which allow you to get control before the application crashes. Since you intercept the malicious exception right before the crash, you can easily put the necessary code to gracefully recover from it. Isn't that like an ESP?!?
In other words, what we're going to do is to use the standard C++ exception handling mechanism to catch SEH exceptions too.
If you're not familiar with SEH, the Structured Exception Handling is a language-independent exception handling provided by the operating system when an error occurs at the OS level. For example, when your application tries to write a memory address allocated to another process, the OS throws a "bad write" exception and your program can catch it thanks to the SEH support (you would use the __try/__except construct to do that).

So the first step is to "extend" the C++ exception mechanism to handle the SEH. We can easily do that using the C runtime library function _set_se_translator that lets you set a translator function that will be called when a structured exception occurs.

Let's implement the translate function in our CrashHandler dll project:
void NTException::translate(unsigned code, EXCEPTION_POINTERS* info)

{

switch (code) {

case EXCEPTION_ACCESS_VIOLATION:

throw AccessViolation(info);

break;

default:

throw NTException(info);

}

}
Based on the code recieved, we implemented 2 exceptions: a generic win32_exception and a more specific access_violation that will return the exact memory address where the error occourred.
The NTException is implemented as following:
NTException::NTException(const EXCEPTION_POINTERS* info)

: mWhat("Win32 exception"), mWhere(info->ExceptionRecord->ExceptionAddress), mCode(info->ExceptionRecord->ExceptionCode)

{

switch (info->ExceptionRecord->ExceptionCode) {

case EXCEPTION_ACCESS_VIOLATION:

mWhat = "Access violation";

break;

case EXCEPTION_FLT_DIVIDE_BY_ZERO:

case EXCEPTION_INT_DIVIDE_BY_ZERO:

mWhat = "Division by zero";

break;

}

}
The more specific handler for access violation is implemented as following:
AccessViolation::AccessViolation(const EXCEPTION_POINTERS* info)

: NTException(info), mIsWrite(false), mBadAddress(0)

{

mIsWrite = info->ExceptionRecord->ExceptionInformation[0] == 1;

mBadAddress = reinterpret_cast<:address>(info->ExceptionRecord->ExceptionInformation[1]);

CONTEXT *cstack = info->ContextRecord; 

}
As you can see the EXCEPTION_POINTER is the key structure to navigate through the exception information.

To enable the CrashHandler we need to register it in the program main:
_set_se_translator(CrashHandler::translate);

We can use this code to test our CrashHandler dll:
#include "CrashHandler.h"
#include 
int main()
{

// register the translator function

_set_se_translator(NTException::translate);

try
{
char* a = "";
int k = 10 / strlen(a); // division by 0!
std::cout << "TEST FAILED" << std::endl;
} 
catch (const AccessViolation& e)
{
std::cerr << "Error " << e.what() << " at " << std::hex << e.where()

<< ": Bad " << (e.isWrite()?"write":"read")

<< " on " << e.badAddress() << std::endl;
std::cout << "TEST PASSED FOR ACCESS VIOLATION" << std::endl; 
}
catch (const NTException& e) 
{
std::cerr << "Error " << e.what() << " (code " << std::hex << e.code()

<< ") at " << e.where() << std::endl;

std::cout << "TEST PASSED FOR NTException" << std::endl;
}
return 0;
}
You just need to compile now, but before that, you need to make sure that asynchronous exception handling is enabled at compile time. The default is the synchronous model (/EHsc). To enable the asynchronous exception handling you need to explicitely add the /EHa switch in the compiler options. If that option is not enabled, your translate function will be never called and you'll not catch the exceptions thrown by the OS.
Note that the asynchronous model adds some overhead with an impact on performance because the compiler has to track the lifetime of objects to be able to unwind the exceptions at any point in the code.

Lets wrap it all up in a MS VC++6 project! The workspace should include:
  • a CrashHandler project to create the Dynamic Link Library
  • a CHUnitTests project to test the new exception handler library
You can download the fully functional DLL with the source code here.
If you download and execute the code you'll get the following message:

As you can see, the Division by zero is intercepted at the address 00401602, so the test passed.

That is the essential you need to write a robust application that is able to collect useful information and gracefully terminate in case of a crash.
We need 2 more things to get the whole picture:
  • to be able to read a MAP file to convert the crash address reported in the where() result of our CrashHandler.
  • and to enhance the where function to return the line of the source code (and not the physical address) that caused the crash.
In fact, having the physical address is not so useful if you can't relate it to the source code. C#, Java and other languages report the code line where the exception is thrown and you can walk the stack trace to identify the problem. We'll achieve this result in the next posts.

In the next days, I'll first explain how to read a MAP file to convert a physical address into a source code line and then we'll create a function that, given the physical address, returns the related source code line.






Tuesday, February 2, 2010

Overclocking INTEL i7 920 processor: why not?

You should not overclock your CPU because:
  • you void the warranty
  • there is a risk to damage the CPU
But since I don't care about that and since I decided I needed some more juice to run the CPU-hungry Flight Simulator, I proceeded overclocking my wonderful CPU.

Basically, why should you overclock your CPU?
  • first because it is fun
  • second because modern CPUs have multiple cores but slow absolute CPU speed and very few applications take advantage of multiple cores
  • third because you want to learn something more about your computers' internals
To achieve good results we need the following ingredients:
  1. a CPU that is underpowered and therefore has a good margin of improvement. If you don't want to spend thousand of EUROs for a Intel Extreme edition, you can get a Intel i7 920 (Bloomfield), a great CPU at a fair price (~250Euros).
  2. We need a good motherboard that will make it easy overclocking via the BIOS settings. My favourite is the ASUS P6T Deluxe v2 that costs roughly 300Euros. Expensive but worth the money.
  3. Good DRAM modules (they make the difference in terms of stability). I got 6Gb (3x2Gb) of the Crucial Ballstix DDR3-1333Mhz 1.65v at 180Euros.
  4. Cool cooler to keep your CPU temperature low: I got the ASUS Triton 88 for 50Euros.

Now that we have the hardware we can start thinking at the overclock: the default frequency of the Intel i7 920 cpu is 2.66Ghz; we want to bring that to at least 3.4Ghz gaining ~30% of CPU speed.

With an overclocked CPU it's crucial to keep its temperature under control: download the free Core Temp utility (google it) to monitor cpu temperature and set temperature warning limits. The CPU temperatre should be at around 65C as per Intel i7 920 specification.

The first test, before overclocking,  is to verify that the ASUS Triton 88 is doing a good job cooling the CPU: with the pc in idle (Windows XP loaded but no other program running) Core Temp reports 30C to 35C on each core. That is a great result, and is the confirmation that the CPU has room for improvement.

To overclock a processor few simple calculations are needed. We first calculate the BCLK (Base Clock) needed to achieve the desired speed. Since we would like a CPU speed of 3.4Ghz, the needed BCKL is:

BCKL = Target Speed / CPU Ratio = 3400 / 20 = 170.

The CPU Ratio is fixed at 20 in the i7 920.
We need to calculate the multiplier, that depends on the DRAM frequency. We got the 1333Mhz DRAM so the multiplier is:

Multiplier = DRAM Frequency / BCKL = 1333 / 170 = 7.8 =~ 8

We need to choose the closest selectable integer in the BIOS settings, in this case it is 8.
The new DRAM frequency is:

New DRAM Frequency = BCKL * Multiplier = 170 * 8 = 1360Mhz

The Uncore Frequency is:

UCKL = New DRAM Frequency * 2 = 1360 * 2 = 2720Mhz

Fianlly, the QPI Link Data Rate should be the lowest selecatble, in this case 6135MT/s; it can be also left to AUTO in the Bios settings.

Now we need to put these values into the BIOS to make it happen! Enter the BIOS and select the AI Tweaker tab and set:
  • AI Overclocking Tuner   [Manual]
  • CPU Ratio Setting          [20.0]
  • BCKL Frequency          [170]
  • PCIE Frequency            [100]
  • DRAM Frequency         [DDR3-1363]
  • UCKL Frequency          [2726Mhz]
  • QPI Link Data Rate       [6135MT/s]
  • leave everything else to [AUTO]
Reboot.
This should result in a stable 3.4Ghz CPU at 50/55C.

The i7 920 could be overclocked up to 4Ghz, and some overclockers report that they reached 6Ghz. To achieve these results you need to manually optimize the CPU Voltage finding the lowest possible voltage where 100% stability is achieved for at least a run overnight. That requires experience and there is a good chance to damage the CPU.

With a step by step approach, it was easy to achieve 4Ghz without any stability issue. Tip: I disabled the Hyper Threading functionality in the advanced BIOS settings to lower the temperature a bit.

Important: dear reader, this is not a tutorial on how to overclock the CPU. This is a note I took to keep track of what I did on my PC. If you follow this note you may damage your CPU. If you want, do it, but do it at your own risk: in other words don't blame me if something goes wrong!

Friday, January 15, 2010

That algorithm is so hungry!!

Time to write an hungry algorithm in C#. Ok, I did my best to reduce its complexity but data to process was so much that I couldn't do any better than the first attempt. Low DRAM prices helped a bit to cut down costs but it was not the complete solution to the OutOfMemoryException issue that, from time to time, was ruin the game.



What can you do if  a program experiences a OutOfMemoryException the 20% of times you execute that piece of code?
If you're in a situation where your algorithm requires many objects that occupy a lot of memory, you could take advantage from the MemoryFailPoint class that you find in the System.Runtime namespace. The class allows you to check for sufficient memory before starting your hungry piece of code.
To use the class you just need to instantiate an object passing the amount of memory the algorithm you're going to execute would require.
try
{
   // try to reserve 2Gb of memory
   using (MemoryFailPoint mem = new MemoryFailPoint(2000))
   {
      // execute hungry code here
   } // dispose to release resources
}
catch (InsufficientMemoryException e)
{
   // gracefully recover in case of not enough memory
}
The constructor first checks if there is enough space in the page file to satisfy the request. If the space is not available, a garbage collection is forced to try to free up some space. If the space does not suffice yet, it tries to expand the paging file. If the file cannot grow enough, a InsufficientMemoryException (derived from OutOfMemoryException) is thrown. Otherwise, if the space is enough, the requested memory is reserved to a private static field defined within the class. At that point you can run your algorithm with a good chance to have enough memory: it is not guaranteed in fact that reserved memory will be physically allocated to the algorithm. When your algorithm completes, make sure you call the Dispose() method to release the reserved resources.
The MemoryFailPoint class can be a good help to create a robust solution: it's not a guarantee but it helps to gather as much memory as possible providing an elegant way to gracefully recover from a memory issue (for example if the exception is thrown you could decide to split the algorithm execution in two runs and then merge back the results).