User:Dhuseby: Difference between revisions
Line 53: | Line 53: | ||
That leads me back to the DWARF Call Frame Information. This is the data added to an executable with the -fasynchronous-unwind-tables flag is set in GCC. Currently we can't use this information because it is HUGE and we run out of memory on the B2G devices. | That leads me back to the DWARF Call Frame Information. This is the data added to an executable with the -fasynchronous-unwind-tables flag is set in GCC. Currently we can't use this information because it is HUGE and we run out of memory on the B2G devices. | ||
The only obvious way forward is to have GCC generate and output all of the DWARF data, including the Call Frame Information, and then we write a new tool that strips out the DWARF information from the elf, condenses it down to just the data we need (prologue/epilogue test for a given PC) and then writes it back to the elf. Then the profiler would have the necessary data to fully unwind stacks correctly, 100% of the time. |
Revision as of 19:51, 29 July 2013
Bug 895156 Notes
The only sure way to catch all calls to dlopen from an executable and any libraries it is linked with is to use the LD_PRELOAD feature. The LD_PRELOAD allows us to force the loading of a dynamically linked library (.so) before an executable is loaded. This allows us to override any symbols in dynamcially linked libraries that the executable is dependent on.
In Bug 895156 Notes we want to override dlopen and dlclose so that we can execute code to load the arm eabi exception handling data and any DWARF data (see Bug 895162) for each library dynamically loaded at runtime. We will create a preload .so "shim" that exports our own dlopen and dlclose functions that execute our custom code and then fall through to the default system dlopen and dclose.
The shim code is extremely simple (gecko_profiler_preload.c):
#include <stdio.h> #include <stdlib.h> #include <dlfcn.h> void * dlopen( const char * file, int mode ) { static void * (*sys_dlopen)( const char * file, int mode ) = NULL; /* look up the pointer to the system's dlopen function */ sys_dlopen = (void*(*)(const char * file, int mode)) dlsym( RTLD_NEXT, "dlopen" ); /* do custom code here to see if the exception handling and/or DWARF data has been * loaded already. if not, load it. */ /* fall through to the system dlopen function */ return (*sys_dlopen)( file, mode ); } int close( int fd ) { static int (*sys_dlclose)( void * handle ) = NULL; /* look up the pointer to the system's dlclose function */ sys_dlclose = (int(*)(void * handle)) dlsym( RTLD_NEXT, "dlclose" ); /* do custom code here if needed. */ /* fall through to the system dlclose function */ return (*sys_dlclose)( handle ); }
Compiling this shim is easy to:
$ gcc -Wall -fPIC -shared -o gecko_profiler_preload.so gecko_profiler_preload.c -ldl
Then when we want to execute gecko using this preload shim:
$ LD_PRELOAD=/path/to/gecko_profiler_preload.so ./gecko
Bug 895162 Notes
The stack walking code on our ARM targets is non-trivial. Our current approach is to use the exception handling data. The only problem is that the exception handling data wasn't intended for use in stack unwinding from every possible program counter (PC) location. In particular, the stack unwinding breaks when the PC is in a function prologue/epilogue. This is because no exceptions can be thrown in a function prologue/epilogue so the stack unwinding information for those PC locations is invalid.
I looked into using DWARF debugging information to help in the prologue/epilogue special case. DWARF debugging information can contain Line Number Program data. According to the DWARF standard, the Line Number Program data contains several flags related to prologue/epilogue and given address locations. Unfortunately, the flags are prologue_end and epilogue_being. Those flags are intenteded for debuggers to determine entry and exit point line numbers of functions for entry/exit breakpoints. Given a random PC, there is no easy way to detect if it is inside of a function prologue/epilogue.
That leads me back to the DWARF Call Frame Information. This is the data added to an executable with the -fasynchronous-unwind-tables flag is set in GCC. Currently we can't use this information because it is HUGE and we run out of memory on the B2G devices.
The only obvious way forward is to have GCC generate and output all of the DWARF data, including the Call Frame Information, and then we write a new tool that strips out the DWARF information from the elf, condenses it down to just the data we need (prologue/epilogue test for a given PC) and then writes it back to the elf. Then the profiler would have the necessary data to fully unwind stacks correctly, 100% of the time.