Looking up Source-Code Lines from GDB/Python (and OpenMP complications)

来源:互联网 时间:1970-01-01

In GDB Python bindings, there is currently no direct way to translate a function symbol into its source file and corresponding lines. But that's possible with gdb command-line, and some more jungling:

(gdb) disassemble [function_name|*address]Dump of assembler code for function .omp_ptask.:0x00000000004024a0 <+0>: push%rbp0x00000000004024a1 <+1>: mov %rsp,%rbp0x00000000004024a4 <+4>: sub $0x20,%rsp0x00000000004024a8 <+8>: mov %edi,-0x8(%rbp)0x00000000004024ab <+11>: mov %rsi,-0x10(%rbp)0x00000000004024af <+15>: mov -0x8(%rbp),%edi0x00000000004024b2 <+18>: mov %edi,-0x14(%rbp)0x00000000004024b5 <+21>: mov -0x10(%rbp),%rsi0x00000000004024b9 <+25>: mov (%rsi),%rsi0x00000000004024bc <+28>: mov (%rsi),%rdi0x00000000004024bf <+31>: callq 0x4009f0 <foo>=> 0x00000000004024c4 <+36>: mov -0x4(%rbp),%eax0x00000000004024c7 <+39>: add $0x20,%rsp0x00000000004024cb <+43>: pop %rbp0x00000000004024cc <+44>: retqEnd of assembler dump.

With disassemble, we know (gdb tells us) where a function begins and ends ... in memory. In theory, we just have to parse the second and penultimate lines of gdb.execute("disassemble {addr}"). But in practise, compilers may reorganize (for optimization) the binary instructions, so it's safer to iterate through all of them. Then, gdb.find_pc_line(pc)tells use the source-code line matching that PC. There we are:

def get_function_fname_and_lines(fct_symb):fct_addr = long(fct_symb.value().address)disa = gdb.execute("disassemble {}".format(fct_addr), to_string=True)filename = fct_symb.symtab.filenamefrom_line = fct_symb.lineto_line = 0for disa_line in disa.split("/n"):if "Dump of assembler code" in disa_line:continue # skip first lineif "End of assembler dump." in disa_line:break # we're at the endtry:# parse the PC value# => 0x004009c1 <+32>: jmpq 0x401464 <main._omp_fn.0+2755>pc = int(disa_line.replace("=>", "").split()[0], 16)except:log.warning("Could not parse disassembly line ...")log.warning(disa_line)continuesal = gdb.find_pc_line(pc)if not sal:continue # hum, nothing known that that PC# check for consistency that PC is in the right fileif not sal.symtab.filename == fct_symb.symtab.filename:log.info("not the right file, inlined ?")continue# if function symbol doesn't specify its lineif fct_symb.line == 0:if from_line == 0 or sal.line < from_line:from_line = sal.line# PCs may not be in orderif sal.line > to_line:to_line = sal.linereturn filename, from_line, to_line

which gives:

(gdb) print(get_function_fname_and_lines(gdb.lookup_symbol("main")[0]))('minimal_omp_threads.c', 26, 76) OpenMP complications

I wrote that function as part of my work on OpenMP (OMP) debugging. In OMP, compilers do "outlining", that is, the reverse of inlining:

#pragma omp task foo1(&i, &j, &k);

becomes with GCC/GOMP:

main._omp_fn.3 (...) { foo1 (...);}GOMP_task (main._omp_fn.3, ...);

Everything is okay here, my code works well. But with Intel OpenMP and LLVM/Clang, they didn't implement outlining the same way: instead of naming the outlined functions something like <parent>._omp_fn.<id>, they name them ... .omp_microtask.! Thanks guys, now gdb.lookup_symbol(".omp_microtask.")always returns the same symbol (certainly the first one), and so does my source-code lookup function.

We do have the address of the function

(Pdb) print fn0x402340 <.omp_ptask.>

but gdb.lookup_symbolcannot do the lookup by address ...

So let's get back to GDB/Python documentationand see how we can fix that:

Function: gdb.lookup_symbol (name [, block [, domain]])

This function searches for a symbol by name. The search scope can be restricted to the parameters defined in the optional domain and block arguments.

That block argument looks good (that's more or less the equivalent of a C scope). But where show I get it from?

I remember that [gdb.Frame](https://sourceware.org/gdb/current/onlinedocs/gdb/Frames-In-Python.html#Frames-In-Python)has a block attribute:

(gdb) pi gdb.lookup_symbol(".omp_microtask.", gdb.selected_frame().block())[0]<gdb.Symbol object at 0x7fc96e0883c8>(gdb) pi get_function_fname_and_lines(...)('minimal_omp_threads.c', 38, 39)

but that doesn't work as I wanted (that is, from the task allocator function), because we are in the scope of the task allocator function, which is here equivalent to the global one. The lookup always resolves to the first task ...

So, how to get the right block? Let's get back to the documentation, maybe the block page...

Function: gdb.block_for_pc (pc)

Return the innermost gdb.Block containing the given pc value. If the block cannot be found for the pc value specified, the function will return None.

Interesting ! Furthermore:

Variable: Block.function

The name of the block represented as a gdb.Symbol. If the block is not named, then this attribute holds None. This attribute is not writable.

For ordinary function blocks, the superblock is the static block. However, you should note that it is possible for a function block to have a superblock that is not the static block – for instance this happens for an inlined function.


(Pdb) gdb.block_for_pc (0x402340).function<gdb.Symbol object at 0x7f824e346300> (.omp_ptask.)

so the final code for Intel OpenMP looks like that:

fct_addr = ... # "0x402340"fct_symb = gdb.block_for_pc(int(fct_addr, 16)).functionmy_gdb.get_function_fname_and_lines(fct_symb)

and that works well :-)