Exact Perl location with B::Deparse (and Devel::Callsite)

来源:互联网 时间:1970-01-01

Recently I have been working on this cool idea: using B::Deparseto help me figure out exactly where a program is stopped. This can be used in a backtrace such as when a program crashes from Carp::Confessor in a debugger like Devel::Trepan.

To motivate the idea a little bit, suppose my program has either of these lines:

$x = $a/$b + $c/$d;($y, $z) = ($e/$f, $g/$h);

I might want to know whichdivision in the line is giving me an illegal division by zero. A while back with the help of perlmonks, the idea of using the OP address was the only promising avenue. More recently, I re-discovered B::Deparseand realized it might be able to do the rest: give the context around a specific op-code location. Devel::Callsitecan be used to get your current op-code address.

B::Deparseis one of those things like the venerable perl debugger:

It is a brute-force effort with a long history, many people have contributed to it, it is one huge file.

It has been said that nothing can parse Perl other than Perl. Well, nothing can de-parse Perl's OP's other than B::Deparse. It understands the Perl interpreter and its intricacies very well.

But the most important feature I need is that B::Deparsehas a way of doing its magic inside a running program. You can give it a subroutine reference at runtime and it will deparse that.

A useful side benefit in B::Deparse's output is that it will split up multi-statement lines into one line per statement.


$x = 1; $y *= 2; $z = $x + $y;

will appear as:

$x = 1; $y *= 2; $z = $x + $y;

All good so far. The first piece of bad news is that it doesn't show the OP addresses. But that is pretty easily remedied.

Initially I figured I'd handle this the way I did when I wanted to show fragments of disassembly code colorized using B::Concise: I'd just dump everything to a buffer internally and then run some sort of text filtering process to get the part I wanted.

So I monkey-patched and extended B::Deparse so I could search for an op address and it would return the closest COP, and I show that statement. This was released in version 0.70 of Devel::Trepan.

This is a hack though. It isn't really what I wanted. While showing just the addresses at COP or statement boundaries while helps out with multiple statements per line, it isn't all that helpful otherwise. In the first example with dividing by zero or an inside a parallel assignment, there would just be to COP addresses and that's really no better than giving a line number. I need to add information about sub parts insidea statement.

So the next idea was to extend B::Deparse to store a hash of addresses (a number) to B:OPs. Better. But not good enough. I still would need to do the part that B::Deparse does best: deparsing.

Aslo, I want to have a way to easily go up the OP tree to get larger and larger context. For example, suppose the code is:

$x = shift; $y = shift;

and I report you are stopped at "shift". I would probably want to say: Give me the full statement that the "shift" is part of. This means in the OP tree I would want the parent. Although there is a way to compile Perl storing parent pointers, Perl generally isn't built that way. Given an OP address, I'm not sure how we could easily find its parent other than starting from the top and traversing.

So my current tack is sort of an abstract OP tree which stores text fragments for that node in the tree. As it walks the tree top down it saves parent pointers to the nodes it creates.

You may ask, what's the difference between this and the OP tree other than the parent pointer?

Well, recall that B::Deparsehas already abstracted the OP codes, from a lower level form into higher level constructs. This is true more so as we moves up the first couple levels of the tree. The Perl output is generic and dumb, but still it is slightly at at higher level than the sequence of OP instructions.

Saving more of the tree structure can improve deparsing itself.

Right now B::Deparsewalks the tree and builds Perl code expressions and statements bottom up. The main thing passed down right now is operator precedence to reduce the extraneous parentheses. At level in the OP tree, the only information from the children passed up is generally just the result string.

In my B::DeparseTree, in addition to the text fragments, I keep child information in a more structured way, and a parent pointer is saved and available during processing.

I close with some observations in using this. My first test was with fibonacci:

sub fib($) { my $x = shift; return 1 if $x <= 1; return fib($x-1) + fib($x-2); }

If you deparse stopped in a debugger in the line with my $x = shift, you get:

shift() # which is inside.. my $x = shift()

So far so good. Stepping to the next stopping point inside the line with return 1 if $x <= 1you get:

$x # which is inside... $x <= 1

Still good. Things start get interesting when I do another step into return fib($x-1) + fib($x-2); Deparsing doesn't show find anything. Here's why:

(trepanpl): s-- main::(example/fib.pl:11 @0x221dce8)return(fib($x-1) + fib($x-2))(trepanpl): deparse# Nothing(trepanpl): disasm -terseSubroutine main::fib------------------- main::fib:UNOP (0x221dc40) leavesub [1]LISTOP (0x21f9608) lineseq#9: my $x = shift;COP (0x21f9650) dbstateBINOP (0x21f96b0) sassignOP (0x21f96f8) shiftOP (0x21f9730) padsv [1]#10: return 1 if $x <= 1;COP (0x2227e98) dbstateUNOP (0x2227ef8) nullLOGOP (0x2227f38) andBINOP (0x2227f80) leOP (0x2228008) padsv [1]SVOP (0x2227fc8) const IV (0x4d25160) 1LISTOP (0x2228040) returnOP (0x21f9590) pushmarkSVOP (0x21f95c8) const IV (0x4d25238) 1 #11: return(fib($x-1) + fib($x-2))COP (0x221dc88) dbstateLISTOP (0x221dd20) return => OP (0x221dce8) pushmarkBINOP (0x221dd68) add [6]UNOP (0x221dfb8) entersub [3]UNOP (0x2227d00) null [149]OP (0x2227cb0) pushmarkBINOP (0x2227d48) subtract [2]OP (0x2227e10) padsv [1]SVOP (0x2227d90) const IV (0x4d24f38) 1UNOP (0x2227dd0) null [17]SVOP (0x2227e50) gv GV (0x4d03b28) *fibUNOP (0x221ddb0) entersub [5]UNOP (0x221de28) null [149]OP (0x221ddf0) pushmarkBINOP (0x221de70) subtract [4]OP (0x221df38) padsv [1]SVOP (0x221deb8) const IV (0x4d24e30) 2UNOP (0x221def8) null [17]SVOP (0x221df78) gv GV (0x4d03b28) *fib

The next instruction to be executed is a pushmark, and B::Deparse skips that when it sees procesess the LISTOP. My remedy here was to note in the structure other ops underneath that are "skipped" or subsumed in the parent operation.

After fixing this the output is:

return (fib($x - 1) + fib($x - 2)) # part of...sub fib($) { # line 9 'example/fib.pl' # ... rest of fib code

Stepping recursively into fib you get the last weirdness I encountered. Here is Devel::Trepan output so I can describe the situation better:

trepan.pl example/fib.pl-- main::(example/fib.pl:14 @0x21798a8)printf "fib(2)= %d, fib(3) = %d, fib(4) = %d/n", fib(2), fib(3), fib(4);set auto eval is on.(trepanpl): b 9 # first statement in fibBreakpoint 1 set in example/fib.pl at line 9(trepanpl): continuexx main::(example/fib.pl:9 @0x217d268) my $x = shift;(trepanpl): continue # first recusive callxx main::(example/fib.pl:9 @0x217d268) my $x = shift;(trepanpl): up--> #1 0x39491f0 $ = main::fib(2) in file `example/fib.pl' at line 11 main::(example/fib.pl:11 @0x39491f0) return(fib($x-1) + fib($x-2))(trepanpl): deparsefib($x - 2) # part of...fib($x - 1) + fib($x - 2)(trepanpl):

I'm in fib($x-2)? No, I'm in the middle of evaluating fib($x-1)! What's going on?

The stopping location is really the point where I would continue. So fib($x-2)is where I would continueto after I return. To reinforce this, when I step an invocation from fib($x-2)and do the same thing, I now see:

fib($x - 1) + fib($x - 2) # part ofreturn (fib($x - 1) + fib($x - 2))

Which is saying I am stopped before the addition, just before the final return. A possible fix is the back step in a call. I dunno.

In sum, this is all pretty powerful stuff. It's also a lot of work.