Python and DTrace in build 65


A significant portion of the Xen control tools are written in Python, in particular xend. It's been somewhat awkward to observe what the daemon is doing at times, necessitating an endless cycle of 'add printf; restart' cycles. A while ago I worked on adding DTrace support to the Python packages we ship in OpenSolaris, and these changes have now made it into the latest build, 65.

As is the case with the other providers people have worked on such as Ruby and Perl, there's two simple probes for function entry and function exit. arg0 contains the filename, arg1 the function name, and arg2 has the line number. So given this simple script to trace the functions called by a particular function invocation, restricted to a given module name:

#!/usr/sbin/dtrace -ZCs

#pragma D option quiet

python$target:::function-entry
    /copyinstr(arg1) == $2 && strstr(copyinstr(arg0), $1) != NULL/ {
        self->trace = 1;
}

python$target:::function-return
    /copyinstr(arg1) == $2 && strstr(copyinstr(arg0), $1) != NULL/ {
        self->trace = 0;
}

python$target:::function-entry,python$target:::function-return
    /self->trace && strstr(copyinstr(arg0), $3) != NULL/ {
        printf("%s %s (%s:%d)\n", probename == "function-entry" ? "->" : "<-",
            copyinstr(arg1), copyinstr(arg0), arg2);
}

We can run it as follows and get some useful results:

# ./pytrace.d \"hg.py\" \"clone\" \"mercurial\" -c 'hg clone /tmp/test.hg'
-> clone (build/proto/lib/python/mercurial/hg.py:65)
-> repository (build/proto/lib/python/mercurial/hg.py:54)
-> _lookup (build/proto/lib/python/mercurial/hg.py:31)
-> _local (build/proto/lib/python/mercurial/hg.py:16)
-> __getattribute__ (build/proto/lib/python/mercurial/demandload.py:56)
-> module (build/proto/lib/python/mercurial/demandload.py:53)
...

Of course, this being DTrace, we can tie all of this into general system activity as usual. I also added "ustack helper" support. This is significantly more tricky to implement, but enormously useful for following the path of Python code. For example, imagine we want to look at what's causing write()s in the clone operation above. As usual:

#!/usr/sbin/dtrace -Zs

syscall::write:entry /pid == $target/
{
        @[jstack(20)] = count();
}

END
{
        trunc(@, 2);
}

Note that we're using jstack() to make sure we have enough space allocated for the stack strings reported. Now as well as the C stack, we can see what Python functions are involved in the user stack trace:

# ./writes.d -c 'hg clone /tmp/test.hg'
...
              libc.so.1`_write+0x15
              libc.so.1`_fflush_u+0x36
              libc.so.1`fflush+0x43
              libpython2.4.so.1.0`file_flush+0x2a
              libpython2.4.so.1.0`call_function+0x32a
              libpython2.4.so.1.0`PyEval_EvalFrame+0xbdf
                [ build/proto/lib/python/mercurial/transaction.py:49 (add) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x732
              libpython2.4.so.1.0`fast_function+0x112
              libpython2.4.so.1.0`call_function+0xda
              libpython2.4.so.1.0`PyEval_EvalFrame+0xbdf
                [ build/proto/lib/python/mercurial/revlog.py:1137 (addgroup) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x732
              libpython2.4.so.1.0`fast_function+0x112
              libpython2.4.so.1.0`call_function+0xda
              libpython2.4.so.1.0`PyEval_EvalFrame+0xbdf
                [ build/proto/lib/python/mercurial/localrepo.py:1849 (addchangegroup) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x732
              libpython2.4.so.1.0`fast_function+0x112
              libpython2.4.so.1.0`call_function+0xda
              libpython2.4.so.1.0`PyEval_EvalFrame+0xbdf
                [ build/proto/lib/python/mercurial/localrepo.py:1345 (pull) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x732
              libpython2.4.so.1.0`fast_function+0x112
              148

              libc.so.1`_write+0x15
              libc.so.1`_fflush_u+0x36
              libc.so.1`fclose+0x6e
              libpython2.4.so.1.0`file_dealloc+0x36
              libpython2.4.so.1.0`frame_dealloc+0x65
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x75c
              libpython2.4.so.1.0`fast_function+0x112
              libpython2.4.so.1.0`call_function+0xda
              libpython2.4.so.1.0`PyEval_EvalFrame+0xbdf
                [ build/proto/lib/python/mercurial/localrepo.py:1849 (addchangegroup) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x732
              libpython2.4.so.1.0`fast_function+0x112
              libpython2.4.so.1.0`call_function+0xda
              libpython2.4.so.1.0`PyEval_EvalFrame+0xbdf
                [ build/proto/lib/python/mercurial/localrepo.py:1345 (pull) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x732
              libpython2.4.so.1.0`fast_function+0x112
              libpython2.4.so.1.0`call_function+0xda
              libpython2.4.so.1.0`PyEval_EvalFrame+0xbdf
                [ build/proto/lib/python/mercurial/localrepo.py:1957 (clone) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x732
              libpython2.4.so.1.0`fast_function+0x112
              libpython2.4.so.1.0`call_function+0xda
              148

Creating a ustack helper

As anyone who's come across the Java dtrace helper source will know, creating a ustack helper is rather a black art.

When a ustack helper is present, it is called in-kernel for each entry in a stack when the ustack() action occurs (source). The D instructions in the helper action are executed such that the final string value is taken as the result of the helper. Typically for Java, there is no associated C function symbol for the PC value at that point in the stack, so the result of the helper is used directly in the stack trace. However, this is not true for Python, so that's why you see a different format above: the normal stack entry, plus the result of the helper in annotated form where it returned a result (in square brackets).

The helper is given two arguments: arg0 is the PC value of the stack entry, and arg1 is the frame pointer. The helper is expected to construct a meaningful string from just those values. In Python, the PyEval_EvalFrame function always has a PyFrameObject * as one of its arguments. By having the helper look at this pointer value and dig around the structures, we can find pointers to the strings containing the file name and function, as well as the line number. We can copy these strings in, and, using alloca() to give ourselves some scratch space, build up the annotation string you see above.

Debugging helpers isn't particularly easy, since it lives and runs in probe context. You can use mdb's DTrace debugging facilities to find out what happened, and some careful mapping between the failing D instructions and the helper source can pinpoint the problem. Using this method it was relatively easy to get a working helper for x86 32-bit. Both SPARC and x86 64-bit proved more troublesome though. The problems were both related to the need to find the PyFrameObject * given the frame pointer. On amd64, the function we needed to trace was passing the arguments in registers, as defined architecturally, so the argument wasn't accessible on the stack via the frame pointer. On SPARC, the pointer we need was stored in a register that was subsequently re-used as a scratch register. Both problems were solved, rather cheesily, by modifying the way the function was called.

Tags: