Generating assembly structure offset values with CTF

The Solaris kernel contains a fair amount of assembly, and this often needs to access C structures (and in particular know the size of such structures, and the byte offsets of their members). Since the assembler can't grok C, we need to provide constant values for it to use. This also applies to the C library and kmdb.

In the kernel, the header assym.h provides these values; for example:

#define T_STACK 0x4
#define T_SWAP  0x68
#define T_WCHAN 0x44

These values are the byte offset of certain members into struct _kthread. For each of the types we want to reference from assembly, a template is provided in one of the files. For the above, we can see in usr/src/uts/i86pc/ml/

_kthread        THREAD_SIZE
        t_pcb                   T_LABEL
        t_stk                   T_STACK
        t_lwpchan.lc_wchan      T_WCHAN
        t_flag                  T_FLAGS

This file contains structure names as well their members. Each of the members listed (which do not have to be in order, nor does the list need to be complete) cause a define to be generated; by default, an uppercase version of the member name is used. As can be seen, this can be overridden by specifying a #define name to be used. The THREAD_SIZE define corresponds to the bytesize of the entire structure (it's also possible to generate a "shift" value, which is log2(size)).

To generate the header with the right offset and size values we need, a script is used to generate CTF data for the needed types, which then uses this data to output the assym.h header. This is a Perl script called genoffsets, and the build invokes it with a command line akin to:

genoffsets -s ctfstabs -r ctfconvert cc < > assym.h

The hand-written file serves as input to the script, and it generates the header we need. The script takes the following steps:

  1. Two temporary files are generated from the input. One is a C file consisting of #includes and any other pre-processor directives. The other contains the meat of the offsets file.
  2. The C file containing all the includes is built with the compile line given (I have stripped the compiler options above for readability).
  3. ctfconvert is run on the built .o file.
  4. The pre-processor is run across the second file (the temporary offsets file)
  5. This pre-processed file is passed to ctfstabs along with the .o file.

ctfstabs reads the input offsets file, and for each entry, looks up the relevant value in the CTF data contained in the .o file passed to it. It has two output modes (which I'll come to shortly), and in this case we are using the genassym driver to output the C header. As you can see, this is a fairly simple process of processing each line of the input and looking up the type data in the CTF contained in the .o file.

A similar process is used for generating forth debug files for use when debugging the kernel via the SPARC PROM. This takes a different format of offsets file more appropriate to generating the forth debug macros, described in the forth driver.

To finish off the output header, the output from a small program called genassym (or, on SPARC, genconst) is appended. It contains a bunch of printfs of constants. A lot of those don't actually need to be there since they're simple constant defines, and the assembly file could just include the right header, but others are still there for reasons such as:

  • The macros which hide assembler syntax differences such as _MUL aren't implemented for the C compiler
  • The value is an enum type, which ctfstabs doesn't support
  • The constant is a complicated composed macro that the assembler can't grok

and other reasons. Whilst a lot of these could be cleaned up and removed from these files, it's probably not worth the development effort except as a gradual change.