Testing a C Library With Python


It’s still common for a systems library to be written in the default lingua franca, C, although Rust is encroaching, for good reasons.

However, when it comes to testing, things get tedious quickly: writing unit or component tests in C is a slow, bug-prone exercise. With libvfio-user, after fixing too many bugs that were due to the test rather than the test subject, I decided it would be worth looking at alternative approaches. The aim was to reduce the time it takes to develop unit/component tests.

Up until this point, we’d been using ctest, along with cmocka when we needed to mock out certain functions (such as socket handling). Leaving aside my strong feelings on these tools, this was rather unsatisfactory: libvfio-user effectively implements a (UNIX) socket server, but we weren’t actually testing round-trip interactions for the most part. In terms of code coverage, very little useful could be done via this unit testing approach, but the “sample” client/server was tedious to work with for testing purposes.

Python-based testing

After a quick proof of concept, it became clear that using Python would be a great choice to cover most of our testing needs. libvfio-user doesn’t ship with any client bindings, and, given that the main clients are qemu, cloud-hypervisor and SPDK, Python bindings would be of dubious utility.

As a result, we decided against “proper” Python bindings, auto-generated or otherwise, in favour of a small and simple approach. In particular, by using the terrible magic of ctypes, we could easily set up both client and server test cases that fully represent how the library works in real life.

So, instead of auto-generated bindings, we write - by hand - simple, thin, layers of type wrappers:

class vfio_irq_info(Structure):
    _pack_ = 1
    _fields_ = [
        ("argsz", c.c_uint32),
        ("flags", c.c_uint32),
        ("index", c.c_uint32),
        ("count", c.c_uint32),
    ]

small harness routines for socket handling

def connect_client(ctx):
    sock = connect_sock()

    json = b'{ "capabilities": { "max_msg_fds": 8 } }'
    # struct vfio_user_version
    payload = struct.pack("HH%dsc" % len(json), LIBVFIO_USER_MAJOR,
                          LIBVFIO_USER_MINOR, json, b'\0')
    hdr = vfio_user_header(VFIO_USER_VERSION, size=len(payload))
    sock.send(hdr + payload)
    ...

… interacting with the library on the server side

def get_pci_header(ctx):
    ptr = lib.vfu_pci_get_config_space(ctx)
    return c.cast(ptr, c.POINTER(vfu_pci_hdr_t)).contents

… and so on. Writing this by hand might seem immensely tedious, but in practice, as it’s pretty much all boilerplate, it’s very quick to write and modify, and easily understandable; something that can rarely be said for any kind of auto-generated code.

Client/server interactions

Another observation was that, for the purposes of these tests, we really didn’t need a client process and a server process: in fact, we don’t even need more than one thread of execution. If we make each test round-robin between acting as the client, then acting as the server, it becomes trivial to follow the control flow, and understanding logs, debugging, etc. is much easier. This is illustrated by the msg() helper:

def msg(ctx, sock, cmd, payload=bytearray(), expect_reply_errno=0, fds=None,
        rsp=True, expect_run_ctx_errno=None):
    """
    Round trip a request and reply to the server. vfu_run_ctx will be
    called once for the server to process the incoming message,
    @expect_run_ctx_errrno checks the return value of vfu_run_ctx. If a
    response is not expected then @rsp must be set to False, otherwise this
    function will block indefinitely.
    """
    # FIXME if expect_run_ctx_errno == errno.EBUSY then shouldn't it implied
    # that rsp == False?
    hdr = vfio_user_header(cmd, size=len(payload))

    if fds:
        sock.sendmsg([hdr + payload], [(socket.SOL_SOCKET, socket.SCM_RIGHTS,
                                        struct.pack("I" * len(fds), *fds))])
    else:
        sock.send(hdr + payload)

    ret = vfu_run_ctx(ctx, expect_errno=expect_run_ctx_errno)
    if expect_run_ctx_errno is None:
        assert ret >= 0, os.strerror(c.get_errno())

    if not rsp:
        return

    return get_reply(sock, expect=expect_reply_errno)

We are operating as the client when we do the sendmsg(); the server then processes that message via vfu_run_ctx(), before we “become” the client again and receive the response via get_reply().

We can then implement an individual test like this:

def test_dma_region_too_big():
    global ctx, sock

    payload = vfio_user_dma_map(argsz=len(vfio_user_dma_map()),
        flags=(VFIO_USER_F_DMA_REGION_READ |
               VFIO_USER_F_DMA_REGION_WRITE),
        offset=0, addr=0x10000, size=MAX_DMA_SIZE + 4096)

    msg(ctx, sock, VFIO_USER_DMA_MAP, payload, expect_reply_errno=errno.ENOSPC)

which we can run via make pytest:

...
___________________________ test_dma_region_too_big ____________________________
----------------------------- Captured stdout call -----------------------------
DEBUG: quiescing device
DEBUG: device quiesced immediately
DEBUG: adding DMA region [0x10000, 0x80000011000) offset=0 flags=0x3
ERROR: DMA region size 8796093026304 > max 8796093022208
ERROR: failed to add DMA region [0x10000, 0x80000011000) offset=0 flags=0x3: No space left on device
ERROR: msg0x62: cmd 2 failed: No space left on device
...

This is many times easier to write and test than trying to do this in C, whether as a client/server, or attempting to use mocking. And we can be reasonably confident that the test is meaningful, as we are really executing all of the library’s message handling.

Debugging/testing tools

With a little bit of tweaking, we can also use standard C-based tools like valgrind and gcov. Code coverage is simple: after defeating the mini-boss of cmake, we can run make gcov and get code-coverage results for all C code invoked via the Python tests - it just works!

Running Python tests with valgrind was a little harder: for leak detection, we need to make sure the tests clean up after themselves explicitly. But Python itself also has a lot of valgrind noise. Eventually we found that this valgrind invocation worked well:

	PYTHONMALLOC=malloc \
	valgrind \
	--suppressions=$(CURDIR)/test/py/valgrind.supp \
	--quiet \
	--track-origins=yes \
	--errors-for-leak-kinds=definite \
	--show-leak-kinds=definite \
	--leak-check=full \
	--error-exitcode=1 \
	$(PYTESTCMD)

We need to force Python to use the system allocator, and add a number of suppressions for internal Python valgrind complaints - I was unable to find a working standard suppression file for Python, so had to construct this myself based upon the Python versions in our CI infrastructure.

Unfortunately, at least on our test systems, ASAN was completely incompatible, so we couldn’t directly run that for the Python tests.

Summary

The approach I’ve described here has worked really well for us: it no longer feels immensely tedious to add tests along with library changes, which can only help improve overall code quality. They are quick to run and modify, and for the most part easy to understand what the tests are actually doing.

There’s been a few occasions where ctypes has been difficult to work with - for me the documentation is particularly sparse, and callbacks from the C library into Python are distinctly non-obvious - but we’ve so far always managed to battle through, and twist it to our needs.

Doing things this way has a few other drawbacks: it’s not clear, for example, how we might test intermittent allocation failures, or other failure injection scenarios. It’s also not really suitable for any kind of performance or scalability testing.

I’m curious if others have taken a similar approach, and what their experiences might be.