Automatic start/stop of Xen domains

After answering a query, I said I'd write a blog entry describing what changes we've made to support clean shutdown and start of Xen domains.

Bernd refers to an older method of auto-starting Xen domains used on Linux. In fact, this method has been replaced with the configuration parameters on_xend_start and on_xend_stop. Setting these can ensure that a Xen domain is cleanly shut down when the host (dom0) is shut down, and started automatically as needed. For somewhat obvious reasons, we'd like to have the same semantics as used with zones, if not quite the same implementation (yet, at least).

When I started looking at this, I realised that the community solution had some problems:

Clean shutdown wasn't the default

It seems obvious that by default I'd like my operating systems to shut down cleanly. Only in unusual circumstances would I be happy with an OS being unceremoniously destroyed. We modified our Xen gate to default to on_xend_stop=shutdown.

Suspend on shutdown was dangerous

It is possible to specify on_xend_stop=suspend; this will save the running state to an image file and then destroy the domain (like xm save). However, there is not corresponding on_xend_start setting, nor any logic to ensure that the values match. This is both apparently useless and even dangerous, since starting a new domain but with old file-system state from a suspended domain could be problematic. We've disabled this functionality.

Actions are tied into xend

This was the biggest problem for us: as modelled, if somebody stops xend, then all the domains would be shut down. Similarly, if xend restarts for whatever reason (say, a hardware error), it would start domains again. We've modified this on Solaris. Instead of xend operating on these values, we introduce a new SMF service, system/xctl/domains, that auto-starts/stops domains as necessary. This service is pretty similar to system/zones. We've set up the dependencies such that a restart of the Xen daemons won't cause any running domains to be restarted. For this to work properly within the SMF framework, we also had to modify xend to wait for all domains to finish their state transitions.

You can find our changes here. And yes, we still need to take system/xctl/domains to PSARC.

Clean shutdown implementation

You might be wondering how the dom0 even asks the guest domains to shut down cleanly. This is done via a xenstore entry, control/shutdown. The control tools write a string into this entry, which is being "watched" by the domain. The kernel then reads the value and responds appropriately (xen_shutdown()), triggering a user-space script via the sysevent framework. If nothing happens for a while, it's possible that the script couldn't run for whatever reason. In that case, we time-out and force a "dirty" shutdown from within the kernel.