Adding an external NIC to a Triton compute node

Oct 14, 2019

I found it a little bit non-obvious how to use NAPI to add an external NIC to a compute node so it can reach the external network rather than just the internal admin one.

We need to first tag the underlying physical NIC on the compute node with the externalNIC tag. We need to look up the MAC of the physical NIC:

computenode# # dladm show-phys -m ixgbe0
LINK         SLOT     ADDRESS            INUSE CLIENT
ixgbe0       primary  e4:11:5b:97:83:49  yes  ixgbe0

then tell NAPI (from the headnode) that this NIC is going to provide the external tag:

sdc-napi /nics/e4:11:5b:97:83:49 -X PUT -d '{ "nic_tags_provided" : "external" }'

We now need to actually add the external VNIC in NAPI:

cn=*your compute node UUID from `sdc-server list`*
ip=*IP address to use on external network*
vlan_id=*vlan id if any*

owner=$(sdc-useradm get admin | json uuid)

sdc-napi /nics -X POST -d @- <<EOF
 "owner_uuid": "$owner",
 "belongs_to_type": "server",
 "belongs_to_uuid": "$cn",
 "cn_uuid": "$cn",
 "ip": "$ip",
 "vlan_id": "$vlan_id",
 "nic_tag": "external"

After a while, we should find that the DHCPD server has updated the networking config file for the CN:

# cat /zones/$(vmadm list -Ho uuid alias=dhcpd0)/root/tftpboot/bootfs/e4115b978348/networking.json
  "nictags": [
      "mtu": 1500,
      "name": "external",
      "uuid": "86b73953-488a-4041-bd7a-83aa51c4ca22"
  "vnics": [
      "belongs_to_type": "server",
      "nic_tag": "external",

And on rebooting the CN, we can find our interface up, and reachable externally:

# ipadm show-addr external0/_a
ADDROBJ           TYPE     STATE        ADDR
external0/_a      static   ok 

Modifying boot files with SmartOS under Loader

Feb 12, 2019

With the advent of newboot in SmartOS/Triton, newly-installed systems will use loader as the bootloader, replacing grub. See RFD 156 for some technical background on the motivation of the switch.

It’s often the case that people want to make some modification to an /etc file in subsequent SmartOS boots. As we boot from ramdisk, we can’t just directly modify the files. As originally described on Keith’s blog the way to get around this problem involves specifying specific files to over-ride the default.

Obviously this has changed under loader.

NOTE: This is now documented at on the SmartOS wiki at Modifying Boot Files. Please look there instead, as the below may not stay current.

Let’s presume we want to over-ride /etc/system to set kmem_flags. First, let’s take a copy of our file and edit it:

# sdc-usbkey mount
# mkdir -p /mnt/usbkey/bootfs/etc/ # or whatever
# cp /etc/system /mnt/usbkey/bootfs/etc/system    # or /mnt/usbkey/bootfs/dtrace.conf etc.
# echo "set kmem_flags=0xf" >>/mnt/usbkey/bootfs/etc/system

Now we want loader to prepare this file as a bootfs module. In grub, we used something like “module /bootfs/etc/system type=file name=etc/system”. For loader, it’s similar:

# cd /mnt/usbkey/boot
# grep etc_system loader.conf.local

The prefix (etc_system_) is fairly arbitrary, though often named after the module. For each file you want, you’d want a _load, _type, _name and _flag line specified. The _name entry is the path to the file for loader to use; the name flag is the /system/boot/... path you want the modified file to be available at after booting.

If this all worked OK, then we should see during boot something like:

Loading /os/20190207T125627Z/platform/i86pc/kernel/amd64/unix...
Loading /os/20190207T125627Z/platform/i86pc/amd64/boot_archive...
Loading /os/20190207T125627Z/platform/i86pc/amd64/boot_archive.hash...
Loading /bootfs/etc/system...
SunOS Release 5.11 Version joyent_20190207T125627Z 64-bit
Copyright (c) 2010-2019, Joyent Inc. All rights reserved.
WARNING: High-overhead kmem debugging features enabled (kmem_flags = 0xf)...

And we should find a copy of our modified file here:

# tail /system/boot/etc/system 
set kmem_flags=0xf

The kernel has a search path such that it will load from /system/boot prior to /. So the above is our active file, although /etc/system is still the standard shipped file.

My awesome download manager

Jan 30, 2019

Since Liferea in more recent versions requires a download manager (it does not attempt to deal with the constant “new” podcast downloads on broken RSS feeds), I tried a few different ones. None of them worked. The best of a bad bunch was uGet, but that still often got stuck on a busy loop, forgot where to download, failed to handle duplicates etc.

I realised that in fact the best option was this marvellous piece of engineering:



readonly LOG_FILE="/var/tmp/download.log"
touch $LOG_FILE
exec 1>>$LOG_FILE
exec 2>&1

#set -x

if grep "$1" ~/.downloaded >/dev/null; then
	echo "$(date): skipping $1"
	exit 0

echo "$(date): downloading $1"

echo "$1" >>~/.downloaded

cd $my_download_dir

curl -RksSLJO "$1" 

Not exactly stunning but it works.

wodim under Ubuntu

Jan 24, 2019

wodim/k3b are unusable on Ubuntu: after making sure you’re in the cdrom group, you need to add to /etc/security/limits.conf:

@cdrom - memlock unlimited

(Or some limit, I suppose, if you’re bothered.)

PCI pass-through support with bhyve and SmartOS

Oct 26, 2018

Some prompting on IRC led me to do this write-up on how to configure PCI passthrough for a bhyve instance running on SmartOS. Please be aware this isn’t necessarily fully supported or tested; it may work for you, it also may not.
Some of this is covered under RFD 114; the below is more of a HOWTO.

Global zone configuration

To allow a bhyve zone to access a PCI device, we need to prevent the global zone’s access to it, and make it available to bhyve zones. To do this, we need to make two overlay files available to the system via boot modules. Remember that the SmartOS root is an ephemeral ramdisk: as we need to change two files in /etc, we’ll have to modify our grub configuration:

Modify grub to include PPT config files

# mount our USB key (modify as needed, see diskinfo output)
mount -F pcfs -o foldcase /dev/dsk/c1t0d0p1 /mnt
vim /mnt/boot/grub/menu.lst

We want to modify the menu entry we’re booting to be something like this:

title my entry
    kernel$ /os/20181023T131405Z/platform/i86pc/kernel/amd64/unix ...
    module /os/20181023T131405Z/platform/i86pc/amd64/boot_archive type=rootfs name=ramdisk
    module /20181023T131405Z/platform/i86pc/amd64/boot_archive.hash type=hash name=ramdisk
    module /overlay/etc/ppt_aliases type=file type=file name=etc/ppt_aliases
    module /overlay/etc/ppt_matches type=file type=file name=etc/ppt_matches

Make sure to add the type entry on all module lines! Before we reboot, though, we need to actually populate these two files.

Setting ppt_matches

This file is a list of *all* devices that we might want to pass-through, in PCI ID form:

# cat /mnt/overlay/etc/ppt_matches

This file should contain the PCI ID of the type of device you want to pass through. (Please ignore all PCI specifics here, this is just for illustration.). Every device on the system that has these IDs will be listed (after a reboot) in pptadm list -a.

Setting ppt_aliases

The second file is used to actually reserve specific devices for pass-through, based on physical path. For example:

# cat /mnt/overlay/etc/ppt_aliases 
ppt "/[email protected],0/pci8086,[email protected]/[email protected]"
ppt "/[email protected],0/pci8086,[email protected]/pci1462,2291"

This binds the “ppt” driver to the given paths under /devices. On a reboot, the kernel will process this and attach ppt as needed. This driver stub makes sure that the host kernel won’t try to process the device itself.

Reboot the host

After we reboot, we should find our files are processed. They are visible under the path /system/boot - the existing /etc/ppt_matches will be over-ridden. The pptadm(1m) tool is a handy way of listing this configuration:

# pptadm list -a -o dev,vendor,device,path
/dev/ppt0  10de   a65    /[email protected],0/pci8086,[email protected]/[email protected]
/dev/ppt1  10de   be3    /[email protected],0/pci8086,[email protected]/pci1462,2291

We can see that two specific devices are now available for pass-through.

Zone configuration

Now we need to configure our VM to actually use this device. In the JSON for the VM, this looks something like this:

  "pci_devices": [
       "path": "/devices/[email protected],0/pci8086,[email protected]/[email protected]",
       "pci_slot": "0:8:0"
       "path": "/devices/[email protected],0/pci8086,[email protected]/pci1462,2291",
       "pci_slot": "0:8:1"

where path is the physical path, and the PCI slot is what the guest will see (the usual bus,device,function triple). Passing the new JSON into vmadm update should allow the VM to boot with the new configuration.
You can check /zones/$uuid/logs/platform.log for any problems.

PCID support on Illumos

Feb 26, 2018

I joined Joyent at the start of the year while Meltdown was breaking news; it was certainly an “interesting” time to start a new job. Luckily by my first week, Alex and Robert had pretty much figured out how the changes should look and made good inroads on the implementation. So I began working with Alex on his KPTI trampoline code (mainly involving breaking it with my old friend KMDB). I also picked up the PCID work which I describe here.

As you can probably tell from Alex’s blog post, Meltdown is unusual for a security issue: aside from the usual operational pains of any security patch, the fix itself involved some pretty significant code changes to the low-level core of the kernel.

There’s also another potential impact, and that’s performance. While the actual overhead is heavily workload-dependent - and some of the reports out there seem pretty alarmist - having to switch page tables (i.e. reloading %cr3) on every kernel entry and exit has a non-trivial impact on system call cost. Nor can we keep the kernel state in the TLB. Previously, we would set PT_GLOBAL on kernel mappings so they’re not flushed across a %cr3 reload, but as the CPU would happily use these TLB entries to speculate into the kernel, we must flush them.

The good news is that there’s a CPU feature on reasonably recent Intel CPUs called Process Context IDs. This lets you load the lower bits of %cr3 with a small integer value. This ID is used as a tag in any TLB lookups or fills. This feature is somewhat similar to ASIDs seen on other architectures, with one notable difference. The PCID applies to TLB state implicitly, that is, there’s no way to say “load from memory using this ID” in ddi_copyin() and the like.

One way of using PCIDs is to associate an ID with a struct as: that is, each time we load a process’s address space into the HAT, we will use a specific PCID for it, and avoid having to flush the mappings for the previous processes. This isn’t really a viable option for Illumos, though: if nothing else we suspect that the additional shootdown flushes needed (since we’d maintain TLB entries even after switching away from a process’s struct as) would counteract any performance gain.

Instead we define two fixed PCID values. PCID_KERNEL, defined as 0 mainly to keep the boot process simple, is used for the kernel %cr3. Thus, all TLB loads while in the kernel will be tagged with this value. PCID_USER is used when in userspace. Now, when we switch %cr3 on kernel entry or exit, we can do a non-flushing load. This lets us keep both the kernel and the userspace mappings around across kernel/user transitions.

When we do need to invalidate TLB entries, though, things are now slightly more complicated. We are by definition in the kernel (and hence using PCID_KERNEL), but we have to account for memory addresses below USERLIMIT. In this case, we have to flush both PCID_USER (for anything that ran in user mode) and PCID_KERNEL (for any accesses the kernel may have made such as with ddi_copyin()). hat_switch() is also a little more complicated. As the %cr3 load there is non-invalidating, we have to explicitly flush everything if we’re switching away from a non-kas HAT, to clear out now-stale user-space mappings. (Note that this has always been done eagerly on Illumos, even when switching to a kas HAT).

The INVPCID instruction is what enable us to flush PCID_USER while in the kernel. Unfortunately, support for INVPCID came quite some time after PCID itself. On such systems, we have to emulate, and the only way Intel gives us to do this is to load the ID into %cr3 before invalidating the TLB entries. We don’t want to “pollute” PCID_USER with any extraneous kernel mappings, so this means we need to switch to the user page tables when loading PCID_USER. But, remember, KPTI requires us not to have kernel text (or stack!) mapped into these page tables. So we have to first make sure we’re in the trampoline text before doing the invalidations: see tr_mmu_flush_user_range.

For those interested, Alex posted a draft webrev of the PCID changes.

Converting HTML mail via procmail

Nov 3, 2016

All the procmail recipes I found on a quick search failed to handle quoted-printable HTML encodings, regularly used everywhere. And those that had quoted-printable examples used tools no longer maintained - such as mimencode.

The solution is to use Perl directly:

* ^Content-Type: text/html;
* ^Content-Transfer-Encoding: *quoted-printable
| perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::decode($_);'
| lynx -dump -force_html -stdin
| formail -i "Content-Type: text/plain; charset=us-ascii"

Ripping vinyl on Linux

Jan 10, 2014

I’ve been ripping a lot of stuff from vinyl to FLAC recently. Here’s how I do it.

I have an Alesis I/O 2, which works well and seems fairly decent quality.

First, most important, step, is to stop trying to use Audacity. It’s incredibly broken and unreliable. Go get ocenaudio instead. It’s fairly new, but it works reliably.

After monitoring your levels, record the whole thing into ocenaudio.

First trim any obviously loud clicks such as when landing the needle. ocenaudio doesn’t seem to have a “draw sample” function yet, the only thing I miss from Audacity, but deleting just a few samples is usually fine.

Normalise everything.

Then select a whole track using Shift-arrows (and Control to go faster). Press Control-K to convert it into a region, and name it if you like.
You’ll see references to using zero-crossing finders to split tracks. This is always a bad idea - it’s simply not reliable enough, especially with an old crackly record, isopropyl’d or not.

Zoom all the way out again, make sure the number of tracks is right.

Then File->Export Audio From Regions, making sure that the “separate files” checkbox is set.

Now it’s tagging time: run “kid3 yourdirwithflacs”. First import from discogs, presuming it has the release (it usually will) File->Import From Discogs. Then click ‘Tag 2’ in the Format Up part, along with the format you need. Save all those, then use Tools->Rename Directory to rename the containing directory. You’re done.

Recording on Linux with Alesis io|2

May 31, 2012

A little note for myself: to get low-latency monitoring, and more importantly, record at the right rate, you need to set the Configuration-Profile to “Digital Stereo Input” in pavucontrol!

Update: you also need this in ~/.pulse/daemon.conf :


Another update: PA/ALSA often seems to forget the sensible default devices, and ocenaudio starts
trying to record from the monitor devices. Solution seems to be to run pavucontrol, start ocenaudio recording, and change the drop down box to select io|2 Digital Stereo.

Old web content

Mar 15, 2011
I think it's important that everyone should endeavour to maintain existing web content, even if it's not currently relevant.