How Many LWPs are Enough LWPs?
Over Labor Day weekend I decided to re-install my local Triton. After getting everything back online, I noticed that any attempt to provision a zone with a delegated dataset would fail. The failed provision jobs had the error:
vmadm exited with code: 1: first of 2 errors: timed out waiting for /var/svc/provisioning to move for 9d591bec-774f-4afe-8dbc-e6c359849886
I found that the Triton Data Center private cloud documentation references this error message and I tried the troubleshooting steps without success. It also suggested that one might need to consult the vmadm logs. Doing just that revealed this hint:
[2022-09-08T18:00:09.873Z] WARN: vmadmd/5401 on davao: (req_id=c4565909-175f-c4e6-d6f4-cf6648f61d7a, action=waitForProvisioning, vm=8d12c37b-7daf-4bba-b128-13bedd6e6bb0) svcs -xv output for 8d12c37b-7daf-4bba-b128-13bedd6e6bb0 at fail time: svc:/system/filesystem/local:default (local file system mounts) Zone: 8d12c37b-7daf-4bba-b128-13bedd6e6bb0 Alias: delegate-test-7 State: offline since September 8, 2022 at 05:55:58 PM UTC Reason: Start method is running. See: http://illumos.org/msg/SMF-8000-C4 See: /zones/8d12c37b-7daf-4bba-b128-13bedd6e6bb0/root/var/svc/log/system-filesystem-local:default.log Impact: 13 dependent services are not running:
On the CN, the zone's root dataset was installed but, it's status was
zfs list revealed that the expected delegate dataset existed. I wanted to try and boot the zone manually and poke around a bit further:
root@cn# zoneadm -z 649dc916-1a80-42ad-93c1-c2f8f0d9fdd6 boot root@cn# zlogin 649dc916-1a80-42ad-93c1-c2f8f0d9fdd6 root@649dc916-1a80-42ad-93c1-c2f8f0d9fdd6# svcs STATE STIME FMRI ... [snip unrelated services] ... offline* 18:57:15 svc:/system/filesystem/local:default
The svcs(1) man page kindly reminded me that:
An asterisk (*) is appended for instances in transition
It was clear that the service was never going to complete it's transition and the log file had nothing of use.
Inspecting the service's script showed that the service runs
zfs mount -va at one point and
ps showed that a zfs command was indeed running:
root@649dc916-1a80-42ad-93c1-c2f8f0d9fdd6# ps aux USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND root 21281 0.1 0.1 8804 7632 ? S 18:57:08 0:08 /lib/svc/bin/svc.configd root 21668 0.0 0.0 2216 1192 pts/2 O 18:58:51 0:00 ps aux root 20955 0.0 0.0 0 0 ? S 18:57:07 0:00 zsched root 21258 0.0 0.0 2324 1440 ? S 18:57:08 0:00 /sbin/init root 21317 0.0 0.0 3964 2384 pts/2 S 18:57:09 0:00 /usr/bin/login -z global -f root root 21318 0.0 0.0 3660 2548 pts/2 S 18:57:09 0:00 -bash netadm 21336 0.0 0.0 3816 2512 ? S 18:57:10 0:00 /lib/inet/ipmgmtd root 21401 0.0 0.1 5556 3384 ? S 18:57:12 0:00 /usr/sbin/nscd root 21512 0.0 0.0 1948 1120 ? S 18:57:14 0:00 /usr/lib/pfexecd root 21592 0.0 0.1 4928 2980 ? S 18:57:15 0:00 /sbin/sh /lib/svc/method/fs-local root 21600 0.0 0.1 10092 6060 ? S 18:57:15 0:00 /usr/sbin/zfs mount -va root 21610 0.0 0.1 4928 2980 ? S 18:57:15 0:00 /sbin/sh /lib/svc/method/fs-local root 21617 0.0 0.1 8384 4356 ? S 18:57:16 0:00 /usr/sbin/zfs mount -va root 21620 0.0 0.0 1764 1180 console S 18:57:16 0:00 /etc/sulogin - root 21621 0.0 0.0 0 0 Z 0:00
Curious what it's doing, I consulted pstack(1):
root@649dc916-1a80-42ad-93c1-c2f8f0d9fdd6# pstack 21617 ... [snip lots of output] ... -------------------- thread# 66 / lwp# 66 -------------------- fffffc7fef040957 lwp_park (0, 0, 0) fffffc7fef039deb cond_wait_queue (4afeb8, 4afe60, 0) + 5b fffffc7fef03a23d cond_wait_common (4afeb8, 4afe60, 0) + 1dd fffffc7fef03a40c __cond_wait (4afeb8, 4afe60) + 6c fffffc7fef03a494 cond_wait (4afeb8, 4afe60) + 24 fffffc7feeb29d90 zfs_taskq_thread (4afe40) + 160 fffffc7fef0405fc _thrp_setup (fffffc7fef1b0240) + 6c fffffc7fef040910 _lwp_start ()
The process has a lot of LWPs and they are all hanging out at the
While we're here, now would be a good time to define a Light Weight Process. According to the Solaris Internals book an LWP is "the kernel visible execution context for a thread". Every process has one LWP and every additional thread in that process has an LWP too.
In Triton, the number of threads (LWPs) a zone can use are configurable for each package. I had created some new packages during my re-install and set 400 LWPs for this particular package as you can see from the prctl(1) output:
root@cn# prctl -i zone -n zone.max-lwps 649dc916-1a80-42ad-93c1-c2f8f0d9fdd6 zone: 14: 649dc916-1a80-42ad-93c1-c2f8f0d9fdd6 NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT zone.max-lwps usage 398 privileged 400 - deny - system 2.15G max deny -
Well, there's the problem! The zone is using 398 of the 400 allowed LWPs. When using delegate datasets 400 LWPs is not enough for the
zfs mount invocation that's needed. After a bit of testing it seems neccesary to have at least 625 LWPs in order to create a Joyent branded zone with delegate datasets else
svc:/system/filesystem/local:default will hang forever. At least on the hardware that I'm using. It is worth noting that, without a delegate dataset, 250 LWPs will do (the lowest number I could create for a package) and so I never encountered the issue until I went to use the delegate dataset feature.
- At one point while debugging the issue, one of the
zfs mountinstances had over 500 LWPs. Why are so many LWPs needed for one process?
- Why are two instances of
/usr/sbin/zfs mount -varunning at once in the zone? Maybe those can be serialized to reduce the number of LWPs needed?