Yes, this was not fun. I took some notes, but not everything.
Please read the notes if you plan on doing these foolish things.
I was cleaning up /usr/src on a host which does not hold src. I looked and found:
[dan@slocum:/usr] $ ls -l total 122177 drwxr-xr-x 2 root wheel 490 Aug 25 21:29 bin drwxr-xr-x 2 root wheel 2 Dec 4 2012 games drwxr-xr-x 4 root wheel 5 Oct 3 2014 home drwxr-xr-x 55 root wheel 313 Aug 17 18:23 include drwxr-xr-x 41 root wheel 41 Aug 18 14:16 jails drwxr-xr-x 8 root wheel 581 Aug 19 13:05 lib drwxr-xr-x 5 root wheel 597 Aug 19 13:05 lib32 drwxr-xr-x 6 root wheel 6 Mar 13 18:22 libdata drwxr-xr-x 8 root wheel 66 Aug 17 18:24 libexec drwxr-xr-x 19 root wheel 21 Mar 13 18:48 local drwxr-xr-x 3 root wheel 3 Jan 16 2015 obj drwxr-xr-x 70 root wheel 88 Mar 27 17:06 ports drwxr-xr-x 2 root wheel 285 Aug 25 21:29 sbin drwxr-xr-x 33 root wheel 33 Aug 17 18:23 share drwxr-xr-x 23 root wheel 31 Aug 18 20:01 src -rw-r--r-- 1 root wheel 124551336 Aug 12 15:44 src.txz $ ls src COPYRIGHT Makefile README cddl etc include libexec sbin sys usr.bin LOCKS Makefile.inc1 UPDATING contrib games kerberos5 release secure tests usr.sbin MAINTAINERS ObsoleteFiles.inc bin crypto gnu lib rescue share tools $ suod rm -rf bin games homes include jail lib lib32 libdata libexec local ports sbin share bash: suod: command not found $ sudo rm -rf bin games homes include jail lib lib32 libdata libexec local ports sbin share rm: lib32/libthr.so.3: Operation not permitted rm: lib32/libc.so.7: Operation not permitted rm: lib32/libcrypt.so.5: Operation not permitted rm: lib32/librt.so.1: Operation not permitted rm: lib32: Directory not empty ^C
It was there that I knew I’d done the wrong thing.
Trying snapshots
Let’s look for snapshots, find one from a few minutes ago, and rollback to that.
$ zfs list -t snapshot system/bootenv/default/usr cannot open 'system/bootenv/default/usr': operation not applicable to datasets of this type
I was flustered, and couldn’t figure out the correct command. Instead, I went to disk rather than commands.
$ cd .zfs/snapshot [dan@slocum:/usr/.zfs/snapshot] $ ls -l total 825 drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-03-12-18:46:28 drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-03-13-17:37:28 drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_14.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_15.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_16.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_17.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_18.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_19.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_20.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_21.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_22.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-06-01_23.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-07-01_16.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-07-01_17.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-07-01_18.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-07-01_19.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-07-01_20.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-07-01_21.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-07-01_22.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-07-01_23.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_02.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_03.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_04.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_05.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_06.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_07.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_08.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_09.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_10.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_11.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_12.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_13.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_14.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_15.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_16.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_17.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_18.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_19.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_20.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_21.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_22.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 2015-08-01_23.21.00--3m drwxr-xr-x 17 root wheel 17 Mar 12 17:45 AfterFreeBSDUpdateTo10.1ButBeforeInstall drwxr-xr-x 17 root wheel 17 Mar 12 17:45 AfterFreeBSDUpdateTo10AndAfterInstall drwxr-xr-x 17 root wheel 17 Mar 12 17:45 BeforeFreeBSD-Update drwxr-xr-x 17 root wheel 17 Mar 12 17:45 BeforeFreeBSD-UpdateTo10.1 drwxr-xr-x 17 root wheel 17 Mar 12 17:45 BeforeFreeBSDUpdateChrootTo10.1Recursive drwxr-xr-x 2 root wheel 2 Mar 12 17:41 BeforeTarOfUsr drwxr-xr-x 17 root wheel 17 Mar 12 17:45 BeforeTarOfVar drwxr-xr-x 17 root wheel 17 Mar 12 17:45 WithNewUSR
Initially, I thought I was good, but I was wrong. My most recent backup was from the first of the month. I had upgraded the server to 10.2 since then.
Oh, it’s really messed up
I want to check what was mounted:
$ zfs mount | grep usr bash: grep: command not found
Oh. /usr/bin/grep is gone. Probably so is all of /usr/bin/.
Let’s try root:
$ su bash: su: command not found
Maybe I have root in a tmux session:
$ tmux attach-session no sessions
I could not ssh in, nor could I login at the console. All I had was this ssh session, from which I could not su
Booting from USB key
I rebooted the server from a USB key burnt from a copy of FreeBSD-10.2-RELEASE-amd64-memstick.img. Once booted, I started copying files & directories which were not found. I rebooted and was greeted by:
I spent time checking other directories etc and making sure everything was there. Still no good. Same problem.
mfsBSD
The work above was done from the console. This got tiring. It’s a small screen, not ssh, no copy/paste, etc.
I downloaded a copy of mfsBSD and burned it to a thumb drive. I could not boot it, no matter what I did. Eventually, I realized I was booting from an ISO, not an .img, and once I fixed that, booting worked just fine.
mfsBSD boots up with sshd running via DHCP, and root login is enabled. This allowed me to move the coach, work from my laptop, and, more importantly, allowed me to copy/paste.
Lots of wasted time
I spent a lot of time checking the install and verifying that files existing. Yes, I had /usr/libexec/getty and yes, it looked right:
# ldd /usr/libexec/getty libutil.so.9 => /lib/libutil.so.9 (0x800824000) libc.so.7 => /lib/libc.so.7 (0x800a36000) #
We, and by we, I mean myself and others who offered help via IRC, spent time checking the mount points, verifying that I was updating the right file system (i.e. system/bootenv/10.1-RELEASE), and that that filesystem was mounted at run time. This screen shot confirms that last point:
Copying
For the record, when I copied files over, I first tried plain old cp. Later, when I settled in, I started doing tar piped to tar. I know I’ve done this before, but I searched and found a reference.
I was doing stuff like this:
# cd /usr/bin # tar cf - . | (cd /mnt/usr/src && tar xBf -)
Mounting ZFS stuff
How was I mounting my ZFS data? With this command from within mfsBSD:
# zpool import -f -o altroot=/mnt system
This will force import the filesystems and mount them at an alternative point, /mnt.
The big hint
After many attempts and reboots, I noticed this while reading the scrollback on the console. To access the scrollback, press Scrolllock on your keyboard, then PG UP / PG DOWN.
What part you say? The bit about libssh.so.5
I rebooted back into mfsBSD and started copying (via tar) more files. Then I rebooted.
BANG!
sshd was running and I could log into the server.
Reinstall all packages
I reinstalled all my packages with this command:
# pkg upgrade -f
This grabbed the packages from my package server (thanks to the folks for pkg & poudriere, saved me lots of time.
Please read these notes
When copying from your thumb drive to your OS, it is a very good idea to have matching versions of the binaries. In my case, I was booting from mfsBSD 10.2 amd64, which matched the version (10.2) and architecture (amd64) of the server I was trying to repair.
I missed some very basic clues by not sitting down at the console and reading ALL of the boot process messages (by paging through the scrollback).
I should not have copied from the FreeBSD install thumb drive, which is a reduced set of binaries.
I should have re-copied everything from the mfsBSD thumb drive, redoing what I had already done with the previous thumb drive.
I have reenabled zfSnap in my /etc/crontab so I have snapshots taken every three minutes, and kept for 24 hours.
Here is the full screenshot of what happened when I deleted the files…