Aug 292015
 

Yes, this was not fun. I took some notes, but not everything.

Please read the notes if you plan on doing these foolish things.

I was cleaning up /usr/src on a host which does not hold src. I looked and found:

[dan@slocum:/usr] $ ls -l
total 122177
drwxr-xr-x   2 root  wheel        490 Aug 25 21:29 bin
drwxr-xr-x   2 root  wheel          2 Dec  4  2012 games
drwxr-xr-x   4 root  wheel          5 Oct  3  2014 home
drwxr-xr-x  55 root  wheel        313 Aug 17 18:23 include
drwxr-xr-x  41 root  wheel         41 Aug 18 14:16 jails
drwxr-xr-x   8 root  wheel        581 Aug 19 13:05 lib
drwxr-xr-x   5 root  wheel        597 Aug 19 13:05 lib32
drwxr-xr-x   6 root  wheel          6 Mar 13 18:22 libdata
drwxr-xr-x   8 root  wheel         66 Aug 17 18:24 libexec
drwxr-xr-x  19 root  wheel         21 Mar 13 18:48 local
drwxr-xr-x   3 root  wheel          3 Jan 16  2015 obj
drwxr-xr-x  70 root  wheel         88 Mar 27 17:06 ports
drwxr-xr-x   2 root  wheel        285 Aug 25 21:29 sbin
drwxr-xr-x  33 root  wheel         33 Aug 17 18:23 share
drwxr-xr-x  23 root  wheel         31 Aug 18 20:01 src
-rw-r--r--   1 root  wheel  124551336 Aug 12 15:44 src.txz


$ ls src
COPYRIGHT         Makefile          README            cddl              etc               include           libexec           sbin              sys               usr.bin
LOCKS             Makefile.inc1     UPDATING          contrib           games             kerberos5         release           secure            tests             usr.sbin
MAINTAINERS       ObsoleteFiles.inc bin               crypto            gnu               lib               rescue            share             tools


$ suod rm -rf bin games homes include jail lib lib32 libdata libexec local ports sbin share
bash: suod: command not found

$ sudo  rm -rf bin games homes include jail lib lib32 libdata libexec local ports sbin share
rm: lib32/libthr.so.3: Operation not permitted
rm: lib32/libc.so.7: Operation not permitted
rm: lib32/libcrypt.so.5: Operation not permitted
rm: lib32/librt.so.1: Operation not permitted
rm: lib32: Directory not empty
^C

It was there that I knew I’d done the wrong thing.

Trying snapshots

Let’s look for snapshots, find one from a few minutes ago, and rollback to that.

$ zfs list -t snapshot system/bootenv/default/usr
cannot open 'system/bootenv/default/usr': operation not applicable to datasets of this type

I was flustered, and couldn’t figure out the correct command. Instead, I went to disk rather than commands.

$ cd .zfs/snapshot
[dan@slocum:/usr/.zfs/snapshot] $ ls -l
total 825
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-03-12-18:46:28
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-03-13-17:37:28
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_14.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_15.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_16.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_17.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_18.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_19.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_20.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_21.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_22.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-06-01_23.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-07-01_16.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-07-01_17.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-07-01_18.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-07-01_19.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-07-01_20.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-07-01_21.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-07-01_22.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-07-01_23.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_02.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_03.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_04.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_05.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_06.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_07.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_08.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_09.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_10.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_11.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_12.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_13.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_14.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_15.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_16.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_17.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_18.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_19.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_20.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_21.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_22.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 2015-08-01_23.21.00--3m
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 AfterFreeBSDUpdateTo10.1ButBeforeInstall
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 AfterFreeBSDUpdateTo10AndAfterInstall
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 BeforeFreeBSD-Update
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 BeforeFreeBSD-UpdateTo10.1
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 BeforeFreeBSDUpdateChrootTo10.1Recursive
drwxr-xr-x   2 root  wheel   2 Mar 12 17:41 BeforeTarOfUsr
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 BeforeTarOfVar
drwxr-xr-x  17 root  wheel  17 Mar 12 17:45 WithNewUSR

Initially, I thought I was good, but I was wrong. My most recent backup was from the first of the month. I had upgraded the server to 10.2 since then.

Oh, it’s really messed up

I want to check what was mounted:

$ zfs mount | grep usr
bash: grep: command not found

Oh. /usr/bin/grep is gone. Probably so is all of /usr/bin/.

Let’s try root:

$ su
bash: su: command not found

Maybe I have root in a tmux session:

$ tmux attach-session
no sessions

I could not ssh in, nor could I login at the console. All I had was this ssh session, from which I could not su

Booting from USB key

I rebooted the server from a USB key burnt from a copy of FreeBSD-10.2-RELEASE-amd64-memstick.img. Once booted, I started copying files & directories which were not found. I rebooted and was greeted by:

cannotexecgetty

I spent time checking other directories etc and making sure everything was there. Still no good. Same problem.

mfsBSD

The work above was done from the console. This got tiring. It’s a small screen, not ssh, no copy/paste, etc.

I downloaded a copy of mfsBSD and burned it to a thumb drive. I could not boot it, no matter what I did. Eventually, I realized I was booting from an ISO, not an .img, and once I fixed that, booting worked just fine.

mfsBSD boots up with sshd running via DHCP, and root login is enabled. This allowed me to move the coach, work from my laptop, and, more importantly, allowed me to copy/paste.

Lots of wasted time

I spent a lot of time checking the install and verifying that files existing. Yes, I had /usr/libexec/getty and yes, it looked right:

# ldd /usr/libexec/getty
       libutil.so.9 => /lib/libutil.so.9 (0x800824000)
       libc.so.7 => /lib/libc.so.7 (0x800a36000)
#

We, and by we, I mean myself and others who offered help via IRC, spent time checking the mount points, verifying that I was updating the right file system (i.e. system/bootenv/10.1-RELEASE), and that that filesystem was mounted at run time. This screen shot confirms that last point:

IMG_2156

Copying

For the record, when I copied files over, I first tried plain old cp. Later, when I settled in, I started doing tar piped to tar. I know I’ve done this before, but I searched and found a reference.

I was doing stuff like this:

# cd /usr/bin
# tar cf - . | (cd /mnt/usr/src && tar xBf -)

Mounting ZFS stuff

How was I mounting my ZFS data? With this command from within mfsBSD:

# zpool import -f -o altroot=/mnt system

This will force import the filesystems and mount them at an alternative point, /mnt.

The big hint

After many attempts and reboots, I noticed this while reading the scrollback on the console. To access the scrollback, press Scrolllock on your keyboard, then PG UP / PG DOWN.

ssh

What part you say? The bit about libssh.so.5

I rebooted back into mfsBSD and started copying (via tar) more files. Then I rebooted.

BANG!

sshd was running and I could log into the server.

Reinstall all packages

I reinstalled all my packages with this command:

# pkg upgrade -f

This grabbed the packages from my package server (thanks to the folks for pkg & poudriere, saved me lots of time.

Please read these notes

When copying from your thumb drive to your OS, it is a very good idea to have matching versions of the binaries. In my case, I was booting from mfsBSD 10.2 amd64, which matched the version (10.2) and architecture (amd64) of the server I was trying to repair.

I missed some very basic clues by not sitting down at the console and reading ALL of the boot process messages (by paging through the scrollback).

I should not have copied from the FreeBSD install thumb drive, which is a reduced set of binaries.

I should have re-copied everything from the mfsBSD thumb drive, redoing what I had already done with the previous thumb drive.

I have reenabled zfSnap in my /etc/crontab so I have snapshots taken every three minutes, and kept for 24 hours.

Here is the full screenshot of what happened when I deleted the files…

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive