Yesterday, I upgraded a DigitalOcean droplet from FreeBSD 10.3 to FreeBSD 11.1 just before I headed to work. I’ve done such upgrades several times before. They all went well. This one did not. Several issues cascaded to prevent me from completely this task in a timely manner.
Let me describe the events as they unfolded.
The freebsd-update
From memory, because the system is not back online as I type this, the command I issued was freebsd-update -r 11.1-RELEASE upgrade. It was uneventful. It was slow, because this is a underpowered droplet.
I recall editing /etc/rc.subr during the merge phase. It was wonky. Wonky in that I thought: Oh, these are odd things to be changing.
I proceeded, eventually upgrading all the installed ports to 11.1 binaries.
The reboot and catch 22
After I rebooted, the box never came back.
I logged into the DigitalOcean droplet using the owner’s login. This is not my droplet. I’m helping someone else.
The login process required me to enter a 6-digit token, which was emailed to the owner.
Oh…
This droplet, the one which was offline, handles email for that domain. There is no other.
OK, let’s open a ticket.
Nope. You need to get logged in for that.
Nothing you can do.
After tweeting about this..
After I tweeted about this odd situation, someone suggested: roll out a new MX, adjust the DNS records, and get that authentication token.
Clever.
Fortunately, I did have access to the DNS supplier. I rolled up a new FreeBSD jail, configured Postfix as a secondary MX, then adjusted the DNS entries, and waited. I also dropped the TTL on the two MX records.
I didn’t wait long. Spam arrived.
Then the old, and probably expired tokens arrived. I deleted them.
I tried logging in again. Another token arrived. I suspect it was also expired.
Eventually, I got a valid token and got logged in.
What did I see?
After gaining access to the console, I saw this.
OK, sounds easy to fix. Let’s go:
# vi /etc/rc.subr -sh: vi: not found #
Ahh, might be $PATH. Let’s try this:
# /usr/bin/vi /etc/rc.subr -sh: /usr/bin/vi: not found # ls /usr # mount -a # ls /usr # # zfs mount -a # ls /usr bin home lib libdata local ports share tests games include include libexec obj sbin src
Oh. Phew.
Read-only file-system
I looked at the file, but I couldn’t figure out the issue in order to fix it. I discovered it contained 1393 lines, yet on other systems, this file was 2171 lines. This change was bigger than I wanted to edit by hand.
# cd /etc # mv rc.subr rc.subr.borked mv: renamed rc.subr to rc.subr.borked: Read-only file system
What?
Let’s review. mount -a did not get this mounted rw.
Getting help from other FreeBSD developers, I tried: mount -u -o rw
No errors, but / remained readonly.
It was at this point that IRC decided to debate the virtues of ZFS and having so many file systems… *sigh*
It was Colin Percival who came up with the winning command: mount -uw /
Working without a net
Given the file was so out of sync with what it should be (hundreds of missing lines), I decided to fetch the file from another location. The only problem: I had no network.
I tried service netif start, but it also uses /etc/rc.subr:
# service netif start /etc/rc.subr: 1391: Syntax error: "fi" unexpected #
What next?
Let’s check /etc/rc.conf and configure the network manually. It’s just a bunch of ifconfig commands.
Nope.
On the way home from the pub
By this time, it was getting later, and it was time to meet friends at the pub.
Insert beer, pizza, chilli, and fries.
On the way home from that, I had a sudden thought: /etc/rc.conf.local
I remembered that I was having trouble getting /etc/rc.conf to persist between reboots as DigitalOcean was doing ‘magic’.
I was excited about this prospect, but when the next day, looking at that file, it did not exist. There was no /etc/rc.conf.local.
Back to DigitalOcean
I went back to the DigitalOcean console.
I clicked on the droplet name.
I clicked on Networking in the left hand column.
There I found the IP address, gateway, and netmask. Everything I needed.
After that, I confirmed I could ping 8.8.8.8 (which is a Google DNS server).
To get the corrected /etc/rc.subr file, I decided to fetch it:
# cd /etc # mv rc.subr rc.subr.borked # fetch -o rc.subr https://svnweb.freebsd.org/base/release/11.1.0/etc/rc.subr?view=co
edit: Colin says I should check the sha256 hash of files I download from random internet sites before I install them.
To test my success, I tried starting sshd:
Success. We have net.
Now onto to the final tasks.
Populate /etc/rc.conf.local
I added ifconfig, hostname, and defaultrouter directives to /etc/rc.conf.local and rebooted the server.
After reboot, I saw I forgot to disable DHCP in /etc/rc.conf. I logged in via the console again, altered the file, and rebooted.
Now the broken droplet is back online and mail has started to trickle in, both from third parties and from my temporary MX.
Revert the DNS change
After getting the real host back online, I reverted all the DNS changes I made.
I left the MX in place for a while, because I’m sure those DNS entries will be cached for much longer than my TTL settings.
Get other access
DigitalOcean allows for teams.
I created one for this account and added my personal account to the team. Now, if the droplet owner can’t get in, at least I have my own login.
He has been informed of this change.
I also suggested he change the email address of his DigitalOcean account to something not hosted on that droplet.
Thus endth the lesson.
The diff
For interest, here is the diff of rc.subr before and after the fix. I am positive this was my fault, but I’m not sure how I deleted so many lines.