nagios03: drive recovery

After zpool upgrade blocked by gpart: /dev/da0p1: not enough space, I’ve decided to create a new Azure VM, snapshot the now-faulty-drive, attach it to the host, and start zfs replication to copy the data to new drive. Or something like that. The existing drive needs to be imported with a checkpoint rollback, then copied to a drive with different partition sizes.

Here’s the new host:

dvl@nagios03-recovery:~ $ gpart show
=>      34  62984125  da0  GPT  (30G)
        34      2014       - free -  (1.0M)
      2048       348    1  freebsd-boot  (174K)
      2396     66584    2  efi  (33M)
     68980  62914560    3  freebsd-zfs  (30G)
  62983540       619       - free -  (310K)

=>      40  33554352  da1  GPT  (16G)
        40  29360064    1  freebsd-ufs  (14G)
  29360104   4194288    2  freebsd-swap  (2.0G)

dvl@nagios03-recovery:~ $ 

My first impression: why only 174K for the boot partition? Then I saw the efi partition. I’m not familiar with this layou. I’ve only seen one or the other before.

A copy of the faulty drive has been created: nagios03-copy-for-checkpoint-rewind

This is the drive being attached:

Feb 19 21:45:57 freebsd kernel: da2 at storvsc1 bus 0 scbus1 target 0 lun 0
Feb 19 21:45:57 freebsd kernel: da2: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device
Feb 19 21:45:57 freebsd kernel: da2: 300.000MB/s transfers
Feb 19 21:45:57 freebsd kernel: da2: Command Queueing enabled
Feb 19 21:45:57 freebsd kernel: da2: 32768MB (67108864 512 byte sectors)
Feb 19 21:45:57 freebsd kernel: (da2:storvsc1:0:0:0): CACHE PAGE TOO SHORT data len 15 desc len 0
Feb 19 21:45:57 freebsd kernel: (da2:storvsc1:0:0:0): Mode page 8 missing, disabling SYNCHRONIZE CACHE

But:

dvl@nagios03-recovery:~ $ zpool import --rewind-to-checkpoint zroot newzroot
cannot import 'zroot': no such pool available

dvl@nagios03-recovery:~ $ gpart show da2
gpart: No such geom: da2.

dvl@nagios03-recovery:~ $ sudo diskinfo -v /dev/da2
/dev/da2
	512         	# sectorsize
	34359738368 	# mediasize in bytes (32G)
	67108864    	# mediasize in sectors
	4096        	# stripesize
	0           	# stripeoffset
	4177        	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	Msft Virtual Disk	# Disk descr.
	            	# Disk ident.
	storvsc1    	# Attachment
	Yes         	# TRIM/UNMAP support
	Unknown     	# Rotation rate in RPM
	Not_Zoned   	# Zone Mode

NOTE: I later realized I need sudo on that import.

Let’s try creating the drive again.

Feb 19 21:55:23 freebsd kernel: da3 at storvsc1 bus 0 scbus1 target 0 lun 1
Feb 19 21:55:23 freebsd kernel: da3: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device
Feb 19 21:55:23 freebsd kernel: da3: 300.000MB/s transfers
Feb 19 21:55:23 freebsd kernel: da3: Command Queueing enabled
Feb 19 21:55:23 freebsd kernel: da3: 32768MB (67108864 512 byte sectors)
Feb 19 21:55:23 freebsd kernel: (da3:storvsc1:0:0:1): CACHE PAGE TOO SHORT data len 15 desc len 0
Feb 19 21:55:23 freebsd kernel: (da3:storvsc1:0:0:1): Mode page 8 missing, disabling SYNCHRONIZE CACHE
Feb 19 21:55:23 freebsd kernel: GEOM: da3: the secondary GPT header is not in the last LBA.

And:

dvl@nagios03-recovery:~ $ gpart show da3
=>      34  62984125  da3  GPT  (32G) [CORRUPT]
        34      2014       - free -  (1.0M)
      2048       345    1  freebsd-boot  (173K)
      2393     66584    2  efi  (33M)
     68977  62914560    3  freebsd-zfs  (30G)
  62983537       622       - free -  (311K)

There. That’s better. Let’s try the import.

But first:

dvl@nagios03-recovery:~ $ sudo gpart recover da3
da3 recovered
dvl@nagios03-recovery:~ $ gpart show da3
=>      40  67108784  da3  GPT  (32G)
        40      2008       - free -  (1.0M)
      2048       345    1  freebsd-boot  (173K)
      2393     66584    2  efi  (33M)
     68977  62914560    3  freebsd-zfs  (30G)
  62983537   4125287       - free -  (2.0G)

dvl@nagios03-recovery:~ $ 

So now:

dvl@nagios03-recovery:~ $ zpool import --rewind-to-checkpoint zroot newzroot
cannot import 'zroot': no such pool available
dvl@nagios03-recovery:~ $ zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zroot  29.5G  3.26G  26.2G        -         -      -    11%  1.00x    ONLINE  -
dvl@nagios03-recovery:~ $ zpool import
cannot discover pools: permission denied

Ahh!

dvl@nagios03-recovery:~ $ sudo zpool import --rewind-to-checkpoint zroot newzroot
cannot import 'zroot': pool was previously in use from another system.
Last accessed by nagios03.unixathome.org (hostid=0) at Thu Feb 19 21:51:50 2026
The pool can be imported, use 'zpool import -f' to import the pool.

Good.

dvl@nagios03-recovery:~ $ sudo zpool import -f --rewind-to-checkpoint zroot newzroot
dvl@nagios03-recovery:~ $ sudo zpool import -f --rewind-to-checkpoint zroot newzroot
dvl@nagios03-recovery:~ $ zpool status newzroot
No such file or directory
dvl@nagios03-recovery:~ $ zpool list
No such file or directory
dvl@nagios03-recovery:~ $ zfs list
No such file or directory
dvl@nagios03-recovery:~ $ 

Well, that seems to have screwed everything. I know why. The new zpool is mounted. I should have added -N (Import the pool without mounting any file systems.)

This is not the first time, nor the last time I have forgotten this detail.

I power cycled the VM. Which didn’t work. I detached the two extra data disk (attempt 1 and attempt 2).

And I can’t get logged in again. The console doesn’t help much.

Oh, you have to click Apply after removing the drives.

Now it works:

dvl@nagios03-recovery:~ $ sudo zpool import -N --rewind-to-checkpoint newzroot
cannot import 'newzroot': checkpoint does not exist
	Destroy and re-create the pool from
	a backup source.
dvl@nagios03-recovery:~ $ sudo zpool import -N newzroot
dvl@nagios03-recovery:~ $ zpool status
  pool: newzroot
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
	The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
config:

	NAME                                          STATE     READ WRITE CKSUM
	newzroot                                      ONLINE       0     0     0
	  gptid/4a28c004-1f4f-11ef-ae18-002590ec5bf2  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
config:

	NAME        STATE     READ WRITE CKSUM
	zroot       ONLINE       0     0     0
	  da0p3     ONLINE       0     0     0

errors: No known data errors
dvl@nagios03-recovery:~ $ 

So the copied drive is there, and I can access if I need to.

However

To be fair, I can reconstruct this VM from Ansible. I think that may be less work than this. Create a new VM, install everything there.

I think that’s less work then trying to copy …. oh wait.

Now that I have the drive data, I can create another drive, partition it nicely, copy the data from this drive, profit.

Perhaps tomorrow.

Actually (the above was written a few days ago): I think I’ll create a new drive and copy everything over from the old drive. Easy.

I’ll post that soon, but I’ve been distracted by PostgreSQL 18 upgrades.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top