Today, I’m ready to adding two recently obtained 12T spinning disks to r730-03. This host is the work-horse which houses all the main backups and database regression testing. It also hosts my newly-created but not yet-functional graylog jail.
I will be following a previous post about adding drives because I don’t want to remember these things. They occur infrequently enough that documenting it is a good idea.
In this post:
- FreeBSD 14
The existing zpool
This is the zpool I am going to expand:
[13:58 r730-03 dvl ~] % zpool list data01 NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data01 21.8T 17.0T 4.86T - - 19% 77% 1.00x ONLINE - [13:58 r730-03 dvl ~] % zpool list -v data01 NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data01 21.8T 17.0T 4.86T - - 19% 77% 1.00x ONLINE - mirror-0 10.9T 8.46T 2.45T - - 20% 77.6% - ONLINE gpt/HGST_8CJVT8YE 10.9T - - - - - - - ONLINE gpt/SG_ZHZ16KEX 10.9T - - - - - - - ONLINE mirror-1 10.9T 8.50T 2.41T - - 19% 77.9% - ONLINE gpt/SG_ZHZ03BAT 10.9T - - - - - - - ONLINE gpt/HGST_8CJW1G4E 10.9T - - - - - - - ONLINE
As you can see, the two mirrors have balanced themselves out over time. The ALLOC column has nearly equal values. Look in the previously-mentioned post and you’ll see it started off rather differently.
Partitioning the new drives
ZFS can handle whole drives. I don’t do that. Mainly so that I can add in a drive of a different brand which may not be exactly identical in size. Partitions help with that.
Let’s look at the existing drives:
[14:07 r730-03 dvl ~] % gpart show -l => 40 937703008 da0 GPT (447G) 40 1024 1 gptboot0 (512K) 1064 984 - free - (492K) 2048 67108864 2 swap0 (32G) 67110912 870590464 3 zfs0 (415G) 937701376 1672 - free - (836K) => 40 937703008 da1 GPT (447G) 40 1024 1 gptboot1 (512K) 1064 984 - free - (492K) 2048 67108864 2 swap1 (32G) 67110912 870590464 3 zfs1 (415G) 937701376 1672 - free - (836K) => 40 23437770672 da5 GPT (11T) 40 23437770600 1 HGST_8CJW1G4E (11T) 23437770640 72 - free - (36K) => 40 23437770672 da2 GPT (11T) 40 23437770600 1 SG_ZHZ16KEX (11T) 23437770640 72 - free - (36K) => 40 23437770672 da3 GPT (11T) 40 23437770600 1 HGST_8CJVT8YE (11T) 23437770640 72 - free - (36K) => 40 23437770672 da4 GPT (11T) 40 23437770600 1 SG_ZHZ03BAT (11T) 23437770640 72 - free - (36K) [14:07 r730-03 dvl ~] %
Please take note of how the last four drives all have 36K free space at the end.
Clearly, da0 and da1 are my boot drives. The others are my main zpool. Those are the ones I will use as examples.
From previous work, I already know the new drives are da7 and da8:
[14:09 r730-03 dvl ~] % grep da7 /var/log/messages Jan 25 01:54:18 r730-03 kernel: da7 at mrsas0 bus 1 scbus1 target 6 lun 0 Jan 25 01:54:18 r730-03 kernel: da7:Fixed Direct Access SPC-4 SCSI device Jan 25 01:54:18 r730-03 kernel: da7: Serial Number ZL2NJBT2 Jan 25 01:54:18 r730-03 kernel: da7: 150.000MB/s transfers Jan 25 01:54:18 r730-03 kernel: da7: 11444224MB (23437770752 512 byte sectors) [14:09 r730-03 dvl ~] % grep da8 /var/log/messages Jan 25 01:54:29 r730-03 kernel: da8 at mrsas0 bus 1 scbus1 target 7 lun 0 Jan 25 01:54:29 r730-03 kernel: da8: Fixed Direct Access SPC-4 SCSI device Jan 25 01:54:29 r730-03 kernel: da8: Serial Number 8CJR6GZE Jan 25 01:54:29 r730-03 kernel: da8: 150.000MB/s transfers Jan 25 01:54:29 r730-03 kernel: da8: 11444224MB (2929721344 4096 byte sectors) [14:10 r730-03 dvl ~] %
First, I duplicate the partitions from existing drives in the zpool. This ensures all drives are have the same partition sizes and I don’t have to type it all in.
[14:10 r730-03 dvl ~] % gpart backup da5 | sudo gpart restore da7 [14:12 r730-03 dvl ~] % gpart backup da5 | sudo gpart restore da8 gpart: size '23437770600': Invalid argument [14:12 r730-03 dvl ~] % gpart show da5 da7 da8 => 40 23437770672 da5 GPT (11T) 40 23437770600 1 freebsd-zfs (11T) 23437770640 72 - free - (36K) => 34 23437770685 da7 GPT (11T) 34 6 - free - (3.0K) 40 23437770600 1 freebsd-zfs (11T) 23437770640 79 - free - (40K) gpart: No such geom: da8. [14:12 r730-03 dvl ~] %
Ouch. da8 did not take. I wonder if this is a capacity issue.
Also, look at da5 vs da7. They differ by 13 sectors. Fortunately, da7 is larger. If it was smaller, we might have hit problems. Similarly, with da8; read on.
Let’s look at the drive details from /var/log/messages (you might also need /var/run/dmesg.boot, depending on how often your log files rotate)
da5: 11444224MB (23437770752 512 byte sectors) da2: 11444224MB (23437770752 512 byte sectors) da8: 11444224MB (2929721344 4096 byte sectors)
At first, I was confused by the greatly reduced number of sectors. Then I realized it’s a 4K sector drive.
This should still work. This is the only 4K drive in the system.
Let’s do some math to figure how many sectors I need to match the existing drives:
da7 partition count * sector size / da8 sector size = da8 partition count 23437770600 * 512 / 4096 = 2929721325
Here, I create the new geom and a parition:
[14:20 r730-03 dvl ~] % sudo gpart create -s gpt da8 da8 created [14:23 r730-03 dvl ~] % sudo gpart add -i 1 -t freebsd-zfs -a 4k -s 2929721325 da8 da8p1 added [14:23 r730-03 dvl ~] % gpart show da8 => 6 2929721333 da8 GPT (11T) 6 2929721325 1 freebsd-zfs (11T) 2929721331 8 - free - (32K) [14:23 r730-03 dvl ~] %
Notice how I have 32K leftover. This drive is 8k smaller than the other drives. I suspect if I was using whole-drives, I would be in trouble now.
Trendy labels
The latest trend, which I’ve been following for years, is adding a label to the partition – which uses the serial number of the drive. That helps you know you’ve pulled the right drive.
[14:59 r730-03 dvl ~] % sudo gpart modify -i 1 -l SG_ZL2NJBT2 da7 da7p1 modified [14:59 r730-03 dvl ~] % sudo gpart modify -i 1 -l HGST_8CJR6GZE da8 da8p1 modified [14:59 r730-03 dvl ~] % gpart show da7 da8 => 34 23437770685 da7 GPT (11T) 34 6 - free - (3.0K) 40 23437770600 1 freebsd-zfs (11T) 23437770640 79 - free - (40K) => 6 2929721333 da8 GPT (11T) 6 2929721325 1 freebsd-zfs (11T) 2929721331 8 - free - (32K)
You can view the labels this way:
[14:59 r730-03 dvl ~] % gpart show -l da7 da8 => 34 23437770685 da7 GPT (11T) 34 6 - free - (3.0K) 40 23437770600 1 SG_ZL2NJBT2 (11T) 23437770640 79 - free - (40K) => 6 2929721333 da8 GPT (11T) 6 2929721325 1 HGST_8CJR6GZE (11T) 2929721331 8 - free - (32K)
Those labels also appear over here, and these are the devices I’ll use in my zpool command later.
[14:59 r730-03 dvl ~] % ls /dev/gpt HGST_8CJR6GZE HGST_8CJVT8YE HGST_8CJW1G4E SG_ZHZ03BAT SG_ZHZ16KEX SG_ZL2NJBT2 gptboot0 gptboot1 [14:57 r730-03 dvl ~] %
Any old labels?
Do we have any old labels? No. Good, otherwise it’s zpool labelclear for you!
[14:59 r730-03 dvl ~] % sudo zdb -l /dev/da7 failed to unpack label 0 failed to unpack label 1 failed to unpack label 2 failed to unpack label 3 [15:02 r730-03 dvl ~] % sudo zdb -l /dev/da8 failed to unpack label 0 failed to unpack label 1 failed to unpack label 2 failed to unpack label 3
Add them into the existing zpool
This is the command I always fear messing up.
[15:02 r730-03 dvl ~] % sudo zpool add data01 mirror /dev/gpt/SG_ZL2NJBT2 /dev/gpt/HGST_8CJR6GZE
The new status, 3-mirrors, which is expected:
[15:12 r730-03 dvl ~] % zpool status data01 pool: data01 state: ONLINE scan: scrub repaired 0B in 21:58:33 with 0 errors on Mon Jan 22 02:42:15 2024 config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/HGST_8CJVT8YE ONLINE 0 0 0 gpt/SG_ZHZ16KEX ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gpt/SG_ZHZ03BAT ONLINE 0 0 0 gpt/HGST_8CJW1G4E ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 gpt/SG_ZL2NJBT2 ONLINE 0 0 0 gpt/HGST_8CJR6GZE ONLINE 0 0 0 errors: No known data errors
The new allocation values, which is also expected. Already, a small amount of data written there:
[15:13 r730-03 dvl ~] % zpool list -v data01 NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data01 32.7T 17.0T 15.8T - - 13% 51% 1.00x ONLINE - mirror-0 10.9T 8.46T 2.45T - - 20% 77.6% - ONLINE gpt/HGST_8CJVT8YE 10.9T - - - - - - - ONLINE gpt/SG_ZHZ16KEX 10.9T - - - - - - - ONLINE mirror-1 10.9T 8.50T 2.41T - - 19% 77.9% - ONLINE gpt/SG_ZHZ03BAT 10.9T - - - - - - - ONLINE gpt/HGST_8CJW1G4E 10.9T - - - - - - - ONLINE mirror-2 10.9T 2.46M 10.9T - - 0% 0.00% - ONLINE gpt/SG_ZL2NJBT2 10.9T - - - - - - - ONLINE gpt/HGST_8CJR6GZE 10.9T - - - - - - - ONLINE [15:13 r730-03 dvl ~] %
Thank you for coming to my TED talk.
Oh wait, how is the alignment?
Here are all the paritions. All drives are 512 byte sectors, except for
da8
, which is 4096.
[15:54 r730-03 dvl ~] % gpart show => 40 937703008 da0 GPT (447G) 40 1024 1 freebsd-boot (512K) 1064 984 - free - (492K) 2048 67108864 2 freebsd-swap (32G) 67110912 870590464 3 freebsd-zfs (415G) 937701376 1672 - free - (836K) => 40 937703008 da1 GPT (447G) 40 1024 1 freebsd-boot (512K) 1064 984 - free - (492K) 2048 67108864 2 freebsd-swap (32G) 67110912 870590464 3 freebsd-zfs (415G) 937701376 1672 - free - (836K) => 40 23437770672 da5 GPT (11T) 40 23437770600 1 freebsd-zfs (11T) 23437770640 72 - free - (36K) => 40 23437770672 da2 GPT (11T) 40 23437770600 1 freebsd-zfs (11T) 23437770640 72 - free - (36K) => 40 23437770672 da3 GPT (11T) 40 23437770600 1 freebsd-zfs (11T) 23437770640 72 - free - (36K) => 40 23437770672 da4 GPT (11T) 40 23437770600 1 freebsd-zfs (11T) 23437770640 72 - free - (36K) => 34 23437770685 da7 GPT (11T) 34 6 - free - (3.0K) 40 23437770600 1 freebsd-zfs (11T) 23437770640 79 - free - (40K) => 6 2929721333 da8 GPT (11T) 6 2929721325 1 freebsd-zfs (11T) 2929721331 8 - free - (32K)
bsdimp said:
I’d be tempted to have all ZFS partitions be 1MB aligned…. but that’s more important for NMVE and SSDs (really, anything a power of two multiple than their underlying physical block size is good — NAND is in the 128-512k range these days so 1MB covers all the bases and gives good space at the start of the disk for sundry things. da2, da5, and da8 aren’t like this, but it’s spinning rust, so that doesn’t matter so much. 40 is 20k into the drive, and is a multiple of the GCM of the drive’s block sizes. Don’t know if the 512 drives are 512e4kn or not, but aligning to 4k is better. I’d have been tempted to still align to a power of 2 though (likely 1MB since it lets you convert to a boot drive should the need arise… or at least park a boot loader). An offset of 40 is fine. I don’t think that having it be a power of 2 would give a material speedup of ZFS on spinning rust.