This is a repeat of a benchmark I did yesterday. The drive is a TOSHIBA DT01ACA300 3TB HDD. This is a 7200 RPM SATA III drive. Tests were run with FreeBSD 9.1 in the hardware listed below. Tonight, we’re going to do the partitions slightly differently, and try ZFS.
The hardware
We are testing on the following hardware:
- motherboard – SUPERMICRO MBD-H8SGL-O ATX Server Motherboard (Supermicro link): $224.99
- CPU – AMD Opteron 6128 Magny-Cours 2.0GHz 8 x 512KB L2 Cache 12MB L3 Cache Socket G34 115W 8-Core Server : $284.99
- RAM – Kingston 8GB 240-Pin DDR3 SDRAM ECC Registered DDR3 1600 Server Memory : 4 x $64.99 = $259.96
- PSU – PC Power and Cooling Silencer MK III 600W power supply : $99.99
- SATA card – LSI Internal SATA/SAS 9211-8i 6Gb/s PCI-Express 2.0 RAID Controller Card, Kit (LSI page): $319.99
- HDD for ZFS – Seagate Barracuda ST2000DM001 2TB 7200 RPM 64MB : 8 x $109.99 = $879.92
The drive being tested is not part of the base OS.
The devices
The LSI card:
mps0: <LSI SAS2008> port 0x8000-0x80ff mem 0xfde3c000-0xfde3ffff,0xfde40000-0xfde7ffff irq 28 at device 0.0 on pci1 mps0: Firmware: 14.00.01.00, Driver: 14.00.00.01-fbsd mps0: IOCCapabilities: 185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>
The drives:
da0 at mps0 bus 0 scbus0 target 4 lun 0 da0: <ATA TOSHIBA DT01ACA3 ABB0> Fixed Direct Access SCSI-6 device da0: 600.000MB/s transfers da0: Command Queueing enabled da0: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C)
dd to raw device
We are skipping the raw dd. From the previous benchmark, we got rates between 175 MB/s and 177 MB/s.
The diskinfo
Here’s diskinfo (as copied from the previous benchmark):
# diskinfo -tv /dev/da0 /dev/da0 512 # sectorsize 3000592982016 # mediasize in bytes (2.7T) 5860533168 # mediasize in sectors 4096 # stripesize 0 # stripeoffset 364801 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Z2T5TSSAS # Disk ident. Seek times: Full stroke: 250 iter in 6.302236 sec = 25.209 msec Half stroke: 250 iter in 4.395477 sec = 17.582 msec Quarter stroke: 500 iter in 7.240961 sec = 14.482 msec Short forward: 400 iter in 1.999334 sec = 4.998 msec Short backward: 400 iter in 2.338969 sec = 5.847 msec Seq outer: 2048 iter in 0.163964 sec = 0.080 msec Seq inner: 2048 iter in 0.172619 sec = 0.084 msec Transfer rates: outside: 102400 kbytes in 0.580453 sec = 176414 kbytes/sec middle: 102400 kbytes in 0.652515 sec = 156931 kbytes/sec inside: 102400 kbytes in 1.098502 sec = 93218 kbytes/sec
phybs
Next, we run phybs (as copied from the previous benchmark):
# ./phybs -rw -l 1024 /dev/da0 count size offset step msec tps kBps 131072 1024 0 4096 59527 2201 2201 131072 1024 512 4096 59320 2209 2209 65536 2048 0 8192 31212 2099 4199 65536 2048 512 8192 31780 2062 4124 65536 2048 1024 8192 31896 2054 4109 32768 4096 0 16384 11575 2830 11322 32768 4096 512 16384 26017 1259 5037 32768 4096 1024 16384 26197 1250 5003 32768 4096 2048 16384 26188 1251 5004 16384 8192 0 32768 9464 1731 13849 16384 8192 512 32768 21142 774 6199 16384 8192 1024 32768 23422 699 5595 16384 8192 2048 32768 22764 719 5757 16384 8192 4096 32768 10493 1561 12491
dd to the filesystem
First, we’ll do UFS. After partitioning and newfs’ing, we have:
# gpart show da0 da0s1 => 34 5860533101 da0 GPT (2.7T) 34 966 - free - (483k) 1000 5860532128 1 freebsd (2.7T) 5860533128 7 - free - (3.5k) => 0 4294967295 da0s1 BSD (2.7T) 0 4294967288 1 freebsd-ufs (2T) 4294967288 7 - free - (3.5k)
Next, the dd:
[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing32 bs=32k count=300000 300000+0 records in 300000+0 records out 9830400000 bytes transferred in 60.514341 secs (162447443 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing64 bs=64k count=300000 300000+0 records in 300000+0 records out 19660800000 bytes transferred in 122.158163 secs (160945446 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing128 bs=128k count=300000 300000+0 records in 300000+0 records out 39321600000 bytes transferred in 249.585626 secs (157547534 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing256 bs=256k count=300000 300000+0 records in 300000+0 records out 78643200000 bytes transferred in 528.035264 secs (148935507 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing512 bs=512k count=300000 300000+0 records in 300000+0 records out 157286400000 bytes transferred in 1232.900178 secs (127574319 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing1024 bs=1024k count=300000 300000+0 records in 300000+0 records out 314572800000 bytes transferred in 2167.745021 secs (145115222 bytes/sec)
That ranges from 121-154 MB/s.
bonnie++
And finally, a quick bonnie++:
$ bonnie++ -s 66000 Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP heckler.unix 66000M 535 99 154324 30 55599 48 1035 98 156736 24 211.0 7 Latency 18874us 354ms 10425ms 70732us 1560ms 473ms Version 1.97 ------Sequential Create------ --------Random Create-------- heckler.unixathome. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 41615us 36us 45us 46619us 35us 45us 1.97,1.97,heckler.unixathome.org,1,1360307494,66000M,,535,99,154324,30,55599,48,1035,98,156736,24,211.0,7,16,,,,,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,18874us,354ms,10425ms,70732us,1560ms,473ms,41615us,36us,45us,46619us,35us,45us
And, for the record:
$ df -h /mnt Filesystem Size Used Avail Capacity Mounted on /dev/da0s1 2.7T 576G 1.9T 23% /mnt
ashift
I recommend you read this post regarding ashift. Then you’ll see why I tried a benchmark with and without ashift=12.
create the zpool
Next, we shall create the zpool and try benchmarking that.
We started with this:
$ gpart show => 34 5860533101 da0 GPT (2.7T) 34 966 - free - (483k) 1000 5860532128 1 freebsd (2.7T) 5860533128 7 - free - (3.5k) => 0 4294967295 da0s1 BSD (2.7T) 0 4294967288 1 freebsd-ufs (2T) 4294967288 7 - free - (3.5k)
To get this back to a starting point, I did:
# gpart delete -i 1 da0s1 da0s1a deleted # gpart destroy da0s1 da0s1 destroyed # gpart delete -i 1 da0 da0s1 deleted # gpart destroy da0 da0 destroyed
Then:
gpart create -s GPT da0 da0 created # gpart add -b 1000 -a 4k -t freebsd-zfs -s 95G da0 da0p1 added # gpart show da0 => 34 5860533101 da0 GPT (2.7T) 34 966 - free - (483k) 1000 199229440 1 freebsd-zfs (95G) 199230440 5661302695 - free - (2.7T) # zpool create -m /mnt example /dev/da0p1 # zpool status pool: example state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM example ONLINE 0 0 0 da0p1 ONLINE 0 0 0 errors: No known data errors
Doing the dd – ashift != 12
$ cat ~/bin/ddFileSystem4k #!/bin/sh COUNTS="100 200 400 800 1600 3200" for count in ${COUNTS} do CMD="dd if=/dev/zero of=testing${count} bs=4k count=${count}k" echo '$' ${CMD} `${CMD}` done [dan@heckler:/mnt/dan] $ ~/bin/ddFileSystem4k $ dd if=/dev/zero of=testing100 bs=4k count=100k 102400+0 records in 102400+0 records out 419430400 bytes transferred in 1.533860 secs (273447648 bytes/sec) $ dd if=/dev/zero of=testing200 bs=4k count=200k 204800+0 records in 204800+0 records out 838860800 bytes transferred in 3.018803 secs (277878627 bytes/sec) $ dd if=/dev/zero of=testing400 bs=4k count=400k 409600+0 records in 409600+0 records out 1677721600 bytes transferred in 12.839801 secs (130665700 bytes/sec) $ dd if=/dev/zero of=testing800 bs=4k count=800k 819200+0 records in 819200+0 records out 3355443200 bytes transferred in 26.395755 secs (127120561 bytes/sec) $ dd if=/dev/zero of=testing1600 bs=4k count=1600k 1638400+0 records in 1638400+0 records out 6710886400 bytes transferred in 47.622508 secs (140918375 bytes/sec) $ dd if=/dev/zero of=testing3200 bs=4k count=3200k 3276800+0 records in 3276800+0 records out 13421772800 bytes transferred in 100.156393 secs (134008149 bytes/sec) [dan@heckler:/mnt/dan] $
Then I did some 300k count tests:
$ ~/bin/ddFileSystem dd if=/dev/zero of=testing32 bs=32k count=300k 307200+0 records in 307200+0 records out 10066329600 bytes transferred in 95.411953 secs (105503863 bytes/sec) dd if=/dev/zero of=testing64 bs=64k count=300k ^C298832+0 records in 298831+0 records out
Which I stopped because of the terrible throughput. Then I noticed. This is just a 95G partition. Oops. Sorry, bad paste. Let me try again.
Let’s try that again:
[root@heckler /home/dan]# gpart delete -i 1 da0 da0p1 deleted [root@heckler /home/dan]# gpart add -b 1000 -a 4k -t freebsd-zfs da0 da0p1 added [root@heckler /home/dan]# gpart show da0 => 34 5860533101 da0 GPT (2.7T) 34 966 - free - (483k) 1000 5860532128 1 freebsd-zfs (2.7T) 5860533128 7 - free - (3.5k) [root@heckler /home/dan]# zpool create -m /mnt example /dev/da0p1 [root@heckler /home/dan]# zpool status pool: example state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM example ONLINE 0 0 0 da0p1 ONLINE 0 0 0 errors: No known data errors [root@heckler /home/dan]#
Now, where was I? Ahh yes, getting ashift=12. I found http://savagedlight.me/2012/07/15/freebsd-zfs-advanced-format/ and it had a simple enough example to be useful.
[root@heckler /home/dan]# gpart add -a 1m -t freebsd-zfs -l Bay1.1 da0 da0p1 added [root@heckler /home/dan]# gnop create -S 4k gpt/Bay1.1 [root@heckler /home/dan]# zpool create -m /mnt example /dev/gpt/Bay1.1.nop invalid vdev specification use '-f' to override the following errors: /dev/gpt/Bay1.1.nop is part of exported pool 'pool' [root@heckler /home/dan]# zpool create -f -m /mnt example /dev/gpt/Bay1.1.nop [root@heckler /home/dan]# gpart show da0 => 34 5860533101 da0 GPT (2.7T) 34 2014 - free - (1M) 2048 5860530176 1 freebsd-zfs (2.7T) 5860532224 911 - free - (455k) [root@heckler /home/dan]# mount /dev/mirror/gm0s1a on / (ufs, local, journaled soft-updates) devfs on /dev (devfs, local, multilabel) /dev/mirror/gm0s1d on /var (ufs, local, journaled soft-updates) /dev/mirror/gm0s1e on /tmp (ufs, local, journaled soft-updates) /dev/mirror/gm0s1f on /usr (ufs, local, journaled soft-updates) example on /mnt (zfs, local, nfsv4acls) [root@heckler /home/dan]# zpool export example [root@heckler /home/dan]# gnop destroy gpt/Bay1.1.nop [root@heckler /home/dan]# zpool import -d /dev/gpt example [root@heckler /home/dan]# mount /dev/mirror/gm0s1a on / (ufs, local, journaled soft-updates) devfs on /dev (devfs, local, multilabel) /dev/mirror/gm0s1d on /var (ufs, local, journaled soft-updates) /dev/mirror/gm0s1e on /tmp (ufs, local, journaled soft-updates) /dev/mirror/gm0s1f on /usr (ufs, local, journaled soft-updates) example on /mnt (zfs, local, nfsv4acls) [root@heckler /home/dan]# gpart show da0 => 34 5860533101 da0 GPT (2.7T) 34 2014 - free - (1M) 2048 5860530176 1 freebsd-zfs (2.7T) 5860532224 911 - free - (455k) [root@heckler /home/dan]# zdb | grep ashift ashift: 12 [root@heckler /home/dan]#
Ahh, there’s the ashift we want.
dd with ashift, different order from the ufs test
I started off with a different test than with ufs. Then I repeated the tests in the same order as ufs.
$ ~/bin/ddFileSystem dd if=/dev/zero of=testing32 bs=32k count=300k 307200+0 records in 307200+0 records out 10066329600 bytes transferred in 64.243268 secs (156690808 bytes/sec) dd if=/dev/zero of=testing64 bs=64k count=300k 307200+0 records in 307200+0 records out 20132659200 bytes transferred in 147.213138 secs (136758577 bytes/sec) dd if=/dev/zero of=testing128 bs=128k count=300k 307200+0 records in 307200+0 records out 40265318400 bytes transferred in 232.357049 secs (173290712 bytes/sec) dd if=/dev/zero of=testing256 bs=256k count=300k 307200+0 records in 307200+0 records out 80530636800 bytes transferred in 689.901064 secs (116727805 bytes/sec) dd if=/dev/zero of=testing1024 bs=1024k count=300k 307200+0 records in 307200+0 records out 322122547200 bytes transferred in 2403.438818 secs (134025691 bytes/sec) dd if=/dev/zero of=testing2048 bs=2048k count=300k 307200+0 records in 307200+0 records out 644245094400 bytes transferred in 5186.330284 secs (124219835 bytes/sec)
That’s 118-165MB/s. Not very consistent.
Next, we have writing of 4k blocks.
$ ~/bin/ddFileSystem4k $ dd if=/dev/zero of=testing100 bs=4k count=100k 102400+0 records in 102400+0 records out 419430400 bytes transferred in 1.716126 secs (244405369 bytes/sec) $ dd if=/dev/zero of=testing200 bs=4k count=200k 204800+0 records in 204800+0 records out 838860800 bytes transferred in 5.224607 secs (160559598 bytes/sec) $ dd if=/dev/zero of=testing400 bs=4k count=400k 409600+0 records in 409600+0 records out 1677721600 bytes transferred in 11.608812 secs (144521385 bytes/sec) $ dd if=/dev/zero of=testing800 bs=4k count=800k 819200+0 records in 819200+0 records out 3355443200 bytes transferred in 26.294773 secs (127608754 bytes/sec) $ dd if=/dev/zero of=testing1600 bs=4k count=1600k 1638400+0 records in 1638400+0 records out 6710886400 bytes transferred in 54.679726 secs (122730798 bytes/sec) $ dd if=/dev/zero of=testing3200 bs=4k count=3200k 3276800+0 records in 3276800+0 records out 13421772800 bytes transferred in 102.508754 secs (130932943 bytes/sec)
That’s between 117-233 MB/s. Very wide ranging results. Not entirely consistent.
At this point the disk is about half full:
$ df -h /mnt Filesystem Size Used Avail Capacity Mounted on example 2.7T 1.0T 1.7T 39% /mnt
dd in same order as ufs
Let’s try this again, in the same order as done with ufs.
[dan@heckler:/mnt/dan] $ df -h /mnt Filesystem Size Used Avail Capacity Mounted on example 2.7T 152k 2.7T 0% /mnt [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing32 bs=32k count=300000 300000+0 records in 300000+0 records out 9830400000 bytes transferred in 74.562503 secs (131841068 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing64 bs=64k count=300000 300000+0 records in 300000+0 records out 19660800000 bytes transferred in 148.559020 secs (132343361 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing128 bs=128k count=300000 300000+0 records in 300000+0 records out 39321600000 bytes transferred in 291.913876 secs (134702744 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing256 bs=256k count=300000 300000+0 records in 300000+0 records out 78643200000 bytes transferred in 579.574777 secs (135691205 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing512 bs=512k count=300000 300000+0 records in 300000+0 records out 157286400000 bytes transferred in 1180.668523 secs (133218085 bytes/sec) [dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing1024 bs=1024k count=300000 300000+0 records in 300000+0 records out 314572800000 bytes transferred in 2383.430611 secs (131983200 bytes/sec) [dan@heckler:/mnt/dan] $
That’s a pretty consistent 125-129 MB/s.
Let’s try the smaller blocks (4K):
[dan@heckler:/mnt/dan/4k] $ cat ~/bin/ddFileSystem4k #!/bin/sh COUNTS="100 200 400 800 1600 3200" for count in ${COUNTS} do CMD="dd if=/dev/zero of=testing${count} bs=4k count=${count}k" echo '$' ${CMD} `${CMD}` done [dan@heckler:/mnt/dan/4k] $ ~/bin/ddFileSystem4k $ dd if=/dev/zero of=testing100 bs=4k count=100k 102400+0 records in 102400+0 records out 419430400 bytes transferred in 1.734270 secs (241848372 bytes/sec) $ dd if=/dev/zero of=testing200 bs=4k count=200k 204800+0 records in 204800+0 records out 838860800 bytes transferred in 3.461010 secs (242374568 bytes/sec) $ dd if=/dev/zero of=testing400 bs=4k count=400k 409600+0 records in 409600+0 records out 1677721600 bytes transferred in 6.906478 secs (242919997 bytes/sec) $ dd if=/dev/zero of=testing800 bs=4k count=800k 819200+0 records in 819200+0 records out 3355443200 bytes transferred in 28.677230 secs (117007229 bytes/sec) $ dd if=/dev/zero of=testing1600 bs=4k count=1600k 1638400+0 records in 1638400+0 records out 6710886400 bytes transferred in 73.088439 secs (91818713 bytes/sec) $ dd if=/dev/zero of=testing3200 bs=4k count=3200k 3276800+0 records in 3276800+0 records out 13421772800 bytes transferred in 98.737248 secs (135934240 bytes/sec)
Varies from 87-231MB/s.
$ df -h /mnt
Filesystem Size Used Avail Capacity Mounted on
example 2.7T 601G 2.1T 22% /mnt
Because of that very fast dd above, I ran a few more ‘dd if=/dev/zero of=testing200 bs=4k count=200k’ tests (each with a different of= filename). They varied:
[dan@heckler:/mnt/dan/4k/again] $ dd if=/dev/zero of=testing200 bs=4k count=200k 204800+0 records in 204800+0 records out 838860800 bytes transferred in 6.763470 secs (124028170 bytes/sec) [dan@heckler:/mnt/dan/4k/again] $ dd if=/dev/zero of=testing200a bs=4k count=200k 204800+0 records in 204800+0 records out 838860800 bytes transferred in 4.068366 secs (206191082 bytes/sec) [dan@heckler:/mnt/dan/4k/again] $ dd if=/dev/zero of=testing200b bs=4k count=200k 204800+0 records in 204800+0 records out 838860800 bytes transferred in 4.151256 secs (202073971 bytes/sec) [dan@heckler:/mnt/dan/4k/again] $ dd if=/dev/zero of=testing200c bs=4k count=200k 204800+0 records in 204800+0 records out 838860800 bytes transferred in 4.220569 secs (198755386 bytes/sec)
That’s 118-196 MB/s.
bonnie++ (added at 2:55 pm)
Here’s the bonnie++ output.
$ ': bonnie++ -s 66000 Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP heckler.unix 66000M 94 99 134458 41 68036 23 228 97 172483 25 183.6 8 Latency 166ms 1447ms 2835ms 347ms 376ms 566ms Version 1.97 ------Sequential Create------ --------Random Create-------- heckler.unixathome. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 17348 97 +++++ +++ 17295 98 16733 94 +++++ +++ 16259 97 Latency 20662us 159us 209us 29731us 51us 283us 1.97,1.97,heckler.unixathome.org,1,1360402714,66000M,,94,99,134458,41,68036,23,228,97,172483,25,183.6,8,16,,,,,17348,97,+++++,+++,17295,98,16733,94,+++++,+++,16259,97,166ms,1447ms,2835ms,347ms,376ms,566ms,20662us,159us,209us,29731us,51us,283us
fio test
After first publishing this post, more than one person mentioned sysutils/fio. With Bruce Cran’s help, I figured out how to run it. You might find this HOWTO useful.
[dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.1 [global] size=320000k bs=32k direct=1 [testing] rw=write [dan@heckler:/mnt/dan] $ fio ~/bin/fio.test.1 fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. testing: (g=0): rw=write, bs=32K-32K/32K-32K/32K-32K, ioengine=sync, iodepth=1 fio-2.0.13 Starting 1 process testing: Laying out IO file(s) (1 file(s) / 312MB) testing: (groupid=0, jobs=1): err= 0: pid=102759: Sat Feb 9 20:11:05 2013 write: io=320000KB, bw=735632KB/s, iops=22988 , runt= 435msec clat (usec): min=15 , max=3500 , avg=41.47, stdev=54.93 lat (usec): min=16 , max=3501 , avg=42.33, stdev=54.92 clat percentiles (usec): | 1.00th=[ 16], 5.00th=[ 16], 10.00th=[ 16], 20.00th=[ 16], | 30.00th=[ 16], 40.00th=[ 17], 50.00th=[ 17], 60.00th=[ 17], | 70.00th=[ 18], 80.00th=[ 102], 90.00th=[ 115], 95.00th=[ 119], | 99.00th=[ 151], 99.50th=[ 159], 99.90th=[ 187], 99.95th=[ 199], | 99.99th=[ 422] lat (usec) : 20=71.94%, 50=3.02%, 100=2.25%, 250=22.77%, 500=0.01% lat (msec) : 4=0.01% cpu : usr=0.00%, sys=77.19%, ctx=2, majf=0, minf=18446744073709539480 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=10000/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=320000KB, aggrb=735632KB/s, minb=735632KB/s, maxb=735632KB/s, mint=435msec, maxt=435msec [dan@heckler:/mnt/dan] $
Bruce Cran suggested adding end_fsync=1 to the results (‘fsync file contents when a write stage has completed’). The last two test used that parameter.
I’m running a longer fio test now. FYI, here is a sample gstat output during that longer test. Some lines from this output have been removed to remove idle drives not being tested.
dT: 0.051s w: 0.050s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 10 1451 0 0 0.0 1451 185711 7.0 101.0| da0 0 0 0 0 0.0 0 0 0.0 0.0| da0.nop 10 1451 0 0 0.0 1451 185711 7.0 101.0| da0p1 10 1451 0 0 0.0 1451 185711 7.0 101.0| gpt/Bay1.1
fio test longer, still no fsync
[dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.2 [global] size=320000M bs=32k direct=1 [testing] rw=write [dan@heckler:/mnt/dan] $ fio ~/bin/fio.test.2 fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. testing: (g=0): rw=write, bs=32K-32K/32K-32K/32K-32K, ioengine=sync, iodepth=1 fio-2.0.13 Starting 1 process testing: Laying out IO file(s) (1 file(s) / 320000MB) Jobs: 1 (f=1): [W] [100.0% done] [0K/143.2M/0K /s] [0 /4581 /0 iops] [eta 00m:00s]:47s] testing: (groupid=0, jobs=1): err= 0: pid=102243: Sat Feb 9 21:01:26 2013 write: io=320000MB, bw=124184KB/s, iops=3880 , runt=2638662msec clat (usec): min=15 , max=5283.7K, avg=254.87, stdev=12180.56 lat (usec): min=16 , max=5283.7K, avg=255.95, stdev=12180.59 clat percentiles (usec): | 1.00th=[ 16], 5.00th=[ 16], 10.00th=[ 16], 20.00th=[ 17], | 30.00th=[ 17], 40.00th=[ 17], 50.00th=[ 18], 60.00th=[ 19], | 70.00th=[ 127], 80.00th=[ 161], 90.00th=[ 940], 95.00th=[ 1004], | 99.00th=[ 1096], 99.50th=[ 1128], 99.90th=[ 1144], 99.95th=[ 1160], | 99.99th=[272384] bw (KB/s) : min= 51, max=728384, per=100.00%, avg=157603.62, stdev=139338.78 lat (usec) : 20=61.03%, 50=4.52%, 100=0.24%, 250=21.53%, 500=0.12% lat (usec) : 750=0.01%, 1000=7.36% lat (msec) : 2=5.17%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : >=2000=0.01% cpu : usr=1.47%, sys=22.04%, ctx=1371595, majf=0, minf=18446744073709539485 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=10240000/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=320000MB, aggrb=124184KB/s, minb=124184KB/s, maxb=124184KB/s, mint=2638662msec, maxt=2638662msec
This test had a throughput of 121 MB/s, and took 49 minutes to run.
fio test with fsync enabled
[dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.3 [global] size=320000k bs=32k direct=1 end_fsync=1 [testing] rw=write [dan@heckler:/mnt/dan] $ fio ~/bin/fio.test.3 fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. testing: (g=0): rw=write, bs=32K-32K/32K-32K/32K-32K, ioengine=sync, iodepth=1 fio-2.0.13 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0K/68462K/0K /s] [0 /2139 /0 iops] [eta 00m:00s] testing: (groupid=0, jobs=1): err= 0: pid=102277: Sat Feb 9 21:03:57 2013 write: io=320000KB, bw=70407KB/s, iops=2200 , runt= 4545msec clat (usec): min=21 , max=940825 , avg=451.98, stdev=11596.84 lat (usec): min=22 , max=940826 , avg=452.86, stdev=11596.84 clat percentiles (usec): | 1.00th=[ 22], 5.00th=[ 23], 10.00th=[ 25], 20.00th=[ 25], | 30.00th=[ 26], 40.00th=[ 26], 50.00th=[ 27], 60.00th=[ 28], | 70.00th=[ 31], 80.00th=[ 510], 90.00th=[ 556], 95.00th=[ 620], | 99.00th=[ 1832], 99.50th=[ 2256], 99.90th=[66048], 99.95th=[110080], | 99.99th=[585728] bw (KB/s) : min= 219, max=165184, per=100.00%, avg=88346.00, stdev=70997.21 lat (usec) : 50=74.92%, 100=0.06%, 250=0.01%, 500=3.46%, 750=17.58% lat (usec) : 1000=0.90% lat (msec) : 2=2.32%, 4=0.55%, 10=0.02%, 20=0.04%, 50=0.04% lat (msec) : 100=0.03%, 250=0.05%, 750=0.01%, 1000=0.01% cpu : usr=0.46%, sys=7.88%, ctx=2490, majf=0, minf=18446744073709539485 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=10000/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=320000KB, aggrb=70407KB/s, minb=70407KB/s, maxb=70407KB/s, mint=4545msec, maxt=4545msec [dan@heckler:/mnt/dan] $
This test had throughput of 69 MB/s and took 4.5 seconds.
fio test, bigger output
This test involves blocks of 4M, writing out 1.25GB.
[dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.4 [global] size=1280000k bs=4096k direct=1 [testing] rw=write [dan@heckler:/mnt/dan] $ fio ~/bin/fio.test.4 fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. testing: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=sync, iodepth=1 fio-2.0.13 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0K/151.6M/0K /s] [0 /37 /0 iops] [eta 00m:00s] testing: (groupid=0, jobs=1): err= 0: pid=102259: Sat Feb 9 21:07:56 2013 write: io=1252.0MB, bw=193926KB/s, iops=47 , runt= 6611msec clat (msec): min=2 , max=897 , avg=20.85, stdev=107.66 lat (msec): min=2 , max=897 , avg=21.11, stdev=107.67 clat percentiles (msec): | 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3], 20.00th=[ 3], | 30.00th=[ 3], 40.00th=[ 3], 50.00th=[ 3], 60.00th=[ 4], | 70.00th=[ 4], 80.00th=[ 5], 90.00th=[ 17], 95.00th=[ 32], | 99.00th=[ 758], 99.50th=[ 873], 99.90th=[ 898], 99.95th=[ 898], | 99.99th=[ 898] bw (KB/s) : min=128343, max=361108, per=91.20%, avg=176865.50, stdev=90517.11 lat (msec) : 4=77.00%, 10=12.14%, 20=0.96%, 50=7.67%, 250=0.32% lat (msec) : 750=0.64%, 1000=1.28% cpu : usr=1.30%, sys=15.25%, ctx=934, majf=0, minf=18446744073709539485 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=313/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=1252.0MB, aggrb=193926KB/s, minb=193926KB/s, maxb=193926KB/s, mint=6611msec, maxt=6611msec
It had a throughput of 189 MB/s and took 6.6 seconds.
fio tests on ufs
I removed the zfs partition and created a ufs one in its place:
[root@heckler ~]# gpart delete -i 1 da0 da0p1 deleted [root@heckler ~]# gpart add -b 2048 -a 4k -t freebsd da0 da0s1 added [root@heckler ~]# gpart create -s BSD da0s1 da0s1 created [root@heckler:/home/dan] # gpart add -t freebsd-ufs da0s10 gpart showda0s1a added [root@heckler:/home/dan] # gpart show da0 da0s1 => 34 5860533101 da0 GPT (2.7T) 34 2014 - free - (1M) 2048 5860531080 1 freebsd (2.7T) 5860533128 7 - free - (3.5k) => 0 4294967295 da0s1 BSD (2.7T) 0 4294967288 1 freebsd-ufs (2T) 4294967288 7 - free - (3.5k) [root@heckler:/home/dan] #
Then I ran the tests from the previous section (but only the two involving fsync):
[dan@heckler:/mnt/dan] $ cat ~/bin/in/fio.test.3 [global] size=320000k bs=32k direct=1 end_fsync=1 [testing] rw=write [dan@heckler:/mnt/dan] $ fio /home/dan/bin/fio.test.3 fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. testing: (g=0): rw=write, bs=32K-32K/32K-32K/32K-32K, ioengine=sync, iodepth=1 fio-2.0.13 Starting 1 process testing: Laying out IO file(s) (1 file(s) / 312MB) Jobs: 1 (f=1): [W] [-.-% done] [0K/150.4M/0K /s] [0 /4809 /0 iops] [eta 00m:00s] testing: (groupid=0, jobs=1): err= 0: pid=100837: Sat Feb 9 22:30:20 2013 write: io=320000KB, bw=164948KB/s, iops=5154 , runt= 1940msec clat (usec): min=14 , max=169544 , avg=191.15, stdev=2368.41 lat (usec): min=15 , max=169545 , avg=192.10, stdev=2368.41 clat percentiles (usec): | 1.00th=[ 15], 5.00th=[ 16], 10.00th=[ 17], 20.00th=[ 18], | 30.00th=[ 19], 40.00th=[ 20], 50.00th=[ 21], 60.00th=[ 31], | 70.00th=[ 34], 80.00th=[ 44], 90.00th=[ 636], 95.00th=[ 652], | 99.00th=[ 1512], 99.50th=[ 1784], 99.90th=[ 1880], 99.95th=[ 1944], | 99.99th=[162816] bw (KB/s) : min=149888, max=201600, per=100.00%, avg=170752.00, stdev=27263.40 lat (usec) : 20=31.74%, 50=48.53%, 100=0.44%, 250=0.29%, 500=0.39% lat (usec) : 750=16.89%, 1000=0.01% lat (msec) : 2=1.66%, 10=0.03%, 250=0.02% cpu : usr=1.13%, sys=8.92%, ctx=1909, majf=0, minf=18446744073709539480 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=10000/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=320000KB, aggrb=164948KB/s, minb=164948KB/s, maxb=164948KB/s, mint=1940msec, maxt=1940msec [dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.4 [global] size=1280000k bs=4096k direct=1 [testing] rw=write [dan@heckler:/mnt/dan] $ fio /home/dan/bin/fio.test.4 fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. testing: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=sync, iodepth=1 fio-2.0.13 Starting 1 process testing: Laying out IO file(s) (1 file(s) / 1250MB) Jobs: 1 (f=1): [W] [100.0% done] [0K/151.6M/0K /s] [0 /37 /0 iops] [eta 00m:00s] testing: (groupid=0, jobs=1): err= 0: pid=101009: Sat Feb 9 22:30:35 2013 write: io=1248.0MB, bw=156997KB/s, iops=38 , runt= 8140msec clat (msec): min=3 , max=173 , avg=25.83, stdev=22.95 lat (msec): min=3 , max=173 , avg=26.08, stdev=22.95 clat percentiles (msec): | 1.00th=[ 5], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 25], | 30.00th=[ 25], 40.00th=[ 26], 50.00th=[ 26], 60.00th=[ 26], | 70.00th=[ 26], 80.00th=[ 26], 90.00th=[ 26], 95.00th=[ 30], | 99.00th=[ 169], 99.50th=[ 174], 99.90th=[ 174], 99.95th=[ 174], | 99.99th=[ 174] bw (KB/s) : min=116806, max=204800, per=100.00%, avg=157713.20, stdev=24387.66 lat (msec) : 4=0.96%, 10=12.18%, 20=2.24%, 50=82.37%, 250=2.24% cpu : usr=0.80%, sys=17.55%, ctx=8364, majf=0, minf=18446744073709539485 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=312/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=1248.0MB, aggrb=156996KB/s, minb=156996KB/s, maxb=156996KB/s, mint=8140msec, maxt=8140msec [dan@heckler:/mnt/dan] $
Those tests were 161 MB/s and 153 MB/s respectively.
fio tests with large files and bigger blocks
In this test, I went for a larger file and bigger blocks. Also, I’ve turned direct IO off since fsync is ignored when direct io is specified.
[dan@heckler:/mnt/dan/testing] $ cat ~/bin/fio.test.8 [global] size=2560000k bs=2096k direct=0 end_fsync=1 [testing] rw=write [dan@heckler:/mnt/dan/testing] $ [dan@heckler:/mnt/dan/testing] $ fio ~/bin/fio.test.8 fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. testing: (g=0): rw=write, bs=2096K-2096K/2096K-2096K/2096K-2096K, ioengine=sync, iodepth=1 fio-2.0.13 Starting 1 process testing: Laying out IO file(s) (1 file(s) / 2500MB) Jobs: 1 (f=1): [W] [100.0% done] [0K/173.3M/0K /s] [0 /84 /0 iops] [eta 00m:00s] testing: (groupid=0, jobs=1): err= 0: pid=102181: Sat Feb 9 23:37:00 2013 write: io=2499.3MB, bw=177600KB/s, iops=84 , runt= 14410msec clat (msec): min=1 , max=903 , avg=11.70, stdev=82.92 lat (msec): min=1 , max=904 , avg=11.80, stdev=82.93 clat percentiles (usec): | 1.00th=[ 1160], 5.00th=[ 1224], 10.00th=[ 1256], 20.00th=[ 1288], | 30.00th=[ 1304], 40.00th=[ 1336], 50.00th=[ 1368], 60.00th=[ 1400], | 70.00th=[ 1464], 80.00th=[ 1752], 90.00th=[16768], 95.00th=[17024], | 99.00th=[643072], 99.50th=[798720], 99.90th=[880640], 99.95th=[905216], | 99.99th=[905216] bw (KB/s) : min= 9274, max=944850, per=100.00%, avg=202417.86, stdev=217380.26 lat (msec) : 2=86.49%, 4=0.90%, 10=0.74%, 20=10.65%, 50=0.16% lat (msec) : 750=0.16%, 1000=0.90% cpu : usr=0.74%, sys=10.91%, ctx=2406, majf=0, minf=18446744073709539472 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=1221/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=2499.3MB, aggrb=177600KB/s, minb=177600KB/s, maxb=177600KB/s, mint=14410msec, maxt=14410msec
Conclusions
What would you conclude here?