Feb 092013
 

This is a repeat of a benchmark I did yesterday. The drive is a TOSHIBA DT01ACA300 3TB HDD. This is a 7200 RPM SATA III drive. Tests were run with FreeBSD 9.1 in the hardware listed below. Tonight, we’re going to do the partitions slightly differently, and try ZFS.

The hardware

We are testing on the following hardware:

  1. motherboard – SUPERMICRO MBD-H8SGL-O ATX Server Motherboard (Supermicro link): $224.99
  2. CPU – AMD Opteron 6128 Magny-Cours 2.0GHz 8 x 512KB L2 Cache 12MB L3 Cache Socket G34 115W 8-Core Server : $284.99
  3. RAM – Kingston 8GB 240-Pin DDR3 SDRAM ECC Registered DDR3 1600 Server Memory : 4 x $64.99 = $259.96
  4. PSU – PC Power and Cooling Silencer MK III 600W power supply : $99.99
  5. SATA card – LSI Internal SATA/SAS 9211-8i 6Gb/s PCI-Express 2.0 RAID Controller Card, Kit (LSI page): $319.99
  6. HDD for ZFS – Seagate Barracuda ST2000DM001 2TB 7200 RPM 64MB : 8 x $109.99 = $879.92

The drive being tested is not part of the base OS.

The devices

The LSI card:

mps0: <LSI SAS2008> port 0x8000-0x80ff mem 0xfde3c000-0xfde3ffff,0xfde40000-0xfde7ffff irq 28 at device 0.0 on pci1
mps0: Firmware: 14.00.01.00, Driver: 14.00.00.01-fbsd
mps0: IOCCapabilities: 185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>

The drives:

da0 at mps0 bus 0 scbus0 target 4 lun 0
da0: <ATA TOSHIBA DT01ACA3 ABB0> Fixed Direct Access SCSI-6 device 
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C)

dd to raw device

We are skipping the raw dd. From the previous benchmark, we got rates between 175 MB/s and 177 MB/s.

The diskinfo

Here’s diskinfo (as copied from the previous benchmark):

# diskinfo -tv /dev/da0
/dev/da0
	512         	# sectorsize
	3000592982016	# mediasize in bytes (2.7T)
	5860533168  	# mediasize in sectors
	4096        	# stripesize
	0           	# stripeoffset
	364801      	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	           Z2T5TSSAS	# Disk ident.

Seek times:
	Full stroke:	  250 iter in   6.302236 sec =   25.209 msec
	Half stroke:	  250 iter in   4.395477 sec =   17.582 msec
	Quarter stroke:	  500 iter in   7.240961 sec =   14.482 msec
	Short forward:	  400 iter in   1.999334 sec =    4.998 msec
	Short backward:	  400 iter in   2.338969 sec =    5.847 msec
	Seq outer:	 2048 iter in   0.163964 sec =    0.080 msec
	Seq inner:	 2048 iter in   0.172619 sec =    0.084 msec
Transfer rates:
	outside:       102400 kbytes in   0.580453 sec =   176414 kbytes/sec
	middle:        102400 kbytes in   0.652515 sec =   156931 kbytes/sec
	inside:        102400 kbytes in   1.098502 sec =    93218 kbytes/sec

phybs

Next, we run phybs (as copied from the previous benchmark):

# ./phybs -rw -l 1024 /dev/da0
   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096       59527    2201    2201
  131072    1024     512    4096       59320    2209    2209

   65536    2048       0    8192       31212    2099    4199
   65536    2048     512    8192       31780    2062    4124
   65536    2048    1024    8192       31896    2054    4109

   32768    4096       0   16384       11575    2830   11322
   32768    4096     512   16384       26017    1259    5037
   32768    4096    1024   16384       26197    1250    5003
   32768    4096    2048   16384       26188    1251    5004

   16384    8192       0   32768        9464    1731   13849
   16384    8192     512   32768       21142     774    6199
   16384    8192    1024   32768       23422     699    5595
   16384    8192    2048   32768       22764     719    5757
   16384    8192    4096   32768       10493    1561   12491

dd to the filesystem

First, we’ll do UFS. After partitioning and newfs’ing, we have:

# gpart show da0 da0s1
=>        34  5860533101  da0  GPT  (2.7T)
          34         966       - free -  (483k)
        1000  5860532128    1  freebsd  (2.7T)
  5860533128           7       - free -  (3.5k)

=>         0  4294967295  da0s1  BSD  (2.7T)
           0  4294967288      1  freebsd-ufs  (2T)
  4294967288           7         - free -  (3.5k)

Next, the dd:

[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing32 bs=32k count=300000
300000+0 records in
300000+0 records out
9830400000 bytes transferred in 60.514341 secs (162447443 bytes/sec)

[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing64 bs=64k count=300000
300000+0 records in
300000+0 records out
19660800000 bytes transferred in 122.158163 secs (160945446 bytes/sec)

[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing128 bs=128k count=300000
300000+0 records in
300000+0 records out
39321600000 bytes transferred in 249.585626 secs (157547534 bytes/sec)

[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing256 bs=256k count=300000
300000+0 records in
300000+0 records out
78643200000 bytes transferred in 528.035264 secs (148935507 bytes/sec)

[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing512 bs=512k count=300000
300000+0 records in
300000+0 records out
157286400000 bytes transferred in 1232.900178 secs (127574319 bytes/sec)

[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing1024 bs=1024k count=300000
300000+0 records in
300000+0 records out
314572800000 bytes transferred in 2167.745021 secs (145115222 bytes/sec)

That ranges from 121-154 MB/s.

bonnie++

And finally, a quick bonnie++:

$ bonnie++ -s 66000
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
heckler.unix 66000M   535  99 154324  30 55599  48  1035  98 156736  24 211.0   7
Latency             18874us     354ms   10425ms   70732us    1560ms     473ms
Version  1.97       ------Sequential Create------ --------Random Create--------
heckler.unixathome. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency             41615us      36us      45us   46619us      35us      45us
1.97,1.97,heckler.unixathome.org,1,1360307494,66000M,,535,99,154324,30,55599,48,1035,98,156736,24,211.0,7,16,,,,,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,18874us,354ms,10425ms,70732us,1560ms,473ms,41615us,36us,45us,46619us,35us,45us

And, for the record:

$ df -h /mnt
Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/da0s1    2.7T    576G    1.9T    23%    /mnt

ashift

I recommend you read this post regarding ashift. Then you’ll see why I tried a benchmark with and without ashift=12.

create the zpool

Next, we shall create the zpool and try benchmarking that.

We started with this:

$ gpart show
=>        34  5860533101  da0  GPT  (2.7T)
          34         966       - free -  (483k)
        1000  5860532128    1  freebsd  (2.7T)
  5860533128           7       - free -  (3.5k)

=>         0  4294967295  da0s1  BSD  (2.7T)
           0  4294967288      1  freebsd-ufs  (2T)
  4294967288           7         - free -  (3.5k)

To get this back to a starting point, I did:

# gpart delete -i 1 da0s1
da0s1a deleted
# gpart destroy da0s1
da0s1 destroyed
# gpart delete -i 1 da0
da0s1 deleted
# gpart destroy da0
da0 destroyed

Then:

 gpart create -s GPT da0
da0 created
# gpart add -b 1000 -a 4k -t freebsd-zfs -s 95G da0 
da0p1 added
# gpart show da0
=>        34  5860533101  da0  GPT  (2.7T)
          34         966       - free -  (483k)
        1000   199229440    1  freebsd-zfs  (95G)
   199230440  5661302695       - free -  (2.7T)

# zpool create -m /mnt example /dev/da0p1
# zpool status
  pool: example
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	example     ONLINE       0     0     0
	  da0p1     ONLINE       0     0     0

errors: No known data errors

Doing the dd – ashift != 12

$ cat ~/bin/ddFileSystem4k
#!/bin/sh

COUNTS="100 200 400 800 1600 3200"

for count in ${COUNTS}
do
  CMD="dd if=/dev/zero of=testing${count} bs=4k count=${count}k"
  echo '$' ${CMD}
  `${CMD}`
done
[dan@heckler:/mnt/dan] $ ~/bin/ddFileSystem4k
$ dd if=/dev/zero of=testing100 bs=4k count=100k
102400+0 records in
102400+0 records out
419430400 bytes transferred in 1.533860 secs (273447648 bytes/sec)
$ dd if=/dev/zero of=testing200 bs=4k count=200k
204800+0 records in
204800+0 records out
838860800 bytes transferred in 3.018803 secs (277878627 bytes/sec)
$ dd if=/dev/zero of=testing400 bs=4k count=400k
409600+0 records in
409600+0 records out
1677721600 bytes transferred in 12.839801 secs (130665700 bytes/sec)
$ dd if=/dev/zero of=testing800 bs=4k count=800k
819200+0 records in
819200+0 records out
3355443200 bytes transferred in 26.395755 secs (127120561 bytes/sec)
$ dd if=/dev/zero of=testing1600 bs=4k count=1600k
1638400+0 records in
1638400+0 records out
6710886400 bytes transferred in 47.622508 secs (140918375 bytes/sec)
$ dd if=/dev/zero of=testing3200 bs=4k count=3200k
3276800+0 records in
3276800+0 records out
13421772800 bytes transferred in 100.156393 secs (134008149 bytes/sec)
[dan@heckler:/mnt/dan] $ 

Then I did some 300k count tests:

$  ~/bin/ddFileSystem
dd if=/dev/zero of=testing32 bs=32k count=300k
307200+0 records in
307200+0 records out
10066329600 bytes transferred in 95.411953 secs (105503863 bytes/sec)
dd if=/dev/zero of=testing64 bs=64k count=300k
^C298832+0 records in
298831+0 records out

Which I stopped because of the terrible throughput. Then I noticed. This is just a 95G partition. Oops. Sorry, bad paste. Let me try again.

Let’s try that again:

[root@heckler /home/dan]# gpart delete -i 1 da0
da0p1 deleted
[root@heckler /home/dan]# gpart add -b 1000 -a 4k -t freebsd-zfs da0
da0p1 added
[root@heckler /home/dan]# gpart show da0
=>        34  5860533101  da0  GPT  (2.7T)
          34         966       - free -  (483k)
        1000  5860532128    1  freebsd-zfs  (2.7T)
  5860533128           7       - free -  (3.5k)

[root@heckler /home/dan]# zpool create -m /mnt example /dev/da0p1
[root@heckler /home/dan]# zpool status
  pool: example
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	example     ONLINE       0     0     0
	  da0p1     ONLINE       0     0     0

errors: No known data errors
[root@heckler /home/dan]#

Now, where was I? Ahh yes, getting ashift=12. I found http://savagedlight.me/2012/07/15/freebsd-zfs-advanced-format/ and it had a simple enough example to be useful.

[root@heckler /home/dan]# gpart add -a 1m -t freebsd-zfs -l Bay1.1 da0 
da0p1 added

[root@heckler /home/dan]# gnop create -S 4k gpt/Bay1.1

[root@heckler /home/dan]# zpool create -m /mnt example /dev/gpt/Bay1.1.nop
invalid vdev specification
use '-f' to override the following errors:
/dev/gpt/Bay1.1.nop is part of exported pool 'pool'

[root@heckler /home/dan]# zpool create -f  -m /mnt example /dev/gpt/Bay1.1.nop
[root@heckler /home/dan]# gpart show da0
=>        34  5860533101  da0  GPT  (2.7T)
          34        2014       - free -  (1M)
        2048  5860530176    1  freebsd-zfs  (2.7T)
  5860532224         911       - free -  (455k)

[root@heckler /home/dan]# mount
/dev/mirror/gm0s1a on / (ufs, local, journaled soft-updates)
devfs on /dev (devfs, local, multilabel)
/dev/mirror/gm0s1d on /var (ufs, local, journaled soft-updates)
/dev/mirror/gm0s1e on /tmp (ufs, local, journaled soft-updates)
/dev/mirror/gm0s1f on /usr (ufs, local, journaled soft-updates)
example on /mnt (zfs, local, nfsv4acls)

[root@heckler /home/dan]# zpool export example

[root@heckler /home/dan]# gnop destroy gpt/Bay1.1.nop

[root@heckler /home/dan]# zpool import -d /dev/gpt example

[root@heckler /home/dan]# mount
/dev/mirror/gm0s1a on / (ufs, local, journaled soft-updates)
devfs on /dev (devfs, local, multilabel)
/dev/mirror/gm0s1d on /var (ufs, local, journaled soft-updates)
/dev/mirror/gm0s1e on /tmp (ufs, local, journaled soft-updates)
/dev/mirror/gm0s1f on /usr (ufs, local, journaled soft-updates)
example on /mnt (zfs, local, nfsv4acls)

[root@heckler /home/dan]# gpart show da0
=>        34  5860533101  da0  GPT  (2.7T)
          34        2014       - free -  (1M)
        2048  5860530176    1  freebsd-zfs  (2.7T)
  5860532224         911       - free -  (455k)

[root@heckler /home/dan]# zdb | grep ashift
            ashift: 12
[root@heckler /home/dan]#

Ahh, there’s the ashift we want.

dd with ashift, different order from the ufs test

I started off with a different test than with ufs. Then I repeated the tests in the same order as ufs.

$  ~/bin/ddFileSystem
dd if=/dev/zero of=testing32 bs=32k count=300k
307200+0 records in
307200+0 records out
10066329600 bytes transferred in 64.243268 secs (156690808 bytes/sec)
dd if=/dev/zero of=testing64 bs=64k count=300k
307200+0 records in
307200+0 records out
20132659200 bytes transferred in 147.213138 secs (136758577 bytes/sec)
dd if=/dev/zero of=testing128 bs=128k count=300k
307200+0 records in
307200+0 records out
40265318400 bytes transferred in 232.357049 secs (173290712 bytes/sec)
dd if=/dev/zero of=testing256 bs=256k count=300k
307200+0 records in
307200+0 records out
80530636800 bytes transferred in 689.901064 secs (116727805 bytes/sec)
dd if=/dev/zero of=testing1024 bs=1024k count=300k
307200+0 records in
307200+0 records out
322122547200 bytes transferred in 2403.438818 secs (134025691 bytes/sec)
dd if=/dev/zero of=testing2048 bs=2048k count=300k
307200+0 records in
307200+0 records out
644245094400 bytes transferred in 5186.330284 secs (124219835 bytes/sec)

That’s 118-165MB/s. Not very consistent.

Next, we have writing of 4k blocks.

$ ~/bin/ddFileSystem4k
$ dd if=/dev/zero of=testing100 bs=4k count=100k
102400+0 records in
102400+0 records out
419430400 bytes transferred in 1.716126 secs (244405369 bytes/sec)
$ dd if=/dev/zero of=testing200 bs=4k count=200k
204800+0 records in
204800+0 records out
838860800 bytes transferred in 5.224607 secs (160559598 bytes/sec)
$ dd if=/dev/zero of=testing400 bs=4k count=400k
409600+0 records in
409600+0 records out
1677721600 bytes transferred in 11.608812 secs (144521385 bytes/sec)
$ dd if=/dev/zero of=testing800 bs=4k count=800k
819200+0 records in
819200+0 records out
3355443200 bytes transferred in 26.294773 secs (127608754 bytes/sec)
$ dd if=/dev/zero of=testing1600 bs=4k count=1600k
1638400+0 records in
1638400+0 records out
6710886400 bytes transferred in 54.679726 secs (122730798 bytes/sec)
$ dd if=/dev/zero of=testing3200 bs=4k count=3200k
3276800+0 records in
3276800+0 records out
13421772800 bytes transferred in 102.508754 secs (130932943 bytes/sec)

That’s between 117-233 MB/s. Very wide ranging results. Not entirely consistent.

At this point the disk is about half full:

$ df -h /mnt
Filesystem    Size    Used   Avail Capacity  Mounted on
example       2.7T    1.0T    1.7T    39%    /mnt

dd in same order as ufs

Let’s try this again, in the same order as done with ufs.

[dan@heckler:/mnt/dan] $ df -h /mnt
Filesystem    Size    Used   Avail Capacity  Mounted on
example       2.7T    152k    2.7T     0%    /mnt
[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing32 bs=32k count=300000
300000+0 records in
300000+0 records out
9830400000 bytes transferred in 74.562503 secs (131841068 bytes/sec)
[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing64 bs=64k count=300000
300000+0 records in
300000+0 records out
19660800000 bytes transferred in 148.559020 secs (132343361 bytes/sec)
[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing128 bs=128k count=300000
300000+0 records in
300000+0 records out
39321600000 bytes transferred in 291.913876 secs (134702744 bytes/sec)
[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing256 bs=256k count=300000
300000+0 records in
300000+0 records out
78643200000 bytes transferred in 579.574777 secs (135691205 bytes/sec)
[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing512 bs=512k count=300000
300000+0 records in
300000+0 records out
157286400000 bytes transferred in 1180.668523 secs (133218085 bytes/sec)
[dan@heckler:/mnt/dan] $ dd if=/dev/zero of=testing1024 bs=1024k count=300000
300000+0 records in
300000+0 records out
314572800000 bytes transferred in 2383.430611 secs (131983200 bytes/sec)
[dan@heckler:/mnt/dan] $ 

That’s a pretty consistent 125-129 MB/s.

Let’s try the smaller blocks (4K):

[dan@heckler:/mnt/dan/4k] $ cat ~/bin/ddFileSystem4k
#!/bin/sh

COUNTS="100 200 400 800 1600 3200"

for count in ${COUNTS}
do
  CMD="dd if=/dev/zero of=testing${count} bs=4k count=${count}k"
  echo '$' ${CMD}
  `${CMD}`
done

[dan@heckler:/mnt/dan/4k] $ ~/bin/ddFileSystem4k
$ dd if=/dev/zero of=testing100 bs=4k count=100k
102400+0 records in
102400+0 records out
419430400 bytes transferred in 1.734270 secs (241848372 bytes/sec)

$ dd if=/dev/zero of=testing200 bs=4k count=200k
204800+0 records in
204800+0 records out
838860800 bytes transferred in 3.461010 secs (242374568 bytes/sec)

$ dd if=/dev/zero of=testing400 bs=4k count=400k
409600+0 records in
409600+0 records out
1677721600 bytes transferred in 6.906478 secs (242919997 bytes/sec)

$ dd if=/dev/zero of=testing800 bs=4k count=800k
819200+0 records in
819200+0 records out
3355443200 bytes transferred in 28.677230 secs (117007229 bytes/sec)

$ dd if=/dev/zero of=testing1600 bs=4k count=1600k
1638400+0 records in
1638400+0 records out
6710886400 bytes transferred in 73.088439 secs (91818713 bytes/sec)

$ dd if=/dev/zero of=testing3200 bs=4k count=3200k
3276800+0 records in     
3276800+0 records out
13421772800 bytes transferred in 98.737248 secs (135934240 bytes/sec)

Varies from 87-231MB/s.

$ df -h /mnt
Filesystem Size Used Avail Capacity Mounted on
example 2.7T 601G 2.1T 22% /mnt

Because of that very fast dd above, I ran a few more ‘dd if=/dev/zero of=testing200 bs=4k count=200k’ tests (each with a different of= filename). They varied:

[dan@heckler:/mnt/dan/4k/again] $ dd if=/dev/zero of=testing200 bs=4k count=200k
204800+0 records in
204800+0 records out
838860800 bytes transferred in 6.763470 secs (124028170 bytes/sec)

[dan@heckler:/mnt/dan/4k/again] $ dd if=/dev/zero of=testing200a bs=4k count=200k
204800+0 records in
204800+0 records out
838860800 bytes transferred in 4.068366 secs (206191082 bytes/sec)

[dan@heckler:/mnt/dan/4k/again] $ dd if=/dev/zero of=testing200b bs=4k count=200k
204800+0 records in
204800+0 records out
838860800 bytes transferred in 4.151256 secs (202073971 bytes/sec)

[dan@heckler:/mnt/dan/4k/again] $ dd if=/dev/zero of=testing200c bs=4k count=200k
204800+0 records in
204800+0 records out
838860800 bytes transferred in 4.220569 secs (198755386 bytes/sec)

That’s 118-196 MB/s.

bonnie++ (added at 2:55 pm)

Here’s the bonnie++ output.

 $ ': bonnie++ -s 66000
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
heckler.unix 66000M    94  99 134458  41 68036  23   228  97 172483  25 183.6   8
Latency               166ms    1447ms    2835ms     347ms     376ms     566ms
Version  1.97       ------Sequential Create------ --------Random Create--------
heckler.unixathome. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 17348  97 +++++ +++ 17295  98 16733  94 +++++ +++ 16259  97
Latency             20662us     159us     209us   29731us      51us     283us
1.97,1.97,heckler.unixathome.org,1,1360402714,66000M,,94,99,134458,41,68036,23,228,97,172483,25,183.6,8,16,,,,,17348,97,+++++,+++,17295,98,16733,94,+++++,+++,16259,97,166ms,1447ms,2835ms,347ms,376ms,566ms,20662us,159us,209us,29731us,51us,283us

fio test

After first publishing this post, more than one person mentioned sysutils/fio. With Bruce Cran’s help, I figured out how to run it. You might find this HOWTO useful.

[dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.1
[global]
size=320000k
bs=32k
direct=1

[testing]
rw=write
[dan@heckler:/mnt/dan] $ fio ~/bin/fio.test.1
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
testing: (g=0): rw=write, bs=32K-32K/32K-32K/32K-32K, ioengine=sync, iodepth=1
fio-2.0.13
Starting 1 process
testing: Laying out IO file(s) (1 file(s) / 312MB)

testing: (groupid=0, jobs=1): err= 0: pid=102759: Sat Feb  9 20:11:05 2013
  write: io=320000KB, bw=735632KB/s, iops=22988 , runt=   435msec
    clat (usec): min=15 , max=3500 , avg=41.47, stdev=54.93
     lat (usec): min=16 , max=3501 , avg=42.33, stdev=54.92
    clat percentiles (usec):
     |  1.00th=[   16],  5.00th=[   16], 10.00th=[   16], 20.00th=[   16],
     | 30.00th=[   16], 40.00th=[   17], 50.00th=[   17], 60.00th=[   17],
     | 70.00th=[   18], 80.00th=[  102], 90.00th=[  115], 95.00th=[  119],
     | 99.00th=[  151], 99.50th=[  159], 99.90th=[  187], 99.95th=[  199],
     | 99.99th=[  422]
    lat (usec) : 20=71.94%, 50=3.02%, 100=2.25%, 250=22.77%, 500=0.01%
    lat (msec) : 4=0.01%
  cpu          : usr=0.00%, sys=77.19%, ctx=2, majf=0, minf=18446744073709539480
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=10000/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=320000KB, aggrb=735632KB/s, minb=735632KB/s, maxb=735632KB/s, mint=435msec, maxt=435msec
[dan@heckler:/mnt/dan] $ 

Bruce Cran suggested adding end_fsync=1 to the results (‘fsync file contents when a write stage has completed’). The last two test used that parameter.

I’m running a longer fio test now. FYI, here is a sample gstat output during that longer test. Some lines from this output have been removed to remove idle drives not being tested.

dT: 0.051s  w: 0.050s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
   10   1451      0      0    0.0   1451 185711    7.0  101.0| da0
    0      0      0      0    0.0      0      0    0.0    0.0| da0.nop
   10   1451      0      0    0.0   1451 185711    7.0  101.0| da0p1
   10   1451      0      0    0.0   1451 185711    7.0  101.0| gpt/Bay1.1

fio test longer, still no fsync

[dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.2
[global]
size=320000M
bs=32k
direct=1

[testing]
rw=write
[dan@heckler:/mnt/dan] $ fio ~/bin/fio.test.2
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
testing: (g=0): rw=write, bs=32K-32K/32K-32K/32K-32K, ioengine=sync, iodepth=1
fio-2.0.13
Starting 1 process
testing: Laying out IO file(s) (1 file(s) / 320000MB)
Jobs: 1 (f=1): [W] [100.0% done] [0K/143.2M/0K /s] [0 /4581 /0  iops] [eta 00m:00s]:47s]
testing: (groupid=0, jobs=1): err= 0: pid=102243: Sat Feb  9 21:01:26 2013
  write: io=320000MB, bw=124184KB/s, iops=3880 , runt=2638662msec
    clat (usec): min=15 , max=5283.7K, avg=254.87, stdev=12180.56
     lat (usec): min=16 , max=5283.7K, avg=255.95, stdev=12180.59
    clat percentiles (usec):
     |  1.00th=[   16],  5.00th=[   16], 10.00th=[   16], 20.00th=[   17],
     | 30.00th=[   17], 40.00th=[   17], 50.00th=[   18], 60.00th=[   19],
     | 70.00th=[  127], 80.00th=[  161], 90.00th=[  940], 95.00th=[ 1004],
     | 99.00th=[ 1096], 99.50th=[ 1128], 99.90th=[ 1144], 99.95th=[ 1160],
     | 99.99th=[272384]
    bw (KB/s)  : min=   51, max=728384, per=100.00%, avg=157603.62, stdev=139338.78
    lat (usec) : 20=61.03%, 50=4.52%, 100=0.24%, 250=21.53%, 500=0.12%
    lat (usec) : 750=0.01%, 1000=7.36%
    lat (msec) : 2=5.17%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : >=2000=0.01%
  cpu          : usr=1.47%, sys=22.04%, ctx=1371595, majf=0, minf=18446744073709539485
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=10240000/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=320000MB, aggrb=124184KB/s, minb=124184KB/s, maxb=124184KB/s, mint=2638662msec, maxt=2638662msec

This test had a throughput of 121 MB/s, and took 49 minutes to run.

fio test with fsync enabled

[dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.3
[global]
size=320000k
bs=32k
direct=1
end_fsync=1

[testing]
rw=write
[dan@heckler:/mnt/dan] $ fio ~/bin/fio.test.3
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
testing: (g=0): rw=write, bs=32K-32K/32K-32K/32K-32K, ioengine=sync, iodepth=1
fio-2.0.13
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0K/68462K/0K /s] [0 /2139 /0  iops] [eta 00m:00s]
testing: (groupid=0, jobs=1): err= 0: pid=102277: Sat Feb  9 21:03:57 2013
  write: io=320000KB, bw=70407KB/s, iops=2200 , runt=  4545msec
    clat (usec): min=21 , max=940825 , avg=451.98, stdev=11596.84
     lat (usec): min=22 , max=940826 , avg=452.86, stdev=11596.84
    clat percentiles (usec):
     |  1.00th=[   22],  5.00th=[   23], 10.00th=[   25], 20.00th=[   25],
     | 30.00th=[   26], 40.00th=[   26], 50.00th=[   27], 60.00th=[   28],
     | 70.00th=[   31], 80.00th=[  510], 90.00th=[  556], 95.00th=[  620],
     | 99.00th=[ 1832], 99.50th=[ 2256], 99.90th=[66048], 99.95th=[110080],
     | 99.99th=[585728]
    bw (KB/s)  : min=  219, max=165184, per=100.00%, avg=88346.00, stdev=70997.21
    lat (usec) : 50=74.92%, 100=0.06%, 250=0.01%, 500=3.46%, 750=17.58%
    lat (usec) : 1000=0.90%
    lat (msec) : 2=2.32%, 4=0.55%, 10=0.02%, 20=0.04%, 50=0.04%
    lat (msec) : 100=0.03%, 250=0.05%, 750=0.01%, 1000=0.01%
  cpu          : usr=0.46%, sys=7.88%, ctx=2490, majf=0, minf=18446744073709539485
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=10000/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=320000KB, aggrb=70407KB/s, minb=70407KB/s, maxb=70407KB/s, mint=4545msec, maxt=4545msec
[dan@heckler:/mnt/dan] $

This test had throughput of 69 MB/s and took 4.5 seconds.

fio test, bigger output

This test involves blocks of 4M, writing out 1.25GB.

[dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.4
[global]
size=1280000k
bs=4096k
direct=1

[testing]
rw=write
[dan@heckler:/mnt/dan] $ fio ~/bin/fio.test.4
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
testing: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=sync, iodepth=1
fio-2.0.13
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0K/151.6M/0K /s] [0 /37 /0  iops] [eta 00m:00s]
testing: (groupid=0, jobs=1): err= 0: pid=102259: Sat Feb  9 21:07:56 2013
  write: io=1252.0MB, bw=193926KB/s, iops=47 , runt=  6611msec
    clat (msec): min=2 , max=897 , avg=20.85, stdev=107.66
     lat (msec): min=2 , max=897 , avg=21.11, stdev=107.67
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    3],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    3], 60.00th=[    4],
     | 70.00th=[    4], 80.00th=[    5], 90.00th=[   17], 95.00th=[   32],
     | 99.00th=[  758], 99.50th=[  873], 99.90th=[  898], 99.95th=[  898],
     | 99.99th=[  898]
    bw (KB/s)  : min=128343, max=361108, per=91.20%, avg=176865.50, stdev=90517.11
    lat (msec) : 4=77.00%, 10=12.14%, 20=0.96%, 50=7.67%, 250=0.32%
    lat (msec) : 750=0.64%, 1000=1.28%
  cpu          : usr=1.30%, sys=15.25%, ctx=934, majf=0, minf=18446744073709539485
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=313/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=1252.0MB, aggrb=193926KB/s, minb=193926KB/s, maxb=193926KB/s, mint=6611msec, maxt=6611msec

It had a throughput of 189 MB/s and took 6.6 seconds.

fio tests on ufs

I removed the zfs partition and created a ufs one in its place:

[root@heckler ~]# gpart delete -i 1 da0
da0p1 deleted

[root@heckler ~]# gpart add -b 2048 -a 4k -t freebsd da0
da0s1 added

[root@heckler ~]# gpart create -s BSD da0s1
da0s1 created

[root@heckler:/home/dan] # gpart add -t freebsd-ufs da0s10
gpart showda0s1a added
[root@heckler:/home/dan] # gpart show da0 da0s1
=>        34  5860533101  da0  GPT  (2.7T)
          34        2014       - free -  (1M)
        2048  5860531080    1  freebsd  (2.7T)
  5860533128           7       - free -  (3.5k)

=>         0  4294967295  da0s1  BSD  (2.7T)
           0  4294967288      1  freebsd-ufs  (2T)
  4294967288           7         - free -  (3.5k)

[root@heckler:/home/dan] # 

Then I ran the tests from the previous section (but only the two involving fsync):

[dan@heckler:/mnt/dan] $ cat ~/bin/in/fio.test.3
[global]
size=320000k
bs=32k
direct=1
end_fsync=1

[testing]
rw=write

[dan@heckler:/mnt/dan] $ fio /home/dan/bin/fio.test.3
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
testing: (g=0): rw=write, bs=32K-32K/32K-32K/32K-32K, ioengine=sync, iodepth=1
fio-2.0.13
Starting 1 process
testing: Laying out IO file(s) (1 file(s) / 312MB)
Jobs: 1 (f=1): [W] [-.-% done] [0K/150.4M/0K /s] [0 /4809 /0  iops] [eta 00m:00s]
testing: (groupid=0, jobs=1): err= 0: pid=100837: Sat Feb  9 22:30:20 2013
  write: io=320000KB, bw=164948KB/s, iops=5154 , runt=  1940msec
    clat (usec): min=14 , max=169544 , avg=191.15, stdev=2368.41
     lat (usec): min=15 , max=169545 , avg=192.10, stdev=2368.41
    clat percentiles (usec):
     |  1.00th=[   15],  5.00th=[   16], 10.00th=[   17], 20.00th=[   18],
     | 30.00th=[   19], 40.00th=[   20], 50.00th=[   21], 60.00th=[   31],
     | 70.00th=[   34], 80.00th=[   44], 90.00th=[  636], 95.00th=[  652],
     | 99.00th=[ 1512], 99.50th=[ 1784], 99.90th=[ 1880], 99.95th=[ 1944],
     | 99.99th=[162816]
    bw (KB/s)  : min=149888, max=201600, per=100.00%, avg=170752.00, stdev=27263.40
    lat (usec) : 20=31.74%, 50=48.53%, 100=0.44%, 250=0.29%, 500=0.39%
    lat (usec) : 750=16.89%, 1000=0.01%
    lat (msec) : 2=1.66%, 10=0.03%, 250=0.02%
  cpu          : usr=1.13%, sys=8.92%, ctx=1909, majf=0, minf=18446744073709539480
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=10000/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=320000KB, aggrb=164948KB/s, minb=164948KB/s, maxb=164948KB/s, mint=1940msec, maxt=1940msec

[dan@heckler:/mnt/dan] $ cat ~/bin/fio.test.4
[global]
size=1280000k
bs=4096k
direct=1

[testing]
rw=write

[dan@heckler:/mnt/dan] $ fio /home/dan/bin/fio.test.4
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
testing: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=sync, iodepth=1
fio-2.0.13
Starting 1 process
testing: Laying out IO file(s) (1 file(s) / 1250MB)
Jobs: 1 (f=1): [W] [100.0% done] [0K/151.6M/0K /s] [0 /37 /0  iops] [eta 00m:00s]
testing: (groupid=0, jobs=1): err= 0: pid=101009: Sat Feb  9 22:30:35 2013
  write: io=1248.0MB, bw=156997KB/s, iops=38 , runt=  8140msec
    clat (msec): min=3 , max=173 , avg=25.83, stdev=22.95
     lat (msec): min=3 , max=173 , avg=26.08, stdev=22.95
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[    5], 10.00th=[    6], 20.00th=[   25],
     | 30.00th=[   25], 40.00th=[   26], 50.00th=[   26], 60.00th=[   26],
     | 70.00th=[   26], 80.00th=[   26], 90.00th=[   26], 95.00th=[   30],
     | 99.00th=[  169], 99.50th=[  174], 99.90th=[  174], 99.95th=[  174],
     | 99.99th=[  174]
    bw (KB/s)  : min=116806, max=204800, per=100.00%, avg=157713.20, stdev=24387.66
    lat (msec) : 4=0.96%, 10=12.18%, 20=2.24%, 50=82.37%, 250=2.24%
  cpu          : usr=0.80%, sys=17.55%, ctx=8364, majf=0, minf=18446744073709539485
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=312/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=1248.0MB, aggrb=156996KB/s, minb=156996KB/s, maxb=156996KB/s, mint=8140msec, maxt=8140msec
[dan@heckler:/mnt/dan] $ 

Those tests were 161 MB/s and 153 MB/s respectively.

fio tests with large files and bigger blocks

In this test, I went for a larger file and bigger blocks. Also, I’ve turned direct IO off since fsync is ignored when direct io is specified.

[dan@heckler:/mnt/dan/testing] $ cat ~/bin/fio.test.8
[global]
size=2560000k
bs=2096k
direct=0
end_fsync=1

[testing]
rw=write
[dan@heckler:/mnt/dan/testing] $

[dan@heckler:/mnt/dan/testing] $ fio ~/bin/fio.test.8
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
testing: (g=0): rw=write, bs=2096K-2096K/2096K-2096K/2096K-2096K, ioengine=sync, iodepth=1
fio-2.0.13
Starting 1 process
testing: Laying out IO file(s) (1 file(s) / 2500MB)
Jobs: 1 (f=1): [W] [100.0% done] [0K/173.3M/0K /s] [0 /84 /0  iops] [eta 00m:00s]
testing: (groupid=0, jobs=1): err= 0: pid=102181: Sat Feb  9 23:37:00 2013
  write: io=2499.3MB, bw=177600KB/s, iops=84 , runt= 14410msec
    clat (msec): min=1 , max=903 , avg=11.70, stdev=82.92
     lat (msec): min=1 , max=904 , avg=11.80, stdev=82.93
    clat percentiles (usec):
     |  1.00th=[ 1160],  5.00th=[ 1224], 10.00th=[ 1256], 20.00th=[ 1288],
     | 30.00th=[ 1304], 40.00th=[ 1336], 50.00th=[ 1368], 60.00th=[ 1400],
     | 70.00th=[ 1464], 80.00th=[ 1752], 90.00th=[16768], 95.00th=[17024],
     | 99.00th=[643072], 99.50th=[798720], 99.90th=[880640], 99.95th=[905216],
     | 99.99th=[905216]
    bw (KB/s)  : min= 9274, max=944850, per=100.00%, avg=202417.86, stdev=217380.26
    lat (msec) : 2=86.49%, 4=0.90%, 10=0.74%, 20=10.65%, 50=0.16%
    lat (msec) : 750=0.16%, 1000=0.90%
  cpu          : usr=0.74%, sys=10.91%, ctx=2406, majf=0, minf=18446744073709539472
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=1221/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=2499.3MB, aggrb=177600KB/s, minb=177600KB/s, maxb=177600KB/s, mint=14410msec, maxt=14410msec

Conclusions

What would you conclude here?

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive