I think I’m now at the point where I have all the hardware assembled and it’s ready for testing. Now I’m trying to determine what set of fio tests to run. If you know of a set of standard tests, or just have some of your own, please let me know in the comments.
The hardware
The disks I have assembled are all available via the LSI 9211-8i. The basic hardware specs are:
- motherboard – SUPERMICRO MBD-H8SGL-O ATX Server Motherboard (Supermicro link)
- CPU – AMD Opteron 6128 Magny-Cours 2.0GHz 8 x 512KB L2 Cache 12MB L3 Cache Socket G34 115W 8-Core Server
- RAM – Kingston 8GB 240-Pin DDR3 SDRAM ECC Registered DDR3 1600 Server Memory
- SATA card – LSI Internal SATA/SAS 9211-8i 6Gb/s PCI-Express 2.0 RAID Controller Card, Kit (LSI page)
The OS
FreeBSD 9.1-STABLE is installed on the following gmirrored HDD:
- Western Digital WD Blue WD2500AAKX 250GB 7200 RPM SATA 6.0Gb/s 3.5″ Internal Hard Drive -Bare Drive
- Seagate Barracuda ST250DM000 250GB 7200 RPM SATA 6.0Gb/s 3.5″ Internal Hard Drive -Bare Drive
These drives are connected to the motherboard SATA ports.
No tuning has been done on the OS.
Installed on this box is the following:
$ ls /var/db/pkg/ apr-1.4.6.1.4.1_3 gmake-3.82_1 pcre-8.32 aspell-0.60.6.1_2 help2man-1.41.1 perl-5.14.2_2 autoconf-2.69 joe-3.7_1,1 pkgconf-0.8.9 autoconf-wrapper-20101119 libgcrypt-1.5.0_1 portaudit-0.6.0 automake-1.12.6 libgpg-error-1.10 postfix-2.9.5,1 automake-wrapper-20101119 libiconv-1.14 postgresql-client-9.2.2_1 bacula-client-5.2.12 libtool-2.4.2 postgresql-contrib-9.2.2_1 bash-4.2.42 libxml2-2.7.8_5 postgresql-server-9.2.2_1 bison-2.5.1,1 libxslt-1.1.28 python27-2.7.3_6 bonnie++-1.97 m4-1.4.16_1,1 screen-4.0.3_14 db42-4.2.52_5 nagios-plugins-1.4.16,1 smartmontools-6.0 expat-2.0.1_2 neon29-0.29.6_4 sqlite3-3.7.14.1 fio-2.0.13 nrpe-2.13_2 subversion-1.7.8 gdbm-1.9.1 ossp-uuid-1.6.2_2 sudo-1.8.6.p5 gettext-0.18.1.1 p5-Locale-gettext-1.05_3
There is no real load on this machine. Any services provided are for testing, not for outside uses.
HDD available for testing
The following HDD are available for testing purposes and are connected to the LSI card mentioned above.
Also available, but not in the server are 3 additional 2 TB Seagate HDD. The plan is to remove the 3 x 3TB HDD and create a raidz2 array composed of 8 x 2TB Seagate drives for another test.
The plan so far
I plan to run tests on the following setups:
- A zpool raidz1 composed of the 4 x 2TB Seagate drives
- A zpool consisting of the single 3TB Toshiba drive
- A zpool consisting of the single 3TB WD Red drive
- A zpool consisting of the single 3TB Seagate drive
- A zpool consisting of the single 2TB Seagate drive
- UFS on 3TB Toshiba drive
- UFS on 3TB WD Red drive
- UFS on 3TB Seagate drive
- UFS on 2TB Seagate drive
- A zpool raidz2 composed of 8 x 2TB Seagate drives
I think we can also do some testing on zpool setups with and without the SSD being used as a ZIL/L2ARC.
A couple of interesting reads
- Greg Smith, on watching an HDD die.
- Greg Smith, on testing HDD for PostgreSQL
- testing HDD copy
I may use some of that for inventing some simple tests.
What set of tests to run?
One simple test I can think of is loading up a PostgreSQL database (e.g. psql example1 < freshports.org.sql). In this case, the raw uncompressed source file is about 8GB, and creates a database which is about 33GB on disk. But I want to do some other tests, which may be of use to others. I have fio set up and ready to do with a few simple tests (mostly just simple read/write). But if you have fio tests which you use, or know of other tests which I can grab and use, please let me know in the comments. Thanks.
Recently, I finished a file tree cloning tool for FreeBSD and Mac OS X. It is now available in the ports of FreeBSD:
/usr/ports/sysutils/clone
It can be used to clone the complete content behind one mount point to another one. On completion it reports the number of items copied, the total size of data transferred, the time and the data rate.
Example:
clone / /mnt
would clone the root file system to the FS connected to /mnt. And this would more or less drive the disks to their limits in respect to the raw transfer rates. On my low profile system I accomplished 89 MByte/s for transferring 2.3 TByte of data from one disk to another.
However, clone can also be used to clone one directory within the same file system, and I guess this would drive the disks crazy with respect to seek times.
Anyway, I would be interested to see how clone performs on a high profile system like yours, and also to see a comparison between dump/restore.
Best regards
Rolf
I wonder if that will be limited by the source of the data? i.e. the slower gmirror upon which all the files are located.
Given the very fast SATA controller and fast processors, the physical speed of the hard disks may still be the bottleneck also in a gmirror – I may be wrong. Hence, this must be tested.
Idea: put the OS onto an SSD, use that as the source. This would eliminate the source for the copy as a possible bottleneck.
Thoughts:
This would put the weight on write performance of the respective disk.
Copying back would give information about read performance also.
In order to reduce the number of tests, perhaps it is sufficient, to read/write from/to the same disk, e.g. from to different partitions.
Perhaps it would be interesting to partition the drives under test in 3 equal parts, so you could also test whether there are huge differences in reading/writing to inner, middle, and outer disk regions.
Won’t diskinfo give that information?
FYI, I’ve got the SSD installed in this server now. It had to be upgraded to 9.1-STABLE to recognize it.
While running diskclone, I get some errors:
When running 8 concurrent instances of diskclone, one did not complete (shown above).
Not sure why we get these.
In the present incarnation of the clone tool, read error on a file with error code 0 could only happen, if the size of the file changed in the course of copying it. Perhaps, I need to implement another error checking, not based on the copied block size.
Did you run the 8 concurrent instances on the same source and destination?
What seems to be also strange to me, that /mnt appears as a source path, that shouldn’t happen, if /mnt is a mount point for a different volume.
Same source, different destinations.
e.g.
OK, so far.
However, then, given this error message:
Read error on file /mnt/CLONE8-2/usr/share/man/man4/if_my.4.gz: No error: 0.
it seems that clone got a problem in correctly telling mount-points from directories. clone never should take anything from a different mount-point as source. Anyhow clone found /mnt/CLONE8-2 having the same device-id as /ssd. What does stat tell?
stat -s /ssd
stat -s /mnt/CLONE8-2
The st_dev values are different. For the moment, I fail to understand, ho clone come to take something from /mnt/CLONE8-2 as source file. I need to investigate this.
I will also change the way, it handles differences in expected and actual file size. If there is no read error (errno == 0), then clone should simply accept this, i.e. copy the changed file, and perhaps leaving a notice that the file has been changed by another process.
Is this some read confirmation which failed?
The thread which schedules files for cloning may have put up to 100 files in the queue ahead of the file that is currently read/written. The items in the queue contain stat record of the respective file, and for performance reasons the source file is not stat’ed again once it is going to be copied.
In any case it takes some time to read/write all the files in the queue, which may take sufficiently long, that a file changes in between it was put into the queue and eventually read-in for copying. The current version of clone took a file size difference as an error, even if the file could be read-in without error.
I just changed this behaviour. clone now barks only on real read errors, i.e. those having an error code different from 0. The new version is already submitted to the svn-repository at Google Code. Tomorrow, I will submit a PR, so that it gets into the ports.
I was thinking perhaps I had two identical destinations, but this seems to indicate otherwise.
I suspect CLONE8-2 is different because that’s the one which did not complete.
That makes sense.
Does the command df show for 7 clones exactly the same occupied size?
What was the min. and max. transfer rate?
I never got any min/max, just the results at the end.
But:
$ sudo du -ch -d 0 CLONE8*
Password:
18G CLONE8-1
1.2G CLONE8-2
18G CLONE8-3
18G CLONE8-4
18G CLONE8-5
18G CLONE8-6
18G CLONE8-7
18G CLONE8-8
133G total
Yes, a single run gives only a single result at the end.
7 runs would give 7 single results, and I was asking for the minimum of the 7, the average of the 7 and the maximum of the 7.
BTW, clone has been updated to version 1.0.1 in the ports of FreeBSD.
It should not report a non-error anymore, when it encounters a changed file, but simply would copy it. Also, it should not block anymore in the middle of cloning.
Many thanks for letting me know these issues.
Best regards
Rolf