* Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors
@ 2008-06-07 14:22 Justin Piszcz
2008-06-07 15:54 ` David Lethe
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Justin Piszcz @ 2008-06-07 14:22 UTC (permalink / raw)
To: linux-kernel, linux-raid, xfs; +Cc: Alan Piszcz
First, the original benchmarks with 6-SATA drives with fixed formatting, using
right justification and the same decimal point precision throughout:
http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-and-right-justified/disks.html
Now for for veliciraptors! Ever wonder what kind of speed is possible with
3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each run is
executed three times and the average is taken of all three runs per each
RAID5 disk set.
In short? The 965 no longer does justice with faster drives, a new chipset
and motherboard are needed. After reading or writing to 4-5 veliciraptors
it saturates the bus/965 chipset.
Here is a picture of the 12 veliciraptors I tested with:
http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/raptors.jpg
Here are the bonnie++ results:
http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/veliciraptor-raid.html
For those who want the results in text:
http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/veliciraptor-raid.txt
System used, same/similar as before:
Motherboard: Intel DG965WH
Memory: 8GiB
Kernel: 2.6.25.4
Distribution: Debian Testing x86_64
Filesystem: XFS with default mkfs.xfs parameters [auto-optimized for SW RAID]
Mount options: defaults,noatime,nodiratime,logbufs=8,logbsize=262144 0 1
Chunk size: 1024KiB
RAID5 Layout: Default (left-symmetric)
Mdadm Superblock used: 0.90
Optimizations used (last one is for the CFQ scheduler), it improves
performance by a modest 5-10MiB/s:
http://home.comcast.net/~jpiszcz/raid/20080601/raid5.html
# Tell user what's going on.
echo "Optimizing RAID Arrays..."
# Define DISKS.
cd /sys/block
DISKS=$(/bin/ls -1d sd[a-z])
# Set read-ahead.
# > That's actually 65k x 512byte blocks so 32MiB
echo "Setting read-ahead to 32 MiB for /dev/md3"
blockdev --setra 65536 /dev/md3
# Set stripe-cache_size for RAID5.
echo "Setting stripe_cache_size to 16 MiB for /dev/md3"
echo 16384 > /sys/block/md3/md/stripe_cache_size
# Disable NCQ on all disks.
echo "Disabling NCQ on all disks..."
for i in $DISKS
do
echo "Disabling NCQ on $i"
echo 1 > /sys/block/"$i"/device/queue_depth
done
# Fix slice_idle.
# See http://www.nextre.it/oracledocs/ioscheduler_03.html
echo "Fixing slice_idle to 0..."
for i in $DISKS
do
echo "Changing slice_idle to 0 on $i"
echo 0 > /sys/block/"$i"/queue/iosched/slice_idle
done
----
Order of tests:
1. Create RAID (mdadm)
Example:
if [ $num_disks -eq 3 ]; then
mdadm --create /dev/md3 --verbose --level=5 -n $num_disks -c 1024 -e 0.90 \
/dev/sd[c-e]1 --assume-clean --run
fi
2. Run optimize script (above)
See above.
3. mkfs.xfs -f /dev/md3
mkfs.xfs auto-optimized for the underlying devices in an mdadm SW RAID.
4. Run bonnie++ as shown below 3 times, averaged:
/usr/bin/time /usr/sbin/bonnie++ -u 1000 -d /x/test -s 16384 -m p34 -n 16:100000:16:64 > $HOME/test"$run"_$num_disks-disks.txt 2>&1
----
A little more info, after 4-5 dd's, I have already maxed out the performance
of what the chipset can offer, see below:
knoppix@Knoppix:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 2755556 6176 203584 0 0 153 1 25 371 3 1 84 11
0 0 0 2755556 6176 203588 0 0 0 0 66 257 0 0 100 0
0 1 0 2605400 152204 203584 0 0 0 146028 257 396 0 5 77 18
0 1 0 2478176 277520 203604 0 0 0 125316 345 794 1 4 75 20
1 0 0 2349472 403984 203592 0 0 0 119136 297 256 0 5 75 20
2 1 0 2117292 631172 203512 0 0 0 232336 498 1019 0 8 66 26
0 2 0 2014400 731968 203556 0 0 0 241472 542 2078 1 11 63 25
3 0 0 2013412 733756 203492 0 0 0 302104 672 2760 0 14 59 27
0 3 0 2013576 735624 203520 0 0 0 362524 808 3356 0 15 56 29
0 4 0 2039312 736728 174860 0 0 120 425484 956 4899 1 20 52 26
0 4 0 2050236 738508 163712 0 0 0 482868 1008 5030 1 24 46 29
5 3 0 2050192 737916 163756 0 0 0 531532 1175 6033 0 26 43 31
3 4 0 2050220 738028 163744 0 0 0 606560 1312 6664 1 32 38 30
1 5 0 2049432 739184 163628 0 0 0 592756 1291 7195 1 30 35 34
8 3 0 2049488 738868 163580 0 0 0 675228 1721 10540 1 38 30 31
Here, ~5 raptor 300s, no more linear improvement after this:
4 4 0 2050048 737816 163744 0 0 0 677820 1771 10514 1 36 32 31
6 4 0 2048764 738612 163684 0 0 0 697640 1842 13231 1 40 27 33
^ permalink raw reply [flat|nested] 14+ messages in thread* RE: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-07 14:22 Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors Justin Piszcz @ 2008-06-07 15:54 ` David Lethe 2008-06-08 1:46 ` Dan Williams ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: David Lethe @ 2008-06-07 15:54 UTC (permalink / raw) To: Justin Piszcz, linux-kernel, linux-raid, xfs; +Cc: Alan Piszcz This is all interesting, but this has no relevance to the real world, where computers run application software. You have a great foundation here, but it won't help anybody who is running a database, mail, or file/backup server because the I/Os are too large, and homogeneous. You will get profoundly different sweet spots for RAID configurations once you model your bench to match something that people actually run. I am not criticizing you for this, it is just that now I have a taste for what you have accomplished, and I want more more more :) David -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Justin Piszcz Sent: Saturday, June 07, 2008 9:23 AM To: linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; xfs@oss.sgi.com Cc: Alan Piszcz Subject: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors First, the original benchmarks with 6-SATA drives with fixed formatting, using right justification and the same decimal point precision throughout: http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-an d-right-justified/disks.html Now for for veliciraptors! Ever wonder what kind of speed is possible with 3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each run is executed three times and the average is taken of all three runs per each RAID5 disk set. In short? The 965 no longer does justice with faster drives, a new chipset and motherboard are needed. After reading or writing to 4-5 veliciraptors it saturates the bus/965 chipset. Here is a picture of the 12 veliciraptors I tested with: http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicir aptors/raptors.jpg Here are the bonnie++ results: http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicir aptors/veliciraptor-raid.html For those who want the results in text: http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicir aptors/veliciraptor-raid.txt System used, same/similar as before: Motherboard: Intel DG965WH Memory: 8GiB Kernel: 2.6.25.4 Distribution: Debian Testing x86_64 Filesystem: XFS with default mkfs.xfs parameters [auto-optimized for SW RAID] Mount options: defaults,noatime,nodiratime,logbufs=8,logbsize=262144 0 1 Chunk size: 1024KiB RAID5 Layout: Default (left-symmetric) Mdadm Superblock used: 0.90 Optimizations used (last one is for the CFQ scheduler), it improves performance by a modest 5-10MiB/s: http://home.comcast.net/~jpiszcz/raid/20080601/raid5.html # Tell user what's going on. echo "Optimizing RAID Arrays..." # Define DISKS. cd /sys/block DISKS=$(/bin/ls -1d sd[a-z]) # Set read-ahead. # > That's actually 65k x 512byte blocks so 32MiB echo "Setting read-ahead to 32 MiB for /dev/md3" blockdev --setra 65536 /dev/md3 # Set stripe-cache_size for RAID5. echo "Setting stripe_cache_size to 16 MiB for /dev/md3" echo 16384 > /sys/block/md3/md/stripe_cache_size # Disable NCQ on all disks. echo "Disabling NCQ on all disks..." for i in $DISKS do echo "Disabling NCQ on $i" echo 1 > /sys/block/"$i"/device/queue_depth done # Fix slice_idle. # See http://www.nextre.it/oracledocs/ioscheduler_03.html echo "Fixing slice_idle to 0..." for i in $DISKS do echo "Changing slice_idle to 0 on $i" echo 0 > /sys/block/"$i"/queue/iosched/slice_idle done ---- Order of tests: 1. Create RAID (mdadm) Example: if [ $num_disks -eq 3 ]; then mdadm --create /dev/md3 --verbose --level=5 -n $num_disks -c 1024 -e 0.90 \ /dev/sd[c-e]1 --assume-clean --run fi 2. Run optimize script (above) See above. 3. mkfs.xfs -f /dev/md3 mkfs.xfs auto-optimized for the underlying devices in an mdadm SW RAID. 4. Run bonnie++ as shown below 3 times, averaged: /usr/bin/time /usr/sbin/bonnie++ -u 1000 -d /x/test -s 16384 -m p34 -n 16:100000:16:64 > $HOME/test"$run"_$num_disks-disks.txt 2>&1 ---- A little more info, after 4-5 dd's, I have already maxed out the performance of what the chipset can offer, see below: knoppix@Knoppix:~$ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 2755556 6176 203584 0 0 153 1 25 371 3 1 84 11 0 0 0 2755556 6176 203588 0 0 0 0 66 257 0 0 100 0 0 1 0 2605400 152204 203584 0 0 0 146028 257 396 0 5 77 18 0 1 0 2478176 277520 203604 0 0 0 125316 345 794 1 4 75 20 1 0 0 2349472 403984 203592 0 0 0 119136 297 256 0 5 75 20 2 1 0 2117292 631172 203512 0 0 0 232336 498 1019 0 8 66 26 0 2 0 2014400 731968 203556 0 0 0 241472 542 2078 1 11 63 25 3 0 0 2013412 733756 203492 0 0 0 302104 672 2760 0 14 59 27 0 3 0 2013576 735624 203520 0 0 0 362524 808 3356 0 15 56 29 0 4 0 2039312 736728 174860 0 0 120 425484 956 4899 1 20 52 26 0 4 0 2050236 738508 163712 0 0 0 482868 1008 5030 1 24 46 29 5 3 0 2050192 737916 163756 0 0 0 531532 1175 6033 0 26 43 31 3 4 0 2050220 738028 163744 0 0 0 606560 1312 6664 1 32 38 30 1 5 0 2049432 739184 163628 0 0 0 592756 1291 7195 1 30 35 34 8 3 0 2049488 738868 163580 0 0 0 675228 1721 10540 1 38 30 31 Here, ~5 raptor 300s, no more linear improvement after this: 4 4 0 2050048 737816 163744 0 0 0 677820 1771 10514 1 36 32 31 6 4 0 2048764 738612 163684 0 0 0 697640 1842 13231 1 40 27 33 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-07 14:22 Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors Justin Piszcz 2008-06-07 15:54 ` David Lethe @ 2008-06-08 1:46 ` Dan Williams 2008-06-09 7:51 ` thomas62186218 2008-06-11 17:02 ` Nat Makarevitch 2008-06-11 20:27 ` Bill Davidsen 3 siblings, 1 reply; 14+ messages in thread From: Dan Williams @ 2008-06-08 1:46 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs, Alan Piszcz On Sat, Jun 7, 2008 at 7:22 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > First, the original benchmarks with 6-SATA drives with fixed formatting, > using > right justification and the same decimal point precision throughout: > http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-and-right-justified/disks.html > > Now for for veliciraptors! Ever wonder what kind of speed is possible with > 3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each run is > executed three times and the average is taken of all three runs per each > RAID5 disk set. > > In short? The 965 no longer does justice with faster drives, a new chipset > and motherboard are needed. After reading or writing to 4-5 veliciraptors > it saturates the bus/965 chipset. > > Here is a picture of the 12 veliciraptors I tested with: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/raptors.jpg > > Here are the bonnie++ results: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/veliciraptor-raid.html > > For those who want the results in text: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/veliciraptor-raid.txt > > System used, same/similar as before: > Motherboard: Intel DG965WH > Memory: 8GiB > Kernel: 2.6.25.4 > Distribution: Debian Testing x86_64 > Filesystem: XFS with default mkfs.xfs parameters [auto-optimized for SW > RAID] > Mount options: defaults,noatime,nodiratime,logbufs=8,logbsize=262144 0 1 > Chunk size: 1024KiB > RAID5 Layout: Default (left-symmetric) > Mdadm Superblock used: 0.90 > > Optimizations used (last one is for the CFQ scheduler), it improves > performance by a modest 5-10MiB/s: > http://home.comcast.net/~jpiszcz/raid/20080601/raid5.html > > # Tell user what's going on. > echo "Optimizing RAID Arrays..." > > # Define DISKS. > cd /sys/block > DISKS=$(/bin/ls -1d sd[a-z]) > > # Set read-ahead. > # > That's actually 65k x 512byte blocks so 32MiB > echo "Setting read-ahead to 32 MiB for /dev/md3" > blockdev --setra 65536 /dev/md3 > > # Set stripe-cache_size for RAID5. > echo "Setting stripe_cache_size to 16 MiB for /dev/md3" Sorry to sound like a broken record, 16MiB is not correct. size=$((num_disks * 4 * 16384 / 1024)) echo "Setting stripe_cache_size to $size MiB for /dev/md3" ...and commit 8b3e6cdc should improve the performance / stripe_cache_size ratio. > echo 16384 > /sys/block/md3/md/stripe_cache_size > > # Disable NCQ on all disks. > echo "Disabling NCQ on all disks..." > for i in $DISKS > do > echo "Disabling NCQ on $i" > echo 1 > /sys/block/"$i"/device/queue_depth > done > > # Fix slice_idle. > # See http://www.nextre.it/oracledocs/ioscheduler_03.html > echo "Fixing slice_idle to 0..." > for i in $DISKS > do > echo "Changing slice_idle to 0 on $i" > echo 0 > /sys/block/"$i"/queue/iosched/slice_idle > done > Thanks for putting this data together. Regards, Dan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-08 1:46 ` Dan Williams @ 2008-06-09 7:51 ` thomas62186218 2008-06-09 8:43 ` Keld Jørn Simonsen 2008-06-09 13:41 ` David Lethe 0 siblings, 2 replies; 14+ messages in thread From: thomas62186218 @ 2008-06-09 7:51 UTC (permalink / raw) To: dan.j.williams, jpiszcz; +Cc: linux-kernel, linux-raid, xfs, ap Thank you for sharing these results. One issue that I consistently see with these results is miserable random IO performance. Looking at these numbers, even a low-end RAID controller with 128MB of cache will outrun md-based RAIDs in random IO benchmarks. In today's world of virtual machines, etc, random IO is far more common than sequential IO. What can be done with md (or something else) to alleviate this problem? -Thomas -----Original Message----- From: Dan Williams <dan.j.williams@gmail.com> To: Justin Piszcz <jpiszcz@lucidpixels.com> Cc: linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; xfs@oss.sgi.com; Alan Piszcz <ap@solarrain.com> Sent: Sat, 7 Jun 2008 6:46 pm Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors On Sat, Jun 7, 2008 at 7:22 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > First, the original benchmarks with 6-SATA drives with fixed formatting, > using > right justification and the same decimal point precision throughout: > http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-and-right-justified/disks.html > > Now for for veliciraptors! Ever wonder what kind of speed is possible with > 3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each run is > executed three times and the average is taken of all three runs per each > RAID5 disk set. > > In short? The 965 no longer does justice with faster drives, a new chipset > and motherboard are needed. After reading or writing to 4-5 veliciraptors > it saturates the bus/965 chipset. > > Here is a picture of the 12 veliciraptors I tested with: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/raptors.jpg > > Here are the bonnie++ results: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/veliciraptor-raid.html > > For those who want the results in text: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/veliciraptor-raid.txt > > System used, same/similar as before: > Motherboard: Intel DG965WH > Memory: 8GiB > Kernel: 2.6.25.4 > Distribution: Debian Testing x86_64 > Filesystem: XFS with default mkfs.xfs parameters [auto-optimized for SW > RAID] > Mount options: defaults,noatime,nodiratime,logbufs=8,logbsize=262144 0 1 > Chunk size: 1024KiB > RAID5 Layout: Default (left-symmetric) > Mdadm Superblock used: 0.90 > > Optimizations used (last one is for the CFQ scheduler), it improves > performance by a modest 5-10MiB/s: > http://home.comcast.net/~jpiszcz/raid/20080601/raid5.html > > # Tell user what's going on. > echo "Optimizing RAID Arrays..." > > # Define DISKS. > cd /sys/block > DISKS=$(/bin/ls -1d sd[a-z]) > > # Set read-ahead. > # > That's actually 65k x 512byte blocks so 32MiB > echo "Setting read-ahead to 32 MiB for /dev/md3" > blockdev --setra 65536 /dev/md3 > > # Set stripe-cache_size for RAID5. > echo "Setting stripe_cache_size to 16 MiB for /dev/md3" Sorry to sound like a broken record, 16MiB is not correct. size=$((num_disks * 4 * 16384 / 1024)) echo "Setting stripe_cache_size to $size MiB for /dev/md3" ...and commit 8b3e6cdc should improve the performance / stripe_cache_size ratio. > echo 16384 > /sys/block/md3/md/stripe_cache_size > > # Disable NCQ on all disks. > echo "Disabling NCQ on all disks..." > for i in $DISKS > do > echo "Disabling NCQ on $i" > echo 1 > /sys/block/"$i"/device/queue_depth > done > > # Fix slice_idle. > # See http://www.nextre.it/oracledocs/ioscheduler_03.html > echo "Fixing slice_idle to 0..." > for i in $DISKS > do > echo "Changing slice_idle to 0 on $i" > echo 0 > /sys/block/"$i"/queue/iosched/slice_idle > done > Thanks for putting this data together. Regards, Dan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-09 7:51 ` thomas62186218 @ 2008-06-09 8:43 ` Keld Jørn Simonsen 2008-06-09 13:41 ` David Lethe 1 sibling, 0 replies; 14+ messages in thread From: Keld Jørn Simonsen @ 2008-06-09 8:43 UTC (permalink / raw) To: thomas62186218; +Cc: dan.j.williams, jpiszcz, linux-kernel, linux-raid, xfs, ap On Mon, Jun 09, 2008 at 03:51:07AM -0400, thomas62186218@aol.com wrote: > Thank you for sharing these results. One issue that I consistently see > with these results is miserable random IO performance. Looking at these > numbers, even a low-end RAID controller with 128MB of cache will outrun > md-based RAIDs in random IO benchmarks. In today's world of virtual > machines, etc, random IO is far more common than sequential IO. What > can be done with md (or something else) to alleviate this problem? Have you got any numbers to back up this? What benchmark are you using for random IO? Anyway the numbers that Justin reported was with an outdate motherboard, My take is that Linux MD raid can outperform most HW RAID by a factor of two on random IO. Best regards keld > -Thomas > > > -----Original Message----- > From: Dan Williams <dan.j.williams@gmail.com> > To: Justin Piszcz <jpiszcz@lucidpixels.com> > Cc: linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; > xfs@oss.sgi.com; Alan Piszcz <ap@solarrain.com> > Sent: Sat, 7 Jun 2008 6:46 pm > Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte > Veliciraptors > > > > > > > > > > > On Sat, Jun 7, 2008 at 7:22 AM, Justin Piszcz <jpiszcz@lucidpixels.com> > wrote: > >First, the original benchmarks with 6-SATA drives with fixed > formatting, > >using > >right justification and the same decimal point precision throughout: > > > http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-and-right-justified/disks.html > > > >Now for for veliciraptors! Ever wonder what kind of speed is > possible with > >3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each > run is > >executed three times and the average is taken of all three runs per > each > >RAID5 disk set. > > > >In short? The 965 no longer does justice with faster drives, a new > chipset > >and motherboard are needed. After reading or writing to 4-5 > veliciraptors > >it saturates the bus/965 chipset. > > > >Here is a picture of the 12 veliciraptors I tested with: > > > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/raptors.jpg > > > >Here are the bonnie++ results: > > > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/veliciraptor-raid.html > > > >For those who want the results in text: > > > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/veliciraptor-raid.txt > > > >System used, same/similar as before: > >Motherboard: Intel DG965WH > >Memory: 8GiB > >Kernel: 2.6.25.4 > >Distribution: Debian Testing x86_64 > >Filesystem: XFS with default mkfs.xfs parameters [auto-optimized for > SW > >RAID] > >Mount options: defaults,noatime,nodiratime,logbufs=8,logbsize=262144 > 0 1 > >Chunk size: 1024KiB > >RAID5 Layout: Default (left-symmetric) > >Mdadm Superblock used: 0.90 > > > >Optimizations used (last one is for the CFQ scheduler), it improves > >performance by a modest 5-10MiB/s: > >http://home.comcast.net/~jpiszcz/raid/20080601/raid5.html > > > ># Tell user what's going on. > >echo "Optimizing RAID Arrays..." > > > ># Define DISKS. > >cd /sys/block > >DISKS=$(/bin/ls -1d sd[a-z]) > > > ># Set read-ahead. > ># > That's actually 65k x 512byte blocks so 32MiB > >echo "Setting read-ahead to 32 MiB for /dev/md3" > >blockdev --setra 65536 /dev/md3 > > > ># Set stripe-cache_size for RAID5. > >echo "Setting stripe_cache_size to 16 MiB for /dev/md3" > > Sorry to sound like a broken record, 16MiB is not correct. > > size=$((num_disks * 4 * 16384 / 1024)) > echo "Setting stripe_cache_size to $size MiB for /dev/md3" > > ...and commit 8b3e6cdc should improve the performance / > stripe_cache_size ratio. > > >echo 16384 > /sys/block/md3/md/stripe_cache_size > > > ># Disable NCQ on all disks. > >echo "Disabling NCQ on all disks..." > >for i in $DISKS > >do > > echo "Disabling NCQ on $i" > > echo 1 > /sys/block/"$i"/device/queue_depth > >done > > > ># Fix slice_idle. > ># See http://www.nextre.it/oracledocs/ioscheduler_03.html > >echo "Fixing slice_idle to 0..." > >for i in $DISKS > >do > > echo "Changing slice_idle to 0 on $i" > > echo 0 > /sys/block/"$i"/queue/iosched/slice_idle > >done > > > > Thanks for putting this data together. > > Regards, > Dan > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-09 7:51 ` thomas62186218 2008-06-09 8:43 ` Keld Jørn Simonsen @ 2008-06-09 13:41 ` David Lethe 2008-06-09 14:27 ` Keld Jørn Simonsen 1 sibling, 1 reply; 14+ messages in thread From: David Lethe @ 2008-06-09 13:41 UTC (permalink / raw) To: thomas62186218, dan.j.williams, jpiszcz; +Cc: linux-kernel, linux-raid, xfs, ap For faster random I/O: * Decrease chunk size * Migrate files that have higher random I/O to a RAID1 set, using disks with the lowest access time/latency * If possible, use the /dev/shm file system * Determine I/O size of apps that produce most of the random I/O, and make sure that md+filesystem matches. If most random I/O is 32KB, then don't waste bandwidth by making md read 256KB at a time, or making it read 2x16KB I/Os. Also don't build md sets like 4-drive RAID5, (Do a 5-drive RAID5 set), because non-parity data isn't a multiple of 2. A 10-drive RAID5 set with heavy random I/O is also profoundly wrong because you are just removing the opportunity to have all of those heads processing random I/O. * If you only have one partition on a md set, then partition it into a few file systems. This may provide greater opportunity for caching I/Os. * Experiment with different file systems, and optimize accordingly. * Turn of journaling, or at least move journals to RAID1 devices. * Add RAM and try to increase buffer cache in attempt to improve cache hit percentage (this works up to a point) * Buy a small SSD and migrate files that get pounded with random I/O to that device. (Make sure you don't get a flash SSD, but a DRAM based SSD that satisfies random I/O in nanoseconds instead of millisecs). They are expensive, but the appropriate device. This is how companies such as Google & Ebay manage to get things done. The biggest thing to remember about random I/O, is that they are expensive, so just step back and think about ways to minimize the I/O requests to disk in the first place, and/or to spread the I/O across multiple raidsets that can work independently to satisfy your load. All suggestions above will not work for everybody. You must understand the nature of the bottleneck. David -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of thomas62186218@aol.com Sent: Monday, June 09, 2008 2:51 AM To: dan.j.williams@gmail.com; jpiszcz@lucidpixels.com Cc: linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; xfs@oss.sgi.com; ap@solarrain.com Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors Thank you for sharing these results. One issue that I consistently see with these results is miserable random IO performance. Looking at these numbers, even a low-end RAID controller with 128MB of cache will outrun md-based RAIDs in random IO benchmarks. In today's world of virtual machines, etc, random IO is far more common than sequential IO. What can be done with md (or something else) to alleviate this problem? -Thomas -----Original Message----- From: Dan Williams <dan.j.williams@gmail.com> To: Justin Piszcz <jpiszcz@lucidpixels.com> Cc: linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; xfs@oss.sgi.com; Alan Piszcz <ap@solarrain.com> Sent: Sat, 7 Jun 2008 6:46 pm Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors On Sat, Jun 7, 2008 at 7:22 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > First, the original benchmarks with 6-SATA drives with fixed formatting, > using > right justification and the same decimal point precision throughout: > http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-an d-right-justified/disks.html > > Now for for veliciraptors! Ever wonder what kind of speed is possible with > 3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each run is > executed three times and the average is taken of all three runs per each > RAID5 disk set. > > In short? The 965 no longer does justice with faster drives, a new chipset > and motherboard are needed. After reading or writing to 4-5 veliciraptors > it saturates the bus/965 chipset. > > Here is a picture of the 12 veliciraptors I tested with: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicir aptors/raptors.jpg > > Here are the bonnie++ results: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicir aptors/veliciraptor-raid.html > > For those who want the results in text: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicir aptors/veliciraptor-raid.txt > > System used, same/similar as before: > Motherboard: Intel DG965WH > Memory: 8GiB > Kernel: 2.6.25.4 > Distribution: Debian Testing x86_64 > Filesystem: XFS with default mkfs.xfs parameters [auto-optimized for SW > RAID] > Mount options: defaults,noatime,nodiratime,logbufs=8,logbsize=262144 0 1 > Chunk size: 1024KiB > RAID5 Layout: Default (left-symmetric) > Mdadm Superblock used: 0.90 > > Optimizations used (last one is for the CFQ scheduler), it improves > performance by a modest 5-10MiB/s: > http://home.comcast.net/~jpiszcz/raid/20080601/raid5.html > > # Tell user what's going on. > echo "Optimizing RAID Arrays..." > > # Define DISKS. > cd /sys/block > DISKS=$(/bin/ls -1d sd[a-z]) > > # Set read-ahead. > # > That's actually 65k x 512byte blocks so 32MiB > echo "Setting read-ahead to 32 MiB for /dev/md3" > blockdev --setra 65536 /dev/md3 > > # Set stripe-cache_size for RAID5. > echo "Setting stripe_cache_size to 16 MiB for /dev/md3" Sorry to sound like a broken record, 16MiB is not correct. size=$((num_disks * 4 * 16384 / 1024)) echo "Setting stripe_cache_size to $size MiB for /dev/md3" ...and commit 8b3e6cdc should improve the performance / stripe_cache_size ratio. > echo 16384 > /sys/block/md3/md/stripe_cache_size > > # Disable NCQ on all disks. > echo "Disabling NCQ on all disks..." > for i in $DISKS > do > echo "Disabling NCQ on $i" > echo 1 > /sys/block/"$i"/device/queue_depth > done > > # Fix slice_idle. > # See http://www.nextre.it/oracledocs/ioscheduler_03.html > echo "Fixing slice_idle to 0..." > for i in $DISKS > do > echo "Changing slice_idle to 0 on $i" > echo 0 > /sys/block/"$i"/queue/iosched/slice_idle > done > Thanks for putting this data together. Regards, Dan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-09 13:41 ` David Lethe @ 2008-06-09 14:27 ` Keld Jørn Simonsen 2008-06-09 14:56 ` David Lethe 0 siblings, 1 reply; 14+ messages in thread From: Keld Jørn Simonsen @ 2008-06-09 14:27 UTC (permalink / raw) To: David Lethe Cc: thomas62186218, dan.j.williams, jpiszcz, linux-kernel, linux-raid, xfs, ap On Mon, Jun 09, 2008 at 08:41:18AM -0500, David Lethe wrote: > For faster random I/O: > * Decrease chunk size > * Migrate files that have higher random I/O to a RAID1 set, using disks > with the lowest access time/latency > * If possible, use the /dev/shm file system > * Determine I/O size of apps that produce most of the random I/O, and > make sure that md+filesystem matches. If most random I/O is 32KB, then > don't waste bandwidth by making md read 256KB at a time, or making it > read 2x16KB I/Os. Also don't build md sets like 4-drive RAID5, (Do a > 5-drive RAID5 set), because non-parity data isn't a multiple of 2. A > 10-drive RAID5 set with heavy random I/O is also profoundly wrong > because you are just removing the opportunity to have all of those heads > processing random I/O. > * If you only have one partition on a md set, then partition it into a > few file systems. This may provide greater opportunity for caching I/Os. > * Experiment with different file systems, and optimize accordingly. > * Turn of journaling, or at least move journals to RAID1 devices. > * Add RAM and try to increase buffer cache in attempt to improve cache > hit percentage (this works up to a point) > * Buy a small SSD and migrate files that get pounded with random I/O to > that device. (Make sure you don't get a flash SSD, but a DRAM based SSD > that satisfies random I/O in nanoseconds instead of millisecs). They are > expensive, but the appropriate device. This is how companies such as > Google & Ebay manage to get things done. > The biggest thing to remember about random I/O, is that they are > expensive, so just step back and think about ways to minimize the I/O > requests to disk in the first place, and/or to spread the I/O across > multiple raidsets that can work independently to satisfy your load. All > suggestions above will not work for everybody. You must understand the > nature of the bottleneck. For faster random IO I would suggest to use raid10,f2 for the random reading, it performs like raid0, something like more than double the speed of a normal single-drive file system. For random writes raid10,f2 performs like most other mirrorred raids, given that data needs to be written twice. Try and see if you can gat any HW raids to match that performance. best regards keld > David > > -----Original Message----- > From: linux-raid-owner@vger.kernel.org > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of > thomas62186218@aol.com > Sent: Monday, June 09, 2008 2:51 AM > To: dan.j.williams@gmail.com; jpiszcz@lucidpixels.com > Cc: linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; > xfs@oss.sgi.com; ap@solarrain.com > Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte > Veliciraptors > > Thank you for sharing these results. One issue that I consistently see > with these results is miserable random IO performance. Looking at these > numbers, even a low-end RAID controller with 128MB of cache will outrun > md-based RAIDs in random IO benchmarks. In today's world of virtual > machines, etc, random IO is far more common than sequential IO. What > can be done with md (or something else) to alleviate this problem? > > -Thomas > > > -----Original Message----- > From: Dan Williams <dan.j.williams@gmail.com> > To: Justin Piszcz <jpiszcz@lucidpixels.com> > Cc: linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; > xfs@oss.sgi.com; Alan Piszcz <ap@solarrain.com> > Sent: Sat, 7 Jun 2008 6:46 pm > Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte > Veliciraptors > > > > > > > > > > > On Sat, Jun 7, 2008 at 7:22 AM, Justin Piszcz <jpiszcz@lucidpixels.com> > wrote: > > First, the original benchmarks with 6-SATA drives with fixed > formatting, > > using > > right justification and the same decimal point precision throughout: > > > http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-an > d-right-justified/disks.html > > > > Now for for veliciraptors! Ever wonder what kind of speed is > possible with > > 3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each > run is > > executed three times and the average is taken of all three runs per > each > > RAID5 disk set. > > > > In short? The 965 no longer does justice with faster drives, a new > chipset > > and motherboard are needed. After reading or writing to 4-5 > veliciraptors > > it saturates the bus/965 chipset. > > > > Here is a picture of the 12 veliciraptors I tested with: > > > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicir > aptors/raptors.jpg > > > > Here are the bonnie++ results: > > > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicir > aptors/veliciraptor-raid.html > > > > For those who want the results in text: > > > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicir > aptors/veliciraptor-raid.txt > > > > System used, same/similar as before: > > Motherboard: Intel DG965WH > > Memory: 8GiB > > Kernel: 2.6.25.4 > > Distribution: Debian Testing x86_64 > > Filesystem: XFS with default mkfs.xfs parameters [auto-optimized for > SW > > RAID] > > Mount options: defaults,noatime,nodiratime,logbufs=8,logbsize=262144 > 0 1 > > Chunk size: 1024KiB > > RAID5 Layout: Default (left-symmetric) > > Mdadm Superblock used: 0.90 > > > > Optimizations used (last one is for the CFQ scheduler), it improves > > performance by a modest 5-10MiB/s: > > http://home.comcast.net/~jpiszcz/raid/20080601/raid5.html > > > > # Tell user what's going on. > > echo "Optimizing RAID Arrays..." > > > > # Define DISKS. > > cd /sys/block > > DISKS=$(/bin/ls -1d sd[a-z]) > > > > # Set read-ahead. > > # > That's actually 65k x 512byte blocks so 32MiB > > echo "Setting read-ahead to 32 MiB for /dev/md3" > > blockdev --setra 65536 /dev/md3 > > > > # Set stripe-cache_size for RAID5. > > echo "Setting stripe_cache_size to 16 MiB for /dev/md3" > > Sorry to sound like a broken record, 16MiB is not correct. > > size=$((num_disks * 4 * 16384 / 1024)) > echo "Setting stripe_cache_size to $size MiB for /dev/md3" > > ...and commit 8b3e6cdc should improve the performance / > stripe_cache_size ratio. > > > echo 16384 > /sys/block/md3/md/stripe_cache_size > > > > # Disable NCQ on all disks. > > echo "Disabling NCQ on all disks..." > > for i in $DISKS > > do > > echo "Disabling NCQ on $i" > > echo 1 > /sys/block/"$i"/device/queue_depth > > done > > > > # Fix slice_idle. > > # See http://www.nextre.it/oracledocs/ioscheduler_03.html > > echo "Fixing slice_idle to 0..." > > for i in $DISKS > > do > > echo "Changing slice_idle to 0 on $i" > > echo 0 > /sys/block/"$i"/queue/iosched/slice_idle > > done > > > > Thanks for putting this data together. > > Regards, > Dan > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-09 14:27 ` Keld Jørn Simonsen @ 2008-06-09 14:56 ` David Lethe 2008-06-09 23:15 ` Keld Jørn Simonsen 0 siblings, 1 reply; 14+ messages in thread From: David Lethe @ 2008-06-09 14:56 UTC (permalink / raw) To: Keld Jørn Simonsen Cc: thomas62186218, dan.j.williams, jpiszcz, linux-kernel, linux-raid, xfs, ap -----Original Message----- From: Keld Jørn Simonsen [mailto:keld@dkuug.dk] Sent: Monday, June 09, 2008 9:27 AM To: David Lethe Cc: thomas62186218@aol.com; dan.j.williams@gmail.com; jpiszcz@lucidpixels.com; linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; xfs@oss.sgi.com; ap@solarrain.com Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors On Mon, Jun 09, 2008 at 08:41:18AM -0500, David Lethe wrote: > For faster random I/O: > * Decrease chunk size > * Migrate files that have higher random I/O to a RAID1 set, using disks > with the lowest access time/latency > * If possible, use the /dev/shm file system > * Determine I/O size of apps that produce most of the random I/O, and > make sure that md+filesystem matches. If most random I/O is 32KB, then > don't waste bandwidth by making md read 256KB at a time, or making it > read 2x16KB I/Os. Also don't build md sets like 4-drive RAID5, (Do a > 5-drive RAID5 set), because non-parity data isn't a multiple of 2. A > 10-drive RAID5 set with heavy random I/O is also profoundly wrong > because you are just removing the opportunity to have all of those heads > processing random I/O. > * If you only have one partition on a md set, then partition it into a > few file systems. This may provide greater opportunity for caching I/Os. > * Experiment with different file systems, and optimize accordingly. > * Turn of journaling, or at least move journals to RAID1 devices. > * Add RAM and try to increase buffer cache in attempt to improve cache > hit percentage (this works up to a point) > * Buy a small SSD and migrate files that get pounded with random I/O to > that device. (Make sure you don't get a flash SSD, but a DRAM based SSD > that satisfies random I/O in nanoseconds instead of millisecs). They are > expensive, but the appropriate device. This is how companies such as > Google & Ebay manage to get things done. > The biggest thing to remember about random I/O, is that they are > expensive, so just step back and think about ways to minimize the I/O > requests to disk in the first place, and/or to spread the I/O across > multiple raidsets that can work independently to satisfy your load. All > suggestions above will not work for everybody. You must understand the > nature of the bottleneck. For faster random IO I would suggest to use raid10,f2 for the random reading, it performs like raid0, something like more than double the speed of a normal single-drive file system. For random writes raid10,f2 performs like most other mirrorred raids, given that data needs to be written twice. Try and see if you can gat any HW raids to match that performance. best regards keld -------------------------------------------------------------------------------- Keld: That is counter-intuitive. The issue is random IOPs, not throughput. I do not understand how a RAID10 would provide more IOs per sec than RAID1. Or, since you are using RAID10, then how could RAID10 serve more random I/Os then a pair of RAID1 filesystems? RAID0 dictates that each disk will supply half of the data you want per application I/O request. At least with RAID1, then each disk can get all the data you want with a single request, and dual-porting/load balancing will allow both disks to work independently of each other on reads so the disk with the least amount of load at any time can work on the request. That is why RAID1 can be faster than JBOD. Granted writes are handled differently, but with any RAID0 implementation you still have to write Half of the data to each disk requiring 2 I/Os + journaling & housekeeping. David -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-09 14:56 ` David Lethe @ 2008-06-09 23:15 ` Keld Jørn Simonsen 0 siblings, 0 replies; 14+ messages in thread From: Keld Jørn Simonsen @ 2008-06-09 23:15 UTC (permalink / raw) To: David Lethe Cc: thomas62186218, dan.j.williams, jpiszcz, linux-kernel, linux-raid, xfs, ap On Mon, Jun 09, 2008 at 09:56:14AM -0500, David Lethe wrote: > > > From: Keld Jørn Simonsen [mailto:keld@dkuug.dk] > Sent: Monday, June 09, 2008 9:27 AM > To: David Lethe > Cc: thomas62186218@aol.com; dan.j.williams@gmail.com; jpiszcz@lucidpixels.com; linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; xfs@oss.sgi.com; ap@solarrain.com > Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors > > For faster random IO I would suggest to use raid10,f2 for the random > reading, it performs like raid0, something like more than double the > speed of a normal single-drive file system. For random writes raid10,f2 > performs like most other mirrorred raids, given that data needs to be > written twice. > > Try and see if you can gat any HW raids to match that performance. > > best regards > keld > > -------------------------------------------------------------------------------- > Keld: > That is counter-intuitive. The issue is random IOPs, not throughput. That probably depends on your use. I run Linux mirrors, and for that purpose thruputi of random IO, especially reading, is key. For data bases it is probably something else, probably IOP. here I also think that Linux MD raid has good performance. Once again I think my pet RAID type, raid10,f2 has something to offer, especially with lower random seek rates, as the track span is shorter, and on the outer, faster tracks. And other uses may have other bottlenecks. In general I think that thruput is an important figure, as it shows how fast a system can process a given amount of data. Areas where this may count include web servers, file servers, print servers, ordinary workstations. I actually think those 2 measures for random IO: IO thruput, and IO transactions per second, for read and write, are the two most important measures. For the IO transacions per second I agree that your suggestions are good advice. I would like to have good benchmarking tools for this, and also I would like figures on how Linux MD compares to different HW RAID. > I do not > understand how a RAID10 would provide more IOs per sec than RAID1. Or, since > you are using RAID10, then how could RAID10 serve more random I/Os then a pair > of RAID1 filesystems? In theory you are right. The MD implementation of RAID1 does not seem to handle random seeks so well, AFAIK. Then the seeks are confined with raid10,f2 to less than half of the disk arm movement, taht does speed things up a little. > RAID0 dictates that each disk will supply half > of the data you want per application I/O request. At least with RAID1, then each > disk can get all the data you want with a single request, and dual-porting/load balancing > will allow both disks to work independently of each other on reads so the disk with > the least amount of load at any time can work on the request. That is why RAID1 can be > faster than JBOD. > > Granted writes are handled differently, but with any RAID0 implementation you still have to write > Half of the data to each disk requiring 2 I/Os + journaling & housekeeping. yes, indeed. best regards keld -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-07 14:22 Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors Justin Piszcz 2008-06-07 15:54 ` David Lethe 2008-06-08 1:46 ` Dan Williams @ 2008-06-11 17:02 ` Nat Makarevitch 2008-06-11 20:27 ` Bill Davidsen 3 siblings, 0 replies; 14+ messages in thread From: Nat Makarevitch @ 2008-06-11 17:02 UTC (permalink / raw) To: linux-raid Justin Piszcz <jpiszcz <at> lucidpixels.com> writes: > Ever wonder what kind of speed is possible with 3 disk, 4,5,6,7,8,9,10-disk RAID5s? > Here are the bonnie++ results: > http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-veliciraptors/veliciraptor-raid.html Why does the amount of spindles has nearly no effect on the amount of seeks per second? 3 disks: 713.9 seeks/s (AFAIK the Raptor works at 10000 rpm, getting 230+ seeks/s is astonishing) 10 disks: 705.5 seeks/s (same as 3 disks?!) Did I miss something? Or did you use a very large stripe size (to the point of forbiding the 16 GB file used to span over all spindles?)? Or is it some glitch in the RAID code (I don't think so, on a RAID10 with 10 low-end disk I obtained ~1000 IOPS: http://www.makarevitch.org/rant/raid/#3wmd)? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-07 14:22 Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors Justin Piszcz ` (2 preceding siblings ...) 2008-06-11 17:02 ` Nat Makarevitch @ 2008-06-11 20:27 ` Bill Davidsen 2008-06-11 20:48 ` Justin Piszcz 3 siblings, 1 reply; 14+ messages in thread From: Bill Davidsen @ 2008-06-11 20:27 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs, Alan Piszcz Justin Piszcz wrote: > First, the original benchmarks with 6-SATA drives with fixed > formatting, using > right justification and the same decimal point precision throughout: > http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-and-right-justified/disks.html > > > Now for for veliciraptors! Ever wonder what kind of speed is possible > with > 3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each run is > executed three times and the average is taken of all three runs per > each RAID5 disk set. > > In short? The 965 no longer does justice with faster drives, a new > chipset > and motherboard are needed. After reading or writing to 4-5 veliciraptors > it saturates the bus/965 chipset. This is very interesting, but a 16GB chunk size bears no relationship to anything I would run in the real world, and I suspect most people are in the same category. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-11 20:27 ` Bill Davidsen @ 2008-06-11 20:48 ` Justin Piszcz 2008-06-11 20:53 ` Justin Piszcz 0 siblings, 1 reply; 14+ messages in thread From: Justin Piszcz @ 2008-06-11 20:48 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-kernel, linux-raid, xfs, Alan Piszcz On Wed, 11 Jun 2008, Bill Davidsen wrote: > Justin Piszcz wrote: >> First, the original benchmarks with 6-SATA drives with fixed formatting, >> using >> right justification and the same decimal point precision throughout: >> http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-and-right-justified/disks.html >> >> Now for for veliciraptors! Ever wonder what kind of speed is possible with >> 3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each run is >> executed three times and the average is taken of all three runs per each >> RAID5 disk set. >> >> In short? The 965 no longer does justice with faster drives, a new chipset >> and motherboard are needed. After reading or writing to 4-5 veliciraptors >> it saturates the bus/965 chipset. > > This is very interesting, but a 16GB chunk size bears no relationship to > anything I would run in the real world, and I suspect most people are in the > same category. I based my bonnie++ test on: http://everything2.org/?node_id=1479435 So I could compare to his results. I use a 1024k (1MiB) with 16384 stripe, this offered the best overall read/write/rewrite performance AFAIK. Justin. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-11 20:48 ` Justin Piszcz @ 2008-06-11 20:53 ` Justin Piszcz 2008-06-12 19:08 ` Bill Davidsen 0 siblings, 1 reply; 14+ messages in thread From: Justin Piszcz @ 2008-06-11 20:53 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-kernel, linux-raid, xfs, Alan Piszcz On Wed, 11 Jun 2008, Justin Piszcz wrote: > > > On Wed, 11 Jun 2008, Bill Davidsen wrote: > >> Justin Piszcz wrote: >>> First, the original benchmarks with 6-SATA drives with fixed formatting, >>> using >>> right justification and the same decimal point precision throughout: >>> http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-and-right-justified/disks.html >>> Now for for veliciraptors! Ever wonder what kind of speed is possible with >>> 3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each run is >>> executed three times and the average is taken of all three runs per each >>> RAID5 disk set. >>> >>> In short? The 965 no longer does justice with faster drives, a new chipset >>> and motherboard are needed. After reading or writing to 4-5 veliciraptors >>> it saturates the bus/965 chipset. >> >> This is very interesting, but a 16GB chunk size bears no relationship to >> anything I would run in the real world, and I suspect most people are in >> the same category. > > I based my bonnie++ test on: > http://everything2.org/?node_id=1479435 > > So I could compare to his results. > > I use a 1024k (1MiB) with 16384 stripe, this offered the best overall > read/write/rewrite performance AFAIK. 1024k chunk size (raid5 chunk size) echo 16384 > stripe_cache_size ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors 2008-06-11 20:53 ` Justin Piszcz @ 2008-06-12 19:08 ` Bill Davidsen 0 siblings, 0 replies; 14+ messages in thread From: Bill Davidsen @ 2008-06-12 19:08 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs, Alan Piszcz Justin Piszcz wrote: > > > On Wed, 11 Jun 2008, Justin Piszcz wrote: > >> >> >> On Wed, 11 Jun 2008, Bill Davidsen wrote: >> >>> Justin Piszcz wrote: >>>> First, the original benchmarks with 6-SATA drives with fixed >>>> formatting, using >>>> right justification and the same decimal point precision throughout: >>>> http://home.comcast.net/~jpiszcz/20080607/raid-benchmarks-decimal-fix-and-right-justified/disks.html >>>> Now for for veliciraptors! Ever wonder what kind of speed is >>>> possible with >>>> 3 disk, 4,5,6,7,8,9,10-disk RAID5s? I ran a loop to find out, each >>>> run is >>>> executed three times and the average is taken of all three runs per >>>> each RAID5 disk set. >>>> >>>> In short? The 965 no longer does justice with faster drives, a new >>>> chipset >>>> and motherboard are needed. After reading or writing to 4-5 >>>> veliciraptors >>>> it saturates the bus/965 chipset. >>> >>> This is very interesting, but a 16GB chunk size bears no >>> relationship to anything I would run in the real world, and I >>> suspect most people are in the same category. >> >> I based my bonnie++ test on: >> http://everything2.org/?node_id=1479435 >> >> So I could compare to his results. >> >> I use a 1024k (1MiB) with 16384 stripe, this offered the best overall >> read/write/rewrite performance AFAIK. > > 1024k chunk size (raid5 chunk size) > echo 16384 > stripe_cache_size Please don't explain any more, I'm confused enough already. I can't make those numbers match 16G no matter how I add them, either the contents of the column labeled "size:chunk size" isn't the size of the chunk, or you have a multiplier floating around that I don't see. And you eliminated the degraded performance, since your stripe_cache_size is less than (raid5 chunk size)*(#disks), I would expect the reads in degraded mode to be dog slow because the don't fit in cache, even if 1024k is what I call chunk size and certainly not if chunk size is 16G. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-06-12 19:08 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-07 14:22 Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors Justin Piszcz 2008-06-07 15:54 ` David Lethe 2008-06-08 1:46 ` Dan Williams 2008-06-09 7:51 ` thomas62186218 2008-06-09 8:43 ` Keld Jørn Simonsen 2008-06-09 13:41 ` David Lethe 2008-06-09 14:27 ` Keld Jørn Simonsen 2008-06-09 14:56 ` David Lethe 2008-06-09 23:15 ` Keld Jørn Simonsen 2008-06-11 17:02 ` Nat Makarevitch 2008-06-11 20:27 ` Bill Davidsen 2008-06-11 20:48 ` Justin Piszcz 2008-06-11 20:53 ` Justin Piszcz 2008-06-12 19:08 ` Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).