* Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
@ 2008-01-16 16:13 Justin Piszcz
2008-01-16 16:55 ` Justin Piszcz
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Justin Piszcz @ 2008-01-16 16:13 UTC (permalink / raw)
To: xfs, linux-raid; +Cc: Alan Piszcz
For these benchmarks I timed how long it takes to extract a standard 4.4
GiB DVD:
Settings: Software RAID 5 with the following settings (until I change
those too):
Base setup:
blockdev --setra 65536 /dev/md3
echo 16384 > /sys/block/md3/md/stripe_cache_size
echo "Disabling NCQ on all disks..."
for i in $DISKS
do
echo "Disabling NCQ on $i"
echo 1 > /sys/block/"$i"/device/queue_depth
done
p34:~# grep : *chunk* |sort -n
4-chunk.txt:0:45.31
8-chunk.txt:0:44.32
16-chunk.txt:0:41.02
32-chunk.txt:0:40.50
64-chunk.txt:0:40.88
128-chunk.txt:0:40.21
256-chunk.txt:0:40.14***
512-chunk.txt:0:40.35
1024-chunk.txt:0:41.11
2048-chunk.txt:0:43.89
4096-chunk.txt:0:47.34
8192-chunk.txt:0:57.86
16384-chunk.txt:1:09.39
32768-chunk.txt:1:26.61
It would appear a 256 KiB chunk-size is optimal.
So what about NCQ?
1=ncq_depth.txt:0:40.86***
2=ncq_depth.txt:0:40.99
4=ncq_depth.txt:0:42.52
8=ncq_depth.txt:0:43.57
16=ncq_depth.txt:0:42.54
31=ncq_depth.txt:0:42.51
Keeping it off seems best.
1=stripe_and_read_ahead.txt:0:40.86
2=stripe_and_read_ahead.txt:0:40.99
4=stripe_and_read_ahead.txt:0:42.52
8=stripe_and_read_ahead.txt:0:43.57
16=stripe_and_read_ahead.txt:0:42.54
31=stripe_and_read_ahead.txt:0:42.51
256=stripe_and_read_ahead.txt:1:44.16
1024=stripe_and_read_ahead.txt:1:07.01
2048=stripe_and_read_ahead.txt:0:53.59
4096=stripe_and_read_ahead.txt:0:45.66
8192=stripe_and_read_ahead.txt:0:40.73
16384=stripe_and_read_ahead.txt:0:38.99**
16384=stripe_and_65536_read_ahead.txt:0:38.67
16384=stripe_and_65536_read_ahead.txt:0:38.69 (again, this is what I use
from earlier benchmarks)
32768=stripe_and_read_ahead.txt:0:38.84
What about logbufs?
2=logbufs.txt:0:39.21
4=logbufs.txt:0:39.24
8=logbufs.txt:0:38.71
(again)
2=logbufs.txt:0:42.16
4=logbufs.txt:0:38.79
8=logbufs.txt:0:38.71** (yes)
What about logbsize?
16k=logbsize.txt:1:09.22
32k=logbsize.txt:0:38.70
64k=logbsize.txt:0:39.04
128k=logbsize.txt:0:39.06
256k=logbsize.txt:0:38.59** (best)
What about allocsize? (default=1024k)
4k=allocsize.txt:0:39.35
8k=allocsize.txt:0:38.95
16k=allocsize.txt:0:38.79
32k=allocsize.txt:0:39.71
64k=allocsize.txt:1:09.67
128k=allocsize.txt:0:39.04
256k=allocsize.txt:0:39.11
512k=allocsize.txt:0:39.01
1024k=allocsize.txt:0:38.75** (default)
2048k=allocsize.txt:0:39.07
4096k=allocsize.txt:0:39.15
8192k=allocsize.txt:0:39.40
16384k=allocsize.txt:0:39.36
What about the agcount?
2=agcount.txt:0:37.53
4=agcount.txt:0:38.56
8=agcount.txt:0:40.86
16=agcount.txt:0:39.05
32=agcount.txt:0:39.07** (default)
64=agcount.txt:0:39.29
128=agcount.txt:0:39.42
256=agcount.txt:0:38.76
512=agcount.txt:0:38.27
1024=agcount.txt:0:38.29
2048=agcount.txt:1:08.55
4096=agcount.txt:0:52.65
8192=agcount.txt:1:06.96
16384=agcount.txt:1:31.21
32768=agcount.txt:1:09.06
65536=agcount.txt:1:54.96
So far I have:
p34:~# mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 /dev/md3
meta-data=/dev/md3 isize=256 agcount=32, agsize=10302272
blks
= sectsz=4096 attr=2
data = bsize=4096 blocks=329671296, imaxpct=25
= sunit=64 swidth=576 blks, unwritten=1
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=2359296 blocks=0, rtextents=0
p34:~# grep /dev/md3 /etc/fstab
/dev/md3 /r1 xfs noatime,nodiratime,logbufs=8,logbsize=262144 0 1
Notice how mkfs.xfs 'knows' the sunit and swidth, and it is the correct
units too because it is software raid, and it pulls this information from
that layer, unlike HW raid which will not have a clue of what is
underneath and say sunit=0,swidth=0.
However, in earlier testing I actually made them both 0 and it actually
made performance better:
http://home.comcast.net/~jpiszcz/sunit-swidth/results.html
In any case, I am re-running bonnie++ once more with a 256 KiB chunk and
will compare to those values in a bit.
Justin.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 16:13 Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again Justin Piszcz
@ 2008-01-16 16:55 ` Justin Piszcz
2008-01-16 17:27 ` Al Boldi
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: Justin Piszcz @ 2008-01-16 16:55 UTC (permalink / raw)
To: xfs, linux-raid; +Cc: Alan Piszcz
On Wed, 16 Jan 2008, Justin Piszcz wrote:
> For these benchmarks I timed how long it takes to extract a standard 4.4 GiB
> DVD:
>
> Settings: Software RAID 5 with the following settings (until I change those
> too):
http://home.comcast.net/~jpiszcz/sunit-swidth/newresults.html
Any idea why an sunit and swidth of 0 (and -d agcount=4) is faster at least
with sequential input/output than the proper sunit/swidth that it should be?
It does not make sense.
Justin.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 16:13 Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again Justin Piszcz
2008-01-16 16:55 ` Justin Piszcz
@ 2008-01-16 17:27 ` Al Boldi
2008-01-16 18:02 ` Justin Piszcz
2008-01-16 18:09 ` Greg Cormier
2008-01-18 15:35 ` Greg Cormier
3 siblings, 1 reply; 16+ messages in thread
From: Al Boldi @ 2008-01-16 17:27 UTC (permalink / raw)
To: Justin Piszcz, xfs, linux-raid; +Cc: Alan Piszcz
Justin Piszcz wrote:
> For these benchmarks I timed how long it takes to extract a standard 4.4
> GiB DVD:
>
> Settings: Software RAID 5 with the following settings (until I change
> those too):
>
> Base setup:
> blockdev --setra 65536 /dev/md3
> echo 16384 > /sys/block/md3/md/stripe_cache_size
> echo "Disabling NCQ on all disks..."
> for i in $DISKS
> do
> echo "Disabling NCQ on $i"
> echo 1 > /sys/block/"$i"/device/queue_depth
> done
>
> p34:~# grep : *chunk* |sort -n
> 4-chunk.txt:0:45.31
> 8-chunk.txt:0:44.32
> 16-chunk.txt:0:41.02
> 32-chunk.txt:0:40.50
> 64-chunk.txt:0:40.88
> 128-chunk.txt:0:40.21
> 256-chunk.txt:0:40.14***
> 512-chunk.txt:0:40.35
> 1024-chunk.txt:0:41.11
> 2048-chunk.txt:0:43.89
> 4096-chunk.txt:0:47.34
> 8192-chunk.txt:0:57.86
> 16384-chunk.txt:1:09.39
> 32768-chunk.txt:1:26.61
>
> It would appear a 256 KiB chunk-size is optimal.
Can you retest with different max_sectors_kb on both md and sd?
Also, can you retest using dd with different block-sizes?
Thanks!
--
Al
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 17:27 ` Al Boldi
@ 2008-01-16 18:02 ` Justin Piszcz
2008-01-16 21:19 ` Al Boldi
0 siblings, 1 reply; 16+ messages in thread
From: Justin Piszcz @ 2008-01-16 18:02 UTC (permalink / raw)
To: Al Boldi; +Cc: xfs, linux-raid, Alan Piszcz
On Wed, 16 Jan 2008, Al Boldi wrote:
> Justin Piszcz wrote:
>> For these benchmarks I timed how long it takes to extract a standard 4.4
>> GiB DVD:
>>
>> Settings: Software RAID 5 with the following settings (until I change
>> those too):
>>
>> Base setup:
>> blockdev --setra 65536 /dev/md3
>> echo 16384 > /sys/block/md3/md/stripe_cache_size
>> echo "Disabling NCQ on all disks..."
>> for i in $DISKS
>> do
>> echo "Disabling NCQ on $i"
>> echo 1 > /sys/block/"$i"/device/queue_depth
>> done
>>
>> p34:~# grep : *chunk* |sort -n
>> 4-chunk.txt:0:45.31
>> 8-chunk.txt:0:44.32
>> 16-chunk.txt:0:41.02
>> 32-chunk.txt:0:40.50
>> 64-chunk.txt:0:40.88
>> 128-chunk.txt:0:40.21
>> 256-chunk.txt:0:40.14***
>> 512-chunk.txt:0:40.35
>> 1024-chunk.txt:0:41.11
>> 2048-chunk.txt:0:43.89
>> 4096-chunk.txt:0:47.34
>> 8192-chunk.txt:0:57.86
>> 16384-chunk.txt:1:09.39
>> 32768-chunk.txt:1:26.61
>>
>> It would appear a 256 KiB chunk-size is optimal.
>
> Can you retest with different max_sectors_kb on both md and sd?
Remember this is SW RAID, so max_sectors_kb will only affect the
individual disks underneath the SW RAID, I have benchmarked in the past,
the defaults chosen by the kernel are optimal, changing them did not make
any noticable improvements.
> > Also, can you retest using dd with different block-sizes?
I can do this, moment..
I know about oflag=direct but I choose to use dd with sync and measure the
total time it takes.
/usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero
of=/r1/bigfile bs=1M count=10240; sync'
So I was asked on the mailing list to test dd with various chunk sizes,
here is the length of time it took
to write 10 GiB and sync per each chunk size:
4=chunk.txt:0:25.46
8=chunk.txt:0:25.63
16=chunk.txt:0:25.26
32=chunk.txt:0:25.08
64=chunk.txt:0:25.55
128=chunk.txt:0:25.26
256=chunk.txt:0:24.72
512=chunk.txt:0:24.71
1024=chunk.txt:0:25.40
2048=chunk.txt:0:25.71
4096=chunk.txt:0:27.18
8192=chunk.txt:0:29.00
16384=chunk.txt:0:31.43
32768=chunk.txt:0:50.11
65536=chunk.txt:2:20.80
Justin.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 18:02 ` Justin Piszcz
@ 2008-01-16 21:19 ` Al Boldi
2008-01-16 22:53 ` Justin Piszcz
2008-01-16 22:55 ` Justin Piszcz
0 siblings, 2 replies; 16+ messages in thread
From: Al Boldi @ 2008-01-16 21:19 UTC (permalink / raw)
To: Justin Piszcz; +Cc: xfs, linux-raid, Alan Piszcz
Justin Piszcz wrote:
> On Wed, 16 Jan 2008, Al Boldi wrote:
> > > Also, can you retest using dd with different block-sizes?
>
> I can do this, moment..
>
>
> I know about oflag=direct but I choose to use dd with sync and measure the
> total time it takes.
> /usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero
> of=/r1/bigfile bs=1M count=10240; sync'
>
> So I was asked on the mailing list to test dd with various chunk sizes,
> here is the length of time it took
> to write 10 GiB and sync per each chunk size:
>
> 4=chunk.txt:0:25.46
> 8=chunk.txt:0:25.63
> 16=chunk.txt:0:25.26
> 32=chunk.txt:0:25.08
> 64=chunk.txt:0:25.55
> 128=chunk.txt:0:25.26
> 256=chunk.txt:0:24.72
> 512=chunk.txt:0:24.71
> 1024=chunk.txt:0:25.40
> 2048=chunk.txt:0:25.71
> 4096=chunk.txt:0:27.18
> 8192=chunk.txt:0:29.00
> 16384=chunk.txt:0:31.43
> 32768=chunk.txt:0:50.11
> 65536=chunk.txt:2:20.80
What do you get with bs=512,1k,2k,4k,8k,16k...
Thanks!
--
Al
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 21:19 ` Al Boldi
@ 2008-01-16 22:53 ` Justin Piszcz
2008-01-16 22:55 ` Justin Piszcz
1 sibling, 0 replies; 16+ messages in thread
From: Justin Piszcz @ 2008-01-16 22:53 UTC (permalink / raw)
To: Al Boldi; +Cc: xfs, linux-raid, Alan Piszcz
On Thu, 17 Jan 2008, Al Boldi wrote:
> Justin Piszcz wrote:
>> On Wed, 16 Jan 2008, Al Boldi wrote:
>>>> Also, can you retest using dd with different block-sizes?
>>
>> I can do this, moment..
>>
>>
>> I know about oflag=direct but I choose to use dd with sync and measure the
>> total time it takes.
>> /usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero
>> of=/r1/bigfile bs=1M count=10240; sync'
>>
>> So I was asked on the mailing list to test dd with various chunk sizes,
>> here is the length of time it took
>> to write 10 GiB and sync per each chunk size:
>>
>> 4=chunk.txt:0:25.46
>> 8=chunk.txt:0:25.63
>> 16=chunk.txt:0:25.26
>> 32=chunk.txt:0:25.08
>> 64=chunk.txt:0:25.55
>> 128=chunk.txt:0:25.26
>> 256=chunk.txt:0:24.72
>> 512=chunk.txt:0:24.71
>> 1024=chunk.txt:0:25.40
>> 2048=chunk.txt:0:25.71
>> 4096=chunk.txt:0:27.18
>> 8192=chunk.txt:0:29.00
>> 16384=chunk.txt:0:31.43
>> 32768=chunk.txt:0:50.11
>> 65536=chunk.txt:2:20.80
>
> What do you get with bs=512,1k,2k,4k,8k,16k...
>
>
> Thanks!
>
> --
> Al
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Done testing for now, but I did test with 256k with a 256k chunk and
obviously that got good results, just like 1m with a 1mb chunk, 460-480
MiB/s.
Justin.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 21:19 ` Al Boldi
2008-01-16 22:53 ` Justin Piszcz
@ 2008-01-16 22:55 ` Justin Piszcz
2008-01-18 15:24 ` Bill Davidsen
1 sibling, 1 reply; 16+ messages in thread
From: Justin Piszcz @ 2008-01-16 22:55 UTC (permalink / raw)
To: Al Boldi; +Cc: xfs, linux-raid, Alan Piszcz
On Thu, 17 Jan 2008, Al Boldi wrote:
> Justin Piszcz wrote:
>> On Wed, 16 Jan 2008, Al Boldi wrote:
>>>> Also, can you retest using dd with different block-sizes?
>>
>> I can do this, moment..
>>
>>
>> I know about oflag=direct but I choose to use dd with sync and measure the
>> total time it takes.
>> /usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero
>> of=/r1/bigfile bs=1M count=10240; sync'
>>
>> So I was asked on the mailing list to test dd with various chunk sizes,
>> here is the length of time it took
>> to write 10 GiB and sync per each chunk size:
>>
>> 4=chunk.txt:0:25.46
>> 8=chunk.txt:0:25.63
>> 16=chunk.txt:0:25.26
>> 32=chunk.txt:0:25.08
>> 64=chunk.txt:0:25.55
>> 128=chunk.txt:0:25.26
>> 256=chunk.txt:0:24.72
>> 512=chunk.txt:0:24.71
>> 1024=chunk.txt:0:25.40
>> 2048=chunk.txt:0:25.71
>> 4096=chunk.txt:0:27.18
>> 8192=chunk.txt:0:29.00
>> 16384=chunk.txt:0:31.43
>> 32768=chunk.txt:0:50.11
>> 65536=chunk.txt:2:20.80
>
> What do you get with bs=512,1k,2k,4k,8k,16k...
>
>
> Thanks!
>
> --
> Al
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
root 4621 0.0 0.0 12404 760 pts/2 D+ 17:53 0:00 mdadm -S /dev/md3
root 4664 0.0 0.0 4264 728 pts/5 S+ 17:54 0:00 grep D
Tried to stop it when it was re-syncing, DEADLOCK :(
[ 305.464904] md: md3 still in use.
[ 314.595281] md: md_do_sync() got signal ... exiting
Anyhow, done testing, time to move data back on if I can kill the resync
process w/out deadlock.
Justin.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 22:55 ` Justin Piszcz
@ 2008-01-18 15:24 ` Bill Davidsen
2008-01-18 15:28 ` Justin Piszcz
0 siblings, 1 reply; 16+ messages in thread
From: Bill Davidsen @ 2008-01-18 15:24 UTC (permalink / raw)
To: Justin Piszcz; +Cc: Al Boldi, xfs, linux-raid, Alan Piszcz
Justin Piszcz wrote:
>
>
> On Thu, 17 Jan 2008, Al Boldi wrote:
>
>> Justin Piszcz wrote:
>>> On Wed, 16 Jan 2008, Al Boldi wrote:
>>>>> Also, can you retest using dd with different block-sizes?
>>>
>>> I can do this, moment..
>>>
>>>
>>> I know about oflag=direct but I choose to use dd with sync and
>>> measure the
>>> total time it takes.
>>> /usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero
>>> of=/r1/bigfile bs=1M count=10240; sync'
>>>
>>> So I was asked on the mailing list to test dd with various chunk sizes,
>>> here is the length of time it took
>>> to write 10 GiB and sync per each chunk size:
>>>
>>> 4=chunk.txt:0:25.46
>>> 8=chunk.txt:0:25.63
>>> 16=chunk.txt:0:25.26
>>> 32=chunk.txt:0:25.08
>>> 64=chunk.txt:0:25.55
>>> 128=chunk.txt:0:25.26
>>> 256=chunk.txt:0:24.72
>>> 512=chunk.txt:0:24.71
>>> 1024=chunk.txt:0:25.40
>>> 2048=chunk.txt:0:25.71
>>> 4096=chunk.txt:0:27.18
>>> 8192=chunk.txt:0:29.00
>>> 16384=chunk.txt:0:31.43
>>> 32768=chunk.txt:0:50.11
>>> 65536=chunk.txt:2:20.80
>>
>> What do you get with bs=512,1k,2k,4k,8k,16k...
>>
>>
>> Thanks!
>>
>> --
>> Al
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> root 4621 0.0 0.0 12404 760 pts/2 D+ 17:53 0:00 mdadm
> -S /dev/md3
> root 4664 0.0 0.0 4264 728 pts/5 S+ 17:54 0:00 grep D
>
> Tried to stop it when it was re-syncing, DEADLOCK :(
>
> [ 305.464904] md: md3 still in use.
> [ 314.595281] md: md_do_sync() got signal ... exiting
>
> Anyhow, done testing, time to move data back on if I can kill the
> resync process w/out deadlock.
So does that indicate that there is still a deadlock issue, or that you
don't have the latest patches installed?
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-18 15:24 ` Bill Davidsen
@ 2008-01-18 15:28 ` Justin Piszcz
0 siblings, 0 replies; 16+ messages in thread
From: Justin Piszcz @ 2008-01-18 15:28 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Al Boldi, xfs, linux-raid, Alan Piszcz
On Fri, 18 Jan 2008, Bill Davidsen wrote:
> Justin Piszcz wrote:
>>
>>
>> On Thu, 17 Jan 2008, Al Boldi wrote:
>>
>>> Justin Piszcz wrote:
>>>> On Wed, 16 Jan 2008, Al Boldi wrote:
>>>>>> Also, can you retest using dd with different block-sizes?
>>>>
>>>> I can do this, moment..
>>>>
>>>>
>>>> I know about oflag=direct but I choose to use dd with sync and measure
>>>> the
>>>> total time it takes.
>>>> /usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero
>>>> of=/r1/bigfile bs=1M count=10240; sync'
>>>>
>>>> So I was asked on the mailing list to test dd with various chunk sizes,
>>>> here is the length of time it took
>>>> to write 10 GiB and sync per each chunk size:
>>>>
>>>> 4=chunk.txt:0:25.46
>>>> 8=chunk.txt:0:25.63
>>>> 16=chunk.txt:0:25.26
>>>> 32=chunk.txt:0:25.08
>>>> 64=chunk.txt:0:25.55
>>>> 128=chunk.txt:0:25.26
>>>> 256=chunk.txt:0:24.72
>>>> 512=chunk.txt:0:24.71
>>>> 1024=chunk.txt:0:25.40
>>>> 2048=chunk.txt:0:25.71
>>>> 4096=chunk.txt:0:27.18
>>>> 8192=chunk.txt:0:29.00
>>>> 16384=chunk.txt:0:31.43
>>>> 32768=chunk.txt:0:50.11
>>>> 65536=chunk.txt:2:20.80
>>>
>>> What do you get with bs=512,1k,2k,4k,8k,16k...
>>>
>>>
>>> Thanks!
>>>
>>> --
>>> Al
>>>
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> root 4621 0.0 0.0 12404 760 pts/2 D+ 17:53 0:00 mdadm -S
>> /dev/md3
>> root 4664 0.0 0.0 4264 728 pts/5 S+ 17:54 0:00 grep D
>>
>> Tried to stop it when it was re-syncing, DEADLOCK :(
>>
>> [ 305.464904] md: md3 still in use.
>> [ 314.595281] md: md_do_sync() got signal ... exiting
>>
>> Anyhow, done testing, time to move data back on if I can kill the resync
>> process w/out deadlock.
>
> So does that indicate that there is still a deadlock issue, or that you don't
> have the latest patches installed?
>
> --
> Bill Davidsen <davidsen@tmr.com>
> "Woe unto the statesman who makes war without a reason that will still
> be valid when the war is over..." Otto von Bismark
>
I was trying to stop the raid when it was building, vanilla 2.6.23.14.
Justin.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 16:13 Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again Justin Piszcz
2008-01-16 16:55 ` Justin Piszcz
2008-01-16 17:27 ` Al Boldi
@ 2008-01-16 18:09 ` Greg Cormier
2008-01-16 18:19 ` Justin Piszcz
2008-01-18 15:35 ` Greg Cormier
3 siblings, 1 reply; 16+ messages in thread
From: Greg Cormier @ 2008-01-16 18:09 UTC (permalink / raw)
To: Justin Piszcz; +Cc: xfs, linux-raid, Alan Piszcz
What sort of tools are you using to get these benchmarks, and can I
used them for ext3?
Very interested in running this on my server.
Thanks,
Greg
On Jan 16, 2008 11:13 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> For these benchmarks I timed how long it takes to extract a standard 4.4
> GiB DVD:
>
> Settings: Software RAID 5 with the following settings (until I change
> those too):
>
> Base setup:
> blockdev --setra 65536 /dev/md3
> echo 16384 > /sys/block/md3/md/stripe_cache_size
> echo "Disabling NCQ on all disks..."
> for i in $DISKS
> do
> echo "Disabling NCQ on $i"
> echo 1 > /sys/block/"$i"/device/queue_depth
> done
>
> p34:~# grep : *chunk* |sort -n
> 4-chunk.txt:0:45.31
> 8-chunk.txt:0:44.32
> 16-chunk.txt:0:41.02
> 32-chunk.txt:0:40.50
> 64-chunk.txt:0:40.88
> 128-chunk.txt:0:40.21
> 256-chunk.txt:0:40.14***
> 512-chunk.txt:0:40.35
> 1024-chunk.txt:0:41.11
> 2048-chunk.txt:0:43.89
> 4096-chunk.txt:0:47.34
> 8192-chunk.txt:0:57.86
> 16384-chunk.txt:1:09.39
> 32768-chunk.txt:1:26.61
>
> It would appear a 256 KiB chunk-size is optimal.
>
> So what about NCQ?
>
> 1=ncq_depth.txt:0:40.86***
> 2=ncq_depth.txt:0:40.99
> 4=ncq_depth.txt:0:42.52
> 8=ncq_depth.txt:0:43.57
> 16=ncq_depth.txt:0:42.54
> 31=ncq_depth.txt:0:42.51
>
> Keeping it off seems best.
>
> 1=stripe_and_read_ahead.txt:0:40.86
> 2=stripe_and_read_ahead.txt:0:40.99
> 4=stripe_and_read_ahead.txt:0:42.52
> 8=stripe_and_read_ahead.txt:0:43.57
> 16=stripe_and_read_ahead.txt:0:42.54
> 31=stripe_and_read_ahead.txt:0:42.51
> 256=stripe_and_read_ahead.txt:1:44.16
> 1024=stripe_and_read_ahead.txt:1:07.01
> 2048=stripe_and_read_ahead.txt:0:53.59
> 4096=stripe_and_read_ahead.txt:0:45.66
> 8192=stripe_and_read_ahead.txt:0:40.73
> 16384=stripe_and_read_ahead.txt:0:38.99**
> 16384=stripe_and_65536_read_ahead.txt:0:38.67
> 16384=stripe_and_65536_read_ahead.txt:0:38.69 (again, this is what I use
> from earlier benchmarks)
> 32768=stripe_and_read_ahead.txt:0:38.84
>
> What about logbufs?
>
> 2=logbufs.txt:0:39.21
> 4=logbufs.txt:0:39.24
> 8=logbufs.txt:0:38.71
>
> (again)
>
> 2=logbufs.txt:0:42.16
> 4=logbufs.txt:0:38.79
> 8=logbufs.txt:0:38.71** (yes)
>
> What about logbsize?
>
> 16k=logbsize.txt:1:09.22
> 32k=logbsize.txt:0:38.70
> 64k=logbsize.txt:0:39.04
> 128k=logbsize.txt:0:39.06
> 256k=logbsize.txt:0:38.59** (best)
>
>
> What about allocsize? (default=1024k)
>
> 4k=allocsize.txt:0:39.35
> 8k=allocsize.txt:0:38.95
> 16k=allocsize.txt:0:38.79
> 32k=allocsize.txt:0:39.71
> 64k=allocsize.txt:1:09.67
> 128k=allocsize.txt:0:39.04
> 256k=allocsize.txt:0:39.11
> 512k=allocsize.txt:0:39.01
> 1024k=allocsize.txt:0:38.75** (default)
> 2048k=allocsize.txt:0:39.07
> 4096k=allocsize.txt:0:39.15
> 8192k=allocsize.txt:0:39.40
> 16384k=allocsize.txt:0:39.36
>
> What about the agcount?
>
> 2=agcount.txt:0:37.53
> 4=agcount.txt:0:38.56
> 8=agcount.txt:0:40.86
> 16=agcount.txt:0:39.05
> 32=agcount.txt:0:39.07** (default)
> 64=agcount.txt:0:39.29
> 128=agcount.txt:0:39.42
> 256=agcount.txt:0:38.76
> 512=agcount.txt:0:38.27
> 1024=agcount.txt:0:38.29
> 2048=agcount.txt:1:08.55
> 4096=agcount.txt:0:52.65
> 8192=agcount.txt:1:06.96
> 16384=agcount.txt:1:31.21
> 32768=agcount.txt:1:09.06
> 65536=agcount.txt:1:54.96
>
>
> So far I have:
>
> p34:~# mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 /dev/md3
> meta-data=/dev/md3 isize=256 agcount=32, agsize=10302272
> blks
> = sectsz=4096 attr=2
> data = bsize=4096 blocks=329671296, imaxpct=25
> = sunit=64 swidth=576 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal log bsize=4096 blocks=32768, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=2359296 blocks=0, rtextents=0
>
> p34:~# grep /dev/md3 /etc/fstab
> /dev/md3 /r1 xfs noatime,nodiratime,logbufs=8,logbsize=262144 0 1
>
> Notice how mkfs.xfs 'knows' the sunit and swidth, and it is the correct
> units too because it is software raid, and it pulls this information from
> that layer, unlike HW raid which will not have a clue of what is
> underneath and say sunit=0,swidth=0.
>
> However, in earlier testing I actually made them both 0 and it actually
> made performance better:
>
> http://home.comcast.net/~jpiszcz/sunit-swidth/results.html
>
> In any case, I am re-running bonnie++ once more with a 256 KiB chunk and
> will compare to those values in a bit.
>
> Justin.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 18:09 ` Greg Cormier
@ 2008-01-16 18:19 ` Justin Piszcz
2008-02-19 21:48 ` Peter Grandi
0 siblings, 1 reply; 16+ messages in thread
From: Justin Piszcz @ 2008-01-16 18:19 UTC (permalink / raw)
To: Greg Cormier; +Cc: xfs, linux-raid, Alan Piszcz
On Wed, 16 Jan 2008, Greg Cormier wrote:
> What sort of tools are you using to get these benchmarks, and can I
> used them for ext3?
>
> Very interested in running this on my server.
>
>
> Thanks,
> Greg
>
You can use whatever suits you, such as untar kernel source tree, copy files, untar backups, etc--, you should benchmark specifically what *your* workload is.
Here is the skeleton, using bash:: (don't forget to turn off the cron
daemon)
for i in 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
do
cd /
umount /r1
mdadm -S /dev/md3
mdadm --create --assume-clean --verbose /dev/md3 --level=5 --raid-devices=10 --chunk=$i --run /dev/sd[c-l]1
/etc/init.d/oraid.sh # to optimize my raid stuff
mkfs.xfs -f /dev/md3
mount /dev/md3 /r1 -o logbufs=8,logbsize=262144
# then simply add what you do often here
# everyone's workload is different
/usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero of=/r1/bigfile bs=1M count=10240; sync'
done
Then just, grep : /root/*chunk* | sort -n to get the results in the same format.
Justin.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 18:19 ` Justin Piszcz
@ 2008-02-19 21:48 ` Peter Grandi
0 siblings, 0 replies; 16+ messages in thread
From: Peter Grandi @ 2008-02-19 21:48 UTC (permalink / raw)
To: Linux RAID, Linux XFS
>> What sort of tools are you using to get these benchmarks, and can I
>> used them for ext3?
The only simple tools that I found that gives semi-reasonable
numbers avoiding most of the many pitfalls of storage speed
testing (almost all storage benchmarks I see are largely
meaningless) are recent versions of GNU 'dd' when used with the
'fdatsync' and 'direct' flags and Bonnie 1.4 with the options
'-u -y -o_direct', both used with a moderately large volume of
data (dependent on the size of the host adapter cache if any).
In particular one must be very careful when using older versions
of 'dd' or Bonnie, or using bonnie++, iozone (unless with -U or
-I), ...
[ ... ]
> for i in 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
> do
> cd /
> umount /r1
> mdadm -S /dev/md3
> mdadm --create --assume-clean --verbose /dev/md3 --level=5 --raid-devices=10 --chunk=$i --run /dev/sd[c-l]1
> /etc/init.d/oraid.sh # to optimize my raid stuff
> mkfs.xfs -f /dev/md3
> mount /dev/md3 /r1 -o logbufs=8,logbsize=262144
> /usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero of=/r1/bigfile bs=1M count=10240; sync'
> done
I would not consider the results from this as particularly
meaningful (that 'sync' only helps a little bit) even for
large sequential write testing. One would have also to document
the elevator used and the flushed daemon parameters.
Let's say that storage benchmarking is a lot more difficult and
subtle than it looks like to the untrained eye.
It is just so much easier to use Bonnie 1.4 (with the flags
mentioned above) as a first (and often last) approximation (but
always remember to mention which elevator was in use).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-16 16:13 Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again Justin Piszcz
` (2 preceding siblings ...)
2008-01-16 18:09 ` Greg Cormier
@ 2008-01-18 15:35 ` Greg Cormier
2008-01-18 15:39 ` Justin Piszcz
3 siblings, 1 reply; 16+ messages in thread
From: Greg Cormier @ 2008-01-18 15:35 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-raid
Justin, thanks for the script. Here's my results. I ran it a few times
with different tests, hence the small number of results you see here,
I slowly trimmed out the obvious not-ideal sizes.
System
---
Athlon64 3500
2GB RAM
4x500GB WD Raid editions, raid 5. SDE is the old 4-platter version
(5000YS), the others are the 3 platter version. Faster :-)
/dev/sdb:
Timing buffered disk reads: 240 MB in 3.00 seconds = 79.91 MB/sec
/dev/sdc:
Timing buffered disk reads: 248 MB in 3.01 seconds = 82.36 MB/sec
/dev/sdd:
Timing buffered disk reads: 248 MB in 3.02 seconds = 82.22 MB/sec
/dev/sde: (older model, 4 platters instead of 3)
Timing buffered disk reads: 210 MB in 3.01 seconds = 69.87 MB/sec
/dev/md3:
Timing buffered disk reads: 628 MB in 3.00 seconds = 209.09 MB/sec
Testing
---
Test was : dd if=/dev/zero of=/r1/bigfile bs=1M count=10240; sync
64-chunka.txt:2:00.63
128-chunka.txt:2:00.20
256-chunka.txt:2:01.67
512-chunka.txt:2:19.90
1024-chunka.txt:2:59.32
Test was : Unraring multipart RAR's, 1.2 gigabytes. Source and dest
drive were the raid array.
64-chunkc.txt:1:04.20
128-chunkc.txt:0:49.37
256-chunkc.txt:0:48.88
512-chunkc.txt:0:41.20
1024-chunkc.txt:0:40.82
So, there's a toss up between 256 and 512. If I'm interpreting
correctly here, raw throughput is better with 256, but 512 seems to
work better with real-world stuff? I'll try to think up another test
or two perhaps, and removing 64 as one of the possible options to save
time (mke2fs takes a while on 1.5TB)
Next step will be playing with read aheads and stripe cache sizes I
guess! I'm open to any comments/suggestions you guys have!
Greg
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-18 15:35 ` Greg Cormier
@ 2008-01-18 15:39 ` Justin Piszcz
2008-01-18 15:43 ` Greg Cormier
0 siblings, 1 reply; 16+ messages in thread
From: Justin Piszcz @ 2008-01-18 15:39 UTC (permalink / raw)
To: Greg Cormier; +Cc: linux-raid, xfs, Alan Piszcz
On Fri, 18 Jan 2008, Greg Cormier wrote:
> Justin, thanks for the script. Here's my results. I ran it a few times
> with different tests, hence the small number of results you see here,
> I slowly trimmed out the obvious not-ideal sizes.
Nice, we all love benchmarks!! :)
>
> System
> ---
> Athlon64 3500
> 2GB RAM
> 4x500GB WD Raid editions, raid 5. SDE is the old 4-platter version
> (5000YS), the others are the 3 platter version. Faster :-)
Ok.
>
> /dev/sdb:
> Timing buffered disk reads: 240 MB in 3.00 seconds = 79.91 MB/sec
> /dev/sdc:
> Timing buffered disk reads: 248 MB in 3.01 seconds = 82.36 MB/sec
> /dev/sdd:
> Timing buffered disk reads: 248 MB in 3.02 seconds = 82.22 MB/sec
> /dev/sde: (older model, 4 platters instead of 3)
> Timing buffered disk reads: 210 MB in 3.01 seconds = 69.87 MB/sec
> /dev/md3:
> Timing buffered disk reads: 628 MB in 3.00 seconds = 209.09 MB/sec
>
>
> Testing
> ---
> Test was : dd if=/dev/zero of=/r1/bigfile bs=1M count=10240; sync
> 64-chunka.txt:2:00.63
> 128-chunka.txt:2:00.20
> 256-chunka.txt:2:01.67
> 512-chunka.txt:2:19.90
> 1024-chunka.txt:2:59.32
For your configuration, a 64-256k chunk seems optimal for this, hypothetical
benchmark :)
>
>
> Test was : Unraring multipart RAR's, 1.2 gigabytes. Source and dest
> drive were the raid array.
> 64-chunkc.txt:1:04.20
> 128-chunkc.txt:0:49.37
> 256-chunkc.txt:0:48.88
> 512-chunkc.txt:0:41.20
> 1024-chunkc.txt:0:40.82
1 meg looks like its the best, which is what I use today, 1 MiB chunk offers
the best peformance by far, at least with all of my testing (with big files)
such as the tests you performed.
>
>
>
> So, there's a toss up between 256 and 512.
Yeah for DD performance, not real-life.
> If I'm interpreting
> correctly here, raw throughput is better with 256, but 512 seems to
> work better with real-world stuff?
Look above, 1 MiB got you the fastest unrar time.
> I'll try to think up another test
> or two perhaps, and removing 64 as one of the possible options to save
> time (mke2fs takes a while on 1.5TB)
Also, don't use ext*, XFS can be up to 2-3x faster (in many of the
benchmarks).
>
> Next step will be playing with read aheads and stripe cache sizes I
> guess! I'm open to any comments/suggestions you guys have!
>
> Greg
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-18 15:39 ` Justin Piszcz
@ 2008-01-18 15:43 ` Greg Cormier
2008-01-18 15:58 ` Justin Piszcz
0 siblings, 1 reply; 16+ messages in thread
From: Greg Cormier @ 2008-01-18 15:43 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-raid, xfs, Alan Piszcz
> Also, don't use ext*, XFS can be up to 2-3x faster (in many of the
> benchmarks).
I'm going to swap file systems and give it a shot right now! :)
How is stability of XFS? I heard recovery is easier with ext2/3 due to
more people using it, more tools available, etc?
Greg
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again
2008-01-18 15:43 ` Greg Cormier
@ 2008-01-18 15:58 ` Justin Piszcz
0 siblings, 0 replies; 16+ messages in thread
From: Justin Piszcz @ 2008-01-18 15:58 UTC (permalink / raw)
To: Greg Cormier; +Cc: linux-raid, xfs, Alan Piszcz
On Fri, 18 Jan 2008, Greg Cormier wrote:
>> Also, don't use ext*, XFS can be up to 2-3x faster (in many of the
>> benchmarks).
>
> I'm going to swap file systems and give it a shot right now! :)
>
> How is stability of XFS? I heard recovery is easier with ext2/3 due to
> more people using it, more tools available, etc?
>
> Greg
>
Recovery is actually easier with XFS because the repair filesystem code is
built-into the kernel (you dont need a utility to fix it)-- however, there
is xfs_repair (if) the in-kernel-tree part could not fix it.
I have been using it for > 4-5 years? now.
Also, with CoRaids (ATA over Ethernet) many of them are above 8TB and ext3
only works up to 8TB, so its not even an option any longer.
Justin.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2008-02-19 21:48 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-16 16:13 Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again Justin Piszcz
2008-01-16 16:55 ` Justin Piszcz
2008-01-16 17:27 ` Al Boldi
2008-01-16 18:02 ` Justin Piszcz
2008-01-16 21:19 ` Al Boldi
2008-01-16 22:53 ` Justin Piszcz
2008-01-16 22:55 ` Justin Piszcz
2008-01-18 15:24 ` Bill Davidsen
2008-01-18 15:28 ` Justin Piszcz
2008-01-16 18:09 ` Greg Cormier
2008-01-16 18:19 ` Justin Piszcz
2008-02-19 21:48 ` Peter Grandi
2008-01-18 15:35 ` Greg Cormier
2008-01-18 15:39 ` Justin Piszcz
2008-01-18 15:43 ` Greg Cormier
2008-01-18 15:58 ` Justin Piszcz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.