Linux RAID subsystem development
 help / color / mirror / Atom feed
* Fastest Chunk Size w/XFS For MD Software RAID = 1024k
@ 2007-06-27 23:20 Justin Piszcz
  2007-06-27 23:20 ` Justin Piszcz
  2007-06-28  3:43 ` Peter Rabbitson
  0 siblings, 2 replies; 20+ messages in thread
From: Justin Piszcz @ 2007-06-27 23:20 UTC (permalink / raw)
  To: linux-raid, xfs; +Cc: Alan Piszcz

The results speak for themselves:

http://home.comcast.net/~jpiszcz/chunk/index.html


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-27 23:20 Fastest Chunk Size w/XFS For MD Software RAID = 1024k Justin Piszcz
@ 2007-06-27 23:20 ` Justin Piszcz
  2007-06-27 23:24   ` Justin Piszcz
  2007-06-28  5:08   ` David Chinner
  2007-06-28  3:43 ` Peter Rabbitson
  1 sibling, 2 replies; 20+ messages in thread
From: Justin Piszcz @ 2007-06-27 23:20 UTC (permalink / raw)
  To: linux-raid, xfs; +Cc: Alan Piszcz

For drives with 16MB of cache (in this case, raptors).

Justin.

On Wed, 27 Jun 2007, Justin Piszcz wrote:

> The results speak for themselves:
>
> http://home.comcast.net/~jpiszcz/chunk/index.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-27 23:20 ` Justin Piszcz
@ 2007-06-27 23:24   ` Justin Piszcz
  2007-06-28  5:08   ` David Chinner
  1 sibling, 0 replies; 20+ messages in thread
From: Justin Piszcz @ 2007-06-27 23:24 UTC (permalink / raw)
  To: linux-raid, xfs; +Cc: Alan Piszcz

For the e-mail archives:

p34-128k-chunk,15696M,77236.3,99,445653,86.3333,192267,34.3333,78773.7,99,524463,41,594.9,0,16:100000:16/64,1298.67,10.6667,5964.33,17.3333,3035.67,18.3333,1512,13.6667,5334.33,16,2634.67,19
p34-512k-chunk,15696M,78383,99,436842,86,162969,27,79624,99,486892,38,583.0,0,16:100000:16/64,2019,17,9715,29,4272,23,2250,22,17095,45,3691,30
p34-1024k-chunk,15696M,77672.3,99,455267,87.3333,183772,29.6667,79601.3,99,578225,43.3333,595.933,0,16:100000:16/64,2085.67,18,12953,39,3908.33,23.3333,2375.33,23.3333,18492,51.6667,3388.33,27
p34-4096k-chunk,15696M,33791.1,43.5556,176630,37.3333,72235.1,11.5556,34424.9,44,247925,18.2222,271.644,0,16:100000:16/64,560,4.88889,2928,8.88889,1039.56,5.77778,571.556,5.33333,1729.78,5.33333,1289.33,9.33333


On Wed, 27 Jun 2007, Justin Piszcz wrote:

> For drives with 16MB of cache (in this case, raptors).
>
> Justin.
>
> On Wed, 27 Jun 2007, Justin Piszcz wrote:
>
>> The results speak for themselves:
>> 
>> http://home.comcast.net/~jpiszcz/chunk/index.html
>> 
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-27 23:20 Fastest Chunk Size w/XFS For MD Software RAID = 1024k Justin Piszcz
  2007-06-27 23:20 ` Justin Piszcz
@ 2007-06-28  3:43 ` Peter Rabbitson
  2007-06-28  8:07   ` Justin Piszcz
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Rabbitson @ 2007-06-28  3:43 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, xfs, Alan Piszcz

Justin Piszcz wrote:
> The results speak for themselves:
> 
> http://home.comcast.net/~jpiszcz/chunk/index.html
> 


What is the array layout (-l ? -n ? -p ?)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-27 23:20 ` Justin Piszcz
  2007-06-27 23:24   ` Justin Piszcz
@ 2007-06-28  5:08   ` David Chinner
  2007-06-28  7:53     ` David Greaves
  2007-06-28  8:07     ` Justin Piszcz
  1 sibling, 2 replies; 20+ messages in thread
From: David Chinner @ 2007-06-28  5:08 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, xfs, Alan Piszcz

On Wed, Jun 27, 2007 at 07:20:42PM -0400, Justin Piszcz wrote:
> For drives with 16MB of cache (in this case, raptors).

That's four (4) drives, right?

If so, how do you get a block read rate of 578MB/s from
4 drives? That's 145MB/s per drive....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  5:08   ` David Chinner
@ 2007-06-28  7:53     ` David Greaves
  2007-06-28  8:07     ` Justin Piszcz
  1 sibling, 0 replies; 20+ messages in thread
From: David Greaves @ 2007-06-28  7:53 UTC (permalink / raw)
  To: David Chinner; +Cc: Justin Piszcz, linux-raid, xfs, Alan Piszcz

David Chinner wrote:
> On Wed, Jun 27, 2007 at 07:20:42PM -0400, Justin Piszcz wrote:
>> For drives with 16MB of cache (in this case, raptors).
> 
> That's four (4) drives, right?

I'm pretty sure he's using 10 - email a few days back...
>>>>>> Justin Piszcz wrote:
>>>>> Running test with 10 RAPTOR 150 hard drives, expect it to take 
>>>>> awhile until I get the results, avg them etc. :)

> If so, how do you get a block read rate of 578MB/s from
> 4 drives? That's 145MB/s per drive....

Which gives a far more reasonable 60MB/s per drive...

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  3:43 ` Peter Rabbitson
@ 2007-06-28  8:07   ` Justin Piszcz
  2007-06-28  8:24     ` Peter Rabbitson
  0 siblings, 1 reply; 20+ messages in thread
From: Justin Piszcz @ 2007-06-28  8:07 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: linux-raid, xfs, Alan Piszcz

mdadm --create \
       --verbose /dev/md3 \
       --level=5 \
       --raid-devices=10 \
       --chunk=1024 \
       --force \
       --run
       /dev/sd[cdefghijkl]1

Justin.


On Thu, 28 Jun 2007, Peter Rabbitson wrote:

> Justin Piszcz wrote:
>> The results speak for themselves:
>> 
>> http://home.comcast.net/~jpiszcz/chunk/index.html
>> 
>
>
> What is the array layout (-l ? -n ? -p ?)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  5:08   ` David Chinner
  2007-06-28  7:53     ` David Greaves
@ 2007-06-28  8:07     ` Justin Piszcz
  1 sibling, 0 replies; 20+ messages in thread
From: Justin Piszcz @ 2007-06-28  8:07 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-raid, xfs, Alan Piszcz

10 disks total.

Justin.

On Thu, 28 Jun 2007, David Chinner wrote:

> On Wed, Jun 27, 2007 at 07:20:42PM -0400, Justin Piszcz wrote:
>> For drives with 16MB of cache (in this case, raptors).
>
> That's four (4) drives, right?
>
> If so, how do you get a block read rate of 578MB/s from
> 4 drives? That's 145MB/s per drive....
>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  8:07   ` Justin Piszcz
@ 2007-06-28  8:24     ` Peter Rabbitson
  2007-06-28  8:27       ` Justin Piszcz
  2007-06-28  9:05       ` Matti Aarnio
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Rabbitson @ 2007-06-28  8:24 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, xfs, Alan Piszcz

Justin Piszcz wrote:
> mdadm --create \
>       --verbose /dev/md3 \
>       --level=5 \
>       --raid-devices=10 \
>       --chunk=1024 \
>       --force \
>       --run
>       /dev/sd[cdefghijkl]1
> 
> Justin.

Interesting, I came up with the same results (1M chunk being superior) 
with a completely different raid set with XFS on top:

mdadm	--create \
	--level=10 \
	--chunk=1024 \
	--raid-devices=4 \
	--layout=f3 \
	...

Could it be attributed to XFS itself?

Peter


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  8:24     ` Peter Rabbitson
@ 2007-06-28  8:27       ` Justin Piszcz
  2007-06-28  8:36         ` Peter Rabbitson
  2007-06-28 22:05         ` David Chinner
  2007-06-28  9:05       ` Matti Aarnio
  1 sibling, 2 replies; 20+ messages in thread
From: Justin Piszcz @ 2007-06-28  8:27 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: linux-raid, xfs, Alan Piszcz



On Thu, 28 Jun 2007, Peter Rabbitson wrote:

> Justin Piszcz wrote:
>> mdadm --create \
>>       --verbose /dev/md3 \
>>       --level=5 \
>>       --raid-devices=10 \
>>       --chunk=1024 \
>>       --force \
>>       --run
>>       /dev/sd[cdefghijkl]1
>> 
>> Justin.
>
> Interesting, I came up with the same results (1M chunk being superior) with a 
> completely different raid set with XFS on top:
>
> mdadm	--create \
> 	--level=10 \
> 	--chunk=1024 \
> 	--raid-devices=4 \
> 	--layout=f3 \
> 	...
>
> Could it be attributed to XFS itself?
>
> Peter
>

Good question, by the way how much cache do the drives have that you are 
testing with?

Justin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  8:27       ` Justin Piszcz
@ 2007-06-28  8:36         ` Peter Rabbitson
  2007-06-28  8:38           ` Justin Piszcz
  2007-06-28  8:42           ` Justin Piszcz
  2007-06-28 22:05         ` David Chinner
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Rabbitson @ 2007-06-28  8:36 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

Justin Piszcz wrote:
> 
> On Thu, 28 Jun 2007, Peter Rabbitson wrote:
> 
>> Interesting, I came up with the same results (1M chunk being superior) 
>> with a completely different raid set with XFS on top:
>>
>> ...
>>
>> Could it be attributed to XFS itself?
>>
>> Peter
>>
> 
> Good question, by the way how much cache do the drives have that you are 
> testing with?
> 

I believe 8MB, but I am not sure I am looking at the right number:

root@Arzamas:~# hdparm -i /dev/sda

/dev/sda:

  Model=aMtxro7 2Y050M                          , FwRev=AY5RH10W, 
SerialNo=6YB6Z7E4
  Config={ Fixed }
  RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
  BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
  CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
  IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
  PIO modes:  pio0 pio1 pio2 pio3 pio4
  DMA modes:  mdma0 mdma1 mdma2
  UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
  AdvancedPM=yes: disabled (255) WriteCache=enabled
  Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

  * signifies the current active mode

root@Arzamas:~#

1M chunk consistently delivered best performance with:

o A plain dumb dd run
o bonnie
o two bonnie threads
o iozone with 4 threads

My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
respectively)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  8:36         ` Peter Rabbitson
@ 2007-06-28  8:38           ` Justin Piszcz
  2007-07-03  4:23             ` Mr. James W. Laferriere
  2007-06-28  8:42           ` Justin Piszcz
  1 sibling, 1 reply; 20+ messages in thread
From: Justin Piszcz @ 2007-06-28  8:38 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: linux-raid



On Thu, 28 Jun 2007, Peter Rabbitson wrote:

> Justin Piszcz wrote:
>> 
>> On Thu, 28 Jun 2007, Peter Rabbitson wrote:
>> 
>>> Interesting, I came up with the same results (1M chunk being superior) 
>>> with a completely different raid set with XFS on top:
>>> 
>>> ...
>>> 
>>> Could it be attributed to XFS itself?
>>> 
>>> Peter
>>> 
>> 
>> Good question, by the way how much cache do the drives have that you are 
>> testing with?
>> 
>
> I believe 8MB, but I am not sure I am looking at the right number:
>
> root@Arzamas:~# hdparm -i /dev/sda
>
> /dev/sda:
>
> Model=aMtxro7 2Y050M                          , FwRev=AY5RH10W, 
> SerialNo=6YB6Z7E4
> Config={ Fixed }
> RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
> BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
> CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
> IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
> PIO modes:  pio0 pio1 pio2 pio3 pio4
> DMA modes:  mdma0 mdma1 mdma2
> UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
> AdvancedPM=yes: disabled (255) WriteCache=enabled
> Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
> ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7
>
> * signifies the current active mode
>
> root@Arzamas:~#
>
> 1M chunk consistently delivered best performance with:
>
> o A plain dumb dd run
> o bonnie
> o two bonnie threads
> o iozone with 4 threads
>
> My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
> respectively)
>

8MB yup: BuffSize=7936kB.

My read ahead is set to 64 megabytes and 16384 for the stripe_size_cache.

Justin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  8:36         ` Peter Rabbitson
  2007-06-28  8:38           ` Justin Piszcz
@ 2007-06-28  8:42           ` Justin Piszcz
  2007-06-28  8:46             ` Justin Piszcz
  1 sibling, 1 reply; 20+ messages in thread
From: Justin Piszcz @ 2007-06-28  8:42 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: linux-raid



On Thu, 28 Jun 2007, Peter Rabbitson wrote:

> Justin Piszcz wrote:
>> 
>> On Thu, 28 Jun 2007, Peter Rabbitson wrote:
>> 
>>> Interesting, I came up with the same results (1M chunk being superior) 
>>> with a completely different raid set with XFS on top:
>>> 
>>> ...
>>> 
>>> Could it be attributed to XFS itself?
>>> 
>>> Peter
>>> 
>> 
>> Good question, by the way how much cache do the drives have that you are 
>> testing with?
>> 
>
> I believe 8MB, but I am not sure I am looking at the right number:
>
> root@Arzamas:~# hdparm -i /dev/sda
>
> /dev/sda:
>
> Model=aMtxro7 2Y050M                          , FwRev=AY5RH10W, 
> SerialNo=6YB6Z7E4
> Config={ Fixed }
> RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
> BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
> CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
> IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
> PIO modes:  pio0 pio1 pio2 pio3 pio4
> DMA modes:  mdma0 mdma1 mdma2
> UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
> AdvancedPM=yes: disabled (255) WriteCache=enabled
> Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
> ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7
>
> * signifies the current active mode
>
> root@Arzamas:~#
>
> 1M chunk consistently delivered best performance with:
>
> o A plain dumb dd run
> o bonnie
> o two bonnie threads
> o iozone with 4 threads
>
> My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
> respectively)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Have you also tried tuning:

1. nr_requests per each disk? I noticed 10-20 seconds faster speed 
(overall) with bonnie tests when I set all disks in the array to 512k.
   echo 512 > /sys/block/"$i"/queue/nr_requests

2. Also disable NCQ.
   echo 1 > /sys/block/"$i"/device/queue_depth


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  8:42           ` Justin Piszcz
@ 2007-06-28  8:46             ` Justin Piszcz
  0 siblings, 0 replies; 20+ messages in thread
From: Justin Piszcz @ 2007-06-28  8:46 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: linux-raid



On Thu, 28 Jun 2007, Justin Piszcz wrote:

>
>
> On Thu, 28 Jun 2007, Peter Rabbitson wrote:
>
>> Justin Piszcz wrote:
>>> 
>>> On Thu, 28 Jun 2007, Peter Rabbitson wrote:
>>> 
>>>> Interesting, I came up with the same results (1M chunk being superior) 
>>>> with a completely different raid set with XFS on top:
>>>> 
>>>> ...
>>>> 
>>>> Could it be attributed to XFS itself?
>>>> 
>>>> Peter
>>>> 
>>> 
>>> Good question, by the way how much cache do the drives have that you are 
>>> testing with?
>>> 
>> 
>> I believe 8MB, but I am not sure I am looking at the right number:
>> 
>> root@Arzamas:~# hdparm -i /dev/sda
>> 
>> /dev/sda:
>> 
>> Model=aMtxro7 2Y050M                          , FwRev=AY5RH10W, 
>> SerialNo=6YB6Z7E4
>> Config={ Fixed }
>> RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
>> BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
>> CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
>> IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
>> PIO modes:  pio0 pio1 pio2 pio3 pio4
>> DMA modes:  mdma0 mdma1 mdma2
>> UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
>> AdvancedPM=yes: disabled (255) WriteCache=enabled
>> Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
>> ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7
>> 
>> * signifies the current active mode
>> 
>> root@Arzamas:~#
>> 
>> 1M chunk consistently delivered best performance with:
>> 
>> o A plain dumb dd run
>> o bonnie
>> o two bonnie threads
>> o iozone with 4 threads
>> 
>> My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
>> respectively)
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>
> Have you also tried tuning:
>
> 1. nr_requests per each disk? I noticed 10-20 seconds faster speed (overall) 
> with bonnie tests when I set all disks in the array to 512k.
>  echo 512 > /sys/block/"$i"/queue/nr_requests
>
> 2. Also disable NCQ.
>  echo 1 > /sys/block/"$i"/device/queue_depth
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Also per XFS:

noatime,logbufs=8

I am testing various options, so far the logbufs=8 option is detrimental, 
making the entire bonnie++ run a little slower.  I believe the default is 
2 and it uses 32k(?) buffers (shown below) if the blocksize is less than 
16K I am trying with: noatime,logbufs=8,logbsize=262144 currently.

        logbufs=value
               Set  the  number  of in-memory log buffers.  Valid numbers range
               from 2-8 inclusive.  The default value is 8 buffers for filesys-
               tems  with  a blocksize of 64K, 4 buffers for filesystems with a
               blocksize of 32K, 3 buffers for filesystems with a blocksize  of
               16K, and 2 buffers for all other configurations.  Increasing the
               number of buffers may increase performance on some workloads  at
               the  cost  of the memory used for the additional log buffers and
               their associated control structures.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  8:24     ` Peter Rabbitson
  2007-06-28  8:27       ` Justin Piszcz
@ 2007-06-28  9:05       ` Matti Aarnio
  2007-06-28 11:19         ` Justin Piszcz
  2007-06-28 13:27         ` Jon Nelson
  1 sibling, 2 replies; 20+ messages in thread
From: Matti Aarnio @ 2007-06-28  9:05 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Justin Piszcz, linux-raid, xfs, Alan Piszcz

On Thu, Jun 28, 2007 at 10:24:54AM +0200, Peter Rabbitson wrote:
> Interesting, I came up with the same results (1M chunk being superior) 
> with a completely different raid set with XFS on top:
> 
> mdadm	--create \
> 	--level=10 \
> 	--chunk=1024 \
> 	--raid-devices=4 \
> 	--layout=f3 \
> 	...
> 
> Could it be attributed to XFS itself?

Sort of..

 /dev/md4:
         Version : 00.90.03
      Raid Level : raid5
    Raid Devices : 4
   Total Devices : 4
 Preferred Minor : 4
 
  Active Devices : 4
 Working Devices : 4

          Layout : left-symmetric
      Chunk Size : 256K

This means there are 3x 256k for the user data..
Now I had to carefully tune the XFS  bsize/sunit/swidth  to match that:

 meta-data=/dev/DataDisk/lvol0    isize=256    agcount=32, agsize=7325824 blks
          =                       sectsz=512   attr=1
 data     =                       bsize=4096   blocks=234426368, imaxpct=25
          =                       sunit=64     swidth=192 blks, unwritten=1
 ...

That is, 4k * 64 = 256k,   and   64 * 3 = 192
With that, bulk writing on the file system runs without need to
read back blocks of disk-space to calculate RAID5 parity data because
the filesystem's idea of block does not align with RAID5 surface.

I do have LVM in between the MD-RAID5 and XFS, so I did also align
the LVM to that  3 * 256k.

Doing this alignment thing did boost write performance by nearly
a factor of 2 from mkfs.xfs with default parameters.


With very wide RAID5, like the original question...  I would find it
very surprising if the alignment of upper layers to MD-RAID level
would not be important there as well.

Very small continuous writing does not make good use of disk mechanism,
(seek time, rotation delay), so something in order of 128k-1024k will
speed things up -- presuming that when you are writing, you are doing
it many MB at the time.  Database transactions are a lot smaller, and
are indeed harmed by such large megachunk-IO oriented surfaces.

RAID-levels 0 and 1 (and 10)  do not have the need of reading back parts
of the surface because a subset of it was not altered by incoming write.

Some DB application on top of the filesystem would benefit if we had
a way for it to ask about these alignment boundary issues, so it could
read whole alignment block even though it writes out only a subset of it.
(Theory being that those same blocks would also exist in memory cache
and thus be available for write-back parity calculation.)


> Peter

/Matti Aarnio

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  9:05       ` Matti Aarnio
@ 2007-06-28 11:19         ` Justin Piszcz
  2007-06-28 13:27         ` Jon Nelson
  1 sibling, 0 replies; 20+ messages in thread
From: Justin Piszcz @ 2007-06-28 11:19 UTC (permalink / raw)
  To: Matti Aarnio; +Cc: Peter Rabbitson, linux-raid, xfs, Alan Piszcz



On Thu, 28 Jun 2007, Matti Aarnio wrote:

> On Thu, Jun 28, 2007 at 10:24:54AM +0200, Peter Rabbitson wrote:
>> Interesting, I came up with the same results (1M chunk being superior)
>> with a completely different raid set with XFS on top:
>>
>> mdadm	--create \
>> 	--level=10 \
>> 	--chunk=1024 \
>> 	--raid-devices=4 \
>> 	--layout=f3 \
>> 	...
>>
>> Could it be attributed to XFS itself?

If anyone is interested, I also did a 2048k, 1024k definitely results in
the most optimal configuration.

p34-128k-chunk,15696M,77236.3,99,445653,86.3333,192267,34.3333,78773.7,99,524463,41,594.9,0,16:100000:16/64,1298.67,10.6667,5964.33,17.3333,3035.67,18.3333,1512,13.6667,5334.33,16,2634.67,19
p34-512k-chunk,15696M,78383,99,436842,86,162969,27,79624,99,486892,38,583.0,0,16:100000:16/64,2019,17,9715,29,4272,23,2250,22,17095,45,3691,30
p34-1024k-chunk,15696M,77672.3,99,455267,87.3333,183772,29.6667,79601.3,99,578225,43.3333,595.933,0,16:100000:16/64,2085.67,18,12953,39,3908.33,23.3333,2375.33,23.3333,18492,51.6667,3388.33,27
p34-2048k-chunk,15696M,76822,98,435439,86,164140,26.3333,77065.3,99,582948,44,631.467,0,16:100000:16/64,1795.33,15,17612.3,49.3333,3668.67,20.6667,2040.67,19,13384,38,3255.33,25
p34-4096k-chunk,15696M,33791.1,43.5556,176630,37.3333,72235.1,11.5556,34424.9,44,247925,18.2222,271.644,0,16:100000:16/64,560,4.88889,2928,8.88889,1039.56,5.77778,571.556,5.33333,1729.78,5.33333,1289.33,9.33333

http://home.comcast.net/~jpiszcz/chunk/

Justin.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  9:05       ` Matti Aarnio
  2007-06-28 11:19         ` Justin Piszcz
@ 2007-06-28 13:27         ` Jon Nelson
  1 sibling, 0 replies; 20+ messages in thread
From: Jon Nelson @ 2007-06-28 13:27 UTC (permalink / raw)
  To: Matti Aarnio; +Cc: Peter Rabbitson, Justin Piszcz, linux-raid, xfs, Alan Piszcz

On Thu, 28 Jun 2007, Matti Aarnio wrote:

> I do have LVM in between the MD-RAID5 and XFS, so I did also align
> the LVM to that  3 * 256k.

How did you align the LVM ?


--
Jon Nelson <jnelson-linux-raid@jamponi.net>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  8:27       ` Justin Piszcz
  2007-06-28  8:36         ` Peter Rabbitson
@ 2007-06-28 22:05         ` David Chinner
  1 sibling, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-06-28 22:05 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Peter Rabbitson, linux-raid, xfs, Alan Piszcz

On Thu, Jun 28, 2007 at 04:27:15AM -0400, Justin Piszcz wrote:
> 
> 
> On Thu, 28 Jun 2007, Peter Rabbitson wrote:
> 
> >Justin Piszcz wrote:
> >>mdadm --create \
> >>      --verbose /dev/md3 \
> >>      --level=5 \
> >>      --raid-devices=10 \
> >>      --chunk=1024 \
> >>      --force \
> >>      --run
> >>      /dev/sd[cdefghijkl]1
> >>
> >>Justin.
> >
> >Interesting, I came up with the same results (1M chunk being superior) 
> >with a completely different raid set with XFS on top:
> >
> >mdadm	--create \
> >	--level=10 \
> >	--chunk=1024 \
> >	--raid-devices=4 \
> >	--layout=f3 \
> >	...
> >
> >Could it be attributed to XFS itself?

More likely it's related to the I/O size being sent to the disks. The larger
the chunk size, the larger the I/o hitting each disk. I think the maximum I/O
size is 512k ATM on x86(_64), so a chunk of 1MB will guarantee that there are
maximally sized I/Os being sent to the disk....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-06-28  8:38           ` Justin Piszcz
@ 2007-07-03  4:23             ` Mr. James W. Laferriere
  2007-07-03  8:42               ` Justin Piszcz
  0 siblings, 1 reply; 20+ messages in thread
From: Mr. James W. Laferriere @ 2007-07-03  4:23 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid maillist

 	Hello Justin (& all) ,

On Thu, 28 Jun 2007, Justin Piszcz wrote:
> On Thu, 28 Jun 2007, Peter Rabbitson wrote:
>
>> Justin Piszcz wrote:
>>> 
>>> On Thu, 28 Jun 2007, Peter Rabbitson wrote:
>>> 
>>>> Interesting, I came up with the same results (1M chunk being superior) 
>>>> with a completely different raid set with XFS on top:
>>>> 
>>>> ...
>>>> 
>>>> Could it be attributed to XFS itself?
>>>> 
>>>> Peter
>>>> 
>>> 
>>> Good question, by the way how much cache do the drives have that you are 
>>> testing with?
>>> 
>> 
>> I believe 8MB, but I am not sure I am looking at the right number:
>> 
>> root@Arzamas:~# hdparm -i /dev/sda
>> 
>> /dev/sda:
>> 
>> Model=aMtxro7 2Y050M                          , FwRev=AY5RH10W, 
>> SerialNo=6YB6Z7E4
>> Config={ Fixed }
>> RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
>> BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
>> CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
>> IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
>> PIO modes:  pio0 pio1 pio2 pio3 pio4
>> DMA modes:  mdma0 mdma1 mdma2
>> UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
>> AdvancedPM=yes: disabled (255) WriteCache=enabled
>> Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
>> ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7
>> 
>> * signifies the current active mode
>> 
>> root@Arzamas:~#
>> 
>> 1M chunk consistently delivered best performance with:
>> 
>> o A plain dumb dd run
>> o bonnie
>> o two bonnie threads
>> o iozone with 4 threads
>> 
>> My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
>> respectively)
>> 
>
> 8MB yup: BuffSize=7936kB.
>
> My read ahead is set to 64 megabytes and 16384 for the stripe_size_cache.

 	Might you know of a tool for acquiring these (*) parameters for a scsi 
drive ?  hdrarm really doesn't like real scsi drives so that doesn't seem to work for 
me .

(*)	"BuffType=DualPortCache, BuffSize=7936kB,"  Stolen from above .


 		Tia ,  JimL
-- 
+-----------------------------------------------------------------+
| James   W.   Laferriere | System   Techniques | Give me VMS     |
| Network        Engineer | 663  Beaumont  Blvd |  Give me Linux  |
| babydr@baby-dragons.com | Pacifica, CA. 94044 |   only  on  AXP |
+-----------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
  2007-07-03  4:23             ` Mr. James W. Laferriere
@ 2007-07-03  8:42               ` Justin Piszcz
  0 siblings, 0 replies; 20+ messages in thread
From: Justin Piszcz @ 2007-07-03  8:42 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist



On Mon, 2 Jul 2007, Mr. James W. Laferriere wrote:

> 	Hello Justin (& all) ,
>
>> 
>> My read ahead is set to 64 megabytes and 16384 for the stripe_size_cache.
>
> 	Might you know of a tool for acquiring these (*) parameters for a 
> scsi drive ?  hdrarm really doesn't like real scsi drives so that doesn't 
> seem to work for me .
>
> (*)	"BuffType=DualPortCache, BuffSize=7936kB,"  Stolen from above .
>
>
> 		Tia ,  JimL

Best bet is to benchmark and test, try 128,256,512,1024 for the chunk 
size, also if your drives are on the PCI bus, it probably won't matter 
much.

Justin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-07-03  8:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-27 23:20 Fastest Chunk Size w/XFS For MD Software RAID = 1024k Justin Piszcz
2007-06-27 23:20 ` Justin Piszcz
2007-06-27 23:24   ` Justin Piszcz
2007-06-28  5:08   ` David Chinner
2007-06-28  7:53     ` David Greaves
2007-06-28  8:07     ` Justin Piszcz
2007-06-28  3:43 ` Peter Rabbitson
2007-06-28  8:07   ` Justin Piszcz
2007-06-28  8:24     ` Peter Rabbitson
2007-06-28  8:27       ` Justin Piszcz
2007-06-28  8:36         ` Peter Rabbitson
2007-06-28  8:38           ` Justin Piszcz
2007-07-03  4:23             ` Mr. James W. Laferriere
2007-07-03  8:42               ` Justin Piszcz
2007-06-28  8:42           ` Justin Piszcz
2007-06-28  8:46             ` Justin Piszcz
2007-06-28 22:05         ` David Chinner
2007-06-28  9:05       ` Matti Aarnio
2007-06-28 11:19         ` Justin Piszcz
2007-06-28 13:27         ` Jon Nelson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox