linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RAID-0/5/6 performances
@ 2013-12-05 19:24 Piergiorgio Sartor
  2013-12-05 21:57 ` NeilBrown
  2013-12-06  9:24 ` Stan Hoeppner
  0 siblings, 2 replies; 7+ messages in thread
From: Piergiorgio Sartor @ 2013-12-05 19:24 UTC (permalink / raw)
  To: linux-raid

Hi all,

I've a system, with an LSI 2308 SAS controller
and 5 2.5" HDD attached.
Each HDD can do around 100MB/sec read/write.
This was tested will all HDDs in parallel, to
make sure the controller can sustain them.
Single disk has same performance.

I was testing RAID 0/5/6 perfomances and I found
something I could not clearly understand.

The test was done with "dd", I wanted to know the
maximum possible performance.
Specifically, for reading:

dd if=/dev/md127 of=/dev/null bs=4k

For writing:

dd if=/dev/zero of=/dev/md127 bs=4k conv=fdatasync

Note than large block size did not change the
results. I guess the page size is quite optimal.

I tested each RAID with 4 and 5 HDDs, with chunk
size of 512k, 64k and 16k.
The "stripe_cache_size" was set to the max 32768.

The results were observed with "iostat -k 5",
taking care to consider variations and ramp up.

The table, with MB/sec, the number are the HDDs
the "r" is read, "w" is write:

chunk RAID 4r  4w  5r  5w
512k   0   400 400 500 500
512k   5   260 300 360 400
512k   6    55 180 100 290

 64k   0   400 400 440 500
 64k   5   150 300 160 400
 64k   6   100 180 140 290

 16k   0   380 400 350 500
 16k   5   100 300 130 390
 16k   6    80 180 100 290

Now, RAID-0/5 seem to perform as expected,
depending on the number of HDDs. Expecially
with large chunk size.
Write performances are not a problem, even
if those are CPU intensive, with parity RAID.
RAID-0/5 do not react well with small chunk.
RAID-6, on the other hand, seems to have an
idea of its own.
First of all, it does not seem to respect
proportionality. I would think a 4 HDDs
RAID-6 should more or less read as fast as
2 HDDs. I can understand some loss, due to
the parity skip, but not so much. In fact it
improves with smaller chunk.
With 5 HDDs, I would expect something better
than 100MB/sec.

Any idea on this? Am I doing something wrong?
Some suggestion on tuning something in order
to try to improve RAID-6?

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID-0/5/6 performances
  2013-12-05 19:24 RAID-0/5/6 performances Piergiorgio Sartor
@ 2013-12-05 21:57 ` NeilBrown
  2013-12-05 22:29   ` Piergiorgio Sartor
  2013-12-06 22:47   ` Piergiorgio Sartor
  2013-12-06  9:24 ` Stan Hoeppner
  1 sibling, 2 replies; 7+ messages in thread
From: NeilBrown @ 2013-12-05 21:57 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2560 bytes --]

On Thu, 5 Dec 2013 20:24:54 +0100 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi all,
> 
> I've a system, with an LSI 2308 SAS controller
> and 5 2.5" HDD attached.
> Each HDD can do around 100MB/sec read/write.
> This was tested will all HDDs in parallel, to
> make sure the controller can sustain them.
> Single disk has same performance.
> 
> I was testing RAID 0/5/6 perfomances and I found
> something I could not clearly understand.
> 
> The test was done with "dd", I wanted to know the
> maximum possible performance.
> Specifically, for reading:
> 
> dd if=/dev/md127 of=/dev/null bs=4k
> 
> For writing:
> 
> dd if=/dev/zero of=/dev/md127 bs=4k conv=fdatasync
> 
> Note than large block size did not change the
> results. I guess the page size is quite optimal.
> 
> I tested each RAID with 4 and 5 HDDs, with chunk
> size of 512k, 64k and 16k.
> The "stripe_cache_size" was set to the max 32768.
> 
> The results were observed with "iostat -k 5",
> taking care to consider variations and ramp up.
> 
> The table, with MB/sec, the number are the HDDs
> the "r" is read, "w" is write:
> 
> chunk RAID 4r  4w  5r  5w
> 512k   0   400 400 500 500
> 512k   5   260 300 360 400
> 512k   6    55 180 100 290
> 
>  64k   0   400 400 440 500
>  64k   5   150 300 160 400
>  64k   6   100 180 140 290
> 
>  16k   0   380 400 350 500
>  16k   5   100 300 130 390
>  16k   6    80 180 100 290
> 
> Now, RAID-0/5 seem to perform as expected,
> depending on the number of HDDs. Expecially
> with large chunk size.
> Write performances are not a problem, even
> if those are CPU intensive, with parity RAID.
> RAID-0/5 do not react well with small chunk.
> RAID-6, on the other hand, seems to have an
> idea of its own.
> First of all, it does not seem to respect
> proportionality. I would think a 4 HDDs
> RAID-6 should more or less read as fast as
> 2 HDDs. I can understand some loss, due to
> the parity skip, but not so much. In fact it
> improves with smaller chunk.
> With 5 HDDs, I would expect something better
> than 100MB/sec.
> 
> Any idea on this? Am I doing something wrong?
> Some suggestion on tuning something in order
> to try to improve RAID-6?
> 
> Thanks,
> 
> bye,
> 

Does look strange.
First thing I would check is the read-ahead size.
md sets it for you but might be messing up some how.
Have a look at 
   /sys/block/mdX/bdi/read_ahead_kb
for each configuration and see if making it some uniform large number has any
effect.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID-0/5/6 performances
  2013-12-05 21:57 ` NeilBrown
@ 2013-12-05 22:29   ` Piergiorgio Sartor
  2013-12-06 22:47   ` Piergiorgio Sartor
  1 sibling, 0 replies; 7+ messages in thread
From: Piergiorgio Sartor @ 2013-12-05 22:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

On Fri, Dec 06, 2013 at 08:57:12AM +1100, NeilBrown wrote:
> On Thu, 5 Dec 2013 20:24:54 +0100 Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> 
> > Hi all,
> > 
> > I've a system, with an LSI 2308 SAS controller
> > and 5 2.5" HDD attached.
> > Each HDD can do around 100MB/sec read/write.
> > This was tested will all HDDs in parallel, to
> > make sure the controller can sustain them.
> > Single disk has same performance.
> > 
> > I was testing RAID 0/5/6 perfomances and I found
> > something I could not clearly understand.
> > 
> > The test was done with "dd", I wanted to know the
> > maximum possible performance.
> > Specifically, for reading:
> > 
> > dd if=/dev/md127 of=/dev/null bs=4k
> > 
> > For writing:
> > 
> > dd if=/dev/zero of=/dev/md127 bs=4k conv=fdatasync
> > 
> > Note than large block size did not change the
> > results. I guess the page size is quite optimal.
> > 
> > I tested each RAID with 4 and 5 HDDs, with chunk
> > size of 512k, 64k and 16k.
> > The "stripe_cache_size" was set to the max 32768.
> > 
> > The results were observed with "iostat -k 5",
> > taking care to consider variations and ramp up.
> > 
> > The table, with MB/sec, the number are the HDDs
> > the "r" is read, "w" is write:
> > 
> > chunk RAID 4r  4w  5r  5w
> > 512k   0   400 400 500 500
> > 512k   5   260 300 360 400
> > 512k   6    55 180 100 290
> > 
> >  64k   0   400 400 440 500
> >  64k   5   150 300 160 400
> >  64k   6   100 180 140 290
> > 
> >  16k   0   380 400 350 500
> >  16k   5   100 300 130 390
> >  16k   6    80 180 100 290
> > 
> > Now, RAID-0/5 seem to perform as expected,
> > depending on the number of HDDs. Expecially
> > with large chunk size.
> > Write performances are not a problem, even
> > if those are CPU intensive, with parity RAID.
> > RAID-0/5 do not react well with small chunk.
> > RAID-6, on the other hand, seems to have an
> > idea of its own.
> > First of all, it does not seem to respect
> > proportionality. I would think a 4 HDDs
> > RAID-6 should more or less read as fast as
> > 2 HDDs. I can understand some loss, due to
> > the parity skip, but not so much. In fact it
> > improves with smaller chunk.
> > With 5 HDDs, I would expect something better
> > than 100MB/sec.
> > 
> > Any idea on this? Am I doing something wrong?
> > Some suggestion on tuning something in order
> > to try to improve RAID-6?
> > 
> > Thanks,
> > 
> > bye,
> > 
> 
> Does look strange.
> First thing I would check is the read-ahead size.
> md sets it for you but might be messing up some how.
> Have a look at 
>    /sys/block/mdX/bdi/read_ahead_kb
> for each configuration and see if making it some uniform large number has any
> effect.

Hi Neil,

thanks for the hint, I knew I needed _the_ expert :-)

Using a chunk of 64k (best of the table above), with
5 HDDs RAID-6, the default read_ahead_kb is 384.
I tried to increase it, with following improvement:

read_ahead_kb 5r
1024    -->   200MB/sec
4096    -->   300MB/sec
8192    -->   310MB/sec
32768   -->   310MB/sec

So, it seems that between 4k and 8k the max is reached,
which is somehow what I would expect for a 5 HDDs RAID-6.

I'll try (tomorrow) with different chunk to see what changes.

In any case, 384 seems a bit too little. Maybe 5 HDDs are
not a real RAID-6 use case, I do not know.

Thanks again,

bye,

pg

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID-0/5/6 performances
  2013-12-05 19:24 RAID-0/5/6 performances Piergiorgio Sartor
  2013-12-05 21:57 ` NeilBrown
@ 2013-12-06  9:24 ` Stan Hoeppner
  2013-12-06 18:13   ` Piergiorgio Sartor
  1 sibling, 1 reply; 7+ messages in thread
From: Stan Hoeppner @ 2013-12-06  9:24 UTC (permalink / raw)
  To: Piergiorgio Sartor, linux-raid

On 12/5/2013 1:24 PM, Piergiorgio Sartor wrote:

> The "stripe_cache_size" was set to the max 32768.

You don't want to set this so high.  Doing this will:

1.  Usually decrease throughput
2.  Eat a huge amount of memory.  With 5 drives:

    ((32768*4096)*5)/1048576 = 640 MB RAM consumed for the stripe buffer

For 5 or fewer pieces of spinning rust a value of 2048 or less should be
sufficient.  Test 512, 1024, 2048, 4096, and 8192.  You should see your
throughput go up and then back down.  Find the sweet spot and use that
value.  If two of these yield throughput within 5% of one another, use
the lower value as it eats less RAM.

-- 
Stan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID-0/5/6 performances
  2013-12-06  9:24 ` Stan Hoeppner
@ 2013-12-06 18:13   ` Piergiorgio Sartor
  2013-12-06 23:29     ` Stan Hoeppner
  0 siblings, 1 reply; 7+ messages in thread
From: Piergiorgio Sartor @ 2013-12-06 18:13 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Piergiorgio Sartor, linux-raid

On Fri, Dec 06, 2013 at 03:24:18AM -0600, Stan Hoeppner wrote:
> On 12/5/2013 1:24 PM, Piergiorgio Sartor wrote:
> 
> > The "stripe_cache_size" was set to the max 32768.
> 
> You don't want to set this so high.  Doing this will:
> 
> 1.  Usually decrease throughput
> 2.  Eat a huge amount of memory.  With 5 drives:
> 
>     ((32768*4096)*5)/1048576 = 640 MB RAM consumed for the stripe buffer
> 
> For 5 or fewer pieces of spinning rust a value of 2048 or less should be
> sufficient.  Test 512, 1024, 2048, 4096, and 8192.  You should see your
> throughput go up and then back down.  Find the sweet spot and use that
> value.  If two of these yield throughput within 5% of one another, use
> the lower value as it eats less RAM.

Hi Stan,

thanks for the reply, I was looking forward to it,
since you always provide useful information.

I checked two systems, one, different, with RAID-5,
the other the actual RAID-6.

In the first one, 2048 seems to be the best stripe
cache size, while more results in slower writing
speed, albeit not too much.

For the RAID-6, it seems 32768 is the best value.

There is one difference, the RAID-5 has chunk size
of 512k (default), while the RAID-6 has still the 64k.

BTW, why is that? I mean why large stripe cache
results in lower writing speed?

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID-0/5/6 performances
  2013-12-05 21:57 ` NeilBrown
  2013-12-05 22:29   ` Piergiorgio Sartor
@ 2013-12-06 22:47   ` Piergiorgio Sartor
  1 sibling, 0 replies; 7+ messages in thread
From: Piergiorgio Sartor @ 2013-12-06 22:47 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

On Fri, Dec 06, 2013 at 08:57:12AM +1100, NeilBrown wrote:
> On Thu, 5 Dec 2013 20:24:54 +0100 Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> 
> > Hi all,
> > 
> > I've a system, with an LSI 2308 SAS controller
> > and 5 2.5" HDD attached.
> > Each HDD can do around 100MB/sec read/write.
> > This was tested will all HDDs in parallel, to
> > make sure the controller can sustain them.
> > Single disk has same performance.
> > 
> > I was testing RAID 0/5/6 perfomances and I found
> > something I could not clearly understand.
> > 
> > The test was done with "dd", I wanted to know the
> > maximum possible performance.
> > Specifically, for reading:
> > 
> > dd if=/dev/md127 of=/dev/null bs=4k
> > 
> > For writing:
> > 
> > dd if=/dev/zero of=/dev/md127 bs=4k conv=fdatasync
> > 
> > Note than large block size did not change the
> > results. I guess the page size is quite optimal.
> > 
> > I tested each RAID with 4 and 5 HDDs, with chunk
> > size of 512k, 64k and 16k.
> > The "stripe_cache_size" was set to the max 32768.
> > 
> > The results were observed with "iostat -k 5",
> > taking care to consider variations and ramp up.
> > 
> > The table, with MB/sec, the number are the HDDs
> > the "r" is read, "w" is write:
> > 
> > chunk RAID 4r  4w  5r  5w
> > 512k   0   400 400 500 500
> > 512k   5   260 300 360 400
> > 512k   6    55 180 100 290
> > 
> >  64k   0   400 400 440 500
> >  64k   5   150 300 160 400
> >  64k   6   100 180 140 290
> > 
> >  16k   0   380 400 350 500
> >  16k   5   100 300 130 390
> >  16k   6    80 180 100 290
> > 
> > Now, RAID-0/5 seem to perform as expected,
> > depending on the number of HDDs. Expecially
> > with large chunk size.
> > Write performances are not a problem, even
> > if those are CPU intensive, with parity RAID.
> > RAID-0/5 do not react well with small chunk.
> > RAID-6, on the other hand, seems to have an
> > idea of its own.
> > First of all, it does not seem to respect
> > proportionality. I would think a 4 HDDs
> > RAID-6 should more or less read as fast as
> > 2 HDDs. I can understand some loss, due to
> > the parity skip, but not so much. In fact it
> > improves with smaller chunk.
> > With 5 HDDs, I would expect something better
> > than 100MB/sec.
> > 
> > Any idea on this? Am I doing something wrong?
> > Some suggestion on tuning something in order
> > to try to improve RAID-6?
> > 
> > Thanks,
> > 
> > bye,
> > 
> 
> Does look strange.
> First thing I would check is the read-ahead size.
> md sets it for you but might be messing up some how.
> Have a look at 
>    /sys/block/mdX/bdi/read_ahead_kb
> for each configuration and see if making it some uniform large number has any
> effect.

Hi again Neil,

I tested some "read_ahead_kb" configuration,
with RAID-6, 4 and 5 HDDs and 512k chunk size.

Increasing the value to very large numbers,
like 65536 or 131072 did indeed improved read
performances.

I tested from 4096 to 131072 doubling the
value at each run.

So, for 4 HDDs I got around 150MB/sec and
for 5 HDDs around 190 MB/sec.

This is better than the 55 and 100 I got
before, but still below the expected 200
and 300 I get with chunk size 64k.

Anyhow, I guess the read-ahead tuning did
the trick.

Thanks againg,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID-0/5/6 performances
  2013-12-06 18:13   ` Piergiorgio Sartor
@ 2013-12-06 23:29     ` Stan Hoeppner
  0 siblings, 0 replies; 7+ messages in thread
From: Stan Hoeppner @ 2013-12-06 23:29 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On 12/6/2013 12:13 PM, Piergiorgio Sartor wrote:
> On Fri, Dec 06, 2013 at 03:24:18AM -0600, Stan Hoeppner wrote:
>> On 12/5/2013 1:24 PM, Piergiorgio Sartor wrote:
>>
>>> The "stripe_cache_size" was set to the max 32768.
>>
>> You don't want to set this so high.  Doing this will:
>>
>> 1.  Usually decrease throughput
>> 2.  Eat a huge amount of memory.  With 5 drives:
>>
>>     ((32768*4096)*5)/1048576 = 640 MB RAM consumed for the stripe buffer
>>
>> For 5 or fewer pieces of spinning rust a value of 2048 or less should be
>> sufficient.  Test 512, 1024, 2048, 4096, and 8192.  You should see your
>> throughput go up and then back down.  Find the sweet spot and use that
>> value.  If two of these yield throughput within 5% of one another, use
>> the lower value as it eats less RAM.
> 
> Hi Stan,
> 
> thanks for the reply, I was looking forward to it,
> since you always provide useful information.
> 
> I checked two systems, one, different, with RAID-5,
> the other the actual RAID-6.
> 
> In the first one, 2048 seems to be the best stripe
> cache size, while more results in slower writing
> speed, albeit not too much.
> 
> For the RAID-6, it seems 32768 is the best value.
> 
> There is one difference, the RAID-5 has chunk size
> of 512k (default), while the RAID-6 has still the 64k.
> 
> BTW, why is that? I mean why large stripe cache
> results in lower writing speed?

I don't have the answer to this question.  It has been asked before.  I
can only speculate that the larger cache table introduces overhead of
some kind.  You may want to ask Neil directly.

Note that you're using dd for testing this.  dd produces single stream
serial IO.  If you test other IO patterns, such as parallel or
asynchronous, with software such as FIO, the results may be a bit different.

-- 
Stan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-12-06 23:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-05 19:24 RAID-0/5/6 performances Piergiorgio Sartor
2013-12-05 21:57 ` NeilBrown
2013-12-05 22:29   ` Piergiorgio Sartor
2013-12-06 22:47   ` Piergiorgio Sartor
2013-12-06  9:24 ` Stan Hoeppner
2013-12-06 18:13   ` Piergiorgio Sartor
2013-12-06 23:29     ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).