Small chunk size read performance penalty

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Small chunk size read performance penalty
@ 2013-08-18 22:05 Ian Pilcher
  2013-08-18 22:16 ` Roberto Spadim
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Ian Pilcher @ 2013-08-18 22:05 UTC (permalink / raw)
  To: linux-raid

Can anyone point me to a good explanation of the read performance impact
of small (RAID-5 and RAID-6) chunk sizes?

I understand why large chunks hurt write performance, but I haven't been
able to reason through the small-chunk/read case, and my Interweb
searches haven't really turned anything up.

The "read penalty" is definitely there; I can see it in the test data
from my NAS.  I just don't understand *why* it's there.

Thanks!

-- 
========================================================================
Ian Pilcher                                         arequipeno@gmail.com
Sometimes there's nothing left to do but crash and burn...or die trying.
========================================================================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Small chunk size read performance penalty
  2013-08-18 22:05 Small chunk size read performance penalty Ian Pilcher
@ 2013-08-18 22:16 ` Roberto Spadim
  2013-08-19  1:40 ` Stan Hoeppner
  2013-08-19  3:01 ` Roberto Spadim
  2 siblings, 0 replies; 6+ messages in thread
From: Roberto Spadim @ 2013-08-18 22:16 UTC (permalink / raw)
  To: Ian Pilcher; +Cc: Linux-RAID

i'm not sure, but maybe the problem is the same of small chunk size of raid-0
the head move a lot for small works, while it could move less and do many job

but i'm not sure about it

2013/8/18 Ian Pilcher <arequipeno@gmail.com>:
> Can anyone point me to a good explanation of the read performance impact
> of small (RAID-5 and RAID-6) chunk sizes?
>
> I understand why large chunks hurt write performance, but I haven't been
> able to reason through the small-chunk/read case, and my Interweb
> searches haven't really turned anything up.
>
> The "read penalty" is definitely there; I can see it in the test data
> from my NAS.  I just don't understand *why* it's there.
>
> Thanks!
>
> --
> ========================================================================
> Ian Pilcher                                         arequipeno@gmail.com
> Sometimes there's nothing left to do but crash and burn...or die trying.
> ========================================================================
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Roberto Spadim

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Small chunk size read performance penalty
  2013-08-18 22:05 Small chunk size read performance penalty Ian Pilcher
  2013-08-18 22:16 ` Roberto Spadim
@ 2013-08-19  1:40 ` Stan Hoeppner
  2013-08-19  5:49   ` Ian Pilcher
  2013-08-19  3:01 ` Roberto Spadim
  2 siblings, 1 reply; 6+ messages in thread
From: Stan Hoeppner @ 2013-08-19  1:40 UTC (permalink / raw)
  To: Ian Pilcher; +Cc: linux-raid

On 8/18/2013 5:05 PM, Ian Pilcher wrote:
> Can anyone point me to a good explanation of the read performance impact
> of small (RAID-5 and RAID-6) chunk sizes?

Can you elaborate on your workload that demonstrates this?  Different
workloads behave differently with different chunk sizes.

> I understand why large chunks hurt write performance...

Again this is workload dependent.  Large chunks increase write and read
performance for large streaming workloads.

> The "read penalty" is definitely there; I can see it in the test data
> from my NAS.  I just don't understand *why* it's there.

If you can see it, then please demonstrate this read penalty with
numbers.  You obviously have test data from the same set of disks with
two different RAID5s of different chunk sizes.  This is required to see
such a difference in performance.  Please share this data with us.

-- 
Stan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Small chunk size read performance penalty
  2013-08-18 22:05 Small chunk size read performance penalty Ian Pilcher
  2013-08-18 22:16 ` Roberto Spadim
  2013-08-19  1:40 ` Stan Hoeppner
@ 2013-08-19  3:01 ` Roberto Spadim
  2 siblings, 0 replies; 6+ messages in thread
From: Roberto Spadim @ 2013-08-19  3:01 UTC (permalink / raw)
  To: Ian Pilcher; +Cc: Linux-RAID

HI Ian
just some points that you should consider... it's an idea about the
theory... don't think it's a main guide to linux-raid or anything
else...

some basic things... a disk have one head arm, in other words it can
read/write a sequence of bits, move, read/write more bits, move... etc

when you use strip (raid-0, 5, 6, raid 10) you change how data is
write, instead of a continous byte stream, you divide it in chunks...
chunk 1 at disk 1 position 0, chunk 2 at disk 2 position 0, chunk 3 at
disk1 position 1 .... etc... this place each disk close to one
specific chunk
example... considering a read from position 0 to last position of
array: the disk used = chunk id % number_of_disks, with a chunk of
128MB and two disks, and reading 1GB you will read 0-128MB from disk
1, 128-256MB from disk 2, 256-384MB  from disk 1 ... etc

when you read more than one chunk, you use two heads arms (two disks)
here you have speed boost, you must check what is the best chunk size
for your work load, (and chunk should only be used when need, on some
workloads raid1 is better)

using two head you can read/write faster, but this is only nice for
continous stream...
when you need parallel works (many thread) you can use raid1 for read
in parallel since it don't have chunks it can read data with only one
disk, in other words... the workload can (when possible) be shared for
1 disk / thread, if you have 10 disks using raid-1, you can have a
nice performace for 10 threads without problems, for write the slowest
disk will stall the write performace, but you should consider what you
need...

that's a superficial explain, the implementation can be a bit
different, but explain the idea about the use of chunks

the workload tell what's better
in my system sometimes raid1 is better than raid10 because i have many
threads reading diferent parts of disk, in this case i can add many
heads arms (disks) one for each important thread, and i have a good
performace (considering a high load system), but if you need fast
continous read, the chunk is VERY good
for example if you need a read of 1GB, and have 10 disks, you can have
a performace of 10x with a good chunk size, since each disk will be
used when a chunk read is requested, and each disk can read parallel
and continous (without many head movement)
if you put a chunk size of 1GB and you need a read of 1GB, you don't
have any performace boost from chunks...

the best thing to do is TEST with your workload
i think this can help, if not delete from your mail box :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Small chunk size read performance penalty
  2013-08-19  1:40 ` Stan Hoeppner
@ 2013-08-19  5:49   ` Ian Pilcher
  2013-08-20  2:28     ` Stan Hoeppner
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Pilcher @ 2013-08-19  5:49 UTC (permalink / raw)
  To: linux-raid

On 08/18/2013 08:40 PM, Stan Hoeppner wrote:
> Can you elaborate on your workload that demonstrates this?  Different
> workloads behave differently with different chunk sizes.

dd ... at block sizes between 4KiB and 1MiB, on RAID-5 and -6 arrays
with chunk sizes in the same range.

Hardware is 5 7200 RPM SATA drives in a NAS (Thecus N5550) with an Atom
D2550 processor and an ICH10R chipset.  The drives are all connected to
the chipset's built-in AHCI controller.

> If you can see it, then please demonstrate this read penalty with
> numbers.  You obviously have test data from the same set of disks with
> two different RAID5s of different chunk sizes.  This is required to see
> such a difference in performance.  Please share this data with us.

I've uploaded the data (in OpenDocument spreadsheet form) to Dropbox.  I
think that it's accessible at this link:

  https://www.dropbox.com/s/4dq93th4wu5rr2y/nas_benchmarks.ods

(This is my first attempt at sharing anything via Dropbox, so let me
know if it doesn't work.)

I actually find your response really interesting.  From my Interweb
searching, the "small stripe size read penalty" seems to be pretty
widely accepted, much as the "large stripe size write penalty" is.  It
certainly does show up in my data; as the chunk size increases reads of
even small blocks get faster.

-- 
========================================================================
Ian Pilcher                                         arequipeno@gmail.com
Sometimes there's nothing left to do but crash and burn...or die trying.
========================================================================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Small chunk size read performance penalty
  2013-08-19  5:49   ` Ian Pilcher
@ 2013-08-20  2:28     ` Stan Hoeppner
  0 siblings, 0 replies; 6+ messages in thread
From: Stan Hoeppner @ 2013-08-20  2:28 UTC (permalink / raw)
  To: Ian Pilcher; +Cc: linux-raid

On 8/19/2013 12:49 AM, Ian Pilcher wrote:
> On 08/18/2013 08:40 PM, Stan Hoeppner wrote:
>> Can you elaborate on your workload that demonstrates this?  Different
>> workloads behave differently with different chunk sizes.
> 
> dd ... at block sizes between 4KiB and 1MiB, on RAID-5 and -6 arrays
> with chunk sizes in the same range.
> 
> Hardware is 5 7200 RPM SATA drives in a NAS (Thecus N5550) with an Atom
> D2550 processor and an ICH10R chipset.  The drives are all connected to
> the chipset's built-in AHCI controller.
> 
>> If you can see it, then please demonstrate this read penalty with
>> numbers.  You obviously have test data from the same set of disks with
>> two different RAID5s of different chunk sizes.  This is required to see
>> such a difference in performance.  Please share this data with us.
> 
> I've uploaded the data (in OpenDocument spreadsheet form) to Dropbox.  I
> think that it's accessible at this link:
> 
>   https://www.dropbox.com/s/4dq93th4wu5rr2y/nas_benchmarks.ods
> 
> (This is my first attempt at sharing anything via Dropbox, so let me
> know if it doesn't work.)
> 
> I actually find your response really interesting.  From my Interweb
> searching, the "small stripe size read penalty" seems to be pretty
> widely accepted, much as the "large stripe size write penalty" is.  It
> certainly does show up in my data; as the chunk size increases reads of
> even small blocks get faster.

Everything in the world of storage performance depends on the workload.
 The statements above assume an unstated workload, and are so general as
to not be worth repeating, and certainly not putting any stock in.

The former is true of large streaming workloads.  If your workload deals
with small IO reads, such as mail serving, then a small stripe is not
detrimental as the mail file you're reading is almost always smaller
than the stripe size, and often smaller than the chunk size.  Using a
large chunk/stripe with such a workload can create hotspots on some
disks in the array, increasing latency, and decreasing throughput.

However, in this scenario, the big win is in write latency.  A large
chunk/stripe size will generate a huge amount of unnecessary read IO
during RMW cycles to recalculate parity when you write a new mail
message into an existing stripe.  With an optimal chunk/stripe for this
workload, you read few extra sectors during RMW.  It's often very
difficult to get this balance right.  And even if you do, mail workloads
are still many times slower on parity RAID than on mirrors or striped
mirrors (RAID10).  This obviously depends on load.  Even "low end"
modern server hardware with md RAID6 and a handful of disks can easily
handle a few hundred active mail users.  Once you get into the thousands
you'll need mirror based RAID as RMW latency will grind you to a halt.
The same hardware is plenty.  You simply change the RAID level.  You'll
need a couple more disks to maintain total capacity, but simply changing
to mirror based RAID will increase throughput 5-15 fold, and decrease
latency substantially.

Any "large stripe size write penalty" will be a function of mismatching
the workload to the RAID stripe and/or array/drive hardware.  Using a
large stripe with a mail workload will yield poor performance indeed due
to large RMW bandwidth/latency.  Large stripe with this workload
typically means >32-64KB.  Yes, that's stripe, not chunk.  For this
workload using a 6 drive RAID6 you'd want an 8-16KB chunk for a 32-64KB
stripe.  This is the opposite of the meme you quote above.  Again,
workload dependent.

If your workload is HPC file serving, where user files are 10s to 100s
of GB, even TBs in size, then you'd want the largest chunk/strip/stripe
your hardware can perform well with.  This may be as low as 512KB or it
may be as large as 2MB.  And it will likely be hardware based RAID, not
Linux md.

-- 
Stan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-08-20  2:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-18 22:05 Small chunk size read performance penalty Ian Pilcher
2013-08-18 22:16 ` Roberto Spadim
2013-08-19  1:40 ` Stan Hoeppner
2013-08-19  5:49   ` Ian Pilcher
2013-08-20  2:28     ` Stan Hoeppner
2013-08-19  3:01 ` Roberto Spadim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox