Re: calculating optimal chunk size for Linux software-RAID

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Stan Hoeppner <stan@hardwarefreak.com>
To: Martin T <m4rtntns@gmail.com>
Cc: "linux-raid@vger.kernel.org List" <linux-raid@vger.kernel.org>
Subject: Re: calculating optimal chunk size for Linux software-RAID
Date: Fri, 07 Mar 2014 23:37:37 -0600	[thread overview]
Message-ID: <531AACA1.6000606@hardwarefreak.com> (raw)
In-Reply-To: <CAJx5YvH1GE0KUcsSnVwbtUz6qP4zMv3N2qt3HhrjdF69NT2vLQ@mail.gmail.com>

On 3/7/2014 9:15 PM, Martin T wrote:
> Stan,
> 
> ok, I see. However, are there utilities out there which help one to
> analyze how applications on a server use the file-system over the time
> and help to make an educated decision regarding the chunk size?

My apologies.  You're a complete novice and I'm leading you down the
textbook storage architectural design path.  Let's short circuit that as
I don't have the time.

As you're starting from zero, let me give you what works best with 99%
of workloads.  Use a chunk size of 32KB or 64KB.  Such a chunk will work
extremely well with any singular or mixed workloads, on parity and
non-parity RAID.  The only workload that should have a significantly
larger chunk than this is a purely streaming allocation workload of
large files.

If you want a more technical explanation, you can read all of my
relevant posts in the linux-raid or XFS archives, as I've explained this
hundreds of times in great detail.  Or you can wait a few months to read
the kernel documentation I'm working on, which will teach the reader the
formal storage stack design process, soup to nuts.  I wish it was
already finished, as I could simply paste the link for you, which,
coincidentally, is the exact reason I'm writing it. :)

> regards,
> Martin
> 
> On Fri, Mar 7, 2014 at 11:58 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 3/6/2014 8:06 PM, Martin T wrote:
>>> Am I correct that optimal chunk size is usually the size of the
>>> average file read/written to disk divided by number of block devices
>>> in RAID array storing the data? For example if the average file size
>>> is 1024KiB and I have four disks in RAID1, then I should choose the
>>> chunk size around 256KiB to get the optimal read performance? Or if I
>>> have two drives in RAID0, then I should choose the chunk size 512KiB
>>> instead? Or are there better methods/benchmarks to determine the
>>> optimal chunk size for software-RAID?
>>
>> You're asking the wrong question.  Storage architecture design always
>> begins with the workload.  The correct question is:
>>
>> My workload (application mix) performs *most* IO in manner X, where X is
>>
>> 1.  large streaming write/read
>> 2.  small file write/read
>> 3.  metadata heavy
>>
>> I have Y number of disk drives.  I plan to use XFS/EXT4/etc filesystem.
>>  What RAID level and chunk size are optimal for my workload, and how do
>> I properly tune my filesystem to my workload and storage stack?
>>
>>> Last but not least, is there a
>>> good utility which could help one to measure the average I/O
>>> read/write size?
>>
>> In flight IO size has no correlation to stripe and chunk size.  What you
>> need to know is how your application(s) write to the filesystem and how
>> your filesystem issues write IOs.  You should already know that the
>> former, and it's easy to determine the latter.

-- 
Stan

next prev parent reply	other threads:[~2014-03-08  5:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-07  2:06 calculating optimal chunk size for Linux software-RAID Martin T
2014-03-07 23:58 ` Stan Hoeppner
2014-03-08  3:15   ` Martin T
2014-03-08  5:37     ` Stan Hoeppner [this message]
2014-03-08 22:03       ` Bill Davidsen
2014-03-12 15:21         ` Martin T
2014-03-13 10:15           ` Stan Hoeppner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=531AACA1.6000606@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=m4rtntns@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).