Re: recommendations for stripe/chunk size

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Keld Jørn Simonsen" <keld@dkuug.dk>
To: Wolfgang Denk <wd@denx.de>
Cc: Bill Davidsen <davidsen@tmr.com>, linux-raid@vger.kernel.org
Subject: Re: recommendations for stripe/chunk size
Date: Thu, 7 Feb 2008 01:31:16 +0100	[thread overview]
Message-ID: <20080207003116.GA23341@rap.rap.dk> (raw)
In-Reply-To: <20080206202536.3316124D1D@gemini.denx.de>

On Wed, Feb 06, 2008 at 09:25:36PM +0100, Wolfgang Denk wrote:
> In message <47AA08E7.5000801@tmr.com> you wrote:
> >
> > > I actually  think the kernel should operate with block sizes
> > > like this and not wth 4 kiB blocks. It is the readahead and the elevator
> > > algorithms that save us from randomly reading 4 kb a time.
> > >
> > >   
> > Exactly, and nothing save a R-A-RW cycle if the write is a partial chunk.
> 
> Indeed kernel page size is an important factor in such optimizations.
> But you have to keep in mind that this is mostly efficient for (very)
> large strictly sequential I/O operations only -  actual  file  system
> traffic may be *very* different.
> 
> We implemented the option to select kernel page sizes of  4,  16,  64
> and  256  kB for some PowerPC systems (440SPe, to be precise). A nice
> graphics of the effect can be found here:
> 
> https://www.amcc.com/MyAMCC/retrieveDocument/PowerPC/440SPe/RAIDinLinux_PB_0529a.pdf

Yes, that is also what I would expect, for sequential reads.
Random writes of small data blocks, kind of what is done in bug data
bases, should show another picture as others also have described.

If you look at a single disk, would you get improved performance with
the asyncroneous IO?

I am a bit puzzled about my SATA-II performance: nominally I could get
300 MB/s on SATA-II, but I only get about 80 MB/s. Why is that?
I thought it was because of latency with syncroneous reads.
Ie, when a chunk is read, yo need to complete the IO operation, and then
issue an new one. In the meantime while the CPU is doing these
calculations, te disk has spun a little, and to get the next data chunk,
we need to wait for the disk to spin around to have the head positioned 
over the right data pace on the disk surface. Is that so? Or does the
controller take care of this, reading the rest of the not-yet-requested
track into a buffer, which then can be delivered next time. Modern disks
often have buffers of about 8 or 16 MB. I wonder why they don't have
bigger buffers.

Anyway, why does a SATA-II drive not deliver something like 300 MB/s?

best regards
keld

next prev parent reply	other threads:[~2008-02-07  0:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-05 18:24 recommendations for stripe/chunk size Keld Jørn Simonsen
2008-02-05 19:19 ` Justin Piszcz
2008-02-06 19:22 ` Bill Davidsen
2008-02-06 20:25   ` Wolfgang Denk
2008-02-06 22:37     ` Bill Davidsen
2008-02-07  0:31     ` Keld Jørn Simonsen [this message]
2008-02-07  5:40       ` Iustin Pop
2008-02-07  9:58         ` Keld Jørn Simonsen
2008-02-07  5:51       ` Neil Brown
2008-02-07  5:46     ` Neil Brown
2008-02-07  8:49       ` Wolfgang Denk
2008-02-07  5:31   ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080207003116.GA23341@rap.rap.dk \
    --to=keld@dkuug.dk \
    --cc=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=wd@denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).