Re: 3x slower file reading oddity

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@zip.com.au>
To: dean gaudet <dean-list-linux-kernel@arctic.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: 3x slower file reading oddity
Date: Mon, 17 Jun 2002 17:36:12 -0700	[thread overview]
Message-ID: <3D0E807C.5D50C17E@zip.com.au> (raw)
In-Reply-To: Pine.LNX.4.44.0206171649270.18507-100000@twinlark.arctic.org

dean gaudet wrote:
> 
> ...
> > You'll get best throughput with a single read thread.
> 
> what if you have a disk array with lots of spindles?  it seems at some
> point that you need to give the array or some lower level driver a lot of
> i/os to choose from so that it can get better parallelism out of the
> hardware.

mm.  For that particular test, you'd get nice speedups from striping
the blockgroups across disks, so each `cat' is probably talking to
a different disk.  I don't think I've seen anything like that proposed
though.

But regardless of the disk topology, the sanest way to get good IO
scheduling is to throw a lot of requests at the block layer.  That's
simple for writes.  But for reads, it's harder.

You could fork one `cat' per file ;)  (Not so silly, really.  But if
you took this approach, you'd need "many" more threads than blockgroups).

Or teach `cat' to perform asynchronous (aio) reads.  You'd need async
opens, too.   But generally we get a good cache hit rate against the
data which is needed to open a small file.

hmm.  What else?  Physical readahead - read metadata into the block
device's pagecache and flip pages from there into directories and
files on-demand.  Fat chance of that happening.

Or change ext2/3 to not place directories in different block groups
at all.  That's super-effective, but does cause somewhat worse long-term
fragmentation.

You can probably lessen the seek-rate by accessing the files in the correct
order.  Read all the files from a directory before descending into any of
its subdirectories.  Can find(1) do that?  You should be able to pretty
much achieve disk bandwidth this way - it depends on how bad the inter-
and intra-file fragmentation has become.

-

next prev parent reply	other threads:[~2002-06-18  0:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-06-17 20:03 3x slower file reading oddity dean gaudet
2002-06-17 20:07 ` Benjamin LaHaise
2002-06-17 20:31   ` dean gaudet
2002-06-17 23:26 ` Andrew Morton
2002-06-18  0:15   ` dean gaudet
2002-06-18  0:36     ` Andrew Morton [this message]
2002-06-18  1:40       ` dean gaudet
2002-06-18  1:45       ` Andreas Dilger
2002-06-18  2:08         ` dean gaudet
2002-06-18 10:45           ` Padraig Brady

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D0E807C.5D50C17E@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=dean-list-linux-kernel@arctic.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox