public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@zip.com.au>
To: dean gaudet <dean-list-linux-kernel@arctic.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: 3x slower file reading oddity
Date: Mon, 17 Jun 2002 16:26:57 -0700	[thread overview]
Message-ID: <3D0E7041.860710CA@zip.com.au> (raw)
In-Reply-To: Pine.LNX.4.44.0206171246270.31265-100000@twinlark.arctic.org

dean gaudet wrote:
> 
> i was trying to study various cpu & i/o bottlenecks for a backup
> tool (rdiff-backup) and i stumbled into this oddity:
> 
> # time xargs -0 -n100 cat -- > /dev/null < /tmp/filelist
> 0.520u 5.310s 0:36.92 15.7%     0+0k 0+0io 11275pf+0w
> # time xargs -0 -n100 cat -- > /dev/null < /tmp/filelist
> 0.510u 5.090s 0:35.05 15.9%     0+0k 0+0io 11275pf+0w
> 
> # time xargs -0 -P2 -n100 cat -- > /dev/null < /tmp/filelist
> 0.500u 5.380s 1:30.51 6.4%      0+0k 0+0io 11275pf+0w
> # time xargs -0 -P2 -n100 cat -- > /dev/null < /tmp/filelist
> 0.420u 4.810s 1:36.73 5.4%      0+0k 0+0io 11275pf+0w
> 
> 3x slower with the two cats in parallel.

Note that the CPU time remained constant.  The wall time went up.
You did more seeking with the dual-thread approach.

I rather depends on what is in /tmp/filelist.  I assume it's
something like the output of `find'.  And I assume you're
using ext2 or ext3?

- ext2/3 will chop the filesystem up into 128-megabyte block groups.

- It attemts to place all the files in a directory into the same
  block group.

- It will explicitly place new directories into a different blockgroup
  from their parent.

And I suspect it's the latter point which has caught you out.  You have
two threads, and probably each thread's list of 100 files is from a
different directory.  And hence it lives in a different block group.
And hence your two threads are competing for the disk head.

Even increasing the elevator read latency won't help you here - we don't
perform inter-file readahead, so as soon as thread 1 blocks on a read,
it has *no* reads queued up and the other thread's requests are then 
serviced.

You'll get best throughput with a single read thread.  There are some
smarter readahead things we could do in there, but it tends to be
that device-level readahead fixes everything up anyway.

-

  parent reply	other threads:[~2002-06-17 23:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-06-17 20:03 3x slower file reading oddity dean gaudet
2002-06-17 20:07 ` Benjamin LaHaise
2002-06-17 20:31   ` dean gaudet
2002-06-17 23:26 ` Andrew Morton [this message]
2002-06-18  0:15   ` dean gaudet
2002-06-18  0:36     ` Andrew Morton
2002-06-18  1:40       ` dean gaudet
2002-06-18  1:45       ` Andreas Dilger
2002-06-18  2:08         ` dean gaudet
2002-06-18 10:45           ` Padraig Brady

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D0E7041.860710CA@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=dean-list-linux-kernel@arctic.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox