public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Phillip Susi <psusi@cfl.rr.com>
To: Linux-kernel <linux-kernel@vger.kernel.org>
Subject: Investigating poor du performance on ReiserFS
Date: Fri, 06 Oct 2006 10:56:57 -0400	[thread overview]
Message-ID: <45266EB9.1080205@cfl.rr.com> (raw)

I've been looking into a performance problem lately on ReiserFS 
executing du on a rather large directory.  For reference this is a 
Maildir of lkml since Jan 1 of this year which currently contains around 
90,000 messages/files.  Executing a du in this directory with cold 
caches takes a horribly long time.  A find completes rather quickly but 
all the stat()s that du performs seems to take a very long time to read 
in the required data ( orders of magnitude longer than it should take 
for the disks to read the amount of data transfered ).

I believe this is due to a massive seek storm caused in the process of 
reading all of the leaf nodes to fetch the stat blocks.  I have surmised 
that this is due to the fact that the directory entries are sorted by 
their hash value, and that is the order they are returned to du in, 
which then performs a stat() on each one in sequence.  The problem is 
that the hash sort order has no relationship to the order of the leaf 
nodes that hold the stat info.  While the leaf nodes generally have 
keys, and thus block locations, close to the parent directory, the order 
they are accessed in is essentially random, which causes the seek storm.

Does this theory sound plausible?  How hard would it be to sort the 
directory listing by key before returning it?  Would doing that likely 
fix the problem by causing du to stat the files in the order of the leaf 
node keys, and thus, quite likely in the order that the blocks appear on 
disk?


                 reply	other threads:[~2006-10-06 14:56 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45266EB9.1080205@cfl.rr.com \
    --to=psusi@cfl.rr.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox