linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Phillip Susi <psusi@cfl.rr.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>,
	linux-fsdevel@vger.kernel.org,
	Linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: readahead on directories
Date: Wed, 21 Apr 2010 21:01:04 +0100	[thread overview]
Message-ID: <20100421200104.GT27575@shareable.org> (raw)
In-Reply-To: <4BCF509E.2040903@cfl.rr.com>

Phillip Susi wrote:
> On 4/21/2010 2:51 PM, Jamie Lokier wrote:
> > Fwiw, I found sorting directories by inode and reading them in that
> > order help to reduce seeks, some 10 years ago.  I implemented
> > something like 'find' which works like that, keeping a queue of
> > directories to read and things to open/stat, ordered by inode number
> > seen in d_ino before open/stat and st_ino after.  However it did not
> > try to readahead the blocks inside a directory, or sort operations by
> > block number.  It reduced some 'find'-like operations to about a
> > quarter of the time on cold cache.  I still use that program sometimes
> > before "git status" ;-)  Google "treescan" and "lokier" if you're
> > interested in trying it (though I use 0.7 which isn't published).
> 
> That helps with open()ing or stat()ing the files since you access the
> inodes in order, but ureadahead already preloads all of the inode tables
> so this won't help.

It helps a little with data access too, because of block group
locality tending to follow inode numbers.  Don't read inodes and data
in the same batch though.

> >> it is not about readdir(). Plain read() is synchronous too. But
> >> filesystem can respond to readahead calls and read next block to current
> >> one, while it won't do this for next direntry.
> > 
> > I'm surprised it makes much difference, as directories are usually not
> > very large anyway.
> 
> That's just it; it doesn't help.  That's why I want to readahead() all
> of the directories at once instead of reading them one block at a time.

Ok, this discussion has got a bit confused.  Text above refers to
needing to asynchronously read next block in a directory, but if they
are small then that's not important.

> > But if it does, go on, try FIEMAP and blockdev reading, you know you
> > want to :-)
> 
> Why reinvent the wheel when that's readahead()'s job?  As a workaround
> I'm about to try just threading all of the calls to open().

FIEMAP suggestion is only if you think you need to issue reads for
multiple blocks in the _same_ directory in parallel.  From what you say,
I doubt that's important.

FIEMAP is not relevant for reading different directories in parallel.
You'd still have to thread the FIEMAP calls for that - it's a
different problem.

> Each one will queue a read and block, but with them all doing so at
> once should fill the queue with plenty of reads.  It is inefficient,
> but better than one block at a time.

That was my first suggestion: threads with readdir(); I thought it had
been rejected hence the further discussion.

(Actually I would use clone + open + getdirentries + tiny userspace
stack to avoid using tons of memory.  But that's just a tweak, only to
be used if the threading is effective.)

-- Jamie

  reply	other threads:[~2010-04-21 20:01 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-19 15:51 readahead on directories Phillip Susi
2010-04-21  0:44 ` Jamie Lokier
2010-04-21 14:57   ` Phillip Susi
2010-04-21 16:12     ` Jamie Lokier
2010-04-21 18:10       ` Phillip Susi
2010-04-21 20:22         ` Jamie Lokier
2010-04-21 20:59           ` Phillip Susi
2010-04-21 22:06             ` Jamie Lokier
2010-04-22  7:01               ` Brad Boyer
2010-04-22 14:26               ` Phillip Susi
2010-04-22 17:53                 ` Jamie Lokier
2010-04-22 19:23                   ` Phillip Susi
2010-04-22 20:35                     ` Jamie Lokier
2010-04-22 21:22                       ` Phillip Susi
2010-04-22 22:43                         ` Jamie Lokier
2010-04-23  4:13                           ` Phillip Susi
2010-04-21 18:38       ` Evgeniy Polyakov
2010-04-21 18:51         ` Jamie Lokier
2010-04-21 18:56           ` Evgeniy Polyakov
2010-04-21 20:02             ` Jamie Lokier
2010-04-21 20:21               ` Evgeniy Polyakov
2010-04-21 20:39                 ` Jamie Lokier
2010-04-21 19:23           ` Phillip Susi
2010-04-21 20:01             ` Jamie Lokier [this message]
2010-04-21 20:13               ` Phillip Susi
2010-04-21 20:37                 ` Jamie Lokier
2010-05-07 13:38 ` unified page and buffer cache? (was: readahead on directories) Phillip Susi
2010-05-07 13:53   ` Matthew Wilcox
2010-05-07 15:45     ` unified page and buffer cache? Phillip Susi
2010-05-07 18:30       ` Matthew Wilcox
2010-05-08  0:50         ` Phillip Susi
2010-05-08  0:46       ` tytso
2010-05-08  0:54         ` Phillip Susi
2010-05-08 12:52           ` tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100421200104.GT27575@shareable.org \
    --to=jamie@shareable.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=psusi@cfl.rr.com \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).