Re: readahead on directories

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jamie Lokier <jamie@shareable.org>
To: Phillip Susi <psusi@cfl.rr.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>,
	linux-fsdevel@vger.kernel.org,
	Linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: readahead on directories
Date: Wed, 21 Apr 2010 21:37:21 +0100	[thread overview]
Message-ID: <20100421203721.GW27575@shareable.org> (raw)
In-Reply-To: <4BCF5C87.8060509@cfl.rr.com>

Phillip Susi wrote:
> On 4/21/2010 4:01 PM, Jamie Lokier wrote:
> > Ok, this discussion has got a bit confused.  Text above refers to
> > needing to asynchronously read next block in a directory, but if they
> > are small then that's not important.
> 
> It is very much important since if you ready each small directory one
> block at a time, it is very slow.  You want to queue up reads to all of
> them at once so they can be batched.

I don't understand what you are saying at this point.  Or you don't
understand what I'm saying.  Or I didn't understand what Evigny was
saying :-)

Small directories don't _have_ next blocks; this is not a problem for
them.  And you've explained that filesystems of interest already fetch
readahead_size in larger directories, so they don't have the "next
block" problem either.

> > That was my first suggestion: threads with readdir(); I thought it had
> > been rejected hence the further discussion.
> 
> Yes, it was sort of rejected, which is why I said it's just a workaround
> for now until readahead() works on directories.  It will produce the
> desired IO pattern but at the expense of ram and cpu cycles creating a
> bunch of short lived threads that go to sleep almost immediately after
> being created, and exit when they wake up.  readahead() would be much
> more efficient.

Some test results comparing AIO with kernel threads indicate that
threads are more efficient than you might expect for this.  Especially
in the cold I/O cache cases.  readahead() has to do a lot of the same
work, in a different way and with less opportunity to parallelise the
metadata stage.

clone() threads with tiny stacks (you can even preallocate the stacks,
and they can be smaller than a page) aren't especially slow or big,
and ideally you'll use *long-lived* threads with an efficient
multi-consumer queue that they pull requests from, written to by the
main program and kept full enough to avoid blocking the threads.

Also since you're discarding the getdirentries() data, you can read
all of it into the same memory for hot cache goodness.  (One per CPU
please.)

I don't know what performance that'll get you, but I think it'll be
faster than you are expecting - *if* the directory locking is
sufficiently scalable at this point.  That's an unknown.

Try it with files if you want to get a comparative picture.

-- Jamie

next prev parent reply	other threads:[~2010-04-21 20:37 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-19 15:51 readahead on directories Phillip Susi
2010-04-21  0:44 ` Jamie Lokier
2010-04-21 14:57   ` Phillip Susi
2010-04-21 16:12     ` Jamie Lokier
2010-04-21 18:10       ` Phillip Susi
2010-04-21 20:22         ` Jamie Lokier
2010-04-21 20:59           ` Phillip Susi
2010-04-21 22:06             ` Jamie Lokier
2010-04-22  7:01               ` Brad Boyer
2010-04-22 14:26               ` Phillip Susi
2010-04-22 17:53                 ` Jamie Lokier
2010-04-22 19:23                   ` Phillip Susi
2010-04-22 20:35                     ` Jamie Lokier
2010-04-22 21:22                       ` Phillip Susi
2010-04-22 22:43                         ` Jamie Lokier
2010-04-23  4:13                           ` Phillip Susi
2010-04-21 18:38       ` Evgeniy Polyakov
2010-04-21 18:51         ` Jamie Lokier
2010-04-21 18:56           ` Evgeniy Polyakov
2010-04-21 20:02             ` Jamie Lokier
2010-04-21 20:21               ` Evgeniy Polyakov
2010-04-21 20:39                 ` Jamie Lokier
2010-04-21 19:23           ` Phillip Susi
2010-04-21 20:01             ` Jamie Lokier
2010-04-21 20:13               ` Phillip Susi
2010-04-21 20:37                 ` Jamie Lokier [this message]
2010-05-07 13:38 ` unified page and buffer cache? (was: readahead on directories) Phillip Susi
2010-05-07 13:53   ` Matthew Wilcox
2010-05-07 15:45     ` unified page and buffer cache? Phillip Susi
2010-05-07 18:30       ` Matthew Wilcox
2010-05-08  0:50         ` Phillip Susi
2010-05-08  0:46       ` tytso
2010-05-08  0:54         ` Phillip Susi
2010-05-08 12:52           ` tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100421203721.GW27575@shareable.org \
    --to=jamie@shareable.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=psusi@cfl.rr.com \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).