linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Phillip Susi <psusi@cfl.rr.com>
Cc: linux-fsdevel@vger.kernel.org,
	Linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: readahead on directories
Date: Wed, 21 Apr 2010 17:12:11 +0100	[thread overview]
Message-ID: <20100421161211.GC27575@shareable.org> (raw)
In-Reply-To: <4BCF123C.6010400@cfl.rr.com>

Phillip Susi wrote:
> On 4/20/2010 8:44 PM, Jamie Lokier wrote:
> > readahead() doesn't make much sense on a directory - the offset and
> > size aren't meaningful.
> > 
> > But does plain opendir/readdir/closedir solve the problem?
> 
> No, since those are synchronous.  I want to have readahead() queue up
> reading the entire directory in the background to avoid blocking, and
> get the queue filled with a bunch of requests that can be merged into
> larger segments before being dispatched to the hardware.

Asynchronous is available: Use clone or pthreads.

More broadly: One of the ways to better I/O sorting is to make sure
you've got enough things in parallel that the I/O queue is never
empty, so what you issue has time to get sorted before it reaches the
head of the queue for dispatch.  On the other hand, not so many things
in parallel that the queues fill up and throttle.  Unfortunately it
only works if things aren't serialised by kernel locks - but there's been
a lot of work on lockless this and that in the kernel, which may help.

Back to your problem: You need a bunch of scattered block requests to
be queued and sorted sanely, and readdir doesn't do that, and even
waits for each block before issuing the next request.

Or does it?

A quick skim of fs/{ext3,ext4}/dir.c finds a call to
page_cache_sync_readahead.  Doesn't that do any reading ahead? :-)

> I don't actually care to have the contents of the
> directories returned, so readdir() does more than I need in that
> respect, and also it performs a blocking read of one disk block at a
> time, which is horribly slow with a cold cache.

I/O is the probably the biggest cost, so it's more important to get
the I/O pattern you want than worrying about return values you'll discard.

If readdir() calls are slowed by lots of calls and libc, consider
using the getdirentries system call directly.

If not, fs/ext4/namei.c:ext4_dir_inode_operations points to
ext4_fiemap.  So you may have luck calling FIEMAP or FIBMAP on the
directory, and then reading blocks using the block device.  I'm not
sure if the cache loaded via the block device (when mounted) will then
be used for directory lookups.

-- Jamie

  reply	other threads:[~2010-04-21 16:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-19 15:51 readahead on directories Phillip Susi
2010-04-21  0:44 ` Jamie Lokier
2010-04-21 14:57   ` Phillip Susi
2010-04-21 16:12     ` Jamie Lokier [this message]
2010-04-21 18:10       ` Phillip Susi
2010-04-21 20:22         ` Jamie Lokier
2010-04-21 20:59           ` Phillip Susi
2010-04-21 22:06             ` Jamie Lokier
2010-04-22  7:01               ` Brad Boyer
2010-04-22 14:26               ` Phillip Susi
2010-04-22 17:53                 ` Jamie Lokier
2010-04-22 19:23                   ` Phillip Susi
2010-04-22 20:35                     ` Jamie Lokier
2010-04-22 21:22                       ` Phillip Susi
2010-04-22 22:43                         ` Jamie Lokier
2010-04-23  4:13                           ` Phillip Susi
2010-04-21 18:38       ` Evgeniy Polyakov
2010-04-21 18:51         ` Jamie Lokier
2010-04-21 18:56           ` Evgeniy Polyakov
2010-04-21 20:02             ` Jamie Lokier
2010-04-21 20:21               ` Evgeniy Polyakov
2010-04-21 20:39                 ` Jamie Lokier
2010-04-21 19:23           ` Phillip Susi
2010-04-21 20:01             ` Jamie Lokier
2010-04-21 20:13               ` Phillip Susi
2010-04-21 20:37                 ` Jamie Lokier
2010-05-07 13:38 ` unified page and buffer cache? (was: readahead on directories) Phillip Susi
2010-05-07 13:53   ` Matthew Wilcox
2010-05-07 15:45     ` unified page and buffer cache? Phillip Susi
2010-05-07 18:30       ` Matthew Wilcox
2010-05-08  0:50         ` Phillip Susi
2010-05-08  0:46       ` tytso
2010-05-08  0:54         ` Phillip Susi
2010-05-08 12:52           ` tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100421161211.GC27575@shareable.org \
    --to=jamie@shareable.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=psusi@cfl.rr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).