Re: readahead on directories

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Phillip Susi <psusi@cfl.rr.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: linux-fsdevel@vger.kernel.org,
	Linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: readahead on directories
Date: Thu, 22 Apr 2010 15:23:55 -0400	[thread overview]
Message-ID: <4BD0A24B.4060209@cfl.rr.com> (raw)
In-Reply-To: <20100422175322.GE6265@shareable.org>

On 4/22/2010 1:53 PM, Jamie Lokier wrote:
> Right, but finding those blocks is highly filesystem-dependent which
> is why making it a generic feature would need support in each filesystem.

It already exists, it's called ->get_blocks().  That's how readahead()
figures out which blocks need to be read.

> support FIEMAP on directories should work.  We're back to why not do
> it yourself then, as very few programs need directory readahead.

Because there's already a system call to accomplish that exact task; why
reinvent the wheel?

> If you're interested, try finding all the places which could sleep for
> a write() call...  Note that POSIX requires a mutex for write; you
> can't easily change that.  Reading is easier to make fully async than
> writing.

POSIX doesn't say anything about how write() must be implemented
internally.  You can do without mutexes just fine.  A good deal of the
current code does use mutexes, but does not have to.  If your data is
organized well then the critical sections of code that modify it can be
kept very small, and guarded with either atomic access functions or a
spin lock.  A mutex is more convenient since it it allows you to have
much larger critical sections and sleep, but we don't really like having
coarse grained locking in the kernel.

> Then readahead() isn't async, which was your request...  It can block
> waiting for memory and other things when you call it.

It doesn't have to block; it can return -ENOMEM or -EWOULDBLOCK.

> Exactly.  And making it so it _never_ blocks when called is a ton of
> work, more lines of code (in C anyway), a maintainability nightmare,
> and adds some different bottlenecks you've not thought off.  At this
> point I suggest you look up the 2007 discussions about fibrils which
> are quite good: They cover the overheads of setting up state for async
> calls when unnecessary, and the beautiful simplicty of treating stack
> frames as states in their own right.

Sounds like an interesting compromise.  I'll look it up.

> No: In that particular case, waiting while the indirect block is
> parsed is advantageous.  But suppose the first indirect block is
> located close to the second file's data blocks.  Or the second file's
> data blocks are on a different MD backing disk.  Or the disk has
> different seeking characteristics (flash, DRBD).

Hrm... true, so knowing this, defrag could lay out the indirect block of
the first file after the first 12 blocks of the second file to maintain
optimal reading.  Hrm... I might have to try that.

> I reckon the same applies to your readahead() calls: A queue which you
> make sure is always full enough that threads never block, sorted by
> inode number or better hints where known, with a small number of
> threads calling readahead() for files, and doing whatever is useful
> for directories.

Yes, and ureadahead already orders the calls to readahead() based on
disk block order.  Multithreading it leads the problem with backward
seeks right now but a tweak to the way defrag lays out the indirect
blocks, should fix that.  The more I think about it the better this idea
sounds.

next prev parent reply	other threads:[~2010-04-22 19:24 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-19 15:51 readahead on directories Phillip Susi
2010-04-21  0:44 ` Jamie Lokier
2010-04-21 14:57   ` Phillip Susi
2010-04-21 16:12     ` Jamie Lokier
2010-04-21 18:10       ` Phillip Susi
2010-04-21 20:22         ` Jamie Lokier
2010-04-21 20:59           ` Phillip Susi
2010-04-21 22:06             ` Jamie Lokier
2010-04-22  7:01               ` Brad Boyer
2010-04-22 14:26               ` Phillip Susi
2010-04-22 17:53                 ` Jamie Lokier
2010-04-22 19:23                   ` Phillip Susi [this message]
2010-04-22 20:35                     ` Jamie Lokier
2010-04-22 21:22                       ` Phillip Susi
2010-04-22 22:43                         ` Jamie Lokier
2010-04-23  4:13                           ` Phillip Susi
2010-04-21 18:38       ` Evgeniy Polyakov
2010-04-21 18:51         ` Jamie Lokier
2010-04-21 18:56           ` Evgeniy Polyakov
2010-04-21 20:02             ` Jamie Lokier
2010-04-21 20:21               ` Evgeniy Polyakov
2010-04-21 20:39                 ` Jamie Lokier
2010-04-21 19:23           ` Phillip Susi
2010-04-21 20:01             ` Jamie Lokier
2010-04-21 20:13               ` Phillip Susi
2010-04-21 20:37                 ` Jamie Lokier
2010-05-07 13:38 ` unified page and buffer cache? (was: readahead on directories) Phillip Susi
2010-05-07 13:53   ` Matthew Wilcox
2010-05-07 15:45     ` unified page and buffer cache? Phillip Susi
2010-05-07 18:30       ` Matthew Wilcox
2010-05-08  0:50         ` Phillip Susi
2010-05-08  0:46       ` tytso
2010-05-08  0:54         ` Phillip Susi
2010-05-08 12:52           ` tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BD0A24B.4060209@cfl.rr.com \
    --to=psusi@cfl.rr.com \
    --cc=jamie@shareable.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).