From: Andreas Dilger <adilger@sun.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 3/3] ext4: Use readahead when reading an inode from the inode table
Date: Wed, 01 Oct 2008 19:17:46 -0700 [thread overview]
Message-ID: <20081002021746.GL3160@webber.adilger.int> (raw)
In-Reply-To: <20080928042749.GA8711@mit.edu>
On Sep 28, 2008 00:27 -0400, Theodore Ts'o wrote:
> ext4: Use readahead when reading an inode from the inode table
>
> With modern hard drives, reading 64k takes roughly the same time as
> reading a 4k block. So request readahead for adjacent inode table
> blocks to reduce the time it takes when iterating over directories
> (especially when doing this in htree sort order) in a cold cache case.
> With this patch, the time it takes to run "git status" on a kernel
> tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches"
> is reduced by 21%.
I'd actually thought that having a tunable in units of "kB" is better
than blocks, since userspace shouldn't have to know the filesystem
block size to tune readahead for a device. Depending on the block size
this tunable can vary by 64x the amount of readahead (1kB vs. 64kB blocks).
> @@ -3969,6 +3934,36 @@ static int __ext4_get_inode_loc(struct inode *inode,
>
> make_io:
> /*
> + * If we need to do any I/O, try to readahead up to 16
> + * blocks from the inode table.
Comment is out of date.
> + if (EXT4_SB(sb)->s_inode_readahead_blks) {
> + /* Make sure s_inode_readahead_blks is a power of 2 */
> + while (EXT4_SB(sb)->s_inode_readahead_blks &
> + (EXT4_SB(sb)->s_inode_readahead_blks-1))
> + EXT4_SB(sb)->s_inode_readahead_blks =
> + (EXT4_SB(sb)->s_inode_readahead_blks &
> + (EXT4_SB(sb)->s_inode_readahead_blks-1));
Is there a good reason why the readahead blocks is a power of 2? Given
that the blocks are likely NOT contiguous for a directory, nor are they
aligned to the underlying LUN offsets, I don't think this is a benefit.
In any case, any tweaking of s_inode_readahead_blks should probably be
done at the time it is set instead of each time an inode is read.
> + ext4_error(sb, "ext4_get_inode_loc",
s/ext4_get_inode_loc/__func__/?
> + case Opt_inode_readahead_blks:
> + if (option < 0 || option > 31)
> + return 0;
> + sbi->s_inode_readahead_blks = option;
This would appear to limit the inode_readahead_blks to 31 blocks, yet the
default is 32? I suspect this is left over from when it was a shift?
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
next prev parent reply other threads:[~2008-10-02 2:17 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-24 16:53 [PATCH 1/3] ext4: move /proc setup and teardown out of mballoc.c Theodore Ts'o
2008-09-24 16:53 ` [PATCH 2/3] ext4: Combine proc file handling into a single set of functions Theodore Ts'o
2008-09-24 16:53 ` [PATCH 3/3] ext4: Use readahead when reading an inode from the inode table Theodore Ts'o
2008-09-26 8:21 ` Andreas Dilger
2008-09-26 13:47 ` Eric Sandeen
2008-09-28 4:27 ` Theodore Tso
2008-10-02 2:17 ` Andreas Dilger [this message]
2008-10-03 3:56 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081002021746.GL3160@webber.adilger.int \
--to=adilger@sun.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.