From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 5/6] libext2fs/e2fsck: provide routines to read-ahead metadata
Date: Mon, 11 Aug 2014 13:50:19 -0700 [thread overview]
Message-ID: <20140811205019.GB1695@birch.djwong.org> (raw)
In-Reply-To: <20140811201030.GH6553@thunk.org>
On Mon, Aug 11, 2014 at 04:10:30PM -0400, Theodore Ts'o wrote:
> On Mon, Aug 11, 2014 at 11:55:32AM -0700, Darrick J. Wong wrote:
> > I was expecting 16 groups (32M readahead) to win, but as the observations in my
> > spreadsheet show, 2MB tends to win. I _think_ the reason is that if we
> > encounter indirect map blocks or ETB blocks, they tend to be fairly close to
> > the file blocks in the block group, and if we're trying to do a large readahead
> > at the same time, we end up with a largeish seek penalty (half the flexbg on
> > average) for every ETB/map block.
>
> Hmm, that might be an argument for not trying to increase the flex_bg
> size, since we want to keep seek distances within a flex_bg to be
> dominated by settling time, and not by the track-to-track
> accelleration/coasting/deaccelleration time.
It might not be too horrible of a regression, since the distance between tracks
has gotten shorter and cylinders themselves have gotten bigger. I suppose
you'd have to test a variety of flexbg sizes against a disk from, say, 5 years
ago. If you know the size of the files you'll be storing at mkfs time (such as
with the mk_hugefiles.c options) then increasing flexbg size is probably ok to
avoid fragmenting.
But yes, I was sort of enjoying how stuff within a flexbg gets (sort of) faster
as disks get bigger. :)
> > I figured out what was going on with the 1TB SSD -- it has a huge RAM cache big
> > enough to store most of the metadata. At that point, reads are essentially
> > free, but readahead costs us ~1ms per fadvise call.
>
> Do we understand why fadvise() takes 1ms? Is that something we can fix?
>
> And readahead(2) was even worse, right?
>From the readahead(2) manpage:
"readahead() blocks until the specified data has been read."
The fadvise time is pretty consistently 1ms, but with readahead you have to
wait for it to read everything off the disk. That's fine for threaded
readahead, but for our single-thread readahead it's not much better than
regular blocking reads. Letting the kernel do the readahead in the background
is way faster.
I don't know why fadvise takes so long. I'll ftrace it to see where it goes.
--D
>
> - Ted
next prev parent reply other threads:[~2014-08-11 20:50 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-09 4:26 [PATCH 0/6] e2fsprogs Summer 2014 patchbomb, part 5 Darrick J. Wong
2014-08-09 4:26 ` [PATCH 1/6] libext2fs: create inlinedata symlinks Darrick J. Wong
2014-08-24 16:15 ` Theodore Ts'o
2014-08-09 4:26 ` [PATCH 2/6] misc: fix gcc warnings Darrick J. Wong
2014-08-24 16:24 ` Theodore Ts'o
2014-08-09 4:26 ` [PATCH 3/6] mke2fs: set block_validity as a default mount option Darrick J. Wong
2014-08-24 22:47 ` Theodore Ts'o
2014-08-25 15:52 ` Darrick J. Wong
2014-08-25 16:36 ` [PATCH] ext4: enable block_validity by default Darrick J. Wong
2014-09-02 2:02 ` Theodore Ts'o
2014-08-09 4:26 ` [PATCH 4/6] ext2fs: add readahead method to improve scanning Darrick J. Wong
2014-08-09 4:26 ` [PATCH 5/6] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
2014-08-11 5:21 ` Darrick J. Wong
2014-08-11 6:24 ` Theodore Ts'o
2014-08-11 6:31 ` Darrick J. Wong
2014-08-11 14:34 ` Theodore Ts'o
2014-08-11 18:05 ` Darrick J. Wong
2014-08-11 18:32 ` Theodore Ts'o
2014-08-11 18:55 ` Darrick J. Wong
2014-08-11 20:10 ` Theodore Ts'o
2014-08-11 20:50 ` Darrick J. Wong [this message]
2014-08-09 4:26 ` [PATCH 6/6] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
2014-08-09 5:53 ` [PATCH 0/6] e2fsprogs Summer 2014 patchbomb, part 5 Theodore Ts'o
2014-08-09 5:59 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140811205019.GB1695@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).