linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 5/6] libext2fs/e2fsck: provide routines to read-ahead metadata
Date: Mon, 11 Aug 2014 11:55:32 -0700	[thread overview]
Message-ID: <20140811185532.GA1695@birch.djwong.org> (raw)
In-Reply-To: <20140811183258.GF6553@thunk.org>

On Mon, Aug 11, 2014 at 02:32:58PM -0400, Theodore Ts'o wrote:
> On Mon, Aug 11, 2014 at 11:05:09AM -0700, Darrick J. Wong wrote:
> > 
> > Using the bitmap turns out to be pretty quick (~130us to start RA for 4 groups
> > vs. ~70us per group if I issue the RA directly).  Each fadvise call seems to
> > cost us ~1ms, so I'll keep using the bitmap to minimize the number of fadvise
> > calls, since it's also a lot less code.
> 
> 4 groups?  Since the default flex_bg size is 16 block groups, I would
> have expected that you would want to start RA every 16 groups.

I was expecting 16 groups (32M readahead) to win, but as the observations in my
spreadsheet show, 2MB tends to win.  I _think_ the reason is that if we
encounter indirect map blocks or ETB blocks, they tend to be fairly close to
the file blocks in the block group, and if we're trying to do a large readahead
at the same time, we end up with a largeish seek penalty (half the flexbg on
average) for every ETB/map block.

I figured out what was going on with the 1TB SSD -- it has a huge RAM cache big
enough to store most of the metadata.  At that point, reads are essentially
free, but readahead costs us ~1ms per fadvise call.  If you use a RA buffer
that's big enough that there aren't many fadvise calls then you still come out
ahead (ditto if you shove the RA into a separate thread) but otherwise the
fadvise calls add up, badly.

Actually, I'd considered using a default of flexbg_size * itable_size, but (a)
the USB results are pretty bad for 32M v. 2M, and (b) I was thinking that 2MB
of readahead might be small enough that we could enable it by default without
having to worry about the mal-effects of parallel e2fsck runs.

A logical next step might be to do ETB/block map readahead, but let's keep it
simple for now.  I should have time to update the spreadsheet to reflect
performance of the new bitmap code while I go mess with fixing the jbd2
problems.

> (And BTW, I've been wondering whether we should increase the flex_bg
> size for bigger file systems.  By the time we get to 4TB disks, Having
> a flex_bg every 2GB seems a little small.)

:)

--D
> 
> 						- Ted

  reply	other threads:[~2014-08-11 18:55 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-09  4:26 [PATCH 0/6] e2fsprogs Summer 2014 patchbomb, part 5 Darrick J. Wong
2014-08-09  4:26 ` [PATCH 1/6] libext2fs: create inlinedata symlinks Darrick J. Wong
2014-08-24 16:15   ` Theodore Ts'o
2014-08-09  4:26 ` [PATCH 2/6] misc: fix gcc warnings Darrick J. Wong
2014-08-24 16:24   ` Theodore Ts'o
2014-08-09  4:26 ` [PATCH 3/6] mke2fs: set block_validity as a default mount option Darrick J. Wong
2014-08-24 22:47   ` Theodore Ts'o
2014-08-25 15:52     ` Darrick J. Wong
2014-08-25 16:36       ` [PATCH] ext4: enable block_validity by default Darrick J. Wong
2014-09-02  2:02         ` Theodore Ts'o
2014-08-09  4:26 ` [PATCH 4/6] ext2fs: add readahead method to improve scanning Darrick J. Wong
2014-08-09  4:26 ` [PATCH 5/6] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
2014-08-11  5:21   ` Darrick J. Wong
2014-08-11  6:24     ` Theodore Ts'o
2014-08-11  6:31       ` Darrick J. Wong
2014-08-11 14:34         ` Theodore Ts'o
2014-08-11 18:05           ` Darrick J. Wong
2014-08-11 18:32             ` Theodore Ts'o
2014-08-11 18:55               ` Darrick J. Wong [this message]
2014-08-11 20:10                 ` Theodore Ts'o
2014-08-11 20:50                   ` Darrick J. Wong
2014-08-09  4:26 ` [PATCH 6/6] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
2014-08-09  5:53 ` [PATCH 0/6] e2fsprogs Summer 2014 patchbomb, part 5 Theodore Ts'o
2014-08-09  5:59   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140811185532.GA1695@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).