From: Eric Sandeen <sandeen@redhat.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
Zheng Liu <gnehzuil.liu@gmail.com>
Subject: Re: [PATCH 0/5 v2] add extent status tree caching
Date: Thu, 18 Jul 2013 13:35:24 -0500 [thread overview]
Message-ID: <51E8356C.9030603@redhat.com> (raw)
In-Reply-To: <1373987883-4466-1-git-send-email-tytso@mit.edu>
On 7/16/13 10:17 AM, Theodore Ts'o wrote:
> In addition to fixing a few bugs and addressing review comments, we now
> add a new ioctl, EXT4_IOC_PRECACHE_EXTENTS, which forces all of the
> extents in an inode to be cached in the extents status tree, and marks
> them to be preferentially protected when under memory pressure.
>
> This is critically important when using AIO to a preallocated file,
> since if we need to read in blocks from the extent tree, the
> io_submit(2) system call becomes synchronous, which is rather rude to
> applications which were expecting the AIO to be "A".
>
> As a bonus, using the extent status tree to store the logical to
> physical block mapping is usually more compact that having to keep one
> or more extent tree blocks in the buffer cache.
>
> (Should we do this all the time, instead of when the application
> explicitly requests it? Maybe; there could be cases with very large,
> fragmented files accessed by an application such as "file" is only needs
> to look at a small subset of the file where this could result in an
> unnecessary work and memory allocated. OTOH, 95%+ of the time this
> would probably be a win...)
I'd say yes, we should - maybe not in all cases but if you need it for
AIO, try to make it "all the time" at least for that AIO?
We keep telling application writers not to assume certain things about
various filesystems, or to write applications that treat ext4 differently
han ext3 differently than xfs etc...
This goes the other way.
In the end who (besides google?) is really going to call this IOCTL?
I wondered if only doing this when files are opened O_DIRECT might make
sense, but Jeff Moyer pointed out that giant databases probably don't
want to read in their entire block mapping tree - OTOH, they probably use
preallocation if they're smart, and maybe it's not that bad.
Or what about tying this into POSIX_FADV_WILLNEED? Hohum, that gets
into force_page_cache_readahead(). We need POSIX_FADV_WILLNEED_META...
-Eric
>
> Theodore Ts'o (5):
> ext4: refactor code to read the extent tree block
> ext4: print the block number of invalid extent tree blocks
> ext4: use unsigned int for es_status values
> ext4: cache all of an extent tree's leaf block upon reading
> ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS
>
> fs/ext4/ext4.h | 19 +++-
> fs/ext4/extents.c | 259 +++++++++++++++++++++++++++++---------------
> fs/ext4/extents_status.c | 52 ++++++++-
> fs/ext4/extents_status.h | 50 +++++----
> fs/ext4/inode.c | 6 +-
> fs/ext4/ioctl.c | 3 +
> fs/ext4/migrate.c | 2 +-
> fs/ext4/move_extent.c | 2 +-
> include/trace/events/ext4.h | 28 +++--
> 9 files changed, 296 insertions(+), 125 deletions(-)
>
next prev parent reply other threads:[~2013-07-18 18:35 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-16 15:17 [PATCH 0/5 v2] add extent status tree caching Theodore Ts'o
2013-07-16 15:17 ` [PATCH 1/5] ext4: refactor code to read the extent tree block Theodore Ts'o
2013-07-16 15:18 ` [PATCH 2/5] ext4: print the block number of invalid extent tree blocks Theodore Ts'o
2013-07-18 0:56 ` Zheng Liu
2013-07-16 15:18 ` [PATCH 3/5] ext4: use unsigned int for es_status values Theodore Ts'o
2013-07-16 15:18 ` [PATCH 4/5] ext4: cache all of an extent tree's leaf block upon reading Theodore Ts'o
2013-07-16 15:18 ` [PATCH 5/5] ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS Theodore Ts'o
2013-07-18 1:19 ` Zheng Liu
2013-07-18 2:50 ` Theodore Ts'o
2013-07-18 13:06 ` Zheng Liu
2013-07-18 15:21 ` Theodore Ts'o
2013-07-18 18:35 ` Eric Sandeen [this message]
2013-07-18 18:53 ` [PATCH 0/5 v2] add extent status tree caching Theodore Ts'o
2013-07-19 0:56 ` Eric Sandeen
2013-07-19 2:59 ` Theodore Ts'o
2013-07-19 3:33 ` Dave Chinner
2013-07-19 14:22 ` Jeff Moyer
2013-07-19 16:19 ` Theodore Ts'o
2013-07-22 1:38 ` Dave Chinner
2013-07-22 2:17 ` Zheng Liu
2013-07-22 10:02 ` Dave Chinner
2013-07-22 12:57 ` Zheng Liu
2013-07-30 3:08 ` Dave Chinner
2013-08-04 1:27 ` Theodore Ts'o
2013-08-13 3:10 ` Dave Chinner
2013-08-13 3:21 ` Eric Sandeen
2013-08-13 13:04 ` Theodore Ts'o
2013-08-16 3:21 ` Dave Chinner
2013-08-16 14:39 ` Theodore Ts'o
2013-07-18 23:54 ` Zheng Liu
2013-07-19 0:07 ` Theodore Ts'o
2013-07-19 1:03 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51E8356C.9030603@redhat.com \
--to=sandeen@redhat.com \
--cc=gnehzuil.liu@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).