From: Eric Sandeen <sandeen@redhat.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>,
Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 0/5 v2] add extent status tree caching
Date: Mon, 12 Aug 2013 22:21:45 -0500 [thread overview]
Message-ID: <5209A649.90406@redhat.com> (raw)
In-Reply-To: <20130813031057.GV12779@dastard>
On 8/12/13 10:10 PM, Dave Chinner wrote:
> On Sat, Aug 03, 2013 at 09:27:40PM -0400, Theodore Ts'o wrote:
>> On Tue, Jul 30, 2013 at 01:08:07PM +1000, Dave Chinner wrote:
>>> But Ted's case is not a "hint" - it's a direct command to fetch the
>>> extent map from disk. You can do that already with FIEMAP, so no new
>>> code or interfaces are needed. fadvise() is not the proper interface
>>> for manipulating filesystem metadata behaviour, and fiemap can
>>> already do what you need. There is no need for any new interfaces
>>> here.
>>
>> I've been looking at the definition of fiemap, and I'm not convinced.
>> To quote from the fiemap.txt:
>>
>> The fiemap ioctl is an efficient method for userspace to get file
>> extent mappings.
Ted -
Changing fiemap.txt is easy, if that's the only problem... :)
>> That's not what is going on here. We are pre-caching them into kernel
>> memory, not in user-space. In addition, we're also setting a flag to
>> keep these extents preferentially in memory compared to other entries
>> in the extent cache.
Reading extents via fiemap almost certainly moves that metadata into
kernel cache, simply by the act of reading the block device to get them.
It doesn't set any caching preference, but how needed is that, really,
in practice?
>> I agree that posix_fadvise() isn't really a good match, either:
>>
>> "posix_fadvise - predeclare an access pattern for file data"
>>
>> How about this? FIEMAP is an ioctl, anyway. How about if we just
>> declare this as a new fs-independent ioctl, much like FS_IOC_FIEMAP?
>>
>> #define FS_IOC_PRECACHE_EXTENTS _IO('f', 18)
>>
>> This is, of course, assuming that other file systems are interested in
>> implementing this functionality. If not, we can just keep it as
>> EXT4_IOC_PRECACHE_EXTENTS, and just call it a day. (We can always add
>> a definition of FS_IOC_PRECACHE_EXTENTS set to ext4 ioctl's code
>> point, at some later point, if people change their minds.)
>
> We *don't need to add any code* to the kernel to read extents into
> the kernel cache. The FIEMAP interface as it exists today -without
> modification- fulfils your stated requirement.
>
> I do no see any reason for adding a new, duplicated interface that
> we have to maintain and hook up to all the relevant filesystems,
> write test code for and then support forever more. It just makes no
> sense at all.
I see Dave's point that we _do_ have an interface today to read
all file extents into cache. We don't mark them as particularly sticky,
however.
This seems pretty clearly driven by a Google workload need; something you
can probably test. Does FIEMAP do the job for you or not? If not, why not?
-Eric
> Cheers,
>
> Dave.
>
next prev parent reply other threads:[~2013-08-13 3:21 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-16 15:17 [PATCH 0/5 v2] add extent status tree caching Theodore Ts'o
2013-07-16 15:17 ` [PATCH 1/5] ext4: refactor code to read the extent tree block Theodore Ts'o
2013-07-16 15:18 ` [PATCH 2/5] ext4: print the block number of invalid extent tree blocks Theodore Ts'o
2013-07-18 0:56 ` Zheng Liu
2013-07-16 15:18 ` [PATCH 3/5] ext4: use unsigned int for es_status values Theodore Ts'o
2013-07-16 15:18 ` [PATCH 4/5] ext4: cache all of an extent tree's leaf block upon reading Theodore Ts'o
2013-07-16 15:18 ` [PATCH 5/5] ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS Theodore Ts'o
2013-07-18 1:19 ` Zheng Liu
2013-07-18 2:50 ` Theodore Ts'o
2013-07-18 13:06 ` Zheng Liu
2013-07-18 15:21 ` Theodore Ts'o
2013-07-18 18:35 ` [PATCH 0/5 v2] add extent status tree caching Eric Sandeen
2013-07-18 18:53 ` Theodore Ts'o
2013-07-19 0:56 ` Eric Sandeen
2013-07-19 2:59 ` Theodore Ts'o
2013-07-19 3:33 ` Dave Chinner
2013-07-19 14:22 ` Jeff Moyer
2013-07-19 16:19 ` Theodore Ts'o
2013-07-22 1:38 ` Dave Chinner
2013-07-22 2:17 ` Zheng Liu
2013-07-22 10:02 ` Dave Chinner
2013-07-22 12:57 ` Zheng Liu
2013-07-30 3:08 ` Dave Chinner
2013-08-04 1:27 ` Theodore Ts'o
2013-08-13 3:10 ` Dave Chinner
2013-08-13 3:21 ` Eric Sandeen [this message]
2013-08-13 13:04 ` Theodore Ts'o
2013-08-16 3:21 ` Dave Chinner
2013-08-16 14:39 ` Theodore Ts'o
2013-07-18 23:54 ` Zheng Liu
2013-07-19 0:07 ` Theodore Ts'o
2013-07-19 1:03 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5209A649.90406@redhat.com \
--to=sandeen@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).