From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>, Eric Sandeen <sandeen@redhat.com>,
Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 0/5 v2] add extent status tree caching
Date: Mon, 22 Jul 2013 20:57:45 +0800 [thread overview]
Message-ID: <20130722125745.GA2827@gmail.com> (raw)
In-Reply-To: <20130722100255.GF11674@dastard>
On Mon, Jul 22, 2013 at 08:02:55PM +1000, Dave Chinner wrote:
> On Mon, Jul 22, 2013 at 10:17:42AM +0800, Zheng Liu wrote:
> > On Mon, Jul 22, 2013 at 11:38:31AM +1000, Dave Chinner wrote:
> > > On Fri, Jul 19, 2013 at 12:19:30PM -0400, Theodore Ts'o wrote:
> > > > On Fri, Jul 19, 2013 at 01:33:09PM +1000, Dave Chinner wrote:
> > > > > An ioctl is kinda silly for this. Just use O_NONBLOCK when calling
> > > > > open() and do the prefetch right in the open call. The open() can
> > > > > block, anyway, and what you are trying to do is non-blocking IO with
> > > > > AIO, so it seems like we've already got a sensible, generic
> > > > > interface for triggering this sort of prefetch operation.
> > > >
> > > > O_NONBLOCK (either set via open or fcntl) is a possibility, since it's
> > > > carefully defined to be unspecified for regular files by SUSv3. It is
> > > > quite different from the existing semantics for O_NONBLOCK, though.
> > > > Currently, for all file types where O_NONBLOCK is not ignored, open(2)
> > > > is guaranteed itself not to block. If we use O_NONBLOCK for regular
> > > > files to mean that any necessary metadata blocks required for AIO to
> > > > be "A" will be cached, then it will make open(2) much more likely to
> > > > block. Also, for all file types where O_NONBLOCK is not ignored,
> > > > read(2) will not block but instead return -1 and set errno to EAGAIN.
> > > > This would also be a change.
> > > >
> > > > If we tried to get this new semantics for O_NONBLOCK to be accepted by
> > > > the Austin Group for standardization in the future, would they accept
> > > > it, or would they say, "this makes me vommit"? I have a suspicion
> > > > there reaction might be closer to the latter....
> > > >
> > > > If we want a VFS-level API, in my opinion an fadvise() flag would be a
> > > > better choice.
> > >
> > > Sure. Make it an fadvise() flag - just don't add ioctls for things
> > > that are generically useful.
> > >
> > > On second thoughts - you're trying to get the extent map read in. We
> > > already have an interface for querying extent maps - fiemap.
> > > FIEMAP_FLAG_PREFETCH along with the range of the file you want the
> > > extent map prefetched for?
> >
> > I don't think fiemap is a good interface. The application uses
> > fiemap(2) to retrieve extent mapping.
>
> fiemap is used to query information about extent maps. What it
> returns is entirely dependent on the input parameters that are
> passed to it. Indeed, from Documentation/filesystems/fiemap.txt:
>
> "If fm_extent_count is zero, then the fm_extents[] array is ignored
> (no extents will be returned), and the fm_mapped_extents count will
> hold the number of extents needed in fm_extents[] to hold the file's
> current mapping."
>
> Think about that for a minute. What does the filesystem do with such
> an fiemap query when the extent map is not cached? That's right,
> *fiemap reads the extent map from disk into the cache* and then
> returns the number of extents in the range.
>
> All I have suggested is adding a flag to make this an *explicit
> operation* rather than a side effect of a "count extents" query. I
> fail to see any justification for a whole new interface when we
> already have a perfectly functional one that already provides the
> functionality that is required...
Yes, I understand your point of view. We can use fiemap to do that.
All I concern is about semantics. When someone mention about fiemap,
first I remember is that I can use it to retrieve the extent mappings.
But for fadvise, it looks like more naturally. When I look at it, I
always think that I can use it to provide a hint to the kernel, and then
the kernel will do the rest of things for me. So that is why I prefer
to use a fadvise flag rather than use fiemap.
Regards,
- Zheng
next prev parent reply other threads:[~2013-07-22 12:57 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-16 15:17 [PATCH 0/5 v2] add extent status tree caching Theodore Ts'o
2013-07-16 15:17 ` [PATCH 1/5] ext4: refactor code to read the extent tree block Theodore Ts'o
2013-07-16 15:18 ` [PATCH 2/5] ext4: print the block number of invalid extent tree blocks Theodore Ts'o
2013-07-18 0:56 ` Zheng Liu
2013-07-16 15:18 ` [PATCH 3/5] ext4: use unsigned int for es_status values Theodore Ts'o
2013-07-16 15:18 ` [PATCH 4/5] ext4: cache all of an extent tree's leaf block upon reading Theodore Ts'o
2013-07-16 15:18 ` [PATCH 5/5] ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS Theodore Ts'o
2013-07-18 1:19 ` Zheng Liu
2013-07-18 2:50 ` Theodore Ts'o
2013-07-18 13:06 ` Zheng Liu
2013-07-18 15:21 ` Theodore Ts'o
2013-07-18 18:35 ` [PATCH 0/5 v2] add extent status tree caching Eric Sandeen
2013-07-18 18:53 ` Theodore Ts'o
2013-07-19 0:56 ` Eric Sandeen
2013-07-19 2:59 ` Theodore Ts'o
2013-07-19 3:33 ` Dave Chinner
2013-07-19 14:22 ` Jeff Moyer
2013-07-19 16:19 ` Theodore Ts'o
2013-07-22 1:38 ` Dave Chinner
2013-07-22 2:17 ` Zheng Liu
2013-07-22 10:02 ` Dave Chinner
2013-07-22 12:57 ` Zheng Liu [this message]
2013-07-30 3:08 ` Dave Chinner
2013-08-04 1:27 ` Theodore Ts'o
2013-08-13 3:10 ` Dave Chinner
2013-08-13 3:21 ` Eric Sandeen
2013-08-13 13:04 ` Theodore Ts'o
2013-08-16 3:21 ` Dave Chinner
2013-08-16 14:39 ` Theodore Ts'o
2013-07-18 23:54 ` Zheng Liu
2013-07-19 0:07 ` Theodore Ts'o
2013-07-19 1:03 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130722125745.GA2827@gmail.com \
--to=gnehzuil.liu@gmail.com \
--cc=david@fromorbit.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.