linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>, Eric Sandeen <sandeen@redhat.com>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 0/5 v2] add extent status tree caching
Date: Mon, 22 Jul 2013 20:57:45 +0800	[thread overview]
Message-ID: <20130722125745.GA2827@gmail.com> (raw)
In-Reply-To: <20130722100255.GF11674@dastard>

On Mon, Jul 22, 2013 at 08:02:55PM +1000, Dave Chinner wrote:
> On Mon, Jul 22, 2013 at 10:17:42AM +0800, Zheng Liu wrote:
> > On Mon, Jul 22, 2013 at 11:38:31AM +1000, Dave Chinner wrote:
> > > On Fri, Jul 19, 2013 at 12:19:30PM -0400, Theodore Ts'o wrote:
> > > > On Fri, Jul 19, 2013 at 01:33:09PM +1000, Dave Chinner wrote:
> > > > > An ioctl is kinda silly for this. Just use O_NONBLOCK when calling
> > > > > open() and do the prefetch right in the open call. The open() can
> > > > > block, anyway, and what you are trying to do is non-blocking IO with
> > > > > AIO, so it seems like we've already got a sensible, generic
> > > > > interface for triggering this sort of prefetch operation.
> > > > 
> > > > O_NONBLOCK (either set via open or fcntl) is a possibility, since it's
> > > > carefully defined to be unspecified for regular files by SUSv3.  It is
> > > > quite different from the existing semantics for O_NONBLOCK, though.
> > > > Currently, for all file types where O_NONBLOCK is not ignored, open(2)
> > > > is guaranteed itself not to block.  If we use O_NONBLOCK for regular
> > > > files to mean that any necessary metadata blocks required for AIO to
> > > > be "A" will be cached, then it will make open(2) much more likely to
> > > > block.  Also, for all file types where O_NONBLOCK is not ignored,
> > > > read(2) will not block but instead return -1 and set errno to EAGAIN.
> > > > This would also be a change.
> > > > 
> > > > If we tried to get this new semantics for O_NONBLOCK to be accepted by
> > > > the Austin Group for standardization in the future, would they accept
> > > > it, or would they say, "this makes me vommit"?  I have a suspicion
> > > > there reaction might be closer to the latter....
> > > > 
> > > > If we want a VFS-level API, in my opinion an fadvise() flag would be a
> > > > better choice.
> > > 
> > > Sure. Make it an fadvise() flag - just don't add ioctls for things
> > > that are generically useful.
> > > 
> > > On second thoughts - you're trying to get the extent map read in. We
> > > already have an interface for querying extent maps - fiemap.
> > > FIEMAP_FLAG_PREFETCH along with the range of the file you want the
> > > extent map prefetched for?
> > 
> > I don't think fiemap is a good interface.  The application uses
> > fiemap(2) to retrieve extent mapping. 
> 
> fiemap is used to query information about extent maps. What it
> returns is entirely dependent on the input parameters that are
> passed to it. Indeed, from Documentation/filesystems/fiemap.txt:
> 
> "If fm_extent_count is zero, then the fm_extents[] array is ignored
> (no extents will be returned), and the fm_mapped_extents count will
> hold the number of extents needed in fm_extents[] to hold the file's
> current mapping."
> 
> Think about that for a minute. What does the filesystem do with such
> an fiemap query when the extent map is not cached?  That's right,
> *fiemap reads the extent map from disk into the cache* and then
> returns the number of extents in the range.
> 
> All I have suggested is adding a flag to make this an *explicit
> operation* rather than a side effect of a "count extents" query. I
> fail to see any justification for a whole new interface when we
> already have a perfectly functional one that already provides the
> functionality that is required...

Yes, I understand your point of view.  We can use fiemap to do that.
All I concern is about semantics.  When someone mention about fiemap,
first I remember is that I can use it to retrieve the extent mappings.
But for fadvise, it looks like more naturally.  When I look at it, I
always think that I can use it to provide a hint to the kernel, and then
the kernel will do the rest of things for me.   So that is why I prefer
to use a fadvise flag rather than use fiemap.

Regards,
                                                - Zheng

  reply	other threads:[~2013-07-22 12:57 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 15:17 [PATCH 0/5 v2] add extent status tree caching Theodore Ts'o
2013-07-16 15:17 ` [PATCH 1/5] ext4: refactor code to read the extent tree block Theodore Ts'o
2013-07-16 15:18 ` [PATCH 2/5] ext4: print the block number of invalid extent tree blocks Theodore Ts'o
2013-07-18  0:56   ` Zheng Liu
2013-07-16 15:18 ` [PATCH 3/5] ext4: use unsigned int for es_status values Theodore Ts'o
2013-07-16 15:18 ` [PATCH 4/5] ext4: cache all of an extent tree's leaf block upon reading Theodore Ts'o
2013-07-16 15:18 ` [PATCH 5/5] ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS Theodore Ts'o
2013-07-18  1:19   ` Zheng Liu
2013-07-18  2:50     ` Theodore Ts'o
2013-07-18 13:06       ` Zheng Liu
2013-07-18 15:21         ` Theodore Ts'o
2013-07-18 18:35 ` [PATCH 0/5 v2] add extent status tree caching Eric Sandeen
2013-07-18 18:53   ` Theodore Ts'o
2013-07-19  0:56     ` Eric Sandeen
2013-07-19  2:59       ` Theodore Ts'o
2013-07-19  3:33         ` Dave Chinner
2013-07-19 14:22           ` Jeff Moyer
2013-07-19 16:19           ` Theodore Ts'o
2013-07-22  1:38             ` Dave Chinner
2013-07-22  2:17               ` Zheng Liu
2013-07-22 10:02                 ` Dave Chinner
2013-07-22 12:57                   ` Zheng Liu [this message]
2013-07-30  3:08                     ` Dave Chinner
2013-08-04  1:27                       ` Theodore Ts'o
2013-08-13  3:10                         ` Dave Chinner
2013-08-13  3:21                           ` Eric Sandeen
2013-08-13 13:04                             ` Theodore Ts'o
2013-08-16  3:21                               ` Dave Chinner
2013-08-16 14:39                                 ` Theodore Ts'o
2013-07-18 23:54   ` Zheng Liu
2013-07-19  0:07     ` Theodore Ts'o
2013-07-19  1:03       ` Zheng Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130722125745.GA2827@gmail.com \
    --to=gnehzuil.liu@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).