linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: jim owens <jowens@hp.com>, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 0/4] Fiemap, an extent mapping ioctl - round 2
Date: Fri, 4 Jul 2008 13:13:25 +0100	[thread overview]
Message-ID: <20080704121325.GB29484@shareable.org> (raw)
In-Reply-To: <20080703225124.GB29319@disturbed>

Dave Chinner wrote:
> The point of this SYNC flag is to ensure that you get nothing other
> than blocks mapped to disk - no delalloc regions, etc. The only sane
> way to do that is an atomic 'sync+map' operation. This is not a
> filesystem specific feature - it's what the SYNC flag should be
> defined as providing.

Wait a minute.

I think Jim, and you Dave, have imagined different use-cases
for FIEMAP - and that's the reason for this difference of opinion.

The two use-cases are:

    1. To get a detailed fragmentation report, which is guidance (and
       can only be guidance: it may be invalid the moment it's returned).

    2. To get a block mapping suitable for _reading_ those blocks from
       the physical device directly (e.g. LILO).

For 1, atomic 'sync+map' does make sense, if you want the report to
not have any delalloc extents, and you want to operate on files which
are being modified by other processes.

For 2, Jim appears to be correct that atomic 'sync+map' is not useful.
You can only read blocks if the mapping remains stable after returning
it, which means the application _must_ ensure no process is modifying
the file, and that it's on a filesystem which doesn't arbitrarily move
blocks when it feels like it.  Given that,
'make_sure_nothing_modifies; atomic(sync + map); read data;
ok_you_can_modify' is no different from 'make_sure_nothing_modifies;
fsync(); map; read data; ok_you_can_modify'.

> The fact that it's only implemented in XFS right now has absolutely
> *zero* consideration in determining this feature is necessary or
> not.

You're right, but that's not what Jim's arguing.  He's saying the
feature isn't necessary since it provides no _dependable_ semantic
guarantees, and therefore arguments to keep it are for legacy
compatibility alone.  That may be reason enough to keep it, though.

However, he's mistaken.  You've explained that it does provide a
guarantee: the resulting map will be valid for a consistent snapshot
of the file at some instant in time during the FIEMAP call.  In other
words, with concurrent modifiers, atomic sync+map ensures no delalloc
regions (is there anything else?) in the map, while fsync() + map gets
close but does not ensure it.  But either way, with concurrent
modifiers, you can only use the result for guidance, in a
fragmentation report, so is preventing delalloc regions actually
useful?  Maybe it is.  It would be good to see an example, though.

Dave, can you give an actual situation where you have seen atomic
'sync+map' used with XFS where it is necessary for an application to
behave correctly?  I'm having trouble thinking of one, other than "the
current app code doesn't know what to do with a delalloc extent".

I'm thinking when those programs are updated to use the new interface,
wouldn't it be better to update them to handle delalloc extents
(treating them as "region unknown" in fragmentation reports), because
some filesystems won't support atomic sync+map anyway?

Finally, if the real intent here is "the returned map for
fragmentation report shall not include any delalloc extents", perhaps
that should be the request flag instead?  There are other ways to
ensure that which don't require blocking concurrent modifications, for
potentially a significant time (esp. block-based filesystems).  We
like lock-free algorithms these days, if the results are suitable.

-- Jamie

  parent reply	other threads:[~2008-07-04 12:13 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-25 22:18 [PATCH 0/4] Fiemap, an extent mapping ioctl - round 2 Mark Fasheh
2008-06-26  3:03 ` Andreas Dilger
2008-06-26  9:36 ` Jamie Lokier
2008-06-26 10:24   ` Andreas Dilger
2008-06-26 11:37     ` Anton Altaparmakov
2008-06-26 12:19     ` Jamie Lokier
2008-06-26 13:16       ` Dave Chinner
2008-06-26 13:27         ` Jamie Lokier
2008-06-26 13:48         ` Eric Sandeen
2008-06-26 14:16           ` Jamie Lokier
2008-06-26 16:56             ` Andreas Dilger
2008-06-29 19:12               ` Anton Altaparmakov
2008-06-29 21:45                 ` Dave Chinner
2008-06-30 22:57                   ` Jamie Lokier
2008-06-30 23:07                     ` Mark Fasheh
2008-07-01  2:01                       ` Brad Boyer
2008-07-02  6:38                         ` Andreas Dilger
2008-07-02  6:33                 ` Andreas Dilger
2008-07-02 14:26                   ` Jamie Lokier
2008-06-26 17:17       ` Andreas Dilger
2008-06-26 14:03 ` Eric Sandeen
2008-06-27  1:41   ` Dave Chinner
2008-06-27  9:41     ` Jamie Lokier
2008-06-27 10:01       ` Dave Chinner
2008-06-27 10:32         ` Jamie Lokier
2008-06-27 22:48       ` Andreas Dilger
2008-06-28  4:21         ` Eric Sandeen
2008-07-02  6:26           ` Andreas Dilger
2008-07-02 14:28             ` Jamie Lokier
2008-07-02 21:20               ` Mark Fasheh
2008-07-03 14:45                 ` Jamie Lokier
2008-06-26 14:04 ` Dave Kleikamp
2008-06-26 14:15   ` Eric Sandeen
2008-06-26 14:27     ` Dave Kleikamp
2008-07-02 23:48       ` jim owens
2008-07-03 11:17         ` Dave Chinner
2008-07-03 12:11           ` jim owens
2008-07-03 22:51             ` Dave Chinner
2008-07-04  8:31               ` Andreas Dilger
2008-07-04 12:13               ` Jamie Lokier [this message]
2008-07-07  7:40                 ` Dave Chinner
2008-07-07 16:53                   ` Jamie Lokier
2008-07-07 22:51                     ` Dave Chinner
2008-07-07 21:16               ` jim owens
2008-07-08  3:01                 ` Dave Chinner
2008-07-07 22:02               ` jim owens
2008-07-09  2:03                 ` Jamie Lokier
2008-07-03 12:21           ` jim owens
2008-07-03 12:42             ` Andi Kleen
2008-07-04 20:32             ` Anton Altaparmakov
2008-07-05 10:49               ` Jamie Lokier
2008-07-05 21:44                 ` Anton Altaparmakov
2008-07-07 23:01               ` jim owens
2008-07-08  1:51                 ` Dave Chinner
2008-07-08 13:02                   ` jim owens
2008-07-08 14:03                     ` jim owens
2008-07-08 14:39                       ` jim owens
2008-07-08 14:30                     ` Theodore Tso
2008-07-09  1:50                       ` Jamie Lokier
2008-06-26 17:01   ` Andreas Dilger
2008-07-03 14:37 ` jim owens
2008-07-03 15:17   ` Jamie Lokier
2008-07-04  8:49     ` Andreas Dilger
2008-07-04 11:28       ` Jamie Lokier
2008-07-03 23:00   ` Dave Chinner
2008-07-04  9:00   ` Andreas Dilger
2008-07-07 23:28     ` jim owens
2008-07-09  1:53       ` Jamie Lokier
2008-07-09 15:01         ` jim owens
2008-07-08  0:06     ` jim owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080704121325.GB29484@shareable.org \
    --to=jamie@shareable.org \
    --cc=jowens@hp.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).