qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Dave Chinner <dchinner@redhat.com>
Cc: Fam Zheng <famz@redhat.com>, Eric Blake <eblake@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Max Reitz <mreitz@redhat.com>,
	Lukas Czerner <lczerner@redhat.com>,
	P@draigBrady.com, Niels de Vos <ndevos@redhat.com>
Subject: Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Fri, 22 Jul 2016 06:41:24 -0400 (EDT)	[thread overview]
Message-ID: <1401857113.9662127.1469184084100.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20160722085857.GT2031@devil.localdomain>

> > > i.e. the use of fiemap to duplicate the exact layout of a file
> > > from userspace is only posisble if you can /guarantee/ the source
> > > file has not changed in any way during the copy operation at the
> > > pointin time you finalise the destination data copy.
> > 
> > We don't do exactly that, exactly because it's messy when you have
> > concurrent accesses (which shouldn't be done but you never know).
> 
> Which means you *cannot make the assumption it won't happen*.
> FIEMAP is not guaranteed to tell you exactly where the data in the
> file is that you need to copy is and that nothing you can do from
> userspace changes that. I can't say it any clearer than that.

You've said it very clearly.  But I'm not saying "fix the damn FIEMAP", I'm
saying "this is what we need, lseek doesn't provide it, FIEMAP comes
close but really doesn't".  If the solution is to fix FIEMAP, if it's
possible at all, that'd be great.  But any other solution is okay.

Do you at least agree that it's possible to use the kind of information
in struct fiemap_extent (extent start, length and flags) in a way that is
not racy, or at least not any different from SEEK_DATA and SEEK_HOLE's
raciness?  I'm not saying that you'd get that information from FIEMAP.
It's just the kind of information that I'd like to get. 

(BTW, the documentation of FIEMAP is horrible.  It does not say at all
that FIEMAP_FLAG_SYNC is needed to return extents that match what
SEEK_HOLE/SEEK_DATA would return.  The obvious reading is that
FIEMAP_FLAG_SYNC would avoid returning FIEMAP_EXTENT_DELALLOC extents,
and that in turn would not be a problem if you don't need fe_physical.
Perhaps it would help if fiemap.txt said started with *why* would one
use FIEMAP, not just what it does).

> > When doing a copy, we use(d to use) FIEMAP the same way as you'd use lseek,
> > querying one extent at a time.  If you proceed this way, all of these
> > can cause the same races:
> > 
> > - pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is
> > not copied
> > 
> > - pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is
> > copied
> > 
> > - lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not
> > copied
> > 
> > - lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is
> > copied
> > 
> > - ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so
> > the 10MB..20MB area is not copied
> 
> No, FIEMAP is not guaranteed to behave like this. what is returned
> is filesystem dependent. Fielsystems that don't support holes will
> return data extents. Filesystems that support compression might
> return a compressed data extent rather than a hole. Encrypted files
> might not expose holes at all, so people can't easily find known
> plain text regions in the encrypted data. Filesystems could report
> holes as deduplicated data, etc.  What do you do when FIEMAP returns
> "OFFLINE" to indicate that the data is located elsewhere and will
> need to be retrieved by the HSM operating on top of the filesystem
> before layout can be determined?

lseek(SEEK_DATA) might also say you're not on a hole because the file
is compressed/encrypted/deduplicated/offline/whatnot.  So lseek is
also filesystem dependent, isn't it?  It also doesn't work on block
devices, so it's really file descriptor dependent.  That's not news.

Of course read, lseek and FIEMAP will not return exactly the same
information.  The point is that they're subject to exactly the same
races, and that struct fiemap_extent *can* be parsed conservatively.
The code I attached to the previous message does that, if it finds any
extent kind other than an unwritten extent it just treats it as data.

Based on this it would be nice to understand the reason why FIEMAP needs
FIEMAP_FLAG_SYNC to return meaningful values (the meaningful values
are there, they're just what lseek or read use), or to have a more
powerful function than just lseek(SEEK_DATA/SEEK_HOLE).  All we want is
to be able to distinguish between the three fallocate modes.

> The assumptions being made about FIEMAP behaviour will only lead to
> user data corruption, as they already have several times in the past.

Indeed, FIEMAP as is ranks just above gets() in usability.  But there's
no reason why it has to be that way.

Paolo

  reply	other threads:[~2016-07-22 10:41 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-19  4:07 [Qemu-devel] [PATCH for-2.7 v5 00/14] nbd: efficient write zeroes Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 01/14] nbd: Fix bad flag detection on server Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 02/14] nbd: Add qemu-nbd -D for human-readable description Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 03/14] nbd: Limit nbdflags to 16 bits Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 04/14] nbd: Treat flags vs. command type as separate fields Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 05/14] nbd: Share common reply-sending code in server Eric Blake
2016-07-19  5:10   ` Fam Zheng
2016-07-19 14:52     ` Eric Blake
2016-07-20  4:39       ` Fam Zheng
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 06/14] nbd: Send message along with server NBD_REP_ERR errors Eric Blake
2016-07-19  5:15   ` Fam Zheng
2016-10-11 15:12     ` Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 07/14] nbd: Share common option-sending code in client Eric Blake
2016-07-19  5:31   ` Fam Zheng
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 08/14] nbd: Let server know when client gives up negotiation Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 09/14] nbd: Let client skip portions of server reply Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 10/14] nbd: Less allocation during NBD_OPT_LIST Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 11/14] nbd: Support shorter handshake Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 12/14] nbd: Improve server handling of shutdown requests Eric Blake
2016-07-19  4:08 ` [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
2016-07-19  6:21   ` Fam Zheng
2016-07-19 15:28     ` Eric Blake
2016-07-19 15:45       ` Paolo Bonzini
2016-07-20  3:34         ` Fam Zheng
2016-07-20  3:47           ` Eric Blake
2016-07-20  4:37             ` Fam Zheng
2016-07-20  7:09               ` Paolo Bonzini
2016-07-20  7:38                 ` Fam Zheng
2016-07-20  8:16                   ` Paolo Bonzini
2016-07-20  9:04                     ` Fam Zheng
2016-07-20  9:19                   ` [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server) Paolo Bonzini
2016-07-20 12:30                     ` Dave Chinner
2016-07-20 13:35                       ` Niels de Vos
2016-07-21 11:43                         ` Dave Chinner
2016-07-21 12:31                           ` Pádraig Brady
2016-07-21 13:15                             ` Dave Chinner
2016-07-20 13:40                       ` Paolo Bonzini
2016-07-21 12:41                         ` Dave Chinner
2016-07-21 13:01                           ` Pádraig Brady
2016-07-21 14:23                           ` Paolo Bonzini
2016-07-22  8:58                             ` Dave Chinner
2016-07-22 10:41                               ` Paolo Bonzini [this message]
2018-02-15 16:40                                 ` Vladimir Sementsov-Ogievskiy
2018-02-15 16:42                                   ` Paolo Bonzini
2018-04-18 14:25                                     ` Vladimir Sementsov-Ogievskiy
2018-04-18 14:41                                       ` [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC Eric Blake
2016-08-18 13:50   ` [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server Vladimir Sementsov-Ogievskiy
2016-08-18 13:52     ` Paolo Bonzini
2016-07-19  4:08 ` [Qemu-devel] [PATCH v5 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
2016-07-19  6:24   ` Fam Zheng
2016-07-19 15:31     ` Eric Blake
2016-07-19  6:33 ` [Qemu-devel] [PATCH for-2.7 v5 00/14] nbd: efficient write zeroes Fam Zheng
2016-07-19  8:53 ` Paolo Bonzini
2016-07-19 15:33   ` Eric Blake
2016-07-19 15:41     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1401857113.9662127.1469184084100.JavaMail.zimbra@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=P@draigBrady.com \
    --cc=dchinner@redhat.com \
    --cc=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lczerner@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=ndevos@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).