From: Dave Chinner <dchinner@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fam Zheng <famz@redhat.com>, Eric Blake <eblake@redhat.com>,
Kevin Wolf <kwolf@redhat.com>,
qemu-block@nongnu.org, qemu-devel@nongnu.org,
Max Reitz <mreitz@redhat.com>,
Lukas Czerner <lczerner@redhat.com>,
P@draigBrady.com, Niels de Vos <ndevos@redhat.com>
Subject: Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Fri, 22 Jul 2016 18:58:57 +1000 [thread overview]
Message-ID: <20160722085857.GT2031@devil.localdomain> (raw)
In-Reply-To: <1072047668.9469836.1469111028358.JavaMail.zimbra@redhat.com>
On Thu, Jul 21, 2016 at 10:23:48AM -0400, Paolo Bonzini wrote:
> > > 1) avoid copying zero data, to keep the copy process efficient. For this,
> > > SEEK_HOLE/SEEK_DATA are enough.
> > >
> > > 2) copy file contents while preserving the allocation state of the file's
> > > extents.
> >
> > Which is /very difficult/ to do safely and reliably.
> > i.e. the use of fiemap to duplicate the exact layout of a file
> > from userspace is only posisble if you can /guarantee/ the source
> > file has not changed in any way during the copy operation at the
> > pointin time you finalise the destination data copy.
>
> We don't do exactly that, exactly because it's messy when you have
> concurrent accesses (which shouldn't be done but you never know).
Which means you *cannot make the assumption it won't happen*.
FIEMAP is not guaranteed to tell you exactly where the data in the
file is that you need to copy is and that nothing you can do from
userspace changes that. I can't say it any clearer than that.
> When
> doing a copy, we use(d to use) FIEMAP the same way as you'd use lseek,
> querying one extent at a time. If you proceed this way, all of these
> can cause the same races:
>
> - pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is
> not copied
>
> - pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is
> copied
>
> - lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not
> copied
>
> - lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is
> copied
>
> - ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so
> the 10MB..20MB area is not copied
No, FIEMAP is not guaranteed to behave like this. what is returned
is filesystem dependent. Fielsystems that don't support holes will
return data extents. Filesystems that support compression might
return a compressed data extent rather than a hole. Encrypted files
might not expose holes at all, so people can't easily find known
plain text regions in the encrypted data. Filesystems could report
holes as deduplicated data, etc. What do you do when FIEMAP returns
"OFFLINE" to indicate that the data is located elsewhere and will
need to be retrieved by the HSM operating on top of the filesystem
before layout can be determined?
All of the above are *valid* and *correct*, because the filesytem
defines what FIEMAP returns for a given file offset. just because
ext4 and XFS have mostly the same behaviour, it doesn't mean that
every other filesystem behaves the same way.
The assumptions being made about FIEMAP behaviour will only lead to
user data corruption, as they already have several times in the past.
Cheers,
Dave.
--
Dave Chinner
dchinner@redhat.com
next prev parent reply other threads:[~2016-07-22 8:59 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-19 4:07 [Qemu-devel] [PATCH for-2.7 v5 00/14] nbd: efficient write zeroes Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 01/14] nbd: Fix bad flag detection on server Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 02/14] nbd: Add qemu-nbd -D for human-readable description Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 03/14] nbd: Limit nbdflags to 16 bits Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 04/14] nbd: Treat flags vs. command type as separate fields Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 05/14] nbd: Share common reply-sending code in server Eric Blake
2016-07-19 5:10 ` Fam Zheng
2016-07-19 14:52 ` Eric Blake
2016-07-20 4:39 ` Fam Zheng
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 06/14] nbd: Send message along with server NBD_REP_ERR errors Eric Blake
2016-07-19 5:15 ` Fam Zheng
2016-10-11 15:12 ` Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 07/14] nbd: Share common option-sending code in client Eric Blake
2016-07-19 5:31 ` Fam Zheng
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 08/14] nbd: Let server know when client gives up negotiation Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 09/14] nbd: Let client skip portions of server reply Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 10/14] nbd: Less allocation during NBD_OPT_LIST Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 11/14] nbd: Support shorter handshake Eric Blake
2016-07-19 4:07 ` [Qemu-devel] [PATCH v5 12/14] nbd: Improve server handling of shutdown requests Eric Blake
2016-07-19 4:08 ` [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
2016-07-19 6:21 ` Fam Zheng
2016-07-19 15:28 ` Eric Blake
2016-07-19 15:45 ` Paolo Bonzini
2016-07-20 3:34 ` Fam Zheng
2016-07-20 3:47 ` Eric Blake
2016-07-20 4:37 ` Fam Zheng
2016-07-20 7:09 ` Paolo Bonzini
2016-07-20 7:38 ` Fam Zheng
2016-07-20 8:16 ` Paolo Bonzini
2016-07-20 9:04 ` Fam Zheng
2016-07-20 9:19 ` [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server) Paolo Bonzini
2016-07-20 12:30 ` Dave Chinner
2016-07-20 13:35 ` Niels de Vos
2016-07-21 11:43 ` Dave Chinner
2016-07-21 12:31 ` Pádraig Brady
2016-07-21 13:15 ` Dave Chinner
2016-07-20 13:40 ` Paolo Bonzini
2016-07-21 12:41 ` Dave Chinner
2016-07-21 13:01 ` Pádraig Brady
2016-07-21 14:23 ` Paolo Bonzini
2016-07-22 8:58 ` Dave Chinner [this message]
2016-07-22 10:41 ` Paolo Bonzini
2018-02-15 16:40 ` Vladimir Sementsov-Ogievskiy
2018-02-15 16:42 ` Paolo Bonzini
2018-04-18 14:25 ` Vladimir Sementsov-Ogievskiy
2018-04-18 14:41 ` [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC Eric Blake
2016-08-18 13:50 ` [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server Vladimir Sementsov-Ogievskiy
2016-08-18 13:52 ` Paolo Bonzini
2016-07-19 4:08 ` [Qemu-devel] [PATCH v5 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
2016-07-19 6:24 ` Fam Zheng
2016-07-19 15:31 ` Eric Blake
2016-07-19 6:33 ` [Qemu-devel] [PATCH for-2.7 v5 00/14] nbd: efficient write zeroes Fam Zheng
2016-07-19 8:53 ` Paolo Bonzini
2016-07-19 15:33 ` Eric Blake
2016-07-19 15:41 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160722085857.GT2031@devil.localdomain \
--to=dchinner@redhat.com \
--cc=P@draigBrady.com \
--cc=eblake@redhat.com \
--cc=famz@redhat.com \
--cc=kwolf@redhat.com \
--cc=lczerner@redhat.com \
--cc=mreitz@redhat.com \
--cc=ndevos@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).