From: Dave Chinner <david@fromorbit.com>
To: Gionatan Danti <g.danti@assyoma.it>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Reflink (cow) copy of busy files
Date: Mon, 26 Feb 2018 11:25:34 +1100 [thread overview]
Message-ID: <20180226002533.GG30854@dastard> (raw)
In-Reply-To: <d105d0000652be75774cc1f5f23eae68@assyoma.it>
On Sun, Feb 25, 2018 at 10:58:16PM +0100, Gionatan Danti wrote:
> Il 25-02-2018 22:13 Dave Chinner ha scritto:
> >This isn't a copy on write issue. This is an issue of the state of
> >the file and the I/O stack above it at the time the data extents are
> >shared. There is I/O inflight, and so there's no guarantee that what
> >is in the extents being shared is consistent. Freezing the
> >filesystem stops IO in flight, so the extents can be shared while
> >the filesystem knows it has consistent state on stable storage.
>
> Uhm, it seems the very same definition/catches of "crash-consistent"
> snapshot...
>
> Suppose an XFS filesystem used for VM disk images hosting, with
> running VMs. I naively execute a cp --reflink=always copy, stop the
> original VM and start the copied one.
>
> For an atomic snapshot I would expect that dataloss is comparable to
> a "power pull" case:
> - async writes are lost. After all, they were on the pagecache and
> never hit the backing file;
> - unacknowledged sync writes are lost. Again, they never
> successfully hit the disk;
> - acknowledged sync writes (ie: the one which returned) are properly
> written to the backing file.
Acknowledged sync writes are not guaranteed to be stable. They may
still be sitting in volatile caches below the backing file, and so
until there is a cache flush pushed down through all layers of the
storage stack (e.g. fsync on the backing file) those acknowledged
sync writes are not stable. That's one of the things quiescing the
filesystem guarantees, but running reflink to clone the file does
not.
IOWs, "properly written" is easy to say but very hard to guarantee.
We cannot make such assumptions about random user configs, nor we
can base recommendations on such assumptions. If you choose not to
quiesce the filesystems before snapshotting them, then it's your
responsibility to guarantee your storage stack will work correctly.
> If the above is correct, when starting the new (copied) VM, the
> guest filesystem will behave as power was lost: its journal will be
> replied and broght to a consistent state. Application can/will be
> affected based on what they were doing at the time of the reflinked
> copy, but important ones (ie: the ones correctly using fsync), as
> databases, will gracefully recover replying their logs.
>
> This should be similar to how LVM snapshot works when no filesystem
> is (directly) layered on top of the volume (ie: volume assigned to a
> VM).
You still have to quiesce the filesystem when it's on top of a LVM
snapshot volume.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-02-26 0:25 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-24 18:20 Reflink (cow) copy of busy files Gionatan Danti
2018-02-24 22:07 ` Dave Chinner
2018-02-24 22:57 ` Gionatan Danti
2018-02-25 2:47 ` Dave Chinner
2018-02-25 11:40 ` Gionatan Danti
2018-02-25 21:13 ` Dave Chinner
2018-02-25 21:58 ` Gionatan Danti
2018-02-26 0:25 ` Dave Chinner [this message]
2018-02-26 7:19 ` Gionatan Danti
2018-02-26 7:58 ` Amir Goldstein
2018-02-26 8:26 ` Gionatan Danti
2018-02-26 17:26 ` Darrick J. Wong
2018-02-26 21:23 ` Gionatan Danti
2018-02-26 21:31 ` Darrick J. Wong
2018-02-26 21:39 ` Gionatan Danti
2018-02-27 0:33 ` Dave Chinner
2018-02-27 0:58 ` Darrick J. Wong
2018-02-27 8:06 ` Gionatan Danti
2018-02-27 22:04 ` Dave Chinner
2018-02-28 7:08 ` Gionatan Danti
2018-02-28 17:07 ` Darrick J. Wong
2018-02-28 18:27 ` Gionatan Danti
2018-02-26 20:29 ` Amir Goldstein
2018-02-26 21:28 ` Gionatan Danti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180226002533.GG30854@dastard \
--to=david@fromorbit.com \
--cc=g.danti@assyoma.it \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).