linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Gionatan Danti <g.danti@assyoma.it>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: Reflink (cow) copy of busy files
Date: Mon, 26 Feb 2018 09:58:09 +0200	[thread overview]
Message-ID: <CAOQ4uxixAP6F8f6hT5VENQLD0o_3Bym6pkAJhXva5b99BhE9Eg@mail.gmail.com> (raw)
In-Reply-To: <6eacd8faae2779b8dfb62fb0d65a9411@assyoma.it>

On Mon, Feb 26, 2018 at 9:19 AM, Gionatan Danti <g.danti@assyoma.it> wrote:
> Full disclaimer: maybe my point of view is influenced by thinking in the
> context of Qemu/KVM + software RAID (where much works was done to be sure
> about proper barrier passing) or BBU/NV hardware RAID.
>
> Il 26-02-2018 01:25 Dave Chinner ha scritto:
>>
>> Acknowledged sync writes are not guaranteed to be stable. They may
>> still be sitting in volatile caches below the backing file, and so
>> until there is a cache flush pushed down through all layers of the
>> storage stack (e.g. fsync on the backing file) those acknowledged
>> sync writes are not stable. That's one of the things quiescing the
>> filesystem guarantees, but running reflink to clone the file does
>> not.
>
>
> Sure, but not-passed-down fsync/write barriers will thwarts even "normal"
> (ie: not CoW/snapshotted/reflinked) sync writes, and will inevitably cause
> problems (ie: a power loss become a big problem). How is it different for
> relinked copy?
>
>> IOWs, "properly written" is easy to say but very hard to guarantee.
>> We cannot make such assumptions about random user configs, nor we
>> can base recommendations on such assumptions.  If you choose not to
>> quiesce the filesystems before snapshotting them, then it's your
>> responsibility to guarantee your storage stack will work correctly.
>
>
> Absolutely, and I *really* appreciate your advices.
>
>> You still have to quiesce the filesystem when it's on top of a LVM
>> snapshot volume.
>
>
> When the LVM volume is passed to a guest VM, the host can not quiesce the
> filesystem. Host/guest communication can be achieved by the mean on a guest
> agent and a private control channel, but this has its own problems. I
> thoroughly tested live, LVM-backed snapshotted VM and every time I run them,
> the guest filesystem replies its log without problem. I always double-check
> that the entire I/O stack (from guest down to the physical disks) honors
> write barriers, though.
>
> Back to the original question: if a reflinked copy is an *atomic* operation
> on all the data extents comprising a file, and in the context of properly
> passed barriers/fsync, I would think that an unquiesced snapshot will work
> for the (reduced) consistency model of a crash-consistent snapshot.
>
> If the reflink copy is not atomic (ie: the different extents are CoWed at
> different time, making it only a "faster copy" rather than a snapshot) this
> will *not* work and I will end with binary garbage (ie: writes can be
> reordered from snapshot's view).
>
> I think all can be reduced to a single question: putting aside quiescing
> problems, is a reflinked copy a true *atomic* snapshot or it is "only" a
> faster copy?
>

Gionatan,

First of all, the answer to your question is "just" faster copy.
reflinkning a file is much faster than copy, but it is not O(1).
I believe cp --reflink can result in cloning part of the file if the system
crashes mid operation, so in any case, the operation is not *atomic*
in that sense.

But your questions about quiescence the filesystem and your question
about the *atomic* nature of the clone operation are two very different
questions.

What you seem to *think* xfs reflink does, it does not actually do.
xfs reflink does NOT reflink the file in-memory data.
xfs reflink "only" reflinks the file on-disk data.
Right now, if you write a large file without fsync and clone it, you
might as well get a clone of unallocated or partly fallocated file with
zero or stale data.

Going forward, I think there is an intention to "clone" the file in-memory
data as well by sharing the READONLY cache pages between cloned files,
but I don't think dirty pages are going be shared between clones anyway,
so you are back to square one - need to get the data on-disk before cloning
the file.

Cheers,
Amir.

  reply	other threads:[~2018-02-26  7:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-24 18:20 Reflink (cow) copy of busy files Gionatan Danti
2018-02-24 22:07 ` Dave Chinner
2018-02-24 22:57   ` Gionatan Danti
2018-02-25  2:47     ` Dave Chinner
2018-02-25 11:40       ` Gionatan Danti
2018-02-25 21:13         ` Dave Chinner
2018-02-25 21:58           ` Gionatan Danti
2018-02-26  0:25             ` Dave Chinner
2018-02-26  7:19               ` Gionatan Danti
2018-02-26  7:58                 ` Amir Goldstein [this message]
2018-02-26  8:26                   ` Gionatan Danti
2018-02-26 17:26                     ` Darrick J. Wong
2018-02-26 21:23                       ` Gionatan Danti
2018-02-26 21:31                         ` Darrick J. Wong
2018-02-26 21:39                           ` Gionatan Danti
2018-02-27  0:33                       ` Dave Chinner
2018-02-27  0:58                         ` Darrick J. Wong
2018-02-27  8:06                         ` Gionatan Danti
2018-02-27 22:04                           ` Dave Chinner
2018-02-28  7:08                             ` Gionatan Danti
2018-02-28 17:07                               ` Darrick J. Wong
2018-02-28 18:27                                 ` Gionatan Danti
2018-02-26 20:29                     ` Amir Goldstein
2018-02-26 21:28                       ` Gionatan Danti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxixAP6F8f6hT5VENQLD0o_3Bym6pkAJhXva5b99BhE9Eg@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=g.danti@assyoma.it \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).