All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gionatan Danti <g.danti@assyoma.it>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	g.danti@assyoma.it
Subject: Re: Reflink (cow) copy of busy files
Date: Mon, 26 Feb 2018 09:26:14 +0100	[thread overview]
Message-ID: <73ef828837683073feb1c49e3405dd6a@assyoma.it> (raw)
In-Reply-To: <CAOQ4uxixAP6F8f6hT5VENQLD0o_3Bym6pkAJhXva5b99BhE9Eg@mail.gmail.com>

Hi Amir,

Il 26-02-2018 08:58 Amir Goldstein ha scritto:
> 
> Gionatan,
> 
> First of all, the answer to your question is "just" faster copy.
> reflinkning a file is much faster than copy, but it is not O(1).
> I believe cp --reflink can result in cloning part of the file if the 
> system
> crashes mid operation, so in any case, the operation is not *atomic*
> in that sense.
> 
> But your questions about quiescence the filesystem and your question
> about the *atomic* nature of the clone operation are two very different
> questions.

can this result on out-of-order writes from the cloned file's point of 
view? I mean:
- take a 10-extents file;
- a vm/db/whatever is writing to the file;
- a cp --reflink is executed;
- extents are cloned one-by-one, with extents 1-4 alredy cloned, 5 is in 
progress;
- the vm/db writes to extent n.1 - this write will *not* be present on 
the cloned file;
- application writes to extent n.6 which will be cloned shortly;
- the cloned file ends with the later write to extent n.6 but not the 
previous on extent n.1;
- bad things happen!

If the above is true, than cp --reflink can't be used even for 
relaxed-consistency backup/clones.

> What you seem to *think* xfs reflink does, it does not actually do.
> xfs reflink does NOT reflink the file in-memory data.
> xfs reflink "only" reflinks the file on-disk data.
> Right now, if you write a large file without fsync and clone it, you
> might as well get a clone of unallocated or partly fallocated file with
> zero or stale data.

Oh, I absolutely do not expect for reflink/clone to works on in-memory 
data. I *surely* expect for dirty, not commited data to be lost: this is 
the very reason I wrote about crash-consistent backup.

In short: is cloning/reflink the same as "pulling the plug" for the 
cloned file? I mean:
- a successfull clone (so, a non-interruped/crashed one) is akin to an 
atomic process for the cloned file;
- async writes/dirty data are lost;
- fsynced writes are preserved;
- writes are not reordered/commited out of order.

Maybe the entire discussion is skewed by the fact that, in some cases, I 
am willing to relax my consistency model to include a crash-consistent 
backup option. Fact is, in the virtualization world there are many 
backup utilities/applications which *use* this model, and I wondered if 
a cp --reflink would give similar results without the hassle.

Maybe the entire crash-vs-application consistency is out of place in a 
filesystem mailing list, where you (rightfully!!!) strive for 
perfect/maximum data consistency (and I *really* appreciate that). 
Hoewever, given the recent reflinking works on XFS, I wonder if I can 
put this to "good use" when it is considered stable.

> Going forward, I think there is an intention to "clone" the file 
> in-memory
> data as well by sharing the READONLY cache pages between cloned files,
> but I don't think dirty pages are going be shared between clones 
> anyway,
> so you are back to square one - need to get the data on-disk before 
> cloning
> the file.

Great - I think this would do wonders for cache efficiency...

> 
> Cheers,
> Amir.

Thanks.

PS: sorry if I rephrase the question in different terms. English is not 
my primary language, please bear with me :p

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

  reply	other threads:[~2018-02-26  8:26 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-24 18:20 Reflink (cow) copy of busy files Gionatan Danti
2018-02-24 22:07 ` Dave Chinner
2018-02-24 22:57   ` Gionatan Danti
2018-02-25  2:47     ` Dave Chinner
2018-02-25 11:40       ` Gionatan Danti
2018-02-25 21:13         ` Dave Chinner
2018-02-25 21:58           ` Gionatan Danti
2018-02-26  0:25             ` Dave Chinner
2018-02-26  7:19               ` Gionatan Danti
2018-02-26  7:58                 ` Amir Goldstein
2018-02-26  8:26                   ` Gionatan Danti [this message]
2018-02-26 17:26                     ` Darrick J. Wong
2018-02-26 21:23                       ` Gionatan Danti
2018-02-26 21:31                         ` Darrick J. Wong
2018-02-26 21:39                           ` Gionatan Danti
2018-02-27  0:33                       ` Dave Chinner
2018-02-27  0:58                         ` Darrick J. Wong
2018-02-27  8:06                         ` Gionatan Danti
2018-02-27 22:04                           ` Dave Chinner
2018-02-28  7:08                             ` Gionatan Danti
2018-02-28 17:07                               ` Darrick J. Wong
2018-02-28 18:27                                 ` Gionatan Danti
2018-02-26 20:29                     ` Amir Goldstein
2018-02-26 21:28                       ` Gionatan Danti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73ef828837683073feb1c49e3405dd6a@assyoma.it \
    --to=g.danti@assyoma.it \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.