linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Gionatan Danti <g.danti@assyoma.it>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: Reflink (cow) copy of busy files
Date: Mon, 26 Feb 2018 13:31:02 -0800	[thread overview]
Message-ID: <20180226213102.GD19312@magnolia> (raw)
In-Reply-To: <2f48d103c0a6eff6c0a1136057e828b6@assyoma.it>

On Mon, Feb 26, 2018 at 10:23:45PM +0100, Gionatan Danti wrote:
> Il 26-02-2018 18:26 Darrick J. Wong ha scritto:
> >The way reflink is supposed to work wrt consistency is:
> >
> >1. lock out all new io/fallocate activity on both inodes (iolock/mmaplock)
> >2. wait for all directio to complete
> >3. fsync both files (write all the dirty pagecache to disk)
> >4. lock both inodes (ilock)
> >5. clone each extent atomically
> >6. unlock ilock
> >7. unlock iolock/mmaplock
> >
> >So at least in theory the cloned file will match whatever the host saw
> >on disk and page cache at the time the reflink call was initiated.
> >I say 'in theory' because there could be bugs.
> 
> Great! CoW will be a great addition for XFS when it will be considered
> stable.
> 
> >Whatever dirty state is in the guest VM stays in that VM, which means
> >that if you only cp --reflink on the host, the clone you get will
> >reflect the virtual disk state as if you'd kill -9'd the VM, cloned the
> >VM disk, and restarted the VM.  Upon restart the log recovers whatever
> >metadata made it out of the VM.
> 
> Sure, it is what I means for "crash-consistent".
> 
> >However, if you tell the guest to freeze the fs before cloning (as Dave
> >suggested earlier) the guest will flush all its state to the upper level
> >(the host) and the host will push all that out to disk before cloning.
> >The snapshot you create should be cleaner because you're effectively
> >prepaying the recovery costs by flushing everything before taking the
> >snapshot.
> 
> True, and this is "application-level consistency" (which requires a guest
> agent and possibly even an application-specific agent)

I believe qemu-ga takes care of guest fs freeze inside the guest,
and you can invoke it from the host via 'virsh domfsfreeze' or the
--quiesce argument to snapshot-create... but you ought to confirm that
for yourself.

--D

> >Also note that if the host goes down before returning from the syscall,
> >the log will continue on with whichever extent was being cloned at the
> >time in order to preserve metadata integrity, but the destination file
> >will reflect a partial copy.
> 
> Thanks for pointing that, and for your extremely clear explanation!
> 
> 
> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@assyoma.it - info@assyoma.it
> GPG public key ID: FF5F32A8
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-02-26 21:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-24 18:20 Reflink (cow) copy of busy files Gionatan Danti
2018-02-24 22:07 ` Dave Chinner
2018-02-24 22:57   ` Gionatan Danti
2018-02-25  2:47     ` Dave Chinner
2018-02-25 11:40       ` Gionatan Danti
2018-02-25 21:13         ` Dave Chinner
2018-02-25 21:58           ` Gionatan Danti
2018-02-26  0:25             ` Dave Chinner
2018-02-26  7:19               ` Gionatan Danti
2018-02-26  7:58                 ` Amir Goldstein
2018-02-26  8:26                   ` Gionatan Danti
2018-02-26 17:26                     ` Darrick J. Wong
2018-02-26 21:23                       ` Gionatan Danti
2018-02-26 21:31                         ` Darrick J. Wong [this message]
2018-02-26 21:39                           ` Gionatan Danti
2018-02-27  0:33                       ` Dave Chinner
2018-02-27  0:58                         ` Darrick J. Wong
2018-02-27  8:06                         ` Gionatan Danti
2018-02-27 22:04                           ` Dave Chinner
2018-02-28  7:08                             ` Gionatan Danti
2018-02-28 17:07                               ` Darrick J. Wong
2018-02-28 18:27                                 ` Gionatan Danti
2018-02-26 20:29                     ` Amir Goldstein
2018-02-26 21:28                       ` Gionatan Danti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180226213102.GD19312@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=g.danti@assyoma.it \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).