From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Gionatan Danti <g.danti@assyoma.it>
Cc: Amir Goldstein <amir73il@gmail.com>,
Dave Chinner <david@fromorbit.com>,
linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: Reflink (cow) copy of busy files
Date: Mon, 26 Feb 2018 13:31:02 -0800 [thread overview]
Message-ID: <20180226213102.GD19312@magnolia> (raw)
In-Reply-To: <2f48d103c0a6eff6c0a1136057e828b6@assyoma.it>
On Mon, Feb 26, 2018 at 10:23:45PM +0100, Gionatan Danti wrote:
> Il 26-02-2018 18:26 Darrick J. Wong ha scritto:
> >The way reflink is supposed to work wrt consistency is:
> >
> >1. lock out all new io/fallocate activity on both inodes (iolock/mmaplock)
> >2. wait for all directio to complete
> >3. fsync both files (write all the dirty pagecache to disk)
> >4. lock both inodes (ilock)
> >5. clone each extent atomically
> >6. unlock ilock
> >7. unlock iolock/mmaplock
> >
> >So at least in theory the cloned file will match whatever the host saw
> >on disk and page cache at the time the reflink call was initiated.
> >I say 'in theory' because there could be bugs.
>
> Great! CoW will be a great addition for XFS when it will be considered
> stable.
>
> >Whatever dirty state is in the guest VM stays in that VM, which means
> >that if you only cp --reflink on the host, the clone you get will
> >reflect the virtual disk state as if you'd kill -9'd the VM, cloned the
> >VM disk, and restarted the VM. Upon restart the log recovers whatever
> >metadata made it out of the VM.
>
> Sure, it is what I means for "crash-consistent".
>
> >However, if you tell the guest to freeze the fs before cloning (as Dave
> >suggested earlier) the guest will flush all its state to the upper level
> >(the host) and the host will push all that out to disk before cloning.
> >The snapshot you create should be cleaner because you're effectively
> >prepaying the recovery costs by flushing everything before taking the
> >snapshot.
>
> True, and this is "application-level consistency" (which requires a guest
> agent and possibly even an application-specific agent)
I believe qemu-ga takes care of guest fs freeze inside the guest,
and you can invoke it from the host via 'virsh domfsfreeze' or the
--quiesce argument to snapshot-create... but you ought to confirm that
for yourself.
--D
> >Also note that if the host goes down before returning from the syscall,
> >the log will continue on with whichever extent was being cloned at the
> >time in order to preserve metadata integrity, but the destination file
> >will reflect a partial copy.
>
> Thanks for pointing that, and for your extremely clear explanation!
>
>
> --
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@assyoma.it - info@assyoma.it
> GPG public key ID: FF5F32A8
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-02-26 21:31 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-24 18:20 Reflink (cow) copy of busy files Gionatan Danti
2018-02-24 22:07 ` Dave Chinner
2018-02-24 22:57 ` Gionatan Danti
2018-02-25 2:47 ` Dave Chinner
2018-02-25 11:40 ` Gionatan Danti
2018-02-25 21:13 ` Dave Chinner
2018-02-25 21:58 ` Gionatan Danti
2018-02-26 0:25 ` Dave Chinner
2018-02-26 7:19 ` Gionatan Danti
2018-02-26 7:58 ` Amir Goldstein
2018-02-26 8:26 ` Gionatan Danti
2018-02-26 17:26 ` Darrick J. Wong
2018-02-26 21:23 ` Gionatan Danti
2018-02-26 21:31 ` Darrick J. Wong [this message]
2018-02-26 21:39 ` Gionatan Danti
2018-02-27 0:33 ` Dave Chinner
2018-02-27 0:58 ` Darrick J. Wong
2018-02-27 8:06 ` Gionatan Danti
2018-02-27 22:04 ` Dave Chinner
2018-02-28 7:08 ` Gionatan Danti
2018-02-28 17:07 ` Darrick J. Wong
2018-02-28 18:27 ` Gionatan Danti
2018-02-26 20:29 ` Amir Goldstein
2018-02-26 21:28 ` Gionatan Danti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180226213102.GD19312@magnolia \
--to=darrick.wong@oracle.com \
--cc=amir73il@gmail.com \
--cc=david@fromorbit.com \
--cc=g.danti@assyoma.it \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).