All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>,
	lsf-pc@lists.linux-foundation.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [LSF/MM TOPIC] Lazy file reflink
Date: Tue, 29 Jan 2019 08:26:43 +1100	[thread overview]
Message-ID: <20190128212642.GQ4205@dastard> (raw)
In-Reply-To: <20190128125044.GC27972@quack2.suse.cz>

On Mon, Jan 28, 2019 at 01:50:44PM +0100, Jan Kara wrote:
> Hi,
> 
> On Fri 25-01-19 16:27:52, Amir Goldstein wrote:
> > I would like to discuss the concept of lazy file reflink.
> > The use case is backup of a very large read-mostly file.
> > Backup application would like to read consistent content from the
> > file, "atomic read" sort of speak.
> > 
> > With filesystem that supports reflink, that can be done by:
> > - Create O_TMPFILE
> > - Reflink origin to temp file
> > - Backup from temp file
> >
> > However, since the origin file is very likely not to be modified,
> > the reflink step, that may incur lots of metadata updates, is a waste.
> > Instead, if filesystem could be notified that atomic content was
> > requested (O_ATOMIC|O_RDONLY or O_CLONE|O_RDONLY),
> > filesystem could defer reflink to an O_TMPFILE until origin file is
> > open for write or actually modified.

That makes me want to run screaming for the hills.

> > What I just described above is actually already implemented with
> > Overlayfs snapshots [1], but for many applications overlayfs snapshots
> > it is not a practical solution.
> > 
> > I have based my assumption that reflink of a large file may incur
> > lots of metadata updates on my limited knowledge of xfs reflink
> > implementation, but perhaps it is not the case for other filesystems?

Comparitively speaking: compared to copying a large file, reflink is
cheap on any filesystem that implements it. Sure, reflinking on XFS
is CPU limited, IIRC, to ~10-20,000 extents per second per reflink
op per AG, but it's still faster than copying 10-20,000 extents
per second per copy op on all but the very fastest, unloaded nvme
SSDs...

> > (btrfs?) and perhaps the current metadata overhead on reflink of a large
> > file is an implementation detail that could be optimized in the future?
> > 
> > The point of the matter is that there is no API to make an explicit
> > request for a "volatile reflink" that does not need to survive power
> > failure and that limits the ability of filesytems to optimize this case.
> 
> Well, to me this seems like a relatively rare usecase (and performance
> gain) for the complexity. Also the speed of reflink is fs dependent - e.g.
> for btrfs it is rather cheap AFAIK.

I suspect for "very large read-mostly file" it's still an expensive
operation on btrfs.

Really, though, for this use case it's make more sense to have "per
file freeze" semantics. i.e. if you want a consistent backup image
on snapshot capable storage, the process is usually "freeze
filesystem, snapshot fs, unfreeze fs, do backup from snapshot,
remove snapshot". We can already transparently block incoming
writes/modifications on files via the freeze mechanism, so why not
just extend that to per-file granularity so writes to the "very
large read-mostly file" block while it's being backed up....

Indeed, this would probably only require a simple extension to
FIFREEZE/FITHAW - the parameter is currently ignored, but as defined
by XFS it was a "freeze level". Set this to 0xffffffff and then it
freezes just the fd passed in, not the whole filesystem.
Alternatively, FI_FREEZE_FILE/FI_THAW_FILE is simple to define...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-01-28 21:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-25 14:27 [LSF/MM TOPIC] Lazy file reflink Amir Goldstein
2019-01-28 12:50 ` Jan Kara
2019-01-28 21:26   ` Dave Chinner [this message]
2019-01-28 22:56     ` Amir Goldstein
2019-01-29  0:18       ` Dave Chinner
2019-01-29  7:18         ` Amir Goldstein
2019-01-29 23:01           ` Dave Chinner
2019-01-30 13:30             ` Amir Goldstein
2019-01-31 20:25               ` Chris Murphy
2019-01-31 21:13     ` Matthew Wilcox
2019-02-01 13:49       ` Amir Goldstein
2019-04-27 21:46         ` Amir Goldstein
2019-01-31 20:02 ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190128212642.GQ4205@dastard \
    --to=david@fromorbit.com \
    --cc=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.