linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "bfields@fieldses.org" <bfields@fieldses.org>
To: Trond Myklebust <trondmy@primarydata.com>
Cc: "hch@infradead.org" <hch@infradead.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"kolga@netapp.com" <kolga@netapp.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
Date: Thu, 9 Mar 2017 10:29:48 -0500	[thread overview]
Message-ID: <20170309152948.GB3929@fieldses.org> (raw)
In-Reply-To: <1489006194.3098.12.camel@primarydata.com>

On Wed, Mar 08, 2017 at 08:49:56PM +0000, Trond Myklebust wrote:
> On Wed, 2017-03-08 at 15:32 -0500, bfields@fieldses.org wrote:
> > On Wed, Mar 08, 2017 at 08:18:31PM +0000, Trond Myklebust wrote:
> > > On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote:
> > > > > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.o
> > > > > rg>
> > > > > wrote:
> > > > > 
> > > > > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia
> > > > > wrote:
> > > > > > 
> > > > > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infrade
> > > > > > > ad.o
> > > > > > > rg>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields
> > > > > > > wrote:
> > > > > > > > Since copy isn't atomic that check is never going to be
> > > > > > > > reliable.
> > > > > > > 
> > > > > > > That's true for everything that COPY does.  By that logic
> > > > > > > we
> > > > > > > should
> > > > > > > not implement it at all (a logic that I'd fully support)
> > > > > > 
> > > > > > If you were to only keep CLONE then you’d lose a huge
> > > > > > performance
> > > > > > gain
> > > > > > you get from server-to-server COPY. 
> > > > > 
> > > > > Yes.  Also, I think copy-like copy implementations have
> > > > > reasonable
> > > > > semantics that are basically the same as read:
> > > > > 
> > > > > 	- copy can return successfully with less copied than
> > > > > requested.
> > > > > 	- it's fine for the copied range to start and/or end
> > > > > past end
> > > > > of
> > > > > 	  file, it'll just return a short read.
> > > > > 	- A copy of more than 0 bytes returning 0 means you're
> > > > > at end
> > > > > of
> > > > > 	  file.
> > > > > 
> > > > > The particular problem here is that that doesn't fit how clone
> > > > > works at
> > > > > all.
> > > > > 
> > > > > It feels like what happened is that copy_file_range() was made
> > > > > mainly
> > > > > for the clone case, with the idea that copy might be
> > > > > reluctantly
> > > > > accepted as a second-class implementation.
> > > 
> > > Historically? No... Christoph added clone as a valid implementation
> > > of
> > > copy_file_range() almost a year after Zach and Anna defined the
> > > semantics of vfs_copy_file_range(). git blame is your friend...
> > 
> > Yeah, I know.  It still feels to me like the interface was originally
> > designed with clone in mind, but that's my vague impression from the
> > man
> > pages and half-remembered conversations.
> > 
> > Though the lack of a "just copy the whole file regardless of size"
> > case
> > is weird for clone.  All you can do is stat the file and then hope it
> > doesn't change before you issue the copy_file_range.  But I'd think
> > it'd
> > be easy for an atomic clone implementation to handle, say, getting a
> > snapshot of a log file while it's getting continuously appended to.
> 
> It really isn't that interesting in the continuously appended case
> (what difference does it make if you only get data from just a few
> moments ago), but I can see it being an issue in the case of random
> writes where the file size is being extended.

Bah, yes, apologies for the bad example.

> The thing is that in both those cases, the copy_file_range() semantics
> are worse, since they don't even guarantee a time-consistent copy.
>
> > > > > But the performance gain of copy offload is too big to just
> > > > > ignore,
> > > > > and
> > > > > in fact it's what copy_file_range does on every filesystem but
> > > > > btrfs and
> > > > > ocfs2 (and maybe cifs?), so I don't think we can just ignore
> > > > > it.
> > > > > 
> > > > > If we had separate copy_file_range and clone_file_range, I
> > > > > *think*
> > > > > it
> > > > > could all be made sensible.  Am I missing something?
> > > > > 
> > > > 
> > > > How would the application (cp) know when to call the
> > > > clone_file_range
> > > > and when to call copy_file_range?
> > > 
> > > cp can probably call copy_file_range(), but any application that
> > > needs
> > > atomic semantics (i.e. a binary operation success/fail) must call
> > > clone_file_range().
> > 
> > I don't believe there's a clone_file_range().  I see the vfs
> > interface,
> > but no system call.
> 
> There is a standard FICLONERANGE ioctl() that can be used on all
> filesystems that support the vfs interface.

Oh, thanks, I forgot about that.

So I don't understand why it needed to be added to copy_file_range().
The copy and clone semantics are different enough that I think callers
want to know which they're getting.

> > And implementing a simple cp is harder than it should be when you
> > don't
> > know whether it's implemented as copy or clone.  You have to stat for
> > the file size first, retry if you got it wrong, and also retry if you
> > get a short read.  The example in the clone_file_range() man page is
> > incomplete.
> 
> As I said, you shouldn't be using copy_file_range() either in the case
> where the file is being modified.

Don't we want it to have more or less the same behavior as a read-write
loop?  People are probably running backup programs that depend on just
simple copies, and maybe the results are good enough for their purposes,
or maybe they're actually corrupting parts of their backups and don't
know, but we can't suddenly start aborting their backups with errors and
tell users it's for their own good.  So copy_file_range() callers will
need to handle EINVAL on changing files somehow.

--b.

  reply	other threads:[~2017-03-09 15:30 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 01/19] fs: Don't copy beyond the end of the file Olga Kornievskaia
2017-03-02 16:22   ` Christoph Hellwig
2017-03-02 16:34     ` Olga Kornievskaia
2017-03-03 20:47     ` J. Bruce Fields
2017-03-03 21:08       ` Olga Kornievskaia
2017-03-03 21:32         ` J. Bruce Fields
     [not found]           ` <B3F80DA0-B4F8-4628-88C5-E5C047620F17@netapp.com>
2017-03-04  2:10             ` J. Bruce Fields
2017-03-06 16:27               ` [RFC v1 01/19] " Olga Kornievskaia
2017-03-06 19:09                 ` J. Bruce Fields
2017-03-06 19:23                 ` J. Bruce Fields
2017-03-07 14:18                   ` Olga Kornievskaia
2017-03-07 23:40       ` [RFC v1 01/19] fs: " Christoph Hellwig
2017-03-08 17:05         ` J. Bruce Fields
2017-03-08 17:25           ` Christoph Hellwig
2017-03-08 17:32             ` Olga Kornievskaia
2017-03-08 19:53               ` J. Bruce Fields
2017-03-08 20:00                 ` Olga Kornievskaia
2017-03-08 20:18                   ` J. Bruce Fields
2017-03-08 20:18                   ` Trond Myklebust
2017-03-08 20:32                     ` bfields
2017-03-08 20:49                       ` Trond Myklebust
2017-03-09 15:29                         ` bfields [this message]
2017-03-09 15:35                           ` hch
2017-03-09 16:16                             ` bfields
2017-03-09 16:17                               ` hch
2017-03-09 17:28                                 ` Olga Kornievskaia
2017-03-09 18:40                                   ` bfields
2017-03-09 21:55                                   ` hch
2017-03-09 17:35                               ` Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 02/19] VFS permit cross device vfs_copy_file_range Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 03/19] VFS don't try clone if superblocks are different Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 04/19] NFS inter ssc open Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 05/19] NFS add COPY_NOTIFY operation Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 06/19] NFS add ca_source_server<> to COPY Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 07/19] NFS CB_OFFLOAD xdr Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 08/19] NFS OFFLOAD_STATUS xdr Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 09/19] NFS OFFLOAD_STATUS op Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 10/19] NFS OFFLOAD_CANCEL xdr Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 11/19] NFS COPY xdr handle async reply Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 12/19] NFS add support for asynchronous COPY Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 13/19] NFS handle COPY reply CB_OFFLOAD call race Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 14/19] NFS send OFFLOAD_CANCEL when COPY killed Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 15/19] NFS make COPY synchronous xdr configurable Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 16/19] NFS handle COPY ERR_OFFLOAD_NO_REQS Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 17/19] NFS skip recovery of copy open on dest server Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 18/19] NFS recover from destination server reboot for copies Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 19/19] NFS if we got partial copy ignore errors Olga Kornievskaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170309152948.GB3929@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=hch@infradead.org \
    --cc=kolga@netapp.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).