All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Zach Brown <zab@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>,
	Ric Wheeler <rwheeler@redhat.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Chris L. Mason" <clmason@fusionio.com>,
	Christoph Hellwig <hch@infradead.org>,
	Alexander Viro <aviro@redhat.com>,
	"Martin K. Petersen" <mkp@mkp.net>,
	Hannes Reinecke <hare@suse.de>, Joel Becker <jlbec@evilplan.org>
Subject: Re: New copyfile system call - discuss before LSF?
Date: Sat, 23 Feb 2013 00:32:39 +0000	[thread overview]
Message-ID: <20130223003239.GA1469@dcvr.yhbt.net> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA9235DB089@SACEXCMBX04-PRD.hq.netapp.com>

"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > -----Original Message-----
> > From: Zach Brown [mailto:zab@redhat.com]
> > Sent: Thursday, February 21, 2013 5:25 PM
> > To: Myklebust, Trond
> > Cc: Paolo Bonzini; Ric Wheeler; Linux FS Devel; linux-kernel@vger.kernel.org;
> > Chris L. Mason; Christoph Hellwig; Alexander Viro; Martin K. Petersen;
> > Hannes Reinecke; Joel Becker
> > Subject: Re: New copyfile system call - discuss before LSF?
> > 
> > On Thu, Feb 21, 2013 at 08:50:27PM +0000, Myklebust, Trond wrote:
> > > On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> > > > Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> > > > >>>
> > > > >> sendfile64() pretty much already has the right arguments for a
> > > > >> "copyfile", however it would be nice to add a 'flags' parameter:
> > > > >> the
> > > > >> NFSv4.2 version would use that to specify whether or not to copy
> > > > >> file metadata.
> > > > >
> > > > > That would seem to be enough to me and has the advantage that it
> > > > > is an relatively obvious extension to something that is at least
> > > > > not totally unknown to developers.
> > > > >
> > > > > Do we need more than that for non-NFS paths I wonder? What does
> > > > > reflink need or the SCSI mechanism?
> > > >
> > > > For virt we would like to be able to specify arbitrary block ranges.
> > > > Copying an entire file helps some copy operations like storage
> > > > migration.  However, it is not enough to convert the guest's
> > > > offloaded copies to host-side offloaded copies.
> > >
> > > So how would a system call based on sendfile64() plus my flag
> > > parameter prevent an underlying implementation from meeting your
> > criterion?
> > 
> > If I'm guessing correctly, sendfile64()+flags would be annoying because it's
> > missing an out_fd_offset.  The host will want to offload the guest's copies by
> > calling sendfile on block ranges of a guest disk image file that correspond to
> > the mappings of the in and out files in the guest.
> > 
> > You could make it work with some locking and out_fd seeking to set the
> > write offset before calling sendfile64()+flags, but ugh.
> > 
> >  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> >                   out_offset, size_t count, int flags);
> > 
> > That seems closer.
> 
> psendfile() ?
> 
> I fully agree that sounds reasonable... Just being an ass. :-)

splice() already has offset for both fds and a flags arg:

       ssize_t splice(int fd_in, loff_t *off_in, int fd_out,
                      loff_t *off_out, size_t len, unsigned int flags);

The current downside is it requires one fd to be a pipe, so it's
just not very easy to use from my perspective[1].

> > We might also want to pre-emptively offer iovs instead of offsets, because
> > that's the very first thing that's going to be requested after people prototype
> > having to iterate calling sendfile() for each contiguous copy region.
> 
> vpsendfile() then? I agree that might be a little more future-proof. Particularly given that the underlying protocols tend to be fully asynchronous, and so it makes sense to queue up more than one copy at a time...

splicev() might be nice to have in that case, too.



[1] my splice() annoyances:
    * need to create/manage a pipe
    * copy size limited by pipe size
    * doesn't reduce userspace syscalls (just data copy overhead)
    * easy to misuse and starve with blocking sockets + big buffers
    * not many users, so bugs creep in (v3.7.8 was the first usable
      version of the 3.7 series for TCP sockets)

  reply	other threads:[~2013-02-23  0:32 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-21 11:37 New copyfile system call - discuss before LSF? Ric Wheeler
2013-02-21 13:37 ` Hannes Reinecke
2013-02-21 13:51 ` Myklebust, Trond
2013-02-21 14:57   ` Ric Wheeler
2013-02-21 16:36     ` Andreas Dilger
2013-02-21 20:00     ` Paolo Bonzini
2013-02-21 20:50       ` Myklebust, Trond
2013-02-21 22:24         ` Zach Brown
2013-02-22  1:29           ` Myklebust, Trond
2013-02-23  0:32             ` Eric Wong [this message]
2013-03-30 19:45               ` Pavel Machek
2013-03-31 21:23                 ` Eric Wong
2013-02-22  9:47           ` Paolo Bonzini
2013-02-22  9:52             ` Ric Wheeler
2013-02-22 18:22               ` Zach Brown
2013-02-22 22:48                 ` Myklebust, Trond
2013-02-25 21:14           ` Andy Lutomirski
2013-02-25 21:49             ` Ric Wheeler
2013-02-25 21:59               ` Myklebust, Trond
2013-02-25 22:16                 ` Andy Lutomirski
2013-02-25 23:28                   ` Myklebust, Trond
2013-02-25 23:35                     ` Andy Lutomirski
2013-02-25 23:45                       ` Myklebust, Trond
2013-02-26  0:03                         ` Zach Brown
2013-03-11  9:31                           ` Joel Becker
2013-02-26 21:02             ` Jörn Engel
2013-02-26 22:35               ` Andy Lutomirski
2013-03-30 19:49               ` Pavel Machek
2013-03-30 20:08                 ` Andreas Dilger
2013-03-30 21:45                   ` Pavel Machek
2013-03-30 21:57                     ` Myklebust, Trond
2013-03-30 23:21                       ` Ric Wheeler
2013-03-31  2:53                         ` Andreas Dilger
2013-03-31  3:52                           ` Myklebust, Trond
2013-03-31  4:18                             ` Andy Lutomirski
2013-03-31  4:36                               ` Myklebust, Trond
2013-03-31  4:45                                 ` Myklebust, Trond
2013-04-01 15:49                                 ` J. Bruce Fields
2013-03-31  7:36                       ` Pavel Machek
2013-03-31 18:27                         ` Myklebust, Trond
2013-03-31 18:32                           ` openat(..., AT_UNLINKED) was " Pavel Machek
2013-03-31 18:44                             ` Myklebust, Trond
2013-03-31 22:50                               ` Pavel Machek
2013-03-31 23:14                                 ` Ric Wheeler
2013-03-31 23:18                                   ` Pavel Machek
2013-03-31 23:28                                     ` Ric Wheeler
2013-03-31 23:41                                       ` Pavel Machek
2013-03-31  5:38                     ` AEDilger Gmail
2013-03-31  8:25                       ` Pavel Machek
2013-03-31 11:48                   ` Pádraig Brady
2013-03-30 22:40                 ` Andy Lutomirski
2013-02-21 22:05       ` Ric Wheeler
2013-02-21 22:13         ` Myklebust, Trond
2013-02-22  8:47           ` Ric Wheeler
2013-02-21 18:29   ` Jeremy Allison
2013-02-22  0:29     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130223003239.GA1469@dcvr.yhbt.net \
    --to=normalperson@yhbt.net \
    --cc=Trond.Myklebust@netapp.com \
    --cc=aviro@redhat.com \
    --cc=clmason@fusionio.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=jlbec@evilplan.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkp@mkp.net \
    --cc=pbonzini@redhat.com \
    --cc=rwheeler@redhat.com \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.