* copy offload support in Linux - new system call needed? @ 2011-12-14 19:22 Ric Wheeler 2011-12-14 19:27 ` Al Viro 2011-12-14 19:59 ` Jeremy Allison 0 siblings, 2 replies; 29+ messages in thread From: Ric Wheeler @ 2011-12-14 19:22 UTC (permalink / raw) To: linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley Back at LinuxCon Prague, we talked about the new NFS and SCSI commands that let us offload copy operations to a storage device (like an NFS server or storage array). This got new life in the virtual machine world where you might want to clone bulky guest files or ranges of blocks and was driven through the standards bodies by vmware, microsoft and some of the major storage vendors. Windows8 has this functionality fully coded and integrated in the GUI, I assume vmware also uses it and there are some vendors who announced support at the SNIA SDC conference. We had an active thread a couple of years back that came out of the reflink work and, at the time, there seemed to be moderately positive support for adding a new system call that would fit this use case (Joel Becker's copyfile()). Can we resurrect this effort? Is copyfile() still a good way to go, or should we look at other hooks? Thanks! Ric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:22 copy offload support in Linux - new system call needed? Ric Wheeler @ 2011-12-14 19:27 ` Al Viro 2011-12-14 19:42 ` Ric Wheeler 2011-12-14 19:59 ` Jeremy Allison 1 sibling, 1 reply; 29+ messages in thread From: Al Viro @ 2011-12-14 19:27 UTC (permalink / raw) To: Ric Wheeler Cc: linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > We had an active thread a couple of years back that came out of the > reflink work and, at the time, there seemed to be moderately > positive support for adding a new system call that would fit this > use case (Joel Becker's copyfile()). > > Can we resurrect this effort? Is copyfile() still a good way to go, > or should we look at other hooks? copyfile(2) is probably a good way to go, provided that we do _not_ go baroque as it had happened the last time syscall had been discussed. IOW, to hell with progress reports, etc. - just a fastpath kind of thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). If it works - fine, if not - caller has to be ready to deal with handling cross-device case anyway. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:27 ` Al Viro @ 2011-12-14 19:42 ` Ric Wheeler 2011-12-14 22:27 ` J. Bruce Fields 2011-12-16 8:00 ` Joel Becker 0 siblings, 2 replies; 29+ messages in thread From: Ric Wheeler @ 2011-12-14 19:42 UTC (permalink / raw) To: Al Viro Cc: linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On 12/14/2011 02:27 PM, Al Viro wrote: > On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > >> We had an active thread a couple of years back that came out of the >> reflink work and, at the time, there seemed to be moderately >> positive support for adding a new system call that would fit this >> use case (Joel Becker's copyfile()). >> >> Can we resurrect this effort? Is copyfile() still a good way to go, >> or should we look at other hooks? > copyfile(2) is probably a good way to go, provided that we do _not_ > go baroque as it had happened the last time syscall had been discussed. > > IOW, to hell with progress reports, etc. - just a fastpath kind of > thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > If it works - fine, if not - caller has to be ready to deal with handling > cross-device case anyway. I think that this approach makes a lot of sense. Most of the devices/targets that support the copy offload, will do it in very reasonable amounts of time. Let me see if I can dig up some of the presentations from the NetApp guys who presented overviews or the specifications from the IETF and T10.... Ric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:42 ` Ric Wheeler @ 2011-12-14 22:27 ` J. Bruce Fields 2011-12-15 14:59 ` Trond Myklebust 2011-12-16 8:00 ` Joel Becker 1 sibling, 1 reply; 29+ messages in thread From: J. Bruce Fields @ 2011-12-14 22:27 UTC (permalink / raw) To: Ric Wheeler Cc: Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > On 12/14/2011 02:27 PM, Al Viro wrote: > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > >>We had an active thread a couple of years back that came out of the > >>reflink work and, at the time, there seemed to be moderately > >>positive support for adding a new system call that would fit this > >>use case (Joel Becker's copyfile()). > >> > >>Can we resurrect this effort? Is copyfile() still a good way to go, > >>or should we look at other hooks? > >copyfile(2) is probably a good way to go, provided that we do _not_ > >go baroque as it had happened the last time syscall had been discussed. > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > >If it works - fine, if not - caller has to be ready to deal with handling > >cross-device case anyway. > > I think that this approach makes a lot of sense. Most of the > devices/targets that support the copy offload, will do it in very > reasonable amounts of time. The current NFSv4.2 draft rolls both the "fast" and "slow" cases into one operation: http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 Perhaps we should ask for separate operations for the two cases. (Or at least a "please don't bother if this is going to take 8 hours" flag....) --b. > > Let me see if I can dig up some of the presentations from the NetApp > guys who presented overviews or the specifications from the IETF and > T10.... > > Ric > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 22:27 ` J. Bruce Fields @ 2011-12-15 14:59 ` Trond Myklebust 2011-12-15 15:52 ` Chris Mason ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 14:59 UTC (permalink / raw) To: J. Bruce Fields Cc: Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > On 12/14/2011 02:27 PM, Al Viro wrote: > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > >>We had an active thread a couple of years back that came out of the > > >>reflink work and, at the time, there seemed to be moderately > > >>positive support for adding a new system call that would fit this > > >>use case (Joel Becker's copyfile()). > > >> > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > >>or should we look at other hooks? > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > >go baroque as it had happened the last time syscall had been discussed. > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > >If it works - fine, if not - caller has to be ready to deal with handling > > >cross-device case anyway. > > > > I think that this approach makes a lot of sense. Most of the > > devices/targets that support the copy offload, will do it in very > > reasonable amounts of time. > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > one operation: > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > Perhaps we should ask for separate operations for the two cases. (Or at > least a "please don't bother if this is going to take 8 hours" flag....) How would the server know? I suggest we deal with this by adding an ioctl() to allow the application to poll for progress: I'm assuming now that we don't expect more than 1 copyfile() system call at a time per file descriptor... Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 14:59 ` Trond Myklebust @ 2011-12-15 15:52 ` Chris Mason 2011-12-15 16:00 ` Trond Myklebust 2011-12-15 16:03 ` Jeff Layton 2011-12-15 16:08 ` Loke, Chetan 2011-12-15 17:44 ` J. Bruce Fields 2 siblings, 2 replies; 29+ messages in thread From: Chris Mason @ 2011-12-15 15:52 UTC (permalink / raw) To: Trond Myklebust Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > >>We had an active thread a couple of years back that came out of the > > > >>reflink work and, at the time, there seemed to be moderately > > > >>positive support for adding a new system call that would fit this > > > >>use case (Joel Becker's copyfile()). > > > >> > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > >>or should we look at other hooks? > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > >cross-device case anyway. > > > > > > I think that this approach makes a lot of sense. Most of the > > > devices/targets that support the copy offload, will do it in very > > > reasonable amounts of time. > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > one operation: > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > least a "please don't bother if this is going to take 8 hours" flag....) > > How would the server know? I suggest we deal with this by adding an > ioctl() to allow the application to poll for progress: I'm assuming now > that we don't expect more than 1 copyfile() system call at a time per > file descriptor... If we're using this to copy VM image files, I could easily imagine wanting to clone multiple copies of the VM in parallel. -chris ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 15:52 ` Chris Mason @ 2011-12-15 16:00 ` Trond Myklebust 2011-12-15 16:03 ` Jeff Layton 1 sibling, 0 replies; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:00 UTC (permalink / raw) To: Chris Mason Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 2011-12-15 at 10:52 -0500, Chris Mason wrote: > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > >>reflink work and, at the time, there seemed to be moderately > > > > >>positive support for adding a new system call that would fit this > > > > >>use case (Joel Becker's copyfile()). > > > > >> > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > >>or should we look at other hooks? > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > >cross-device case anyway. > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > devices/targets that support the copy offload, will do it in very > > > > reasonable amounts of time. > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > one operation: > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > How would the server know? I suggest we deal with this by adding an > > ioctl() to allow the application to poll for progress: I'm assuming now > > that we don't expect more than 1 copyfile() system call at a time per > > file descriptor... > > If we're using this to copy VM image files, I could easily imagine > wanting to clone multiple copies of the VM in parallel. Sure, but in that case, your target file descriptors will differ, right? Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 15:52 ` Chris Mason 2011-12-15 16:00 ` Trond Myklebust @ 2011-12-15 16:03 ` Jeff Layton 2011-12-15 16:06 ` Trond Myklebust 1 sibling, 1 reply; 29+ messages in thread From: Jeff Layton @ 2011-12-15 16:03 UTC (permalink / raw) To: Chris Mason Cc: Trond Myklebust, J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 15 Dec 2011 10:52:13 -0500 Chris Mason <chris.mason@oracle.com> wrote: > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > >>reflink work and, at the time, there seemed to be moderately > > > > >>positive support for adding a new system call that would fit this > > > > >>use case (Joel Becker's copyfile()). > > > > >> > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > >>or should we look at other hooks? > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > >cross-device case anyway. > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > devices/targets that support the copy offload, will do it in very > > > > reasonable amounts of time. > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > one operation: > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > How would the server know? I suggest we deal with this by adding an > > ioctl() to allow the application to poll for progress: I'm assuming now > > that we don't expect more than 1 copyfile() system call at a time per > > file descriptor... > > If we're using this to copy VM image files, I could easily imagine > wanting to clone multiple copies of the VM in parallel. > > -chris > Not really a problem is it? Just dup() the fd before you issue the copyfile()? Or even simpler, just do periodic stat() on the destination file if you want a progress report. Regardless, I like the simple approach that Al is suggesting here. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 16:03 ` Jeff Layton @ 2011-12-15 16:06 ` Trond Myklebust 2011-12-15 16:16 ` Jeff Layton 0 siblings, 1 reply; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:06 UTC (permalink / raw) To: Jeff Layton Cc: Chris Mason, J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote: > On Thu, 15 Dec 2011 10:52:13 -0500 > Chris Mason <chris.mason@oracle.com> wrote: > > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > > >>reflink work and, at the time, there seemed to be moderately > > > > > >>positive support for adding a new system call that would fit this > > > > > >>use case (Joel Becker's copyfile()). > > > > > >> > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > > >>or should we look at other hooks? > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > > >cross-device case anyway. > > > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > > devices/targets that support the copy offload, will do it in very > > > > > reasonable amounts of time. > > > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > > one operation: > > > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > > > How would the server know? I suggest we deal with this by adding an > > > ioctl() to allow the application to poll for progress: I'm assuming now > > > that we don't expect more than 1 copyfile() system call at a time per > > > file descriptor... > > > > If we're using this to copy VM image files, I could easily imagine > > wanting to clone multiple copies of the VM in parallel. > > > > -chris > > > > Not really a problem is it? Just dup() the fd before you issue the > copyfile()? Or even simpler, just do periodic stat() on the destination > file if you want a progress report. > > Regardless, I like the simple approach that Al is suggesting here. Periodic stat() isn't good enough if you are copying subranges of a file. Part of the application here (as I understood it) is to initialise specific disk volumes on existing VM images when doing thin provisioning. In that case, the reported image size won't ever change... Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 16:06 ` Trond Myklebust @ 2011-12-15 16:16 ` Jeff Layton 2011-12-15 16:38 ` Trond Myklebust 0 siblings, 1 reply; 29+ messages in thread From: Jeff Layton @ 2011-12-15 16:16 UTC (permalink / raw) To: Trond Myklebust Cc: Chris Mason, J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 15 Dec 2011 11:06:16 -0500 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote: > > On Thu, 15 Dec 2011 10:52:13 -0500 > > Chris Mason <chris.mason@oracle.com> wrote: > > > > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > > > >>reflink work and, at the time, there seemed to be moderately > > > > > > >>positive support for adding a new system call that would fit this > > > > > > >>use case (Joel Becker's copyfile()). > > > > > > >> > > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > > > >>or should we look at other hooks? > > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > > > >cross-device case anyway. > > > > > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > > > devices/targets that support the copy offload, will do it in very > > > > > > reasonable amounts of time. > > > > > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > > > one operation: > > > > > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > > > > > How would the server know? I suggest we deal with this by adding an > > > > ioctl() to allow the application to poll for progress: I'm assuming now > > > > that we don't expect more than 1 copyfile() system call at a time per > > > > file descriptor... > > > > > > If we're using this to copy VM image files, I could easily imagine > > > wanting to clone multiple copies of the VM in parallel. > > > > > > -chris > > > > > > > Not really a problem is it? Just dup() the fd before you issue the > > copyfile()? Or even simpler, just do periodic stat() on the destination > > file if you want a progress report. > > > > Regardless, I like the simple approach that Al is suggesting here. > > Periodic stat() isn't good enough if you are copying subranges of a > file. Part of the application here (as I understood it) is to initialise > specific disk volumes on existing VM images when doing thin > provisioning. In that case, the reported image size won't ever change... > If they were sparse files then st_blocks would presumably change, but that's not necessarily going to be the case. So, ok stat() is out for this... What's the use-case for these sorts of progress reports anyway? Progress meters in GUI apps? Either way, I think adding as simple an interface as possible to begin with makes sense. If you want to add progress reports or other doohickeys later, then that can be done in a separate set of patches... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 16:16 ` Jeff Layton @ 2011-12-15 16:38 ` Trond Myklebust 0 siblings, 0 replies; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:38 UTC (permalink / raw) To: Jeff Layton Cc: Chris Mason, J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 2011-12-15 at 11:16 -0500, Jeff Layton wrote: > On Thu, 15 Dec 2011 11:06:16 -0500 > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote: > > > On Thu, 15 Dec 2011 10:52:13 -0500 > > > Chris Mason <chris.mason@oracle.com> wrote: > > > > > > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > > > > >>reflink work and, at the time, there seemed to be moderately > > > > > > > >>positive support for adding a new system call that would fit this > > > > > > > >>use case (Joel Becker's copyfile()). > > > > > > > >> > > > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > > > > >>or should we look at other hooks? > > > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > > > > >cross-device case anyway. > > > > > > > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > > > > devices/targets that support the copy offload, will do it in very > > > > > > > reasonable amounts of time. > > > > > > > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > > > > one operation: > > > > > > > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > > > > > > > How would the server know? I suggest we deal with this by adding an > > > > > ioctl() to allow the application to poll for progress: I'm assuming now > > > > > that we don't expect more than 1 copyfile() system call at a time per > > > > > file descriptor... > > > > > > > > If we're using this to copy VM image files, I could easily imagine > > > > wanting to clone multiple copies of the VM in parallel. > > > > > > > > -chris > > > > > > > > > > Not really a problem is it? Just dup() the fd before you issue the > > > copyfile()? Or even simpler, just do periodic stat() on the destination > > > file if you want a progress report. > > > > > > Regardless, I like the simple approach that Al is suggesting here. > > > > Periodic stat() isn't good enough if you are copying subranges of a > > file. Part of the application here (as I understood it) is to initialise > > specific disk volumes on existing VM images when doing thin > > provisioning. In that case, the reported image size won't ever change... > > > > If they were sparse files then st_blocks would presumably change, but > that's not necessarily going to be the case. So, ok stat() is out for > this... > > What's the use-case for these sorts of progress reports anyway? > Progress meters in GUI apps? Mainly... If you are copying several GB worth of data, you expect it to take some time, but you'd like to know that the server hasn't just crashed or something... > Either way, I think adding as simple an interface as possible to begin > with makes sense. If you want to add progress reports or other > doohickeys later, then that can be done in a separate set of patches... Agreed. ...and doing it as an ioctl allows for that. I just want to make sure someone else here doesn't have a use case that might blow that idea out of the water... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 14:59 ` Trond Myklebust 2011-12-15 15:52 ` Chris Mason @ 2011-12-15 16:08 ` Loke, Chetan 2011-12-15 16:11 ` Trond Myklebust 2011-12-15 17:44 ` J. Bruce Fields 2 siblings, 1 reply; 29+ messages in thread From: Loke, Chetan @ 2011-12-15 16:08 UTC (permalink / raw) To: Trond Myklebust, J. Bruce Fields Cc: Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley PiBIb3cgd291bGQgdGhlIHNlcnZlciBrbm93PyBJIHN1Z2dlc3Qgd2UgZGVhbCB3aXRoIHRoaXMg YnkgYWRkaW5nIGFuDQo+IGlvY3RsKCkgdG8gYWxsb3cgdGhlIGFwcGxpY2F0aW9uIHRvIHBvbGwg Zm9yIHByb2dyZXNzOiBJJ20gYXNzdW1pbmcgbm93DQoNCldoeSBub3Qgc3VwcG9ydCBzb21ldGhp bmcgbGlrZSB0aGUgYXN5bmMtaW9jYj8NCg0KDQo= ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 16:08 ` Loke, Chetan @ 2011-12-15 16:11 ` Trond Myklebust 2011-12-15 16:40 ` Loke, Chetan 0 siblings, 1 reply; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:11 UTC (permalink / raw) To: Loke, Chetan Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 2011-12-15 at 11:08 -0500, Loke, Chetan wrote: > > How would the server know? I suggest we deal with this by adding an > > ioctl() to allow the application to poll for progress: I'm assuming now > > Why not support something like the async-iocb? You could, but that would tie copyfile() to the aio interface which was one of the things that I believe Al was opposed to when we discussed this at LSF/MM-2010. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 16:11 ` Trond Myklebust @ 2011-12-15 16:40 ` Loke, Chetan 2011-12-15 16:53 ` Trond Myklebust 0 siblings, 1 reply; 29+ messages in thread From: Loke, Chetan @ 2011-12-15 16:40 UTC (permalink / raw) To: Trond Myklebust Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley PiA+DQo+ID4gV2h5IG5vdCBzdXBwb3J0IHNvbWV0aGluZyBsaWtlIHRoZSBhc3luYy1pb2NiPw0K PiANCj4gWW91IGNvdWxkLCBidXQgdGhhdCB3b3VsZCB0aWUgY29weWZpbGUoKSB0byB0aGUgYWlv IGludGVyZmFjZSB3aGljaCB3YXMNCj4gb25lIG9mIHRoZSB0aGluZ3MgdGhhdCBJIGJlbGlldmUg QWwgd2FzIG9wcG9zZWQgdG8gd2hlbiB3ZSBkaXNjdXNzZWQNCj4gdGhpcyBhdCBMU0YvTU0tMjAx MC4NCj4gDQoNCnZpcnR1YWxpemF0aW9uIHZlbmRvcnMgd2hvIHN1cHBvcnQgdGhpcyBvZmZsb2Fk IGRvIGl0IGF0IGEgbGF5ZXIgYWJvdmUgdGhlIGd1ZXN0LU9TKEludHJhLUxVTih0bSkgbG9ja2lu ZyBvciB3aGF0ZXZlciBmYW5jeSBsb2NraW5nKS4gU28gSSB0aGluayAnY29weWZpbGUnIGlzIGdv aW5nIHRvIGJlIGFwcGVhbGluZyB0byBhcHBsaWNhdGlvbi1kZXZlbG9wZXJzIG1vcmUgdGhhbiB0 aGUgaHlwZXJ2aXNvci12ZW5kb3JzLg0KDQpTbyBsZXQncyB0aGluayBhYm91dCBpdCBmcm9tIGVu ZC11c2VycyBwZXJzcGVjdGl2ZToNCldvbid0IGV2ZXJ5b25lIHJlcGxpY2F0ZSBjb2RlIHRvIGNo ZWNrIC0gJ0FtIEkgZG9uZSc/IEl0IHdpbGwganVzdCBtYWtlIGFwcGxpY2F0aW9uIGZvbGtzIHdy aXRlIG1vcmUgKHVnbHkpY29kZS4gQmVjYXVzZSB5b3Ugd291bGQgdGhlbiBoYXZlIHRvIG1haW50 YWluIGFub3RoZXIgcXVldWUvZXRjIHRvIGNoZWNrIGZvciB0aGlzIG9wZXJhdGlvbi4NCg0KV2Ug Y2FuIGp1c3Qgc3VwcG9ydCBmdWxsLWNvcHkuIFBhcnRpYWwgY29waWVzIGNhbiBiZSByZXR1cm5l ZCBhcyBmYWlsdXJlLg0K ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 16:40 ` Loke, Chetan @ 2011-12-15 16:53 ` Trond Myklebust 2011-12-15 17:18 ` Ric Wheeler 2011-12-15 17:27 ` Loke, Chetan 0 siblings, 2 replies; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:53 UTC (permalink / raw) To: Loke, Chetan Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 2011-12-15 at 11:40 -0500, Loke, Chetan wrote: > > > > > > Why not support something like the async-iocb? > > > > You could, but that would tie copyfile() to the aio interface which was > > one of the things that I believe Al was opposed to when we discussed > > this at LSF/MM-2010. > > > > virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors. The application is thin provisioning, not the 'cp' command. When virtualisation vendors do support this, it will mainly be as part of their image management toolkits, not the hypervisor. > So let's think about it from end-users perspective: > Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation. 'Am I done' is easy: copyfile() returns with the number of bytes that have been copied. 'Is my copyfile() syscall making progress' is the question that needs answering. > We can just support full-copy. Partial copies can be returned as failure. Then you have to check the entire range on error instead of just resuming the copy from where it stopped. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 16:53 ` Trond Myklebust @ 2011-12-15 17:18 ` Ric Wheeler 2011-12-15 17:25 ` Trond Myklebust 2011-12-15 17:31 ` Loke, Chetan 2011-12-15 17:27 ` Loke, Chetan 1 sibling, 2 replies; 29+ messages in thread From: Ric Wheeler @ 2011-12-15 17:18 UTC (permalink / raw) To: Trond Myklebust Cc: Loke, Chetan, J. Bruce Fields, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On 12/15/2011 11:53 AM, Trond Myklebust wrote: > On Thu, 2011-12-15 at 11:40 -0500, Loke, Chetan wrote: >>>> Why not support something like the async-iocb? >>> You could, but that would tie copyfile() to the aio interface which was >>> one of the things that I believe Al was opposed to when we discussed >>> this at LSF/MM-2010. >>> >> virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors. > The application is thin provisioning, not the 'cp' command. When > virtualisation vendors do support this, it will mainly be as part of > their image management toolkits, not the hypervisor. I think that hypervisor vendors will be very interested in this feature which would explain why vmware was active in drafting both the NFS and T10 specs. Not to mention those of us who use KVM or XEN :) As Trond mentions, we might have this in the management tool chain or other places in the stack. > >> So let's think about it from end-users perspective: >> Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation. > 'Am I done' is easy: copyfile() returns with the number of bytes that > have been copied. > > 'Is my copyfile() syscall making progress' is the question that needs > answering. > >> We can just support full-copy. Partial copies can be returned as failure. > Then you have to check the entire range on error instead of just > resuming the copy from where it stopped. > I also like simple first. I am not too certain about the need for polling (especially given how little we have done historically to take advantage of the notifications, water marks, etc in things like thin provisioning :)). On the other hand, I also don't object to having the ability to poll (through the ioctl or whatever) if others find that useful. What I would like to see is a way to make sure that we can interrupt any long running command & also make sure that our timeouts (for SCSI specifically) are not too aggressive. Ric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 17:18 ` Ric Wheeler @ 2011-12-15 17:25 ` Trond Myklebust 2011-12-15 17:31 ` Loke, Chetan 1 sibling, 0 replies; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 17:25 UTC (permalink / raw) To: Ric Wheeler Cc: Loke, Chetan, J. Bruce Fields, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 2011-12-15 at 12:18 -0500, Ric Wheeler wrote: > What I would like to see is a way to make sure that we can interrupt any long > running command & also make sure that our timeouts (for SCSI specifically) are > not too aggressive. The draft NFSv4.2 protocol contains features to make interruption possible, so as far as the NFS client is concerned, that should be doable. I can't answer for CIFS or SCSI... Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 17:18 ` Ric Wheeler 2011-12-15 17:25 ` Trond Myklebust @ 2011-12-15 17:31 ` Loke, Chetan 2011-12-15 17:55 ` Ric Wheeler 1 sibling, 1 reply; 29+ messages in thread From: Loke, Chetan @ 2011-12-15 17:31 UTC (permalink / raw) To: Ric Wheeler, Trond Myklebust Cc: J. Bruce Fields, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley PiANCj4gSSB0aGluayB0aGF0IGh5cGVydmlzb3IgdmVuZG9ycyB3aWxsIGJlIHZlcnkgaW50ZXJl c3RlZCBpbiB0aGlzIGZlYXR1cmUNCj4gd2hpY2gNCj4gd291bGQgZXhwbGFpbiB3aHkgdm13YXJl IHdhcyBhY3RpdmUgaW4gZHJhZnRpbmcgYm90aCB0aGUgTkZTIGFuZCBUMTANCg0KU3BlY3MgYXJl IHRoZSBvbmx5IHdheSB0byBjb252aW5jZSBzdG9yYWdlLXRhcmdldC12ZW5kb3JzIDspLiBPdGhl cndpc2UgdGFyZ2V0LXN0YWNrIHdpbGwgbmVlZCB0byBpbXBsZW1lbnQgbXVsdGlwbGUgY3VzdG9t LUNEQi1oYW5kbGVycyBmb3IgZGlmZmVyZW50IGZyb250LWVuZCBBUElzKHdoaWNoIGlzIHVnbHkp Lg0KDQoNCkNoZXRhbg0KDQoNCg0K ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 17:31 ` Loke, Chetan @ 2011-12-15 17:55 ` Ric Wheeler 0 siblings, 0 replies; 29+ messages in thread From: Ric Wheeler @ 2011-12-15 17:55 UTC (permalink / raw) To: Loke, Chetan Cc: Trond Myklebust, J. Bruce Fields, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On 12/15/2011 12:31 PM, Loke, Chetan wrote: >> I think that hypervisor vendors will be very interested in this feature >> which >> would explain why vmware was active in drafting both the NFS and T10 > Specs are the only way to convince storage-target-vendors ;). Otherwise target-stack will need to implement multiple custom-CDB-handlers for different front-end APIs(which is ugly). > > > Chetan Hi Chetan, I should post from my "Red Hat" email to make this less confusing for you - I know that this is in fact interesting to vendors :) Thanks! Ric ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 16:53 ` Trond Myklebust 2011-12-15 17:18 ` Ric Wheeler @ 2011-12-15 17:27 ` Loke, Chetan 1 sibling, 0 replies; 29+ messages in thread From: Loke, Chetan @ 2011-12-15 17:27 UTC (permalink / raw) To: Trond Myklebust Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley PiA+IHZpcnR1YWxpemF0aW9uIHZlbmRvcnMgd2hvIHN1cHBvcnQgdGhpcyBvZmZsb2FkIGRvIGl0 IGF0IGEgbGF5ZXINCj4gYWJvdmUgdGhlIGd1ZXN0LU9TKEludHJhLUxVTih0bSkgbG9ja2luZyBv ciB3aGF0ZXZlciBmYW5jeSBsb2NraW5nKS4gU28NCj4gSSB0aGluayAnY29weWZpbGUnIGlzIGdv aW5nIHRvIGJlIGFwcGVhbGluZyB0byBhcHBsaWNhdGlvbi1kZXZlbG9wZXJzDQo+IG1vcmUgdGhh biB0aGUgaHlwZXJ2aXNvci12ZW5kb3JzLg0KPiANCj4gVGhlIGFwcGxpY2F0aW9uIGlzIHRoaW4g cHJvdmlzaW9uaW5nLCBub3QgdGhlICdjcCcgY29tbWFuZC4gV2hlbg0KDQoNCnRoaW4tcHJvdmlz aW9uaW5nIGlzIG9uZSB1c2UtY2FzZS4gVGhlcmUgYXJlIHF1aXRlIGEgZmV3IHVzZS1jYXNlcyBv ZiAnY29weWZpbGUnIGRlcGVuZGluZyBvbiB5b3VyIGJ1c2luZXNzLWxvZ2ljIGFuZCB0aGUgdHlw ZSBvZiBhcHBsaWFuY2UgeW91IHNlbGwuDQoNCg0KPiB2aXJ0dWFsaXNhdGlvbiB2ZW5kb3JzIGRv IHN1cHBvcnQgdGhpcywgaXQgd2lsbCBtYWlubHkgYmUgYXMgcGFydCBvZg0KPiB0aGVpciBpbWFn ZSBtYW5hZ2VtZW50IHRvb2xraXRzLCBub3QgdGhlIGh5cGVydmlzb3IuDQo+IA0KDQpUb29sa2l0 cz8gTWF5IG5vdCBiZSB0cnVlLiBUaGUgdG9vbGtpdCBtaWdodCBuZWVkIHRvIHRhbGsgdG8gc29t ZSBoeXBlcnZpc29yLWNvbXBvbmVudCB0byBlbnN1cmUgTFVOLWxvY2tpbmcgZXRjIG9uIHRoZSB0 YXJnZXQuIFNvIHRoaXMgaXMgbm90IGVudGlyZWx5IGlzb2xhdGVkIGFzIHlvdSBtaWdodCB0aGlu ay4gVGhlcmUgaXMgc29tZSBpbnRlZ3JhdGlvbi4gQXMgYW4gZXhhbXBsZShqdXN0IHRvIHByb3Zl IHRoZSBwb2ludCkgLSBIYXZlIHlvdSBldmVyIHNlZW4gYW55b25lIG5vdCB1c2UgdnNwaGVyZS1j bGllbnQgb24gVk13YXJlIGZvciBjb3B5aW5nIFZNIHRlbXBsYXRlcz8NCg0KDQo+ID4gU28gbGV0 J3MgdGhpbmsgYWJvdXQgaXQgZnJvbSBlbmQtdXNlcnMgcGVyc3BlY3RpdmU6DQo+ID4gV29uJ3Qg ZXZlcnlvbmUgcmVwbGljYXRlIGNvZGUgdG8gY2hlY2sgLSAnQW0gSSBkb25lJz8gSXQgd2lsbCBq dXN0DQo+IG1ha2UgYXBwbGljYXRpb24gZm9sa3Mgd3JpdGUgbW9yZSAodWdseSljb2RlLiBCZWNh dXNlIHlvdSB3b3VsZCB0aGVuDQo+IGhhdmUgdG8gbWFpbnRhaW4gYW5vdGhlciBxdWV1ZS9ldGMg dG8gY2hlY2sgZm9yIHRoaXMgb3BlcmF0aW9uLg0KPiANCj4gJ0FtIEkgZG9uZScgaXMgZWFzeTog Y29weWZpbGUoKSByZXR1cm5zIHdpdGggdGhlIG51bWJlciBvZiBieXRlcyB0aGF0DQo+IGhhdmUg YmVlbiBjb3BpZWQuDQo+IA0KPiAnSXMgbXkgY29weWZpbGUoKSBzeXNjYWxsIG1ha2luZyBwcm9n cmVzcycgaXMgdGhlIHF1ZXN0aW9uIHRoYXQgbmVlZHMNCj4gYW5zd2VyaW5nLg0KPiANCg0KVW5k ZXJzdG9vZC4gQnV0IGFzIGEgdXNlciwgd2UgZG9uJ3Qga25vdyB3aGF0ICdhbSBJIGRvbmUnIGlz IGdvaW5nIHRvIHJlcG9ydC4NCg0KJ2FtIEkgZG9uZScgY2FuIHJldHVybjoNCjEpQUNLW2NvcHkg ZG9uZV0gLSBzaW1wbGlzdGljIGNhc2UuDQoyKUlOLXByb2dyZXNzLg0KMylOQUNLW2NvcHkgZmFp bGVkKHdpdGggc3RhdHVzIHZhbHVlcykgb3IgY29weSBwYXJ0aWFsbHkgY29tcGxldGVkXQ0KDQpB bmQgaWYgeW91IGFyZSB1c2luZyB0aGUgY29weS1WTSB1c2UtY2FzZSB0aGVuIHZlcnkgZmV3IFZN cyBhcmUgdW5kZXIgNEdCcy4gU28gd2Ugd2lsbCBoaXQgMikgYWJvdmUgbW9yZSBmcmVxdWVudGx5 IHRoYW4gMSkgYW5kIDMpLg0KDQoNCj4gPiBXZSBjYW4ganVzdCBzdXBwb3J0IGZ1bGwtY29weS4g UGFydGlhbCBjb3BpZXMgY2FuIGJlIHJldHVybmVkIGFzDQo+IGZhaWx1cmUuDQo+IA0KPiBUaGVu IHlvdSBoYXZlIHRvIGNoZWNrIHRoZSBlbnRpcmUgcmFuZ2Ugb24gZXJyb3IgaW5zdGVhZCBvZiBq dXN0DQo+IHJlc3VtaW5nIHRoZSBjb3B5IGZyb20gd2hlcmUgaXQgc3RvcHBlZC4NCj4gDQoNCldo eSBub3QgcmVzdGFydD8gV2hhdCBpZiB0aGUgTFVOIHdhcyBpbXBsZW1lbnRpbmcgdGhpbi1wcm92 aXNpb25pbmcgYW5kIG5vdyBpdCByYW4gb3V0LW9mLXNwYWNlIGFmdGVyIHBhcnRpYWxseSBjb3B5 aW5nIHlvdXIgZGF0YS4NClNvIHdoeSBub3QgcmVzdGFydCB0aGUgY29weT8gSWYgdGhlIHRhcmdl dCBkb2Vzbid0IHN1cHBvcnQgYXV0by1leHRlbmQsIHNvbWVvbmUoc3RvcmFnZS1hZG1pbiBldGMp IHdvdWxkIGhhdmUgdG8gc3RlcC1pbiBhbmQgbWFuYWdlIHRoYXQgTFVOLg0KWW91IG1pZ2h0IGFz LXdlbGwgcmVzdGFydCB0aGUgY29weSBpbiB0aGlzIGNhc2UuDQoNCg0KDQpDaGV0YW4gTG9rZQ0K DQoNCg== ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 14:59 ` Trond Myklebust 2011-12-15 15:52 ` Chris Mason 2011-12-15 16:08 ` Loke, Chetan @ 2011-12-15 17:44 ` J. Bruce Fields 2 siblings, 0 replies; 29+ messages in thread From: J. Bruce Fields @ 2011-12-15 17:44 UTC (permalink / raw) To: Trond Myklebust Cc: Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > >>We had an active thread a couple of years back that came out of the > > > >>reflink work and, at the time, there seemed to be moderately > > > >>positive support for adding a new system call that would fit this > > > >>use case (Joel Becker's copyfile()). > > > >> > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > >>or should we look at other hooks? > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > >cross-device case anyway. > > > > > > I think that this approach makes a lot of sense. Most of the > > > devices/targets that support the copy offload, will do it in very > > > reasonable amounts of time. > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > one operation: > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > least a "please don't bother if this is going to take 8 hours" flag....) > > How would the server know? Sorry, "8 hours" was a joke--no, you can't require the server to predict whether an operation will take more or less than some precise duration. I'm assuming the "fast" case that Al's proposing we do as a first step cover CoW operations? (So O(1) or close to it, users typically won't be asking for progress reports, operation may be atomic (with no partial-failure case), ?) --b. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:42 ` Ric Wheeler 2011-12-14 22:27 ` J. Bruce Fields @ 2011-12-16 8:00 ` Joel Becker 1 sibling, 0 replies; 29+ messages in thread From: Joel Becker @ 2011-12-16 8:00 UTC (permalink / raw) To: Ric Wheeler Cc: Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, James Bottomley On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > On 12/14/2011 02:27 PM, Al Viro wrote: > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > >>We had an active thread a couple of years back that came out of the > >>reflink work and, at the time, there seemed to be moderately > >>positive support for adding a new system call that would fit this > >>use case (Joel Becker's copyfile()). > >> > >>Can we resurrect this effort? Is copyfile() still a good way to go, > >>or should we look at other hooks? > >copyfile(2) is probably a good way to go, provided that we do _not_ > >go baroque as it had happened the last time syscall had been discussed. > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > >If it works - fine, if not - caller has to be ready to deal with handling > >cross-device case anyway. > > I think that this approach makes a lot of sense. Most of the > devices/targets that support the copy offload, will do it in very > reasonable amounts of time. > > Let me see if I can dig up some of the presentations from the NetApp > guys who presented overviews or the specifications from the IETF and > T10.... Whee! I've been down the rabbit hole, but I've promised myself to get the updated patch out soon. I know that Trond et al are probably wondering what happened to the patch. more soon. Joel -- Life's Little Instruction Book #207 "Swing for the fence." http://www.jlbec.org/ jlbec@evilplan.org ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:22 copy offload support in Linux - new system call needed? Ric Wheeler 2011-12-14 19:27 ` Al Viro @ 2011-12-14 19:59 ` Jeremy Allison 2011-12-14 20:30 ` Ric Wheeler 2011-12-19 22:19 ` H. Peter Anvin 1 sibling, 2 replies; 29+ messages in thread From: Jeremy Allison @ 2011-12-14 19:59 UTC (permalink / raw) To: Ric Wheeler Cc: linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > Back at LinuxCon Prague, we talked about the new NFS and SCSI > commands that let us offload copy operations to a storage device > (like an NFS server or storage array). > > This got new life in the virtual machine world where you might want > to clone bulky guest files or ranges of blocks and was driven > through the standards bodies by vmware, microsoft and some of the > major storage vendors. Windows8 has this functionality fully coded > and integrated in the GUI, I assume vmware also uses it and there > are some vendors who announced support at the SNIA SDC conference. > > We had an active thread a couple of years back that came out of the > reflink work and, at the time, there seemed to be moderately > positive support for adding a new system call that would fit this > use case (Joel Becker's copyfile()). > > Can we resurrect this effort? Is copyfile() still a good way to go, > or should we look at other hooks? Windows uses a COPYCHUNK call, which specifies the following parameters: Definition of a copy "chunk": hyper source_off; hyper target_off; uint32 length; and an array of these chunks which is passed into their kernel. This is what we have to implement in Samba. Jeremy. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:59 ` Jeremy Allison @ 2011-12-14 20:30 ` Ric Wheeler 2011-12-19 12:38 ` Hannes Reinecke 2011-12-19 22:19 ` H. Peter Anvin 1 sibling, 1 reply; 29+ messages in thread From: Ric Wheeler @ 2011-12-14 20:30 UTC (permalink / raw) To: Jeremy Allison Cc: linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On 12/14/2011 02:59 PM, Jeremy Allison wrote: > On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: >> Back at LinuxCon Prague, we talked about the new NFS and SCSI >> commands that let us offload copy operations to a storage device >> (like an NFS server or storage array). >> >> This got new life in the virtual machine world where you might want >> to clone bulky guest files or ranges of blocks and was driven >> through the standards bodies by vmware, microsoft and some of the >> major storage vendors. Windows8 has this functionality fully coded >> and integrated in the GUI, I assume vmware also uses it and there >> are some vendors who announced support at the SNIA SDC conference. >> >> We had an active thread a couple of years back that came out of the >> reflink work and, at the time, there seemed to be moderately >> positive support for adding a new system call that would fit this >> use case (Joel Becker's copyfile()). >> >> Can we resurrect this effort? Is copyfile() still a good way to go, >> or should we look at other hooks? > Windows uses a COPYCHUNK call, which specifies the > following parameters: > > Definition of a copy "chunk": > > hyper source_off; > hyper target_off; > uint32 length; > > and an array of these chunks which is passed > into their kernel. > > This is what we have to implement in Samba. > > Jeremy. This is a public pointer to the draft NFS proposal: http://tools.ietf.org/id/draft-lentini-nfsv4-server-side-copy-06.txt The T10 site has some click through that I was not too happy about agreeing to. NetApp (Fred Knight) had some nice presentations that he presented about how SCSI does this in two different ways... Ric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 20:30 ` Ric Wheeler @ 2011-12-19 12:38 ` Hannes Reinecke 0 siblings, 0 replies; 29+ messages in thread From: Hannes Reinecke @ 2011-12-19 12:38 UTC (permalink / raw) To: Ric Wheeler Cc: Jeremy Allison, linux-scsi@vger.kernel.org, linux-fsdevel, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On 12/14/2011 09:30 PM, Ric Wheeler wrote: > On 12/14/2011 02:59 PM, Jeremy Allison wrote: >> On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: >>> Back at LinuxCon Prague, we talked about the new NFS and SCSI >>> commands that let us offload copy operations to a storage device >>> (like an NFS server or storage array). >>> >>> This got new life in the virtual machine world where you might want >>> to clone bulky guest files or ranges of blocks and was driven >>> through the standards bodies by vmware, microsoft and some of the >>> major storage vendors. Windows8 has this functionality fully coded >>> and integrated in the GUI, I assume vmware also uses it and there >>> are some vendors who announced support at the SNIA SDC conference. >>> >>> We had an active thread a couple of years back that came out of the >>> reflink work and, at the time, there seemed to be moderately >>> positive support for adding a new system call that would fit this >>> use case (Joel Becker's copyfile()). >>> >>> Can we resurrect this effort? Is copyfile() still a good way to go, >>> or should we look at other hooks? >> Windows uses a COPYCHUNK call, which specifies the >> following parameters: >> >> Definition of a copy "chunk": >> >> hyper source_off; >> hyper target_off; >> uint32 length; >> >> and an array of these chunks which is passed >> into their kernel. >> >> This is what we have to implement in Samba. >> >> Jeremy. > > This is a public pointer to the draft NFS proposal: > > http://tools.ietf.org/id/draft-lentini-nfsv4-server-side-copy-06.txt > > The T10 site has some click through that I was not too happy about > agreeing to. NetApp (Fred Knight) had some nice presentations that > he presented about how SCSI does this in two different ways... > Yes, the 'XCOPY Lite' mechanism. With that the whole copy process is broken into two steps: - Create a reference to the requested blocks - Use that reference to request the operation The neat thing with that is that there might be some delay between those steps, effectively creating a snapshot in time. An additional bonus is that one doesn't have to create those over-complicated source and target descriptors, but rather have the array create one for you. So all-in-all nice and easy to use. With the slight disadvantage that no-one implements it. Yet. Hence we might be wanting to use the old-style EXTENDED COPY after all ... However, both approaches have in common that an opaque 'identifier' is used to identify any currently running copy process. So when designing this interface we should keep in mind that we would need to store this identifier somewhere. As as loath as I'm to admit it, the async-I/O mechanism would fit the bill far better than a single copyfile() call ... Which could be easily implemented on top of the Async I/O call, btw. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:59 ` Jeremy Allison 2011-12-14 20:30 ` Ric Wheeler @ 2011-12-19 22:19 ` H. Peter Anvin 2011-12-19 22:34 ` Jeremy Allison 2011-12-19 22:57 ` Dave Chinner 1 sibling, 2 replies; 29+ messages in thread From: H. Peter Anvin @ 2011-12-19 22:19 UTC (permalink / raw) To: Jeremy Allison Cc: Ric Wheeler, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On 12/14/2011 11:59 AM, Jeremy Allison wrote: >> >> Can we resurrect this effort? Is copyfile() still a good way to go, >> or should we look at other hooks? > > Windows uses a COPYCHUNK call, which specifies the > following parameters: > > Definition of a copy "chunk": > > hyper source_off; > hyper target_off; > uint32 length; > > and an array of these chunks which is passed > into their kernel. > > This is what we have to implement in Samba. > Could we do this by (re-)allowing sendfile() between two files? -hpa ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-19 22:19 ` H. Peter Anvin @ 2011-12-19 22:34 ` Jeremy Allison 2011-12-19 22:57 ` Dave Chinner 1 sibling, 0 replies; 29+ messages in thread From: Jeremy Allison @ 2011-12-19 22:34 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Allison, Ric Wheeler, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Mon, Dec 19, 2011 at 02:19:43PM -0800, H. Peter Anvin wrote: > On 12/14/2011 11:59 AM, Jeremy Allison wrote: > >> > >> Can we resurrect this effort? Is copyfile() still a good way to go, > >> or should we look at other hooks? > > > > Windows uses a COPYCHUNK call, which specifies the > > following parameters: > > > > Definition of a copy "chunk": > > > > hyper source_off; > > hyper target_off; > > uint32 length; > > > > and an array of these chunks which is passed > > into their kernel. > > > > This is what we have to implement in Samba. > > > > Could we do this by (re-)allowing sendfile() between two files? Oooh - nice idea ! Yes, having a completely symmetric sendfile which allows socket -> file, file -> socket, socket -> socket, file -> file would be a great idea (IMHO). Jeremy. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-19 22:19 ` H. Peter Anvin 2011-12-19 22:34 ` Jeremy Allison @ 2011-12-19 22:57 ` Dave Chinner 2011-12-19 23:29 ` H. Peter Anvin 1 sibling, 1 reply; 29+ messages in thread From: Dave Chinner @ 2011-12-19 22:57 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Allison, Ric Wheeler, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Mon, Dec 19, 2011 at 02:19:43PM -0800, H. Peter Anvin wrote: > On 12/14/2011 11:59 AM, Jeremy Allison wrote: > >> > >> Can we resurrect this effort? Is copyfile() still a good way to go, > >> or should we look at other hooks? > > > > Windows uses a COPYCHUNK call, which specifies the > > following parameters: > > > > Definition of a copy "chunk": > > > > hyper source_off; > > hyper target_off; > > uint32 length; > > > > and an array of these chunks which is passed > > into their kernel. > > > > This is what we have to implement in Samba. > > > > Could we do this by (re-)allowing sendfile() between two files? That was my immediate thought, but sendfile has plumbing that is page cache based and we require completely different infrastructure and semantics for an array offload. e.g. for an array offload, we have to flush the source file page cache first so that the data being copied is known to be on disk, then invalidate the destination page cache if overwriting or extend and pre-allocate blocks if not. Then we have to map both files and hand that off to the array. Then there's a whole bunch of tricky questions about what the state of the destination file should look like while the copy is in progress, whether the source file should be allowed to change (e.g. it can't be truncated and have blocks freed and then reused by other files half way through the copy offload operation), and so on. sendfile() has well known, fixed semantics that we can't change to suit what is needed for an offload operation that could potentially take hours to complete. Hence I think an new syscall is the way to go.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-19 22:57 ` Dave Chinner @ 2011-12-19 23:29 ` H. Peter Anvin 0 siblings, 0 replies; 29+ messages in thread From: H. Peter Anvin @ 2011-12-19 23:29 UTC (permalink / raw) To: Dave Chinner Cc: Jeremy Allison, Ric Wheeler, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On 12/19/2011 02:57 PM, Dave Chinner wrote: > > That was my immediate thought, but sendfile has plumbing that is > page cache based and we require completely different infrastructure > and semantics for an array offload. > The plumbing is internal to the kernel and doesn't mean we have to use the same VFS methods. > e.g. for an array offload, we have to flush the source file page > cache first so that the data being copied is known to be on disk, > then invalidate the destination page cache if overwriting or extend > and pre-allocate blocks if not. Then we have to map both files and > hand that off to the array. > > Then there's a whole bunch of tricky questions about what the state > of the destination file should look like while the copy is in > progress, whether the source file should be allowed to change (e.g. > it can't be truncated and have blocks freed and then reused by other > files half way through the copy offload operation), and so on. > > sendfile() has well known, fixed semantics that we can't change to > suit what is needed for an offload operation that could potentially > take hours to complete. Hence I think an new syscall is the way to > go.... Perhaps what we need first in an explicit enumeration of the semantics you're looking for. -hpa ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2011-12-19 23:30 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-14 19:22 copy offload support in Linux - new system call needed? Ric Wheeler 2011-12-14 19:27 ` Al Viro 2011-12-14 19:42 ` Ric Wheeler 2011-12-14 22:27 ` J. Bruce Fields 2011-12-15 14:59 ` Trond Myklebust 2011-12-15 15:52 ` Chris Mason 2011-12-15 16:00 ` Trond Myklebust 2011-12-15 16:03 ` Jeff Layton 2011-12-15 16:06 ` Trond Myklebust 2011-12-15 16:16 ` Jeff Layton 2011-12-15 16:38 ` Trond Myklebust 2011-12-15 16:08 ` Loke, Chetan 2011-12-15 16:11 ` Trond Myklebust 2011-12-15 16:40 ` Loke, Chetan 2011-12-15 16:53 ` Trond Myklebust 2011-12-15 17:18 ` Ric Wheeler 2011-12-15 17:25 ` Trond Myklebust 2011-12-15 17:31 ` Loke, Chetan 2011-12-15 17:55 ` Ric Wheeler 2011-12-15 17:27 ` Loke, Chetan 2011-12-15 17:44 ` J. Bruce Fields 2011-12-16 8:00 ` Joel Becker 2011-12-14 19:59 ` Jeremy Allison 2011-12-14 20:30 ` Ric Wheeler 2011-12-19 12:38 ` Hannes Reinecke 2011-12-19 22:19 ` H. Peter Anvin 2011-12-19 22:34 ` Jeremy Allison 2011-12-19 22:57 ` Dave Chinner 2011-12-19 23:29 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).