* copy offload support in Linux - new system call needed? @ 2011-12-14 19:22 Ric Wheeler 2011-12-14 19:27 ` Al Viro 2011-12-14 19:59 ` Jeremy Allison 0 siblings, 2 replies; 29+ messages in thread From: Ric Wheeler @ 2011-12-14 19:22 UTC (permalink / raw) To: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley Back at LinuxCon Prague, we talked about the new NFS and SCSI commands that let us offload copy operations to a storage device (like an NFS server or storage array). This got new life in the virtual machine world where you might want to clone bulky guest files or ranges of blocks and was driven through the standards bodies by vmware, microsoft and some of the major storage vendors. Windows8 has this functionality fully coded and integrated in the GUI, I assume vmware also uses it and there are some vendors who announced support at the SNIA SDC conference. We had an active thread a couple of years back that came out of the reflink work and, at the time, there seemed to be moderately positive support for adding a new system call that would fit this use case (Joel Becker's copyfile()). Can we resurrect this effort? Is copyfile() still a good way to go, or should we look at other hooks? Thanks! Ric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:22 copy offload support in Linux - new system call needed? Ric Wheeler @ 2011-12-14 19:27 ` Al Viro [not found] ` <20111214192739.GN2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2011-12-14 19:59 ` Jeremy Allison 1 sibling, 1 reply; 29+ messages in thread From: Al Viro @ 2011-12-14 19:27 UTC (permalink / raw) To: Ric Wheeler Cc: linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > We had an active thread a couple of years back that came out of the > reflink work and, at the time, there seemed to be moderately > positive support for adding a new system call that would fit this > use case (Joel Becker's copyfile()). > > Can we resurrect this effort? Is copyfile() still a good way to go, > or should we look at other hooks? copyfile(2) is probably a good way to go, provided that we do _not_ go baroque as it had happened the last time syscall had been discussed. IOW, to hell with progress reports, etc. - just a fastpath kind of thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). If it works - fine, if not - caller has to be ready to deal with handling cross-device case anyway. ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <20111214192739.GN2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>]
* Re: copy offload support in Linux - new system call needed? [not found] ` <20111214192739.GN2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> @ 2011-12-14 19:42 ` Ric Wheeler [not found] ` <4EE8FC2E.3010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2011-12-16 8:00 ` Joel Becker 0 siblings, 2 replies; 29+ messages in thread From: Ric Wheeler @ 2011-12-14 19:42 UTC (permalink / raw) To: Al Viro Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On 12/14/2011 02:27 PM, Al Viro wrote: > On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > >> We had an active thread a couple of years back that came out of the >> reflink work and, at the time, there seemed to be moderately >> positive support for adding a new system call that would fit this >> use case (Joel Becker's copyfile()). >> >> Can we resurrect this effort? Is copyfile() still a good way to go, >> or should we look at other hooks? > copyfile(2) is probably a good way to go, provided that we do _not_ > go baroque as it had happened the last time syscall had been discussed. > > IOW, to hell with progress reports, etc. - just a fastpath kind of > thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > If it works - fine, if not - caller has to be ready to deal with handling > cross-device case anyway. I think that this approach makes a lot of sense. Most of the devices/targets that support the copy offload, will do it in very reasonable amounts of time. Let me see if I can dig up some of the presentations from the NetApp guys who presented overviews or the specifications from the IETF and T10.... Ric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <4EE8FC2E.3010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: copy offload support in Linux - new system call needed? [not found] ` <4EE8FC2E.3010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2011-12-14 22:27 ` J. Bruce Fields 2011-12-15 14:59 ` Trond Myklebust 0 siblings, 1 reply; 29+ messages in thread From: J. Bruce Fields @ 2011-12-14 22:27 UTC (permalink / raw) To: Ric Wheeler Cc: Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > On 12/14/2011 02:27 PM, Al Viro wrote: > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > >>We had an active thread a couple of years back that came out of the > >>reflink work and, at the time, there seemed to be moderately > >>positive support for adding a new system call that would fit this > >>use case (Joel Becker's copyfile()). > >> > >>Can we resurrect this effort? Is copyfile() still a good way to go, > >>or should we look at other hooks? > >copyfile(2) is probably a good way to go, provided that we do _not_ > >go baroque as it had happened the last time syscall had been discussed. > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > >If it works - fine, if not - caller has to be ready to deal with handling > >cross-device case anyway. > > I think that this approach makes a lot of sense. Most of the > devices/targets that support the copy offload, will do it in very > reasonable amounts of time. The current NFSv4.2 draft rolls both the "fast" and "slow" cases into one operation: http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 Perhaps we should ask for separate operations for the two cases. (Or at least a "please don't bother if this is going to take 8 hours" flag....) --b. > > Let me see if I can dig up some of the presentations from the NetApp > guys who presented overviews or the specifications from the IETF and > T10.... > > Ric > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 22:27 ` J. Bruce Fields @ 2011-12-15 14:59 ` Trond Myklebust 2011-12-15 15:52 ` Chris Mason ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 14:59 UTC (permalink / raw) To: J. Bruce Fields Cc: Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > On 12/14/2011 02:27 PM, Al Viro wrote: > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > >>We had an active thread a couple of years back that came out of the > > >>reflink work and, at the time, there seemed to be moderately > > >>positive support for adding a new system call that would fit this > > >>use case (Joel Becker's copyfile()). > > >> > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > >>or should we look at other hooks? > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > >go baroque as it had happened the last time syscall had been discussed. > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > >If it works - fine, if not - caller has to be ready to deal with handling > > >cross-device case anyway. > > > > I think that this approach makes a lot of sense. Most of the > > devices/targets that support the copy offload, will do it in very > > reasonable amounts of time. > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > one operation: > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > Perhaps we should ask for separate operations for the two cases. (Or at > least a "please don't bother if this is going to take 8 hours" flag....) How would the server know? I suggest we deal with this by adding an ioctl() to allow the application to poll for progress: I'm assuming now that we don't expect more than 1 copyfile() system call at a time per file descriptor... Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 14:59 ` Trond Myklebust @ 2011-12-15 15:52 ` Chris Mason 2011-12-15 16:00 ` Trond Myklebust 2011-12-15 16:03 ` Jeff Layton 2011-12-15 16:08 ` Loke, Chetan [not found] ` <1323961140.14317.2.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 2 siblings, 2 replies; 29+ messages in thread From: Chris Mason @ 2011-12-15 15:52 UTC (permalink / raw) To: Trond Myklebust Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > >>We had an active thread a couple of years back that came out of the > > > >>reflink work and, at the time, there seemed to be moderately > > > >>positive support for adding a new system call that would fit this > > > >>use case (Joel Becker's copyfile()). > > > >> > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > >>or should we look at other hooks? > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > >cross-device case anyway. > > > > > > I think that this approach makes a lot of sense. Most of the > > > devices/targets that support the copy offload, will do it in very > > > reasonable amounts of time. > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > one operation: > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > least a "please don't bother if this is going to take 8 hours" flag....) > > How would the server know? I suggest we deal with this by adding an > ioctl() to allow the application to poll for progress: I'm assuming now > that we don't expect more than 1 copyfile() system call at a time per > file descriptor... If we're using this to copy VM image files, I could easily imagine wanting to clone multiple copies of the VM in parallel. -chris ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 15:52 ` Chris Mason @ 2011-12-15 16:00 ` Trond Myklebust 2011-12-15 16:03 ` Jeff Layton 1 sibling, 0 replies; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:00 UTC (permalink / raw) To: Chris Mason Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On Thu, 2011-12-15 at 10:52 -0500, Chris Mason wrote: > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > >>reflink work and, at the time, there seemed to be moderately > > > > >>positive support for adding a new system call that would fit this > > > > >>use case (Joel Becker's copyfile()). > > > > >> > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > >>or should we look at other hooks? > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > >cross-device case anyway. > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > devices/targets that support the copy offload, will do it in very > > > > reasonable amounts of time. > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > one operation: > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > How would the server know? I suggest we deal with this by adding an > > ioctl() to allow the application to poll for progress: I'm assuming now > > that we don't expect more than 1 copyfile() system call at a time per > > file descriptor... > > If we're using this to copy VM image files, I could easily imagine > wanting to clone multiple copies of the VM in parallel. Sure, but in that case, your target file descriptors will differ, right? Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 15:52 ` Chris Mason 2011-12-15 16:00 ` Trond Myklebust @ 2011-12-15 16:03 ` Jeff Layton [not found] ` <20111215110330.33aed3a6-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org> 1 sibling, 1 reply; 29+ messages in thread From: Jeff Layton @ 2011-12-15 16:03 UTC (permalink / raw) To: Chris Mason Cc: Trond Myklebust, J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On Thu, 15 Dec 2011 10:52:13 -0500 Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote: > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > >>reflink work and, at the time, there seemed to be moderately > > > > >>positive support for adding a new system call that would fit this > > > > >>use case (Joel Becker's copyfile()). > > > > >> > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > >>or should we look at other hooks? > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > >cross-device case anyway. > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > devices/targets that support the copy offload, will do it in very > > > > reasonable amounts of time. > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > one operation: > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > How would the server know? I suggest we deal with this by adding an > > ioctl() to allow the application to poll for progress: I'm assuming now > > that we don't expect more than 1 copyfile() system call at a time per > > file descriptor... > > If we're using this to copy VM image files, I could easily imagine > wanting to clone multiple copies of the VM in parallel. > > -chris > Not really a problem is it? Just dup() the fd before you issue the copyfile()? Or even simpler, just do periodic stat() on the destination file if you want a progress report. Regardless, I like the simple approach that Al is suggesting here. -- Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <20111215110330.33aed3a6-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>]
* Re: copy offload support in Linux - new system call needed? [not found] ` <20111215110330.33aed3a6-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org> @ 2011-12-15 16:06 ` Trond Myklebust [not found] ` <1323965176.14317.11.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:06 UTC (permalink / raw) To: Jeff Layton Cc: Chris Mason, J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote: > On Thu, 15 Dec 2011 10:52:13 -0500 > Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote: > > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > > >>reflink work and, at the time, there seemed to be moderately > > > > > >>positive support for adding a new system call that would fit this > > > > > >>use case (Joel Becker's copyfile()). > > > > > >> > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > > >>or should we look at other hooks? > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > > >cross-device case anyway. > > > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > > devices/targets that support the copy offload, will do it in very > > > > > reasonable amounts of time. > > > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > > one operation: > > > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > > > How would the server know? I suggest we deal with this by adding an > > > ioctl() to allow the application to poll for progress: I'm assuming now > > > that we don't expect more than 1 copyfile() system call at a time per > > > file descriptor... > > > > If we're using this to copy VM image files, I could easily imagine > > wanting to clone multiple copies of the VM in parallel. > > > > -chris > > > > Not really a problem is it? Just dup() the fd before you issue the > copyfile()? Or even simpler, just do periodic stat() on the destination > file if you want a progress report. > > Regardless, I like the simple approach that Al is suggesting here. Periodic stat() isn't good enough if you are copying subranges of a file. Part of the application here (as I understood it) is to initialise specific disk volumes on existing VM images when doing thin provisioning. In that case, the reported image size won't ever change... Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <1323965176.14317.11.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>]
* Re: copy offload support in Linux - new system call needed? [not found] ` <1323965176.14317.11.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> @ 2011-12-15 16:16 ` Jeff Layton 2011-12-15 16:38 ` Trond Myklebust 0 siblings, 1 reply; 29+ messages in thread From: Jeff Layton @ 2011-12-15 16:16 UTC (permalink / raw) To: Trond Myklebust Cc: Chris Mason, J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On Thu, 15 Dec 2011 11:06:16 -0500 Trond Myklebust <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote: > On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote: > > On Thu, 15 Dec 2011 10:52:13 -0500 > > Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote: > > > > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > > > >>reflink work and, at the time, there seemed to be moderately > > > > > > >>positive support for adding a new system call that would fit this > > > > > > >>use case (Joel Becker's copyfile()). > > > > > > >> > > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > > > >>or should we look at other hooks? > > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > > > >cross-device case anyway. > > > > > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > > > devices/targets that support the copy offload, will do it in very > > > > > > reasonable amounts of time. > > > > > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > > > one operation: > > > > > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > > > > > How would the server know? I suggest we deal with this by adding an > > > > ioctl() to allow the application to poll for progress: I'm assuming now > > > > that we don't expect more than 1 copyfile() system call at a time per > > > > file descriptor... > > > > > > If we're using this to copy VM image files, I could easily imagine > > > wanting to clone multiple copies of the VM in parallel. > > > > > > -chris > > > > > > > Not really a problem is it? Just dup() the fd before you issue the > > copyfile()? Or even simpler, just do periodic stat() on the destination > > file if you want a progress report. > > > > Regardless, I like the simple approach that Al is suggesting here. > > Periodic stat() isn't good enough if you are copying subranges of a > file. Part of the application here (as I understood it) is to initialise > specific disk volumes on existing VM images when doing thin > provisioning. In that case, the reported image size won't ever change... > If they were sparse files then st_blocks would presumably change, but that's not necessarily going to be the case. So, ok stat() is out for this... What's the use-case for these sorts of progress reports anyway? Progress meters in GUI apps? Either way, I think adding as simple an interface as possible to begin with makes sense. If you want to add progress reports or other doohickeys later, then that can be done in a separate set of patches... -- Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 16:16 ` Jeff Layton @ 2011-12-15 16:38 ` Trond Myklebust 0 siblings, 0 replies; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:38 UTC (permalink / raw) To: Jeff Layton Cc: Chris Mason, J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 2011-12-15 at 11:16 -0500, Jeff Layton wrote: > On Thu, 15 Dec 2011 11:06:16 -0500 > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote: > > > On Thu, 15 Dec 2011 10:52:13 -0500 > > > Chris Mason <chris.mason@oracle.com> wrote: > > > > > > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > > > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > > > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > > > > > > > > > >>We had an active thread a couple of years back that came out of the > > > > > > > >>reflink work and, at the time, there seemed to be moderately > > > > > > > >>positive support for adding a new system call that would fit this > > > > > > > >>use case (Joel Becker's copyfile()). > > > > > > > >> > > > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > > > > > >>or should we look at other hooks? > > > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > > > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > > > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > > > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > > > > > >cross-device case anyway. > > > > > > > > > > > > > > I think that this approach makes a lot of sense. Most of the > > > > > > > devices/targets that support the copy offload, will do it in very > > > > > > > reasonable amounts of time. > > > > > > > > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > > > > > one operation: > > > > > > > > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > > > > > > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > > > > > least a "please don't bother if this is going to take 8 hours" flag....) > > > > > > > > > > How would the server know? I suggest we deal with this by adding an > > > > > ioctl() to allow the application to poll for progress: I'm assuming now > > > > > that we don't expect more than 1 copyfile() system call at a time per > > > > > file descriptor... > > > > > > > > If we're using this to copy VM image files, I could easily imagine > > > > wanting to clone multiple copies of the VM in parallel. > > > > > > > > -chris > > > > > > > > > > Not really a problem is it? Just dup() the fd before you issue the > > > copyfile()? Or even simpler, just do periodic stat() on the destination > > > file if you want a progress report. > > > > > > Regardless, I like the simple approach that Al is suggesting here. > > > > Periodic stat() isn't good enough if you are copying subranges of a > > file. Part of the application here (as I understood it) is to initialise > > specific disk volumes on existing VM images when doing thin > > provisioning. In that case, the reported image size won't ever change... > > > > If they were sparse files then st_blocks would presumably change, but > that's not necessarily going to be the case. So, ok stat() is out for > this... > > What's the use-case for these sorts of progress reports anyway? > Progress meters in GUI apps? Mainly... If you are copying several GB worth of data, you expect it to take some time, but you'd like to know that the server hasn't just crashed or something... > Either way, I think adding as simple an interface as possible to begin > with makes sense. If you want to add progress reports or other > doohickeys later, then that can be done in a separate set of patches... Agreed. ...and doing it as an ioctl allows for that. I just want to make sure someone else here doesn't have a use case that might blow that idea out of the water... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 14:59 ` Trond Myklebust 2011-12-15 15:52 ` Chris Mason @ 2011-12-15 16:08 ` Loke, Chetan [not found] ` <D3F292ADF945FB49B35E96C94C2061B91516E391-2s2rCY1e8UXHBhWB4kaBDUEOCMrvLtNR@public.gmane.org> [not found] ` <1323961140.14317.2.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 2 siblings, 1 reply; 29+ messages in thread From: Loke, Chetan @ 2011-12-15 16:08 UTC (permalink / raw) To: Trond Myklebust, J. Bruce Fields Cc: Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley > How would the server know? I suggest we deal with this by adding an > ioctl() to allow the application to poll for progress: I'm assuming now Why not support something like the async-iocb? ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <D3F292ADF945FB49B35E96C94C2061B91516E391-2s2rCY1e8UXHBhWB4kaBDUEOCMrvLtNR@public.gmane.org>]
* RE: copy offload support in Linux - new system call needed? [not found] ` <D3F292ADF945FB49B35E96C94C2061B91516E391-2s2rCY1e8UXHBhWB4kaBDUEOCMrvLtNR@public.gmane.org> @ 2011-12-15 16:11 ` Trond Myklebust 2011-12-15 16:40 ` Loke, Chetan 0 siblings, 1 reply; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:11 UTC (permalink / raw) To: Loke, Chetan Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On Thu, 2011-12-15 at 11:08 -0500, Loke, Chetan wrote: > > How would the server know? I suggest we deal with this by adding an > > ioctl() to allow the application to poll for progress: I'm assuming now > > Why not support something like the async-iocb? You could, but that would tie copyfile() to the aio interface which was one of the things that I believe Al was opposed to when we discussed this at LSF/MM-2010. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 16:11 ` Trond Myklebust @ 2011-12-15 16:40 ` Loke, Chetan 2011-12-15 16:53 ` Trond Myklebust 0 siblings, 1 reply; 29+ messages in thread From: Loke, Chetan @ 2011-12-15 16:40 UTC (permalink / raw) To: Trond Myklebust Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley > > > > Why not support something like the async-iocb? > > You could, but that would tie copyfile() to the aio interface which was > one of the things that I believe Al was opposed to when we discussed > this at LSF/MM-2010. > virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors. So let's think about it from end-users perspective: Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation. We can just support full-copy. Partial copies can be returned as failure. ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 16:40 ` Loke, Chetan @ 2011-12-15 16:53 ` Trond Myklebust [not found] ` <1323968015.14317.28.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 16:53 UTC (permalink / raw) To: Loke, Chetan Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 2011-12-15 at 11:40 -0500, Loke, Chetan wrote: > > > > > > Why not support something like the async-iocb? > > > > You could, but that would tie copyfile() to the aio interface which was > > one of the things that I believe Al was opposed to when we discussed > > this at LSF/MM-2010. > > > > virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors. The application is thin provisioning, not the 'cp' command. When virtualisation vendors do support this, it will mainly be as part of their image management toolkits, not the hypervisor. > So let's think about it from end-users perspective: > Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation. 'Am I done' is easy: copyfile() returns with the number of bytes that have been copied. 'Is my copyfile() syscall making progress' is the question that needs answering. > We can just support full-copy. Partial copies can be returned as failure. Then you have to check the entire range on error instead of just resuming the copy from where it stopped. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <1323968015.14317.28.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>]
* Re: copy offload support in Linux - new system call needed? [not found] ` <1323968015.14317.28.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> @ 2011-12-15 17:18 ` Ric Wheeler 2011-12-15 17:25 ` Trond Myklebust 2011-12-15 17:31 ` Loke, Chetan 2011-12-15 17:27 ` Loke, Chetan 1 sibling, 2 replies; 29+ messages in thread From: Ric Wheeler @ 2011-12-15 17:18 UTC (permalink / raw) To: Trond Myklebust Cc: Loke, Chetan, J. Bruce Fields, Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On 12/15/2011 11:53 AM, Trond Myklebust wrote: > On Thu, 2011-12-15 at 11:40 -0500, Loke, Chetan wrote: >>>> Why not support something like the async-iocb? >>> You could, but that would tie copyfile() to the aio interface which was >>> one of the things that I believe Al was opposed to when we discussed >>> this at LSF/MM-2010. >>> >> virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors. > The application is thin provisioning, not the 'cp' command. When > virtualisation vendors do support this, it will mainly be as part of > their image management toolkits, not the hypervisor. I think that hypervisor vendors will be very interested in this feature which would explain why vmware was active in drafting both the NFS and T10 specs. Not to mention those of us who use KVM or XEN :) As Trond mentions, we might have this in the management tool chain or other places in the stack. > >> So let's think about it from end-users perspective: >> Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation. > 'Am I done' is easy: copyfile() returns with the number of bytes that > have been copied. > > 'Is my copyfile() syscall making progress' is the question that needs > answering. > >> We can just support full-copy. Partial copies can be returned as failure. > Then you have to check the entire range on error instead of just > resuming the copy from where it stopped. > I also like simple first. I am not too certain about the need for polling (especially given how little we have done historically to take advantage of the notifications, water marks, etc in things like thin provisioning :)). On the other hand, I also don't object to having the ability to poll (through the ioctl or whatever) if others find that useful. What I would like to see is a way to make sure that we can interrupt any long running command & also make sure that our timeouts (for SCSI specifically) are not too aggressive. Ric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 17:18 ` Ric Wheeler @ 2011-12-15 17:25 ` Trond Myklebust 2011-12-15 17:31 ` Loke, Chetan 1 sibling, 0 replies; 29+ messages in thread From: Trond Myklebust @ 2011-12-15 17:25 UTC (permalink / raw) To: Ric Wheeler Cc: Loke, Chetan, J. Bruce Fields, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Thu, 2011-12-15 at 12:18 -0500, Ric Wheeler wrote: > What I would like to see is a way to make sure that we can interrupt any long > running command & also make sure that our timeouts (for SCSI specifically) are > not too aggressive. The draft NFSv4.2 protocol contains features to make interruption possible, so as far as the NFS client is concerned, that should be doable. I can't answer for CIFS or SCSI... Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? 2011-12-15 17:18 ` Ric Wheeler 2011-12-15 17:25 ` Trond Myklebust @ 2011-12-15 17:31 ` Loke, Chetan 2011-12-15 17:55 ` Ric Wheeler 1 sibling, 1 reply; 29+ messages in thread From: Loke, Chetan @ 2011-12-15 17:31 UTC (permalink / raw) To: Ric Wheeler, Trond Myklebust Cc: J. Bruce Fields, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley > > I think that hypervisor vendors will be very interested in this feature > which > would explain why vmware was active in drafting both the NFS and T10 Specs are the only way to convince storage-target-vendors ;). Otherwise target-stack will need to implement multiple custom-CDB-handlers for different front-end APIs(which is ugly). Chetan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-15 17:31 ` Loke, Chetan @ 2011-12-15 17:55 ` Ric Wheeler 0 siblings, 0 replies; 29+ messages in thread From: Ric Wheeler @ 2011-12-15 17:55 UTC (permalink / raw) To: Loke, Chetan Cc: Trond Myklebust, J. Bruce Fields, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On 12/15/2011 12:31 PM, Loke, Chetan wrote: >> I think that hypervisor vendors will be very interested in this feature >> which >> would explain why vmware was active in drafting both the NFS and T10 > Specs are the only way to convince storage-target-vendors ;). Otherwise target-stack will need to implement multiple custom-CDB-handlers for different front-end APIs(which is ugly). > > > Chetan Hi Chetan, I should post from my "Red Hat" email to make this less confusing for you - I know that this is in fact interesting to vendors :) Thanks! Ric ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed? [not found] ` <1323968015.14317.28.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 2011-12-15 17:18 ` Ric Wheeler @ 2011-12-15 17:27 ` Loke, Chetan 1 sibling, 0 replies; 29+ messages in thread From: Loke, Chetan @ 2011-12-15 17:27 UTC (permalink / raw) To: Trond Myklebust Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley > > virtualization vendors who support this offload do it at a layer > above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So > I think 'copyfile' is going to be appealing to application-developers > more than the hypervisor-vendors. > > The application is thin provisioning, not the 'cp' command. When thin-provisioning is one use-case. There are quite a few use-cases of 'copyfile' depending on your business-logic and the type of appliance you sell. > virtualisation vendors do support this, it will mainly be as part of > their image management toolkits, not the hypervisor. > Toolkits? May not be true. The toolkit might need to talk to some hypervisor-component to ensure LUN-locking etc on the target. So this is not entirely isolated as you might think. There is some integration. As an example(just to prove the point) - Have you ever seen anyone not use vsphere-client on VMware for copying VM templates? > > So let's think about it from end-users perspective: > > Won't everyone replicate code to check - 'Am I done'? It will just > make application folks write more (ugly)code. Because you would then > have to maintain another queue/etc to check for this operation. > > 'Am I done' is easy: copyfile() returns with the number of bytes that > have been copied. > > 'Is my copyfile() syscall making progress' is the question that needs > answering. > Understood. But as a user, we don't know what 'am I done' is going to report. 'am I done' can return: 1)ACK[copy done] - simplistic case. 2)IN-progress. 3)NACK[copy failed(with status values) or copy partially completed] And if you are using the copy-VM use-case then very few VMs are under 4GBs. So we will hit 2) above more frequently than 1) and 3). > > We can just support full-copy. Partial copies can be returned as > failure. > > Then you have to check the entire range on error instead of just > resuming the copy from where it stopped. > Why not restart? What if the LUN was implementing thin-provisioning and now it ran out-of-space after partially copying your data. So why not restart the copy? If the target doesn't support auto-extend, someone(storage-admin etc) would have to step-in and manage that LUN. You might as-well restart the copy in this case. Chetan Loke ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <1323961140.14317.2.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>]
* Re: copy offload support in Linux - new system call needed? [not found] ` <1323961140.14317.2.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> @ 2011-12-15 17:44 ` J. Bruce Fields 0 siblings, 0 replies; 29+ messages in thread From: J. Bruce Fields @ 2011-12-15 17:44 UTC (permalink / raw) To: Trond Myklebust Cc: Ric Wheeler, Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote: > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote: > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > > > On 12/14/2011 02:27 PM, Al Viro wrote: > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > > > > > >>We had an active thread a couple of years back that came out of the > > > >>reflink work and, at the time, there seemed to be moderately > > > >>positive support for adding a new system call that would fit this > > > >>use case (Joel Becker's copyfile()). > > > >> > > > >>Can we resurrect this effort? Is copyfile() still a good way to go, > > > >>or should we look at other hooks? > > > >copyfile(2) is probably a good way to go, provided that we do _not_ > > > >go baroque as it had happened the last time syscall had been discussed. > > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > > > >If it works - fine, if not - caller has to be ready to deal with handling > > > >cross-device case anyway. > > > > > > I think that this approach makes a lot of sense. Most of the > > > devices/targets that support the copy offload, will do it in very > > > reasonable amounts of time. > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into > > one operation: > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2 > > > > Perhaps we should ask for separate operations for the two cases. (Or at > > least a "please don't bother if this is going to take 8 hours" flag....) > > How would the server know? Sorry, "8 hours" was a joke--no, you can't require the server to predict whether an operation will take more or less than some precise duration. I'm assuming the "fast" case that Al's proposing we do as a first step cover CoW operations? (So O(1) or close to it, users typically won't be asking for progress reports, operation may be atomic (with no partial-failure case), ?) --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:42 ` Ric Wheeler [not found] ` <4EE8FC2E.3010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2011-12-16 8:00 ` Joel Becker 1 sibling, 0 replies; 29+ messages in thread From: Joel Becker @ 2011-12-16 8:00 UTC (permalink / raw) To: Ric Wheeler Cc: Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, James Bottomley On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote: > On 12/14/2011 02:27 PM, Al Viro wrote: > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > > >>We had an active thread a couple of years back that came out of the > >>reflink work and, at the time, there seemed to be moderately > >>positive support for adding a new system call that would fit this > >>use case (Joel Becker's copyfile()). > >> > >>Can we resurrect this effort? Is copyfile() still a good way to go, > >>or should we look at other hooks? > >copyfile(2) is probably a good way to go, provided that we do _not_ > >go baroque as it had happened the last time syscall had been discussed. > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1). > >If it works - fine, if not - caller has to be ready to deal with handling > >cross-device case anyway. > > I think that this approach makes a lot of sense. Most of the > devices/targets that support the copy offload, will do it in very > reasonable amounts of time. > > Let me see if I can dig up some of the presentations from the NetApp > guys who presented overviews or the specifications from the IETF and > T10.... Whee! I've been down the rabbit hole, but I've promised myself to get the updated patch out soon. I know that Trond et al are probably wondering what happened to the patch. more soon. Joel -- Life's Little Instruction Book #207 "Swing for the fence." http://www.jlbec.org/ jlbec@evilplan.org ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:22 copy offload support in Linux - new system call needed? Ric Wheeler 2011-12-14 19:27 ` Al Viro @ 2011-12-14 19:59 ` Jeremy Allison 2011-12-14 20:30 ` Ric Wheeler 2011-12-19 22:19 ` H. Peter Anvin 1 sibling, 2 replies; 29+ messages in thread From: Jeremy Allison @ 2011-12-14 19:59 UTC (permalink / raw) To: Ric Wheeler Cc: linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: > > Back at LinuxCon Prague, we talked about the new NFS and SCSI > commands that let us offload copy operations to a storage device > (like an NFS server or storage array). > > This got new life in the virtual machine world where you might want > to clone bulky guest files or ranges of blocks and was driven > through the standards bodies by vmware, microsoft and some of the > major storage vendors. Windows8 has this functionality fully coded > and integrated in the GUI, I assume vmware also uses it and there > are some vendors who announced support at the SNIA SDC conference. > > We had an active thread a couple of years back that came out of the > reflink work and, at the time, there seemed to be moderately > positive support for adding a new system call that would fit this > use case (Joel Becker's copyfile()). > > Can we resurrect this effort? Is copyfile() still a good way to go, > or should we look at other hooks? Windows uses a COPYCHUNK call, which specifies the following parameters: Definition of a copy "chunk": hyper source_off; hyper target_off; uint32 length; and an array of these chunks which is passed into their kernel. This is what we have to implement in Samba. Jeremy. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:59 ` Jeremy Allison @ 2011-12-14 20:30 ` Ric Wheeler 2011-12-19 12:38 ` Hannes Reinecke 2011-12-19 22:19 ` H. Peter Anvin 1 sibling, 1 reply; 29+ messages in thread From: Ric Wheeler @ 2011-12-14 20:30 UTC (permalink / raw) To: Jeremy Allison Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On 12/14/2011 02:59 PM, Jeremy Allison wrote: > On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: >> Back at LinuxCon Prague, we talked about the new NFS and SCSI >> commands that let us offload copy operations to a storage device >> (like an NFS server or storage array). >> >> This got new life in the virtual machine world where you might want >> to clone bulky guest files or ranges of blocks and was driven >> through the standards bodies by vmware, microsoft and some of the >> major storage vendors. Windows8 has this functionality fully coded >> and integrated in the GUI, I assume vmware also uses it and there >> are some vendors who announced support at the SNIA SDC conference. >> >> We had an active thread a couple of years back that came out of the >> reflink work and, at the time, there seemed to be moderately >> positive support for adding a new system call that would fit this >> use case (Joel Becker's copyfile()). >> >> Can we resurrect this effort? Is copyfile() still a good way to go, >> or should we look at other hooks? > Windows uses a COPYCHUNK call, which specifies the > following parameters: > > Definition of a copy "chunk": > > hyper source_off; > hyper target_off; > uint32 length; > > and an array of these chunks which is passed > into their kernel. > > This is what we have to implement in Samba. > > Jeremy. This is a public pointer to the draft NFS proposal: http://tools.ietf.org/id/draft-lentini-nfsv4-server-side-copy-06.txt The T10 site has some click through that I was not too happy about agreeing to. NetApp (Fred Knight) had some nice presentations that he presented about how SCSI does this in two different ways... Ric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 20:30 ` Ric Wheeler @ 2011-12-19 12:38 ` Hannes Reinecke 0 siblings, 0 replies; 29+ messages in thread From: Hannes Reinecke @ 2011-12-19 12:38 UTC (permalink / raw) To: Ric Wheeler Cc: Jeremy Allison, linux-scsi@vger.kernel.org, linux-fsdevel, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On 12/14/2011 09:30 PM, Ric Wheeler wrote: > On 12/14/2011 02:59 PM, Jeremy Allison wrote: >> On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote: >>> Back at LinuxCon Prague, we talked about the new NFS and SCSI >>> commands that let us offload copy operations to a storage device >>> (like an NFS server or storage array). >>> >>> This got new life in the virtual machine world where you might want >>> to clone bulky guest files or ranges of blocks and was driven >>> through the standards bodies by vmware, microsoft and some of the >>> major storage vendors. Windows8 has this functionality fully coded >>> and integrated in the GUI, I assume vmware also uses it and there >>> are some vendors who announced support at the SNIA SDC conference. >>> >>> We had an active thread a couple of years back that came out of the >>> reflink work and, at the time, there seemed to be moderately >>> positive support for adding a new system call that would fit this >>> use case (Joel Becker's copyfile()). >>> >>> Can we resurrect this effort? Is copyfile() still a good way to go, >>> or should we look at other hooks? >> Windows uses a COPYCHUNK call, which specifies the >> following parameters: >> >> Definition of a copy "chunk": >> >> hyper source_off; >> hyper target_off; >> uint32 length; >> >> and an array of these chunks which is passed >> into their kernel. >> >> This is what we have to implement in Samba. >> >> Jeremy. > > This is a public pointer to the draft NFS proposal: > > http://tools.ietf.org/id/draft-lentini-nfsv4-server-side-copy-06.txt > > The T10 site has some click through that I was not too happy about > agreeing to. NetApp (Fred Knight) had some nice presentations that > he presented about how SCSI does this in two different ways... > Yes, the 'XCOPY Lite' mechanism. With that the whole copy process is broken into two steps: - Create a reference to the requested blocks - Use that reference to request the operation The neat thing with that is that there might be some delay between those steps, effectively creating a snapshot in time. An additional bonus is that one doesn't have to create those over-complicated source and target descriptors, but rather have the array create one for you. So all-in-all nice and easy to use. With the slight disadvantage that no-one implements it. Yet. Hence we might be wanting to use the old-style EXTENDED COPY after all ... However, both approaches have in common that an opaque 'identifier' is used to identify any currently running copy process. So when designing this interface we should keep in mind that we would need to store this identifier somewhere. As as loath as I'm to admit it, the async-I/O mechanism would fit the bill far better than a single copyfile() call ... Which could be easily implemented on top of the Async I/O call, btw. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-14 19:59 ` Jeremy Allison 2011-12-14 20:30 ` Ric Wheeler @ 2011-12-19 22:19 ` H. Peter Anvin 2011-12-19 22:34 ` Jeremy Allison 2011-12-19 22:57 ` Dave Chinner 1 sibling, 2 replies; 29+ messages in thread From: H. Peter Anvin @ 2011-12-19 22:19 UTC (permalink / raw) To: Jeremy Allison Cc: Ric Wheeler, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On 12/14/2011 11:59 AM, Jeremy Allison wrote: >> >> Can we resurrect this effort? Is copyfile() still a good way to go, >> or should we look at other hooks? > > Windows uses a COPYCHUNK call, which specifies the > following parameters: > > Definition of a copy "chunk": > > hyper source_off; > hyper target_off; > uint32 length; > > and an array of these chunks which is passed > into their kernel. > > This is what we have to implement in Samba. > Could we do this by (re-)allowing sendfile() between two files? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-19 22:19 ` H. Peter Anvin @ 2011-12-19 22:34 ` Jeremy Allison 2011-12-19 22:57 ` Dave Chinner 1 sibling, 0 replies; 29+ messages in thread From: Jeremy Allison @ 2011-12-19 22:34 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Allison, Ric Wheeler, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Mon, Dec 19, 2011 at 02:19:43PM -0800, H. Peter Anvin wrote: > On 12/14/2011 11:59 AM, Jeremy Allison wrote: > >> > >> Can we resurrect this effort? Is copyfile() still a good way to go, > >> or should we look at other hooks? > > > > Windows uses a COPYCHUNK call, which specifies the > > following parameters: > > > > Definition of a copy "chunk": > > > > hyper source_off; > > hyper target_off; > > uint32 length; > > > > and an array of these chunks which is passed > > into their kernel. > > > > This is what we have to implement in Samba. > > > > Could we do this by (re-)allowing sendfile() between two files? Oooh - nice idea ! Yes, having a completely symmetric sendfile which allows socket -> file, file -> socket, socket -> socket, file -> file would be a great idea (IMHO). Jeremy. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-19 22:19 ` H. Peter Anvin 2011-12-19 22:34 ` Jeremy Allison @ 2011-12-19 22:57 ` Dave Chinner 2011-12-19 23:29 ` H. Peter Anvin 1 sibling, 1 reply; 29+ messages in thread From: Dave Chinner @ 2011-12-19 22:57 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Allison, Ric Wheeler, linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker, James Bottomley On Mon, Dec 19, 2011 at 02:19:43PM -0800, H. Peter Anvin wrote: > On 12/14/2011 11:59 AM, Jeremy Allison wrote: > >> > >> Can we resurrect this effort? Is copyfile() still a good way to go, > >> or should we look at other hooks? > > > > Windows uses a COPYCHUNK call, which specifies the > > following parameters: > > > > Definition of a copy "chunk": > > > > hyper source_off; > > hyper target_off; > > uint32 length; > > > > and an array of these chunks which is passed > > into their kernel. > > > > This is what we have to implement in Samba. > > > > Could we do this by (re-)allowing sendfile() between two files? That was my immediate thought, but sendfile has plumbing that is page cache based and we require completely different infrastructure and semantics for an array offload. e.g. for an array offload, we have to flush the source file page cache first so that the data being copied is known to be on disk, then invalidate the destination page cache if overwriting or extend and pre-allocate blocks if not. Then we have to map both files and hand that off to the array. Then there's a whole bunch of tricky questions about what the state of the destination file should look like while the copy is in progress, whether the source file should be allowed to change (e.g. it can't be truncated and have blocks freed and then reused by other files half way through the copy offload operation), and so on. sendfile() has well known, fixed semantics that we can't change to suit what is needed for an offload operation that could potentially take hours to complete. Hence I think an new syscall is the way to go.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed? 2011-12-19 22:57 ` Dave Chinner @ 2011-12-19 23:29 ` H. Peter Anvin 0 siblings, 0 replies; 29+ messages in thread From: H. Peter Anvin @ 2011-12-19 23:29 UTC (permalink / raw) To: Dave Chinner Cc: Jeremy Allison, Ric Wheeler, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley On 12/19/2011 02:57 PM, Dave Chinner wrote: > > That was my immediate thought, but sendfile has plumbing that is > page cache based and we require completely different infrastructure > and semantics for an array offload. > The plumbing is internal to the kernel and doesn't mean we have to use the same VFS methods. > e.g. for an array offload, we have to flush the source file page > cache first so that the data being copied is known to be on disk, > then invalidate the destination page cache if overwriting or extend > and pre-allocate blocks if not. Then we have to map both files and > hand that off to the array. > > Then there's a whole bunch of tricky questions about what the state > of the destination file should look like while the copy is in > progress, whether the source file should be allowed to change (e.g. > it can't be truncated and have blocks freed and then reused by other > files half way through the copy offload operation), and so on. > > sendfile() has well known, fixed semantics that we can't change to > suit what is needed for an offload operation that could potentially > take hours to complete. Hence I think an new syscall is the way to > go.... Perhaps what we need first in an explicit enumeration of the semantics you're looking for. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2011-12-19 23:29 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-14 19:22 copy offload support in Linux - new system call needed? Ric Wheeler 2011-12-14 19:27 ` Al Viro [not found] ` <20111214192739.GN2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2011-12-14 19:42 ` Ric Wheeler [not found] ` <4EE8FC2E.3010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2011-12-14 22:27 ` J. Bruce Fields 2011-12-15 14:59 ` Trond Myklebust 2011-12-15 15:52 ` Chris Mason 2011-12-15 16:00 ` Trond Myklebust 2011-12-15 16:03 ` Jeff Layton [not found] ` <20111215110330.33aed3a6-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org> 2011-12-15 16:06 ` Trond Myklebust [not found] ` <1323965176.14317.11.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 2011-12-15 16:16 ` Jeff Layton 2011-12-15 16:38 ` Trond Myklebust 2011-12-15 16:08 ` Loke, Chetan [not found] ` <D3F292ADF945FB49B35E96C94C2061B91516E391-2s2rCY1e8UXHBhWB4kaBDUEOCMrvLtNR@public.gmane.org> 2011-12-15 16:11 ` Trond Myklebust 2011-12-15 16:40 ` Loke, Chetan 2011-12-15 16:53 ` Trond Myklebust [not found] ` <1323968015.14317.28.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 2011-12-15 17:18 ` Ric Wheeler 2011-12-15 17:25 ` Trond Myklebust 2011-12-15 17:31 ` Loke, Chetan 2011-12-15 17:55 ` Ric Wheeler 2011-12-15 17:27 ` Loke, Chetan [not found] ` <1323961140.14317.2.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 2011-12-15 17:44 ` J. Bruce Fields 2011-12-16 8:00 ` Joel Becker 2011-12-14 19:59 ` Jeremy Allison 2011-12-14 20:30 ` Ric Wheeler 2011-12-19 12:38 ` Hannes Reinecke 2011-12-19 22:19 ` H. Peter Anvin 2011-12-19 22:34 ` Jeremy Allison 2011-12-19 22:57 ` Dave Chinner 2011-12-19 23:29 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).