* copy offload support in Linux - new system call needed?
@ 2011-12-14 19:22 Ric Wheeler
2011-12-14 19:27 ` Al Viro
2011-12-14 19:59 ` Jeremy Allison
0 siblings, 2 replies; 29+ messages in thread
From: Ric Wheeler @ 2011-12-14 19:22 UTC (permalink / raw)
To: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
Joel Becker, James Bottomley
Back at LinuxCon Prague, we talked about the new NFS and SCSI commands that let
us offload copy operations to a storage device (like an NFS server or storage
array).
This got new life in the virtual machine world where you might want to clone
bulky guest files or ranges of blocks and was driven through the standards
bodies by vmware, microsoft and some of the major storage vendors. Windows8 has
this functionality fully coded and integrated in the GUI, I assume vmware also
uses it and there are some vendors who announced support at the SNIA SDC conference.
We had an active thread a couple of years back that came out of the reflink work
and, at the time, there seemed to be moderately positive support for adding a
new system call that would fit this use case (Joel Becker's copyfile()).
Can we resurrect this effort? Is copyfile() still a good way to go, or should we
look at other hooks?
Thanks!
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-14 19:22 copy offload support in Linux - new system call needed? Ric Wheeler
@ 2011-12-14 19:27 ` Al Viro
[not found] ` <20111214192739.GN2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2011-12-14 19:59 ` Jeremy Allison
1 sibling, 1 reply; 29+ messages in thread
From: Al Viro @ 2011-12-14 19:27 UTC (permalink / raw)
To: Ric Wheeler
Cc: linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke,
Andrew Morton, linux-nfs, Joel Becker, James Bottomley
On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> We had an active thread a couple of years back that came out of the
> reflink work and, at the time, there seemed to be moderately
> positive support for adding a new system call that would fit this
> use case (Joel Becker's copyfile()).
>
> Can we resurrect this effort? Is copyfile() still a good way to go,
> or should we look at other hooks?
copyfile(2) is probably a good way to go, provided that we do _not_
go baroque as it had happened the last time syscall had been discussed.
IOW, to hell with progress reports, etc. - just a fastpath kind of
thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
If it works - fine, if not - caller has to be ready to deal with handling
cross-device case anyway.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
[not found] ` <20111214192739.GN2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2011-12-14 19:42 ` Ric Wheeler
[not found] ` <4EE8FC2E.3010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-12-16 8:00 ` Joel Becker
0 siblings, 2 replies; 29+ messages in thread
From: Ric Wheeler @ 2011-12-14 19:42 UTC (permalink / raw)
To: Al Viro
Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
Joel Becker, James Bottomley
On 12/14/2011 02:27 PM, Al Viro wrote:
> On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
>
>> We had an active thread a couple of years back that came out of the
>> reflink work and, at the time, there seemed to be moderately
>> positive support for adding a new system call that would fit this
>> use case (Joel Becker's copyfile()).
>>
>> Can we resurrect this effort? Is copyfile() still a good way to go,
>> or should we look at other hooks?
> copyfile(2) is probably a good way to go, provided that we do _not_
> go baroque as it had happened the last time syscall had been discussed.
>
> IOW, to hell with progress reports, etc. - just a fastpath kind of
> thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> If it works - fine, if not - caller has to be ready to deal with handling
> cross-device case anyway.
I think that this approach makes a lot of sense. Most of the devices/targets
that support the copy offload, will do it in very reasonable amounts of time.
Let me see if I can dig up some of the presentations from the NetApp guys who
presented overviews or the specifications from the IETF and T10....
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-14 19:22 copy offload support in Linux - new system call needed? Ric Wheeler
2011-12-14 19:27 ` Al Viro
@ 2011-12-14 19:59 ` Jeremy Allison
2011-12-14 20:30 ` Ric Wheeler
2011-12-19 22:19 ` H. Peter Anvin
1 sibling, 2 replies; 29+ messages in thread
From: Jeremy Allison @ 2011-12-14 19:59 UTC (permalink / raw)
To: Ric Wheeler
Cc: linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke,
Andrew Morton, linux-nfs, Joel Becker, James Bottomley
On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
>
> Back at LinuxCon Prague, we talked about the new NFS and SCSI
> commands that let us offload copy operations to a storage device
> (like an NFS server or storage array).
>
> This got new life in the virtual machine world where you might want
> to clone bulky guest files or ranges of blocks and was driven
> through the standards bodies by vmware, microsoft and some of the
> major storage vendors. Windows8 has this functionality fully coded
> and integrated in the GUI, I assume vmware also uses it and there
> are some vendors who announced support at the SNIA SDC conference.
>
> We had an active thread a couple of years back that came out of the
> reflink work and, at the time, there seemed to be moderately
> positive support for adding a new system call that would fit this
> use case (Joel Becker's copyfile()).
>
> Can we resurrect this effort? Is copyfile() still a good way to go,
> or should we look at other hooks?
Windows uses a COPYCHUNK call, which specifies the
following parameters:
Definition of a copy "chunk":
hyper source_off;
hyper target_off;
uint32 length;
and an array of these chunks which is passed
into their kernel.
This is what we have to implement in Samba.
Jeremy.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-14 19:59 ` Jeremy Allison
@ 2011-12-14 20:30 ` Ric Wheeler
2011-12-19 12:38 ` Hannes Reinecke
2011-12-19 22:19 ` H. Peter Anvin
1 sibling, 1 reply; 29+ messages in thread
From: Ric Wheeler @ 2011-12-14 20:30 UTC (permalink / raw)
To: Jeremy Allison
Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
Joel Becker, James Bottomley
On 12/14/2011 02:59 PM, Jeremy Allison wrote:
> On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
>> Back at LinuxCon Prague, we talked about the new NFS and SCSI
>> commands that let us offload copy operations to a storage device
>> (like an NFS server or storage array).
>>
>> This got new life in the virtual machine world where you might want
>> to clone bulky guest files or ranges of blocks and was driven
>> through the standards bodies by vmware, microsoft and some of the
>> major storage vendors. Windows8 has this functionality fully coded
>> and integrated in the GUI, I assume vmware also uses it and there
>> are some vendors who announced support at the SNIA SDC conference.
>>
>> We had an active thread a couple of years back that came out of the
>> reflink work and, at the time, there seemed to be moderately
>> positive support for adding a new system call that would fit this
>> use case (Joel Becker's copyfile()).
>>
>> Can we resurrect this effort? Is copyfile() still a good way to go,
>> or should we look at other hooks?
> Windows uses a COPYCHUNK call, which specifies the
> following parameters:
>
> Definition of a copy "chunk":
>
> hyper source_off;
> hyper target_off;
> uint32 length;
>
> and an array of these chunks which is passed
> into their kernel.
>
> This is what we have to implement in Samba.
>
> Jeremy.
This is a public pointer to the draft NFS proposal:
http://tools.ietf.org/id/draft-lentini-nfsv4-server-side-copy-06.txt
The T10 site has some click through that I was not too happy about agreeing to.
NetApp (Fred Knight) had some nice presentations that he presented about how
SCSI does this in two different ways...
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
[not found] ` <4EE8FC2E.3010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2011-12-14 22:27 ` J. Bruce Fields
2011-12-15 14:59 ` Trond Myklebust
0 siblings, 1 reply; 29+ messages in thread
From: J. Bruce Fields @ 2011-12-14 22:27 UTC (permalink / raw)
To: Ric Wheeler
Cc: Al Viro, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-fsdevel, Hannes Reinecke, Andrew Morton,
linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley
On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> On 12/14/2011 02:27 PM, Al Viro wrote:
> >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> >
> >>We had an active thread a couple of years back that came out of the
> >>reflink work and, at the time, there seemed to be moderately
> >>positive support for adding a new system call that would fit this
> >>use case (Joel Becker's copyfile()).
> >>
> >>Can we resurrect this effort? Is copyfile() still a good way to go,
> >>or should we look at other hooks?
> >copyfile(2) is probably a good way to go, provided that we do _not_
> >go baroque as it had happened the last time syscall had been discussed.
> >
> >IOW, to hell with progress reports, etc. - just a fastpath kind of
> >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> >If it works - fine, if not - caller has to be ready to deal with handling
> >cross-device case anyway.
>
> I think that this approach makes a lot of sense. Most of the
> devices/targets that support the copy offload, will do it in very
> reasonable amounts of time.
The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
one operation:
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
Perhaps we should ask for separate operations for the two cases. (Or at
least a "please don't bother if this is going to take 8 hours" flag....)
--b.
>
> Let me see if I can dig up some of the presentations from the NetApp
> guys who presented overviews or the specifications from the IETF and
> T10....
>
> Ric
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-14 22:27 ` J. Bruce Fields
@ 2011-12-15 14:59 ` Trond Myklebust
2011-12-15 15:52 ` Chris Mason
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Trond Myklebust @ 2011-12-15 14:59 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker,
James Bottomley
On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > On 12/14/2011 02:27 PM, Al Viro wrote:
> > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > >
> > >>We had an active thread a couple of years back that came out of the
> > >>reflink work and, at the time, there seemed to be moderately
> > >>positive support for adding a new system call that would fit this
> > >>use case (Joel Becker's copyfile()).
> > >>
> > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > >>or should we look at other hooks?
> > >copyfile(2) is probably a good way to go, provided that we do _not_
> > >go baroque as it had happened the last time syscall had been discussed.
> > >
> > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > >If it works - fine, if not - caller has to be ready to deal with handling
> > >cross-device case anyway.
> >
> > I think that this approach makes a lot of sense. Most of the
> > devices/targets that support the copy offload, will do it in very
> > reasonable amounts of time.
>
> The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> one operation:
>
> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
>
> Perhaps we should ask for separate operations for the two cases. (Or at
> least a "please don't bother if this is going to take 8 hours" flag....)
How would the server know? I suggest we deal with this by adding an
ioctl() to allow the application to poll for progress: I'm assuming now
that we don't expect more than 1 copyfile() system call at a time per
file descriptor...
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-15 14:59 ` Trond Myklebust
@ 2011-12-15 15:52 ` Chris Mason
2011-12-15 16:00 ` Trond Myklebust
2011-12-15 16:03 ` Jeff Layton
2011-12-15 16:08 ` Loke, Chetan
[not found] ` <1323961140.14317.2.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2 siblings, 2 replies; 29+ messages in thread
From: Chris Mason @ 2011-12-15 15:52 UTC (permalink / raw)
To: Trond Myklebust
Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi@vger.kernel.org,
linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs,
Joel Becker, James Bottomley
On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > >
> > > >>We had an active thread a couple of years back that came out of the
> > > >>reflink work and, at the time, there seemed to be moderately
> > > >>positive support for adding a new system call that would fit this
> > > >>use case (Joel Becker's copyfile()).
> > > >>
> > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > >>or should we look at other hooks?
> > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > >go baroque as it had happened the last time syscall had been discussed.
> > > >
> > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > >cross-device case anyway.
> > >
> > > I think that this approach makes a lot of sense. Most of the
> > > devices/targets that support the copy offload, will do it in very
> > > reasonable amounts of time.
> >
> > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > one operation:
> >
> > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> >
> > Perhaps we should ask for separate operations for the two cases. (Or at
> > least a "please don't bother if this is going to take 8 hours" flag....)
>
> How would the server know? I suggest we deal with this by adding an
> ioctl() to allow the application to poll for progress: I'm assuming now
> that we don't expect more than 1 copyfile() system call at a time per
> file descriptor...
If we're using this to copy VM image files, I could easily imagine
wanting to clone multiple copies of the VM in parallel.
-chris
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-15 15:52 ` Chris Mason
@ 2011-12-15 16:00 ` Trond Myklebust
2011-12-15 16:03 ` Jeff Layton
1 sibling, 0 replies; 29+ messages in thread
From: Trond Myklebust @ 2011-12-15 16:00 UTC (permalink / raw)
To: Chris Mason
Cc: J. Bruce Fields, Ric Wheeler, Al Viro,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
Joel Becker, James Bottomley
On Thu, 2011-12-15 at 10:52 -0500, Chris Mason wrote:
> On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > >
> > > > >>We had an active thread a couple of years back that came out of the
> > > > >>reflink work and, at the time, there seemed to be moderately
> > > > >>positive support for adding a new system call that would fit this
> > > > >>use case (Joel Becker's copyfile()).
> > > > >>
> > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > >>or should we look at other hooks?
> > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > >
> > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > >cross-device case anyway.
> > > >
> > > > I think that this approach makes a lot of sense. Most of the
> > > > devices/targets that support the copy offload, will do it in very
> > > > reasonable amounts of time.
> > >
> > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > one operation:
> > >
> > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > >
> > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > least a "please don't bother if this is going to take 8 hours" flag....)
> >
> > How would the server know? I suggest we deal with this by adding an
> > ioctl() to allow the application to poll for progress: I'm assuming now
> > that we don't expect more than 1 copyfile() system call at a time per
> > file descriptor...
>
> If we're using this to copy VM image files, I could easily imagine
> wanting to clone multiple copies of the VM in parallel.
Sure, but in that case, your target file descriptors will differ, right?
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-15 15:52 ` Chris Mason
2011-12-15 16:00 ` Trond Myklebust
@ 2011-12-15 16:03 ` Jeff Layton
[not found] ` <20111215110330.33aed3a6-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
1 sibling, 1 reply; 29+ messages in thread
From: Jeff Layton @ 2011-12-15 16:03 UTC (permalink / raw)
To: Chris Mason
Cc: Trond Myklebust, J. Bruce Fields, Ric Wheeler, Al Viro,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
Joel Becker, James Bottomley
On Thu, 15 Dec 2011 10:52:13 -0500
Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > >
> > > > >>We had an active thread a couple of years back that came out of the
> > > > >>reflink work and, at the time, there seemed to be moderately
> > > > >>positive support for adding a new system call that would fit this
> > > > >>use case (Joel Becker's copyfile()).
> > > > >>
> > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > >>or should we look at other hooks?
> > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > >
> > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > >cross-device case anyway.
> > > >
> > > > I think that this approach makes a lot of sense. Most of the
> > > > devices/targets that support the copy offload, will do it in very
> > > > reasonable amounts of time.
> > >
> > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > one operation:
> > >
> > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > >
> > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > least a "please don't bother if this is going to take 8 hours" flag....)
> >
> > How would the server know? I suggest we deal with this by adding an
> > ioctl() to allow the application to poll for progress: I'm assuming now
> > that we don't expect more than 1 copyfile() system call at a time per
> > file descriptor...
>
> If we're using this to copy VM image files, I could easily imagine
> wanting to clone multiple copies of the VM in parallel.
>
> -chris
>
Not really a problem is it? Just dup() the fd before you issue the
copyfile()? Or even simpler, just do periodic stat() on the destination
file if you want a progress report.
Regardless, I like the simple approach that Al is suggesting here.
--
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
[not found] ` <20111215110330.33aed3a6-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
@ 2011-12-15 16:06 ` Trond Myklebust
[not found] ` <1323965176.14317.11.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2011-12-15 16:06 UTC (permalink / raw)
To: Jeff Layton
Cc: Chris Mason, J. Bruce Fields, Ric Wheeler, Al Viro,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
Joel Becker, James Bottomley
On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote:
> On Thu, 15 Dec 2011 10:52:13 -0500
> Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>
> > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > > >
> > > > > >>We had an active thread a couple of years back that came out of the
> > > > > >>reflink work and, at the time, there seemed to be moderately
> > > > > >>positive support for adding a new system call that would fit this
> > > > > >>use case (Joel Becker's copyfile()).
> > > > > >>
> > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > > >>or should we look at other hooks?
> > > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > > >
> > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > > >cross-device case anyway.
> > > > >
> > > > > I think that this approach makes a lot of sense. Most of the
> > > > > devices/targets that support the copy offload, will do it in very
> > > > > reasonable amounts of time.
> > > >
> > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > > one operation:
> > > >
> > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > > >
> > > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > > least a "please don't bother if this is going to take 8 hours" flag....)
> > >
> > > How would the server know? I suggest we deal with this by adding an
> > > ioctl() to allow the application to poll for progress: I'm assuming now
> > > that we don't expect more than 1 copyfile() system call at a time per
> > > file descriptor...
> >
> > If we're using this to copy VM image files, I could easily imagine
> > wanting to clone multiple copies of the VM in parallel.
> >
> > -chris
> >
>
> Not really a problem is it? Just dup() the fd before you issue the
> copyfile()? Or even simpler, just do periodic stat() on the destination
> file if you want a progress report.
>
> Regardless, I like the simple approach that Al is suggesting here.
Periodic stat() isn't good enough if you are copying subranges of a
file. Part of the application here (as I understood it) is to initialise
specific disk volumes on existing VM images when doing thin
provisioning. In that case, the reported image size won't ever change...
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed?
2011-12-15 14:59 ` Trond Myklebust
2011-12-15 15:52 ` Chris Mason
@ 2011-12-15 16:08 ` Loke, Chetan
[not found] ` <D3F292ADF945FB49B35E96C94C2061B91516E391-2s2rCY1e8UXHBhWB4kaBDUEOCMrvLtNR@public.gmane.org>
[not found] ` <1323961140.14317.2.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2 siblings, 1 reply; 29+ messages in thread
From: Loke, Chetan @ 2011-12-15 16:08 UTC (permalink / raw)
To: Trond Myklebust, J. Bruce Fields
Cc: Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel, Hannes Reinecke,
Andrew Morton, linux-nfs, Joel Becker, James Bottomley
> How would the server know? I suggest we deal with this by adding an
> ioctl() to allow the application to poll for progress: I'm assuming now
Why not support something like the async-iocb?
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed?
[not found] ` <D3F292ADF945FB49B35E96C94C2061B91516E391-2s2rCY1e8UXHBhWB4kaBDUEOCMrvLtNR@public.gmane.org>
@ 2011-12-15 16:11 ` Trond Myklebust
2011-12-15 16:40 ` Loke, Chetan
0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2011-12-15 16:11 UTC (permalink / raw)
To: Loke, Chetan
Cc: J. Bruce Fields, Ric Wheeler, Al Viro,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel, Hannes Reinecke,
Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker,
James Bottomley
On Thu, 2011-12-15 at 11:08 -0500, Loke, Chetan wrote:
> > How would the server know? I suggest we deal with this by adding an
> > ioctl() to allow the application to poll for progress: I'm assuming now
>
> Why not support something like the async-iocb?
You could, but that would tie copyfile() to the aio interface which was
one of the things that I believe Al was opposed to when we discussed
this at LSF/MM-2010.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
[not found] ` <1323965176.14317.11.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
@ 2011-12-15 16:16 ` Jeff Layton
2011-12-15 16:38 ` Trond Myklebust
0 siblings, 1 reply; 29+ messages in thread
From: Jeff Layton @ 2011-12-15 16:16 UTC (permalink / raw)
To: Trond Myklebust
Cc: Chris Mason, J. Bruce Fields, Ric Wheeler, Al Viro,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
Joel Becker, James Bottomley
On Thu, 15 Dec 2011 11:06:16 -0500
Trond Myklebust <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote:
> > On Thu, 15 Dec 2011 10:52:13 -0500
> > Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > > > >
> > > > > > >>We had an active thread a couple of years back that came out of the
> > > > > > >>reflink work and, at the time, there seemed to be moderately
> > > > > > >>positive support for adding a new system call that would fit this
> > > > > > >>use case (Joel Becker's copyfile()).
> > > > > > >>
> > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > > > >>or should we look at other hooks?
> > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > > > >
> > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > > > >cross-device case anyway.
> > > > > >
> > > > > > I think that this approach makes a lot of sense. Most of the
> > > > > > devices/targets that support the copy offload, will do it in very
> > > > > > reasonable amounts of time.
> > > > >
> > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > > > one operation:
> > > > >
> > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > > > >
> > > > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > > > least a "please don't bother if this is going to take 8 hours" flag....)
> > > >
> > > > How would the server know? I suggest we deal with this by adding an
> > > > ioctl() to allow the application to poll for progress: I'm assuming now
> > > > that we don't expect more than 1 copyfile() system call at a time per
> > > > file descriptor...
> > >
> > > If we're using this to copy VM image files, I could easily imagine
> > > wanting to clone multiple copies of the VM in parallel.
> > >
> > > -chris
> > >
> >
> > Not really a problem is it? Just dup() the fd before you issue the
> > copyfile()? Or even simpler, just do periodic stat() on the destination
> > file if you want a progress report.
> >
> > Regardless, I like the simple approach that Al is suggesting here.
>
> Periodic stat() isn't good enough if you are copying subranges of a
> file. Part of the application here (as I understood it) is to initialise
> specific disk volumes on existing VM images when doing thin
> provisioning. In that case, the reported image size won't ever change...
>
If they were sparse files then st_blocks would presumably change, but
that's not necessarily going to be the case. So, ok stat() is out for
this...
What's the use-case for these sorts of progress reports anyway?
Progress meters in GUI apps?
Either way, I think adding as simple an interface as possible to begin
with makes sense. If you want to add progress reports or other
doohickeys later, then that can be done in a separate set of patches...
--
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-15 16:16 ` Jeff Layton
@ 2011-12-15 16:38 ` Trond Myklebust
0 siblings, 0 replies; 29+ messages in thread
From: Trond Myklebust @ 2011-12-15 16:38 UTC (permalink / raw)
To: Jeff Layton
Cc: Chris Mason, J. Bruce Fields, Ric Wheeler, Al Viro,
linux-scsi@vger.kernel.org, linux-fsdevel, Hannes Reinecke,
Andrew Morton, linux-nfs, Joel Becker, James Bottomley
On Thu, 2011-12-15 at 11:16 -0500, Jeff Layton wrote:
> On Thu, 15 Dec 2011 11:06:16 -0500
> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
>
> > On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote:
> > > On Thu, 15 Dec 2011 10:52:13 -0500
> > > Chris Mason <chris.mason@oracle.com> wrote:
> > >
> > > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > > > > >
> > > > > > > >>We had an active thread a couple of years back that came out of the
> > > > > > > >>reflink work and, at the time, there seemed to be moderately
> > > > > > > >>positive support for adding a new system call that would fit this
> > > > > > > >>use case (Joel Becker's copyfile()).
> > > > > > > >>
> > > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > > > > >>or should we look at other hooks?
> > > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > > > > >
> > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > > > > >cross-device case anyway.
> > > > > > >
> > > > > > > I think that this approach makes a lot of sense. Most of the
> > > > > > > devices/targets that support the copy offload, will do it in very
> > > > > > > reasonable amounts of time.
> > > > > >
> > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > > > > one operation:
> > > > > >
> > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > > > > >
> > > > > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > > > > least a "please don't bother if this is going to take 8 hours" flag....)
> > > > >
> > > > > How would the server know? I suggest we deal with this by adding an
> > > > > ioctl() to allow the application to poll for progress: I'm assuming now
> > > > > that we don't expect more than 1 copyfile() system call at a time per
> > > > > file descriptor...
> > > >
> > > > If we're using this to copy VM image files, I could easily imagine
> > > > wanting to clone multiple copies of the VM in parallel.
> > > >
> > > > -chris
> > > >
> > >
> > > Not really a problem is it? Just dup() the fd before you issue the
> > > copyfile()? Or even simpler, just do periodic stat() on the destination
> > > file if you want a progress report.
> > >
> > > Regardless, I like the simple approach that Al is suggesting here.
> >
> > Periodic stat() isn't good enough if you are copying subranges of a
> > file. Part of the application here (as I understood it) is to initialise
> > specific disk volumes on existing VM images when doing thin
> > provisioning. In that case, the reported image size won't ever change...
> >
>
> If they were sparse files then st_blocks would presumably change, but
> that's not necessarily going to be the case. So, ok stat() is out for
> this...
>
> What's the use-case for these sorts of progress reports anyway?
> Progress meters in GUI apps?
Mainly... If you are copying several GB worth of data, you expect it to
take some time, but you'd like to know that the server hasn't just
crashed or something...
> Either way, I think adding as simple an interface as possible to begin
> with makes sense. If you want to add progress reports or other
> doohickeys later, then that can be done in a separate set of patches...
Agreed. ...and doing it as an ioctl allows for that. I just want to make
sure someone else here doesn't have a use case that might blow that idea
out of the water...
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed?
2011-12-15 16:11 ` Trond Myklebust
@ 2011-12-15 16:40 ` Loke, Chetan
2011-12-15 16:53 ` Trond Myklebust
0 siblings, 1 reply; 29+ messages in thread
From: Loke, Chetan @ 2011-12-15 16:40 UTC (permalink / raw)
To: Trond Myklebust
Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker,
James Bottomley
> >
> > Why not support something like the async-iocb?
>
> You could, but that would tie copyfile() to the aio interface which was
> one of the things that I believe Al was opposed to when we discussed
> this at LSF/MM-2010.
>
virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors.
So let's think about it from end-users perspective:
Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation.
We can just support full-copy. Partial copies can be returned as failure.
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed?
2011-12-15 16:40 ` Loke, Chetan
@ 2011-12-15 16:53 ` Trond Myklebust
[not found] ` <1323968015.14317.28.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2011-12-15 16:53 UTC (permalink / raw)
To: Loke, Chetan
Cc: J. Bruce Fields, Ric Wheeler, Al Viro, linux-scsi, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker,
James Bottomley
On Thu, 2011-12-15 at 11:40 -0500, Loke, Chetan wrote:
> > >
> > > Why not support something like the async-iocb?
> >
> > You could, but that would tie copyfile() to the aio interface which was
> > one of the things that I believe Al was opposed to when we discussed
> > this at LSF/MM-2010.
> >
>
> virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors.
The application is thin provisioning, not the 'cp' command. When
virtualisation vendors do support this, it will mainly be as part of
their image management toolkits, not the hypervisor.
> So let's think about it from end-users perspective:
> Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation.
'Am I done' is easy: copyfile() returns with the number of bytes that
have been copied.
'Is my copyfile() syscall making progress' is the question that needs
answering.
> We can just support full-copy. Partial copies can be returned as failure.
Then you have to check the entire range on error instead of just
resuming the copy from where it stopped.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
[not found] ` <1323968015.14317.28.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
@ 2011-12-15 17:18 ` Ric Wheeler
2011-12-15 17:25 ` Trond Myklebust
2011-12-15 17:31 ` Loke, Chetan
2011-12-15 17:27 ` Loke, Chetan
1 sibling, 2 replies; 29+ messages in thread
From: Ric Wheeler @ 2011-12-15 17:18 UTC (permalink / raw)
To: Trond Myklebust
Cc: Loke, Chetan, J. Bruce Fields, Al Viro,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel, Hannes Reinecke,
Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker,
James Bottomley
On 12/15/2011 11:53 AM, Trond Myklebust wrote:
> On Thu, 2011-12-15 at 11:40 -0500, Loke, Chetan wrote:
>>>> Why not support something like the async-iocb?
>>> You could, but that would tie copyfile() to the aio interface which was
>>> one of the things that I believe Al was opposed to when we discussed
>>> this at LSF/MM-2010.
>>>
>> virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors.
> The application is thin provisioning, not the 'cp' command. When
> virtualisation vendors do support this, it will mainly be as part of
> their image management toolkits, not the hypervisor.
I think that hypervisor vendors will be very interested in this feature which
would explain why vmware was active in drafting both the NFS and T10 specs. Not
to mention those of us who use KVM or XEN :)
As Trond mentions, we might have this in the management tool chain or other
places in the stack.
>
>> So let's think about it from end-users perspective:
>> Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation.
> 'Am I done' is easy: copyfile() returns with the number of bytes that
> have been copied.
>
> 'Is my copyfile() syscall making progress' is the question that needs
> answering.
>
>> We can just support full-copy. Partial copies can be returned as failure.
> Then you have to check the entire range on error instead of just
> resuming the copy from where it stopped.
>
I also like simple first. I am not too certain about the need for polling
(especially given how little we have done historically to take advantage of the
notifications, water marks, etc in things like thin provisioning :)).
On the other hand, I also don't object to having the ability to poll (through
the ioctl or whatever) if others find that useful.
What I would like to see is a way to make sure that we can interrupt any long
running command & also make sure that our timeouts (for SCSI specifically) are
not too aggressive.
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-15 17:18 ` Ric Wheeler
@ 2011-12-15 17:25 ` Trond Myklebust
2011-12-15 17:31 ` Loke, Chetan
1 sibling, 0 replies; 29+ messages in thread
From: Trond Myklebust @ 2011-12-15 17:25 UTC (permalink / raw)
To: Ric Wheeler
Cc: Loke, Chetan, J. Bruce Fields, Al Viro, linux-scsi, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker,
James Bottomley
On Thu, 2011-12-15 at 12:18 -0500, Ric Wheeler wrote:
> What I would like to see is a way to make sure that we can interrupt any long
> running command & also make sure that our timeouts (for SCSI specifically) are
> not too aggressive.
The draft NFSv4.2 protocol contains features to make interruption
possible, so as far as the NFS client is concerned, that should be
doable. I can't answer for CIFS or SCSI...
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed?
[not found] ` <1323968015.14317.28.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-12-15 17:18 ` Ric Wheeler
@ 2011-12-15 17:27 ` Loke, Chetan
1 sibling, 0 replies; 29+ messages in thread
From: Loke, Chetan @ 2011-12-15 17:27 UTC (permalink / raw)
To: Trond Myklebust
Cc: J. Bruce Fields, Ric Wheeler, Al Viro,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel, Hannes Reinecke,
Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker,
James Bottomley
> > virtualization vendors who support this offload do it at a layer
> above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So
> I think 'copyfile' is going to be appealing to application-developers
> more than the hypervisor-vendors.
>
> The application is thin provisioning, not the 'cp' command. When
thin-provisioning is one use-case. There are quite a few use-cases of 'copyfile' depending on your business-logic and the type of appliance you sell.
> virtualisation vendors do support this, it will mainly be as part of
> their image management toolkits, not the hypervisor.
>
Toolkits? May not be true. The toolkit might need to talk to some hypervisor-component to ensure LUN-locking etc on the target. So this is not entirely isolated as you might think. There is some integration. As an example(just to prove the point) - Have you ever seen anyone not use vsphere-client on VMware for copying VM templates?
> > So let's think about it from end-users perspective:
> > Won't everyone replicate code to check - 'Am I done'? It will just
> make application folks write more (ugly)code. Because you would then
> have to maintain another queue/etc to check for this operation.
>
> 'Am I done' is easy: copyfile() returns with the number of bytes that
> have been copied.
>
> 'Is my copyfile() syscall making progress' is the question that needs
> answering.
>
Understood. But as a user, we don't know what 'am I done' is going to report.
'am I done' can return:
1)ACK[copy done] - simplistic case.
2)IN-progress.
3)NACK[copy failed(with status values) or copy partially completed]
And if you are using the copy-VM use-case then very few VMs are under 4GBs. So we will hit 2) above more frequently than 1) and 3).
> > We can just support full-copy. Partial copies can be returned as
> failure.
>
> Then you have to check the entire range on error instead of just
> resuming the copy from where it stopped.
>
Why not restart? What if the LUN was implementing thin-provisioning and now it ran out-of-space after partially copying your data.
So why not restart the copy? If the target doesn't support auto-extend, someone(storage-admin etc) would have to step-in and manage that LUN.
You might as-well restart the copy in this case.
Chetan Loke
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: copy offload support in Linux - new system call needed?
2011-12-15 17:18 ` Ric Wheeler
2011-12-15 17:25 ` Trond Myklebust
@ 2011-12-15 17:31 ` Loke, Chetan
2011-12-15 17:55 ` Ric Wheeler
1 sibling, 1 reply; 29+ messages in thread
From: Loke, Chetan @ 2011-12-15 17:31 UTC (permalink / raw)
To: Ric Wheeler, Trond Myklebust
Cc: J. Bruce Fields, Al Viro, linux-scsi, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs, Joel Becker,
James Bottomley
>
> I think that hypervisor vendors will be very interested in this feature
> which
> would explain why vmware was active in drafting both the NFS and T10
Specs are the only way to convince storage-target-vendors ;). Otherwise target-stack will need to implement multiple custom-CDB-handlers for different front-end APIs(which is ugly).
Chetan
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
[not found] ` <1323961140.14317.2.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
@ 2011-12-15 17:44 ` J. Bruce Fields
0 siblings, 0 replies; 29+ messages in thread
From: J. Bruce Fields @ 2011-12-15 17:44 UTC (permalink / raw)
To: Trond Myklebust
Cc: Ric Wheeler, Al Viro,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
Joel Becker, James Bottomley
On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > >
> > > >>We had an active thread a couple of years back that came out of the
> > > >>reflink work and, at the time, there seemed to be moderately
> > > >>positive support for adding a new system call that would fit this
> > > >>use case (Joel Becker's copyfile()).
> > > >>
> > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > >>or should we look at other hooks?
> > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > >go baroque as it had happened the last time syscall had been discussed.
> > > >
> > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > >cross-device case anyway.
> > >
> > > I think that this approach makes a lot of sense. Most of the
> > > devices/targets that support the copy offload, will do it in very
> > > reasonable amounts of time.
> >
> > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > one operation:
> >
> > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> >
> > Perhaps we should ask for separate operations for the two cases. (Or at
> > least a "please don't bother if this is going to take 8 hours" flag....)
>
> How would the server know?
Sorry, "8 hours" was a joke--no, you can't require the server to predict
whether an operation will take more or less than some precise duration.
I'm assuming the "fast" case that Al's proposing we do as a first step
cover CoW operations? (So O(1) or close to it, users typically won't be
asking for progress reports, operation may be atomic (with no
partial-failure case), ?)
--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-15 17:31 ` Loke, Chetan
@ 2011-12-15 17:55 ` Ric Wheeler
0 siblings, 0 replies; 29+ messages in thread
From: Ric Wheeler @ 2011-12-15 17:55 UTC (permalink / raw)
To: Loke, Chetan
Cc: Trond Myklebust, J. Bruce Fields, Al Viro, linux-scsi,
linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs,
Joel Becker, James Bottomley
On 12/15/2011 12:31 PM, Loke, Chetan wrote:
>> I think that hypervisor vendors will be very interested in this feature
>> which
>> would explain why vmware was active in drafting both the NFS and T10
> Specs are the only way to convince storage-target-vendors ;). Otherwise target-stack will need to implement multiple custom-CDB-handlers for different front-end APIs(which is ugly).
>
>
> Chetan
Hi Chetan,
I should post from my "Red Hat" email to make this less confusing for you - I
know that this is in fact interesting to vendors :)
Thanks!
Ric
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-14 19:42 ` Ric Wheeler
[not found] ` <4EE8FC2E.3010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2011-12-16 8:00 ` Joel Becker
1 sibling, 0 replies; 29+ messages in thread
From: Joel Becker @ 2011-12-16 8:00 UTC (permalink / raw)
To: Ric Wheeler
Cc: Al Viro, linux-scsi@vger.kernel.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs, James Bottomley
On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> On 12/14/2011 02:27 PM, Al Viro wrote:
> >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> >
> >>We had an active thread a couple of years back that came out of the
> >>reflink work and, at the time, there seemed to be moderately
> >>positive support for adding a new system call that would fit this
> >>use case (Joel Becker's copyfile()).
> >>
> >>Can we resurrect this effort? Is copyfile() still a good way to go,
> >>or should we look at other hooks?
> >copyfile(2) is probably a good way to go, provided that we do _not_
> >go baroque as it had happened the last time syscall had been discussed.
> >
> >IOW, to hell with progress reports, etc. - just a fastpath kind of
> >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> >If it works - fine, if not - caller has to be ready to deal with handling
> >cross-device case anyway.
>
> I think that this approach makes a lot of sense. Most of the
> devices/targets that support the copy offload, will do it in very
> reasonable amounts of time.
>
> Let me see if I can dig up some of the presentations from the NetApp
> guys who presented overviews or the specifications from the IETF and
> T10....
Whee! I've been down the rabbit hole, but I've promised myself
to get the updated patch out soon. I know that Trond et al are probably
wondering what happened to the patch. more soon.
Joel
--
Life's Little Instruction Book #207
"Swing for the fence."
http://www.jlbec.org/
jlbec@evilplan.org
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-14 20:30 ` Ric Wheeler
@ 2011-12-19 12:38 ` Hannes Reinecke
0 siblings, 0 replies; 29+ messages in thread
From: Hannes Reinecke @ 2011-12-19 12:38 UTC (permalink / raw)
To: Ric Wheeler
Cc: Jeremy Allison, linux-scsi@vger.kernel.org, linux-fsdevel,
Andrew Morton, linux-nfs, Joel Becker, James Bottomley
On 12/14/2011 09:30 PM, Ric Wheeler wrote:
> On 12/14/2011 02:59 PM, Jeremy Allison wrote:
>> On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
>>> Back at LinuxCon Prague, we talked about the new NFS and SCSI
>>> commands that let us offload copy operations to a storage device
>>> (like an NFS server or storage array).
>>>
>>> This got new life in the virtual machine world where you might want
>>> to clone bulky guest files or ranges of blocks and was driven
>>> through the standards bodies by vmware, microsoft and some of the
>>> major storage vendors. Windows8 has this functionality fully coded
>>> and integrated in the GUI, I assume vmware also uses it and there
>>> are some vendors who announced support at the SNIA SDC conference.
>>>
>>> We had an active thread a couple of years back that came out of the
>>> reflink work and, at the time, there seemed to be moderately
>>> positive support for adding a new system call that would fit this
>>> use case (Joel Becker's copyfile()).
>>>
>>> Can we resurrect this effort? Is copyfile() still a good way to go,
>>> or should we look at other hooks?
>> Windows uses a COPYCHUNK call, which specifies the
>> following parameters:
>>
>> Definition of a copy "chunk":
>>
>> hyper source_off;
>> hyper target_off;
>> uint32 length;
>>
>> and an array of these chunks which is passed
>> into their kernel.
>>
>> This is what we have to implement in Samba.
>>
>> Jeremy.
>
> This is a public pointer to the draft NFS proposal:
>
> http://tools.ietf.org/id/draft-lentini-nfsv4-server-side-copy-06.txt
>
> The T10 site has some click through that I was not too happy about
> agreeing to. NetApp (Fred Knight) had some nice presentations that
> he presented about how SCSI does this in two different ways...
>
Yes, the 'XCOPY Lite' mechanism.
With that the whole copy process is broken into two steps:
- Create a reference to the requested blocks
- Use that reference to request the operation
The neat thing with that is that there might be some delay between
those steps, effectively creating a snapshot in time.
An additional bonus is that one doesn't have to create those
over-complicated source and target descriptors, but rather have the
array create one for you.
So all-in-all nice and easy to use. With the slight disadvantage
that no-one implements it. Yet.
Hence we might be wanting to use the old-style EXTENDED COPY after
all ...
However, both approaches have in common that an opaque 'identifier'
is used to identify any currently running copy process.
So when designing this interface we should keep in mind that we
would need to store this identifier somewhere. As as loath as I'm to
admit it, the async-I/O mechanism would fit the bill far better than
a single copyfile() call ...
Which could be easily implemented on top of the Async I/O call, btw.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-14 19:59 ` Jeremy Allison
2011-12-14 20:30 ` Ric Wheeler
@ 2011-12-19 22:19 ` H. Peter Anvin
2011-12-19 22:34 ` Jeremy Allison
2011-12-19 22:57 ` Dave Chinner
1 sibling, 2 replies; 29+ messages in thread
From: H. Peter Anvin @ 2011-12-19 22:19 UTC (permalink / raw)
To: Jeremy Allison
Cc: Ric Wheeler, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-fsdevel, Hannes Reinecke, Andrew Morton,
linux-nfs-u79uwXL29TY76Z2rM5mHXA, Joel Becker, James Bottomley
On 12/14/2011 11:59 AM, Jeremy Allison wrote:
>>
>> Can we resurrect this effort? Is copyfile() still a good way to go,
>> or should we look at other hooks?
>
> Windows uses a COPYCHUNK call, which specifies the
> following parameters:
>
> Definition of a copy "chunk":
>
> hyper source_off;
> hyper target_off;
> uint32 length;
>
> and an array of these chunks which is passed
> into their kernel.
>
> This is what we have to implement in Samba.
>
Could we do this by (re-)allowing sendfile() between two files?
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-19 22:19 ` H. Peter Anvin
@ 2011-12-19 22:34 ` Jeremy Allison
2011-12-19 22:57 ` Dave Chinner
1 sibling, 0 replies; 29+ messages in thread
From: Jeremy Allison @ 2011-12-19 22:34 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Jeremy Allison, Ric Wheeler, linux-scsi@vger.kernel.org,
linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs,
Joel Becker, James Bottomley
On Mon, Dec 19, 2011 at 02:19:43PM -0800, H. Peter Anvin wrote:
> On 12/14/2011 11:59 AM, Jeremy Allison wrote:
> >>
> >> Can we resurrect this effort? Is copyfile() still a good way to go,
> >> or should we look at other hooks?
> >
> > Windows uses a COPYCHUNK call, which specifies the
> > following parameters:
> >
> > Definition of a copy "chunk":
> >
> > hyper source_off;
> > hyper target_off;
> > uint32 length;
> >
> > and an array of these chunks which is passed
> > into their kernel.
> >
> > This is what we have to implement in Samba.
> >
>
> Could we do this by (re-)allowing sendfile() between two files?
Oooh - nice idea ! Yes, having a completely symmetric sendfile
which allows socket -> file, file -> socket, socket -> socket,
file -> file would be a great idea (IMHO).
Jeremy.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-19 22:19 ` H. Peter Anvin
2011-12-19 22:34 ` Jeremy Allison
@ 2011-12-19 22:57 ` Dave Chinner
2011-12-19 23:29 ` H. Peter Anvin
1 sibling, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2011-12-19 22:57 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Jeremy Allison, Ric Wheeler, linux-scsi@vger.kernel.org,
linux-fsdevel, Hannes Reinecke, Andrew Morton, linux-nfs,
Joel Becker, James Bottomley
On Mon, Dec 19, 2011 at 02:19:43PM -0800, H. Peter Anvin wrote:
> On 12/14/2011 11:59 AM, Jeremy Allison wrote:
> >>
> >> Can we resurrect this effort? Is copyfile() still a good way to go,
> >> or should we look at other hooks?
> >
> > Windows uses a COPYCHUNK call, which specifies the
> > following parameters:
> >
> > Definition of a copy "chunk":
> >
> > hyper source_off;
> > hyper target_off;
> > uint32 length;
> >
> > and an array of these chunks which is passed
> > into their kernel.
> >
> > This is what we have to implement in Samba.
> >
>
> Could we do this by (re-)allowing sendfile() between two files?
That was my immediate thought, but sendfile has plumbing that is
page cache based and we require completely different infrastructure
and semantics for an array offload.
e.g. for an array offload, we have to flush the source file page
cache first so that the data being copied is known to be on disk,
then invalidate the destination page cache if overwriting or extend
and pre-allocate blocks if not. Then we have to map both files and
hand that off to the array.
Then there's a whole bunch of tricky questions about what the state
of the destination file should look like while the copy is in
progress, whether the source file should be allowed to change (e.g.
it can't be truncated and have blocks freed and then reused by other
files half way through the copy offload operation), and so on.
sendfile() has well known, fixed semantics that we can't change to
suit what is needed for an offload operation that could potentially
take hours to complete. Hence I think an new syscall is the way to
go....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: copy offload support in Linux - new system call needed?
2011-12-19 22:57 ` Dave Chinner
@ 2011-12-19 23:29 ` H. Peter Anvin
0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2011-12-19 23:29 UTC (permalink / raw)
To: Dave Chinner
Cc: Jeremy Allison, Ric Wheeler,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel,
Hannes Reinecke, Andrew Morton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
Joel Becker, James Bottomley
On 12/19/2011 02:57 PM, Dave Chinner wrote:
>
> That was my immediate thought, but sendfile has plumbing that is
> page cache based and we require completely different infrastructure
> and semantics for an array offload.
>
The plumbing is internal to the kernel and doesn't mean we have to use
the same VFS methods.
> e.g. for an array offload, we have to flush the source file page
> cache first so that the data being copied is known to be on disk,
> then invalidate the destination page cache if overwriting or extend
> and pre-allocate blocks if not. Then we have to map both files and
> hand that off to the array.
>
> Then there's a whole bunch of tricky questions about what the state
> of the destination file should look like while the copy is in
> progress, whether the source file should be allowed to change (e.g.
> it can't be truncated and have blocks freed and then reused by other
> files half way through the copy offload operation), and so on.
>
> sendfile() has well known, fixed semantics that we can't change to
> suit what is needed for an offload operation that could potentially
> take hours to complete. Hence I think an new syscall is the way to
> go....
Perhaps what we need first in an explicit enumeration of the semantics
you're looking for.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2011-12-19 23:29 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-14 19:22 copy offload support in Linux - new system call needed? Ric Wheeler
2011-12-14 19:27 ` Al Viro
[not found] ` <20111214192739.GN2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2011-12-14 19:42 ` Ric Wheeler
[not found] ` <4EE8FC2E.3010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-12-14 22:27 ` J. Bruce Fields
2011-12-15 14:59 ` Trond Myklebust
2011-12-15 15:52 ` Chris Mason
2011-12-15 16:00 ` Trond Myklebust
2011-12-15 16:03 ` Jeff Layton
[not found] ` <20111215110330.33aed3a6-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
2011-12-15 16:06 ` Trond Myklebust
[not found] ` <1323965176.14317.11.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-12-15 16:16 ` Jeff Layton
2011-12-15 16:38 ` Trond Myklebust
2011-12-15 16:08 ` Loke, Chetan
[not found] ` <D3F292ADF945FB49B35E96C94C2061B91516E391-2s2rCY1e8UXHBhWB4kaBDUEOCMrvLtNR@public.gmane.org>
2011-12-15 16:11 ` Trond Myklebust
2011-12-15 16:40 ` Loke, Chetan
2011-12-15 16:53 ` Trond Myklebust
[not found] ` <1323968015.14317.28.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-12-15 17:18 ` Ric Wheeler
2011-12-15 17:25 ` Trond Myklebust
2011-12-15 17:31 ` Loke, Chetan
2011-12-15 17:55 ` Ric Wheeler
2011-12-15 17:27 ` Loke, Chetan
[not found] ` <1323961140.14317.2.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-12-15 17:44 ` J. Bruce Fields
2011-12-16 8:00 ` Joel Becker
2011-12-14 19:59 ` Jeremy Allison
2011-12-14 20:30 ` Ric Wheeler
2011-12-19 12:38 ` Hannes Reinecke
2011-12-19 22:19 ` H. Peter Anvin
2011-12-19 22:34 ` Jeremy Allison
2011-12-19 22:57 ` Dave Chinner
2011-12-19 23:29 ` H. Peter Anvin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).