linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v0 0/4] sys_copy_range() rough draft
@ 2013-05-14 21:15 Zach Brown
  2013-05-14 21:15 ` [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point Zach Brown
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Zach Brown @ 2013-05-14 21:15 UTC (permalink / raw)
  To: Martin K. Petersen, Trond Myklebust, linux-kernel, linux-fsdevel,
	linux-btrfs, linux-nfs

We've been talking about implementing some form of bulk data copy
offloading for a while now.  BTRFS and OCFS2 implement forms of copy
offloading with ioctls, NFS 4.2 will include a byte-granular COPY
operation, and the SCSI XCOPY command is being implemented now that
Windows can issue it.

In the past we've discussed promoting the ocfs2 reflink ioctl into a
system call that would create a new file and implicitly copy the
source data into the new file:
https://lkml.org/lkml/2009/9/14/481

These draft patches take the simpler approach of only copying data
between existing files.  The patches 1) make a system call out of the
btrfs CLONE_RANGE ioctl, 2) implement the btrfs .copy_range method with
the ioctl's guts, 3) implement the nfs .copy_range by sending a COPY
op, and 4) serve the COPY op in nfsd by calling the .copy_range method
again.

The nfs patch is an untested hack.  I'm happy to beat it in to shape
but I'll need some guidance.

I'd like strong review feedback on the interfaces, here are some
possible topics:

a) Hopefully being able to specify a portion of the data to copy will
avoid *huge* syscall latencies and the motivation for new async
semantics.

b) The BTRFS ioctl and nfs COPY let you specify a count of 0 to copy
from the start offset to the end of the file.  Does anyone have a
strong feeling about this?  I'm leaning towards not bothering with it
in the syscall interface.

c) I chose to return partial progess in the ssize_t return code.  This
limits the length of the range and the size_t count argument can be too
large and return errors, much like other io syscalls.  This seemed
less awful than some extra argument with a pointer to a status value.

d) I'm dreading mentioning a vector of ranges to copy in one syscall
because I don't want to think about overlaping ranges and file systems
that use range locks -- xfs for now, but more if Jan gets his way.
I'd rather that we get some experience with this simpler syscall before
taking on that headache.

I'm sure I'm forgetting some other details.

I'm going to keep hacking away at this.  My next step is to get ext4
supporting .copy_range, probably with a quick hack to copy the
contents of bios.  Hopefully that'll give enough time to also integrate
review feedback.

Thoughts?

- z

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point
@ 2013-05-15 17:50 Steve French
  2013-05-15 18:54 ` J. Bruce Fields
       [not found] ` <CAH2r5ms0P8Hgv1mUpyHA32Er38iiaC1HHC4fhxvz2SBFy6Sucw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 2 replies; 24+ messages in thread
From: Steve French @ 2013-05-15 17:50 UTC (permalink / raw)
  To: linux-fsdevel, zab-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Doesn't the new syscall have to invalidate the page cache pages that
the server is about to overwrite as btrfs does with the following line
in fs/btrfs/ioctl.c

	truncate_inode_pages_range(&inode->i_data, destoff,
				   PAGE_CACHE_ALIGN(destoff + len) - 1);

(and doesn't truncate_inode_pages_range handle page cache alignment
anyway - and also why did btrfs use truncate_inode_pages_range instead
of invalidate?)

Does nfs client ever have the case where two different superblocks map
to the same nfs export (and thus the check below is restricting the
ability to do server side copy)?

+	if (inode_in->i_sb != inode_out->i_sb ||
+	    file_in->f_path.mnt != file_out->f_path.mnt)
+		return -EXDEV;

I am working on cifs client patches for the ioctl, and the new syscall
also looks pretty easy.   Some popular cifs servers (like Windows)
have supported smb/cifs copy offload for many, many years - and now
Samba with the support that David Disseldorp added for the clone range
ioctl has been supporting copychunk (server side copy from Windows to
Samba) so about time to finish the cifs client equivalent.

-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-05-21 19:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-14 21:15 [RFC v0 0/4] sys_copy_range() rough draft Zach Brown
2013-05-14 21:15 ` [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point Zach Brown
2013-05-15 19:44   ` Eric Wong
2013-05-15 20:03     ` Zach Brown
2013-05-16 21:16       ` Ric Wheeler
2013-05-21 19:47       ` Eric Wong
2013-05-21 19:50         ` Zach Brown
2013-05-14 21:15 ` [RFC v0 2/4] x86: add sys_copy_range to syscall tables Zach Brown
2013-05-14 21:15 ` [RFC v0 3/4] btrfs: add .copy_range file operation Zach Brown
     [not found] ` <1368566126-17610-1-git-send-email-zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-14 21:15   ` [RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support Zach Brown
     [not found]     ` <1368566126-17610-5-git-send-email-zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-15 20:19       ` J. Bruce Fields
     [not found]         ` <20130515201949.GD25994-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-05-15 20:21           ` Myklebust, Trond
2013-05-15 20:24             ` J. Bruce Fields
2013-05-14 21:42   ` [RFC v0 0/4] sys_copy_range() rough draft Dave Chinner
2013-05-14 22:04     ` Zach Brown
2013-05-15  1:01       ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2013-05-15 17:50 [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point Steve French
2013-05-15 18:54 ` J. Bruce Fields
     [not found]   ` <20130515185429.GA25994-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-05-15 19:39     ` Zach Brown
     [not found] ` <CAH2r5ms0P8Hgv1mUpyHA32Er38iiaC1HHC4fhxvz2SBFy6Sucw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-15 19:36   ` Zach Brown
     [not found]     ` <20130515193600.GA318-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2013-05-15 20:08       ` Steve French
2013-05-15 20:16         ` Chris Mason
     [not found]           ` <20130515201614.24668.83788-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2013-05-15 20:21             ` Steve French
2013-05-15 20:25               ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).