* vfs: move btrfs clone ioctls to common code
@ 2015-11-26 18:50 Christoph Hellwig
2015-11-26 18:50 ` [PATCH 2/5] locks: new locks_mandatory_area calling convention Christoph Hellwig
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Christoph Hellwig @ 2015-11-26 18:50 UTC (permalink / raw)
To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
Cc: tao.peng-7I+n7zu2hftEKMMhf/gKZA,
jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA,
linux-cifs-u79uwXL29TY76Z2rM5mHXA
This patch set moves the existing btrfs clone ioctls that other file
system have started to implement to common code, and allows the NFS
server to export this functionality to remote systems.
This work is based originally on my NFS CLONE prototype, which reused
code from Anna Schumaker's NFS COPY prototype, as well as various
updates from Peng Tao to this code.
The patches are also available as a git branch and on gitweb:
git://git.infradead.org/users/hch/pnfs.git clone-for-viro
http://git.infradead.org/users/hch/pnfs.git/shortlog/refs/heads/clone-for-viro
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH 2/5] locks: new locks_mandatory_area calling convention 2015-11-26 18:50 vfs: move btrfs clone ioctls to common code Christoph Hellwig @ 2015-11-26 18:50 ` Christoph Hellwig [not found] ` <1448563859-21922-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org> 2015-11-26 18:50 ` [PATCH 3/5] vfs: pull btrfs clone API to vfs layer Christoph Hellwig ` (2 subsequent siblings) 3 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2015-11-26 18:50 UTC (permalink / raw) To: viro Cc: tao.peng, jeff.layton, linux-fsdevel, linux-btrfs, linux-nfs, linux-cifs Pass a loff_t end for the last byte instead of the 32-bit count parameter to allow full file clones even on 32-bit architectures. While we're at it also drop the pointless inode argument and simplify the read/write selection. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/locks.c | 22 +++++++++------------- fs/read_write.c | 5 ++--- include/linux/fs.h | 28 +++++++++++++--------------- 3 files changed, 24 insertions(+), 31 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 0d2b326..d503669 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1227,21 +1227,17 @@ int locks_mandatory_locked(struct file *file) /** * locks_mandatory_area - Check for a conflicting lock - * @read_write: %FLOCK_VERIFY_WRITE for exclusive access, %FLOCK_VERIFY_READ - * for shared - * @inode: the file to check * @filp: how the file was opened (if it was) - * @offset: start of area to check - * @count: length of area to check + * @start: first byte in the file to check + * @end: lastbyte in the file to check + * @write: %true if checking for write access * * Searches the inode's list of locks to find any POSIX locks which conflict. - * This function is called from rw_verify_area() and - * locks_verify_truncate(). */ -int locks_mandatory_area(int read_write, struct inode *inode, - struct file *filp, loff_t offset, - size_t count) +int locks_mandatory_area(struct file *filp, loff_t start, loff_t end, + bool write) { + struct inode *inode = file_inode(filp); struct file_lock fl; int error; bool sleep = false; @@ -1252,9 +1248,9 @@ int locks_mandatory_area(int read_write, struct inode *inode, fl.fl_flags = FL_POSIX | FL_ACCESS; if (filp && !(filp->f_flags & O_NONBLOCK)) sleep = true; - fl.fl_type = (read_write == FLOCK_VERIFY_WRITE) ? F_WRLCK : F_RDLCK; - fl.fl_start = offset; - fl.fl_end = offset + count - 1; + fl.fl_type = write ? F_WRLCK : F_RDLCK; + fl.fl_start = start; + fl.fl_end = end; for (;;) { if (filp) { diff --git a/fs/read_write.c b/fs/read_write.c index c81ef39..48157dd 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -396,9 +396,8 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t } if (unlikely(inode->i_flctx && mandatory_lock(inode))) { - retval = locks_mandatory_area( - read_write == READ ? FLOCK_VERIFY_READ : FLOCK_VERIFY_WRITE, - inode, file, pos, count); + retval = locks_mandatory_area(file, pos, pos + count - 1, + read_write == READ ? false : true); if (retval < 0) return retval; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 870a76e..e640f791 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2030,12 +2030,9 @@ extern struct kobject *fs_kobj; #define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK) -#define FLOCK_VERIFY_READ 1 -#define FLOCK_VERIFY_WRITE 2 - #ifdef CONFIG_FILE_LOCKING extern int locks_mandatory_locked(struct file *); -extern int locks_mandatory_area(int, struct inode *, struct file *, loff_t, size_t); +extern int locks_mandatory_area(struct file *, loff_t, loff_t, bool); /* * Candidates for mandatory locking have the setgid bit set @@ -2068,14 +2065,16 @@ static inline int locks_verify_truncate(struct inode *inode, struct file *filp, loff_t size) { - if (inode->i_flctx && mandatory_lock(inode)) - return locks_mandatory_area( - FLOCK_VERIFY_WRITE, inode, filp, - size < inode->i_size ? size : inode->i_size, - (size < inode->i_size ? inode->i_size - size - : size - inode->i_size) - ); - return 0; + if (!inode->i_flctx || !mandatory_lock(inode)) + return 0; + + if (size < inode->i_size) { + return locks_mandatory_area(filp, size, inode->i_size - 1, + true); + } else { + return locks_mandatory_area(filp, inode->i_size, size - 1, + true); + } } static inline int break_lease(struct inode *inode, unsigned int mode) @@ -2144,9 +2143,8 @@ static inline int locks_mandatory_locked(struct file *file) return 0; } -static inline int locks_mandatory_area(int rw, struct inode *inode, - struct file *filp, loff_t offset, - size_t count) +static inline int locks_mandatory_area(struct file *filp, loff_t start, + loff_t end, bool write) { return 0; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <1448563859-21922-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>]
* Re: [PATCH 2/5] locks: new locks_mandatory_area calling convention [not found] ` <1448563859-21922-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org> @ 2015-11-30 22:38 ` J. Bruce Fields [not found] ` <20151130223830.GB31564-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: J. Bruce Fields @ 2015-11-30 22:38 UTC (permalink / raw) To: Christoph Hellwig Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, tao.peng-7I+n7zu2hftEKMMhf/gKZA, jeff.layton-7I+n7zu2hftEKMMhf/gKZA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-btrfs-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA On Thu, Nov 26, 2015 at 07:50:56PM +0100, Christoph Hellwig wrote: > Pass a loff_t end for the last byte instead of the 32-bit count > parameter to allow full file clones even on 32-bit architectures. > While we're at it also drop the pointless inode argument and simplify > the read/write selection. > > Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> > --- > fs/locks.c | 22 +++++++++------------- > fs/read_write.c | 5 ++--- > include/linux/fs.h | 28 +++++++++++++--------------- > 3 files changed, 24 insertions(+), 31 deletions(-) > > diff --git a/fs/locks.c b/fs/locks.c > index 0d2b326..d503669 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -1227,21 +1227,17 @@ int locks_mandatory_locked(struct file *file) > > /** > * locks_mandatory_area - Check for a conflicting lock > - * @read_write: %FLOCK_VERIFY_WRITE for exclusive access, %FLOCK_VERIFY_READ > - * for shared > - * @inode: the file to check > * @filp: how the file was opened (if it was) > - * @offset: start of area to check > - * @count: length of area to check > + * @start: first byte in the file to check > + * @end: lastbyte in the file to check > + * @write: %true if checking for write access > * > * Searches the inode's list of locks to find any POSIX locks which conflict. > - * This function is called from rw_verify_area() and > - * locks_verify_truncate(). > */ > -int locks_mandatory_area(int read_write, struct inode *inode, > - struct file *filp, loff_t offset, > - size_t count) > +int locks_mandatory_area(struct file *filp, loff_t start, loff_t end, > + bool write) > { > + struct inode *inode = file_inode(filp); > struct file_lock fl; > int error; > bool sleep = false; > @@ -1252,9 +1248,9 @@ int locks_mandatory_area(int read_write, struct inode *inode, > fl.fl_flags = FL_POSIX | FL_ACCESS; > if (filp && !(filp->f_flags & O_NONBLOCK)) > sleep = true; > - fl.fl_type = (read_write == FLOCK_VERIFY_WRITE) ? F_WRLCK : F_RDLCK; > - fl.fl_start = offset; > - fl.fl_end = offset + count - 1; > + fl.fl_type = write ? F_WRLCK : F_RDLCK; > + fl.fl_start = start; > + fl.fl_end = end; > > for (;;) { > if (filp) { > diff --git a/fs/read_write.c b/fs/read_write.c > index c81ef39..48157dd 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -396,9 +396,8 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t > } > > if (unlikely(inode->i_flctx && mandatory_lock(inode))) { > - retval = locks_mandatory_area( > - read_write == READ ? FLOCK_VERIFY_READ : FLOCK_VERIFY_WRITE, > - inode, file, pos, count); > + retval = locks_mandatory_area(file, pos, pos + count - 1, > + read_write == READ ? false : true); > if (retval < 0) > return retval; > } > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 870a76e..e640f791 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2030,12 +2030,9 @@ extern struct kobject *fs_kobj; > > #define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK) > > -#define FLOCK_VERIFY_READ 1 > -#define FLOCK_VERIFY_WRITE 2 > - > #ifdef CONFIG_FILE_LOCKING > extern int locks_mandatory_locked(struct file *); > -extern int locks_mandatory_area(int, struct inode *, struct file *, loff_t, size_t); > +extern int locks_mandatory_area(struct file *, loff_t, loff_t, bool); > > /* > * Candidates for mandatory locking have the setgid bit set > @@ -2068,14 +2065,16 @@ static inline int locks_verify_truncate(struct inode *inode, > struct file *filp, > loff_t size) > { > - if (inode->i_flctx && mandatory_lock(inode)) > - return locks_mandatory_area( > - FLOCK_VERIFY_WRITE, inode, filp, > - size < inode->i_size ? size : inode->i_size, > - (size < inode->i_size ? inode->i_size - size > - : size - inode->i_size) > - ); > - return 0; > + if (!inode->i_flctx || !mandatory_lock(inode)) > + return 0; > + > + if (size < inode->i_size) { > + return locks_mandatory_area(filp, size, inode->i_size - 1, > + true); > + } else { > + return locks_mandatory_area(filp, inode->i_size, size - 1, > + true); I feel like these callers would be just slightly more self-documenting if that last parameter was F_WRLCK instead of true. But I could live with it either way, patch looks like an improvement--ACK. --b. > + } > } > > static inline int break_lease(struct inode *inode, unsigned int mode) > @@ -2144,9 +2143,8 @@ static inline int locks_mandatory_locked(struct file *file) > return 0; > } > > -static inline int locks_mandatory_area(int rw, struct inode *inode, > - struct file *filp, loff_t offset, > - size_t count) > +static inline int locks_mandatory_area(struct file *filp, loff_t start, > + loff_t end, bool write) > { > return 0; > } > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20151130223830.GB31564-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH 2/5] locks: new locks_mandatory_area calling convention [not found] ` <20151130223830.GB31564-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2015-12-01 7:37 ` Christoph Hellwig 0 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2015-12-01 7:37 UTC (permalink / raw) To: J. Bruce Fields Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, tao.peng-7I+n7zu2hftEKMMhf/gKZA, jeff.layton-7I+n7zu2hftEKMMhf/gKZA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-btrfs-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA On Mon, Nov 30, 2015 at 05:38:30PM -0500, J. Bruce Fields wrote: > > + if (size < inode->i_size) { > > + return locks_mandatory_area(filp, size, inode->i_size - 1, > > + true); > > + } else { > > + return locks_mandatory_area(filp, inode->i_size, size - 1, > > + true); > > I feel like these callers would be just slightly more self-documenting > if that last parameter was F_WRLCK instead of true. Sure, I can change that forthe next version. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/5] vfs: pull btrfs clone API to vfs layer 2015-11-26 18:50 vfs: move btrfs clone ioctls to common code Christoph Hellwig 2015-11-26 18:50 ` [PATCH 2/5] locks: new locks_mandatory_area calling convention Christoph Hellwig @ 2015-11-26 18:50 ` Christoph Hellwig 2015-11-26 18:50 ` [PATCH 4/5] nfsd: Pass filehandle to nfs4_preprocess_stateid_op() Christoph Hellwig [not found] ` <1448563859-21922-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org> 3 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2015-11-26 18:50 UTC (permalink / raw) To: viro Cc: tao.peng, jeff.layton, linux-fsdevel, linux-btrfs, linux-nfs, linux-cifs The btrfs ioctl clones are now adopted by other file systems: NFS since 4.3 and XFS a few kernel in the future, as well as the previous (incorrect) usage by CIFS. To avoid growth of various slightly incompatible implementation add one to the core VFS code. Note that clones are different from file copies in various ways: - they are atomic vs other writers - they support whole file clones - they support 64-bit legth clones - they do not allow partial success (aka short writes) - clones are expected to be a fast metadata operation Because of that it would be rather cumbersome to try to piggyback them on top of the recent clone_file_range infrastructure. Based on earlier work from Peng Tao. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/btrfs/ctree.h | 3 +- fs/btrfs/file.c | 1 + fs/btrfs/ioctl.c | 49 ++-------------------- fs/ioctl.c | 29 +++++++++++++ fs/nfs/nfs42proc.c | 1 + fs/nfs/nfs4file.c | 107 ++++++++--------------------------------------- fs/read_write.c | 71 +++++++++++++++++++++++++++++++ include/linux/fs.h | 7 +++- include/uapi/linux/fs.h | 9 ++++ include/uapi/linux/nfs.h | 11 ----- 10 files changed, 140 insertions(+), 148 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index dd7d888..adc997f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -4021,7 +4021,6 @@ void btrfs_get_block_group_info(struct list_head *groups_list, void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock, struct btrfs_ioctl_balance_args *bargs); - /* file.c */ int btrfs_auto_defrag_init(void); void btrfs_auto_defrag_exit(void); @@ -4054,6 +4053,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t len, unsigned int flags); +int btrfs_clone_file_range(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, u64 len); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 1c0ee74..3b61b0a 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2921,6 +2921,7 @@ const struct file_operations btrfs_file_operations = { .compat_ioctl = btrfs_ioctl, #endif .copy_file_range = btrfs_copy_file_range, + .clone_file_range = btrfs_clone_file_range, }; void btrfs_auto_defrag_exit(void) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0f92735..85b1cae 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3906,49 +3906,10 @@ ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in, return ret; } -static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, - u64 off, u64 olen, u64 destoff) +int btrfs_clone_file_range(struct file *src_file, loff_t off, + struct file *dst_file, loff_t destoff, u64 len) { - struct fd src_file; - int ret; - - /* the destination must be opened for writing */ - if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND)) - return -EINVAL; - - ret = mnt_want_write_file(file); - if (ret) - return ret; - - src_file = fdget(srcfd); - if (!src_file.file) { - ret = -EBADF; - goto out_drop_write; - } - - /* the src must be open for reading */ - if (!(src_file.file->f_mode & FMODE_READ)) { - ret = -EINVAL; - goto out_fput; - } - - ret = btrfs_clone_files(file, src_file.file, off, olen, destoff); - -out_fput: - fdput(src_file); -out_drop_write: - mnt_drop_write_file(file); - return ret; -} - -static long btrfs_ioctl_clone_range(struct file *file, void __user *argp) -{ - struct btrfs_ioctl_clone_range_args args; - - if (copy_from_user(&args, argp, sizeof(args))) - return -EFAULT; - return btrfs_ioctl_clone(file, args.src_fd, args.src_offset, - args.src_length, args.dest_offset); + return btrfs_clone_files(dst_file, src_file, off, len, destoff); } /* @@ -5498,10 +5459,6 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_dev_info(root, argp); case BTRFS_IOC_BALANCE: return btrfs_ioctl_balance(file, NULL); - case BTRFS_IOC_CLONE: - return btrfs_ioctl_clone(file, arg, 0, 0, 0); - case BTRFS_IOC_CLONE_RANGE: - return btrfs_ioctl_clone_range(file, argp); case BTRFS_IOC_TRANS_START: return btrfs_ioctl_trans_start(file); case BTRFS_IOC_TRANS_END: diff --git a/fs/ioctl.c b/fs/ioctl.c index 5d01d26..84c6e79 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -215,6 +215,29 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg) return error; } +static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd, + u64 off, u64 olen, u64 destoff) +{ + struct fd src_file = fdget(srcfd); + int ret; + + if (!src_file.file) + return -EBADF; + ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen); + fdput(src_file); + return ret; +} + +static long ioctl_file_clone_range(struct file *file, void __user *argp) +{ + struct file_clone_range args; + + if (copy_from_user(&args, argp, sizeof(args))) + return -EFAULT; + return ioctl_file_clone(file, args.src_fd, args.src_offset, + args.src_length, args.dest_offset); +} + #ifdef CONFIG_BLOCK static inline sector_t logical_to_blk(struct inode *inode, loff_t offset) @@ -600,6 +623,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, case FIGETBSZ: return put_user(inode->i_sb->s_blocksize, argp); + case FICLONE: + return ioctl_file_clone(filp, arg, 0, 0, 0); + + case FICLONERANGE: + return ioctl_file_clone_range(filp, argp); + default: if (S_ISREG(inode->i_mode)) error = file_ioctl(filp, cmd, arg); diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 3e92a3c..303d22e 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -284,6 +284,7 @@ static int _nfs42_proc_clone(struct rpc_message *msg, struct file *src_f, .dst_fh = NFS_FH(dst_inode), .src_offset = src_offset, .dst_offset = dst_offset, + .count = count, .dst_bitmask = server->cache_consistency_bitmask, }; struct nfs42_clone_res res = { diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 4aa5719..f46d087 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -194,63 +194,32 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t return nfs42_proc_allocate(filep, offset, len); } -static noinline long -nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd, - u64 src_off, u64 dst_off, u64 count) +static int nfs42_clone_file_range(struct file *src_file, loff_t src_off, + struct file *dst_file, loff_t dst_off, u64 count) { struct inode *dst_inode = file_inode(dst_file); struct nfs_server *server = NFS_SERVER(dst_inode); - struct fd src_file; - struct inode *src_inode; + struct inode *src_inode = file_inode(src_file); unsigned int bs = server->clone_blksize; + bool same_inode = false; int ret; - /* dst file must be opened for writing */ - if (!(dst_file->f_mode & FMODE_WRITE)) - return -EINVAL; - - ret = mnt_want_write_file(dst_file); - if (ret) - return ret; - - src_file = fdget(srcfd); - if (!src_file.file) { - ret = -EBADF; - goto out_drop_write; - } - - src_inode = file_inode(src_file.file); - - /* src and dst must be different files */ - ret = -EINVAL; - if (src_inode == dst_inode) - goto out_fput; - - /* src file must be opened for reading */ - if (!(src_file.file->f_mode & FMODE_READ)) - goto out_fput; - - /* src and dst must be regular files */ - ret = -EISDIR; - if (!S_ISREG(src_inode->i_mode) || !S_ISREG(dst_inode->i_mode)) - goto out_fput; - - ret = -EXDEV; - if (src_file.file->f_path.mnt != dst_file->f_path.mnt || - src_inode->i_sb != dst_inode->i_sb) - goto out_fput; - /* check alignment w.r.t. clone_blksize */ ret = -EINVAL; if (bs) { if (!IS_ALIGNED(src_off, bs) || !IS_ALIGNED(dst_off, bs)) - goto out_fput; + goto out; if (!IS_ALIGNED(count, bs) && i_size_read(src_inode) != (src_off + count)) - goto out_fput; + goto out; } + if (src_inode == dst_inode) + same_inode = true; + /* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */ - if (dst_inode < src_inode) { + if (same_inode) { + mutex_lock(&src_inode->i_mutex); + } else if (dst_inode < src_inode) { mutex_lock_nested(&dst_inode->i_mutex, I_MUTEX_PARENT); mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_CHILD); } else { @@ -267,7 +236,7 @@ nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd, if (ret) goto out_unlock; - ret = nfs42_proc_clone(src_file.file, dst_file, src_off, dst_off, count); + ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count); /* truncate inode page cache of the dst range so that future reads can fetch * new data from server */ @@ -275,56 +244,20 @@ nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd, truncate_inode_pages_range(&dst_inode->i_data, dst_off, dst_off + count - 1); out_unlock: - if (dst_inode < src_inode) { + if (same_inode) { + mutex_unlock(&src_inode->i_mutex); + } else if (dst_inode < src_inode) { mutex_unlock(&src_inode->i_mutex); mutex_unlock(&dst_inode->i_mutex); } else { mutex_unlock(&dst_inode->i_mutex); mutex_unlock(&src_inode->i_mutex); } -out_fput: - fdput(src_file); -out_drop_write: - mnt_drop_write_file(dst_file); +out: return ret; } - -static long nfs42_ioctl_clone_range(struct file *dst_file, void __user *argp) -{ - struct nfs_ioctl_clone_range_args args; - - if (copy_from_user(&args, argp, sizeof(args))) - return -EFAULT; - - return nfs42_ioctl_clone(dst_file, args.src_fd, args.src_off, args.dst_off, args.count); -} -#else -static long nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd, - u64 src_off, u64 dst_off, u64 count) -{ - return -ENOTTY; -} - -static long nfs42_ioctl_clone_range(struct file *dst_file, void __user *argp) -{ - return -ENOTTY; -} #endif /* CONFIG_NFS_V4_2 */ -long nfs4_ioctl(struct file *file, unsigned int cmd, unsigned long arg) -{ - void __user *argp = (void __user *)arg; - - switch (cmd) { - case NFS_IOC_CLONE: - return nfs42_ioctl_clone(file, arg, 0, 0, 0); - case NFS_IOC_CLONE_RANGE: - return nfs42_ioctl_clone_range(file, argp); - } - - return -ENOTTY; -} - const struct file_operations nfs4_file_operations = { #ifdef CONFIG_NFS_V4_2 .llseek = nfs4_file_llseek, @@ -344,12 +277,8 @@ const struct file_operations nfs4_file_operations = { .splice_write = iter_file_splice_write, #ifdef CONFIG_NFS_V4_2 .fallocate = nfs42_fallocate, + .clone_file_range = nfs42_clone_file_range, #endif /* CONFIG_NFS_V4_2 */ .check_flags = nfs_check_flags, .setlease = simple_nosetlease, -#ifdef CONFIG_COMPAT - .unlocked_ioctl = nfs4_ioctl, -#else - .compat_ioctl = nfs4_ioctl, -#endif /* CONFIG_COMPAT */ }; diff --git a/fs/read_write.c b/fs/read_write.c index 48157dd..095e209 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1451,3 +1451,74 @@ out1: out2: return ret; } + +static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write) +{ + struct inode *inode = file_inode(file); + + if (unlikely(pos < 0)) + return -EINVAL; + + if (unlikely((loff_t) (pos + len) < 0)) + return -EINVAL; + + if (unlikely(inode->i_flctx && mandatory_lock(inode))) { + loff_t end = len ? pos + len - 1 : OFFSET_MAX; + int retval; + + retval = locks_mandatory_area(file, pos, end, write); + if (retval < 0) + return retval; + } + + return security_file_permission(file, write ? MAY_WRITE : MAY_READ); +} + +int vfs_clone_file_range(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, u64 len) +{ + struct inode *inode_in = file_inode(file_in); + struct inode *inode_out = file_inode(file_out); + int ret; + + if (inode_in->i_sb != inode_out->i_sb || + file_in->f_path.mnt != file_out->f_path.mnt) + return -EXDEV; + + if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode)) + return -EISDIR; + if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode)) + return -EOPNOTSUPP; + + if (!(file_in->f_mode & FMODE_READ) || + !(file_out->f_mode & FMODE_WRITE) || + (file_out->f_flags & O_APPEND) || + !file_in->f_op->clone_file_range) + return -EBADF; + + ret = clone_verify_area(file_in, pos_in, len, false); + if (ret) + return ret; + + ret = clone_verify_area(file_out, pos_out, len, true); + if (ret) + return ret; + + if (pos_in + len > i_size_read(inode_in)) + return -EINVAL; + + ret = mnt_want_write_file(file_out); + if (ret) + return ret; + + ret = file_in->f_op->clone_file_range(file_in, pos_in, + file_out, pos_out, len); + if (!ret) { + fsnotify_access(file_in); + fsnotify_modify(file_out); + } + + mnt_drop_write_file(file_out); + return ret; +} +EXPORT_SYMBOL(vfs_clone_file_range); diff --git a/include/linux/fs.h b/include/linux/fs.h index e640f791..75ce095 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1629,7 +1629,10 @@ struct file_operations { #ifndef CONFIG_MMU unsigned (*mmap_capabilities)(struct file *); #endif - ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int); + ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, + loff_t, size_t, unsigned int); + int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t, + u64); }; struct inode_operations { @@ -1683,6 +1686,8 @@ extern ssize_t vfs_writev(struct file *, const struct iovec __user *, unsigned long, loff_t *); extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *, loff_t, size_t, unsigned int); +extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, u64 len); struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index f15d980..cd5db7f 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -39,6 +39,13 @@ #define RENAME_EXCHANGE (1 << 1) /* Exchange source and dest */ #define RENAME_WHITEOUT (1 << 2) /* Whiteout source */ +struct file_clone_range { + __s64 src_fd; + __u64 src_offset; + __u64 src_length; + __u64 dest_offset; +}; + struct fstrim_range { __u64 start; __u64 len; @@ -159,6 +166,8 @@ struct inodes_stat_t { #define FIFREEZE _IOWR('X', 119, int) /* Freeze */ #define FITHAW _IOWR('X', 120, int) /* Thaw */ #define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */ +#define FICLONE _IOW(0x94, 9, int) +#define FICLONERANGE _IOW(0x94, 13, struct file_clone_range) #define FS_IOC_GETFLAGS _IOR('f', 1, long) #define FS_IOC_SETFLAGS _IOW('f', 2, long) diff --git a/include/uapi/linux/nfs.h b/include/uapi/linux/nfs.h index 654bae3..5e62961 100644 --- a/include/uapi/linux/nfs.h +++ b/include/uapi/linux/nfs.h @@ -33,17 +33,6 @@ #define NFS_PIPE_DIRNAME "nfs" -/* NFS ioctls */ -/* Let's follow btrfs lead on CLONE to avoid messing userspace */ -#define NFS_IOC_CLONE _IOW(0x94, 9, int) -#define NFS_IOC_CLONE_RANGE _IOW(0x94, 13, int) - -struct nfs_ioctl_clone_range_args { - __s64 src_fd; - __u64 src_off, count; - __u64 dst_off; -}; - /* * NFS stats. The good thing with these values is that NFSv3 errors are * a superset of NFSv2 errors (with the exception of NFSERR_WFLUSH which -- 1.9.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/5] nfsd: Pass filehandle to nfs4_preprocess_stateid_op() 2015-11-26 18:50 vfs: move btrfs clone ioctls to common code Christoph Hellwig 2015-11-26 18:50 ` [PATCH 2/5] locks: new locks_mandatory_area calling convention Christoph Hellwig 2015-11-26 18:50 ` [PATCH 3/5] vfs: pull btrfs clone API to vfs layer Christoph Hellwig @ 2015-11-26 18:50 ` Christoph Hellwig [not found] ` <1448563859-21922-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org> 3 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2015-11-26 18:50 UTC (permalink / raw) To: viro Cc: tao.peng, jeff.layton, linux-fsdevel, linux-btrfs, linux-nfs, linux-cifs, Anna Schumaker, Anna Schumaker From: Anna Schumaker <Anna.Schumaker@netapp.com> This will be needed so COPY can look up the saved_fh in addition to the current_fh. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> --- fs/nfsd/nfs4proc.c | 16 +++++++++------- fs/nfsd/nfs4state.c | 5 ++--- fs/nfsd/state.h | 4 ++-- 3 files changed, 13 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a9f096c..3ba10a3 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -774,8 +774,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); /* check stateid */ - status = nfs4_preprocess_stateid_op(rqstp, cstate, &read->rd_stateid, - RD_STATE, &read->rd_filp, &read->rd_tmp_file); + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, + &read->rd_stateid, RD_STATE, + &read->rd_filp, &read->rd_tmp_file); if (status) { dprintk("NFSD: nfsd4_read: couldn't process stateid!\n"); goto out; @@ -921,7 +922,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (setattr->sa_iattr.ia_valid & ATTR_SIZE) { status = nfs4_preprocess_stateid_op(rqstp, cstate, - &setattr->sa_stateid, WR_STATE, NULL, NULL); + &cstate->current_fh, &setattr->sa_stateid, + WR_STATE, NULL, NULL); if (status) { dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n"); return status; @@ -985,8 +987,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (write->wr_offset >= OFFSET_MAX) return nfserr_inval; - status = nfs4_preprocess_stateid_op(rqstp, cstate, stateid, WR_STATE, - &filp, NULL); + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, + stateid, WR_STATE, &filp, NULL); if (status) { dprintk("NFSD: nfsd4_write: couldn't process stateid!\n"); return status; @@ -1016,7 +1018,7 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, __be32 status = nfserr_notsupp; struct file *file; - status = nfs4_preprocess_stateid_op(rqstp, cstate, + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &fallocate->falloc_stateid, WR_STATE, &file, NULL); if (status != nfs_ok) { @@ -1055,7 +1057,7 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, __be32 status; struct file *file; - status = nfs4_preprocess_stateid_op(rqstp, cstate, + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &seek->seek_stateid, RD_STATE, &file, NULL); if (status) { diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 6b800b5..df5dba6 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4797,10 +4797,9 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s, */ __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp, - struct nfsd4_compound_state *cstate, stateid_t *stateid, - int flags, struct file **filpp, bool *tmp_file) + struct nfsd4_compound_state *cstate, struct svc_fh *fhp, + stateid_t *stateid, int flags, struct file **filpp, bool *tmp_file) { - struct svc_fh *fhp = &cstate->current_fh; struct inode *ino = d_inode(fhp->fh_dentry); struct net *net = SVC_NET(rqstp); struct nfsd_net *nn = net_generic(net, nfsd_net_id); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 77fdf4d..99432b7 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -578,8 +578,8 @@ struct nfsd4_compound_state; struct nfsd_net; extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp, - struct nfsd4_compound_state *cstate, stateid_t *stateid, - int flags, struct file **filp, bool *tmp_file); + struct nfsd4_compound_state *cstate, struct svc_fh *fhp, + stateid_t *stateid, int flags, struct file **filp, bool *tmp_file); __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, stateid_t *stateid, unsigned char typemask, struct nfs4_stid **s, struct nfsd_net *nn); -- 1.9.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <1448563859-21922-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>]
* [PATCH 1/5] cifs: implement clone_file_range operation [not found] ` <1448563859-21922-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org> @ 2015-11-26 18:50 ` Christoph Hellwig [not found] ` <1448563859-21922-2-git-send-email-hch-jcswGhMUV9g@public.gmane.org> 2015-11-26 18:50 ` [PATCH 5/5] nfsd: implement the NFSv4.2 CLONE operation Christoph Hellwig ` (3 subsequent siblings) 4 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2015-11-26 18:50 UTC (permalink / raw) To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn Cc: tao.peng-7I+n7zu2hftEKMMhf/gKZA, jeff.layton-7I+n7zu2hftEKMMhf/gKZA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-btrfs-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA And drop the fake support for the btrfs CLONE ioctl - SMB2 copies are chunked and do not actually implement clone semantics! Heavily based on a previous patch from Peng Tao. Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> --- fs/cifs/cifsfs.c | 25 ++++++++++++++ fs/cifs/cifsfs.h | 4 ++- fs/cifs/ioctl.c | 103 +++++++++++++++++++++++++++++++------------------------ 3 files changed, 86 insertions(+), 46 deletions(-) diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index cbc0f4b..ad7117a 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -914,6 +914,23 @@ const struct inode_operations cifs_symlink_inode_ops = { #endif }; +ssize_t cifs_file_copy_range(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + size_t len, unsigned int flags) +{ + unsigned int xid; + int rc; + + if (flags) + return -EOPNOTSUPP; + + xid = get_xid(); + rc = cifs_file_clone_range(xid, file_in, file_out, pos_in, + len, pos_out, true); + free_xid(xid); + return rc < 0 ? rc : len; +} + const struct file_operations cifs_file_ops = { .read_iter = cifs_loose_read_iter, .write_iter = cifs_file_write_iter, @@ -926,6 +943,7 @@ const struct file_operations cifs_file_ops = { .splice_read = generic_file_splice_read, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, + .copy_file_range = cifs_file_copy_range, .setlease = cifs_setlease, .fallocate = cifs_fallocate, }; @@ -942,6 +960,8 @@ const struct file_operations cifs_file_strict_ops = { .splice_read = generic_file_splice_read, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, + .copy_file_range = cifs_file_copy_range, + .copy_file_range = cifs_file_copy_range, .setlease = cifs_setlease, .fallocate = cifs_fallocate, }; @@ -958,6 +978,7 @@ const struct file_operations cifs_file_direct_ops = { .mmap = cifs_file_mmap, .splice_read = generic_file_splice_read, .unlocked_ioctl = cifs_ioctl, + .copy_file_range = cifs_file_copy_range, .llseek = cifs_llseek, .setlease = cifs_setlease, .fallocate = cifs_fallocate, @@ -974,6 +995,7 @@ const struct file_operations cifs_file_nobrl_ops = { .splice_read = generic_file_splice_read, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, + .copy_file_range = cifs_file_copy_range, .setlease = cifs_setlease, .fallocate = cifs_fallocate, }; @@ -989,6 +1011,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = { .splice_read = generic_file_splice_read, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, + .copy_file_range = cifs_file_copy_range, .setlease = cifs_setlease, .fallocate = cifs_fallocate, }; @@ -1004,6 +1027,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = { .mmap = cifs_file_mmap, .splice_read = generic_file_splice_read, .unlocked_ioctl = cifs_ioctl, + .copy_file_range = cifs_file_copy_range, .llseek = cifs_llseek, .setlease = cifs_setlease, .fallocate = cifs_fallocate, @@ -1014,6 +1038,7 @@ const struct file_operations cifs_dir_ops = { .release = cifs_closedir, .read = generic_read_dir, .unlocked_ioctl = cifs_ioctl, + .copy_file_range = cifs_file_copy_range, .llseek = generic_file_llseek, }; diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index c3cc160..797439b 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -131,7 +131,9 @@ extern int cifs_setxattr(struct dentry *, const char *, const void *, extern ssize_t cifs_getxattr(struct dentry *, const char *, void *, size_t); extern ssize_t cifs_listxattr(struct dentry *, char *, size_t); extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg); - +extern int cifs_file_clone_range(unsigned int xid, struct file *src_file, + struct file *dst_file, u64 off, u64 len, + u64 destoff, bool dup_extents); #ifdef CONFIG_CIFS_NFSD_EXPORT extern const struct export_operations cifs_export_ops; #endif /* CONFIG_CIFS_NFSD_EXPORT */ diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c index 35cf990..4f92f5c 100644 --- a/fs/cifs/ioctl.c +++ b/fs/cifs/ioctl.c @@ -34,73 +34,43 @@ #include "cifs_ioctl.h" #include <linux/btrfs.h> -static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file, - unsigned long srcfd, u64 off, u64 len, u64 destoff, - bool dup_extents) +int cifs_file_clone_range(unsigned int xid, struct file *src_file, + struct file *dst_file, u64 off, u64 len, + u64 destoff, bool dup_extents) { - int rc; - struct cifsFileInfo *smb_file_target = dst_file->private_data; + struct inode *src_inode = file_inode(src_file); struct inode *target_inode = file_inode(dst_file); - struct cifs_tcon *target_tcon; - struct fd src_file; struct cifsFileInfo *smb_file_src; - struct inode *src_inode; + struct cifsFileInfo *smb_file_target; struct cifs_tcon *src_tcon; + struct cifs_tcon *target_tcon; + int rc; cifs_dbg(FYI, "ioctl clone range\n"); - /* the destination must be opened for writing */ - if (!(dst_file->f_mode & FMODE_WRITE)) { - cifs_dbg(FYI, "file target not open for write\n"); - return -EINVAL; - } - /* check if target volume is readonly and take reference */ - rc = mnt_want_write_file(dst_file); - if (rc) { - cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc); - return rc; - } - - src_file = fdget(srcfd); - if (!src_file.file) { - rc = -EBADF; - goto out_drop_write; - } - - if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) { - rc = -EBADF; - cifs_dbg(VFS, "src file seems to be from a different filesystem type\n"); - goto out_fput; - } - - if ((!src_file.file->private_data) || (!dst_file->private_data)) { + if (!src_file->private_data || !dst_file->private_data) { rc = -EBADF; cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n"); - goto out_fput; + goto out; } rc = -EXDEV; smb_file_target = dst_file->private_data; - smb_file_src = src_file.file->private_data; + smb_file_src = src_file->private_data; src_tcon = tlink_tcon(smb_file_src->tlink); target_tcon = tlink_tcon(smb_file_target->tlink); /* check source and target on same server (or volume if dup_extents) */ if (dup_extents && (src_tcon != target_tcon)) { cifs_dbg(VFS, "source and target of copy not on same share\n"); - goto out_fput; + goto out; } if (!dup_extents && (src_tcon->ses != target_tcon->ses)) { cifs_dbg(VFS, "source and target of copy not on same server\n"); - goto out_fput; + goto out; } - src_inode = file_inode(src_file.file); - rc = -EINVAL; - if (S_ISDIR(src_inode->i_mode)) - goto out_fput; - /* * Note: cifs case is easier than btrfs since server responsible for * checks for proper open modes and file type and if it wants @@ -136,6 +106,52 @@ out_unlock: /* although unlocking in the reverse order from locking is not strictly necessary here it is a little cleaner to be consistent */ unlock_two_nondirectories(src_inode, target_inode); +out: + return rc; +} + +static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file, + unsigned long srcfd, u64 off, u64 len, u64 destoff, + bool dup_extents) +{ + int rc; + struct fd src_file; + struct inode *src_inode; + + cifs_dbg(FYI, "ioctl clone range\n"); + /* the destination must be opened for writing */ + if (!(dst_file->f_mode & FMODE_WRITE)) { + cifs_dbg(FYI, "file target not open for write\n"); + return -EINVAL; + } + + /* check if target volume is readonly and take reference */ + rc = mnt_want_write_file(dst_file); + if (rc) { + cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc); + return rc; + } + + src_file = fdget(srcfd); + if (!src_file.file) { + rc = -EBADF; + goto out_drop_write; + } + + if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) { + rc = -EBADF; + cifs_dbg(VFS, "src file seems to be from a different filesystem type\n"); + goto out_fput; + } + + src_inode = file_inode(src_file.file); + rc = -EINVAL; + if (S_ISDIR(src_inode->i_mode)) + goto out_fput; + + rc = cifs_file_clone_range(xid, src_file.file, dst_file, off, len, + destoff, dup_extents); + out_fput: fdput(src_file); out_drop_write: @@ -258,9 +274,6 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg) case CIFS_IOC_COPYCHUNK_FILE: rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, false); break; - case BTRFS_IOC_CLONE: - rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, true); - break; case CIFS_IOC_SET_INTEGRITY: if (pSMBFile == NULL) break; -- 1.9.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <1448563859-21922-2-git-send-email-hch-jcswGhMUV9g@public.gmane.org>]
* Re: [PATCH 1/5] cifs: implement clone_file_range operation [not found] ` <1448563859-21922-2-git-send-email-hch-jcswGhMUV9g@public.gmane.org> @ 2015-11-27 10:42 ` David Disseldorp [not found] ` <20151127114232.5b367b7b-TzLh5lQYVSQb1SvskN2V4Q@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: David Disseldorp @ 2015-11-27 10:42 UTC (permalink / raw) To: Christoph Hellwig Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, tao.peng-7I+n7zu2hftEKMMhf/gKZA, jeff.layton-7I+n7zu2hftEKMMhf/gKZA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, Steve French Hi Christoph, On Thu, 26 Nov 2015 19:50:55 +0100, Christoph Hellwig wrote: > And drop the fake support for the btrfs CLONE ioctl - SMB2 copies are > chunked and do not actually implement clone semantics! BTRFS_IOC_CLONE is implemented using the new ReFS FSCTL_DUPLICATE_EXTENTS_TO_FILE request, which was deemed to be COW based[1]: "The purpose of this operation is to make it look like a copy of a region from the source stream to the target stream has occurred when in reality no data is actually copied. This operation modifies the target stream’s extent list such that, the same clusters are pointed to by both the source and target streams’ extent lists for the region being copied." I think that's about as close as we're going to get to clone semantics for cifs. It's also dispatched as a single request covering the full file - chunking only occurs for CIFS_IOC_COPYCHUNK_FILE based requests, which are implemented using FSCTL_SRV_COPYCHUNK_WRITE, and not (always) handled by the server as a COW clone. It looks like there's also a minor cut 'n paste error here... > @@ -942,6 +960,8 @@ const struct file_operations cifs_file_strict_ops = { > .splice_read = generic_file_splice_read, > .llseek = cifs_llseek, > .unlocked_ioctl = cifs_ioctl, > + .copy_file_range = cifs_file_copy_range, > + .copy_file_range = cifs_file_copy_range, Cheers, David 1. FSCTL_DUPLICATE_EXTENTS_TO_FILE discussion https://lists.samba.org/archive/samba-technical/2015-February/105410.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20151127114232.5b367b7b-TzLh5lQYVSQb1SvskN2V4Q@public.gmane.org>]
* Re: [PATCH 1/5] cifs: implement clone_file_range operation [not found] ` <20151127114232.5b367b7b-TzLh5lQYVSQb1SvskN2V4Q@public.gmane.org> @ 2015-11-30 9:02 ` Christoph Hellwig 0 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2015-11-30 9:02 UTC (permalink / raw) To: David Disseldorp Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, tao.peng-7I+n7zu2hftEKMMhf/gKZA, jeff.layton-7I+n7zu2hftEKMMhf/gKZA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA, Steve French On Fri, Nov 27, 2015 at 11:42:32AM +0100, David Disseldorp wrote: > I think that's about as close as we're going to get to clone semantics > for cifs. It's also dispatched as a single request covering the full > file - chunking only occurs for CIFS_IOC_COPYCHUNK_FILE based requests, > which are implemented using FSCTL_SRV_COPYCHUNK_WRITE, and not (always) > handled by the server as a COW clone. Oh, I misread cifs_ioctl_clone - it does two entirely different things based on the dup_extents parameter. It looked like it did both of them in the dup_extents case. Re-reading the code it seems close enough, although the client side samping of the file size seems a little dangerous. I'll wire it up for clone_file_range for the next respin, but I'm still a little worried. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 5/5] nfsd: implement the NFSv4.2 CLONE operation [not found] ` <1448563859-21922-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org> 2015-11-26 18:50 ` [PATCH 1/5] cifs: implement clone_file_range operation Christoph Hellwig @ 2015-11-26 18:50 ` Christoph Hellwig 2015-11-30 22:56 ` vfs: move btrfs clone ioctls to common code J. Bruce Fields ` (2 subsequent siblings) 4 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2015-11-26 18:50 UTC (permalink / raw) To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn Cc: tao.peng-7I+n7zu2hftEKMMhf/gKZA, jeff.layton-7I+n7zu2hftEKMMhf/gKZA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-btrfs-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA This is basically a remote version of the btrfs CLONE operation, so the implementation is fairly trivial. Made even more trivial by stealing the XDR code and general framework Anna Schumaker's COPY prototype. Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> --- fs/nfsd/nfs4proc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ fs/nfsd/nfs4xdr.c | 21 +++++++++++++++++++++ fs/nfsd/vfs.c | 8 ++++++++ fs/nfsd/vfs.h | 2 ++ fs/nfsd/xdr4.h | 10 ++++++++++ include/linux/nfs4.h | 4 ++-- 6 files changed, 90 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 3ba10a3..819ad81 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1012,6 +1012,47 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } static __be32 +nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, + struct nfsd4_clone *clone) +{ + struct file *src, *dst; + __be32 status; + + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh, + &clone->cl_src_stateid, RD_STATE, + &src, NULL); + if (status) { + dprintk("NFSD: %s: couldn't process src stateid!\n", __func__); + goto out; + } + + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, + &clone->cl_dst_stateid, WR_STATE, + &dst, NULL); + if (status) { + dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__); + goto out_put_src; + } + + /* fix up for NFS-specific error code */ + if (!S_ISREG(file_inode(src)->i_mode) || + !S_ISREG(file_inode(dst)->i_mode)) { + status = nfserr_wrong_type; + goto out_put_dst; + } + + status = nfsd4_clone_file_range(src, clone->cl_src_pos, + dst, clone->cl_dst_pos, clone->cl_count); + +out_put_dst: + fput(dst); +out_put_src: + fput(src); +out: + return status; +} + +static __be32 nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_fallocate *fallocate, int flags) { @@ -2281,6 +2322,12 @@ static struct nfsd4_operation nfsd4_ops[] = { .op_name = "OP_DEALLOCATE", .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize, }, + [OP_CLONE] = { + .op_func = (nfsd4op_func)nfsd4_clone, + .op_flags = OP_MODIFIES_SOMETHING | OP_CACHEME, + .op_name = "OP_CLONE", + .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize, + }, [OP_SEEK] = { .op_func = (nfsd4op_func)nfsd4_seek, .op_name = "OP_SEEK", diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 51c9e9c..924416f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1675,6 +1675,25 @@ nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, } static __be32 +nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) +{ + DECODE_HEAD; + + status = nfsd4_decode_stateid(argp, &clone->cl_src_stateid); + if (status) + return status; + status = nfsd4_decode_stateid(argp, &clone->cl_dst_stateid); + if (status) + return status; + + READ_BUF(8 + 8 + 8); + p = xdr_decode_hyper(p, &clone->cl_src_pos); + p = xdr_decode_hyper(p, &clone->cl_dst_pos); + p = xdr_decode_hyper(p, &clone->cl_count); + DECODE_TAIL; +} + +static __be32 nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) { DECODE_HEAD; @@ -1785,6 +1804,7 @@ static nfsd4_dec nfsd4_dec_ops[] = { [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp, [OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek, [OP_WRITE_SAME] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_CLONE] = (nfsd4_dec)nfsd4_decode_clone, }; static inline bool @@ -4292,6 +4312,7 @@ static nfsd4_enc nfsd4_enc_ops[] = { [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop, [OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek, [OP_WRITE_SAME] = (nfsd4_enc)nfsd4_encode_noop, + [OP_CLONE] = (nfsd4_enc)nfsd4_encode_noop, }; /* diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 994d66f..5411bf0 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -36,6 +36,7 @@ #endif /* CONFIG_NFSD_V3 */ #ifdef CONFIG_NFSD_V4 +#include "../internal.h" #include "acl.h" #include "idmap.h" #endif /* CONFIG_NFSD_V4 */ @@ -498,6 +499,13 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp, } #endif +__be32 nfsd4_clone_file_range(struct file *src, u64 src_pos, struct file *dst, + u64 dst_pos, u64 count) +{ + return nfserrno(vfs_clone_file_range(src, src_pos, dst, dst_pos, + count)); +} + __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file, loff_t offset, loff_t len, int flags) diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index fcfc48c..c11ba31 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -56,6 +56,8 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *, struct xdr_netobj *); __be32 nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *, struct file *, loff_t, loff_t, int); +__be32 nfsd4_clone_file_range(struct file *, u64, struct file *, + u64, u64); #endif /* CONFIG_NFSD_V4 */ __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index ce7362c..d955481 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -491,6 +491,15 @@ struct nfsd4_fallocate { u64 falloc_length; }; +struct nfsd4_clone { + /* request */ + stateid_t cl_src_stateid; + stateid_t cl_dst_stateid; + u64 cl_src_pos; + u64 cl_dst_pos; + u64 cl_count; +}; + struct nfsd4_seek { /* request */ stateid_t seek_stateid; @@ -555,6 +564,7 @@ struct nfsd4_op { /* NFSv4.2 */ struct nfsd4_fallocate allocate; struct nfsd4_fallocate deallocate; + struct nfsd4_clone clone; struct nfsd4_seek seek; } u; struct nfs4_replay * replay; diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h index e7e7853..43aeabd 100644 --- a/include/linux/nfs4.h +++ b/include/linux/nfs4.h @@ -139,10 +139,10 @@ enum nfs_opnum4 { Needs to be updated if more operations are defined in future.*/ #define FIRST_NFS4_OP OP_ACCESS -#define LAST_NFS4_OP OP_WRITE_SAME #define LAST_NFS40_OP OP_RELEASE_LOCKOWNER #define LAST_NFS41_OP OP_RECLAIM_COMPLETE -#define LAST_NFS42_OP OP_WRITE_SAME +#define LAST_NFS42_OP OP_CLONE +#define LAST_NFS4_OP LAST_NFS42_OP enum nfsstat4 { NFS4_OK = 0, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: vfs: move btrfs clone ioctls to common code [not found] ` <1448563859-21922-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org> 2015-11-26 18:50 ` [PATCH 1/5] cifs: implement clone_file_range operation Christoph Hellwig 2015-11-26 18:50 ` [PATCH 5/5] nfsd: implement the NFSv4.2 CLONE operation Christoph Hellwig @ 2015-11-30 22:56 ` J. Bruce Fields 2015-12-01 17:09 ` Chris Mason 2015-12-01 22:48 ` Steve French 4 siblings, 0 replies; 17+ messages in thread From: J. Bruce Fields @ 2015-11-30 22:56 UTC (permalink / raw) To: Christoph Hellwig Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, tao.peng-7I+n7zu2hftEKMMhf/gKZA, jeff.layton-7I+n7zu2hftEKMMhf/gKZA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-btrfs-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA On Thu, Nov 26, 2015 at 07:50:54PM +0100, Christoph Hellwig wrote: > This patch set moves the existing btrfs clone ioctls that other file > system have started to implement to common code, and allows the NFS > server to export this functionality to remote systems. > > This work is based originally on my NFS CLONE prototype, which reused > code from Anna Schumaker's NFS COPY prototype, as well as various > updates from Peng Tao to this code. Looks good to me. (In particular: ACK to the locks.c and nfsd patches. But, disclaimer, I haven't tried to test clone.) --b. > > The patches are also available as a git branch and on gitweb: > > git://git.infradead.org/users/hch/pnfs.git clone-for-viro > http://git.infradead.org/users/hch/pnfs.git/shortlog/refs/heads/clone-for-viro > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vfs: move btrfs clone ioctls to common code [not found] ` <1448563859-21922-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org> ` (2 preceding siblings ...) 2015-11-30 22:56 ` vfs: move btrfs clone ioctls to common code J. Bruce Fields @ 2015-12-01 17:09 ` Chris Mason 2015-12-01 22:48 ` Steve French 4 siblings, 0 replies; 17+ messages in thread From: Chris Mason @ 2015-12-01 17:09 UTC (permalink / raw) To: Christoph Hellwig Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, tao.peng-7I+n7zu2hftEKMMhf/gKZA, jeff.layton-7I+n7zu2hftEKMMhf/gKZA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-btrfs-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-cifs-u79uwXL29TY76Z2rM5mHXA On Thu, Nov 26, 2015 at 07:50:54PM +0100, Christoph Hellwig wrote: > This patch set moves the existing btrfs clone ioctls that other file > system have started to implement to common code, and allows the NFS > server to export this functionality to remote systems. > > This work is based originally on my NFS CLONE prototype, which reused > code from Anna Schumaker's NFS COPY prototype, as well as various > updates from Peng Tao to this code. > > The patches are also available as a git branch and on gitweb: > > git://git.infradead.org/users/hch/pnfs.git clone-for-viro Thanks Christoph, this looks fine to me. -chris ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vfs: move btrfs clone ioctls to common code [not found] ` <1448563859-21922-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org> ` (3 preceding siblings ...) 2015-12-01 17:09 ` Chris Mason @ 2015-12-01 22:48 ` Steve French 2015-12-02 7:27 ` Christoph Hellwig 4 siblings, 1 reply; 17+ messages in thread From: Steve French @ 2015-12-01 22:48 UTC (permalink / raw) To: Christoph Hellwig Cc: Al Viro, Peng Tao, Jeffrey Layton, linux-fsdevel, linux-btrfs-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org In the new API is there a way to distinguish between the two copy offload behaviors: 1) FSCTL_DUPLICATE_EXTENTS (where the server file system increments a refcount on blocks in the range) and 2) FSCTL_COPYCHUNK (where the server does a server side copy of the requested range, but does not necessarily use reflink, although in the case of Samba on btrfs it implements it this way). In this case NTFS will usually make a copy of the range requested rather than linking the ranges. For the former cifs uses (used prior to this patc) the btrfs ioctl, for the latter it has a private ioctl (CIFS_IOC_COPYCHUNK_FILE). For the former the files have to be on the same share (export) for the latter it just requires that the files be on the same server, and in common cases (drag an drop in the file explorer on the desktop) the source and target files would be on different mounts to the same server. There is an unimplemented (in cifs.ko) whole file clone operation (copy-on-write file with a network API similar to the hardlink) but it looks like it is no longer supported by newer servers, perhaps because there is more interest in the ODX mechanism for duplicating files ala https://msdn.microsoft.com/en-us/library/windows/desktop/hh848056(v=vs.85).aspx across server farms for managing virtualization images. I need to add the ODX copy offload mechanism to cifs.ko but presumably it would behave more like FSCTL_COPYCHUNK (ie not do a reflink of the blocks) The performance improvements from server side copy offload is huge whether or not reflink is done - so allowing the cp command (or common user space commands ala robocopy which already does this in Windows) to do fast copy is particularly important for network file systems. On Thu, Nov 26, 2015 at 12:50 PM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote: > This patch set moves the existing btrfs clone ioctls that other file > system have started to implement to common code, and allows the NFS > server to export this functionality to remote systems. > > This work is based originally on my NFS CLONE prototype, which reused > code from Anna Schumaker's NFS COPY prototype, as well as various > updates from Peng Tao to this code. > > The patches are also available as a git branch and on gitweb: > > git://git.infradead.org/users/hch/pnfs.git clone-for-viro > http://git.infradead.org/users/hch/pnfs.git/shortlog/refs/heads/clone-for-viro > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vfs: move btrfs clone ioctls to common code 2015-12-01 22:48 ` Steve French @ 2015-12-02 7:27 ` Christoph Hellwig [not found] ` <20151202072757.GB15839-jcswGhMUV9g@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2015-12-02 7:27 UTC (permalink / raw) To: Steve French Cc: Christoph Hellwig, Al Viro, Peng Tao, Jeffrey Layton, linux-fsdevel, linux-btrfs, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org Hi Steve, we have two APIs in Linux: - the copy_file_range syscall which just is a "do a copy by any means" - the btrfs clone ioctls which have stricter semantics that very much expect a reflink-like operation I plan to also wire up copy_file_range to try the clone_file_range method first if available to make life easier for file systems, but as there isn't any test coverage for that I don't dare to actually submit it yet. I'll send a compile tested only RFC for it when resending this series. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20151202072757.GB15839-jcswGhMUV9g@public.gmane.org>]
* Re: vfs: move btrfs clone ioctls to common code [not found] ` <20151202072757.GB15839-jcswGhMUV9g@public.gmane.org> @ 2015-12-02 17:40 ` Steve French 2015-12-03 10:30 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: Steve French @ 2015-12-02 17:40 UTC (permalink / raw) To: Christoph Hellwig Cc: Al Viro, Peng Tao, Jeffrey Layton, linux-fsdevel, linux-btrfs-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Dec 2, 2015 at 1:27 AM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote: > Hi Steve, > > we have two APIs in Linux: > > - the copy_file_range syscall which just is a "do a copy by any means" > - the btrfs clone ioctls which have stricter semantics that very much > expect a reflink-like operation If the copy_file_range is allowed to use any offload mechanism then cifs.ko could be changed as follows, to fallback among the three possible mechanisms depending on what the target supports. - send the fastest one of the three choices, the "reflink-like") FSCTL_DUPLICATE_EXTENTS (there is a server fs capability that we check at mount time that indicates whether it is supported). If it is not supported or if the source and target are on different shares (exports) then fallback to - send the ODX style copy offload (when implemented). This is the only one that could in theory support cross-server copies (rather than require copy from a source and target on the same server) - (if the above aren't supported) send the FSCTL_COPYCHUNK (currently called via CIFS_IOC_COPYCHUNK_FILE) For the btrfs_ioc_clone_range (or similar ", FSCTL_DUPLICATE_EXTENTS could probably stay the same since it is the only one of the three that guarantees using reflinks. If we want to for Linux->Samba, we could probably add a whole file clone (similar to hardlinks on the wire) to Samba and cifs.ko if that is useful (as opposed to the three mechanisms above which are copy ranges) In addition, I noticed that the cp command has added various optimizations for sparse file enablement. I need to test those on cifs.ko and update the ioctls for retrieving sparse ranges o make sure that they work over SMB3 mounts, for optimizing the case where the source file is sparse, and mostly empty. > I plan to also wire up copy_file_range to try the clone_file_range method > first if available to make life easier for file systems, but as there isn't > any test coverage for that I don't dare to actually submit it yet. I'll > send a compile tested only RFC for it when resending this series. -- Thanks, Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vfs: move btrfs clone ioctls to common code 2015-12-02 17:40 ` Steve French @ 2015-12-03 10:30 ` Christoph Hellwig [not found] ` <20151203103035.GA15996-jcswGhMUV9g@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2015-12-03 10:30 UTC (permalink / raw) To: Steve French Cc: Christoph Hellwig, Al Viro, Peng Tao, Jeffrey Layton, linux-fsdevel, linux-btrfs, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org On Wed, Dec 02, 2015 at 11:40:13AM -0600, Steve French wrote: > If the copy_file_range is allowed to use any offload mechanism then > cifs.ko could be changed as follows, to fallback among the three > possible mechanisms depending on what the target supports. How reliable are the fallbacks? E.g. for clones we usually have alignment restrictions that we'd need to communicate back, and cifs currently doesn't have client side checks for those. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20151203103035.GA15996-jcswGhMUV9g@public.gmane.org>]
* Re: vfs: move btrfs clone ioctls to common code [not found] ` <20151203103035.GA15996-jcswGhMUV9g@public.gmane.org> @ 2015-12-03 19:28 ` Steve French 0 siblings, 0 replies; 17+ messages in thread From: Steve French @ 2015-12-03 19:28 UTC (permalink / raw) To: Christoph Hellwig Cc: Al Viro, Peng Tao, Jeffrey Layton, linux-fsdevel, linux-btrfs-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, Dec 3, 2015 at 4:30 AM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote: > On Wed, Dec 02, 2015 at 11:40:13AM -0600, Steve French wrote: >> If the copy_file_range is allowed to use any offload mechanism then >> cifs.ko could be changed as follows, to fallback among the three >> possible mechanisms depending on what the target supports. > > How reliable are the fallbacks? E.g. for clones we usually have alignment > restrictions that we'd need to communicate back, and cifs currently > doesn't have client side checks for those. I am not worried about fallback inconsistency for the current two options, if block refcounting is not supported we will know before we issue the request, and the fallback copy chunk has few restrictions. When we add ODX there may be additional alignments restrictions, but don't know until we investigate more. Although we can query alignment over CIFS and SMB3, it is less important to know over a network file system than a block device, and unlikely to be a restriction. Although the protocol does not restrict the maximum chunk size, the server can return an error indicating the maximum supported chunk size, allowing the client to retry with the size of chunks the server requests. To match existing server behavior with reasonable defaults for common servers - the cifs client uses 16 chunks of 1MB each for each FSCTL_SRV_COPYCHUNK_WRITE request sent on the wire. -- Thanks, Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2015-12-03 19:28 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-26 18:50 vfs: move btrfs clone ioctls to common code Christoph Hellwig
2015-11-26 18:50 ` [PATCH 2/5] locks: new locks_mandatory_area calling convention Christoph Hellwig
[not found] ` <1448563859-21922-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-11-30 22:38 ` J. Bruce Fields
[not found] ` <20151130223830.GB31564-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-12-01 7:37 ` Christoph Hellwig
2015-11-26 18:50 ` [PATCH 3/5] vfs: pull btrfs clone API to vfs layer Christoph Hellwig
2015-11-26 18:50 ` [PATCH 4/5] nfsd: Pass filehandle to nfs4_preprocess_stateid_op() Christoph Hellwig
[not found] ` <1448563859-21922-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-11-26 18:50 ` [PATCH 1/5] cifs: implement clone_file_range operation Christoph Hellwig
[not found] ` <1448563859-21922-2-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-11-27 10:42 ` David Disseldorp
[not found] ` <20151127114232.5b367b7b-TzLh5lQYVSQb1SvskN2V4Q@public.gmane.org>
2015-11-30 9:02 ` Christoph Hellwig
2015-11-26 18:50 ` [PATCH 5/5] nfsd: implement the NFSv4.2 CLONE operation Christoph Hellwig
2015-11-30 22:56 ` vfs: move btrfs clone ioctls to common code J. Bruce Fields
2015-12-01 17:09 ` Chris Mason
2015-12-01 22:48 ` Steve French
2015-12-02 7:27 ` Christoph Hellwig
[not found] ` <20151202072757.GB15839-jcswGhMUV9g@public.gmane.org>
2015-12-02 17:40 ` Steve French
2015-12-03 10:30 ` Christoph Hellwig
[not found] ` <20151203103035.GA15996-jcswGhMUV9g@public.gmane.org>
2015-12-03 19:28 ` Steve French
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox