* [PATCH] new flag COPY_FILE_RANGE_FILESIZE for copy_file_range()
@ 2019-04-13 20:49 Shawn Landden
  2019-04-13 21:48 ` Andy Lutomirski
  0 siblings, 1 reply; 5+ messages in thread
From: Shawn Landden @ 2019-04-13 20:49 UTC (permalink / raw)
  Cc: linux-api, linux-fsdevel, linux-kernel, Shawn Landden, linux-api
If flags includes COPY_FILE_RANGE_FILESIZE then the length
copied is the length of the file. off_in and off_out are
ignored. len must be 0 or the file size.
This implementation saves a call to stat() in the common case
of copying files. It does not fix any race conditions, but that
is possible in the future with this interface.
EAGAIN: If COPY_FILE_RANGE_FILESIZE was passed and len is not 0
or the file size.
Signed-off-by: Shawn Landden <shawn@git.icu>
CC: <linux-api@vger.kernel.org>
---
 fs/read_write.c           | 14 +++++++++++++-
 include/uapi/linux/stat.h |  4 ++++
 2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 61b43ad7608e..6d06361f0856 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1557,7 +1557,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	struct inode *inode_out = file_inode(file_out);
 	ssize_t ret;
 
-	if (flags != 0)
+	if ((flags & ~COPY_FILE_RANGE_FILESIZE) != 0)
 		return -EINVAL;
 
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
@@ -1565,6 +1565,18 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
 		return -EINVAL;
 
+	if (flags & COPY_FILE_RANGE_FILESIZE) {
+		struct kstat stat;
+		int error;
+		error = vfs_getattr(&file_in->f_path, &stat,
+				    STATX_SIZE, 0);
+		if (error < 0)
+			return error;
+		if (!(len == 0 || len == stat.size))
+			return -EAGAIN;
+		len = stat.size;
+	}
+
 	ret = rw_verify_area(READ, file_in, &pos_in, len);
 	if (unlikely(ret))
 		return ret;
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 7b35e98d3c58..1075aa4666ef 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -170,5 +170,9 @@ struct statx {
 
 #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
 
+/*
+ * Flags for copy_file_range()
+ */
+#define COPY_FILE_RANGE_FILESIZE	0x00000001 /* Copy the full length of the input file */
 
 #endif /* _UAPI_LINUX_STAT_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 5+ messages in thread
* [PATCH] new flag COPY_FILE_RANGE_FILESIZE for copy_file_range()
@ 2019-04-13 20:54 Shawn Landden
  2019-04-14  1:02 ` Darrick J. Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Shawn Landden @ 2019-04-13 20:54 UTC (permalink / raw)
  Cc: linux-api, linux-fsdevel, linux-kernel, Shawn Landden
If flags includes COPY_FILE_RANGE_FILESIZE then the length
copied is the length of the file. off_in and off_out are
ignored. len must be 0 or the file size.
This implementation saves a call to stat() in the common case
of copying files. It does not fix any race conditions, but that
is possible in the future with this interface.
EAGAIN: If COPY_FILE_RANGE_FILESIZE was passed and len is not 0
or the file size.
Signed-off-by: Shawn Landden <shawn@git.icu>
CC: <linux-api@vger.kernel.org>
---
 fs/read_write.c           | 14 +++++++++++++-
 include/uapi/linux/stat.h |  4 ++++
 2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 61b43ad7608e..6d06361f0856 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1557,7 +1557,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	struct inode *inode_out = file_inode(file_out);
 	ssize_t ret;
 
-	if (flags != 0)
+	if ((flags & ~COPY_FILE_RANGE_FILESIZE) != 0)
 		return -EINVAL;
 
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
@@ -1565,6 +1565,18 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
 		return -EINVAL;
 
+	if (flags & COPY_FILE_RANGE_FILESIZE) {
+		struct kstat stat;
+		int error;
+		error = vfs_getattr(&file_in->f_path, &stat,
+				    STATX_SIZE, 0);
+		if (error < 0)
+			return error;
+		if (!(len == 0 || len == stat.size))
+			return -EAGAIN;
+		len = stat.size;
+	}
+
 	ret = rw_verify_area(READ, file_in, &pos_in, len);
 	if (unlikely(ret))
 		return ret;
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 7b35e98d3c58..1075aa4666ef 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -170,5 +170,9 @@ struct statx {
 
 #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
 
+/*
+ * Flags for copy_file_range()
+ */
+#define COPY_FILE_RANGE_FILESIZE	0x00000001 /* Copy the full length of the input file */
 
 #endif /* _UAPI_LINUX_STAT_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 5+ messages in thread
* Re: [PATCH] new flag COPY_FILE_RANGE_FILESIZE for copy_file_range()
  2019-04-13 20:49 [PATCH] new flag COPY_FILE_RANGE_FILESIZE for copy_file_range() Shawn Landden
@ 2019-04-13 21:48 ` Andy Lutomirski
  0 siblings, 0 replies; 5+ messages in thread
From: Andy Lutomirski @ 2019-04-13 21:48 UTC (permalink / raw)
  To: Shawn Landden; +Cc: linux-api, linux-fsdevel, linux-kernel, linux-api
> On Apr 13, 2019, at 1:49 PM, Shawn Landden <shawn@git.icu> wrote:
> 
> If flags includes COPY_FILE_RANGE_FILESIZE then the length
> copied is the length of the file. off_in and off_out are
> ignored. len must be 0 or the file size.
> 
> This implementation saves a call to stat() in the common case
> of copying files. It does not fix any race conditions, but that
> is possible in the future with this interface.
> 
> EAGAIN: If COPY_FILE_RANGE_FILESIZE was passed and len is not 0
> or the file size.
I think you’re asking for trouble here. I assume you have some kind of race prevention in mind here.  The trouble is that passing zero means copy the whole thing, but the size-checking behavior is only available for nonzero sizes. This means that anyone who passes their idea of the size needs to account for inconsistent behavior if the size is zero.
Also, what happens if the file size changes mid copy?  I assume the result is more or less arbitrary, but you should document what behavior is allowed.  The docs should cover the case where you race against an O_APPEND write.
> 
> Signed-off-by: Shawn Landden <shawn@git.icu>
> CC: <linux-api@vger.kernel.org>
> ---
> fs/read_write.c           | 14 +++++++++++++-
> include/uapi/linux/stat.h |  4 ++++
> 2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 61b43ad7608e..6d06361f0856 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1557,7 +1557,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>    struct inode *inode_out = file_inode(file_out);
>    ssize_t ret;
> 
> -    if (flags != 0)
> +    if ((flags & ~COPY_FILE_RANGE_FILESIZE) != 0)
>        return -EINVAL;
> 
>    if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> @@ -1565,6 +1565,18 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>    if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
>        return -EINVAL;
> 
> +    if (flags & COPY_FILE_RANGE_FILESIZE) {
> +        struct kstat stat;
> +        int error;
> +        error = vfs_getattr(&file_in->f_path, &stat,
> +                    STATX_SIZE, 0);
> +        if (error < 0)
> +            return error;
> +        if (!(len == 0 || len == stat.size))
> +            return -EAGAIN;
> +        len = stat.size;
> +    }
> +
>    ret = rw_verify_area(READ, file_in, &pos_in, len);
>    if (unlikely(ret))
>        return ret;
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 7b35e98d3c58..1075aa4666ef 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -170,5 +170,9 @@ struct statx {
> 
> #define STATX_ATTR_AUTOMOUNT        0x00001000 /* Dir: Automount trigger */
> 
> +/*
> + * Flags for copy_file_range()
> + */
> +#define COPY_FILE_RANGE_FILESIZE    0x00000001 /* Copy the full length of the input file */
> 
> #endif /* _UAPI_LINUX_STAT_H */
> -- 
> 2.20.1
> 
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH] new flag COPY_FILE_RANGE_FILESIZE for copy_file_range()
  2019-04-13 20:54 Shawn Landden
@ 2019-04-14  1:02 ` Darrick J. Wong
  2019-04-14  7:10   ` Amir Goldstein
  0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2019-04-14  1:02 UTC (permalink / raw)
  To: Shawn Landden; +Cc: linux-api, linux-fsdevel, linux-kernel
On Sat, Apr 13, 2019 at 03:54:39PM -0500, Shawn Landden wrote:
/me pulls out his close-reading glasses and the copy_file_range manpage...
> If flags includes COPY_FILE_RANGE_FILESIZE then the length
> copied is the length of the file. off_in and off_out are
> ignored.  len must be 0 or the file size.
They're ignored?  As in the copy operation reads the number of bytes in
the file referenced by fd_in from fd_in at its current position and is
writes that out to fd_out at its current position?  I don't see why I
would want such an operation...
...but I can see how people could make use of a CFR_ENTIRE_FILE that
would check that both file descriptors are actually regular files, and
if so copy the entire contents of the fd_in file into the same position
in the fd_out file, and then set the fd_out file's length to match.  If
@off_in or @off_out are non-NULL then they'll be updated to the new EOFs
if the copy completes succesfully and @len can be anything.
Also: please update the manual page and the xfstests regression test for
this syscall.
> This implementation saves a call to stat() in the common case
> of copying files. It does not fix any race conditions, but that
> is possible in the future with this interface.
> 
> EAGAIN: If COPY_FILE_RANGE_FILESIZE was passed and len is not 0
> or the file size.
The values are invalid, so why would we tell userspace to try again
instead of the EINVAL that we usually use?
> Signed-off-by: Shawn Landden <shawn@git.icu>
> CC: <linux-api@vger.kernel.org>
> ---
>  fs/read_write.c           | 14 +++++++++++++-
>  include/uapi/linux/stat.h |  4 ++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 61b43ad7608e..6d06361f0856 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1557,7 +1557,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	struct inode *inode_out = file_inode(file_out);
>  	ssize_t ret;
>  
> -	if (flags != 0)
> +	if ((flags & ~COPY_FILE_RANGE_FILESIZE) != 0)
FWIW you might as well shorten the prefix to "CFR_" since nobody else is
using it.
--D
>  		return -EINVAL;
>  
>  	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> @@ -1565,6 +1565,18 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
>  		return -EINVAL;
>  
> +	if (flags & COPY_FILE_RANGE_FILESIZE) {
> +		struct kstat stat;
> +		int error;
> +		error = vfs_getattr(&file_in->f_path, &stat,
> +				    STATX_SIZE, 0);
> +		if (error < 0)
> +			return error;
> +		if (!(len == 0 || len == stat.size))
> +			return -EAGAIN;
> +		len = stat.size;
> +	}
> +
>  	ret = rw_verify_area(READ, file_in, &pos_in, len);
>  	if (unlikely(ret))
>  		return ret;
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 7b35e98d3c58..1075aa4666ef 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -170,5 +170,9 @@ struct statx {
>  
>  #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
>  
> +/*
> + * Flags for copy_file_range()
> + */
> +#define COPY_FILE_RANGE_FILESIZE	0x00000001 /* Copy the full length of the input file */
>  
>  #endif /* _UAPI_LINUX_STAT_H */
> -- 
> 2.20.1
> 
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH] new flag COPY_FILE_RANGE_FILESIZE for copy_file_range()
  2019-04-14  1:02 ` Darrick J. Wong
@ 2019-04-14  7:10   ` Amir Goldstein
  0 siblings, 0 replies; 5+ messages in thread
From: Amir Goldstein @ 2019-04-14  7:10 UTC (permalink / raw)
  To: Shawn Landden; +Cc: linux-api, linux-fsdevel, linux-kernel, Darrick J. Wong
On Sun, Apr 14, 2019 at 4:04 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Sat, Apr 13, 2019 at 03:54:39PM -0500, Shawn Landden wrote:
>
> /me pulls out his close-reading glasses and the copy_file_range manpage...
>
> > If flags includes COPY_FILE_RANGE_FILESIZE then the length
> > copied is the length of the file. off_in and off_out are
> > ignored.  len must be 0 or the file size.
>
> They're ignored?  As in the copy operation reads the number of bytes in
> the file referenced by fd_in from fd_in at its current position and is
> writes that out to fd_out at its current position?  I don't see why I
> would want such an operation...
>
> ...but I can see how people could make use of a CFR_ENTIRE_FILE that
> would check that both file descriptors are actually regular files, and
> if so copy the entire contents of the fd_in file into the same position
> in the fd_out file, and then set the fd_out file's length to match.  If
> @off_in or @off_out are non-NULL then they'll be updated to the new EOFs
> if the copy completes succesfully and @len can be anything.
>
IDGI. In what way would that be helpful?
Would the syscall fail if it cannot copy entire file (like clone_file_range)
or return bytes copied?
If latter, then user will have to call syscall again until getting 0
return value.
User can already call copy_file_range with len=SSIZE_MAX and get almost
the same thing.
Unless the idea is to optimize for less syscalls for copying very large files??
In that case, MAX_RW_COUNT limit for this syscall would need to be relaxed.
While on the subject, something that has been discussed in the past is that
copy_file_range() and sendfile() of a large file are not killable, so that is
that should be fixed, especially if the interface is going to be used to copy
more data in-kernel.
IOW, the motivation of the patch is not clear to me:
> This implementation saves a call to stat() in the common case
What is the real life workload where this micro optimization would
have any affect?
> It does not fix any race conditions, but that is possible in the future
> with this interface.
Then please present a plan or an implementation of how that interface
can solve race conditions and if that is the only motivation for the
interface than I do not see why we should merge the interface before
the implementation.
Please let me know if I am missing something.
Thanks,
Amir.
^ permalink raw reply	[flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-04-14  7:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-13 20:49 [PATCH] new flag COPY_FILE_RANGE_FILESIZE for copy_file_range() Shawn Landden
2019-04-13 21:48 ` Andy Lutomirski
  -- strict thread matches above, loose matches on Subject: below --
2019-04-13 20:54 Shawn Landden
2019-04-14  1:02 ` Darrick J. Wong
2019-04-14  7:10   ` Amir Goldstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).