From: "Darrick J. Wong" <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: "Michael Kerrisk (man-pages)"
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Anna Schumaker
<Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org,
viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
clm-b10kYP2dOMg@public.gmane.org,
andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
Subject: Re: [PATCH v2 10/9] copy_file_range.2: New page documenting copy_file_range()
Date: Mon, 14 Sep 2015 11:32:23 -0700 [thread overview]
Message-ID: <20150914183223.GA28469@birch.djwong.org> (raw)
In-Reply-To: <55F52ABA.9070908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On Sun, Sep 13, 2015 at 09:50:18AM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Anna,
>
> On 09/11/2015 10:30 PM, Anna Schumaker wrote:
> > copy_file_range() is a new system call for copying ranges of data
> > completely in the kernel. This gives filesystems an opportunity to
> > implement some kind of "copy acceleration", such as reflinks or
> > server-side-copy (in the case of NFS).
> >
> > Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
>
> Thanks for writing such a nice first draft man page! I have a few
> comments below. Would you be willing to revise and resubmit?
>
> > ---
> > man2/copy_file_range.2 | 188 +++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 188 insertions(+)
> > create mode 100644 man2/copy_file_range.2
> >
> > diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> > new file mode 100644
> > index 0000000..84912b5
> > --- /dev/null
> > +++ b/man2/copy_file_range.2
> > @@ -0,0 +1,188 @@
> > +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
>
> We need a license for this page. Please see
> https://www.kernel.org/doc/man-pages/licenses.html
>
> > +.TH COPY 2 2015-8-31 "Linux" "Linux Programmer's Manual"
>
> Make the month 2 digits (leading 0).
>
> > +.SH NAME
> > +copy_file_range \- Copy a range of data from one file to another
> > +.SH SYNOPSIS
> > +.nf
> > +.B #include <linux/copy.h>
> > +.B #include <sys/syscall.h>
> > +.B #include <unistd.h>
> > +
> > +.BI "ssize_t syscall(__NR_copy_file_range, int " fd_in ", loff_t * " off_in ",
> > +.BI " int " fd_out ", loff_t * " off_out ", size_t " len ",
> > +.BI " unsigned int " flags );
>
> Remove spaces following "*" in the lines above. (man-pages convention for code)
>
> I know that the copy_file_range() (obviously) doesn't yet have a wrapper
> in glibc, but in the man pages we document all system calls as though there
> is a wrapper. See, for example, gettid(2), for an axample of how this is done
> (a note in the SYNOPSIS and then some further details in NOTES).
>
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR copy_file_range ()
> > +system call performs an in-kernel copy between two file descriptors
> > +without all that tedious mucking about in userspace.
>
> I'd write that last piece as:
>
> "without the cost of (a loop) transferring data from the kernel to a
> user-space buffer and then back to the kernel again.
>
> > +It copies up to
> > +.I len
> > +bytes of data from file descriptor
> > +.I fd_in
> > +to file descriptor
> > +.IR fd_out ,
> > +overwriting any data that exists within the requested range.
>
> s/.$/ of the target file./
>
> > +
> > +The following semantics apply for
> > +.IR off_in ,
> > +and similar statements apply to
> > +.IR off_out :
> > +.IP * 3
> > +If
> > +.I off_in
> > +is NULL, then bytes are read from
> > +.I fd_in
> > +starting from the current file offset and the current
> > +file offset is adjusted appropriately.
> > +.IP *
> > +If
> > +.I off_in
> > +is not NULL, then
> > +.I off_in
> > +must point to a buffer that specifies the starting
> > +offset where bytes from
> > +.I fd_in
> > +will be read. The current file offset of
> > +.I fd_in
> > +is not changed, but
> > +.I off_in
> > +is adjusted appropriately.
> > +.PP
> > +
> > +The
> > +.I flags
> > +argument is a bit mask composed by OR-ing together zero
> > +or more of the following flags:
> > +.TP 1.9i
> > +.B COPY_FR_COPY
> > +Copy all the file data in the requested range.
> > +Some filesystems, like NFS, might be able to accelerate this copy
> > +to avoid unnecessary data transfers.
> > +.TP
> > +.B COPY_FR_REFLINK
> > +Create a lightweight "reflink", where data is not copied until
> > +one of the files is modified.
> > +.PP
> > +The default behavior
> > +.RI ( flags
> > +== 0) is to try creating a reflink,
> > +and if reflinking fails
> > +.BR copy_file_range ()
> > +will fall back on performing a full data copy.
>
> s/back on/back to/
>
> > +This is equivalent to setting
> > +.I flags
> > +equal to
> > +.RB ( COPY_FR_COPY | COPY_FR_REFLINK ).
>
> So, from an API deign perspective, the interoperation of these two
> flags appears odd. Bit flags are commonly (not always) orthogonal.
> I think here it could be pointed out a little more explicitly that
> these two flags are not orthogonal. In particular, perhaps after the
> last sentence, you could add another sentence:
>
> [[
> (This contrasts with specifying
> .I flags
> as just
> .BR COPY_FR_REFLINK ,
> which causes the call to create a reflink,
> and fail if that is not possible,
> rather than falling back to a full data copy.)
> ]]
>
> Furthermore, I even wonder if explicitly specifying flags as
> COPY_FR_COPY | COPY_FR_REFLINK should just generate an EINVAL
> error. 0 already gives us the behavior described above,
> and allowing the combination COPY_FR_COPY | COPY_FR_REFLINK
> perhaps just contributes to misleading the user that these
> flags are orthogonal, when in reality they are not. What do
> you think?
Personally, I think it's a little weird that one turns on reflink with a flag;
turns on regular copy with a different flag; and turns on both by not
specifying either flag. :)
> What are the semantics with respect to signals, especially if data
> copying a very large file range? This info should be included in the
> man page, probably under NOTES.
>
> > +.SH RETURN VALUE
> > +Upon successful completion,
> > +.BR copy_file_range ()
> > +will return the number of bytes copied between files.
> > +This could be less than the length originally requested.
> > +
> > +On error,
> > +.BR copy_file_range ()
> > +returns \-1 and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EBADF
> > +One or more file descriptors are not valid,
> > +or do not have proper read-write mode;
>
> I think that last line can go. I mean, isn't this point (again)
> covered in the next few lines?
Or change the ';' to a ':'?
> > +.I fd_in
> > +is not open for reading; or
> > +.I fd_out
> > +is not open for writing.
> > +.TP
> > +.B EINVAL
> > +Requested range extends beyond the end of the file; or the
>
> s/file/source file/ (right?)
>
> > +.I flags
> > +argument is set to an invalid value.
> > +.TP
> > +.B EIO
> > +A low level I/O error occurred while copying.
> > +.TP
> > +.B ENOMEM
> > +Out of memory.
> > +.TP
> > +.B ENOSPC
> > +There is not enough space to complete the copy.
>
> Space where? On the filesystem?
> => s/space/space on the target filesystem/
>
> > +.TP
> > +.B EOPNOTSUPP
> > +.B COPY_REFLINK
> > +was specified in
> > +.IR flags ,
> > +but the target filesystem does not support reflinks.
> > +.TP
> > +.B EXDEV
> > +Target filesystem doesn't support cross-filesystem copies.
>
> I'm curious. What are some examples of filesystems that produce this
> error?
btrfs and xfs (and probably most local filesystems) don't support cross-fs
copies.
--D
>
> > +.SH VERSIONS
> > +The
> > +.BR copy_file_range ()
> > +system call first appeared in Linux 4.4.
> > +.SH CONFORMING TO
> > +The
> > +.BR copy_file_range ()
> > +system call is a nonstandard Linux extension.
> > +.SH EXAMPLE
> > +.nf
> > +
> > +#define _GNU_SOURCE
> > +#include <fcntl.h>
> > +#include <linux/copy.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <sys/stat.h>
> > +#include <sys/syscall.h>
> > +#include <unistd.h>
> > +
> > +
> > +int main(int argc, char **argv)
> > +{
> > + int fd_in, fd_out;
> > + struct stat stat;
> > + loff_t len, ret;
> > +
> > + if (argc != 3) {
> > + fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + fd_in = open(argv[1], O_RDONLY);
> > + if (fd_in == -1) {
>
> Please replace all "-" in code by "\-". (This is a groff
> detail.)
>
> > + perror("open (argv[1])");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + if (fstat(fd_in, &stat) == -1) {
> > + perror("fstat");
> > + exit(EXIT_FAILURE);
> > + }
> > + len = stat.st_size;
> > +
> > + fd_out = creat(argv[2], 0644);
>
> These days, I think we should discourage the use of creat(). Maybe
> better to use open() instead here?
>
> > + if (fd_out == -1) {
> > + perror("creat (argv[2])");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + do {
> > + ret = syscall(__NR_copy_file_range, fd_in, NULL,
> > + fd_out, NULL, len, 0);
>
> I'd rather see this as:
>
> int
> copy_file_range(int fd_in, loff_t *off_in,
> int fd_out, loff_t *off_out, size_t len,
> unsigned int flags)
> {
> return(syscall(__NR_copy_file_range, fd_in, fd_out, off_out, len, flags);
> }
>
> ...
>
> copy_file_range(fd_in, fd_out, off_out, len, flags);
>
>
> > + if (ret == -1) {
> > + perror("copy_file_range");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + len -= ret;
> > + } while (len > 0);
> > +
> > + close(fd_in);
> > + close(fd_out);
> > + exit(EXIT_SUCCESS);
> > +}
> > +.fi
> > +.SH SEE ALSO
> > +.BR splice (2)
> >
>
> In the next iteration of this patch, could you include a change to
> splice(2) so that copy_file_range(2) is added under SEE ALSO in
> that page. Also, are there any other pages that we should like
> to/from? (sendfile(2)?)
>
> Thanks,
>
> Michael
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-09-14 18:32 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-11 20:30 [PATCH v2 0/9] VFS: In-kernel copy system call Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 1/9] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
[not found] ` <1442003423-6884-2-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-22 11:44 ` David Sterba
[not found] ` <20150922114404.GF8891-1ReQVI26iDCaZKY3DrU6dA@public.gmane.org>
2015-09-22 18:27 ` Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 2/9] x86: add sys_copy_file_range to syscall tables Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 3/9] btrfs: add .copy_file_range file operation Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 4/9] vfs: Copy should check len after file open mode Anna Schumaker
[not found] ` <1442003423-6884-5-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-22 11:47 ` David Sterba
[not found] ` <1442003423-6884-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-11 20:30 ` [PATCH v2 5/9] vfs: Copy shouldn't forbid ranges inside the same file Anna Schumaker
2015-09-22 11:48 ` David Sterba
2015-09-11 20:30 ` [PATCH v2 8/9] vfs: copy_file_range() can do a pagecache copy with splice Anna Schumaker
2015-09-15 3:32 ` Darrick J. Wong
[not found] ` <20150915033217.GG10391-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-15 15:58 ` Anna Schumaker
[not found] ` <55F8400C.3000402-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-15 16:38 ` Darrick J. Wong
[not found] ` <20150915163829.GA12658-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-15 17:01 ` Austin S Hemmelgarn
2015-09-11 20:30 ` [PATCH v2 6/9] vfs: Copy should use file_out rather than file_in Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 7/9] vfs: Remove copy_file_range mountpoint checks Anna Schumaker
2015-09-22 11:52 ` David Sterba
2015-09-11 20:30 ` [PATCH v2 9/9] btrfs: btrfs_copy_file_range() only supports reflinks Anna Schumaker
2015-09-22 11:56 ` David Sterba
2015-09-11 20:30 ` [PATCH v2 10/9] copy_file_range.2: New page documenting copy_file_range() Anna Schumaker
2015-09-13 7:50 ` Michael Kerrisk (man-pages)
[not found] ` <55F52ABA.9070908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-14 18:32 ` Darrick J. Wong [this message]
[not found] ` <20150914183223.GA28469-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-22 20:10 ` Anna Schumaker
2015-09-22 20:30 ` Pádraig Brady
2015-09-28 17:23 ` Darrick J. Wong
2015-09-14 19:02 ` Austin S Hemmelgarn
2015-09-22 20:30 ` Anna Schumaker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150914183223.GA28469@birch.djwong.org \
--to=darrick.wong-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
--cc=Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
--cc=andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
--cc=clm-b10kYP2dOMg@public.gmane.org \
--cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
--cc=zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).