linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: "Michael Kerrisk (man-pages)"
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Anna Schumaker
	<Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	clm-b10kYP2dOMg@public.gmane.org,
	andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org,
	hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
Subject: Re: [PATCH v2 10/9] copy_file_range.2: New page documenting copy_file_range()
Date: Mon, 14 Sep 2015 11:32:23 -0700	[thread overview]
Message-ID: <20150914183223.GA28469@birch.djwong.org> (raw)
In-Reply-To: <55F52ABA.9070908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Sun, Sep 13, 2015 at 09:50:18AM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Anna,
> 
> On 09/11/2015 10:30 PM, Anna Schumaker wrote:
> > copy_file_range() is a new system call for copying ranges of data
> > completely in the kernel.  This gives filesystems an opportunity to
> > implement some kind of "copy acceleration", such as reflinks or
> > server-side-copy (in the case of NFS).
> > 
> > Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
> 
> Thanks for writing such a nice first draft man page! I have a few
> comments below. Would you be willing to revise and resubmit?
> 
> > ---
> >  man2/copy_file_range.2 | 188 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 188 insertions(+)
> >  create mode 100644 man2/copy_file_range.2
> > 
> > diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> > new file mode 100644
> > index 0000000..84912b5
> > --- /dev/null
> > +++ b/man2/copy_file_range.2
> > @@ -0,0 +1,188 @@
> > +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
> 
> We need a license for this page. Please see
> https://www.kernel.org/doc/man-pages/licenses.html
> 
> > +.TH COPY 2 2015-8-31 "Linux" "Linux Programmer's Manual"
> 
> Make the month 2 digits (leading 0).
> 
> > +.SH NAME
> > +copy_file_range \- Copy a range of data from one file to another
> > +.SH SYNOPSIS
> > +.nf
> > +.B #include <linux/copy.h>
> > +.B #include <sys/syscall.h>
> > +.B #include <unistd.h>
> > +
> > +.BI "ssize_t syscall(__NR_copy_file_range, int " fd_in ", loff_t * " off_in ",
> > +.BI "                int " fd_out ", loff_t * " off_out ", size_t " len ",
> > +.BI "                unsigned int " flags );
> 
> Remove spaces following "*" in the lines above. (man-pages convention for code)
> 
> I know that the copy_file_range() (obviously) doesn't yet have a wrapper
> in glibc, but in the man pages we document all system calls as though there
> is a wrapper. See, for example, gettid(2), for an axample of how this is done
> (a note in the SYNOPSIS and then some further details in NOTES).
> 
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR copy_file_range ()
> > +system call performs an in-kernel copy between two file descriptors
> > +without all that tedious mucking about in userspace.
> 
> I'd write that last piece as:
> 
> "without the cost of (a loop) transferring data from the kernel to a 
> user-space buffer and then back to the kernel again.
> 
> > +It copies up to
> > +.I len
> > +bytes of data from file descriptor
> > +.I fd_in
> > +to file descriptor
> > +.IR fd_out ,
> > +overwriting any data that exists within the requested range.
> 
> s/.$/ of the target file./
> 
> > +
> > +The following semantics apply for
> > +.IR off_in ,
> > +and similar statements apply to
> > +.IR off_out :
> > +.IP * 3
> > +If
> > +.I off_in
> > +is NULL, then bytes are read from
> > +.I fd_in
> > +starting from the current file offset and the current
> > +file offset is adjusted appropriately.
> > +.IP *
> > +If
> > +.I off_in
> > +is not NULL, then
> > +.I off_in
> > +must point to a buffer that specifies the starting
> > +offset where bytes from
> > +.I fd_in
> > +will be read.  The current file offset of
> > +.I fd_in
> > +is not changed, but
> > +.I off_in
> > +is adjusted appropriately.
> > +.PP
> > +
> > +The
> > +.I flags
> > +argument is a bit mask composed by OR-ing together zero
> > +or more of the following flags:
> > +.TP 1.9i
> > +.B COPY_FR_COPY
> > +Copy all the file data in the requested range.
> > +Some filesystems, like NFS, might be able to accelerate this copy
> > +to avoid unnecessary data transfers.
> > +.TP
> > +.B COPY_FR_REFLINK
> > +Create a lightweight "reflink", where data is not copied until
> > +one of the files is modified.
> > +.PP
> > +The default behavior
> > +.RI ( flags
> > +== 0) is to try creating a reflink,
> > +and if reflinking fails
> > +.BR copy_file_range ()
> > +will fall back on performing a full data copy.
> 
> s/back on/back to/
> 
> > +This is equivalent to setting
> > +.I flags
> > +equal to
> > +.RB ( COPY_FR_COPY | COPY_FR_REFLINK ).
> 
> So, from an API deign perspective, the interoperation of these two
> flags appears odd. Bit flags are commonly (not always) orthogonal.
> I think here it could be pointed out a little more explicitly that
> these two flags are not orthogonal. In particular, perhaps after the
> last sentence, you could add another sentence:
> 
> [[
> (This contrasts with specifying
> .I flags
> as just
> .BR COPY_FR_REFLINK ,
> which causes the call to create a reflink,
> and fail if that is not possible,
> rather than falling back to a full data copy.)
> ]]
> 
> Furthermore, I even wonder if explicitly specifying flags as
> COPY_FR_COPY | COPY_FR_REFLINK should just generate an EINVAL
> error. 0 already gives us the behavior described above,
> and allowing the combination COPY_FR_COPY | COPY_FR_REFLINK
> perhaps just contributes to misleading the user that these
> flags are orthogonal, when in reality they are not. What do
> you think?

Personally, I think it's a little weird that one turns on reflink with a flag;
turns on regular copy with a different flag; and turns on both by not
specifying either flag. :)

> What are the semantics with respect to signals, especially if data 
> copying a very large file range? This info should be included in the 
> man page, probably under NOTES.
> 
> > +.SH RETURN VALUE
> > +Upon successful completion,
> > +.BR copy_file_range ()
> > +will return the number of bytes copied between files.
> > +This could be less than the length originally requested.
> > +
> > +On error,
> > +.BR copy_file_range ()
> > +returns \-1 and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EBADF
> > +One or more file descriptors are not valid,
> > +or do not have proper read-write mode;
> 
> I think that last line can go. I mean, isn't this point (again)
> covered in the next few lines?

Or change the ';' to a ':'?

> > +.I fd_in
> > +is not open for reading; or
> > +.I fd_out
> > +is not open for writing.
> > +.TP
> > +.B EINVAL
> > +Requested range extends beyond the end of the file; or the
> 
> s/file/source file/  (right?)
>
> > +.I flags
> > +argument is set to an invalid value.
> > +.TP
> > +.B EIO
> > +A low level I/O error occurred while copying.
> > +.TP
> > +.B ENOMEM
> > +Out of memory.
> > +.TP
> > +.B ENOSPC
> > +There is not enough space to complete the copy.
> 
> Space where? On the filesystem?
> => s/space/space on the target filesystem/
> 
> > +.TP
> > +.B EOPNOTSUPP
> > +.B COPY_REFLINK
> > +was specified in
> > +.IR flags ,
> > +but the target filesystem does not support reflinks.
> > +.TP
> > +.B EXDEV
> > +Target filesystem doesn't support cross-filesystem copies.
> 
> I'm curious. What are some examples of filesystems that produce this
> error?

btrfs and xfs (and probably most local filesystems) don't support cross-fs
copies.

--D

> 
> > +.SH VERSIONS
> > +The
> > +.BR copy_file_range ()
> > +system call first appeared in Linux 4.4.
> > +.SH CONFORMING TO
> > +The
> > +.BR copy_file_range ()
> > +system call is a nonstandard Linux extension.
> > +.SH EXAMPLE
> > +.nf
> > +
> > +#define _GNU_SOURCE
> > +#include <fcntl.h>
> > +#include <linux/copy.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <sys/stat.h>
> > +#include <sys/syscall.h>
> > +#include <unistd.h>
> > +
> > +
> > +int main(int argc, char **argv)
> > +{
> > +    int fd_in, fd_out;
> > +    struct stat stat;
> > +    loff_t len, ret;
> > +
> > +    if (argc != 3) {
> > +        fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    fd_in = open(argv[1], O_RDONLY);
> > +    if (fd_in == -1) {
> 
> Please replace all "-" in code by "\-". (This is a groff
> detail.)
> 
> > +        perror("open (argv[1])");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    if (fstat(fd_in, &stat) == -1) {
> > +        perror("fstat");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +    len = stat.st_size;
> > +
> > +    fd_out = creat(argv[2], 0644);
> 
> These days, I think we should discourage the use of creat(). Maybe 
> better to use open() instead here?
> 
> > +    if (fd_out == -1) {
> > +        perror("creat (argv[2])");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    do {
> > +        ret = syscall(__NR_copy_file_range, fd_in, NULL,
> > +                      fd_out, NULL, len, 0);
> 
> I'd rather see this as:
> 
> int
> copy_file_range(int fd_in, loff_t *off_in,
>                 int fd_out, loff_t *off_out, size_t len,
>                 unsigned int flags)
> {
>     return(syscall(__NR_copy_file_range, fd_in, fd_out, off_out, len, flags);
> }
> 
> ...
> 
>     copy_file_range(fd_in, fd_out, off_out, len, flags);
> 
>  
> > +        if (ret == -1) {
> > +            perror("copy_file_range");
> > +            exit(EXIT_FAILURE);
> > +        }
> > +
> > +        len -= ret;
> > +    } while (len > 0);
> > +
> > +    close(fd_in);
> > +    close(fd_out);
> > +    exit(EXIT_SUCCESS);
> > +}
> > +.fi
> > +.SH SEE ALSO
> > +.BR splice (2)
> > 
> 
> In the next iteration of this patch, could you include a change to
> splice(2) so that copy_file_range(2) is added under SEE ALSO in
> that page. Also, are there any other pages that we should like
> to/from? (sendfile(2)?)
> 
> Thanks,
> 
> Michael
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2015-09-14 18:32 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11 20:30 [PATCH v2 0/9] VFS: In-kernel copy system call Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 1/9] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
     [not found]   ` <1442003423-6884-2-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-22 11:44     ` David Sterba
     [not found]       ` <20150922114404.GF8891-1ReQVI26iDCaZKY3DrU6dA@public.gmane.org>
2015-09-22 18:27         ` Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 2/9] x86: add sys_copy_file_range to syscall tables Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 3/9] btrfs: add .copy_file_range file operation Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 4/9] vfs: Copy should check len after file open mode Anna Schumaker
     [not found]   ` <1442003423-6884-5-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-22 11:47     ` David Sterba
     [not found] ` <1442003423-6884-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-11 20:30   ` [PATCH v2 5/9] vfs: Copy shouldn't forbid ranges inside the same file Anna Schumaker
2015-09-22 11:48     ` David Sterba
2015-09-11 20:30   ` [PATCH v2 8/9] vfs: copy_file_range() can do a pagecache copy with splice Anna Schumaker
2015-09-15  3:32     ` Darrick J. Wong
     [not found]       ` <20150915033217.GG10391-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-15 15:58         ` Anna Schumaker
     [not found]           ` <55F8400C.3000402-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-15 16:38             ` Darrick J. Wong
     [not found]               ` <20150915163829.GA12658-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-15 17:01                 ` Austin S Hemmelgarn
2015-09-11 20:30 ` [PATCH v2 6/9] vfs: Copy should use file_out rather than file_in Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 7/9] vfs: Remove copy_file_range mountpoint checks Anna Schumaker
2015-09-22 11:52   ` David Sterba
2015-09-11 20:30 ` [PATCH v2 9/9] btrfs: btrfs_copy_file_range() only supports reflinks Anna Schumaker
2015-09-22 11:56   ` David Sterba
2015-09-11 20:30 ` [PATCH v2 10/9] copy_file_range.2: New page documenting copy_file_range() Anna Schumaker
2015-09-13  7:50   ` Michael Kerrisk (man-pages)
     [not found]     ` <55F52ABA.9070908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-14 18:32       ` Darrick J. Wong [this message]
     [not found]         ` <20150914183223.GA28469-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-22 20:10           ` Anna Schumaker
2015-09-22 20:30             ` Pádraig Brady
2015-09-28 17:23             ` Darrick J. Wong
2015-09-14 19:02     ` Austin S Hemmelgarn
2015-09-22 20:30     ` Anna Schumaker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150914183223.GA28469@birch.djwong.org \
    --to=darrick.wong-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=clm-b10kYP2dOMg@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    --cc=zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).