linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Austin S Hemmelgarn <ahferroin7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Anna Schumaker
	<Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
	"Darrick J. Wong"
	<darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: "Andy Lutomirski" <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	"Pádraig Brady" <P@draigbrady.com>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Linux btrfs Developers List"
	<linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"Linux FS Devel"
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"Linux API" <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"Zach Brown" <zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org>,
	"Al Viro"
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	"Chris Mason" <clm-b10kYP2dOMg@public.gmane.org>,
	"Michael Kerrisk-manpages"
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org,
	"Christoph Hellwig" <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Coreutils <coreutils-mXXj517/zsQ@public.gmane.org>
Subject: Re: [PATCH v1 0/8] VFS: In-kernel copy system call
Date: Thu, 10 Sep 2015 11:49:20 -0400	[thread overview]
Message-ID: <55F1A680.8010905@gmail.com> (raw)
In-Reply-To: <55F19D7F.5090907-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 10045 bytes --]

On 2015-09-10 11:10, Anna Schumaker wrote:
> On 09/09/2015 05:16 PM, Darrick J. Wong wrote:
>> On Wed, Sep 09, 2015 at 02:52:08PM -0400, Anna Schumaker wrote:
>>> On 09/08/2015 06:39 PM, Darrick J. Wong wrote:
>>>> On Tue, Sep 08, 2015 at 02:45:39PM -0700, Andy Lutomirski wrote:
>>>>> On Tue, Sep 8, 2015 at 2:29 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>>>>>> On Tue, Sep 08, 2015 at 09:03:09PM +0100, Pádraig Brady wrote:
>>>>>>> On 08/09/15 20:10, Andy Lutomirski wrote:
>>>>>>>> On Tue, Sep 8, 2015 at 11:23 AM, Anna Schumaker
>>>>>>>> <Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote:
>>>>>>>>> On 09/08/2015 11:21 AM, Pádraig Brady wrote:
>>>>>>>>>> I see copy_file_range() is a reflink() on BTRFS?
>>>>>>>>>> That's a bit surprising, as it avoids the copy completely.
>>>>>>>>>> cp(1) for example considered doing a BTRFS clone by default,
>>>>>>>>>> but didn't due to expectations that users actually wanted
>>>>>>>>>> the data duplicated on disk for resilience reasons,
>>>>>>>>>> and for performance reasons so that write latencies were
>>>>>>>>>> restricted to the copy operation, rather than being
>>>>>>>>>> introduced at usage time as the dest file is CoW'd.
>>>>>>>>>>
>>>>>>>>>> If reflink() is a possibility for copy_file_range()
>>>>>>>>>> then could it be done optionally with a flag?
>>>>>>>>>
>>>>>>>>> The idea is that filesystems get to choose how to handle copies in the
>>>>>>>>> default case.  BTRFS could do a reflink, but NFS could do a server side
>>>>>>
>>>>>> Eww, different default behaviors depending on the filesystem. :)
>>>>>>
>>>>>>>>> copy instead.  I can change the default behavior to only do a data copy
>>>>>>>>> (unless the reflink flag is specified) instead, if that is desirable.
>>>>>>>>>
>>>>>>>>> What does everybody think?
>>>>>>>>
>>>>>>>> I think the best you could do is to have a hint asking politely for
>>>>>>>> the data to be deep-copied.  After all, some filesystems reserve the
>>>>>>>> right to transparently deduplicate.
>>>>>>>>
>>>>>>>> Also, on a true COW filesystem (e.g. btrfs sometimes), there may be no
>>>>>>>> advantage to deep copying unless you actually want two copies for
>>>>>>>> locality reasons.
>>>>>>>
>>>>>>> Agreed. The relink and server side copy are separate things.
>>>>>>> There's no advantage to not doing a server side copy,
>>>>>>> but as mentioned there may be advantages to doing deep copies on BTRFS
>>>>>>> (another reason not previous mentioned in this thread, would be
>>>>>>> to avoid ENOSPC errors at some time in the future).
>>>>>>>
>>>>>>> So having control over the deep copy seems useful.
>>>>>>> It's debatable whether ALLOW_REFLINK should be on/off by default
>>>>>>> for copy_file_range().  I'd be inclined to have such a setting off by default,
>>>>>>> but cp(1) at least will work with whatever is chosen.
>>>>>>
>>>>>> So far it looks like people are interested in at least these "make data appear
>>>>>> in this other place" filesystem operations:
>>>>>>
>>>>>> 1. reflink
>>>>>> 2. reflink, but only if the contents are the same (dedupe)
>>>>>
>>>>> What I meant by this was: if you ask for "regular copy", you may end
>>>>> up with a reflink anyway.  Anyway, how can you reflink a range and
>>>>> have the contents *not* be the same?
>>>>
>>>> reflink forcibly remaps fd_dest's range to fd_src's range.  If they didn't
>>>> match before, they will afterwards.
>>>>
>>>> dedupe remaps fd_dest's range to fd_src's range only if they match, of course.
>>>>
>>>> Perhaps I should have said "...if the contents are the same before the call"?
>>>>
>>>>>
>>>>>> 3. regular copy
>>>>>> 4. regular copy, but make the hardware do it for us
>>>>>> 5. regular copy, but require a second copy on the media (no-dedupe)
>>>>>
>>>>> If this comes from me, I have no desire to ever use this as a flag.
>>>>
>>>> I meant (5) as a "disable auto-dedupe for this operation" flag, not as
>>>> a "reallocate all the shared blocks now" op...
>>>>
>>>>> If someone wants to use chattr or some new operation to say "make this
>>>>> range of this file belong just to me for purpose of optimizing future
>>>>> writes", then sure, go for it, with the understanding that there are
>>>>> plenty of filesystems for which that doesn't even make sense.
>>>>
>>>> "Unshare these blocks" sounds more like something fallocate could do.
>>>>
>>>> So far in my XFS reflink playground, it seems that using the defrag tool to
>>>> un-cow a file makes most sense.  AFAICT the XFS and ext4 defraggers copy a
>>>> fragmented file's data to a second file and use a 'swap extents' operation,
>>>> after which the donor file is unlinked.
>>>>
>>>> Hey, if this syscall turns into a more generic "do something involving two
>>>> (fd:off:len) (fd:off:len) tuples" call, I guess we could throw in "swap
>>>> extents" as a 7th operation, to refactor the ioctls.  <smirk>
>>>>
>>>>>
>>>>>> 6. regular copy, but don't CoW (eatmyothercopies) (joke)
>>>>>>
>>>>>> (Please add whatever ops I missed.)
>>>>>>
>>>>>> I think I can see a case for letting (4) fall back to (3) since (4) is an
>>>>>> optimization of (3).
>>>>>>
>>>>>> However, I particularly don't like the idea of (1) falling back to (3-5).
>>>>>> Either the kernel can satisfy a request or it can't, but let's not just
>>>>>> assume that we should transmogrify one type of request into another.  Userspace
>>>>>> should decide if a reflink failure should turn into one of the copy variants,
>>>>>> depending on whether the user wants to spread allocation costs over rewrites or
>>>>>> pay it all up front.  Also, if we allow reflink to fall back to copy, how do
>>>>>> programs find out what actually took place?  Or do we simply not allow them to
>>>>>> find out?
>>>>>>
>>>>>> Also, programs that expect reflink either to finish or fail quickly might be
>>>>>> surprised if it's possible for reflink to take a longer time than usual and
>>>>>> with the side effect that a deep(er) copy was made.
>>>>>>
>>>>>> I guess if someone asks for both (1) and (3) we can do the fallback in the
>>>>>> kernel, like how we handle it right now.
>>>>>>
>>>>>
>>>>> I think we should focus on what the actual legit use cases might be.
>>>>> Certainly we want to support a mode that's "reflink or fail".  We
>>>>> could have these flags:
>>>>>
>>>>> COPY_FILE_RANGE_ALLOW_REFLINK
>>>>> COPY_FILE_RANGE_ALLOW_COPY
>>>>>
>>>>> Setting neither gets -EINVAL.  Setting both works as is.  Setting just
>>>>> ALLOW_REFLINK will fail if a reflink can't be supported.  Setting just
>>>>> ALLOW_COPY will make a best-effort attempt not to reflink but
>>>>> expressly permits reflinking in cases where either (a) plain old
>>>>> write(2) might also result in a reflink or (b) there is no advantage
>>>>> to not reflinking.
>>>>
>>>> I don't agree with having a 'copy' flag that can reflink when we also have a
>>>> 'reflink' flag.  I guess I just don't like having a flag with different
>>>> meanings depending on context.
>>>>
>>>> Users should be able to get the default behavior by passing '0' for flags, so
>>>> provide FORBID_REFLINK and FORBID_COPY flags to turn off those behaviors, with
>>>> an admonishment that one should only use them if they have a goooood reason.
>>>> Passing neither gets you reflink-xor-copy, which is what I think we both want
>>>> in the general case.
>>>
>>> I agree here that 0 for flags should do something useful, and I wanted to
>>> double check if reflink-xor-copy is a good default behavior.
>>
>> Ok.
>>
>>>>
>>>> FORBID_REFLINK = 1
>>>> FORBID_COPY = 2
>>>
>>> I don't like the idea of using flags to forbid behavior.  I think it would be
>>> more straightforward to have flags like REFLINK_ONLY or COPY_ONLY so users
>>> can tell us what they want, instead of what they don't want.
>>
>> Seems fine to me.
>>
>>> While I'm thinking about flags, COPY_FILE_RANGE_REFLINK_ONLY would be a bit
>>> of a mouthful.  Does anybody have suggestions for ways that I could make this
>>> shorter?
>>
>> CFR_REFLINK_ONLY?
>
> That could work!  Although I might do as Austin suggests and drop the _ONLY part, and then make the man page clear about what's going on.
>
> Would you expect to trigger a NFS server side copy by passing the pagecache copy flag?  Or would that only happen if I pass flags=0?
Personally, I would think that an NFS server side copy could be counted 
under the 'hardware assisted' flag.  From the point of view of an NFS 
client, the NFS server is a (usually) opaque piece of storage hardware, 
similar to a local disk drive in that you pass commands to it and get 
responses, the only real difference is that NFS is a much higher level 
protocol than for example SCSI.
>>
>> --D
>>
>>>
>>> Thanks,
>>> Anna
>>>
>>>> CHECK_SAME = 4
>>>> HW_COPY = 8
>>>>
>>>> DEDUPE = (FORBID_COPY | CHECK_SAME)
>>>>
>>>> What do you say to that?
>>>>
>>>>> An example of (b) would be a filesystem backed by deduped
>>>>> thinly-provisioned storage that can't do anything about ENOSPC because
>>>>> it doesn't control it in the first place.
>>>>>
>>>>> Another option would be to split up the copy case into "I expect to
>>>>> overwrite a lot of the target file soon, so (c) try to commit space
>>>>> for that or (d) try to make it time-efficient".  Of course, (d) is
>>>>> irrelevant on filesystems with no random access (nvdimms, for
>>>>> example).
>>>>>
>>>>> I guess the tl;dr is that I'm highly skeptical of any use for
>>>>> disallowing reflinking other than forcibly committing space in cases
>>>>> where committing space actually means something.
>>>>
>>>> That's more or less where I was going too. :)
>>>>
>>>> --D
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-api" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

  parent reply	other threads:[~2015-09-10 15:49 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-04 20:16 [PATCH v1 0/8] VFS: In-kernel copy system call Anna Schumaker
2015-09-04 20:16 ` [PATCH v1 2/8] x86: add sys_copy_file_range to syscall tables Anna Schumaker
2015-09-04 20:16 ` [PATCH v1 3/8] btrfs: add .copy_file_range file operation Anna Schumaker
     [not found]   ` <1441397823-1203-4-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-04 21:02     ` Josef Bacik
2015-09-09  8:39   ` David Sterba
2015-09-04 20:17 ` [PATCH v1 7/8] vfs: Copy should use file_out rather than file_in Anna Schumaker
     [not found] ` <1441397823-1203-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-04 20:16   ` [PATCH v1 1/9] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
2015-09-04 21:50     ` Darrick J. Wong
2015-09-04 20:16   ` [PATCH v1 4/8] btrfs: Add mountpoint checking during btrfs_copy_file_range Anna Schumaker
2015-09-09  9:18     ` David Sterba
2015-09-09 15:56       ` Anna Schumaker
2015-09-04 20:16   ` [PATCH v1 5/8] vfs: Remove copy_file_range mountpoint checks Anna Schumaker
2015-09-04 20:17   ` [PATCH v1 6/8] vfs: Copy should check len after file open mode Anna Schumaker
2015-09-04 20:17   ` [PATCH v1 8/8] vfs: Fall back on splice if no copy function defined Anna Schumaker
2015-09-04 21:08     ` Darrick J. Wong
     [not found]       ` <20150904210813.GA30681-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-08 14:57         ` Anna Schumaker
2015-09-04 20:17   ` [PATCH v1 9/8] copy_file_range.2: New page documenting copy_file_range() Anna Schumaker
2015-09-04 21:38     ` Darrick J. Wong
     [not found]       ` <20150904213856.GC10391-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-04 22:31         ` Andreas Dilger
     [not found]           ` <95674806-645C-410C-8A4B-A46F03AFFE20-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2015-09-08 15:05             ` Anna Schumaker
2015-09-08 15:04         ` Anna Schumaker
2015-09-08 20:39           ` Darrick J. Wong
2015-09-09  9:16             ` David Sterba
     [not found]             ` <20150908203918.GB30681-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-09 11:38               ` Austin S Hemmelgarn
2015-09-09 17:17                 ` Darrick J. Wong
     [not found]                   ` <20150909171757.GE10391-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-09 17:31                     ` Anna Schumaker
     [not found]                       ` <55F06CEC.5040208-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-09 18:12                         ` Darrick J. Wong
2015-09-09 19:25                           ` Anna Schumaker
2015-09-10 15:42                     ` David Sterba
     [not found]                       ` <20150910154251.GM8891-1ReQVI26iDCaZKY3DrU6dA@public.gmane.org>
2015-09-10 16:43                         ` Darrick J. Wong
2015-09-04 22:25   ` [PATCH v1 0/8] VFS: In-kernel copy system call Andreas Dilger
     [not found]     ` <4B41043F-5D85-42D6-8F20-2DCC45930EF4-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2015-09-05  8:33       ` Al Viro
     [not found]         ` <20150905083342.GG22011-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2015-09-08 15:08           ` Anna Schumaker
2015-09-08 20:45             ` Darrick J. Wong
     [not found]               ` <20150908204517.GC30681-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-09-08 20:49                 ` Anna Schumaker
2015-09-08 15:07     ` Anna Schumaker
2015-09-08 15:21   ` Pádraig Brady
     [not found]     ` <55EEFCEE.5090000-V8g9lnOeT5ydJdNcDFJN0w@public.gmane.org>
2015-09-08 18:23       ` Anna Schumaker
     [not found]         ` <55EF279B.3020101-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-08 19:10           ` Andy Lutomirski
     [not found]             ` <CALCETrXxRB-LXVb+=nkwfj0zEjWuXXTctkSAc9Oec0fgyOQ5Yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-08 20:03               ` Pádraig Brady
     [not found]                 ` <55EF3EFD.3080302-V8g9lnOeT5ydJdNcDFJN0w@public.gmane.org>
2015-09-08 21:29                   ` Darrick J. Wong
2015-09-08 21:45                     ` Andy Lutomirski
2015-09-08 22:39                       ` Darrick J. Wong
2015-09-08 23:08                         ` Andy Lutomirski
2015-09-09  1:19                           ` Darrick J. Wong
2015-09-09 20:09                           ` Chris Mason
     [not found]                             ` <20150909200921.GD9511-DzB2rL6jT1BHfPKRx072akEOCMrvLtNR@public.gmane.org>
2015-09-09 20:26                               ` Trond Myklebust
     [not found]                                 ` <CAHQdGtTSZ1beMMF4DJv=OuA1j2ww0xzJj3+9HMRAf3UpCCLaZg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-09 20:38                                   ` Chris Mason
     [not found]                                     ` <20150909203805.GE9511-DzB2rL6jT1BHfPKRx072akEOCMrvLtNR@public.gmane.org>
2015-09-09 20:41                                       ` Anna Schumaker
     [not found]                                         ` <55F0997E.1040105-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-09 21:42                                           ` Darrick J. Wong
2015-09-09 20:37                               ` Andy Lutomirski
     [not found]                                 ` <CALCETrXPcxHWGwqhtkGStVabWDOsRbBy+VzrN+XxVZA_F9O0qA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-09 20:42                                   ` Chris Mason
     [not found]                           ` <CALCETrVsWBdqvAgwxHcG=gbcWRNPG2ZziWUg1g=siKDrDu7S2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-13 23:25                             ` Dave Chinner
2015-09-14 17:53                               ` Andy Lutomirski
2015-09-09 18:52                         ` Anna Schumaker
     [not found]                           ` <55F07FD8.4020507-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-09 21:16                             ` Darrick J. Wong
2015-09-10 15:10                               ` Anna Schumaker
     [not found]                                 ` <55F19D7F.5090907-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-10 15:49                                   ` Austin S Hemmelgarn [this message]
2015-09-10 11:40                           ` Austin S Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F1A680.8010905@gmail.com \
    --to=ahferroin7-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=P@draigbrady.com \
    --cc=andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=clm-b10kYP2dOMg@public.gmane.org \
    --cc=coreutils-mXXj517/zsQ@public.gmane.org \
    --cc=darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    --cc=zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).