* Re: [fuse-devel] FICLONE / FICLONERANGE support
       [not found] <1fb83b2a-38cf-4b70-8c9e-ac1c77db7080@spawn.link>
@ 2024-01-28 10:07 ` Amir Goldstein
  2024-01-28 19:11   ` Antonio SJ Musumeci
  2024-01-28 21:25   ` Dave Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: Amir Goldstein @ 2024-01-28 10:07 UTC (permalink / raw)
  To: Antonio SJ Musumeci; +Cc: fuse-devel, linux-fsdevel, Miklos Szeredi
On Sun, Jan 28, 2024 at 2:31 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>
> Hello,
>
> Has anyone investigated adding support for FICLONE and FICLONERANGE? I'm
> not seeing any references to either on the mailinglist. I've got a
> passthrough filesystem and with more users taking advantage of btrfs and
> xfs w/ reflinks there has been some demand for the ability to support it.
>
[CC fsdevel because my answer's scope is wider than just FUSE]
FWIW, the kernel implementation of copy_file_range() calls remap_file_range()
(a.k.a. clone_file_range()) for both xfs and btrfs, so if your users control the
application they are using, calling copy_file_range() will propagate via your
fuse filesystem correctly to underlying xfs/btrfs and will effectively result in
clone_file_range().
Thus using tools like cp --reflink, on your passthrough filesystem should yield
the expected result.
For a more practical example see:
https://bugzilla.samba.org/show_bug.cgi?id=12033
Since Samba 4.1, server-side-copy is implemented as copy_file_range()
API-wise, there are two main differences between copy_file_range() and
FICLONERANGE:
1. copy_file_range() can result in partial copy
2. copy_file_range() can results in more used disk space
Other API differences are minor, but the fact that copy_file_range()
is a syscall with a @flags argument makes it a candidate for being
a super-set of both functionalities.
The question is, for your users, are you actually looking for
clone_file_range() support? or is best-effort copy_file_range() with
clone_file_range() fallback enough?
If your users are looking for the atomic clone_file_range() behavior,
then a single flag in fuse_copy_file_range_in::flags is enough to
indicate to the server that the "atomic clone" behavior is wanted.
Note that the @flags argument to copy_file_range() syscall does not
support any flags at all at the moment.
The only flag defined in the kernel COPY_FILE_SPLICE is for
internal use only.
We can define a flag COPY_FILE_CLONE to use either only
internally in kernel and in FUSE protocol or even also in
copy_file_range() syscall.
Sure, we can also add a new FUSE protocol command for
FUSE_CLONE_FILE_RANGE, but I don't think that is
necessary.
It is certainly not necessary if there is agreement to extend the
copy_file_range() syscall to support COPY_FILE_CLONE flag.
What do folks think about this possible API extension?
Thanks,
Amir.
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [fuse-devel] FICLONE / FICLONERANGE support
  2024-01-28 10:07 ` [fuse-devel] FICLONE / FICLONERANGE support Amir Goldstein
@ 2024-01-28 19:11   ` Antonio SJ Musumeci
  2024-01-28 21:25   ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Antonio SJ Musumeci @ 2024-01-28 19:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: fuse-devel, linux-fsdevel, Miklos Szeredi
On Sunday, January 28th, 2024 at 4:07 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> 
> On Sun, Jan 28, 2024 at 2:31 AM Antonio SJ Musumeci trapexit@spawn.link wrote:
> 
> > Hello,
> > 
> > Has anyone investigated adding support for FICLONE and FICLONERANGE? I'm
> > not seeing any references to either on the mailinglist. I've got a
> > passthrough filesystem and with more users taking advantage of btrfs and
> > xfs w/ reflinks there has been some demand for the ability to support it.
> 
> 
> [CC fsdevel because my answer's scope is wider than just FUSE]
> 
> FWIW, the kernel implementation of copy_file_range() calls remap_file_range()
> (a.k.a. clone_file_range()) for both xfs and btrfs, so if your users control the
> application they are using, calling copy_file_range() will propagate via your
> fuse filesystem correctly to underlying xfs/btrfs and will effectively result in
> clone_file_range().
> 
> Thus using tools like cp --reflink, on your passthrough filesystem should yield
> the expected result.
> 
> For a more practical example see:
> https://bugzilla.samba.org/show_bug.cgi?id=12033
> Since Samba 4.1, server-side-copy is implemented as copy_file_range()
> 
> API-wise, there are two main differences between copy_file_range() and
> FICLONERANGE:
> 1. copy_file_range() can result in partial copy
> 2. copy_file_range() can results in more used disk space
> 
> Other API differences are minor, but the fact that copy_file_range()
> is a syscall with a @flags argument makes it a candidate for being
> a super-set of both functionalities.
> 
> The question is, for your users, are you actually looking for
> clone_file_range() support? or is best-effort copy_file_range() with
> clone_file_range() fallback enough?
> 
> If your users are looking for the atomic clone_file_range() behavior,
> then a single flag in fuse_copy_file_range_in::flags is enough to
> indicate to the server that the "atomic clone" behavior is wanted.
> 
> Note that the @flags argument to copy_file_range() syscall does not
> support any flags at all at the moment.
> 
> The only flag defined in the kernel COPY_FILE_SPLICE is for
> internal use only.
> 
> We can define a flag COPY_FILE_CLONE to use either only
> internally in kernel and in FUSE protocol or even also in
> copy_file_range() syscall.
> 
> Sure, we can also add a new FUSE protocol command for
> FUSE_CLONE_FILE_RANGE, but I don't think that is
> necessary.
> It is certainly not necessary if there is agreement to extend the
> copy_file_range() syscall to support COPY_FILE_CLONE flag.
> 
> What do folks think about this possible API extension?
> 
> Thanks,
> Amir.
cp --reflink calls FICLONE. It received a EOPNOTSUPP and falls back to copying normally (if set to auto mode). It appears it still does this: https://github.com/coreutils/coreutils/blob/master/src/copy.c#L1509
My users don't control the software they are running. They are using random tooling that happen to support FICLONE such as cp --reflink. In the most recent case using it for some rsnapshot like backup strategy I believe.
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [fuse-devel] FICLONE / FICLONERANGE support
  2024-01-28 10:07 ` [fuse-devel] FICLONE / FICLONERANGE support Amir Goldstein
  2024-01-28 19:11   ` Antonio SJ Musumeci
@ 2024-01-28 21:25   ` Dave Chinner
  2024-01-29 13:54     ` Amir Goldstein
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2024-01-28 21:25 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Antonio SJ Musumeci, fuse-devel, linux-fsdevel, Miklos Szeredi
On Sun, Jan 28, 2024 at 12:07:22PM +0200, Amir Goldstein wrote:
> On Sun, Jan 28, 2024 at 2:31 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> >
> > Hello,
> >
> > Has anyone investigated adding support for FICLONE and FICLONERANGE? I'm
> > not seeing any references to either on the mailinglist. I've got a
> > passthrough filesystem and with more users taking advantage of btrfs and
> > xfs w/ reflinks there has been some demand for the ability to support it.
> >
> 
> [CC fsdevel because my answer's scope is wider than just FUSE]
> 
> FWIW, the kernel implementation of copy_file_range() calls remap_file_range()
> (a.k.a. clone_file_range()) for both xfs and btrfs, so if your users control the
> application they are using, calling copy_file_range() will propagate via your
> fuse filesystem correctly to underlying xfs/btrfs and will effectively result in
> clone_file_range().
> 
> Thus using tools like cp --reflink, on your passthrough filesystem should yield
> the expected result.
> 
> For a more practical example see:
> https://bugzilla.samba.org/show_bug.cgi?id=12033
> Since Samba 4.1, server-side-copy is implemented as copy_file_range()
> 
> API-wise, there are two main differences between copy_file_range() and
> FICLONERANGE:
> 1. copy_file_range() can result in partial copy
> 2. copy_file_range() can results in more used disk space
> 
> Other API differences are minor, but the fact that copy_file_range()
> is a syscall with a @flags argument makes it a candidate for being
> a super-set of both functionalities.
> 
> The question is, for your users, are you actually looking for
> clone_file_range() support? or is best-effort copy_file_range() with
> clone_file_range() fallback enough?
> 
> If your users are looking for the atomic clone_file_range() behavior,
> then a single flag in fuse_copy_file_range_in::flags is enough to
> indicate to the server that the "atomic clone" behavior is wanted.
> 
> Note that the @flags argument to copy_file_range() syscall does not
> support any flags at all at the moment.
> 
> The only flag defined in the kernel COPY_FILE_SPLICE is for
> internal use only.
> 
> We can define a flag COPY_FILE_CLONE to use either only
> internally in kernel and in FUSE protocol or even also in
> copy_file_range() syscall.
I don't care how fuse implements ->remap_file_range(), but no change
to syscall behaviour, please.
copy_file_range() is supposed to select the best available method
for copying the data based on kernel side technology awareness that
the application knows nothing about (e.g. clone, server-side copy,
block device copy offload, etc). The API is technology agnostic and
largely future proof because of this; adding flags to say "use this
specific technology to copy data or fail" is the exact opposite of
how we want copy_file_range() to work.
i.e. if you want a specific type of "copy" to be done (i.e. clone
rather than data copy) then call FICLONE or copy the data yourself
to do exactly what you need. If you just want it done fast as
possible and don't care about implementation (99% of cases), then
just call copy_file_range().
> Sure, we can also add a new FUSE protocol command for
> FUSE_CLONE_FILE_RANGE, but I don't think that is
> necessary.
> It is certainly not necessary if there is agreement to extend the
> copy_file_range() syscall to support COPY_FILE_CLONE flag.
We have already have FICLONE/FICLONERANGE for this operation. Fuse
just needs to implement ->remap_file_range() server stubs, and then
the back end driver  can choose to implement it if it's storage
mechanisms support such functionality. Then it will get used
automatically for copy_file_range() for those FUSE drivers, the rest
will just copy the data in the kernel using splice as they currently
do...
-Dave.
-- 
Dave Chinner
david@fromorbit.com
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [fuse-devel] FICLONE / FICLONERANGE support
  2024-01-28 21:25   ` Dave Chinner
@ 2024-01-29 13:54     ` Amir Goldstein
  2024-01-30  8:08       ` Shachar Sharon
  0 siblings, 1 reply; 5+ messages in thread
From: Amir Goldstein @ 2024-01-29 13:54 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Antonio SJ Musumeci, fuse-devel, linux-fsdevel, Miklos Szeredi
On Sun, Jan 28, 2024 at 11:25 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sun, Jan 28, 2024 at 12:07:22PM +0200, Amir Goldstein wrote:
> > On Sun, Jan 28, 2024 at 2:31 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> > >
> > > Hello,
> > >
> > > Has anyone investigated adding support for FICLONE and FICLONERANGE? I'm
> > > not seeing any references to either on the mailinglist. I've got a
> > > passthrough filesystem and with more users taking advantage of btrfs and
> > > xfs w/ reflinks there has been some demand for the ability to support it.
> > >
> >
> > [CC fsdevel because my answer's scope is wider than just FUSE]
> >
> > FWIW, the kernel implementation of copy_file_range() calls remap_file_range()
> > (a.k.a. clone_file_range()) for both xfs and btrfs, so if your users control the
> > application they are using, calling copy_file_range() will propagate via your
> > fuse filesystem correctly to underlying xfs/btrfs and will effectively result in
> > clone_file_range().
> >
> > Thus using tools like cp --reflink, on your passthrough filesystem should yield
> > the expected result.
Sorry, cp --reflink indeed uses clone
> >
> > For a more practical example see:
> > https://bugzilla.samba.org/show_bug.cgi?id=12033
> > Since Samba 4.1, server-side-copy is implemented as copy_file_range()
> >
> > API-wise, there are two main differences between copy_file_range() and
> > FICLONERANGE:
> > 1. copy_file_range() can result in partial copy
> > 2. copy_file_range() can results in more used disk space
> >
> > Other API differences are minor, but the fact that copy_file_range()
> > is a syscall with a @flags argument makes it a candidate for being
> > a super-set of both functionalities.
> >
> > The question is, for your users, are you actually looking for
> > clone_file_range() support? or is best-effort copy_file_range() with
> > clone_file_range() fallback enough?
> >
> > If your users are looking for the atomic clone_file_range() behavior,
> > then a single flag in fuse_copy_file_range_in::flags is enough to
> > indicate to the server that the "atomic clone" behavior is wanted.
> >
> > Note that the @flags argument to copy_file_range() syscall does not
> > support any flags at all at the moment.
> >
> > The only flag defined in the kernel COPY_FILE_SPLICE is for
> > internal use only.
> >
> > We can define a flag COPY_FILE_CLONE to use either only
> > internally in kernel and in FUSE protocol or even also in
> > copy_file_range() syscall.
>
> I don't care how fuse implements ->remap_file_range(), but no change
> to syscall behaviour, please.
>
ok.
> copy_file_range() is supposed to select the best available method
> for copying the data based on kernel side technology awareness that
> the application knows nothing about (e.g. clone, server-side copy,
> block device copy offload, etc). The API is technology agnostic and
> largely future proof because of this; adding flags to say "use this
> specific technology to copy data or fail" is the exact opposite of
> how we want copy_file_range() to work.
>
> i.e. if you want a specific type of "copy" to be done (i.e. clone
> rather than data copy) then call FICLONE or copy the data yourself
> to do exactly what you need. If you just want it done fast as
> possible and don't care about implementation (99% of cases), then
> just call copy_file_range().
>
Technically, a flag COPY_FILE_ATOMIC would be a requirement
not an implementation detail, but this requirement could currently be
fulfilled only by fs that implement remap_file_range(), but nevermind,
I won't be trying to push a syscall API change myself.
> > Sure, we can also add a new FUSE protocol command for
> > FUSE_CLONE_FILE_RANGE, but I don't think that is
> > necessary.
> > It is certainly not necessary if there is agreement to extend the
> > copy_file_range() syscall to support COPY_FILE_CLONE flag.
>
> We have already have FICLONE/FICLONERANGE for this operation. Fuse
> just needs to implement ->remap_file_range() server stubs, and then
> the back end driver  can choose to implement it if it's storage
> mechanisms support such functionality.
For Antonio's request to support FICLONERANGE with FUSE,
that would be enough using a new protocol command.
> Then it will get used
> automatically for copy_file_range() for those FUSE drivers, the rest
> will just copy the data in the kernel using splice as they currently
> do...
This is not the current behavior of FUSE as far as I can tell.
The reason is that vfs_copy_file_range() checks if fs implement
->copy_file_range(), if it does, it will not fallback to ->remap_file_range()
nor to splice. This is intentional - fs with ->copy_file_range() has full
control including the decision to return whatever error code to userspace.
The problem is that the FUSE kernel driver always implements
->copy_file_range(), regardless whether the FUSE server implements
FUSE_COPY_FILE_RANGE. So for a FUSE server that does not
implement FUSE_COPY_FILE_RANGE, fc->no_copy_file_range is
true and copy_file_range() returns -EOPNOTSUPP.
So either the fallback from FUSE_COPY_FILE_RANGE to
FUSE_CLONE_FILE_RANGE will be done internally by FUSE,
or clone/copy support will need to be advertised during FUSE_INIT
and a different set of fuse_file_operations will need to be used
accordingly, which seems overly complicated.
Thanks,
Amir.
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [fuse-devel] FICLONE / FICLONERANGE support
  2024-01-29 13:54     ` Amir Goldstein
@ 2024-01-30  8:08       ` Shachar Sharon
  0 siblings, 0 replies; 5+ messages in thread
From: Shachar Sharon @ 2024-01-30  8:08 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Dave Chinner, Antonio SJ Musumeci, fuse-devel, linux-fsdevel,
	Miklos Szeredi
On Mon, Jan 29, 2024 at 3:54 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sun, Jan 28, 2024 at 11:25 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Sun, Jan 28, 2024 at 12:07:22PM +0200, Amir Goldstein wrote:
> > > On Sun, Jan 28, 2024 at 2:31 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> > > >
> > > > Hello,
> > > >
> > > > Has anyone investigated adding support for FICLONE and FICLONERANGE? I'm
> > > > not seeing any references to either on the mailinglist. I've got a
> > > > passthrough filesystem and with more users taking advantage of btrfs and
> > > > xfs w/ reflinks there has been some demand for the ability to support it.
> > > >
> > >
> > > [CC fsdevel because my answer's scope is wider than just FUSE]
> > >
> > > FWIW, the kernel implementation of copy_file_range() calls remap_file_range()
> > > (a.k.a. clone_file_range()) for both xfs and btrfs, so if your users control the
> > > application they are using, calling copy_file_range() will propagate via your
> > > fuse filesystem correctly to underlying xfs/btrfs and will effectively result in
> > > clone_file_range().
> > >
> > > Thus using tools like cp --reflink, on your passthrough filesystem should yield
> > > the expected result.
>
> Sorry, cp --reflink indeed uses clone
>
> > >
> > > For a more practical example see:
> > > https://bugzilla.samba.org/show_bug.cgi?id=12033
> > > Since Samba 4.1, server-side-copy is implemented as copy_file_range()
> > >
> > > API-wise, there are two main differences between copy_file_range() and
> > > FICLONERANGE:
> > > 1. copy_file_range() can result in partial copy
> > > 2. copy_file_range() can results in more used disk space
> > >
> > > Other API differences are minor, but the fact that copy_file_range()
> > > is a syscall with a @flags argument makes it a candidate for being
> > > a super-set of both functionalities.
> > >
> > > The question is, for your users, are you actually looking for
> > > clone_file_range() support? or is best-effort copy_file_range() with
> > > clone_file_range() fallback enough?
> > >
> > > If your users are looking for the atomic clone_file_range() behavior,
> > > then a single flag in fuse_copy_file_range_in::flags is enough to
> > > indicate to the server that the "atomic clone" behavior is wanted.
> > >
> > > Note that the @flags argument to copy_file_range() syscall does not
> > > support any flags at all at the moment.
> > >
> > > The only flag defined in the kernel COPY_FILE_SPLICE is for
> > > internal use only.
> > >
> > > We can define a flag COPY_FILE_CLONE to use either only
> > > internally in kernel and in FUSE protocol or even also in
> > > copy_file_range() syscall.
> >
> > I don't care how fuse implements ->remap_file_range(), but no change
> > to syscall behaviour, please.
> >
>
> ok.
>
> > copy_file_range() is supposed to select the best available method
> > for copying the data based on kernel side technology awareness that
> > the application knows nothing about (e.g. clone, server-side copy,
> > block device copy offload, etc). The API is technology agnostic and
> > largely future proof because of this; adding flags to say "use this
> > specific technology to copy data or fail" is the exact opposite of
> > how we want copy_file_range() to work.
> >
> > i.e. if you want a specific type of "copy" to be done (i.e. clone
> > rather than data copy) then call FICLONE or copy the data yourself
> > to do exactly what you need. If you just want it done fast as
> > possible and don't care about implementation (99% of cases), then
> > just call copy_file_range().
> >
>
> Technically, a flag COPY_FILE_ATOMIC would be a requirement
> not an implementation detail, but this requirement could currently be
> fulfilled only by fs that implement remap_file_range(), but nevermind,
> I won't be trying to push a syscall API change myself.
>
> > > Sure, we can also add a new FUSE protocol command for
> > > FUSE_CLONE_FILE_RANGE, but I don't think that is
> > > necessary.
> > > It is certainly not necessary if there is agreement to extend the
> > > copy_file_range() syscall to support COPY_FILE_CLONE flag.
> >
> > We have already have FICLONE/FICLONERANGE for this operation. Fuse
> > just needs to implement ->remap_file_range() server stubs, and then
> > the back end driver  can choose to implement it if it's storage
> > mechanisms support such functionality.
>
> For Antonio's request to support FICLONERANGE with FUSE,
> that would be enough using a new protocol command.
>
> > Then it will get used
> > automatically for copy_file_range() for those FUSE drivers, the rest
> > will just copy the data in the kernel using splice as they currently
> > do...
>
> This is not the current behavior of FUSE as far as I can tell.
> The reason is that vfs_copy_file_range() checks if fs implement
> ->copy_file_range(), if it does, it will not fallback to ->remap_file_range()
> nor to splice. This is intentional - fs with ->copy_file_range() has full
> control including the decision to return whatever error code to userspace.
>
> The problem is that the FUSE kernel driver always implements
> ->copy_file_range(), regardless whether the FUSE server implements
> FUSE_COPY_FILE_RANGE. So for a FUSE server that does not
> implement FUSE_COPY_FILE_RANGE, fc->no_copy_file_range is
> true and copy_file_range() returns -EOPNOTSUPP.
>
> So either the fallback from FUSE_COPY_FILE_RANGE to
> FUSE_CLONE_FILE_RANGE will be done internally by FUSE,
> or clone/copy support will need to be advertised during FUSE_INIT
> and a different set of fuse_file_operations will need to be used
> accordingly, which seems overly complicated.
>
Note that FUSE_COPY_FILE_RANGE uses struct fuse_write_out to report
the number of bytes copied between files (uint32_t size), and therefore it can
not copy more than 2^32-1 bytes at each call. For example, a call to
cp --reflink
of 1T file yields multiple calls to copy_file_range() by userspace.
- Shachar.
> Thanks,
> Amir.
>
^ permalink raw reply	[flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-30  8:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1fb83b2a-38cf-4b70-8c9e-ac1c77db7080@spawn.link>
2024-01-28 10:07 ` [fuse-devel] FICLONE / FICLONERANGE support Amir Goldstein
2024-01-28 19:11   ` Antonio SJ Musumeci
2024-01-28 21:25   ` Dave Chinner
2024-01-29 13:54     ` Amir Goldstein
2024-01-30  8:08       ` Shachar Sharon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).