[PATCH] btrfs: send: fix invalid clone operation for file that got its size decreased

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] btrfs: send: fix invalid clone operation for file that got its size decreased
@ 2024-09-27 10:25 fdmanana
  2024-09-27 10:33 ` Qu Wenruo
  2024-09-27 11:03 ` [PATCH v2] " fdmanana
  0 siblings, 2 replies; 10+ messages in thread
From: fdmanana @ 2024-09-27 10:25 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

During an incremental send we may end up sending an invalid clone
operation, for the last extent of a file which ends at an unaligned offset
that matches the final i_size of the file in the send snapshot, in case
the file had its initial size (the size in the parent snapshot) decreased
in the send snapshot. In this case the destination will fail to apply the
clone operation because its end offset is not sector size aligned and it
ends before the current size of the file.

Sending the truncate operation always happens when we finish processing an
inode, after we process all its extents (and xattrs, names, etc). So fix
this by ensuring the file has the correct size before we send a clone
operation for an unaligned extent that ends at the final i_size of the file.

The following test reproduces the issue:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  # Create a file with a size of 256K + 5 bytes, having two extents, one
  # with a size of 128K and another one with a size of 128K + 5 bytes.
  last_ext_size=$((128 * 1024 + 5))
  xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \
         -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \
         $MNT/foo

  # Another file which we will later clone foo into, but initially with
  # a larger size than foo.
  xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar

  btrfs subvolume snapshot -r $MNT/ $MNT/snap1

  # Now resize bar and clone foo into it.
  xfs_io -c "truncate 0" \
         -c "reflink $MNT/foo" $MNT/bar

  btrfs subvolume snapshot -r $MNT/ $MNT/snap2

  rm -f /tmp/send-full /tmp/send-inc
  btrfs send -f /tmp/send-full $MNT/snap1
  btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2

  umount $MNT
  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  btrfs receive -f /tmp/send-full $MNT
  btrfs receive -f /tmp/send-inc $MNT

  umount $MNT

Running it before this patch:

  $ ./test.sh
  (...)
  At subvol snap1
  At snapshot snap2
  ERROR: failed to clone extents to bar: Invalid argument

A test case for fstests will be sent soon.

Reported-by: Ben Millwood <thebenmachine@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/
Fixes: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/send.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 5871ca845b0e..3ee27840a95a 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6189,8 +6189,25 @@ static int send_write_or_clone(struct send_ctx *sctx,
 	if (ret < 0)
 		return ret;
 
-	if (clone_root->offset + num_bytes == info.size)
+	if (clone_root->offset + num_bytes == info.size) {
+		/*
+		 * The final size of our file matches the end offset, but it may
+		 * be that its current size is larger, so we have to truncate it
+		 * to that size (or to the start offset of the extent) otherwise
+		 * the clone operation is invalid because it's unaligned and it
+		 * ends before the current EOF.
+		 * We do this truncate when we finish processing the inode, but
+		 * it's too late by then, so we must do it now.
+		 */
+		if (sctx->parent_root != NULL) {
+			ret = send_truncate(sctx, sctx->cur_ino,
+					    sctx->cur_inode_gen,
+					    sctx->cur_inode_size);
+			if (ret < 0)
+				return ret;
+		}
 		goto clone_data;
+	}
 
 write_data:
 	ret = send_extent_data(sctx, path, offset, num_bytes);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] btrfs: send: fix invalid clone operation for file that got its size decreased
  2024-09-27 10:25 [PATCH] btrfs: send: fix invalid clone operation for file that got its size decreased fdmanana
@ 2024-09-27 10:33 ` Qu Wenruo
  2024-09-27 10:36   ` Filipe Manana
  2024-09-27 11:03 ` [PATCH v2] " fdmanana
  1 sibling, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2024-09-27 10:33 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



在 2024/9/27 19:55, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> During an incremental send we may end up sending an invalid clone
> operation, for the last extent of a file which ends at an unaligned offset
> that matches the final i_size of the file in the send snapshot, in case
> the file had its initial size (the size in the parent snapshot) decreased
> in the send snapshot. In this case the destination will fail to apply the
> clone operation because its end offset is not sector size aligned and it
> ends before the current size of the file.
>
> Sending the truncate operation always happens when we finish processing an
> inode, after we process all its extents (and xattrs, names, etc). So fix
> this by ensuring the file has the correct size before we send a clone
> operation for an unaligned extent that ends at the final i_size of the file.
>
> The following test reproduces the issue:
>
>    $ cat test.sh
>    #!/bin/bash
>
>    DEV=/dev/sdi
>    MNT=/mnt/sdi
>
>    mkfs.btrfs -f $DEV
>    mount $DEV $MNT
>
>    # Create a file with a size of 256K + 5 bytes, having two extents, one
>    # with a size of 128K and another one with a size of 128K + 5 bytes.
>    last_ext_size=$((128 * 1024 + 5))
>    xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \
>           -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \
>           $MNT/foo
>
>    # Another file which we will later clone foo into, but initially with
>    # a larger size than foo.
>    xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar
>
>    btrfs subvolume snapshot -r $MNT/ $MNT/snap1
>
>    # Now resize bar and clone foo into it.
>    xfs_io -c "truncate 0" \
>           -c "reflink $MNT/foo" $MNT/bar
>
>    btrfs subvolume snapshot -r $MNT/ $MNT/snap2
>
>    rm -f /tmp/send-full /tmp/send-inc
>    btrfs send -f /tmp/send-full $MNT/snap1
>    btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2
>
>    umount $MNT
>    mkfs.btrfs -f $DEV
>    mount $DEV $MNT
>
>    btrfs receive -f /tmp/send-full $MNT
>    btrfs receive -f /tmp/send-inc $MNT
>
>    umount $MNT
>
> Running it before this patch:
>
>    $ ./test.sh
>    (...)
>    At subvol snap1
>    At snapshot snap2
>    ERROR: failed to clone extents to bar: Invalid argument
>
> A test case for fstests will be sent soon.
>
> Reported-by: Ben Millwood <thebenmachine@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/
> Fixes: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Just a small nitpick inlined below.
> ---
>   fs/btrfs/send.c | 19 ++++++++++++++++++-
>   1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 5871ca845b0e..3ee27840a95a 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -6189,8 +6189,25 @@ static int send_write_or_clone(struct send_ctx *sctx,
>   	if (ret < 0)
>   		return ret;
>
> -	if (clone_root->offset + num_bytes == info.size)
> +	if (clone_root->offset + num_bytes == info.size) {
> +		/*
> +		 * The final size of our file matches the end offset, but it may
> +		 * be that its current size is larger, so we have to truncate it
> +		 * to that size (or to the start offset of the extent) otherwise
> +		 * the clone operation is invalid because it's unaligned and it
> +		 * ends before the current EOF.
> +		 * We do this truncate when we finish processing the inode, but
> +		 * it's too late by then, so we must do it now.
> +		 */
> +		if (sctx->parent_root != NULL) {
> +			ret = send_truncate(sctx, sctx->cur_ino,
> +					    sctx->cur_inode_gen,
> +					    sctx->cur_inode_size);

I know this is a little overkilled, but can we just truncate the inode
size to round_down(cur_inode_size, secotrsize)?

As the truncate will still dirty the last sector, and later clone() will
writeback the range covering the last sector, causing unnecessary IO for
the sector we are going to drop immediately.

Thanks,
Qu

> +			if (ret < 0)
> +				return ret;
> +		}
>   		goto clone_data;
> +	}
>
>   write_data:
>   	ret = send_extent_data(sctx, path, offset, num_bytes);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] btrfs: send: fix invalid clone operation for file that got its size decreased
  2024-09-27 10:33 ` Qu Wenruo
@ 2024-09-27 10:36   ` Filipe Manana
  2024-09-27 10:38     ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Filipe Manana @ 2024-09-27 10:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Sep 27, 2024 at 11:34 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/9/27 19:55, fdmanana@kernel.org 写道:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > During an incremental send we may end up sending an invalid clone
> > operation, for the last extent of a file which ends at an unaligned offset
> > that matches the final i_size of the file in the send snapshot, in case
> > the file had its initial size (the size in the parent snapshot) decreased
> > in the send snapshot. In this case the destination will fail to apply the
> > clone operation because its end offset is not sector size aligned and it
> > ends before the current size of the file.
> >
> > Sending the truncate operation always happens when we finish processing an
> > inode, after we process all its extents (and xattrs, names, etc). So fix
> > this by ensuring the file has the correct size before we send a clone
> > operation for an unaligned extent that ends at the final i_size of the file.
> >
> > The following test reproduces the issue:
> >
> >    $ cat test.sh
> >    #!/bin/bash
> >
> >    DEV=/dev/sdi
> >    MNT=/mnt/sdi
> >
> >    mkfs.btrfs -f $DEV
> >    mount $DEV $MNT
> >
> >    # Create a file with a size of 256K + 5 bytes, having two extents, one
> >    # with a size of 128K and another one with a size of 128K + 5 bytes.
> >    last_ext_size=$((128 * 1024 + 5))
> >    xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \
> >           -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \
> >           $MNT/foo
> >
> >    # Another file which we will later clone foo into, but initially with
> >    # a larger size than foo.
> >    xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar
> >
> >    btrfs subvolume snapshot -r $MNT/ $MNT/snap1
> >
> >    # Now resize bar and clone foo into it.
> >    xfs_io -c "truncate 0" \
> >           -c "reflink $MNT/foo" $MNT/bar
> >
> >    btrfs subvolume snapshot -r $MNT/ $MNT/snap2
> >
> >    rm -f /tmp/send-full /tmp/send-inc
> >    btrfs send -f /tmp/send-full $MNT/snap1
> >    btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2
> >
> >    umount $MNT
> >    mkfs.btrfs -f $DEV
> >    mount $DEV $MNT
> >
> >    btrfs receive -f /tmp/send-full $MNT
> >    btrfs receive -f /tmp/send-inc $MNT
> >
> >    umount $MNT
> >
> > Running it before this patch:
> >
> >    $ ./test.sh
> >    (...)
> >    At subvol snap1
> >    At snapshot snap2
> >    ERROR: failed to clone extents to bar: Invalid argument
> >
> > A test case for fstests will be sent soon.
> >
> > Reported-by: Ben Millwood <thebenmachine@gmail.com>
> > Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/
> > Fixes: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
>
> Just a small nitpick inlined below.
> > ---
> >   fs/btrfs/send.c | 19 ++++++++++++++++++-
> >   1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > index 5871ca845b0e..3ee27840a95a 100644
> > --- a/fs/btrfs/send.c
> > +++ b/fs/btrfs/send.c
> > @@ -6189,8 +6189,25 @@ static int send_write_or_clone(struct send_ctx *sctx,
> >       if (ret < 0)
> >               return ret;
> >
> > -     if (clone_root->offset + num_bytes == info.size)
> > +     if (clone_root->offset + num_bytes == info.size) {
> > +             /*
> > +              * The final size of our file matches the end offset, but it may
> > +              * be that its current size is larger, so we have to truncate it
> > +              * to that size (or to the start offset of the extent) otherwise
> > +              * the clone operation is invalid because it's unaligned and it
> > +              * ends before the current EOF.
> > +              * We do this truncate when we finish processing the inode, but
> > +              * it's too late by then, so we must do it now.
> > +              */
> > +             if (sctx->parent_root != NULL) {
> > +                     ret = send_truncate(sctx, sctx->cur_ino,
> > +                                         sctx->cur_inode_gen,
> > +                                         sctx->cur_inode_size);
>
> I know this is a little overkilled, but can we just truncate the inode
> size to round_down(cur_inode_size, secotrsize)?
>
> As the truncate will still dirty the last sector, and later clone() will
> writeback the range covering the last sector, causing unnecessary IO for
> the sector we are going to drop immediately.

For that it's more logical to truncate to the start offset which is
always aligned.
I'll do that.

Thanks.

>
> Thanks,
> Qu
>
> > +                     if (ret < 0)
> > +                             return ret;
> > +             }
> >               goto clone_data;
> > +     }
> >
> >   write_data:
> >       ret = send_extent_data(sctx, path, offset, num_bytes);
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] btrfs: send: fix invalid clone operation for file that got its size decreased
  2024-09-27 10:36   ` Filipe Manana
@ 2024-09-27 10:38     ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-09-27 10:38 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs



在 2024/9/27 20:06, Filipe Manana 写道:
> On Fri, Sep 27, 2024 at 11:34 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> 在 2024/9/27 19:55, fdmanana@kernel.org 写道:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> During an incremental send we may end up sending an invalid clone
>>> operation, for the last extent of a file which ends at an unaligned offset
>>> that matches the final i_size of the file in the send snapshot, in case
>>> the file had its initial size (the size in the parent snapshot) decreased
>>> in the send snapshot. In this case the destination will fail to apply the
>>> clone operation because its end offset is not sector size aligned and it
>>> ends before the current size of the file.
>>>
>>> Sending the truncate operation always happens when we finish processing an
>>> inode, after we process all its extents (and xattrs, names, etc). So fix
>>> this by ensuring the file has the correct size before we send a clone
>>> operation for an unaligned extent that ends at the final i_size of the file.
>>>
>>> The following test reproduces the issue:
>>>
>>>     $ cat test.sh
>>>     #!/bin/bash
>>>
>>>     DEV=/dev/sdi
>>>     MNT=/mnt/sdi
>>>
>>>     mkfs.btrfs -f $DEV
>>>     mount $DEV $MNT
>>>
>>>     # Create a file with a size of 256K + 5 bytes, having two extents, one
>>>     # with a size of 128K and another one with a size of 128K + 5 bytes.
>>>     last_ext_size=$((128 * 1024 + 5))
>>>     xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \
>>>            -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \
>>>            $MNT/foo
>>>
>>>     # Another file which we will later clone foo into, but initially with
>>>     # a larger size than foo.
>>>     xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar
>>>
>>>     btrfs subvolume snapshot -r $MNT/ $MNT/snap1
>>>
>>>     # Now resize bar and clone foo into it.
>>>     xfs_io -c "truncate 0" \
>>>            -c "reflink $MNT/foo" $MNT/bar
>>>
>>>     btrfs subvolume snapshot -r $MNT/ $MNT/snap2
>>>
>>>     rm -f /tmp/send-full /tmp/send-inc
>>>     btrfs send -f /tmp/send-full $MNT/snap1
>>>     btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2
>>>
>>>     umount $MNT
>>>     mkfs.btrfs -f $DEV
>>>     mount $DEV $MNT
>>>
>>>     btrfs receive -f /tmp/send-full $MNT
>>>     btrfs receive -f /tmp/send-inc $MNT
>>>
>>>     umount $MNT
>>>
>>> Running it before this patch:
>>>
>>>     $ ./test.sh
>>>     (...)
>>>     At subvol snap1
>>>     At snapshot snap2
>>>     ERROR: failed to clone extents to bar: Invalid argument
>>>
>>> A test case for fstests will be sent soon.
>>>
>>> Reported-by: Ben Millwood <thebenmachine@gmail.com>
>>> Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/
>>> Fixes: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>
>> Just a small nitpick inlined below.
>>> ---
>>>    fs/btrfs/send.c | 19 ++++++++++++++++++-
>>>    1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>> index 5871ca845b0e..3ee27840a95a 100644
>>> --- a/fs/btrfs/send.c
>>> +++ b/fs/btrfs/send.c
>>> @@ -6189,8 +6189,25 @@ static int send_write_or_clone(struct send_ctx *sctx,
>>>        if (ret < 0)
>>>                return ret;
>>>
>>> -     if (clone_root->offset + num_bytes == info.size)
>>> +     if (clone_root->offset + num_bytes == info.size) {
>>> +             /*
>>> +              * The final size of our file matches the end offset, but it may
>>> +              * be that its current size is larger, so we have to truncate it
>>> +              * to that size (or to the start offset of the extent) otherwise
>>> +              * the clone operation is invalid because it's unaligned and it
>>> +              * ends before the current EOF.
>>> +              * We do this truncate when we finish processing the inode, but
>>> +              * it's too late by then, so we must do it now.
>>> +              */
>>> +             if (sctx->parent_root != NULL) {
>>> +                     ret = send_truncate(sctx, sctx->cur_ino,
>>> +                                         sctx->cur_inode_gen,
>>> +                                         sctx->cur_inode_size);
>>
>> I know this is a little overkilled, but can we just truncate the inode
>> size to round_down(cur_inode_size, secotrsize)?
>>
>> As the truncate will still dirty the last sector, and later clone() will
>> writeback the range covering the last sector, causing unnecessary IO for
>> the sector we are going to drop immediately.
>
> For that it's more logical to truncate to the start offset which is
> always aligned.

Right, that's more reasonable.

Thanks for the fix.
Qu

> I'll do that.
>
> Thanks.
>
>>
>> Thanks,
>> Qu
>>
>>> +                     if (ret < 0)
>>> +                             return ret;
>>> +             }
>>>                goto clone_data;
>>> +     }
>>>
>>>    write_data:
>>>        ret = send_extent_data(sctx, path, offset, num_bytes);
>>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] btrfs: send: fix invalid clone operation for file that got its size decreased
  2024-09-27 10:25 [PATCH] btrfs: send: fix invalid clone operation for file that got its size decreased fdmanana
  2024-09-27 10:33 ` Qu Wenruo
@ 2024-09-27 11:03 ` fdmanana
  2024-09-30 18:11   ` Josef Bacik
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: fdmanana @ 2024-09-27 11:03 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

During an incremental send we may end up sending an invalid clone
operation, for the last extent of a file which ends at an unaligned offset
that matches the final i_size of the file in the send snapshot, in case
the file had its initial size (the size in the parent snapshot) decreased
in the send snapshot. In this case the destination will fail to apply the
clone operation because its end offset is not sector size aligned and it
ends before the current size of the file.

Sending the truncate operation always happens when we finish processing an
inode, after we process all its extents (and xattrs, names, etc). So fix
this by ensuring the file has a valid size before we send a clone
operation for an unaligned extent that ends at the final i_size of the
file. The size we truncate to matches the start offset of the clone range
but it could be any value between that start offset and the final size of
the file since the clone operation will expand the i_size if the current
size is smaller than the end offset. The start offset of the range was
chosen because it's always sector size aligned and avoids a truncation
into the middle of a page, which results in dirtying the page due to
filling part of it with zeroes and then making the clone operation at the
receiver trigger IO.

The following test reproduces the issue:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  # Create a file with a size of 256K + 5 bytes, having two extents, one
  # with a size of 128K and another one with a size of 128K + 5 bytes.
  last_ext_size=$((128 * 1024 + 5))
  xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \
         -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \
         $MNT/foo

  # Another file which we will later clone foo into, but initially with
  # a larger size than foo.
  xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar

  btrfs subvolume snapshot -r $MNT/ $MNT/snap1

  # Now resize bar and clone foo into it.
  xfs_io -c "truncate 0" \
         -c "reflink $MNT/foo" $MNT/bar

  btrfs subvolume snapshot -r $MNT/ $MNT/snap2

  rm -f /tmp/send-full /tmp/send-inc
  btrfs send -f /tmp/send-full $MNT/snap1
  btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2

  umount $MNT
  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  btrfs receive -f /tmp/send-full $MNT
  btrfs receive -f /tmp/send-inc $MNT

  umount $MNT

Running it before this patch:

  $ ./test.sh
  (...)
  At subvol snap1
  At snapshot snap2
  ERROR: failed to clone extents to bar: Invalid argument

A test case for fstests will be sent soon.

Reported-by: Ben Millwood <thebenmachine@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/
Fixes: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/send.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 5871ca845b0e..27306d98ec43 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6189,8 +6189,29 @@ static int send_write_or_clone(struct send_ctx *sctx,
 	if (ret < 0)
 		return ret;

-	if (clone_root->offset + num_bytes == info.size)
+	if (clone_root->offset + num_bytes == info.size) {
+		/*
+		 * The final size of our file matches the end offset, but it may
+		 * be that its current size is larger, so we have to truncate it
+		 * to any value between the start offset of the range and the
+		 * final i_size, otherwise the clone operation is invalid
+		 * because it's unaligned and it ends before the current EOF.
+		 * We do this truncate to the final i_size when we finish
+		 * processing the inode, but it's too late by then. And here we
+		 * truncate to the start offset of the range because it's always
+		 * sector size aligned while if it were the final i_size it
+		 * would result in dirtying part of a page, filling part of a
+		 * page with zeroes and then having the clone operation at the
+		 * receiver trigger IO and wait for it due to the dirty page.
+		 */
+		if (sctx->parent_root != NULL) {
+			ret = send_truncate(sctx, sctx->cur_ino,
+					    sctx->cur_inode_gen, offset);
+			if (ret < 0)
+				return ret;
+		}
 		goto clone_data;
+	}

 write_data:
 	ret = send_extent_data(sctx, path, offset, num_bytes);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] btrfs: send: fix invalid clone operation for file that got its size decreased
  2024-09-27 11:03 ` [PATCH v2] " fdmanana
@ 2024-09-30 18:11   ` Josef Bacik
  2024-09-30 22:17   ` Qu Wenruo
  2024-11-13 13:07   ` Markus
  2 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2024-09-30 18:11 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Fri, Sep 27, 2024 at 12:03:55PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> During an incremental send we may end up sending an invalid clone
> operation, for the last extent of a file which ends at an unaligned offset
> that matches the final i_size of the file in the send snapshot, in case
> the file had its initial size (the size in the parent snapshot) decreased
> in the send snapshot. In this case the destination will fail to apply the
> clone operation because its end offset is not sector size aligned and it
> ends before the current size of the file.
> 
> Sending the truncate operation always happens when we finish processing an
> inode, after we process all its extents (and xattrs, names, etc). So fix
> this by ensuring the file has a valid size before we send a clone
> operation for an unaligned extent that ends at the final i_size of the
> file. The size we truncate to matches the start offset of the clone range
> but it could be any value between that start offset and the final size of
> the file since the clone operation will expand the i_size if the current
> size is smaller than the end offset. The start offset of the range was
> chosen because it's always sector size aligned and avoids a truncation
> into the middle of a page, which results in dirtying the page due to
> filling part of it with zeroes and then making the clone operation at the
> receiver trigger IO.
> 
> The following test reproduces the issue:
> 
>   $ cat test.sh
>   #!/bin/bash
> 
>   DEV=/dev/sdi
>   MNT=/mnt/sdi
> 
>   mkfs.btrfs -f $DEV
>   mount $DEV $MNT
> 
>   # Create a file with a size of 256K + 5 bytes, having two extents, one
>   # with a size of 128K and another one with a size of 128K + 5 bytes.
>   last_ext_size=$((128 * 1024 + 5))
>   xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \
>          -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \
>          $MNT/foo
> 
>   # Another file which we will later clone foo into, but initially with
>   # a larger size than foo.
>   xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar
> 
>   btrfs subvolume snapshot -r $MNT/ $MNT/snap1
> 
>   # Now resize bar and clone foo into it.
>   xfs_io -c "truncate 0" \
>          -c "reflink $MNT/foo" $MNT/bar
> 
>   btrfs subvolume snapshot -r $MNT/ $MNT/snap2
> 
>   rm -f /tmp/send-full /tmp/send-inc
>   btrfs send -f /tmp/send-full $MNT/snap1
>   btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2
> 
>   umount $MNT
>   mkfs.btrfs -f $DEV
>   mount $DEV $MNT
> 
>   btrfs receive -f /tmp/send-full $MNT
>   btrfs receive -f /tmp/send-inc $MNT
> 
>   umount $MNT
> 
> Running it before this patch:
> 
>   $ ./test.sh
>   (...)
>   At subvol snap1
>   At snapshot snap2
>   ERROR: failed to clone extents to bar: Invalid argument
> 
> A test case for fstests will be sent soon.
> 
> Reported-by: Ben Millwood <thebenmachine@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/
> Fixes: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] btrfs: send: fix invalid clone operation for file that got its size decreased
  2024-09-27 11:03 ` [PATCH v2] " fdmanana
  2024-09-30 18:11   ` Josef Bacik
@ 2024-09-30 22:17   ` Qu Wenruo
  2024-11-13 13:07   ` Markus
  2 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-09-30 22:17 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



在 2024/9/27 20:33, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> During an incremental send we may end up sending an invalid clone
> operation, for the last extent of a file which ends at an unaligned offset
> that matches the final i_size of the file in the send snapshot, in case
> the file had its initial size (the size in the parent snapshot) decreased
> in the send snapshot. In this case the destination will fail to apply the
> clone operation because its end offset is not sector size aligned and it
> ends before the current size of the file.
>
> Sending the truncate operation always happens when we finish processing an
> inode, after we process all its extents (and xattrs, names, etc). So fix
> this by ensuring the file has a valid size before we send a clone
> operation for an unaligned extent that ends at the final i_size of the
> file. The size we truncate to matches the start offset of the clone range
> but it could be any value between that start offset and the final size of
> the file since the clone operation will expand the i_size if the current
> size is smaller than the end offset. The start offset of the range was
> chosen because it's always sector size aligned and avoids a truncation
> into the middle of a page, which results in dirtying the page due to
> filling part of it with zeroes and then making the clone operation at the
> receiver trigger IO.
>
> The following test reproduces the issue:
>
>    $ cat test.sh
>    #!/bin/bash
>
>    DEV=/dev/sdi
>    MNT=/mnt/sdi
>
>    mkfs.btrfs -f $DEV
>    mount $DEV $MNT
>
>    # Create a file with a size of 256K + 5 bytes, having two extents, one
>    # with a size of 128K and another one with a size of 128K + 5 bytes.
>    last_ext_size=$((128 * 1024 + 5))
>    xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \
>           -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \
>           $MNT/foo
>
>    # Another file which we will later clone foo into, but initially with
>    # a larger size than foo.
>    xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar
>
>    btrfs subvolume snapshot -r $MNT/ $MNT/snap1
>
>    # Now resize bar and clone foo into it.
>    xfs_io -c "truncate 0" \
>           -c "reflink $MNT/foo" $MNT/bar
>
>    btrfs subvolume snapshot -r $MNT/ $MNT/snap2
>
>    rm -f /tmp/send-full /tmp/send-inc
>    btrfs send -f /tmp/send-full $MNT/snap1
>    btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2
>
>    umount $MNT
>    mkfs.btrfs -f $DEV
>    mount $DEV $MNT
>
>    btrfs receive -f /tmp/send-full $MNT
>    btrfs receive -f /tmp/send-inc $MNT
>
>    umount $MNT
>
> Running it before this patch:
>
>    $ ./test.sh
>    (...)
>    At subvol snap1
>    At snapshot snap2
>    ERROR: failed to clone extents to bar: Invalid argument
>
> A test case for fstests will be sent soon.
>
> Reported-by: Ben Millwood <thebenmachine@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/
> Fixes: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/send.c | 23 ++++++++++++++++++++++-
>   1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 5871ca845b0e..27306d98ec43 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -6189,8 +6189,29 @@ static int send_write_or_clone(struct send_ctx *sctx,
>   	if (ret < 0)
>   		return ret;
>
> -	if (clone_root->offset + num_bytes == info.size)
> +	if (clone_root->offset + num_bytes == info.size) {
> +		/*
> +		 * The final size of our file matches the end offset, but it may
> +		 * be that its current size is larger, so we have to truncate it
> +		 * to any value between the start offset of the range and the
> +		 * final i_size, otherwise the clone operation is invalid
> +		 * because it's unaligned and it ends before the current EOF.
> +		 * We do this truncate to the final i_size when we finish
> +		 * processing the inode, but it's too late by then. And here we
> +		 * truncate to the start offset of the range because it's always
> +		 * sector size aligned while if it were the final i_size it
> +		 * would result in dirtying part of a page, filling part of a
> +		 * page with zeroes and then having the clone operation at the
> +		 * receiver trigger IO and wait for it due to the dirty page.
> +		 */
> +		if (sctx->parent_root != NULL) {
> +			ret = send_truncate(sctx, sctx->cur_ino,
> +					    sctx->cur_inode_gen, offset);
> +			if (ret < 0)
> +				return ret;
> +		}
>   		goto clone_data;
> +	}
>
>   write_data:
>   	ret = send_extent_data(sctx, path, offset, num_bytes);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] btrfs: send: fix invalid clone operation for file that got its size decreased
  2024-09-27 11:03 ` [PATCH v2] " fdmanana
  2024-09-30 18:11   ` Josef Bacik
  2024-09-30 22:17   ` Qu Wenruo
@ 2024-11-13 13:07   ` Markus
  2024-11-13 15:01     ` Filipe Manana
  2 siblings, 1 reply; 10+ messages in thread
From: Markus @ 2024-11-13 13:07 UTC (permalink / raw)
  To: fdmanana, linux-btrfs

Am 27.09.24 um 13:03 schrieb fdmanana@kernel.org:
> From: Filipe Manana <fdmanana@suse.com>
> 
> During an incremental send we may end up sending an invalid clone
> operation, for the last extent of a file which ends at an unaligned offset
> that matches the final i_size of the file in the send snapshot, in case
> the file had its initial size (the size in the parent snapshot) decreased
> in the send snapshot. In this case the destination will fail to apply the
> clone operation because its end offset is not sector size aligned and it
> ends before the current size of the file.
> 
> Sending the truncate operation always happens when we finish processing an
> inode, after we process all its extents (and xattrs, names, etc). So fix
> this by ensuring the file has a valid size before we send a clone
> operation for an unaligned extent that ends at the final i_size of the
> file. The size we truncate to matches the start offset of the clone range
> but it could be any value between that start offset and the final size of
> the file since the clone operation will expand the i_size if the current
> size is smaller than the end offset. The start offset of the range was
> chosen because it's always sector size aligned and avoids a truncation
> into the middle of a page, which results in dirtying the page due to
> filling part of it with zeroes and then making the clone operation at the
> receiver trigger IO.

I came across this patch/message after I had the "failed to clone
extents" problem 3-4x on Debian 12, 6.1.0-27-amd64. For us, it only
occurs since we periodically run a Btrfs balance via Cronjob.

That's why I'm wondering: Is it possible that Btrfs Balance increases
the likelihood of the problem occurring?

Best,
Markus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] btrfs: send: fix invalid clone operation for file that got its size decreased
  2024-11-13 13:07   ` Markus
@ 2024-11-13 15:01     ` Filipe Manana
  2024-11-13 16:50       ` Markus
  0 siblings, 1 reply; 10+ messages in thread
From: Filipe Manana @ 2024-11-13 15:01 UTC (permalink / raw)
  To: Markus; +Cc: linux-btrfs

On Wed, Nov 13, 2024 at 1:07 PM Markus <markus@opsone.ch> wrote:
>
> Am 27.09.24 um 13:03 schrieb fdmanana@kernel.org:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > During an incremental send we may end up sending an invalid clone
> > operation, for the last extent of a file which ends at an unaligned offset
> > that matches the final i_size of the file in the send snapshot, in case
> > the file had its initial size (the size in the parent snapshot) decreased
> > in the send snapshot. In this case the destination will fail to apply the
> > clone operation because its end offset is not sector size aligned and it
> > ends before the current size of the file.
> >
> > Sending the truncate operation always happens when we finish processing an
> > inode, after we process all its extents (and xattrs, names, etc). So fix
> > this by ensuring the file has a valid size before we send a clone
> > operation for an unaligned extent that ends at the final i_size of the
> > file. The size we truncate to matches the start offset of the clone range
> > but it could be any value between that start offset and the final size of
> > the file since the clone operation will expand the i_size if the current
> > size is smaller than the end offset. The start offset of the range was
> > chosen because it's always sector size aligned and avoids a truncation
> > into the middle of a page, which results in dirtying the page due to
> > filling part of it with zeroes and then making the clone operation at the
> > receiver trigger IO.
>
> I came across this patch/message after I had the "failed to clone
> extents" problem 3-4x on Debian 12, 6.1.0-27-amd64. For us, it only
> occurs since we periodically run a Btrfs balance via Cronjob.

I don't know what Debian's 6.1.0-27 matches, but upstream the fix went
into 6.1.113, and the bug first appeared in 6.1.107.
So for 6.1 kernels, it only affected releases between 6.1.107 and 6.1.112.

So check if that kernel corresponds to 6.1.113+, and if the issue
still happens, run 'btrfs receive' with -vv and provide the output to
help figure out if it's the same issue or something else.

>
> That's why I'm wondering: Is it possible that Btrfs Balance increases
> the likelihood of the problem occurring?

Balance doesn't make aligned extents become aligned and vice-versa (if
so it would change file sizes and cause corruption), doesn't make
extents that were not shared become shared and vice-versa, and doesn't
do any changes to the extent layout of a file. So, no, balance is
totally unrelated.

>
> Best,
> Markus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] btrfs: send: fix invalid clone operation for file that got its size decreased
  2024-11-13 15:01     ` Filipe Manana
@ 2024-11-13 16:50       ` Markus
  0 siblings, 0 replies; 10+ messages in thread
From: Markus @ 2024-11-13 16:50 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs

Am 13.11.24 um 16:01 schrieb Filipe Manana:
> On Wed, Nov 13, 2024 at 1:07 PM Markus <markus@opsone.ch> wrote:
> 
> I don't know what Debian's 6.1.0-27 matches, but upstream the fix went
> into 6.1.113, and the bug first appeared in 6.1.107.
> So for 6.1 kernels, it only affected releases between 6.1.107 and 6.1.112.
> 
> So check if that kernel corresponds to 6.1.113+, and if the issue
> still happens, run 'btrfs receive' with -vv and provide the output to
> help figure out if it's the same issue or something else.

Debian Linux Kernel 6.1.0-26-amd64 is 6.1.112. However, my problem has
just been solved with Debian 12.8 [1] released on 9 November 2024.

After the installation, I am on 6.1.115, now I can no longer reproduce
the problem with the instructions from the patch.

[1] https://lists.debian.org/debian-announce/2024/msg00008.html

> Balance doesn't make aligned extents become aligned and vice-versa (if
> so it would change file sizes and cause corruption), doesn't make
> extents that were not shared become shared and vice-versa, and doesn't
> do any changes to the extent layout of a file. So, no, balance is
> totally unrelated.

Many thanks for your explanations!

Thanks,
Markus

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-11-13 16:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-27 10:25 [PATCH] btrfs: send: fix invalid clone operation for file that got its size decreased fdmanana
2024-09-27 10:33 ` Qu Wenruo
2024-09-27 10:36   ` Filipe Manana
2024-09-27 10:38     ` Qu Wenruo
2024-09-27 11:03 ` [PATCH v2] " fdmanana
2024-09-30 18:11   ` Josef Bacik
2024-09-30 22:17   ` Qu Wenruo
2024-11-13 13:07   ` Markus
2024-11-13 15:01     ` Filipe Manana
2024-11-13 16:50       ` Markus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).