linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Btrfs: fix range cloning when same inode used as source and destination
@ 2015-03-31 13:56 Filipe Manana
  2015-04-02 17:25 ` [PATCH v2] " Filipe Manana
  0 siblings, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2015-03-31 13:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

While searching for extents to clone we might find one where we only use
a part of it coming from its tail. If our destination inode is the same
the source inode, we end up removing the tail part of the extent item and
insert after a new one that point to the same extent with an adjusted
key file offset and data offset. After this we search for the next extent
item in the fs/subvol tree with a key that has an offset incremented by
one. But this second search leaves us at the new extent item we inserted
previously, and since that extent item has a non-zero data offset, it
it can make us call btrfs_drop_extents with an empty range (start == end)
which causes the following warning:

[23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]()
(...)
[23978.557266] Call Trace:
[23978.557978]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.559191]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.560699]  [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.562389]  [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c
[23978.563613]  [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.565103]  [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28
[23978.566294]  [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf
[23978.567438]  [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs]
[23978.568702]  [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf
[23978.569763]  [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb
[23978.570817]  [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36
[23978.571872]  [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
[23978.573466]  [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18
[23978.574962]  [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs]
[23978.576179]  [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs]
[23978.577311]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.578520]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.580282]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.591887] ---[ end trace 988ec2a653d03ed3 ]---

Then we attempt to insert a new extent item with a key that already
exists, which makes btrfs_insert_empty_item return -EEXIST resulting in
abortion of the current transaction:

[23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
(...)
[23978.622589] Call Trace:
[23978.623181]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.624359]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.625573]  [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.626971]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[23978.628003]  [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f
[23978.629138]  [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.630528]  [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs]
[23978.631635]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.632886]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.634119]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.647714] ---[ end trace 988ec2a653d03ed4 ]---

This is wrong because we should not process the extent item that we just
inserted previously, and instead process the extent item that follows it
in the tree

For example for the test case I wrote for fstests:

   bs=$((64 * 1024))
   mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc
   mount /dev/sdc /mnt

   xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo

   $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo
   $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo

The second clone call fails with -EEXIST, because when we process the
first extent item (offset 262144), we drop part of it (counting from the
end) and then insert a new extent item with a key greater then the key we
found. The next time we search the tree we search for a key with offset
262144 + 1, which leaves us at the new extent item we have just inserted
but we think it refers to an extent that we need to clone.

Fix this by ensuring the next search key uses an offset corresponding to
the offset of the key we found previously plus the data length of the
corresponding extent item. This ensures we skip new extent items that we
inserted and works for the case of implicit holes too (NO_HOLES feature).

A test case for fstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 09a566a..869e39d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3205,6 +3205,8 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 	key.offset = off;
 
 	while (1) {
+		u64 next_key_min_offset;
+
 		/*
 		 * note the key will change type as we walk through the
 		 * tree.
@@ -3285,7 +3287,7 @@ process_slot:
 			} else if (key.offset >= off + len) {
 				break;
 			}
-
+			next_key_min_offset = key.offset + datal;
 			size = btrfs_item_size_nr(leaf, slot);
 			read_extent_buffer(leaf, buf,
 					   btrfs_item_ptr_offset(leaf, slot),
@@ -3500,7 +3502,7 @@ process_slot:
 				break;
 		}
 		btrfs_release_path(path);
-		key.offset++;
+		key.offset = next_key_min_offset;
 	}
 	ret = 0;
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2] Btrfs: fix range cloning when same inode used as source and destination
  2015-03-31 13:56 [PATCH] Btrfs: fix range cloning when same inode used as source and destination Filipe Manana
@ 2015-04-02 17:25 ` Filipe Manana
  2015-04-02 17:31   ` Holger Hoffstätte
  0 siblings, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2015-04-02 17:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

While searching for extents to clone we might find one where we only use
a part of it coming from its tail. If our destination inode is the same
the source inode, we end up removing the tail part of the extent item and
insert after a new one that point to the same extent with an adjusted
key file offset and data offset. After this we search for the next extent
item in the fs/subvol tree with a key that has an offset incremented by
one. But this second search leaves us at the new extent item we inserted
previously, and since that extent item has a non-zero data offset, it
it can make us call btrfs_drop_extents with an empty range (start == end)
which causes the following warning:

[23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]()
(...)
[23978.557266] Call Trace:
[23978.557978]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.559191]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.560699]  [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.562389]  [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c
[23978.563613]  [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.565103]  [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28
[23978.566294]  [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf
[23978.567438]  [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs]
[23978.568702]  [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf
[23978.569763]  [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb
[23978.570817]  [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36
[23978.571872]  [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
[23978.573466]  [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18
[23978.574962]  [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs]
[23978.576179]  [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs]
[23978.577311]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.578520]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.580282]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.591887] ---[ end trace 988ec2a653d03ed3 ]---

Then we attempt to insert a new extent item with a key that already
exists, which makes btrfs_insert_empty_item return -EEXIST resulting in
abortion of the current transaction:

[23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
(...)
[23978.622589] Call Trace:
[23978.623181]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.624359]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.625573]  [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.626971]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[23978.628003]  [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f
[23978.629138]  [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.630528]  [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs]
[23978.631635]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.632886]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.634119]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.647714] ---[ end trace 988ec2a653d03ed4 ]---

This is wrong because we should not process the extent item that we just
inserted previously, and instead process the extent item that follows it
in the tree

For example for the test case I wrote for fstests:

   bs=$((64 * 1024))
   mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc
   mount /dev/sdc /mnt

   xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo

   $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo
   $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo

The second clone call fails with -EEXIST, because when we process the
first extent item (offset 262144), we drop part of it (counting from the
end) and then insert a new extent item with a key greater then the key we
found. The next time we search the tree we search for a key with offset
262144 + 1, which leaves us at the new extent item we have just inserted
but we think it refers to an extent that we need to clone.

Fix this by ensuring the next search key uses an offset corresponding to
the offset of the key we found previously plus the data length of the
corresponding extent item. This ensures we skip new extent items that we
inserted and works for the case of implicit holes too (NO_HOLES feature).

A test case for fstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Fixed a warning about potentially uninitialized variable. David
    got this warning on a 4.5.1 gcc, but I didn't on a 4.9.2 gcc
    however.

 fs/btrfs/ioctl.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 09a566a..029f4da 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3254,6 +3254,7 @@ process_slot:
 			u64 datao = 0, datal = 0;
 			u8 comp;
 			u64 drop_start;
+			u64 next_key_min_offset;
 
 			extent = btrfs_item_ptr(leaf, slot,
 						struct btrfs_file_extent_item);
@@ -3285,7 +3286,7 @@ process_slot:
 			} else if (key.offset >= off + len) {
 				break;
 			}
-
+			next_key_min_offset = key.offset + datal;
 			size = btrfs_item_size_nr(leaf, slot);
 			read_extent_buffer(leaf, buf,
 					   btrfs_item_ptr_offset(leaf, slot),
@@ -3498,9 +3499,11 @@ process_slot:
 				goto out;
 			if (new_key.offset + datal >= destoff + len)
 				break;
+			key.offset = next_key_min_offset;
+		} else {
+			key.offset++;
 		}
 		btrfs_release_path(path);
-		key.offset++;
 	}
 	ret = 0;
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Btrfs: fix range cloning when same inode used as source and destination
  2015-04-02 17:25 ` [PATCH v2] " Filipe Manana
@ 2015-04-02 17:31   ` Holger Hoffstätte
  2015-04-02 18:03     ` Filipe David Manana
  0 siblings, 1 reply; 4+ messages in thread
From: Holger Hoffstätte @ 2015-04-02 17:31 UTC (permalink / raw)
  To: linux-btrfs


On Thu, 02 Apr 2015 18:25:11 +0100, Filipe Manana wrote:

> V2: Fixed a warning about potentially uninitialized variable. David
>     got this warning on a 4.5.1 gcc, but I didn't on a 4.9.2 gcc
>     however.

I was *just* about to post this warning, since I saw it only a minute ago!

I assume you mean:

fs/btrfs/ioctl.c: In function 'btrfs_clone':
fs/btrfs/ioctl.c:3531:14: warning: 'next_key_min_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]
   key.offset = next_key_min_offset;
              ^

..and this is with 4.9.2 here.

Anyway..thanks for being faster :)

Holger


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Btrfs: fix range cloning when same inode used as source and destination
  2015-04-02 17:31   ` Holger Hoffstätte
@ 2015-04-02 18:03     ` Filipe David Manana
  0 siblings, 0 replies; 4+ messages in thread
From: Filipe David Manana @ 2015-04-02 18:03 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: linux-btrfs@vger.kernel.org

On Thu, Apr 2, 2015 at 6:31 PM, Holger Hoffstätte
<holger.hoffstaette@googlemail.com> wrote:
>
> On Thu, 02 Apr 2015 18:25:11 +0100, Filipe Manana wrote:
>
>> V2: Fixed a warning about potentially uninitialized variable. David
>>     got this warning on a 4.5.1 gcc, but I didn't on a 4.9.2 gcc
>>     however.
>
> I was *just* about to post this warning, since I saw it only a minute ago!
>
> I assume you mean:
>
> fs/btrfs/ioctl.c: In function 'btrfs_clone':
> fs/btrfs/ioctl.c:3531:14: warning: 'next_key_min_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]
>    key.offset = next_key_min_offset;
>               ^
>
> ..and this is with 4.9.2 here.
>
> Anyway..thanks for being faster :)

Yes, that was it.
Thanks.

>
> Holger
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-04-02 18:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-31 13:56 [PATCH] Btrfs: fix range cloning when same inode used as source and destination Filipe Manana
2015-04-02 17:25 ` [PATCH v2] " Filipe Manana
2015-04-02 17:31   ` Holger Hoffstätte
2015-04-02 18:03     ` Filipe David Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).