* [PATCH v3 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
2020-08-09 12:09 [PATCH v3 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
@ 2020-08-09 12:09 ` Qu Wenruo
2020-08-11 18:42 ` Josef Bacik
2020-08-09 12:09 ` [PATCH v3 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-09 12:09 UTC (permalink / raw)
To: linux-btrfs
Although we have start, len check for extent buffer reader/write (e.g.
read_extent_buffer()), those checks has its limitations:
- No overflow check
Values like start = 1024 len = -1024 can still pass the basic
(start + len) > eb->len check.
- Checks are not consistent
For read_extent_buffer() we only check (start + len) against eb->len.
While for memcmp_extent_buffer() we also check start against eb->len.
- Different error reporting mechanism
We use WARN() in read_extent_buffer() but BUG() in
memcpy_extent_buffer().
- Still modify memory if the request is obviously wrong
In read_extent_buffer() even we find (start + len) > eb->len, we still
call memset(dst, 0, len), which can eaisly cause memory access error
if start + len overflows.
To address above problems, this patch creates a new common function to
check such access, check_eb_range().
- Add overflow check
This function checks start, start + len against eb->len and overflow
check.
- Unified checks
- Unified error reports
Will call WARN() if CONFIG_BTRFS_DEBUG is configured.
And also do btrfs_warn() message for non-debug build.
- Exit ASAP if check fails
No more possible memory corruption.
- Add extra comment for @start @len used in those functions
Even experienced developers sometimes get confused with the @start
@len with logical address in those functions.
I'm not sure what's the cause, maybe it's the extent_buffer::start
naming.
For now, just add some comment.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202817
[ Inspired by above report, the report itself is already addressed ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 76 +++++++++++++++++++++++---------------------
1 file changed, 39 insertions(+), 37 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 617ea38e6fd7..9f583ef1e387 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5620,6 +5620,28 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
return ret;
}
+/*
+ * Check if the [start, start + len) range is valid before reading/writing
+ * the eb.
+ * NOTE: @start and @len are offset *INSIDE* the eb, *NOT* logical address.
+ *
+ * Caller should not touch the dst/src memory if this function returns error.
+ */
+static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
+ unsigned long len)
+{
+ /* start, start + len should not go beyond eb->len nor overflow */
+ if (unlikely(start > eb->len || start + len > eb->len ||
+ len > eb->len)) {
+ btrfs_warn(eb->fs_info,
+"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
+ eb->start, eb->len, start, len);
+ WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+ return -EINVAL;
+ }
+ return 0;
+}
+
void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
unsigned long start, unsigned long len)
{
@@ -5630,12 +5652,8 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
char *dst = (char *)dstv;
unsigned long i = start >> PAGE_SHIFT;
- if (start + len > eb->len) {
- WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
- eb->start, eb->len, start, len);
- memset(dst, 0, len);
+ if (check_eb_range(eb, start, len))
return;
- }
offset = offset_in_page(start);
@@ -5700,8 +5718,8 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
unsigned long i = start >> PAGE_SHIFT;
int ret = 0;
- WARN_ON(start > eb->len);
- WARN_ON(start + len > eb->start + eb->len);
+ if (check_eb_range(eb, start, len))
+ return -EINVAL;
offset = offset_in_page(start);
@@ -5754,8 +5772,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
char *src = (char *)srcv;
unsigned long i = start >> PAGE_SHIFT;
- WARN_ON(start > eb->len);
- WARN_ON(start + len > eb->start + eb->len);
+ if (check_eb_range(eb, start, len))
+ return;
offset = offset_in_page(start);
@@ -5783,8 +5801,8 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
char *kaddr;
unsigned long i = start >> PAGE_SHIFT;
- WARN_ON(start > eb->len);
- WARN_ON(start + len > eb->start + eb->len);
+ if (check_eb_range(eb, start, len))
+ return;
offset = offset_in_page(start);
@@ -5828,6 +5846,10 @@ void copy_extent_buffer(const struct extent_buffer *dst,
char *kaddr;
unsigned long i = dst_offset >> PAGE_SHIFT;
+ if (check_eb_range(dst, dst_offset, len) ||
+ check_eb_range(src, src_offset, len))
+ return;
+
WARN_ON(src->len != dst_len);
offset = offset_in_page(dst_offset);
@@ -6017,25 +6039,15 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
unsigned long dst_offset, unsigned long src_offset,
unsigned long len)
{
- struct btrfs_fs_info *fs_info = dst->fs_info;
size_t cur;
size_t dst_off_in_page;
size_t src_off_in_page;
unsigned long dst_i;
unsigned long src_i;
- if (src_offset + len > dst->len) {
- btrfs_err(fs_info,
- "memmove bogus src_offset %lu move len %lu dst len %lu",
- src_offset, len, dst->len);
- BUG();
- }
- if (dst_offset + len > dst->len) {
- btrfs_err(fs_info,
- "memmove bogus dst_offset %lu move len %lu dst len %lu",
- dst_offset, len, dst->len);
- BUG();
- }
+ if (check_eb_range(dst, dst_offset, len) ||
+ check_eb_range(dst, src_offset, len))
+ return;
while (len > 0) {
dst_off_in_page = offset_in_page(dst_offset);
@@ -6062,7 +6074,6 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
unsigned long dst_offset, unsigned long src_offset,
unsigned long len)
{
- struct btrfs_fs_info *fs_info = dst->fs_info;
size_t cur;
size_t dst_off_in_page;
size_t src_off_in_page;
@@ -6071,18 +6082,9 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
unsigned long dst_i;
unsigned long src_i;
- if (src_offset + len > dst->len) {
- btrfs_err(fs_info,
- "memmove bogus src_offset %lu move len %lu len %lu",
- src_offset, len, dst->len);
- BUG();
- }
- if (dst_offset + len > dst->len) {
- btrfs_err(fs_info,
- "memmove bogus dst_offset %lu move len %lu len %lu",
- dst_offset, len, dst->len);
- BUG();
- }
+ if (check_eb_range(dst, dst_offset, len) ||
+ check_eb_range(dst, src_offset, len))
+ return;
if (dst_offset < src_offset) {
memcpy_extent_buffer(dst, dst_offset, src_offset, len);
return;
--
2.28.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v3 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
2020-08-09 12:09 ` [PATCH v3 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
@ 2020-08-11 18:42 ` Josef Bacik
0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-08-11 18:42 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8/9/20 8:09 AM, Qu Wenruo wrote:
> Although we have start, len check for extent buffer reader/write (e.g.
> read_extent_buffer()), those checks has its limitations:
> - No overflow check
> Values like start = 1024 len = -1024 can still pass the basic
> (start + len) > eb->len check.
>
> - Checks are not consistent
> For read_extent_buffer() we only check (start + len) against eb->len.
> While for memcmp_extent_buffer() we also check start against eb->len.
>
> - Different error reporting mechanism
> We use WARN() in read_extent_buffer() but BUG() in
> memcpy_extent_buffer().
>
> - Still modify memory if the request is obviously wrong
> In read_extent_buffer() even we find (start + len) > eb->len, we still
> call memset(dst, 0, len), which can eaisly cause memory access error
> if start + len overflows.
>
> To address above problems, this patch creates a new common function to
> check such access, check_eb_range().
> - Add overflow check
> This function checks start, start + len against eb->len and overflow
> check.
>
> - Unified checks
>
> - Unified error reports
> Will call WARN() if CONFIG_BTRFS_DEBUG is configured.
> And also do btrfs_warn() message for non-debug build.
>
> - Exit ASAP if check fails
> No more possible memory corruption.
>
> - Add extra comment for @start @len used in those functions
> Even experienced developers sometimes get confused with the @start
> @len with logical address in those functions.
> I'm not sure what's the cause, maybe it's the extent_buffer::start
> naming.
> For now, just add some comment.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202817
> [ Inspired by above report, the report itself is already addressed ]
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
2020-08-09 12:09 [PATCH v3 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
2020-08-09 12:09 ` [PATCH v3 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
@ 2020-08-09 12:09 ` Qu Wenruo
2020-08-11 18:45 ` Josef Bacik
2020-08-09 12:09 ` [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-09 12:09 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
__btrfs_free_extent() is one of the best cases to show how optimization
could make a function hard to read.
In fact __btrfs_free_extent() is only doing two major works:
1. Reduce the refs number of an extent backref
Either it's an inlined extent backref (inside EXTENT/METADATA item) or
a keyed extent backref (SHARED_* item).
We only need to locate that backref line, either reduce the number or
remove the backref line completely.
2. Update the refs count in EXTENT/METADATA_ITEM
But in real world, we do it in a complex but somewhat efficient way.
During step 1), we will try to locate the EXTENT/METADATA_ITEM without
triggering another btrfs_search_slot() as fast path.
Only when we failed to locate that item, we will trigger another
btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we
updated/deleted the backref line.
And we have a lot of restrict check on things like refs_to_drop against
extent refs and special case check for single ref extent.
All of these results:
- 7 BUG_ON()s in a single function
Although all these BUG_ON() are doing correct check, they're super
easy to get triggered for fuzzed images.
It's never a good idea to piss the end user.
- Near 300 lines without much useful comments but a lot of hidden
conditions
I believe even the author needs several minutes to recall what the
code is doing
Not to mention a lot of BUG_ON() conditions needs to go back tens of
lines to find out why.
This patch address all these problems by:
- Introduce two examples to show what __btrfs_free_extent() is doing
One inlined backref case and one keyed case.
Should cover most cases.
- Kill all BUG_ON()s with proper error message and optional leaf dump
- Add comment to show the overall workflow
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202819
[ The report triggers one BUG_ON() in __btrfs_free_extent() ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/extent-tree.c | 144 +++++++++++++++++++++++++++++++++++++----
1 file changed, 131 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fa7d83051587..8e86e3524861 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2930,6 +2930,53 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
return 0;
}
+/*
+ * Real work happens here to drop one or more refs of @node.
+ *
+ * The work is mostly done in two parts:
+ * 1. Locate the extent refs.
+ * It's either inlined in EXTENT/METADATA_ITEM or in keyed SHARED_* item.
+ * Locate it then reduces the refs number or remove the ref line completely.
+ *
+ * 2. Update the refs count in EXTENT/METADATA_ITEM
+ *
+ * Due to the above two operations and possible optimizations, the function
+ * is a little hard to read, but with the following examples, the result
+ * of this function should be pretty easy to get.
+ *
+ * Example:
+ * *** Inlined backref case ***
+ * In extent tree we have:
+ * item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 16201 itemsize 82
+ * refs 2 gen 6 flags DATA
+ * extent data backref root FS_TREE objectid 258 offset 0 count 1
+ * extent data backref root FS_TREE objectid 257 offset 0 count 1
+ *
+ * This function get called with
+ * node->bytenr = 13631488, node->num_bytes = 1048576
+ * root_objectid = FS_TREE, owner_objectid = 257, owner_offset = 0,
+ * refs_to_drop = 1
+ * Then we should get some like:
+ * item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 16201 itemsize 82
+ * refs 1 gen 6 flags DATA
+ * extent data backref root FS_TREE objectid 258 offset 0 count 1
+ *
+ * *** Keyed backref case ***
+ * In extent tree we have:
+ * item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 3971 itemsize 24
+ * refs 754 gen 6 flags DATA
+ * [...]
+ * item 2 key (13631488 EXTENT_DATA_REF <HASH>) itemoff 3915 itemsize 28
+ * extent data backref root FS_TREE objectid 866 offset 0 count 1
+ * This function get called with
+ * node->bytenr = 13631488, node->num_bytes = 1048576
+ * root_objectid = FS_TREE, owner_objectid = 866, owner_offset = 0,
+ * refs_to_drop = 1
+ * Then we should get some like:
+ * item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 3971 itemsize 24
+ * refs 753 gen 6 flags DATA
+ * And that (13631488 EXTENT_DATA_REF <HASH>) get removed.
+ */
static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
struct btrfs_delayed_ref_node *node, u64 parent,
u64 root_objectid, u64 owner_objectid,
@@ -2962,7 +3009,15 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
path->leave_spinning = 1;
is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
- BUG_ON(!is_data && refs_to_drop != 1);
+
+ if (unlikely(!is_data && refs_to_drop != 1)) {
+ btrfs_crit(info,
+"invalid refs_to_drop, dropping more than 1 refs for tree block %llu refs_to_drop %u",
+ node->bytenr, refs_to_drop);
+ ret = -EINVAL;
+ btrfs_abort_transaction(trans, ret);
+ goto out;
+ }
if (is_data)
skinny_metadata = false;
@@ -2971,6 +3026,13 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
parent, root_objectid, owner_objectid,
owner_offset);
if (ret == 0) {
+ /*
+ * Either the inline backref or the SHARED_DATA_REF/
+ * SHARED_BLOCK_REF is found
+ *
+ * Here is a quick path to locate EXTENT/METADATA_ITEM.
+ * It's possible the EXTENT/METADATA_ITEM is near current slot.
+ */
extent_slot = path->slots[0];
while (extent_slot >= 0) {
btrfs_item_key_to_cpu(path->nodes[0], &key,
@@ -2987,13 +3049,20 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
found_extent = 1;
break;
}
+
+ /* Quick path didn't find the EXTEMT/METADATA_ITEM */
if (path->slots[0] - extent_slot > 5)
break;
extent_slot--;
}
if (!found_extent) {
- BUG_ON(iref);
+ if (unlikely(iref)) {
+ btrfs_crit(info,
+"invalid iref, no EXTENT/METADATA_ITEM found but has inline extent ref");
+ goto err_dump_abort;
+ }
+ /* Must be SHARED_* item, remove the backref first */
ret = remove_extent_backref(trans, path, NULL,
refs_to_drop,
is_data, &last_ref);
@@ -3004,6 +3073,8 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
btrfs_release_path(path);
path->leave_spinning = 1;
+
+ /* Slow path to locate EXTENT/METADATA_ITEM */
key.objectid = bytenr;
key.type = BTRFS_EXTENT_ITEM_KEY;
key.offset = num_bytes;
@@ -3078,19 +3149,24 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
if (owner_objectid < BTRFS_FIRST_FREE_OBJECTID &&
key.type == BTRFS_EXTENT_ITEM_KEY) {
struct btrfs_tree_block_info *bi;
- BUG_ON(item_size < sizeof(*ei) + sizeof(*bi));
+ if (unlikely(item_size < sizeof(*ei) + sizeof(*bi))) {
+ btrfs_crit(info,
+"invalid extent item size for key (%llu, %u, %llu) owner %llu, has %u expect >= %lu",
+ key.objectid, key.type, key.offset,
+ owner_objectid, item_size,
+ sizeof(*ei) + sizeof(*bi));
+ goto err_dump_abort;
+ }
bi = (struct btrfs_tree_block_info *)(ei + 1);
WARN_ON(owner_objectid != btrfs_tree_block_level(leaf, bi));
}
refs = btrfs_extent_refs(leaf, ei);
if (refs < refs_to_drop) {
- btrfs_err(info,
- "trying to drop %d refs but we only have %Lu for bytenr %Lu",
+ btrfs_crit(info,
+ "trying to drop %d refs but we only have %Lu for bytenr %Lu",
refs_to_drop, refs, bytenr);
- ret = -EINVAL;
- btrfs_abort_transaction(trans, ret);
- goto out;
+ goto err_dump_abort;
}
refs -= refs_to_drop;
@@ -3102,7 +3178,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
* be updated by remove_extent_backref
*/
if (iref) {
- BUG_ON(!found_extent);
+ if (unlikely(!found_extent)) {
+ btrfs_crit(info,
+"invalid iref, got inlined extent ref but no EXTENT/METADATA_ITEM found");
+ goto err_dump_abort;
+ }
} else {
btrfs_set_extent_refs(leaf, ei, refs);
btrfs_mark_buffer_dirty(leaf);
@@ -3117,13 +3197,36 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
}
}
} else {
+ /* In this branch refs == 1 */
if (found_extent) {
- BUG_ON(is_data && refs_to_drop !=
- extent_data_ref_count(path, iref));
+ if (is_data && refs_to_drop !=
+ extent_data_ref_count(path, iref)) {
+ btrfs_crit(info,
+ "invalid refs_to_drop, current refs %u refs_to_drop %u",
+ extent_data_ref_count(path, iref),
+ refs_to_drop);
+ goto err_dump_abort;
+ }
if (iref) {
- BUG_ON(path->slots[0] != extent_slot);
+ if (path->slots[0] != extent_slot) {
+ btrfs_crit(info,
+"invalid iref, extent item key (%llu %u %llu) doesn't has wanted iref",
+ key.objectid, key.type,
+ key.offset);
+ goto err_dump_abort;
+ }
} else {
- BUG_ON(path->slots[0] != extent_slot + 1);
+ /*
+ * No inline ref, we must at SHARED_* item,
+ * And it's single ref, it must be:
+ * | extent_slot ||extent_slot + 1|
+ * [ EXTENT/METADATA_ITEM ][ SHARED_* ITEM ]
+ */
+ if (path->slots[0] != extent_slot + 1) {
+ btrfs_crit(info,
+ "invalid SHARED_* item, previous item is not EXTENT/METADATA_ITEM");
+ goto err_dump_abort;
+ }
path->slots[0] = extent_slot;
num_to_del = 2;
}
@@ -3164,6 +3267,21 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
out:
btrfs_free_path(path);
return ret;
+err_dump_abort:
+ /*
+ * Leaf dump can take up a lot of dmesg buffer since default nodesize
+ * is already 16K.
+ * So we only do full leaf dump for debug build.
+ */
+ if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+ btrfs_crit(info, "path->slots[0]=%d extent_slot=%d",
+ path->slots[0], extent_slot);
+ btrfs_print_leaf(path->nodes[0]);
+ }
+
+ btrfs_abort_transaction(trans, -EUCLEAN);
+ btrfs_free_path(path);
+ return -EUCLEAN;
}
/*
--
2.28.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v3 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
2020-08-09 12:09 ` [PATCH v3 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
@ 2020-08-11 18:45 ` Josef Bacik
0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-08-11 18:45 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov
On 8/9/20 8:09 AM, Qu Wenruo wrote:
> __btrfs_free_extent() is one of the best cases to show how optimization
> could make a function hard to read.
>
> In fact __btrfs_free_extent() is only doing two major works:
> 1. Reduce the refs number of an extent backref
> Either it's an inlined extent backref (inside EXTENT/METADATA item) or
> a keyed extent backref (SHARED_* item).
> We only need to locate that backref line, either reduce the number or
> remove the backref line completely.
>
> 2. Update the refs count in EXTENT/METADATA_ITEM
>
> But in real world, we do it in a complex but somewhat efficient way.
> During step 1), we will try to locate the EXTENT/METADATA_ITEM without
> triggering another btrfs_search_slot() as fast path.
>
> Only when we failed to locate that item, we will trigger another
> btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we
> updated/deleted the backref line.
>
> And we have a lot of restrict check on things like refs_to_drop against
> extent refs and special case check for single ref extent.
>
> All of these results:
> - 7 BUG_ON()s in a single function
> Although all these BUG_ON() are doing correct check, they're super
> easy to get triggered for fuzzed images.
> It's never a good idea to piss the end user.
>
> - Near 300 lines without much useful comments but a lot of hidden
> conditions
> I believe even the author needs several minutes to recall what the
> code is doing
> Not to mention a lot of BUG_ON() conditions needs to go back tens of
> lines to find out why.
>
> This patch address all these problems by:
> - Introduce two examples to show what __btrfs_free_extent() is doing
> One inlined backref case and one keyed case.
> Should cover most cases.
>
> - Kill all BUG_ON()s with proper error message and optional leaf dump
>
> - Add comment to show the overall workflow
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202819
> [ The report triggers one BUG_ON() in __btrfs_free_extent() ]
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
2020-08-09 12:09 [PATCH v3 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
2020-08-09 12:09 ` [PATCH v3 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
2020-08-09 12:09 ` [PATCH v3 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
@ 2020-08-09 12:09 ` Qu Wenruo
2020-08-11 18:48 ` Josef Bacik
2020-08-09 12:09 ` [PATCH v3 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
2020-08-09 12:09 ` [PATCH v3 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
4 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-09 12:09 UTC (permalink / raw)
To: linux-btrfs
[BUG]
With crafted image, btrfs will panic at btree operations:
kernel BUG at fs/btrfs/ctree.c:3894!
invalid opcode: 0000 [#1] SMP PTI
CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
RIP: 0010:__push_leaf_left+0x6b6/0x6e0
Code: 00 00 48 98 48 8d 04 80 48 8d 74 80 65 e8 42 5a 04 00 48 8b bd 78 ff ff ff 8b bf 90 d0 00 00 89 7d 98 83 ef 65 e9 06 ff ff ff <0f> 0b 0f 0b 48 8b 85 78 ff ff ff 8b 90 90 d0 00 00 e9 eb fe ff ff
RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
Call Trace:
? _cond_resched+0x1a/0x50
push_leaf_left+0x179/0x190
btrfs_del_items+0x316/0x470
btrfs_del_csums+0x215/0x3a0
__btrfs_free_extent.isra.72+0x5a7/0xbe0
__btrfs_run_delayed_refs+0x539/0x1120
btrfs_run_delayed_refs+0xdb/0x1b0
btrfs_commit_transaction+0x52/0x950
? start_transaction+0x94/0x450
transaction_kthread+0x163/0x190
kthread+0x105/0x140
? btrfs_cleanup_transaction+0x560/0x560
? kthread_destroy_worker+0x50/0x50
ret_from_fork+0x35/0x40
Modules linked in:
---[ end trace c2425e6e89b5558f ]---
[CAUSE]
The offending csum tree looks like this:
checksum tree key (CSUM_TREE ROOT_ITEM 0)
node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
...
key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
...
leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8
range start 85975040 end 85983232 length 8192
...
leaf 29642752 items 0 free space 3995 generation 17 owner 0
^ empty leaf invalid owner ^
leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368
range start 92274688 end 95723520 length 3448832
So we have a corrupted csum tree where one tree leaf is completely
empty, causing unbalanced btree, thus leading to unexpected btree
balance error.
[FIX]
For this particular case, we handle it in two directions to catch it:
- Check if the tree block is empty through btrfs_verify_level_key()
So that invalid tree blocks won't be read out through
btrfs_search_slot() and its variants.
- Check 0 tree owner in tree checker
NO tree is using 0 as its tree owner, detect it and reject at tree
block read time.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202821
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c850d7f44fbe..b12804d1faac 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -393,6 +393,14 @@ int btrfs_verify_level_key(struct extent_buffer *eb, int level,
if (!first_key)
return 0;
+ /* We have @first_key, so this @eb must have at least one item */
+ if (btrfs_header_nritems(eb) == 0) {
+ btrfs_err(fs_info,
+ "invalid tree nritems, bytenr=%llu nritems=0 expect >0",
+ eb->start);
+ WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+ return -EUCLEAN;
+ }
/*
* For live tree block (new tree blocks in current transaction),
--
2.28.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
2020-08-09 12:09 ` [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
@ 2020-08-11 18:48 ` Josef Bacik
2020-08-11 23:04 ` Qu Wenruo
0 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-08-11 18:48 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8/9/20 8:09 AM, Qu Wenruo wrote:
> [BUG]
> With crafted image, btrfs will panic at btree operations:
> kernel BUG at fs/btrfs/ctree.c:3894!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
> RIP: 0010:__push_leaf_left+0x6b6/0x6e0
> Code: 00 00 48 98 48 8d 04 80 48 8d 74 80 65 e8 42 5a 04 00 48 8b bd 78 ff ff ff 8b bf 90 d0 00 00 89 7d 98 83 ef 65 e9 06 ff ff ff <0f> 0b 0f 0b 48 8b 85 78 ff ff ff 8b 90 90 d0 00 00 e9 eb fe ff ff
> RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
> RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
> RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
> R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
> FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
> Call Trace:
> ? _cond_resched+0x1a/0x50
> push_leaf_left+0x179/0x190
> btrfs_del_items+0x316/0x470
> btrfs_del_csums+0x215/0x3a0
> __btrfs_free_extent.isra.72+0x5a7/0xbe0
> __btrfs_run_delayed_refs+0x539/0x1120
> btrfs_run_delayed_refs+0xdb/0x1b0
> btrfs_commit_transaction+0x52/0x950
> ? start_transaction+0x94/0x450
> transaction_kthread+0x163/0x190
> kthread+0x105/0x140
> ? btrfs_cleanup_transaction+0x560/0x560
> ? kthread_destroy_worker+0x50/0x50
> ret_from_fork+0x35/0x40
> Modules linked in:
> ---[ end trace c2425e6e89b5558f ]---
>
> [CAUSE]
> The offending csum tree looks like this:
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
> ...
> key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
> key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
> key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
> ...
>
> leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
> item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8
> range start 85975040 end 85983232 length 8192
> ...
> leaf 29642752 items 0 free space 3995 generation 17 owner 0
> ^ empty leaf invalid owner ^
>
> leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
> item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368
> range start 92274688 end 95723520 length 3448832
>
> So we have a corrupted csum tree where one tree leaf is completely
> empty, causing unbalanced btree, thus leading to unexpected btree
> balance error.
>
> [FIX]
> For this particular case, we handle it in two directions to catch it:
> - Check if the tree block is empty through btrfs_verify_level_key()
> So that invalid tree blocks won't be read out through
> btrfs_search_slot() and its variants.
>
> - Check 0 tree owner in tree checker
> NO tree is using 0 as its tree owner, detect it and reject at tree
> block read time.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202821
> Signed-off-by: Qu Wenruo <wqu@suse.com>
This test is done further down, just after a
if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
return 0;
Which I assume is the problem? The generation is 19, is that >
last_trans_committed? Seems like this check just needs to be moved lower,
right? Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
2020-08-11 18:48 ` Josef Bacik
@ 2020-08-11 23:04 ` Qu Wenruo
2020-08-12 0:23 ` Josef Bacik
0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-11 23:04 UTC (permalink / raw)
To: Josef Bacik, Qu Wenruo, linux-btrfs
On 2020/8/12 上午2:48, Josef Bacik wrote:
> On 8/9/20 8:09 AM, Qu Wenruo wrote:
>> [BUG]
>> With crafted image, btrfs will panic at btree operations:
>> kernel BUG at fs/btrfs/ctree.c:3894!
>> invalid opcode: 0000 [#1] SMP PTI
>> CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
>> RIP: 0010:__push_leaf_left+0x6b6/0x6e0
>> Code: 00 00 48 98 48 8d 04 80 48 8d 74 80 65 e8 42 5a 04 00 48 8b
>> bd 78 ff ff ff 8b bf 90 d0 00 00 89 7d 98 83 ef 65 e9 06 ff ff ff <0f>
>> 0b 0f 0b 48 8b 85 78 ff ff ff 8b 90 90 d0 00 00 e9 eb fe ff ff
>> RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
>> RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
>> RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
>> R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
>> FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
>> Call Trace:
>> ? _cond_resched+0x1a/0x50
>> push_leaf_left+0x179/0x190
>> btrfs_del_items+0x316/0x470
>> btrfs_del_csums+0x215/0x3a0
>> __btrfs_free_extent.isra.72+0x5a7/0xbe0
>> __btrfs_run_delayed_refs+0x539/0x1120
>> btrfs_run_delayed_refs+0xdb/0x1b0
>> btrfs_commit_transaction+0x52/0x950
>> ? start_transaction+0x94/0x450
>> transaction_kthread+0x163/0x190
>> kthread+0x105/0x140
>> ? btrfs_cleanup_transaction+0x560/0x560
>> ? kthread_destroy_worker+0x50/0x50
>> ret_from_fork+0x35/0x40
>> Modules linked in:
>> ---[ end trace c2425e6e89b5558f ]---
>>
>> [CAUSE]
>> The offending csum tree looks like this:
>> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>> node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>> ...
>> key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
>> key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
>> key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
>> ...
>>
>> leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
>> item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987
>> itemsize 8
>> range start 85975040 end 85983232 length 8192
>> ...
>> leaf 29642752 items 0 free space 3995 generation 17 owner 0
>> ^ empty leaf invalid owner ^
>>
>> leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
>> item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627
>> itemsize 3368
>> range start 92274688 end 95723520 length 3448832
>>
>> So we have a corrupted csum tree where one tree leaf is completely
>> empty, causing unbalanced btree, thus leading to unexpected btree
>> balance error.
>>
>> [FIX]
>> For this particular case, we handle it in two directions to catch it:
>> - Check if the tree block is empty through btrfs_verify_level_key()
>> So that invalid tree blocks won't be read out through
>> btrfs_search_slot() and its variants.
>>
>> - Check 0 tree owner in tree checker
>> NO tree is using 0 as its tree owner, detect it and reject at tree
>> block read time.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202821
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> This test is done further down, just after a
>
> if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
> return 0;
>
> Which I assume is the problem? The generation is 19, is that >
> last_trans_committed? Seems like this check just needs to be moved
> lower, right? Thanks,
Nope, that generation 19 is valid. That fs has a higher generation, so
that's completely valid.
The generation 19 is there because there is another csum leaf whose
generation is 19.
Thanks,
Qu
>
> Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
2020-08-11 23:04 ` Qu Wenruo
@ 2020-08-12 0:23 ` Josef Bacik
2020-08-12 0:29 ` Qu Wenruo
0 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-08-12 0:23 UTC (permalink / raw)
To: Qu Wenruo, Qu Wenruo, linux-btrfs
On 8/11/20 7:04 PM, Qu Wenruo wrote:
>
>
> On 2020/8/12 上午2:48, Josef Bacik wrote:
>> On 8/9/20 8:09 AM, Qu Wenruo wrote:
>>> [BUG]
>>> With crafted image, btrfs will panic at btree operations:
>>> kernel BUG at fs/btrfs/ctree.c:3894!
>>> invalid opcode: 0000 [#1] SMP PTI
>>> CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
>>> RIP: 0010:__push_leaf_left+0x6b6/0x6e0
>>> Code: 00 00 48 98 48 8d 04 80 48 8d 74 80 65 e8 42 5a 04 00 48 8b
>>> bd 78 ff ff ff 8b bf 90 d0 00 00 89 7d 98 83 ef 65 e9 06 ff ff ff <0f>
>>> 0b 0f 0b 48 8b 85 78 ff ff ff 8b 90 90 d0 00 00 e9 eb fe ff ff
>>> RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
>>> RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
>>> RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
>>> RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
>>> R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
>>> FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000)
>>> knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
>>> Call Trace:
>>> ? _cond_resched+0x1a/0x50
>>> push_leaf_left+0x179/0x190
>>> btrfs_del_items+0x316/0x470
>>> btrfs_del_csums+0x215/0x3a0
>>> __btrfs_free_extent.isra.72+0x5a7/0xbe0
>>> __btrfs_run_delayed_refs+0x539/0x1120
>>> btrfs_run_delayed_refs+0xdb/0x1b0
>>> btrfs_commit_transaction+0x52/0x950
>>> ? start_transaction+0x94/0x450
>>> transaction_kthread+0x163/0x190
>>> kthread+0x105/0x140
>>> ? btrfs_cleanup_transaction+0x560/0x560
>>> ? kthread_destroy_worker+0x50/0x50
>>> ret_from_fork+0x35/0x40
>>> Modules linked in:
>>> ---[ end trace c2425e6e89b5558f ]---
>>>
>>> [CAUSE]
>>> The offending csum tree looks like this:
>>> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>>> node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>>> ...
>>> key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
>>> key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
>>> key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
>>> ...
>>>
>>> leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
>>> item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987
>>> itemsize 8
>>> range start 85975040 end 85983232 length 8192
>>> ...
>>> leaf 29642752 items 0 free space 3995 generation 17 owner 0
>>> ^ empty leaf invalid owner ^
>>>
>>> leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
>>> item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627
>>> itemsize 3368
>>> range start 92274688 end 95723520 length 3448832
>>>
>>> So we have a corrupted csum tree where one tree leaf is completely
>>> empty, causing unbalanced btree, thus leading to unexpected btree
>>> balance error.
>>>
>>> [FIX]
>>> For this particular case, we handle it in two directions to catch it:
>>> - Check if the tree block is empty through btrfs_verify_level_key()
>>> So that invalid tree blocks won't be read out through
>>> btrfs_search_slot() and its variants.
>>>
>>> - Check 0 tree owner in tree checker
>>> NO tree is using 0 as its tree owner, detect it and reject at tree
>>> block read time.
>>>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202821
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>
>> This test is done further down, just after a
>>
>> if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
>> return 0;
>>
>> Which I assume is the problem? The generation is 19, is that >
>> last_trans_committed? Seems like this check just needs to be moved
>> lower, right? Thanks,
>
> Nope, that generation 19 is valid. That fs has a higher generation, so
> that's completely valid.
>
> The generation 19 is there because there is another csum leaf whose
> generation is 19.
>
Then this patch does nothing, because we already have this check lower,
so how exactly did it make the panic go away? Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
2020-08-12 0:23 ` Josef Bacik
@ 2020-08-12 0:29 ` Qu Wenruo
2020-08-12 1:50 ` Josef Bacik
0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-12 0:29 UTC (permalink / raw)
To: Josef Bacik, Qu Wenruo, linux-btrfs
On 2020/8/12 上午8:23, Josef Bacik wrote:
> On 8/11/20 7:04 PM, Qu Wenruo wrote:
>>
>>
[...]
>>> Which I assume is the problem? The generation is 19, is that >
>>> last_trans_committed? Seems like this check just needs to be moved
>>> lower, right? Thanks,
>>
>> Nope, that generation 19 is valid. That fs has a higher generation, so
>> that's completely valid.
>>
>> The generation 19 is there because there is another csum leaf whose
>> generation is 19.
>>
>
> Then this patch does nothing, because we already have this check lower,
> so how exactly did it make the panic go away? Thanks,
>
> Josef
Sorry, I don't get your point.
The generation 19 isn't larger than last_trans_committed, so that check
has nothing to do with this case.
And then it goes to the header_nritems() check, which is 0, and with
first_key present, which is invalid and we error out, rejecting the
corrupted leaf.
What's the problem then?
Thanks,
Qu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
2020-08-12 0:29 ` Qu Wenruo
@ 2020-08-12 1:50 ` Josef Bacik
2020-08-12 1:53 ` Qu Wenruo
0 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-08-12 1:50 UTC (permalink / raw)
To: Qu Wenruo, Qu Wenruo, linux-btrfs
On 8/11/20 8:29 PM, Qu Wenruo wrote:
>
>
> On 2020/8/12 上午8:23, Josef Bacik wrote:
>> On 8/11/20 7:04 PM, Qu Wenruo wrote:
>>>
>>>
> [...]
>>>> Which I assume is the problem? The generation is 19, is that >
>>>> last_trans_committed? Seems like this check just needs to be moved
>>>> lower, right? Thanks,
>>>
>>> Nope, that generation 19 is valid. That fs has a higher generation, so
>>> that's completely valid.
>>>
>>> The generation 19 is there because there is another csum leaf whose
>>> generation is 19.
>>>
>>
>> Then this patch does nothing, because we already have this check lower,
>> so how exactly did it make the panic go away? Thanks,
>>
>> Josef
>
> Sorry, I don't get your point.
>
> The generation 19 isn't larger than last_trans_committed, so that check
> has nothing to do with this case.
>
> And then it goes to the header_nritems() check, which is 0, and with
> first_key present, which is invalid and we error out, rejecting the
> corrupted leaf.
>
> What's the problem then?
/* We have @first_key, so this @eb must have at least one item */
if (btrfs_header_nritems(eb) == 0) {
btrfs_err(fs_info,
"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
eb->start);
WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
return -EUCLEAN;
}
/*
* For live tree block (new tree blocks in current transaction),
* we need proper lock context to avoid race, which is
impossible here.
* So we only checks tree blocks which is read from disk, whose
* generation <= fs_info->last_trans_committed.
*/
if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
return 0;
/* We have @first_key, so this @eb must have at least one item */
if (btrfs_header_nritems(eb) == 0) {
btrfs_err(fs_info,
"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
eb->start);
WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
return -EUCLEAN;
}
This is the code, you have the exact same check above the header
generation thing, and that's not the problem, so I don't understand why
you have added this check.
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
2020-08-12 1:50 ` Josef Bacik
@ 2020-08-12 1:53 ` Qu Wenruo
0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-08-12 1:53 UTC (permalink / raw)
To: Josef Bacik, Qu Wenruo, linux-btrfs
On 2020/8/12 上午9:50, Josef Bacik wrote:
> On 8/11/20 8:29 PM, Qu Wenruo wrote:
>>
>>
>> On 2020/8/12 上午8:23, Josef Bacik wrote:
>>> On 8/11/20 7:04 PM, Qu Wenruo wrote:
>>>>
>>>>
>> [...]
>>>>> Which I assume is the problem? The generation is 19, is that >
>>>>> last_trans_committed? Seems like this check just needs to be moved
>>>>> lower, right? Thanks,
>>>>
>>>> Nope, that generation 19 is valid. That fs has a higher generation, so
>>>> that's completely valid.
>>>>
>>>> The generation 19 is there because there is another csum leaf whose
>>>> generation is 19.
>>>>
>>>
>>> Then this patch does nothing, because we already have this check lower,
>>> so how exactly did it make the panic go away? Thanks,
>>>
>>> Josef
>>
>> Sorry, I don't get your point.
>>
>> The generation 19 isn't larger than last_trans_committed, so that check
>> has nothing to do with this case.
>>
>> And then it goes to the header_nritems() check, which is 0, and with
>> first_key present, which is invalid and we error out, rejecting the
>> corrupted leaf.
>>
>> What's the problem then?
>
>
> /* We have @first_key, so this @eb must have at least one item */
> if (btrfs_header_nritems(eb) == 0) {
> btrfs_err(fs_info,
> "invalid tree nritems, bytenr=%llu nritems=0 expect >0",
> eb->start);
> WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> return -EUCLEAN;
> }
>
> /*
> * For live tree block (new tree blocks in current transaction),
> * we need proper lock context to avoid race, which is
> impossible here.
> * So we only checks tree blocks which is read from disk, whose
> * generation <= fs_info->last_trans_committed.
> */
> if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
> return 0;
>
> /* We have @first_key, so this @eb must have at least one item */
> if (btrfs_header_nritems(eb) == 0) {
> btrfs_err(fs_info,
> "invalid tree nritems, bytenr=%llu nritems=0 expect >0",
> eb->start);
> WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> return -EUCLEAN;
> }
>
>
> This is the code, you have the exact same check above the header
> generation thing, and that's not the problem, so I don't understand why
> you have added this check.
Oh, you're right.
A bad auto merge, which we have merged the same patch before...
Thanks for spotting this!
Qu
>
> Josef
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref()
2020-08-09 12:09 [PATCH v3 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
` (2 preceding siblings ...)
2020-08-09 12:09 ` [PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
@ 2020-08-09 12:09 ` Qu Wenruo
2020-08-11 18:50 ` Josef Bacik
2020-08-09 12:09 ` [PATCH v3 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
4 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-09 12:09 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
[BUG]
With crafted image, btrfs can panic at insert_inline_extent_backref():
kernel BUG at fs/btrfs/extent-tree.c:1857!
invalid opcode: 0000 [#1] SMP PTI
CPU: 0 PID: 1117 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
RIP: 0010:insert_inline_extent_backref+0xcc/0xe0
Code: 45 20 49 8b 7e 50 49 89 d8 4c 8b 4d 10 48 8b 55 c8 4c 89 e1 41 57 4c 89 ee 50 ff 75 18 e8 cc bf ff ff 31 c0 48 83 c4 18 eb b2 <0f> 0b e8 9d df bd ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
RSP: 0018:ffffac4dc1287be8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000001
RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffac4dc1287c28 R08: ffffac4dc1287ab8 R09: ffffac4dc1287ac0
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8febef88a540 R14: ffff8febeaa7bc30 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8febf7a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f663ace94c0 CR3: 0000000235698006 CR4: 00000000000206f0
Call Trace:
? _cond_resched+0x1a/0x50
__btrfs_inc_extent_ref.isra.64+0x7e/0x240
? btrfs_merge_delayed_refs+0xa5/0x330
__btrfs_run_delayed_refs+0x653/0x1120
btrfs_run_delayed_refs+0xdb/0x1b0
btrfs_commit_transaction+0x52/0x950
? start_transaction+0x94/0x450
transaction_kthread+0x163/0x190
kthread+0x105/0x140
? btrfs_cleanup_transaction+0x560/0x560
? kthread_destroy_worker+0x50/0x50
ret_from_fork+0x35/0x40
Modules linked in:
---[ end trace 2ad8b3de903cf825 ]---
[CAUSE]
Due to extent tree corruption (still valid by itself, but bad cross ref),
we can allocate an extent which is still in extent tree.
The offending tree block of that case is from csum tree.
The newly allocated tree block is also for csum tree.
Then we will try to insert an tree block ref for the existing tree block
ref.
For btrfs tree extent item, a tree block can never be shared directly by
the same tree twice.
We have such BUG_ON() to prevent such problem, but BUG_ON() is
definitely not good enough.
[FIX]
Replace that BUG_ON() with proper error message and leaf dump for debug
build.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202829
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/extent-tree.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8e86e3524861..b664ad361bd8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1177,7 +1177,22 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
num_bytes, parent, root_objectid,
owner, offset, 1);
if (ret == 0) {
- BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
+ /*
+ * We're adding refs to an tree block we already own, this
+ * should not happen at all.
+ */
+ if (owner < BTRFS_FIRST_FREE_OBJECTID) {
+ btrfs_crit(trans->fs_info,
+"invalid operation, adding refs to an existing tree ref, bytenr=%llu num_bytes=%llu root_objectid=%llu",
+ bytenr, num_bytes, root_objectid);
+ if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+ WARN_ON(1);
+ btrfs_crit(trans->fs_info,
+ "path->slots[0]=%d path->nodes[0]:", path->slots[0]);
+ btrfs_print_leaf(path->nodes[0]);
+ }
+ return -EUCLEAN;
+ }
update_inline_extent_backref(path, iref, refs_to_add,
extent_op, NULL);
} else if (ret == -ENOENT) {
@@ -1397,6 +1412,9 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
/*
* __btrfs_inc_extent_ref - insert backreference for a given extent
*
+ * The work is opposite as __btrfs_free_extent().
+ * For more info about how it works or examples, refer to __btrfs_free_extent().
+ *
* @trans: Handle of transaction
*
* @node: The delayed ref node used to get the bytenr/length for
--
2.28.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v3 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref()
2020-08-09 12:09 ` [PATCH v3 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
@ 2020-08-11 18:50 ` Josef Bacik
0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-08-11 18:50 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov
On 8/9/20 8:09 AM, Qu Wenruo wrote:
> [BUG]
> With crafted image, btrfs can panic at insert_inline_extent_backref():
> kernel BUG at fs/btrfs/extent-tree.c:1857!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 0 PID: 1117 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
> RIP: 0010:insert_inline_extent_backref+0xcc/0xe0
> Code: 45 20 49 8b 7e 50 49 89 d8 4c 8b 4d 10 48 8b 55 c8 4c 89 e1 41 57 4c 89 ee 50 ff 75 18 e8 cc bf ff ff 31 c0 48 83 c4 18 eb b2 <0f> 0b e8 9d df bd ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
> RSP: 0018:ffffac4dc1287be8 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000001
> RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffac4dc1287c28 R08: ffffac4dc1287ab8 R09: ffffac4dc1287ac0
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff8febef88a540 R14: ffff8febeaa7bc30 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff8febf7a00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f663ace94c0 CR3: 0000000235698006 CR4: 00000000000206f0
> Call Trace:
> ? _cond_resched+0x1a/0x50
> __btrfs_inc_extent_ref.isra.64+0x7e/0x240
> ? btrfs_merge_delayed_refs+0xa5/0x330
> __btrfs_run_delayed_refs+0x653/0x1120
> btrfs_run_delayed_refs+0xdb/0x1b0
> btrfs_commit_transaction+0x52/0x950
> ? start_transaction+0x94/0x450
> transaction_kthread+0x163/0x190
> kthread+0x105/0x140
> ? btrfs_cleanup_transaction+0x560/0x560
> ? kthread_destroy_worker+0x50/0x50
> ret_from_fork+0x35/0x40
> Modules linked in:
> ---[ end trace 2ad8b3de903cf825 ]---
>
> [CAUSE]
> Due to extent tree corruption (still valid by itself, but bad cross ref),
> we can allocate an extent which is still in extent tree.
> The offending tree block of that case is from csum tree.
> The newly allocated tree block is also for csum tree.
>
> Then we will try to insert an tree block ref for the existing tree block
> ref.
>
> For btrfs tree extent item, a tree block can never be shared directly by
> the same tree twice.
> We have such BUG_ON() to prevent such problem, but BUG_ON() is
> definitely not good enough.
>
> [FIX]
> Replace that BUG_ON() with proper error message and leaf dump for debug
> build.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202829
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 5/5] btrfs: ctree: Checking key orders before merged tree blocks
2020-08-09 12:09 [PATCH v3 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
` (3 preceding siblings ...)
2020-08-09 12:09 ` [PATCH v3 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
@ 2020-08-09 12:09 ` Qu Wenruo
2020-08-11 18:53 ` Josef Bacik
4 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-09 12:09 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
[BUG]
With crafted image, btrfs can panic at btrfs_del_csums().
kernel BUG at fs/btrfs/ctree.c:3188!
invalid opcode: 0000 [#1] SMP PTI
CPU: 0 PID: 1156 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
RIP: 0010:btrfs_set_item_key_safe+0x16c/0x180
Code: b7 48 8d 7d bf 4c 89 fe 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 ce f2 ff ff 85 c0 0f 8f 48 ff ff ff 0f 0b <0f> 0b e8 dd 8d be ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
RSP: 0018:ffff976141257ab8 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff898a6b890930 RCX: 0000000004b70000
RDX: 0000000000000000 RSI: ffff976141257bae RDI: ffff976141257acf
RBP: ffff976141257b10 R08: 0000000000001000 R09: ffff9761412579a8
R10: 0000000000000000 R11: 0000000000000000 R12: ffff976141257abe
R13: 0000000000000003 R14: ffff898a6a8be578 R15: ffff976141257bae
FS: 0000000000000000(0000) GS:ffff898a77a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f779d9cd624 CR3: 000000022b2b4006 CR4: 00000000000206f0
Call Trace:
truncate_one_csum+0xac/0xf0
btrfs_del_csums+0x24f/0x3a0
__btrfs_free_extent.isra.72+0x5a7/0xbe0
__btrfs_run_delayed_refs+0x539/0x1120
btrfs_run_delayed_refs+0xdb/0x1b0
btrfs_commit_transaction+0x52/0x950
? start_transaction+0x94/0x450
transaction_kthread+0x163/0x190
kthread+0x105/0x140
? btrfs_cleanup_transaction+0x560/0x560
? kthread_destroy_worker+0x50/0x50
ret_from_fork+0x35/0x40
Modules linked in:
---[ end trace 93bf9db00e6c374e ]---
[CAUSE]
This crafted image has a very tricky key order corruption:
checksum tree key (CSUM_TREE ROOT_ITEM 0)
node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
...
key (EXTENT_CSUM EXTENT_CSUM 73785344) block 29757440 gen 19
key (EXTENT_CSUM EXTENT_CSUM 77594624) block 29753344 gen 19
...
leaf 29757440 items 5 free space 150 generation 19 owner CSUM_TREE
item 0 key (EXTENT_CSUM EXTENT_CSUM 73785344) itemoff 2323 itemsize 1672
range start 73785344 end 75497472 length 1712128
item 1 key (EXTENT_CSUM EXTENT_CSUM 75497472) itemoff 2319 itemsize 4
range start 75497472 end 75501568 length 4096
item 2 key (EXTENT_CSUM EXTENT_CSUM 75501568) itemoff 579 itemsize 1740
range start 75501568 end 77283328 length 1781760
item 3 key (EXTENT_CSUM EXTENT_CSUM 77283328) itemoff 575 itemsize 4
range start 77283328 end 77287424 length 4096
item 4 key (EXTENT_CSUM EXTENT_CSUM 4120596480) itemoff 275 itemsize 300 <<<
range start 4120596480 end 4120903680 length 307200
leaf 29753344 items 3 free space 1936 generation 19 owner CSUM_TREE
item 0 key (18446744073457893366 EXTENT_CSUM 77594624) itemoff 2323 itemsize 1672
range start 77594624 end 79306752 length 1712128
...
Note the item 4 key of leaf 29757440, which is obviously too large, and
even larger than the first key of the next leaf.
However it still follows the key order in that tree block, thus tree
checker is unable to detect it at read time, since tree checker can only
work inside a leaf, thus such complex corruption can't be rejected in
advance.
[FIX]
The next timing to detect such problem is at tree block merge time,
which is in push_node_left(), balance_node_right(), push_leaf_left() and
push_leaf_right().
Now we check if the key order of the right most key of the left node is
larger than the left most key of the right node.
By this we don't need to call the full tree-check, while still keeps the
key order correct as key order in each node is already checked by tree
checker thus we only need to check the above two slots.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202833
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/ctree.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 70e49d8d4f6c..497abb397ea1 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3159,6 +3159,52 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
fixup_low_keys(path, &disk_key, 1);
}
+/*
+ * Check the cross tree block key ordering.
+ *
+ * Tree-checker only works inside one tree block, thus the following
+ * corruption can not be rejected by tree-checker:
+ * Leaf @left | Leaf @right
+ * --------------------------------------------------------------
+ * | 1 | 2 | 3 | 4 | 5 | f6 | | 7 | 8 |
+ *
+ * Key f6 in leaf @left itself is valid, but not valid when the next
+ * key in leaf @right is 7.
+ * This can only be checked at tree block merge time.
+ * And since tree checker has ensured all key order in each tree block
+ * is correct, we only need to bother the last key of @left and the first
+ * key of @right.
+ */
+static bool valid_cross_tree_key_order(struct extent_buffer *left,
+ struct extent_buffer *right)
+{
+ struct btrfs_key left_last;
+ struct btrfs_key right_first;
+ int level = btrfs_header_level(left);
+ int nr_left = btrfs_header_nritems(left);
+ int nr_right = btrfs_header_nritems(right);
+
+ /* No key to check in one of the tree blocks */
+ if (!nr_left || !nr_right)
+ return true;
+ if (level) {
+ btrfs_node_key_to_cpu(left, &left_last, nr_left - 1);
+ btrfs_node_key_to_cpu(right, &right_first, 0);
+ } else {
+ btrfs_item_key_to_cpu(left, &left_last, nr_left - 1);
+ btrfs_item_key_to_cpu(right, &right_first, 0);
+ }
+ if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
+ btrfs_crit(left->fs_info,
+"bad key order cross tree blocks, left last (%llu %u %llu) right first (%llu %u %llu",
+ left_last.objectid, left_last.type,
+ left_last.offset, right_first.objectid,
+ right_first.type, right_first.offset);
+ return false;
+ }
+ return true;
+}
+
/*
* try to push data from one node into the next node left in the
* tree.
@@ -3203,6 +3249,12 @@ static int push_node_left(struct btrfs_trans_handle *trans,
} else
push_items = min(src_nritems - 8, push_items);
+ /* dst is the left eb, src is the middle eb */
+ if (!valid_cross_tree_key_order(dst, src)) {
+ ret = -EUCLEAN;
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
ret = tree_mod_log_eb_copy(dst, src, dst_nritems, 0, push_items);
if (ret) {
btrfs_abort_transaction(trans, ret);
@@ -3271,6 +3323,12 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
if (max_push < push_items)
push_items = max_push;
+ /* dst is the right eb, src is the middle eb */
+ if (!valid_cross_tree_key_order(src, dst)) {
+ ret = -EUCLEAN;
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
ret = tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
BUG_ON(ret < 0);
memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
@@ -3747,6 +3805,12 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
if (left_nritems == 0)
goto out_unlock;
+ if (!valid_cross_tree_key_order(left, right)) {
+ ret = -EUCLEAN;
+ btrfs_tree_unlock(right);
+ free_extent_buffer(right);
+ return ret;
+ }
if (path->slots[0] == left_nritems && !empty) {
/* Key greater than all keys in the leaf, right neighbor has
* enough room for it and we're not emptying our leaf to delete
@@ -3984,6 +4048,10 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
goto out;
}
+ if (!valid_cross_tree_key_order(left, right)) {
+ ret = -EUCLEAN;
+ goto out;
+ }
return __push_leaf_left(path, min_data_size,
empty, left, free_space, right_nritems,
max_slot);
--
2.28.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v3 5/5] btrfs: ctree: Checking key orders before merged tree blocks
2020-08-09 12:09 ` [PATCH v3 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
@ 2020-08-11 18:53 ` Josef Bacik
0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-08-11 18:53 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov
On 8/9/20 8:09 AM, Qu Wenruo wrote:
> [BUG]
> With crafted image, btrfs can panic at btrfs_del_csums().
> kernel BUG at fs/btrfs/ctree.c:3188!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 0 PID: 1156 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
> RIP: 0010:btrfs_set_item_key_safe+0x16c/0x180
> Code: b7 48 8d 7d bf 4c 89 fe 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 ce f2 ff ff 85 c0 0f 8f 48 ff ff ff 0f 0b <0f> 0b e8 dd 8d be ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
> RSP: 0018:ffff976141257ab8 EFLAGS: 00010202
> RAX: 0000000000000001 RBX: ffff898a6b890930 RCX: 0000000004b70000
> RDX: 0000000000000000 RSI: ffff976141257bae RDI: ffff976141257acf
> RBP: ffff976141257b10 R08: 0000000000001000 R09: ffff9761412579a8
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff976141257abe
> R13: 0000000000000003 R14: ffff898a6a8be578 R15: ffff976141257bae
> FS: 0000000000000000(0000) GS:ffff898a77a00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f779d9cd624 CR3: 000000022b2b4006 CR4: 00000000000206f0
> Call Trace:
> truncate_one_csum+0xac/0xf0
> btrfs_del_csums+0x24f/0x3a0
> __btrfs_free_extent.isra.72+0x5a7/0xbe0
> __btrfs_run_delayed_refs+0x539/0x1120
> btrfs_run_delayed_refs+0xdb/0x1b0
> btrfs_commit_transaction+0x52/0x950
> ? start_transaction+0x94/0x450
> transaction_kthread+0x163/0x190
> kthread+0x105/0x140
> ? btrfs_cleanup_transaction+0x560/0x560
> ? kthread_destroy_worker+0x50/0x50
> ret_from_fork+0x35/0x40
> Modules linked in:
> ---[ end trace 93bf9db00e6c374e ]---
>
> [CAUSE]
> This crafted image has a very tricky key order corruption:
>
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
> ...
> key (EXTENT_CSUM EXTENT_CSUM 73785344) block 29757440 gen 19
> key (EXTENT_CSUM EXTENT_CSUM 77594624) block 29753344 gen 19
> ...
>
> leaf 29757440 items 5 free space 150 generation 19 owner CSUM_TREE
> item 0 key (EXTENT_CSUM EXTENT_CSUM 73785344) itemoff 2323 itemsize 1672
> range start 73785344 end 75497472 length 1712128
> item 1 key (EXTENT_CSUM EXTENT_CSUM 75497472) itemoff 2319 itemsize 4
> range start 75497472 end 75501568 length 4096
> item 2 key (EXTENT_CSUM EXTENT_CSUM 75501568) itemoff 579 itemsize 1740
> range start 75501568 end 77283328 length 1781760
> item 3 key (EXTENT_CSUM EXTENT_CSUM 77283328) itemoff 575 itemsize 4
> range start 77283328 end 77287424 length 4096
> item 4 key (EXTENT_CSUM EXTENT_CSUM 4120596480) itemoff 275 itemsize 300 <<<
> range start 4120596480 end 4120903680 length 307200
> leaf 29753344 items 3 free space 1936 generation 19 owner CSUM_TREE
> item 0 key (18446744073457893366 EXTENT_CSUM 77594624) itemoff 2323 itemsize 1672
> range start 77594624 end 79306752 length 1712128
> ...
>
> Note the item 4 key of leaf 29757440, which is obviously too large, and
> even larger than the first key of the next leaf.
>
> However it still follows the key order in that tree block, thus tree
> checker is unable to detect it at read time, since tree checker can only
> work inside a leaf, thus such complex corruption can't be rejected in
> advance.
>
> [FIX]
> The next timing to detect such problem is at tree block merge time,
> which is in push_node_left(), balance_node_right(), push_leaf_left() and
> push_leaf_right().
>
> Now we check if the key order of the right most key of the left node is
> larger than the left most key of the right node.
>
> By this we don't need to call the full tree-check, while still keeps the
> key order correct as key order in each node is already checked by tree
> checker thus we only need to check the above two slots.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202833
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread