* [PATCH 0/4] btrfs: fix relocation on RAID stripe-tree filesystems
@ 2024-07-29 12:00 Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 1/4] btrfs: don't dump stripe-tree on lookup error Johannes Thumshirn
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Johannes Thumshirn @ 2024-07-29 12:00 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Johannes Thumshirn, Johannes Thumshirn,
Filipe Manana
When doing relocation on RST backed filesystems, there is a possibility of
a scatter-gather list corruption.
See patch 4 for details.
CI Link: https://github.com/btrfs/linux/actions/runs/10143804038
---
Johannes Thumshirn (4):
btrfs: don't dump stripe-tree on lookup error
btrfs: rename btrfs_io_stripe::is_scrub to rst_search_commit_root
btrfs: set rst_search_commit_root in case of relocation
btrfs: don't readahead the relocation inode on RST
fs/btrfs/bio.c | 3 ++-
fs/btrfs/raid-stripe-tree.c | 6 ++----
fs/btrfs/relocation.c | 14 ++++++++++----
fs/btrfs/scrub.c | 2 +-
fs/btrfs/volumes.h | 2 +-
5 files changed, 16 insertions(+), 11 deletions(-)
---
base-commit: 543cb1b052748dc53ff06b23273fcb78f11b8254
change-id: 20240726-debug-f1fe805ea37b
Best regards,
--
Johannes Thumshirn <jth@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] btrfs: don't dump stripe-tree on lookup error
2024-07-29 12:00 [PATCH 0/4] btrfs: fix relocation on RAID stripe-tree filesystems Johannes Thumshirn
@ 2024-07-29 12:00 ` Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 2/4] btrfs: rename btrfs_io_stripe::is_scrub to rst_search_commit_root Johannes Thumshirn
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Johannes Thumshirn @ 2024-07-29 12:00 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Johannes Thumshirn
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
This just creates unnecessary noise and doesn't provide any insights into
debugging RAID stripe-tree related issues.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/raid-stripe-tree.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 684d4744f02d..97430918e923 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -284,8 +284,6 @@ int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info,
if (ret > 0)
ret = -ENOENT;
if (ret && ret != -EIO && !stripe->is_scrub) {
- if (IS_ENABLED(CONFIG_BTRFS_DEBUG))
- btrfs_print_tree(leaf, 1);
btrfs_err(fs_info,
"cannot find raid-stripe for logical [%llu, %llu] devid %llu, profile %s",
logical, logical + *length, stripe->dev->devid,
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/4] btrfs: rename btrfs_io_stripe::is_scrub to rst_search_commit_root
2024-07-29 12:00 [PATCH 0/4] btrfs: fix relocation on RAID stripe-tree filesystems Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 1/4] btrfs: don't dump stripe-tree on lookup error Johannes Thumshirn
@ 2024-07-29 12:00 ` Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 3/4] btrfs: set rst_search_commit_root in case of relocation Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 4/4] btrfs: don't readahead the relocation inode on RST Johannes Thumshirn
3 siblings, 0 replies; 5+ messages in thread
From: Johannes Thumshirn @ 2024-07-29 12:00 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Johannes Thumshirn
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Rename 'btrfs_io_stripe::is_scrub' to 'rst_search_commit_root'. While
'is_scrub' describes the state of the io_stripe (it is a stripe submitted
by scrub) it does not describe the purpose, namely looking at the commit
root when searching RAID stripe-tree entries.
Renaming the stripe to rst_search_commit_root describes this purpose.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/bio.c | 2 +-
fs/btrfs/raid-stripe-tree.c | 4 ++--
fs/btrfs/scrub.c | 2 +-
fs/btrfs/volumes.h | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index f04d93109960..dfb32f7d3fc2 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -679,7 +679,7 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
blk_status_t ret;
int error;
- smap.is_scrub = !bbio->inode;
+ smap.rst_search_commit_root = !bbio->inode;
btrfs_bio_counter_inc_blocked(fs_info);
error = btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 97430918e923..bfb567f602b1 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -210,7 +210,7 @@ int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info,
if (!path)
return -ENOMEM;
- if (stripe->is_scrub) {
+ if (stripe->rst_search_commit_root) {
path->skip_locking = 1;
path->search_commit_root = 1;
}
@@ -283,7 +283,7 @@ int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info,
out:
if (ret > 0)
ret = -ENOENT;
- if (ret && ret != -EIO && !stripe->is_scrub) {
+ if (ret && ret != -EIO && !stripe->rst_search_commit_root) {
btrfs_err(fs_info,
"cannot find raid-stripe for logical [%llu, %llu] devid %llu, profile %s",
logical, logical + *length, stripe->dev->devid,
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 0f15f5139896..a1f41e8b00cd 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1695,7 +1695,7 @@ static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx,
(i << fs_info->sectorsize_bits);
int err;
- io_stripe.is_scrub = true;
+ io_stripe.rst_search_commit_root = true;
stripe_len = (nr_sectors - i) << fs_info->sectorsize_bits;
/*
* For RST cases, we need to manually split the bbio to
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c947187539dd..03d2d60afe0c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -444,7 +444,7 @@ struct btrfs_io_stripe {
/* Block mapping. */
u64 physical;
u64 length;
- bool is_scrub;
+ bool rst_search_commit_root;
/* For the endio handler. */
struct btrfs_io_context *bioc;
};
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] btrfs: set rst_search_commit_root in case of relocation
2024-07-29 12:00 [PATCH 0/4] btrfs: fix relocation on RAID stripe-tree filesystems Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 1/4] btrfs: don't dump stripe-tree on lookup error Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 2/4] btrfs: rename btrfs_io_stripe::is_scrub to rst_search_commit_root Johannes Thumshirn
@ 2024-07-29 12:00 ` Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 4/4] btrfs: don't readahead the relocation inode on RST Johannes Thumshirn
3 siblings, 0 replies; 5+ messages in thread
From: Johannes Thumshirn @ 2024-07-29 12:00 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Johannes Thumshirn
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Set rst_search_commit_root in the btrfs_io_stripe we're passing to
btrfs_map_block() in case we're doing data relocation.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/bio.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index dfb32f7d3fc2..c6563616c813 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -679,7 +679,8 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
blk_status_t ret;
int error;
- smap.rst_search_commit_root = !bbio->inode;
+ smap.rst_search_commit_root =
+ (!bbio->inode || btrfs_is_data_reloc_root(inode->root));
btrfs_bio_counter_inc_blocked(fs_info);
error = btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 4/4] btrfs: don't readahead the relocation inode on RST
2024-07-29 12:00 [PATCH 0/4] btrfs: fix relocation on RAID stripe-tree filesystems Johannes Thumshirn
` (2 preceding siblings ...)
2024-07-29 12:00 ` [PATCH 3/4] btrfs: set rst_search_commit_root in case of relocation Johannes Thumshirn
@ 2024-07-29 12:00 ` Johannes Thumshirn
3 siblings, 0 replies; 5+ messages in thread
From: Johannes Thumshirn @ 2024-07-29 12:00 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Johannes Thumshirn, Johannes Thumshirn,
Filipe Manana
From: Johannes Thumshirn <jthumshirn@wdc.com>
On relocation we're doing readahead on the relocation inode, but if the
filesystem is backed by a RAID stripe tree we can get ENOENT from the
lookup.
But readahead doesn't handle the error and submits invalid reads to the
device, causing an assertion in the scatter-gather list code:
BTRFS info (device nvme1n1): balance: start -d -m -s
BTRFS info (device nvme1n1): relocating block group 6480920576 flags data|raid0
BTRFS error (device nvme1n1): cannot find raid-stripe for logical [6481928192, 6481969152] devid 2, profile raid0
------------[ cut here ]------------
kernel BUG at include/linux/scatterlist.h:115!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1012 Comm: btrfs Not tainted 6.10.0-rc7+ #567
RIP: 0010:__blk_rq_map_sg+0x339/0x4a0
RSP: 0018:ffffc90001a43820 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffea00045d4802
RDX: 0000000117520000 RSI: 0000000000000000 RDI: ffff8881027d1000
RBP: 0000000000003000 R08: ffffea00045d4902 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000001000 R12: ffff8881003d10b8
R13: ffffc90001a438f0 R14: 0000000000000000 R15: 0000000000003000
FS: 00007fcc048a6900(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002cd11000 CR3: 00000001109ea001 CR4: 0000000000370eb0
Call Trace:
<TASK>
? __die_body.cold+0x14/0x25
? die+0x2e/0x50
? do_trap+0xca/0x110
? do_error_trap+0x65/0x80
? __blk_rq_map_sg+0x339/0x4a0
? exc_invalid_op+0x50/0x70
? __blk_rq_map_sg+0x339/0x4a0
? asm_exc_invalid_op+0x1a/0x20
? __blk_rq_map_sg+0x339/0x4a0
nvme_prep_rq.part.0+0x9d/0x770
nvme_queue_rq+0x7d/0x1e0
__blk_mq_issue_directly+0x2a/0x90
? blk_mq_get_budget_and_tag+0x61/0x90
blk_mq_try_issue_list_directly+0x56/0xf0
blk_mq_flush_plug_list.part.0+0x52b/0x5d0
__blk_flush_plug+0xc6/0x110
blk_finish_plug+0x28/0x40
read_pages+0x160/0x1c0
page_cache_ra_unbounded+0x109/0x180
relocate_file_extent_cluster+0x611/0x6a0
? btrfs_search_slot+0xba4/0xd20
? balance_dirty_pages_ratelimited_flags+0x26/0xb00
relocate_data_extent.constprop.0+0x134/0x160
relocate_block_group+0x3f2/0x500
btrfs_relocate_block_group+0x250/0x430
btrfs_relocate_chunk+0x3f/0x130
btrfs_balance+0x71b/0xef0
? kmalloc_trace_noprof+0x13b/0x280
btrfs_ioctl+0x2c2e/0x3030
? kvfree_call_rcu+0x1e6/0x340
? list_lru_add_obj+0x66/0x80
? mntput_no_expire+0x3a/0x220
__x64_sys_ioctl+0x96/0xc0
do_syscall_64+0x54/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fcc04514f9b
Code: Unable to access opcode bytes at 0x7fcc04514f71.
RSP: 002b:00007ffeba923370 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fcc04514f9b
RDX: 00007ffeba923460 RSI: 00000000c4009420 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000013 R09: 0000000000000001
R10: 00007fcc043fbba8 R11: 0000000000000246 R12: 00007ffeba924fc5
R13: 00007ffeba923460 R14: 0000000000000002 R15: 00000000004d4bb0
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__blk_rq_map_sg+0x339/0x4a0
RSP: 0018:ffffc90001a43820 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffea00045d4802
RDX: 0000000117520000 RSI: 0000000000000000 RDI: ffff8881027d1000
RBP: 0000000000003000 R08: ffffea00045d4902 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000001000 R12: ffff8881003d10b8
R13: ffffc90001a438f0 R14: 0000000000000000 R15: 0000000000003000
FS: 00007fcc048a6900(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcc04514f71 CR3: 00000001109ea001 CR4: 0000000000370eb0
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception ]---
So in case of a relocation on a RAID stripe-tree based file system, skip
the readahead.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/relocation.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index b592fc8cf368..5a6066f6db42 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -36,6 +36,7 @@
#include "relocation.h"
#include "super.h"
#include "tree-checker.h"
+#include "raid-stripe-tree.h"
/*
* Relocation overview
@@ -2965,21 +2966,26 @@ static int relocate_one_folio(struct reloc_control *rc,
u64 folio_end;
u64 cur;
int ret;
+ bool use_rst =
+ btrfs_need_stripe_tree_update(fs_info, rc->block_group->flags);
ASSERT(index <= last_index);
folio = filemap_lock_folio(inode->i_mapping, index);
if (IS_ERR(folio)) {
- page_cache_sync_readahead(inode->i_mapping, ra, NULL,
- index, last_index + 1 - index);
+ if (!use_rst)
+ page_cache_sync_readahead(inode->i_mapping, ra, NULL,
+ index,
+ last_index + 1 - index);
folio = __filemap_get_folio(inode->i_mapping, index,
- FGP_LOCK | FGP_ACCESSED | FGP_CREAT, mask);
+ FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
+ mask);
if (IS_ERR(folio))
return PTR_ERR(folio);
}
WARN_ON(folio_order(folio));
- if (folio_test_readahead(folio))
+ if (folio_test_readahead(folio) && !use_rst)
page_cache_async_readahead(inode->i_mapping, ra, NULL,
folio, index,
last_index + 1 - index);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-29 12:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-29 12:00 [PATCH 0/4] btrfs: fix relocation on RAID stripe-tree filesystems Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 1/4] btrfs: don't dump stripe-tree on lookup error Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 2/4] btrfs: rename btrfs_io_stripe::is_scrub to rst_search_commit_root Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 3/4] btrfs: set rst_search_commit_root in case of relocation Johannes Thumshirn
2024-07-29 12:00 ` [PATCH 4/4] btrfs: don't readahead the relocation inode on RST Johannes Thumshirn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox