From: Johannes Thumshirn <jth@kernel.org>
To: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>
Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
Johannes Thumshirn <jthumshirn@wdc.com>,
Filipe Manana <fdmanana@suse.com>, Qu Wenruo <wqu@suse.com>,
Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: [PATCH v3 4/5] btrfs: don't readahead the relocation inode on RST
Date: Wed, 31 Jul 2024 22:43:06 +0200 [thread overview]
Message-ID: <20240731-debug-v3-4-f9b7ed479b10@kernel.org> (raw)
In-Reply-To: <20240731-debug-v3-0-f9b7ed479b10@kernel.org>
From: Johannes Thumshirn <jthumshirn@wdc.com>
On relocation we're doing readahead on the relocation inode, but if the
filesystem is backed by a RAID stripe tree we can get ENOENT (e.g. due to
preallocated extents not being mapped in the RST) from the lookup.
But readahead doesn't handle the error and submits invalid reads to the
device, causing an assertion in the scatter-gather list code:
BTRFS info (device nvme1n1): balance: start -d -m -s
BTRFS info (device nvme1n1): relocating block group 6480920576 flags data|raid0
BTRFS error (device nvme1n1): cannot find raid-stripe for logical [6481928192, 6481969152] devid 2, profile raid0
------------[ cut here ]------------
kernel BUG at include/linux/scatterlist.h:115!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1012 Comm: btrfs Not tainted 6.10.0-rc7+ #567
RIP: 0010:__blk_rq_map_sg+0x339/0x4a0
RSP: 0018:ffffc90001a43820 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffea00045d4802
RDX: 0000000117520000 RSI: 0000000000000000 RDI: ffff8881027d1000
RBP: 0000000000003000 R08: ffffea00045d4902 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000001000 R12: ffff8881003d10b8
R13: ffffc90001a438f0 R14: 0000000000000000 R15: 0000000000003000
FS: 00007fcc048a6900(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002cd11000 CR3: 00000001109ea001 CR4: 0000000000370eb0
Call Trace:
<TASK>
? __die_body.cold+0x14/0x25
? die+0x2e/0x50
? do_trap+0xca/0x110
? do_error_trap+0x65/0x80
? __blk_rq_map_sg+0x339/0x4a0
? exc_invalid_op+0x50/0x70
? __blk_rq_map_sg+0x339/0x4a0
? asm_exc_invalid_op+0x1a/0x20
? __blk_rq_map_sg+0x339/0x4a0
nvme_prep_rq.part.0+0x9d/0x770
nvme_queue_rq+0x7d/0x1e0
__blk_mq_issue_directly+0x2a/0x90
? blk_mq_get_budget_and_tag+0x61/0x90
blk_mq_try_issue_list_directly+0x56/0xf0
blk_mq_flush_plug_list.part.0+0x52b/0x5d0
__blk_flush_plug+0xc6/0x110
blk_finish_plug+0x28/0x40
read_pages+0x160/0x1c0
page_cache_ra_unbounded+0x109/0x180
relocate_file_extent_cluster+0x611/0x6a0
? btrfs_search_slot+0xba4/0xd20
? balance_dirty_pages_ratelimited_flags+0x26/0xb00
relocate_data_extent.constprop.0+0x134/0x160
relocate_block_group+0x3f2/0x500
btrfs_relocate_block_group+0x250/0x430
btrfs_relocate_chunk+0x3f/0x130
btrfs_balance+0x71b/0xef0
? kmalloc_trace_noprof+0x13b/0x280
btrfs_ioctl+0x2c2e/0x3030
? kvfree_call_rcu+0x1e6/0x340
? list_lru_add_obj+0x66/0x80
? mntput_no_expire+0x3a/0x220
__x64_sys_ioctl+0x96/0xc0
do_syscall_64+0x54/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fcc04514f9b
Code: Unable to access opcode bytes at 0x7fcc04514f71.
RSP: 002b:00007ffeba923370 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fcc04514f9b
RDX: 00007ffeba923460 RSI: 00000000c4009420 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000013 R09: 0000000000000001
R10: 00007fcc043fbba8 R11: 0000000000000246 R12: 00007ffeba924fc5
R13: 00007ffeba923460 R14: 0000000000000002 R15: 00000000004d4bb0
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__blk_rq_map_sg+0x339/0x4a0
RSP: 0018:ffffc90001a43820 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffea00045d4802
RDX: 0000000117520000 RSI: 0000000000000000 RDI: ffff8881027d1000
RBP: 0000000000003000 R08: ffffea00045d4902 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000001000 R12: ffff8881003d10b8
R13: ffffc90001a438f0 R14: 0000000000000000 R15: 0000000000003000
FS: 00007fcc048a6900(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcc04514f71 CR3: 00000001109ea001 CR4: 0000000000370eb0
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception ]---
So in case of a relocation on a RAID stripe-tree based file system, skip
the readahead.
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/relocation.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0533d0f82dc9..72fb43b4d27c 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -36,6 +36,7 @@
#include "relocation.h"
#include "super.h"
#include "tree-checker.h"
+#include "raid-stripe-tree.h"
/*
* Relocation overview
@@ -2965,21 +2966,26 @@ static int relocate_one_folio(struct reloc_control *rc,
u64 folio_end;
u64 cur;
int ret;
+ bool use_rst =
+ btrfs_need_stripe_tree_update(fs_info, rc->block_group->flags);
ASSERT(index <= last_index);
folio = filemap_lock_folio(inode->i_mapping, index);
if (IS_ERR(folio)) {
- page_cache_sync_readahead(inode->i_mapping, ra, NULL,
- index, last_index + 1 - index);
+ if (!use_rst)
+ page_cache_sync_readahead(inode->i_mapping, ra, NULL,
+ index,
+ last_index + 1 - index);
folio = __filemap_get_folio(inode->i_mapping, index,
- FGP_LOCK | FGP_ACCESSED | FGP_CREAT, mask);
+ FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
+ mask);
if (IS_ERR(folio))
return PTR_ERR(folio);
}
WARN_ON(folio_order(folio));
- if (folio_test_readahead(folio))
+ if (folio_test_readahead(folio) && !use_rst)
page_cache_async_readahead(inode->i_mapping, ra, NULL,
folio, last_index + 1 - index);
--
2.43.0
next prev parent reply other threads:[~2024-07-31 20:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-31 20:43 [PATCH v3 0/5] btrfs: fix relocation on RAID stripe-tree filesystems Johannes Thumshirn
2024-07-31 20:43 ` [PATCH v3 1/5] btrfs: don't dump stripe-tree on lookup error Johannes Thumshirn
2024-07-31 20:43 ` [PATCH v3 2/5] btrfs: rename btrfs_io_stripe::is_scrub to rst_search_commit_root Johannes Thumshirn
2024-07-31 20:43 ` [PATCH v3 3/5] btrfs: set rst_search_commit_root in case of relocation Johannes Thumshirn
2024-08-05 16:28 ` David Sterba
2024-07-31 20:43 ` Johannes Thumshirn [this message]
2024-08-05 16:33 ` [PATCH v3 4/5] btrfs: don't readahead the relocation inode on RST David Sterba
2024-07-31 20:43 ` [PATCH v3 5/5] btrfs: change RST lookup error message to debug Johannes Thumshirn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240731-debug-v3-4-f9b7ed479b10@kernel.org \
--to=jth@kernel.org \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=johannes.thumshirn@wdc.com \
--cc=josef@toxicpanda.com \
--cc=jthumshirn@wdc.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox