* Zoned panic WRT btrfs_redirty_list_add
@ 2023-09-11 13:51 Josef Bacik
0 siblings, 0 replies; only message in thread
From: Josef Bacik @ 2023-09-11 13:51 UTC (permalink / raw)
To: linux-btrfs; +Cc: johannes.thumshirn, naohiro.aota
Hello,
I hit the following panic on our CI this last week
assertion failed: PageDirty(eb->pages[i]), in fs/btrfs/extent_io.c:3809
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:3809!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 858183 Comm: fsstress Not tainted 6.5.0+ #1
RIP: 0010:set_extent_buffer_dirty+0x11a/0x210
RSP: 0018:ffffc9000631fa28 EFLAGS: 00010246
RAX: 0000000000000047 RBX: ffff888116642848 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88817bd21880 RDI: ffff88817bd21880
RBP: 0000000000000004 R08: 0000000000000000 R09: ffffc9000631f8c8
R10: 0000000000000003 R11: ffffffff8c534318 R12: 0000000000000004
R13: 0000000000004000 R14: 0000000000000004 R15: ffff888116642848
FS: 00007f6608d72740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000c98568 CR3: 000000011f3b8000 CR4: 0000000000350ee0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xda/0x100
? set_extent_buffer_dirty+0x11a/0x210
? set_extent_buffer_dirty+0x11a/0x210
? do_error_trap+0x81/0x110
? set_extent_buffer_dirty+0x11a/0x210
? exc_invalid_op+0x50/0x70
? set_extent_buffer_dirty+0x11a/0x210
? asm_exc_invalid_op+0x1a/0x20
? set_extent_buffer_dirty+0x11a/0x210
? set_extent_buffer_dirty+0x11a/0x210
btrfs_redirty_list_add+0x75/0xd0
btrfs_free_tree_block+0x243/0x2f0
btrfs_del_leaf+0xba/0xe0
btrfs_del_items+0x49c/0x520
__btrfs_free_extent+0x615/0x1260
__btrfs_run_delayed_refs+0x2d9/0x1310
? lock_is_held_type+0x9b/0x110
? find_held_lock+0x2b/0x80
? btrfs_start_dirty_block_groups+0x50/0x5b0
btrfs_run_delayed_refs+0x59/0x220
btrfs_start_dirty_block_groups+0x3bc/0x5b0
? btrfs_commit_transaction+0x41/0x1420
? btrfs_commit_transaction+0x41/0x1420
btrfs_commit_transaction+0x106/0x1420
? btrfs_attach_transaction_barrier+0x22/0x60
? __pfx_sync_fs_one_sb+0x10/0x10
iterate_supers+0x7e/0xf0
We have a check to make sure the pages we just dirtied are actually set to
dirty, and this is what's failing.
In btrfs_redirty_list_add() we have an ASSERT(!EXTENT_BUFFER_DIRTY), so we know
we're going through the path where we weren't dirty and now we are and thus
setting the pages dirty.
However we're doing the btrfs_redirty_list_add() outside of the
btrfs_tree_lock(). We always set the EXTENT_DIRTY for ->dirty_pages on the
transaction, however we never clear it, we rely on the EXTENT_BUFFER_DIRTY flag
to be the ultimate arbiter of whether or not to write the extent buffer.
Which means there's a race where we could be currently writing this extent
buffer out and calling btrfs_redirty_list_add() on the extent buffer.
I attempted to naively fix this by adjusting the btrfs_clear_buffer_dirty()
helper to simply not clear dirty for extent buffers if we were zoned, but that
keeps blowing up in my face and I'm not awake enough to figure out why.
I had also tried just wrapping this in btrfs_tree_lock() but there was an
immediate deadlock that I didn't look at because I tried the
btrfs_clear_buffer_dirty() thing next.
In any case this needs to be reworked with this race in mind. Thanks,
Josef
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2023-09-11 22:07 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-11 13:51 Zoned panic WRT btrfs_redirty_list_add Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox