* [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes
@ 2024-06-07 11:46 Johannes Thumshirn
2024-06-11 0:21 ` Shinichiro Kawasaki
2024-06-11 13:54 ` Naohiro Aota
0 siblings, 2 replies; 3+ messages in thread
From: Johannes Thumshirn @ 2024-06-07 11:46 UTC (permalink / raw)
To: linux-btrfs
Cc: David Sterba, Josef Bacik, Naohiro Aota, Damien Le Moal,
Johannes Thumshirn, Naohiro Aota, Shinichiro Kawasaki
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Shin'ichiro reported that when he's running fstests' test-case
btrfs/167 on emulated zoned devices, he's seeing the following NULL
pointer dereference in 'btrfs_zone_finish_endio()':
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f]
CPU: 4 PID: 2332440 Comm: kworker/u80:15 Tainted: G W 6.10.0-rc2-kts+ #4
Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
RIP: 0010:btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]
RSP: 0018:ffff88867f107a90 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff893e5534
RDX: 0000000000000011 RSI: 0000000000000004 RDI: 0000000000000088
RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed1081696028
R10: ffff88840b4b0143 R11: ffff88834dfff600 R12: ffff88840b4b0000
R13: 0000000000020000 R14: 0000000000000000 R15: ffff888530ad5210
FS: 0000000000000000(0000) GS:ffff888e3f800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f87223fff38 CR3: 00000007a7c6a002 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __die_body.cold+0x19/0x27
? die_addr+0x46/0x70
? exc_general_protection+0x14f/0x250
? asm_exc_general_protection+0x26/0x30
? do_raw_read_unlock+0x44/0x70
? btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]
btrfs_finish_one_ordered+0x5d9/0x19a0 [btrfs]
? __pfx_lock_release+0x10/0x10
? do_raw_write_lock+0x90/0x260
? __pfx_do_raw_write_lock+0x10/0x10
? __pfx_btrfs_finish_one_ordered+0x10/0x10 [btrfs]
? _raw_write_unlock+0x23/0x40
? btrfs_finish_ordered_zoned+0x5a9/0x850 [btrfs]
? lock_acquire+0x435/0x500
btrfs_work_helper+0x1b1/0xa70 [btrfs]
? __schedule+0x10a8/0x60b0
? __pfx___might_resched+0x10/0x10
process_one_work+0x862/0x1410
? __pfx_lock_acquire+0x10/0x10
? __pfx_process_one_work+0x10/0x10
? assign_work+0x16c/0x240
worker_thread+0x5e6/0x1010
? __pfx_worker_thread+0x10/0x10
kthread+0x2c3/0x3a0
? trace_irq_enable.constprop.0+0xce/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---
Enabling CONFIG_BTRFS_ASSERT revealed the following assertion to
trigger:
assertion failed: !list_empty(&ordered->list), in fs/btrfs/zoned.c:1815
This indicates, that we're missing the checksums list on the
ordered_extent. As btrfs/167 is doing a NOCOW write this is to be
expected.
Further analysis with drgn confirmed the assumption:
>>> inode = prog.crashed_thread().stack_trace()[11]['ordered'].inode
>>> btrfs_inode = drgn.container_of(inode, "struct btrfs_inode", \
"vfs_inode")
>>> print(btrfs_inode.flags)
(u32)1
As zoned emulation mode simulates conventional zones on regular
devices, we cannot use zone-append for writing. But we're only
attaching dummy checksums if we're doing a zone-append write.
So for NOCOW zoned data writes on conventional zones, also attach a
dummy checksum.
Fixes: cbfce4c7fbde ("btrfs: optimize the logical to physical mapping for zoned writes")
Cc: Naohiro Aota <Naohiro.Aota@wdc.com>
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/bio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 477f350a8bd0..e3a57196b0ee 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -741,7 +741,9 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
ret = btrfs_bio_csum(bbio);
if (ret)
goto fail_put_bio;
- } else if (use_append) {
+ } else if (use_append ||
+ (btrfs_is_zoned(fs_info) && inode &&
+ inode->flags & BTRFS_INODE_NODATASUM)) {
ret = btrfs_alloc_dummy_sum(bbio);
if (ret)
goto fail_put_bio;
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes
2024-06-07 11:46 [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes Johannes Thumshirn
@ 2024-06-11 0:21 ` Shinichiro Kawasaki
2024-06-11 13:54 ` Naohiro Aota
1 sibling, 0 replies; 3+ messages in thread
From: Shinichiro Kawasaki @ 2024-06-11 0:21 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: linux-btrfs@vger.kernel.org, David Sterba, Josef Bacik,
Naohiro Aota, Damien Le Moal, Johannes Thumshirn
On Jun 07, 2024 / 13:46, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Shin'ichiro reported that when he's running fstests' test-case
> btrfs/167 on emulated zoned devices, he's seeing the following NULL
> pointer dereference in 'btrfs_zone_finish_endio()':
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] PREEMPT SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f]
> CPU: 4 PID: 2332440 Comm: kworker/u80:15 Tainted: G W 6.10.0-rc2-kts+ #4
> Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
> Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
> RIP: 0010:btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]
[...]
> As zoned emulation mode simulates conventional zones on regular
> devices, we cannot use zone-append for writing. But we're only
> attaching dummy checksums if we're doing a zone-append write.
>
> So for NOCOW zoned data writes on conventional zones, also attach a
> dummy checksum.
>
> Fixes: cbfce4c7fbde ("btrfs: optimize the logical to physical mapping for zoned writes")
> Cc: Naohiro Aota <Naohiro.Aota@wdc.com>
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Johaness, thank you very much for the fix. I confirmed that this patch avoids
the KASAN null-ptr-deref as well as the hang at btrfs/167. Also, I ran my zoned
btrfs test set and observed no regression:
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes
2024-06-07 11:46 [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes Johannes Thumshirn
2024-06-11 0:21 ` Shinichiro Kawasaki
@ 2024-06-11 13:54 ` Naohiro Aota
1 sibling, 0 replies; 3+ messages in thread
From: Naohiro Aota @ 2024-06-11 13:54 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: linux-btrfs@vger.kernel.org, David Sterba, Josef Bacik,
Damien Le Moal, Johannes Thumshirn, Shinichiro Kawasaki
On Fri, Jun 07, 2024 at 01:46:28PM GMT, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Shin'ichiro reported that when he's running fstests' test-case
> btrfs/167 on emulated zoned devices, he's seeing the following NULL
> pointer dereference in 'btrfs_zone_finish_endio()':
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] PREEMPT SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f]
> CPU: 4 PID: 2332440 Comm: kworker/u80:15 Tainted: G W 6.10.0-rc2-kts+ #4
> Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
> Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
> RIP: 0010:btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]
>
> RSP: 0018:ffff88867f107a90 EFLAGS: 00010206
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff893e5534
> RDX: 0000000000000011 RSI: 0000000000000004 RDI: 0000000000000088
> RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed1081696028
> R10: ffff88840b4b0143 R11: ffff88834dfff600 R12: ffff88840b4b0000
> R13: 0000000000020000 R14: 0000000000000000 R15: ffff888530ad5210
> FS: 0000000000000000(0000) GS:ffff888e3f800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f87223fff38 CR3: 00000007a7c6a002 CR4: 00000000007706f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __die_body.cold+0x19/0x27
> ? die_addr+0x46/0x70
> ? exc_general_protection+0x14f/0x250
> ? asm_exc_general_protection+0x26/0x30
> ? do_raw_read_unlock+0x44/0x70
> ? btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]
> btrfs_finish_one_ordered+0x5d9/0x19a0 [btrfs]
> ? __pfx_lock_release+0x10/0x10
> ? do_raw_write_lock+0x90/0x260
> ? __pfx_do_raw_write_lock+0x10/0x10
> ? __pfx_btrfs_finish_one_ordered+0x10/0x10 [btrfs]
> ? _raw_write_unlock+0x23/0x40
> ? btrfs_finish_ordered_zoned+0x5a9/0x850 [btrfs]
> ? lock_acquire+0x435/0x500
> btrfs_work_helper+0x1b1/0xa70 [btrfs]
> ? __schedule+0x10a8/0x60b0
> ? __pfx___might_resched+0x10/0x10
> process_one_work+0x862/0x1410
> ? __pfx_lock_acquire+0x10/0x10
> ? __pfx_process_one_work+0x10/0x10
> ? assign_work+0x16c/0x240
> worker_thread+0x5e6/0x1010
> ? __pfx_worker_thread+0x10/0x10
> kthread+0x2c3/0x3a0
> ? trace_irq_enable.constprop.0+0xce/0x110
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x31/0x70
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
>
> ---[ end trace 0000000000000000 ]---
>
> Enabling CONFIG_BTRFS_ASSERT revealed the following assertion to
> trigger:
>
> assertion failed: !list_empty(&ordered->list), in fs/btrfs/zoned.c:1815
>
> This indicates, that we're missing the checksums list on the
> ordered_extent. As btrfs/167 is doing a NOCOW write this is to be
> expected.
>
> Further analysis with drgn confirmed the assumption:
>
> >>> inode = prog.crashed_thread().stack_trace()[11]['ordered'].inode
> >>> btrfs_inode = drgn.container_of(inode, "struct btrfs_inode", \
> "vfs_inode")
> >>> print(btrfs_inode.flags)
> (u32)1
>
> As zoned emulation mode simulates conventional zones on regular
> devices, we cannot use zone-append for writing. But we're only
> attaching dummy checksums if we're doing a zone-append write.
>
> So for NOCOW zoned data writes on conventional zones, also attach a
> dummy checksum.
>
> Fixes: cbfce4c7fbde ("btrfs: optimize the logical to physical mapping for zoned writes")
> Cc: Naohiro Aota <Naohiro.Aota@wdc.com>
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Looks good to me.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Thanks,
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-06-11 13:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-07 11:46 [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes Johannes Thumshirn
2024-06-11 0:21 ` Shinichiro Kawasaki
2024-06-11 13:54 ` Naohiro Aota
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox