All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Thumshirn <jth@kernel.org>
To: linux-btrfs@vger.kernel.org
Cc: David Sterba <dsterba@suse.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Naohiro Aota <naohiro.aota@wdc.com>,
	Damien Le Moal <dlemoal@kernel.org>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Naohiro Aota <Naohiro.Aota@wdc.com>,
	Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Subject: [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes
Date: Fri,  7 Jun 2024 13:46:28 +0200	[thread overview]
Message-ID: <20240607114628.5471-1-jth@kernel.org> (raw)

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Shin'ichiro reported that when he's running fstests' test-case
btrfs/167 on emulated zoned devices, he's seeing the following NULL
pointer dereference in 'btrfs_zone_finish_endio()':

 Oops: general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] PREEMPT SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f]
 CPU: 4 PID: 2332440 Comm: kworker/u80:15 Tainted: G        W          6.10.0-rc2-kts+ #4
 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
 RIP: 0010:btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]

 RSP: 0018:ffff88867f107a90 EFLAGS: 00010206
 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff893e5534
 RDX: 0000000000000011 RSI: 0000000000000004 RDI: 0000000000000088
 RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed1081696028
 R10: ffff88840b4b0143 R11: ffff88834dfff600 R12: ffff88840b4b0000
 R13: 0000000000020000 R14: 0000000000000000 R15: ffff888530ad5210
 FS:  0000000000000000(0000) GS:ffff888e3f800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f87223fff38 CR3: 00000007a7c6a002 CR4: 00000000007706f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  ? __die_body.cold+0x19/0x27
  ? die_addr+0x46/0x70
  ? exc_general_protection+0x14f/0x250
  ? asm_exc_general_protection+0x26/0x30
  ? do_raw_read_unlock+0x44/0x70
  ? btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]
  btrfs_finish_one_ordered+0x5d9/0x19a0 [btrfs]
  ? __pfx_lock_release+0x10/0x10
  ? do_raw_write_lock+0x90/0x260
  ? __pfx_do_raw_write_lock+0x10/0x10
  ? __pfx_btrfs_finish_one_ordered+0x10/0x10 [btrfs]
  ? _raw_write_unlock+0x23/0x40
  ? btrfs_finish_ordered_zoned+0x5a9/0x850 [btrfs]
  ? lock_acquire+0x435/0x500
  btrfs_work_helper+0x1b1/0xa70 [btrfs]
  ? __schedule+0x10a8/0x60b0
  ? __pfx___might_resched+0x10/0x10
  process_one_work+0x862/0x1410
  ? __pfx_lock_acquire+0x10/0x10
  ? __pfx_process_one_work+0x10/0x10
  ? assign_work+0x16c/0x240
  worker_thread+0x5e6/0x1010
  ? __pfx_worker_thread+0x10/0x10
  kthread+0x2c3/0x3a0
  ? trace_irq_enable.constprop.0+0xce/0x110
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x31/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>

 ---[ end trace 0000000000000000 ]---

Enabling CONFIG_BTRFS_ASSERT revealed the following assertion to
trigger:

 assertion failed: !list_empty(&ordered->list), in fs/btrfs/zoned.c:1815

This indicates, that we're missing the checksums list on the
ordered_extent. As btrfs/167 is doing a NOCOW write this is to be
expected.

Further analysis with drgn confirmed the assumption:

 >>> inode = prog.crashed_thread().stack_trace()[11]['ordered'].inode
 >>> btrfs_inode = drgn.container_of(inode, "struct btrfs_inode", \
					"vfs_inode")
 >>> print(btrfs_inode.flags)
 (u32)1

As zoned emulation mode simulates conventional zones on regular
devices, we cannot use zone-append for writing. But we're only
attaching dummy checksums if we're doing a zone-append write.

So for NOCOW zoned data writes on conventional zones, also attach a
dummy checksum.

Fixes: cbfce4c7fbde ("btrfs: optimize the logical to physical mapping for zoned writes")
Cc: Naohiro Aota <Naohiro.Aota@wdc.com>
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 477f350a8bd0..e3a57196b0ee 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -741,7 +741,9 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
 			ret = btrfs_bio_csum(bbio);
 			if (ret)
 				goto fail_put_bio;
-		} else if (use_append) {
+		} else if (use_append ||
+			   (btrfs_is_zoned(fs_info) && inode &&
+			    inode->flags & BTRFS_INODE_NODATASUM)) {
 			ret = btrfs_alloc_dummy_sum(bbio);
 			if (ret)
 				goto fail_put_bio;
-- 
2.43.0


             reply	other threads:[~2024-06-07 11:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07 11:46 Johannes Thumshirn [this message]
2024-06-11  0:21 ` [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes Shinichiro Kawasaki
2024-06-11 13:54 ` Naohiro Aota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240607114628.5471-1-jth@kernel.org \
    --to=jth@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=dsterba@suse.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=naohiro.aota@wdc.com \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.