Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Johannes Thumshirn <jth@kernel.org>
To: linux-btrfs@vger.kernel.org
Cc: David Sterba <dsterba@suse.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Naohiro Aota <naohiro.aota@wdc.com>,
	Damien Le Moal <dlemoal@kernel.org>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Naohiro Aota <Naohiro.Aota@wdc.com>,
	Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Subject: [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes
Date: Fri,  7 Jun 2024 13:46:28 +0200	[thread overview]
Message-ID: <20240607114628.5471-1-jth@kernel.org> (raw)

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Shin'ichiro reported that when he's running fstests' test-case
btrfs/167 on emulated zoned devices, he's seeing the following NULL
pointer dereference in 'btrfs_zone_finish_endio()':

 Oops: general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] PREEMPT SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f]
 CPU: 4 PID: 2332440 Comm: kworker/u80:15 Tainted: G        W          6.10.0-rc2-kts+ #4
 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
 RIP: 0010:btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]

 RSP: 0018:ffff88867f107a90 EFLAGS: 00010206
 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff893e5534
 RDX: 0000000000000011 RSI: 0000000000000004 RDI: 0000000000000088
 RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed1081696028
 R10: ffff88840b4b0143 R11: ffff88834dfff600 R12: ffff88840b4b0000
 R13: 0000000000020000 R14: 0000000000000000 R15: ffff888530ad5210
 FS:  0000000000000000(0000) GS:ffff888e3f800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f87223fff38 CR3: 00000007a7c6a002 CR4: 00000000007706f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  ? __die_body.cold+0x19/0x27
  ? die_addr+0x46/0x70
  ? exc_general_protection+0x14f/0x250
  ? asm_exc_general_protection+0x26/0x30
  ? do_raw_read_unlock+0x44/0x70
  ? btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]
  btrfs_finish_one_ordered+0x5d9/0x19a0 [btrfs]
  ? __pfx_lock_release+0x10/0x10
  ? do_raw_write_lock+0x90/0x260
  ? __pfx_do_raw_write_lock+0x10/0x10
  ? __pfx_btrfs_finish_one_ordered+0x10/0x10 [btrfs]
  ? _raw_write_unlock+0x23/0x40
  ? btrfs_finish_ordered_zoned+0x5a9/0x850 [btrfs]
  ? lock_acquire+0x435/0x500
  btrfs_work_helper+0x1b1/0xa70 [btrfs]
  ? __schedule+0x10a8/0x60b0
  ? __pfx___might_resched+0x10/0x10
  process_one_work+0x862/0x1410
  ? __pfx_lock_acquire+0x10/0x10
  ? __pfx_process_one_work+0x10/0x10
  ? assign_work+0x16c/0x240
  worker_thread+0x5e6/0x1010
  ? __pfx_worker_thread+0x10/0x10
  kthread+0x2c3/0x3a0
  ? trace_irq_enable.constprop.0+0xce/0x110
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x31/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>

 ---[ end trace 0000000000000000 ]---

Enabling CONFIG_BTRFS_ASSERT revealed the following assertion to
trigger:

 assertion failed: !list_empty(&ordered->list), in fs/btrfs/zoned.c:1815

This indicates, that we're missing the checksums list on the
ordered_extent. As btrfs/167 is doing a NOCOW write this is to be
expected.

Further analysis with drgn confirmed the assumption:

 >>> inode = prog.crashed_thread().stack_trace()[11]['ordered'].inode
 >>> btrfs_inode = drgn.container_of(inode, "struct btrfs_inode", \
					"vfs_inode")
 >>> print(btrfs_inode.flags)
 (u32)1

As zoned emulation mode simulates conventional zones on regular
devices, we cannot use zone-append for writing. But we're only
attaching dummy checksums if we're doing a zone-append write.

So for NOCOW zoned data writes on conventional zones, also attach a
dummy checksum.

Fixes: cbfce4c7fbde ("btrfs: optimize the logical to physical mapping for zoned writes")
Cc: Naohiro Aota <Naohiro.Aota@wdc.com>
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 477f350a8bd0..e3a57196b0ee 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -741,7 +741,9 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
 			ret = btrfs_bio_csum(bbio);
 			if (ret)
 				goto fail_put_bio;
-		} else if (use_append) {
+		} else if (use_append ||
+			   (btrfs_is_zoned(fs_info) && inode &&
+			    inode->flags & BTRFS_INODE_NODATASUM)) {
 			ret = btrfs_alloc_dummy_sum(bbio);
 			if (ret)
 				goto fail_put_bio;
-- 
2.43.0


             reply	other threads:[~2024-06-07 11:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07 11:46 Johannes Thumshirn [this message]
2024-06-11  0:21 ` [PATCH] btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes Shinichiro Kawasaki
2024-06-11 13:54 ` Naohiro Aota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240607114628.5471-1-jth@kernel.org \
    --to=jth@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=dsterba@suse.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=naohiro.aota@wdc.com \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox