Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* BUG Report: kernel NULL pointer dereference in bio_integrity_advance()
@ 2024-08-26 14:32 pjy
  2024-08-27  7:26 ` Christoph Hellwig
  0 siblings, 1 reply; 2+ messages in thread
From: pjy @ 2024-08-26 14:32 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg; +Cc: linux-nvme


Hi,

I saw that running the following command on 5.4, 5.10, 5.15 stable
kernels crashes the system with a NULL pointer dereference:

 root@pjy:~# touch test.txt
 root@pjy:~# nvme io-passthru /dev/nvme0 --opcode=0x1 --input-file=test.txt --data-len=1 --write --namespace=1 --metadata-len=1
 nvme nvme0: using deprecated NVME_IOCTL_IO_CMD ioctl on the char device!
 Unable to handle kernel NULL pointer dereference at virtual address 000000000000000a
 Mem abort info:
   ESR = 0x96000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106500000
 [000000000000000a] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 96000004 [#1] PREEMPT SMP
 Modules linked in: crct10dif_ce nvme nvme_core fuse drm dm_mod ip_tables x_tables ipv6
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.1-gb6abb62daa55 #1
 Hardware name: linux,dummy-virt (DT)
 pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : bio_integrity_advance+0x4c/0x100
 lr : bio_advance+0x34/0x120
 sp : ffff800010003d90
 x29: ffff800010003d90 x28: ffff800011d234c0 x27: ffff80001151ff20
 x26: ffff0000be180310 x25: 00000000000000b1 x24: 0000000000000001
 x23: 0000000000000000 x22: 0000000000000000 x21: ffff0000c6733640
 x20: ffff0000c47afa00 x19: ffff0000c5f81108 x18: 0000000000000001
 x17: ffff8000edf70000 x16: ffff800010004000 x15: 0000000000004000
 x14: 0000000000000001 x13: 0000000000000002 x12: 0000000000000400
 x11: 0000000000000040 x10: ffff0000c0034168 x9 : ffff0000c0034160
 x8 : ffff0000c0424550 x7 : 0000000000000000 x6 : 0000000000000000
 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
 x2 : ffff0000c5f81100 x1 : 0000000000000001 x0 : 0000000000000000
 Call trace:
  bio_integrity_advance+0x4c/0x100
  bio_advance+0x34/0x120
  blk_update_request+0x174/0x400
  blk_mq_end_request+0x2c/0x150
  nvme_complete_rq+0x4c/0x10c [nvme_core]
  nvme_pci_complete_rq+0x4c/0xa4 [nvme]
  nvme_process_cq+0x144/0x250 [nvme]
  nvme_irq+0x18/0x30 [nvme]
  __handle_irq_event_percpu+0x40/0x15c
  handle_irq_event+0x64/0x140
  handle_fasteoi_irq+0xa8/0x1a0
  handle_domain_irq+0x64/0x94
  gic_handle_irq+0xbc/0x140
  call_on_irq_stack+0x2c/0x60
  do_interrupt_handler+0x54/0x60
  el1_interrupt+0x30/0x80
  el1h_64_irq_handler+0x1c/0x2c
  el1h_64_irq+0x78/0x7c
  finish_task_switch.isra.0+0x98/0x260
  __schedule+0x2a4/0x714
  schedule_idle+0x2c/0x50
  do_idle+0x190/0x2cc
  cpu_startup_entry+0x28/0x80
  rest_init+0xe8/0x100
  arch_call_rest_init+0x14/0x20
  start_kernel+0x634/0x674
  __primary_switched+0xc0/0xc8
 Code: f9402800 f84c8c04 f100009f 9a9f1000 (39402804)
 ---[ end trace 515229a85ac6ccf1 ]---
 Kernel panic - not syncing: Oops: Fatal exception in interrupt
 SMP: stopping secondary CPUs
 Kernel Offset: disabled
 CPU features: 0x11000471,20000846
 Memory Limit: none
 ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

This is because in the function:

void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{
	struct bio_integrity_payload *bip = bio_integrity(bio);
	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
	unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);

	bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
	bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}

Here blk_get_integrity() returns NULL and bio_integrity_bytes() uses it
without checking for NULL.

This issue is also present in mainline but doesn't trigger because after

d4aa57a1cac3 ("block: don't bother iter advancing a fully done bio")

bio_advance() is not called for this reproducer, but this bug might be
triggerable through another path.

I want to send a patch to fix this but need some help to understand
where the change has to be made.

in 5.15 for example:

void bio_advance(struct bio *bio, unsigned bytes)
{
	if (bio_integrity(bio))
		bio_integrity_advance(bio, bytes);

	bio_crypt_advance(bio, bytes);
	bio_advance_iter(bio, &bio->bi_iter, bytes);
}

Here bio_integrity(bio) returns non-null and therefore
bio_integrity_advance() is called but in that fuction,
blk_get_integrity(bio->bi_bdev->bd_disk) returns NULL because for this
disk bi->profile is NULL.

So, the problem is that bi->profile is NULL for this disk but
bio->bi_integrity is non-NULL for this bio.


Please help me debug this further.


P.S. - Reproducing using qemu.
Here are the commands I used:

qemu-system-aarch64 -machine 'virt,gic-version=3' -cpu 'cortex-a57' -smp \
 2 -m 4G -drive format=raw,file=rootfs -device \
 virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2222-:22 \ 
 -kernel linux/arch/arm64/boot/Image -nographic -append "root=/dev/vda rw \
 console=ttyAMA0 debug earlyprintk=serial slub_debug=UZ nokaslr" -gdb \
 tcp::1234 -d guest_errors,unimp -D log.txt -drive \
 file=nvm.img,if=none,id=nvm -device nvme,serial=deadbeef,drive=nvm

Kernel is v5.15.1 compiled with arm64 defconfig

The following commands will then crash the kernel:
# touch test.txt
# nvme io-passthru /dev/nvme0 --opcode=0x1 --input-file=test.txt --data-len=1 --write --namespace=1 --metadata-len=1


Thanks,
Puranjay


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: BUG Report: kernel NULL pointer dereference in bio_integrity_advance()
  2024-08-26 14:32 BUG Report: kernel NULL pointer dereference in bio_integrity_advance() pjy
@ 2024-08-27  7:26 ` Christoph Hellwig
  0 siblings, 0 replies; 2+ messages in thread
From: Christoph Hellwig @ 2024-08-27  7:26 UTC (permalink / raw)
  To: pjy; +Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme

On Mon, Aug 26, 2024 at 02:32:31PM +0000, pjy@amazon.com wrote:
> This is because in the function:
> 
> void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
> {
> 	struct bio_integrity_payload *bip = bio_integrity(bio);
> 	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
> 	unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
> 
> 	bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
> 	bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
> }
> 
> Here blk_get_integrity() returns NULL and bio_integrity_bytes() uses it
> without checking for NULL.

So the above is on a NVMe namespace that does not support metadata?

We currently don't check if a namespace supports metadata before
sending it, so something like the patch below should fix it:

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index f1d58e70933f54..b1d1422f812a63 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -4,6 +4,7 @@
  * Copyright (c) 2017-2021 Christoph Hellwig.
  */
 #include <linux/bio-integrity.h>
+#include <linux/blk-integrity.h>
 #include <linux/ptrace.h>	/* for force_successful_syscall_return */
 #include <linux/nvme_ioctl.h>
 #include <linux/io_uring/cmd.h>
@@ -121,6 +122,9 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer,
 	struct block_device *bdev = ns ? ns->disk->part0 : NULL;
 	struct bio *bio = NULL;
 	int ret;
+		
+	if (meta_buffer && meta_len && !blk_get_integrity(bdev->bd_disk))
+		return -EINVAL;
 
 	if (ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED)) {
 		struct iov_iter iter;


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-08-27  7:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-26 14:32 BUG Report: kernel NULL pointer dereference in bio_integrity_advance() pjy
2024-08-27  7:26 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox