linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: NULL pointer dereferenced within __blk_rq_map_sg
@ 2025-02-08  2:09 Cheyenne Wills
  2025-02-11 12:13 ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Cheyenne Wills @ 2025-02-08  2:09 UTC (permalink / raw)
  To: linux-block; +Cc: Christoph Hellwig

While I was setting up to test with linux 6.14-rc1 (under Xen), I ran
into a consistent NULL ptr dereference within __blk_rq_map_sg when
booting the system.

Using git bisect I was able to narrow down the "bad" commit to:

block: add a dma mapping iterator (b7175e24d6acf79d9f3af9ce9d3d50de1fa748ec)

Building a kernel with the parent commit
(2caca8fc7aad9ea9a6ea3ed26ed146b1e5f06fab) using the same .config does
not fail.

Following is the console log showing the error as well as the Xen
(libvirt) configuration for the guest that I'm using.

Please let me know if there is any additional information that I can provide.

cheyenne.wills@gmail.com

Console log with error
----

[    6.535764] BUG: kernel NULL pointer dereference, address: 0000000000000028
[    6.547530] #PF: supervisor read access in kernel mode
[    6.556013] #PF: error_code(0x0000) - not-present page
[    6.566162] PGD 0 P4D 0
[    6.572427] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[    6.580457] CPU: 14 UID: 0 PID: 1433 Comm: kworker/14:1H Not
tainted 6.14.0-rc1+ #1
[    6.592054] Hardware name: Xen HVM domU, BIOS 4.19.1 01/17/2025
[    6.600738] Workqueue: kblockd blk_mq_requeue_work
[    6.610356] RIP: 0010:__blk_rq_map_sg+0x3d/0x410
[    6.618285] Code: 54 45 31 e4 55 48 89 cd 53 48 89 d3 48 83 ec 60
48 8b 4e 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 58 31 c0 48 89 e8
44 89 e5 <44> 8b 69 28 44 8b 41 2c 49 89 c4 44 8b 79 30 e9 b0 00 00 00
48 85
[    6.640873] RSP: 0018:ffffbd02005ebb38 EFLAGS: 00010046
[    6.649672] RAX: ffffbd02005ebc08 RBX: ffffa18cc11a7200 RCX: 0000000000000000
[    6.660862] RDX: ffffa18cc11a7200 RSI: ffffa18cc11e6600 RDI: ffffa18cc23a8000
[    6.672288] RBP: 0000000000000000 R08: ffffa18cc23a0000 R09: ffffa18cc11e6600
[    6.683278] R10: ffffa18cc1642980 R11: ffffa18cc148e400 R12: 0000000000000000
[    6.695085] R13: ffffa18cc11e6600 R14: ffffa18cc23a0be0 R15: ffffa18cc23a0000
[    6.708417] FS:  0000000000000000(0000) GS:ffffa18dc6d80000(0000)
knlGS:0000000000000000
[    6.724049] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.736413] CR2: 0000000000000028 CR3: 000000010a5e2000 CR4: 0000000000750ef0
[    6.748664] PKRU: 55555554
[    6.755404] Call Trace:
[    6.761889]  <TASK>
[    6.766985]  ? __die+0x23/0x70
[    6.774405]  ? page_fault_oops+0x158/0x460
[    6.784689]  ? exc_page_fault+0x6b/0x150
[    6.793848]  ? asm_exc_page_fault+0x26/0x30
[    6.801585]  ? __blk_rq_map_sg+0x3d/0x410
[    6.808362]  blkif_queue_rq+0x1de/0x840
[    6.816009]  blk_mq_dispatch_rq_list+0x117/0x6b0
[    6.822869]  __blk_mq_sched_dispatch_requests+0xb0/0x5b0
[    6.830766]  ? __remove_hrtimer+0x39/0x90
[    6.837653]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.846842]  ? xas_load+0xd/0xd0
[    6.852211]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.858252]  ? xas_find+0x157/0x1a0
[    6.863941]  blk_mq_sched_dispatch_requests+0x2d/0x70
[    6.871505]  blk_mq_run_hw_queue+0x22c/0x2f0
[    6.879164]  blk_mq_run_hw_queues+0x67/0x120
[    6.887146]  blk_mq_requeue_work+0x162/0x1a0
[    6.896083]  process_one_work+0x148/0x360
[    6.905583]  worker_thread+0x2cb/0x3e0
[    6.914302]  ? __pfx_worker_thread+0x10/0x10
[    6.923801]  kthread+0xf1/0x1d0
[    6.931407]  ? __pfx_kthread+0x10/0x10
[    6.940421]  ret_from_fork+0x34/0x50
[    6.948756]  ? __pfx_kthread+0x10/0x10
[    6.956678]  ret_from_fork_asm+0x1a/0x30
[    6.965756]  </TASK>
[    6.971401] Modules linked in:
[    6.977370] CR2: 0000000000000028
[    6.983075] ---[ end trace 0000000000000000 ]---
[    6.989697] RIP: 0010:__blk_rq_map_sg+0x3d/0x410
[    6.998861] Code: 54 45 31 e4 55 48 89 cd 53 48 89 d3 48 83 ec 60
48 8b 4e 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 58 31 c0 48 89 e8
44 89 e5 <44> 8b 69 28 44 8b 41 2c 49 89 c4 44 8b 79 30 e9 b0 00 00 00
48 85
[    7.027159] RSP: 0018:ffffbd02005ebb38 EFLAGS: 00010046
[    7.035909] RAX: ffffbd02005ebc08 RBX: ffffa18cc11a7200 RCX: 0000000000000000
[    7.047863] RDX: ffffa18cc11a7200 RSI: ffffa18cc11e6600 RDI: ffffa18cc23a8000
[    7.060227] RBP: 0000000000000000 R08: ffffa18cc23a0000 R09: ffffa18cc11e6600
[    7.070223] R10: ffffa18cc1642980 R11: ffffa18cc148e400 R12: 0000000000000000
[    7.079521] R13: ffffa18cc11e6600 R14: ffffa18cc23a0be0 R15: ffffa18cc23a0000
[    7.089842] FS:  0000000000000000(0000) GS:ffffa18dc6d80000(0000)
knlGS:0000000000000000
[    7.101846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.110248] CR2: 0000000000000028 CR3: 000000010a5e2000 CR4: 0000000000750ef0
[    7.121235] PKRU: 55555554
[    7.126201] note: kworker/14:1H[1433] exited with irqs disabled
[    7.134082] note: kworker/14:1H[1433] exited with preempt_count 1
[    7.143106] kworker/14:1H (1433) used greatest stack depth: 12848 bytes left
[    1.295002] cpu 9 spinlock event irq 121

----

Here is the libvirt/virtmanager configuration for the xen guest (if
this is of any help).
The xen hypervisor is: xen_version: 4.19.1 and the dom0 is gentoo with
a 6.6.67 kernel.

<domain type="xen">
  <name>linux614-test</name>
  <uuid>xxxxxxxxxxxxxxxxxx</uuid>
  <metadata>
    <libosinfo:libosinfo
xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://gentoo.org/gentoo/rolling"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">8388608</memory>
  <currentMemory unit="KiB">8388608</currentMemory>
  <vcpu placement="static">16</vcpu>
  <os>
    <type arch="x86_64" machine="xenfv">hvm</type>
    <loader type="rom">/usr/lib/xen/boot/hvmloader</loader>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset="utc"/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/lib/xen/bin/qemu-system-i386</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="raw"/>
      <source file="/var/lib/libvirt/images/linux614-test.img"/>
      <target dev="xvda" bus="xen"/>
    </disk>
    <controller type="xenbus" index="0"/>
    <controller type="ide" index="0"/>
    <interface type="bridge">
      <mac address="xxxxxxx"/>
      <source bridge="br0"/>
      <model type="e1000"/>
    </interface>
    <serial type="pty">
      <target port="0"/>
    </serial>
    <console type="pty">
      <target type="serial" port="0"/>
    </console>
    <input type="tablet" bus="usb"/>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <graphics type="vnc" port="-1" autoport="yes">
      <listen type="address"/>
    </graphics>
    <video>
      <model type="vga" vram="16384" heads="1" primary="yes"/>
    </video>
    <memballoon model="xen"/>
  </devices>
</domain>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: NULL pointer dereferenced within __blk_rq_map_sg
  2025-02-08  2:09 BUG: NULL pointer dereferenced within __blk_rq_map_sg Cheyenne Wills
@ 2025-02-11 12:13 ` Ming Lei
  2025-02-11 15:28   ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2025-02-11 12:13 UTC (permalink / raw)
  To: Cheyenne Wills; +Cc: linux-block, Christoph Hellwig

On Fri, Feb 07, 2025 at 07:09:39PM -0700, Cheyenne Wills wrote:
> While I was setting up to test with linux 6.14-rc1 (under Xen), I ran
> into a consistent NULL ptr dereference within __blk_rq_map_sg when
> booting the system.
> 
> Using git bisect I was able to narrow down the "bad" commit to:
> 
> block: add a dma mapping iterator (b7175e24d6acf79d9f3af9ce9d3d50de1fa748ec)
> 
> Building a kernel with the parent commit
> (2caca8fc7aad9ea9a6ea3ed26ed146b1e5f06fab) using the same .config does
> not fail.
> 
> Following is the console log showing the error as well as the Xen
> (libvirt) configuration for the guest that I'm using.
> 
> Please let me know if there is any additional information that I can provide.

Can you test the following patch?


diff --git a/block/blk-merge.c b/block/blk-merge.c
index b55c52a42303..1eabde8383fb 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -493,7 +493,7 @@ static bool blk_map_iter_next(struct request *req,
 		return true;
 	}
 
-	if (!iter->iter.bi_size)
+	if (!iter->bio || !iter->iter.bi_size)
 		return false;
 
 	bv = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter);
@@ -514,7 +514,8 @@ static bool blk_map_iter_next(struct request *req,
 			if (!iter->bio->bi_next)
 				break;
 			iter->bio = iter->bio->bi_next;
-			iter->iter = iter->bio->bi_iter;
+			if (iter->bio)
+				iter->iter = iter->bio->bi_iter;
 		}
 
 		next = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter);


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: BUG: NULL pointer dereferenced within __blk_rq_map_sg
  2025-02-11 12:13 ` Ming Lei
@ 2025-02-11 15:28   ` Ming Lei
  2025-02-12 23:24     ` Cheyenne Wills
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2025-02-11 15:28 UTC (permalink / raw)
  To: Cheyenne Wills; +Cc: linux-block, Christoph Hellwig

On Tue, Feb 11, 2025 at 08:13:16PM +0800, Ming Lei wrote:
> On Fri, Feb 07, 2025 at 07:09:39PM -0700, Cheyenne Wills wrote:
> > While I was setting up to test with linux 6.14-rc1 (under Xen), I ran
> > into a consistent NULL ptr dereference within __blk_rq_map_sg when
> > booting the system.
> > 
> > Using git bisect I was able to narrow down the "bad" commit to:
> > 
> > block: add a dma mapping iterator (b7175e24d6acf79d9f3af9ce9d3d50de1fa748ec)
> > 
> > Building a kernel with the parent commit
> > (2caca8fc7aad9ea9a6ea3ed26ed146b1e5f06fab) using the same .config does
> > not fail.
> > 
> > Following is the console log showing the error as well as the Xen
> > (libvirt) configuration for the guest that I'm using.
> > 
> > Please let me know if there is any additional information that I can provide.
> 
> Can you test the following patch?
> 

Please try the revised one:


diff --git a/block/blk-merge.c b/block/blk-merge.c
index 15cd231d560c..a66d087a6b55 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -493,7 +493,7 @@ static bool blk_map_iter_next(struct request *req,
 		return true;
 	}
 
-	if (!iter->iter.bi_size)
+	if (!iter->bio || !iter->iter.bi_size)
 		return false;
 
 	bv = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter);
@@ -514,6 +514,8 @@ static bool blk_map_iter_next(struct request *req,
 			if (!iter->bio->bi_next)
 				break;
 			iter->bio = iter->bio->bi_next;
+			if (!iter->bio)
+				break;
 			iter->iter = iter->bio->bi_iter;
 		}
 



Thanks,
Ming


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: BUG: NULL pointer dereferenced within __blk_rq_map_sg
  2025-02-11 15:28   ` Ming Lei
@ 2025-02-12 23:24     ` Cheyenne Wills
  2025-02-13  1:29       ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Cheyenne Wills @ 2025-02-12 23:24 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Christoph Hellwig

On Tue, Feb 11, 2025 at 8:29 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Tue, Feb 11, 2025 at 08:13:16PM +0800, Ming Lei wrote:
> > On Fri, Feb 07, 2025 at 07:09:39PM -0700, Cheyenne Wills wrote:
> > > While I was setting up to test with linux 6.14-rc1 (under Xen), I ran
> > > into a consistent NULL ptr dereference within __blk_rq_map_sg when
> > > booting the system.
> > >
> > > Using git bisect I was able to narrow down the "bad" commit to:
> > >
> > > block: add a dma mapping iterator (b7175e24d6acf79d9f3af9ce9d3d50de1fa748ec)
> > >
> > > Building a kernel with the parent commit
> > > (2caca8fc7aad9ea9a6ea3ed26ed146b1e5f06fab) using the same .config does
> > > not fail.
> > >
> > > Following is the console log showing the error as well as the Xen
> > > (libvirt) configuration for the guest that I'm using.
> > >
> > > Please let me know if there is any additional information that I can provide.
> >
> > Can you test the following patch?
> >
>
> Please try the revised one:
>
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 15cd231d560c..a66d087a6b55 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -493,7 +493,7 @@ static bool blk_map_iter_next(struct request *req,
>                 return true;
>         }
>
> -       if (!iter->iter.bi_size)
> +       if (!iter->bio || !iter->iter.bi_size)
>                 return false;
>
>         bv = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter);
> @@ -514,6 +514,8 @@ static bool blk_map_iter_next(struct request *req,
>                         if (!iter->bio->bi_next)
>                                 break;
>                         iter->bio = iter->bio->bi_next;
> +                       if (!iter->bio)
> +                               break;
>                         iter->iter = iter->bio->bi_iter;
>                 }
>
>
>
>
> Thanks,
> Ming
>

Still getting a BUG at the same location.

I was able to capture the BUG using a xen gdbsx / gdb session (the
offending instruction is a mov  0x28(%rdx),%r13d and the bug is that
%rdx is zero. -- break *__blk_rq_map_sg+0x5e if $rdx == 0)

It appears in __blk_rq_map_sg that the rq->bio is NULL at the start of
the routine.

Breakpoint 1, __blk_rq_map_sg (q=<optimized out>,
rq=rq@entry=0xffff888102231300, sglist=0xffff88810222f600,
last_sg=last_sg@entry=0xffffc90000137c08) at block/blk-merge.c:568
(gdb) bt
#0  __blk_rq_map_sg (q=<optimized out>,
rq=rq@entry=0xffff888102231300, sglist=0xffff88810222f600,
last_sg=last_sg@entry=0xffffc90000137c08) at block/blk-merge.c:568
#1  0xffffffff81db3a27 in blk_rq_map_sg (sglist=<optimized out>,
rq=0xffff888102231300, q=<optimized out>) at
./include/linux/blk-mq.h:1165
#2  blkif_queue_rw_req (rinfo=0xffff88810088c000,
req=0xffff888102231300) at drivers/block/xen-blkfront.c:754
#3  blkif_queue_request (rinfo=0xffff88810088c000,
req=0xffff888102231300) at drivers/block/xen-blkfront.c:880
#4  blkif_queue_rq (hctx=0xffff888102205c00, qd=<optimized out>) at
drivers/block/xen-blkfront.c:921
#5  0xffffffff818c1867 in blk_mq_dispatch_rq_list
(hctx=hctx@entry=0xffff888102205c00,
list=list@entry=0xffffc90000137d38, nr_budgets=nr_budgets@entry=0) at
block/blk-mq.c:2120
#6  0xffffffff818c7ca0 in __blk_mq_sched_dispatch_requests
(hctx=hctx@entry=0xffff888102205c00) at block/blk-mq-sched.c:301
#7  0xffffffff818c820d in blk_mq_sched_dispatch_requests
(hctx=hctx@entry=0xffff888102205c00) at block/blk-mq-sched.c:331
#8  0xffffffff818bdbdc in blk_mq_run_hw_queue
(hctx=0xffff888102205c00, async=async@entry=false) at
block/blk-mq.c:2354
#9  0xffffffff818bec87 in blk_mq_run_hw_queues
(q=q@entry=0xffff888100d49b00, async=async@entry=false) at
block/blk-mq.c:2403
#10 0xffffffff818bfc52 in blk_mq_requeue_work
(work=0xffff888100d49cf8) at block/blk-mq.c:1568
#11 0xffffffff812c5528 in process_one_work
(worker=worker@entry=0xffff888100c253c0, work=0xffff888100d49cf8) at
kernel/workqueue.c:3236
#12 0xffffffff812c668b in process_scheduled_works (worker=<optimized
out>) at kernel/workqueue.c:3317
#13 worker_thread (__worker=0xffff888100c253c0) at kernel/workqueue.c:3398
#14 0xffffffff812cfaf1 in kthread (_create=<optimized out>) at
kernel/kthread.c:464
#15 0xffffffff812502d4 in ret_from_fork (prev=<optimized out>,
regs=0xffffc90000137f58, fn=0xffffffff812cfa00 <kthread>,
fn_arg=0xffff888100c26340) at arch/x86/kernel/process.c:148
#16 0xffffffff812024aa in ret_from_fork_asm () at arch/x86/entry/entry_64.S:244
#17 0x0000000000000000 in ?? ()
(gdb) print *rq
$1 = {
  q = 0xffff888100d49b00,
  mq_ctx = 0xffff888206c37b00,
  mq_hctx = 0xffff888102205c00,
  cmd_flags = 262146,
  rq_flags = 2,
  tag = 2,
  internal_tag = 59,
  timeout = 30000,
  __data_len = 0,
  __sector = 18446744073709551615,
  bio = 0x0 <fixed_percpu_data>,
  biotail = 0x0 <fixed_percpu_data>,
  {
    queuelist = {
      next = 0xffff888102231348,
      prev = 0xffff888102231348
    },
    rq_next = 0xffff888102231348
  },
  part = 0x0 <fixed_percpu_data>,
  start_time_ns = 62585793058,
  io_start_time_ns = 0,
  stats_sectors = 0,
  nr_phys_segments = 0,
  nr_integrity_segments = 0,
  state = MQ_RQ_IN_FLIGHT,
  ref = {
    counter = 1
  },
  deadline = 4294759798,
  {
    hash = {
      next = 0x0 <fixed_percpu_data>,
      pprev = 0x0 <fixed_percpu_data>
    },
    ipi_list = {
      next = 0x0 <fixed_percpu_data>
    }
  },
  {
    rb_node = {
      __rb_parent_color = 18446612686400852888,
      rb_right = 0x0 <fixed_percpu_data>,
      rb_left = 0x0 <fixed_percpu_data>
    },
    special_vec = {
      bv_page = 0xffff888102231398,
      bv_len = 0,
      bv_offset = 0
    }
  },
  elv = {
    icq = 0x0 <fixed_percpu_data>,
    priv = {0x0 <fixed_percpu_data>, 0x0 <fixed_percpu_data>}
  },
  flush = {
    seq = 0,
    saved_end_io = 0x0 <fixed_percpu_data>
  },
  fifo_time = 0,
  end_io = 0xffffffff818b56b0 <flush_end_io>,
  end_io_data = 0x0 <fixed_percpu_data>
}


I suspect that the NULL dereference is in the initialization of the
req_iterator itself:

struct req_iterator iter = {
.bio = rq->bio,
.iter = rq->bio->bi_iter,        <<< here
};

Again let me know if there is any other information that I can provide.

Cheyenne

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: NULL pointer dereferenced within __blk_rq_map_sg
  2025-02-12 23:24     ` Cheyenne Wills
@ 2025-02-13  1:29       ` Ming Lei
  2025-02-13  6:32         ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2025-02-13  1:29 UTC (permalink / raw)
  To: Cheyenne Wills; +Cc: linux-block, Christoph Hellwig

On Wed, Feb 12, 2025 at 04:24:43PM -0700, Cheyenne Wills wrote:
> On Tue, Feb 11, 2025 at 8:29 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Tue, Feb 11, 2025 at 08:13:16PM +0800, Ming Lei wrote:
> > > On Fri, Feb 07, 2025 at 07:09:39PM -0700, Cheyenne Wills wrote:
> > > > While I was setting up to test with linux 6.14-rc1 (under Xen), I ran
> > > > into a consistent NULL ptr dereference within __blk_rq_map_sg when
> > > > booting the system.
> > > >
> > > > Using git bisect I was able to narrow down the "bad" commit to:
> > > >
> > > > block: add a dma mapping iterator (b7175e24d6acf79d9f3af9ce9d3d50de1fa748ec)
> > > >
> > > > Building a kernel with the parent commit
> > > > (2caca8fc7aad9ea9a6ea3ed26ed146b1e5f06fab) using the same .config does
> > > > not fail.
> > > >
> > > > Following is the console log showing the error as well as the Xen
> > > > (libvirt) configuration for the guest that I'm using.
> > > >
> > > > Please let me know if there is any additional information that I can provide.
> > >
> > > Can you test the following patch?
> > >
> >
> > Please try the revised one:
> >
> >
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 15cd231d560c..a66d087a6b55 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -493,7 +493,7 @@ static bool blk_map_iter_next(struct request *req,
> >                 return true;
> >         }
> >
> > -       if (!iter->iter.bi_size)
> > +       if (!iter->bio || !iter->iter.bi_size)
> >                 return false;
> >
> >         bv = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter);
> > @@ -514,6 +514,8 @@ static bool blk_map_iter_next(struct request *req,
> >                         if (!iter->bio->bi_next)
> >                                 break;
> >                         iter->bio = iter->bio->bi_next;
> > +                       if (!iter->bio)
> > +                               break;
> >                         iter->iter = iter->bio->bi_iter;
> >                 }
> >
> >
> >
> >
> > Thanks,
> > Ming
> >
> 
> Still getting a BUG at the same location.
> 
> I was able to capture the BUG using a xen gdbsx / gdb session (the
> offending instruction is a mov  0x28(%rdx),%r13d and the bug is that
> %rdx is zero. -- break *__blk_rq_map_sg+0x5e if $rdx == 0)
> 
> It appears in __blk_rq_map_sg that the rq->bio is NULL at the start of
> the routine.

Yeah, turns out oops is triggered in initializing req_iterator for
discard req, and the following patch should be enough:


diff --git a/block/blk-merge.c b/block/blk-merge.c
index 15cd231d560c..9d7e87052882 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -556,11 +556,13 @@ int __blk_rq_map_sg(struct request_queue *q, struct request *rq,
 {
 	struct req_iterator iter = {
 		.bio	= rq->bio,
-		.iter	= rq->bio->bi_iter,
 	};
 	struct phys_vec vec;
 	int nsegs = 0;
 
+	if (iter.bio)
+		iter.iter = iter.bio->bi_iter;
+
 	while (blk_map_iter_next(rq, &iter, &vec)) {
 		*last_sg = blk_next_sg(last_sg, sglist);
 		sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len,


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: BUG: NULL pointer dereferenced within __blk_rq_map_sg
  2025-02-13  1:29       ` Ming Lei
@ 2025-02-13  6:32         ` Christoph Hellwig
  2025-02-13  6:38           ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2025-02-13  6:32 UTC (permalink / raw)
  To: Ming Lei; +Cc: Cheyenne Wills, linux-block, Christoph Hellwig

On Thu, Feb 13, 2025 at 09:29:53AM +0800, Ming Lei wrote:
> Yeah, turns out oops is triggered in initializing req_iterator for
> discard req, and the following patch should be enough:

How do we end up in blk_rq_map_sg for a discard request here?
dma-mapping doesn't make sense for a non-special pyaload discard
as used by xxen-blkfront, and xen-blkfront also only calls
blk_rq_map_sg from blkif_queue_rw_req and not blkif_queue_discard_req.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: NULL pointer dereferenced within __blk_rq_map_sg
  2025-02-13  6:32         ` Christoph Hellwig
@ 2025-02-13  6:38           ` Christoph Hellwig
  2025-02-13 12:39             ` Cheyenne Wills
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2025-02-13  6:38 UTC (permalink / raw)
  To: Ming Lei; +Cc: Cheyenne Wills, linux-block, Christoph Hellwig

On Thu, Feb 13, 2025 at 07:32:14AM +0100, Christoph Hellwig wrote:
> On Thu, Feb 13, 2025 at 09:29:53AM +0800, Ming Lei wrote:
> > Yeah, turns out oops is triggered in initializing req_iterator for
> > discard req, and the following patch should be enough:
> 
> How do we end up in blk_rq_map_sg for a discard request here?
> dma-mapping doesn't make sense for a non-special pyaload discard
> as used by xxen-blkfront, and xen-blkfront also only calls
> blk_rq_map_sg from blkif_queue_rw_req and not blkif_queue_discard_req.

I think we're probably dealing with a flush command, as that's the
only request that doesn't have a bio except for empty passthrough
commands.  xen-blkfront is a bit weird in calling into these data
transfer helpers despite not having data to transfer, but I guess
something like your patch to safeguard against it should be fine.
But add a comment as well please.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: NULL pointer dereferenced within __blk_rq_map_sg
  2025-02-13  6:38           ` Christoph Hellwig
@ 2025-02-13 12:39             ` Cheyenne Wills
  0 siblings, 0 replies; 8+ messages in thread
From: Cheyenne Wills @ 2025-02-13 12:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ming Lei, linux-block

With the patch in __blk_rq_map_sg, I was able to boot successfully.
(just to note that the code that I tested included the other patch
that updated blk_map_iter_next with guards as well).  I can do a test
that just has the update to __blk_rq_map_sg if needed.

Thanks

On Wed, Feb 12, 2025 at 11:38 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Feb 13, 2025 at 07:32:14AM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 13, 2025 at 09:29:53AM +0800, Ming Lei wrote:
> > > Yeah, turns out oops is triggered in initializing req_iterator for
> > > discard req, and the following patch should be enough:
> >
> > How do we end up in blk_rq_map_sg for a discard request here?
> > dma-mapping doesn't make sense for a non-special pyaload discard
> > as used by xxen-blkfront, and xen-blkfront also only calls
> > blk_rq_map_sg from blkif_queue_rw_req and not blkif_queue_discard_req.
>
> I think we're probably dealing with a flush command, as that's the
> only request that doesn't have a bio except for empty passthrough
> commands.  xen-blkfront is a bit weird in calling into these data
> transfer helpers despite not having data to transfer, but I guess
> something like your patch to safeguard against it should be fine.
> But add a comment as well please.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-02-13 12:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-08  2:09 BUG: NULL pointer dereferenced within __blk_rq_map_sg Cheyenne Wills
2025-02-11 12:13 ` Ming Lei
2025-02-11 15:28   ` Ming Lei
2025-02-12 23:24     ` Cheyenne Wills
2025-02-13  1:29       ` Ming Lei
2025-02-13  6:32         ` Christoph Hellwig
2025-02-13  6:38           ` Christoph Hellwig
2025-02-13 12:39             ` Cheyenne Wills

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).