[bug report] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* [bug report] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
@ 2026-01-07 16:39 Yi Zhang
  2026-01-07 16:48 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Yi Zhang @ 2026-01-07 16:39 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Ming Lei, Shinichiro Kawasaki

Hi
The following issue[2] was triggered by blktests nvme/059 and it's
100% reproduced with commit[1]. Please help check it and let me know
if you need any info/test for it.
Seems it's one regression, I will try to test with the latest
linux-block/for-next and also bisect it tomorrow.

[1]
commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
Merge: 29cefd61e0c6 fcf463b92a08
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Jan 6 05:48:07 2026 -0700

    Merge branch 'for-7.0/blk-pvec' into for-next

    * for-7.0/blk-pvec:
      types: move phys_vec definition to common header
      nvme-pci: Use size_t for length fields to handle larger sizes
[2]
[16866.579229] run blktests nvme/049 at 2026-01-07 02:00:14
[16869.709147]  slab io_kiocb start ffff88825e6ad400 pointer offset 144 size 248
[16869.716399] list_add corruption. prev->next should be next
(ffff888200596100), but was 0000000000000000. (prev=ffff88825e6ad490).
[16869.728106] ------------[ cut here ]------------
[16869.732738] kernel BUG at lib/list_debug.c:32!
[16869.737209] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
[16869.742790] CPU: 15 UID: 0 PID: 71799 Comm: fio Kdump: loaded Not
tainted 6.19.0-rc3+ #1 PREEMPT(voluntary)
[16869.752614] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
2.21.1 09/24/2025
[16869.760267] RIP: 0010:__list_add_valid_or_report+0xf9/0x130
[16869.765849] Code: 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c
02 00 75 3c 49 8b 55 00 4c 89 e9 48 89 de 48 c7 c7 40 6d f6 9e e8 67
e1 a1 fe <0f> 0b 4c 89 e7 e8 8d eb 78 ff e9 3c ff ff ff 4c 89 ef e8 80
eb 78
[16869.784600] RSP: 0018:ffffc9000aadf990 EFLAGS: 00010282
[16869.789835] RAX: 0000000000000075 RBX: ffff888200596100 RCX: 0000000000000000
[16869.796967] RDX: 0000000000000075 RSI: ffffffff9ef66980 RDI: fffff5200155bf24
[16869.804101] RBP: ffff88825e6adc10 R08: 0000000000000001 R09: fffff5200155bee6
[16869.811234] R10: ffffc9000aadf737 R11: 0000000000000001 R12: ffff888200596108
[16869.818366] R13: ffff88825e6ad490 R14: ffff88825e6ad490 R15: ffff88825e6adc10
[16869.825500] FS:  00007f01a51bb740(0000) GS:ffff88887f6c4000(0000)
knlGS:0000000000000000
[16869.833591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16869.839338] CR2: 00007f019cdb7430 CR3: 00000001eeeae000 CR4: 0000000000350ef0
[16869.846469] Call Trace:
[16869.848923]  <TASK>
[16869.851034]  io_issue_sqe+0x7eb/0xdd0
[16869.854707]  ? srso_return_thunk+0x5/0x5f
[16869.858725]  ? io_uring_cmd_prep+0x350/0x560
[16869.863012]  io_submit_sqes+0x475/0x1000
[16869.866942]  ? srso_return_thunk+0x5/0x5f
[16869.870969]  ? __pfx_io_submit_sqes+0x10/0x10
[16869.875332]  ? srso_return_thunk+0x5/0x5f
[16869.879352]  ? __fget_files+0x1b6/0x2f0
[16869.883208]  __do_sys_io_uring_enter+0x433/0x820
[16869.887829]  ? fput+0x4c/0xa0
[16869.890809]  ? __pfx___do_sys_io_uring_enter+0x10/0x10
[16869.895958]  ? srso_return_thunk+0x5/0x5f
[16869.899978]  ? srso_return_thunk+0x5/0x5f
[16869.903999]  ? rcu_is_watching+0x15/0xb0
[16869.907934]  ? srso_return_thunk+0x5/0x5f
[16869.911953]  ? trace_irq_enable.constprop.0+0x13d/0x190
[16869.917183]  ? srso_return_thunk+0x5/0x5f
[16869.921203]  ? syscall_trace_enter+0x13e/0x230
[16869.925656]  ? srso_return_thunk+0x5/0x5f
[16869.929685]  do_syscall_64+0x95/0x520
[16869.933363]  ? srso_return_thunk+0x5/0x5f
[16869.937380]  ? trace_irq_enable.constprop.0+0x13d/0x190
[16869.942608]  ? srso_return_thunk+0x5/0x5f
[16869.946628]  ? do_syscall_64+0x16d/0x520
[16869.950556]  ? __pfx_pgd_none+0x10/0x10
[16869.954408]  ? srso_return_thunk+0x5/0x5f
[16869.958424]  ? __handle_mm_fault+0x97e/0x11d0
[16869.962795]  ? __pfx_css_rstat_updated+0x10/0x10
[16869.967421]  ? __pfx___handle_mm_fault+0x10/0x10
[16869.972050]  ? srso_return_thunk+0x5/0x5f
[16869.976069]  ? rcu_is_watching+0x15/0xb0
[16869.979995]  ? srso_return_thunk+0x5/0x5f
[16869.984016]  ? trace_count_memcg_events+0x14f/0x1a0
[16869.988905]  ? srso_return_thunk+0x5/0x5f
[16869.992924]  ? count_memcg_events+0xe5/0x370
[16869.997198]  ? srso_return_thunk+0x5/0x5f
[16870.001218]  ? srso_return_thunk+0x5/0x5f
[16870.005232]  ? __up_read+0x2c5/0x700
[16870.008821]  ? __pfx___up_read+0x10/0x10
[16870.012756]  ? handle_mm_fault+0x452/0x8a0
[16870.016862]  ? do_user_addr_fault+0x274/0xa60
[16870.021229]  ? srso_return_thunk+0x5/0x5f
[16870.025241]  ? rcu_is_watching+0x15/0xb0
[16870.029172]  ? srso_return_thunk+0x5/0x5f
[16870.033189]  ? rcu_is_watching+0x15/0xb0
[16870.037114]  ? srso_return_thunk+0x5/0x5f
[16870.041126]  ? trace_irq_enable.constprop.0+0x13d/0x190
[16870.046353]  ? srso_return_thunk+0x5/0x5f
[16870.050368]  ? srso_return_thunk+0x5/0x5f
[16870.054387]  ? irqentry_exit+0x93/0x5f0
[16870.058229]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[16870.063288] RIP: 0033:0x558b6250d067
[16870.066876] Code: 24 94 00 00 00 85 f6 74 78 49 8b 44 24 20 41 8b
3c 24 45 31 c0 45 31 c9 41 ba 01 00 00 00 31 d2 44 8b 38 b8 aa 01 00
00 0f 05 <48> 89 c3 89 c5 85 c0 7e 90 89 c2 44 89 fe 4c 89 ef e8 c3 d6
ff ff
[16870.085630] RSP: 002b:00007ffc479b70b0 EFLAGS: 00000246 ORIG_RAX:
00000000000001aa
[16870.093205] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000558b6250d067
[16870.100335] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000007
[16870.107468] RBP: 00007f019cdb6000 R08: 0000000000000000 R09: 0000000000000000
[16870.114601] R10: 0000000000000001 R11: 0000000000000246 R12: 0000558b9f3eab00
[16870.121734] R13: 00007f019cdb6000 R14: 0000558b62527000 R15: 0000000000000001
[16870.128882]  </TASK>
[16870.131077] Modules linked in: ext4 crc16 mbcache jbd2
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace
nfs_localio netfs platform_profile dell_wmi dell_smbios intel_rapl_msr
amd_atl intel_rapl_common sparse_keymap amd64_edac rfkill edac_mce_amd
video vfat dcdbas fat kvm_amd cdc_ether usbnet kvm mii irqbypass
mgag200 wmi_bmof dell_wmi_descriptor rapl i2c_algo_bit pcspkr
acpi_cpufreq ipmi_ssif ptdma i2c_piix4 k10temp i2c_smbus
acpi_power_meter ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler sg
loop fuse xfs sd_mod nvme ahci libahci nvme_core mpt3sas
ghash_clmulni_intel tg3 nvme_keyring ccp libata raid_class nvme_auth
hkdf scsi_transport_sas sp5100_tco wmi sunrpc dm_mirror dm_region_hash
dm_log dm_mod nfnetlink [last unloaded: nvmet]

-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bug report] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-07 16:39 [bug report] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049 Yi Zhang
@ 2026-01-07 16:48 ` Jens Axboe
  2026-01-08  6:39   ` Yi Zhang
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2026-01-07 16:48 UTC (permalink / raw)
  To: Yi Zhang, linux-block; +Cc: Ming Lei, Shinichiro Kawasaki

On 1/7/26 9:39 AM, Yi Zhang wrote:
> Hi
> The following issue[2] was triggered by blktests nvme/059 and it's

nvme/049 presumably?

> 100% reproduced with commit[1]. Please help check it and let me know
> if you need any info/test for it.
> Seems it's one regression, I will try to test with the latest
> linux-block/for-next and also bisect it tomorrow.

Doesn't reproduce for me on the current tree, but nothing since:

> commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
> Merge: 29cefd61e0c6 fcf463b92a08
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Tue Jan 6 05:48:07 2026 -0700
> 
>     Merge branch 'for-7.0/blk-pvec' into for-next

should have impacted that. So please do bisect.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bug report] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-07 16:48 ` Jens Axboe
@ 2026-01-08  6:39   ` Yi Zhang
  2026-01-14  5:58     ` [bug report][bisected] " Yi Zhang
  0 siblings, 1 reply; 13+ messages in thread
From: Yi Zhang @ 2026-01-08  6:39 UTC (permalink / raw)
  To: Jens Axboe, fengnanchang; +Cc: linux-block, Ming Lei, Shinichiro Kawasaki

On Thu, Jan 8, 2026 at 12:48 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 1/7/26 9:39 AM, Yi Zhang wrote:
> > Hi
> > The following issue[2] was triggered by blktests nvme/059 and it's
>
> nvme/049 presumably?
>
Yes.

> > 100% reproduced with commit[1]. Please help check it and let me know
> > if you need any info/test for it.
> > Seems it's one regression, I will try to test with the latest
> > linux-block/for-next and also bisect it tomorrow.
>
> Doesn't reproduce for me on the current tree, but nothing since:
>
> > commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
> > Merge: 29cefd61e0c6 fcf463b92a08
> > Author: Jens Axboe <axboe@kernel.dk>
> > Date:   Tue Jan 6 05:48:07 2026 -0700
> >
> >     Merge branch 'for-7.0/blk-pvec' into for-next
>
> should have impacted that. So please do bisect.

Hi Jens
The issue seems was introduced from below commit.
and the issue cannot be reproduced after reverting this commit.

3c7d76d6128a io_uring: IOPOLL polling improvements

>
> --
> Jens Axboe
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-08  6:39   ` Yi Zhang
@ 2026-01-14  5:58     ` Yi Zhang
  2026-01-14  9:40       ` Alexander Atanasov
  2026-01-14 14:11       ` Ming Lei
  0 siblings, 2 replies; 13+ messages in thread
From: Yi Zhang @ 2026-01-14  5:58 UTC (permalink / raw)
  To: Jens Axboe, fengnanchang; +Cc: linux-block, Ming Lei, Shinichiro Kawasaki

On Thu, Jan 8, 2026 at 2:39 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>
> On Thu, Jan 8, 2026 at 12:48 AM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > On 1/7/26 9:39 AM, Yi Zhang wrote:
> > > Hi
> > > The following issue[2] was triggered by blktests nvme/059 and it's
> >
> > nvme/049 presumably?
> >
> Yes.
>
> > > 100% reproduced with commit[1]. Please help check it and let me know
> > > if you need any info/test for it.
> > > Seems it's one regression, I will try to test with the latest
> > > linux-block/for-next and also bisect it tomorrow.
> >
> > Doesn't reproduce for me on the current tree, but nothing since:
> >
> > > commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
> > > Merge: 29cefd61e0c6 fcf463b92a08
> > > Author: Jens Axboe <axboe@kernel.dk>
> > > Date:   Tue Jan 6 05:48:07 2026 -0700
> > >
> > >     Merge branch 'for-7.0/blk-pvec' into for-next
> >
> > should have impacted that. So please do bisect.
>
> Hi Jens
> The issue seems was introduced from below commit.
> and the issue cannot be reproduced after reverting this commit.

The issue still can be reproduced on the latest linux-block/for-next

>
> 3c7d76d6128a io_uring: IOPOLL polling improvements
>
> >
> > --
> > Jens Axboe
> >
>
>
> --
> Best Regards,
>   Yi Zhang



-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-14  5:58     ` [bug report][bisected] " Yi Zhang
@ 2026-01-14  9:40       ` Alexander Atanasov
  2026-01-14 12:43         ` Christoph Hellwig
  2026-01-14 14:11       ` Ming Lei
  1 sibling, 1 reply; 13+ messages in thread
From: Alexander Atanasov @ 2026-01-14  9:40 UTC (permalink / raw)
  To: Yi Zhang, Jens Axboe, fengnanchang
  Cc: linux-block, Ming Lei, Shinichiro Kawasaki

Hello Yi,

On 14.01.26 7:58, Yi Zhang wrote:
> On Thu, Jan 8, 2026 at 2:39 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>>
>> On Thu, Jan 8, 2026 at 12:48 AM Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> On 1/7/26 9:39 AM, Yi Zhang wrote:
>>>> Hi
>>>> The following issue[2] was triggered by blktests nvme/059 and it's
>>>
>>> nvme/049 presumably?
>>>
>> Yes.
>>
>>>> 100% reproduced with commit[1]. Please help check it and let me know
>>>> if you need any info/test for it.
>>>> Seems it's one regression, I will try to test with the latest
>>>> linux-block/for-next and also bisect it tomorrow.
>>>
>>> Doesn't reproduce for me on the current tree, but nothing since:
>>>
>>>> commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
>>>> Merge: 29cefd61e0c6 fcf463b92a08
>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>> Date:   Tue Jan 6 05:48:07 2026 -0700
>>>>
>>>>      Merge branch 'for-7.0/blk-pvec' into for-next
>>>
>>> should have impacted that. So please do bisect.
>>
>> Hi Jens
>> The issue seems was introduced from below commit.
>> and the issue cannot be reproduced after reverting this commit.
> 
> The issue still can be reproduced on the latest linux-block/for-next
> 
>>
>> 3c7d76d6128a io_uring: IOPOLL polling improvements


Double linked lists require init, single lists do not (including 
io_wq_work_list). iopoll_node is never list_init-ed. So init before adding.

Can you check if this fixes it for you? If yes, i will submit it as a 
proper patch - no way to test it at the moment.

-- 
have fun,
alex

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index cac292d103f1..fba0ae0cbf7b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1679,6 +1679,7 @@ static void io_iopoll_req_issued(struct io_kiocb 
*req, unsigned int issue_flags)
                         ctx->poll_multi_queue = true;
         }

+       list_init(&&req->iopoll_node);
         list_add_tail(&req->iopoll_node, &ctx->iopoll_list);

         if (unlikely(needs_lock)) {



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-14  9:40       ` Alexander Atanasov
@ 2026-01-14 12:43         ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2026-01-14 12:43 UTC (permalink / raw)
  To: alex+zkern
  Cc: Yi Zhang, Jens Axboe, fengnanchang, linux-block, Ming Lei,
	Shinichiro Kawasaki

On Wed, Jan 14, 2026 at 11:40:41AM +0200, Alexander Atanasov wrote:
> Double linked lists require init, single lists do not (including
> io_wq_work_list). iopoll_node is never list_init-ed. So init before adding.
> 
> Can you check if this fixes it for you? If yes, i will submit it as a proper
> patch - no way to test it at the moment.

The heads (anchors) of lists need initializations.  The entries added
to the list do not.  I know this is a bit confusing because they use
the same time, but besides not compiling due to the double-&, the patch
would not do anything even the version that would compile.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-14  5:58     ` [bug report][bisected] " Yi Zhang
  2026-01-14  9:40       ` Alexander Atanasov
@ 2026-01-14 14:11       ` Ming Lei
  2026-01-14 14:43         ` Jens Axboe
  2026-01-16 11:54         ` Alexander Atanasov
  1 sibling, 2 replies; 13+ messages in thread
From: Ming Lei @ 2026-01-14 14:11 UTC (permalink / raw)
  To: Yi Zhang; +Cc: Jens Axboe, fengnanchang, linux-block, Shinichiro Kawasaki

On Wed, Jan 14, 2026 at 01:58:03PM +0800, Yi Zhang wrote:
> On Thu, Jan 8, 2026 at 2:39 PM Yi Zhang <yi.zhang@redhat.com> wrote:
> >
> > On Thu, Jan 8, 2026 at 12:48 AM Jens Axboe <axboe@kernel.dk> wrote:
> > >
> > > On 1/7/26 9:39 AM, Yi Zhang wrote:
> > > > Hi
> > > > The following issue[2] was triggered by blktests nvme/059 and it's
> > >
> > > nvme/049 presumably?
> > >
> > Yes.
> >
> > > > 100% reproduced with commit[1]. Please help check it and let me know
> > > > if you need any info/test for it.
> > > > Seems it's one regression, I will try to test with the latest
> > > > linux-block/for-next and also bisect it tomorrow.
> > >
> > > Doesn't reproduce for me on the current tree, but nothing since:
> > >
> > > > commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
> > > > Merge: 29cefd61e0c6 fcf463b92a08
> > > > Author: Jens Axboe <axboe@kernel.dk>
> > > > Date:   Tue Jan 6 05:48:07 2026 -0700
> > > >
> > > >     Merge branch 'for-7.0/blk-pvec' into for-next
> > >
> > > should have impacted that. So please do bisect.
> >
> > Hi Jens
> > The issue seems was introduced from below commit.
> > and the issue cannot be reproduced after reverting this commit.
> 
> The issue still can be reproduced on the latest linux-block/for-next

Hi Yi,

Can you try the following patch?


diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index a9c097dacad6..7b0e62b8322b 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -425,14 +425,23 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
 	pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
 
 	/*
-	 * IOPOLL could potentially complete this request directly, but
-	 * if multiple rings are polling on the same queue, then it's possible
-	 * for one ring to find completions for another ring. Punting the
-	 * completion via task_work will always direct it to the right
-	 * location, rather than potentially complete requests for ringA
-	 * under iopoll invocations from ringB.
+	 * For IOPOLL, complete the request inline. The request's io_kiocb
+	 * uses a union for io_task_work and iopoll_node, so scheduling
+	 * task_work would corrupt the iopoll_list while the request is
+	 * still on it. io_uring_cmd_done() handles IOPOLL by setting
+	 * iopoll_completed rather than scheduling task_work.
+	 *
+	 * For non-IOPOLL, complete via task_work to ensure we run in the
+	 * submitter's context and handling multiple rings is safe.
 	 */
-	io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
+	if (blk_rq_is_poll(req)) {
+		if (pdu->bio)
+			blk_rq_unmap_user(pdu->bio);
+		io_uring_cmd_done32(ioucmd, pdu->status, pdu->result, 0);
+	} else {
+		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
+	}
+
 	return RQ_END_IO_FREE;
 }
 


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-14 14:11       ` Ming Lei
@ 2026-01-14 14:43         ` Jens Axboe
  2026-01-14 14:58           ` Jens Axboe
  2026-01-16 11:54         ` Alexander Atanasov
  1 sibling, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2026-01-14 14:43 UTC (permalink / raw)
  To: Ming Lei, Yi Zhang; +Cc: fengnanchang, linux-block, Shinichiro Kawasaki

On 1/14/26 7:11 AM, Ming Lei wrote:
> On Wed, Jan 14, 2026 at 01:58:03PM +0800, Yi Zhang wrote:
>> On Thu, Jan 8, 2026 at 2:39 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>>>
>>> On Thu, Jan 8, 2026 at 12:48 AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> On 1/7/26 9:39 AM, Yi Zhang wrote:
>>>>> Hi
>>>>> The following issue[2] was triggered by blktests nvme/059 and it's
>>>>
>>>> nvme/049 presumably?
>>>>
>>> Yes.
>>>
>>>>> 100% reproduced with commit[1]. Please help check it and let me know
>>>>> if you need any info/test for it.
>>>>> Seems it's one regression, I will try to test with the latest
>>>>> linux-block/for-next and also bisect it tomorrow.
>>>>
>>>> Doesn't reproduce for me on the current tree, but nothing since:
>>>>
>>>>> commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
>>>>> Merge: 29cefd61e0c6 fcf463b92a08
>>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>>> Date:   Tue Jan 6 05:48:07 2026 -0700
>>>>>
>>>>>     Merge branch 'for-7.0/blk-pvec' into for-next
>>>>
>>>> should have impacted that. So please do bisect.
>>>
>>> Hi Jens
>>> The issue seems was introduced from below commit.
>>> and the issue cannot be reproduced after reverting this commit.
>>
>> The issue still can be reproduced on the latest linux-block/for-next
> 
> Hi Yi,
> 
> Can you try the following patch?
> 
> 
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index a9c097dacad6..7b0e62b8322b 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -425,14 +425,23 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
>  	pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
>  
>  	/*
> -	 * IOPOLL could potentially complete this request directly, but
> -	 * if multiple rings are polling on the same queue, then it's possible
> -	 * for one ring to find completions for another ring. Punting the
> -	 * completion via task_work will always direct it to the right
> -	 * location, rather than potentially complete requests for ringA
> -	 * under iopoll invocations from ringB.
> +	 * For IOPOLL, complete the request inline. The request's io_kiocb
> +	 * uses a union for io_task_work and iopoll_node, so scheduling
> +	 * task_work would corrupt the iopoll_list while the request is
> +	 * still on it. io_uring_cmd_done() handles IOPOLL by setting
> +	 * iopoll_completed rather than scheduling task_work.
> +	 *
> +	 * For non-IOPOLL, complete via task_work to ensure we run in the
> +	 * submitter's context and handling multiple rings is safe.
>  	 */
> -	io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
> +	if (blk_rq_is_poll(req)) {
> +		if (pdu->bio)
> +			blk_rq_unmap_user(pdu->bio);
> +		io_uring_cmd_done32(ioucmd, pdu->status, pdu->result, 0);
> +	} else {
> +		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
> +	}
> +
>  	return RQ_END_IO_FREE;
>  }
>  

Ah yes that should fix it, the task_work addition will conflict with
the list addition. Don't think it's safe though, which is why I made
them all use task_work previously. Let me fix it in the IOPOLL patch
instead.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-14 14:43         ` Jens Axboe
@ 2026-01-14 14:58           ` Jens Axboe
  2026-01-14 15:20             ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2026-01-14 14:58 UTC (permalink / raw)
  To: Ming Lei, Yi Zhang; +Cc: fengnanchang, linux-block, Shinichiro Kawasaki

On 1/14/26 7:43 AM, Jens Axboe wrote:
> On 1/14/26 7:11 AM, Ming Lei wrote:
>> On Wed, Jan 14, 2026 at 01:58:03PM +0800, Yi Zhang wrote:
>>> On Thu, Jan 8, 2026 at 2:39?PM Yi Zhang <yi.zhang@redhat.com> wrote:
>>>>
>>>> On Thu, Jan 8, 2026 at 12:48?AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>
>>>>> On 1/7/26 9:39 AM, Yi Zhang wrote:
>>>>>> Hi
>>>>>> The following issue[2] was triggered by blktests nvme/059 and it's
>>>>>
>>>>> nvme/049 presumably?
>>>>>
>>>> Yes.
>>>>
>>>>>> 100% reproduced with commit[1]. Please help check it and let me know
>>>>>> if you need any info/test for it.
>>>>>> Seems it's one regression, I will try to test with the latest
>>>>>> linux-block/for-next and also bisect it tomorrow.
>>>>>
>>>>> Doesn't reproduce for me on the current tree, but nothing since:
>>>>>
>>>>>> commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
>>>>>> Merge: 29cefd61e0c6 fcf463b92a08
>>>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>>>> Date:   Tue Jan 6 05:48:07 2026 -0700
>>>>>>
>>>>>>     Merge branch 'for-7.0/blk-pvec' into for-next
>>>>>
>>>>> should have impacted that. So please do bisect.
>>>>
>>>> Hi Jens
>>>> The issue seems was introduced from below commit.
>>>> and the issue cannot be reproduced after reverting this commit.
>>>
>>> The issue still can be reproduced on the latest linux-block/for-next
>>
>> Hi Yi,
>>
>> Can you try the following patch?
>>
>>
>> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
>> index a9c097dacad6..7b0e62b8322b 100644
>> --- a/drivers/nvme/host/ioctl.c
>> +++ b/drivers/nvme/host/ioctl.c
>> @@ -425,14 +425,23 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
>>  	pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
>>  
>>  	/*
>> -	 * IOPOLL could potentially complete this request directly, but
>> -	 * if multiple rings are polling on the same queue, then it's possible
>> -	 * for one ring to find completions for another ring. Punting the
>> -	 * completion via task_work will always direct it to the right
>> -	 * location, rather than potentially complete requests for ringA
>> -	 * under iopoll invocations from ringB.
>> +	 * For IOPOLL, complete the request inline. The request's io_kiocb
>> +	 * uses a union for io_task_work and iopoll_node, so scheduling
>> +	 * task_work would corrupt the iopoll_list while the request is
>> +	 * still on it. io_uring_cmd_done() handles IOPOLL by setting
>> +	 * iopoll_completed rather than scheduling task_work.
>> +	 *
>> +	 * For non-IOPOLL, complete via task_work to ensure we run in the
>> +	 * submitter's context and handling multiple rings is safe.
>>  	 */
>> -	io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
>> +	if (blk_rq_is_poll(req)) {
>> +		if (pdu->bio)
>> +			blk_rq_unmap_user(pdu->bio);
>> +		io_uring_cmd_done32(ioucmd, pdu->status, pdu->result, 0);
>> +	} else {
>> +		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
>> +	}
>> +
>>  	return RQ_END_IO_FREE;
>>  }
>>  
> 
> Ah yes that should fix it, the task_work addition will conflict with
> the list addition. Don't think it's safe though, which is why I made
> them all use task_work previously. Let me fix it in the IOPOLL patch
> instead.

This should be better:

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index dd084a55bed8..1fa8d829cbac 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -719,13 +719,10 @@ struct io_kiocb {
 	atomic_t			refs;
 	bool				cancel_seq_set;
 
-	/*
-	 * IOPOLL doesn't use task_work, so use the ->iopoll_node list
-	 * entry to manage pending iopoll requests.
-	 */
 	union {
 		struct io_task_work	io_task_work;
-		struct list_head	iopoll_node;
+		/* For IOPOLL setup queues, with hybrid polling */
+		u64                     iopoll_start;
 	};
 
 	union {
@@ -734,8 +731,8 @@ struct io_kiocb {
 		 * poll
 		 */
 		struct hlist_node	hash_node;
-		/* For IOPOLL setup queues, with hybrid polling */
-		u64                     iopoll_start;
+		/* IOPOLL completion handling */
+		struct list_head	iopoll_node;
 		/* for private io_kiocb freeing */
 		struct rcu_head		rcu_head;
 	};

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-14 14:58           ` Jens Axboe
@ 2026-01-14 15:20             ` Ming Lei
  2026-01-14 15:26               ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2026-01-14 15:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Yi Zhang, fengnanchang, linux-block, Shinichiro Kawasaki

On Wed, Jan 14, 2026 at 07:58:54AM -0700, Jens Axboe wrote:
> On 1/14/26 7:43 AM, Jens Axboe wrote:
> > On 1/14/26 7:11 AM, Ming Lei wrote:
> >> On Wed, Jan 14, 2026 at 01:58:03PM +0800, Yi Zhang wrote:
> >>> On Thu, Jan 8, 2026 at 2:39?PM Yi Zhang <yi.zhang@redhat.com> wrote:
> >>>>
> >>>> On Thu, Jan 8, 2026 at 12:48?AM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>>
> >>>>> On 1/7/26 9:39 AM, Yi Zhang wrote:
> >>>>>> Hi
> >>>>>> The following issue[2] was triggered by blktests nvme/059 and it's
> >>>>>
> >>>>> nvme/049 presumably?
> >>>>>
> >>>> Yes.
> >>>>
> >>>>>> 100% reproduced with commit[1]. Please help check it and let me know
> >>>>>> if you need any info/test for it.
> >>>>>> Seems it's one regression, I will try to test with the latest
> >>>>>> linux-block/for-next and also bisect it tomorrow.
> >>>>>
> >>>>> Doesn't reproduce for me on the current tree, but nothing since:
> >>>>>
> >>>>>> commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
> >>>>>> Merge: 29cefd61e0c6 fcf463b92a08
> >>>>>> Author: Jens Axboe <axboe@kernel.dk>
> >>>>>> Date:   Tue Jan 6 05:48:07 2026 -0700
> >>>>>>
> >>>>>>     Merge branch 'for-7.0/blk-pvec' into for-next
> >>>>>
> >>>>> should have impacted that. So please do bisect.
> >>>>
> >>>> Hi Jens
> >>>> The issue seems was introduced from below commit.
> >>>> and the issue cannot be reproduced after reverting this commit.
> >>>
> >>> The issue still can be reproduced on the latest linux-block/for-next
> >>
> >> Hi Yi,
> >>
> >> Can you try the following patch?
> >>
> >>
> >> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> >> index a9c097dacad6..7b0e62b8322b 100644
> >> --- a/drivers/nvme/host/ioctl.c
> >> +++ b/drivers/nvme/host/ioctl.c
> >> @@ -425,14 +425,23 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
> >>  	pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
> >>  
> >>  	/*
> >> -	 * IOPOLL could potentially complete this request directly, but
> >> -	 * if multiple rings are polling on the same queue, then it's possible
> >> -	 * for one ring to find completions for another ring. Punting the
> >> -	 * completion via task_work will always direct it to the right
> >> -	 * location, rather than potentially complete requests for ringA
> >> -	 * under iopoll invocations from ringB.
> >> +	 * For IOPOLL, complete the request inline. The request's io_kiocb
> >> +	 * uses a union for io_task_work and iopoll_node, so scheduling
> >> +	 * task_work would corrupt the iopoll_list while the request is
> >> +	 * still on it. io_uring_cmd_done() handles IOPOLL by setting
> >> +	 * iopoll_completed rather than scheduling task_work.
> >> +	 *
> >> +	 * For non-IOPOLL, complete via task_work to ensure we run in the
> >> +	 * submitter's context and handling multiple rings is safe.
> >>  	 */
> >> -	io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
> >> +	if (blk_rq_is_poll(req)) {
> >> +		if (pdu->bio)
> >> +			blk_rq_unmap_user(pdu->bio);
> >> +		io_uring_cmd_done32(ioucmd, pdu->status, pdu->result, 0);
> >> +	} else {
> >> +		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
> >> +	}
> >> +
> >>  	return RQ_END_IO_FREE;
> >>  }
> >>  
> > 
> > Ah yes that should fix it, the task_work addition will conflict with
> > the list addition. Don't think it's safe though, which is why I made
> > them all use task_work previously. Let me fix it in the IOPOLL patch
> > instead.
> 
> This should be better:
> 
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index dd084a55bed8..1fa8d829cbac 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -719,13 +719,10 @@ struct io_kiocb {
>  	atomic_t			refs;
>  	bool				cancel_seq_set;
>  
> -	/*
> -	 * IOPOLL doesn't use task_work, so use the ->iopoll_node list
> -	 * entry to manage pending iopoll requests.
> -	 */
>  	union {
>  		struct io_task_work	io_task_work;
> -		struct list_head	iopoll_node;
> +		/* For IOPOLL setup queues, with hybrid polling */
> +		u64                     iopoll_start;
>  	};
>  
>  	union {
> @@ -734,8 +731,8 @@ struct io_kiocb {
>  		 * poll
>  		 */
>  		struct hlist_node	hash_node;
> -		/* For IOPOLL setup queues, with hybrid polling */
> -		u64                     iopoll_start;
> +		/* IOPOLL completion handling */
> +		struct list_head	iopoll_node;
>  		/* for private io_kiocb freeing */
>  		struct rcu_head		rcu_head;
>  	};

This way looks better, just `req->iopoll_start` needs to read to local
variable first in io_uring_hybrid_poll().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-14 15:20             ` Ming Lei
@ 2026-01-14 15:26               ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2026-01-14 15:26 UTC (permalink / raw)
  To: Ming Lei; +Cc: Yi Zhang, fengnanchang, linux-block, Shinichiro Kawasaki

On 1/14/26 8:20 AM, Ming Lei wrote:
> On Wed, Jan 14, 2026 at 07:58:54AM -0700, Jens Axboe wrote:
>> On 1/14/26 7:43 AM, Jens Axboe wrote:
>>> On 1/14/26 7:11 AM, Ming Lei wrote:
>>>> On Wed, Jan 14, 2026 at 01:58:03PM +0800, Yi Zhang wrote:
>>>>> On Thu, Jan 8, 2026 at 2:39?PM Yi Zhang <yi.zhang@redhat.com> wrote:
>>>>>>
>>>>>> On Thu, Jan 8, 2026 at 12:48?AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>
>>>>>>> On 1/7/26 9:39 AM, Yi Zhang wrote:
>>>>>>>> Hi
>>>>>>>> The following issue[2] was triggered by blktests nvme/059 and it's
>>>>>>>
>>>>>>> nvme/049 presumably?
>>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>> 100% reproduced with commit[1]. Please help check it and let me know
>>>>>>>> if you need any info/test for it.
>>>>>>>> Seems it's one regression, I will try to test with the latest
>>>>>>>> linux-block/for-next and also bisect it tomorrow.
>>>>>>>
>>>>>>> Doesn't reproduce for me on the current tree, but nothing since:
>>>>>>>
>>>>>>>> commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
>>>>>>>> Merge: 29cefd61e0c6 fcf463b92a08
>>>>>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>>>>>> Date:   Tue Jan 6 05:48:07 2026 -0700
>>>>>>>>
>>>>>>>>     Merge branch 'for-7.0/blk-pvec' into for-next
>>>>>>>
>>>>>>> should have impacted that. So please do bisect.
>>>>>>
>>>>>> Hi Jens
>>>>>> The issue seems was introduced from below commit.
>>>>>> and the issue cannot be reproduced after reverting this commit.
>>>>>
>>>>> The issue still can be reproduced on the latest linux-block/for-next
>>>>
>>>> Hi Yi,
>>>>
>>>> Can you try the following patch?
>>>>
>>>>
>>>> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
>>>> index a9c097dacad6..7b0e62b8322b 100644
>>>> --- a/drivers/nvme/host/ioctl.c
>>>> +++ b/drivers/nvme/host/ioctl.c
>>>> @@ -425,14 +425,23 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
>>>>  	pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
>>>>  
>>>>  	/*
>>>> -	 * IOPOLL could potentially complete this request directly, but
>>>> -	 * if multiple rings are polling on the same queue, then it's possible
>>>> -	 * for one ring to find completions for another ring. Punting the
>>>> -	 * completion via task_work will always direct it to the right
>>>> -	 * location, rather than potentially complete requests for ringA
>>>> -	 * under iopoll invocations from ringB.
>>>> +	 * For IOPOLL, complete the request inline. The request's io_kiocb
>>>> +	 * uses a union for io_task_work and iopoll_node, so scheduling
>>>> +	 * task_work would corrupt the iopoll_list while the request is
>>>> +	 * still on it. io_uring_cmd_done() handles IOPOLL by setting
>>>> +	 * iopoll_completed rather than scheduling task_work.
>>>> +	 *
>>>> +	 * For non-IOPOLL, complete via task_work to ensure we run in the
>>>> +	 * submitter's context and handling multiple rings is safe.
>>>>  	 */
>>>> -	io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
>>>> +	if (blk_rq_is_poll(req)) {
>>>> +		if (pdu->bio)
>>>> +			blk_rq_unmap_user(pdu->bio);
>>>> +		io_uring_cmd_done32(ioucmd, pdu->status, pdu->result, 0);
>>>> +	} else {
>>>> +		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
>>>> +	}
>>>> +
>>>>  	return RQ_END_IO_FREE;
>>>>  }
>>>>  
>>>
>>> Ah yes that should fix it, the task_work addition will conflict with
>>> the list addition. Don't think it's safe though, which is why I made
>>> them all use task_work previously. Let me fix it in the IOPOLL patch
>>> instead.
>>
>> This should be better:
>>
>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> index dd084a55bed8..1fa8d829cbac 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -719,13 +719,10 @@ struct io_kiocb {
>>  	atomic_t			refs;
>>  	bool				cancel_seq_set;
>>  
>> -	/*
>> -	 * IOPOLL doesn't use task_work, so use the ->iopoll_node list
>> -	 * entry to manage pending iopoll requests.
>> -	 */
>>  	union {
>>  		struct io_task_work	io_task_work;
>> -		struct list_head	iopoll_node;
>> +		/* For IOPOLL setup queues, with hybrid polling */
>> +		u64                     iopoll_start;
>>  	};
>>  
>>  	union {
>> @@ -734,8 +731,8 @@ struct io_kiocb {
>>  		 * poll
>>  		 */
>>  		struct hlist_node	hash_node;
>> -		/* For IOPOLL setup queues, with hybrid polling */
>> -		u64                     iopoll_start;
>> +		/* IOPOLL completion handling */
>> +		struct list_head	iopoll_node;
>>  		/* for private io_kiocb freeing */
>>  		struct rcu_head		rcu_head;
>>  	};
> 
> This way looks better, just `req->iopoll_start` needs to read to local
> variable first in io_uring_hybrid_poll().

True, let me send out a v2.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-14 14:11       ` Ming Lei
  2026-01-14 14:43         ` Jens Axboe
@ 2026-01-16 11:54         ` Alexander Atanasov
  2026-01-16 12:41           ` Ming Lei
  1 sibling, 1 reply; 13+ messages in thread
From: Alexander Atanasov @ 2026-01-16 11:54 UTC (permalink / raw)
  To: Ming Lei, Yi Zhang
  Cc: Jens Axboe, fengnanchang, linux-block, Shinichiro Kawasaki

Hello Ming,

On 14.01.26 16:11, Ming Lei wrote:
> On Wed, Jan 14, 2026 at 01:58:03PM +0800, Yi Zhang wrote:
>> On Thu, Jan 8, 2026 at 2:39 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>>>
>>> On Thu, Jan 8, 2026 at 12:48 AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> On 1/7/26 9:39 AM, Yi Zhang wrote:
>>>>> Hi
>>>>> The following issue[2] was triggered by blktests nvme/059 and it's
>>>>
>>>> nvme/049 presumably?
>>>>
>>> Yes.
>>>
>>>>> 100% reproduced with commit[1]. Please help check it and let me know
>>>>> if you need any info/test for it.
>>>>> Seems it's one regression, I will try to test with the latest
>>>>> linux-block/for-next and also bisect it tomorrow.
>>>>
>>>> Doesn't reproduce for me on the current tree, but nothing since:
>>>>
>>>>> commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
>>>>> Merge: 29cefd61e0c6 fcf463b92a08
>>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>>> Date:   Tue Jan 6 05:48:07 2026 -0700
>>>>>
>>>>>      Merge branch 'for-7.0/blk-pvec' into for-next
>>>>
>>>> should have impacted that. So please do bisect.
>>>
>>> Hi Jens
>>> The issue seems was introduced from below commit.
>>> and the issue cannot be reproduced after reverting this commit.
>>
>> The issue still can be reproduced on the latest linux-block/for-next
> 
> Hi Yi,
> 
> Can you try the following patch?
> 
> 
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index a9c097dacad6..7b0e62b8322b 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -425,14 +425,23 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
>   	pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
>   
>   	/*
> -	 * IOPOLL could potentially complete this request directly, but
> -	 * if multiple rings are polling on the same queue, then it's possible
> -	 * for one ring to find completions for another ring. Punting the
> -	 * completion via task_work will always direct it to the right
> -	 * location, rather than potentially complete requests for ringA
> -	 * under iopoll invocations from ringB.
> +	 * For IOPOLL, complete the request inline. The request's io_kiocb
> +	 * uses a union for io_task_work and iopoll_node, so scheduling
> +	 * task_work would corrupt the iopoll_list while the request is
> +	 * still on it. io_uring_cmd_done() handles IOPOLL by setting
> +	 * iopoll_completed rather than scheduling task_work.
> +	 *
> +	 * For non-IOPOLL, complete via task_work to ensure we run in the
> +	 * submitter's context and handling multiple rings is safe.
>   	 */
> -	io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
> +	if (blk_rq_is_poll(req)) {
> +		if (pdu->bio)
> +			blk_rq_unmap_user(pdu->bio);
> +		io_uring_cmd_done32(ioucmd, pdu->status, pdu->result, 0);
> +	} else {
> +		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
> +	}
> +
>   	return RQ_END_IO_FREE;
>   }


While this is a good optimisation and it will fix the list issue for a 
single user - it may crash with multiple users of the context. I am 
still learning this code, so excuse my ignorance here and there.

The bisected patch 3c7d76d6128a changed io_wq_work_list which looks like 
safe to be used  without locks (it is a derivate of llist) , list_head 
require proper locking to be safe.

ctx can be used to poll multiple files, iopoll_list is a list for that 
reason.
sqpoll is calling io_iopoll_req_issued without lock -> it does 
list_add_tail
if that races with other list addition or deletion it will corrupt the list.

is there any mechanism to prevent that? or i am missing something?



-- 
have fun,
alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bug report][bisected] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049
  2026-01-16 11:54         ` Alexander Atanasov
@ 2026-01-16 12:41           ` Ming Lei
  0 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2026-01-16 12:41 UTC (permalink / raw)
  To: alex+zkern
  Cc: Yi Zhang, Jens Axboe, fengnanchang, linux-block,
	Shinichiro Kawasaki

On Fri, Jan 16, 2026 at 01:54:15PM +0200, Alexander Atanasov wrote:
> Hello Ming,
> 
> On 14.01.26 16:11, Ming Lei wrote:
> > On Wed, Jan 14, 2026 at 01:58:03PM +0800, Yi Zhang wrote:
> > > On Thu, Jan 8, 2026 at 2:39 PM Yi Zhang <yi.zhang@redhat.com> wrote:
> > > > 
> > > > On Thu, Jan 8, 2026 at 12:48 AM Jens Axboe <axboe@kernel.dk> wrote:
> > > > > 
> > > > > On 1/7/26 9:39 AM, Yi Zhang wrote:
> > > > > > Hi
> > > > > > The following issue[2] was triggered by blktests nvme/059 and it's
> > > > > 
> > > > > nvme/049 presumably?
> > > > > 
> > > > Yes.
> > > > 
> > > > > > 100% reproduced with commit[1]. Please help check it and let me know
> > > > > > if you need any info/test for it.
> > > > > > Seems it's one regression, I will try to test with the latest
> > > > > > linux-block/for-next and also bisect it tomorrow.
> > > > > 
> > > > > Doesn't reproduce for me on the current tree, but nothing since:
> > > > > 
> > > > > > commit 5ee81d4ae52ec4e9206efb4c1b06e269407aba11
> > > > > > Merge: 29cefd61e0c6 fcf463b92a08
> > > > > > Author: Jens Axboe <axboe@kernel.dk>
> > > > > > Date:   Tue Jan 6 05:48:07 2026 -0700
> > > > > > 
> > > > > >      Merge branch 'for-7.0/blk-pvec' into for-next
> > > > > 
> > > > > should have impacted that. So please do bisect.
> > > > 
> > > > Hi Jens
> > > > The issue seems was introduced from below commit.
> > > > and the issue cannot be reproduced after reverting this commit.
> > > 
> > > The issue still can be reproduced on the latest linux-block/for-next
> > 
> > Hi Yi,
> > 
> > Can you try the following patch?
> > 
> > 
> > diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> > index a9c097dacad6..7b0e62b8322b 100644
> > --- a/drivers/nvme/host/ioctl.c
> > +++ b/drivers/nvme/host/ioctl.c
> > @@ -425,14 +425,23 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
> >   	pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
> >   	/*
> > -	 * IOPOLL could potentially complete this request directly, but
> > -	 * if multiple rings are polling on the same queue, then it's possible
> > -	 * for one ring to find completions for another ring. Punting the
> > -	 * completion via task_work will always direct it to the right
> > -	 * location, rather than potentially complete requests for ringA
> > -	 * under iopoll invocations from ringB.
> > +	 * For IOPOLL, complete the request inline. The request's io_kiocb
> > +	 * uses a union for io_task_work and iopoll_node, so scheduling
> > +	 * task_work would corrupt the iopoll_list while the request is
> > +	 * still on it. io_uring_cmd_done() handles IOPOLL by setting
> > +	 * iopoll_completed rather than scheduling task_work.
> > +	 *
> > +	 * For non-IOPOLL, complete via task_work to ensure we run in the
> > +	 * submitter's context and handling multiple rings is safe.
> >   	 */
> > -	io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
> > +	if (blk_rq_is_poll(req)) {
> > +		if (pdu->bio)
> > +			blk_rq_unmap_user(pdu->bio);
> > +		io_uring_cmd_done32(ioucmd, pdu->status, pdu->result, 0);
> > +	} else {
> > +		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
> > +	}
> > +
> >   	return RQ_END_IO_FREE;
> >   }
> 
> 
> While this is a good optimisation and it will fix the list issue for a
> single user - it may crash with multiple users of the context. I am still
> learning this code, so excuse my ignorance here and there.

Jens has sent the following fix already:

https://lore.kernel.org/io-uring/aWhGEMsaOf752f5z@fedora/T/#t

> 
> The bisected patch 3c7d76d6128a changed io_wq_work_list which looks like
> safe to be used  without locks (it is a derivate of llist) , list_head
> require proper locking to be safe.
> 
> ctx can be used to poll multiple files, iopoll_list is a list for that
> reason.
> sqpoll is calling io_iopoll_req_issued without lock -> it does list_add_tail
> if that races with other list addition or deletion it will corrupt the list.
> 
> is there any mechanism to prevent that? or i am missing something?

io_iopoll_req_issued() will grab ctx->uring_lock if it isn't held.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-01-16 12:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-07 16:39 [bug report] kernel BUG at lib/list_debug.c:32! triggered by blktests nvme/049 Yi Zhang
2026-01-07 16:48 ` Jens Axboe
2026-01-08  6:39   ` Yi Zhang
2026-01-14  5:58     ` [bug report][bisected] " Yi Zhang
2026-01-14  9:40       ` Alexander Atanasov
2026-01-14 12:43         ` Christoph Hellwig
2026-01-14 14:11       ` Ming Lei
2026-01-14 14:43         ` Jens Axboe
2026-01-14 14:58           ` Jens Axboe
2026-01-14 15:20             ` Ming Lei
2026-01-14 15:26               ` Jens Axboe
2026-01-16 11:54         ` Alexander Atanasov
2026-01-16 12:41           ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox