* [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited
@ 2026-01-06 15:56 Mikulas Patocka
2026-01-06 16:29 ` Uladzislau Rezki
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Mikulas Patocka @ 2026-01-06 15:56 UTC (permalink / raw)
To: Fengnan Chang, Yu Kuai, Fengnan Chang, Jens Axboe,
Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki
Cc: rcu, linux-block
On the kernel 6.19-rc, I am experiencing 15-second boot stall in a
virtual machine when probing a virtio-scsi disk:
[ 1.011641] SCSI subsystem initialized
[ 1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues
[ 1.015983] scsi host0: Virtio SCSI HBA
[ 1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16
[ 1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode
[ 1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f)
[ 1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only
[ 1.024688] scsi host1: ahci
[ 1.025432] scsi host2: ahci
[ 1.025966] scsi host3: ahci
[ 1.026511] scsi host4: ahci
[ 1.028371] scsi host5: ahci
[ 1.028918] scsi host6: ahci
[ 1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23100 irq 16 lpm-pol 1
[ 1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23180 irq 16 lpm-pol 1
[ 1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23200 irq 16 lpm-pol 1
[ 1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23280 irq 16 lpm-pol 1
[ 1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23300 irq 16 lpm-pol 1
[ 1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23380 irq 16 lpm-pol 1
[ 1.118111] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[ 1.348916] ata1: SATA link down (SStatus 0 SControl 300)
[ 1.350713] ata2: SATA link down (SStatus 0 SControl 300)
[ 1.351025] ata6: SATA link down (SStatus 0 SControl 300)
[ 1.351160] ata5: SATA link down (SStatus 0 SControl 300)
[ 1.351326] ata3: SATA link down (SStatus 0 SControl 300)
[ 1.351536] ata4: SATA link down (SStatus 0 SControl 300)
[ 1.449153] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
[ 16.483477] sd 0:0:0:0: Power-on or device reset occurred
[ 16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 16.483762] sd 0:0:0:0: [sda] Write Protect is off
[ 16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 16.569225] sd 0:0:0:0: [sda] Attached SCSI disk
I bisected it and it is caused by the commit 89e1fb7ceffd which
introduces calls to synchronize_rcu_expedited.
This commit replaces synchronize_rcu_expedited and kfree with a call to
kfree_rcu_mightsleep, avoiding the 15-second delay.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 89e1fb7ceffd ("blk-mq: fix potential uaf for 'queue_hw_ctx'")
---
block/blk-mq.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
Index: linux-2.6/block/blk-mq.c
===================================================================
--- linux-2.6.orig/block/blk-mq.c 2026-01-06 16:45:11.000000000 +0100
+++ linux-2.6/block/blk-mq.c 2026-01-06 16:48:00.000000000 +0100
@@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str
* Make sure reading the old queue_hw_ctx from other
* context concurrently won't trigger uaf.
*/
- synchronize_rcu_expedited();
- kfree(hctxs);
+ kfree_rcu_mightsleep(hctxs);
hctxs = new_hctxs;
}
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited 2026-01-06 15:56 [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited Mikulas Patocka @ 2026-01-06 16:29 ` Uladzislau Rezki 2026-01-06 16:59 ` Mikulas Patocka 2026-01-07 12:23 ` Uladzislau Rezki ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Uladzislau Rezki @ 2026-01-06 16:29 UTC (permalink / raw) To: Mikulas Patocka Cc: Fengnan Chang, Yu Kuai, Fengnan Chang, Jens Axboe, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki, rcu, linux-block On Tue, Jan 06, 2026 at 04:56:07PM +0100, Mikulas Patocka wrote: > On the kernel 6.19-rc, I am experiencing 15-second boot stall in a > virtual machine when probing a virtio-scsi disk: > [ 1.011641] SCSI subsystem initialized > [ 1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues > [ 1.015983] scsi host0: Virtio SCSI HBA > [ 1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16 > [ 1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode > [ 1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f) > [ 1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only > [ 1.024688] scsi host1: ahci > [ 1.025432] scsi host2: ahci > [ 1.025966] scsi host3: ahci > [ 1.026511] scsi host4: ahci > [ 1.028371] scsi host5: ahci > [ 1.028918] scsi host6: ahci > [ 1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23100 irq 16 lpm-pol 1 > [ 1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23180 irq 16 lpm-pol 1 > [ 1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23200 irq 16 lpm-pol 1 > [ 1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23280 irq 16 lpm-pol 1 > [ 1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23300 irq 16 lpm-pol 1 > [ 1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23380 irq 16 lpm-pol 1 > [ 1.118111] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > [ 1.348916] ata1: SATA link down (SStatus 0 SControl 300) > [ 1.350713] ata2: SATA link down (SStatus 0 SControl 300) > [ 1.351025] ata6: SATA link down (SStatus 0 SControl 300) > [ 1.351160] ata5: SATA link down (SStatus 0 SControl 300) > [ 1.351326] ata3: SATA link down (SStatus 0 SControl 300) > [ 1.351536] ata4: SATA link down (SStatus 0 SControl 300) > [ 1.449153] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 > [ 16.483477] sd 0:0:0:0: Power-on or device reset occurred > [ 16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) > [ 16.483762] sd 0:0:0:0: [sda] Write Protect is off > [ 16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > [ 16.569225] sd 0:0:0:0: [sda] Attached SCSI disk > > I bisected it and it is caused by the commit 89e1fb7ceffd which > introduces calls to synchronize_rcu_expedited. > > This commit replaces synchronize_rcu_expedited and kfree with a call to > kfree_rcu_mightsleep, avoiding the 15-second delay. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: 89e1fb7ceffd ("blk-mq: fix potential uaf for 'queue_hw_ctx'") > > --- > block/blk-mq.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > Index: linux-2.6/block/blk-mq.c > =================================================================== > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 16:45:11.000000000 +0100 > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:48:00.000000000 +0100 > @@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str > * Make sure reading the old queue_hw_ctx from other > * context concurrently won't trigger uaf. > */ > - synchronize_rcu_expedited(); > - kfree(hctxs); > + kfree_rcu_mightsleep(hctxs); > I agree, doing freeing that way is not optimal. But kfree_rcu_mightsleep() also might not work. It has a fallback, if we can not place an object into "page" due to memory allocation failure, it inlines freeing: <snip> synchronize_rcu(); free(). <snip> Please note, synchronize_rcu() can easily be converted into expedited version. See rcu_gp_is_expedited(). Inlining is a corner case but it can happen. The best way is to add rcu_head to the blk_mq_hw_ctx structure and use kfree_rcu(). It never blocks. -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited 2026-01-06 16:29 ` Uladzislau Rezki @ 2026-01-06 16:59 ` Mikulas Patocka 2026-01-07 11:01 ` Uladzislau Rezki 2026-01-07 15:10 ` Jens Axboe 0 siblings, 2 replies; 10+ messages in thread From: Mikulas Patocka @ 2026-01-06 16:59 UTC (permalink / raw) To: Uladzislau Rezki Cc: Fengnan Chang, Yu Kuai, Fengnan Chang, Jens Axboe, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, rcu, linux-block On Tue, 6 Jan 2026, Uladzislau Rezki wrote: > On Tue, Jan 06, 2026 at 04:56:07PM +0100, Mikulas Patocka wrote: > > On the kernel 6.19-rc, I am experiencing 15-second boot stall in a > > virtual machine when probing a virtio-scsi disk: > > [ 1.011641] SCSI subsystem initialized > > [ 1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues > > [ 1.015983] scsi host0: Virtio SCSI HBA > > [ 1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16 > > [ 1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode > > [ 1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f) > > [ 1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only > > [ 1.024688] scsi host1: ahci > > [ 1.025432] scsi host2: ahci > > [ 1.025966] scsi host3: ahci > > [ 1.026511] scsi host4: ahci > > [ 1.028371] scsi host5: ahci > > [ 1.028918] scsi host6: ahci > > [ 1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23100 irq 16 lpm-pol 1 > > [ 1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23180 irq 16 lpm-pol 1 > > [ 1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23200 irq 16 lpm-pol 1 > > [ 1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23280 irq 16 lpm-pol 1 > > [ 1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23300 irq 16 lpm-pol 1 > > [ 1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23380 irq 16 lpm-pol 1 > > [ 1.118111] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > > [ 1.348916] ata1: SATA link down (SStatus 0 SControl 300) > > [ 1.350713] ata2: SATA link down (SStatus 0 SControl 300) > > [ 1.351025] ata6: SATA link down (SStatus 0 SControl 300) > > [ 1.351160] ata5: SATA link down (SStatus 0 SControl 300) > > [ 1.351326] ata3: SATA link down (SStatus 0 SControl 300) > > [ 1.351536] ata4: SATA link down (SStatus 0 SControl 300) > > [ 1.449153] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 > > [ 16.483477] sd 0:0:0:0: Power-on or device reset occurred > > [ 16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) > > [ 16.483762] sd 0:0:0:0: [sda] Write Protect is off > > [ 16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > [ 16.569225] sd 0:0:0:0: [sda] Attached SCSI disk > > > > I bisected it and it is caused by the commit 89e1fb7ceffd which > > introduces calls to synchronize_rcu_expedited. > > > > This commit replaces synchronize_rcu_expedited and kfree with a call to > > kfree_rcu_mightsleep, avoiding the 15-second delay. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Fixes: 89e1fb7ceffd ("blk-mq: fix potential uaf for 'queue_hw_ctx'") > > > > --- > > block/blk-mq.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > Index: linux-2.6/block/blk-mq.c > > =================================================================== > > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 16:45:11.000000000 +0100 > > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:48:00.000000000 +0100 > > @@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str > > * Make sure reading the old queue_hw_ctx from other > > * context concurrently won't trigger uaf. > > */ > > - synchronize_rcu_expedited(); > > - kfree(hctxs); > > + kfree_rcu_mightsleep(hctxs); > > > I agree, doing freeing that way is not optimal. But kfree_rcu_mightsleep() > also might not work. It has a fallback, if we can not place an object into > "page" due to memory allocation failure, it inlines freeing: > > <snip> > synchronize_rcu(); > free(). > <snip> > > Please note, synchronize_rcu() can easily be converted into expedited > version. See rcu_gp_is_expedited(). > > -- > Uladzislau Rezki Would this patch be better? It does GFP_KERNEL allocation which dones't fail in practice. > Inlining is a corner case but it can happen. The best way is to add > rcu_head to the blk_mq_hw_ctx structure and use kfree_rcu(). It never > blocks. We are not protecting the blk_mq_hw_ctx structure with RCU, we are protecting the q->queue_hw_ctx array. So, rcu_head cannot be added to an array. We could cast the array to rcu_head (and make sure that the initial allocation is at least sizeof(struct rcu_head)), but that is hacky. Mikulas --- block/blk-mq.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) Index: linux-2.6/block/blk-mq.c =================================================================== --- linux-2.6.orig/block/blk-mq.c 2026-01-06 15:55:41.000000000 +0100 +++ linux-2.6/block/blk-mq.c 2026-01-06 16:22:40.000000000 +0100 @@ -4531,6 +4531,18 @@ static struct blk_mq_hw_ctx *blk_mq_allo return NULL; } +struct rcu_free_hctxs { + struct rcu_head head; + struct blk_mq_hw_ctx **hctxs; +}; + +static void rcu_free_hctxs(struct rcu_head *head) +{ + struct rcu_free_hctxs *r = container_of(head, struct rcu_free_hctxs, head); + kfree(r->hctxs); + kfree(r); +} + static void __blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, struct request_queue *q) { @@ -4539,6 +4551,7 @@ static void __blk_mq_realloc_hw_ctxs(str if (q->nr_hw_queues < set->nr_hw_queues) { struct blk_mq_hw_ctx **new_hctxs; + struct rcu_free_hctxs *r; new_hctxs = kcalloc_node(set->nr_hw_queues, sizeof(*new_hctxs), GFP_KERNEL, @@ -4553,8 +4566,14 @@ static void __blk_mq_realloc_hw_ctxs(str * Make sure reading the old queue_hw_ctx from other * context concurrently won't trigger uaf. */ - synchronize_rcu_expedited(); - kfree(hctxs); + r = kmalloc(sizeof(struct rcu_free_hctxs), GFP_KERNEL); + if (!r) { + synchronize_rcu_expedited(); + kfree(hctxs); + } else { + r->hctxs = hctxs; + call_rcu(&r->head, rcu_free_hctxs); + } hctxs = new_hctxs; } > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited 2026-01-06 16:59 ` Mikulas Patocka @ 2026-01-07 11:01 ` Uladzislau Rezki 2026-01-07 12:05 ` Mikulas Patocka 2026-01-07 15:10 ` Jens Axboe 1 sibling, 1 reply; 10+ messages in thread From: Uladzislau Rezki @ 2026-01-07 11:01 UTC (permalink / raw) To: Mikulas Patocka Cc: Uladzislau Rezki, Fengnan Chang, Yu Kuai, Fengnan Chang, Jens Axboe, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, rcu, linux-block On Tue, Jan 06, 2026 at 05:59:16PM +0100, Mikulas Patocka wrote: > > > On Tue, 6 Jan 2026, Uladzislau Rezki wrote: > > > On Tue, Jan 06, 2026 at 04:56:07PM +0100, Mikulas Patocka wrote: > > > On the kernel 6.19-rc, I am experiencing 15-second boot stall in a > > > virtual machine when probing a virtio-scsi disk: > > > [ 1.011641] SCSI subsystem initialized > > > [ 1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues > > > [ 1.015983] scsi host0: Virtio SCSI HBA > > > [ 1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16 > > > [ 1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode > > > [ 1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f) > > > [ 1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only > > > [ 1.024688] scsi host1: ahci > > > [ 1.025432] scsi host2: ahci > > > [ 1.025966] scsi host3: ahci > > > [ 1.026511] scsi host4: ahci > > > [ 1.028371] scsi host5: ahci > > > [ 1.028918] scsi host6: ahci > > > [ 1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23100 irq 16 lpm-pol 1 > > > [ 1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23180 irq 16 lpm-pol 1 > > > [ 1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23200 irq 16 lpm-pol 1 > > > [ 1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23280 irq 16 lpm-pol 1 > > > [ 1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23300 irq 16 lpm-pol 1 > > > [ 1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23380 irq 16 lpm-pol 1 > > > [ 1.118111] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > > > [ 1.348916] ata1: SATA link down (SStatus 0 SControl 300) > > > [ 1.350713] ata2: SATA link down (SStatus 0 SControl 300) > > > [ 1.351025] ata6: SATA link down (SStatus 0 SControl 300) > > > [ 1.351160] ata5: SATA link down (SStatus 0 SControl 300) > > > [ 1.351326] ata3: SATA link down (SStatus 0 SControl 300) > > > [ 1.351536] ata4: SATA link down (SStatus 0 SControl 300) > > > [ 1.449153] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 > > > [ 16.483477] sd 0:0:0:0: Power-on or device reset occurred > > > [ 16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) > > > [ 16.483762] sd 0:0:0:0: [sda] Write Protect is off > > > [ 16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > > [ 16.569225] sd 0:0:0:0: [sda] Attached SCSI disk > > > > > > I bisected it and it is caused by the commit 89e1fb7ceffd which > > > introduces calls to synchronize_rcu_expedited. > > > > > > This commit replaces synchronize_rcu_expedited and kfree with a call to > > > kfree_rcu_mightsleep, avoiding the 15-second delay. > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > Fixes: 89e1fb7ceffd ("blk-mq: fix potential uaf for 'queue_hw_ctx'") > > > > > > --- > > > block/blk-mq.c | 3 +-- > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > Index: linux-2.6/block/blk-mq.c > > > =================================================================== > > > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 16:45:11.000000000 +0100 > > > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:48:00.000000000 +0100 > > > @@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str > > > * Make sure reading the old queue_hw_ctx from other > > > * context concurrently won't trigger uaf. > > > */ > > > - synchronize_rcu_expedited(); > > > - kfree(hctxs); > > > + kfree_rcu_mightsleep(hctxs); > > > > > I agree, doing freeing that way is not optimal. But kfree_rcu_mightsleep() > > also might not work. It has a fallback, if we can not place an object into > > "page" due to memory allocation failure, it inlines freeing: > > > > <snip> > > synchronize_rcu(); > > free(). > > <snip> > > > > Please note, synchronize_rcu() can easily be converted into expedited > > version. See rcu_gp_is_expedited(). > > > > -- > > Uladzislau Rezki > > Would this patch be better? It does GFP_KERNEL allocation which dones't > fail in practice. > > > Inlining is a corner case but it can happen. The best way is to add > > rcu_head to the blk_mq_hw_ctx structure and use kfree_rcu(). It never > > blocks. > > We are not protecting the blk_mq_hw_ctx structure with RCU, we are > protecting the q->queue_hw_ctx array. So, rcu_head cannot be added to an > array. We could cast the array to rcu_head (and make sure that the initial > allocation is at least sizeof(struct rcu_head)), but that is hacky. > > Mikulas > > --- > block/blk-mq.c | 23 +++++++++++++++++++++-- > 1 file changed, 21 insertions(+), 2 deletions(-) > > Index: linux-2.6/block/blk-mq.c > =================================================================== > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 15:55:41.000000000 +0100 > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:22:40.000000000 +0100 > @@ -4531,6 +4531,18 @@ static struct blk_mq_hw_ctx *blk_mq_allo > return NULL; > } > > +struct rcu_free_hctxs { > + struct rcu_head head; > + struct blk_mq_hw_ctx **hctxs; > +}; > + > +static void rcu_free_hctxs(struct rcu_head *head) > +{ > + struct rcu_free_hctxs *r = container_of(head, struct rcu_free_hctxs, head); > + kfree(r->hctxs); > + kfree(r); > +} > + > static void __blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, > struct request_queue *q) > { > @@ -4539,6 +4551,7 @@ static void __blk_mq_realloc_hw_ctxs(str > > if (q->nr_hw_queues < set->nr_hw_queues) { > struct blk_mq_hw_ctx **new_hctxs; > + struct rcu_free_hctxs *r; > > new_hctxs = kcalloc_node(set->nr_hw_queues, > sizeof(*new_hctxs), GFP_KERNEL, > @@ -4553,8 +4566,14 @@ static void __blk_mq_realloc_hw_ctxs(str > * Make sure reading the old queue_hw_ctx from other > * context concurrently won't trigger uaf. > */ > - synchronize_rcu_expedited(); > - kfree(hctxs); > + r = kmalloc(sizeof(struct rcu_free_hctxs), GFP_KERNEL); > + if (!r) { > + synchronize_rcu_expedited(); > + kfree(hctxs); > + } else { > + r->hctxs = hctxs; > + call_rcu(&r->head, rcu_free_hctxs); > + } > hctxs = new_hctxs; > } > > > > I see. That will work but this looks like a temporary fix. It would be great to understand why synchronize_rcu_expedited() is blocked for so long. 16 seconds is a way too long. Is that easy to reproduce? -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited 2026-01-07 11:01 ` Uladzislau Rezki @ 2026-01-07 12:05 ` Mikulas Patocka 2026-01-07 12:22 ` Uladzislau Rezki 0 siblings, 1 reply; 10+ messages in thread From: Mikulas Patocka @ 2026-01-07 12:05 UTC (permalink / raw) To: Uladzislau Rezki Cc: Fengnan Chang, Yu Kuai, Fengnan Chang, Jens Axboe, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, rcu, linux-block On Wed, 7 Jan 2026, Uladzislau Rezki wrote: > On Tue, Jan 06, 2026 at 05:59:16PM +0100, Mikulas Patocka wrote: > > > > > > On Tue, 6 Jan 2026, Uladzislau Rezki wrote: > > > > > On Tue, Jan 06, 2026 at 04:56:07PM +0100, Mikulas Patocka wrote: > > > > On the kernel 6.19-rc, I am experiencing 15-second boot stall in a > > > > virtual machine when probing a virtio-scsi disk: > > > > [ 1.011641] SCSI subsystem initialized > > > > [ 1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues > > > > [ 1.015983] scsi host0: Virtio SCSI HBA > > > > [ 1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16 > > > > [ 1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode > > > > [ 1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f) > > > > [ 1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only > > > > [ 1.024688] scsi host1: ahci > > > > [ 1.025432] scsi host2: ahci > > > > [ 1.025966] scsi host3: ahci > > > > [ 1.026511] scsi host4: ahci > > > > [ 1.028371] scsi host5: ahci > > > > [ 1.028918] scsi host6: ahci > > > > [ 1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23100 irq 16 lpm-pol 1 > > > > [ 1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23180 irq 16 lpm-pol 1 > > > > [ 1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23200 irq 16 lpm-pol 1 > > > > [ 1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23280 irq 16 lpm-pol 1 > > > > [ 1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23300 irq 16 lpm-pol 1 > > > > [ 1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23380 irq 16 lpm-pol 1 > > > > [ 1.118111] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > > > > [ 1.348916] ata1: SATA link down (SStatus 0 SControl 300) > > > > [ 1.350713] ata2: SATA link down (SStatus 0 SControl 300) > > > > [ 1.351025] ata6: SATA link down (SStatus 0 SControl 300) > > > > [ 1.351160] ata5: SATA link down (SStatus 0 SControl 300) > > > > [ 1.351326] ata3: SATA link down (SStatus 0 SControl 300) > > > > [ 1.351536] ata4: SATA link down (SStatus 0 SControl 300) > > > > [ 1.449153] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 > > > > [ 16.483477] sd 0:0:0:0: Power-on or device reset occurred > > > > [ 16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) > > > > [ 16.483762] sd 0:0:0:0: [sda] Write Protect is off > > > > [ 16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > > > [ 16.569225] sd 0:0:0:0: [sda] Attached SCSI disk > > > > > > > > I bisected it and it is caused by the commit 89e1fb7ceffd which > > > > introduces calls to synchronize_rcu_expedited. > > > > > > > > This commit replaces synchronize_rcu_expedited and kfree with a call to > > > > kfree_rcu_mightsleep, avoiding the 15-second delay. > > > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > > Fixes: 89e1fb7ceffd ("blk-mq: fix potential uaf for 'queue_hw_ctx'") > > > > > > > > --- > > > > block/blk-mq.c | 3 +-- > > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > > > Index: linux-2.6/block/blk-mq.c > > > > =================================================================== > > > > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 16:45:11.000000000 +0100 > > > > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:48:00.000000000 +0100 > > > > @@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str > > > > * Make sure reading the old queue_hw_ctx from other > > > > * context concurrently won't trigger uaf. > > > > */ > > > > - synchronize_rcu_expedited(); > > > > - kfree(hctxs); > > > > + kfree_rcu_mightsleep(hctxs); > > > > > > > I agree, doing freeing that way is not optimal. But kfree_rcu_mightsleep() > > > also might not work. It has a fallback, if we can not place an object into > > > "page" due to memory allocation failure, it inlines freeing: > > > > > > <snip> > > > synchronize_rcu(); > > > free(). > > > <snip> > > > > > > Please note, synchronize_rcu() can easily be converted into expedited > > > version. See rcu_gp_is_expedited(). > > > > > > -- > > > Uladzislau Rezki > > > > Would this patch be better? It does GFP_KERNEL allocation which dones't > > fail in practice. > > > > > Inlining is a corner case but it can happen. The best way is to add > > > rcu_head to the blk_mq_hw_ctx structure and use kfree_rcu(). It never > > > blocks. > > > > We are not protecting the blk_mq_hw_ctx structure with RCU, we are > > protecting the q->queue_hw_ctx array. So, rcu_head cannot be added to an > > array. We could cast the array to rcu_head (and make sure that the initial > > allocation is at least sizeof(struct rcu_head)), but that is hacky. > > > > Mikulas > > > > --- > > block/blk-mq.c | 23 +++++++++++++++++++++-- > > 1 file changed, 21 insertions(+), 2 deletions(-) > > > > Index: linux-2.6/block/blk-mq.c > > =================================================================== > > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 15:55:41.000000000 +0100 > > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:22:40.000000000 +0100 > > @@ -4531,6 +4531,18 @@ static struct blk_mq_hw_ctx *blk_mq_allo > > return NULL; > > } > > > > +struct rcu_free_hctxs { > > + struct rcu_head head; > > + struct blk_mq_hw_ctx **hctxs; > > +}; > > + > > +static void rcu_free_hctxs(struct rcu_head *head) > > +{ > > + struct rcu_free_hctxs *r = container_of(head, struct rcu_free_hctxs, head); > > + kfree(r->hctxs); > > + kfree(r); > > +} > > + > > static void __blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, > > struct request_queue *q) > > { > > @@ -4539,6 +4551,7 @@ static void __blk_mq_realloc_hw_ctxs(str > > > > if (q->nr_hw_queues < set->nr_hw_queues) { > > struct blk_mq_hw_ctx **new_hctxs; > > + struct rcu_free_hctxs *r; > > > > new_hctxs = kcalloc_node(set->nr_hw_queues, > > sizeof(*new_hctxs), GFP_KERNEL, > > @@ -4553,8 +4566,14 @@ static void __blk_mq_realloc_hw_ctxs(str > > * Make sure reading the old queue_hw_ctx from other > > * context concurrently won't trigger uaf. > > */ > > - synchronize_rcu_expedited(); > > - kfree(hctxs); > > + r = kmalloc(sizeof(struct rcu_free_hctxs), GFP_KERNEL); > > + if (!r) { > > + synchronize_rcu_expedited(); > > + kfree(hctxs); > > + } else { > > + r->hctxs = hctxs; > > + call_rcu(&r->head, rcu_free_hctxs); > > + } > > hctxs = new_hctxs; > > } > > > > > > > > I see. That will work but this looks like a temporary fix. It would be > great to understand why synchronize_rcu_expedited() is blocked for so long. > 16 seconds is a way too long. synchronize_rcu_expedited is called 257 times from the block layer. One call is approximately 50ms. This is one of the stacktraces: [ 3.087639] CPU: 12 UID: 0 PID: 260 Comm: (udev-worker) Not tainted 6.19.0-rc4 #26 PREEMPT_{RT,(full)} [ 3.087642] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 3.087643] Call Trace: [ 3.087644] <TASK> [ 3.087644] dump_stack_lvl+0x47/0x60 [ 3.087649] __blk_mq_realloc_hw_ctxs+0x14e/0x170 [ 3.087652] blk_mq_init_allocated_queue+0xf5/0x3b0 [ 3.087654] blk_mq_alloc_queue+0x62/0xa0 [ 3.087655] scsi_alloc_sdev+0x1e2/0x300 [scsi_mod] [ 3.087658] scsi_probe_and_add_lun+0x1de/0x280 [scsi_mod] [ 3.087660] ? preempt_count_add+0x48/0xc0 [ 3.087663] ? rt_spin_unlock+0x47/0xa0 [ 3.087665] ? rt_spin_unlock+0x2d/0xa0 [ 3.087666] __scsi_scan_target+0xd3/0x1d0 [scsi_mod] [ 3.087667] scsi_scan_channel+0x4f/0x80 [scsi_mod] [ 3.087668] scsi_scan_host_selected+0xc0/0xf0 [scsi_mod] [ 3.087670] scsi_scan_host+0x181/0x1a0 [scsi_mod] [ 3.087671] virtscsi_probe+0x333/0x341 [virtio_scsi] [ 3.087674] virtio_dev_probe+0x1e5/0x300 [ 3.087676] really_probe+0xb9/0x240 [ 3.087678] __driver_probe_device+0x6e/0x100 [ 3.087679] driver_probe_device+0x1a/0x70 [ 3.087680] ? __device_attach_driver+0xa0/0xa0 [ 3.087681] __driver_attach+0x84/0x140 [ 3.087682] bus_for_each_dev+0x5e/0xa0 [ 3.087683] bus_add_driver+0xd8/0x1c0 [ 3.087684] ? libata_transport_exit+0x930/0x930 [libata] [ 3.087686] driver_register+0x6c/0xd0 [ 3.087687] virtio_scsi_init+0xa1/0x1000 [virtio_scsi] [ 3.087688] do_one_initcall+0x35/0x160 [ 3.087690] ? do_init_module+0x1f/0x250 [ 3.087691] ? __kmalloc_cache_noprof+0x183/0x330 [ 3.087694] do_init_module+0x5d/0x250 [ 3.087695] ? init_module_from_file+0x9e/0xc0 [ 3.087695] init_module_from_file+0x9e/0xc0 [ 3.087696] idempotent_init_module+0xee/0x2d0 [ 3.087697] __x64_sys_finit_module+0x56/0xa0 [ 3.087698] do_syscall_64+0x31e/0x370 [ 3.087701] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > Is that easy to reproduce? I uploaded my config here: http://www.jikos.cz/~mikulas/testcases/config/.config-6.19-rc4 Mikulas > -- > Uladzislau Rezki > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited 2026-01-07 12:05 ` Mikulas Patocka @ 2026-01-07 12:22 ` Uladzislau Rezki 0 siblings, 0 replies; 10+ messages in thread From: Uladzislau Rezki @ 2026-01-07 12:22 UTC (permalink / raw) To: Mikulas Patocka Cc: Uladzislau Rezki, Fengnan Chang, Yu Kuai, Fengnan Chang, Jens Axboe, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, rcu, linux-block On Wed, Jan 07, 2026 at 01:05:14PM +0100, Mikulas Patocka wrote: > > > On Wed, 7 Jan 2026, Uladzislau Rezki wrote: > > > On Tue, Jan 06, 2026 at 05:59:16PM +0100, Mikulas Patocka wrote: > > > > > > > > > On Tue, 6 Jan 2026, Uladzislau Rezki wrote: > > > > > > > On Tue, Jan 06, 2026 at 04:56:07PM +0100, Mikulas Patocka wrote: > > > > > On the kernel 6.19-rc, I am experiencing 15-second boot stall in a > > > > > virtual machine when probing a virtio-scsi disk: > > > > > [ 1.011641] SCSI subsystem initialized > > > > > [ 1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues > > > > > [ 1.015983] scsi host0: Virtio SCSI HBA > > > > > [ 1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16 > > > > > [ 1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode > > > > > [ 1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f) > > > > > [ 1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only > > > > > [ 1.024688] scsi host1: ahci > > > > > [ 1.025432] scsi host2: ahci > > > > > [ 1.025966] scsi host3: ahci > > > > > [ 1.026511] scsi host4: ahci > > > > > [ 1.028371] scsi host5: ahci > > > > > [ 1.028918] scsi host6: ahci > > > > > [ 1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23100 irq 16 lpm-pol 1 > > > > > [ 1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23180 irq 16 lpm-pol 1 > > > > > [ 1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23200 irq 16 lpm-pol 1 > > > > > [ 1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23280 irq 16 lpm-pol 1 > > > > > [ 1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23300 irq 16 lpm-pol 1 > > > > > [ 1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23380 irq 16 lpm-pol 1 > > > > > [ 1.118111] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > > > > > [ 1.348916] ata1: SATA link down (SStatus 0 SControl 300) > > > > > [ 1.350713] ata2: SATA link down (SStatus 0 SControl 300) > > > > > [ 1.351025] ata6: SATA link down (SStatus 0 SControl 300) > > > > > [ 1.351160] ata5: SATA link down (SStatus 0 SControl 300) > > > > > [ 1.351326] ata3: SATA link down (SStatus 0 SControl 300) > > > > > [ 1.351536] ata4: SATA link down (SStatus 0 SControl 300) > > > > > [ 1.449153] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 > > > > > [ 16.483477] sd 0:0:0:0: Power-on or device reset occurred > > > > > [ 16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) > > > > > [ 16.483762] sd 0:0:0:0: [sda] Write Protect is off > > > > > [ 16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > > > > [ 16.569225] sd 0:0:0:0: [sda] Attached SCSI disk > > > > > > > > > > I bisected it and it is caused by the commit 89e1fb7ceffd which > > > > > introduces calls to synchronize_rcu_expedited. > > > > > > > > > > This commit replaces synchronize_rcu_expedited and kfree with a call to > > > > > kfree_rcu_mightsleep, avoiding the 15-second delay. > > > > > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > > > Fixes: 89e1fb7ceffd ("blk-mq: fix potential uaf for 'queue_hw_ctx'") > > > > > > > > > > --- > > > > > block/blk-mq.c | 3 +-- > > > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > > > > > Index: linux-2.6/block/blk-mq.c > > > > > =================================================================== > > > > > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 16:45:11.000000000 +0100 > > > > > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:48:00.000000000 +0100 > > > > > @@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str > > > > > * Make sure reading the old queue_hw_ctx from other > > > > > * context concurrently won't trigger uaf. > > > > > */ > > > > > - synchronize_rcu_expedited(); > > > > > - kfree(hctxs); > > > > > + kfree_rcu_mightsleep(hctxs); > > > > > > > > > I agree, doing freeing that way is not optimal. But kfree_rcu_mightsleep() > > > > also might not work. It has a fallback, if we can not place an object into > > > > "page" due to memory allocation failure, it inlines freeing: > > > > > > > > <snip> > > > > synchronize_rcu(); > > > > free(). > > > > <snip> > > > > > > > > Please note, synchronize_rcu() can easily be converted into expedited > > > > version. See rcu_gp_is_expedited(). > > > > > > > > -- > > > > Uladzislau Rezki > > > > > > Would this patch be better? It does GFP_KERNEL allocation which dones't > > > fail in practice. > > > > > > > Inlining is a corner case but it can happen. The best way is to add > > > > rcu_head to the blk_mq_hw_ctx structure and use kfree_rcu(). It never > > > > blocks. > > > > > > We are not protecting the blk_mq_hw_ctx structure with RCU, we are > > > protecting the q->queue_hw_ctx array. So, rcu_head cannot be added to an > > > array. We could cast the array to rcu_head (and make sure that the initial > > > allocation is at least sizeof(struct rcu_head)), but that is hacky. > > > > > > Mikulas > > > > > > --- > > > block/blk-mq.c | 23 +++++++++++++++++++++-- > > > 1 file changed, 21 insertions(+), 2 deletions(-) > > > > > > Index: linux-2.6/block/blk-mq.c > > > =================================================================== > > > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 15:55:41.000000000 +0100 > > > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:22:40.000000000 +0100 > > > @@ -4531,6 +4531,18 @@ static struct blk_mq_hw_ctx *blk_mq_allo > > > return NULL; > > > } > > > > > > +struct rcu_free_hctxs { > > > + struct rcu_head head; > > > + struct blk_mq_hw_ctx **hctxs; > > > +}; > > > + > > > +static void rcu_free_hctxs(struct rcu_head *head) > > > +{ > > > + struct rcu_free_hctxs *r = container_of(head, struct rcu_free_hctxs, head); > > > + kfree(r->hctxs); > > > + kfree(r); > > > +} > > > + > > > static void __blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, > > > struct request_queue *q) > > > { > > > @@ -4539,6 +4551,7 @@ static void __blk_mq_realloc_hw_ctxs(str > > > > > > if (q->nr_hw_queues < set->nr_hw_queues) { > > > struct blk_mq_hw_ctx **new_hctxs; > > > + struct rcu_free_hctxs *r; > > > > > > new_hctxs = kcalloc_node(set->nr_hw_queues, > > > sizeof(*new_hctxs), GFP_KERNEL, > > > @@ -4553,8 +4566,14 @@ static void __blk_mq_realloc_hw_ctxs(str > > > * Make sure reading the old queue_hw_ctx from other > > > * context concurrently won't trigger uaf. > > > */ > > > - synchronize_rcu_expedited(); > > > - kfree(hctxs); > > > + r = kmalloc(sizeof(struct rcu_free_hctxs), GFP_KERNEL); > > > + if (!r) { > > > + synchronize_rcu_expedited(); > > > + kfree(hctxs); > > > + } else { > > > + r->hctxs = hctxs; > > > + call_rcu(&r->head, rcu_free_hctxs); > > > + } > > > hctxs = new_hctxs; > > > } > > > > > > > > > > > > I see. That will work but this looks like a temporary fix. It would be > > great to understand why synchronize_rcu_expedited() is blocked for so long. > > 16 seconds is a way too long. > > synchronize_rcu_expedited is called 257 times from the block layer. One > call is approximately 50ms. > OK. I thought the _one_ call of synchronize_rcu_expedited() was stuck for ~15 seconds. Whereas you just have many of them. Therefore you can easily just go back to your original patch and use kfree_rcu_mightsleep(hctxs)! -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited 2026-01-06 16:59 ` Mikulas Patocka 2026-01-07 11:01 ` Uladzislau Rezki @ 2026-01-07 15:10 ` Jens Axboe 1 sibling, 0 replies; 10+ messages in thread From: Jens Axboe @ 2026-01-07 15:10 UTC (permalink / raw) To: Mikulas Patocka, Uladzislau Rezki Cc: Fengnan Chang, Yu Kuai, Fengnan Chang, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, rcu, linux-block On 1/6/26 9:59 AM, Mikulas Patocka wrote: > @@ -4553,8 +4566,14 @@ static void __blk_mq_realloc_hw_ctxs(str > * Make sure reading the old queue_hw_ctx from other > * context concurrently won't trigger uaf. > */ > - synchronize_rcu_expedited(); > - kfree(hctxs); > + r = kmalloc(sizeof(struct rcu_free_hctxs), GFP_KERNEL); > + if (!r) { > + synchronize_rcu_expedited(); > + kfree(hctxs); > + } else { > + r->hctxs = hctxs; > + call_rcu(&r->head, rcu_free_hctxs); > + } > hctxs = new_hctxs; > } This is worse in every conceivable way, imho. The proper way to do this would be to embed the rcu_head in whatever is allocated for the hctxs at alloc time, if youre doing an alloc here you may as well just use kfree_rcu_mightsleep() in the first place. There's nothing gained from open coding that. Since kfree_rcu_mightsleep() will only run into trouble under strained conditions anyway, I think the original patch is fine for this. -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited 2026-01-06 15:56 [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited Mikulas Patocka 2026-01-06 16:29 ` Uladzislau Rezki @ 2026-01-07 12:23 ` Uladzislau Rezki 2026-01-07 16:49 ` Bart Van Assche 2026-01-07 16:50 ` Jens Axboe 3 siblings, 0 replies; 10+ messages in thread From: Uladzislau Rezki @ 2026-01-07 12:23 UTC (permalink / raw) To: Mikulas Patocka Cc: Fengnan Chang, Yu Kuai, Fengnan Chang, Jens Axboe, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki, rcu, linux-block On Tue, Jan 06, 2026 at 04:56:07PM +0100, Mikulas Patocka wrote: > On the kernel 6.19-rc, I am experiencing 15-second boot stall in a > virtual machine when probing a virtio-scsi disk: > [ 1.011641] SCSI subsystem initialized > [ 1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues > [ 1.015983] scsi host0: Virtio SCSI HBA > [ 1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16 > [ 1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode > [ 1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f) > [ 1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only > [ 1.024688] scsi host1: ahci > [ 1.025432] scsi host2: ahci > [ 1.025966] scsi host3: ahci > [ 1.026511] scsi host4: ahci > [ 1.028371] scsi host5: ahci > [ 1.028918] scsi host6: ahci > [ 1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23100 irq 16 lpm-pol 1 > [ 1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23180 irq 16 lpm-pol 1 > [ 1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23200 irq 16 lpm-pol 1 > [ 1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23280 irq 16 lpm-pol 1 > [ 1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23300 irq 16 lpm-pol 1 > [ 1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23380 irq 16 lpm-pol 1 > [ 1.118111] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > [ 1.348916] ata1: SATA link down (SStatus 0 SControl 300) > [ 1.350713] ata2: SATA link down (SStatus 0 SControl 300) > [ 1.351025] ata6: SATA link down (SStatus 0 SControl 300) > [ 1.351160] ata5: SATA link down (SStatus 0 SControl 300) > [ 1.351326] ata3: SATA link down (SStatus 0 SControl 300) > [ 1.351536] ata4: SATA link down (SStatus 0 SControl 300) > [ 1.449153] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 > [ 16.483477] sd 0:0:0:0: Power-on or device reset occurred > [ 16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) > [ 16.483762] sd 0:0:0:0: [sda] Write Protect is off > [ 16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > [ 16.569225] sd 0:0:0:0: [sda] Attached SCSI disk > > I bisected it and it is caused by the commit 89e1fb7ceffd which > introduces calls to synchronize_rcu_expedited. > > This commit replaces synchronize_rcu_expedited and kfree with a call to > kfree_rcu_mightsleep, avoiding the 15-second delay. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: 89e1fb7ceffd ("blk-mq: fix potential uaf for 'queue_hw_ctx'") > > --- > block/blk-mq.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > Index: linux-2.6/block/blk-mq.c > =================================================================== > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 16:45:11.000000000 +0100 > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:48:00.000000000 +0100 > @@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str > * Make sure reading the old queue_hw_ctx from other > * context concurrently won't trigger uaf. > */ > - synchronize_rcu_expedited(); > - kfree(hctxs); > + kfree_rcu_mightsleep(hctxs); > hctxs = new_hctxs; > } > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited 2026-01-06 15:56 [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited Mikulas Patocka 2026-01-06 16:29 ` Uladzislau Rezki 2026-01-07 12:23 ` Uladzislau Rezki @ 2026-01-07 16:49 ` Bart Van Assche 2026-01-07 16:50 ` Jens Axboe 3 siblings, 0 replies; 10+ messages in thread From: Bart Van Assche @ 2026-01-07 16:49 UTC (permalink / raw) To: Mikulas Patocka, Fengnan Chang, Yu Kuai, Fengnan Chang, Jens Axboe, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki Cc: rcu, linux-block On 1/6/26 8:56 AM, Mikulas Patocka wrote: > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 16:45:11.000000000 +0100 > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:48:00.000000000 +0100 > @@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str > * Make sure reading the old queue_hw_ctx from other > * context concurrently won't trigger uaf. > */ > - synchronize_rcu_expedited(); > - kfree(hctxs); > + kfree_rcu_mightsleep(hctxs); > hctxs = new_hctxs; > } Reviewed-by: Bart Van Assche <bvanassche@acm.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited 2026-01-06 15:56 [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited Mikulas Patocka ` (2 preceding siblings ...) 2026-01-07 16:49 ` Bart Van Assche @ 2026-01-07 16:50 ` Jens Axboe 3 siblings, 0 replies; 10+ messages in thread From: Jens Axboe @ 2026-01-07 16:50 UTC (permalink / raw) To: Fengnan Chang, Yu Kuai, Fengnan Chang, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki, Mikulas Patocka Cc: rcu, linux-block On Tue, 06 Jan 2026 16:56:07 +0100, Mikulas Patocka wrote: > On the kernel 6.19-rc, I am experiencing 15-second boot stall in a > virtual machine when probing a virtio-scsi disk: > [ 1.011641] SCSI subsystem initialized > [ 1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues > [ 1.015983] scsi host0: Virtio SCSI HBA > [ 1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16 > [ 1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode > [ 1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f) > [ 1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only > [ 1.024688] scsi host1: ahci > [ 1.025432] scsi host2: ahci > [ 1.025966] scsi host3: ahci > [ 1.026511] scsi host4: ahci > [ 1.028371] scsi host5: ahci > [ 1.028918] scsi host6: ahci > [ 1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23100 irq 16 lpm-pol 1 > [ 1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23180 irq 16 lpm-pol 1 > [ 1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23200 irq 16 lpm-pol 1 > [ 1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23280 irq 16 lpm-pol 1 > [ 1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23300 irq 16 lpm-pol 1 > [ 1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23380 irq 16 lpm-pol 1 > [ 1.118111] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > [ 1.348916] ata1: SATA link down (SStatus 0 SControl 300) > [ 1.350713] ata2: SATA link down (SStatus 0 SControl 300) > [ 1.351025] ata6: SATA link down (SStatus 0 SControl 300) > [ 1.351160] ata5: SATA link down (SStatus 0 SControl 300) > [ 1.351326] ata3: SATA link down (SStatus 0 SControl 300) > [ 1.351536] ata4: SATA link down (SStatus 0 SControl 300) > [ 1.449153] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 > [ 16.483477] sd 0:0:0:0: Power-on or device reset occurred > [ 16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) > [ 16.483762] sd 0:0:0:0: [sda] Write Protect is off > [ 16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > [ 16.569225] sd 0:0:0:0: [sda] Attached SCSI disk > > [...] Applied, thanks! [1/1] blk-mq: avoid stall during boot due to synchronize_rcu_expedited commit: 9670db22e7ab4aefe2b2619589a47fef9d3e0c7e Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-01-07 16:50 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-06 15:56 [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited Mikulas Patocka 2026-01-06 16:29 ` Uladzislau Rezki 2026-01-06 16:59 ` Mikulas Patocka 2026-01-07 11:01 ` Uladzislau Rezki 2026-01-07 12:05 ` Mikulas Patocka 2026-01-07 12:22 ` Uladzislau Rezki 2026-01-07 15:10 ` Jens Axboe 2026-01-07 12:23 ` Uladzislau Rezki 2026-01-07 16:49 ` Bart Van Assche 2026-01-07 16:50 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox