From: Uladzislau Rezki <urezki@gmail.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
Fengnan Chang <fengnanchang@gmail.com>,
Yu Kuai <yukuai3@huawei.com>,
Fengnan Chang <changfengnan@bytedance.com>,
Jens Axboe <axboe@kernel.dk>,
"Paul E. McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
rcu@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited
Date: Wed, 7 Jan 2026 12:01:38 +0100 [thread overview]
Message-ID: <aV49En4-qub5af-C@milan> (raw)
In-Reply-To: <2720e388-9341-34af-21c5-0e5e1a822960@redhat.com>
On Tue, Jan 06, 2026 at 05:59:16PM +0100, Mikulas Patocka wrote:
>
>
> On Tue, 6 Jan 2026, Uladzislau Rezki wrote:
>
> > On Tue, Jan 06, 2026 at 04:56:07PM +0100, Mikulas Patocka wrote:
> > > On the kernel 6.19-rc, I am experiencing 15-second boot stall in a
> > > virtual machine when probing a virtio-scsi disk:
> > > [ 1.011641] SCSI subsystem initialized
> > > [ 1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues
> > > [ 1.015983] scsi host0: Virtio SCSI HBA
> > > [ 1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16
> > > [ 1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode
> > > [ 1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f)
> > > [ 1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only
> > > [ 1.024688] scsi host1: ahci
> > > [ 1.025432] scsi host2: ahci
> > > [ 1.025966] scsi host3: ahci
> > > [ 1.026511] scsi host4: ahci
> > > [ 1.028371] scsi host5: ahci
> > > [ 1.028918] scsi host6: ahci
> > > [ 1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23100 irq 16 lpm-pol 1
> > > [ 1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23180 irq 16 lpm-pol 1
> > > [ 1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23200 irq 16 lpm-pol 1
> > > [ 1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23280 irq 16 lpm-pol 1
> > > [ 1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23300 irq 16 lpm-pol 1
> > > [ 1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 0xfea23380 irq 16 lpm-pol 1
> > > [ 1.118111] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > > [ 1.348916] ata1: SATA link down (SStatus 0 SControl 300)
> > > [ 1.350713] ata2: SATA link down (SStatus 0 SControl 300)
> > > [ 1.351025] ata6: SATA link down (SStatus 0 SControl 300)
> > > [ 1.351160] ata5: SATA link down (SStatus 0 SControl 300)
> > > [ 1.351326] ata3: SATA link down (SStatus 0 SControl 300)
> > > [ 1.351536] ata4: SATA link down (SStatus 0 SControl 300)
> > > [ 1.449153] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
> > > [ 16.483477] sd 0:0:0:0: Power-on or device reset occurred
> > > [ 16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
> > > [ 16.483762] sd 0:0:0:0: [sda] Write Protect is off
> > > [ 16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > [ 16.569225] sd 0:0:0:0: [sda] Attached SCSI disk
> > >
> > > I bisected it and it is caused by the commit 89e1fb7ceffd which
> > > introduces calls to synchronize_rcu_expedited.
> > >
> > > This commit replaces synchronize_rcu_expedited and kfree with a call to
> > > kfree_rcu_mightsleep, avoiding the 15-second delay.
> > >
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > Fixes: 89e1fb7ceffd ("blk-mq: fix potential uaf for 'queue_hw_ctx'")
> > >
> > > ---
> > > block/blk-mq.c | 3 +--
> > > 1 file changed, 1 insertion(+), 2 deletions(-)
> > >
> > > Index: linux-2.6/block/blk-mq.c
> > > ===================================================================
> > > --- linux-2.6.orig/block/blk-mq.c 2026-01-06 16:45:11.000000000 +0100
> > > +++ linux-2.6/block/blk-mq.c 2026-01-06 16:48:00.000000000 +0100
> > > @@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str
> > > * Make sure reading the old queue_hw_ctx from other
> > > * context concurrently won't trigger uaf.
> > > */
> > > - synchronize_rcu_expedited();
> > > - kfree(hctxs);
> > > + kfree_rcu_mightsleep(hctxs);
> > >
> > I agree, doing freeing that way is not optimal. But kfree_rcu_mightsleep()
> > also might not work. It has a fallback, if we can not place an object into
> > "page" due to memory allocation failure, it inlines freeing:
> >
> > <snip>
> > synchronize_rcu();
> > free().
> > <snip>
> >
> > Please note, synchronize_rcu() can easily be converted into expedited
> > version. See rcu_gp_is_expedited().
> >
> > --
> > Uladzislau Rezki
>
> Would this patch be better? It does GFP_KERNEL allocation which dones't
> fail in practice.
>
> > Inlining is a corner case but it can happen. The best way is to add
> > rcu_head to the blk_mq_hw_ctx structure and use kfree_rcu(). It never
> > blocks.
>
> We are not protecting the blk_mq_hw_ctx structure with RCU, we are
> protecting the q->queue_hw_ctx array. So, rcu_head cannot be added to an
> array. We could cast the array to rcu_head (and make sure that the initial
> allocation is at least sizeof(struct rcu_head)), but that is hacky.
>
> Mikulas
>
> ---
> block/blk-mq.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/block/blk-mq.c
> ===================================================================
> --- linux-2.6.orig/block/blk-mq.c 2026-01-06 15:55:41.000000000 +0100
> +++ linux-2.6/block/blk-mq.c 2026-01-06 16:22:40.000000000 +0100
> @@ -4531,6 +4531,18 @@ static struct blk_mq_hw_ctx *blk_mq_allo
> return NULL;
> }
>
> +struct rcu_free_hctxs {
> + struct rcu_head head;
> + struct blk_mq_hw_ctx **hctxs;
> +};
> +
> +static void rcu_free_hctxs(struct rcu_head *head)
> +{
> + struct rcu_free_hctxs *r = container_of(head, struct rcu_free_hctxs, head);
> + kfree(r->hctxs);
> + kfree(r);
> +}
> +
> static void __blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
> struct request_queue *q)
> {
> @@ -4539,6 +4551,7 @@ static void __blk_mq_realloc_hw_ctxs(str
>
> if (q->nr_hw_queues < set->nr_hw_queues) {
> struct blk_mq_hw_ctx **new_hctxs;
> + struct rcu_free_hctxs *r;
>
> new_hctxs = kcalloc_node(set->nr_hw_queues,
> sizeof(*new_hctxs), GFP_KERNEL,
> @@ -4553,8 +4566,14 @@ static void __blk_mq_realloc_hw_ctxs(str
> * Make sure reading the old queue_hw_ctx from other
> * context concurrently won't trigger uaf.
> */
> - synchronize_rcu_expedited();
> - kfree(hctxs);
> + r = kmalloc(sizeof(struct rcu_free_hctxs), GFP_KERNEL);
> + if (!r) {
> + synchronize_rcu_expedited();
> + kfree(hctxs);
> + } else {
> + r->hctxs = hctxs;
> + call_rcu(&r->head, rcu_free_hctxs);
> + }
> hctxs = new_hctxs;
> }
>
> >
>
I see. That will work but this looks like a temporary fix. It would be
great to understand why synchronize_rcu_expedited() is blocked for so long.
16 seconds is a way too long.
Is that easy to reproduce?
--
Uladzislau Rezki
next prev parent reply other threads:[~2026-01-07 11:01 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-06 15:56 [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited Mikulas Patocka
2026-01-06 16:29 ` Uladzislau Rezki
2026-01-06 16:59 ` Mikulas Patocka
2026-01-07 11:01 ` Uladzislau Rezki [this message]
2026-01-07 12:05 ` Mikulas Patocka
2026-01-07 12:22 ` Uladzislau Rezki
2026-01-07 15:10 ` Jens Axboe
2026-01-07 12:23 ` Uladzislau Rezki
2026-01-07 16:49 ` Bart Van Assche
2026-01-07 16:50 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aV49En4-qub5af-C@milan \
--to=urezki@gmail.com \
--cc=axboe@kernel.dk \
--cc=boqun.feng@gmail.com \
--cc=changfengnan@bytedance.com \
--cc=fengnanchang@gmail.com \
--cc=frederic@kernel.org \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=linux-block@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox