From: Florian Bezdeka <florian.bezdeka@siemens.com>
To: Aaron Tomlin <atomlin@atomlin.com>,
axboe@kernel.dk, kbusch@kernel.org, hch@lst.de,
sagi@grimberg.me, mst@redhat.com
Cc: aacraid@microsemi.com, James.Bottomley@HansenPartnership.com,
martin.petersen@oracle.com, liyihang9@h-partners.com,
kashyap.desai@broadcom.com, sumit.saxena@broadcom.com,
shivasharan.srikanteshwara@broadcom.com,
chandrakanth.patil@broadcom.com, sathya.prakash@broadcom.com,
sreekanth.reddy@broadcom.com,
suganath-prabu.subramani@broadcom.com, ranjan.kumar@broadcom.com,
jinpu.wang@cloud.ionos.com, tglx@kernel.org, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, akpm@linux-foundation.org,
maz@kernel.org, ruanjinjie@huawei.com, bigeasy@linutronix.de,
yphbchou0911@gmail.com, wagi@kernel.org, frederic@kernel.org,
longman@redhat.com, chenridong@huawei.com, hare@suse.de,
kch@nvidia.com, ming.lei@redhat.com, tom.leiming@gmail.com,
steve@abita.co, sean@ashe.io, chjohnst@gmail.com, neelx@suse.com,
mproche@gmail.com, nick.lange@gmail.com,
marco.crivellari@suse.com, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org,
megaraidlinux.pdl@broadcom.com, mpi3mr-linuxdrv.pdl@broadcom.com,
MPT-FusionLinux.pdl@broadcom.com
Subject: Re: [PATCH v12 00/13] blk: honor isolcpus configuration
Date: Mon, 27 Apr 2026 12:55:20 +0200 [thread overview]
Message-ID: <e350389a5a635660267a7a13f06529da102a95d8.camel@siemens.com> (raw)
In-Reply-To: <20260422185215.100929-1-atomlin@atomlin.com>
Hi all,
On Wed, 2026-04-22 at 14:52 -0400, Aaron Tomlin wrote:
> Hi,
>
> I have decided to drive this series forward on behalf of Daniel Wagner, the
> original author. The series has been rebased on v7.0-12635-g6596a02b2078.
>
> Building upon prior iterations, this series introduces critical
> architectural refinements to the mapping and affinity spreading algorithms
> to guarantee thread safety and resilience against concurrent CPU-hotplug
> operations. Previously, the block layer relied on a shared global static
> mask (i.e., blk_hk_online_mask), which proved vulnerable to race conditions
> during rapid hotplug events. This vulnerability was highlighted by the
> kernel test robot, which encountered a NULL pointer dereference during
> rcutorture (cpuhotplug) stress testing due to concurrent mask modification.
>
> To resolve this, the architecture has been fundamentally hardened. The
> global static state has been eradicated. Instead, the IRQ affinity core now
> employs a newly introduced irq_spread_hk_filter(), which safely intersects
> the natively calculated affinity mask with the HK_TYPE_IO_QUEUE mask.
> Crucially, this is achieved using a local, hotplug-safe snapshot via
> data_race(cpu_online_mask). This approach circumvents the hotplug lock
> deadlocks previously identified by Thomas Gleixner, while explicitly
> avoiding CONFIG_CPUMASK_OFFSTACK stack bloat hazards on high-core-count
> systems. A robust fallback mechanism guarantees that should an interrupt
> vector be assigned exclusively to isolated cores, it is safely re-routed to
> the system's online housekeeping CPUs.
>
> Following rigorous testing of multiple queue maps (such as NVMe poll
> queues) alongside isolated CPUs, the tenth iteration resolved a critical
> page fault regression. The multi-queue mapping logic has been corrected to
> strictly maintain absolute hardware queue indices, ensuring faultless queue
> initialisation and preventing out-of-bounds memory access.
>
> Furthermore, following feedback from Ming Lei, the administrative
> documentation for isolcpus=io_queue has undergone a comprehensive overhaul
> to reflect this architectural reality. Previous iterations lacked the
> required technical precision regarding subsystem impact. The expanded
> kernel-parameters.txt now explicitly details that this parameter applies
> strictly to managed IRQs. It thoroughly documents how the block layer
> intercepts multiqueue allocation to match the housekeeping mask, actively
> preventing MSI-X vector exhaustion on massive topologies and forcing queue
> sharing. Most importantly, it cements the structural guarantee: while an
> application on an isolated CPU may freely submit I/O, the hardware
> completion interrupt is strictly and safely offloaded to a housekeeping
> core.
>
> Please let me know your thoughts.
This topic reminds me of a discussion started by Tobias [1] some time
ago about IRQ spreading of network drivers. The problem was (and still
is) that network drivers ignore any CPU isolation when spreading out
device IRQs.
In general we have two different CPU isolation mechanisms:
- The static one, via isolcpus= cmdline parameter
- The dynamic one, via cgroups(v2) cpuset controller
This series is only taking the static "world" into account, right? Are
there any plans to honor the CPU isolations configured the dynamic way?
It has been a while since the last investigations on my end. Last time I
went through the code, the IRQ core was completely decoupled from the
dynamic configuration via cgroups. Are there any plans to fix that gap?
Best regards,
Florian
[1] https://lore.kernel.org/all/a0cad8314124ca98d7c6763e3e08d7192598cf92.camel@siemens.com/
next prev parent reply other threads:[~2026-04-27 10:55 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-22 18:52 [PATCH v12 00/13] blk: honor isolcpus configuration Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 01/13] scsi: aacraid: use block layer helpers to calculate num of queues Aaron Tomlin
2026-05-05 18:42 ` Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 02/13] lib/group_cpus: remove dead !SMP code Aaron Tomlin
2026-04-27 15:21 ` Sebastian Andrzej Siewior
2026-04-29 23:32 ` Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 03/13] lib/group_cpus: Add group_mask_cpus_evenly() Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 04/13] genirq/affinity: Add cpumask to struct irq_affinity Aaron Tomlin
2026-05-05 20:40 ` Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 05/13] blk-mq: add blk_mq_{online|possible}_queue_affinity Aaron Tomlin
2026-04-27 15:34 ` Sebastian Andrzej Siewior
2026-04-28 12:53 ` Daniel Wagner
2026-04-29 7:15 ` Hannes Reinecke
2026-05-05 20:55 ` Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 06/13] nvme-pci: use block layer helpers to constrain queue affinity Aaron Tomlin
2026-05-05 19:47 ` Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 07/13] scsi: Use " Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 08/13] virtio: blk/scsi: use " Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 09/13] isolation: Introduce io_queue isolcpus type Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Aaron Tomlin
2026-05-02 21:25 ` Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 11/13] blk-mq: prevent offlining hk CPUs with associated online isolated CPUs Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 12/13] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs Aaron Tomlin
2026-04-22 18:52 ` [PATCH v12 13/13] docs: add io_queue flag to isolcpus Aaron Tomlin
2026-04-27 10:55 ` Florian Bezdeka [this message]
2026-04-28 13:08 ` [PATCH v12 00/13] blk: honor isolcpus configuration Daniel Wagner
2026-04-29 21:01 ` Florian Bezdeka
2026-04-30 12:09 ` Daniel Wagner
2026-04-30 15:45 ` Jan Kiszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e350389a5a635660267a7a13f06529da102a95d8.camel@siemens.com \
--to=florian.bezdeka@siemens.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=MPT-FusionLinux.pdl@broadcom.com \
--cc=aacraid@microsemi.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@atomlin.com \
--cc=axboe@kernel.dk \
--cc=bigeasy@linutronix.de \
--cc=chandrakanth.patil@broadcom.com \
--cc=chenridong@huawei.com \
--cc=chjohnst@gmail.com \
--cc=frederic@kernel.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jinpu.wang@cloud.ionos.com \
--cc=juri.lelli@redhat.com \
--cc=kashyap.desai@broadcom.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=liyihang9@h-partners.com \
--cc=longman@redhat.com \
--cc=marco.crivellari@suse.com \
--cc=martin.petersen@oracle.com \
--cc=maz@kernel.org \
--cc=megaraidlinux.pdl@broadcom.com \
--cc=ming.lei@redhat.com \
--cc=mingo@redhat.com \
--cc=mpi3mr-linuxdrv.pdl@broadcom.com \
--cc=mproche@gmail.com \
--cc=mst@redhat.com \
--cc=neelx@suse.com \
--cc=nick.lange@gmail.com \
--cc=peterz@infradead.org \
--cc=ranjan.kumar@broadcom.com \
--cc=ruanjinjie@huawei.com \
--cc=sagi@grimberg.me \
--cc=sathya.prakash@broadcom.com \
--cc=sean@ashe.io \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=sreekanth.reddy@broadcom.com \
--cc=steve@abita.co \
--cc=suganath-prabu.subramani@broadcom.com \
--cc=sumit.saxena@broadcom.com \
--cc=tglx@kernel.org \
--cc=tom.leiming@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=virtualization@lists.linux.dev \
--cc=wagi@kernel.org \
--cc=yphbchou0911@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox