From: Aaron Tomlin <atomlin@atomlin.com>
To: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
mst@redhat.com
Cc: atomlin@atomlin.com, aacraid@microsemi.com,
James.Bottomley@HansenPartnership.com,
martin.petersen@oracle.com, liyihang9@h-partners.com,
kashyap.desai@broadcom.com, sumit.saxena@broadcom.com,
shivasharan.srikanteshwara@broadcom.com,
chandrakanth.patil@broadcom.com, sathya.prakash@broadcom.com,
sreekanth.reddy@broadcom.com,
suganath-prabu.subramani@broadcom.com, ranjan.kumar@broadcom.com,
jinpu.wang@cloud.ionos.com, tglx@kernel.org, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, akpm@linux-foundation.org,
maz@kernel.org, ruanjinjie@huawei.com, bigeasy@linutronix.de,
yphbchou0911@gmail.com, wagi@kernel.org, frederic@kernel.org,
longman@redhat.com, chenridong@huawei.com, hare@suse.de,
kch@nvidia.com, ming.lei@redhat.com, tom.leiming@gmail.com,
steve@abita.co, sean@ashe.io, chjohnst@gmail.com, neelx@suse.com,
mproche@gmail.com, nick.lange@gmail.com,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux.dev, linux-nvme@lists.infradead.org,
linux-scsi@vger.kernel.org, megaraidlinux.pdl@broadcom.com,
mpi3mr-linuxdrv.pdl@broadcom.com,
MPT-FusionLinux.pdl@broadcom.com
Subject: [PATCH v11 00/13] blk: honor isolcpus configuration
Date: Thu, 16 Apr 2026 15:29:29 -0400 [thread overview]
Message-ID: <20260416192942.1243421-1-atomlin@atomlin.com> (raw)
Hi,
I have decided to drive this series forward on behalf of Daniel Wagner, the
original author. This iteration addresses the outstanding architectural and
concurrency concerns raised during the previous review cycle, and the series
has been rebased on v7.0-rc5-509-g545475aebc2a.
Building upon prior iterations, this series introduces critical
architectural refinements to the mapping and affinity spreading algorithms
to guarantee thread safety and resilience against concurrent CPU-hotplug
operations. Previously, the block layer relied on a shared global static
mask (i.e., blk_hk_online_mask), which proved vulnerable to race conditions
during rapid hotplug events. This vulnerability was recently highlighted by
the kernel test robot, which encountered a NULL pointer dereference during
rcutorture (cpuhotplug) stress testing due to concurrent mask modification.
To resolve this, the architecture has been fundamentally hardened. The
global static state has been eradicated. Instead, the IRQ affinity core now
employs a newly introduced irq_spread_hk_filter(), which safely intersects
the natively calculated affinity mask with the HK_TYPE_IO_QUEUE mask.
Crucially, this is achieved using a local, hotplug-safe snapshot via
data_race(cpu_online_mask). This approach circumvents the hotplug lock
deadlocks previously identified by Thomas Gleixner, while explicitly
avoiding CONFIG_CPUMASK_OFFSTACK stack bloat hazards on high-core-count
systems. A robust fallback mechanism guarantees that should an interrupt
vector be assigned exclusively to isolated cores, it is safely re-routed to
the system's online housekeeping CPUs.
Following rigorous testing of multiple queue maps (such as NVMe poll
queues) alongside isolated CPUs, the tenth iteration resolved a critical
page fault regression. The multi-queue mapping logic has been corrected to
strictly maintain absolute hardware queue indices, ensuring faultless queue
initialisation and preventing out-of-bounds memory access.
Furthermore, following feedback from Ming Lei, the administrative
documentation for isolcpus=io_queue has undergone a comprehensive overhaul
to reflect this architectural reality. Previous iterations lacked the
required technical precision regarding subsystem impact. The expanded
kernel-parameters.txt now explicitly details that this parameter applies
strictly to managed IRQs. It thoroughly documents how the block layer
intercepts multiqueue allocation to match the housekeeping mask, actively
preventing MSI-X vector exhaustion on massive topologies and forcing queue
sharing. Most importantly, it cements the structural guarantee: while an
application on an isolated CPU may freely submit I/O, the hardware
completion interrupt is strictly and safely offloaded to a housekeeping
core.
Please let me know your thoughts.
Aaron Tomlin (1):
genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs
Daniel Wagner (12):
scsi: aacraid: use block layer helpers to calculate num of queues
lib/group_cpus: remove dead !SMP code
lib/group_cpus: Add group_mask_cpus_evenly()
genirq/affinity: Add cpumask to struct irq_affinity
blk-mq: add blk_mq_{online|possible}_queue_affinity
nvme-pci: use block layer helpers to constrain queue affinity
scsi: Use block layer helpers to constrain queue affinity
virtio: blk/scsi: use block layer helpers to constrain queue affinity
isolation: Introduce io_queue isolcpus type
blk-mq: use hk cpus only when isolcpus=io_queue is enabled
blk-mq: prevent offlining hk CPUs with associated online isolated CPUs
docs: add io_queue flag to isolcpus
.../admin-guide/kernel-parameters.txt | 30 ++-
block/blk-mq-cpumap.c | 192 ++++++++++++++++--
block/blk-mq.c | 42 ++++
drivers/block/virtio_blk.c | 4 +-
drivers/nvme/host/pci.c | 1 +
drivers/scsi/aacraid/comminit.c | 3 +-
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 +
drivers/scsi/megaraid/megaraid_sas_base.c | 5 +-
drivers/scsi/mpi3mr/mpi3mr_fw.c | 6 +-
drivers/scsi/mpt3sas/mpt3sas_base.c | 5 +-
drivers/scsi/pm8001/pm8001_init.c | 1 +
drivers/scsi/virtio_scsi.c | 5 +-
include/linux/blk-mq.h | 2 +
include/linux/group_cpus.h | 3 +
include/linux/interrupt.h | 16 +-
include/linux/sched/isolation.h | 1 +
kernel/irq/affinity.c | 38 +++-
kernel/sched/isolation.c | 7 +
lib/group_cpus.c | 65 ++++--
19 files changed, 379 insertions(+), 48 deletions(-)
base-commit: 3cd8b194bf3428dfa53120fee47e827a7c495815
--
2.51.0
next reply other threads:[~2026-04-16 19:29 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 19:29 Aaron Tomlin [this message]
2026-04-16 19:29 ` [PATCH v11 01/13] scsi: aacraid: use block layer helpers to calculate num of queues Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 02/13] lib/group_cpus: remove dead !SMP code Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 03/13] lib/group_cpus: Add group_mask_cpus_evenly() Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 04/13] genirq/affinity: Add cpumask to struct irq_affinity Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 05/13] blk-mq: add blk_mq_{online|possible}_queue_affinity Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 06/13] nvme-pci: use block layer helpers to constrain queue affinity Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 07/13] scsi: Use " Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 08/13] virtio: blk/scsi: use " Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 09/13] isolation: Introduce io_queue isolcpus type Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 11/13] blk-mq: prevent offlining hk CPUs with associated online isolated CPUs Aaron Tomlin
2026-04-17 16:11 ` Marco Crivellari
2026-04-17 18:06 ` Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 12/13] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs Aaron Tomlin
2026-04-16 19:29 ` [PATCH v11 13/13] docs: add io_queue flag to isolcpus Aaron Tomlin
2026-04-16 19:38 ` [PATCH v11 00/13] blk: honor isolcpus configuration Aaron Tomlin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260416192942.1243421-1-atomlin@atomlin.com \
--to=atomlin@atomlin.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=MPT-FusionLinux.pdl@broadcom.com \
--cc=aacraid@microsemi.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bigeasy@linutronix.de \
--cc=chandrakanth.patil@broadcom.com \
--cc=chenridong@huawei.com \
--cc=chjohnst@gmail.com \
--cc=frederic@kernel.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jinpu.wang@cloud.ionos.com \
--cc=juri.lelli@redhat.com \
--cc=kashyap.desai@broadcom.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=liyihang9@h-partners.com \
--cc=longman@redhat.com \
--cc=martin.petersen@oracle.com \
--cc=maz@kernel.org \
--cc=megaraidlinux.pdl@broadcom.com \
--cc=ming.lei@redhat.com \
--cc=mingo@redhat.com \
--cc=mpi3mr-linuxdrv.pdl@broadcom.com \
--cc=mproche@gmail.com \
--cc=mst@redhat.com \
--cc=neelx@suse.com \
--cc=nick.lange@gmail.com \
--cc=peterz@infradead.org \
--cc=ranjan.kumar@broadcom.com \
--cc=ruanjinjie@huawei.com \
--cc=sagi@grimberg.me \
--cc=sathya.prakash@broadcom.com \
--cc=sean@ashe.io \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=sreekanth.reddy@broadcom.com \
--cc=steve@abita.co \
--cc=suganath-prabu.subramani@broadcom.com \
--cc=sumit.saxena@broadcom.com \
--cc=tglx@kernel.org \
--cc=tom.leiming@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=virtualization@lists.linux.dev \
--cc=wagi@kernel.org \
--cc=yphbchou0911@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox