From: Aaron Tomlin <atomlin@atomlin.com>
To: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
mst@redhat.com
Cc: atomlin@atomlin.com, aacraid@microsemi.com,
James.Bottomley@HansenPartnership.com,
martin.petersen@oracle.com, liyihang9@h-partners.com,
kashyap.desai@broadcom.com, sumit.saxena@broadcom.com,
shivasharan.srikanteshwara@broadcom.com,
chandrakanth.patil@broadcom.com, sathya.prakash@broadcom.com,
sreekanth.reddy@broadcom.com,
suganath-prabu.subramani@broadcom.com, ranjan.kumar@broadcom.com,
jinpu.wang@cloud.ionos.com, tglx@kernel.org, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, akpm@linux-foundation.org,
maz@kernel.org, ruanjinjie@huawei.com, bigeasy@linutronix.de,
yphbchou0911@gmail.com, wagi@kernel.org, frederic@kernel.org,
longman@redhat.com, chenridong@huawei.com, hare@suse.de,
kch@nvidia.com, ming.lei@redhat.com, tom.leiming@gmail.com,
steve@abita.co, sean@ashe.io, chjohnst@gmail.com, neelx@suse.com,
mproche@gmail.com, nick.lange@gmail.com,
marco.crivellari@suse.com, rishil1999@outlook.com,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v13 5/8] blk-mq: use hk cpus only when isolcpus=io_queue is enabled
Date: Tue, 12 May 2026 20:55:06 -0400 [thread overview]
Message-ID: <20260513005509.135966-6-atomlin@atomlin.com> (raw)
In-Reply-To: <20260513005509.135966-1-atomlin@atomlin.com>
From: Daniel Wagner <wagi@kernel.org>
Extend the capabilities of the generic CPU to hardware queue (hctx)
mapping code, so it maps houskeeping CPUs and isolated CPUs to the
hardware queues evenly.
A hctx is only operational when there is at least one online
housekeeping CPU assigned (aka active_hctx). Thus, check the final
mapping that there is no hctx which has only offline housekeeing CPU and
online isolated CPUs.
Example mapping result:
16 online CPUs
isolcpus=io_queue,2-3,6-7,12-13
Queue mapping:
hctx0: default 0 2
hctx1: default 1 3
hctx2: default 4 6
hctx3: default 5 7
hctx4: default 8 12
hctx5: default 9 13
hctx6: default 10
hctx7: default 11
hctx8: default 14
hctx9: default 15
IRQ mapping:
irq 42 affinity 0 effective 0 nvme0q0
irq 43 affinity 0 effective 0 nvme0q1
irq 44 affinity 1 effective 1 nvme0q2
irq 45 affinity 4 effective 4 nvme0q3
irq 46 affinity 5 effective 5 nvme0q4
irq 47 affinity 8 effective 8 nvme0q5
irq 48 affinity 9 effective 9 nvme0q6
irq 49 affinity 10 effective 10 nvme0q7
irq 50 affinity 11 effective 11 nvme0q8
irq 51 affinity 14 effective 14 nvme0q9
irq 52 affinity 15 effective 15 nvme0q10
A corner case is when the number of online CPUs and present CPUs
differ and the driver asks for less queues than online CPUs, e.g.
8 online CPUs, 16 possible CPUs
isolcpus=io_queue,2-3,6-7,12-13
virtio_blk.num_request_queues=2
Queue mapping:
hctx0: default 0 1 2 3 4 5 6 7 8 12 13
hctx1: default 9 10 11 14 15
IRQ mapping
irq 27 affinity 0 effective 0 virtio0-config
irq 28 affinity 0-1,4-5,8 effective 5 virtio0-req.0
irq 29 affinity 9-11,14-15 effective 0 virtio0-req.1
Noteworthy is that for the normal/default configuration (!isoclpus) the
mapping will change for systems which have non hyperthreading CPUs. The
main assignment loop will completely rely that group_mask_cpus_evenly to
do the right thing. The old code would distribute the CPUs linearly over
the hardware context:
queue mapping for /dev/nvme0n1
hctx0: default 0 8
hctx1: default 1 9
hctx2: default 2 10
hctx3: default 3 11
hctx4: default 4 12
hctx5: default 5 13
hctx6: default 6 14
hctx7: default 7 15
The assign each hardware context the map generated by the
group_mask_cpus_evenly function:
queue mapping for /dev/nvme0n1
hctx0: default 0 1
hctx1: default 2 3
hctx2: default 4 5
hctx3: default 6 7
hctx4: default 8 9
hctx5: default 10 11
hctx6: default 12 13
hctx7: default 14 15
In case of hyperthreading CPUs, the resulting map stays the same.
Signed-off-by: Daniel Wagner <wagi@kernel.org>
[atomlin:
- Updated blk_mq_validate() to use test_bit() for the new bitmap
- Replaced __free cleanups with traditional goto unwinding to align
with subsystem styling
- Updated blk_mq_map_fallback() to use qmap->queue_offset ensuring
secondary maps do not incorrectly route to the primary default map
- Added a bitmap_empty() check to prevent out-of-bounds CPU routing
when all mapped CPUs are offline
- Migrated active_hctx to a dynamically sized bitmap to fix an
out-of-bounds write when hardware queues exceed the system CPU
count
- Fixed absolute vs. relative hardware queue index mix-up in
blk_mq_map_queues() and validation checks
- Fixed typographical errors
- Reduced stack frame size of blk_mq_num_queues()
- Resolved a TOCTOU race against CPU hotplug events by snapshotting
cpu_online_mask to ensure mapping and validation phases agree
- Corrected a loop overwrite bug in blk_mq_map_queues() by iterating
directly over masks to prevent orphaned queues from being activated
- Restored topology-aware multi-queue fallback in
blk_mq_map_hw_queues() for devices lacking IRQ affinity hints
- Hardened isolation logic in blk_mq_map_hw_queues() to require online
housekeeping CPUs before marking a hardware queue as active
- Optimised active queue evaluations by short-circuiting redundant
checks once a valid CPU is found]
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
block/blk-mq-cpumap.c | 224 ++++++++++++++++++++++++++++++++++++++----
1 file changed, 207 insertions(+), 17 deletions(-)
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 705da074ad6c..f953714d190c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -22,7 +22,11 @@ static unsigned int blk_mq_num_queues(const struct cpumask *mask,
{
unsigned int num;
- num = cpumask_weight(mask);
+ if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
+ num = cpumask_weight_and(mask, housekeeping_cpumask(HK_TYPE_IO_QUEUE));
+ else
+ num = cpumask_weight(mask);
+
return min_not_zero(num, max_queues);
}
@@ -33,7 +37,8 @@ static unsigned int blk_mq_num_queues(const struct cpumask *mask,
* ignored.
*
* Calculates the number of queues to be used for a multiqueue
- * device based on the number of possible CPUs.
+ * device based on the number of possible CPUs. This helper
+ * takes isolcpus settings into account.
*/
unsigned int blk_mq_num_possible_queues(unsigned int max_queues)
{
@@ -48,7 +53,8 @@ EXPORT_SYMBOL_GPL(blk_mq_num_possible_queues);
* ignored.
*
* Calculates the number of queues to be used for a multiqueue
- * device based on the number of online CPUs.
+ * device based on the number of online CPUs. This helper
+ * takes isolcpus settings into account.
*/
unsigned int blk_mq_num_online_queues(unsigned int max_queues)
{
@@ -56,23 +62,139 @@ unsigned int blk_mq_num_online_queues(unsigned int max_queues)
}
EXPORT_SYMBOL_GPL(blk_mq_num_online_queues);
+static bool blk_mq_validate(struct blk_mq_queue_map *qmap,
+ const unsigned long *active_hctx,
+ const struct cpumask *online_mask)
+{
+ /*
+ * Verify if the mapping is usable when housekeeping
+ * configuration is enabled
+ */
+ for (int queue = 0; queue < qmap->nr_queues; queue++) {
+ int cpu;
+
+ if (test_bit(queue, active_hctx)) {
+ /*
+ * This hctx has at least one online CPU thus it
+ * is able to serve any assigned isolated CPU.
+ */
+ continue;
+ }
+
+ /*
+ * There is no housekeeping online CPU for this hctx, all
+ * good as long as all non-housekeeping CPUs are also
+ * offline.
+ */
+ for_each_cpu(cpu, online_mask) {
+ if (qmap->mq_map[cpu] != qmap->queue_offset + queue)
+ continue;
+
+ pr_warn("Unable to create a usable CPU-to-queue mapping with the given constraints\n");
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static void blk_mq_map_fallback(struct blk_mq_queue_map *qmap)
+{
+ unsigned int cpu;
+
+ /*
+ * Map all CPUs to the first hctx of this specific map to ensure
+ * at least one online CPU is serving it, respecting the map's
+ * boundaries so secondary maps do not route into the default map.
+ */
+ for_each_possible_cpu(cpu)
+ qmap->mq_map[cpu] = qmap->queue_offset;
+}
+
void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
{
- const struct cpumask *masks;
+ struct cpumask *masks;
+ const struct cpumask *constraint;
unsigned int queue, cpu, nr_masks;
+ unsigned long *active_hctx;
+ cpumask_var_t online_mask;
- masks = group_cpus_evenly(qmap->nr_queues, &nr_masks);
- if (!masks) {
- for_each_possible_cpu(cpu)
- qmap->mq_map[cpu] = qmap->queue_offset;
- return;
- }
+ active_hctx = bitmap_zalloc(qmap->nr_queues, GFP_KERNEL);
+ if (!active_hctx)
+ goto fallback;
- for (queue = 0; queue < qmap->nr_queues; queue++) {
- for_each_cpu(cpu, &masks[queue % nr_masks])
+ if (!alloc_cpumask_var(&online_mask, GFP_KERNEL))
+ goto free_fallback_hctx;
+
+ /*
+ * Snapshot online CPUs to prevent TOCTOU races between the
+ * mapping phase and the validation phase.
+ */
+ cpumask_copy(online_mask, cpu_online_mask);
+
+ if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
+ constraint = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+ else
+ constraint = cpu_possible_mask;
+
+ /* Map CPUs to the hardware contexts (hctx) */
+ masks = group_mask_cpus_evenly(qmap->nr_queues, constraint, &nr_masks);
+ if (!masks)
+ goto free_fallback;
+
+ /*
+ * Iterate directly over the generated CPU masks.
+ * Calculate the final, highest hardware queue index that maps to this
+ * mask. This skips all intermediate overwrites and safely evaluates
+ * active_hctx only for queues that survive the mapping.
+ */
+ for (unsigned int idx = 0; idx < nr_masks; idx++) {
+ bool active = false;
+ queue = qmap->nr_queues - 1 -
+ ((qmap->nr_queues - 1 - idx) % nr_masks);
+
+ for_each_cpu(cpu, &masks[idx]) {
qmap->mq_map[cpu] = qmap->queue_offset + queue;
+
+ if (!active && cpumask_test_cpu(cpu, online_mask)) {
+ __set_bit(queue, active_hctx);
+ active = true;
+ }
+ }
+ }
+
+ /*
+ * If all CPUs in the generated masks are offline, the active_hctx
+ * bitmap will be empty. Attempting to route unassigned CPUs to an
+ * empty bitmap will map them out-of-bounds. Fall back instead.
+ */
+ if (bitmap_empty(active_hctx, qmap->nr_queues))
+ goto free_fallback;
+
+ /* Map any unassigned CPU evenly to the hardware contexts (hctx) */
+ queue = find_first_bit(active_hctx, qmap->nr_queues);
+ for_each_cpu_andnot(cpu, cpu_possible_mask, constraint) {
+ qmap->mq_map[cpu] = qmap->queue_offset + queue;
+ queue = find_next_bit_wrap(active_hctx, qmap->nr_queues, queue + 1);
}
+
+ if (!blk_mq_validate(qmap, active_hctx, online_mask))
+ goto free_fallback;
+
kfree(masks);
+ free_cpumask_var(online_mask);
+ bitmap_free(active_hctx);
+
+ return;
+
+free_fallback:
+ kfree(masks);
+ free_cpumask_var(online_mask);
+free_fallback_hctx:
+ bitmap_free(active_hctx);
+
+fallback:
+ blk_mq_map_fallback(qmap);
}
EXPORT_SYMBOL_GPL(blk_mq_map_queues);
@@ -109,24 +231,92 @@ void blk_mq_map_hw_queues(struct blk_mq_queue_map *qmap,
struct device *dev, unsigned int offset)
{
- const struct cpumask *mask;
+ cpumask_var_t mask, online_mask;
+ const struct cpumask *constraint;
+ unsigned long *active_hctx;
unsigned int queue, cpu;
if (!dev->bus->irq_get_affinity)
+ goto map_software;
+
+ active_hctx = bitmap_zalloc(qmap->nr_queues, GFP_KERNEL);
+ if (!active_hctx)
+ goto fallback;
+
+ if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+ bitmap_free(active_hctx);
goto fallback;
+ }
+
+ if (!alloc_cpumask_var(&online_mask, GFP_KERNEL))
+ goto free_fallback_mask;
+ if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
+ constraint = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+ else
+ constraint = cpu_possible_mask;
+
+ /*
+ * Snapshot online CPUs to prevent TOCTOU races between the
+ * mapping phase and the validation phase.
+ */
+ cpumask_copy(online_mask, cpu_online_mask);
+
+ /* Map CPUs to the hardware contexts (hctx) */
for (queue = 0; queue < qmap->nr_queues; queue++) {
- mask = dev->bus->irq_get_affinity(dev, queue + offset);
- if (!mask)
- goto fallback;
+ const struct cpumask *affinity_mask;
+ bool active = false;
+
+ affinity_mask = dev->bus->irq_get_affinity(dev, offset + queue);
+ if (!affinity_mask)
+ goto free_fallback;
- for_each_cpu(cpu, mask)
+ for_each_cpu(cpu, affinity_mask) {
qmap->mq_map[cpu] = qmap->queue_offset + queue;
+
+ cpumask_set_cpu(cpu, mask);
+ if (!active && cpumask_test_cpu(cpu, online_mask) &&
+ cpumask_test_cpu(cpu, constraint)) {
+ __set_bit(queue, active_hctx);
+ active = true;
+ }
+ }
+ }
+
+ /*
+ * If all CPUs assigned to this map are offline, the bitmap will
+ * be empty. Fall back instead of routing out of bounds.
+ */
+ if (bitmap_empty(active_hctx, qmap->nr_queues))
+ goto free_fallback;
+
+ /* Map any unassigned CPU evenly to the hardware contexts (hctx) */
+ queue = find_first_bit(active_hctx, qmap->nr_queues);
+ for_each_cpu_andnot(cpu, cpu_possible_mask, mask) {
+ qmap->mq_map[cpu] = qmap->queue_offset + queue;
+ queue = find_next_bit_wrap(active_hctx, qmap->nr_queues, queue + 1);
}
+ if (!blk_mq_validate(qmap, active_hctx, online_mask))
+ goto free_fallback;
+
+ bitmap_free(active_hctx);
+ free_cpumask_var(mask);
+ free_cpumask_var(online_mask);
+
return;
+free_fallback:
+ free_cpumask_var(online_mask);
+free_fallback_mask:
+ bitmap_free(active_hctx);
+ free_cpumask_var(mask);
+
fallback:
+ blk_mq_map_fallback(qmap);
+ return;
+
+map_software:
blk_mq_map_queues(qmap);
}
EXPORT_SYMBOL_GPL(blk_mq_map_hw_queues);
--
2.51.0
next prev parent reply other threads:[~2026-05-13 0:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 0:55 [PATCH v13 0/8] blk: honor isolcpus configuration Aaron Tomlin
2026-05-13 0:55 ` [PATCH v13 1/8] scsi: aacraid: use block layer helpers to calculate num of queues Aaron Tomlin
2026-05-13 0:55 ` [PATCH v13 2/8] lib/group_cpus: remove dead !SMP code Aaron Tomlin
2026-05-13 0:55 ` [PATCH v13 3/8] lib/group_cpus: Add group_mask_cpus_evenly() Aaron Tomlin
2026-05-13 0:55 ` [PATCH v13 4/8] isolation: Introduce io_queue isolcpus type Aaron Tomlin
2026-05-13 0:55 ` Aaron Tomlin [this message]
[not found] ` <3af2cd18-1221-4ff6-aa7f-6dab74460eab@nitrogen.local>
2026-05-13 23:30 ` [PATCH v13 5/8] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Aaron Tomlin
2026-05-14 10:42 ` Daniel Wagner
2026-05-13 0:55 ` [PATCH v13 6/8] blk-mq: prevent offlining hk CPUs with associated online isolated CPUs Aaron Tomlin
2026-05-13 0:55 ` [PATCH v13 7/8] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs Aaron Tomlin
2026-05-13 0:55 ` [PATCH v13 8/8] docs: add io_queue flag to isolcpus Aaron Tomlin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260513005509.135966-6-atomlin@atomlin.com \
--to=atomlin@atomlin.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=aacraid@microsemi.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bigeasy@linutronix.de \
--cc=chandrakanth.patil@broadcom.com \
--cc=chenridong@huawei.com \
--cc=chjohnst@gmail.com \
--cc=frederic@kernel.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jinpu.wang@cloud.ionos.com \
--cc=juri.lelli@redhat.com \
--cc=kashyap.desai@broadcom.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=liyihang9@h-partners.com \
--cc=longman@redhat.com \
--cc=marco.crivellari@suse.com \
--cc=martin.petersen@oracle.com \
--cc=maz@kernel.org \
--cc=ming.lei@redhat.com \
--cc=mingo@redhat.com \
--cc=mproche@gmail.com \
--cc=mst@redhat.com \
--cc=neelx@suse.com \
--cc=nick.lange@gmail.com \
--cc=peterz@infradead.org \
--cc=ranjan.kumar@broadcom.com \
--cc=rishil1999@outlook.com \
--cc=ruanjinjie@huawei.com \
--cc=sagi@grimberg.me \
--cc=sathya.prakash@broadcom.com \
--cc=sean@ashe.io \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=sreekanth.reddy@broadcom.com \
--cc=steve@abita.co \
--cc=suganath-prabu.subramani@broadcom.com \
--cc=sumit.saxena@broadcom.com \
--cc=tglx@kernel.org \
--cc=tom.leiming@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=wagi@kernel.org \
--cc=yphbchou0911@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.