* [PATCH v10 01/13] scsi: aacraid: use block layer helpers to calculate num of queues
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-03 1:43 ` Martin K. Petersen
2026-04-01 22:23 ` [PATCH v10 02/13] lib/group_cpus: remove dead !SMP code Aaron Tomlin
` (11 subsequent siblings)
12 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
The calculation of the upper limit for queues does not depend solely on
the number of online CPUs; for example, the isolcpus kernel
command-line option must also be considered.
To account for this, the block layer provides a helper function to
retrieve the maximum number of queues. Use it to set an appropriate
upper queue number limit.
Fixes: 94970cfb5f10 ("scsi: use block layer helpers to calculate num of queues")
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
drivers/scsi/aacraid/comminit.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
index 9bd3f5b868bc..ec165b57182d 100644
--- a/drivers/scsi/aacraid/comminit.c
+++ b/drivers/scsi/aacraid/comminit.c
@@ -469,8 +469,7 @@ void aac_define_int_mode(struct aac_dev *dev)
}
/* Don't bother allocating more MSI-X vectors than cpus */
- msi_count = min(dev->max_msix,
- (unsigned int)num_online_cpus());
+ msi_count = blk_mq_num_online_queues(dev->max_msix);
dev->max_msix = msi_count;
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v10 01/13] scsi: aacraid: use block layer helpers to calculate num of queues
2026-04-01 22:23 ` [PATCH v10 01/13] scsi: aacraid: use block layer helpers to calculate num of queues Aaron Tomlin
@ 2026-04-03 1:43 ` Martin K. Petersen
0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2026-04-03 1:43 UTC (permalink / raw)
To: Aaron Tomlin
Cc: axboe, kbusch, hch, sagi, mst, aacraid, James.Bottomley,
martin.petersen, liyihang9, kashyap.desai, sumit.saxena,
shivasharan.srikanteshwara, chandrakanth.patil, sathya.prakash,
sreekanth.reddy, suganath-prabu.subramani, ranjan.kumar,
jinpu.wang, tglx, mingo, peterz, juri.lelli, vincent.guittot,
akpm, maz, ruanjinjie, bigeasy, yphbchou0911, wagi, frederic,
longman, chenridong, hare, kch, ming.lei, steve, sean, chjohnst,
neelx, mproche, linux-block, linux-kernel, virtualization,
linux-nvme, linux-scsi, megaraidlinux.pdl, mpi3mr-linuxdrv.pdl,
MPT-FusionLinux.pdl
Aaron,
> The calculation of the upper limit for queues does not depend solely on
> the number of online CPUs; for example, the isolcpus kernel
> command-line option must also be considered.
>
> To account for this, the block layer provides a helper function to
> retrieve the maximum number of queues. Use it to set an appropriate
> upper queue number limit.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v10 02/13] lib/group_cpus: remove dead !SMP code
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
2026-04-01 22:23 ` [PATCH v10 01/13] scsi: aacraid: use block layer helpers to calculate num of queues Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-03 1:45 ` Martin K. Petersen
2026-04-01 22:23 ` [PATCH v10 03/13] lib/group_cpus: Add group_mask_cpus_evenly() Aaron Tomlin
` (10 subsequent siblings)
12 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
The support for the !SMP configuration has been removed from the core by
commit cac5cefbade9 ("sched/smp: Make SMP unconditional").
While one can technically still compile a uniprocessor kernel, the core
scheduler now mandates SMP unconditionally, rendering this particular
!SMP fallback handling redundant. Therefore, remove the #ifdef CONFIG_SMP
guards and the fallback logic.
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[atomlin: Updated commit message to clarify !SMP removal context]
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
lib/group_cpus.c | 20 --------------------
1 file changed, 20 deletions(-)
diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index e6e18d7a49bb..b8d54398f88a 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -9,8 +9,6 @@
#include <linux/sort.h>
#include <linux/group_cpus.h>
-#ifdef CONFIG_SMP
-
static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
unsigned int cpus_per_grp)
{
@@ -564,22 +562,4 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps, unsigned int *nummasks)
*nummasks = min(nr_present + nr_others, numgrps);
return masks;
}
-#else /* CONFIG_SMP */
-struct cpumask *group_cpus_evenly(unsigned int numgrps, unsigned int *nummasks)
-{
- struct cpumask *masks;
-
- if (numgrps == 0)
- return NULL;
-
- masks = kzalloc_objs(*masks, numgrps);
- if (!masks)
- return NULL;
-
- /* assign all CPUs(cpu 0) to the 1st group only */
- cpumask_copy(&masks[0], cpu_possible_mask);
- *nummasks = 1;
- return masks;
-}
-#endif /* CONFIG_SMP */
EXPORT_SYMBOL_GPL(group_cpus_evenly);
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v10 02/13] lib/group_cpus: remove dead !SMP code
2026-04-01 22:23 ` [PATCH v10 02/13] lib/group_cpus: remove dead !SMP code Aaron Tomlin
@ 2026-04-03 1:45 ` Martin K. Petersen
0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2026-04-03 1:45 UTC (permalink / raw)
To: Aaron Tomlin
Cc: axboe, kbusch, hch, sagi, mst, aacraid, James.Bottomley,
martin.petersen, liyihang9, kashyap.desai, sumit.saxena,
shivasharan.srikanteshwara, chandrakanth.patil, sathya.prakash,
sreekanth.reddy, suganath-prabu.subramani, ranjan.kumar,
jinpu.wang, tglx, mingo, peterz, juri.lelli, vincent.guittot,
akpm, maz, ruanjinjie, bigeasy, yphbchou0911, wagi, frederic,
longman, chenridong, hare, kch, ming.lei, steve, sean, chjohnst,
neelx, mproche, linux-block, linux-kernel, virtualization,
linux-nvme, linux-scsi, megaraidlinux.pdl, mpi3mr-linuxdrv.pdl,
MPT-FusionLinux.pdl
Aaron,
> The support for the !SMP configuration has been removed from the core by
> commit cac5cefbade9 ("sched/smp: Make SMP unconditional").
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v10 03/13] lib/group_cpus: Add group_mask_cpus_evenly()
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
2026-04-01 22:23 ` [PATCH v10 01/13] scsi: aacraid: use block layer helpers to calculate num of queues Aaron Tomlin
2026-04-01 22:23 ` [PATCH v10 02/13] lib/group_cpus: remove dead !SMP code Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-01 22:23 ` [PATCH v10 04/13] genirq/affinity: Add cpumask to struct irq_affinity Aaron Tomlin
` (9 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
group_mask_cpu_evenly() allows the caller to pass in a CPU mask that
should be evenly distributed. This new function is a more generic
version of the existing group_cpus_evenly(), which always distributes
all present CPUs into groups.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
include/linux/group_cpus.h | 3 ++
lib/group_cpus.c | 59 ++++++++++++++++++++++++++++++++++++++
2 files changed, 62 insertions(+)
diff --git a/include/linux/group_cpus.h b/include/linux/group_cpus.h
index 9d4e5ab6c314..defab4123a82 100644
--- a/include/linux/group_cpus.h
+++ b/include/linux/group_cpus.h
@@ -10,5 +10,8 @@
#include <linux/cpu.h>
struct cpumask *group_cpus_evenly(unsigned int numgrps, unsigned int *nummasks);
+struct cpumask *group_mask_cpus_evenly(unsigned int numgrps,
+ const struct cpumask *mask,
+ unsigned int *nummasks);
#endif
diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index b8d54398f88a..d3e9a20250ff 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -8,6 +8,7 @@
#include <linux/cpu.h>
#include <linux/sort.h>
#include <linux/group_cpus.h>
+#include <linux/sched/isolation.h>
static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
unsigned int cpus_per_grp)
@@ -563,3 +564,61 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps, unsigned int *nummasks)
return masks;
}
EXPORT_SYMBOL_GPL(group_cpus_evenly);
+
+/**
+ * group_mask_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
+ * @numgrps: number of cpumasks to create
+ * @mask: CPUs to consider for the grouping
+ * @nummasks: number of initialized cpusmasks
+ *
+ * Return: cpumask array if successful, NULL otherwise. Only the CPUs
+ * marked in the mask will be considered for the grouping. And each
+ * element includes CPUs assigned to this group. nummasks contains the
+ * number of initialized masks which can be less than numgrps. cpu_mask
+ *
+ * Try to put close CPUs from viewpoint of CPU and NUMA locality into
+ * same group, and run two-stage grouping:
+ * 1) allocate present CPUs on these groups evenly first
+ * 2) allocate other possible CPUs on these groups evenly
+ *
+ * We guarantee in the resulted grouping that all CPUs are covered, and
+ * no same CPU is assigned to multiple groups
+ */
+struct cpumask *group_mask_cpus_evenly(unsigned int numgrps,
+ const struct cpumask *mask,
+ unsigned int *nummasks)
+{
+ cpumask_var_t *node_to_cpumask;
+ cpumask_var_t nmsk;
+ int ret = -ENOMEM;
+ struct cpumask *masks = NULL;
+
+ if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
+ return NULL;
+
+ node_to_cpumask = alloc_node_to_cpumask();
+ if (!node_to_cpumask)
+ goto fail_nmsk;
+
+ masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
+ if (!masks)
+ goto fail_node_to_cpumask;
+
+ build_node_to_cpumask(node_to_cpumask);
+
+ ret = __group_cpus_evenly(0, numgrps, node_to_cpumask, mask, nmsk,
+ masks);
+
+fail_node_to_cpumask:
+ free_node_to_cpumask(node_to_cpumask);
+
+fail_nmsk:
+ free_cpumask_var(nmsk);
+ if (ret < 0) {
+ kfree(masks);
+ return NULL;
+ }
+ *nummasks = ret;
+ return masks;
+}
+EXPORT_SYMBOL_GPL(group_mask_cpus_evenly);
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH v10 04/13] genirq/affinity: Add cpumask to struct irq_affinity
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (2 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 03/13] lib/group_cpus: Add group_mask_cpus_evenly() Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-01 22:23 ` [PATCH v10 05/13] blk-mq: add blk_mq_{online|possible}_queue_affinity Aaron Tomlin
` (8 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
Pass a cpumask to irq_create_affinity_masks as an additional constraint
to consider when creating the affinity masks. This allows the caller to
exclude specific CPUs, e.g., isolated CPUs (see the 'isolcpus' kernel
command-line parameter).
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
include/linux/interrupt.h | 16 ++++++++++------
kernel/irq/affinity.c | 12 ++++++++++--
2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 6cd26ffb0505..afd5a2c75b43 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -287,18 +287,22 @@ struct irq_affinity_notify {
* @nr_sets: The number of interrupt sets for which affinity
* spreading is required
* @set_size: Array holding the size of each interrupt set
+ * @mask: cpumask that constrains which CPUs to consider when
+ * calculating the number and size of the interrupt sets
* @calc_sets: Callback for calculating the number and size
* of interrupt sets
* @priv: Private data for usage by @calc_sets, usually a
* pointer to driver/device specific data.
*/
struct irq_affinity {
- unsigned int pre_vectors;
- unsigned int post_vectors;
- unsigned int nr_sets;
- unsigned int set_size[IRQ_AFFINITY_MAX_SETS];
- void (*calc_sets)(struct irq_affinity *, unsigned int nvecs);
- void *priv;
+ unsigned int pre_vectors;
+ unsigned int post_vectors;
+ unsigned int nr_sets;
+ unsigned int set_size[IRQ_AFFINITY_MAX_SETS];
+ const struct cpumask *mask;
+ void (*calc_sets)(struct irq_affinity *,
+ unsigned int nvecs);
+ void *priv;
};
/**
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 85c45cfe7223..076a5ef1e306 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -70,7 +70,13 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
*/
for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
unsigned int nr_masks, this_vecs = affd->set_size[i];
- struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks);
+ struct cpumask *result;
+
+ if (affd->mask)
+ result = group_mask_cpus_evenly(this_vecs, affd->mask,
+ &nr_masks);
+ else
+ result = group_cpus_evenly(this_vecs, &nr_masks);
if (!result) {
kfree(masks);
@@ -115,7 +121,9 @@ unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec,
if (resv > minvec)
return 0;
- if (affd->calc_sets) {
+ if (affd->mask) {
+ set_vecs = cpumask_weight(affd->mask);
+ } else if (affd->calc_sets) {
set_vecs = maxvec - resv;
} else {
cpus_read_lock();
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH v10 05/13] blk-mq: add blk_mq_{online|possible}_queue_affinity
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (3 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 04/13] genirq/affinity: Add cpumask to struct irq_affinity Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-01 22:23 ` [PATCH v10 06/13] nvme-pci: use block layer helpers to constrain queue affinity Aaron Tomlin
` (7 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
Introduce blk_mq_{online|possible}_queue_affinity, which returns the
queue-to-CPU mapping constraints defined by the block layer. This allows
other subsystems (e.g., IRQ affinity setup) to respect block layer
requirements.
It is necessary to provide versions for both the online and possible CPU
masks because some drivers want to spread their I/O queues only across
online CPUs, while others prefer to use all possible CPUs. And the mask
used needs to match with the number of queues requested
(see blk_num_{online|possible}_queues).
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
block/blk-mq-cpumap.c | 24 ++++++++++++++++++++++++
include/linux/blk-mq.h | 2 ++
2 files changed, 26 insertions(+)
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 705da074ad6c..8244ecf87835 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -26,6 +26,30 @@ static unsigned int blk_mq_num_queues(const struct cpumask *mask,
return min_not_zero(num, max_queues);
}
+/**
+ * blk_mq_possible_queue_affinity - Return block layer queue affinity
+ *
+ * Returns an affinity mask that represents the queue-to-CPU mapping
+ * requested by the block layer based on possible CPUs.
+ */
+const struct cpumask *blk_mq_possible_queue_affinity(void)
+{
+ return cpu_possible_mask;
+}
+EXPORT_SYMBOL_GPL(blk_mq_possible_queue_affinity);
+
+/**
+ * blk_mq_online_queue_affinity - Return block layer queue affinity
+ *
+ * Returns an affinity mask that represents the queue-to-CPU mapping
+ * requested by the block layer based on online CPUs.
+ */
+const struct cpumask *blk_mq_online_queue_affinity(void)
+{
+ return cpu_online_mask;
+}
+EXPORT_SYMBOL_GPL(blk_mq_online_queue_affinity);
+
/**
* blk_mq_num_possible_queues - Calc nr of queues for multiqueue devices
* @max_queues: The maximum number of queues the hardware/driver
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 18a2388ba581..ebc45557aee8 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -969,6 +969,8 @@ int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
void blk_mq_unfreeze_queue_non_owner(struct request_queue *q);
void blk_freeze_queue_start_non_owner(struct request_queue *q);
+const struct cpumask *blk_mq_possible_queue_affinity(void);
+const struct cpumask *blk_mq_online_queue_affinity(void);
unsigned int blk_mq_num_possible_queues(unsigned int max_queues);
unsigned int blk_mq_num_online_queues(unsigned int max_queues);
void blk_mq_map_queues(struct blk_mq_queue_map *qmap);
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH v10 06/13] nvme-pci: use block layer helpers to constrain queue affinity
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (4 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 05/13] blk-mq: add blk_mq_{online|possible}_queue_affinity Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-03 1:46 ` Martin K. Petersen
2026-04-01 22:23 ` [PATCH v10 07/13] scsi: Use " Aaron Tomlin
` (6 subsequent siblings)
12 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
Ensure that IRQ affinity setup also respects the queue-to-CPU mapping
constraints provided by the block layer. This allows the NVMe driver
to avoid assigning interrupts to CPUs that the block layer has excluded
(e.g., isolated CPUs).
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
drivers/nvme/host/pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b78ba239c8ea..8e05ad06283e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2862,6 +2862,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
.pre_vectors = 1,
.calc_sets = nvme_calc_irq_sets,
.priv = dev,
+ .mask = blk_mq_possible_queue_affinity(),
};
unsigned int irq_queues, poll_queues;
unsigned int flags = PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY;
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v10 06/13] nvme-pci: use block layer helpers to constrain queue affinity
2026-04-01 22:23 ` [PATCH v10 06/13] nvme-pci: use block layer helpers to constrain queue affinity Aaron Tomlin
@ 2026-04-03 1:46 ` Martin K. Petersen
0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2026-04-03 1:46 UTC (permalink / raw)
To: Aaron Tomlin
Cc: axboe, kbusch, hch, sagi, mst, aacraid, James.Bottomley,
martin.petersen, liyihang9, kashyap.desai, sumit.saxena,
shivasharan.srikanteshwara, chandrakanth.patil, sathya.prakash,
sreekanth.reddy, suganath-prabu.subramani, ranjan.kumar,
jinpu.wang, tglx, mingo, peterz, juri.lelli, vincent.guittot,
akpm, maz, ruanjinjie, bigeasy, yphbchou0911, wagi, frederic,
longman, chenridong, hare, kch, ming.lei, steve, sean, chjohnst,
neelx, mproche, linux-block, linux-kernel, virtualization,
linux-nvme, linux-scsi, megaraidlinux.pdl, mpi3mr-linuxdrv.pdl,
MPT-FusionLinux.pdl
Aaron,
> Ensure that IRQ affinity setup also respects the queue-to-CPU mapping
> constraints provided by the block layer. This allows the NVMe driver
> to avoid assigning interrupts to CPUs that the block layer has
> excluded (e.g., isolated CPUs).
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v10 07/13] scsi: Use block layer helpers to constrain queue affinity
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (5 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 06/13] nvme-pci: use block layer helpers to constrain queue affinity Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-03 1:46 ` Martin K. Petersen
2026-04-01 22:23 ` [PATCH v10 08/13] virtio: blk/scsi: use " Aaron Tomlin
` (5 subsequent siblings)
12 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
Ensure that IRQ affinity setup also respects the queue-to-CPU mapping
constraints provided by the block layer. This allows the SCSI drivers
to avoid assigning interrupts to CPUs that the block layer has excluded
(e.g., isolated CPUs).
Only convert drivers which are already using the
pci_alloc_irq_vectors_affinity with the PCI_IRQ_AFFINITY flag set.
Because these drivers are enabled to let the IRQ core code to
set the affinity. Also don't update qla2xxx because the nvme-fabrics
code is not ready yet.
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 +
drivers/scsi/megaraid/megaraid_sas_base.c | 5 ++++-
drivers/scsi/mpi3mr/mpi3mr_fw.c | 6 +++++-
drivers/scsi/mpt3sas/mpt3sas_base.c | 5 ++++-
drivers/scsi/pm8001/pm8001_init.c | 1 +
5 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index f69efc6494b8..d1f689224e7b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2605,6 +2605,7 @@ static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba)
struct pci_dev *pdev = hisi_hba->pci_dev;
struct irq_affinity desc = {
.pre_vectors = BASE_VECTORS_V3_HW,
+ .mask = blk_mq_online_queue_affinity(),
};
min_msi = MIN_AFFINE_VECTORS_V3_HW;
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index ac71ea4898b2..7e2a3c187ee0 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -5925,7 +5925,10 @@ static int
__megasas_alloc_irq_vectors(struct megasas_instance *instance)
{
int i, irq_flags;
- struct irq_affinity desc = { .pre_vectors = instance->low_latency_index_start };
+ struct irq_affinity desc = {
+ .pre_vectors = instance->low_latency_index_start,
+ .mask = blk_mq_online_queue_affinity(),
+ };
struct irq_affinity *descp = &desc;
irq_flags = PCI_IRQ_MSIX;
diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c
index c744210cc901..f9b8b3639c64 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_fw.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c
@@ -830,7 +830,11 @@ static int mpi3mr_setup_isr(struct mpi3mr_ioc *mrioc, u8 setup_one)
int max_vectors, min_vec;
int retval;
int i;
- struct irq_affinity desc = { .pre_vectors = 1, .post_vectors = 1 };
+ struct irq_affinity desc = {
+ .pre_vectors = 1,
+ .post_vectors = 1,
+ .mask = blk_mq_online_queue_affinity(),
+ };
if (mrioc->is_intr_info_set)
return 0;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 79052f2accbd..91e1622b5b77 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3370,7 +3370,10 @@ static int
_base_alloc_irq_vectors(struct MPT3SAS_ADAPTER *ioc)
{
int i, irq_flags = PCI_IRQ_MSIX;
- struct irq_affinity desc = { .pre_vectors = ioc->high_iops_queues };
+ struct irq_affinity desc = {
+ .pre_vectors = ioc->high_iops_queues,
+ .mask = blk_mq_online_queue_affinity(),
+ };
struct irq_affinity *descp = &desc;
/*
* Don't allocate msix vectors for poll_queues.
diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
index e93ea76b565e..6360fa95bcf4 100644
--- a/drivers/scsi/pm8001/pm8001_init.c
+++ b/drivers/scsi/pm8001/pm8001_init.c
@@ -978,6 +978,7 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info *pm8001_ha)
*/
struct irq_affinity desc = {
.pre_vectors = 1,
+ .mask = blk_mq_online_queue_affinity(),
};
rc = pci_alloc_irq_vectors_affinity(
pm8001_ha->pdev, 2, PM8001_MAX_MSIX_VEC,
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v10 07/13] scsi: Use block layer helpers to constrain queue affinity
2026-04-01 22:23 ` [PATCH v10 07/13] scsi: Use " Aaron Tomlin
@ 2026-04-03 1:46 ` Martin K. Petersen
0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2026-04-03 1:46 UTC (permalink / raw)
To: Aaron Tomlin
Cc: axboe, kbusch, hch, sagi, mst, aacraid, James.Bottomley,
martin.petersen, liyihang9, kashyap.desai, sumit.saxena,
shivasharan.srikanteshwara, chandrakanth.patil, sathya.prakash,
sreekanth.reddy, suganath-prabu.subramani, ranjan.kumar,
jinpu.wang, tglx, mingo, peterz, juri.lelli, vincent.guittot,
akpm, maz, ruanjinjie, bigeasy, yphbchou0911, wagi, frederic,
longman, chenridong, hare, kch, ming.lei, steve, sean, chjohnst,
neelx, mproche, linux-block, linux-kernel, virtualization,
linux-nvme, linux-scsi, megaraidlinux.pdl, mpi3mr-linuxdrv.pdl,
MPT-FusionLinux.pdl
Aaron,
> Ensure that IRQ affinity setup also respects the queue-to-CPU mapping
> constraints provided by the block layer. This allows the SCSI drivers
> to avoid assigning interrupts to CPUs that the block layer has
> excluded (e.g., isolated CPUs).
>
> Only convert drivers which are already using the
> pci_alloc_irq_vectors_affinity with the PCI_IRQ_AFFINITY flag set.
> Because these drivers are enabled to let the IRQ core code to
> set the affinity. Also don't update qla2xxx because the nvme-fabrics
> code is not ready yet.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v10 08/13] virtio: blk/scsi: use block layer helpers to constrain queue affinity
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (6 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 07/13] scsi: Use " Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-03 1:47 ` Martin K. Petersen
2026-04-01 22:23 ` [PATCH v10 09/13] isolation: Introduce io_queue isolcpus type Aaron Tomlin
` (4 subsequent siblings)
12 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
Ensure that IRQ affinity setup also respects the queue-to-CPU mapping
constraints provided by the block layer. This allows the virtio drivers
to avoid assigning interrupts to CPUs that the block layer has excluded
(e.g., isolated CPUs).
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
drivers/block/virtio_blk.c | 4 +++-
drivers/scsi/virtio_scsi.c | 5 ++++-
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index b1c9a27fe00f..9d737510454b 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -964,7 +964,9 @@ static int init_vq(struct virtio_blk *vblk)
unsigned short num_vqs;
unsigned short num_poll_vqs;
struct virtio_device *vdev = vblk->vdev;
- struct irq_affinity desc = { 0, };
+ struct irq_affinity desc = {
+ .mask = blk_mq_possible_queue_affinity(),
+ };
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
struct virtio_blk_config, num_queues,
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 0ed8558dad72..520a7da5386e 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -849,7 +849,10 @@ static int virtscsi_init(struct virtio_device *vdev,
u32 num_vqs, num_poll_vqs, num_req_vqs;
struct virtqueue_info *vqs_info;
struct virtqueue **vqs;
- struct irq_affinity desc = { .pre_vectors = 2 };
+ struct irq_affinity desc = {
+ .pre_vectors = 2,
+ .mask = blk_mq_possible_queue_affinity(),
+ };
num_req_vqs = vscsi->num_queues;
num_vqs = num_req_vqs + VIRTIO_SCSI_VQ_BASE;
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v10 08/13] virtio: blk/scsi: use block layer helpers to constrain queue affinity
2026-04-01 22:23 ` [PATCH v10 08/13] virtio: blk/scsi: use " Aaron Tomlin
@ 2026-04-03 1:47 ` Martin K. Petersen
0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2026-04-03 1:47 UTC (permalink / raw)
To: Aaron Tomlin
Cc: axboe, kbusch, hch, sagi, mst, aacraid, James.Bottomley,
martin.petersen, liyihang9, kashyap.desai, sumit.saxena,
shivasharan.srikanteshwara, chandrakanth.patil, sathya.prakash,
sreekanth.reddy, suganath-prabu.subramani, ranjan.kumar,
jinpu.wang, tglx, mingo, peterz, juri.lelli, vincent.guittot,
akpm, maz, ruanjinjie, bigeasy, yphbchou0911, wagi, frederic,
longman, chenridong, hare, kch, ming.lei, steve, sean, chjohnst,
neelx, mproche, linux-block, linux-kernel, virtualization,
linux-nvme, linux-scsi, megaraidlinux.pdl, mpi3mr-linuxdrv.pdl,
MPT-FusionLinux.pdl
Aaron,
> Ensure that IRQ affinity setup also respects the queue-to-CPU mapping
> constraints provided by the block layer. This allows the virtio
> drivers to avoid assigning interrupts to CPUs that the block layer has
> excluded (e.g., isolated CPUs).
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v10 09/13] isolation: Introduce io_queue isolcpus type
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (7 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 08/13] virtio: blk/scsi: use " Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-03 1:47 ` Martin K. Petersen
2026-04-01 22:23 ` [PATCH v10 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Aaron Tomlin
` (3 subsequent siblings)
12 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
Multiqueue drivers spread I/O queues across all CPUs for optimal
performance. However, these drivers are not aware of CPU isolation
requirements and will distribute queues without considering the isolcpus
configuration.
Introduce a new isolcpus mask that allows users to define which CPUs
should have I/O queues assigned. This is similar to managed_irq, but
intended for drivers that do not use the managed IRQ infrastructure
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
include/linux/sched/isolation.h | 1 +
kernel/sched/isolation.c | 7 +++++++
2 files changed, 8 insertions(+)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index dc3975ff1b2e..7b266fc2a405 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -18,6 +18,7 @@ enum hk_type {
HK_TYPE_MANAGED_IRQ,
/* Inverse of boot-time nohz_full= or isolcpus=nohz arguments */
HK_TYPE_KERNEL_NOISE,
+ HK_TYPE_IO_QUEUE,
HK_TYPE_MAX,
/*
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index ef152d401fe2..3406e3024fd4 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -16,6 +16,7 @@ enum hk_flags {
HK_FLAG_DOMAIN = BIT(HK_TYPE_DOMAIN),
HK_FLAG_MANAGED_IRQ = BIT(HK_TYPE_MANAGED_IRQ),
HK_FLAG_KERNEL_NOISE = BIT(HK_TYPE_KERNEL_NOISE),
+ HK_FLAG_IO_QUEUE = BIT(HK_TYPE_IO_QUEUE),
};
DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
@@ -340,6 +341,12 @@ static int __init housekeeping_isolcpus_setup(char *str)
continue;
}
+ if (!strncmp(str, "io_queue,", 9)) {
+ str += 9;
+ flags |= HK_FLAG_IO_QUEUE;
+ continue;
+ }
+
/*
* Skip unknown sub-parameter and validate that it is not
* containing an invalid character.
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v10 09/13] isolation: Introduce io_queue isolcpus type
2026-04-01 22:23 ` [PATCH v10 09/13] isolation: Introduce io_queue isolcpus type Aaron Tomlin
@ 2026-04-03 1:47 ` Martin K. Petersen
0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2026-04-03 1:47 UTC (permalink / raw)
To: Aaron Tomlin
Cc: axboe, kbusch, hch, sagi, mst, aacraid, James.Bottomley,
martin.petersen, liyihang9, kashyap.desai, sumit.saxena,
shivasharan.srikanteshwara, chandrakanth.patil, sathya.prakash,
sreekanth.reddy, suganath-prabu.subramani, ranjan.kumar,
jinpu.wang, tglx, mingo, peterz, juri.lelli, vincent.guittot,
akpm, maz, ruanjinjie, bigeasy, yphbchou0911, wagi, frederic,
longman, chenridong, hare, kch, ming.lei, steve, sean, chjohnst,
neelx, mproche, linux-block, linux-kernel, virtualization,
linux-nvme, linux-scsi, megaraidlinux.pdl, mpi3mr-linuxdrv.pdl,
MPT-FusionLinux.pdl
Aaron,
> Multiqueue drivers spread I/O queues across all CPUs for optimal
> performance. However, these drivers are not aware of CPU isolation
> requirements and will distribute queues without considering the
> isolcpus configuration.
>
> Introduce a new isolcpus mask that allows users to define which CPUs
> should have I/O queues assigned. This is similar to managed_irq, but
> intended for drivers that do not use the managed IRQ infrastructure
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v10 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (8 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 09/13] isolation: Introduce io_queue isolcpus type Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-03 2:06 ` Waiman Long
2026-04-01 22:23 ` [PATCH v10 11/13] blk-mq: prevent offlining hk CPUs with associated online isolated CPUs Aaron Tomlin
` (2 subsequent siblings)
12 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
Extend the capabilities of the generic CPU to hardware queue (hctx)
mapping code, so it maps houskeeping CPUs and isolated CPUs to the
hardware queues evenly.
A hctx is only operational when there is at least one online
housekeeping CPU assigned (aka active_hctx). Thus, check the final
mapping that there is no hctx which has only offline housekeeing CPU and
online isolated CPUs.
Example mapping result:
16 online CPUs
isolcpus=io_queue,2-3,6-7,12-13
Queue mapping:
hctx0: default 0 2
hctx1: default 1 3
hctx2: default 4 6
hctx3: default 5 7
hctx4: default 8 12
hctx5: default 9 13
hctx6: default 10
hctx7: default 11
hctx8: default 14
hctx9: default 15
IRQ mapping:
irq 42 affinity 0 effective 0 nvme0q0
irq 43 affinity 0 effective 0 nvme0q1
irq 44 affinity 1 effective 1 nvme0q2
irq 45 affinity 4 effective 4 nvme0q3
irq 46 affinity 5 effective 5 nvme0q4
irq 47 affinity 8 effective 8 nvme0q5
irq 48 affinity 9 effective 9 nvme0q6
irq 49 affinity 10 effective 10 nvme0q7
irq 50 affinity 11 effective 11 nvme0q8
irq 51 affinity 14 effective 14 nvme0q9
irq 52 affinity 15 effective 15 nvme0q10
A corner case is when the number of online CPUs and present CPUs
differ and the driver asks for less queues than online CPUs, e.g.
8 online CPUs, 16 possible CPUs
isolcpus=io_queue,2-3,6-7,12-13
virtio_blk.num_request_queues=2
Queue mapping:
hctx0: default 0 1 2 3 4 5 6 7 8 12 13
hctx1: default 9 10 11 14 15
IRQ mapping
irq 27 affinity 0 effective 0 virtio0-config
irq 28 affinity 0-1,4-5,8 effective 5 virtio0-req.0
irq 29 affinity 9-11,14-15 effective 0 virtio0-req.1
Noteworthy is that for the normal/default configuration (!isoclpus) the
mapping will change for systems which have non hyperthreading CPUs. The
main assignment loop will completely rely that group_mask_cpus_evenly to
do the right thing. The old code would distribute the CPUs linearly over
the hardware context:
queue mapping for /dev/nvme0n1
hctx0: default 0 8
hctx1: default 1 9
hctx2: default 2 10
hctx3: default 3 11
hctx4: default 4 12
hctx5: default 5 13
hctx6: default 6 14
hctx7: default 7 15
The assign each hardware context the map generated by the
group_mask_cpus_evenly function:
queue mapping for /dev/nvme0n1
hctx0: default 0 1
hctx1: default 2 3
hctx2: default 4 5
hctx3: default 6 7
hctx4: default 8 9
hctx5: default 10 11
hctx6: default 12 13
hctx7: default 14 15
In case of hyperthreading CPUs, the resulting map stays the same.
Signed-off-by: Daniel Wagner <wagi@kernel.org>
[atomlin: Fixed absolute vs. relative hardware queue index mix-up in
blk_mq_map_queues and validation checks; fixed typographical errors.]
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
block/blk-mq-cpumap.c | 175 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 157 insertions(+), 18 deletions(-)
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 8244ecf87835..8d09af49a142 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -22,7 +22,18 @@ static unsigned int blk_mq_num_queues(const struct cpumask *mask,
{
unsigned int num;
- num = cpumask_weight(mask);
+ if (housekeeping_enabled(HK_TYPE_IO_QUEUE)) {
+ const struct cpumask *hk_mask;
+ struct cpumask avail_mask;
+
+ hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+ cpumask_and(&avail_mask, mask, hk_mask);
+
+ num = cpumask_weight(&avail_mask);
+ } else {
+ num = cpumask_weight(mask);
+ }
+
return min_not_zero(num, max_queues);
}
@@ -31,9 +42,13 @@ static unsigned int blk_mq_num_queues(const struct cpumask *mask,
*
* Returns an affinity mask that represents the queue-to-CPU mapping
* requested by the block layer based on possible CPUs.
+ * This helper takes isolcpus settings into account.
*/
const struct cpumask *blk_mq_possible_queue_affinity(void)
{
+ if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
+ return housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+
return cpu_possible_mask;
}
EXPORT_SYMBOL_GPL(blk_mq_possible_queue_affinity);
@@ -46,6 +61,14 @@ EXPORT_SYMBOL_GPL(blk_mq_possible_queue_affinity);
*/
const struct cpumask *blk_mq_online_queue_affinity(void)
{
+ /*
+ * Return the stable housekeeping mask if enabled. Callers (e.g.,
+ * the IRQ affinity core) are responsible for safely intersecting
+ * this with a local snapshot of the online mask.
+ */
+ if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
+ return housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+
return cpu_online_mask;
}
EXPORT_SYMBOL_GPL(blk_mq_online_queue_affinity);
@@ -57,7 +80,8 @@ EXPORT_SYMBOL_GPL(blk_mq_online_queue_affinity);
* ignored.
*
* Calculates the number of queues to be used for a multiqueue
- * device based on the number of possible CPUs.
+ * device based on the number of possible CPUs. This helper
+ * takes isolcpus settings into account.
*/
unsigned int blk_mq_num_possible_queues(unsigned int max_queues)
{
@@ -72,7 +96,8 @@ EXPORT_SYMBOL_GPL(blk_mq_num_possible_queues);
* ignored.
*
* Calculates the number of queues to be used for a multiqueue
- * device based on the number of online CPUs.
+ * device based on the number of online CPUs. This helper
+ * takes isolcpus settings into account.
*/
unsigned int blk_mq_num_online_queues(unsigned int max_queues)
{
@@ -80,23 +105,104 @@ unsigned int blk_mq_num_online_queues(unsigned int max_queues)
}
EXPORT_SYMBOL_GPL(blk_mq_num_online_queues);
+static bool blk_mq_validate(struct blk_mq_queue_map *qmap,
+ const struct cpumask *active_hctx)
+{
+ /*
+ * Verify if the mapping is usable when housekeeping
+ * configuration is enabled
+ */
+
+ for (int queue = 0; queue < qmap->nr_queues; queue++) {
+ int cpu;
+
+ if (cpumask_test_cpu(queue, active_hctx)) {
+ /*
+ * This hctx has at least one online CPU thus it
+ * is able to serve any assigned isolated CPU.
+ */
+ continue;
+ }
+
+ /*
+ * There is no housekeeping online CPU for this hctx, all
+ * good as long as all non-housekeeping CPUs are also
+ * offline.
+ */
+ for_each_online_cpu(cpu) {
+ if (qmap->mq_map[cpu] != qmap->queue_offset + queue)
+ continue;
+
+ pr_warn("Unable to create a usable CPU-to-queue mapping with the given constraints\n");
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static void blk_mq_map_fallback(struct blk_mq_queue_map *qmap)
+{
+ unsigned int cpu;
+
+ /*
+ * Map all CPUs to the first hctx to ensure at least one online
+ * CPU is serving it.
+ */
+ for_each_possible_cpu(cpu)
+ qmap->mq_map[cpu] = 0;
+}
+
void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
{
- const struct cpumask *masks;
+ struct cpumask *masks __free(kfree) = NULL;
+ const struct cpumask *constraint;
unsigned int queue, cpu, nr_masks;
+ cpumask_var_t active_hctx;
- masks = group_cpus_evenly(qmap->nr_queues, &nr_masks);
- if (!masks) {
- for_each_possible_cpu(cpu)
- qmap->mq_map[cpu] = qmap->queue_offset;
- return;
- }
+ if (!zalloc_cpumask_var(&active_hctx, GFP_KERNEL))
+ goto fallback;
+
+ if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
+ constraint = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+ else
+ constraint = cpu_possible_mask;
+
+ /* Map CPUs to the hardware contexts (hctx) */
+ masks = group_mask_cpus_evenly(qmap->nr_queues, constraint, &nr_masks);
+ if (!masks)
+ goto free_fallback;
for (queue = 0; queue < qmap->nr_queues; queue++) {
- for_each_cpu(cpu, &masks[queue % nr_masks])
+ unsigned int idx = (qmap->queue_offset + queue) % nr_masks;
+
+ for_each_cpu(cpu, &masks[idx]) {
qmap->mq_map[cpu] = qmap->queue_offset + queue;
+
+ if (cpu_online(cpu))
+ cpumask_set_cpu(queue, active_hctx);
+ }
+ }
+
+ /* Map any unassigned CPU evenly to the hardware contexts (hctx) */
+ queue = cpumask_first(active_hctx);
+ for_each_cpu_andnot(cpu, cpu_possible_mask, constraint) {
+ qmap->mq_map[cpu] = qmap->queue_offset + queue;
+ queue = cpumask_next_wrap(queue, active_hctx);
}
- kfree(masks);
+
+ if (!blk_mq_validate(qmap, active_hctx))
+ goto free_fallback;
+
+ free_cpumask_var(active_hctx);
+
+ return;
+
+free_fallback:
+ free_cpumask_var(active_hctx);
+
+fallback:
+ blk_mq_map_fallback(qmap);
}
EXPORT_SYMBOL_GPL(blk_mq_map_queues);
@@ -133,24 +239,57 @@ void blk_mq_map_hw_queues(struct blk_mq_queue_map *qmap,
struct device *dev, unsigned int offset)
{
- const struct cpumask *mask;
+ cpumask_var_t active_hctx, mask;
unsigned int queue, cpu;
if (!dev->bus->irq_get_affinity)
goto fallback;
+ if (!zalloc_cpumask_var(&active_hctx, GFP_KERNEL))
+ goto fallback;
+
+ if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+ free_cpumask_var(active_hctx);
+ goto fallback;
+ }
+
+ /* Map CPUs to the hardware contexts (hctx) */
for (queue = 0; queue < qmap->nr_queues; queue++) {
- mask = dev->bus->irq_get_affinity(dev, queue + offset);
- if (!mask)
- goto fallback;
+ const struct cpumask *affinity_mask;
+
+ affinity_mask = dev->bus->irq_get_affinity(dev, offset + queue);
+ if (!affinity_mask)
+ goto free_fallback;
- for_each_cpu(cpu, mask)
+ for_each_cpu(cpu, affinity_mask) {
qmap->mq_map[cpu] = qmap->queue_offset + queue;
+
+ cpumask_set_cpu(cpu, mask);
+ if (cpu_online(cpu))
+ cpumask_set_cpu(queue, active_hctx);
+ }
}
+ /* Map any unassigned CPU evenly to the hardware contexts (hctx) */
+ queue = cpumask_first(active_hctx);
+ for_each_cpu_andnot(cpu, cpu_possible_mask, mask) {
+ qmap->mq_map[cpu] = qmap->queue_offset + queue;
+ queue = cpumask_next_wrap(queue, active_hctx);
+ }
+
+ if (!blk_mq_validate(qmap, active_hctx))
+ goto free_fallback;
+
+ free_cpumask_var(active_hctx);
+ free_cpumask_var(mask);
+
return;
+free_fallback:
+ free_cpumask_var(active_hctx);
+ free_cpumask_var(mask);
+
fallback:
- blk_mq_map_queues(qmap);
+ blk_mq_map_fallback(qmap);
}
EXPORT_SYMBOL_GPL(blk_mq_map_hw_queues);
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v10 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled
2026-04-01 22:23 ` [PATCH v10 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Aaron Tomlin
@ 2026-04-03 2:06 ` Waiman Long
0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2026-04-03 2:06 UTC (permalink / raw)
To: Aaron Tomlin, axboe, kbusch, hch, sagi, mst
Cc: aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, chenridong, hare, kch,
ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
On 4/1/26 6:23 PM, Aaron Tomlin wrote:
> From: Daniel Wagner <wagi@kernel.org>
>
> Extend the capabilities of the generic CPU to hardware queue (hctx)
> mapping code, so it maps houskeeping CPUs and isolated CPUs to the
> hardware queues evenly.
>
> A hctx is only operational when there is at least one online
> housekeeping CPU assigned (aka active_hctx). Thus, check the final
> mapping that there is no hctx which has only offline housekeeing CPU and
> online isolated CPUs.
>
> Example mapping result:
>
> 16 online CPUs
>
> isolcpus=io_queue,2-3,6-7,12-13
>
> Queue mapping:
> hctx0: default 0 2
> hctx1: default 1 3
> hctx2: default 4 6
> hctx3: default 5 7
> hctx4: default 8 12
> hctx5: default 9 13
> hctx6: default 10
> hctx7: default 11
> hctx8: default 14
> hctx9: default 15
>
> IRQ mapping:
> irq 42 affinity 0 effective 0 nvme0q0
> irq 43 affinity 0 effective 0 nvme0q1
> irq 44 affinity 1 effective 1 nvme0q2
> irq 45 affinity 4 effective 4 nvme0q3
> irq 46 affinity 5 effective 5 nvme0q4
> irq 47 affinity 8 effective 8 nvme0q5
> irq 48 affinity 9 effective 9 nvme0q6
> irq 49 affinity 10 effective 10 nvme0q7
> irq 50 affinity 11 effective 11 nvme0q8
> irq 51 affinity 14 effective 14 nvme0q9
> irq 52 affinity 15 effective 15 nvme0q10
>
> A corner case is when the number of online CPUs and present CPUs
> differ and the driver asks for less queues than online CPUs, e.g.
>
> 8 online CPUs, 16 possible CPUs
>
> isolcpus=io_queue,2-3,6-7,12-13
> virtio_blk.num_request_queues=2
>
> Queue mapping:
> hctx0: default 0 1 2 3 4 5 6 7 8 12 13
> hctx1: default 9 10 11 14 15
>
> IRQ mapping
> irq 27 affinity 0 effective 0 virtio0-config
> irq 28 affinity 0-1,4-5,8 effective 5 virtio0-req.0
> irq 29 affinity 9-11,14-15 effective 0 virtio0-req.1
>
> Noteworthy is that for the normal/default configuration (!isoclpus) the
> mapping will change for systems which have non hyperthreading CPUs. The
> main assignment loop will completely rely that group_mask_cpus_evenly to
> do the right thing. The old code would distribute the CPUs linearly over
> the hardware context:
>
> queue mapping for /dev/nvme0n1
> hctx0: default 0 8
> hctx1: default 1 9
> hctx2: default 2 10
> hctx3: default 3 11
> hctx4: default 4 12
> hctx5: default 5 13
> hctx6: default 6 14
> hctx7: default 7 15
>
> The assign each hardware context the map generated by the
> group_mask_cpus_evenly function:
>
> queue mapping for /dev/nvme0n1
> hctx0: default 0 1
> hctx1: default 2 3
> hctx2: default 4 5
> hctx3: default 6 7
> hctx4: default 8 9
> hctx5: default 10 11
> hctx6: default 12 13
> hctx7: default 14 15
>
> In case of hyperthreading CPUs, the resulting map stays the same.
>
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> [atomlin: Fixed absolute vs. relative hardware queue index mix-up in
> blk_mq_map_queues and validation checks; fixed typographical errors.]
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
> block/blk-mq-cpumap.c | 175 +++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 157 insertions(+), 18 deletions(-)
>
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 8244ecf87835..8d09af49a142 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -22,7 +22,18 @@ static unsigned int blk_mq_num_queues(const struct cpumask *mask,
> {
> unsigned int num;
>
> - num = cpumask_weight(mask);
> + if (housekeeping_enabled(HK_TYPE_IO_QUEUE)) {
> + const struct cpumask *hk_mask;
> + struct cpumask avail_mask;
> +
> + hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
> + cpumask_and(&avail_mask, mask, hk_mask);
> +
> + num = cpumask_weight(&avail_mask);
As said before by Ming Lei, struct cpumask can be rather big in size if
NR_CPUS is large. I will suggest using cpumask_weight_and() instead
which will eliminate the need of the local variables.
Cheers,
Longman
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v10 11/13] blk-mq: prevent offlining hk CPUs with associated online isolated CPUs
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (9 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-01 22:23 ` [PATCH v10 12/13] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs Aaron Tomlin
2026-04-01 22:23 ` [PATCH v10 13/13] docs: add io_queue flag to isolcpus Aaron Tomlin
12 siblings, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
When isolcpus=io_queue is enabled, and the last housekeeping CPU for a
given hctx goes offline, there would be no CPU left to handle I/O. To
prevent I/O stalls, prevent offlining housekeeping CPUs that are still
serving isolated CPUs.
When isolcpus=io_queue is enabled and the last housekeeping CPU
for a given hctx goes offline, no CPU would be left to handle I/O.
To prevent I/O stalls, disallow offlining housekeeping CPUs that are
still serving isolated CPUs.
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
block/blk-mq.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3da2215b2912..8671f2170880 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3699,6 +3699,43 @@ static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
return data.has_rq;
}
+static bool blk_mq_hctx_can_offline_hk_cpu(struct blk_mq_hw_ctx *hctx,
+ unsigned int this_cpu)
+{
+ const struct cpumask *hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+
+ for (int i = 0; i < hctx->nr_ctx; i++) {
+ struct blk_mq_ctx *ctx = hctx->ctxs[i];
+
+ if (ctx->cpu == this_cpu)
+ continue;
+
+ /*
+ * Check if this context has at least one online
+ * housekeeping CPU; in this case the hardware context is
+ * usable.
+ */
+ if (cpumask_test_cpu(ctx->cpu, hk_mask) &&
+ cpu_online(ctx->cpu))
+ break;
+
+ /*
+ * The context doesn't have any online housekeeping CPUs,
+ * but there might be an online isolated CPU mapped to
+ * it.
+ */
+ if (cpu_is_offline(ctx->cpu))
+ continue;
+
+ pr_warn("%s: trying to offline hctx%d but there is still an online isolcpu CPU %d mapped to it\n",
+ hctx->queue->disk->disk_name,
+ hctx->queue_num, ctx->cpu);
+ return false;
+ }
+
+ return true;
+}
+
static bool blk_mq_hctx_has_online_cpu(struct blk_mq_hw_ctx *hctx,
unsigned int this_cpu)
{
@@ -3731,6 +3768,11 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
struct blk_mq_hw_ctx, cpuhp_online);
int ret = 0;
+ if (housekeeping_enabled(HK_TYPE_IO_QUEUE)) {
+ if (!blk_mq_hctx_can_offline_hk_cpu(hctx, cpu))
+ return -EINVAL;
+ }
+
if (!hctx->nr_ctx || blk_mq_hctx_has_online_cpu(hctx, cpu))
return 0;
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH v10 12/13] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (10 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 11/13] blk-mq: prevent offlining hk CPUs with associated online isolated CPUs Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-01 22:23 ` [PATCH v10 13/13] docs: add io_queue flag to isolcpus Aaron Tomlin
12 siblings, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
At present, the managed interrupt spreading algorithm distributes vectors
across all available CPUs within a given node or system. On systems
employing CPU isolation (e.g., "isolcpus=io_queue"), this behaviour
defeats the primary purpose of isolation by routing hardware interrupts
(such as NVMe completion queues) directly to isolated cores.
Update irq_create_affinity_masks() to respect the housekeeping CPU mask.
Introduce irq_spread_hk_filter() to intersect the natively calculated
affinity mask with the HK_TYPE_IO_QUEUE mask, thereby keeping managed
interrupts off isolated CPUs.
To ensure strict isolation whilst guaranteeing a valid routing destination:
1. Fallback mechanism: Should the initial spreading logic assign a
vector exclusively to isolated CPUs (resulting in an empty
intersection), the filter safely falls back to the system's
online housekeeping CPUs.
2. Hotplug safety: The fallback utilises data_race(cpu_online_mask)
instead of allocating a local cpumask snapshot. This circumvents
CONFIG_CPUMASK_OFFSTACK stack bloat hazards on high-core-count
systems. Furthermore, it prevents deadlocks with concurrent CPU
hotplug operations (e.g., during storage driver error recovery)
by eliminating the need to hold the CPU hotplug read lock.
3. Fast-path optimisation: The filtering logic is conditionally
executed only if housekeeping is enabled, thereby ensuring zero
overhead for standard configurations.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
kernel/irq/affinity.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 076a5ef1e306..dd9e7f5fbdec 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -8,6 +8,24 @@
#include <linux/slab.h>
#include <linux/cpu.h>
#include <linux/group_cpus.h>
+#include <linux/sched/isolation.h>
+
+/**
+ * irq_spread_hk_filter - Restrict an interrupt affinity mask to housekeeping CPUs
+ * @mask: The interrupt affinity mask to filter (in/out)
+ * @hk_mask: The system's housekeeping CPU mask
+ *
+ * Intersects @mask with @hk_mask to keep interrupts off isolated CPUs.
+ * If this intersection is empty (meaning all targeted CPUs were isolated),
+ * it falls back to the online housekeeping CPUs to guarantee a valid
+ * routing destination.
+ */
+static void irq_spread_hk_filter(struct cpumask *mask,
+ const struct cpumask *hk_mask)
+{
+ if (!cpumask_and(mask, mask, hk_mask))
+ cpumask_and(mask, hk_mask, data_race(cpu_online_mask));
+}
static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
{
@@ -27,6 +45,8 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
{
unsigned int affvecs, curvec, usedvecs, i;
struct irq_affinity_desc *masks = NULL;
+ const struct cpumask *hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+ bool hk_enabled = housekeeping_enabled(HK_TYPE_IO_QUEUE);
/*
* Determine the number of vectors which need interrupt affinities
@@ -83,8 +103,12 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
return NULL;
}
- for (int j = 0; j < nr_masks; j++)
+ for (int j = 0; j < nr_masks; j++) {
cpumask_copy(&masks[curvec + j].mask, &result[j]);
+ if (hk_enabled)
+ irq_spread_hk_filter(&masks[curvec + j].mask,
+ hk_mask);
+ }
kfree(result);
curvec += nr_masks;
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH v10 13/13] docs: add io_queue flag to isolcpus
2026-04-01 22:22 [PATCH v10 00/13] blk: honor isolcpus configuration Aaron Tomlin
` (11 preceding siblings ...)
2026-04-01 22:23 ` [PATCH v10 12/13] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs Aaron Tomlin
@ 2026-04-01 22:23 ` Aaron Tomlin
2026-04-03 2:30 ` Ming Lei
12 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-04-01 22:23 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, mst
Cc: atomlin, aacraid, James.Bottomley, martin.petersen, liyihang9,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara,
chandrakanth.patil, sathya.prakash, sreekanth.reddy,
suganath-prabu.subramani, ranjan.kumar, jinpu.wang, tglx, mingo,
peterz, juri.lelli, vincent.guittot, akpm, maz, ruanjinjie,
bigeasy, yphbchou0911, wagi, frederic, longman, chenridong, hare,
kch, ming.lei, steve, sean, chjohnst, neelx, mproche, linux-block,
linux-kernel, virtualization, linux-nvme, linux-scsi,
megaraidlinux.pdl, mpi3mr-linuxdrv.pdl, MPT-FusionLinux.pdl
From: Daniel Wagner <wagi@kernel.org>
The io_queue flag informs multiqueue device drivers where to place
hardware queues. Document this new flag in the isolcpus
command-line argument description.
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
.../admin-guide/kernel-parameters.txt | 22 ++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 03a550630644..9ed7c3ecd158 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2816,7 +2816,6 @@ Kernel parameters
"number of CPUs in system - 1".
managed_irq
-
Isolate from being targeted by managed interrupts
which have an interrupt mask containing isolated
CPUs. The affinity of managed interrupts is
@@ -2839,6 +2838,27 @@ Kernel parameters
housekeeping CPUs has no influence on those
queues.
+ io_queue
+ Isolate from I/O queue work caused by multiqueue
+ device drivers. Restrict the placement of
+ queues to housekeeping CPUs only, ensuring that
+ all I/O work is processed by a housekeeping CPU.
+
+ The io_queue configuration takes precedence
+ over managed_irq. When io_queue is used,
+ managed_irq placement constrains have no
+ effect.
+
+ Note: Offlining housekeeping CPUS which serve
+ isolated CPUs will be rejected. Isolated CPUs
+ need to be offlined before offlining the
+ housekeeping CPUs.
+
+ Note: When an isolated CPU issues an I/O request,
+ it is forwarded to a housekeeping CPU. This will
+ trigger a software interrupt on the completion
+ path.
+
The format of <cpu-list> is described above.
iucv= [HW,NET]
--
2.51.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v10 13/13] docs: add io_queue flag to isolcpus
2026-04-01 22:23 ` [PATCH v10 13/13] docs: add io_queue flag to isolcpus Aaron Tomlin
@ 2026-04-03 2:30 ` Ming Lei
0 siblings, 0 replies; 22+ messages in thread
From: Ming Lei @ 2026-04-03 2:30 UTC (permalink / raw)
To: Aaron Tomlin
Cc: axboe, kbusch, hch, sagi, mst, aacraid, James.Bottomley,
martin.petersen, liyihang9, kashyap.desai, sumit.saxena,
shivasharan.srikanteshwara, chandrakanth.patil, sathya.prakash,
sreekanth.reddy, suganath-prabu.subramani, ranjan.kumar,
jinpu.wang, tglx, mingo, peterz, juri.lelli, vincent.guittot,
akpm, maz, ruanjinjie, bigeasy, yphbchou0911, wagi, frederic,
longman, chenridong, hare, kch, steve, sean, chjohnst, neelx,
mproche, linux-block, linux-kernel, virtualization, linux-nvme,
linux-scsi, megaraidlinux.pdl, mpi3mr-linuxdrv.pdl,
MPT-FusionLinux.pdl
On Wed, Apr 01, 2026 at 06:23:12PM -0400, Aaron Tomlin wrote:
> From: Daniel Wagner <wagi@kernel.org>
>
> The io_queue flag informs multiqueue device drivers where to place
> hardware queues. Document this new flag in the isolcpus
> command-line argument description.
>
> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
> .../admin-guide/kernel-parameters.txt | 22 ++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 03a550630644..9ed7c3ecd158 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2816,7 +2816,6 @@ Kernel parameters
> "number of CPUs in system - 1".
>
> managed_irq
> -
> Isolate from being targeted by managed interrupts
> which have an interrupt mask containing isolated
> CPUs. The affinity of managed interrupts is
> @@ -2839,6 +2838,27 @@ Kernel parameters
> housekeeping CPUs has no influence on those
> queues.
>
> + io_queue
> + Isolate from I/O queue work caused by multiqueue
> + device drivers. Restrict the placement of
> + queues to housekeeping CPUs only, ensuring that
> + all I/O work is processed by a housekeeping CPU.
All these can be supported by `managed_irq` already, please document the thing
which `io_queue` solves, and `managed_irq` can't cover, so user can know
how to choose between the two command lines.
`Restrict the placement of queues to housekeeping CPUs only` looks totally
stale, please see patch 10, in which isolated CPUs are spread too.
> +
> + The io_queue configuration takes precedence
> + over managed_irq. When io_queue is used,
> + managed_irq placement constrains have no
> + effect.
> +
> + Note: Offlining housekeeping CPUS which serve
> + isolated CPUs will be rejected. Isolated CPUs
> + need to be offlined before offlining the
> + housekeeping CPUs.
> +
> + Note: When an isolated CPU issues an I/O request,
> + it is forwarded to a housekeeping CPU. This will
> + trigger a software interrupt on the completion
> + path.
`io_queue` doesn't touch io completion code path, which is more
implementation details, so not sure if the above Note is needed.
Thanks,
Ming
^ permalink raw reply [flat|nested] 22+ messages in thread