[PATCH v15 7/8] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Aaron Tomlin <atomlin@atomlin.com>
To: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	mst@redhat.com
Cc: atomlin@atomlin.com, aacraid@microsemi.com,
	James.Bottomley@HansenPartnership.com,
	martin.petersen@oracle.com, liyihang9@h-partners.com,
	kashyap.desai@broadcom.com, sumit.saxena@broadcom.com,
	shivasharan.srikanteshwara@broadcom.com,
	chandrakanth.patil@broadcom.com, sathya.prakash@broadcom.com,
	sreekanth.reddy@broadcom.com,
	suganath-prabu.subramani@broadcom.com, ranjan.kumar@broadcom.com,
	jinpu.wang@cloud.ionos.com, tglx@kernel.org, mingo@redhat.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, akpm@linux-foundation.org,
	maz@kernel.org, ruanjinjie@huawei.com, bigeasy@linutronix.de,
	yphbchou0911@gmail.com, wagi@kernel.org, frederic@kernel.org,
	longman@redhat.com, chenridong@huawei.com, hare@suse.de,
	kch@nvidia.com, ming.lei@redhat.com, tom.leiming@gmail.com,
	steve@abita.co, sean@ashe.io, chjohnst@gmail.com, neelx@suse.com,
	mproche@gmail.com, nick.lange@gmail.com,
	marco.crivellari@suse.com, rishil1999@outlook.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v15 7/8] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs
Date: Thu, 21 May 2026 19:29:55 -0400	[thread overview]
Message-ID: <20260521232956.553287-8-atomlin@atomlin.com> (raw)
In-Reply-To: <20260521232956.553287-1-atomlin@atomlin.com>

At present, the managed interrupt spreading algorithm distributes vectors
across all available CPUs within a given node or system. On systems
employing CPU isolation (e.g., "isolcpus=io_queue"), this behaviour
defeats the primary purpose of isolation by routing hardware interrupts
(such as NVMe completion queues) directly to isolated cores.

Update irq_create_affinity_masks() to respect the housekeeping CPU mask.
By passing the HK_TYPE_IO_QUEUE mask directly to the topological
distribution function (group_mask_cpus_evenly()), we ensure that managed
interrupts are kept strictly off isolated CPUs.

This patch additionally addresses the architectural constraints of
restricted vector distribution:

    1.  Vector Limits and Overrides: Updated irq_calc_affinity_vectors()
        to strictly bound the maximum number of allocated vectors to the
        weight of the housekeeping mask. This correctly overrides
        drivers providing a calc_sets() callback, preventing them from
        wasting memory on dead hardware queues that cannot be routed to
        isolated CPUs.

    2.  Multi-set Alignment and Leak Prevention: When isolation
        constraints result in fewer available masks than requested
        vectors for a given set, the remaining vector slots are padded
        with the housekeeping mask. This replaces the historical
        irq_default_affinity padding, ensuring excess managed queues do
        not leak interrupts onto isolated CPUs.

    3.  Minimum Vector Safety Net: To prevent fatal -ENOSPC device probe
        aborts on heavily isolated systems (where the housekeeping CPU
        count might be lower than a device's structural minimum), the
        final vector calculation is safeguarded to never drop below
        minvec. Queues will safely share the available housekeeping CPUs
        instead of failing the probe.

    4.  Zero Overhead: The housekeeping mask is conditionally assigned
        via a direct pointer, completely avoiding temporary mask
        allocations (e.g., alloc_cpumask_var) and bitwise operations
        when CPU isolation is disabled. This guarantees zero performance
        or memory overhead for standard configurations.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 kernel/irq/affinity.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 78f2418a8925..dade92f8b4b3 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -8,6 +8,7 @@
 #include <linux/slab.h>
 #include <linux/cpu.h>
 #include <linux/group_cpus.h>
+#include <linux/sched/isolation.h>
 
 static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
 {
@@ -25,8 +26,10 @@ static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
 struct irq_affinity_desc *
 irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
 {
-	unsigned int affvecs, curvec, usedvecs, i;
+	unsigned int affvecs, curvec, usedvecs, i, j;
 	struct irq_affinity_desc *masks = NULL;
+	const struct cpumask *hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+	bool hk_enabled = housekeeping_enabled(HK_TYPE_IO_QUEUE);
 
 	/*
 	 * Determine the number of vectors which need interrupt affinities
@@ -70,19 +73,29 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
 	 */
 	for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
 		unsigned int nr_masks, this_vecs = affd->set_size[i];
-		struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks);
+		struct cpumask *result;
+		const struct cpumask *mask;
 
+		if (hk_enabled)
+			mask = hk_mask;
+		else
+			mask = cpu_possible_mask;
+
+		result = group_mask_cpus_evenly(this_vecs, mask,
+						&nr_masks);
 		if (!result) {
 			kfree(masks);
 			return NULL;
 		}
-
-		for (int j = 0; j < nr_masks; j++)
+		for (j = 0; j < nr_masks; j++)
 			cpumask_copy(&masks[curvec + j].mask, &result[j]);
+		for (j = nr_masks; j < this_vecs; j++)
+			cpumask_copy(&masks[curvec + j].mask, mask);
+
 		kfree(result);
 
-		curvec += nr_masks;
-		usedvecs += nr_masks;
+		curvec += this_vecs;
+		usedvecs += this_vecs;
 	}
 
 	/* Fill out vectors at the end that don't need affinity */
@@ -115,10 +128,12 @@ unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec,
 	if (resv > minvec)
 		return 0;
 
-	if (affd->calc_sets)
+	if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
+		set_vecs = cpumask_weight(housekeeping_cpumask(HK_TYPE_IO_QUEUE));
+	else if (affd->calc_sets)
 		set_vecs = maxvec - resv;
 	else
 		set_vecs = cpumask_weight(cpu_possible_mask);
 
-	return resv + min(set_vecs, maxvec - resv);
+	return max(minvec, resv + min(set_vecs, maxvec - resv));
 }
-- 
2.51.0

next prev parent reply	other threads:[~2026-05-21 23:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-21 23:29 [PATCH v15 0/8] blk: honor isolcpus configuration Aaron Tomlin
2026-05-21 23:29 ` [PATCH v15 1/8] scsi: aacraid: use block layer helpers to calculate num of queues Aaron Tomlin
2026-05-21 23:29 ` [PATCH v15 2/8] lib/group_cpus: remove dead !SMP code Aaron Tomlin
2026-05-21 23:29 ` [PATCH v15 3/8] lib/group_cpus: Add group_mask_cpus_evenly() Aaron Tomlin
2026-05-21 23:29 ` [PATCH v15 4/8] isolation: Introduce io_queue isolcpus type Aaron Tomlin
2026-05-21 23:29 ` [PATCH v15 5/8] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Aaron Tomlin
2026-05-21 23:29 ` [PATCH v15 6/8] blk-mq: prevent offlining hk CPUs with associated online isolated CPUs Aaron Tomlin
2026-05-21 23:29 ` Aaron Tomlin [this message]
2026-05-21 23:29 ` [PATCH v15 8/8] docs: add io_queue flag to isolcpus Aaron Tomlin
2026-05-26 16:05 ` [PATCH v15 0/8] blk: honor isolcpus configuration Daniel Wagner
2026-05-26 22:02   ` Aaron Tomlin

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:78f2418a892 dfblob:dade92f8b4b )
 OR (
bs:"[PATCH v15 7/8] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260521232956.553287-8-atomlin@atomlin.com \
    --to=atomlin@atomlin.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=aacraid@microsemi.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=chandrakanth.patil@broadcom.com \
    --cc=chenridong@huawei.com \
    --cc=chjohnst@gmail.com \
    --cc=frederic@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=juri.lelli@redhat.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liyihang9@h-partners.com \
    --cc=longman@redhat.com \
    --cc=marco.crivellari@suse.com \
    --cc=martin.petersen@oracle.com \
    --cc=maz@kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mingo@redhat.com \
    --cc=mproche@gmail.com \
    --cc=mst@redhat.com \
    --cc=neelx@suse.com \
    --cc=nick.lange@gmail.com \
    --cc=peterz@infradead.org \
    --cc=ranjan.kumar@broadcom.com \
    --cc=rishil1999@outlook.com \
    --cc=ruanjinjie@huawei.com \
    --cc=sagi@grimberg.me \
    --cc=sathya.prakash@broadcom.com \
    --cc=sean@ashe.io \
    --cc=shivasharan.srikanteshwara@broadcom.com \
    --cc=sreekanth.reddy@broadcom.com \
    --cc=steve@abita.co \
    --cc=suganath-prabu.subramani@broadcom.com \
    --cc=sumit.saxena@broadcom.com \
    --cc=tglx@kernel.org \
    --cc=tom.leiming@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=wagi@kernel.org \
    --cc=yphbchou0911@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.