From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Aaron Tomlin <atomlin@atomlin.com>
Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
mst@redhat.com, aacraid@microsemi.com,
James.Bottomley@hansenpartnership.com,
martin.petersen@oracle.com, liyihang9@h-partners.com,
kashyap.desai@broadcom.com, sumit.saxena@broadcom.com,
shivasharan.srikanteshwara@broadcom.com,
chandrakanth.patil@broadcom.com, sathya.prakash@broadcom.com,
sreekanth.reddy@broadcom.com,
suganath-prabu.subramani@broadcom.com, ranjan.kumar@broadcom.com,
jinpu.wang@cloud.ionos.com, tglx@kernel.org, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, akpm@linux-foundation.org,
maz@kernel.org, ruanjinjie@huawei.com, yphbchou0911@gmail.com,
wagi@kernel.org, frederic@kernel.org, longman@redhat.com,
chenridong@huawei.com, hare@suse.de, kch@nvidia.com,
ming.lei@redhat.com, steve@abita.co, sean@ashe.io,
chjohnst@gmail.com, neelx@suse.com, mproche@gmail.com,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux.dev, linux-nvme@lists.infradead.org,
linux-scsi@vger.kernel.org, megaraidlinux.pdl@broadcom.com,
mpi3mr-linuxdrv.pdl@broadcom.com,
MPT-FusionLinux.pdl@broadcom.com
Subject: Re: [PATCH v9 09/13] isolation: Introduce io_queue isolcpus type
Date: Thu, 2 Apr 2026 11:09:40 +0200 [thread overview]
Message-ID: <20260402090940.5j0WmVX_@linutronix.de> (raw)
In-Reply-To: <c7phvlvohdn2ksc2jymxk5foolwlqaqq2jzcdv7oic4uzomh3j@yjimbwcnnst3>
On 2026-04-01 16:58:22 [-0400], Aaron Tomlin wrote:
> Hi Sebastian,
Hi,
> Thank you for taking the time to document the "managed_irq" behaviour; it
> is immensely helpful. You raise a highly pertinent point regarding the
> potential proliferation of "isolcpus=" flags. It is certainly a situation
> that must be managed carefully to prevent every subsystem from demanding
> its own bit.
>
> To clarify the reasoning behind introducing "io_queue" rather than strictly
> relying on managed_irq:
>
> The managed_irq flag belongs firmly to the interrupt subsystem. It dictates
> whether a CPU is eligible to receive hardware interrupts whose affinity is
> managed by the kernel. Whilst many modern block drivers use managed IRQs,
> the block layer multi-queue mapping encompasses far more than just
> interrupt routing. It maps logical queues to CPUs to handle I/O submission,
> software queues, and crucially, poll queues, which do not utilise
> interrupts at all. Furthermore, there are specific drivers that do not use
> the managed IRQ infrastructure but still rely on the block layer for queue
> distribution.
Could you tell block which queue maps to which CPU at /sys/block/$$/mq/
level? Then you have one queue going to one CPU.
Then the drive could request one or more interrupts managed or not. For
managed you could specify a CPU mask which you desire to occupy.
You have the case where
- you have more queues than CPUs
- use all of them
- use less
- less queues than CPUs
- mapped a queue to more than once CPU in case it goes down or becomes
not available
- mapped to one CPU
Ideally you solve this at one level so that the device(s) can request
less queues than CPUs if told so without patching each and every driver.
This should give you the freedom to isolate CPUs, decide at boot time
which CPUs get I/O queues assigned. At run time you can tell which
queues go to which CPUs. If you shutdown a queue, the interrupt remains
but does not get any I/O requests assigned so no problem. If the CPU
goes down, same thing.
I am trying to come up with a design here which I haven't found so far.
But I might be late to the party and everyone else is fully aware.
> If managed_irq were solely relied upon, the IRQ subsystem would
> successfully keep hardware interrupts off the isolated CPUs, but the block
The managed_irqs can't be influence by userland. The CPUs are auto
distributed.
> layer would still blindly map polling queues or non-managed queues to those
> same isolated CPUs. This would force isolated CPUs to process I/O
> submissions or handle polling tasks, thereby breaking the strict isolation.
>
> Regarding the point about the networking subsystem, it is a very valid
> comparison. If the networking layer wishes to respect isolcpus in the
> future, adding a net flag would indeed exacerbate the bit proliferation.
Networking could also have different cases like adding a RX filter and
having HW putting packet based on it in a dedicated queue. But also in
this case I would like to have the freedome to decide which isolated
CPUs should receive interrupts/ traffic and which don't.
> For the present time, retaining io_queue seems the most prudent approach to
> ensure that block queue mapping remains semantically distinct from
> interrupt delivery. This provides an immediate and clean architectural
> boundary. However, if the consensus amongst the maintainers suggests that
> this is too granular, alternative approaches could certainly be considered
> for the future. For instance, a broader, more generic flag could be
> introduced to encompass both block and future networking queue mappings.
> Alternatively, if semantic conflation is deemed acceptable, the existing
> managed_irq housekeeping mask could simply be overloaded within the block
> layer to restrict all queue mappings.
>
> Keeping the current separation appears to be the cleanest solution for this
> series, but your thoughts, and those of the wider community, on potentially
> migrating to a consolidated generic flag in the future would be very much
> welcomed.
I just don't like introducing yet another boot argument, making it a
boot constraint while in my naive view this could be managed at some
degree via sysfs as suggested above.
>
> Kind regards,
Sebastian
next prev parent reply other threads:[~2026-04-02 9:09 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-30 22:10 [PATCH v9 00/13] blk: honor isolcpus configuration Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 01/13] scsi: aacraid: use block layer helpers to calculate num of queues Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 02/13] lib/group_cpus: remove dead !SMP code Aaron Tomlin
2026-04-01 12:29 ` Sebastian Andrzej Siewior
2026-04-01 19:31 ` Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 03/13] lib/group_cpus: Add group_mask_cpus_evenly() Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 04/13] genirq/affinity: Add cpumask to struct irq_affinity Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 05/13] blk-mq: add blk_mq_{online|possible}_queue_affinity Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 06/13] nvme-pci: use block layer helpers to constrain queue affinity Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 07/13] scsi: Use " Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 08/13] virtio: blk/scsi: use " Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 09/13] isolation: Introduce io_queue isolcpus type Aaron Tomlin
2026-04-01 12:49 ` Sebastian Andrzej Siewior
2026-04-01 19:05 ` Waiman Long
2026-04-02 7:58 ` Sebastian Andrzej Siewior
2026-04-03 1:54 ` Waiman Long
2026-04-01 20:58 ` Aaron Tomlin
2026-04-02 9:09 ` Sebastian Andrzej Siewior [this message]
2026-04-03 0:50 ` Aaron Tomlin
2026-04-03 1:20 ` Ming Lei
2026-03-30 22:10 ` [PATCH v9 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Aaron Tomlin
2026-03-31 23:05 ` Keith Busch
2026-04-01 17:16 ` Aaron Tomlin
2026-04-03 1:43 ` Ming Lei
2026-03-30 22:10 ` [PATCH v9 11/13] blk-mq: prevent offlining hk CPUs with associated online isolated CPUs Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 12/13] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 13/13] docs: add io_queue flag to isolcpus Aaron Tomlin
2026-03-31 1:01 ` [PATCH v9 00/13] blk: honor isolcpus configuration Ming Lei
2026-03-31 13:38 ` Aaron Tomlin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260402090940.5j0WmVX_@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=James.Bottomley@hansenpartnership.com \
--cc=MPT-FusionLinux.pdl@broadcom.com \
--cc=aacraid@microsemi.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@atomlin.com \
--cc=axboe@kernel.dk \
--cc=chandrakanth.patil@broadcom.com \
--cc=chenridong@huawei.com \
--cc=chjohnst@gmail.com \
--cc=frederic@kernel.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jinpu.wang@cloud.ionos.com \
--cc=juri.lelli@redhat.com \
--cc=kashyap.desai@broadcom.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=liyihang9@h-partners.com \
--cc=longman@redhat.com \
--cc=martin.petersen@oracle.com \
--cc=maz@kernel.org \
--cc=megaraidlinux.pdl@broadcom.com \
--cc=ming.lei@redhat.com \
--cc=mingo@redhat.com \
--cc=mpi3mr-linuxdrv.pdl@broadcom.com \
--cc=mproche@gmail.com \
--cc=mst@redhat.com \
--cc=neelx@suse.com \
--cc=peterz@infradead.org \
--cc=ranjan.kumar@broadcom.com \
--cc=ruanjinjie@huawei.com \
--cc=sagi@grimberg.me \
--cc=sathya.prakash@broadcom.com \
--cc=sean@ashe.io \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=sreekanth.reddy@broadcom.com \
--cc=steve@abita.co \
--cc=suganath-prabu.subramani@broadcom.com \
--cc=sumit.saxena@broadcom.com \
--cc=tglx@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=virtualization@lists.linux.dev \
--cc=wagi@kernel.org \
--cc=yphbchou0911@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox