Re: Control of IRQ Affinities from Userspace

public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed

From: Florian Bezdeka <florian.bezdeka@siemens.com>
To: "Thomas Gleixner" <tglx@linutronix.de>,
	"bigeasy@linutronix.de" <bigeasy@linutronix.de>
Cc: "Preclik, Tobias" <tobias.preclik@siemens.com>,
	"Frederic Weisbecker" <frederic@kernel.org>,
	"linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>,
	"Kiszka, Jan" <jan.kiszka@siemens.com>,
	"Waiman Long" <longman@redhat.com>,
	"Gabriele Monaco" <gmonaco@redhat.com>
Subject: Re: Control of IRQ Affinities from Userspace
Date: Thu, 27 Nov 2025 15:52:17 +0100	[thread overview]
Message-ID: <DEJK91DAS7P0.1UN9SHE15VZRK@siemens.com> (raw)
In-Reply-To: <877bvchafm.ffs@tglx>

On Wed Nov 26, 2025 at 8:15 PM CET, Thomas Gleixner wrote:
>>
>> Are there any strong reasons for not exporting the default affinity from
>> the IRQ core? Read-only would be enough.
>
> Default affinity is yet another piece which is disconnected from all the
> other isolation mechanics. So we are not exporting it for some quick and
> dirty hack. You can do that of course in your own kernel, but please
> don't send the result to my inbox :)

Well, that's the reason for this discussion here: Upstream first.

>
>> In addition I'm quite sure that the housekeeping infrastructure would
>> not help in the area of networking as nobody (except one driver) is
>> based on the managed IRQ API.
>
> Managed interrupts are not user steerable and due to their strict
> CPU/CPUgroup relationship they are not required to be steerable. NVME &
> al have a strict command/response on the same queue scheme, which is
> obviously most efficient when you have per CPU queues. The nice thing
> about that concept is that the queues are only active (and having
> interrupts) when an application on a given CPU issues a R/W operation.
>
> Networking does not have that by default as their strategy of routing
> packages to queues is way more complicated and can be affected by
> hardware filtering etc.
>
> But why can't housekeeping help in general and why do you want to hack
> around the problem in random drivers?

No, that's not what I want. I'm highly interested in solving this
problem properly. Just trying to collect all the information at the
moment. I'm quite sure there is still something around that I did not
take into account yet.

>
> What's wrong with providing a new irq_set_affinity_hint_xxx() variant
> which takes a additional queue number as argument and let that do:
>
>     if (isolate) {
>         weight = cpumask_weight(housekeeping);
>         qnr %= weight;
>         cpu = cpumask_nth(qnr, housekeeping);
>         mask = cpumask_of(cpu);
>     }
>     return irq_set_affinity_hint(mask);
>     
> or something like that. From a quick glance over the drivers this could
> maybe be based on a queue number alone as most drivers do:
>
>       mask = cpumask_of(qnr % num_online_cpus());
>
> or something daft like that, which is obviously broken, but who cares.
> So that would become:
>
>     if (isolate) {
>         weight = cpumask_weight(housekeeping);
>         qnr %= weight;
>         cpu = cpumask_nth(qnr, housekeeping);
>     } else {
>         guard(cpus_read_lock)();
>         qnr %= num_online_cpus();
>         cpu = cpumask_nth(qnr, cpu_online_mask);
>     }
>     	
>     return irq_set_affinity_hint(cpumask_of(cpu));
>
> See?

That is close to a RFC that I was already preparing, until I realized
that it would only solve one part of the problem.

Part one: Get rid of unwanted IRQ traffic on my isolated cores. That
part would be covered as the balancing would be limited to !RT cores.
Fine.

Part two: In case the device is actually being used by an RT application
and allowed to run on isolated cores (userspace has properly configured
that upfront) we would get the opposite after loading a BPF: IRQs are
now configured wrong.

>
> That lets userspace still override the hint but does at least initial
> spreading within the housekeeping mask. Which ever mask that is out of
> the zoo of masks you best debate with Frederic. :)
>

Choosing the right mask is key. The right mask depends on the usage of
the device. Some devices (or maybe even just some queues) should be
limited to !RT CPUs, while others should explicitly run within a
isolated cpuset.

When I'm getting this right, the work from Frederic will bring in the
"isolated flag" for cpusets. That seems great preparation work. In
addition we would need something like a mapping between devices (or
queues maybe indirectly via IRQs) and cgroup/cpusets.

Have there been thoughts around a cpuset.interrupts API - or something
similar - already?

Best regards,
Florian

next prev parent reply	other threads:[~2025-11-27 15:32 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-30 14:20 Control of IRQ Affinities from Userspace Preclik, Tobias
2025-11-03 15:53 ` Sebastian Andrzej Siewior
2025-11-03 17:12   ` Florian Bezdeka
2025-11-05 13:11     ` Preclik, Tobias
2025-11-05 13:18       ` Preclik, Tobias
2025-11-11 14:35         ` bigeasy
2025-11-11 14:34       ` bigeasy
2025-11-21 13:25         ` Preclik, Tobias
2025-11-24  9:59           ` bigeasy
2025-11-25 11:32             ` Florian Bezdeka
2025-11-25 11:50               ` bigeasy
2025-11-25 14:36                 ` Florian Bezdeka
2025-11-25 16:31                   ` Thomas Gleixner
2025-11-26  9:20                     ` Florian Bezdeka
2025-11-26 14:26                       ` Thomas Gleixner
2025-11-26 15:07                         ` Florian Bezdeka
2025-11-26 19:15                           ` Thomas Gleixner
2025-11-27 14:06                             ` Preclik, Tobias
2025-11-27 14:52                             ` Florian Bezdeka [this message]
2025-11-27 18:09                               ` Thomas Gleixner
2025-11-28  7:33                                 ` Florian Bezdeka
2025-11-26 15:45                       ` Frederic Weisbecker
2025-11-26 15:31                 ` Frederic Weisbecker
2025-11-26 15:24               ` Frederic Weisbecker
2025-11-11 13:58     ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DEJK91DAS7P0.1UN9SHE15VZRK@siemens.com \
    --to=florian.bezdeka@siemens.com \
    --cc=bigeasy@linutronix.de \
    --cc=frederic@kernel.org \
    --cc=gmonaco@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tobias.preclik@siemens.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox