* cpu_rmap maps CPUs to wrong interrupts after reprogramming affinities
@ 2024-12-13 18:18 Caleb Sander
2024-12-19 1:23 ` Jakub Kicinski
2024-12-19 22:04 ` Ben Hutchings
0 siblings, 2 replies; 3+ messages in thread
From: Caleb Sander @ 2024-12-13 18:18 UTC (permalink / raw)
To: David Miller, Tom Herbert, Thomas Gleixner, Eli Cohen,
Ben Hutchings, Jakub Kicinski, Eric Dumazet, Paolo Abeni
Cc: netdev, Linux Kernel Mailing List
Hi netdev,
While testing ARFS, we found set_rps_cpu() was calling
ndo_rx_flow_steer() with an RX queue that was not affinitized to the
desired CPU. The issue occurred only after modifying interrupt
affinities. It looks to be a bug in cpu_rmap, where cpu_rmap_update()
can leave CPUs mapped to interrupts which are no longer the most
closely affinitized to them.
Here is the simplest scenario:
1. A network device has 2 IRQs, 1 and 2. Initially only CPU A is
available to process the network device. So both IRQs 1 and 2 are
affinitized to CPU A.
rx_cpu_rmap maps CPU A to IRQ 2 (assuming the affinity of IRQ 2 was
set after IRQ 1)
2. CPU B becomes available to process the network device. So IRQ 2's
affinity is changed from CPU A to CPU B.
cpu_rmap_update() is called for IRQ 2 with its new affinity (CPU B).
It maps CPU B to IRQ 2. CPU A remains mapped to IRQ 2, though with a
higher distance.
rx_cpu_rmap now maps both CPUs A and B to IRQ 2. Any traffic meant to
be steered to CPU A will end up being processed in IRQ 2 on CPU B
instead, even though there is still an IRQ (1) affinitized to CPU A.
If IRQ 1 had been affinitized to CPU A and IRQ 2 to CPU B initially,
the cpu_rmap would have correctly mapped CPU A to IRQ 1 and CPU B to
IRQ 2. So the state of the cpu_rmap depends on the history of the IRQ
affinities, not just the current IRQ affinities.
This behavior was surprising to me, but perhaps it's working as
intended. It seems to be a limitation of struct cpu_rmap: it stores
only one IRQ with the lowest "distance" for each CPU, even if there
are other IRQs of equivalent or higher distance. When an IRQ's
affinity changes, each CPU currently affinitized to it has its
distance invalidated, but its new closest IRQ is selected based on
other CPUs' closest IRQs, ignoring existing IRQs that may be
affinitized to that CPU.
I can see a few possible ways to address this:
- Store the current affinity masks for all the IRQs in struct cpu_rmap
so the next closest IRQ can be computed when a CPU's closest IRQ is
invalidated. This would significantly increase the size of struct
cpu_rmap.
- Store all candidate IRQs and their distances for each CPU in struct
cpu_rmap so the next closest IRQ can be computed when a CPU's closest
IRQ is invalidated. Again, this would significantly increase the size
of struct cpu_rmap.
- Re-fetch the affinity masks of all the IRQs from the irq layer
whenever one IRQ's affinity changes so the next closest IRQ can be
computed for each invalidated CPU. This would avoid using any
additional memory, but would add a lot of calls into the irq layer.
- Work around the cpu_rmap behavior by having userspace always write
to all IRQs' affinity masks when changing the affinity of any one.
This is probably the simplest solution, but I worry that other
userspace applications would hit the same unexpected behavior.
Let me know whether you see this behavior as a bug in cpu_rmap or
something that userspace should work around. If you do think it's a
cpu_rmap bug, how would you like to fix it?
Thanks,
Caleb
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: cpu_rmap maps CPUs to wrong interrupts after reprogramming affinities
2024-12-13 18:18 cpu_rmap maps CPUs to wrong interrupts after reprogramming affinities Caleb Sander
@ 2024-12-19 1:23 ` Jakub Kicinski
2024-12-19 22:04 ` Ben Hutchings
1 sibling, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2024-12-19 1:23 UTC (permalink / raw)
To: Caleb Sander
Cc: David Miller, Tom Herbert, Thomas Gleixner, Eli Cohen,
Ben Hutchings, Eric Dumazet, Paolo Abeni, netdev,
Linux Kernel Mailing List
On Fri, 13 Dec 2024 10:18:30 -0800 Caleb Sander wrote:
> I can see a few possible ways to address this:
> - Store the current affinity masks for all the IRQs in struct cpu_rmap
> so the next closest IRQ can be computed when a CPU's closest IRQ is
> invalidated. This would significantly increase the size of struct
> cpu_rmap.
Ahmed is actively working on something along those lines:
https://lore.kernel.org/all/20241218165843.744647-1-ahmed.zaki@intel.com/
hopefully we can leverage it?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: cpu_rmap maps CPUs to wrong interrupts after reprogramming affinities
2024-12-13 18:18 cpu_rmap maps CPUs to wrong interrupts after reprogramming affinities Caleb Sander
2024-12-19 1:23 ` Jakub Kicinski
@ 2024-12-19 22:04 ` Ben Hutchings
1 sibling, 0 replies; 3+ messages in thread
From: Ben Hutchings @ 2024-12-19 22:04 UTC (permalink / raw)
To: Caleb Sander, David Miller, Tom Herbert, Thomas Gleixner,
Eli Cohen, Jakub Kicinski, Eric Dumazet, Paolo Abeni
Cc: netdev, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1166 bytes --]
On Fri, 2024-12-13 at 10:18 -0800, Caleb Sander wrote:
> Hi netdev,
> While testing ARFS, we found set_rps_cpu() was calling
> ndo_rx_flow_steer() with an RX queue that was not affinitized to the
> desired CPU. The issue occurred only after modifying interrupt
> affinities. It looks to be a bug in cpu_rmap, where cpu_rmap_update()
> can leave CPUs mapped to interrupts which are no longer the most
> closely affinitized to them.
>
> Here is the simplest scenario:
> 1. A network device has 2 IRQs, 1 and 2. Initially only CPU A is
> available to process the network device. So both IRQs 1 and 2 are
> affinitized to CPU A.
[...]
This seems like a misconfiguration: there shouldn't be more RX queues
than CPUs to handle them. I probably never considered it when
implementing cpu_rmap. Still, I agree that this could happen as a
transitory state, and the reverse-map ought to become sensible once all
the RX queues are assigned to different CPUs.
But I haven't looked at that code for over a decade, so I'm probably
not the right person to address this now.
Ben.
--
Ben Hutchings
The world is coming to an end. Please log off.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-12-19 22:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13 18:18 cpu_rmap maps CPUs to wrong interrupts after reprogramming affinities Caleb Sander
2024-12-19 1:23 ` Jakub Kicinski
2024-12-19 22:04 ` Ben Hutchings
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).