netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cpu_rmap maps CPUs to wrong interrupts after reprogramming affinities
@ 2024-12-13 18:18 Caleb Sander
  2024-12-19  1:23 ` Jakub Kicinski
  2024-12-19 22:04 ` Ben Hutchings
  0 siblings, 2 replies; 3+ messages in thread
From: Caleb Sander @ 2024-12-13 18:18 UTC (permalink / raw)
  To: David Miller, Tom Herbert, Thomas Gleixner, Eli Cohen,
	Ben Hutchings, Jakub Kicinski, Eric Dumazet, Paolo Abeni
  Cc: netdev, Linux Kernel Mailing List

Hi netdev,
While testing ARFS, we found set_rps_cpu() was calling
ndo_rx_flow_steer() with an RX queue that was not affinitized to the
desired CPU. The issue occurred only after modifying interrupt
affinities. It looks to be a bug in cpu_rmap, where cpu_rmap_update()
can leave CPUs mapped to interrupts which are no longer the most
closely affinitized to them.

Here is the simplest scenario:
1. A network device has 2 IRQs, 1 and 2. Initially only CPU A is
available to process the network device. So both IRQs 1 and 2 are
affinitized to CPU A.
rx_cpu_rmap maps CPU A to IRQ 2 (assuming the affinity of IRQ 2 was
set after IRQ 1)
2. CPU B becomes available to process the network device. So IRQ 2's
affinity is changed from CPU A to CPU B.
cpu_rmap_update() is called for IRQ 2 with its new affinity (CPU B).
It maps CPU B to IRQ 2. CPU A remains mapped to IRQ 2, though with a
higher distance.
rx_cpu_rmap now maps both CPUs A and B to IRQ 2. Any traffic meant to
be steered to CPU A will end up being processed in IRQ 2 on CPU B
instead, even though there is still an IRQ (1) affinitized to CPU A.

If IRQ 1 had been affinitized to CPU A and IRQ 2 to CPU B initially,
the cpu_rmap would have correctly mapped CPU A to IRQ 1 and CPU B to
IRQ 2. So the state of the cpu_rmap depends on the history of the IRQ
affinities, not just the current IRQ affinities.

This behavior was surprising to me, but perhaps it's working as
intended. It seems to be a limitation of struct cpu_rmap: it stores
only one IRQ with the lowest "distance" for each CPU, even if there
are other IRQs of equivalent or higher distance. When an IRQ's
affinity changes, each CPU currently affinitized to it has its
distance invalidated, but its new closest IRQ is selected based on
other CPUs' closest IRQs, ignoring existing IRQs that may be
affinitized to that CPU.

I can see a few possible ways to address this:
- Store the current affinity masks for all the IRQs in struct cpu_rmap
so the next closest IRQ can be computed when a CPU's closest IRQ is
invalidated. This would significantly increase the size of struct
cpu_rmap.
- Store all candidate IRQs and their distances for each CPU in struct
cpu_rmap so the next closest IRQ can be computed when a CPU's closest
IRQ is invalidated. Again, this would significantly increase the size
of struct cpu_rmap.
- Re-fetch the affinity masks of all the IRQs from the irq layer
whenever one IRQ's affinity changes so the next closest IRQ can be
computed for each invalidated CPU. This would avoid using any
additional memory, but would add a lot of calls into the irq layer.
- Work around the cpu_rmap behavior by having userspace always write
to all IRQs' affinity masks when changing the affinity of any one.
This is probably the simplest solution, but I worry that other
userspace applications would hit the same unexpected behavior.

Let me know whether you see this behavior as a bug in cpu_rmap or
something that userspace should work around. If you do think it's a
cpu_rmap bug, how would you like to fix it?

Thanks,
Caleb

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-12-19 22:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13 18:18 cpu_rmap maps CPUs to wrong interrupts after reprogramming affinities Caleb Sander
2024-12-19  1:23 ` Jakub Kicinski
2024-12-19 22:04 ` Ben Hutchings

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).