All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.19 0/9] x86/irq: fixes for CPU hot{,un}plug
@ 2024-05-29  9:01 Roger Pau Monne
  2024-05-29  9:01 ` [PATCH for-4.19 1/9] x86/irq: remove offline CPUs from old CPU mask when adjusting move_cleanup_count Roger Pau Monne
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: Roger Pau Monne @ 2024-05-29  9:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Oleksii Kurochko

Hello,

The following series aim to fix interrupt handling when doing CPU
plug/unplug operations.  Without this series running:

cpus=`xl info max_cpu_id`
while [ 1 ]; do
    for i in `seq 1 $cpus`; do
        xen-hptool cpu-offline $i;
        xen-hptool cpu-online $i;
    done
done

Quite quickly results in interrupts getting lost and "No irq handler for
vector" messages on the Xen console.  Drivers in dom0 also start getting
interrupt timeouts and the system becomes unusable.

After applying the series running the loop over night still result in a
fully usable system, no  "No irq handler for vector" messages at all, no
interrupt loses reported by dom0.  Test with
x2apic-mode={mixed,cluster}.

I'm tagging this for 4.19 as it's IMO bugfixes, but the series has grown
quite bigger than expected, and hence we need to be careful to not
introduce breakages late in the release cycle.  I've attempted to
document all code as good as I could, interrupt handling has some
unexpected corner cases that are hard to diagnose and reason about.

I'm currently also doing some extra testing with XenRT in case I've
missed something.

Thanks, Roger.

Roger Pau Monne (9):
  x86/irq: remove offline CPUs from old CPU mask when adjusting
    move_cleanup_count
  xen/cpu: do not get the CPU map in stop_machine_run()
  xen/cpu: ensure get_cpu_maps() returns false if CPU operations are
    underway
  x86/irq: describe how the interrupt CPU movement works
  x86/irq: limit interrupt movement done by fixup_irqs()
  x86/irq: restrict CPU movement in set_desc_affinity()
  x86/irq: deal with old_cpu_mask for interrupts in movement in
    fixup_irqs()
  x86/irq: handle moving interrupts in _assign_irq_vector()
  x86/irq: forward pending interrupts to new destination in fixup_irqs()

 xen/arch/x86/apic.c             |   5 +
 xen/arch/x86/include/asm/apic.h |   3 +
 xen/arch/x86/include/asm/irq.h  |  28 +++++-
 xen/arch/x86/irq.c              | 172 +++++++++++++++++++++++++-------
 xen/common/cpu.c                |   8 +-
 xen/common/stop_machine.c       |  15 +--
 xen/include/xen/cpu.h           |   2 +
 xen/include/xen/rwlock.h        |   2 +
 8 files changed, 190 insertions(+), 45 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-06-11 13:25 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29  9:01 [PATCH for-4.19 0/9] x86/irq: fixes for CPU hot{,un}plug Roger Pau Monne
2024-05-29  9:01 ` [PATCH for-4.19 1/9] x86/irq: remove offline CPUs from old CPU mask when adjusting move_cleanup_count Roger Pau Monne
2024-05-29 12:40   ` Jan Beulich
2024-05-29 15:15     ` Roger Pau Monné
2024-05-29 15:27       ` Jan Beulich
2024-05-29 15:34         ` Roger Pau Monné
2024-05-29  9:01 ` [PATCH for-4.19 2/9] xen/cpu: do not get the CPU map in stop_machine_run() Roger Pau Monne
2024-05-29 13:04   ` Jan Beulich
2024-05-29 15:20     ` Roger Pau Monné
2024-05-29 15:31       ` Jan Beulich
2024-05-29 15:48         ` Roger Pau Monné
2024-05-29  9:01 ` [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway Roger Pau Monne
2024-05-29 13:35   ` Jan Beulich
2024-05-29 15:03     ` Roger Pau Monné
2024-05-29 15:49       ` Jan Beulich
2024-05-29 16:14         ` Roger Pau Monné
2024-05-31  7:02           ` Jan Beulich
2024-05-31  7:31             ` Roger Pau Monné
2024-05-31  8:33               ` Jan Beulich
2024-05-31  9:15                 ` Roger Pau Monné
2024-05-29  9:01 ` [PATCH for-4.19 4/9] x86/irq: describe how the interrupt CPU movement works Roger Pau Monne
2024-05-29 13:57   ` Jan Beulich
2024-05-29 15:28     ` Roger Pau Monné
2024-05-31  7:06       ` Jan Beulich
2024-05-31  7:39         ` Roger Pau Monné
2024-05-29  9:01 ` [PATCH for-4.19 5/9] x86/irq: limit interrupt movement done by fixup_irqs() Roger Pau Monne
2024-05-29  9:01 ` [PATCH for-4.19 6/9] x86/irq: restrict CPU movement in set_desc_affinity() Roger Pau Monne
2024-05-29  9:01 ` [PATCH for-4.19 7/9] x86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs() Roger Pau Monne
2024-05-29  9:01 ` [PATCH for-4.19 8/9] x86/irq: handle moving interrupts in _assign_irq_vector() Roger Pau Monne
2024-05-29  9:01 ` [PATCH for-4.19 9/9] x86/irq: forward pending interrupts to new destination in fixup_irqs() Roger Pau Monne
2024-05-29  9:33   ` Andrew Cooper
2024-05-29  9:37 ` [PATCH for-4.19 0/9] x86/irq: fixes for CPU hot{,un}plug Oleksii K.
2024-06-11 13:25   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.