All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] genirq: Fix lockup in handle_edge_irq
@ 2025-07-01 16:35 Liangyan
  2025-07-02 13:17 ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: Liangyan @ 2025-07-01 16:35 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel, Liangyan, Yicong Shen

Yicong reported a softlockup in guest vm triggered by setting NIC IRQ
affinity in irqbalance service.

When a NIC IRQ affinity is changed from cpu 0 to cpu 1 and cpu 0 is
handling the first interrupt of this IRQ in handle_edge_irq, the second
interrupt is activated and handled in cpu 1 which sets IRQS_PENDING flag,
cpu 0 will invoke handle_irq_event again after finish the first interrupt.
If the interval between two interrupts is smaller than the latency of
handling one interrupt in the loop of handle_edge_irq (i.e., unmask_irq +
handle_irq_event), cpu 0 may repeat to invoke handle_irq_event and not
exit handle_edge_irq which causes softlockup at last(hardlockup is
not enabled in guest vm).

In our online guest vm, we have some heavy network traffic business,
the number of NIC interrupt is more that 1000 per second, the NIC
mask/unmask_irq will trap to host and consume more than 1ms, this
softlockup is easy to reproduce. By bpftrace, we can see cpu 0 invokes
handle_irq_event more than 5000 times in handle_edge_irq when
softlockup occurs.

To fix this, we can limit the repeat times of calling handle_irq_event.

       cpu 0                                        cpu 1

  handle_edge_irq
    spin_lock
    do {
        unmask_irq if IRQS_PENDING
                                                handle_edge_irq
        handle_irq_event
          istate &= ~IRQS_PENDING
          spin_unlock
                                                  spin_lock
                                                  istate |= IRQS_PENDING
          handle_irq_event_percpu                 mask_ack_irq
                                                  spin_unlock
          spin_lock
      } while(istate & IRQS_PENDING)
      spin_unlock

The softlockup traces look something like this:
-----
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L
Hardware name: ByteDance Inc. OpenStack Nova, BIOS
RIP: 0010:__do_softirq+0x78/0x2ac
RSP: 0018:ffffa02a00134f98 EFLAGS: 00000246
RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 00000000ffffffff
RDX: 00000000000000c1 RSI: ffffffff9e801040 RDI: 0000000000000016
RBP: ffffa02a000c7dd8 R08: 000002ea2320b76b R09: 7fffffffffffffff
R10: 000002ea3a1c0080 R11: 00000000002fefff R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000080
FS:  0000000000000000(0000) GS:ffff89323e840000(0000)
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2e5957c000 CR3: 0000000167a9a005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 __irq_exit_rcu+0xb9/0xf0
 sysvec_apic_timer_interrupt+0x72/0x90
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x16/0x20
RIP: 0010:cpuidle_enter_state+0xd2/0x400
RSP: 0018:ffffa02a000c7e80 EFLAGS: 00000202
RAX: ffff89323e870bc0 RBX: 0000000000000001 RCX: 00000000ffffffff
RDX: 0000000000000016 RSI: ffffffff9e801040 RDI: 0000000000000000
RBP: ffff89323e87c700 R08: 000002ea22ebdf87 R09: 0000000000000018
R10: 000000000000010d R11: 000000000000020a R12: ffffffff9dab58e0
R13: 000002ea22ebdf87 R14: 0000000000000001 R15: 0000000000000000
 cpuidle_enter+0x29/0x40
 cpuidle_idle_call+0xfa/0x160
 do_idle+0x7b/0xe0
 cpu_startup_entry+0x19/0x20
 start_secondary+0x116/0x140
 secondary_startup_64_no_verify+0xe5/0xeb
 </TASK>

Signed-off-by: Liangyan <liangyan.peng@bytedance.com>
Reported-by: Yicong Shen <shenyicong.1023@bytedance.com>
---
 kernel/irq/chip.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 2b274007e8ba..9f5c50e75e6b 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -764,6 +764,8 @@ EXPORT_SYMBOL_GPL(handle_fasteoi_nmi);
  */
 void handle_edge_irq(struct irq_desc *desc)
 {
+	bool need_unmask = false;
+
 	guard(raw_spinlock)(&desc->lock);
 
 	if (!irq_can_handle(desc)) {
@@ -791,12 +793,16 @@ void handle_edge_irq(struct irq_desc *desc)
 		if (unlikely(desc->istate & IRQS_PENDING)) {
 			if (!irqd_irq_disabled(&desc->irq_data) &&
 			    irqd_irq_masked(&desc->irq_data))
-				unmask_irq(desc);
+				need_unmask = true;
 		}
 
 		handle_irq_event(desc);
 
 	} while ((desc->istate & IRQS_PENDING) && !irqd_irq_disabled(&desc->irq_data));
+
+	if (need_unmask && !irqd_irq_disabled(&desc->irq_data) &&
+	    irqd_irq_masked(&desc->irq_data))
+		unmask_irq(desc);
 }
 EXPORT_SYMBOL(handle_edge_irq);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-07-08  1:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01 16:35 [RFC] genirq: Fix lockup in handle_edge_irq Liangyan
2025-07-02 13:17 ` Thomas Gleixner
2025-07-03 15:31   ` [External] " Liangyan
2025-07-04 14:42     ` Thomas Gleixner
2025-07-04 15:36       ` Liangyan
2025-07-08  1:43       ` Liangyan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.