All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node()
@ 2024-07-29 14:06 Breno Leitao
  2024-07-29 16:13 ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: Breno Leitao @ 2024-07-29 14:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin
  Cc: leit, Peter Zijlstra (Intel), Wei Liu, Marc Zyngier, Adrian Huang,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

I've been running some experiments with failslab fault injector running
to detect a different problem, and the machine always crash with the
following stack:

	can not alloc irq_pin_list (-1,0,20)
	Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed

	Call Trace:
	 panic
	   _printk
	   panic_smp_self_stop
	   rcu_is_watching
	   intel_irq_remapping_free

This happens because add_pin_to_irq_node() function would panic if
adding a pin to an IRQ failed due to -ENOMEM (which was injected by
failslab fault injector).  I've been running with this patch in my test
cases in order to be able to pick real bugs, and I thought it might be a
good idea to have it upstream also, so, other people trying to find real
bugs don't stumble upon this one. Also, this makes sense in a real
world(?), when retrying a few times might be better than just panicking.

Introduce a retry mechanism that attempts to add the pin up to 3 times
before giving up and panicking. This should improve the robustness of
the IO-APIC code in the face of transient errors.

Since __add_pin_to_irq_node() only returns 0 or -ENOMEM, the retry is only
for -ENOMEM case only.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/x86/kernel/apic/io_apic.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 477b740b2f26..2846a90366f2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -390,8 +390,14 @@ static void __remove_pin_from_irq(struct mp_chip_data *data, int apic, int pin)
 static void add_pin_to_irq_node(struct mp_chip_data *data,
 				int node, int apic, int pin)
 {
-	if (__add_pin_to_irq_node(data, node, apic, pin))
-		panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
+	int ret, i;
+
+	for (i = 0; i < 3; i++) {
+		ret = __add_pin_to_irq_node(data, node, apic, pin);
+		if (!ret)
+			return;
+	}
+	panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
 }
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-07 16:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-29 14:06 [PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node() Breno Leitao
2024-07-29 16:13 ` Thomas Gleixner
2024-07-29 16:55   ` Breno Leitao
2024-07-29 17:44     ` Thomas Gleixner
2024-07-30 10:28       ` Breno Leitao
2024-08-07 16:25     ` [tip: x86/apic] x86/ioapic: Handle allocation failures gracefully tip-bot2 for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.