linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node()
@ 2024-07-29 14:06 Breno Leitao
  2024-07-29 16:13 ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: Breno Leitao @ 2024-07-29 14:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin
  Cc: leit, Peter Zijlstra (Intel), Wei Liu, Marc Zyngier, Adrian Huang,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

I've been running some experiments with failslab fault injector running
to detect a different problem, and the machine always crash with the
following stack:

	can not alloc irq_pin_list (-1,0,20)
	Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed

	Call Trace:
	 panic
	   _printk
	   panic_smp_self_stop
	   rcu_is_watching
	   intel_irq_remapping_free

This happens because add_pin_to_irq_node() function would panic if
adding a pin to an IRQ failed due to -ENOMEM (which was injected by
failslab fault injector).  I've been running with this patch in my test
cases in order to be able to pick real bugs, and I thought it might be a
good idea to have it upstream also, so, other people trying to find real
bugs don't stumble upon this one. Also, this makes sense in a real
world(?), when retrying a few times might be better than just panicking.

Introduce a retry mechanism that attempts to add the pin up to 3 times
before giving up and panicking. This should improve the robustness of
the IO-APIC code in the face of transient errors.

Since __add_pin_to_irq_node() only returns 0 or -ENOMEM, the retry is only
for -ENOMEM case only.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/x86/kernel/apic/io_apic.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 477b740b2f26..2846a90366f2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -390,8 +390,14 @@ static void __remove_pin_from_irq(struct mp_chip_data *data, int apic, int pin)
 static void add_pin_to_irq_node(struct mp_chip_data *data,
 				int node, int apic, int pin)
 {
-	if (__add_pin_to_irq_node(data, node, apic, pin))
-		panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
+	int ret, i;
+
+	for (i = 0; i < 3; i++) {
+		ret = __add_pin_to_irq_node(data, node, apic, pin);
+		if (!ret)
+			return;
+	}
+	panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
 }
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-07 16:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-29 14:06 [PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node() Breno Leitao
2024-07-29 16:13 ` Thomas Gleixner
2024-07-29 16:55   ` Breno Leitao
2024-07-29 17:44     ` Thomas Gleixner
2024-07-30 10:28       ` Breno Leitao
2024-08-07 16:25     ` [tip: x86/apic] x86/ioapic: Handle allocation failures gracefully tip-bot2 for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).