From: Matt Turner <mattst88@gmail.com>
To: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org,
Richard Henderson <richard.henderson@linaro.org>,
Magnus Lindholm <linmag7@gmail.com>,
Ivan Kokshaysky <ink@unseen.parts>,
Matt Turner <mattst88@gmail.com>
Subject: [PATCH 1/3] alpha: smp: Serialize all synchronous IPI operations to fix SMP deadlock
Date: Sat, 30 May 2026 16:25:42 -0400 [thread overview]
Message-ID: <20260530202544.59231-2-mattst88@gmail.com> (raw)
In-Reply-To: <20260530202544.59231-1-mattst88@gmail.com>
Two or more CPUs simultaneously calling any function that uses
on_each_cpu(wait=1) or smp_call_function(wait=1) deadlock: each blocks
in csd_lock_wait spinning while waiting for the remote CPU to signal CSD
completion. While spinning, neither CPU can receive the other's IPI, so
neither completion signal arrives — permanent hang.
Affected callers: smp_imb, flush_tlb_all, flush_tlb_mm, flush_tlb_page,
flush_icache_user_page (smp.c) and migrate_flush_tlb_page (tlbflush.c).
Introduce alpha_smp_ipi_lock (plain spinlock, defined in smp.c, declared
in asm/smp.h) and apply it to all six callers. Rather than spin_lock(),
use a trylock loop with alpha_drain_ipi(): if the lock is held, the loser
actively drains any pending IPI bits on the local CPU before retrying.
This is necessary because some callers hold IRQs disabled (e.g. paths
that take spin_lock_irqsave), so no RTC interrupt will fire to rescue a
lost wripir edge via alpha_poll_ipi_inirq(). alpha_drain_ipi() calls
handle_ipi() under local_irq_save/restore, satisfying handle_ipi()'s
requirement that IRQs be disabled, without touching lockdep
hardirq-context state.
This fix is necessary but not sufficient. A separate, independent
deadlock path exists: if the target CPU is inside do_entInt at IPL=7
when wripir fires, the hardware IPI edge is lost and the sending CPU
spins forever even when only one CPU is issuing a wait=1 call. That
race is fixed independently by alpha_poll_ipi_inirq() (see follow-on
commit). Both fixes are required for a complete solution.
The deadlock has been observed on EV7/Marvel under workloads generating
a high rate of synchronous TLB flush IPIs (e.g. the git test suite).
Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Matt Turner <mattst88@gmail.com>
---
arch/alpha/include/asm/smp.h | 9 ++++++
arch/alpha/kernel/smp.c | 62 ++++++++++++++++++++++++++++++++++++
arch/alpha/mm/tlbflush.c | 3 ++
3 files changed, 74 insertions(+)
diff --git ./arch/alpha/include/asm/smp.h ./arch/alpha/include/asm/smp.h
index 2264ae72673b..8bd529376cf6 100644
--- ./arch/alpha/include/asm/smp.h
+++ ./arch/alpha/include/asm/smp.h
@@ -48,6 +48,15 @@ extern int smp_num_cpus;
extern void arch_send_call_function_single_ipi(int cpu);
extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
+/*
+ * Global spinlock serializing all synchronous (wait=1) IPI callers.
+ * Callers must use the trylock+alpha_drain_ipi() pattern, not spin_lock(),
+ * because some call sites hold IRQs disabled and cannot rely on the RTC
+ * interrupt to rescue a lost wripir edge.
+ */
+extern spinlock_t alpha_smp_ipi_lock;
+extern void alpha_drain_ipi(void);
+
#else /* CONFIG_SMP */
#define hard_smp_processor_id() 0
diff --git ./arch/alpha/kernel/smp.c ./arch/alpha/kernel/smp.c
index ed06367ece57..d900da49b0d8 100644
--- ./arch/alpha/kernel/smp.c
+++ ./arch/alpha/kernel/smp.c
@@ -597,11 +597,61 @@ ipi_imb(void *ignored)
imb();
}
+/*
+ * Serialize all synchronous (wait=1) IPI operations to prevent cross-CPU
+ * deadlock on EV7/Marvel. If two CPUs simultaneously call any function that
+ * uses on_each_cpu(wait=1) or smp_call_function(wait=1), each blocks in
+ * csd_lock_wait spinning for the remote CPU to signal completion. While
+ * spinning, neither CPU can receive the other's IPI, so neither completion
+ * signal arrives — permanent hang.
+ *
+ * A plain spinlock (not irqsave) is intentional: the CPU that loses the lock
+ * race spins with IRQs enabled and can service the winner's IPI before
+ * taking the lock itself.
+ *
+ * All callers of synchronous IPIs — including migrate_flush_tlb_page in
+ * tlbflush.c — must hold this lock.
+ */
+DEFINE_SPINLOCK(alpha_smp_ipi_lock);
+
+/*
+ * Drain any pending IPIs for this CPU while spinning on alpha_smp_ipi_lock.
+ *
+ * The lock holder has already sent a wripir but is blocked in csd_lock_wait
+ * waiting for our IPI ACK. We cannot simply spin on the lock: if IRQs are
+ * disabled (e.g. caller holds a spin_lock_irqsave), no RTC interrupt will
+ * fire and the lost wripir edge is never rescued by alpha_poll_ipi_inirq.
+ *
+ * Call this from the trylock loop so the IPI is processed even with IRQs
+ * disabled, breaking the circular wait.
+ *
+ * handle_ipi() requires IRQs disabled: generic_smp_call_function_interrupt
+ * asserts lockdep_assert_irqs_disabled(). Use local_irq_save/restore so
+ * this is safe whether the caller has IRQs enabled (e.g. page fault path)
+ * or disabled (e.g. spin_lock_irqsave holder). Avoid __irq_enter_raw/
+ * __irq_exit_raw: those manipulate lockdep hardirq-context state and trigger
+ * a lockdep WARNING when called while lockdep already tracks hardirq context.
+ */
+void alpha_drain_ipi(void)
+{
+ unsigned long flags;
+
+ if (!READ_ONCE(ipi_data[smp_processor_id()].bits))
+ return;
+
+ local_irq_save(flags);
+ handle_ipi(NULL); /* regs unused in handle_ipi() */
+ local_irq_restore(flags);
+}
+
void
smp_imb(void)
{
/* Must wait other processors to flush their icache before continue. */
+ while (!spin_trylock(&alpha_smp_ipi_lock))
+ alpha_drain_ipi();
on_each_cpu(ipi_imb, NULL, 1);
+ spin_unlock(&alpha_smp_ipi_lock);
}
EXPORT_SYMBOL(smp_imb);
@@ -616,7 +666,10 @@ flush_tlb_all(void)
{
/* Although we don't have any data to pass, we do want to
synchronize with the other processors. */
+ while (!spin_trylock(&alpha_smp_ipi_lock))
+ alpha_drain_ipi();
on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+ spin_unlock(&alpha_smp_ipi_lock);
}
#define asn_locked() (cpu_data[smp_processor_id()].asn_lock)
@@ -651,7 +704,10 @@ flush_tlb_mm(struct mm_struct *mm)
}
}
+ while (!spin_trylock(&alpha_smp_ipi_lock))
+ alpha_drain_ipi();
smp_call_function(ipi_flush_tlb_mm, mm, 1);
+ spin_unlock(&alpha_smp_ipi_lock);
preempt_enable();
}
@@ -702,7 +758,10 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
data.mm = mm;
data.addr = addr;
+ while (!spin_trylock(&alpha_smp_ipi_lock))
+ alpha_drain_ipi();
smp_call_function(ipi_flush_tlb_page, &data, 1);
+ spin_unlock(&alpha_smp_ipi_lock);
preempt_enable();
}
@@ -752,7 +811,10 @@ flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
}
}
+ while (!spin_trylock(&alpha_smp_ipi_lock))
+ alpha_drain_ipi();
smp_call_function(ipi_flush_icache_page, mm, 1);
+ spin_unlock(&alpha_smp_ipi_lock);
preempt_enable();
}
diff --git ./arch/alpha/mm/tlbflush.c ./arch/alpha/mm/tlbflush.c
index ccbc317b9a34..37607d08796b 100644
--- ./arch/alpha/mm/tlbflush.c
+++ ./arch/alpha/mm/tlbflush.c
@@ -89,7 +89,10 @@ void migrate_flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
* This is the "combined" version of flush_tlb_mm + per-page invalidate.
*/
preempt_disable();
+ while (!spin_trylock(&alpha_smp_ipi_lock))
+ alpha_drain_ipi();
on_each_cpu(ipi_flush_mm_and_page, &d, 1);
+ spin_unlock(&alpha_smp_ipi_lock);
/*
* mimic flush_tlb_mm()'s mm_users<=1 optimization.
--
2.53.0
next prev parent reply other threads:[~2026-05-30 20:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-30 20:25 [PATCH 0/3] alpha SMP fixes for EV7/Marvel Matt Turner
2026-05-30 20:25 ` Matt Turner [this message]
2026-05-30 20:25 ` [PATCH 2/3] alpha: Fix SMP IPI loss when target CPU is in interrupt handler Matt Turner
2026-05-30 20:25 ` [PATCH 3/3] alpha: Break down rescued IPI counter by type in /proc/interrupts Matt Turner
2026-05-31 8:24 ` [PATCH 0/3] alpha SMP fixes for EV7/Marvel Magnus Lindholm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260530202544.59231-2-mattst88@gmail.com \
--to=mattst88@gmail.com \
--cc=ink@unseen.parts \
--cc=linmag7@gmail.com \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox