* [PATCH for-4.20 v3 1/5] x86/shutdown: offline APs with interrupts disabled on all CPUs
2025-02-11 11:02 [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown Roger Pau Monne
@ 2025-02-11 11:02 ` Roger Pau Monne
2025-02-11 11:23 ` Jan Beulich
2025-02-11 11:02 ` [PATCH for-4.20 v3 2/5] x86/irq: drop fixup_irqs() parameters Roger Pau Monne
` (4 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monne @ 2025-02-11 11:02 UTC (permalink / raw)
To: xen-devel; +Cc: oleksii.kurochko, Roger Pau Monne, Jan Beulich, Andrew Cooper
The current shutdown logic in smp_send_stop() will disable the APs while
having interrupts enabled on the BSP or possibly other APs. On AMD systems
this can lead to local APIC errors:
APIC error on CPU0: 00(08), Receive accept error
Such error message can be printed in a loop, thus blocking the system from
rebooting. I assume this loop is created by the error being triggered by
the console interrupt, which is further stirred by the ESR handler
printing to the console.
Intel SDM states:
"Receive Accept Error.
Set when the local APIC detects that the message it received was not
accepted by any APIC on the APIC bus, including itself. Used only on P6
family and Pentium processors."
So the error shouldn't trigger on any Intel CPU supported by Xen.
However AMD doesn't make such claims, and indeed the error is broadcast to
all local APICs when an interrupt targets a CPU that's already offline.
To prevent the error from stalling the shutdown process perform the
disabling of APs and the BSP local APIC with interrupts disabled on all
CPUs in the system, so that by the time interrupts are unmasked on the BSP
the local APIC is already disabled. This can still lead to a spurious:
APIC error on CPU0: 00(00)
As a result of an LVT Error getting injected while interrupts are masked on
the CPU, and the vector only handled after the local APIC is already
disabled. ESR reports 0 because as part of disable_local_APIC() the ESR
register is cleared.
Note the NMI crash path doesn't have such issue, because disabling of APs
and the caller local APIC is already done in the same contiguous region
with interrupts disabled. There's a possible window on the NMI crash path
(nmi_shootdown_cpus()) where some APs might be disabled (and thus
interrupts targeting them raising "Receive accept error") before others APs
have interrupts disabled. However the shutdown NMI will be handled,
regardless of whether the AP is processing a local APIC error, and hence
such interrupts will not cause the shutdown process to get stuck.
Remove the call to fixup_irqs() in smp_send_stop(): it doesn't achieve the
intended goal of moving all interrupts to the BSP anyway. The logic in
fixup_irqs() will move interrupts whose affinity doesn't overlap with the
passed mask, but the movement of interrupts is done to any CPU set in
cpu_online_map. As in the shutdown path fixup_irqs() is called before APs
are cleared from cpu_online_map this leads to interrupts being shuffled
around, but not assigned to the BSP exclusively.
The Fixes tag is more of a guess than a certainty; it's possible the
previous sleep window in fixup_irqs() allowed any in-flight interrupt to be
delivered before APs went offline. However fixup_irqs() was still
incorrectly used, as it didn't (and still doesn't) move all interrupts to
target the provided cpu mask.
Fixes: e2bb28d62158 ('x86/irq: forward pending interrupts to new destination in fixup_irqs()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
- s/boradcasted/broadcast/
- Expand commit message regarding fixup_irqs().
Changes since v1:
- Change the approach and instead rely on having interrupts uniformly
disabled when offlining APs.
---
xen/arch/x86/smp.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 02a6ed7593f3..1d3878826f07 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -345,6 +345,11 @@ void __stop_this_cpu(void)
static void cf_check stop_this_cpu(void *dummy)
{
+ const bool *stop_aps = dummy;
+
+ while ( !*stop_aps )
+ cpu_relax();
+
__stop_this_cpu();
for ( ; ; )
halt();
@@ -357,16 +362,25 @@ static void cf_check stop_this_cpu(void *dummy)
void smp_send_stop(void)
{
unsigned int cpu = smp_processor_id();
+ bool stop_aps = false;
+
+ /*
+ * Perform AP offlining and disabling of interrupt controllers with all
+ * CPUs on the system having interrupts disabled to prevent interrupt
+ * delivery errors. On AMD systems "Receive accept error" will be
+ * broadcast to local APICs if interrupts target CPUs that are offline.
+ */
+ if ( num_online_cpus() > 1 )
+ smp_call_function(stop_this_cpu, &stop_aps, 0);
+
+ local_irq_disable();
if ( num_online_cpus() > 1 )
{
int timeout = 10;
- local_irq_disable();
- fixup_irqs(cpumask_of(cpu), 0);
- local_irq_enable();
-
- smp_call_function(stop_this_cpu, NULL, 0);
+ /* Signal APs to stop. */
+ stop_aps = true;
/* Wait 10ms for all other CPUs to go offline. */
while ( (num_online_cpus() > 1) && (timeout-- > 0) )
@@ -375,13 +389,12 @@ void smp_send_stop(void)
if ( cpu_online(cpu) )
{
- local_irq_disable();
disable_IO_APIC();
hpet_disable();
__stop_this_cpu();
x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
- local_irq_enable();
}
+ local_irq_enable();
}
void smp_send_nmi_allbutself(void)
--
2.46.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH for-4.20 v3 1/5] x86/shutdown: offline APs with interrupts disabled on all CPUs
2025-02-11 11:02 ` [PATCH for-4.20 v3 1/5] x86/shutdown: offline APs with interrupts disabled on all CPUs Roger Pau Monne
@ 2025-02-11 11:23 ` Jan Beulich
2025-02-11 14:12 ` Roger Pau Monné
0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2025-02-11 11:23 UTC (permalink / raw)
To: Roger Pau Monne; +Cc: oleksii.kurochko, Andrew Cooper, xen-devel
On 11.02.2025 12:02, Roger Pau Monne wrote:
> The current shutdown logic in smp_send_stop() will disable the APs while
> having interrupts enabled on the BSP or possibly other APs. On AMD systems
> this can lead to local APIC errors:
>
> APIC error on CPU0: 00(08), Receive accept error
>
> Such error message can be printed in a loop, thus blocking the system from
> rebooting. I assume this loop is created by the error being triggered by
> the console interrupt, which is further stirred by the ESR handler
> printing to the console.
>
> Intel SDM states:
>
> "Receive Accept Error.
>
> Set when the local APIC detects that the message it received was not
> accepted by any APIC on the APIC bus, including itself. Used only on P6
> family and Pentium processors."
>
> So the error shouldn't trigger on any Intel CPU supported by Xen.
>
> However AMD doesn't make such claims, and indeed the error is broadcast to
> all local APICs when an interrupt targets a CPU that's already offline.
>
> To prevent the error from stalling the shutdown process perform the
> disabling of APs and the BSP local APIC with interrupts disabled on all
> CPUs in the system, so that by the time interrupts are unmasked on the BSP
> the local APIC is already disabled. This can still lead to a spurious:
>
> APIC error on CPU0: 00(00)
>
> As a result of an LVT Error getting injected while interrupts are masked on
> the CPU, and the vector only handled after the local APIC is already
> disabled. ESR reports 0 because as part of disable_local_APIC() the ESR
> register is cleared.
>
> Note the NMI crash path doesn't have such issue, because disabling of APs
> and the caller local APIC is already done in the same contiguous region
> with interrupts disabled. There's a possible window on the NMI crash path
> (nmi_shootdown_cpus()) where some APs might be disabled (and thus
> interrupts targeting them raising "Receive accept error") before others APs
> have interrupts disabled. However the shutdown NMI will be handled,
> regardless of whether the AP is processing a local APIC error, and hence
> such interrupts will not cause the shutdown process to get stuck.
>
> Remove the call to fixup_irqs() in smp_send_stop(): it doesn't achieve the
> intended goal of moving all interrupts to the BSP anyway. The logic in
> fixup_irqs() will move interrupts whose affinity doesn't overlap with the
> passed mask, but the movement of interrupts is done to any CPU set in
> cpu_online_map. As in the shutdown path fixup_irqs() is called before APs
> are cleared from cpu_online_map this leads to interrupts being shuffled
> around, but not assigned to the BSP exclusively.
Which would have been possible to address by changing to something like
if ( !cpumask_intersects(mask, desc->affinity) )
{
break_affinity = true;
cpumask_copy(affinity, mask);
}
else
cpumask_and(affinity, mask, desc->affinity);
there, I guess.
> The Fixes tag is more of a guess than a certainty; it's possible the
> previous sleep window in fixup_irqs() allowed any in-flight interrupt to be
> delivered before APs went offline. However fixup_irqs() was still
> incorrectly used, as it didn't (and still doesn't) move all interrupts to
> target the provided cpu mask.
Plus there's the vector shortage aspect, if everything was moved to the
BSP. I don't think that's possible to get past without doing what you
do.
> Fixes: e2bb28d62158 ('x86/irq: forward pending interrupts to new destination in fixup_irqs()')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH for-4.20 v3 1/5] x86/shutdown: offline APs with interrupts disabled on all CPUs
2025-02-11 11:23 ` Jan Beulich
@ 2025-02-11 14:12 ` Roger Pau Monné
0 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2025-02-11 14:12 UTC (permalink / raw)
To: Jan Beulich; +Cc: oleksii.kurochko, Andrew Cooper, xen-devel
On Tue, Feb 11, 2025 at 12:23:56PM +0100, Jan Beulich wrote:
> On 11.02.2025 12:02, Roger Pau Monne wrote:
> > The current shutdown logic in smp_send_stop() will disable the APs while
> > having interrupts enabled on the BSP or possibly other APs. On AMD systems
> > this can lead to local APIC errors:
> >
> > APIC error on CPU0: 00(08), Receive accept error
> >
> > Such error message can be printed in a loop, thus blocking the system from
> > rebooting. I assume this loop is created by the error being triggered by
> > the console interrupt, which is further stirred by the ESR handler
> > printing to the console.
> >
> > Intel SDM states:
> >
> > "Receive Accept Error.
> >
> > Set when the local APIC detects that the message it received was not
> > accepted by any APIC on the APIC bus, including itself. Used only on P6
> > family and Pentium processors."
> >
> > So the error shouldn't trigger on any Intel CPU supported by Xen.
> >
> > However AMD doesn't make such claims, and indeed the error is broadcast to
> > all local APICs when an interrupt targets a CPU that's already offline.
> >
> > To prevent the error from stalling the shutdown process perform the
> > disabling of APs and the BSP local APIC with interrupts disabled on all
> > CPUs in the system, so that by the time interrupts are unmasked on the BSP
> > the local APIC is already disabled. This can still lead to a spurious:
> >
> > APIC error on CPU0: 00(00)
> >
> > As a result of an LVT Error getting injected while interrupts are masked on
> > the CPU, and the vector only handled after the local APIC is already
> > disabled. ESR reports 0 because as part of disable_local_APIC() the ESR
> > register is cleared.
> >
> > Note the NMI crash path doesn't have such issue, because disabling of APs
> > and the caller local APIC is already done in the same contiguous region
> > with interrupts disabled. There's a possible window on the NMI crash path
> > (nmi_shootdown_cpus()) where some APs might be disabled (and thus
> > interrupts targeting them raising "Receive accept error") before others APs
> > have interrupts disabled. However the shutdown NMI will be handled,
> > regardless of whether the AP is processing a local APIC error, and hence
> > such interrupts will not cause the shutdown process to get stuck.
> >
> > Remove the call to fixup_irqs() in smp_send_stop(): it doesn't achieve the
> > intended goal of moving all interrupts to the BSP anyway. The logic in
> > fixup_irqs() will move interrupts whose affinity doesn't overlap with the
> > passed mask, but the movement of interrupts is done to any CPU set in
> > cpu_online_map. As in the shutdown path fixup_irqs() is called before APs
> > are cleared from cpu_online_map this leads to interrupts being shuffled
> > around, but not assigned to the BSP exclusively.
>
> Which would have been possible to address by changing to something like
>
> if ( !cpumask_intersects(mask, desc->affinity) )
> {
> break_affinity = true;
> cpumask_copy(affinity, mask);
> }
> else
> cpumask_and(affinity, mask, desc->affinity);
>
> there, I guess.
Possibly, but note _assign_irq_vector() could also refuse to move the
interrupts if there's a pending movement and the current target CPU is
still set as online in cpu_online_map. Overall I think going down
that route is way more complex.
>
> > The Fixes tag is more of a guess than a certainty; it's possible the
> > previous sleep window in fixup_irqs() allowed any in-flight interrupt to be
> > delivered before APs went offline. However fixup_irqs() was still
> > incorrectly used, as it didn't (and still doesn't) move all interrupts to
> > target the provided cpu mask.
>
> Plus there's the vector shortage aspect, if everything was moved to the
> BSP. I don't think that's possible to get past without doing what you
> do.
Indeed, and the interrupt movement was IMO way more complex than what
I'm proposing (even with the followup patches that attempt to silence
device interrupts).
> > Fixes: e2bb28d62158 ('x86/irq: forward pending interrupts to new destination in fixup_irqs()')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thanks, Roger.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH for-4.20 v3 2/5] x86/irq: drop fixup_irqs() parameters
2025-02-11 11:02 [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown Roger Pau Monne
2025-02-11 11:02 ` [PATCH for-4.20 v3 1/5] x86/shutdown: offline APs with interrupts disabled on all CPUs Roger Pau Monne
@ 2025-02-11 11:02 ` Roger Pau Monne
2025-02-11 11:02 ` [PATCH for-4.20 v3 3/5] x86/smp: perform disabling on interrupts ahead of AP shutdown Roger Pau Monne
` (3 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monne @ 2025-02-11 11:02 UTC (permalink / raw)
To: xen-devel; +Cc: oleksii.kurochko, Roger Pau Monne, Jan Beulich, Andrew Cooper
The solely remaining caller always passes the same globally available
parameters. Drop the parameters and modify fixup_irqs() to use
cpu_online_map in place of the input mask parameter, and always be verbose
in its output printing.
While there remove some of the checks given the single context where
fixup_irqs() is now called, which should always be in the CPU offline path,
after the CPU going offline has been removed from cpu_online_map.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
There's more cleanup that can likely be done here, but it's best if such
cleanup is done after the cpu_mask and old_cpu_mask irq_desc fields are
converted from cpu masks to integers, as logic delivery mode should never
be used for external interrupts now.
---
xen/arch/x86/include/asm/irq.h | 4 ++--
xen/arch/x86/irq.c | 30 +++++++++++++-----------------
xen/arch/x86/smpboot.c | 2 +-
3 files changed, 16 insertions(+), 20 deletions(-)
diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index d3bc76806808..354868ba31ab 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -168,8 +168,8 @@ void free_domain_pirqs(struct domain *d);
int map_domain_emuirq_pirq(struct domain *d, int pirq, int emuirq);
int unmap_domain_pirq_emuirq(struct domain *d, int pirq);
-/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
-void fixup_irqs(const cpumask_t *mask, bool verbose);
+/* Evacuate interrupts assigned to CPUs not present in the CPU online map. */
+void fixup_irqs(void);
void fixup_eoi(void);
int init_irq_data(void);
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index e56bacc88d84..ff3ac832f4b9 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2590,17 +2590,21 @@ static int __init cf_check setup_dump_irqs(void)
}
__initcall(setup_dump_irqs);
-/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
-void fixup_irqs(const cpumask_t *mask, bool verbose)
+/* Evacuate interrupts assigned to CPUs not present in the CPU online map. */
+void fixup_irqs(void)
{
+ const unsigned int cpu = smp_processor_id();
unsigned int irq;
static int warned;
struct irq_desc *desc;
+ /* Only to be called from the context of a CPU going offline. */
+ ASSERT(!cpu_online(cpu));
+
for ( irq = 0; irq < nr_irqs; irq++ )
{
bool break_affinity = false, set_affinity = true, check_irr = false;
- unsigned int vector, cpu = smp_processor_id();
+ unsigned int vector;
cpumask_t *affinity = this_cpu(scratch_cpumask);
if ( irq == 2 )
@@ -2644,12 +2648,6 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
}
if ( desc->arch.move_in_progress &&
- /*
- * Only attempt to adjust the mask if the current CPU is going
- * offline, otherwise the whole system is going down and leaving
- * stale data in the masks is fine.
- */
- !cpu_online(cpu) &&
cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
{
/*
@@ -2691,16 +2689,17 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
/*
* Avoid shuffling the interrupt around as long as current target CPUs
- * are a subset of the input mask. What fixup_irqs() cares about is
- * evacuating interrupts from CPUs not in the input mask.
+ * are a subset of the online mask. What fixup_irqs() cares about is
+ * evacuating interrupts from CPUs not in the online mask.
*/
- if ( !desc->action || cpumask_subset(desc->arch.cpu_mask, mask) )
+ if ( !desc->action || cpumask_subset(desc->arch.cpu_mask,
+ &cpu_online_map) )
{
spin_unlock(&desc->lock);
continue;
}
- if ( !cpumask_intersects(mask, desc->affinity) )
+ if ( !cpumask_intersects(&cpu_online_map, desc->affinity) )
{
break_affinity = true;
cpumask_setall(affinity);
@@ -2716,7 +2715,7 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
* the interrupt, signal to check whether there are any pending vectors
* to be handled in the local APIC after the interrupt has been moved.
*/
- if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
+ if ( cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
check_irr = true;
if ( desc->handler->set_affinity )
@@ -2743,9 +2742,6 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
spin_unlock(&desc->lock);
- if ( !verbose )
- continue;
-
if ( !set_affinity )
printk("Cannot set affinity for IRQ%u\n", irq);
else if ( break_affinity )
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 79a79c54c304..891a29fca146 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1282,7 +1282,7 @@ void __cpu_disable(void)
/* It's now safe to remove this processor from the online map */
cpumask_clear_cpu(cpu, &cpu_online_map);
- fixup_irqs(&cpu_online_map, 1);
+ fixup_irqs();
fixup_eoi();
}
--
2.46.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH for-4.20 v3 3/5] x86/smp: perform disabling on interrupts ahead of AP shutdown
2025-02-11 11:02 [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown Roger Pau Monne
2025-02-11 11:02 ` [PATCH for-4.20 v3 1/5] x86/shutdown: offline APs with interrupts disabled on all CPUs Roger Pau Monne
2025-02-11 11:02 ` [PATCH for-4.20 v3 2/5] x86/irq: drop fixup_irqs() parameters Roger Pau Monne
@ 2025-02-11 11:02 ` Roger Pau Monne
2025-02-11 11:02 ` [PATCH for-4.20 v3 4/5] x86/pci: disable MSI(-X) on all devices at shutdown Roger Pau Monne
` (2 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monne @ 2025-02-11 11:02 UTC (permalink / raw)
To: xen-devel; +Cc: oleksii.kurochko, Roger Pau Monne, Jan Beulich, Andrew Cooper
Move the disabling of interrupt sources so it's done ahead of the offlining
of APs. This is to prevent AMD systems triggering "Receive accept error"
when interrupts target CPUs that are no longer online.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v1:
- New in this version.
---
xen/arch/x86/smp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 1d3878826f07..4d29a09a9a95 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -374,6 +374,8 @@ void smp_send_stop(void)
smp_call_function(stop_this_cpu, &stop_aps, 0);
local_irq_disable();
+ disable_IO_APIC();
+ hpet_disable();
if ( num_online_cpus() > 1 )
{
@@ -389,8 +391,6 @@ void smp_send_stop(void)
if ( cpu_online(cpu) )
{
- disable_IO_APIC();
- hpet_disable();
__stop_this_cpu();
x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
}
--
2.46.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH for-4.20 v3 4/5] x86/pci: disable MSI(-X) on all devices at shutdown
2025-02-11 11:02 [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown Roger Pau Monne
` (2 preceding siblings ...)
2025-02-11 11:02 ` [PATCH for-4.20 v3 3/5] x86/smp: perform disabling on interrupts ahead of AP shutdown Roger Pau Monne
@ 2025-02-11 11:02 ` Roger Pau Monne
2025-02-11 11:34 ` Jan Beulich
2025-02-11 14:48 ` [PATCH for-4.20 v4 " Roger Pau Monne
2025-02-11 11:02 ` [PATCH for-4.20 v3 5/5] x86/iommu: disable interrupts " Roger Pau Monne
2025-02-11 18:39 ` [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors " Roger Pau Monné
5 siblings, 2 replies; 18+ messages in thread
From: Roger Pau Monne @ 2025-02-11 11:02 UTC (permalink / raw)
To: xen-devel
Cc: oleksii.kurochko, Roger Pau Monne, Jan Beulich, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini
Attempt to disable MSI(-X) capabilities on all PCI devices know by Xen at
shutdown. Doing such disabling should facilitate kexec chained kernel from
booting more reliably, as device MSI(-X) interrupt generation should be
quiesced. Only attempt to disable MSI(-X) on all devices in the crash
context if the PCI lock is not taken, otherwise the PCI device list could
be in an inconsistent state.
Disabling MSI(-X) should prevent "Receive accept error" being raised as a
result of non-disabled interrupts targeting offline CPUs.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
- Disable MSI(-X) ahead of IO-APIC disabling.
- Duplicate and expand comment about pci_iterate_devices() in function
declaration.
- Add assert interrupts are disabled in pci_iterate_devices().
- Only walk PCI device list on crash context if the pci lock is not taken.
Changes since v1:
- Split from bigger patch.
- Iterate over all devices, even if the handler returns failure.
---
xen/arch/x86/crash.c | 7 ++++++
xen/arch/x86/include/asm/msi.h | 1 +
xen/arch/x86/msi.c | 18 ++++++++++++++++
xen/arch/x86/smp.c | 1 +
xen/drivers/passthrough/pci.c | 39 ++++++++++++++++++++++++++++++++++
xen/include/xen/pci.h | 7 ++++++
6 files changed, 73 insertions(+)
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index a789416ca3ae..621af81acc09 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -175,6 +175,13 @@ static void nmi_shootdown_cpus(void)
*/
x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
+ if ( !pcidevs_locked() )
+ /*
+ * Assume the PCI device list to be in a consistent state if the
+ * lock is not held when the crash happened.
+ */
+ pci_disable_msi_all();
+
disable_IO_APIC();
hpet_disable();
}
diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index 63adb19820e8..7f9e531f73e6 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -86,6 +86,7 @@ extern int pci_enable_msi(struct pci_dev *pdev, struct msi_info *msi,
extern void pci_disable_msi(struct msi_desc *msi_desc);
extern int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool off);
extern void pci_cleanup_msi(struct pci_dev *pdev);
+extern void pci_disable_msi_all(void);
extern int setup_msi_irq(struct irq_desc *desc, struct msi_desc *msidesc);
extern int __setup_msi_irq(struct irq_desc *desc, struct msi_desc *msidesc,
hw_irq_controller *handler);
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index e2360579deda..c9fe942c46f3 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1248,6 +1248,24 @@ void pci_cleanup_msi(struct pci_dev *pdev)
msi_free_irqs(pdev);
}
+static int cf_check disable_msi(struct pci_dev *pdev, void *arg)
+{
+ msi_set_enable(pdev, 0);
+ msix_set_enable(pdev, 0);
+
+ return 0;
+}
+
+/* Disable MSI and/or MSI-X on all devices known by Xen. */
+void pci_disable_msi_all(void)
+{
+ int rc = pci_iterate_devices(disable_msi, NULL);
+
+ if ( rc )
+ printk(XENLOG_ERR
+ "Failed to disable MSI(-X) on some devices: %d\n", rc);
+}
+
int pci_reset_msix_state(struct pci_dev *pdev)
{
unsigned int pos = pdev->msix_pos;
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 4d29a09a9a95..28bc041e03a9 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -374,6 +374,7 @@ void smp_send_stop(void)
smp_call_function(stop_this_cpu, &stop_aps, 0);
local_irq_disable();
+ pci_disable_msi_all();
disable_IO_APIC();
hpet_disable();
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index f398c3aa65dc..6c6b18a16dbc 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1802,6 +1802,45 @@ int iommu_do_pci_domctl(
return ret;
}
+struct segment_iter {
+ int (*handler)(struct pci_dev *pdev, void *arg);
+ void *arg;
+ int rc;
+};
+
+static int cf_check iterate_all(struct pci_seg *pseg, void *arg)
+{
+ struct segment_iter *iter = arg;
+ struct pci_dev *pdev;
+
+ list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
+ {
+ int rc = iter->handler(pdev, iter->arg);
+
+ if ( !iter->rc )
+ iter->rc = rc;
+ }
+
+ return 0;
+}
+
+/*
+ * Iterate without locking or preemption over all PCI devices known by Xen.
+ * Expected to be called with interrupts disabled.
+ */
+int pci_iterate_devices(int (*handler)(struct pci_dev *pdev, void *arg),
+ void *arg)
+{
+ struct segment_iter iter = {
+ .handler = handler,
+ .arg = arg,
+ };
+
+ ASSERT(!local_irq_is_enabled());
+
+ return pci_segments_iterate(iterate_all, &iter) ?: iter.rc;
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index f784e9116059..815589dcea16 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -226,6 +226,13 @@ struct pci_dev *pci_get_pdev(const struct domain *d, pci_sbdf_t sbdf);
struct pci_dev *pci_get_real_pdev(pci_sbdf_t sbdf);
void pci_check_disable_device(u16 seg, u8 bus, u8 devfn);
+/*
+ * Iterate without locking or preemption over all PCI devices known by Xen.
+ * Expected to be called with interrupts disabled.
+ */
+int pci_iterate_devices(int (*handler)(struct pci_dev *pdev, void *arg),
+ void *arg);
+
uint8_t pci_conf_read8(pci_sbdf_t sbdf, unsigned int reg);
uint16_t pci_conf_read16(pci_sbdf_t sbdf, unsigned int reg);
uint32_t pci_conf_read32(pci_sbdf_t sbdf, unsigned int reg);
--
2.46.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH for-4.20 v3 4/5] x86/pci: disable MSI(-X) on all devices at shutdown
2025-02-11 11:02 ` [PATCH for-4.20 v3 4/5] x86/pci: disable MSI(-X) on all devices at shutdown Roger Pau Monne
@ 2025-02-11 11:34 ` Jan Beulich
2025-02-11 14:19 ` Roger Pau Monné
2025-02-11 14:48 ` [PATCH for-4.20 v4 " Roger Pau Monne
1 sibling, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2025-02-11 11:34 UTC (permalink / raw)
To: Roger Pau Monne
Cc: oleksii.kurochko, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, xen-devel
On 11.02.2025 12:02, Roger Pau Monne wrote:
> --- a/xen/arch/x86/crash.c
> +++ b/xen/arch/x86/crash.c
> @@ -175,6 +175,13 @@ static void nmi_shootdown_cpus(void)
> */
> x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
>
> + if ( !pcidevs_locked() )
> + /*
> + * Assume the PCI device list to be in a consistent state if the
> + * lock is not held when the crash happened.
> + */
> + pci_disable_msi_all();
Hmm, I really meant try-lock to be used here. For recursive locks
rspin_is_locked() tells you only whether the local CPU owns the lock,
whereas here you want to know whether anyone owns it.
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1802,6 +1802,45 @@ int iommu_do_pci_domctl(
> return ret;
> }
>
> +struct segment_iter {
> + int (*handler)(struct pci_dev *pdev, void *arg);
> + void *arg;
> + int rc;
> +};
> +
> +static int cf_check iterate_all(struct pci_seg *pseg, void *arg)
> +{
> + struct segment_iter *iter = arg;
> + struct pci_dev *pdev;
> +
> + list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
> + {
> + int rc = iter->handler(pdev, iter->arg);
> +
> + if ( !iter->rc )
> + iter->rc = rc;
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Iterate without locking or preemption over all PCI devices known by Xen.
> + * Expected to be called with interrupts disabled.
> + */
> +int pci_iterate_devices(int (*handler)(struct pci_dev *pdev, void *arg),
> + void *arg)
> +{
> + struct segment_iter iter = {
> + .handler = handler,
> + .arg = arg,
> + };
> +
> + ASSERT(!local_irq_is_enabled());
I'm not sure we want to go this far. Maybe my earlier comment was ambiguous
though. What I meant is that the function needs to be documented to be
prepared to be called with IRQs off. I didn't mean that to be a requirement
to call the function (as there's no dependency on that, afaics).
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH for-4.20 v3 4/5] x86/pci: disable MSI(-X) on all devices at shutdown
2025-02-11 11:34 ` Jan Beulich
@ 2025-02-11 14:19 ` Roger Pau Monné
0 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2025-02-11 14:19 UTC (permalink / raw)
To: Jan Beulich
Cc: oleksii.kurochko, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, xen-devel
On Tue, Feb 11, 2025 at 12:34:41PM +0100, Jan Beulich wrote:
> On 11.02.2025 12:02, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/crash.c
> > +++ b/xen/arch/x86/crash.c
> > @@ -175,6 +175,13 @@ static void nmi_shootdown_cpus(void)
> > */
> > x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
> >
> > + if ( !pcidevs_locked() )
> > + /*
> > + * Assume the PCI device list to be in a consistent state if the
> > + * lock is not held when the crash happened.
> > + */
> > + pci_disable_msi_all();
>
> Hmm, I really meant try-lock to be used here. For recursive locks
> rspin_is_locked() tells you only whether the local CPU owns the lock,
> whereas here you want to know whether anyone owns it.
Indeed, I always forget about this quirk of recursive locks. I will
need to introduce a new pcidevs_trylock() helper then.
> > --- a/xen/drivers/passthrough/pci.c
> > +++ b/xen/drivers/passthrough/pci.c
> > @@ -1802,6 +1802,45 @@ int iommu_do_pci_domctl(
> > return ret;
> > }
> >
> > +struct segment_iter {
> > + int (*handler)(struct pci_dev *pdev, void *arg);
> > + void *arg;
> > + int rc;
> > +};
> > +
> > +static int cf_check iterate_all(struct pci_seg *pseg, void *arg)
> > +{
> > + struct segment_iter *iter = arg;
> > + struct pci_dev *pdev;
> > +
> > + list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
> > + {
> > + int rc = iter->handler(pdev, iter->arg);
> > +
> > + if ( !iter->rc )
> > + iter->rc = rc;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * Iterate without locking or preemption over all PCI devices known by Xen.
> > + * Expected to be called with interrupts disabled.
> > + */
> > +int pci_iterate_devices(int (*handler)(struct pci_dev *pdev, void *arg),
> > + void *arg)
> > +{
> > + struct segment_iter iter = {
> > + .handler = handler,
> > + .arg = arg,
> > + };
> > +
> > + ASSERT(!local_irq_is_enabled());
>
> I'm not sure we want to go this far. Maybe my earlier comment was ambiguous
> though. What I meant is that the function needs to be documented to be
> prepared to be called with IRQs off. I didn't mean that to be a requirement
> to call the function (as there's no dependency on that, afaics).
Well, I mostly did this because the function is traversing the list of
PCI devices list without any locking, and wanted to make it clear
interrupts might not be safe in case they perform modifications to the
list of PCI devices (I don't think we have such usage).
Since I need to do a v4 of this one I don't mind dropping the assert.
Thanks, Roger.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH for-4.20 v4 4/5] x86/pci: disable MSI(-X) on all devices at shutdown
2025-02-11 11:02 ` [PATCH for-4.20 v3 4/5] x86/pci: disable MSI(-X) on all devices at shutdown Roger Pau Monne
2025-02-11 11:34 ` Jan Beulich
@ 2025-02-11 14:48 ` Roger Pau Monne
2025-02-11 17:00 ` Jan Beulich
1 sibling, 1 reply; 18+ messages in thread
From: Roger Pau Monne @ 2025-02-11 14:48 UTC (permalink / raw)
To: xen-devel
Cc: oleksii.kurochko, Roger Pau Monne, Jan Beulich, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini
Attempt to disable MSI(-X) capabilities on all PCI devices know by Xen at
shutdown. Doing such disabling should facilitate kexec chained kernel from
booting more reliably, as device MSI(-X) interrupt generation should be
quiesced.
Only attempt to disable MSI(-X) on all devices in the crash context if the
PCI lock is not taken, otherwise the PCI device list could be in an
inconsistent state. This requires introducing a new pcidevs_trylock()
helper to check whether the lock is currently taken.
Disabling MSI(-X) should prevent "Receive accept error" being raised as a
result of non-disabled interrupts targeting offline CPUs.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
- Introduce and use pcidevs_trylock().
- Remove assert from pci_iterate_devices().
Changes since v2:
- Disable MSI(-X) ahead of IO-APIC disabling.
- Duplicate and expand comment about pci_iterate_devices() in function
declaration.
- Add assert interrupts are disabled in pci_iterate_devices().
- Only walk PCI device list on crash context if the pci lock is not taken.
Changes since v1:
- Split from bigger patch.
- Iterate over all devices, even if the handler returns failure.
---
xen/arch/x86/crash.c | 10 ++++++++
xen/arch/x86/include/asm/msi.h | 1 +
xen/arch/x86/msi.c | 18 +++++++++++++++
xen/arch/x86/smp.c | 1 +
xen/drivers/passthrough/pci.c | 42 ++++++++++++++++++++++++++++++++++
xen/include/xen/pci.h | 12 ++++++++++
6 files changed, 84 insertions(+)
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index a789416ca3ae..22b1121d7a40 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -175,6 +175,16 @@ static void nmi_shootdown_cpus(void)
*/
x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
+ if ( pcidevs_trylock() )
+ {
+ /*
+ * Assume the PCI device list to be in a consistent state if the
+ * lock is not held when the crash happened.
+ */
+ pci_disable_msi_all();
+ pcidevs_unlock();
+ }
+
disable_IO_APIC();
hpet_disable();
}
diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index 63adb19820e8..7f9e531f73e6 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -86,6 +86,7 @@ extern int pci_enable_msi(struct pci_dev *pdev, struct msi_info *msi,
extern void pci_disable_msi(struct msi_desc *msi_desc);
extern int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool off);
extern void pci_cleanup_msi(struct pci_dev *pdev);
+extern void pci_disable_msi_all(void);
extern int setup_msi_irq(struct irq_desc *desc, struct msi_desc *msidesc);
extern int __setup_msi_irq(struct irq_desc *desc, struct msi_desc *msidesc,
hw_irq_controller *handler);
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index e2360579deda..c9fe942c46f3 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1248,6 +1248,24 @@ void pci_cleanup_msi(struct pci_dev *pdev)
msi_free_irqs(pdev);
}
+static int cf_check disable_msi(struct pci_dev *pdev, void *arg)
+{
+ msi_set_enable(pdev, 0);
+ msix_set_enable(pdev, 0);
+
+ return 0;
+}
+
+/* Disable MSI and/or MSI-X on all devices known by Xen. */
+void pci_disable_msi_all(void)
+{
+ int rc = pci_iterate_devices(disable_msi, NULL);
+
+ if ( rc )
+ printk(XENLOG_ERR
+ "Failed to disable MSI(-X) on some devices: %d\n", rc);
+}
+
int pci_reset_msix_state(struct pci_dev *pdev)
{
unsigned int pos = pdev->msix_pos;
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 4d29a09a9a95..28bc041e03a9 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -374,6 +374,7 @@ void smp_send_stop(void)
smp_call_function(stop_this_cpu, &stop_aps, 0);
local_irq_disable();
+ pci_disable_msi_all();
disable_IO_APIC();
hpet_disable();
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index f398c3aa65dc..e1a09344df25 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -68,6 +68,11 @@ bool pcidevs_locked(void)
return rspin_is_locked(&_pcidevs_lock);
}
+bool pcidevs_trylock_unsafe(void)
+{
+ return _rspin_trylock(&_pcidevs_lock);
+}
+
static RADIX_TREE(pci_segments);
static inline struct pci_seg *get_pseg(u16 seg)
@@ -1802,6 +1807,43 @@ int iommu_do_pci_domctl(
return ret;
}
+struct segment_iter {
+ int (*handler)(struct pci_dev *pdev, void *arg);
+ void *arg;
+ int rc;
+};
+
+static int cf_check iterate_all(struct pci_seg *pseg, void *arg)
+{
+ struct segment_iter *iter = arg;
+ struct pci_dev *pdev;
+
+ list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
+ {
+ int rc = iter->handler(pdev, iter->arg);
+
+ if ( !iter->rc )
+ iter->rc = rc;
+ }
+
+ return 0;
+}
+
+/*
+ * Iterate without locking or preemption over all PCI devices known by Xen.
+ * Can be called with interrupts disabled.
+ */
+int pci_iterate_devices(int (*handler)(struct pci_dev *pdev, void *arg),
+ void *arg)
+{
+ struct segment_iter iter = {
+ .handler = handler,
+ .arg = arg,
+ };
+
+ return pci_segments_iterate(iterate_all, &iter) ?: iter.rc;
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index f784e9116059..4f12bcf089ac 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -190,6 +190,11 @@ static always_inline void pcidevs_lock(void)
}
void pcidevs_unlock(void);
bool __must_check pcidevs_locked(void);
+bool pcidevs_trylock_unsafe(void);
+static always_inline bool pcidevs_trylock(void)
+{
+ return lock_evaluate_nospec(pcidevs_trylock_unsafe());
+}
#ifndef NDEBUG
/*
@@ -226,6 +231,13 @@ struct pci_dev *pci_get_pdev(const struct domain *d, pci_sbdf_t sbdf);
struct pci_dev *pci_get_real_pdev(pci_sbdf_t sbdf);
void pci_check_disable_device(u16 seg, u8 bus, u8 devfn);
+/*
+ * Iterate without locking or preemption over all PCI devices known by Xen.
+ * Can be called with interrupts disabled.
+ */
+int pci_iterate_devices(int (*handler)(struct pci_dev *pdev, void *arg),
+ void *arg);
+
uint8_t pci_conf_read8(pci_sbdf_t sbdf, unsigned int reg);
uint16_t pci_conf_read16(pci_sbdf_t sbdf, unsigned int reg);
uint32_t pci_conf_read32(pci_sbdf_t sbdf, unsigned int reg);
--
2.46.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH for-4.20 v4 4/5] x86/pci: disable MSI(-X) on all devices at shutdown
2025-02-11 14:48 ` [PATCH for-4.20 v4 " Roger Pau Monne
@ 2025-02-11 17:00 ` Jan Beulich
0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2025-02-11 17:00 UTC (permalink / raw)
To: Roger Pau Monne
Cc: oleksii.kurochko, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, xen-devel
On 11.02.2025 15:48, Roger Pau Monne wrote:
> Attempt to disable MSI(-X) capabilities on all PCI devices know by Xen at
> shutdown. Doing such disabling should facilitate kexec chained kernel from
> booting more reliably, as device MSI(-X) interrupt generation should be
> quiesced.
>
> Only attempt to disable MSI(-X) on all devices in the crash context if the
> PCI lock is not taken, otherwise the PCI device list could be in an
> inconsistent state. This requires introducing a new pcidevs_trylock()
> helper to check whether the lock is currently taken.
>
> Disabling MSI(-X) should prevent "Receive accept error" being raised as a
> result of non-disabled interrupts targeting offline CPUs.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH for-4.20 v3 5/5] x86/iommu: disable interrupts at shutdown
2025-02-11 11:02 [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown Roger Pau Monne
` (3 preceding siblings ...)
2025-02-11 11:02 ` [PATCH for-4.20 v3 4/5] x86/pci: disable MSI(-X) on all devices at shutdown Roger Pau Monne
@ 2025-02-11 11:02 ` Roger Pau Monne
2025-02-11 11:37 ` Jan Beulich
2025-02-11 18:39 ` [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors " Roger Pau Monné
5 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monne @ 2025-02-11 11:02 UTC (permalink / raw)
To: xen-devel; +Cc: oleksii.kurochko, Roger Pau Monne, Jan Beulich, Andrew Cooper
Add a new hook to inhibit interrupt generation by the IOMMU(s). Note the
hook is currently only implemented for x86 IOMMUs. The purpose is to
disable interrupt generation at shutdown so any kexec chained image finds
the IOMMU(s) in a quiesced state.
It would also prevent "Receive accept error" being raised as a result of
non-disabled interrupts targeting offline CPUs.
Note that the iommu_quiesce() call in nmi_shootdown_cpus() is still
required even when there's a preceding iommu_crash_shutdown() call; the
later can become a no-op depending on the setting of the "crash-disable"
command line option.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
- New in this version.
- Expand commit message.
- Check if the hook is implemented before attempting to call it.
---
xen/arch/x86/crash.c | 1 +
xen/arch/x86/smp.c | 1 +
xen/drivers/passthrough/amd/iommu.h | 1 +
xen/drivers/passthrough/amd/iommu_init.c | 17 +++++++++++++++++
xen/drivers/passthrough/amd/pci_amd_iommu.c | 1 +
xen/drivers/passthrough/iommu.c | 12 ++++++++++++
xen/drivers/passthrough/vtd/iommu.c | 19 +++++++++++++++++++
xen/include/xen/iommu.h | 3 +++
8 files changed, 55 insertions(+)
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index 621af81acc09..7a65f806100f 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -184,6 +184,7 @@ static void nmi_shootdown_cpus(void)
disable_IO_APIC();
hpet_disable();
+ iommu_quiesce();
}
}
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 28bc041e03a9..516dab5528c1 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -377,6 +377,7 @@ void smp_send_stop(void)
pci_disable_msi_all();
disable_IO_APIC();
hpet_disable();
+ iommu_quiesce();
if ( num_online_cpus() > 1 )
{
diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/amd/iommu.h
index c32e9e9a1602..00e81b4b2ab3 100644
--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -292,6 +292,7 @@ extern unsigned long *shared_intremap_inuse;
void cf_check amd_iommu_resume(void);
int __must_check cf_check amd_iommu_suspend(void);
void cf_check amd_iommu_crash_shutdown(void);
+void cf_check amd_iommu_quiesce(void);
static inline u32 get_field_from_reg_u32(u32 reg_value, u32 mask, u32 shift)
{
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index ed5e684b93da..934adc5abe59 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1608,3 +1608,20 @@ void cf_check amd_iommu_resume(void)
invalidate_all_domain_pages();
}
}
+
+void cf_check amd_iommu_quiesce(void)
+{
+ struct amd_iommu *iommu;
+
+ for_each_amd_iommu ( iommu )
+ {
+ if ( iommu->ctrl.int_cap_xt_en )
+ {
+ iommu->ctrl.int_cap_xt_en = false;
+ writeq(iommu->ctrl.raw,
+ iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+ }
+ else
+ amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+ }
+}
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index f96f59440bcc..d00697edb3a2 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -791,6 +791,7 @@ static const struct iommu_ops __initconst_cf_clobber _iommu_ops = {
.crash_shutdown = amd_iommu_crash_shutdown,
.get_reserved_device_memory = amd_iommu_get_reserved_device_memory,
.dump_page_tables = amd_dump_page_tables,
+ .quiesce = amd_iommu_quiesce,
};
static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 9e74a1fc72fa..16aad86973f9 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -663,6 +663,18 @@ void iommu_crash_shutdown(void)
#endif
}
+void iommu_quiesce(void)
+{
+ const struct iommu_ops *ops;
+
+ if ( !iommu_enabled )
+ return;
+
+ ops = iommu_get_ops();
+ if ( ops->quiesce )
+ iommu_vcall(ops, quiesce);
+}
+
int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
{
const struct iommu_ops *ops;
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 9d7a9977a6a6..a1927d9f126d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -3207,6 +3207,24 @@ static int cf_check intel_iommu_quarantine_init(struct pci_dev *pdev,
return rc;
}
+static void cf_check vtd_quiesce(void)
+{
+ const struct acpi_drhd_unit *drhd;
+
+ for_each_drhd_unit ( drhd )
+ {
+ const struct vtd_iommu *iommu = drhd->iommu;
+ uint32_t sts = dmar_readl(iommu->reg, DMAR_FECTL_REG);
+
+ /*
+ * Open code dma_msi_mask() to avoid taking the spinlock which could
+ * deadlock if called from crash context.
+ */
+ sts |= DMA_FECTL_IM;
+ dmar_writel(iommu->reg, DMAR_FECTL_REG, sts);
+ }
+}
+
static const struct iommu_ops __initconst_cf_clobber vtd_ops = {
.page_sizes = PAGE_SIZE_4K,
.init = intel_iommu_domain_init,
@@ -3236,6 +3254,7 @@ static const struct iommu_ops __initconst_cf_clobber vtd_ops = {
.iotlb_flush = iommu_flush_iotlb,
.get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
.dump_page_tables = vtd_dump_page_tables,
+ .quiesce = vtd_quiesce,
};
const struct iommu_init_ops __initconstrel intel_iommu_init_ops = {
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b928c67e1995..77a514019cc6 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -314,6 +314,8 @@ struct iommu_ops {
*/
int (*dt_xlate)(device_t *dev, const struct dt_phandle_args *args);
#endif
+ /* Inhibit all interrupt generation, to be used at shutdown. */
+ void (*quiesce)(void);
};
/*
@@ -404,6 +406,7 @@ static inline int iommu_do_domctl(struct xen_domctl *domctl, struct domain *d,
int __must_check iommu_suspend(void);
void iommu_resume(void);
void iommu_crash_shutdown(void);
+void iommu_quiesce(void);
int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
int iommu_quarantine_dev_init(device_t *dev);
--
2.46.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH for-4.20 v3 5/5] x86/iommu: disable interrupts at shutdown
2025-02-11 11:02 ` [PATCH for-4.20 v3 5/5] x86/iommu: disable interrupts " Roger Pau Monne
@ 2025-02-11 11:37 ` Jan Beulich
0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2025-02-11 11:37 UTC (permalink / raw)
To: Roger Pau Monne; +Cc: oleksii.kurochko, Andrew Cooper, xen-devel
On 11.02.2025 12:02, Roger Pau Monne wrote:
> Add a new hook to inhibit interrupt generation by the IOMMU(s). Note the
> hook is currently only implemented for x86 IOMMUs. The purpose is to
> disable interrupt generation at shutdown so any kexec chained image finds
> the IOMMU(s) in a quiesced state.
>
> It would also prevent "Receive accept error" being raised as a result of
> non-disabled interrupts targeting offline CPUs.
>
> Note that the iommu_quiesce() call in nmi_shootdown_cpus() is still
> required even when there's a preceding iommu_crash_shutdown() call; the
> later can become a no-op depending on the setting of the "crash-disable"
> command line option.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
> - New in this version.
> - Expand commit message.
> - Check if the hook is implemented before attempting to call it.
The latter two are changes since v2, though, aiui. In any event
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown
2025-02-11 11:02 [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown Roger Pau Monne
` (4 preceding siblings ...)
2025-02-11 11:02 ` [PATCH for-4.20 v3 5/5] x86/iommu: disable interrupts " Roger Pau Monne
@ 2025-02-11 18:39 ` Roger Pau Monné
2025-02-12 8:33 ` Oleksii Kurochko
5 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2025-02-11 18:39 UTC (permalink / raw)
To: oleksii.kurochko
Cc: xen-devel, oleksii.kurochko, Jan Beulich, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini
On Tue, Feb 11, 2025 at 12:02:04PM +0100, Roger Pau Monne wrote:
> Hello,
>
> The following series aims to prevent local APIC errors from stalling the
> shtudown process. On XenServer testing we have seen reports of AMD
> boxes sporadically getting stuck in a spam of:
>
> APIC error on CPU0: 00(08), Receive accept error
>
> Messages during shutdown, as a result of device interrupts targeting
> CPUs that are offline (and have the local APIC disabled).
>
> First patch strictly solves the issue of shutdown getting stuck, further
> patches aim to quiesce interrupts from all devices (known by Xen) as an
> attempt to prevent a spurious "APIC error on CPU0: 00(00)" plus also
> make kexec more reliable.
>
> Thanks, Roger.
>
> Roger Pau Monne (5):
> x86/shutdown: offline APs with interrupts disabled on all CPUs
> x86/irq: drop fixup_irqs() parameters
> x86/smp: perform disabling on interrupts ahead of AP shutdown
> x86/pci: disable MSI(-X) on all devices at shutdown
> x86/iommu: disable interrupts at shutdown
This is now fully reviewed, can I get your opinion (and
release-acked-by) on which patches we should take for 4.20?
Thanks, Roger.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown
2025-02-11 18:39 ` [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors " Roger Pau Monné
@ 2025-02-12 8:33 ` Oleksii Kurochko
2025-02-12 8:51 ` Jan Beulich
0 siblings, 1 reply; 18+ messages in thread
From: Oleksii Kurochko @ 2025-02-12 8:33 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini
[-- Attachment #1: Type: text/plain, Size: 1913 bytes --]
On 2/11/25 7:39 PM, Roger Pau Monné wrote:
> On Tue, Feb 11, 2025 at 12:02:04PM +0100, Roger Pau Monne wrote:
>> Hello,
>>
>> The following series aims to prevent local APIC errors from stalling the
>> shtudown process. On XenServer testing we have seen reports of AMD
>> boxes sporadically getting stuck in a spam of:
>>
>> APIC error on CPU0: 00(08), Receive accept error
>>
>> Messages during shutdown, as a result of device interrupts targeting
>> CPUs that are offline (and have the local APIC disabled).
>>
>> First patch strictly solves the issue of shutdown getting stuck, further
>> patches aim to quiesce interrupts from all devices (known by Xen) as an
>> attempt to prevent a spurious "APIC error on CPU0: 00(00)" plus also
>> make kexec more reliable.
>>
>> Thanks, Roger.
>>
>> Roger Pau Monne (5):
>> x86/shutdown: offline APs with interrupts disabled on all CPUs
>> x86/irq: drop fixup_irqs() parameters
>> x86/smp: perform disabling on interrupts ahead of AP shutdown
>> x86/pci: disable MSI(-X) on all devices at shutdown
>> x86/iommu: disable interrupts at shutdown
> This is now fully reviewed, can I get your opinion (and
> release-acked-by) on which patches we should take for 4.20?
If my understanding is correct to unblock shutdown process, it is enough just
to have only first patch merged, correct? So the first patch should be merged.
As second patch doesn't have functional changes, IMO, it could be merged to
despite of the fact we have Hard code freeze period.
All other patches, I would like to ask additional opinion (as I am an expert in x86),
at first glance it looks like an absence of these patches in staging branch will
lead only to triggering "Receive accept error" which I believe won't block shutdown
process, so these patches could be postponed until 4.21. On other side, if it is
low-risk fixes then we could consider to merge them now.
~ Oleksii
[-- Attachment #2: Type: text/html, Size: 2360 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown
2025-02-12 8:33 ` Oleksii Kurochko
@ 2025-02-12 8:51 ` Jan Beulich
2025-02-12 9:25 ` Roger Pau Monné
0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2025-02-12 8:51 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, Roger Pau Monné
On 12.02.2025 09:33, Oleksii Kurochko wrote:
>
> On 2/11/25 7:39 PM, Roger Pau Monné wrote:
>> On Tue, Feb 11, 2025 at 12:02:04PM +0100, Roger Pau Monne wrote:
>>> Hello,
>>>
>>> The following series aims to prevent local APIC errors from stalling the
>>> shtudown process. On XenServer testing we have seen reports of AMD
>>> boxes sporadically getting stuck in a spam of:
>>>
>>> APIC error on CPU0: 00(08), Receive accept error
>>>
>>> Messages during shutdown, as a result of device interrupts targeting
>>> CPUs that are offline (and have the local APIC disabled).
>>>
>>> First patch strictly solves the issue of shutdown getting stuck, further
>>> patches aim to quiesce interrupts from all devices (known by Xen) as an
>>> attempt to prevent a spurious "APIC error on CPU0: 00(00)" plus also
>>> make kexec more reliable.
>>>
>>> Thanks, Roger.
>>>
>>> Roger Pau Monne (5):
>>> x86/shutdown: offline APs with interrupts disabled on all CPUs
>>> x86/irq: drop fixup_irqs() parameters
>>> x86/smp: perform disabling on interrupts ahead of AP shutdown
>>> x86/pci: disable MSI(-X) on all devices at shutdown
>>> x86/iommu: disable interrupts at shutdown
>> This is now fully reviewed, can I get your opinion (and
>> release-acked-by) on which patches we should take for 4.20?
>
> If my understanding is correct to unblock shutdown process, it is enough just
> to have only first patch merged, correct? So the first patch should be merged.
>
> As second patch doesn't have functional changes, IMO, it could be merged to
> despite of the fact we have Hard code freeze period.
>
> All other patches, I would like to ask additional opinion (as I am an expert in x86),
> at first glance it looks like an absence of these patches in staging branch will
> lead only to triggering "Receive accept error" which I believe won't block shutdown
> process, so these patches could be postponed until 4.21. On other side, if it is
> low-risk fixes then we could consider to merge them now.
I'm not Roger, but as a data point: While I'm uncertain about patch 2, all
others in this series will very likely be backported anyway.
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown
2025-02-12 8:51 ` Jan Beulich
@ 2025-02-12 9:25 ` Roger Pau Monné
2025-02-12 14:25 ` Oleksii Kurochko
0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2025-02-12 9:25 UTC (permalink / raw)
To: Jan Beulich, Oleksii Kurochko
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini
On Wed, Feb 12, 2025 at 09:51:16AM +0100, Jan Beulich wrote:
> On 12.02.2025 09:33, Oleksii Kurochko wrote:
> >
> > On 2/11/25 7:39 PM, Roger Pau Monné wrote:
> >> On Tue, Feb 11, 2025 at 12:02:04PM +0100, Roger Pau Monne wrote:
> >>> Hello,
> >>>
> >>> The following series aims to prevent local APIC errors from stalling the
> >>> shtudown process. On XenServer testing we have seen reports of AMD
> >>> boxes sporadically getting stuck in a spam of:
> >>>
> >>> APIC error on CPU0: 00(08), Receive accept error
> >>>
> >>> Messages during shutdown, as a result of device interrupts targeting
> >>> CPUs that are offline (and have the local APIC disabled).
> >>>
> >>> First patch strictly solves the issue of shutdown getting stuck, further
> >>> patches aim to quiesce interrupts from all devices (known by Xen) as an
> >>> attempt to prevent a spurious "APIC error on CPU0: 00(00)" plus also
> >>> make kexec more reliable.
> >>>
> >>> Thanks, Roger.
> >>>
> >>> Roger Pau Monne (5):
> >>> x86/shutdown: offline APs with interrupts disabled on all CPUs
> >>> x86/irq: drop fixup_irqs() parameters
> >>> x86/smp: perform disabling on interrupts ahead of AP shutdown
> >>> x86/pci: disable MSI(-X) on all devices at shutdown
> >>> x86/iommu: disable interrupts at shutdown
> >> This is now fully reviewed, can I get your opinion (and
> >> release-acked-by) on which patches we should take for 4.20?
> >
> > If my understanding is correct to unblock shutdown process, it is enough just
> > to have only first patch merged, correct? So the first patch should be merged.
> >
> > As second patch doesn't have functional changes, IMO, it could be merged to
> > despite of the fact we have Hard code freeze period.
> >
> > All other patches, I would like to ask additional opinion (as I am an expert in x86),
> > at first glance it looks like an absence of these patches in staging branch will
> > lead only to triggering "Receive accept error" which I believe won't block shutdown
> > process, so these patches could be postponed until 4.21. On other side, if it is
> > low-risk fixes then we could consider to merge them now.
I expect the following patches might make kexec'ing from Xen a bit
more reliable, as the kexec'ed kernel should find an environment with
interrupts from all Xen known devices quiesced.
> I'm not Roger, but as a data point: While I'm uncertain about patch 2, all
> others in this series will very likely be backported anyway.
I plan to backport the series to the XenServer patch queue also when it
goes in.
Thanks, Roger.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown
2025-02-12 9:25 ` Roger Pau Monné
@ 2025-02-12 14:25 ` Oleksii Kurochko
0 siblings, 0 replies; 18+ messages in thread
From: Oleksii Kurochko @ 2025-02-12 14:25 UTC (permalink / raw)
To: Roger Pau Monné, Jan Beulich
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini
[-- Attachment #1: Type: text/plain, Size: 2784 bytes --]
On 2/12/25 10:25 AM, Roger Pau Monné wrote:
> On Wed, Feb 12, 2025 at 09:51:16AM +0100, Jan Beulich wrote:
>> On 12.02.2025 09:33, Oleksii Kurochko wrote:
>>> On 2/11/25 7:39 PM, Roger Pau Monné wrote:
>>>> On Tue, Feb 11, 2025 at 12:02:04PM +0100, Roger Pau Monne wrote:
>>>>> Hello,
>>>>>
>>>>> The following series aims to prevent local APIC errors from stalling the
>>>>> shtudown process. On XenServer testing we have seen reports of AMD
>>>>> boxes sporadically getting stuck in a spam of:
>>>>>
>>>>> APIC error on CPU0: 00(08), Receive accept error
>>>>>
>>>>> Messages during shutdown, as a result of device interrupts targeting
>>>>> CPUs that are offline (and have the local APIC disabled).
>>>>>
>>>>> First patch strictly solves the issue of shutdown getting stuck, further
>>>>> patches aim to quiesce interrupts from all devices (known by Xen) as an
>>>>> attempt to prevent a spurious "APIC error on CPU0: 00(00)" plus also
>>>>> make kexec more reliable.
>>>>>
>>>>> Thanks, Roger.
>>>>>
>>>>> Roger Pau Monne (5):
>>>>> x86/shutdown: offline APs with interrupts disabled on all CPUs
>>>>> x86/irq: drop fixup_irqs() parameters
>>>>> x86/smp: perform disabling on interrupts ahead of AP shutdown
>>>>> x86/pci: disable MSI(-X) on all devices at shutdown
>>>>> x86/iommu: disable interrupts at shutdown
>>>> This is now fully reviewed, can I get your opinion (and
>>>> release-acked-by) on which patches we should take for 4.20?
>>> If my understanding is correct to unblock shutdown process, it is enough just
>>> to have only first patch merged, correct? So the first patch should be merged.
>>>
>>> As second patch doesn't have functional changes, IMO, it could be merged to
>>> despite of the fact we have Hard code freeze period.
>>>
>>> All other patches, I would like to ask additional opinion (as I am an expert in x86),
>>> at first glance it looks like an absence of these patches in staging branch will
>>> lead only to triggering "Receive accept error" which I believe won't block shutdown
>>> process, so these patches could be postponed until 4.21. On other side, if it is
>>> low-risk fixes then we could consider to merge them now.
> I expect the following patches might make kexec'ing from Xen a bit
> more reliable, as the kexec'ed kernel should find an environment with
> interrupts from all Xen known devices quiesced.
>
>> I'm not Roger, but as a data point: While I'm uncertain about patch 2, all
>> others in this series will very likely be backported anyway.
> I plan to backport the series to the XenServer patch queue also when it
> goes in.
If it is likely to be backported anyway, then let's merge the patch series now:
Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>
~Oleksii
>
> Thanks, Roger.
[-- Attachment #2: Type: text/html, Size: 3891 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread