* [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64
@ 2026-04-02 10:45 Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
` (14 more replies)
0 siblings, 15 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Jan Beulich, Roger Pau Monné, Jens Wiklander, Rahul Singh
From: Mykola Kvach <mykola_kvach@epam.com>
This is part 2 of version 8 of the ARM Xen system suspend/resume patch
series, based on earlier work by Mirela Simonovic and Mykyta Poturai.
The first part is in mainline.
NOTE: Most of the code is guarded by CONFIG_SYSTEM_SUSPEND, which can
currently only be selected when UNSUPPORTED is set, and thus the
functionality is neither enabled by default nor even built.
This version is ported to Xen master and includes extensive improvements
based on reviewer feedback. The patch series restructures code to improve
robustness, maintainability, and implements system Suspend-to-RAM support
on ARM64 hardware/control domains.
Key updates in this series:
- Introduced architecture-specific suspend/resume infrastructure
- Integrated GICv2/GICv3 suspend and resume, including memory-backed context
save/restore with error handling
- Added time and IRQ suspend/resume hooks, ensuring correct timer/interrupt
state across suspend cycles
- Implemented proper PSCI SYSTEM_SUSPEND invocation and version checks
- Improved state management and recovery in error cases during suspend/resume
- Added support for IPMMU-VMSA/SMMUv3 context save/restore
- Added support for GICv3 eSPI registers context save/restore
- Added support for ITS registers context save/restore
---
TODOs:
- Enable "xl suspend" support on ARM
- Add suspend/resume CI test for ARM (QEMU if feasible)
- PCI suspend ?
---
Detailed changelogs can be found in each patch.
Changes in v8:
- Rebased to latest master and refreshed the series accordingly.
- Added a new GICv3 patch to tolerate retained redistributor LPI state
across CPU_OFF/CPU_ON.
- GICv2 suspend now disables the CPU interface and distributor before
saving state.
- GICv3 suspend/resume fixes the redistributor base used for LPI state.
- ITS and SMMUv3 suspend/resume paths were tightened, with safer
restore/rollback handling and stricter fatal-error handling.
- System suspend now checks that all domains are already in
SHUTDOWN_suspend before proceeding, and renames the hardware-domain
suspend capability/helper for clearer semantics.
- Fixed alignment/cleanup issues in the low-level suspend/resume code.
Changes in v7:
- Timer helper renamed/clarified; virtual/hyper/phys handling documented.
- GICv2 uses one context block; restore saved CTLR; panic on alloc failure.
- GICv3/eSPI/ITS always suspend/resume; restore LPI/eSPI; rdist timeout.
- IPMMU suspend context allocated before PCI setup.
- System suspend: control domain drives host suspend.
- Dropped v6 IRQ descriptor restore patches; use setup_irq and re-register
local IRQs on resume instead.
For earlier changelogs, please refer to the previous cover letters.
Mirela Simonovic (6):
xen/arm: Add suspend and resume timer helpers
xen/arm: gic-v2: Implement GIC suspend/resume functions
xen/arm: Resume memory management on Xen resume
xen/arm: Save/restore context on suspend/resume
xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
xen/arm: Add support for system suspend triggered by hardware domain
Mykola Kvach (6):
xen/arm: gic-v3: tolerate retained redistributor LPI state across
CPU_OFF
xen/arm: gic-v3: Implement GICv3 suspend/resume functions
xen/arm: gic-v3: add ITS suspend/resume support
xen/arm: tee: keep init_tee_secondary() for hotplug and resume
xen/arm: ffa: fix notification SRI across CPU hotplug/suspend
arm/smmu-v3: add suspend/resume handlers
Oleksandr Tyshchenko (1):
iommu/ipmmu-vmsa: Implement suspend/resume callbacks
xen/arch/arm/Kconfig | 2 +
xen/arch/arm/Makefile | 1 +
xen/arch/arm/arm64/head.S | 112 ++++++++
xen/arch/arm/gic-v2.c | 132 +++++++++
xen/arch/arm/gic-v3-its.c | 126 +++++++-
xen/arch/arm/gic-v3-lpi.c | 80 +++++-
xen/arch/arm/gic-v3.c | 349 ++++++++++++++++++++++-
xen/arch/arm/gic.c | 29 ++
xen/arch/arm/include/asm/gic.h | 12 +
xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
xen/arch/arm/include/asm/gic_v3_its.h | 24 ++
xen/arch/arm/include/asm/mm.h | 2 +
xen/arch/arm/include/asm/psci.h | 1 +
xen/arch/arm/include/asm/suspend.h | 31 ++
xen/arch/arm/include/asm/time.h | 5 +
xen/arch/arm/mmu/smpboot.c | 2 +-
xen/arch/arm/psci.c | 23 +-
xen/arch/arm/suspend.c | 195 +++++++++++++
xen/arch/arm/tee/ffa_notif.c | 63 +++-
xen/arch/arm/tee/tee.c | 2 +-
xen/arch/arm/time.c | 44 ++-
xen/arch/arm/vpsci.c | 12 +-
xen/common/Kconfig | 3 +
xen/common/domain.c | 7 +-
xen/drivers/passthrough/arm/ipmmu-vmsa.c | 305 +++++++++++++++++++-
xen/drivers/passthrough/arm/smmu-v3.c | 172 ++++++++---
xen/drivers/passthrough/arm/smmu.c | 10 +
xen/include/xen/list.h | 14 +
28 files changed, 1670 insertions(+), 89 deletions(-)
create mode 100644 xen/arch/arm/suspend.c
--
2.43.0
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v8 01/13] xen/arm: Add suspend and resume timer helpers
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-20 15:22 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
` (13 subsequent siblings)
14 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Julien Grall
From: Mirela Simonovic <mirela.simonovic@aggios.com>
Timer interrupts must be disabled while the system is suspended to prevent
spurious wake-ups. Suspending timers in Xen consists of disabling the
physical timer and the hypervisor timer on the current CPU. The virtual
timer does not need explicit handling here, as it is already disabled on
vCPU context switch and its state is restored per-vCPU on the next context
restore.
Resuming consists of raising TIMER_SOFTIRQ, which prompts the generic
timer code to reprogram the hypervisor timer with the correct timeout.
Xen does not use or expose the physical timer, so it remains disabled
across suspend/resume.
Introduce a new helper, disable_phys_hyp_timers(), to encapsulate disabling
of the physical and hypervisor timers.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Acked-by: Julien Grall <jgrall@amazon.com>
---
Changes in V7:
- Dropped EL1/EL2 wording; use "physical timer" and "hypervisor timer"
- Renamed helper to disable_phys_hyp_timers() to reflect its actual scope
- Clarified virtual timer handling (disabled on vCPU switch-out, restored on
context restore) and added comments in suspend/resume paths
- Added resume comment explaining which timers are restored by TIMER_SOFTIRQ
---
xen/arch/arm/include/asm/time.h | 5 ++++
xen/arch/arm/time.c | 44 ++++++++++++++++++++++++++++-----
2 files changed, 43 insertions(+), 6 deletions(-)
diff --git a/xen/arch/arm/include/asm/time.h b/xen/arch/arm/include/asm/time.h
index c194dbb9f5..9313b157ea 100644
--- a/xen/arch/arm/include/asm/time.h
+++ b/xen/arch/arm/include/asm/time.h
@@ -105,6 +105,11 @@ void preinit_xen_time(void);
void force_update_vcpu_system_time(struct vcpu *v);
+#ifdef CONFIG_SYSTEM_SUSPEND
+void time_suspend(void);
+void time_resume(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
#endif /* __ARM_TIME_H__ */
/*
* Local variables:
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index a12912a106..f91dc64099 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -296,6 +296,14 @@ static void check_timer_irq_cfg(unsigned int irq, const char *which)
static DEFINE_PER_CPU_READ_MOSTLY(struct irqaction, irq_hyp);
static DEFINE_PER_CPU_READ_MOSTLY(struct irqaction, irq_virt);
+/* Disable physical and hypervisor timers on the current CPU */
+static inline void disable_phys_hyp_timers(void)
+{
+ WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
+ WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
+ isb();
+}
+
/* Set up the timer interrupt on this CPU */
void init_timer_interrupt(void)
{
@@ -306,9 +314,7 @@ void init_timer_interrupt(void)
WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */
/* Do not let the VMs program the physical timer, only read the physical counter */
WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
- WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
- WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
- isb();
+ disable_phys_hyp_timers();
hyp_action->name = "hyptimer";
hyp_action->handler = htimer_interrupt;
@@ -333,9 +339,7 @@ void init_timer_interrupt(void)
*/
static void deinit_timer_interrupt(void)
{
- WRITE_SYSREG(0, CNTP_CTL_EL0); /* Disable physical timer */
- WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Disable hypervisor's timer */
- isb();
+ disable_phys_hyp_timers();
release_irq(timer_irq[TIMER_HYP_PPI], NULL);
release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
@@ -375,6 +379,34 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
/* XXX update guest visible wallclock time */
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+void time_suspend(void)
+{
+ /* CNTV already disabled by virt_timer_save() during vcpu context switch. */
+ disable_phys_hyp_timers();
+}
+
+void time_resume(void)
+{
+ /*
+ * Raising TIMER_SOFTIRQ triggers generic timer code to reprogram the
+ * hypervisor timer with the correct timeout (not known here).
+ *
+ * Xen doesn't use or expose the physical timer, so it remains disabled
+ * across suspend/resume.
+ *
+ * The virtual timer state is restored per-vCPU on the next context switch.
+ *
+ * No further action is needed to restore timekeeping after power down,
+ * since the system counter is unaffected. See ARM DDI 0487 L.a, D12.1.2
+ * "The system counter must be implemented in an always-on power domain."
+ */
+ raise_softirq(TIMER_SOFTIRQ);
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
static int cpu_time_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-21 13:24 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 03/13] xen/arm: gic-v3: tolerate retained redistributor LPI state across CPU_OFF Mykola Kvach
` (12 subsequent siblings)
14 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mirela Simonovic <mirela.simonovic@aggios.com>
System suspend may lead to a state where GIC would be powered down.
Therefore, Xen should save/restore the context of GIC on suspend/resume.
Note that the context consists of states of registers which are
controlled by the hypervisor. Other GIC registers which are accessible
by guests are saved/restored on context switch.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V8:
- disable cpu interface + distributor before suspend
- change 0xffffffff to GENMASK;
- cosmetic changes;
Changes in V7:
- Allocate one contiguous memory block for the GICv2 dist suspend context.
- gicv2_resume() no longer unconditionally re-enables the distributor/CPU interface;
it now writes back the saved CTLR values as-is.
- gicv2_alloc_context() now returns 0 on success and panics on failure, since
suspend context allocation is not recoverable.
---
xen/arch/arm/gic-v2.c | 132 +++++++++++++++++++++++++++++++++
xen/arch/arm/gic.c | 29 ++++++++
xen/arch/arm/include/asm/gic.h | 12 +++
3 files changed, 173 insertions(+)
diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index b23e72a3d0..dbff470962 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -1098,6 +1098,129 @@ static int gicv2_iomem_deny_access(struct domain *d)
return iomem_deny_access(d, mfn, mfn + nr);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+/* This struct represents block of 32 IRQs */
+struct irq_block {
+ uint32_t icfgr[2]; /* 2 registers of 16 IRQs each */
+ uint32_t ipriorityr[8];
+ uint32_t isenabler;
+ uint32_t isactiver;
+ uint32_t itargetsr[8];
+};
+
+/* GICv2 registers to be saved/restored on system suspend/resume */
+struct gicv2_context {
+ /* GICC context */
+ struct cpu_ctx {
+ uint32_t ctlr;
+ uint32_t pmr;
+ uint32_t bpr;
+ } cpu;
+
+ /* GICD context */
+ struct dist_ctx {
+ uint32_t ctlr;
+ /* Includes banked SGI/PPI state for the boot CPU. */
+ struct irq_block *irqs;
+ } dist;
+};
+
+static struct gicv2_context gic_ctx;
+
+static int gicv2_suspend(void)
+{
+ unsigned int i, blocks = DIV_ROUND_UP(gicv2_info.nr_lines, 32);
+
+ /* Save GICC_CTLR configuration. */
+ gic_ctx.cpu.ctlr = readl_gicc(GICC_CTLR);
+
+ /* Quiesce the GIC CPU interface before suspend. */
+ gicv2_cpu_disable();
+
+ /* Save GICD configuration */
+ gic_ctx.dist.ctlr = readl_gicd(GICD_CTLR);
+ writel_gicd(0, GICD_CTLR);
+
+ gic_ctx.cpu.pmr = readl_gicc(GICC_PMR);
+ gic_ctx.cpu.bpr = readl_gicc(GICC_BPR);
+
+ for ( i = 0; i < blocks; i++ )
+ {
+ struct irq_block *irqs = gic_ctx.dist.irqs + i;
+ size_t j, off = i * sizeof(irqs->isenabler);
+
+ irqs->isenabler = readl_gicd(GICD_ISENABLER + off);
+ irqs->isactiver = readl_gicd(GICD_ISACTIVER + off);
+
+ off = i * sizeof(irqs->ipriorityr);
+ for ( j = 0; j < ARRAY_SIZE(irqs->ipriorityr); j++ )
+ {
+ irqs->ipriorityr[j] = readl_gicd(GICD_IPRIORITYR + off + j * 4);
+ irqs->itargetsr[j] = readl_gicd(GICD_ITARGETSR + off + j * 4);
+ }
+
+ off = i * sizeof(irqs->icfgr);
+ for ( j = 0; j < ARRAY_SIZE(irqs->icfgr); j++ )
+ irqs->icfgr[j] = readl_gicd(GICD_ICFGR + off + j * 4);
+ }
+
+ return 0;
+}
+
+static void gicv2_resume(void)
+{
+ unsigned int i, blocks = DIV_ROUND_UP(gicv2_info.nr_lines, 32);
+
+ gicv2_cpu_disable();
+ /* Disable distributor */
+ writel_gicd(0, GICD_CTLR);
+
+ for ( i = 0; i < blocks; i++ )
+ {
+ struct irq_block *irqs = gic_ctx.dist.irqs + i;
+ size_t j, off = i * sizeof(irqs->isenabler);
+
+ writel_gicd(GENMASK(31, 0), GICD_ICENABLER + off);
+ writel_gicd(irqs->isenabler, GICD_ISENABLER + off);
+
+ writel_gicd(GENMASK(31, 0), GICD_ICACTIVER + off);
+ writel_gicd(irqs->isactiver, GICD_ISACTIVER + off);
+
+ off = i * sizeof(irqs->ipriorityr);
+ for ( j = 0; j < ARRAY_SIZE(irqs->ipriorityr); j++ )
+ {
+ writel_gicd(irqs->ipriorityr[j], GICD_IPRIORITYR + off + j * 4);
+ writel_gicd(irqs->itargetsr[j], GICD_ITARGETSR + off + j * 4);
+ }
+
+ off = i * sizeof(irqs->icfgr);
+ for ( j = 0; j < ARRAY_SIZE(irqs->icfgr); j++ )
+ writel_gicd(irqs->icfgr[j], GICD_ICFGR + off + j * 4);
+ }
+
+ /* Make sure all registers are restored and enable distributor */
+ writel_gicd(gic_ctx.dist.ctlr, GICD_CTLR);
+
+ /* Restore GIC CPU interface configuration */
+ writel_gicc(gic_ctx.cpu.pmr, GICC_PMR);
+ writel_gicc(gic_ctx.cpu.bpr, GICC_BPR);
+
+ /* Enable GIC CPU interface */
+ writel_gicc(gic_ctx.cpu.ctlr, GICC_CTLR);
+}
+
+static void __init gicv2_alloc_context(void)
+{
+ uint32_t blocks = DIV_ROUND_UP(gicv2_info.nr_lines, 32);
+
+ gic_ctx.dist.irqs = xzalloc_array(struct irq_block, blocks);
+ if ( !gic_ctx.dist.irqs )
+ panic("Failed to allocate memory for GICv2 suspend context\n");
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
#ifdef CONFIG_ACPI
static unsigned long gicv2_get_hwdom_extra_madt_size(const struct domain *d)
{
@@ -1302,6 +1425,11 @@ static int __init gicv2_init(void)
spin_unlock(&gicv2.lock);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ /* Allocate memory to be used for saving GIC context during the suspend */
+ gicv2_alloc_context();
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
return 0;
}
@@ -1345,6 +1473,10 @@ static const struct gic_hw_operations gicv2_ops = {
.map_hwdom_extra_mappings = gicv2_map_hwdom_extra_mappings,
.iomem_deny_access = gicv2_iomem_deny_access,
.do_LPI = gicv2_do_LPI,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = gicv2_suspend,
+ .resume = gicv2_resume,
+#endif /* CONFIG_SYSTEM_SUSPEND */
};
/* Set up the GIC */
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index ee75258fc3..7727ffed5a 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -432,6 +432,35 @@ int gic_iomem_deny_access(struct domain *d)
return gic_hw_ops->iomem_deny_access(d);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+int gic_suspend(void)
+{
+ /* Must be called by boot CPU#0 with interrupts disabled */
+ ASSERT(!local_irq_is_enabled());
+ ASSERT(!smp_processor_id());
+
+ if ( !gic_hw_ops->suspend || !gic_hw_ops->resume )
+ return -ENOSYS;
+
+ return gic_hw_ops->suspend();
+}
+
+void gic_resume(void)
+{
+ /*
+ * Must be called by boot CPU#0 with interrupts disabled after gic_suspend
+ * has returned successfully.
+ */
+ ASSERT(!local_irq_is_enabled());
+ ASSERT(!smp_processor_id());
+ ASSERT(gic_hw_ops->resume);
+
+ gic_hw_ops->resume();
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
static int cpu_gic_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
diff --git a/xen/arch/arm/include/asm/gic.h b/xen/arch/arm/include/asm/gic.h
index 8e713aa477..8e8f4ac4c5 100644
--- a/xen/arch/arm/include/asm/gic.h
+++ b/xen/arch/arm/include/asm/gic.h
@@ -280,6 +280,12 @@ extern int gicv_setup(struct domain *d);
extern void gic_save_state(struct vcpu *v);
extern void gic_restore_state(struct vcpu *v);
+#ifdef CONFIG_SYSTEM_SUSPEND
+/* Suspend/resume */
+extern int gic_suspend(void);
+extern void gic_resume(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
/* SGI (AKA IPIs) */
enum gic_sgi {
GIC_SGI_EVENT_CHECK,
@@ -423,6 +429,12 @@ struct gic_hw_operations {
int (*iomem_deny_access)(struct domain *d);
/* Handle LPIs, which require special handling */
void (*do_LPI)(unsigned int lpi);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ /* Save GIC configuration due to the system suspend */
+ int (*suspend)(void);
+ /* Restore GIC configuration due to the system resume */
+ void (*resume)(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
};
extern const struct gic_hw_operations *gic_hw_ops;
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 03/13] xen/arm: gic-v3: tolerate retained redistributor LPI state across CPU_OFF
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-22 15:55 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 04/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions Mykola Kvach
` (11 subsequent siblings)
14 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
PSCI does not guarantee that a GICv3 redistributor is powered down across
CPU_OFF -> CPU_ON.
DEN0022F.b says CPU_OFF powers down the calling core (5.5) and CPU_ON
brings the core back with a defined initial CPU state (5.6, 6.4).
However, PSCI leaves interrupt migration and GIC re-initialization to the
supervisory software/firmware stack: the caller must migrate interrupts
away before CPU_OFF (5.5.2), and the execution context that is lost in a
powerdown state must be saved and restored by software (6.8). PSCI also
calls out GIC management explicitly in 6.8, including retargeting SPIs,
preventing PPIs/SGIs from targeting a powered down CPU, and reinitializing
the CPU interface after CPU_ON.
This matches the GIC architecture. IHI0069H.b Chapter 11.1 requires the PE
and CPU interface to share a power domain, but explicitly allows the
associated redistributor, distributor, and ITS to remain powered while the
PE and CPU interface are off. All other GIC power-management behavior is
IMPLEMENTATION DEFINED. DEN0050D Chapter 4.2, "Generic Interrupt
Controller (GIC)", says the GICv3 redistributor may live either in the AP
core power domain or in a relatively always-on parent domain. So after
CPU_OFF -> CPU_ON a secondary CPU can legitimately come back to a live
redistributor with GICR_CTLR.EnableLPIs still set.
Handle that case in the LPI setup path instead of assuming a fully reset
redistributor.
The LPI path needs special care because the GIC spec makes redistributor
LPI state sticky and partially implementation defined. IHI0069H.b 5.1.1
and 5.1.2 say that changing GICR_PROPBASER or GICR_PENDBASER while
GICR_CTLR.EnableLPIs == 1 is UNPREDICTABLE. After clearing EnableLPIs,
software must wait for GICR_CTLR.RWP == 0 before touching the pending
table. The architecture also permits implementations where, once
EnableLPIs has been set, clearing it again is not guaranteed to work.
Where an ITS is present, the spec strongly recommends moving LPIs to
another redistributor before clearing EnableLPIs.
Because of that, treat a retained EnableLPIs state as valid when the
redistributor still points at Xen's expected PROPBASER/PENDBASER tables.
Only try to clear EnableLPIs when the retained configuration does not
match Xen's state, and wait for RWP before reprogramming the tables.
This is also consistent with platform firmware reality: PSCI and the GIC
architecture allow platform-specific redistributor power handling, and not
all TF-A platforms force a full redistributor power-off through
implementation-defined controls during CPU_OFF. Xen therefore needs to
tolerate retained redistributor state on secondary CPU bring-up.
Tested using Xen's non-boot CPU disable/enable path on Arm
FVP_Base_RevC-2xAEMvA, both with and without:
-C gic_distributor.allow-LPIEN-clear=1
-C gic_distributor.GICR-clear-enable-supported=1
and on Orange Pi 5.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
xen/arch/arm/gic-v3-lpi.c | 77 ++++++++++++++++++++++++++-
xen/arch/arm/gic-v3.c | 15 ++++--
xen/arch/arm/include/asm/gic_v3_its.h | 1 +
3 files changed, 87 insertions(+), 6 deletions(-)
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index de5052e5cf..125f51e61b 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -81,6 +81,13 @@ static DEFINE_PER_CPU(struct lpi_redist_data, lpi_redist);
#define MAX_NR_HOST_LPIS (lpi_data.max_host_lpi_ids - LPI_OFFSET)
#define HOST_LPIS_PER_PAGE (PAGE_SIZE / sizeof(union host_lpi))
+#define GICR_PROPBASER_XEN_MASK GENMASK_ULL(51, 12)
+/*
+ * For retained redistributor state, match the pending table by address only.
+ * Attribute bits such as PTZ may not read back with the programmed value.
+ */
+#define GICR_PENDBASER_XEN_MASK GENMASK_ULL(51, 16)
+
static union host_lpi *gic_get_host_lpi(uint32_t plpi)
{
union host_lpi *block;
@@ -296,6 +303,60 @@ static int gicv3_lpi_set_pendtable(void __iomem *rdist_base)
return 0;
}
+static uint64_t gicv3_lpi_expected_proptable(void)
+{
+ return virt_to_maddr(lpi_data.lpi_property);
+}
+
+static uint64_t gicv3_lpi_expected_pendtable(void)
+{
+ return virt_to_maddr(this_cpu(lpi_redist).pending_table);
+}
+
+static bool gicv3_lpi_tables_match(void __iomem *rdist_base)
+{
+ uint64_t propbase, pendbase;
+
+ if ( !lpi_data.lpi_property || !this_cpu(lpi_redist).pending_table )
+ return false;
+
+ propbase = readq_relaxed(rdist_base + GICR_PROPBASER);
+ pendbase = readq_relaxed(rdist_base + GICR_PENDBASER);
+
+ return ((propbase & GICR_PROPBASER_XEN_MASK) ==
+ (gicv3_lpi_expected_proptable() & GICR_PROPBASER_XEN_MASK)) &&
+ ((pendbase & GICR_PENDBASER_XEN_MASK) ==
+ (gicv3_lpi_expected_pendtable() & GICR_PENDBASER_XEN_MASK));
+}
+
+static int gicv3_lpi_disable_lpis(void __iomem *rdist_base)
+{
+ uint32_t reg = readl_relaxed(rdist_base + GICR_CTLR);
+ int ret;
+
+ if ( !(reg & GICR_CTLR_ENABLE_LPIS) )
+ return 0;
+
+ writel_relaxed(reg & ~GICR_CTLR_ENABLE_LPIS, rdist_base + GICR_CTLR);
+
+ /*
+ * The spec only guarantees programmability when we have observed the bit
+ * cleared. Where clearing is supported, RWP must reach 0 before touching
+ * PROPBASER/PENDBASER again.
+ */
+ wmb();
+
+ ret = gicv3_do_wait_for_rwp(rdist_base);
+ if ( ret )
+ return ret;
+
+ reg = readl_relaxed(rdist_base + GICR_CTLR);
+ if ( reg & GICR_CTLR_ENABLE_LPIS )
+ return -EBUSY;
+
+ return 0;
+}
+
/*
* Tell a redistributor about the (shared) property table, allocating one
* if not already done.
@@ -373,7 +434,21 @@ int gicv3_lpi_init_rdist(void __iomem * rdist_base)
/* Make sure LPIs are disabled before setting up the tables. */
reg = readl_relaxed(rdist_base + GICR_CTLR);
if ( reg & GICR_CTLR_ENABLE_LPIS )
- return -EBUSY;
+ {
+ if ( gicv3_lpi_tables_match(rdist_base) )
+ return -EBUSY;
+
+ ret = gicv3_lpi_disable_lpis(rdist_base);
+ if ( ret == -EBUSY )
+ {
+ printk(XENLOG_ERR
+ "GICv3: CPU%d: LPIs still enabled with unexpected redistributor tables\n",
+ smp_processor_id());
+ return -EINVAL;
+ }
+ if ( ret )
+ return ret;
+ }
ret = gicv3_lpi_set_pendtable(rdist_base);
if ( ret )
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index bc07f97c16..34fb065afc 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -274,8 +274,8 @@ static void gicv3_enable_sre(void)
isb();
}
-/* Wait for completion of a distributor change */
-static void gicv3_do_wait_for_rwp(void __iomem *base)
+/* Wait for completion of a distributor/redistributor write-pending change. */
+int gicv3_do_wait_for_rwp(void __iomem *base)
{
uint32_t val;
bool timeout = false;
@@ -295,17 +295,22 @@ static void gicv3_do_wait_for_rwp(void __iomem *base)
} while ( 1 );
if ( timeout )
+ {
dprintk(XENLOG_ERR, "RWP timeout\n");
+ return -ETIMEDOUT;
+ }
+
+ return 0;
}
static void gicv3_dist_wait_for_rwp(void)
{
- gicv3_do_wait_for_rwp(GICD);
+ (void)gicv3_do_wait_for_rwp(GICD);
}
static void gicv3_redist_wait_for_rwp(void)
{
- gicv3_do_wait_for_rwp(GICD_RDIST_BASE);
+ (void)gicv3_do_wait_for_rwp(GICD_RDIST_BASE);
}
static void gicv3_wait_for_rwp(int irq)
@@ -925,7 +930,7 @@ static int __init gicv3_populate_rdist(void)
gicv3_set_redist_address(rdist_addr, procnum);
ret = gicv3_lpi_init_rdist(ptr);
- if ( ret && ret != -ENODEV )
+ if ( ret && ret != -ENODEV && ret != -EBUSY )
{
printk("GICv3: CPU%d: Cannot initialize LPIs: %u\n",
smp_processor_id(), ret);
diff --git a/xen/arch/arm/include/asm/gic_v3_its.h b/xen/arch/arm/include/asm/gic_v3_its.h
index fc5a84892c..081bd19180 100644
--- a/xen/arch/arm/include/asm/gic_v3_its.h
+++ b/xen/arch/arm/include/asm/gic_v3_its.h
@@ -133,6 +133,7 @@ struct host_its {
/* Map a collection for this host CPU to each host ITS. */
int gicv3_its_setup_collection(unsigned int cpu);
+int gicv3_do_wait_for_rwp(void __iomem *base);
#ifdef CONFIG_HAS_ITS
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 04/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (2 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 03/13] xen/arm: gic-v3: tolerate retained redistributor LPI state across CPU_OFF Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-23 11:28 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 05/13] xen/arm: gic-v3: add ITS suspend/resume support Mykola Kvach
` (10 subsequent siblings)
14 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
System suspend may lead to a state where GIC would be powered down.
Therefore, Xen should save/restore the context of GIC on suspend/resume.
Note that the context consists of states of registers which are
controlled by the hypervisor. Other GIC registers which are accessible
by guests are saved/restored on context switch.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V8:
- use right rdist base for prop/pend baser and ctrl
Changes in V7:
- restore LPI regs on resume
- add timeout during redist disabling
- squash with suspend/resume handling for GICv3 eSPI registers
- drop ITS guard paths so suspend/resume always runs; switch missing ctx
allocation to panic
- trim TODO comments; narrow redistributor storage to PPI icfgr
- keep distributor context allocation even without ITS; adjust resume
to use GENMASK(31, 0) for clearing enables
- drop storage of the SGI configuration register, as SGIs are always
edge-triggered
---
xen/arch/arm/gic-v3-lpi.c | 3 +
xen/arch/arm/gic-v3.c | 321 ++++++++++++++++++++++++-
xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
3 files changed, 322 insertions(+), 3 deletions(-)
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index 125f51e61b..01120aeed9 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -466,6 +466,9 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
switch ( action )
{
case CPU_UP_PREPARE:
+ if ( system_state == SYS_STATE_resume )
+ break;
+
rc = gicv3_lpi_allocate_pendtable(cpu);
if ( rc )
printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%lu\n",
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 34fb065afc..d182a71478 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1072,12 +1072,12 @@ out:
return res;
}
-static void gicv3_hyp_disable(void)
+static void gicv3_hyp_enable(bool enable)
{
register_t hcr;
hcr = READ_SYSREG(ICH_HCR_EL2);
- hcr &= ~GICH_HCR_EN;
+ hcr = enable ? (hcr | GICH_HCR_EN) : (hcr & ~GICH_HCR_EN);
WRITE_SYSREG(hcr, ICH_HCR_EL2);
isb();
}
@@ -1184,7 +1184,7 @@ static void gicv3_disable_interface(void)
spin_lock(&gicv3.lock);
gicv3_cpu_disable();
- gicv3_hyp_disable();
+ gicv3_hyp_enable(false);
spin_unlock(&gicv3.lock);
}
@@ -1920,6 +1920,313 @@ static bool gic_dist_supports_lpis(void)
return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+/* This struct represent block of 32 IRQs */
+struct dist_irq_block {
+ uint32_t icfgr[2];
+ uint32_t ipriorityr[8];
+ uint64_t irouter[32];
+ uint32_t isactiver;
+ uint32_t isenabler;
+};
+
+struct redist_ctx {
+ uint32_t ctlr;
+ uint32_t icfgr; /* only PPIs stored */
+ uint32_t igroupr;
+ uint32_t ipriorityr[8];
+ uint32_t isactiver;
+ uint32_t isenabler;
+
+ uint64_t pendbase;
+ uint64_t propbase;
+};
+
+/* GICv3 registers to be saved/restored on system suspend/resume */
+struct gicv3_ctx {
+ struct dist_ctx {
+ uint32_t ctlr;
+ struct dist_irq_block *irqs, *espi_irqs;
+ } dist;
+
+ /* have only one rdist structure for last running CPU during suspend */
+ struct redist_ctx rdist;
+
+ struct cpu_ctx {
+ uint32_t ctlr;
+ uint32_t pmr;
+ uint32_t bpr;
+ uint32_t sre_el2;
+ uint32_t grpen;
+ } cpu;
+};
+
+static struct gicv3_ctx gicv3_ctx;
+
+static void __init gicv3_alloc_context(void)
+{
+ uint32_t blocks = DIV_ROUND_UP(gicv3_info.nr_lines, 32);
+
+ /* The spec allows for systems without any SPIs */
+ if ( blocks > 1 )
+ {
+ gicv3_ctx.dist.irqs = xzalloc_array(struct dist_irq_block, blocks - 1);
+ if ( !gicv3_ctx.dist.irqs )
+ panic("Failed to allocate memory for GICv3 suspend context\n");
+ }
+
+#ifdef CONFIG_GICV3_ESPI
+ if ( !gic_number_espis() )
+ return;
+
+ blocks = gic_number_espis() / 32;
+ gicv3_ctx.dist.espi_irqs = xzalloc_array(struct dist_irq_block, blocks);
+ if ( !gicv3_ctx.dist.espi_irqs )
+ panic("Failed to allocate memory for GICv3 eSPI suspend context\n");
+#endif
+}
+
+static int gicv3_disable_redist(void)
+{
+ void __iomem *waker = GICD_RDIST_BASE + GICR_WAKER;
+ s_time_t deadline;
+
+ /*
+ * Avoid infinite loop if Non-secure does not have access to GICR_WAKER.
+ * See Arm IHI 0069H.b, 12.11.42 GICR_WAKER:
+ * When GICD_CTLR.DS == 0 and an access is Non-secure accesses to this
+ * register are RAZ/WI.
+ */
+ if ( !(readl_relaxed(GICD + GICD_CTLR) & GICD_CTLR_DS) )
+ return 0;
+
+ deadline = NOW() + MILLISECS(1000);
+
+ writel_relaxed(readl_relaxed(waker) | GICR_WAKER_ProcessorSleep, waker);
+ while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 )
+ {
+ if ( NOW() > deadline )
+ {
+ printk("GICv3: Timeout waiting for redistributor to sleep\n");
+ return -ETIMEDOUT;
+ }
+ cpu_relax();
+ udelay(10);
+ }
+
+ return 0;
+}
+
+#define GET_SPI_REG_OFFSET(name, is_espi) \
+ ((is_espi) ? GICD_##name##nE : GICD_##name)
+
+static void gicv3_store_spi_irq_block(struct dist_irq_block *irqs,
+ unsigned int i, bool is_espi)
+{
+ void __iomem *base;
+ unsigned int irq;
+
+ base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + i * sizeof(irqs->icfgr);
+ irqs->icfgr[0] = readl_relaxed(base);
+ irqs->icfgr[1] = readl_relaxed(base + 4);
+
+ base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi);
+ base += i * sizeof(irqs->ipriorityr);
+ for ( irq = 0; irq < ARRAY_SIZE(irqs->ipriorityr); irq++ )
+ irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi);
+ base += i * sizeof(irqs->irouter);
+ for ( irq = 0; irq < ARRAY_SIZE(irqs->irouter); irq++ )
+ irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi);
+ base += i * sizeof(irqs->isactiver);
+ irqs->isactiver = readl_relaxed(base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi);
+ base += i * sizeof(irqs->isenabler);
+ irqs->isenabler = readl_relaxed(base);
+}
+
+static void gicv3_restore_spi_irq_block(struct dist_irq_block *irqs,
+ unsigned int i, bool is_espi)
+{
+ void __iomem *base;
+ unsigned int irq;
+
+ base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + i * sizeof(irqs->icfgr);
+ writel_relaxed(irqs->icfgr[0], base);
+ writel_relaxed(irqs->icfgr[1], base + 4);
+
+ base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi);
+ base += i * sizeof(irqs->ipriorityr);
+ for ( irq = 0; irq < ARRAY_SIZE(irqs->ipriorityr); irq++ )
+ writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi);
+ base += i * sizeof(irqs->irouter);
+ for ( irq = 0; irq < ARRAY_SIZE(irqs->irouter); irq++ )
+ writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(ICENABLER, is_espi) + i * 4;
+ writel_relaxed(GENMASK(31, 0), base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi);
+ base += i * sizeof(irqs->isenabler);
+ writel_relaxed(irqs->isenabler, base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ICACTIVER, is_espi) + i * 4;
+ writel_relaxed(GENMASK(31, 0), base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi);
+ base += i * sizeof(irqs->isactiver);
+ writel_relaxed(irqs->isactiver, base);
+}
+
+static int gicv3_suspend(void)
+{
+ unsigned int i;
+ void __iomem *base;
+ int ret;
+ struct redist_ctx *rdist = &gicv3_ctx.rdist;
+
+ /* Save GICC configuration */
+ gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
+ gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
+ gicv3_ctx.cpu.bpr = READ_SYSREG(ICC_BPR1_EL1);
+ gicv3_ctx.cpu.sre_el2 = READ_SYSREG(ICC_SRE_EL2);
+ gicv3_ctx.cpu.grpen = READ_SYSREG(ICC_IGRPEN1_EL1);
+
+ gicv3_disable_interface();
+
+ ret = gicv3_disable_redist();
+ if ( ret )
+ goto out_enable_iface;
+
+ /* Save GICR configuration */
+ gicv3_redist_wait_for_rwp();
+
+ base = GICD_RDIST_BASE;
+
+ rdist->ctlr = readl_relaxed(base + GICR_CTLR);
+
+ rdist->propbase = readq_relaxed(base + GICR_PROPBASER);
+ rdist->pendbase = readq_relaxed(base + GICR_PENDBASER);
+
+ base = GICD_RDIST_SGI_BASE;
+
+ /* Save priority on PPI and SGI interrupts */
+ for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
+ rdist->ipriorityr[i] = readl_relaxed(base + GICR_IPRIORITYR0 + 4 * i);
+
+ rdist->isactiver = readl_relaxed(base + GICR_ISACTIVER0);
+ rdist->isenabler = readl_relaxed(base + GICR_ISENABLER0);
+ rdist->igroupr = readl_relaxed(base + GICR_IGROUPR0);
+ rdist->icfgr = readl_relaxed(base + GICR_ICFGR1);
+
+ /* Save GICD configuration */
+ gicv3_dist_wait_for_rwp();
+ gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
+
+ for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
+ gicv3_store_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
+
+#ifdef CONFIG_GICV3_ESPI
+ for ( i = 0; i < gic_number_espis() / 32; i++ )
+ gicv3_store_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
+#endif
+
+ return 0;
+
+ out_enable_iface:
+ gicv3_hyp_enable(true);
+ WRITE_SYSREG(gicv3_ctx.cpu.ctlr, ICC_CTLR_EL1);
+ isb();
+
+ return ret;
+}
+
+static void gicv3_resume(void)
+{
+ int ret;
+ unsigned int i;
+ void __iomem *base;
+ struct redist_ctx *rdist = &gicv3_ctx.rdist;
+
+ writel_relaxed(0, GICD + GICD_CTLR);
+
+ for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
+ writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
+
+ for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
+ gicv3_restore_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
+
+#ifdef CONFIG_GICV3_ESPI
+ for ( i = 0; i < gic_number_espis() / 32; i++ )
+ gicv3_restore_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
+#endif
+
+ writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
+ gicv3_dist_wait_for_rwp();
+
+ ret = gicv3_lpi_init_rdist(GICD_RDIST_BASE);
+ /*
+ * If LPIs are already enabled, assume firmware or the still-powered
+ * redistributor has valid PROPBASER/PENDBASER and skip reprogramming.
+ * Return -EBUSY so callers can ignore this case.
+ */
+ if ( ret && ret != -ENODEV && ret != -EBUSY )
+ panic("GICv3: Failed to re-initialize LPIs during resume\n");
+ else if ( ret == -EBUSY ) /* extra checks, just to be sure */
+ {
+ base = GICD_RDIST_BASE;
+ if ( readq_relaxed(base + GICR_PROPBASER) != rdist->propbase ||
+ readq_relaxed(base + GICR_PENDBASER) != rdist->pendbase )
+ {
+ panic("GICv3: LPIs already enabled with unexpected PROPBASER/PENDBASER during resume\n");
+ }
+ }
+
+ /* Restore GICR (Redistributor) configuration */
+ if ( gicv3_enable_redist() )
+ panic("GICv3: Failed to re-enable redistributor during resume\n");
+
+ base = GICD_RDIST_SGI_BASE;
+
+ writel_relaxed(GENMASK(31, 0), base + GICR_ICENABLER0);
+ gicv3_redist_wait_for_rwp();
+
+ for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
+ writel_relaxed(rdist->ipriorityr[i], base + GICR_IPRIORITYR0 + i * 4);
+
+ writel_relaxed(rdist->isactiver, base + GICR_ISACTIVER0);
+ writel_relaxed(rdist->igroupr, base + GICR_IGROUPR0);
+ writel_relaxed(rdist->icfgr, base + GICR_ICFGR1);
+
+ gicv3_redist_wait_for_rwp();
+
+ writel_relaxed(rdist->isenabler, base + GICR_ISENABLER0);
+ writel_relaxed(rdist->ctlr, GICD_RDIST_BASE + GICR_CTLR);
+
+ gicv3_redist_wait_for_rwp();
+
+ WRITE_SYSREG(gicv3_ctx.cpu.sre_el2, ICC_SRE_EL2);
+ isb();
+
+ /* Restore CPU interface (System registers) */
+ WRITE_SYSREG(gicv3_ctx.cpu.pmr, ICC_PMR_EL1);
+ WRITE_SYSREG(gicv3_ctx.cpu.bpr, ICC_BPR1_EL1);
+ WRITE_SYSREG(gicv3_ctx.cpu.ctlr, ICC_CTLR_EL1);
+ WRITE_SYSREG(gicv3_ctx.cpu.grpen, ICC_IGRPEN1_EL1);
+ isb();
+
+ gicv3_hyp_init();
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
/* Set up the GIC */
static int __init gicv3_init(void)
{
@@ -1994,6 +2301,10 @@ static int __init gicv3_init(void)
gicv3_hyp_init();
+#ifdef CONFIG_SYSTEM_SUSPEND
+ gicv3_alloc_context();
+#endif
+
out:
spin_unlock(&gicv3.lock);
@@ -2033,6 +2344,10 @@ static const struct gic_hw_operations gicv3_ops = {
#endif
.iomem_deny_access = gicv3_iomem_deny_access,
.do_LPI = gicv3_do_LPI,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = gicv3_suspend,
+ .resume = gicv3_resume,
+#endif
};
static int __init gicv3_dt_preinit(struct dt_device_node *node, const void *data)
diff --git a/xen/arch/arm/include/asm/gic_v3_defs.h b/xen/arch/arm/include/asm/gic_v3_defs.h
index c373b94d19..992c8f9c2f 100644
--- a/xen/arch/arm/include/asm/gic_v3_defs.h
+++ b/xen/arch/arm/include/asm/gic_v3_defs.h
@@ -94,6 +94,7 @@
#define GICD_TYPE_LPIS (1U << 17)
#define GICD_CTLR_RWP (1UL << 31)
+#define GICD_CTLR_DS (1U << 6)
#define GICD_CTLR_ARE_NS (1U << 4)
#define GICD_CTLR_ENABLE_G1A (1U << 1)
#define GICD_CTLR_ENABLE_G1 (1U << 0)
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 05/13] xen/arm: gic-v3: add ITS suspend/resume support
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (3 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 04/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-24 10:53 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume Mykola Kvach
` (9 subsequent siblings)
14 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Jan Beulich, Roger Pau Monné
From: Mykola Kvach <mykola_kvach@epam.com>
Handle system suspend/resume for GICv3 with an ITS present so LPIs keep
working after firmware powers the GIC down. Snapshot the CPU interface,
distributor and last-CPU redistributor state, disable the ITS to cache its
CTLR/CBASER/BASER registers, then restore everything and re-arm the
collection on resume.
Add list_for_each_entry_continue_reverse() in list.h for the ITS suspend
error path that needs to roll back partially saved state.
Based on Linux commit dba0bc7b76dc ("irqchip/gic-v3-its: Add ability to save/restore ITS state")
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V8:
- Reword the CBASER/CWRITER comment to match Xen and drop the stale Linux
cmd_write reference.
- Clarify the list_for_each_entry_continue_reverse() comment.
- Factor out per-ITS helpers for collection setup and resume.
- Restore each ITS and re-establish its collection mapping in the same
loop, so a failed ITS resume is not followed by MAPC/SYNC on that
un-restored instance.
- panic in case when resume of an ITS failed
- cleanup baser cache during suspend
---
xen/arch/arm/gic-v3-its.c | 126 ++++++++++++++++++++++++--
xen/arch/arm/gic-v3.c | 15 ++-
xen/arch/arm/include/asm/gic_v3_its.h | 23 +++++
xen/include/xen/list.h | 14 +++
4 files changed, 166 insertions(+), 12 deletions(-)
diff --git a/xen/arch/arm/gic-v3-its.c b/xen/arch/arm/gic-v3-its.c
index 9ba068c46f..fe2865eac9 100644
--- a/xen/arch/arm/gic-v3-its.c
+++ b/xen/arch/arm/gic-v3-its.c
@@ -335,6 +335,22 @@ static int its_send_cmd_inv(struct host_its *its,
return its_send_command(its, cmd);
}
+static int gicv3_its_setup_collection_single(struct host_its *its,
+ unsigned int cpu)
+{
+ int ret;
+
+ ret = its_send_cmd_mapc(its, cpu, cpu);
+ if ( ret )
+ return ret;
+
+ ret = its_send_cmd_sync(its, cpu);
+ if ( ret )
+ return ret;
+
+ return gicv3_its_wait_commands(its);
+}
+
/* Set up the (1:1) collection mapping for the given host CPU. */
int gicv3_its_setup_collection(unsigned int cpu)
{
@@ -343,15 +359,7 @@ int gicv3_its_setup_collection(unsigned int cpu)
list_for_each_entry(its, &host_its_list, entry)
{
- ret = its_send_cmd_mapc(its, cpu, cpu);
- if ( ret )
- return ret;
-
- ret = its_send_cmd_sync(its, cpu);
- if ( ret )
- return ret;
-
- ret = gicv3_its_wait_commands(its);
+ ret = gicv3_its_setup_collection_single(its, cpu);
if ( ret )
return ret;
}
@@ -1209,6 +1217,106 @@ int gicv3_its_init(void)
return 0;
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+int gicv3_its_suspend(void)
+{
+ struct host_its *its;
+ int ret;
+
+ list_for_each_entry(its, &host_its_list, entry)
+ {
+ unsigned int i;
+ void __iomem *base = its->its_base;
+
+ its->suspend_ctx.ctlr = readl_relaxed(base + GITS_CTLR);
+ ret = gicv3_disable_its(its);
+ if ( ret )
+ {
+ writel_relaxed(its->suspend_ctx.ctlr, base + GITS_CTLR);
+ goto err;
+ }
+
+ its->suspend_ctx.cbaser = readq_relaxed(base + GITS_CBASER);
+
+ for (i = 0; i < GITS_BASER_NR_REGS; i++)
+ {
+ uint64_t baser = readq_relaxed(base + GITS_BASER0 + i * 8);
+
+ its->suspend_ctx.baser[i] = 0;
+
+ if ( !(baser & GITS_VALID_BIT) )
+ continue;
+
+ its->suspend_ctx.baser[i] = baser;
+ }
+ }
+
+ return 0;
+
+ err:
+ list_for_each_entry_continue_reverse(its, &host_its_list, entry)
+ writel_relaxed(its->suspend_ctx.ctlr, its->its_base + GITS_CTLR);
+
+ return ret;
+}
+
+static int gicv3_its_resume_single(struct host_its *its, unsigned int cpu)
+{
+ void __iomem *base = its->its_base;
+ unsigned int i;
+ int ret;
+
+ /*
+ * Make sure that the ITS is disabled. If it fails to quiesce,
+ * don't restore it since writing to CBASER or BASER<n>
+ * registers is undefined according to the GIC v3 ITS
+ * Specification.
+ */
+ WARN_ON(readl_relaxed(base + GITS_CTLR) & GITS_CTLR_ENABLE);
+ ret = gicv3_disable_its(its);
+ if ( ret )
+ return ret;
+
+ writeq_relaxed(its->suspend_ctx.cbaser, base + GITS_CBASER);
+
+ /*
+ * Writing CBASER resets CREADR to 0, so reset CWRITER to
+ * keep the command queue pointers aligned.
+ */
+ writeq_relaxed(0, base + GITS_CWRITER);
+
+ /* Restore GITS_BASER from the value cache. */
+ for ( i = 0; i < GITS_BASER_NR_REGS; i++ )
+ {
+ uint64_t baser = its->suspend_ctx.baser[i];
+
+ if ( !(baser & GITS_VALID_BIT) )
+ continue;
+
+ writeq_relaxed(baser, base + GITS_BASER0 + i * 8);
+ }
+
+ writel_relaxed(its->suspend_ctx.ctlr, base + GITS_CTLR);
+
+ return gicv3_its_setup_collection_single(its, cpu);
+}
+
+void gicv3_its_resume(void)
+{
+ struct host_its *its;
+ unsigned int cpu = smp_processor_id();
+ int ret;
+
+ list_for_each_entry(its, &host_its_list, entry)
+ {
+ ret = gicv3_its_resume_single(its, cpu);
+ if ( ret )
+ panic("GICv3: ITS@%"PRIpaddr": failed to restore during resume: %d\n",
+ its->addr, ret);
+ }
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
/*
* Local variables:
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index d182a71478..ef8318dd50 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -862,7 +862,7 @@ static bool gicv3_enable_lpis(void)
return true;
}
-static int __init gicv3_populate_rdist(void)
+static int gicv3_populate_rdist(void)
{
int i;
uint32_t aff;
@@ -932,7 +932,7 @@ static int __init gicv3_populate_rdist(void)
ret = gicv3_lpi_init_rdist(ptr);
if ( ret && ret != -ENODEV && ret != -EBUSY )
{
- printk("GICv3: CPU%d: Cannot initialize LPIs: %u\n",
+ printk("GICv3: CPU%d: Cannot initialize LPIs: %d\n",
smp_processor_id(), ret);
break;
}
@@ -2101,10 +2101,14 @@ static int gicv3_suspend(void)
gicv3_disable_interface();
- ret = gicv3_disable_redist();
+ ret = gicv3_its_suspend();
if ( ret )
goto out_enable_iface;
+ ret = gicv3_disable_redist();
+ if ( ret )
+ goto out_its_resume;
+
/* Save GICR configuration */
gicv3_redist_wait_for_rwp();
@@ -2140,6 +2144,9 @@ static int gicv3_suspend(void)
return 0;
+ out_its_resume:
+ gicv3_its_resume();
+
out_enable_iface:
gicv3_hyp_enable(true);
WRITE_SYSREG(gicv3_ctx.cpu.ctlr, ICC_CTLR_EL1);
@@ -2212,6 +2219,8 @@ static void gicv3_resume(void)
gicv3_redist_wait_for_rwp();
+ gicv3_its_resume();
+
WRITE_SYSREG(gicv3_ctx.cpu.sre_el2, ICC_SRE_EL2);
isb();
diff --git a/xen/arch/arm/include/asm/gic_v3_its.h b/xen/arch/arm/include/asm/gic_v3_its.h
index 081bd19180..3ca74435c8 100644
--- a/xen/arch/arm/include/asm/gic_v3_its.h
+++ b/xen/arch/arm/include/asm/gic_v3_its.h
@@ -129,6 +129,13 @@ struct host_its {
spinlock_t cmd_lock;
void *cmd_buf;
unsigned int flags;
+#ifdef CONFIG_SYSTEM_SUSPEND
+ struct suspend_ctx {
+ uint32_t ctlr;
+ uint64_t cbaser;
+ uint64_t baser[GITS_BASER_NR_REGS];
+ } suspend_ctx;
+#endif
};
/* Map a collection for this host CPU to each host ITS. */
@@ -205,6 +212,11 @@ uint64_t gicv3_its_get_cacheability(void);
uint64_t gicv3_its_get_shareability(void);
unsigned int gicv3_its_get_memflags(void);
+#ifdef CONFIG_SYSTEM_SUSPEND
+int gicv3_its_suspend(void);
+void gicv3_its_resume(void);
+#endif
+
#else
#ifdef CONFIG_ACPI
@@ -272,6 +284,17 @@ static inline int gicv3_its_make_hwdom_dt_nodes(const struct domain *d,
return 0;
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+static inline int gicv3_its_suspend(void)
+{
+ return 0;
+}
+
+static inline void gicv3_its_resume(void)
+{
+}
+#endif
+
#endif /* CONFIG_HAS_ITS */
#endif
diff --git a/xen/include/xen/list.h b/xen/include/xen/list.h
index 98d8482dab..2aab274157 100644
--- a/xen/include/xen/list.h
+++ b/xen/include/xen/list.h
@@ -535,6 +535,20 @@ static inline void list_splice_init(struct list_head *list,
&(pos)->member != (head); \
(pos) = list_entry((pos)->member.next, typeof(*(pos)), member))
+/**
+ * list_for_each_entry_continue_reverse - iterate backwards from the given point
+ * @pos: the type * to use as a loop cursor.
+ * @head: the head for your list.
+ * @member: the name of the list_head within the struct.
+ *
+ * Iterate over list of given type backwards, starting from the element previous
+ * to the current one in list order.
+ */
+#define list_for_each_entry_continue_reverse(pos, head, member) \
+ for ((pos) = list_entry((pos)->member.prev, typeof(*(pos)), member); \
+ &(pos)->member != (head); \
+ (pos) = list_entry((pos)->member.prev, typeof(*(pos)), member))
+
/**
* list_for_each_entry_from - iterate over list of given type from the
* current point
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (4 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 05/13] xen/arm: gic-v3: add ITS suspend/resume support Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-24 10:59 ` Luca Fancellu
` (2 more replies)
2026-04-02 10:45 ` [PATCH v8 07/13] xen/arm: ffa: fix notification SRI across CPU hotplug/suspend Mykola Kvach
` (8 subsequent siblings)
14 siblings, 3 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Volodymyr Babchuk, Bertrand Marquis, Jens Wiklander,
Stefano Stabellini, Julien Grall, Michal Orzel
From: Mykola Kvach <mykola_kvach@epam.com>
init_tee_secondary() was marked __init and freed after boot. Calling it
from the CPU hotplug/resume path then executed discarded code, which
could crash Xen. Drop __init so the TEE mediator secondary init can run
safely on hotplugged and resumed CPUs.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
xen/arch/arm/tee/tee.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
index 8501443c8e..00e561fc78 100644
--- a/xen/arch/arm/tee/tee.c
+++ b/xen/arch/arm/tee/tee.c
@@ -128,7 +128,7 @@ static int __init tee_init(void)
presmp_initcall(tee_init);
-void __init init_tee_secondary(void)
+void init_tee_secondary(void)
{
if ( cur_mediator && cur_mediator->ops->init_secondary )
cur_mediator->ops->init_secondary();
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 07/13] xen/arm: ffa: fix notification SRI across CPU hotplug/suspend
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (5 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-24 12:05 ` Luca Fancellu
2026-04-27 8:20 ` Bertrand Marquis
2026-04-02 10:45 ` [PATCH v8 08/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
` (7 subsequent siblings)
14 siblings, 2 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Volodymyr Babchuk, Bertrand Marquis, Jens Wiklander,
Stefano Stabellini, Julien Grall, Michal Orzel
From: Mykola Kvach <mykola_kvach@epam.com>
The FF-A notification SRI interrupt handler was not correctly tied to
CPU hotplug and suspend/resume. As a result, CPUs going offline and
back online could end up with stale or missing handlers, breaking
delivery of FF-A notifications.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
xen/arch/arm/tee/ffa_notif.c | 63 ++++++++++++++++++++++++++++--------
1 file changed, 50 insertions(+), 13 deletions(-)
diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
index 186e726412..513c399594 100644
--- a/xen/arch/arm/tee/ffa_notif.c
+++ b/xen/arch/arm/tee/ffa_notif.c
@@ -360,10 +360,28 @@ static int32_t ffa_notification_bitmap_destroy(uint16_t vm_id)
return ffa_simple_call(FFA_NOTIFICATION_BITMAP_DESTROY, vm_id, 0, 0, 0);
}
-void ffa_notif_init_interrupt(void)
+static DEFINE_PER_CPU_READ_MOSTLY(struct irqaction, sri_irq);
+
+static int request_sri_irq(void)
{
int ret;
+ struct irqaction *sri_action = &this_cpu(sri_irq);
+
+ sri_action->name = "FF-A notif";
+ sri_action->handler = notif_irq_handler;
+ sri_action->dev_id = NULL;
+ sri_action->free_on_release = 0;
+
+ ret = setup_irq(notif_sri_irq, 0, sri_action);
+ if ( ret )
+ printk(XENLOG_ERR "ffa: setup_irq irq %u failed: error %d\n",
+ notif_sri_irq, ret);
+ return ret;
+}
+
+void ffa_notif_init_interrupt(void)
+{
if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
{
/*
@@ -376,14 +394,36 @@ void ffa_notif_init_interrupt(void)
* pending, while the SPMC in the secure world will not notice that
* the interrupt was lost.
*/
- ret = request_irq(notif_sri_irq, 0, notif_irq_handler, "FF-A notif",
- NULL);
- if ( ret )
- printk(XENLOG_ERR "ffa: request_irq irq %u failed: error %d\n",
- notif_sri_irq, ret);
+ request_sri_irq();
}
}
+static void deinit_ffa_notif_interrupt(void)
+{
+ if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
+ release_irq(notif_sri_irq, NULL);
+}
+
+static int cpu_ffa_notif_callback(struct notifier_block *nfb,
+ unsigned long action,
+ void *hcpu)
+{
+ switch ( action )
+ {
+ case CPU_DYING:
+ deinit_ffa_notif_interrupt();
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_ffa_notif_nfb = {
+ .notifier_call = cpu_ffa_notif_callback,
+};
+
void ffa_notif_init(void)
{
const struct arm_smccc_1_2_regs arg = {
@@ -392,7 +432,6 @@ void ffa_notif_init(void)
};
struct arm_smccc_1_2_regs resp;
unsigned int irq;
- int ret;
/* Only enable fw notification if all ABIs we need are supported */
if ( ffa_fw_supports_fid(FFA_NOTIFICATION_BITMAP_CREATE) &&
@@ -408,13 +447,11 @@ void ffa_notif_init(void)
notif_sri_irq = irq;
if ( irq >= NR_GIC_SGI )
irq_set_type(irq, IRQ_TYPE_EDGE_RISING);
- ret = request_irq(irq, 0, notif_irq_handler, "FF-A notif", NULL);
- if ( ret )
- {
- printk(XENLOG_ERR "ffa: request_irq irq %u failed: error %d\n",
- irq, ret);
+
+ if ( request_sri_irq() )
return;
- }
+
+ register_cpu_notifier(&cpu_ffa_notif_nfb);
fw_notif_enabled = true;
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 08/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (6 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 07/13] xen/arm: ffa: fix notification SRI across CPU hotplug/suspend Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-24 13:34 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers Mykola Kvach
` (6 subsequent siblings)
14 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Store and restore active context and micro-TLB registers.
Tested on R-Car H3 Starter Kit.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V7:
- moved suspend context allocation before pci stuff
---
xen/drivers/passthrough/arm/ipmmu-vmsa.c | 305 ++++++++++++++++++++++-
1 file changed, 298 insertions(+), 7 deletions(-)
diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
index ea9fa9ddf3..6765bd3083 100644
--- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -71,6 +71,8 @@
})
#endif
+#define dev_dbg(dev, fmt, ...) \
+ dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
#define dev_info(dev, fmt, ...) \
dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
#define dev_warn(dev, fmt, ...) \
@@ -130,6 +132,24 @@ struct ipmmu_features {
unsigned int imuctr_ttsel_mask;
};
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+struct ipmmu_reg_ctx {
+ unsigned int imttlbr0;
+ unsigned int imttubr0;
+ unsigned int imttbcr;
+ unsigned int imctr;
+};
+
+struct ipmmu_vmsa_backup {
+ struct device *dev;
+ unsigned int *utlbs_val;
+ unsigned int *asids_val;
+ struct list_head list;
+};
+
+#endif
+
/* Root/Cache IPMMU device's information */
struct ipmmu_vmsa_device {
struct device *dev;
@@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
unsigned int utlb_refcount[IPMMU_UTLB_MAX];
const struct ipmmu_features *features;
+#ifdef CONFIG_SYSTEM_SUSPEND
+ struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
+#endif
};
/*
@@ -547,6 +570,245 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
spin_unlock_irqrestore(&mmu->lock, flags);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
+static LIST_HEAD(ipmmu_devices_backup);
+
+static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
+
+static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
+ unsigned int utlb)
+{
+ return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
+}
+
+static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
+{
+ struct ipmmu_vmsa_backup *backup_data;
+
+ dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
+
+ spin_lock(&ipmmu_devices_backup_lock);
+
+ list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
+ {
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
+ unsigned int i;
+
+ if ( to_ipmmu(backup_data->dev) != mmu )
+ continue;
+
+ for ( i = 0; i < fwspec->num_ids; i++ )
+ {
+ unsigned int utlb = fwspec->ids[i];
+
+ backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
+ backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
+ }
+ }
+
+ spin_unlock(&ipmmu_devices_backup_lock);
+}
+
+static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
+{
+ struct ipmmu_vmsa_backup *backup_data;
+
+ dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
+
+ spin_lock(&ipmmu_devices_backup_lock);
+
+ list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
+ {
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
+ unsigned int i;
+
+ if ( to_ipmmu(backup_data->dev) != mmu )
+ continue;
+
+ for ( i = 0; i < fwspec->num_ids; i++ )
+ {
+ unsigned int utlb = fwspec->ids[i];
+
+ ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
+ ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
+ }
+ }
+
+ spin_unlock(&ipmmu_devices_backup_lock);
+}
+
+static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
+{
+ struct ipmmu_vmsa_device *mmu = domain->mmu->root;
+ struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
+
+ dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
+
+ regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
+ regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
+ regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
+ regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
+}
+
+static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
+{
+ struct ipmmu_vmsa_device *mmu = domain->mmu->root;
+ struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
+
+ dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
+
+ ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
+ ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
+ ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
+ ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
+}
+
+/*
+ * Xen: Unlike Linux implementation, Xen uses a single driver instance
+ * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
+ * callbacks to be invoked for each IPMMU device. So, we need to iterate
+ * through all registered IPMMUs performing required actions.
+ *
+ * Also take care of restoring special settings, such as translation
+ * table format, etc.
+ */
+static int __must_check ipmmu_suspend(void)
+{
+ struct ipmmu_vmsa_device *mmu;
+
+ if ( !iommu_enabled )
+ return 0;
+
+ printk(XENLOG_DEBUG "ipmmu: Suspending...\n");
+
+ spin_lock(&ipmmu_devices_lock);
+
+ list_for_each_entry( mmu, &ipmmu_devices, list )
+ {
+ if ( ipmmu_is_root(mmu) )
+ {
+ unsigned int i;
+
+ for ( i = 0; i < mmu->num_ctx; i++ )
+ {
+ if ( !mmu->domains[i] )
+ continue;
+ ipmmu_domain_backup_context(mmu->domains[i]);
+ }
+ }
+ else
+ ipmmu_utlbs_backup(mmu);
+ }
+
+ spin_unlock(&ipmmu_devices_lock);
+
+ return 0;
+}
+
+static void ipmmu_resume(void)
+{
+ struct ipmmu_vmsa_device *mmu;
+
+ if ( !iommu_enabled )
+ return;
+
+ printk(XENLOG_DEBUG "ipmmu: Resuming...\n");
+
+ spin_lock(&ipmmu_devices_lock);
+
+ list_for_each_entry( mmu, &ipmmu_devices, list )
+ {
+ uint32_t reg;
+
+ /* Do not use security group function */
+ reg = IMSCTLR + mmu->features->control_offset_base;
+ ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
+
+ if ( ipmmu_is_root(mmu) )
+ {
+ unsigned int i;
+
+ /* Use stage 2 translation table format */
+ reg = IMSAUXCTLR + mmu->features->control_offset_base;
+ ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
+
+ for ( i = 0; i < mmu->num_ctx; i++ )
+ {
+ if ( !mmu->domains[i] )
+ continue;
+ ipmmu_domain_restore_context(mmu->domains[i]);
+ }
+ }
+ else
+ ipmmu_utlbs_restore(mmu);
+ }
+
+ spin_unlock(&ipmmu_devices_lock);
+}
+
+static int ipmmu_alloc_ctx_suspend(struct device *dev)
+{
+ struct ipmmu_vmsa_backup *backup_data;
+ unsigned int *utlbs_val, *asids_val;
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
+ utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
+ if ( !utlbs_val )
+ return -ENOMEM;
+
+ asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
+ if ( !asids_val )
+ {
+ xfree(utlbs_val);
+ return -ENOMEM;
+ }
+
+ backup_data = xzalloc(struct ipmmu_vmsa_backup);
+ if ( !backup_data )
+ {
+ xfree(utlbs_val);
+ xfree(asids_val);
+ return -ENOMEM;
+ }
+
+ backup_data->dev = dev;
+ backup_data->utlbs_val = utlbs_val;
+ backup_data->asids_val = asids_val;
+
+ spin_lock(&ipmmu_devices_backup_lock);
+ list_add(&backup_data->list, &ipmmu_devices_backup);
+ spin_unlock(&ipmmu_devices_backup_lock);
+
+ return 0;
+}
+
+#ifdef CONFIG_HAS_PCI
+static void ipmmu_free_ctx_suspend(struct device *dev)
+{
+ struct ipmmu_vmsa_backup *backup_data, *tmp;
+
+ spin_lock(&ipmmu_devices_backup_lock);
+
+ list_for_each_entry_safe( backup_data, tmp, &ipmmu_devices_backup, list )
+ {
+ if ( backup_data->dev == dev )
+ {
+ list_del(&backup_data->list);
+ xfree(backup_data->utlbs_val);
+ xfree(backup_data->asids_val);
+ xfree(backup_data);
+ break;
+ }
+ }
+
+ spin_unlock(&ipmmu_devices_backup_lock);
+}
+#endif /* CONFIG_HAS_PCI */
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
{
uint64_t ttbr;
@@ -559,6 +821,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
return ret;
domain->context_id = ret;
+#ifdef CONFIG_SYSTEM_SUSPEND
+ domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
+#endif
/*
* TTBR0
@@ -615,6 +880,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
ipmmu_tlb_sync(domain);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ domain->mmu->root->reg_backup[domain->context_id] = NULL;
+#endif
ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
}
@@ -1340,10 +1608,11 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
struct iommu_fwspec *fwspec;
#ifdef CONFIG_HAS_PCI
+ int ret;
+
if ( dev_is_pci(dev) )
{
struct pci_dev *pdev = dev_to_pci(dev);
- int ret;
if ( devfn != pdev->devfn )
return 0;
@@ -1371,6 +1640,15 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
/* Let Xen know that the master device is protected by an IOMMU. */
dt_device_set_protected(dev_to_dt(dev));
}
+
+#ifdef CONFIG_SYSTEM_SUSPEND
+ if ( ipmmu_alloc_ctx_suspend(dev) )
+ {
+ dev_err(dev, "Failed to allocate context for suspend\n");
+ return -ENOMEM;
+ }
+#endif
+
#ifdef CONFIG_HAS_PCI
if ( dev_is_pci(dev) )
{
@@ -1379,26 +1657,28 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
struct pci_host_bridge *bridge;
struct iommu_fwspec *fwspec_bridge;
unsigned int utlb_osid0 = 0;
- int ret;
bridge = pci_find_host_bridge(pdev->seg, pdev->bus);
if ( !bridge )
{
dev_err(dev, "Failed to find host bridge\n");
- return -ENODEV;
+ ret = -ENODEV;
+ goto free_suspend_ctx;
}
fwspec_bridge = dev_iommu_fwspec_get(dt_to_dev(bridge->dt_node));
if ( fwspec_bridge->num_ids < 1 )
{
dev_err(dev, "Failed to find host bridge uTLB\n");
- return -ENXIO;
+ ret = -ENXIO;
+ goto free_suspend_ctx;
}
if ( fwspec->num_ids < 1 )
{
dev_err(dev, "Failed to find uTLB");
- return -ENXIO;
+ ret = -ENXIO;
+ goto free_suspend_ctx;
}
rcar4_pcie_osid_regs_init(bridge);
@@ -1407,7 +1687,7 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
if ( ret < 0 )
{
dev_err(dev, "No unused OSID regs\n");
- return ret;
+ goto free_suspend_ctx;
}
reg_id = ret;
@@ -1422,7 +1702,7 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
{
rcar4_pcie_osid_bdf_clear(bridge, reg_id);
rcar4_pcie_osid_reg_free(bridge, reg_id);
- return ret;
+ goto free_suspend_ctx;
}
}
#endif
@@ -1431,6 +1711,13 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
dev_name(fwspec->iommu_dev), fwspec->num_ids);
return 0;
+#ifdef CONFIG_HAS_PCI
+ free_suspend_ctx:
+#ifdef CONFIG_SYSTEM_SUSPEND
+ ipmmu_free_ctx_suspend(dev);
+#endif
+ return ret;
+#endif
}
static int ipmmu_iommu_domain_init(struct domain *d)
@@ -1492,6 +1779,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
.unmap_page = arm_iommu_unmap_page,
.dt_xlate = ipmmu_dt_xlate,
.add_device = ipmmu_add_device,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = ipmmu_suspend,
+ .resume = ipmmu_resume,
+#endif
};
static __init int ipmmu_init(struct dt_device_node *node, const void *data)
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (7 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 08/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-27 14:01 ` Luca Fancellu
2026-04-27 14:02 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
` (5 subsequent siblings)
14 siblings, 2 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Bertrand Marquis, Rahul Singh, Stefano Stabellini,
Julien Grall, Michal Orzel, Volodymyr Babchuk
Before we suspend SMMU, we want to ensure that all commands (especially
ATC_INV) have been flushed by the CMDQ, i.e. the CMDQs are empty.
The suspend callback configures the SMMU to abort new transactions,
disables the main translation unit and then drains the command queue
to ensure completion of any in-flight commands.
The resume callback performs a full device reset via 'arm_smmu_device_reset'
to bring the SMMU back to an operational state.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V8:
- Honor ARM_SMMU_FEAT_SEV when draining the CMDQ during suspend, matching
the existing runtime CMD_SYNC path.
- Fold the suspend rollback reset path into a helper and rename the error
reporting to describe suspend rollback rather than resume.
- Treat SMMU reset failure during resume as fatal instead of logging and
continuing with a potentially unusable IOMMU.
- cosmetic changes
---
xen/drivers/passthrough/arm/smmu-v3.c | 172 ++++++++++++++++++++------
1 file changed, 136 insertions(+), 36 deletions(-)
diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
index bf153227db..7607ffc9ca 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -1814,8 +1814,7 @@ static int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
}
/* GBPA is "special" */
-static int __init arm_smmu_update_gbpa(struct arm_smmu_device *smmu,
- u32 set, u32 clr)
+static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
{
int ret;
u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;
@@ -1995,10 +1994,29 @@ err_free_evtq_irq:
return ret;
}
+static int arm_smmu_enable_irqs(struct arm_smmu_device *smmu)
+{
+ int ret;
+ u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
+
+ if ( smmu->features & ARM_SMMU_FEAT_PRI )
+ irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
+
+ /* Enable interrupt generation on the SMMU */
+ ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
+ ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
+ if ( ret )
+ {
+ dev_warn(smmu->dev, "failed to enable irqs\n");
+ return ret;
+ }
+
+ return 0;
+}
+
static int __init arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
{
int ret, irq;
- u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
/* Disable IRQs first */
ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
@@ -2028,22 +2046,7 @@ static int __init arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
}
}
- if (smmu->features & ARM_SMMU_FEAT_PRI)
- irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
-
- /* Enable interrupt generation on the SMMU */
- ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
- ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
- if (ret) {
- dev_warn(smmu->dev, "failed to enable irqs\n");
- goto err_free_irqs;
- }
-
return 0;
-
-err_free_irqs:
- arm_smmu_free_irqs(smmu);
- return ret;
}
static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
@@ -2057,7 +2060,7 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
return ret;
}
-static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
+static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
{
int ret;
u32 reg, enables;
@@ -2163,17 +2166,9 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
}
}
- ret = arm_smmu_setup_irqs(smmu);
- if (ret) {
- dev_err(smmu->dev, "failed to setup irqs\n");
+ ret = arm_smmu_enable_irqs(smmu);
+ if ( ret )
return ret;
- }
-
- /* Initialize tasklets for threaded IRQs*/
- tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
- tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
- tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
- smmu);
/* Enable the SMMU interface, or ensure bypass */
if (disable_bypass) {
@@ -2181,20 +2176,16 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
} else {
ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT);
if (ret)
- goto err_free_irqs;
+ return ret;
}
ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
ARM_SMMU_CR0ACK);
if (ret) {
dev_err(smmu->dev, "failed to enable SMMU interface\n");
- goto err_free_irqs;
+ return ret;
}
return 0;
-
-err_free_irqs:
- arm_smmu_free_irqs(smmu);
- return ret;
}
static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
@@ -2558,10 +2549,23 @@ static int __init arm_smmu_device_probe(struct platform_device *pdev)
if (ret)
goto out_free;
+ ret = arm_smmu_setup_irqs(smmu);
+ if ( ret )
+ {
+ dev_err(smmu->dev, "failed to setup irqs\n");
+ goto out_free;
+ }
+
+ /* Initialize tasklets for threaded IRQs*/
+ tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
+ tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
+ tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
+ smmu);
+
/* Reset the device */
ret = arm_smmu_device_reset(smmu);
if (ret)
- goto out_free;
+ goto out_free_irqs;
/*
* Keep a list of all probed devices. This will be used to query
@@ -2575,6 +2579,8 @@ static int __init arm_smmu_device_probe(struct platform_device *pdev)
return 0;
+out_free_irqs:
+ arm_smmu_free_irqs(smmu);
out_free:
arm_smmu_free_structures(smmu);
@@ -2855,6 +2861,96 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
xfree(xen_domain);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+static void arm_smmu_reset_for_suspend_rollback(struct arm_smmu_device *smmu)
+{
+ int ret = arm_smmu_device_reset(smmu);
+
+ if ( ret )
+ dev_err(smmu->dev, "Failed to reset during suspend rollback: %d\n",
+ ret);
+}
+
+static int arm_smmu_suspend(void)
+{
+ struct arm_smmu_device *smmu;
+ int ret = 0;
+
+ list_for_each_entry(smmu, &arm_smmu_devices, devices)
+ {
+ bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
+
+ /* Abort all transactions before disable to avoid spurious bypass */
+ ret = arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
+ if ( ret )
+ goto fail;
+
+ /* Disable the SMMU via CR0.EN and all queues except CMDQ */
+ ret = arm_smmu_write_reg_sync(smmu, CR0_CMDQEN, ARM_SMMU_CR0,
+ ARM_SMMU_CR0ACK);
+ if ( ret )
+ {
+ dev_err(smmu->dev, "Timed-out while disabling smmu\n");
+ goto fail;
+ }
+
+ /*
+ * At this point the SMMU is completely disabled and won't access
+ * any translation/config structures, even speculative accesses
+ * aren't performed as per the IHI0070 spec (section 6.3.9.6).
+ */
+
+ /* Wait for the CMDQs to be drained to flush any pending commands */
+ ret = queue_poll_cons(&smmu->cmdq.q, true, wfe);
+ if ( ret )
+ {
+ dev_err(smmu->dev, "Draining queues timed-out\n");
+ goto fail;
+ }
+
+ /* Disable everything */
+ ret = arm_smmu_device_disable(smmu);
+ if ( ret )
+ goto fail;
+
+ dev_dbg(smmu->dev, "Suspended smmu\n");
+ }
+
+ return 0;
+
+ fail:
+ /* Reset the device that failed as well as any already-suspended ones. */
+ arm_smmu_reset_for_suspend_rollback(smmu);
+
+ list_for_each_entry_continue_reverse(smmu, &arm_smmu_devices, devices)
+ arm_smmu_reset_for_suspend_rollback(smmu);
+
+ return ret;
+}
+
+static void arm_smmu_resume(void)
+{
+ int ret;
+ struct arm_smmu_device *smmu;
+
+ list_for_each_entry(smmu, &arm_smmu_devices, devices)
+ {
+ dev_dbg(smmu->dev, "Resuming device\n");
+
+ /*
+ * The reset will re-initialize all the base addresses, queues,
+ * prod and cons maintained within struct arm_smmu_device as well as
+ * re-enable the interrupts.
+ */
+ ret = arm_smmu_device_reset(smmu);
+ if ( ret )
+ panic("SMMUv3: %s: Failed to reset during resume: %d\n",
+ dev_name(smmu->dev), ret);
+ }
+}
+#endif
+
static const struct iommu_ops arm_smmu_iommu_ops = {
.page_sizes = PAGE_SIZE_4K,
.init = arm_smmu_iommu_xen_domain_init,
@@ -2867,6 +2963,10 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
.unmap_page = arm_iommu_unmap_page,
.dt_xlate = arm_smmu_dt_xlate,
.add_device = arm_smmu_add_device,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = arm_smmu_suspend,
+ .resume = arm_smmu_resume,
+#endif
};
static __init int arm_smmu_dt_init(struct dt_device_node *dev,
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (8 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-27 14:50 ` Luca Fancellu
2026-05-07 22:06 ` Volodymyr Babchuk
2026-04-02 10:45 ` [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
` (4 subsequent siblings)
14 siblings, 2 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mirela Simonovic <mirela.simonovic@aggios.com>
The MMU must be enabled during the resume path before restoring context,
as virtual addresses are used to access the saved context data.
This patch adds MMU setup during resume by reusing the existing
enable_secondary_cpu_mm function, which enables data cache and the MMU.
Before the MMU is enabled, the content of TTBR0_EL2 is changed to point
to init_ttbr (page tables used at runtime).
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v7:
- no functional changes, just moved commit
---
xen/arch/arm/arm64/head.S | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 72c7b24498..596e960152 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -561,6 +561,30 @@ END(efi_xen_start)
#endif /* CONFIG_ARM_EFI */
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+FUNC(hyp_resume)
+ /* Initialize the UART if earlyprintk has been enabled. */
+#ifdef CONFIG_EARLY_PRINTK
+ bl init_uart
+#endif
+ PRINT_ID("- Xen resuming -\r\n")
+
+ bl check_cpu_mode
+ bl cpu_init
+
+ ldr x0, =start
+ adr x20, start /* x20 := paddr (start) */
+ sub x20, x20, x0 /* x20 := phys-offset */
+ ldr lr, =mmu_resumed
+ b enable_secondary_cpu_mm
+
+mmu_resumed:
+ b .
+END(hyp_resume)
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
/*
* Local variables:
* mode: ASM
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (9 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-27 15:26 ` Luca Fancellu
` (2 more replies)
2026-04-02 10:45 ` [PATCH v8 12/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
` (3 subsequent siblings)
14 siblings, 3 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mirela Simonovic <mirela.simonovic@aggios.com>
The context of CPU general purpose and system control registers must be
saved on suspend and restored on resume. This is implemented in
prepare_resume_ctx and before the return from the hyp_resume function.
The prepare_resume_ctx must be invoked just before the PSCI system suspend
call is issued to the ATF. The prepare_resume_ctx must return a non-zero
value so that the calling 'if' statement evaluates to true, causing the
system suspend to be invoked. Upon resume, the context saved on suspend
will be restored, including the link register. Therefore, after
restoring the context, the control flow will return to the address
pointed to by the saved link register, which is the place from which
prepare_resume_ctx was called. To ensure that the calling 'if' statement
does not again evaluate to true and initiate system suspend, hyp_resume
must return a zero value after restoring the context.
Note that the order of saving register context into cpu_context structure
must match the order of restoring.
Support for ARM32 is not implemented. Instead, compilation fails with a
build-time error if suspend is enabled for ARM32.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v8:
- fix alignments in code
Changes in v7:
- no changes
---
xen/arch/arm/Makefile | 1 +
xen/arch/arm/arm64/head.S | 90 +++++++++++++++++++++++++++++-
xen/arch/arm/include/asm/suspend.h | 26 +++++++++
xen/arch/arm/suspend.c | 14 +++++
4 files changed, 130 insertions(+), 1 deletion(-)
create mode 100644 xen/arch/arm/suspend.c
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 69200b2728..c36158271a 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -51,6 +51,7 @@ obj-y += setup.o
obj-y += shutdown.o
obj-y += smp.o
obj-y += smpboot.o
+obj-$(CONFIG_SYSTEM_SUSPEND) += suspend.o
obj-$(CONFIG_SYSCTL) += sysctl.o
obj-y += time.o
obj-y += traps.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 596e960152..2cb02ee314 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -562,6 +562,52 @@ END(efi_xen_start)
#endif /* CONFIG_ARM_EFI */
#ifdef CONFIG_SYSTEM_SUSPEND
+/*
+ * int prepare_resume_ctx(struct cpu_context *ptr)
+ *
+ * x0 - pointer to the storage where callee's context will be saved
+ *
+ * CPU context saved here will be restored on resume in hyp_resume function.
+ * prepare_resume_ctx shall return a non-zero value. Upon restoring context
+ * hyp_resume shall return value zero instead. From C code that invokes
+ * prepare_resume_ctx, the return value is interpreted to determine whether
+ * the context is saved (prepare_resume_ctx) or restored (hyp_resume).
+ */
+FUNC(prepare_resume_ctx)
+ /* Store callee-saved registers */
+ stp x19, x20, [x0], #16
+ stp x21, x22, [x0], #16
+ stp x23, x24, [x0], #16
+ stp x25, x26, [x0], #16
+ stp x27, x28, [x0], #16
+ stp x29, lr, [x0], #16
+
+ /* Store stack-pointer */
+ mov x2, sp
+ str x2, [x0], #8
+
+ /* Store system control registers */
+ mrs x2, VBAR_EL2
+ str x2, [x0], #8
+ mrs x2, VTCR_EL2
+ str x2, [x0], #8
+ mrs x2, VTTBR_EL2
+ str x2, [x0], #8
+ mrs x2, TPIDR_EL2
+ str x2, [x0], #8
+ mrs x2, MDCR_EL2
+ str x2, [x0], #8
+ mrs x2, HSTR_EL2
+ str x2, [x0], #8
+ mrs x2, CPTR_EL2
+ str x2, [x0], #8
+ mrs x2, HCR_EL2
+ str x2, [x0], #8
+
+ /* prepare_resume_ctx must return a non-zero value */
+ mov x0, #1
+ ret
+END(prepare_resume_ctx)
FUNC(hyp_resume)
/* Initialize the UART if earlyprintk has been enabled. */
@@ -580,7 +626,49 @@ FUNC(hyp_resume)
b enable_secondary_cpu_mm
mmu_resumed:
- b .
+ /* Now we can access the cpu_context, so restore the context here */
+ ldr x0, =cpu_context
+
+ /* Restore callee-saved registers */
+ ldp x19, x20, [x0], #16
+ ldp x21, x22, [x0], #16
+ ldp x23, x24, [x0], #16
+ ldp x25, x26, [x0], #16
+ ldp x27, x28, [x0], #16
+ ldp x29, lr, [x0], #16
+
+ /* Restore stack pointer */
+ ldr x2, [x0], #8
+ mov sp, x2
+
+ /* Restore system control registers */
+ ldr x2, [x0], #8
+ msr VBAR_EL2, x2
+ ldr x2, [x0], #8
+ msr VTCR_EL2, x2
+ ldr x2, [x0], #8
+ msr VTTBR_EL2, x2
+ ldr x2, [x0], #8
+ msr TPIDR_EL2, x2
+ ldr x2, [x0], #8
+ msr MDCR_EL2, x2
+ ldr x2, [x0], #8
+ msr HSTR_EL2, x2
+ ldr x2, [x0], #8
+ msr CPTR_EL2, x2
+ ldr x2, [x0], #8
+ msr HCR_EL2, x2
+ isb
+
+ /*
+ * Since context is restored return from this function will appear
+ * as return from prepare_resume_ctx. To distinguish a return from
+ * prepare_resume_ctx which is called upon finalizing the suspend,
+ * as opposed to return from this function which executes on resume,
+ * we need to return zero value here.
+ */
+ mov x0, #0
+ ret
END(hyp_resume)
#endif /* CONFIG_SYSTEM_SUSPEND */
diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
index 31a98a1f1b..c127fa3d78 100644
--- a/xen/arch/arm/include/asm/suspend.h
+++ b/xen/arch/arm/include/asm/suspend.h
@@ -3,6 +3,8 @@
#ifndef ARM_SUSPEND_H
#define ARM_SUSPEND_H
+#include <xen/types.h>
+
struct domain;
struct vcpu;
struct vcpu_guest_context;
@@ -14,6 +16,30 @@ struct resume_info {
void arch_domain_resume(struct domain *d);
+#ifdef CONFIG_SYSTEM_SUSPEND
+#ifdef CONFIG_ARM_64
+struct cpu_context {
+ register_t callee_regs[12];
+ register_t sp;
+ register_t vbar_el2;
+ register_t vtcr_el2;
+ register_t vttbr_el2;
+ register_t tpidr_el2;
+ register_t mdcr_el2;
+ register_t hstr_el2;
+ register_t cptr_el2;
+ register_t hcr_el2;
+} __aligned(16);
+#else
+#error "Define cpu_context structure for arm32"
+#endif
+
+extern struct cpu_context cpu_context;
+
+int prepare_resume_ctx(struct cpu_context *ptr);
+void hyp_resume(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
#endif /* ARM_SUSPEND_H */
/*
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
new file mode 100644
index 0000000000..e38566b0b7
--- /dev/null
+++ b/xen/arch/arm/suspend.c
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <asm/suspend.h>
+
+struct cpu_context cpu_context = {};
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 12/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (10 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-27 16:21 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
` (2 subsequent siblings)
14 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mirela Simonovic <mirela.simonovic@aggios.com>
Invoke PSCI SYSTEM_SUSPEND to finalize Xen's suspend sequence on ARM64 platforms.
Pass the resume entry point (hyp_resume) as the first argument to EL3. The resume
handler is currently a stub and will be implemented later in assembly. Ignore the
context ID argument, as is done in Linux.
Only enable this path when CONFIG_SYSTEM_SUSPEND is set and
PSCI version is >= 1.0.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v7:
- no changes
---
xen/arch/arm/include/asm/psci.h | 1 +
xen/arch/arm/psci.c | 23 ++++++++++++++++++++++-
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
index 48a93e6b79..bb3c73496e 100644
--- a/xen/arch/arm/include/asm/psci.h
+++ b/xen/arch/arm/include/asm/psci.h
@@ -23,6 +23,7 @@ int call_psci_cpu_on(int cpu);
void call_psci_cpu_off(void);
void call_psci_system_off(void);
void call_psci_system_reset(void);
+int call_psci_system_suspend(void);
/* Range of allocated PSCI function numbers */
#define PSCI_FNUM_MIN_VALUE _AC(0,U)
diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index b6860a7760..c9d126b195 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -17,17 +17,20 @@
#include <asm/cpufeature.h>
#include <asm/psci.h>
#include <asm/acpi.h>
+#include <asm/suspend.h>
/*
* While a 64-bit OS can make calls with SMC32 calling conventions, for
* some calls it is necessary to use SMC64 to pass or return 64-bit values.
- * For such calls PSCI_0_2_FN_NATIVE(x) will choose the appropriate
+ * For such calls PSCI_*_FN_NATIVE(x) will choose the appropriate
* (native-width) function ID.
*/
#ifdef CONFIG_ARM_64
#define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN64_##name
+#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN64_##name
#else
#define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN32_##name
+#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN32_##name
#endif
uint32_t psci_ver;
@@ -60,6 +63,24 @@ void call_psci_cpu_off(void)
}
}
+int call_psci_system_suspend(void)
+{
+#ifdef CONFIG_SYSTEM_SUSPEND
+ struct arm_smccc_res res;
+
+ if ( psci_ver < PSCI_VERSION(1, 0) )
+ return PSCI_NOT_SUPPORTED;
+
+ /* 2nd argument (context ID) is not used */
+ arm_smccc_smc(PSCI_1_0_FN_NATIVE(SYSTEM_SUSPEND), __pa(hyp_resume), &res);
+ return PSCI_RET(res);
+#else
+ dprintk(XENLOG_WARNING,
+ "SYSTEM_SUSPEND not supported (CONFIG_SYSTEM_SUSPEND disabled)\n");
+ return PSCI_NOT_SUPPORTED;
+#endif
+}
+
void call_psci_system_off(void)
{
if ( psci_ver > PSCI_VERSION(0, 1) )
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (11 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 12/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
@ 2026-04-02 10:45 ` Mykola Kvach
2026-04-02 11:00 ` Jan Beulich
2026-04-29 8:05 ` Luca Fancellu
2026-04-16 12:51 ` PING: Re: [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2026-04-16 12:52 ` Mykola Kvach
14 siblings, 2 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-04-02 10:45 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Jan Beulich, Roger Pau Monné, Rahul Singh
From: Mirela Simonovic <mirela.simonovic@aggios.com>
Trigger Xen suspend when the hardware domain initiates suspend via
SHUTDOWN_suspend. Redirect system suspend to CPU#0 to ensure the
suspend logic runs on the boot CPU, as required.
Introduce full suspend/resume infrastructure gated by CONFIG_SYSTEM_SUSPEND,
including logic to:
- disable and enable non-boot physical CPUs
- freeze and thaw domains
- suspend and resume the GIC, timer, iommu and console
- maintain system state before and after suspend
On boot, init_ttbr is normally initialized during secondary CPU hotplug.
On uniprocessor systems, this would leave init_ttbr uninitialized,
causing resume to fail. To address this, the boot CPU now sets init_ttbr
during suspend.
Remove the restriction in the vPSCI interface preventing suspend from the
hardware domain.
Select HAS_SYSTEM_SUSPEND for ARM_64.
Introduce CONFIG_HAS_HWDOM_SYSTEM_SUSPEND as an architecture-selected
capability for platforms where the hardware domain survives
SHUTDOWN_suspend without hwdom_shutdown(). ARM_64 selects it with
SYSTEM_SUSPEND enabled; other architectures keep the existing behaviour.
Note: the code is behind CONFIG_HAS_SYSTEM_SUSPEND, which is currently only
selected when UNSUPPORTED is set and when MPU isn't set, so the functionality
is built but disabled by default.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V8:
- Add a pre-suspend check in system_suspend() after scheduler_disable()
to require all domains to be in the shut down state with
SHUTDOWN_suspend before proceeding with the global suspend flow.
- Drop the common-level depends on !ARM_64 || !SYSTEM_SUSPEND from
CONFIG_HAS_HWDOM_SHUTDOWN_ON_SUSPEND and model the ARM64 suspend
case with an arch-selected capability instead.
- Rename CONFIG_HAS_HWDOM_SHUTDOWN_ON_SUSPEND to CONFIG_HAS_HWDOM_SYSTEM_SUSPEND.
- Rename need_hwdom_shutdown() to want_hwdom_shutdown().
Changes in V7:
- Control domain is responsible for host suspend
- Move the is_hardware_domain check into host_system_suspend()
- Add an empty inline host_system_suspend() function when SYSTEM_SUSPEND
config is disabled
- Use IS_ENABLED() for config checking instead of #ifdef
- Replace #ifdef checks in domain_shutdown() with IS_ENABLED() to simplify
control flow.
- Factor hardware domain shutdown condition into a helper
(need_hwdom_shutdown()) to avoid preprocessor directives inside the function.
- Squash with iommu suspend/resume commit
---
xen/arch/arm/Kconfig | 2 +
xen/arch/arm/include/asm/mm.h | 2 +
xen/arch/arm/include/asm/suspend.h | 7 +-
xen/arch/arm/mmu/smpboot.c | 2 +-
xen/arch/arm/suspend.c | 181 +++++++++++++++++++++++++++++
xen/arch/arm/vpsci.c | 12 +-
xen/common/Kconfig | 3 +
xen/common/domain.c | 7 +-
xen/drivers/passthrough/arm/smmu.c | 10 ++
9 files changed, 220 insertions(+), 6 deletions(-)
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 2f2b501fda..c2e63ce8ff 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -8,6 +8,8 @@ config ARM_64
depends on !ARM_32
select 64BIT
select HAS_FAST_MULTIPLY
+ select HAS_HWDOM_SYSTEM_SUSPEND if SYSTEM_SUSPEND
+ select HAS_SYSTEM_SUSPEND if !MPU && UNSUPPORTED
select HAS_VPCI_GUEST_SUPPORT if PCI_PASSTHROUGH
config ARM
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 72a6928624..87b54a55dc 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -360,6 +360,8 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
} while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
}
+void set_init_ttbr(lpae_t *root);
+
#endif /* __ARCH_ARM_MM__ */
/*
* Local variables:
diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
index c127fa3d78..c36ba23b10 100644
--- a/xen/arch/arm/include/asm/suspend.h
+++ b/xen/arch/arm/include/asm/suspend.h
@@ -38,7 +38,12 @@ extern struct cpu_context cpu_context;
int prepare_resume_ctx(struct cpu_context *ptr);
void hyp_resume(void);
-#endif /* CONFIG_SYSTEM_SUSPEND */
+void host_system_suspend(struct domain *d);
+
+#else /* !CONFIG_SYSTEM_SUSPEND */
+
+static inline void host_system_suspend(struct domain *d) { (void)d; }
+#endif
#endif /* ARM_SUSPEND_H */
diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
index 37e91d72b7..ff508ecf40 100644
--- a/xen/arch/arm/mmu/smpboot.c
+++ b/xen/arch/arm/mmu/smpboot.c
@@ -72,7 +72,7 @@ static void clear_boot_pagetables(void)
clear_table(boot_third);
}
-static void set_init_ttbr(lpae_t *root)
+void set_init_ttbr(lpae_t *root)
{
/*
* init_ttbr is part of the identity mapping which is read-only. So
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
index e38566b0b7..4d1289776b 100644
--- a/xen/arch/arm/suspend.c
+++ b/xen/arch/arm/suspend.c
@@ -1,9 +1,190 @@
/* SPDX-License-Identifier: GPL-2.0-only */
+#include <asm/psci.h>
#include <asm/suspend.h>
+#include <public/sched.h>
+#include <xen/console.h>
+#include <xen/cpu.h>
+#include <xen/errno.h>
+#include <xen/iommu.h>
+#include <xen/sched.h>
+#include <xen/tasklet.h>
+
struct cpu_context cpu_context = {};
+static int can_system_suspend(void)
+{
+ int ret = 0;
+ struct domain *d;
+
+ rcu_read_lock(&domlist_read_lock);
+
+ for_each_domain ( d )
+ {
+ bool domain_suspended;
+
+ spin_lock(&d->shutdown_lock);
+ domain_suspended = d->is_shut_down &&
+ d->shutdown_code == SHUTDOWN_suspend;
+ spin_unlock(&d->shutdown_lock);
+
+ if ( domain_suspended )
+ continue;
+
+ printk(XENLOG_ERR
+ "System suspend requires all domains to be shut down for suspend (dom%d: isn't in suspend state)\n",
+ d->domain_id);
+
+ ret = -EBUSY;
+ break;
+ }
+
+ rcu_read_unlock(&domlist_read_lock);
+
+ return ret;
+}
+
+/* Xen suspend. data identifies the domain that initiated suspend. */
+static void system_suspend(void *data)
+{
+ int status;
+ unsigned long flags;
+ struct domain *d = (struct domain *)data;
+
+ BUG_ON(system_state != SYS_STATE_active);
+
+ system_state = SYS_STATE_suspend;
+
+ printk("Xen suspending...\n");
+
+ freeze_domains();
+ scheduler_disable();
+
+ status = can_system_suspend();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_scheduler;
+ }
+
+ /*
+ * Non-boot CPUs have to be disabled on suspend and enabled on resume
+ * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
+ * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
+ * platform capabilities, this may lead to the physical powering down of
+ * CPUs.
+ */
+ status = disable_nonboot_cpus();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_nonboot_cpus;
+ }
+
+ time_suspend();
+
+ status = iommu_suspend();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_time;
+ }
+
+ console_start_sync();
+ status = console_suspend();
+ if ( status )
+ {
+ dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", status);
+ system_state = SYS_STATE_resume;
+ goto resume_end_sync;
+ }
+
+ local_irq_save(flags);
+ status = gic_suspend();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_irqs;
+ }
+
+ set_init_ttbr(xen_pgtable);
+
+ /*
+ * Enable identity mapping before entering suspend to simplify
+ * the resume path
+ */
+ update_boot_mapping(true);
+
+ if ( prepare_resume_ctx(&cpu_context) )
+ {
+ status = call_psci_system_suspend();
+ /*
+ * If suspend is finalized properly by above system suspend PSCI call,
+ * the code below in this 'if' branch will never execute. Execution
+ * will continue from hyp_resume which is the hypervisor's resume point.
+ * In hyp_resume CPU context will be restored and since link-register is
+ * restored as well, it will appear to return from prepare_resume_ctx.
+ * The difference in returning from prepare_resume_ctx on system suspend
+ * versus resume is in function's return value: on suspend, the return
+ * value is a non-zero value, on resume it is zero. That is why the
+ * control flow will not re-enter this 'if' branch on resume.
+ */
+ if ( status )
+ dprintk(XENLOG_WARNING, "PSCI system suspend failed, err=%d\n",
+ status);
+ }
+
+ system_state = SYS_STATE_resume;
+ update_boot_mapping(false);
+
+ gic_resume();
+
+ resume_irqs:
+ local_irq_restore(flags);
+
+ console_resume();
+ resume_end_sync:
+ console_end_sync();
+
+ iommu_resume();
+
+ resume_time:
+ time_resume();
+
+ resume_nonboot_cpus:
+ /*
+ * The rcu_barrier() has to be added to ensure that the per cpu area is
+ * freed before a non-boot CPU tries to initialize it (_free_percpu_area()
+ * has to be called before the init_percpu_area()). This scenario occurs
+ * when non-boot CPUs are hot-unplugged on suspend and hotplugged on resume.
+ */
+ rcu_barrier();
+ enable_nonboot_cpus();
+
+ resume_scheduler:
+ scheduler_enable();
+ thaw_domains();
+
+ system_state = SYS_STATE_active;
+
+ printk("Resume (status %d)\n", status);
+
+ domain_resume(d);
+}
+
+static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
+
+void host_system_suspend(struct domain *d)
+{
+ system_suspend_tasklet.data = (void *)d;
+ /*
+ * The suspend procedure has to be finalized by the pCPU#0 (non-boot pCPUs
+ * will be disabled during the suspend).
+ */
+ tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
index bd87ec430d..8fb9172186 100644
--- a/xen/arch/arm/vpsci.c
+++ b/xen/arch/arm/vpsci.c
@@ -5,6 +5,7 @@
#include <asm/current.h>
#include <asm/domain.h>
+#include <asm/suspend.h>
#include <asm/vgic.h>
#include <asm/vpsci.h>
#include <asm/event.h>
@@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
if ( is_64bit_domain(d) && is_thumb )
return PSCI_INVALID_ADDRESS;
- /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
- if ( is_hardware_domain(d) )
+ if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
return PSCI_NOT_SUPPORTED;
/* Ensure that all CPUs other than the calling one are offline */
@@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
"SYSTEM_SUSPEND requested, epoint=%#"PRIregister", cid=%#"PRIregister"\n",
epoint, cid);
+ if ( is_control_domain(d) )
+ host_system_suspend(d);
+
return rc;
}
@@ -290,7 +293,10 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
return 0;
case PSCI_1_0_FN32_SYSTEM_SUSPEND:
case PSCI_1_0_FN64_SYSTEM_SUSPEND:
- return is_hardware_domain(current->domain) ? PSCI_NOT_SUPPORTED : 0;
+ if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) ||
+ !is_hardware_domain(current->domain) )
+ return 0;
+ fallthrough;
default:
return PSCI_NOT_SUPPORTED;
}
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 0a20aa0a12..feb1336f46 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -137,6 +137,9 @@ config HAS_EX_TABLE
config HAS_FAST_MULTIPLY
bool
+config HAS_HWDOM_SYSTEM_SUSPEND
+ bool
+
config HAS_IOPORTS
bool
diff --git a/xen/common/domain.c b/xen/common/domain.c
index bb9e210c28..d3edfb2a13 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1375,6 +1375,11 @@ void __domain_crash(struct domain *d)
domain_shutdown(d, SHUTDOWN_crash);
}
+static inline bool want_hwdom_shutdown(uint8_t reason)
+{
+ return !IS_ENABLED(CONFIG_HAS_HWDOM_SYSTEM_SUSPEND) ||
+ reason != SHUTDOWN_suspend;
+}
int domain_shutdown(struct domain *d, u8 reason)
{
@@ -1391,7 +1396,7 @@ int domain_shutdown(struct domain *d, u8 reason)
d->shutdown_code = reason;
reason = d->shutdown_code;
- if ( is_hardware_domain(d) )
+ if ( is_hardware_domain(d) && want_hwdom_shutdown(reason) )
hwdom_shutdown(reason);
if ( d->is_shutting_down )
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 22d306d0cb..45f29ef8ec 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
xfree(xen_domain);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+static int arm_smmu_suspend(void)
+{
+ return -ENOSYS;
+}
+#endif
+
static const struct iommu_ops arm_smmu_iommu_ops = {
.page_sizes = PAGE_SIZE_4K,
.init = arm_smmu_iommu_domain_init,
@@ -2960,6 +2967,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
.map_page = arm_iommu_map_page,
.unmap_page = arm_iommu_unmap_page,
.dt_xlate = arm_smmu_dt_xlate_generic,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = arm_smmu_suspend,
+#endif
};
static struct arm_smmu_device *find_smmu(const struct device *dev)
--
2.43.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
2026-04-02 10:45 ` [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
@ 2026-04-02 11:00 ` Jan Beulich
2026-04-29 8:05 ` Luca Fancellu
1 sibling, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2026-04-02 11:00 UTC (permalink / raw)
To: Mykola Kvach
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Rahul Singh, xen-devel
On 02.04.2026 12:45, Mykola Kvach wrote:
> +/* Xen suspend. data identifies the domain that initiated suspend. */
> +static void system_suspend(void *data)
> +{
> + int status;
> + unsigned long flags;
> + struct domain *d = (struct domain *)data;
> +
> + BUG_ON(system_state != SYS_STATE_active);
> +
> + system_state = SYS_STATE_suspend;
> +
> + printk("Xen suspending...\n");
> +
> + freeze_domains();
> + scheduler_disable();
> +
> + status = can_system_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_scheduler;
> + }
> +
> + /*
> + * Non-boot CPUs have to be disabled on suspend and enabled on resume
> + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
> + * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
> + * platform capabilities, this may lead to the physical powering down of
> + * CPUs.
> + */
> + status = disable_nonboot_cpus();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_nonboot_cpus;
> + }
> +
> + time_suspend();
> +
> + status = iommu_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_time;
> + }
So you've frozen the system just to get ...
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
... unconditional failure from here?
Also, ENOSYS is clearly inappropriate to use here. EOPNOTSUPP or something yet
better distinguishable, please (if this can't be dropped altogether).
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* PING: Re: [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (12 preceding siblings ...)
2026-04-02 10:45 ` [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
@ 2026-04-16 12:51 ` Mykola Kvach
2026-04-16 12:52 ` Mykola Kvach
14 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-04-16 12:51 UTC (permalink / raw)
To: Xen-devel
Hi all,
A gentle ping on this series. It's been a couple of weeks since v8 was posted.
Please let me know if there are any further comments or if anything else is
needed from my side to move this forward.
Thanks!
Best regards,
Mykola
On Thu, Apr 2, 2026 at 1:47 PM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> This is part 2 of version 8 of the ARM Xen system suspend/resume patch
> series, based on earlier work by Mirela Simonovic and Mykyta Poturai.
>
> The first part is in mainline.
>
> NOTE: Most of the code is guarded by CONFIG_SYSTEM_SUSPEND, which can
> currently only be selected when UNSUPPORTED is set, and thus the
> functionality is neither enabled by default nor even built.
>
> This version is ported to Xen master and includes extensive improvements
> based on reviewer feedback. The patch series restructures code to improve
> robustness, maintainability, and implements system Suspend-to-RAM support
> on ARM64 hardware/control domains.
>
> Key updates in this series:
> - Introduced architecture-specific suspend/resume infrastructure
> - Integrated GICv2/GICv3 suspend and resume, including memory-backed context
> save/restore with error handling
> - Added time and IRQ suspend/resume hooks, ensuring correct timer/interrupt
> state across suspend cycles
> - Implemented proper PSCI SYSTEM_SUSPEND invocation and version checks
> - Improved state management and recovery in error cases during suspend/resume
> - Added support for IPMMU-VMSA/SMMUv3 context save/restore
> - Added support for GICv3 eSPI registers context save/restore
> - Added support for ITS registers context save/restore
> ---
>
> TODOs:
> - Enable "xl suspend" support on ARM
> - Add suspend/resume CI test for ARM (QEMU if feasible)
> - PCI suspend ?
> ---
>
> Detailed changelogs can be found in each patch.
>
> Changes in v8:
> - Rebased to latest master and refreshed the series accordingly.
> - Added a new GICv3 patch to tolerate retained redistributor LPI state
> across CPU_OFF/CPU_ON.
> - GICv2 suspend now disables the CPU interface and distributor before
> saving state.
> - GICv3 suspend/resume fixes the redistributor base used for LPI state.
> - ITS and SMMUv3 suspend/resume paths were tightened, with safer
> restore/rollback handling and stricter fatal-error handling.
> - System suspend now checks that all domains are already in
> SHUTDOWN_suspend before proceeding, and renames the hardware-domain
> suspend capability/helper for clearer semantics.
> - Fixed alignment/cleanup issues in the low-level suspend/resume code.
>
> Changes in v7:
> - Timer helper renamed/clarified; virtual/hyper/phys handling documented.
> - GICv2 uses one context block; restore saved CTLR; panic on alloc failure.
> - GICv3/eSPI/ITS always suspend/resume; restore LPI/eSPI; rdist timeout.
> - IPMMU suspend context allocated before PCI setup.
> - System suspend: control domain drives host suspend.
> - Dropped v6 IRQ descriptor restore patches; use setup_irq and re-register
> local IRQs on resume instead.
>
> For earlier changelogs, please refer to the previous cover letters.
>
> Mirela Simonovic (6):
> xen/arm: Add suspend and resume timer helpers
> xen/arm: gic-v2: Implement GIC suspend/resume functions
> xen/arm: Resume memory management on Xen resume
> xen/arm: Save/restore context on suspend/resume
> xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
> xen/arm: Add support for system suspend triggered by hardware domain
>
> Mykola Kvach (6):
> xen/arm: gic-v3: tolerate retained redistributor LPI state across
> CPU_OFF
> xen/arm: gic-v3: Implement GICv3 suspend/resume functions
> xen/arm: gic-v3: add ITS suspend/resume support
> xen/arm: tee: keep init_tee_secondary() for hotplug and resume
> xen/arm: ffa: fix notification SRI across CPU hotplug/suspend
> arm/smmu-v3: add suspend/resume handlers
>
> Oleksandr Tyshchenko (1):
> iommu/ipmmu-vmsa: Implement suspend/resume callbacks
>
> xen/arch/arm/Kconfig | 2 +
> xen/arch/arm/Makefile | 1 +
> xen/arch/arm/arm64/head.S | 112 ++++++++
> xen/arch/arm/gic-v2.c | 132 +++++++++
> xen/arch/arm/gic-v3-its.c | 126 +++++++-
> xen/arch/arm/gic-v3-lpi.c | 80 +++++-
> xen/arch/arm/gic-v3.c | 349 ++++++++++++++++++++++-
> xen/arch/arm/gic.c | 29 ++
> xen/arch/arm/include/asm/gic.h | 12 +
> xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
> xen/arch/arm/include/asm/gic_v3_its.h | 24 ++
> xen/arch/arm/include/asm/mm.h | 2 +
> xen/arch/arm/include/asm/psci.h | 1 +
> xen/arch/arm/include/asm/suspend.h | 31 ++
> xen/arch/arm/include/asm/time.h | 5 +
> xen/arch/arm/mmu/smpboot.c | 2 +-
> xen/arch/arm/psci.c | 23 +-
> xen/arch/arm/suspend.c | 195 +++++++++++++
> xen/arch/arm/tee/ffa_notif.c | 63 +++-
> xen/arch/arm/tee/tee.c | 2 +-
> xen/arch/arm/time.c | 44 ++-
> xen/arch/arm/vpsci.c | 12 +-
> xen/common/Kconfig | 3 +
> xen/common/domain.c | 7 +-
> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 305 +++++++++++++++++++-
> xen/drivers/passthrough/arm/smmu-v3.c | 172 ++++++++---
> xen/drivers/passthrough/arm/smmu.c | 10 +
> xen/include/xen/list.h | 14 +
> 28 files changed, 1670 insertions(+), 89 deletions(-)
> create mode 100644 xen/arch/arm/suspend.c
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 66+ messages in thread
* PING: Re: [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (13 preceding siblings ...)
2026-04-16 12:51 ` PING: Re: [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
@ 2026-04-16 12:52 ` Mykola Kvach
14 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-04-16 12:52 UTC (permalink / raw)
To: Xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Jan Beulich, Roger Pau Monné, Jens Wiklander, Rahul Singh
Hi all,
A gentle ping on this series. It's been a couple of weeks since v8 was posted.
Please let me know if there are any further comments or if anything else is
needed from my side to move this forward.
Thanks!
Best regards,
Mykola
On Thu, Apr 2, 2026 at 1:47 PM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> This is part 2 of version 8 of the ARM Xen system suspend/resume patch
> series, based on earlier work by Mirela Simonovic and Mykyta Poturai.
>
> The first part is in mainline.
>
> NOTE: Most of the code is guarded by CONFIG_SYSTEM_SUSPEND, which can
> currently only be selected when UNSUPPORTED is set, and thus the
> functionality is neither enabled by default nor even built.
>
> This version is ported to Xen master and includes extensive improvements
> based on reviewer feedback. The patch series restructures code to improve
> robustness, maintainability, and implements system Suspend-to-RAM support
> on ARM64 hardware/control domains.
>
> Key updates in this series:
> - Introduced architecture-specific suspend/resume infrastructure
> - Integrated GICv2/GICv3 suspend and resume, including memory-backed context
> save/restore with error handling
> - Added time and IRQ suspend/resume hooks, ensuring correct timer/interrupt
> state across suspend cycles
> - Implemented proper PSCI SYSTEM_SUSPEND invocation and version checks
> - Improved state management and recovery in error cases during suspend/resume
> - Added support for IPMMU-VMSA/SMMUv3 context save/restore
> - Added support for GICv3 eSPI registers context save/restore
> - Added support for ITS registers context save/restore
> ---
>
> TODOs:
> - Enable "xl suspend" support on ARM
> - Add suspend/resume CI test for ARM (QEMU if feasible)
> - PCI suspend ?
> ---
>
> Detailed changelogs can be found in each patch.
>
> Changes in v8:
> - Rebased to latest master and refreshed the series accordingly.
> - Added a new GICv3 patch to tolerate retained redistributor LPI state
> across CPU_OFF/CPU_ON.
> - GICv2 suspend now disables the CPU interface and distributor before
> saving state.
> - GICv3 suspend/resume fixes the redistributor base used for LPI state.
> - ITS and SMMUv3 suspend/resume paths were tightened, with safer
> restore/rollback handling and stricter fatal-error handling.
> - System suspend now checks that all domains are already in
> SHUTDOWN_suspend before proceeding, and renames the hardware-domain
> suspend capability/helper for clearer semantics.
> - Fixed alignment/cleanup issues in the low-level suspend/resume code.
>
> Changes in v7:
> - Timer helper renamed/clarified; virtual/hyper/phys handling documented.
> - GICv2 uses one context block; restore saved CTLR; panic on alloc failure.
> - GICv3/eSPI/ITS always suspend/resume; restore LPI/eSPI; rdist timeout.
> - IPMMU suspend context allocated before PCI setup.
> - System suspend: control domain drives host suspend.
> - Dropped v6 IRQ descriptor restore patches; use setup_irq and re-register
> local IRQs on resume instead.
>
> For earlier changelogs, please refer to the previous cover letters.
>
> Mirela Simonovic (6):
> xen/arm: Add suspend and resume timer helpers
> xen/arm: gic-v2: Implement GIC suspend/resume functions
> xen/arm: Resume memory management on Xen resume
> xen/arm: Save/restore context on suspend/resume
> xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
> xen/arm: Add support for system suspend triggered by hardware domain
>
> Mykola Kvach (6):
> xen/arm: gic-v3: tolerate retained redistributor LPI state across
> CPU_OFF
> xen/arm: gic-v3: Implement GICv3 suspend/resume functions
> xen/arm: gic-v3: add ITS suspend/resume support
> xen/arm: tee: keep init_tee_secondary() for hotplug and resume
> xen/arm: ffa: fix notification SRI across CPU hotplug/suspend
> arm/smmu-v3: add suspend/resume handlers
>
> Oleksandr Tyshchenko (1):
> iommu/ipmmu-vmsa: Implement suspend/resume callbacks
>
> xen/arch/arm/Kconfig | 2 +
> xen/arch/arm/Makefile | 1 +
> xen/arch/arm/arm64/head.S | 112 ++++++++
> xen/arch/arm/gic-v2.c | 132 +++++++++
> xen/arch/arm/gic-v3-its.c | 126 +++++++-
> xen/arch/arm/gic-v3-lpi.c | 80 +++++-
> xen/arch/arm/gic-v3.c | 349 ++++++++++++++++++++++-
> xen/arch/arm/gic.c | 29 ++
> xen/arch/arm/include/asm/gic.h | 12 +
> xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
> xen/arch/arm/include/asm/gic_v3_its.h | 24 ++
> xen/arch/arm/include/asm/mm.h | 2 +
> xen/arch/arm/include/asm/psci.h | 1 +
> xen/arch/arm/include/asm/suspend.h | 31 ++
> xen/arch/arm/include/asm/time.h | 5 +
> xen/arch/arm/mmu/smpboot.c | 2 +-
> xen/arch/arm/psci.c | 23 +-
> xen/arch/arm/suspend.c | 195 +++++++++++++
> xen/arch/arm/tee/ffa_notif.c | 63 +++-
> xen/arch/arm/tee/tee.c | 2 +-
> xen/arch/arm/time.c | 44 ++-
> xen/arch/arm/vpsci.c | 12 +-
> xen/common/Kconfig | 3 +
> xen/common/domain.c | 7 +-
> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 305 +++++++++++++++++++-
> xen/drivers/passthrough/arm/smmu-v3.c | 172 ++++++++---
> xen/drivers/passthrough/arm/smmu.c | 10 +
> xen/include/xen/list.h | 14 +
> 28 files changed, 1670 insertions(+), 89 deletions(-)
> create mode 100644 xen/arch/arm/suspend.c
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 01/13] xen/arm: Add suspend and resume timer helpers
2026-04-02 10:45 ` [PATCH v8 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
@ 2026-04-20 15:22 ` Luca Fancellu
0 siblings, 0 replies; 66+ messages in thread
From: Luca Fancellu @ 2026-04-20 15:22 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Julien Grall
Hi Mykola,
> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> Timer interrupts must be disabled while the system is suspended to prevent
> spurious wake-ups. Suspending timers in Xen consists of disabling the
> physical timer and the hypervisor timer on the current CPU. The virtual
> timer does not need explicit handling here, as it is already disabled on
> vCPU context switch and its state is restored per-vCPU on the next context
> restore.
>
> Resuming consists of raising TIMER_SOFTIRQ, which prompts the generic
> timer code to reprogram the hypervisor timer with the correct timeout.
>
> Xen does not use or expose the physical timer, so it remains disabled
> across suspend/resume.
>
> Introduce a new helper, disable_phys_hyp_timers(), to encapsulate disabling
> of the physical and hypervisor timers.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> Acked-by: Julien Grall <jgrall@amazon.com>
> ---
>
Changes looks ok to me
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2026-04-02 10:45 ` [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
@ 2026-04-21 13:24 ` Luca Fancellu
2026-05-07 7:48 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-04-21 13:24 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
>
> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> index b23e72a3d0..dbff470962 100644
> --- a/xen/arch/arm/gic-v2.c
> +++ b/xen/arch/arm/gic-v2.c
> @@ -1098,6 +1098,129 @@ static int gicv2_iomem_deny_access(struct domain *d)
> return iomem_deny_access(d, mfn, mfn + nr);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +/* This struct represents block of 32 IRQs */
> +struct irq_block {
> + uint32_t icfgr[2]; /* 2 registers of 16 IRQs each */
> + uint32_t ipriorityr[8];
> + uint32_t isenabler;
> + uint32_t isactiver;
> + uint32_t itargetsr[8];
> +};
> +
> +/* GICv2 registers to be saved/restored on system suspend/resume */
> +struct gicv2_context {
> + /* GICC context */
> + struct cpu_ctx {
> + uint32_t ctlr;
> + uint32_t pmr;
> + uint32_t bpr;
> + } cpu;
> +
> + /* GICD context */
> + struct dist_ctx {
> + uint32_t ctlr;
> + /* Includes banked SGI/PPI state for the boot CPU. */
> + struct irq_block *irqs;
> + } dist;
> +};
> +
> +static struct gicv2_context gic_ctx;
> +
> +static int gicv2_suspend(void)
> +{
> + unsigned int i, blocks = DIV_ROUND_UP(gicv2_info.nr_lines, 32);
> +
> + /* Save GICC_CTLR configuration. */
> + gic_ctx.cpu.ctlr = readl_gicc(GICC_CTLR);
> +
> + /* Quiesce the GIC CPU interface before suspend. */
> + gicv2_cpu_disable();
> +
> + /* Save GICD configuration */
> + gic_ctx.dist.ctlr = readl_gicd(GICD_CTLR);
> + writel_gicd(0, GICD_CTLR);
> +
> + gic_ctx.cpu.pmr = readl_gicc(GICC_PMR);
> + gic_ctx.cpu.bpr = readl_gicc(GICC_BPR);
> +
> + for ( i = 0; i < blocks; i++ )
> + {
> + struct irq_block *irqs = gic_ctx.dist.irqs + i;
> + size_t j, off = i * sizeof(irqs->isenabler);
> +
> + irqs->isenabler = readl_gicd(GICD_ISENABLER + off);
> + irqs->isactiver = readl_gicd(GICD_ISACTIVER + off);
> +
> + off = i * sizeof(irqs->ipriorityr);
> + for ( j = 0; j < ARRAY_SIZE(irqs->ipriorityr); j++ )
> + {
> + irqs->ipriorityr[j] = readl_gicd(GICD_IPRIORITYR + off + j * 4);
> + irqs->itargetsr[j] = readl_gicd(GICD_ITARGETSR + off + j * 4);
regarding GICD_ITARGETSR ...
> + }
> +
> + off = i * sizeof(irqs->icfgr);
> + for ( j = 0; j < ARRAY_SIZE(irqs->icfgr); j++ )
> + irqs->icfgr[j] = readl_gicd(GICD_ICFGR + off + j * 4);
> + }
> +
> + return 0;
> +}
> +
> +static void gicv2_resume(void)
> +{
> + unsigned int i, blocks = DIV_ROUND_UP(gicv2_info.nr_lines, 32);
> +
> + gicv2_cpu_disable();
> + /* Disable distributor */
> + writel_gicd(0, GICD_CTLR);
> +
> + for ( i = 0; i < blocks; i++ )
> + {
> + struct irq_block *irqs = gic_ctx.dist.irqs + i;
> + size_t j, off = i * sizeof(irqs->isenabler);
> +
> + writel_gicd(GENMASK(31, 0), GICD_ICENABLER + off);
> + writel_gicd(irqs->isenabler, GICD_ISENABLER + off);
> +
> + writel_gicd(GENMASK(31, 0), GICD_ICACTIVER + off);
> + writel_gicd(irqs->isactiver, GICD_ISACTIVER + off);
> +
> + off = i * sizeof(irqs->ipriorityr);
> + for ( j = 0; j < ARRAY_SIZE(irqs->ipriorityr); j++ )
> + {
> + writel_gicd(irqs->ipriorityr[j], GICD_IPRIORITYR + off + j * 4);
> + writel_gicd(irqs->itargetsr[j], GICD_ITARGETSR + off + j * 4);
… please let me know if I read correctly this loop, but here GICD_ITARGETSR0 … 7
are restored when i=0, but the specificaitons says that this block is read only on
multiprocessor, so we should skip the restore part.
Also saving it could be skipped because each field returns a value that corresponds
only to the processor reading the register.
4.3.12 User constraints [1]
> + }
> +
> + off = i * sizeof(irqs->icfgr);
> + for ( j = 0; j < ARRAY_SIZE(irqs->icfgr); j++ )
> + writel_gicd(irqs->icfgr[j], GICD_ICFGR + off + j * 4);
> + }
> +
> + /* Make sure all registers are restored and enable distributor */
> + writel_gicd(gic_ctx.dist.ctlr, GICD_CTLR);
> +
> + /* Restore GIC CPU interface configuration */
> + writel_gicc(gic_ctx.cpu.pmr, GICC_PMR);
> + writel_gicc(gic_ctx.cpu.bpr, GICC_BPR);
> +
> + /* Enable GIC CPU interface */
> + writel_gicc(gic_ctx.cpu.ctlr, GICC_CTLR);
> +}
> +
I also see that we don’t save pending SGIs state (by GICD_CPENDSGIRn/GICD_SPENDSGIRn) or Active Priorities registers
state (GICC_APRn/GICC_NSAPRn [latter if security extension are there]) as written in [1] “4.5 Preserving and restoring GIC state”,
was it intentional?
[1] https://developer.arm.com/documentation/ihi0048/bb/?lang=en
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 03/13] xen/arm: gic-v3: tolerate retained redistributor LPI state across CPU_OFF
2026-04-02 10:45 ` [PATCH v8 03/13] xen/arm: gic-v3: tolerate retained redistributor LPI state across CPU_OFF Mykola Kvach
@ 2026-04-22 15:55 ` Luca Fancellu
2026-05-05 6:06 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-04-22 15:55 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
> +
> +static int gicv3_lpi_disable_lpis(void __iomem *rdist_base)
> +{
> + uint32_t reg = readl_relaxed(rdist_base + GICR_CTLR);
> + int ret;
> +
> + if ( !(reg & GICR_CTLR_ENABLE_LPIS) )
> + return 0;
> +
> + writel_relaxed(reg & ~GICR_CTLR_ENABLE_LPIS, rdist_base + GICR_CTLR);
> +
> + /*
> + * The spec only guarantees programmability when we have observed the bit
> + * cleared. Where clearing is supported, RWP must reach 0 before touching
> + * PROPBASER/PENDBASER again.
> + */
> + wmb();
> +
> + ret = gicv3_do_wait_for_rwp(rdist_base);
I’m looking into the implementation of gicv3_do_wait_for_rwp() and I see
it’s polling on bit 31 (UWP) instead of bit 3 (RWP)?
Not related to this patch but I feel we need to raise this.
> + if ( ret )
> + return ret;
> +
> + reg = readl_relaxed(rdist_base + GICR_CTLR);
> + if ( reg & GICR_CTLR_ENABLE_LPIS )
> + return -EBUSY;
> +
> + return 0;
> +}
> +
> /*
> * Tell a redistributor about the (shared) property table, allocating one
> * if not already done.
> @@ -373,7 +434,21 @@ int gicv3_lpi_init_rdist(void __iomem * rdist_base)
> /* Make sure LPIs are disabled before setting up the tables. */
> reg = readl_relaxed(rdist_base + GICR_CTLR);
> if ( reg & GICR_CTLR_ENABLE_LPIS )
> - return -EBUSY;
> + {
> + if ( gicv3_lpi_tables_match(rdist_base) )
> + return -EBUSY;
> +
> + ret = gicv3_lpi_disable_lpis(rdist_base);
> + if ( ret == -EBUSY )
> + {
> + printk(XENLOG_ERR
> + "GICv3: CPU%d: LPIs still enabled with unexpected redistributor tables\n",
> + smp_processor_id());
> + return -EINVAL;
> + }
> + if ( ret )
> + return ret;
> + }
>
> ret = gicv3_lpi_set_pendtable(rdist_base);
> if ( ret )
> diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> index bc07f97c16..34fb065afc 100644
> --- a/xen/arch/arm/gic-v3.c
> +++ b/xen/arch/arm/gic-v3.c
> @@ -274,8 +274,8 @@ static void gicv3_enable_sre(void)
> isb();
> }
>
> -/* Wait for completion of a distributor change */
> -static void gicv3_do_wait_for_rwp(void __iomem *base)
> +/* Wait for completion of a distributor/redistributor write-pending change. */
> +int gicv3_do_wait_for_rwp(void __iomem *base)
> {
> uint32_t val;
> bool timeout = false;
> @@ -295,17 +295,22 @@ static void gicv3_do_wait_for_rwp(void __iomem *base)
> } while ( 1 );
>
> if ( timeout )
> + {
> dprintk(XENLOG_ERR, "RWP timeout\n");
> + return -ETIMEDOUT;
> + }
> +
> + return 0;
> }
>
> static void gicv3_dist_wait_for_rwp(void)
> {
> - gicv3_do_wait_for_rwp(GICD);
> + (void)gicv3_do_wait_for_rwp(GICD);
> }
>
> static void gicv3_redist_wait_for_rwp(void)
> {
> - gicv3_do_wait_for_rwp(GICD_RDIST_BASE);
> + (void)gicv3_do_wait_for_rwp(GICD_RDIST_BASE);
> }
>
> static void gicv3_wait_for_rwp(int irq)
> @@ -925,7 +930,7 @@ static int __init gicv3_populate_rdist(void)
> gicv3_set_redist_address(rdist_addr, procnum);
>
> ret = gicv3_lpi_init_rdist(ptr);
> - if ( ret && ret != -ENODEV )
> + if ( ret && ret != -ENODEV && ret != -EBUSY )
> {
> printk("GICv3: CPU%d: Cannot initialize LPIs: %u\n”,
This should be the other way around? %u for smp_processor_id() and %d for ret?
> smp_processor_id(), ret);
> diff --git a/xen/arch/arm/include/asm/gic_v3_its.h b/xen/arch/arm/include/asm/gic_v3_its.h
> index fc5a84892c..081bd19180 100644
> --- a/xen/arch/arm/include/asm/gic_v3_its.h
> +++ b/xen/arch/arm/include/asm/gic_v3_its.h
Why this header and not gic.h?
> @@ -133,6 +133,7 @@ struct host_its {
>
> /* Map a collection for this host CPU to each host ITS. */
> int gicv3_its_setup_collection(unsigned int cpu);
> +int gicv3_do_wait_for_rwp(void __iomem *base);
>
> #ifdef CONFIG_HAS_ITS
>
>
The rest looks ok to me!
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 04/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions
2026-04-02 10:45 ` [PATCH v8 04/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions Mykola Kvach
@ 2026-04-23 11:28 ` Luca Fancellu
2026-05-05 7:26 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-04-23 11:28 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
> diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> index 34fb065afc..d182a71478 100644
> --- a/xen/arch/arm/gic-v3.c
> +++ b/xen/arch/arm/gic-v3.c
> @@ -1072,12 +1072,12 @@ out:
> return res;
> }
>
> -static void gicv3_hyp_disable(void)
> +static void gicv3_hyp_enable(bool enable)
> {
> register_t hcr;
>
> hcr = READ_SYSREG(ICH_HCR_EL2);
> - hcr &= ~GICH_HCR_EN;
> + hcr = enable ? (hcr | GICH_HCR_EN) : (hcr & ~GICH_HCR_EN);
> WRITE_SYSREG(hcr, ICH_HCR_EL2);
> isb();
> }
> @@ -1184,7 +1184,7 @@ static void gicv3_disable_interface(void)
> spin_lock(&gicv3.lock);
>
> gicv3_cpu_disable();
> - gicv3_hyp_disable();
> + gicv3_hyp_enable(false);
>
> spin_unlock(&gicv3.lock);
> }
> @@ -1920,6 +1920,313 @@ static bool gic_dist_supports_lpis(void)
> return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +/* This struct represent block of 32 IRQs */
NIT: s/represent/represents
> +struct dist_irq_block {
> + uint32_t icfgr[2];
> + uint32_t ipriorityr[8];
> + uint64_t irouter[32];
> + uint32_t isactiver;
> + uint32_t isenabler;
> +};
> +
> +struct redist_ctx {
> + uint32_t ctlr;
> + uint32_t icfgr; /* only PPIs stored */
> + uint32_t igroupr;
I think Xen writes also GICD_IGROUPR<n>E, we are not saving it so in case of a reset
we would have GICD_IGROUPR<n>E containing the reset value which is zero.
Or we could decide to re-initialise it in the same way Xen does (all 1s).
> + uint32_t ipriorityr[8];
> + uint32_t isactiver;
> + uint32_t isenabler;
> +
> + uint64_t pendbase;
> + uint64_t propbase;
> +};
> +
> +/* GICv3 registers to be saved/restored on system suspend/resume */
> +struct gicv3_ctx {
> + struct dist_ctx {
> + uint32_t ctlr;
> + struct dist_irq_block *irqs, *espi_irqs;
NIT: I would declare them one after the other and not in the same line, but this is a matter of taste
maybe so I will leave the decision to the maintainers.
> + } dist;
> +
> + /* have only one rdist structure for last running CPU during suspend */
> + struct redist_ctx rdist;
> +
> + struct cpu_ctx {
> + uint32_t ctlr;
> + uint32_t pmr;
> + uint32_t bpr;
> + uint32_t sre_el2;
> + uint32_t grpen;
> + } cpu;
> +};
> +
> +static struct gicv3_ctx gicv3_ctx;
> +
> +static void __init gicv3_alloc_context(void)
> +{
> + uint32_t blocks = DIV_ROUND_UP(gicv3_info.nr_lines, 32);
> +
> + /* The spec allows for systems without any SPIs */
> + if ( blocks > 1 )
> + {
> + gicv3_ctx.dist.irqs = xzalloc_array(struct dist_irq_block, blocks - 1);
> + if ( !gicv3_ctx.dist.irqs )
> + panic("Failed to allocate memory for GICv3 suspend context\n");
> + }
> +
> +#ifdef CONFIG_GICV3_ESPI
> + if ( !gic_number_espis() )
> + return;
> +
> + blocks = gic_number_espis() / 32;
> + gicv3_ctx.dist.espi_irqs = xzalloc_array(struct dist_irq_block, blocks);
> + if ( !gicv3_ctx.dist.espi_irqs )
> + panic("Failed to allocate memory for GICv3 eSPI suspend context\n");
> +#endif
> +}
> +
> +static int gicv3_disable_redist(void)
> +{
> + void __iomem *waker = GICD_RDIST_BASE + GICR_WAKER;
> + s_time_t deadline;
> +
> + /*
> + * Avoid infinite loop if Non-secure does not have access to GICR_WAKER.
> + * See Arm IHI 0069H.b, 12.11.42 GICR_WAKER:
> + * When GICD_CTLR.DS == 0 and an access is Non-secure accesses to this
> + * register are RAZ/WI.
> + */
> + if ( !(readl_relaxed(GICD + GICD_CTLR) & GICD_CTLR_DS) )
> + return 0;
> +
> + deadline = NOW() + MILLISECS(1000);
> +
> + writel_relaxed(readl_relaxed(waker) | GICR_WAKER_ProcessorSleep, waker);
> + while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 )
> + {
> + if ( NOW() > deadline )
> + {
> + printk("GICv3: Timeout waiting for redistributor to sleep\n");
> + return -ETIMEDOUT;
> + }
> + cpu_relax();
> + udelay(10);
> + }
> +
> + return 0;
> +}
> +
> +#define GET_SPI_REG_OFFSET(name, is_espi) \
> + ((is_espi) ? GICD_##name##nE : GICD_##name)
> +
> +static void gicv3_store_spi_irq_block(struct dist_irq_block *irqs,
> + unsigned int i, bool is_espi)
> +{
> + void __iomem *base;
> + unsigned int irq;
> +
> + base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + i * sizeof(irqs->icfgr);
> + irqs->icfgr[0] = readl_relaxed(base);
> + irqs->icfgr[1] = readl_relaxed(base + 4);
> +
> + base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi);
> + base += i * sizeof(irqs->ipriorityr);
> + for ( irq = 0; irq < ARRAY_SIZE(irqs->ipriorityr); irq++ )
> + irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
> +
> + base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi);
> + base += i * sizeof(irqs->irouter);
> + for ( irq = 0; irq < ARRAY_SIZE(irqs->irouter); irq++ )
> + irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
> +
> + base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi);
> + base += i * sizeof(irqs->isactiver);
> + irqs->isactiver = readl_relaxed(base);
> +
> + base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi);
> + base += i * sizeof(irqs->isenabler);
> + irqs->isenabler = readl_relaxed(base);
> +}
> +
> +static void gicv3_restore_spi_irq_block(struct dist_irq_block *irqs,
> + unsigned int i, bool is_espi)
> +{
> + void __iomem *base;
> + unsigned int irq;
> +
> + base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + i * sizeof(irqs->icfgr);
> + writel_relaxed(irqs->icfgr[0], base);
> + writel_relaxed(irqs->icfgr[1], base + 4);
> +
> + base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi);
> + base += i * sizeof(irqs->ipriorityr);
> + for ( irq = 0; irq < ARRAY_SIZE(irqs->ipriorityr); irq++ )
> + writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
> +
> + base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi);
> + base += i * sizeof(irqs->irouter);
> + for ( irq = 0; irq < ARRAY_SIZE(irqs->irouter); irq++ )
> + writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
The [1] 12.9.22 GICD_IROUTER<n> says "these registers are used only when affinity routing is enabled.
When affinity routing is not enabled: These registers are RES0. An implementation is permitted to make
the register RAZ/WI in this case”
So I think these needs to be written after we set GICD_CTLR or we are going to loose anything written there
and also the configuration won’t be restored.
> +
> + base = GICD + GET_SPI_REG_OFFSET(ICENABLER, is_espi) + i * 4;
> + writel_relaxed(GENMASK(31, 0), base);
> +
> + base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi);
> + base += i * sizeof(irqs->isenabler);
> + writel_relaxed(irqs->isenabler, base);
> +
> + base = GICD + GET_SPI_REG_OFFSET(ICACTIVER, is_espi) + i * 4;
> + writel_relaxed(GENMASK(31, 0), base);
> +
> + base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi);
> + base += i * sizeof(irqs->isactiver);
> + writel_relaxed(irqs->isactiver, base);
> +}
> +
> +static int gicv3_suspend(void)
> +{
> + unsigned int i;
> + void __iomem *base;
> + int ret;
> + struct redist_ctx *rdist = &gicv3_ctx.rdist;
> +
> + /* Save GICC configuration */
> + gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
> + gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
> + gicv3_ctx.cpu.bpr = READ_SYSREG(ICC_BPR1_EL1);
> + gicv3_ctx.cpu.sre_el2 = READ_SYSREG(ICC_SRE_EL2);
> + gicv3_ctx.cpu.grpen = READ_SYSREG(ICC_IGRPEN1_EL1);
> +
> + gicv3_disable_interface();
this one is calling also gicv3_cpu_disable() that will zero ICC_IGRPEN1_EL1 ...
> +
> + ret = gicv3_disable_redist();
> + if ( ret )
> + goto out_enable_iface;
… but when we fail here ...
> +
> + /* Save GICR configuration */
> + gicv3_redist_wait_for_rwp();
> +
> + base = GICD_RDIST_BASE;
> +
> + rdist->ctlr = readl_relaxed(base + GICR_CTLR);
> +
> + rdist->propbase = readq_relaxed(base + GICR_PROPBASER);
> + rdist->pendbase = readq_relaxed(base + GICR_PENDBASER);
> +
> + base = GICD_RDIST_SGI_BASE;
> +
> + /* Save priority on PPI and SGI interrupts */
> + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> + rdist->ipriorityr[i] = readl_relaxed(base + GICR_IPRIORITYR0 + 4 * i);
> +
> + rdist->isactiver = readl_relaxed(base + GICR_ISACTIVER0);
> + rdist->isenabler = readl_relaxed(base + GICR_ISENABLER0);
> + rdist->igroupr = readl_relaxed(base + GICR_IGROUPR0);
> + rdist->icfgr = readl_relaxed(base + GICR_ICFGR1);
> +
> + /* Save GICD configuration */
> + gicv3_dist_wait_for_rwp();
> + gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
> +
> + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> + gicv3_store_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
> +
> +#ifdef CONFIG_GICV3_ESPI
> + for ( i = 0; i < gic_number_espis() / 32; i++ )
> + gicv3_store_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
> +#endif
> +
> + return 0;
> +
> + out_enable_iface:
> + gicv3_hyp_enable(true);
> + WRITE_SYSREG(gicv3_ctx.cpu.ctlr, ICC_CTLR_EL1);
we don’t recover ICC_IGRPEN1_EL1
> + isb();
> +
> + return ret;
> +}
> +
> +static void gicv3_resume(void)
> +{
> + int ret;
> + unsigned int i;
> + void __iomem *base;
> + struct redist_ctx *rdist = &gicv3_ctx.rdist;
> +
> + writel_relaxed(0, GICD + GICD_CTLR);
> +
> + for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
> + writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
> +
> + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> + gicv3_restore_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
> +
> +#ifdef CONFIG_GICV3_ESPI
> + for ( i = 0; i < gic_number_espis() / 32; i++ )
> + gicv3_restore_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
> +#endif
> +
> + writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
> + gicv3_dist_wait_for_rwp();
> +
> + ret = gicv3_lpi_init_rdist(GICD_RDIST_BASE);
> + /*
> + * If LPIs are already enabled, assume firmware or the still-powered
> + * redistributor has valid PROPBASER/PENDBASER and skip reprogramming.
> + * Return -EBUSY so callers can ignore this case.
> + */
> + if ( ret && ret != -ENODEV && ret != -EBUSY )
> + panic("GICv3: Failed to re-initialize LPIs during resume\n");
> + else if ( ret == -EBUSY ) /* extra checks, just to be sure */
> + {
> + base = GICD_RDIST_BASE;
> + if ( readq_relaxed(base + GICR_PROPBASER) != rdist->propbase ||
> + readq_relaxed(base + GICR_PENDBASER) != rdist->pendbase )
> + {
> + panic("GICv3: LPIs already enabled with unexpected PROPBASER/PENDBASER during resume\n");
> + }
> + }
> +
> + /* Restore GICR (Redistributor) configuration */
> + if ( gicv3_enable_redist() )
> + panic("GICv3: Failed to re-enable redistributor during resume\n");
> +
> + base = GICD_RDIST_SGI_BASE;
> +
> + writel_relaxed(GENMASK(31, 0), base + GICR_ICENABLER0);
> + gicv3_redist_wait_for_rwp();
> +
> + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> + writel_relaxed(rdist->ipriorityr[i], base + GICR_IPRIORITYR0 + i * 4);
> +
> + writel_relaxed(rdist->isactiver, base + GICR_ISACTIVER0);
> + writel_relaxed(rdist->igroupr, base + GICR_IGROUPR0);
> + writel_relaxed(rdist->icfgr, base + GICR_ICFGR1);
> +
> + gicv3_redist_wait_for_rwp();
> +
> + writel_relaxed(rdist->isenabler, base + GICR_ISENABLER0);
> + writel_relaxed(rdist->ctlr, GICD_RDIST_BASE + GICR_CTLR);
> +
> + gicv3_redist_wait_for_rwp();
> +
> + WRITE_SYSREG(gicv3_ctx.cpu.sre_el2, ICC_SRE_EL2);
> + isb();
> +
> + /* Restore CPU interface (System registers) */
> + WRITE_SYSREG(gicv3_ctx.cpu.pmr, ICC_PMR_EL1);
> + WRITE_SYSREG(gicv3_ctx.cpu.bpr, ICC_BPR1_EL1);
> + WRITE_SYSREG(gicv3_ctx.cpu.ctlr, ICC_CTLR_EL1);
> + WRITE_SYSREG(gicv3_ctx.cpu.grpen, ICC_IGRPEN1_EL1);
> + isb();
> +
> + gicv3_hyp_init();
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> /* Set up the GIC */
> static int __init gicv3_init(void)
> {
> @@ -1994,6 +2301,10 @@ static int __init gicv3_init(void)
>
> gicv3_hyp_init();
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + gicv3_alloc_context();
> +#endif
> +
> out:
> spin_unlock(&gicv3.lock);
>
> @@ -2033,6 +2344,10 @@ static const struct gic_hw_operations gicv3_ops = {
> #endif
> .iomem_deny_access = gicv3_iomem_deny_access,
> .do_LPI = gicv3_do_LPI,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = gicv3_suspend,
> + .resume = gicv3_resume,
> +#endif
> };
>
> static int __init gicv3_dt_preinit(struct dt_device_node *node, const void *data)
> diff --git a/xen/arch/arm/include/asm/gic_v3_defs.h b/xen/arch/arm/include/asm/gic_v3_defs.h
> index c373b94d19..992c8f9c2f 100644
> --- a/xen/arch/arm/include/asm/gic_v3_defs.h
> +++ b/xen/arch/arm/include/asm/gic_v3_defs.h
> @@ -94,6 +94,7 @@
> #define GICD_TYPE_LPIS (1U << 17)
>
> #define GICD_CTLR_RWP (1UL << 31)
> +#define GICD_CTLR_DS (1U << 6)
> #define GICD_CTLR_ARE_NS (1U << 4)
> #define GICD_CTLR_ENABLE_G1A (1U << 1)
> #define GICD_CTLR_ENABLE_G1 (1U << 0)
>
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 05/13] xen/arm: gic-v3: add ITS suspend/resume support
2026-04-02 10:45 ` [PATCH v8 05/13] xen/arm: gic-v3: add ITS suspend/resume support Mykola Kvach
@ 2026-04-24 10:53 ` Luca Fancellu
2026-05-05 10:09 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-04-24 10:53 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné
Hi Mykola,
> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> Handle system suspend/resume for GICv3 with an ITS present so LPIs keep
> working after firmware powers the GIC down. Snapshot the CPU interface,
> distributor and last-CPU redistributor state, disable the ITS to cache its
> CTLR/CBASER/BASER registers, then restore everything and re-arm the
> collection on resume.
>
> Add list_for_each_entry_continue_reverse() in list.h for the ITS suspend
> error path that needs to roll back partially saved state.
>
> Based on Linux commit dba0bc7b76dc ("irqchip/gic-v3-its: Add ability to save/restore ITS state")
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V8:
> - Reword the CBASER/CWRITER comment to match Xen and drop the stale Linux
> cmd_write reference.
> - Clarify the list_for_each_entry_continue_reverse() comment.
> - Factor out per-ITS helpers for collection setup and resume.
> - Restore each ITS and re-establish its collection mapping in the same
> loop, so a failed ITS resume is not followed by MAPC/SYNC on that
> un-restored instance.
> - panic in case when resume of an ITS failed
> - cleanup baser cache during suspend
> ---
> xen/arch/arm/gic-v3-its.c | 126 ++++++++++++++++++++++++--
> xen/arch/arm/gic-v3.c | 15 ++-
> xen/arch/arm/include/asm/gic_v3_its.h | 23 +++++
> xen/include/xen/list.h | 14 +++
> 4 files changed, 166 insertions(+), 12 deletions(-)
>
> diff --git a/xen/arch/arm/gic-v3-its.c b/xen/arch/arm/gic-v3-its.c
> index 9ba068c46f..fe2865eac9 100644
> --- a/xen/arch/arm/gic-v3-its.c
> +++ b/xen/arch/arm/gic-v3-its.c
> @@ -335,6 +335,22 @@ static int its_send_cmd_inv(struct host_its *its,
> return its_send_command(its, cmd);
> }
>
> +static int gicv3_its_setup_collection_single(struct host_its *its,
> + unsigned int cpu)
> +{
> + int ret;
> +
> + ret = its_send_cmd_mapc(its, cpu, cpu);
> + if ( ret )
> + return ret;
> +
> + ret = its_send_cmd_sync(its, cpu);
> + if ( ret )
> + return ret;
> +
> + return gicv3_its_wait_commands(its);
> +}
> +
> /* Set up the (1:1) collection mapping for the given host CPU. */
> int gicv3_its_setup_collection(unsigned int cpu)
> {
> @@ -343,15 +359,7 @@ int gicv3_its_setup_collection(unsigned int cpu)
>
> list_for_each_entry(its, &host_its_list, entry)
> {
> - ret = its_send_cmd_mapc(its, cpu, cpu);
> - if ( ret )
> - return ret;
> -
> - ret = its_send_cmd_sync(its, cpu);
> - if ( ret )
> - return ret;
> -
> - ret = gicv3_its_wait_commands(its);
> + ret = gicv3_its_setup_collection_single(its, cpu);
> if ( ret )
> return ret;
> }
> @@ -1209,6 +1217,106 @@ int gicv3_its_init(void)
> return 0;
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +int gicv3_its_suspend(void)
> +{
> + struct host_its *its;
> + int ret;
> +
> + list_for_each_entry(its, &host_its_list, entry)
NIT: codestyle, spaces after and before the parenthesis
> + {
> + unsigned int i;
> + void __iomem *base = its->its_base;
> +
> + its->suspend_ctx.ctlr = readl_relaxed(base + GITS_CTLR);
> + ret = gicv3_disable_its(its);
This is called from system_suspend(), along the path iommu_suspend and
console_suspend() are called, finally reaching gic_suspend() and this one.
In the IHI 0069H.b, 5.6.2 Disabling an ITS, it says:
“Ensure that all interrupts that target the ITS that is being powered down are
either redirected or disabled”, is it correct to assume all the ITS targeting source
at this point are disabled because domains should be already suspended?
> + if ( ret )
> + {
> + writel_relaxed(its->suspend_ctx.ctlr, base + GITS_CTLR);
here and in the other places we write GITS_CTLR, this reg has Quiescent as RO,
maybe we should mask the write to only the other bits that are writable?
> + goto err;
> + }
> +
> + its->suspend_ctx.cbaser = readq_relaxed(base + GITS_CBASER);
> +
> + for (i = 0; i < GITS_BASER_NR_REGS; i++)
NIT: codestyle on the spaces and parenthesis
> + {
> + uint64_t baser = readq_relaxed(base + GITS_BASER0 + i * 8);
> +
> + its->suspend_ctx.baser[i] = 0;
> +
> + if ( !(baser & GITS_VALID_BIT) )
> + continue;
> +
> + its->suspend_ctx.baser[i] = baser;
> + }
> + }
> +
> + return 0;
> +
> + err:
> + list_for_each_entry_continue_reverse(its, &host_its_list, entry)
> + writel_relaxed(its->suspend_ctx.ctlr, its->its_base + GITS_CTLR);
> +
> + return ret;
> +}
> +
> +static int gicv3_its_resume_single(struct host_its *its, unsigned int cpu)
> +{
> + void __iomem *base = its->its_base;
> + unsigned int i;
> + int ret;
> +
> + /*
> + * Make sure that the ITS is disabled. If it fails to quiesce,
> + * don't restore it since writing to CBASER or BASER<n>
> + * registers is undefined according to the GIC v3 ITS
> + * Specification.
> + */
> + WARN_ON(readl_relaxed(base + GITS_CTLR) & GITS_CTLR_ENABLE);
> + ret = gicv3_disable_its(its);
> + if ( ret )
> + return ret;
> +
> + writeq_relaxed(its->suspend_ctx.cbaser, base + GITS_CBASER);
> +
> + /*
> + * Writing CBASER resets CREADR to 0, so reset CWRITER to
> + * keep the command queue pointers aligned.
> + */
> + writeq_relaxed(0, base + GITS_CWRITER);
> +
> + /* Restore GITS_BASER from the value cache. */
> + for ( i = 0; i < GITS_BASER_NR_REGS; i++ )
> + {
> + uint64_t baser = its->suspend_ctx.baser[i];
> +
> + if ( !(baser & GITS_VALID_BIT) )
> + continue;
> +
> + writeq_relaxed(baser, base + GITS_BASER0 + i * 8);
> + }
> +
> + writel_relaxed(its->suspend_ctx.ctlr, base + GITS_CTLR);
> +
> + return gicv3_its_setup_collection_single(its, cpu);
> +}
> +
> +void gicv3_its_resume(void)
> +{
> + struct host_its *its;
> + unsigned int cpu = smp_processor_id();
> + int ret;
> +
> + list_for_each_entry(its, &host_its_list, entry)
> + {
> + ret = gicv3_its_resume_single(its, cpu);
> + if ( ret )
> + panic("GICv3: ITS@%"PRIpaddr": failed to restore during resume: %d\n",
> + its->addr, ret);
> + }
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
>
> /*
> * Local variables:
> diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> index d182a71478..ef8318dd50 100644
> --- a/xen/arch/arm/gic-v3.c
> +++ b/xen/arch/arm/gic-v3.c
> @@ -862,7 +862,7 @@ static bool gicv3_enable_lpis(void)
> return true;
> }
>
> -static int __init gicv3_populate_rdist(void)
> +static int gicv3_populate_rdist(void)
> {
> int i;
> uint32_t aff;
> @@ -932,7 +932,7 @@ static int __init gicv3_populate_rdist(void)
> ret = gicv3_lpi_init_rdist(ptr);
> if ( ret && ret != -ENODEV && ret != -EBUSY )
> {
> - printk("GICv3: CPU%d: Cannot initialize LPIs: %u\n",
> + printk("GICv3: CPU%d: Cannot initialize LPIs: %d\n",
this is to fix the mistake of a patch before,
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume
2026-04-02 10:45 ` [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume Mykola Kvach
@ 2026-04-24 10:59 ` Luca Fancellu
2026-04-27 8:19 ` Bertrand Marquis
2026-05-07 22:26 ` Volodymyr Babchuk
2 siblings, 0 replies; 66+ messages in thread
From: Luca Fancellu @ 2026-04-24 10:59 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Volodymyr Babchuk,
Bertrand Marquis, Jens Wiklander, Stefano Stabellini,
Julien Grall, Michal Orzel
Hi Mykola,
> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> init_tee_secondary() was marked __init and freed after boot. Calling it
> from the CPU hotplug/resume path then executed discarded code, which
> could crash Xen. Drop __init so the TEE mediator secondary init can run
> safely on hotplugged and resumed CPUs.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> xen/arch/arm/tee/tee.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> index 8501443c8e..00e561fc78 100644
> --- a/xen/arch/arm/tee/tee.c
> +++ b/xen/arch/arm/tee/tee.c
> @@ -128,7 +128,7 @@ static int __init tee_init(void)
>
> presmp_initcall(tee_init);
>
> -void __init init_tee_secondary(void)
> +void init_tee_secondary(void)
> {
> if ( cur_mediator && cur_mediator->ops->init_secondary )
> cur_mediator->ops->init_secondary();
>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 07/13] xen/arm: ffa: fix notification SRI across CPU hotplug/suspend
2026-04-02 10:45 ` [PATCH v8 07/13] xen/arm: ffa: fix notification SRI across CPU hotplug/suspend Mykola Kvach
@ 2026-04-24 12:05 ` Luca Fancellu
2026-04-27 8:20 ` Bertrand Marquis
1 sibling, 0 replies; 66+ messages in thread
From: Luca Fancellu @ 2026-04-24 12:05 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Volodymyr Babchuk,
Bertrand Marquis, Jens Wiklander, Stefano Stabellini,
Julien Grall, Michal Orzel
Hi Mykola,
> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> The FF-A notification SRI interrupt handler was not correctly tied to
> CPU hotplug and suspend/resume. As a result, CPUs going offline and
> back online could end up with stale or missing handlers, breaking
> delivery of FF-A notifications.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
>
Looks ok to me
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 08/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2026-04-02 10:45 ` [PATCH v8 08/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
@ 2026-04-24 13:34 ` Luca Fancellu
2026-05-05 11:45 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-04-24 13:34 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Store and restore active context and micro-TLB registers.
>
> Tested on R-Car H3 Starter Kit.
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V7:
> - moved suspend context allocation before pci stuff
> ---
> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 305 ++++++++++++++++++++++-
> 1 file changed, 298 insertions(+), 7 deletions(-)
>
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index ea9fa9ddf3..6765bd3083 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -71,6 +71,8 @@
> })
> #endif
>
> +#define dev_dbg(dev, fmt, ...) \
> + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> #define dev_info(dev, fmt, ...) \
> dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> #define dev_warn(dev, fmt, ...) \
> @@ -130,6 +132,24 @@ struct ipmmu_features {
> unsigned int imuctr_ttsel_mask;
> };
>
> […]
>
> @@ -1340,10 +1608,11 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> struct iommu_fwspec *fwspec;
>
> #ifdef CONFIG_HAS_PCI
> + int ret;
> +
> if ( dev_is_pci(dev) )
> {
> struct pci_dev *pdev = dev_to_pci(dev);
> - int ret;
>
> if ( devfn != pdev->devfn )
> return 0;
> @@ -1371,6 +1640,15 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> /* Let Xen know that the master device is protected by an IOMMU. */
> dt_device_set_protected(dev_to_dt(dev));
> }
> +
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + if ( ipmmu_alloc_ctx_suspend(dev) )
> + {
> + dev_err(dev, "Failed to allocate context for suspend\n");
> + return -ENOMEM;
> + }
> +#endif
If this fails the device will remain protected, I suggest we move this one before `if ( !dev_is_pci(dev) ) { … }`
block
The rest looks ok to me, but I’m not an expert of this part.
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume
2026-04-02 10:45 ` [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume Mykola Kvach
2026-04-24 10:59 ` Luca Fancellu
@ 2026-04-27 8:19 ` Bertrand Marquis
2026-05-07 22:26 ` Volodymyr Babchuk
2 siblings, 0 replies; 66+ messages in thread
From: Bertrand Marquis @ 2026-04-27 8:19 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Volodymyr Babchuk,
Jens Wiklander, Stefano Stabellini, Julien Grall, Michal Orzel
Hi Mykola,
> On 2 Apr 2026, at 12:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> init_tee_secondary() was marked __init and freed after boot. Calling it
> from the CPU hotplug/resume path then executed discarded code, which
> could crash Xen. Drop __init so the TEE mediator secondary init can run
> safely on hotplugged and resumed CPUs.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Cheers
Bertrand
> ---
> xen/arch/arm/tee/tee.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> index 8501443c8e..00e561fc78 100644
> --- a/xen/arch/arm/tee/tee.c
> +++ b/xen/arch/arm/tee/tee.c
> @@ -128,7 +128,7 @@ static int __init tee_init(void)
>
> presmp_initcall(tee_init);
>
> -void __init init_tee_secondary(void)
> +void init_tee_secondary(void)
> {
> if ( cur_mediator && cur_mediator->ops->init_secondary )
> cur_mediator->ops->init_secondary();
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 07/13] xen/arm: ffa: fix notification SRI across CPU hotplug/suspend
2026-04-02 10:45 ` [PATCH v8 07/13] xen/arm: ffa: fix notification SRI across CPU hotplug/suspend Mykola Kvach
2026-04-24 12:05 ` Luca Fancellu
@ 2026-04-27 8:20 ` Bertrand Marquis
2026-05-05 10:18 ` Mykola Kvach
1 sibling, 1 reply; 66+ messages in thread
From: Bertrand Marquis @ 2026-04-27 8:20 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Volodymyr Babchuk,
Jens Wiklander, Stefano Stabellini, Julien Grall, Michal Orzel
Hi Mykola,
> On 2 Apr 2026, at 12:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> The FF-A notification SRI interrupt handler was not correctly tied to
> CPU hotplug and suspend/resume. As a result, CPUs going offline and
> back online could end up with stale or missing handlers, breaking
> delivery of FF-A notifications.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
This will probably need a rebase if the harden notification and VM to VM notification
serie in FF-A is merged first.
Anyway, changes look good so:
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Cheers
Bertrand
> ---
> xen/arch/arm/tee/ffa_notif.c | 63 ++++++++++++++++++++++++++++--------
> 1 file changed, 50 insertions(+), 13 deletions(-)
>
> diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
> index 186e726412..513c399594 100644
> --- a/xen/arch/arm/tee/ffa_notif.c
> +++ b/xen/arch/arm/tee/ffa_notif.c
> @@ -360,10 +360,28 @@ static int32_t ffa_notification_bitmap_destroy(uint16_t vm_id)
> return ffa_simple_call(FFA_NOTIFICATION_BITMAP_DESTROY, vm_id, 0, 0, 0);
> }
>
> -void ffa_notif_init_interrupt(void)
> +static DEFINE_PER_CPU_READ_MOSTLY(struct irqaction, sri_irq);
> +
> +static int request_sri_irq(void)
> {
> int ret;
> + struct irqaction *sri_action = &this_cpu(sri_irq);
> +
> + sri_action->name = "FF-A notif";
> + sri_action->handler = notif_irq_handler;
> + sri_action->dev_id = NULL;
> + sri_action->free_on_release = 0;
> +
> + ret = setup_irq(notif_sri_irq, 0, sri_action);
> + if ( ret )
> + printk(XENLOG_ERR "ffa: setup_irq irq %u failed: error %d\n",
> + notif_sri_irq, ret);
>
> + return ret;
> +}
> +
> +void ffa_notif_init_interrupt(void)
> +{
> if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
> {
> /*
> @@ -376,14 +394,36 @@ void ffa_notif_init_interrupt(void)
> * pending, while the SPMC in the secure world will not notice that
> * the interrupt was lost.
> */
> - ret = request_irq(notif_sri_irq, 0, notif_irq_handler, "FF-A notif",
> - NULL);
> - if ( ret )
> - printk(XENLOG_ERR "ffa: request_irq irq %u failed: error %d\n",
> - notif_sri_irq, ret);
> + request_sri_irq();
> }
> }
>
> +static void deinit_ffa_notif_interrupt(void)
> +{
> + if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
> + release_irq(notif_sri_irq, NULL);
> +}
> +
> +static int cpu_ffa_notif_callback(struct notifier_block *nfb,
> + unsigned long action,
> + void *hcpu)
> +{
> + switch ( action )
> + {
> + case CPU_DYING:
> + deinit_ffa_notif_interrupt();
> + break;
> + default:
> + break;
> + }
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block cpu_ffa_notif_nfb = {
> + .notifier_call = cpu_ffa_notif_callback,
> +};
> +
> void ffa_notif_init(void)
> {
> const struct arm_smccc_1_2_regs arg = {
> @@ -392,7 +432,6 @@ void ffa_notif_init(void)
> };
> struct arm_smccc_1_2_regs resp;
> unsigned int irq;
> - int ret;
>
> /* Only enable fw notification if all ABIs we need are supported */
> if ( ffa_fw_supports_fid(FFA_NOTIFICATION_BITMAP_CREATE) &&
> @@ -408,13 +447,11 @@ void ffa_notif_init(void)
> notif_sri_irq = irq;
> if ( irq >= NR_GIC_SGI )
> irq_set_type(irq, IRQ_TYPE_EDGE_RISING);
> - ret = request_irq(irq, 0, notif_irq_handler, "FF-A notif", NULL);
> - if ( ret )
> - {
> - printk(XENLOG_ERR "ffa: request_irq irq %u failed: error %d\n",
> - irq, ret);
> +
> + if ( request_sri_irq() )
> return;
> - }
> +
> + register_cpu_notifier(&cpu_ffa_notif_nfb);
> fw_notif_enabled = true;
> }
> }
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers
2026-04-02 10:45 ` [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers Mykola Kvach
@ 2026-04-27 14:01 ` Luca Fancellu
2026-04-27 14:02 ` Luca Fancellu
1 sibling, 0 replies; 66+ messages in thread
From: Luca Fancellu @ 2026-04-27 14:01 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Bertrand Marquis,
Rahul Singh, Stefano Stabellini, Julien Grall, Michal Orzel,
Volodymyr Babchuk
Hi Mykola,
>
> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> index bf153227db..7607ffc9ca 100644
> --- a/xen/drivers/passthrough/arm/smmu-v3.c
> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> @@ -1814,8 +1814,7 @@ static int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
> }
>
> /* GBPA is "special" */
> -static int __init arm_smmu_update_gbpa(struct arm_smmu_device *smmu,
> - u32 set, u32 clr)
> +static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
> {
> int ret;
> u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;
> @@ -1995,10 +1994,29 @@ err_free_evtq_irq:
> return ret;
> }
>
> +static int arm_smmu_enable_irqs(struct arm_smmu_device *smmu)
> +{
> + int ret;
> + u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
> +
> + if ( smmu->features & ARM_SMMU_FEAT_PRI )
> + irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
> +
> + /* Enable interrupt generation on the SMMU */
> + ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
> + ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
> + if ( ret )
> + {
> + dev_warn(smmu->dev, "failed to enable irqs\n");
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> static int __init arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
> {
> int ret, irq;
> - u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
>
> /* Disable IRQs first */
> ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
> @@ -2028,22 +2046,7 @@ static int __init arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
> }
> }
>
> - if (smmu->features & ARM_SMMU_FEAT_PRI)
> - irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
> -
> - /* Enable interrupt generation on the SMMU */
> - ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
> - ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
> - if (ret) {
> - dev_warn(smmu->dev, "failed to enable irqs\n");
> - goto err_free_irqs;
> - }
> -
> return 0;
> -
> -err_free_irqs:
> - arm_smmu_free_irqs(smmu);
> - return ret;
> }
>
> static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
> @@ -2057,7 +2060,7 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
> return ret;
> }
>
> -static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> +static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
> {
> int ret;
> u32 reg, enables;
> @@ -2163,17 +2166,9 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> }
> }
>
> - ret = arm_smmu_setup_irqs(smmu);
> - if (ret) {
> - dev_err(smmu->dev, "failed to setup irqs\n");
We are moving this one to the probe and ..
> + ret = arm_smmu_enable_irqs(smmu);
> + if ( ret )
changing with this one, but arm_smmu_setup_irqs() also calls arm_smmu_setup_unique_irqs() which
calls arm_smmu_setup_msis(), are we sure that on resume we will get the same state?
> return ret;
> - }
> -
> - /* Initialize tasklets for threaded IRQs*/
> - tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
> - tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
> - tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
> - smmu);
>
> /* Enable the SMMU interface, or ensure bypass */
> if (disable_bypass) {
> @@ -2181,20 +2176,16 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> } else {
> ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT);
> if (ret)
> - goto err_free_irqs;
> + return ret;
> }
> ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
> ARM_SMMU_CR0ACK);
> if (ret) {
> dev_err(smmu->dev, "failed to enable SMMU interface\n");
> - goto err_free_irqs;
> + return ret;
> }
>
> return 0;
> -
> -err_free_irqs:
> - arm_smmu_free_irqs(smmu);
> - return ret;
> }
>
> static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
> @@ -2558,10 +2549,23 @@ static int __init arm_smmu_device_probe(struct platform_device *pdev)
> if (ret)
> goto out_free;
>
> + ret = arm_smmu_setup_irqs(smmu);
> + if ( ret )
> + {
> + dev_err(smmu->dev, "failed to setup irqs\n");
> + goto out_free;
> + }
> +
> + /* Initialize tasklets for threaded IRQs*/
> + tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
> + tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
> + tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
> + smmu);
> +
> /* Reset the device */
> ret = arm_smmu_device_reset(smmu);
> if (ret)
> - goto out_free;
> + goto out_free_irqs;
>
> /*
> * Keep a list of all probed devices. This will be used to query
> @@ -2575,6 +2579,8 @@ static int __init arm_smmu_device_probe(struct platform_device *pdev)
>
> return 0;
>
> +out_free_irqs:
> + arm_smmu_free_irqs(smmu);
>
> out_free:
> arm_smmu_free_structures(smmu);
> @@ -2855,6 +2861,96 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +static void arm_smmu_reset_for_suspend_rollback(struct arm_smmu_device *smmu)
> +{
> + int ret = arm_smmu_device_reset(smmu);
> +
> + if ( ret )
> + dev_err(smmu->dev, "Failed to reset during suspend rollback: %d\n",
> + ret);
> +}
> +
> +static int arm_smmu_suspend(void)
> +{
> + struct arm_smmu_device *smmu;
> + int ret = 0;
> +
> + list_for_each_entry(smmu, &arm_smmu_devices, devices)
> + {
> + bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
> +
> + /* Abort all transactions before disable to avoid spurious bypass */
> + ret = arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> + if ( ret )
> + goto fail;
> +
> + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> + ret = arm_smmu_write_reg_sync(smmu, CR0_CMDQEN, ARM_SMMU_CR0,
> + ARM_SMMU_CR0ACK);
> + if ( ret )
> + {
> + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> + goto fail;
> + }
> +
> + /*
> + * At this point the SMMU is completely disabled and won't access
> + * any translation/config structures, even speculative accesses
> + * aren't performed as per the IHI0070 spec (section 6.3.9.6).
> + */
> +
> + /* Wait for the CMDQs to be drained to flush any pending commands */
> + ret = queue_poll_cons(&smmu->cmdq.q, true, wfe);
> + if ( ret )
> + {
> + dev_err(smmu->dev, "Draining queues timed-out\n");
> + goto fail;
> + }
polling the queue doesn’t give you the assurance that all prior commands are complete,
I would use arm_smmu_cmdq_issue_sync for that instead of the above.
ret = arm_smmu_cmdq_issue_sync(smmu);
if ( ret )
goto fail;
> +
> + /* Disable everything */
> + ret = arm_smmu_device_disable(smmu);
> + if ( ret )
> + goto fail;
> +
> + dev_dbg(smmu->dev, "Suspended smmu\n");
> + }
> +
> + return 0;
> +
> + fail:
> + /* Reset the device that failed as well as any already-suspended ones. */
> + arm_smmu_reset_for_suspend_rollback(smmu);
> +
> + list_for_each_entry_continue_reverse(smmu, &arm_smmu_devices, devices)
> + arm_smmu_reset_for_suspend_rollback(smmu);
> +
> + return ret;
> +}
> +
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers
2026-04-02 10:45 ` [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers Mykola Kvach
2026-04-27 14:01 ` Luca Fancellu
@ 2026-04-27 14:02 ` Luca Fancellu
2026-05-05 15:23 ` Mykola Kvach
1 sibling, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-04-27 14:02 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Bertrand Marquis,
Rahul Singh, Stefano Stabellini, Julien Grall, Michal Orzel,
Volodymyr Babchuk
Hi Mykola,
>
> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> index bf153227db..7607ffc9ca 100644
> --- a/xen/drivers/passthrough/arm/smmu-v3.c
> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> @@ -1814,8 +1814,7 @@ static int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
> }
>
> /* GBPA is "special" */
> -static int __init arm_smmu_update_gbpa(struct arm_smmu_device *smmu,
> - u32 set, u32 clr)
> +static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
> {
> int ret;
> u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;
> @@ -1995,10 +1994,29 @@ err_free_evtq_irq:
> return ret;
> }
>
> +static int arm_smmu_enable_irqs(struct arm_smmu_device *smmu)
> +{
> + int ret;
> + u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
> +
> + if ( smmu->features & ARM_SMMU_FEAT_PRI )
> + irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
> +
> + /* Enable interrupt generation on the SMMU */
> + ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
> + ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
> + if ( ret )
> + {
> + dev_warn(smmu->dev, "failed to enable irqs\n");
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> static int __init arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
> {
> int ret, irq;
> - u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
>
> /* Disable IRQs first */
> ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
> @@ -2028,22 +2046,7 @@ static int __init arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
> }
> }
>
> - if (smmu->features & ARM_SMMU_FEAT_PRI)
> - irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
> -
> - /* Enable interrupt generation on the SMMU */
> - ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
> - ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
> - if (ret) {
> - dev_warn(smmu->dev, "failed to enable irqs\n");
> - goto err_free_irqs;
> - }
> -
> return 0;
> -
> -err_free_irqs:
> - arm_smmu_free_irqs(smmu);
> - return ret;
> }
>
> static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
> @@ -2057,7 +2060,7 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
> return ret;
> }
>
> -static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> +static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
> {
> int ret;
> u32 reg, enables;
> @@ -2163,17 +2166,9 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> }
> }
>
> - ret = arm_smmu_setup_irqs(smmu);
> - if (ret) {
> - dev_err(smmu->dev, "failed to setup irqs\n");
We are moving this one to the probe and ..
> + ret = arm_smmu_enable_irqs(smmu);
> + if ( ret )
changing with this one, but arm_smmu_setup_irqs() also calls arm_smmu_setup_unique_irqs() which
calls arm_smmu_setup_msis(), are we sure that on resume we will get the same state?
> return ret;
> - }
> -
> - /* Initialize tasklets for threaded IRQs*/
> - tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
> - tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
> - tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
> - smmu);
>
> /* Enable the SMMU interface, or ensure bypass */
> if (disable_bypass) {
> @@ -2181,20 +2176,16 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> } else {
> ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT);
> if (ret)
> - goto err_free_irqs;
> + return ret;
> }
> ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
> ARM_SMMU_CR0ACK);
> if (ret) {
> dev_err(smmu->dev, "failed to enable SMMU interface\n");
> - goto err_free_irqs;
> + return ret;
> }
>
> return 0;
> -
> -err_free_irqs:
> - arm_smmu_free_irqs(smmu);
> - return ret;
> }
>
> static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
> @@ -2558,10 +2549,23 @@ static int __init arm_smmu_device_probe(struct platform_device *pdev)
> if (ret)
> goto out_free;
>
> + ret = arm_smmu_setup_irqs(smmu);
> + if ( ret )
> + {
> + dev_err(smmu->dev, "failed to setup irqs\n");
> + goto out_free;
> + }
> +
> + /* Initialize tasklets for threaded IRQs*/
> + tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
> + tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
> + tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
> + smmu);
> +
> /* Reset the device */
> ret = arm_smmu_device_reset(smmu);
> if (ret)
> - goto out_free;
> + goto out_free_irqs;
>
> /*
> * Keep a list of all probed devices. This will be used to query
> @@ -2575,6 +2579,8 @@ static int __init arm_smmu_device_probe(struct platform_device *pdev)
>
> return 0;
>
> +out_free_irqs:
> + arm_smmu_free_irqs(smmu);
>
> out_free:
> arm_smmu_free_structures(smmu);
> @@ -2855,6 +2861,96 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +static void arm_smmu_reset_for_suspend_rollback(struct arm_smmu_device *smmu)
> +{
> + int ret = arm_smmu_device_reset(smmu);
> +
> + if ( ret )
> + dev_err(smmu->dev, "Failed to reset during suspend rollback: %d\n",
> + ret);
> +}
> +
> +static int arm_smmu_suspend(void)
> +{
> + struct arm_smmu_device *smmu;
> + int ret = 0;
> +
> + list_for_each_entry(smmu, &arm_smmu_devices, devices)
> + {
> + bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
> +
> + /* Abort all transactions before disable to avoid spurious bypass */
> + ret = arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> + if ( ret )
> + goto fail;
> +
> + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> + ret = arm_smmu_write_reg_sync(smmu, CR0_CMDQEN, ARM_SMMU_CR0,
> + ARM_SMMU_CR0ACK);
> + if ( ret )
> + {
> + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> + goto fail;
> + }
> +
> + /*
> + * At this point the SMMU is completely disabled and won't access
> + * any translation/config structures, even speculative accesses
> + * aren't performed as per the IHI0070 spec (section 6.3.9.6).
> + */
> +
> + /* Wait for the CMDQs to be drained to flush any pending commands */
> + ret = queue_poll_cons(&smmu->cmdq.q, true, wfe);
> + if ( ret )
> + {
> + dev_err(smmu->dev, "Draining queues timed-out\n");
> + goto fail;
> + }
polling the queue doesn’t give you the assurance that all prior commands are complete,
I would use arm_smmu_cmdq_issue_sync for that instead of the above.
ret = arm_smmu_cmdq_issue_sync(smmu);
if ( ret )
goto fail;
> +
> + /* Disable everything */
> + ret = arm_smmu_device_disable(smmu);
> + if ( ret )
> + goto fail;
> +
> + dev_dbg(smmu->dev, "Suspended smmu\n");
> + }
> +
> + return 0;
> +
> + fail:
> + /* Reset the device that failed as well as any already-suspended ones. */
> + arm_smmu_reset_for_suspend_rollback(smmu);
> +
> + list_for_each_entry_continue_reverse(smmu, &arm_smmu_devices, devices)
> + arm_smmu_reset_for_suspend_rollback(smmu);
> +
> + return ret;
> +}
> +
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume
2026-04-02 10:45 ` [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
@ 2026-04-27 14:50 ` Luca Fancellu
2026-05-05 15:55 ` Mykola Kvach
2026-05-07 22:06 ` Volodymyr Babchuk
1 sibling, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-04-27 14:50 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> The MMU must be enabled during the resume path before restoring context,
> as virtual addresses are used to access the saved context data.
>
> This patch adds MMU setup during resume by reusing the existing
> enable_secondary_cpu_mm function, which enables data cache and the MMU.
I don’t understand where this last part happen in this commit:
> Before the MMU is enabled, the content of TTBR0_EL2 is changed to point
> to init_ttbr (page tables used at runtime).
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in v7:
> - no functional changes, just moved commit
> ---
> xen/arch/arm/arm64/head.S | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 72c7b24498..596e960152 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -561,6 +561,30 @@ END(efi_xen_start)
>
> #endif /* CONFIG_ARM_EFI */
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +FUNC(hyp_resume)
I think we should mask all exceptions here:
msr DAIFSet, 0xf
until we resume correctly the status (VBAR_EL2, etc).
> + /* Initialize the UART if earlyprintk has been enabled. */
> +#ifdef CONFIG_EARLY_PRINTK
> + bl init_uart
> +#endif
> + PRINT_ID("- Xen resuming -\r\n")
> +
> + bl check_cpu_mode
> + bl cpu_init
> +
> + ldr x0, =start
> + adr x20, start /* x20 := paddr (start) */
> + sub x20, x20, x0 /* x20 := phys-offset */
> + ldr lr, =mmu_resumed
> + b enable_secondary_cpu_mm
> +
> +mmu_resumed:
> + b .
> +END(hyp_resume)
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> /*
> * Local variables:
> * mode: ASM
>
This is more a trampoline for the core resuming, not sure if it could be better to squash this
into the following patch, the maintainer could provide their preference.
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume
2026-04-02 10:45 ` [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
@ 2026-04-27 15:26 ` Luca Fancellu
2026-05-07 22:17 ` Volodymyr Babchuk
2026-05-11 16:00 ` Oleksandr Tyshchenko
2 siblings, 0 replies; 66+ messages in thread
From: Luca Fancellu @ 2026-04-27 15:26 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> The context of CPU general purpose and system control registers must be
> saved on suspend and restored on resume. This is implemented in
> prepare_resume_ctx and before the return from the hyp_resume function.
> The prepare_resume_ctx must be invoked just before the PSCI system suspend
> call is issued to the ATF. The prepare_resume_ctx must return a non-zero
> value so that the calling 'if' statement evaluates to true, causing the
> system suspend to be invoked. Upon resume, the context saved on suspend
> will be restored, including the link register. Therefore, after
> restoring the context, the control flow will return to the address
> pointed to by the saved link register, which is the place from which
> prepare_resume_ctx was called. To ensure that the calling 'if' statement
> does not again evaluate to true and initiate system suspend, hyp_resume
> must return a zero value after restoring the context.
>
> Note that the order of saving register context into cpu_context structure
> must match the order of restoring.
>
> Support for ARM32 is not implemented. Instead, compilation fails with a
> build-time error if suspend is enabled for ARM32.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
This patch looks ok to me, I’ll leave my R-by when maintainers will give an opinion
about squashing this one with the previous one.
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 12/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
2026-04-02 10:45 ` [PATCH v8 12/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
@ 2026-04-27 16:21 ` Luca Fancellu
2026-05-05 16:15 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-04-27 16:21 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> Invoke PSCI SYSTEM_SUSPEND to finalize Xen's suspend sequence on ARM64 platforms.
> Pass the resume entry point (hyp_resume) as the first argument to EL3. The resume
> handler is currently a stub and will be implemented later in assembly. Ignore the
> context ID argument, as is done in Linux.
>
> Only enable this path when CONFIG_SYSTEM_SUSPEND is set and
> PSCI version is >= 1.0.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in v7:
> - no changes
> ---
> xen/arch/arm/include/asm/psci.h | 1 +
> xen/arch/arm/psci.c | 23 ++++++++++++++++++++++-
> 2 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
> index 48a93e6b79..bb3c73496e 100644
> --- a/xen/arch/arm/include/asm/psci.h
> +++ b/xen/arch/arm/include/asm/psci.h
> @@ -23,6 +23,7 @@ int call_psci_cpu_on(int cpu);
> void call_psci_cpu_off(void);
> void call_psci_system_off(void);
> void call_psci_system_reset(void);
> +int call_psci_system_suspend(void);
>
> /* Range of allocated PSCI function numbers */
> #define PSCI_FNUM_MIN_VALUE _AC(0,U)
> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
> index b6860a7760..c9d126b195 100644
> --- a/xen/arch/arm/psci.c
> +++ b/xen/arch/arm/psci.c
> @@ -17,17 +17,20 @@
> #include <asm/cpufeature.h>
> #include <asm/psci.h>
> #include <asm/acpi.h>
> +#include <asm/suspend.h>
>
> /*
> * While a 64-bit OS can make calls with SMC32 calling conventions, for
> * some calls it is necessary to use SMC64 to pass or return 64-bit values.
> - * For such calls PSCI_0_2_FN_NATIVE(x) will choose the appropriate
> + * For such calls PSCI_*_FN_NATIVE(x) will choose the appropriate
> * (native-width) function ID.
> */
> #ifdef CONFIG_ARM_64
> #define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN64_##name
> +#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN64_##name
> #else
> #define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN32_##name
> +#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN32_##name
> #endif
>
> uint32_t psci_ver;
> @@ -60,6 +63,24 @@ void call_psci_cpu_off(void)
> }
> }
>
> +int call_psci_system_suspend(void)
> +{
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + struct arm_smccc_res res;
> +
> + if ( psci_ver < PSCI_VERSION(1, 0) )
> + return PSCI_NOT_SUPPORTED;
> +
> + /* 2nd argument (context ID) is not used */
> + arm_smccc_smc(PSCI_1_0_FN_NATIVE(SYSTEM_SUSPEND), __pa(hyp_resume), &res);
I think Linux is passing 0 as context ID, probably to mark that it’s not used, I think we should do the
same
> + return PSCI_RET(res);
> +#else
> + dprintk(XENLOG_WARNING,
> + "SYSTEM_SUSPEND not supported (CONFIG_SYSTEM_SUSPEND disabled)\n");
> + return PSCI_NOT_SUPPORTED;
> +#endif
> +}
> +
> void call_psci_system_off(void)
> {
> if ( psci_ver > PSCI_VERSION(0, 1) )
>
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
2026-04-02 10:45 ` [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
2026-04-02 11:00 ` Jan Beulich
@ 2026-04-29 8:05 ` Luca Fancellu
2026-05-05 20:34 ` Mykola Kvach
1 sibling, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-04-29 8:05 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné,
Rahul Singh
Hi Mykola,
> diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> index e38566b0b7..4d1289776b 100644
> --- a/xen/arch/arm/suspend.c
> +++ b/xen/arch/arm/suspend.c
> @@ -1,9 +1,190 @@
> /* SPDX-License-Identifier: GPL-2.0-only */
>
> +#include <asm/psci.h>
> #include <asm/suspend.h>
>
> +#include <public/sched.h>
> +#include <xen/console.h>
> +#include <xen/cpu.h>
> +#include <xen/errno.h>
> +#include <xen/iommu.h>
> +#include <xen/sched.h>
> +#include <xen/tasklet.h>
> +
> struct cpu_context cpu_context = {};
>
> +static int can_system_suspend(void)
> +{
> + int ret = 0;
> + struct domain *d;
> +
> + rcu_read_lock(&domlist_read_lock);
> +
> + for_each_domain ( d )
> + {
> + bool domain_suspended;
> +
> + spin_lock(&d->shutdown_lock);
> + domain_suspended = d->is_shut_down &&
> + d->shutdown_code == SHUTDOWN_suspend;
> + spin_unlock(&d->shutdown_lock);
> +
> + if ( domain_suspended )
> + continue;
> +
> + printk(XENLOG_ERR
> + "System suspend requires all domains to be shut down for suspend (dom%d: isn't in suspend state)\n",
d->domain_id is unsigned if I’m not mistaken, it wants %u (typedef uint16_t domid_t;)
> + d->domain_id);
> +
> + ret = -EBUSY;
> + break;
> + }
> +
> + rcu_read_unlock(&domlist_read_lock);
> +
> + return ret;
> +}
> +
> +/* Xen suspend. data identifies the domain that initiated suspend. */
> +static void system_suspend(void *data)
> +{
> + int status;
> + unsigned long flags;
> + struct domain *d = (struct domain *)data;
> +
> + BUG_ON(system_state != SYS_STATE_active);
> +
> + system_state = SYS_STATE_suspend;
> +
> + printk("Xen suspending...\n");
> +
> + freeze_domains();
> + scheduler_disable();
> +
> + status = can_system_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_scheduler;
When we have an error and we get the resume_scheduler path, we apply back the
context of the guest saved previously in do_psci_1_0_system_suspend(), so am I
correct saying the guest won’t get any PSCI error back and we resume the guest
from the guest resume entrypoint?
In case, should we have a different path that returns a PSCI error (PSCI_*) into the guest
x0, and skips the context restore?
> + }
> +
> + /*
> + * Non-boot CPUs have to be disabled on suspend and enabled on resume
> + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
> + * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
> + * platform capabilities, this may lead to the physical powering down of
> + * CPUs.
> + */
> + status = disable_nonboot_cpus();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_nonboot_cpus;
> + }
> +
> + time_suspend();
> +
> + status = iommu_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_time;
> + }
> +
> + console_start_sync();
> + status = console_suspend();
> + if ( status )
> + {
> + dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", status);
> + system_state = SYS_STATE_resume;
> + goto resume_end_sync;
> + }
> +
> + local_irq_save(flags);
> + status = gic_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_irqs;
> + }
> +
> + set_init_ttbr(xen_pgtable);
> +
> + /*
> + * Enable identity mapping before entering suspend to simplify
> + * the resume path
> + */
> + update_boot_mapping(true);
> +
> + if ( prepare_resume_ctx(&cpu_context) )
> + {
> + status = call_psci_system_suspend();
> + /*
> + * If suspend is finalized properly by above system suspend PSCI call,
> + * the code below in this 'if' branch will never execute. Execution
> + * will continue from hyp_resume which is the hypervisor's resume point.
> + * In hyp_resume CPU context will be restored and since link-register is
> + * restored as well, it will appear to return from prepare_resume_ctx.
> + * The difference in returning from prepare_resume_ctx on system suspend
> + * versus resume is in function's return value: on suspend, the return
> + * value is a non-zero value, on resume it is zero. That is why the
> + * control flow will not re-enter this 'if' branch on resume.
> + */
> + if ( status )
> + dprintk(XENLOG_WARNING, "PSCI system suspend failed, err=%d\n",
> + status);
> + }
> +
> + system_state = SYS_STATE_resume;
> + update_boot_mapping(false);
> +
> + gic_resume();
> +
> + resume_irqs:
> + local_irq_restore(flags);
> +
> + console_resume();
> + resume_end_sync:
> + console_end_sync();
> +
> + iommu_resume();
> +
> + resume_time:
> + time_resume();
> +
> + resume_nonboot_cpus:
> + /*
> + * The rcu_barrier() has to be added to ensure that the per cpu area is
> + * freed before a non-boot CPU tries to initialize it (_free_percpu_area()
> + * has to be called before the init_percpu_area()). This scenario occurs
> + * when non-boot CPUs are hot-unplugged on suspend and hotplugged on resume.
> + */
> + rcu_barrier();
> + enable_nonboot_cpus();
> +
> + resume_scheduler:
> + scheduler_enable();
> + thaw_domains();
> +
> + system_state = SYS_STATE_active;
> +
> + printk("Resume (status %d)\n", status);
> +
> + domain_resume(d);
> +}
> +
> +static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
> +
> +void host_system_suspend(struct domain *d)
> +{
> + system_suspend_tasklet.data = (void *)d;
> + /*
> + * The suspend procedure has to be finalized by the pCPU#0 (non-boot pCPUs
> + * will be disabled during the suspend).
> + */
> + tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
> +}
> +
> /*
> * Local variables:
> * mode: C
> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> index bd87ec430d..8fb9172186 100644
> --- a/xen/arch/arm/vpsci.c
> +++ b/xen/arch/arm/vpsci.c
> @@ -5,6 +5,7 @@
>
> #include <asm/current.h>
> #include <asm/domain.h>
> +#include <asm/suspend.h>
> #include <asm/vgic.h>
> #include <asm/vpsci.h>
> #include <asm/event.h>
> @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> if ( is_64bit_domain(d) && is_thumb )
> return PSCI_INVALID_ADDRESS;
>
> - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> - if ( is_hardware_domain(d) )
> + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
> return PSCI_NOT_SUPPORTED;
>
> /* Ensure that all CPUs other than the calling one are offline */
> @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", cid=%#"PRIregister"\n",
> epoint, cid);
>
> + if ( is_control_domain(d) )
Why is_control_domain() here and not is_hardware_domain() ?
> + host_system_suspend(d);
> +
> return rc;
> }
>
> @@ -290,7 +293,10 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> return 0;
> case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> - return is_hardware_domain(current->domain) ? PSCI_NOT_SUPPORTED : 0;
> + if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) ||
> + !is_hardware_domain(current->domain) )
Should this have also the condition that “is hardware domain and psci_ver >= PSCI_VERSION(1, 0)”?
Otherwise if the host machine doestn’t support PSCI 1.0 we would return OK here but the call would
fail later in call_psci_system_suspend()?
> + return 0;
> + fallthrough;
> default:
> return PSCI_NOT_SUPPORTED;
> }
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index 0a20aa0a12..feb1336f46 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -137,6 +137,9 @@ config HAS_EX_TABLE
> config HAS_FAST_MULTIPLY
> bool
>
> +config HAS_HWDOM_SYSTEM_SUSPEND
> + bool
> +
> config HAS_IOPORTS
> bool
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index bb9e210c28..d3edfb2a13 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1375,6 +1375,11 @@ void __domain_crash(struct domain *d)
> domain_shutdown(d, SHUTDOWN_crash);
> }
>
> +static inline bool want_hwdom_shutdown(uint8_t reason)
> +{
> + return !IS_ENABLED(CONFIG_HAS_HWDOM_SYSTEM_SUSPEND) ||
> + reason != SHUTDOWN_suspend;
> +}
>
> int domain_shutdown(struct domain *d, u8 reason)
> {
> @@ -1391,7 +1396,7 @@ int domain_shutdown(struct domain *d, u8 reason)
> d->shutdown_code = reason;
> reason = d->shutdown_code;
>
> - if ( is_hardware_domain(d) )
> + if ( is_hardware_domain(d) && want_hwdom_shutdown(reason) )
> hwdom_shutdown(reason);
>
> if ( d->is_shutting_down )
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 22d306d0cb..45f29ef8ec 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
Maybe we want to gate the feature also to !CONFIG_ARM_SMMU ? I would wait for the maintainers
view on this.
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 03/13] xen/arm: gic-v3: tolerate retained redistributor LPI state across CPU_OFF
2026-04-22 15:55 ` Luca Fancellu
@ 2026-05-05 6:06 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-05 6:06 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Luca,
Thank you for the review.
On Wed, Apr 22, 2026 at 6:57 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> > +
> > +static int gicv3_lpi_disable_lpis(void __iomem *rdist_base)
> > +{
> > + uint32_t reg = readl_relaxed(rdist_base + GICR_CTLR);
> > + int ret;
> > +
> > + if ( !(reg & GICR_CTLR_ENABLE_LPIS) )
> > + return 0;
> > +
> > + writel_relaxed(reg & ~GICR_CTLR_ENABLE_LPIS, rdist_base + GICR_CTLR);
> > +
> > + /*
> > + * The spec only guarantees programmability when we have observed the bit
> > + * cleared. Where clearing is supported, RWP must reach 0 before touching
> > + * PROPBASER/PENDBASER again.
> > + */
> > + wmb();
> > +
> > + ret = gicv3_do_wait_for_rwp(rdist_base);
>
> I’m looking into the implementation of gicv3_do_wait_for_rwp() and I see
> it’s polling on bit 31 (UWP) instead of bit 3 (RWP)?
>
> Not related to this patch but I feel we need to raise this.
Good catch, thanks.
UWP does have SGI-related semantics, but it is not the same as redistributor
RWP. The existing helper is used as an RWP wait helper after redistributor
register writes, so the redistributor path should poll GICR_CTLR.RWP rather
than GICR_CTLR.UWP.
I will send a separate prerequisite patch to make the redistributor
path use GICR_CTLR_RWP.
>
> > + if ( ret )
> > + return ret;
> > +
> > + reg = readl_relaxed(rdist_base + GICR_CTLR);
> > + if ( reg & GICR_CTLR_ENABLE_LPIS )
> > + return -EBUSY;
> > +
> > + return 0;
> > +}
> > +
> > /*
> > * Tell a redistributor about the (shared) property table, allocating one
> > * if not already done.
> > @@ -373,7 +434,21 @@ int gicv3_lpi_init_rdist(void __iomem * rdist_base)
> > /* Make sure LPIs are disabled before setting up the tables. */
> > reg = readl_relaxed(rdist_base + GICR_CTLR);
> > if ( reg & GICR_CTLR_ENABLE_LPIS )
> > - return -EBUSY;
> > + {
> > + if ( gicv3_lpi_tables_match(rdist_base) )
> > + return -EBUSY;
> > +
> > + ret = gicv3_lpi_disable_lpis(rdist_base);
> > + if ( ret == -EBUSY )
> > + {
> > + printk(XENLOG_ERR
> > + "GICv3: CPU%d: LPIs still enabled with unexpected redistributor tables\n",
> > + smp_processor_id());
> > + return -EINVAL;
> > + }
> > + if ( ret )
> > + return ret;
> > + }
> >
> > ret = gicv3_lpi_set_pendtable(rdist_base);
> > if ( ret )
> > diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> > index bc07f97c16..34fb065afc 100644
> > --- a/xen/arch/arm/gic-v3.c
> > +++ b/xen/arch/arm/gic-v3.c
> > @@ -274,8 +274,8 @@ static void gicv3_enable_sre(void)
> > isb();
> > }
> >
> > -/* Wait for completion of a distributor change */
> > -static void gicv3_do_wait_for_rwp(void __iomem *base)
> > +/* Wait for completion of a distributor/redistributor write-pending change. */
> > +int gicv3_do_wait_for_rwp(void __iomem *base)
> > {
> > uint32_t val;
> > bool timeout = false;
> > @@ -295,17 +295,22 @@ static void gicv3_do_wait_for_rwp(void __iomem *base)
> > } while ( 1 );
> >
> > if ( timeout )
> > + {
> > dprintk(XENLOG_ERR, "RWP timeout\n");
> > + return -ETIMEDOUT;
> > + }
> > +
> > + return 0;
> > }
> >
> > static void gicv3_dist_wait_for_rwp(void)
> > {
> > - gicv3_do_wait_for_rwp(GICD);
> > + (void)gicv3_do_wait_for_rwp(GICD);
> > }
> >
> > static void gicv3_redist_wait_for_rwp(void)
> > {
> > - gicv3_do_wait_for_rwp(GICD_RDIST_BASE);
> > + (void)gicv3_do_wait_for_rwp(GICD_RDIST_BASE);
> > }
> >
> > static void gicv3_wait_for_rwp(int irq)
> > @@ -925,7 +930,7 @@ static int __init gicv3_populate_rdist(void)
> > gicv3_set_redist_address(rdist_addr, procnum);
> >
> > ret = gicv3_lpi_init_rdist(ptr);
> > - if ( ret && ret != -ENODEV )
> > + if ( ret && ret != -ENODEV && ret != -EBUSY )
> > {
> > printk("GICv3: CPU%d: Cannot initialize LPIs: %u\n”,
>
> This should be the other way around? %u for smp_processor_id() and %d for ret?
You're right, thanks. I will fix the format string.
>
> > smp_processor_id(), ret);
> > diff --git a/xen/arch/arm/include/asm/gic_v3_its.h b/xen/arch/arm/include/asm/gic_v3_its.h
> > index fc5a84892c..081bd19180 100644
> > --- a/xen/arch/arm/include/asm/gic_v3_its.h
> > +++ b/xen/arch/arm/include/asm/gic_v3_its.h
>
> Why this header and not gic.h?
You're right, this prototype is not ITS-specific. I will move it to gic.h.
Best regards,
Mykola
>
> > @@ -133,6 +133,7 @@ struct host_its {
> >
> > /* Map a collection for this host CPU to each host ITS. */
> > int gicv3_its_setup_collection(unsigned int cpu);
> > +int gicv3_do_wait_for_rwp(void __iomem *base);
> >
> > #ifdef CONFIG_HAS_ITS
> >
> >
>
> The rest looks ok to me!
>
> Cheers,
> Luca
>
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 04/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions
2026-04-23 11:28 ` Luca Fancellu
@ 2026-05-05 7:26 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-05 7:26 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Luca,
Thank you for the review.
On Thu, Apr 23, 2026 at 2:29 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> > diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> > index 34fb065afc..d182a71478 100644
> > --- a/xen/arch/arm/gic-v3.c
> > +++ b/xen/arch/arm/gic-v3.c
> > @@ -1072,12 +1072,12 @@ out:
> > return res;
> > }
> >
> > -static void gicv3_hyp_disable(void)
> > +static void gicv3_hyp_enable(bool enable)
> > {
> > register_t hcr;
> >
> > hcr = READ_SYSREG(ICH_HCR_EL2);
> > - hcr &= ~GICH_HCR_EN;
> > + hcr = enable ? (hcr | GICH_HCR_EN) : (hcr & ~GICH_HCR_EN);
> > WRITE_SYSREG(hcr, ICH_HCR_EL2);
> > isb();
> > }
> > @@ -1184,7 +1184,7 @@ static void gicv3_disable_interface(void)
> > spin_lock(&gicv3.lock);
> >
> > gicv3_cpu_disable();
> > - gicv3_hyp_disable();
> > + gicv3_hyp_enable(false);
> >
> > spin_unlock(&gicv3.lock);
> > }
> > @@ -1920,6 +1920,313 @@ static bool gic_dist_supports_lpis(void)
> > return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +/* This struct represent block of 32 IRQs */
>
> NIT: s/represent/represents
Ack.
>
> > +struct dist_irq_block {
> > + uint32_t icfgr[2];
> > + uint32_t ipriorityr[8];
> > + uint64_t irouter[32];
> > + uint32_t isactiver;
> > + uint32_t isenabler;
> > +};
> > +
> > +struct redist_ctx {
> > + uint32_t ctlr;
> > + uint32_t icfgr; /* only PPIs stored */
> > + uint32_t igroupr;
>
> I think Xen writes also GICD_IGROUPR<n>E, we are not saving it so in case of a reset
> we would have GICD_IGROUPR<n>E containing the reset value which is zero.
> Or we could decide to re-initialise it in the same way Xen does (all 1s).
Yes, good point.
For the normal SPI range I re-initialise GICD_IGROUPR to all 1s during resume,
but I missed doing the same for the eSPI range. I will add the corresponding
GICD_IGROUPR<n>E re-initialisation, matching the normal Xen initialisation path.
>
> > + uint32_t ipriorityr[8];
> > + uint32_t isactiver;
> > + uint32_t isenabler;
> > +
> > + uint64_t pendbase;
> > + uint64_t propbase;
> > +};
> > +
> > +/* GICv3 registers to be saved/restored on system suspend/resume */
> > +struct gicv3_ctx {
> > + struct dist_ctx {
> > + uint32_t ctlr;
> > + struct dist_irq_block *irqs, *espi_irqs;
>
> NIT: I would declare them one after the other and not in the same line, but this is a matter of taste
> maybe so I will leave the decision to the maintainers.
Ack.
>
> > + } dist;
> > +
> > + /* have only one rdist structure for last running CPU during suspend */
> > + struct redist_ctx rdist;
> > +
> > + struct cpu_ctx {
> > + uint32_t ctlr;
> > + uint32_t pmr;
> > + uint32_t bpr;
> > + uint32_t sre_el2;
> > + uint32_t grpen;
> > + } cpu;
> > +};
> > +
> > +static struct gicv3_ctx gicv3_ctx;
> > +
> > +static void __init gicv3_alloc_context(void)
> > +{
> > + uint32_t blocks = DIV_ROUND_UP(gicv3_info.nr_lines, 32);
> > +
> > + /* The spec allows for systems without any SPIs */
> > + if ( blocks > 1 )
> > + {
> > + gicv3_ctx.dist.irqs = xzalloc_array(struct dist_irq_block, blocks - 1);
> > + if ( !gicv3_ctx.dist.irqs )
> > + panic("Failed to allocate memory for GICv3 suspend context\n");
> > + }
> > +
> > +#ifdef CONFIG_GICV3_ESPI
> > + if ( !gic_number_espis() )
> > + return;
> > +
> > + blocks = gic_number_espis() / 32;
> > + gicv3_ctx.dist.espi_irqs = xzalloc_array(struct dist_irq_block, blocks);
> > + if ( !gicv3_ctx.dist.espi_irqs )
> > + panic("Failed to allocate memory for GICv3 eSPI suspend context\n");
> > +#endif
> > +}
> > +
> > +static int gicv3_disable_redist(void)
> > +{
> > + void __iomem *waker = GICD_RDIST_BASE + GICR_WAKER;
> > + s_time_t deadline;
> > +
> > + /*
> > + * Avoid infinite loop if Non-secure does not have access to GICR_WAKER.
> > + * See Arm IHI 0069H.b, 12.11.42 GICR_WAKER:
> > + * When GICD_CTLR.DS == 0 and an access is Non-secure accesses to this
> > + * register are RAZ/WI.
> > + */
> > + if ( !(readl_relaxed(GICD + GICD_CTLR) & GICD_CTLR_DS) )
> > + return 0;
> > +
> > + deadline = NOW() + MILLISECS(1000);
> > +
> > + writel_relaxed(readl_relaxed(waker) | GICR_WAKER_ProcessorSleep, waker);
> > + while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 )
> > + {
> > + if ( NOW() > deadline )
> > + {
> > + printk("GICv3: Timeout waiting for redistributor to sleep\n");
> > + return -ETIMEDOUT;
> > + }
> > + cpu_relax();
> > + udelay(10);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +#define GET_SPI_REG_OFFSET(name, is_espi) \
> > + ((is_espi) ? GICD_##name##nE : GICD_##name)
> > +
> > +static void gicv3_store_spi_irq_block(struct dist_irq_block *irqs,
> > + unsigned int i, bool is_espi)
> > +{
> > + void __iomem *base;
> > + unsigned int irq;
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + i * sizeof(irqs->icfgr);
> > + irqs->icfgr[0] = readl_relaxed(base);
> > + irqs->icfgr[1] = readl_relaxed(base + 4);
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi);
> > + base += i * sizeof(irqs->ipriorityr);
> > + for ( irq = 0; irq < ARRAY_SIZE(irqs->ipriorityr); irq++ )
> > + irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi);
> > + base += i * sizeof(irqs->irouter);
> > + for ( irq = 0; irq < ARRAY_SIZE(irqs->irouter); irq++ )
> > + irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi);
> > + base += i * sizeof(irqs->isactiver);
> > + irqs->isactiver = readl_relaxed(base);
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi);
> > + base += i * sizeof(irqs->isenabler);
> > + irqs->isenabler = readl_relaxed(base);
> > +}
> > +
> > +static void gicv3_restore_spi_irq_block(struct dist_irq_block *irqs,
> > + unsigned int i, bool is_espi)
> > +{
> > + void __iomem *base;
> > + unsigned int irq;
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + i * sizeof(irqs->icfgr);
> > + writel_relaxed(irqs->icfgr[0], base);
> > + writel_relaxed(irqs->icfgr[1], base + 4);
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi);
> > + base += i * sizeof(irqs->ipriorityr);
> > + for ( irq = 0; irq < ARRAY_SIZE(irqs->ipriorityr); irq++ )
> > + writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi);
> > + base += i * sizeof(irqs->irouter);
> > + for ( irq = 0; irq < ARRAY_SIZE(irqs->irouter); irq++ )
> > + writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
>
>
> The [1] 12.9.22 GICD_IROUTER<n> says "these registers are used only when affinity routing is enabled.
> When affinity routing is not enabled: These registers are RES0. An implementation is permitted to make
> the register RAZ/WI in this case”
>
> So I think these needs to be written after we set GICD_CTLR or we are going to loose anything written there
> and also the configuration won’t be restored.
You are right. Restoring IROUTER before restoring the affinity-routing state is
not safe, because these registers are only meaningful when affinity routing is
enabled.
I will fix the restore ordering in the next version.
>
>
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(ICENABLER, is_espi) + i * 4;
> > + writel_relaxed(GENMASK(31, 0), base);
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi);
> > + base += i * sizeof(irqs->isenabler);
> > + writel_relaxed(irqs->isenabler, base);
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(ICACTIVER, is_espi) + i * 4;
> > + writel_relaxed(GENMASK(31, 0), base);
> > +
> > + base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi);
> > + base += i * sizeof(irqs->isactiver);
> > + writel_relaxed(irqs->isactiver, base);
> > +}
> > +
> > +static int gicv3_suspend(void)
> > +{
> > + unsigned int i;
> > + void __iomem *base;
> > + int ret;
> > + struct redist_ctx *rdist = &gicv3_ctx.rdist;
> > +
> > + /* Save GICC configuration */
> > + gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
> > + gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
> > + gicv3_ctx.cpu.bpr = READ_SYSREG(ICC_BPR1_EL1);
> > + gicv3_ctx.cpu.sre_el2 = READ_SYSREG(ICC_SRE_EL2);
> > + gicv3_ctx.cpu.grpen = READ_SYSREG(ICC_IGRPEN1_EL1);
> > +
> > + gicv3_disable_interface();
>
> this one is calling also gicv3_cpu_disable() that will zero ICC_IGRPEN1_EL1 ...
>
> > +
> > + ret = gicv3_disable_redist();
> > + if ( ret )
> > + goto out_enable_iface;
>
> … but when we fail here ...
>
> > +
> > + /* Save GICR configuration */
> > + gicv3_redist_wait_for_rwp();
> > +
> > + base = GICD_RDIST_BASE;
> > +
> > + rdist->ctlr = readl_relaxed(base + GICR_CTLR);
> > +
> > + rdist->propbase = readq_relaxed(base + GICR_PROPBASER);
> > + rdist->pendbase = readq_relaxed(base + GICR_PENDBASER);
> > +
> > + base = GICD_RDIST_SGI_BASE;
> > +
> > + /* Save priority on PPI and SGI interrupts */
> > + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> > + rdist->ipriorityr[i] = readl_relaxed(base + GICR_IPRIORITYR0 + 4 * i);
> > +
> > + rdist->isactiver = readl_relaxed(base + GICR_ISACTIVER0);
> > + rdist->isenabler = readl_relaxed(base + GICR_ISENABLER0);
> > + rdist->igroupr = readl_relaxed(base + GICR_IGROUPR0);
> > + rdist->icfgr = readl_relaxed(base + GICR_ICFGR1);
> > +
> > + /* Save GICD configuration */
> > + gicv3_dist_wait_for_rwp();
> > + gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
> > +
> > + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> > + gicv3_store_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
> > +
> > +#ifdef CONFIG_GICV3_ESPI
> > + for ( i = 0; i < gic_number_espis() / 32; i++ )
> > + gicv3_store_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
> > +#endif
> > +
> > + return 0;
> > +
> > + out_enable_iface:
> > + gicv3_hyp_enable(true);
> > + WRITE_SYSREG(gicv3_ctx.cpu.ctlr, ICC_CTLR_EL1);
>
> we don’t recover ICC_IGRPEN1_EL1
Yes, you are right.
This series missed the change introduced by commit 18b718b6af3d ("xen/arm:
gic-v3: disable Group 1 before CPU power-down"). Since gicv3_cpu_disable() now
disables ICC_IGRPEN1_EL1, the error path needs to restore it before returning.
I will fix this in the next version.
Best regards,
Mykola
>
> > + isb();
> > +
> > + return ret;
> > +}
> > +
> > +static void gicv3_resume(void)
> > +{
> > + int ret;
> > + unsigned int i;
> > + void __iomem *base;
> > + struct redist_ctx *rdist = &gicv3_ctx.rdist;
> > +
> > + writel_relaxed(0, GICD + GICD_CTLR);
> > +
> > + for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
> > + writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
> > +
> > + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> > + gicv3_restore_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
> > +
> > +#ifdef CONFIG_GICV3_ESPI
> > + for ( i = 0; i < gic_number_espis() / 32; i++ )
> > + gicv3_restore_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
> > +#endif
> > +
> > + writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
> > + gicv3_dist_wait_for_rwp();
> > +
> > + ret = gicv3_lpi_init_rdist(GICD_RDIST_BASE);
> > + /*
> > + * If LPIs are already enabled, assume firmware or the still-powered
> > + * redistributor has valid PROPBASER/PENDBASER and skip reprogramming.
> > + * Return -EBUSY so callers can ignore this case.
> > + */
> > + if ( ret && ret != -ENODEV && ret != -EBUSY )
> > + panic("GICv3: Failed to re-initialize LPIs during resume\n");
> > + else if ( ret == -EBUSY ) /* extra checks, just to be sure */
> > + {
> > + base = GICD_RDIST_BASE;
> > + if ( readq_relaxed(base + GICR_PROPBASER) != rdist->propbase ||
> > + readq_relaxed(base + GICR_PENDBASER) != rdist->pendbase )
> > + {
> > + panic("GICv3: LPIs already enabled with unexpected PROPBASER/PENDBASER during resume\n");
> > + }
> > + }
> > +
> > + /* Restore GICR (Redistributor) configuration */
> > + if ( gicv3_enable_redist() )
> > + panic("GICv3: Failed to re-enable redistributor during resume\n");
> > +
> > + base = GICD_RDIST_SGI_BASE;
> > +
> > + writel_relaxed(GENMASK(31, 0), base + GICR_ICENABLER0);
> > + gicv3_redist_wait_for_rwp();
> > +
> > + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> > + writel_relaxed(rdist->ipriorityr[i], base + GICR_IPRIORITYR0 + i * 4);
> > +
> > + writel_relaxed(rdist->isactiver, base + GICR_ISACTIVER0);
> > + writel_relaxed(rdist->igroupr, base + GICR_IGROUPR0);
> > + writel_relaxed(rdist->icfgr, base + GICR_ICFGR1);
> > +
> > + gicv3_redist_wait_for_rwp();
> > +
> > + writel_relaxed(rdist->isenabler, base + GICR_ISENABLER0);
> > + writel_relaxed(rdist->ctlr, GICD_RDIST_BASE + GICR_CTLR);
> > +
> > + gicv3_redist_wait_for_rwp();
> > +
> > + WRITE_SYSREG(gicv3_ctx.cpu.sre_el2, ICC_SRE_EL2);
> > + isb();
> > +
> > + /* Restore CPU interface (System registers) */
> > + WRITE_SYSREG(gicv3_ctx.cpu.pmr, ICC_PMR_EL1);
> > + WRITE_SYSREG(gicv3_ctx.cpu.bpr, ICC_BPR1_EL1);
> > + WRITE_SYSREG(gicv3_ctx.cpu.ctlr, ICC_CTLR_EL1);
> > + WRITE_SYSREG(gicv3_ctx.cpu.grpen, ICC_IGRPEN1_EL1);
> > + isb();
> > +
> > + gicv3_hyp_init();
> > +}
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > /* Set up the GIC */
> > static int __init gicv3_init(void)
> > {
> > @@ -1994,6 +2301,10 @@ static int __init gicv3_init(void)
> >
> > gicv3_hyp_init();
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + gicv3_alloc_context();
> > +#endif
> > +
> > out:
> > spin_unlock(&gicv3.lock);
> >
> > @@ -2033,6 +2344,10 @@ static const struct gic_hw_operations gicv3_ops = {
> > #endif
> > .iomem_deny_access = gicv3_iomem_deny_access,
> > .do_LPI = gicv3_do_LPI,
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + .suspend = gicv3_suspend,
> > + .resume = gicv3_resume,
> > +#endif
> > };
> >
> > static int __init gicv3_dt_preinit(struct dt_device_node *node, const void *data)
> > diff --git a/xen/arch/arm/include/asm/gic_v3_defs.h b/xen/arch/arm/include/asm/gic_v3_defs.h
> > index c373b94d19..992c8f9c2f 100644
> > --- a/xen/arch/arm/include/asm/gic_v3_defs.h
> > +++ b/xen/arch/arm/include/asm/gic_v3_defs.h
> > @@ -94,6 +94,7 @@
> > #define GICD_TYPE_LPIS (1U << 17)
> >
> > #define GICD_CTLR_RWP (1UL << 31)
> > +#define GICD_CTLR_DS (1U << 6)
> > #define GICD_CTLR_ARE_NS (1U << 4)
> > #define GICD_CTLR_ENABLE_G1A (1U << 1)
> > #define GICD_CTLR_ENABLE_G1 (1U << 0)
> >
>
> Cheers,
> Luca
>
>
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 05/13] xen/arm: gic-v3: add ITS suspend/resume support
2026-04-24 10:53 ` Luca Fancellu
@ 2026-05-05 10:09 ` Mykola Kvach
2026-05-08 11:30 ` Luca Fancellu
0 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-05-05 10:09 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné
Hi Luca,
Thank you for the review.
On Fri, Apr 24, 2026 at 1:54 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> > On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
> >
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > Handle system suspend/resume for GICv3 with an ITS present so LPIs keep
> > working after firmware powers the GIC down. Snapshot the CPU interface,
> > distributor and last-CPU redistributor state, disable the ITS to cache its
> > CTLR/CBASER/BASER registers, then restore everything and re-arm the
> > collection on resume.
> >
> > Add list_for_each_entry_continue_reverse() in list.h for the ITS suspend
> > error path that needs to roll back partially saved state.
> >
> > Based on Linux commit dba0bc7b76dc ("irqchip/gic-v3-its: Add ability to save/restore ITS state")
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V8:
> > - Reword the CBASER/CWRITER comment to match Xen and drop the stale Linux
> > cmd_write reference.
> > - Clarify the list_for_each_entry_continue_reverse() comment.
> > - Factor out per-ITS helpers for collection setup and resume.
> > - Restore each ITS and re-establish its collection mapping in the same
> > loop, so a failed ITS resume is not followed by MAPC/SYNC on that
> > un-restored instance.
> > - panic in case when resume of an ITS failed
> > - cleanup baser cache during suspend
> > ---
> > xen/arch/arm/gic-v3-its.c | 126 ++++++++++++++++++++++++--
> > xen/arch/arm/gic-v3.c | 15 ++-
> > xen/arch/arm/include/asm/gic_v3_its.h | 23 +++++
> > xen/include/xen/list.h | 14 +++
> > 4 files changed, 166 insertions(+), 12 deletions(-)
> >
> > diff --git a/xen/arch/arm/gic-v3-its.c b/xen/arch/arm/gic-v3-its.c
> > index 9ba068c46f..fe2865eac9 100644
> > --- a/xen/arch/arm/gic-v3-its.c
> > +++ b/xen/arch/arm/gic-v3-its.c
> > @@ -335,6 +335,22 @@ static int its_send_cmd_inv(struct host_its *its,
> > return its_send_command(its, cmd);
> > }
> >
> > +static int gicv3_its_setup_collection_single(struct host_its *its,
> > + unsigned int cpu)
> > +{
> > + int ret;
> > +
> > + ret = its_send_cmd_mapc(its, cpu, cpu);
> > + if ( ret )
> > + return ret;
> > +
> > + ret = its_send_cmd_sync(its, cpu);
> > + if ( ret )
> > + return ret;
> > +
> > + return gicv3_its_wait_commands(its);
> > +}
> > +
> > /* Set up the (1:1) collection mapping for the given host CPU. */
> > int gicv3_its_setup_collection(unsigned int cpu)
> > {
> > @@ -343,15 +359,7 @@ int gicv3_its_setup_collection(unsigned int cpu)
> >
> > list_for_each_entry(its, &host_its_list, entry)
> > {
> > - ret = its_send_cmd_mapc(its, cpu, cpu);
> > - if ( ret )
> > - return ret;
> > -
> > - ret = its_send_cmd_sync(its, cpu);
> > - if ( ret )
> > - return ret;
> > -
> > - ret = gicv3_its_wait_commands(its);
> > + ret = gicv3_its_setup_collection_single(its, cpu);
> > if ( ret )
> > return ret;
> > }
> > @@ -1209,6 +1217,106 @@ int gicv3_its_init(void)
> > return 0;
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +int gicv3_its_suspend(void)
> > +{
> > + struct host_its *its;
> > + int ret;
> > +
> > + list_for_each_entry(its, &host_its_list, entry)
>
> NIT: codestyle, spaces after and before the parenthesis
Ack, I will fix the coding style.
>
> > + {
> > + unsigned int i;
> > + void __iomem *base = its->its_base;
> > +
> > + its->suspend_ctx.ctlr = readl_relaxed(base + GITS_CTLR);
> > + ret = gicv3_disable_its(its);
>
> This is called from system_suspend(), along the path iommu_suspend and
> console_suspend() are called, finally reaching gic_suspend() and this one.
>
> In the IHI 0069H.b, 5.6.2 Disabling an ITS, it says:
> “Ensure that all interrupts that target the ITS that is being powered down are
> either redirected or disabled”, is it correct to assume all the ITS targeting source
> at this point are disabled because domains should be already suspended?
Yes, that is the assumption here.
Before Xen reaches this path, each domain must already have entered
SHUTDOWN_suspend. In other words, the guest OS has already requested
SYSTEM_SUSPEND only after completing its own suspend flow, so the
ITS-targeting interrupt sources owned by that OS are expected to be
quiesced at this point.
So this code relies on the owning OS having disabled or otherwise
quiesced those sources before issuing SYSTEM_SUSPEND, rather than Xen
explicitly doing that in gicv3_its_suspend().
>
>
> > + if ( ret )
> > + {
> > + writel_relaxed(its->suspend_ctx.ctlr, base + GITS_CTLR);
>
> here and in the other places we write GITS_CTLR, this reg has Quiescent as RO,
> maybe we should mask the write to only the other bits that are writable?
Yes, this was inherited from the Linux ITS suspend/resume code, which restores
the saved GITS_CTLR value directly.
That said, masking the write to the writable bits is cleaner, and I will do
that in the next version.
>
> > + goto err;
> > + }
> > +
> > + its->suspend_ctx.cbaser = readq_relaxed(base + GITS_CBASER);
> > +
> > + for (i = 0; i < GITS_BASER_NR_REGS; i++)
>
> NIT: codestyle on the spaces and parenthesis
>
> > + {
> > + uint64_t baser = readq_relaxed(base + GITS_BASER0 + i * 8);
> > +
> > + its->suspend_ctx.baser[i] = 0;
> > +
> > + if ( !(baser & GITS_VALID_BIT) )
> > + continue;
> > +
> > + its->suspend_ctx.baser[i] = baser;
> > + }
> > + }
> > +
> > + return 0;
> > +
> > + err:
> > + list_for_each_entry_continue_reverse(its, &host_its_list, entry)
> > + writel_relaxed(its->suspend_ctx.ctlr, its->its_base + GITS_CTLR);
> > +
> > + return ret;
> > +}
> > +
> > +static int gicv3_its_resume_single(struct host_its *its, unsigned int cpu)
> > +{
> > + void __iomem *base = its->its_base;
> > + unsigned int i;
> > + int ret;
> > +
> > + /*
> > + * Make sure that the ITS is disabled. If it fails to quiesce,
> > + * don't restore it since writing to CBASER or BASER<n>
> > + * registers is undefined according to the GIC v3 ITS
> > + * Specification.
> > + */
> > + WARN_ON(readl_relaxed(base + GITS_CTLR) & GITS_CTLR_ENABLE);
> > + ret = gicv3_disable_its(its);
> > + if ( ret )
> > + return ret;
> > +
> > + writeq_relaxed(its->suspend_ctx.cbaser, base + GITS_CBASER);
> > +
> > + /*
> > + * Writing CBASER resets CREADR to 0, so reset CWRITER to
> > + * keep the command queue pointers aligned.
> > + */
> > + writeq_relaxed(0, base + GITS_CWRITER);
> > +
> > + /* Restore GITS_BASER from the value cache. */
> > + for ( i = 0; i < GITS_BASER_NR_REGS; i++ )
> > + {
> > + uint64_t baser = its->suspend_ctx.baser[i];
> > +
> > + if ( !(baser & GITS_VALID_BIT) )
> > + continue;
> > +
> > + writeq_relaxed(baser, base + GITS_BASER0 + i * 8);
> > + }
> > +
> > + writel_relaxed(its->suspend_ctx.ctlr, base + GITS_CTLR);
> > +
> > + return gicv3_its_setup_collection_single(its, cpu);
> > +}
> > +
> > +void gicv3_its_resume(void)
> > +{
> > + struct host_its *its;
> > + unsigned int cpu = smp_processor_id();
> > + int ret;
> > +
> > + list_for_each_entry(its, &host_its_list, entry)
> > + {
> > + ret = gicv3_its_resume_single(its, cpu);
> > + if ( ret )
> > + panic("GICv3: ITS@%"PRIpaddr": failed to restore during resume: %d\n",
> > + its->addr, ret);
> > + }
> > +}
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> >
> > /*
> > * Local variables:
> > diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> > index d182a71478..ef8318dd50 100644
> > --- a/xen/arch/arm/gic-v3.c
> > +++ b/xen/arch/arm/gic-v3.c
> > @@ -862,7 +862,7 @@ static bool gicv3_enable_lpis(void)
> > return true;
> > }
> >
> > -static int __init gicv3_populate_rdist(void)
> > +static int gicv3_populate_rdist(void)
> > {
> > int i;
> > uint32_t aff;
> > @@ -932,7 +932,7 @@ static int __init gicv3_populate_rdist(void)
> > ret = gicv3_lpi_init_rdist(ptr);
> > if ( ret && ret != -ENODEV && ret != -EBUSY )
> > {
> > - printk("GICv3: CPU%d: Cannot initialize LPIs: %u\n",
> > + printk("GICv3: CPU%d: Cannot initialize LPIs: %d\n",
>
> this is to fix the mistake of a patch before,
Yes, I will fold this into the previous patch.
Best regards,
Mykola
>
> Cheers,
> Luca
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 07/13] xen/arm: ffa: fix notification SRI across CPU hotplug/suspend
2026-04-27 8:20 ` Bertrand Marquis
@ 2026-05-05 10:18 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-05 10:18 UTC (permalink / raw)
To: Bertrand Marquis
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Volodymyr Babchuk,
Jens Wiklander, Stefano Stabellini, Julien Grall, Michal Orzel
Hi Bertrand,
Thank you for the review.
On Mon, Apr 27, 2026 at 11:22 AM Bertrand Marquis
<Bertrand.Marquis@arm.com> wrote:
>
> Hi Mykola,
>
> > On 2 Apr 2026, at 12:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
> >
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > The FF-A notification SRI interrupt handler was not correctly tied to
> > CPU hotplug and suspend/resume. As a result, CPUs going offline and
> > back online could end up with stale or missing handlers, breaking
> > delivery of FF-A notifications.
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
>
> This will probably need a rebase if the harden notification and VM to VM notification
> serie in FF-A is merged first.
I will rebase this patch if the FF-A notification series lands first.
>
> Anyway, changes look good so:
>
> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Thank you, I will add your Reviewed-by.
Best regards,
Mykola
>
> Cheers
> Bertrand
>
> > ---
> > xen/arch/arm/tee/ffa_notif.c | 63 ++++++++++++++++++++++++++++--------
> > 1 file changed, 50 insertions(+), 13 deletions(-)
> >
> > diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
> > index 186e726412..513c399594 100644
> > --- a/xen/arch/arm/tee/ffa_notif.c
> > +++ b/xen/arch/arm/tee/ffa_notif.c
> > @@ -360,10 +360,28 @@ static int32_t ffa_notification_bitmap_destroy(uint16_t vm_id)
> > return ffa_simple_call(FFA_NOTIFICATION_BITMAP_DESTROY, vm_id, 0, 0, 0);
> > }
> >
> > -void ffa_notif_init_interrupt(void)
> > +static DEFINE_PER_CPU_READ_MOSTLY(struct irqaction, sri_irq);
> > +
> > +static int request_sri_irq(void)
> > {
> > int ret;
> > + struct irqaction *sri_action = &this_cpu(sri_irq);
> > +
> > + sri_action->name = "FF-A notif";
> > + sri_action->handler = notif_irq_handler;
> > + sri_action->dev_id = NULL;
> > + sri_action->free_on_release = 0;
> > +
> > + ret = setup_irq(notif_sri_irq, 0, sri_action);
> > + if ( ret )
> > + printk(XENLOG_ERR "ffa: setup_irq irq %u failed: error %d\n",
> > + notif_sri_irq, ret);
> >
> > + return ret;
> > +}
> > +
> > +void ffa_notif_init_interrupt(void)
> > +{
> > if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
> > {
> > /*
> > @@ -376,14 +394,36 @@ void ffa_notif_init_interrupt(void)
> > * pending, while the SPMC in the secure world will not notice that
> > * the interrupt was lost.
> > */
> > - ret = request_irq(notif_sri_irq, 0, notif_irq_handler, "FF-A notif",
> > - NULL);
> > - if ( ret )
> > - printk(XENLOG_ERR "ffa: request_irq irq %u failed: error %d\n",
> > - notif_sri_irq, ret);
> > + request_sri_irq();
> > }
> > }
> >
> > +static void deinit_ffa_notif_interrupt(void)
> > +{
> > + if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
> > + release_irq(notif_sri_irq, NULL);
> > +}
> > +
> > +static int cpu_ffa_notif_callback(struct notifier_block *nfb,
> > + unsigned long action,
> > + void *hcpu)
> > +{
> > + switch ( action )
> > + {
> > + case CPU_DYING:
> > + deinit_ffa_notif_interrupt();
> > + break;
> > + default:
> > + break;
> > + }
> > +
> > + return NOTIFY_DONE;
> > +}
> > +
> > +static struct notifier_block cpu_ffa_notif_nfb = {
> > + .notifier_call = cpu_ffa_notif_callback,
> > +};
> > +
> > void ffa_notif_init(void)
> > {
> > const struct arm_smccc_1_2_regs arg = {
> > @@ -392,7 +432,6 @@ void ffa_notif_init(void)
> > };
> > struct arm_smccc_1_2_regs resp;
> > unsigned int irq;
> > - int ret;
> >
> > /* Only enable fw notification if all ABIs we need are supported */
> > if ( ffa_fw_supports_fid(FFA_NOTIFICATION_BITMAP_CREATE) &&
> > @@ -408,13 +447,11 @@ void ffa_notif_init(void)
> > notif_sri_irq = irq;
> > if ( irq >= NR_GIC_SGI )
> > irq_set_type(irq, IRQ_TYPE_EDGE_RISING);
> > - ret = request_irq(irq, 0, notif_irq_handler, "FF-A notif", NULL);
> > - if ( ret )
> > - {
> > - printk(XENLOG_ERR "ffa: request_irq irq %u failed: error %d\n",
> > - irq, ret);
> > +
> > + if ( request_sri_irq() )
> > return;
> > - }
> > +
> > + register_cpu_notifier(&cpu_ffa_notif_nfb);
> > fw_notif_enabled = true;
> > }
> > }
> > --
> > 2.43.0
> >
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 08/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2026-04-24 13:34 ` Luca Fancellu
@ 2026-05-05 11:45 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-05 11:45 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Luca,
Thank you for the review.
On Fri, Apr 24, 2026 at 4:36 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> > On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
> >
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >
> > Store and restore active context and micro-TLB registers.
> >
> > Tested on R-Car H3 Starter Kit.
> >
> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V7:
> > - moved suspend context allocation before pci stuff
> > ---
> > xen/drivers/passthrough/arm/ipmmu-vmsa.c | 305 ++++++++++++++++++++++-
> > 1 file changed, 298 insertions(+), 7 deletions(-)
> >
> > diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> > index ea9fa9ddf3..6765bd3083 100644
> > --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> > +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> > @@ -71,6 +71,8 @@
> > })
> > #endif
> >
> > +#define dev_dbg(dev, fmt, ...) \
> > + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> > #define dev_info(dev, fmt, ...) \
> > dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> > #define dev_warn(dev, fmt, ...) \
> > @@ -130,6 +132,24 @@ struct ipmmu_features {
> > unsigned int imuctr_ttsel_mask;
> > };
> >
>
>
> > […]
>
>
> >
> > @@ -1340,10 +1608,11 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> > struct iommu_fwspec *fwspec;
> >
> > #ifdef CONFIG_HAS_PCI
> > + int ret;
> > +
> > if ( dev_is_pci(dev) )
> > {
> > struct pci_dev *pdev = dev_to_pci(dev);
> > - int ret;
> >
> > if ( devfn != pdev->devfn )
> > return 0;
> > @@ -1371,6 +1640,15 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> > /* Let Xen know that the master device is protected by an IOMMU. */
> > dt_device_set_protected(dev_to_dt(dev));
> > }
> > +
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + if ( ipmmu_alloc_ctx_suspend(dev) )
> > + {
> > + dev_err(dev, "Failed to allocate context for suspend\n");
> > + return -ENOMEM;
> > + }
> > +#endif
>
> If this fails the device will remain protected, I suggest we move this one before `if ( !dev_is_pci(dev) ) { … }`
> block
Good point, thanks.
Yes, this should be fixed. In the original ordering, a failure in
ipmmu_alloc_ctx_suspend() could leave a non-PCI DT device marked as
protected even though ipmmu_add_device() returned an error.
I'll reorder the code so dt_device_set_protected() is done only
after ipmmu_alloc_ctx_suspend() succeeds. This keeps the successful path
unchanged and avoids leaving stale protected state on failure.
Best regards,
Mykola
>
> The rest looks ok to me, but I’m not an expert of this part.
>
> Cheers,
> Luca
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers
2026-04-27 14:02 ` Luca Fancellu
@ 2026-05-05 15:23 ` Mykola Kvach
2026-05-08 12:21 ` Luca Fancellu
0 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-05-05 15:23 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Bertrand Marquis,
Rahul Singh, Stefano Stabellini, Julien Grall, Michal Orzel,
Volodymyr Babchuk
Hi Luca,
Thank you for the review.
On Mon, Apr 27, 2026 at 5:03 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> >
> > diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> > index bf153227db..7607ffc9ca 100644
> > --- a/xen/drivers/passthrough/arm/smmu-v3.c
> > +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> > @@ -1814,8 +1814,7 @@ static int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
> > }
> >
> > /* GBPA is "special" */
> > -static int __init arm_smmu_update_gbpa(struct arm_smmu_device *smmu,
> > - u32 set, u32 clr)
> > +static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
> > {
> > int ret;
> > u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;
> > @@ -1995,10 +1994,29 @@ err_free_evtq_irq:
> > return ret;
> > }
> >
> > +static int arm_smmu_enable_irqs(struct arm_smmu_device *smmu)
> > +{
> > + int ret;
> > + u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
> > +
> > + if ( smmu->features & ARM_SMMU_FEAT_PRI )
> > + irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
> > +
> > + /* Enable interrupt generation on the SMMU */
> > + ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
> > + ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
> > + if ( ret )
> > + {
> > + dev_warn(smmu->dev, "failed to enable irqs\n");
> > + return ret;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int __init arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
> > {
> > int ret, irq;
> > - u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
> >
> > /* Disable IRQs first */
> > ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
> > @@ -2028,22 +2046,7 @@ static int __init arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
> > }
> > }
> >
> > - if (smmu->features & ARM_SMMU_FEAT_PRI)
> > - irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
> > -
> > - /* Enable interrupt generation on the SMMU */
> > - ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
> > - ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
> > - if (ret) {
> > - dev_warn(smmu->dev, "failed to enable irqs\n");
> > - goto err_free_irqs;
> > - }
> > -
> > return 0;
> > -
> > -err_free_irqs:
> > - arm_smmu_free_irqs(smmu);
> > - return ret;
> > }
> >
> > static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
> > @@ -2057,7 +2060,7 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
> > return ret;
> > }
> >
> > -static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> > +static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
> > {
> > int ret;
> > u32 reg, enables;
> > @@ -2163,17 +2166,9 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> > }
> > }
> >
> > - ret = arm_smmu_setup_irqs(smmu);
> > - if (ret) {
> > - dev_err(smmu->dev, "failed to setup irqs\n");
>
> We are moving this one to the probe and ..
>
> > + ret = arm_smmu_enable_irqs(smmu);
> > + if ( ret )
>
> changing with this one, but arm_smmu_setup_irqs() also calls arm_smmu_setup_unique_irqs() which
> calls arm_smmu_setup_msis(), are we sure that on resume we will get the same state?
This follows the split introduced in the Linux arm-smmu-v3 runtime/system sleep
series:
https://lore.kernel.org/linux-iommu/20260414194702.1229094-1-praan@google.com/
The intent is to keep IRQ handler registration as one-time probe state, while
reset/resume only restores the SMMU hardware state and re-enables interrupt
generation.
You are right that the MSI case needs extra care. In the Linux series this is
handled by arm_smmu_resume_msis(), which restores the SMMU-side MSI
configuration. I did not port that part in this patch because Xen SMMUv3 MSI
support is currently documented as unsupported and is not part of the
supported/tested path, so this patch only covers the wired IRQ path used by Xen
today.
If Xen SMMUv3 MSI support becomes usable in the future, the resume path will
need an equivalent MSI restore step before IRQ_CTRL is re-enabled.
I will add a code comment and update the commit message to make this scope
explicit. I also noticed that I accidentally dropped the reference to Pranjal's
Linux series while reworking the patch; I will restore the Link/attribution in
the next version.
>
> > return ret;
> > - }
> > -
> > - /* Initialize tasklets for threaded IRQs*/
> > - tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
> > - tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
> > - tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
> > - smmu);
> >
> > /* Enable the SMMU interface, or ensure bypass */
> > if (disable_bypass) {
> > @@ -2181,20 +2176,16 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> > } else {
> > ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT);
> > if (ret)
> > - goto err_free_irqs;
> > + return ret;
> > }
> > ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
> > ARM_SMMU_CR0ACK);
> > if (ret) {
> > dev_err(smmu->dev, "failed to enable SMMU interface\n");
> > - goto err_free_irqs;
> > + return ret;
> > }
> >
> > return 0;
> > -
> > -err_free_irqs:
> > - arm_smmu_free_irqs(smmu);
> > - return ret;
> > }
> >
> > static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
> > @@ -2558,10 +2549,23 @@ static int __init arm_smmu_device_probe(struct platform_device *pdev)
> > if (ret)
> > goto out_free;
> >
> > + ret = arm_smmu_setup_irqs(smmu);
> > + if ( ret )
> > + {
> > + dev_err(smmu->dev, "failed to setup irqs\n");
> > + goto out_free;
> > + }
> > +
> > + /* Initialize tasklets for threaded IRQs*/
> > + tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
> > + tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
> > + tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
> > + smmu);
> > +
> > /* Reset the device */
> > ret = arm_smmu_device_reset(smmu);
> > if (ret)
> > - goto out_free;
> > + goto out_free_irqs;
> >
> > /*
> > * Keep a list of all probed devices. This will be used to query
> > @@ -2575,6 +2579,8 @@ static int __init arm_smmu_device_probe(struct platform_device *pdev)
> >
> > return 0;
> >
> > +out_free_irqs:
> > + arm_smmu_free_irqs(smmu);
> >
> > out_free:
> > arm_smmu_free_structures(smmu);
> > @@ -2855,6 +2861,96 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> > xfree(xen_domain);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +static void arm_smmu_reset_for_suspend_rollback(struct arm_smmu_device *smmu)
> > +{
> > + int ret = arm_smmu_device_reset(smmu);
> > +
> > + if ( ret )
> > + dev_err(smmu->dev, "Failed to reset during suspend rollback: %d\n",
> > + ret);
> > +}
> > +
> > +static int arm_smmu_suspend(void)
> > +{
> > + struct arm_smmu_device *smmu;
> > + int ret = 0;
> > +
> > + list_for_each_entry(smmu, &arm_smmu_devices, devices)
> > + {
> > + bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
> > +
> > + /* Abort all transactions before disable to avoid spurious bypass */
> > + ret = arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> > + if ( ret )
> > + goto fail;
> > +
> > + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> > + ret = arm_smmu_write_reg_sync(smmu, CR0_CMDQEN, ARM_SMMU_CR0,
> > + ARM_SMMU_CR0ACK);
> > + if ( ret )
> > + {
> > + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> > + goto fail;
> > + }
> > +
> > + /*
> > + * At this point the SMMU is completely disabled and won't access
> > + * any translation/config structures, even speculative accesses
> > + * aren't performed as per the IHI0070 spec (section 6.3.9.6).
> > + */
> > +
> > + /* Wait for the CMDQs to be drained to flush any pending commands */
> > + ret = queue_poll_cons(&smmu->cmdq.q, true, wfe);
> > + if ( ret )
> > + {
> > + dev_err(smmu->dev, "Draining queues timed-out\n");
> > + goto fail;
> > + }
>
> polling the queue doesn’t give you the assurance that all prior commands are complete,
> I would use arm_smmu_cmdq_issue_sync for that instead of the above.
>
> ret = arm_smmu_cmdq_issue_sync(smmu);
> if ( ret )
> goto fail;
Yes, I agree.
Polling CONS only shows that the SMMU has consumed the CMDQ entries; it does
not provide the completion semantics we want here. I will replace the direct
queue_poll_cons() in the suspend path with arm_smmu_cmdq_issue_sync(), while
CMDQ is still enabled, and update the comment/commit message accordingly.
Best regards,
Mykola
>
> > +
> > + /* Disable everything */
> > + ret = arm_smmu_device_disable(smmu);
> > + if ( ret )
> > + goto fail;
> > +
> > + dev_dbg(smmu->dev, "Suspended smmu\n");
> > + }
> > +
> > + return 0;
> > +
> > + fail:
> > + /* Reset the device that failed as well as any already-suspended ones. */
> > + arm_smmu_reset_for_suspend_rollback(smmu);
> > +
> > + list_for_each_entry_continue_reverse(smmu, &arm_smmu_devices, devices)
> > + arm_smmu_reset_for_suspend_rollback(smmu);
> > +
> > + return ret;
> > +}
> > +
>
> Cheers,
> Luca
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume
2026-04-27 14:50 ` Luca Fancellu
@ 2026-05-05 15:55 ` Mykola Kvach
2026-05-08 13:26 ` Luca Fancellu
0 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-05-05 15:55 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Luca,
Thank you for the review.
On Mon, Apr 27, 2026 at 5:51 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> > On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
> >
> > From: Mirela Simonovic <mirela.simonovic@aggios.com>
> >
> > The MMU must be enabled during the resume path before restoring context,
> > as virtual addresses are used to access the saved context data.
> >
> > This patch adds MMU setup during resume by reusing the existing
> > enable_secondary_cpu_mm function, which enables data cache and the MMU.
>
> I don’t understand where this last part happen in this commit:
This is a leftover from before the commits were reorganized. I will update the
commit message in v9 so that it only describes what this patch actually does.
>
> > Before the MMU is enabled, the content of TTBR0_EL2 is changed to point
> > to init_ttbr (page tables used at runtime).
> >
> > Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> > Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> > Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in v7:
> > - no functional changes, just moved commit
> > ---
> > xen/arch/arm/arm64/head.S | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 72c7b24498..596e960152 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -561,6 +561,30 @@ END(efi_xen_start)
> >
> > #endif /* CONFIG_ARM_EFI */
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +FUNC(hyp_resume)
>
> I think we should mask all exceptions here:
> msr DAIFSet, 0xf
>
> until we resume correctly the status (VBAR_EL2, etc).
This was discussed in an earlier version:
https://patchew.org/Xen/cover.1741164138.git.xakep.amatop@gmail.com/2ef15cb605f987eb087c5496d123c47c01cc0ae7.1741164138.git.xakep.amatop@gmail.com/#CAGeoDV97no7mXSKd7auFu5E85wSXAHKWvqGW2=-VEAbkrTyU8Q@mail.gmail.com
For SYSTEM_SUSPEND, PSCI ties the call semantics to CPU_SUSPEND. In
particular, section 5.20.2 says that the caller must observe all the rules
described for CPU_SUSPEND, and section 6.4 explicitly says that the initial
state rules also apply to SYSTEM_SUSPEND.
For the return Exception level on AArch64, section 6.4.3.3 requires
SPSR_ELx.{D,A,I,F} to be set to {1, 1, 1, 1}. Therefore Xen expects to enter
this resume path with DAIF already masked by PSCI-compliant firmware.
I agree this assumption is not obvious from the code, so I will add a comment
at the resume entry point to document that this path relies on the PSCI initial
core configuration requirements.
>
> > + /* Initialize the UART if earlyprintk has been enabled. */
> > +#ifdef CONFIG_EARLY_PRINTK
> > + bl init_uart
> > +#endif
> > + PRINT_ID("- Xen resuming -\r\n")
> > +
> > + bl check_cpu_mode
> > + bl cpu_init
> > +
> > + ldr x0, =start
> > + adr x20, start /* x20 := paddr (start) */
> > + sub x20, x20, x0 /* x20 := phys-offset */
> > + ldr lr, =mmu_resumed
> > + b enable_secondary_cpu_mm
> > +
> > +mmu_resumed:
> > + b .
> > +END(hyp_resume)
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > /*
> > * Local variables:
> > * mode: ASM
> >
>
> This is more a trampoline for the core resuming, not sure if it could be better to squash this
> into the following patch, the maintainer could provide their preference.
Yes, this patch is only the low-level resume trampoline before the context
restore code is added by the following patch. I do not have a strong preference
between keeping it separate and squashing it into the next patch. I can squash
them in v9 unless the maintainers prefer to keep the trampoline separate.
Best regards,
Mykola
>
> Cheers,
> Luca
>
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 12/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
2026-04-27 16:21 ` Luca Fancellu
@ 2026-05-05 16:15 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-05 16:15 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Luca,
Thank you for the review.
On Mon, Apr 27, 2026 at 7:23 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> > On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
> >
> > From: Mirela Simonovic <mirela.simonovic@aggios.com>
> >
> > Invoke PSCI SYSTEM_SUSPEND to finalize Xen's suspend sequence on ARM64 platforms.
> > Pass the resume entry point (hyp_resume) as the first argument to EL3. The resume
> > handler is currently a stub and will be implemented later in assembly. Ignore the
> > context ID argument, as is done in Linux.
> >
> > Only enable this path when CONFIG_SYSTEM_SUSPEND is set and
> > PSCI version is >= 1.0.
> >
> > Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> > Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> > Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in v7:
> > - no changes
> > ---
> > xen/arch/arm/include/asm/psci.h | 1 +
> > xen/arch/arm/psci.c | 23 ++++++++++++++++++++++-
> > 2 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
> > index 48a93e6b79..bb3c73496e 100644
> > --- a/xen/arch/arm/include/asm/psci.h
> > +++ b/xen/arch/arm/include/asm/psci.h
> > @@ -23,6 +23,7 @@ int call_psci_cpu_on(int cpu);
> > void call_psci_cpu_off(void);
> > void call_psci_system_off(void);
> > void call_psci_system_reset(void);
> > +int call_psci_system_suspend(void);
> >
> > /* Range of allocated PSCI function numbers */
> > #define PSCI_FNUM_MIN_VALUE _AC(0,U)
> > diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
> > index b6860a7760..c9d126b195 100644
> > --- a/xen/arch/arm/psci.c
> > +++ b/xen/arch/arm/psci.c
> > @@ -17,17 +17,20 @@
> > #include <asm/cpufeature.h>
> > #include <asm/psci.h>
> > #include <asm/acpi.h>
> > +#include <asm/suspend.h>
> >
> > /*
> > * While a 64-bit OS can make calls with SMC32 calling conventions, for
> > * some calls it is necessary to use SMC64 to pass or return 64-bit values.
> > - * For such calls PSCI_0_2_FN_NATIVE(x) will choose the appropriate
> > + * For such calls PSCI_*_FN_NATIVE(x) will choose the appropriate
> > * (native-width) function ID.
> > */
> > #ifdef CONFIG_ARM_64
> > #define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN64_##name
> > +#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN64_##name
> > #else
> > #define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN32_##name
> > +#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN32_##name
> > #endif
> >
> > uint32_t psci_ver;
> > @@ -60,6 +63,24 @@ void call_psci_cpu_off(void)
> > }
> > }
> >
> > +int call_psci_system_suspend(void)
> > +{
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + struct arm_smccc_res res;
> > +
> > + if ( psci_ver < PSCI_VERSION(1, 0) )
> > + return PSCI_NOT_SUPPORTED;
> > +
> > + /* 2nd argument (context ID) is not used */
> > + arm_smccc_smc(PSCI_1_0_FN_NATIVE(SYSTEM_SUSPEND), __pa(hyp_resume), &res);
>
> I think Linux is passing 0 as context ID, probably to mark that it’s not used, I think we should do the
> same
Yes, agreed.
SYSTEM_SUSPEND takes context_id as the second PSCI argument, and Xen does
not use it. I will pass it explicitly as 0 instead of relying on the SMCCC
wrapper/default register state.
Best regards,
Mykola
>
> > + return PSCI_RET(res);
> > +#else
> > + dprintk(XENLOG_WARNING,
> > + "SYSTEM_SUSPEND not supported (CONFIG_SYSTEM_SUSPEND disabled)\n");
> > + return PSCI_NOT_SUPPORTED;
> > +#endif
> > +}
> > +
> > void call_psci_system_off(void)
> > {
> > if ( psci_ver > PSCI_VERSION(0, 1) )
> >
>
> Cheers,
> Luca
>
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
2026-04-29 8:05 ` Luca Fancellu
@ 2026-05-05 20:34 ` Mykola Kvach
2026-05-07 22:25 ` Volodymyr Babchuk
2026-05-08 14:30 ` Luca Fancellu
0 siblings, 2 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-05 20:34 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné,
Rahul Singh
Hi Luca,
Thanks for the feedback.
On Wed, Apr 29, 2026 at 11:06 AM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> > diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> > index e38566b0b7..4d1289776b 100644
> > --- a/xen/arch/arm/suspend.c
> > +++ b/xen/arch/arm/suspend.c
> > @@ -1,9 +1,190 @@
> > /* SPDX-License-Identifier: GPL-2.0-only */
> >
> > +#include <asm/psci.h>
> > #include <asm/suspend.h>
> >
> > +#include <public/sched.h>
> > +#include <xen/console.h>
> > +#include <xen/cpu.h>
> > +#include <xen/errno.h>
> > +#include <xen/iommu.h>
> > +#include <xen/sched.h>
> > +#include <xen/tasklet.h>
> > +
> > struct cpu_context cpu_context = {};
> >
> > +static int can_system_suspend(void)
> > +{
> > + int ret = 0;
> > + struct domain *d;
> > +
> > + rcu_read_lock(&domlist_read_lock);
> > +
> > + for_each_domain ( d )
> > + {
> > + bool domain_suspended;
> > +
> > + spin_lock(&d->shutdown_lock);
> > + domain_suspended = d->is_shut_down &&
> > + d->shutdown_code == SHUTDOWN_suspend;
> > + spin_unlock(&d->shutdown_lock);
> > +
> > + if ( domain_suspended )
> > + continue;
> > +
> > + printk(XENLOG_ERR
> > + "System suspend requires all domains to be shut down for suspend (dom%d: isn't in suspend state)\n",
>
> d->domain_id is unsigned if I’m not mistaken, it wants %u (typedef uint16_t domid_t;)
Ack, I will fix it in v9.
>
> > + d->domain_id);
> > +
> > + ret = -EBUSY;
> > + break;
> > + }
> > +
> > + rcu_read_unlock(&domlist_read_lock);
> > +
> > + return ret;
> > +}
> > +
> > +/* Xen suspend. data identifies the domain that initiated suspend. */
> > +static void system_suspend(void *data)
> > +{
> > + int status;
> > + unsigned long flags;
> > + struct domain *d = (struct domain *)data;
> > +
> > + BUG_ON(system_state != SYS_STATE_active);
> > +
> > + system_state = SYS_STATE_suspend;
> > +
> > + printk("Xen suspending...\n");
> > +
> > + freeze_domains();
> > + scheduler_disable();
> > +
> > + status = can_system_suspend();
> > + if ( status )
> > + {
> > + system_state = SYS_STATE_resume;
> > + goto resume_scheduler;
>
> When we have an error and we get the resume_scheduler path, we apply back the
> context of the guest saved previously in do_psci_1_0_system_suspend(), so am I
> correct saying the guest won’t get any PSCI error back and we resume the guest
> from the guest resume entrypoint?
>
> In case, should we have a different path that returns a PSCI error (PSCI_*) into the guest
> x0, and skips the context restore?
You are right about the current control flow: once the virtual
SYSTEM_SUSPEND request has been accepted and the domain has been parked, a
later failure in the Xen-wide suspend path resumes the domain through the normal
domain resume path, rather than returning a PSCI error from the original call.
This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND
path parks the domain and saves its resume context. The actual Xen-wide host
suspend is a separate step that is attempted only after all domains are
suspended.
So a failure in the later Xen-wide suspend step is treated as an abort of the
host suspend attempt after the domain suspend was already accepted. The domain
is then resumed through the existing domain resume path, similarly to the
toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI
call path and modifying the saved vCPU context again.
I agree this design is not obvious from the patch. I will clarify the commit
message and comments. If you or the maintainers think that failures before the
physical SYSTEM_SUSPEND call succeeds should be reported back through the
original virtual PSCI call, then this would require a different flow. I was
trying to avoid that extra complexity in this series.
>
> > + }
> > +
> > + /*
> > + * Non-boot CPUs have to be disabled on suspend and enabled on resume
> > + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
> > + * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
> > + * platform capabilities, this may lead to the physical powering down of
> > + * CPUs.
> > + */
> > + status = disable_nonboot_cpus();
> > + if ( status )
> > + {
> > + system_state = SYS_STATE_resume;
> > + goto resume_nonboot_cpus;
> > + }
> > +
> > + time_suspend();
> > +
> > + status = iommu_suspend();
> > + if ( status )
> > + {
> > + system_state = SYS_STATE_resume;
> > + goto resume_time;
> > + }
> > +
> > + console_start_sync();
> > + status = console_suspend();
> > + if ( status )
> > + {
> > + dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", status);
> > + system_state = SYS_STATE_resume;
> > + goto resume_end_sync;
> > + }
> > +
> > + local_irq_save(flags);
> > + status = gic_suspend();
> > + if ( status )
> > + {
> > + system_state = SYS_STATE_resume;
> > + goto resume_irqs;
> > + }
> > +
> > + set_init_ttbr(xen_pgtable);
> > +
> > + /*
> > + * Enable identity mapping before entering suspend to simplify
> > + * the resume path
> > + */
> > + update_boot_mapping(true);
> > +
> > + if ( prepare_resume_ctx(&cpu_context) )
> > + {
> > + status = call_psci_system_suspend();
> > + /*
> > + * If suspend is finalized properly by above system suspend PSCI call,
> > + * the code below in this 'if' branch will never execute. Execution
> > + * will continue from hyp_resume which is the hypervisor's resume point.
> > + * In hyp_resume CPU context will be restored and since link-register is
> > + * restored as well, it will appear to return from prepare_resume_ctx.
> > + * The difference in returning from prepare_resume_ctx on system suspend
> > + * versus resume is in function's return value: on suspend, the return
> > + * value is a non-zero value, on resume it is zero. That is why the
> > + * control flow will not re-enter this 'if' branch on resume.
> > + */
> > + if ( status )
> > + dprintk(XENLOG_WARNING, "PSCI system suspend failed, err=%d\n",
> > + status);
> > + }
> > +
> > + system_state = SYS_STATE_resume;
> > + update_boot_mapping(false);
> > +
> > + gic_resume();
> > +
> > + resume_irqs:
> > + local_irq_restore(flags);
> > +
> > + console_resume();
> > + resume_end_sync:
> > + console_end_sync();
> > +
> > + iommu_resume();
> > +
> > + resume_time:
> > + time_resume();
> > +
> > + resume_nonboot_cpus:
> > + /*
> > + * The rcu_barrier() has to be added to ensure that the per cpu area is
> > + * freed before a non-boot CPU tries to initialize it (_free_percpu_area()
> > + * has to be called before the init_percpu_area()). This scenario occurs
> > + * when non-boot CPUs are hot-unplugged on suspend and hotplugged on resume.
> > + */
> > + rcu_barrier();
> > + enable_nonboot_cpus();
> > +
> > + resume_scheduler:
> > + scheduler_enable();
> > + thaw_domains();
> > +
> > + system_state = SYS_STATE_active;
> > +
> > + printk("Resume (status %d)\n", status);
> > +
> > + domain_resume(d);
> > +}
> > +
> > +static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
> > +
> > +void host_system_suspend(struct domain *d)
> > +{
> > + system_suspend_tasklet.data = (void *)d;
> > + /*
> > + * The suspend procedure has to be finalized by the pCPU#0 (non-boot pCPUs
> > + * will be disabled during the suspend).
> > + */
> > + tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
> > +}
> > +
> > /*
> > * Local variables:
> > * mode: C
> > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> > index bd87ec430d..8fb9172186 100644
> > --- a/xen/arch/arm/vpsci.c
> > +++ b/xen/arch/arm/vpsci.c
> > @@ -5,6 +5,7 @@
> >
> > #include <asm/current.h>
> > #include <asm/domain.h>
> > +#include <asm/suspend.h>
> > #include <asm/vgic.h>
> > #include <asm/vpsci.h>
> > #include <asm/event.h>
> > @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> > if ( is_64bit_domain(d) && is_thumb )
> > return PSCI_INVALID_ADDRESS;
> >
> > - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> > - if ( is_hardware_domain(d) )
> > + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
> > return PSCI_NOT_SUPPORTED;
> >
> > /* Ensure that all CPUs other than the calling one are offline */
> > @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> > "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", cid=%#"PRIregister"\n",
> > epoint, cid);
> >
> > + if ( is_control_domain(d) )
>
> Why is_control_domain() here and not is_hardware_domain() ?
The use of is_control_domain() is intentional.
The intended model is that Xen-wide host suspend is orchestrated by the
privileged management/control domain. The control domain coordinates the
toolstack side, asks other domains to enter suspend, and then issues the final
SYSTEM_SUSPEND request to Xen.
This does not have to be the same entity as the hardware domain. If the
hardware domain is separate, it is one of the domains that the control domain
parks before the final host suspend step.
The hwdom-specific checks in this patch have a different purpose: they avoid
the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware
domain to be parked as part of the suspend sequence. They do not define the
policy for who is allowed to trigger Xen-wide host suspend.
That said, this policy may not be optimal for all configurations, especially
when the control and hardware domain roles are split. I would appreciate your
view, as well as the maintainers' views, on whether the trigger should remain
control-domain based, be tied to the hardware domain instead, or be expressed
through a separate host-suspend capability/helper.
>
> > + host_system_suspend(d);
> > +
> > return rc;
> > }
> >
> > @@ -290,7 +293,10 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> > return 0;
> > case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> > case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> > - return is_hardware_domain(current->domain) ? PSCI_NOT_SUPPORTED : 0;
> > + if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) ||
> > + !is_hardware_domain(current->domain) )
>
> Should this have also the condition that “is hardware domain and psci_ver >= PSCI_VERSION(1, 0)”?
> Otherwise if the host machine doestn’t support PSCI 1.0 we would return OK here but the call would
> fail later in call_psci_system_suspend()?
Good point.
I agree that, for the domain allowed to trigger Xen-wide suspend, Xen should
not advertise SYSTEM_SUSPEND if the host suspend path cannot be used.
I think this should be checked as an explicit host SYSTEM_SUSPEND capability,
rather than only as psci_ver >= PSCI_VERSION(1, 0). The same capability check
also needs to be enforced in the actual SYSTEM_SUSPEND handler before parking
the domain, because a caller may invoke SYSTEM_SUSPEND directly without first
querying PSCI_FEATURES.
For ordinary guests, the physical PSCI version is not relevant because they
cannot trigger host suspend; their SYSTEM_SUSPEND path is virtual.
I will make this consistent in v9: PSCI_FEATURES will advertise SYSTEM_SUSPEND
for the host-suspend-triggering domain only when the host SYSTEM_SUSPEND backend
is available, and the actual SYSTEM_SUSPEND path will enforce the same check.
>
> > + return 0;
> > + fallthrough;
> > default:
> > return PSCI_NOT_SUPPORTED;
> > }
> > diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> > index 0a20aa0a12..feb1336f46 100644
> > --- a/xen/common/Kconfig
> > +++ b/xen/common/Kconfig
> > @@ -137,6 +137,9 @@ config HAS_EX_TABLE
> > config HAS_FAST_MULTIPLY
> > bool
> >
> > +config HAS_HWDOM_SYSTEM_SUSPEND
> > + bool
> > +
> > config HAS_IOPORTS
> > bool
> >
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index bb9e210c28..d3edfb2a13 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -1375,6 +1375,11 @@ void __domain_crash(struct domain *d)
> > domain_shutdown(d, SHUTDOWN_crash);
> > }
> >
> > +static inline bool want_hwdom_shutdown(uint8_t reason)
> > +{
> > + return !IS_ENABLED(CONFIG_HAS_HWDOM_SYSTEM_SUSPEND) ||
> > + reason != SHUTDOWN_suspend;
> > +}
> >
> > int domain_shutdown(struct domain *d, u8 reason)
> > {
> > @@ -1391,7 +1396,7 @@ int domain_shutdown(struct domain *d, u8 reason)
> > d->shutdown_code = reason;
> > reason = d->shutdown_code;
> >
> > - if ( is_hardware_domain(d) )
> > + if ( is_hardware_domain(d) && want_hwdom_shutdown(reason) )
> > hwdom_shutdown(reason);
> >
> > if ( d->is_shutting_down )
> > diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> > index 22d306d0cb..45f29ef8ec 100644
> > --- a/xen/drivers/passthrough/arm/smmu.c
> > +++ b/xen/drivers/passthrough/arm/smmu.c
> > @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> > xfree(xen_domain);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +static int arm_smmu_suspend(void)
> > +{
> > + return -ENOSYS;
> > +}
> > +#endif
>
> Maybe we want to gate the feature also to !CONFIG_ARM_SMMU ? I would wait for the maintainers
> view on this.
I feel that gating this strictly on !CONFIG_ARM_SMMU might not be the most
optimal approach here.
CONFIG_ARM_SMMU is a build-time option and does not mean that an old SMMUv1/v2
device is actually present. Using it would disable system suspend even on
platforms where only SMMUv3 is used, because CONFIG_ARM_SMMU is enabled by
default for Arm.
The condition should be runtime-based: whether the active/probed IOMMU devices
have system suspend/resume support. For the old ARM SMMU driver this is not
implemented today, so a platform with an SMMUv1/v2 instance should not expose
or attempt host suspend.
I think we should handle this by tracking whether any old ARM SMMUv1/v2 device
was actually probed, or by adding a generic IOMMU suspend capability check. Then
the host suspend availability check can reject system suspend only when such an
unsupported IOMMU is present, instead of disabling the feature for all
Arm builds
with CONFIG_ARM_SMMU enabled.
I would be interested to hear if you or the maintainers see a better way to
express this capability.
Best regards,
Mykola
>
> Cheers,
> Luca
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2026-04-21 13:24 ` Luca Fancellu
@ 2026-05-07 7:48 ` Mykola Kvach
2026-05-08 10:56 ` Luca Fancellu
0 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-05-07 7:48 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Luca,
Thank you for the feedback.
On Tue, Apr 21, 2026 at 4:26 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> >
> > diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> > index b23e72a3d0..dbff470962 100644
> > --- a/xen/arch/arm/gic-v2.c
> > +++ b/xen/arch/arm/gic-v2.c
> > @@ -1098,6 +1098,129 @@ static int gicv2_iomem_deny_access(struct domain *d)
> > return iomem_deny_access(d, mfn, mfn + nr);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +/* This struct represents block of 32 IRQs */
> > +struct irq_block {
> > + uint32_t icfgr[2]; /* 2 registers of 16 IRQs each */
> > + uint32_t ipriorityr[8];
> > + uint32_t isenabler;
> > + uint32_t isactiver;
> > + uint32_t itargetsr[8];
> > +};
> > +
> > +/* GICv2 registers to be saved/restored on system suspend/resume */
> > +struct gicv2_context {
> > + /* GICC context */
> > + struct cpu_ctx {
> > + uint32_t ctlr;
> > + uint32_t pmr;
> > + uint32_t bpr;
> > + } cpu;
> > +
> > + /* GICD context */
> > + struct dist_ctx {
> > + uint32_t ctlr;
> > + /* Includes banked SGI/PPI state for the boot CPU. */
> > + struct irq_block *irqs;
> > + } dist;
> > +};
> > +
> > +static struct gicv2_context gic_ctx;
> > +
> > +static int gicv2_suspend(void)
> > +{
> > + unsigned int i, blocks = DIV_ROUND_UP(gicv2_info.nr_lines, 32);
> > +
> > + /* Save GICC_CTLR configuration. */
> > + gic_ctx.cpu.ctlr = readl_gicc(GICC_CTLR);
> > +
> > + /* Quiesce the GIC CPU interface before suspend. */
> > + gicv2_cpu_disable();
> > +
> > + /* Save GICD configuration */
> > + gic_ctx.dist.ctlr = readl_gicd(GICD_CTLR);
> > + writel_gicd(0, GICD_CTLR);
> > +
> > + gic_ctx.cpu.pmr = readl_gicc(GICC_PMR);
> > + gic_ctx.cpu.bpr = readl_gicc(GICC_BPR);
> > +
> > + for ( i = 0; i < blocks; i++ )
> > + {
> > + struct irq_block *irqs = gic_ctx.dist.irqs + i;
> > + size_t j, off = i * sizeof(irqs->isenabler);
> > +
> > + irqs->isenabler = readl_gicd(GICD_ISENABLER + off);
> > + irqs->isactiver = readl_gicd(GICD_ISACTIVER + off);
> > +
> > + off = i * sizeof(irqs->ipriorityr);
> > + for ( j = 0; j < ARRAY_SIZE(irqs->ipriorityr); j++ )
> > + {
> > + irqs->ipriorityr[j] = readl_gicd(GICD_IPRIORITYR + off + j * 4);
> > + irqs->itargetsr[j] = readl_gicd(GICD_ITARGETSR + off + j * 4);
>
> regarding GICD_ITARGETSR ...
>
> > + }
> > +
> > + off = i * sizeof(irqs->icfgr);
> > + for ( j = 0; j < ARRAY_SIZE(irqs->icfgr); j++ )
> > + irqs->icfgr[j] = readl_gicd(GICD_ICFGR + off + j * 4);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void gicv2_resume(void)
> > +{
> > + unsigned int i, blocks = DIV_ROUND_UP(gicv2_info.nr_lines, 32);
> > +
> > + gicv2_cpu_disable();
> > + /* Disable distributor */
> > + writel_gicd(0, GICD_CTLR);
> > +
> > + for ( i = 0; i < blocks; i++ )
> > + {
> > + struct irq_block *irqs = gic_ctx.dist.irqs + i;
> > + size_t j, off = i * sizeof(irqs->isenabler);
> > +
> > + writel_gicd(GENMASK(31, 0), GICD_ICENABLER + off);
> > + writel_gicd(irqs->isenabler, GICD_ISENABLER + off);
> > +
> > + writel_gicd(GENMASK(31, 0), GICD_ICACTIVER + off);
> > + writel_gicd(irqs->isactiver, GICD_ISACTIVER + off);
> > +
> > + off = i * sizeof(irqs->ipriorityr);
> > + for ( j = 0; j < ARRAY_SIZE(irqs->ipriorityr); j++ )
> > + {
> > + writel_gicd(irqs->ipriorityr[j], GICD_IPRIORITYR + off + j * 4);
> > + writel_gicd(irqs->itargetsr[j], GICD_ITARGETSR + off + j * 4);
>
> … please let me know if I read correctly this loop, but here GICD_ITARGETSR0 … 7
> are restored when i=0, but the specificaitons says that this block is read only on
> multiprocessor, so we should skip the restore part.
> Also saving it could be skipped because each field returns a value that corresponds
> only to the processor reading the register.
>
> 4.3.12 User constraints [1]
You are right, thanks for pointing this out.
I will skip saving/restoring the read-only GICD_ITARGETSR0..7 block in v9.
>
> > + }
> > +
> > + off = i * sizeof(irqs->icfgr);
> > + for ( j = 0; j < ARRAY_SIZE(irqs->icfgr); j++ )
> > + writel_gicd(irqs->icfgr[j], GICD_ICFGR + off + j * 4);
> > + }
> > +
> > + /* Make sure all registers are restored and enable distributor */
> > + writel_gicd(gic_ctx.dist.ctlr, GICD_CTLR);
> > +
> > + /* Restore GIC CPU interface configuration */
> > + writel_gicc(gic_ctx.cpu.pmr, GICC_PMR);
> > + writel_gicc(gic_ctx.cpu.bpr, GICC_BPR);
> > +
> > + /* Enable GIC CPU interface */
> > + writel_gicc(gic_ctx.cpu.ctlr, GICC_CTLR);
> > +}
> > +
>
> I also see that we don’t save pending SGIs state (by GICD_CPENDSGIRn/GICD_SPENDSGIRn) or Active Priorities registers
> state (GICC_APRn/GICC_NSAPRn [latter if security extension are there]) as written in [1] “4.5 Preserving and restoring GIC state”,
> was it intentional?
Yes, this was intentional.
The GICv2 suspend callback is called at a quiescent point in the
SYSTEM_SUSPEND path: all domains are already shut down for suspend, guest
execution is quiesced, the scheduler is disabled, non-boot CPUs have been
offlined, and CPU0 enters gic_suspend() with local interrupts disabled.
For SGIs, I don't consider GICD_CPENDSGIRn/GICD_SPENDSGIRn part of the saved
host GIC context. Xen uses physical SGIs as IPIs, and IPI delivery is an
internal synchronization mechanism, not architectural state that should be
replayed after SYSTEM_SUSPEND. Guest SGI state is virtual GIC state and is not
represented by these physical GICD SGI pending registers.
For GICC_APRn/GICC_NSAPRn, those registers describe active priority state for
interrupts already acknowledged by the CPU interface. The final suspend path is
not expected to run with an active physical interrupt context. If those
registers were non-zero there, restoring only APR/NSAPR would not make the
corresponding interrupt handling context valid after resume, and could instead
leave the CPU interface with stale active priority state.
So I did not add save/restore for GICD_CPENDSGIRn/GICD_SPENDSGIRn or
GICC_APRn/GICC_NSAPRn in this patch. I can add a short comment in v9 to make
this scope explicit.
Please let me know if you think there is a suspend/resume path where this
state still needs to be preserved.
Best regards,
Mykola
>
> [1] https://developer.arm.com/documentation/ihi0048/bb/?lang=en
>
> Cheers,
> Luca
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume
2026-04-02 10:45 ` [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
2026-04-27 14:50 ` Luca Fancellu
@ 2026-05-07 22:06 ` Volodymyr Babchuk
2026-05-08 20:59 ` Mykola Kvach
1 sibling, 1 reply; 66+ messages in thread
From: Volodymyr Babchuk @ 2026-05-07 22:06 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel
Hi Mykola,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> The MMU must be enabled during the resume path before restoring context,
> as virtual addresses are used to access the saved context data.
>
I agree with Luca, this patch does not makes sense as is. I don't see
why it should be separated from the rest of the resume path that is
added in the next patch
> This patch adds MMU setup during resume by reusing the existing
> enable_secondary_cpu_mm function, which enables data cache and the MMU.
> Before the MMU is enabled, the content of TTBR0_EL2 is changed to point
> to init_ttbr (page tables used at runtime).
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in v7:
> - no functional changes, just moved commit
> ---
> xen/arch/arm/arm64/head.S | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 72c7b24498..596e960152 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -561,6 +561,30 @@ END(efi_xen_start)
>
> #endif /* CONFIG_ARM_EFI */
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +FUNC(hyp_resume)
> + /* Initialize the UART if earlyprintk has been enabled. */
> +#ifdef CONFIG_EARLY_PRINTK
> + bl init_uart
> +#endif
> + PRINT_ID("- Xen resuming -\r\n")
> +
> + bl check_cpu_mode
> + bl cpu_init
> +
> + ldr x0, =start
> + adr x20, start /* x20 := paddr (start) */
> + sub x20, x20, x0 /* x20 := phys-offset */
> + ldr lr, =mmu_resumed
> + b enable_secondary_cpu_mm
> +
> +mmu_resumed:
> + b .
> +END(hyp_resume)
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> /*
> * Local variables:
> * mode: ASM
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume
2026-04-02 10:45 ` [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
2026-04-27 15:26 ` Luca Fancellu
@ 2026-05-07 22:17 ` Volodymyr Babchuk
2026-05-08 10:38 ` Mykola Kvach
2026-05-11 16:00 ` Oleksandr Tyshchenko
2 siblings, 1 reply; 66+ messages in thread
From: Volodymyr Babchuk @ 2026-05-07 22:17 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel
Hi Mikola,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> The context of CPU general purpose and system control registers must be
> saved on suspend and restored on resume. This is implemented in
> prepare_resume_ctx and before the return from the hyp_resume function.
> The prepare_resume_ctx must be invoked just before the PSCI system suspend
> call is issued to the ATF. The prepare_resume_ctx must return a non-zero
> value so that the calling 'if' statement evaluates to true, causing the
> system suspend to be invoked. Upon resume, the context saved on suspend
> will be restored, including the link register. Therefore, after
> restoring the context, the control flow will return to the address
> pointed to by the saved link register, which is the place from which
> prepare_resume_ctx was called. To ensure that the calling 'if' statement
> does not again evaluate to true and initiate system suspend, hyp_resume
> must return a zero value after restoring the context.
>
> Note that the order of saving register context into cpu_context structure
> must match the order of restoring.
>
> Support for ARM32 is not implemented. Instead, compilation fails with a
> build-time error if suspend is enabled for ARM32.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in v8:
> - fix alignments in code
>
> Changes in v7:
> - no changes
> ---
> xen/arch/arm/Makefile | 1 +
> xen/arch/arm/arm64/head.S | 90 +++++++++++++++++++++++++++++-
> xen/arch/arm/include/asm/suspend.h | 26 +++++++++
> xen/arch/arm/suspend.c | 14 +++++
> 4 files changed, 130 insertions(+), 1 deletion(-)
> create mode 100644 xen/arch/arm/suspend.c
>
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 69200b2728..c36158271a 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -51,6 +51,7 @@ obj-y += setup.o
> obj-y += shutdown.o
> obj-y += smp.o
> obj-y += smpboot.o
> +obj-$(CONFIG_SYSTEM_SUSPEND) += suspend.o
> obj-$(CONFIG_SYSCTL) += sysctl.o
> obj-y += time.o
> obj-y += traps.o
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 596e960152..2cb02ee314 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -562,6 +562,52 @@ END(efi_xen_start)
> #endif /* CONFIG_ARM_EFI */
>
> #ifdef CONFIG_SYSTEM_SUSPEND
> +/*
> + * int prepare_resume_ctx(struct cpu_context *ptr)
"cpu_context" is very generic name, especially taking into account that
you are introducing a global variable with the same name. How about
"resume_cpu_context"?
> + *
> + * x0 - pointer to the storage where callee's context will be saved
> + *
> + * CPU context saved here will be restored on resume in hyp_resume function.
> + * prepare_resume_ctx shall return a non-zero value. Upon restoring context
> + * hyp_resume shall return value zero instead. From C code that invokes
> + * prepare_resume_ctx, the return value is interpreted to determine whether
> + * the context is saved (prepare_resume_ctx) or restored (hyp_resume).
> + */
> +FUNC(prepare_resume_ctx)
> + /* Store callee-saved registers */
How are planning to synchronise this code with actual cpu_context?
I am pretty sure it is better to use offsets generated by asm-offset.c
> + stp x19, x20, [x0], #16
> + stp x21, x22, [x0], #16
> + stp x23, x24, [x0], #16
> + stp x25, x26, [x0], #16
> + stp x27, x28, [x0], #16
> + stp x29, lr, [x0], #16
> +
> + /* Store stack-pointer */
> + mov x2, sp
> + str x2, [x0], #8
> +
> + /* Store system control registers */
> + mrs x2, VBAR_EL2
> + str x2, [x0], #8
> + mrs x2, VTCR_EL2
> + str x2, [x0], #8
> + mrs x2, VTTBR_EL2
> + str x2, [x0], #8
> + mrs x2, TPIDR_EL2
> + str x2, [x0], #8
> + mrs x2, MDCR_EL2
> + str x2, [x0], #8
> + mrs x2, HSTR_EL2
> + str x2, [x0], #8
> + mrs x2, CPTR_EL2
> + str x2, [x0], #8
> + mrs x2, HCR_EL2
> + str x2, [x0], #8
> +
> + /* prepare_resume_ctx must return a non-zero value */
> + mov x0, #1
> + ret
> +END(prepare_resume_ctx)
>
> FUNC(hyp_resume)
> /* Initialize the UART if earlyprintk has been enabled. */
> @@ -580,7 +626,49 @@ FUNC(hyp_resume)
> b enable_secondary_cpu_mm
>
> mmu_resumed:
> - b .
> + /* Now we can access the cpu_context, so restore the context here */
> + ldr x0, =cpu_context
> +
> + /* Restore callee-saved registers */
> + ldp x19, x20, [x0], #16
> + ldp x21, x22, [x0], #16
> + ldp x23, x24, [x0], #16
> + ldp x25, x26, [x0], #16
> + ldp x27, x28, [x0], #16
> + ldp x29, lr, [x0], #16
> +
> + /* Restore stack pointer */
> + ldr x2, [x0], #8
> + mov sp, x2
> +
> + /* Restore system control registers */
> + ldr x2, [x0], #8
> + msr VBAR_EL2, x2
> + ldr x2, [x0], #8
> + msr VTCR_EL2, x2
> + ldr x2, [x0], #8
> + msr VTTBR_EL2, x2
> + ldr x2, [x0], #8
> + msr TPIDR_EL2, x2
> + ldr x2, [x0], #8
> + msr MDCR_EL2, x2
> + ldr x2, [x0], #8
> + msr HSTR_EL2, x2
> + ldr x2, [x0], #8
> + msr CPTR_EL2, x2
> + ldr x2, [x0], #8
> + msr HCR_EL2, x2
> + isb
> +
> + /*
> + * Since context is restored return from this function will appear
> + * as return from prepare_resume_ctx. To distinguish a return from
> + * prepare_resume_ctx which is called upon finalizing the suspend,
> + * as opposed to return from this function which executes on resume,
> + * we need to return zero value here.
> + */
> + mov x0, #0
> + ret
> END(hyp_resume)
>
> #endif /* CONFIG_SYSTEM_SUSPEND */
> diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
> index 31a98a1f1b..c127fa3d78 100644
> --- a/xen/arch/arm/include/asm/suspend.h
> +++ b/xen/arch/arm/include/asm/suspend.h
> @@ -3,6 +3,8 @@
> #ifndef ARM_SUSPEND_H
> #define ARM_SUSPEND_H
>
> +#include <xen/types.h>
> +
> struct domain;
> struct vcpu;
> struct vcpu_guest_context;
> @@ -14,6 +16,30 @@ struct resume_info {
>
> void arch_domain_resume(struct domain *d);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +#ifdef CONFIG_ARM_64
> +struct cpu_context {
> + register_t callee_regs[12];
> + register_t sp;
> + register_t vbar_el2;
> + register_t vtcr_el2;
> + register_t vttbr_el2;
> + register_t tpidr_el2;
> + register_t mdcr_el2;
> + register_t hstr_el2;
> + register_t cptr_el2;
> + register_t hcr_el2;
> +} __aligned(16);
> +#else
> +#error "Define cpu_context structure for arm32"
> +#endif
> +
> +extern struct cpu_context cpu_context;
> +
> +int prepare_resume_ctx(struct cpu_context *ptr);
> +void hyp_resume(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> #endif /* ARM_SUSPEND_H */
>
> /*
> diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> new file mode 100644
> index 0000000000..e38566b0b7
> --- /dev/null
> +++ b/xen/arch/arm/suspend.c
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <asm/suspend.h>
> +
> +struct cpu_context cpu_context = {};
Don't need to zero-initialize a global variable.
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
2026-05-05 20:34 ` Mykola Kvach
@ 2026-05-07 22:25 ` Volodymyr Babchuk
2026-05-08 8:37 ` Mykola Kvach
2026-05-08 14:30 ` Luca Fancellu
1 sibling, 1 reply; 66+ messages in thread
From: Volodymyr Babchuk @ 2026-05-07 22:25 UTC (permalink / raw)
To: Mykola Kvach
Cc: Luca Fancellu, xen-devel@lists.xenproject.org, Mykola Kvach,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné,
Rahul Singh
Hi Mykola,
Mykola Kvach <xakep.amatop@gmail.com> writes:
[...]
>> > + status = can_system_suspend();
>> > + if ( status )
>> > + {
>> > + system_state = SYS_STATE_resume;
>> > + goto resume_scheduler;
>>
>> When we have an error and we get the resume_scheduler path, we apply back the
>> context of the guest saved previously in do_psci_1_0_system_suspend(), so am I
>> correct saying the guest won’t get any PSCI error back and we resume the guest
>> from the guest resume entrypoint?
>>
>> In case, should we have a different path that returns a PSCI error (PSCI_*) into the guest
>> x0, and skips the context restore?
>
> You are right about the current control flow: once the virtual
> SYSTEM_SUSPEND request has been accepted and the domain has been parked, a
> later failure in the Xen-wide suspend path resumes the domain through the normal
> domain resume path, rather than returning a PSCI error from the original call.
>
> This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND
> path parks the domain and saves its resume context. The actual Xen-wide host
> suspend is a separate step that is attempted only after all domains are
> suspended.
>
> So a failure in the later Xen-wide suspend step is treated as an abort of the
> host suspend attempt after the domain suspend was already accepted. The domain
> is then resumed through the existing domain resume path, similarly to the
> toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI
> call path and modifying the saved vCPU context again.
>
> I agree this design is not obvious from the patch. I will clarify the commit
> message and comments. If you or the maintainers think that failures before the
> physical SYSTEM_SUSPEND call succeeds should be reported back through the
> original virtual PSCI call, then this would require a different flow. I was
> trying to avoid that extra complexity in this series.
I think that there is no sense to reporting an error back to guest. PSCI
allows resume at any stage, so it is acceptable to have such brief "suspend"
>
>>
>> > + }
>> > +
>> > + /*
>> > + * Non-boot CPUs have to be disabled on suspend and enabled on resume
>> > + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
>> > + * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
>> > + * platform capabilities, this may lead to the physical powering down of
>> > + * CPUs.
>> > + */
>> > + status = disable_nonboot_cpus();
>> > + if ( status )
>> > + {
>> > + system_state = SYS_STATE_resume;
>> > + goto resume_nonboot_cpus;
>> > + }
>> > +
>> > + time_suspend();
>> > +
>> > + status = iommu_suspend();
>> > + if ( status )
>> > + {
>> > + system_state = SYS_STATE_resume;
>> > + goto resume_time;
>> > + }
>> > +
>> > + console_start_sync();
>> > + status = console_suspend();
>> > + if ( status )
>> > + {
>> > + dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", status);
>> > + system_state = SYS_STATE_resume;
>> > + goto resume_end_sync;
>> > + }
>> > +
>> > + local_irq_save(flags);
>> > + status = gic_suspend();
>> > + if ( status )
>> > + {
>> > + system_state = SYS_STATE_resume;
>> > + goto resume_irqs;
>> > + }
>> > +
>> > + set_init_ttbr(xen_pgtable);
>> > +
>> > + /*
>> > + * Enable identity mapping before entering suspend to simplify
>> > + * the resume path
>> > + */
>> > + update_boot_mapping(true);
>> > +
>> > + if ( prepare_resume_ctx(&cpu_context) )
>> > + {
>> > + status = call_psci_system_suspend();
>> > + /*
>> > + * If suspend is finalized properly by above system suspend PSCI call,
>> > + * the code below in this 'if' branch will never execute. Execution
>> > + * will continue from hyp_resume which is the hypervisor's resume point.
>> > + * In hyp_resume CPU context will be restored and since link-register is
>> > + * restored as well, it will appear to return from prepare_resume_ctx.
>> > + * The difference in returning from prepare_resume_ctx on system suspend
>> > + * versus resume is in function's return value: on suspend, the return
>> > + * value is a non-zero value, on resume it is zero. That is why the
>> > + * control flow will not re-enter this 'if' branch on resume.
>> > + */
>> > + if ( status )
>> > + dprintk(XENLOG_WARNING, "PSCI system suspend failed, err=%d\n",
>> > + status);
>> > + }
>> > +
>> > + system_state = SYS_STATE_resume;
>> > + update_boot_mapping(false);
>> > +
>> > + gic_resume();
>> > +
>> > + resume_irqs:
>> > + local_irq_restore(flags);
>> > +
>> > + console_resume();
>> > + resume_end_sync:
>> > + console_end_sync();
>> > +
>> > + iommu_resume();
>> > +
>> > + resume_time:
>> > + time_resume();
>> > +
>> > + resume_nonboot_cpus:
>> > + /*
>> > + * The rcu_barrier() has to be added to ensure that the per cpu area is
>> > + * freed before a non-boot CPU tries to initialize it (_free_percpu_area()
>> > + * has to be called before the init_percpu_area()). This scenario occurs
>> > + * when non-boot CPUs are hot-unplugged on suspend and hotplugged on resume.
>> > + */
>> > + rcu_barrier();
>> > + enable_nonboot_cpus();
>> > +
>> > + resume_scheduler:
>> > + scheduler_enable();
>> > + thaw_domains();
>> > +
>> > + system_state = SYS_STATE_active;
>> > +
>> > + printk("Resume (status %d)\n", status);
>> > +
>> > + domain_resume(d);
>> > +}
>> > +
>> > +static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
>> > +
>> > +void host_system_suspend(struct domain *d)
>> > +{
>> > + system_suspend_tasklet.data = (void *)d;
>> > + /*
>> > + * The suspend procedure has to be finalized by the pCPU#0 (non-boot pCPUs
>> > + * will be disabled during the suspend).
>> > + */
>> > + tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
>> > +}
>> > +
>> > /*
>> > * Local variables:
>> > * mode: C
>> > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
>> > index bd87ec430d..8fb9172186 100644
>> > --- a/xen/arch/arm/vpsci.c
>> > +++ b/xen/arch/arm/vpsci.c
>> > @@ -5,6 +5,7 @@
>> >
>> > #include <asm/current.h>
>> > #include <asm/domain.h>
>> > +#include <asm/suspend.h>
>> > #include <asm/vgic.h>
>> > #include <asm/vpsci.h>
>> > #include <asm/event.h>
>> > @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
>> > if ( is_64bit_domain(d) && is_thumb )
>> > return PSCI_INVALID_ADDRESS;
>> >
>> > - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
>> > - if ( is_hardware_domain(d) )
>> > + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
>> > return PSCI_NOT_SUPPORTED;
>> >
>> > /* Ensure that all CPUs other than the calling one are offline */
>> > @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
>> > "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", cid=%#"PRIregister"\n",
>> > epoint, cid);
>> >
>> > + if ( is_control_domain(d) )
>>
>> Why is_control_domain() here and not is_hardware_domain() ?
>
> The use of is_control_domain() is intentional.
>
> The intended model is that Xen-wide host suspend is orchestrated by the
> privileged management/control domain. The control domain coordinates the
> toolstack side, asks other domains to enter suspend, and then issues the final
> SYSTEM_SUSPEND request to Xen.
>
> This does not have to be the same entity as the hardware domain. If the
> hardware domain is separate, it is one of the domains that the control domain
> parks before the final host suspend step.
>
> The hwdom-specific checks in this patch have a different purpose: they avoid
> the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware
> domain to be parked as part of the suspend sequence. They do not define the
> policy for who is allowed to trigger Xen-wide host suspend.
>
> That said, this policy may not be optimal for all configurations, especially
> when the control and hardware domain roles are split. I would appreciate your
> view, as well as the maintainers' views, on whether the trigger should remain
> control-domain based, be tied to the hardware domain instead, or be expressed
> through a separate host-suspend capability/helper.
Hardware domain owns all the hardware. Hardware shall be put to
power-down/suspended state before suspending the SoC, so it can be
resumed afterwards. You can't just pause hardware domain in the same way
as pausing all other domains.
(Of course, we'll have the same issues with domain that have
passed-through hardware, but in this case Dom0 shall orchestrate proper
suspend sequence for these)
[...]
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume
2026-04-02 10:45 ` [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume Mykola Kvach
2026-04-24 10:59 ` Luca Fancellu
2026-04-27 8:19 ` Bertrand Marquis
@ 2026-05-07 22:26 ` Volodymyr Babchuk
2 siblings, 0 replies; 66+ messages in thread
From: Volodymyr Babchuk @ 2026-05-07 22:26 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Bertrand Marquis,
Jens Wiklander, Stefano Stabellini, Julien Grall, Michal Orzel
Hi Mykola,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> init_tee_secondary() was marked __init and freed after boot. Calling it
> from the CPU hotplug/resume path then executed discarded code, which
> could crash Xen. Drop __init so the TEE mediator secondary init can run
> safely on hotplugged and resumed CPUs.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> xen/arch/arm/tee/tee.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> index 8501443c8e..00e561fc78 100644
> --- a/xen/arch/arm/tee/tee.c
> +++ b/xen/arch/arm/tee/tee.c
> @@ -128,7 +128,7 @@ static int __init tee_init(void)
>
> presmp_initcall(tee_init);
>
> -void __init init_tee_secondary(void)
> +void init_tee_secondary(void)
> {
> if ( cur_mediator && cur_mediator->ops->init_secondary )
> cur_mediator->ops->init_secondary();
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
2026-05-07 22:25 ` Volodymyr Babchuk
@ 2026-05-08 8:37 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-08 8:37 UTC (permalink / raw)
To: Volodymyr Babchuk
Cc: Luca Fancellu, xen-devel@lists.xenproject.org, Mykola Kvach,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné,
Rahul Singh
Hi Volodymyr,
Thank you for the feedback.
On Fri, May 8, 2026 at 1:25 AM Volodymyr Babchuk
<Volodymyr_Babchuk@epam.com> wrote:
>
> Hi Mykola,
>
> Mykola Kvach <xakep.amatop@gmail.com> writes:
>
> [...]
>
> >> > + status = can_system_suspend();
> >> > + if ( status )
> >> > + {
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_scheduler;
> >>
> >> When we have an error and we get the resume_scheduler path, we apply back the
> >> context of the guest saved previously in do_psci_1_0_system_suspend(), so am I
> >> correct saying the guest won’t get any PSCI error back and we resume the guest
> >> from the guest resume entrypoint?
> >>
> >> In case, should we have a different path that returns a PSCI error (PSCI_*) into the guest
> >> x0, and skips the context restore?
> >
> > You are right about the current control flow: once the virtual
> > SYSTEM_SUSPEND request has been accepted and the domain has been parked, a
> > later failure in the Xen-wide suspend path resumes the domain through the normal
> > domain resume path, rather than returning a PSCI error from the original call.
> >
> > This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND
> > path parks the domain and saves its resume context. The actual Xen-wide host
> > suspend is a separate step that is attempted only after all domains are
> > suspended.
> >
> > So a failure in the later Xen-wide suspend step is treated as an abort of the
> > host suspend attempt after the domain suspend was already accepted. The domain
> > is then resumed through the existing domain resume path, similarly to the
> > toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI
> > call path and modifying the saved vCPU context again.
> >
> > I agree this design is not obvious from the patch. I will clarify the commit
> > message and comments. If you or the maintainers think that failures before the
> > physical SYSTEM_SUSPEND call succeeds should be reported back through the
> > original virtual PSCI call, then this would require a different flow. I was
> > trying to avoid that extra complexity in this series.
>
> I think that there is no sense to reporting an error back to guest. PSCI
> allows resume at any stage, so it is acceptable to have such brief "suspend"
>
> >
> >>
> >> > + }
> >> > +
> >> > + /*
> >> > + * Non-boot CPUs have to be disabled on suspend and enabled on resume
> >> > + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
> >> > + * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
> >> > + * platform capabilities, this may lead to the physical powering down of
> >> > + * CPUs.
> >> > + */
> >> > + status = disable_nonboot_cpus();
> >> > + if ( status )
> >> > + {
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_nonboot_cpus;
> >> > + }
> >> > +
> >> > + time_suspend();
> >> > +
> >> > + status = iommu_suspend();
> >> > + if ( status )
> >> > + {
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_time;
> >> > + }
> >> > +
> >> > + console_start_sync();
> >> > + status = console_suspend();
> >> > + if ( status )
> >> > + {
> >> > + dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", status);
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_end_sync;
> >> > + }
> >> > +
> >> > + local_irq_save(flags);
> >> > + status = gic_suspend();
> >> > + if ( status )
> >> > + {
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_irqs;
> >> > + }
> >> > +
> >> > + set_init_ttbr(xen_pgtable);
> >> > +
> >> > + /*
> >> > + * Enable identity mapping before entering suspend to simplify
> >> > + * the resume path
> >> > + */
> >> > + update_boot_mapping(true);
> >> > +
> >> > + if ( prepare_resume_ctx(&cpu_context) )
> >> > + {
> >> > + status = call_psci_system_suspend();
> >> > + /*
> >> > + * If suspend is finalized properly by above system suspend PSCI call,
> >> > + * the code below in this 'if' branch will never execute. Execution
> >> > + * will continue from hyp_resume which is the hypervisor's resume point.
> >> > + * In hyp_resume CPU context will be restored and since link-register is
> >> > + * restored as well, it will appear to return from prepare_resume_ctx.
> >> > + * The difference in returning from prepare_resume_ctx on system suspend
> >> > + * versus resume is in function's return value: on suspend, the return
> >> > + * value is a non-zero value, on resume it is zero. That is why the
> >> > + * control flow will not re-enter this 'if' branch on resume.
> >> > + */
> >> > + if ( status )
> >> > + dprintk(XENLOG_WARNING, "PSCI system suspend failed, err=%d\n",
> >> > + status);
> >> > + }
> >> > +
> >> > + system_state = SYS_STATE_resume;
> >> > + update_boot_mapping(false);
> >> > +
> >> > + gic_resume();
> >> > +
> >> > + resume_irqs:
> >> > + local_irq_restore(flags);
> >> > +
> >> > + console_resume();
> >> > + resume_end_sync:
> >> > + console_end_sync();
> >> > +
> >> > + iommu_resume();
> >> > +
> >> > + resume_time:
> >> > + time_resume();
> >> > +
> >> > + resume_nonboot_cpus:
> >> > + /*
> >> > + * The rcu_barrier() has to be added to ensure that the per cpu area is
> >> > + * freed before a non-boot CPU tries to initialize it (_free_percpu_area()
> >> > + * has to be called before the init_percpu_area()). This scenario occurs
> >> > + * when non-boot CPUs are hot-unplugged on suspend and hotplugged on resume.
> >> > + */
> >> > + rcu_barrier();
> >> > + enable_nonboot_cpus();
> >> > +
> >> > + resume_scheduler:
> >> > + scheduler_enable();
> >> > + thaw_domains();
> >> > +
> >> > + system_state = SYS_STATE_active;
> >> > +
> >> > + printk("Resume (status %d)\n", status);
> >> > +
> >> > + domain_resume(d);
> >> > +}
> >> > +
> >> > +static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
> >> > +
> >> > +void host_system_suspend(struct domain *d)
> >> > +{
> >> > + system_suspend_tasklet.data = (void *)d;
> >> > + /*
> >> > + * The suspend procedure has to be finalized by the pCPU#0 (non-boot pCPUs
> >> > + * will be disabled during the suspend).
> >> > + */
> >> > + tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
> >> > +}
> >> > +
> >> > /*
> >> > * Local variables:
> >> > * mode: C
> >> > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> >> > index bd87ec430d..8fb9172186 100644
> >> > --- a/xen/arch/arm/vpsci.c
> >> > +++ b/xen/arch/arm/vpsci.c
> >> > @@ -5,6 +5,7 @@
> >> >
> >> > #include <asm/current.h>
> >> > #include <asm/domain.h>
> >> > +#include <asm/suspend.h>
> >> > #include <asm/vgic.h>
> >> > #include <asm/vpsci.h>
> >> > #include <asm/event.h>
> >> > @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> >> > if ( is_64bit_domain(d) && is_thumb )
> >> > return PSCI_INVALID_ADDRESS;
> >> >
> >> > - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> >> > - if ( is_hardware_domain(d) )
> >> > + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
> >> > return PSCI_NOT_SUPPORTED;
> >> >
> >> > /* Ensure that all CPUs other than the calling one are offline */
> >> > @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> >> > "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", cid=%#"PRIregister"\n",
> >> > epoint, cid);
> >> >
> >> > + if ( is_control_domain(d) )
> >>
> >> Why is_control_domain() here and not is_hardware_domain() ?
> >
> > The use of is_control_domain() is intentional.
> >
> > The intended model is that Xen-wide host suspend is orchestrated by the
> > privileged management/control domain. The control domain coordinates the
> > toolstack side, asks other domains to enter suspend, and then issues the final
> > SYSTEM_SUSPEND request to Xen.
> >
> > This does not have to be the same entity as the hardware domain. If the
> > hardware domain is separate, it is one of the domains that the control domain
> > parks before the final host suspend step.
> >
> > The hwdom-specific checks in this patch have a different purpose: they avoid
> > the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware
> > domain to be parked as part of the suspend sequence. They do not define the
> > policy for who is allowed to trigger Xen-wide host suspend.
> >
> > That said, this policy may not be optimal for all configurations, especially
> > when the control and hardware domain roles are split. I would appreciate your
> > view, as well as the maintainers' views, on whether the trigger should remain
> > control-domain based, be tied to the hardware domain instead, or be expressed
> > through a separate host-suspend capability/helper.
>
>
> Hardware domain owns all the hardware. Hardware shall be put to
> power-down/suspended state before suspending the SoC, so it can be
> resumed afterwards. You can't just pause hardware domain in the same way
> as pausing all other domains.
>
> (Of course, we'll have the same issues with domain that have
> passed-through hardware, but in this case Dom0 shall orchestrate proper
> suspend sequence for these)
Yes, I agree that the hardware domain must not be externally
paused as a replacement for its own suspend path.
What I meant to describe is a guest-driven suspend sequence.
The control domain/toolstack may orchestrate the sequence,
but each domain that needs to quiesce hardware, including
the hardware domain and any domain with passed-through
devices, is expected to enter its own suspend path first and
quiesce its devices before issuing the virtual PSCI
SYSTEM_SUSPEND call.
Xen only treats other domains as ready for host suspend after
they have voluntarily reached SHUTDOWN_suspend. In the split
control/hardware-domain case, the final host-wide suspend
request from the control domain is accepted only after the
other domains, including the hardware domain, are already in
SHUTDOWN_suspend.
So my wording saying that the control domain "parks" the
hardware domain was imprecise. The control domain orchestrates
the sequence; it does not externally pause the hardware domain
as a substitute for its own suspend path.
Best regards,
Mykola
>
> [...]
>
> --
> WBR, Volodymyr
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume
2026-05-07 22:17 ` Volodymyr Babchuk
@ 2026-05-08 10:38 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-08 10:38 UTC (permalink / raw)
To: Volodymyr Babchuk
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel
Hi Volodymyr,
Thanks for taking a look at this.
On Fri, May 8, 2026 at 1:17 AM Volodymyr Babchuk
<Volodymyr_Babchuk@epam.com> wrote:
>
> Hi Mikola,
>
> Mykola Kvach <xakep.amatop@gmail.com> writes:
>
> > From: Mirela Simonovic <mirela.simonovic@aggios.com>
> >
> > The context of CPU general purpose and system control registers must be
> > saved on suspend and restored on resume. This is implemented in
> > prepare_resume_ctx and before the return from the hyp_resume function.
> > The prepare_resume_ctx must be invoked just before the PSCI system suspend
> > call is issued to the ATF. The prepare_resume_ctx must return a non-zero
> > value so that the calling 'if' statement evaluates to true, causing the
> > system suspend to be invoked. Upon resume, the context saved on suspend
> > will be restored, including the link register. Therefore, after
> > restoring the context, the control flow will return to the address
> > pointed to by the saved link register, which is the place from which
> > prepare_resume_ctx was called. To ensure that the calling 'if' statement
> > does not again evaluate to true and initiate system suspend, hyp_resume
> > must return a zero value after restoring the context.
> >
> > Note that the order of saving register context into cpu_context structure
> > must match the order of restoring.
> >
> > Support for ARM32 is not implemented. Instead, compilation fails with a
> > build-time error if suspend is enabled for ARM32.
> >
> > Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> > Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> > Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in v8:
> > - fix alignments in code
> >
> > Changes in v7:
> > - no changes
> > ---
> > xen/arch/arm/Makefile | 1 +
> > xen/arch/arm/arm64/head.S | 90 +++++++++++++++++++++++++++++-
> > xen/arch/arm/include/asm/suspend.h | 26 +++++++++
> > xen/arch/arm/suspend.c | 14 +++++
> > 4 files changed, 130 insertions(+), 1 deletion(-)
> > create mode 100644 xen/arch/arm/suspend.c
> >
> > diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> > index 69200b2728..c36158271a 100644
> > --- a/xen/arch/arm/Makefile
> > +++ b/xen/arch/arm/Makefile
> > @@ -51,6 +51,7 @@ obj-y += setup.o
> > obj-y += shutdown.o
> > obj-y += smp.o
> > obj-y += smpboot.o
> > +obj-$(CONFIG_SYSTEM_SUSPEND) += suspend.o
> > obj-$(CONFIG_SYSCTL) += sysctl.o
> > obj-y += time.o
> > obj-y += traps.o
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 596e960152..2cb02ee314 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -562,6 +562,52 @@ END(efi_xen_start)
> > #endif /* CONFIG_ARM_EFI */
> >
> > #ifdef CONFIG_SYSTEM_SUSPEND
> > +/*
> > + * int prepare_resume_ctx(struct cpu_context *ptr)
>
> "cpu_context" is very generic name, especially taking into account that
> you are introducing a global variable with the same name. How about
> "resume_cpu_context"?
Ack.
>
> > + *
> > + * x0 - pointer to the storage where callee's context will be saved
> > + *
> > + * CPU context saved here will be restored on resume in hyp_resume function.
> > + * prepare_resume_ctx shall return a non-zero value. Upon restoring context
> > + * hyp_resume shall return value zero instead. From C code that invokes
> > + * prepare_resume_ctx, the return value is interpreted to determine whether
> > + * the context is saved (prepare_resume_ctx) or restored (hyp_resume).
> > + */
> > +FUNC(prepare_resume_ctx)
> > + /* Store callee-saved registers */
>
> How are planning to synchronise this code with actual cpu_context?
>
> I am pretty sure it is better to use offsets generated by asm-offset.c
Ack.
>
> > + stp x19, x20, [x0], #16
> > + stp x21, x22, [x0], #16
> > + stp x23, x24, [x0], #16
> > + stp x25, x26, [x0], #16
> > + stp x27, x28, [x0], #16
> > + stp x29, lr, [x0], #16
> > +
> > + /* Store stack-pointer */
> > + mov x2, sp
> > + str x2, [x0], #8
> > +
> > + /* Store system control registers */
> > + mrs x2, VBAR_EL2
> > + str x2, [x0], #8
> > + mrs x2, VTCR_EL2
> > + str x2, [x0], #8
> > + mrs x2, VTTBR_EL2
> > + str x2, [x0], #8
> > + mrs x2, TPIDR_EL2
> > + str x2, [x0], #8
> > + mrs x2, MDCR_EL2
> > + str x2, [x0], #8
> > + mrs x2, HSTR_EL2
> > + str x2, [x0], #8
> > + mrs x2, CPTR_EL2
> > + str x2, [x0], #8
> > + mrs x2, HCR_EL2
> > + str x2, [x0], #8
> > +
> > + /* prepare_resume_ctx must return a non-zero value */
> > + mov x0, #1
> > + ret
> > +END(prepare_resume_ctx)
> >
> > FUNC(hyp_resume)
> > /* Initialize the UART if earlyprintk has been enabled. */
> > @@ -580,7 +626,49 @@ FUNC(hyp_resume)
> > b enable_secondary_cpu_mm
> >
> > mmu_resumed:
> > - b .
> > + /* Now we can access the cpu_context, so restore the context here */
> > + ldr x0, =cpu_context
> > +
> > + /* Restore callee-saved registers */
> > + ldp x19, x20, [x0], #16
> > + ldp x21, x22, [x0], #16
> > + ldp x23, x24, [x0], #16
> > + ldp x25, x26, [x0], #16
> > + ldp x27, x28, [x0], #16
> > + ldp x29, lr, [x0], #16
> > +
> > + /* Restore stack pointer */
> > + ldr x2, [x0], #8
> > + mov sp, x2
> > +
> > + /* Restore system control registers */
> > + ldr x2, [x0], #8
> > + msr VBAR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr VTCR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr VTTBR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr TPIDR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr MDCR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr HSTR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr CPTR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr HCR_EL2, x2
> > + isb
> > +
> > + /*
> > + * Since context is restored return from this function will appear
> > + * as return from prepare_resume_ctx. To distinguish a return from
> > + * prepare_resume_ctx which is called upon finalizing the suspend,
> > + * as opposed to return from this function which executes on resume,
> > + * we need to return zero value here.
> > + */
> > + mov x0, #0
> > + ret
> > END(hyp_resume)
> >
> > #endif /* CONFIG_SYSTEM_SUSPEND */
> > diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
> > index 31a98a1f1b..c127fa3d78 100644
> > --- a/xen/arch/arm/include/asm/suspend.h
> > +++ b/xen/arch/arm/include/asm/suspend.h
> > @@ -3,6 +3,8 @@
> > #ifndef ARM_SUSPEND_H
> > #define ARM_SUSPEND_H
> >
> > +#include <xen/types.h>
> > +
> > struct domain;
> > struct vcpu;
> > struct vcpu_guest_context;
> > @@ -14,6 +16,30 @@ struct resume_info {
> >
> > void arch_domain_resume(struct domain *d);
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +#ifdef CONFIG_ARM_64
> > +struct cpu_context {
> > + register_t callee_regs[12];
> > + register_t sp;
> > + register_t vbar_el2;
> > + register_t vtcr_el2;
> > + register_t vttbr_el2;
> > + register_t tpidr_el2;
> > + register_t mdcr_el2;
> > + register_t hstr_el2;
> > + register_t cptr_el2;
> > + register_t hcr_el2;
> > +} __aligned(16);
> > +#else
> > +#error "Define cpu_context structure for arm32"
> > +#endif
> > +
> > +extern struct cpu_context cpu_context;
> > +
> > +int prepare_resume_ctx(struct cpu_context *ptr);
> > +void hyp_resume(void);
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > #endif /* ARM_SUSPEND_H */
> >
> > /*
> > diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> > new file mode 100644
> > index 0000000000..e38566b0b7
> > --- /dev/null
> > +++ b/xen/arch/arm/suspend.c
> > @@ -0,0 +1,14 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#include <asm/suspend.h>
> > +
> > +struct cpu_context cpu_context = {};
>
> Don't need to zero-initialize a global variable.
Ack.
Best regards,
Mykola
>
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
>
> --
> WBR, Volodymyr
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2026-05-07 7:48 ` Mykola Kvach
@ 2026-05-08 10:56 ` Luca Fancellu
2026-05-10 6:02 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-05-08 10:56 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
>
>>
>>> + }
>>> +
>>> + off = i * sizeof(irqs->icfgr);
>>> + for ( j = 0; j < ARRAY_SIZE(irqs->icfgr); j++ )
>>> + writel_gicd(irqs->icfgr[j], GICD_ICFGR + off + j * 4);
>>> + }
>>> +
>>> + /* Make sure all registers are restored and enable distributor */
>>> + writel_gicd(gic_ctx.dist.ctlr, GICD_CTLR);
>>> +
>>> + /* Restore GIC CPU interface configuration */
>>> + writel_gicc(gic_ctx.cpu.pmr, GICC_PMR);
>>> + writel_gicc(gic_ctx.cpu.bpr, GICC_BPR);
>>> +
>>> + /* Enable GIC CPU interface */
>>> + writel_gicc(gic_ctx.cpu.ctlr, GICC_CTLR);
>>> +}
>>> +
>>
>> I also see that we don’t save pending SGIs state (by GICD_CPENDSGIRn/GICD_SPENDSGIRn) or Active Priorities registers
>> state (GICC_APRn/GICC_NSAPRn [latter if security extension are there]) as written in [1] “4.5 Preserving and restoring GIC state”,
>> was it intentional?
>
> Yes, this was intentional.
>
> The GICv2 suspend callback is called at a quiescent point in the
> SYSTEM_SUSPEND path: all domains are already shut down for suspend, guest
> execution is quiesced, the scheduler is disabled, non-boot CPUs have been
> offlined, and CPU0 enters gic_suspend() with local interrupts disabled.
>
> For SGIs, I don't consider GICD_CPENDSGIRn/GICD_SPENDSGIRn part of the saved
> host GIC context. Xen uses physical SGIs as IPIs, and IPI delivery is an
> internal synchronization mechanism, not architectural state that should be
> replayed after SYSTEM_SUSPEND. Guest SGI state is virtual GIC state and is not
> represented by these physical GICD SGI pending registers.
ack, I would maybe mention in the commit message that we exclude transient IPI/active-priority
state at the suspend quiescent point.
>
> For GICC_APRn/GICC_NSAPRn, those registers describe active priority state for
> interrupts already acknowledged by the CPU interface. The final suspend path is
> not expected to run with an active physical interrupt context. If those
> registers were non-zero there, restoring only APR/NSAPR would not make the
> corresponding interrupt handling context valid after resume, and could instead
> leave the CPU interface with stale active priority state.
Ok I understand now, but if we are expecting here GICD_ISACTIVERn zeroed, why are
we saving/restoring it? Shouldn’t we instead have a runtime check that it’s zero and in case
it’s not bail out? And in the resume path we would only zero it.
Am I missing something?
>
> So I did not add save/restore for GICD_CPENDSGIRn/GICD_SPENDSGIRn or
> GICC_APRn/GICC_NSAPRn in this patch. I can add a short comment in v9 to make
> this scope explicit.
>
> Please let me know if you think there is a suspend/resume path where this
> state still needs to be preserved.
>
> Best regards,
> Mykola
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 05/13] xen/arm: gic-v3: add ITS suspend/resume support
2026-05-05 10:09 ` Mykola Kvach
@ 2026-05-08 11:30 ` Luca Fancellu
2026-05-08 22:11 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-05-08 11:30 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné
Hi Mykola,
>
> On Fri, Apr 24, 2026 at 1:54 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>>
>> Hi Mykola,
>>
>>> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>>>
>>> From: Mykola Kvach <mykola_kvach@epam.com>
>>>
>>> Handle system suspend/resume for GICv3 with an ITS present so LPIs keep
>>> working after firmware powers the GIC down. Snapshot the CPU interface,
>>> distributor and last-CPU redistributor state,
“Snapshot the CPU interface, distributor and last-CPU redistributor state” happened in the commit before?
>>> disable the ITS to cache its
>>> CTLR/CBASER/BASER registers, then restore everything and re-arm the
>>> collection on resume.
>>>
>>> Add list_for_each_entry_continue_reverse() in list.h for the ITS suspend
>>> error path that needs to roll back partially saved state.
>>>
>>> Based on Linux commit dba0bc7b76dc ("irqchip/gic-v3-its: Add ability to save/restore ITS state")
>>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
>>> ---
[…]
>
>>
>>> + {
>>> + unsigned int i;
>>> + void __iomem *base = its->its_base;
>>> +
>>> + its->suspend_ctx.ctlr = readl_relaxed(base + GITS_CTLR);
>>> + ret = gicv3_disable_its(its);
>>
>> This is called from system_suspend(), along the path iommu_suspend and
>> console_suspend() are called, finally reaching gic_suspend() and this one.
>>
>> In the IHI 0069H.b, 5.6.2 Disabling an ITS, it says:
>> “Ensure that all interrupts that target the ITS that is being powered down are
>> either redirected or disabled”, is it correct to assume all the ITS targeting source
>> at this point are disabled because domains should be already suspended?
>
> Yes, that is the assumption here.
>
> Before Xen reaches this path, each domain must already have entered
> SHUTDOWN_suspend. In other words, the guest OS has already requested
> SYSTEM_SUSPEND only after completing its own suspend flow, so the
> ITS-targeting interrupt sources owned by that OS are expected to be
> quiesced at this point.
>
> So this code relies on the owning OS having disabled or otherwise
> quiesced those sources before issuing SYSTEM_SUSPEND, rather than Xen
> explicitly doing that in gicv3_its_suspend().
Ok! I would be for a comment stating this assumption, unless the maintainers disagree
>
>>
>>
>>> + if ( ret )
>>> + {
>>> + writel_relaxed(its->suspend_ctx.ctlr, base + GITS_CTLR);
>>
>> here and in the other places we write GITS_CTLR, this reg has Quiescent as RO,
>> maybe we should mask the write to only the other bits that are writable?
>
> Yes, this was inherited from the Linux ITS suspend/resume code, which restores
> the saved GITS_CTLR value directly.
>
> That said, masking the write to the writable bits is cleaner, and I will do
> that in the next version.
ok
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers
2026-05-05 15:23 ` Mykola Kvach
@ 2026-05-08 12:21 ` Luca Fancellu
2026-05-08 21:44 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-05-08 12:21 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Bertrand Marquis,
Rahul Singh, Stefano Stabellini, Julien Grall, Michal Orzel,
Volodymyr Babchuk
HI Mykola,
>>>
>>> -static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>> +static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>> {
>>> int ret;
>>> u32 reg, enables;
>>> @@ -2163,17 +2166,9 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>> }
>>> }
>>>
>>> - ret = arm_smmu_setup_irqs(smmu);
>>> - if (ret) {
>>> - dev_err(smmu->dev, "failed to setup irqs\n");
>>
>> We are moving this one to the probe and ..
>>
>>> + ret = arm_smmu_enable_irqs(smmu);
>>> + if ( ret )
>>
>> changing with this one, but arm_smmu_setup_irqs() also calls arm_smmu_setup_unique_irqs() which
>> calls arm_smmu_setup_msis(), are we sure that on resume we will get the same state?
>
> This follows the split introduced in the Linux arm-smmu-v3 runtime/system sleep
> series:
>
> https://lore.kernel.org/linux-iommu/20260414194702.1229094-1-praan@google.com/
>
> The intent is to keep IRQ handler registration as one-time probe state, while
> reset/resume only restores the SMMU hardware state and re-enables interrupt
> generation.
>
> You are right that the MSI case needs extra care. In the Linux series this is
> handled by arm_smmu_resume_msis(), which restores the SMMU-side MSI
> configuration. I did not port that part in this patch because Xen SMMUv3 MSI
> support is currently documented as unsupported and is not part of the
> supported/tested path, so this patch only covers the wired IRQ path used by Xen
> today.
>
> If Xen SMMUv3 MSI support becomes usable in the future, the resume path will
> need an equivalent MSI restore step before IRQ_CTRL is re-enabled.
In the mean time should we check maybe smmu->features doesn’t have
ARM_SMMU_FEAT_MSI flag and document it in commit message?
What do you think about it? I’m just worried someone uses CONFIG_MSI and your
feature and ends up in some trouble, while we know that your feature breaks
CONFIG_MSI.
Maybe the maintainers can give their opinion here as well.
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume
2026-05-05 15:55 ` Mykola Kvach
@ 2026-05-08 13:26 ` Luca Fancellu
2026-05-08 20:51 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-05-08 13:26 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
>>> xen/arch/arm/arm64/head.S | 24 ++++++++++++++++++++++++
>>> 1 file changed, 24 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
>>> index 72c7b24498..596e960152 100644
>>> --- a/xen/arch/arm/arm64/head.S
>>> +++ b/xen/arch/arm/arm64/head.S
>>> @@ -561,6 +561,30 @@ END(efi_xen_start)
>>>
>>> #endif /* CONFIG_ARM_EFI */
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> +
>>> +FUNC(hyp_resume)
>>
>> I think we should mask all exceptions here:
>> msr DAIFSet, 0xf
>>
>> until we resume correctly the status (VBAR_EL2, etc).
>
> This was discussed in an earlier version:
>
> https://patchew.org/Xen/cover.1741164138.git.xakep.amatop@gmail.com/2ef15cb605f987eb087c5496d123c47c01cc0ae7.1741164138.git.xakep.amatop@gmail.com/#CAGeoDV97no7mXSKd7auFu5E85wSXAHKWvqGW2=-VEAbkrTyU8Q@mail.gmail.com
>
> For SYSTEM_SUSPEND, PSCI ties the call semantics to CPU_SUSPEND. In
> particular, section 5.20.2 says that the caller must observe all the rules
> described for CPU_SUSPEND, and section 6.4 explicitly says that the initial
> state rules also apply to SYSTEM_SUSPEND.
>
> For the return Exception level on AArch64, section 6.4.3.3 requires
> SPSR_ELx.{D,A,I,F} to be set to {1, 1, 1, 1}. Therefore Xen expects to enter
> this resume path with DAIF already masked by PSCI-compliant firmware.
>
> I agree this assumption is not obvious from the code, so I will add a comment
> at the resume entry point to document that this path relies on the PSCI initial
> core configuration requirements.
Yes please, something along the line of
/*
* PSCI SYSTEM_SUSPEND follows CPU_SUSPEND initial-state rules.
* On AArch64, firmware must return with SPSR_ELx.DAIF set, so
* PSTATE.DAIF is already masked on entry here.
*/
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
2026-05-05 20:34 ` Mykola Kvach
2026-05-07 22:25 ` Volodymyr Babchuk
@ 2026-05-08 14:30 ` Luca Fancellu
2026-05-08 20:49 ` Mykola Kvach
1 sibling, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-05-08 14:30 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné,
Rahul Singh
Hi Mykola,
>>> +/* Xen suspend. data identifies the domain that initiated suspend. */
>>> +static void system_suspend(void *data)
>>> +{
>>> + int status;
>>> + unsigned long flags;
>>> + struct domain *d = (struct domain *)data;
>>> +
>>> + BUG_ON(system_state != SYS_STATE_active);
>>> +
>>> + system_state = SYS_STATE_suspend;
>>> +
>>> + printk("Xen suspending...\n");
>>> +
>>> + freeze_domains();
>>> + scheduler_disable();
>>> +
>>> + status = can_system_suspend();
>>> + if ( status )
>>> + {
>>> + system_state = SYS_STATE_resume;
>>> + goto resume_scheduler;
>>
>> When we have an error and we get the resume_scheduler path, we apply back the
>> context of the guest saved previously in do_psci_1_0_system_suspend(), so am I
>> correct saying the guest won’t get any PSCI error back and we resume the guest
>> from the guest resume entrypoint?
>>
>> In case, should we have a different path that returns a PSCI error (PSCI_*) into the guest
>> x0, and skips the context restore?
>
> You are right about the current control flow: once the virtual
> SYSTEM_SUSPEND request has been accepted and the domain has been parked, a
> later failure in the Xen-wide suspend path resumes the domain through the normal
> domain resume path, rather than returning a PSCI error from the original call.
>
> This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND
> path parks the domain and saves its resume context. The actual Xen-wide host
> suspend is a separate step that is attempted only after all domains are
> suspended.
>
> So a failure in the later Xen-wide suspend step is treated as an abort of the
> host suspend attempt after the domain suspend was already accepted. The domain
> is then resumed through the existing domain resume path, similarly to the
> toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI
> call path and modifying the saved vCPU context again.
>
> I agree this design is not obvious from the patch. I will clarify the commit
> message and comments. If you or the maintainers think that failures before the
> physical SYSTEM_SUSPEND call succeeds should be reported back through the
> original virtual PSCI call, then this would require a different flow. I was
> trying to avoid that extra complexity in this series.
Ok I understand, I’m wondering if inside do_psci_1_0_system_suspend() we could do something
like:
[…]
if ( is_control_domain(d) && !other_domains_ready_for_suspend(d) )
return PSCI_DENIED;
[…]
But I’m ok also to only document this behaviour.
>>>
>>> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
>>> index bd87ec430d..8fb9172186 100644
>>> --- a/xen/arch/arm/vpsci.c
>>> +++ b/xen/arch/arm/vpsci.c
>>> @@ -5,6 +5,7 @@
>>>
>>> #include <asm/current.h>
>>> #include <asm/domain.h>
>>> +#include <asm/suspend.h>
>>> #include <asm/vgic.h>
>>> #include <asm/vpsci.h>
>>> #include <asm/event.h>
>>> @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
>>> if ( is_64bit_domain(d) && is_thumb )
>>> return PSCI_INVALID_ADDRESS;
>>>
>>> - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
>>> - if ( is_hardware_domain(d) )
>>> + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
>>> return PSCI_NOT_SUPPORTED;
>>>
>>> /* Ensure that all CPUs other than the calling one are offline */
>>> @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
>>> "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", cid=%#"PRIregister"\n",
>>> epoint, cid);
>>>
>>> + if ( is_control_domain(d) )
>>
>> Why is_control_domain() here and not is_hardware_domain() ?
>
> The use of is_control_domain() is intentional.
>
> The intended model is that Xen-wide host suspend is orchestrated by the
> privileged management/control domain. The control domain coordinates the
> toolstack side, asks other domains to enter suspend, and then issues the final
> SYSTEM_SUSPEND request to Xen.
>
> This does not have to be the same entity as the hardware domain. If the
> hardware domain is separate, it is one of the domains that the control domain
> parks before the final host suspend step.
>
> The hwdom-specific checks in this patch have a different purpose: they avoid
> the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware
> domain to be parked as part of the suspend sequence. They do not define the
> policy for who is allowed to trigger Xen-wide host suspend.
>
> That said, this policy may not be optimal for all configurations, especially
> when the control and hardware domain roles are split. I would appreciate your
> view, as well as the maintainers' views, on whether the trigger should remain
> control-domain based, be tied to the hardware domain instead, or be expressed
> through a separate host-suspend capability/helper.
In the commit message and title I saw HW domain, so maybe the commit should be updated
to say control domain instead?
At this point however I’m wondering about this code above:
```
if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
return PSCI_NOT_SUPPORTED;
```
and in do_psci_1_0_features(), shouldn’t we use consistently is_control_domain()?
>
>>
>>> + host_system_suspend(d);
>>> +
>>> return rc;
>>> }
>>>
>>> @@ -290,7 +293,10 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
>>> return 0;
>>> case PSCI_1_0_FN32_SYSTEM_SUSPEND:
>>> case PSCI_1_0_FN64_SYSTEM_SUSPEND:
>>> - return is_hardware_domain(current->domain) ? PSCI_NOT_SUPPORTED : 0;
>>> + if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) ||
>>> + !is_hardware_domain(current->domain) )
>>
>> Should this have also the condition that “is hardware domain and psci_ver >= PSCI_VERSION(1, 0)”?
>> Otherwise if the host machine doestn’t support PSCI 1.0 we would return OK here but the call would
>> fail later in call_psci_system_suspend()?
>
> Good point.
>
> I agree that, for the domain allowed to trigger Xen-wide suspend, Xen should
> not advertise SYSTEM_SUSPEND if the host suspend path cannot be used.
>
> I think this should be checked as an explicit host SYSTEM_SUSPEND capability,
> rather than only as psci_ver >= PSCI_VERSION(1, 0). The same capability check
> also needs to be enforced in the actual SYSTEM_SUSPEND handler before parking
> the domain, because a caller may invoke SYSTEM_SUSPEND directly without first
> querying PSCI_FEATURES.
>
> For ordinary guests, the physical PSCI version is not relevant because they
> cannot trigger host suspend; their SYSTEM_SUSPEND path is virtual.
>
> I will make this consistent in v9: PSCI_FEATURES will advertise SYSTEM_SUSPEND
> for the host-suspend-triggering domain only when the host SYSTEM_SUSPEND backend
> is available, and the actual SYSTEM_SUSPEND path will enforce the same check.
ok
>>>
>>> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
>>> index 22d306d0cb..45f29ef8ec 100644
>>> --- a/xen/drivers/passthrough/arm/smmu.c
>>> +++ b/xen/drivers/passthrough/arm/smmu.c
>>> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
>>> xfree(xen_domain);
>>> }
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> +static int arm_smmu_suspend(void)
>>> +{
>>> + return -ENOSYS;
>>> +}
>>> +#endif
>>
>> Maybe we want to gate the feature also to !CONFIG_ARM_SMMU ? I would wait for the maintainers
>> view on this.
>
> I feel that gating this strictly on !CONFIG_ARM_SMMU might not be the most
> optimal approach here.
>
> CONFIG_ARM_SMMU is a build-time option and does not mean that an old SMMUv1/v2
> device is actually present. Using it would disable system suspend even on
> platforms where only SMMUv3 is used, because CONFIG_ARM_SMMU is enabled by
> default for Arm.
>
> The condition should be runtime-based: whether the active/probed IOMMU devices
> have system suspend/resume support. For the old ARM SMMU driver this is not
> implemented today, so a platform with an SMMUv1/v2 instance should not expose
> or attempt host suspend.
>
> I think we should handle this by tracking whether any old ARM SMMUv1/v2 device
> was actually probed, or by adding a generic IOMMU suspend capability check. Then
> the host suspend availability check can reject system suspend only when such an
> unsupported IOMMU is present, instead of disabling the feature for all
> Arm builds
> with CONFIG_ARM_SMMU enabled.
>
> I would be interested to hear if you or the maintainers see a better way to
> express this capability.
ok, let’s address Jan comment now and we can see what the maintainers think about this.
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
2026-05-08 14:30 ` Luca Fancellu
@ 2026-05-08 20:49 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-08 20:49 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné,
Rahul Singh
On Fri, May 8, 2026 at 5:31 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> >>> +/* Xen suspend. data identifies the domain that initiated suspend. */
> >>> +static void system_suspend(void *data)
> >>> +{
> >>> + int status;
> >>> + unsigned long flags;
> >>> + struct domain *d = (struct domain *)data;
> >>> +
> >>> + BUG_ON(system_state != SYS_STATE_active);
> >>> +
> >>> + system_state = SYS_STATE_suspend;
> >>> +
> >>> + printk("Xen suspending...\n");
> >>> +
> >>> + freeze_domains();
> >>> + scheduler_disable();
> >>> +
> >>> + status = can_system_suspend();
> >>> + if ( status )
> >>> + {
> >>> + system_state = SYS_STATE_resume;
> >>> + goto resume_scheduler;
> >>
> >> When we have an error and we get the resume_scheduler path, we apply back the
> >> context of the guest saved previously in do_psci_1_0_system_suspend(), so am I
> >> correct saying the guest won’t get any PSCI error back and we resume the guest
> >> from the guest resume entrypoint?
> >>
> >> In case, should we have a different path that returns a PSCI error (PSCI_*) into the guest
> >> x0, and skips the context restore?
> >
> > You are right about the current control flow: once the virtual
> > SYSTEM_SUSPEND request has been accepted and the domain has been parked, a
> > later failure in the Xen-wide suspend path resumes the domain through the normal
> > domain resume path, rather than returning a PSCI error from the original call.
> >
> > This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND
> > path parks the domain and saves its resume context. The actual Xen-wide host
> > suspend is a separate step that is attempted only after all domains are
> > suspended.
> >
> > So a failure in the later Xen-wide suspend step is treated as an abort of the
> > host suspend attempt after the domain suspend was already accepted. The domain
> > is then resumed through the existing domain resume path, similarly to the
> > toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI
> > call path and modifying the saved vCPU context again.
> >
> > I agree this design is not obvious from the patch. I will clarify the commit
> > message and comments. If you or the maintainers think that failures before the
> > physical SYSTEM_SUSPEND call succeeds should be reported back through the
> > original virtual PSCI call, then this would require a different flow. I was
> > trying to avoid that extra complexity in this series.
>
> Ok I understand, I’m wondering if inside do_psci_1_0_system_suspend() we could do something
> like:
>
> […]
> if ( is_control_domain(d) && !other_domains_ready_for_suspend(d) )
> return PSCI_DENIED;
Yes. I have reworked this locally and will include it in v9.
The control-domain readiness check is now done in the vPSCI SYSTEM_SUSPEND
path before building the guest resume context and before calling
domain_shutdown(..., SHUTDOWN_suspend), so in that case the call returns
PSCI_DENIED early rather than parking the domain first.
>
> […]
>
> But I’m ok also to only document this behaviour.
>
>
> >>>
> >>> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> >>> index bd87ec430d..8fb9172186 100644
> >>> --- a/xen/arch/arm/vpsci.c
> >>> +++ b/xen/arch/arm/vpsci.c
> >>> @@ -5,6 +5,7 @@
> >>>
> >>> #include <asm/current.h>
> >>> #include <asm/domain.h>
> >>> +#include <asm/suspend.h>
> >>> #include <asm/vgic.h>
> >>> #include <asm/vpsci.h>
> >>> #include <asm/event.h>
> >>> @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> >>> if ( is_64bit_domain(d) && is_thumb )
> >>> return PSCI_INVALID_ADDRESS;
> >>>
> >>> - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> >>> - if ( is_hardware_domain(d) )
> >>> + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
> >>> return PSCI_NOT_SUPPORTED;
> >>>
> >>> /* Ensure that all CPUs other than the calling one are offline */
> >>> @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> >>> "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", cid=%#"PRIregister"\n",
> >>> epoint, cid);
> >>>
> >>> + if ( is_control_domain(d) )
> >>
> >> Why is_control_domain() here and not is_hardware_domain() ?
> >
> > The use of is_control_domain() is intentional.
> >
> > The intended model is that Xen-wide host suspend is orchestrated by the
> > privileged management/control domain. The control domain coordinates the
> > toolstack side, asks other domains to enter suspend, and then issues the final
> > SYSTEM_SUSPEND request to Xen.
> >
> > This does not have to be the same entity as the hardware domain. If the
> > hardware domain is separate, it is one of the domains that the control domain
> > parks before the final host suspend step.
> >
> > The hwdom-specific checks in this patch have a different purpose: they avoid
> > the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware
> > domain to be parked as part of the suspend sequence. They do not define the
> > policy for who is allowed to trigger Xen-wide host suspend.
> >
> > That said, this policy may not be optimal for all configurations, especially
> > when the control and hardware domain roles are split. I would appreciate your
> > view, as well as the maintainers' views, on whether the trigger should remain
> > control-domain based, be tied to the hardware domain instead, or be expressed
> > through a separate host-suspend capability/helper.
>
> In the commit message and title I saw HW domain, so maybe the commit should be updated
> to say control domain instead?
Yes, that was stale wording from an older version. I have fixed it
locally for v9.
>
> At this point however I’m wondering about this code above:
> ```
> if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
> return PSCI_NOT_SUPPORTED;
> ```
> and in do_psci_1_0_features(), shouldn’t we use consistently is_control_domain()?
Yes. I have reworked this locally so the policy is now explicit and
consistent.
The control domain is the only domain whose SYSTEM_SUSPEND request may
drive the Xen-wide host suspend path. For that domain, both
PSCI_FEATURES(SYSTEM_SUSPEND) and the real SYSTEM_SUSPEND path now
consult the same helper, so the advertised capability and the execution
path stay consistent.
That helper is an explicit host-suspend capability check: it requires
firmware support for PSCI SYSTEM_SUSPEND and it also keeps host suspend
disabled when Xen detects a missing suspend/resume path in required
host-side components.
For a non-control hardware domain, SYSTEM_SUSPEND remains a virtual
guest suspend operation and does not by itself trigger Xen-wide host
suspend. Other guests keep the existing virtual behaviour as well.
>
> >
> >>
> >>> + host_system_suspend(d);
> >>> +
> >>> return rc;
> >>> }
> >>>
> >>> @@ -290,7 +293,10 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> >>> return 0;
> >>> case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> >>> case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> >>> - return is_hardware_domain(current->domain) ? PSCI_NOT_SUPPORTED : 0;
> >>> + if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) ||
> >>> + !is_hardware_domain(current->domain) )
> >>
> >> Should this have also the condition that “is hardware domain and psci_ver >= PSCI_VERSION(1, 0)”?
> >> Otherwise if the host machine doestn’t support PSCI 1.0 we would return OK here but the call would
> >> fail later in call_psci_system_suspend()?
> >
> > Good point.
> >
> > I agree that, for the domain allowed to trigger Xen-wide suspend, Xen should
> > not advertise SYSTEM_SUSPEND if the host suspend path cannot be used.
> >
> > I think this should be checked as an explicit host SYSTEM_SUSPEND capability,
> > rather than only as psci_ver >= PSCI_VERSION(1, 0). The same capability check
> > also needs to be enforced in the actual SYSTEM_SUSPEND handler before parking
> > the domain, because a caller may invoke SYSTEM_SUSPEND directly without first
> > querying PSCI_FEATURES.
> >
> > For ordinary guests, the physical PSCI version is not relevant because they
> > cannot trigger host suspend; their SYSTEM_SUSPEND path is virtual.
> >
> > I will make this consistent in v9: PSCI_FEATURES will advertise SYSTEM_SUSPEND
> > for the host-suspend-triggering domain only when the host SYSTEM_SUSPEND backend
> > is available, and the actual SYSTEM_SUSPEND path will enforce the same check.
>
> ok
I have reworked this locally accordingly.
Rather than open-coding a psci_ver >= PSCI_VERSION(1, 0) test, the new
code uses an explicit host SYSTEM_SUSPEND capability predicate. For the
control domain, PSCI_FEATURES(SYSTEM_SUSPEND) and the actual
SYSTEM_SUSPEND handler now share that same check, so they stay
consistent even when host suspend is blocked by firmware capability or
by Xen-side runtime suspend/resume limitations.
>
> >>>
> >>> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> >>> index 22d306d0cb..45f29ef8ec 100644
> >>> --- a/xen/drivers/passthrough/arm/smmu.c
> >>> +++ b/xen/drivers/passthrough/arm/smmu.c
> >>> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> >>> xfree(xen_domain);
> >>> }
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> +static int arm_smmu_suspend(void)
> >>> +{
> >>> + return -ENOSYS;
> >>> +}
> >>> +#endif
> >>
> >> Maybe we want to gate the feature also to !CONFIG_ARM_SMMU ? I would wait for the maintainers
> >> view on this.
> >
> > I feel that gating this strictly on !CONFIG_ARM_SMMU might not be the most
> > optimal approach here.
> >
> > CONFIG_ARM_SMMU is a build-time option and does not mean that an old SMMUv1/v2
> > device is actually present. Using it would disable system suspend even on
> > platforms where only SMMUv3 is used, because CONFIG_ARM_SMMU is enabled by
> > default for Arm.
> >
> > The condition should be runtime-based: whether the active/probed IOMMU devices
> > have system suspend/resume support. For the old ARM SMMU driver this is not
> > implemented today, so a platform with an SMMUv1/v2 instance should not expose
> > or attempt host suspend.
> >
> > I think we should handle this by tracking whether any old ARM SMMUv1/v2 device
> > was actually probed, or by adding a generic IOMMU suspend capability check. Then
> > the host suspend availability check can reject system suspend only when such an
> > unsupported IOMMU is present, instead of disabling the feature for all
> > Arm builds
> > with CONFIG_ARM_SMMU enabled.
> >
> > I would be interested to hear if you or the maintainers see a better way to
> > express this capability.
>
> ok, let’s address Jan comment now and we can see what the maintainers think about this.
Ack, thanks.
I have already reworked the Jan-related part locally for v9: the
control-domain readiness/capability checks now happen in the vPSCI
SYSTEM_SUSPEND path before the domain finishes suspend, and PSCI_FEATURES
plus the real call now use the same host-suspend capability predicate.
I’ll fold that into the next revision, and then we can see what the
maintainers prefer for the remaining runtime-gating details.
Best regards,
Mykola
>
> Cheers,
> Luca
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume
2026-05-08 13:26 ` Luca Fancellu
@ 2026-05-08 20:51 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-08 20:51 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On Fri, May 8, 2026 at 4:28 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> >>> xen/arch/arm/arm64/head.S | 24 ++++++++++++++++++++++++
> >>> 1 file changed, 24 insertions(+)
> >>>
> >>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> >>> index 72c7b24498..596e960152 100644
> >>> --- a/xen/arch/arm/arm64/head.S
> >>> +++ b/xen/arch/arm/arm64/head.S
> >>> @@ -561,6 +561,30 @@ END(efi_xen_start)
> >>>
> >>> #endif /* CONFIG_ARM_EFI */
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> +
> >>> +FUNC(hyp_resume)
> >>
> >> I think we should mask all exceptions here:
> >> msr DAIFSet, 0xf
> >>
> >> until we resume correctly the status (VBAR_EL2, etc).
> >
> > This was discussed in an earlier version:
> >
> > https://patchew.org/Xen/cover.1741164138.git.xakep.amatop@gmail.com/2ef15cb605f987eb087c5496d123c47c01cc0ae7.1741164138.git.xakep.amatop@gmail.com/#CAGeoDV97no7mXSKd7auFu5E85wSXAHKWvqGW2=-VEAbkrTyU8Q@mail.gmail.com
> >
> > For SYSTEM_SUSPEND, PSCI ties the call semantics to CPU_SUSPEND. In
> > particular, section 5.20.2 says that the caller must observe all the rules
> > described for CPU_SUSPEND, and section 6.4 explicitly says that the initial
> > state rules also apply to SYSTEM_SUSPEND.
> >
> > For the return Exception level on AArch64, section 6.4.3.3 requires
> > SPSR_ELx.{D,A,I,F} to be set to {1, 1, 1, 1}. Therefore Xen expects to enter
> > this resume path with DAIF already masked by PSCI-compliant firmware.
> >
> > I agree this assumption is not obvious from the code, so I will add a comment
> > at the resume entry point to document that this path relies on the PSCI initial
> > core configuration requirements.
>
> Yes please, something along the line of
>
> /*
> * PSCI SYSTEM_SUSPEND follows CPU_SUSPEND initial-state rules.
> * On AArch64, firmware must return with SPSR_ELx.DAIF set, so
> * PSTATE.DAIF is already masked on entry here.
> */
Ack.
Best regards,
Mykola
>
> Cheers,
> Luca
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume
2026-05-07 22:06 ` Volodymyr Babchuk
@ 2026-05-08 20:59 ` Mykola Kvach
2026-05-11 16:11 ` Oleksandr Tyshchenko
0 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-05-08 20:59 UTC (permalink / raw)
To: Volodymyr Babchuk
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel
Hi Volodymyr,
Thank you for the feedback.
On Fri, May 8, 2026 at 1:06 AM Volodymyr Babchuk
<Volodymyr_Babchuk@epam.com> wrote:
>
> Hi Mykola,
>
> Mykola Kvach <xakep.amatop@gmail.com> writes:
>
> > From: Mirela Simonovic <mirela.simonovic@aggios.com>
> >
> > The MMU must be enabled during the resume path before restoring context,
> > as virtual addresses are used to access the saved context data.
> >
>
> I agree with Luca, this patch does not makes sense as is. I don't see
> why it should be separated from the rest of the resume path that is
> added in the next patch
Ack. I'll combine this with the next patch in v9.
Best regards,
Mykola
>
> > This patch adds MMU setup during resume by reusing the existing
> > enable_secondary_cpu_mm function, which enables data cache and the MMU.
> > Before the MMU is enabled, the content of TTBR0_EL2 is changed to point
> > to init_ttbr (page tables used at runtime).
> >
> > Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> > Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> > Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in v7:
> > - no functional changes, just moved commit
> > ---
> > xen/arch/arm/arm64/head.S | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 72c7b24498..596e960152 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -561,6 +561,30 @@ END(efi_xen_start)
> >
> > #endif /* CONFIG_ARM_EFI */
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +FUNC(hyp_resume)
> > + /* Initialize the UART if earlyprintk has been enabled. */
> > +#ifdef CONFIG_EARLY_PRINTK
> > + bl init_uart
> > +#endif
> > + PRINT_ID("- Xen resuming -\r\n")
> > +
> > + bl check_cpu_mode
> > + bl cpu_init
> > +
> > + ldr x0, =start
> > + adr x20, start /* x20 := paddr (start) */
> > + sub x20, x20, x0 /* x20 := phys-offset */
> > + ldr lr, =mmu_resumed
> > + b enable_secondary_cpu_mm
> > +
> > +mmu_resumed:
> > + b .
> > +END(hyp_resume)
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > /*
> > * Local variables:
> > * mode: ASM
>
> --
> WBR, Volodymyr
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers
2026-05-08 12:21 ` Luca Fancellu
@ 2026-05-08 21:44 ` Mykola Kvach
2026-05-09 7:50 ` Luca Fancellu
0 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-05-08 21:44 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Bertrand Marquis,
Rahul Singh, Stefano Stabellini, Julien Grall, Michal Orzel,
Volodymyr Babchuk
On Fri, May 8, 2026 at 3:22 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> HI Mykola,
>
> >>>
> >>> -static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> >>> +static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
> >>> {
> >>> int ret;
> >>> u32 reg, enables;
> >>> @@ -2163,17 +2166,9 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
> >>> }
> >>> }
> >>>
> >>> - ret = arm_smmu_setup_irqs(smmu);
> >>> - if (ret) {
> >>> - dev_err(smmu->dev, "failed to setup irqs\n");
> >>
> >> We are moving this one to the probe and ..
> >>
> >>> + ret = arm_smmu_enable_irqs(smmu);
> >>> + if ( ret )
> >>
> >> changing with this one, but arm_smmu_setup_irqs() also calls arm_smmu_setup_unique_irqs() which
> >> calls arm_smmu_setup_msis(), are we sure that on resume we will get the same state?
> >
> > This follows the split introduced in the Linux arm-smmu-v3 runtime/system sleep
> > series:
> >
> > https://lore.kernel.org/linux-iommu/20260414194702.1229094-1-praan@google.com/
> >
> > The intent is to keep IRQ handler registration as one-time probe state, while
> > reset/resume only restores the SMMU hardware state and re-enables interrupt
> > generation.
> >
> > You are right that the MSI case needs extra care. In the Linux series this is
> > handled by arm_smmu_resume_msis(), which restores the SMMU-side MSI
> > configuration. I did not port that part in this patch because Xen SMMUv3 MSI
> > support is currently documented as unsupported and is not part of the
> > supported/tested path, so this patch only covers the wired IRQ path used by Xen
> > today.
> >
> > If Xen SMMUv3 MSI support becomes usable in the future, the resume path will
> > need an equivalent MSI restore step before IRQ_CTRL is re-enabled.
>
> In the mean time should we check maybe smmu->features doesn’t have
> ARM_SMMU_FEAT_MSI flag and document it in commit message?
>
> What do you think about it? I’m just worried someone uses CONFIG_MSI and your
> feature and ends up in some trouble, while we know that your feature breaks
> CONFIG_MSI.
Good point.
I don't think checking only ARM_SMMU_FEAT_MSI in this patch is the right
approach, since that reflects hardware capability rather than whether Xen is
actually using the SMMUv3 MSI IRQ path.
For v9, I plan to keep this SMMUv3 patch limited to documenting the current
limitation in the driver and in the commit message: the MSI IRQ path is not
host-suspend-safe yet because resume does not restore the SMMU *_IRQ_CFGn
registers.
The actual runtime block will be added in a later host suspend policy patch,
together with the other runtime blockers. That keeps the policy in one place
and ensures PSCI_FEATURES(SYSTEM_SUSPEND) stays consistent with the actual
SYSTEM_SUSPEND handling.
Best regards,
Mykola
>
> Maybe the maintainers can give their opinion here as well.
>
> Cheers,
> Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 05/13] xen/arm: gic-v3: add ITS suspend/resume support
2026-05-08 11:30 ` Luca Fancellu
@ 2026-05-08 22:11 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-08 22:11 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné
On Fri, May 8, 2026 at 2:31 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> >
> > On Fri, Apr 24, 2026 at 1:54 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
> >>
> >> Hi Mykola,
> >>
> >>> On 2 Apr 2026, at 11:45, Mykola Kvach <xakep.amatop@gmail.com> wrote:
> >>>
> >>> From: Mykola Kvach <mykola_kvach@epam.com>
> >>>
> >>> Handle system suspend/resume for GICv3 with an ITS present so LPIs keep
> >>> working after firmware powers the GIC down. Snapshot the CPU interface,
> >>> distributor and last-CPU redistributor state,
>
> “Snapshot the CPU interface, distributor and last-CPU redistributor state” happened in the commit before?
Yes, fair point.
That wording is too broad for this patch. It describes the wider GICv3
suspend/resume flow in which the ITS handling is invoked, rather than the
ITS-specific part added here.
The CPU interface, distributor and redistributor handling are covered by
the related GICv3 suspend/resume patches, while this patch itself adds the
ITS state save/restore.
I will tighten the commit message in the next version so it only describes
the ITS-specific suspend/resume handling done by this patch.
>
> >>> disable the ITS to cache its
> >>> CTLR/CBASER/BASER registers, then restore everything and re-arm the
> >>> collection on resume.
> >>>
> >>> Add list_for_each_entry_continue_reverse() in list.h for the ITS suspend
> >>> error path that needs to roll back partially saved state.
> >>>
> >>> Based on Linux commit dba0bc7b76dc ("irqchip/gic-v3-its: Add ability to save/restore ITS state")
> >>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> >>> ---
> […]
> >
> >>
> >>> + {
> >>> + unsigned int i;
> >>> + void __iomem *base = its->its_base;
> >>> +
> >>> + its->suspend_ctx.ctlr = readl_relaxed(base + GITS_CTLR);
> >>> + ret = gicv3_disable_its(its);
> >>
> >> This is called from system_suspend(), along the path iommu_suspend and
> >> console_suspend() are called, finally reaching gic_suspend() and this one.
> >>
> >> In the IHI 0069H.b, 5.6.2 Disabling an ITS, it says:
> >> “Ensure that all interrupts that target the ITS that is being powered down are
> >> either redirected or disabled”, is it correct to assume all the ITS targeting source
> >> at this point are disabled because domains should be already suspended?
> >
> > Yes, that is the assumption here.
> >
> > Before Xen reaches this path, each domain must already have entered
> > SHUTDOWN_suspend. In other words, the guest OS has already requested
> > SYSTEM_SUSPEND only after completing its own suspend flow, so the
> > ITS-targeting interrupt sources owned by that OS are expected to be
> > quiesced at this point.
> >
> > So this code relies on the owning OS having disabled or otherwise
> > quiesced those sources before issuing SYSTEM_SUSPEND, rather than Xen
> > explicitly doing that in gicv3_its_suspend().
>
> Ok! I would be for a comment stating this assumption, unless the maintainers disagree
Ack.
Best regards,
Mykola
>
> >
> >>
> >>
> >>> + if ( ret )
> >>> + {
> >>> + writel_relaxed(its->suspend_ctx.ctlr, base + GITS_CTLR);
> >>
> >> here and in the other places we write GITS_CTLR, this reg has Quiescent as RO,
> >> maybe we should mask the write to only the other bits that are writable?
> >
> > Yes, this was inherited from the Linux ITS suspend/resume code, which restores
> > the saved GITS_CTLR value directly.
> >
> > That said, masking the write to the writable bits is cleaner, and I will do
> > that in the next version.
>
> ok
>
> Cheers,
> Luca
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers
2026-05-08 21:44 ` Mykola Kvach
@ 2026-05-09 7:50 ` Luca Fancellu
0 siblings, 0 replies; 66+ messages in thread
From: Luca Fancellu @ 2026-05-09 7:50 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Bertrand Marquis,
Rahul Singh, Stefano Stabellini, Julien Grall, Michal Orzel,
Volodymyr Babchuk
Hi Mykola,
> On 8 May 2026, at 22:44, Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> On Fri, May 8, 2026 at 3:22 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>>
>> HI Mykola,
>>
>>>>>
>>>>> -static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>>>> +static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>>>> {
>>>>> int ret;
>>>>> u32 reg, enables;
>>>>> @@ -2163,17 +2166,9 @@ static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>>>> }
>>>>> }
>>>>>
>>>>> - ret = arm_smmu_setup_irqs(smmu);
>>>>> - if (ret) {
>>>>> - dev_err(smmu->dev, "failed to setup irqs\n");
>>>>
>>>> We are moving this one to the probe and ..
>>>>
>>>>> + ret = arm_smmu_enable_irqs(smmu);
>>>>> + if ( ret )
>>>>
>>>> changing with this one, but arm_smmu_setup_irqs() also calls arm_smmu_setup_unique_irqs() which
>>>> calls arm_smmu_setup_msis(), are we sure that on resume we will get the same state?
>>>
>>> This follows the split introduced in the Linux arm-smmu-v3 runtime/system sleep
>>> series:
>>>
>>> https://lore.kernel.org/linux-iommu/20260414194702.1229094-1-praan@google.com/
>>>
>>> The intent is to keep IRQ handler registration as one-time probe state, while
>>> reset/resume only restores the SMMU hardware state and re-enables interrupt
>>> generation.
>>>
>>> You are right that the MSI case needs extra care. In the Linux series this is
>>> handled by arm_smmu_resume_msis(), which restores the SMMU-side MSI
>>> configuration. I did not port that part in this patch because Xen SMMUv3 MSI
>>> support is currently documented as unsupported and is not part of the
>>> supported/tested path, so this patch only covers the wired IRQ path used by Xen
>>> today.
>>>
>>> If Xen SMMUv3 MSI support becomes usable in the future, the resume path will
>>> need an equivalent MSI restore step before IRQ_CTRL is re-enabled.
>>
>> In the mean time should we check maybe smmu->features doesn’t have
>> ARM_SMMU_FEAT_MSI flag and document it in commit message?
>>
>> What do you think about it? I’m just worried someone uses CONFIG_MSI and your
>> feature and ends up in some trouble, while we know that your feature breaks
>> CONFIG_MSI.
>
> Good point.
>
> I don't think checking only ARM_SMMU_FEAT_MSI in this patch is the right
> approach, since that reflects hardware capability rather than whether Xen is
> actually using the SMMUv3 MSI IRQ path.
Yes you are right, I realised that moments after sending my reply, a good check
would be to complain only if Xen is actually using that path.
Let’s go for documenting the limitation unless maintainers disagree.
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2026-05-08 10:56 ` Luca Fancellu
@ 2026-05-10 6:02 ` Mykola Kvach
2026-05-11 6:40 ` Luca Fancellu
0 siblings, 1 reply; 66+ messages in thread
From: Mykola Kvach @ 2026-05-10 6:02 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On Fri, May 8, 2026 at 1:57 PM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> >
> >>
> >>> + }
> >>> +
> >>> + off = i * sizeof(irqs->icfgr);
> >>> + for ( j = 0; j < ARRAY_SIZE(irqs->icfgr); j++ )
> >>> + writel_gicd(irqs->icfgr[j], GICD_ICFGR + off + j * 4);
> >>> + }
> >>> +
> >>> + /* Make sure all registers are restored and enable distributor */
> >>> + writel_gicd(gic_ctx.dist.ctlr, GICD_CTLR);
> >>> +
> >>> + /* Restore GIC CPU interface configuration */
> >>> + writel_gicc(gic_ctx.cpu.pmr, GICC_PMR);
> >>> + writel_gicc(gic_ctx.cpu.bpr, GICC_BPR);
> >>> +
> >>> + /* Enable GIC CPU interface */
> >>> + writel_gicc(gic_ctx.cpu.ctlr, GICC_CTLR);
> >>> +}
> >>> +
> >>
> >> I also see that we don’t save pending SGIs state (by GICD_CPENDSGIRn/GICD_SPENDSGIRn) or Active Priorities registers
> >> state (GICC_APRn/GICC_NSAPRn [latter if security extension are there]) as written in [1] “4.5 Preserving and restoring GIC state”,
> >> was it intentional?
> >
> > Yes, this was intentional.
> >
> > The GICv2 suspend callback is called at a quiescent point in the
> > SYSTEM_SUSPEND path: all domains are already shut down for suspend, guest
> > execution is quiesced, the scheduler is disabled, non-boot CPUs have been
> > offlined, and CPU0 enters gic_suspend() with local interrupts disabled.
> >
> > For SGIs, I don't consider GICD_CPENDSGIRn/GICD_SPENDSGIRn part of the saved
> > host GIC context. Xen uses physical SGIs as IPIs, and IPI delivery is an
> > internal synchronization mechanism, not architectural state that should be
> > replayed after SYSTEM_SUSPEND. Guest SGI state is virtual GIC state and is not
> > represented by these physical GICD SGI pending registers.
>
> ack, I would maybe mention in the commit message that we exclude transient IPI/active-priority
> state at the suspend quiescent point.
Ack.
>
> >
> > For GICC_APRn/GICC_NSAPRn, those registers describe active priority state for
> > interrupts already acknowledged by the CPU interface. The final suspend path is
> > not expected to run with an active physical interrupt context. If those
> > registers were non-zero there, restoring only APR/NSAPR would not make the
> > corresponding interrupt handling context valid after resume, and could instead
> > leave the CPU interface with stale active priority state.
>
> Ok I understand now, but if we are expecting here GICD_ISACTIVERn zeroed, why are
> we saving/restoring it? Shouldn’t we instead have a runtime check that it’s zero and in case
> it’s not bail out? And in the resume path we would only zero it.
>
> Am I missing something?
Good questions.
Yes, the distinction I should have made clearer is between CPU-interface
active-priority state and distributor active state.
For GICC_APRn/GICC_NSAPRn, I expect the state to be quiesced at this point.
Those registers track active priorities in the CPU interface. Xen reaches
gic_suspend() with local interrupts disabled, and for the guest-routed
interrupt case that can leave a distributor active bit behind, Xen has
already performed the physical EOI, so the CPU-interface priority has been
dropped.
There is no CPU-interface active-priority context that we can meaningfully
replay after resume.
That is different from GICD_ISACTIVERn. In EOImode==1, EOIR only drops the
priority. The interrupt remains active in the distributor until the separate
deactivation step. For a guest-routed interrupt Xen's GICv2 guest end path does
only the physical EOI; deactivation is completed later by the virtual GIC/GICV
path when the guest completes the interrupt.
This is why APR/NSAPR and ISACTIVERn are treated differently. For example:
1. A physical IRQ routed to a guest is acknowledged by Xen.
2. The GIC marks the interrupt active in the distributor.
3. Xen EOIs it, which drops the physical priority.
4. Xen queues/injects the interrupt to the vGIC.
5. The guest has not yet run, or the virtual interrupt is not yet deliverable
because of guest PMR/priority/local IRQ masking/vGIC state.
6. Therefore the guest-side deactivate has not happened yet, and the physical
distributor active bit remains set.
There is also a late suspend window in the current Xen path: domains are
suspended and the scheduler is disabled before local IRQs are disabled.
A guest-routed IRQ can therefore be taken by Xen after the guest is already
suspended, but before gic_suspend(). Xen can EOI/priority-drop it and queue
it for the guest, while the guest cannot run and deactivate it before the
GIC state is saved.
This is the same class of issue handled by Linux for GIC EOImode==1. Linux
saves/restores the active state because forwarded interrupts can remain active
while passed to a VM [1].
So I don't think GICD_ISACTIVERn should be treated as "must be zero" unless we
also add an explicit suspend-abort/quiesce policy for in-flight guest
interrupts. That would be a different design: detect non-zero active/in-flight
state, unwind suspend, thaw domains, let the guest drain/deactivate the
interrupts, and retry later. This series does not implement that policy. Given
the current flow, preserving GICD_ISACTIVERn avoids losing architectural
interrupt-controller state across suspend/resume.
I am not opposed to such a policy as a follow-up if we want stricter suspend
quiescence rules, but I think it should be designed explicitly rather than
inferred from the GIC save/restore code.
Best regards,
Mykola
[1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1447701208-18150-5-git-send-email-marc.zyngier@arm.com/
>
> >
> > So I did not add save/restore for GICD_CPENDSGIRn/GICD_SPENDSGIRn or
> > GICC_APRn/GICC_NSAPRn in this patch. I can add a short comment in v9 to make
> > this scope explicit.
> >
> > Please let me know if you think there is a suspend/resume path where this
> > state still needs to be preserved.
> >
> > Best regards,
> > Mykola
>
> Cheers,
> Luca
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2026-05-10 6:02 ` Mykola Kvach
@ 2026-05-11 6:40 ` Luca Fancellu
2026-05-11 20:41 ` Mykola Kvach
0 siblings, 1 reply; 66+ messages in thread
From: Luca Fancellu @ 2026-05-11 6:40 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Mykola,
>
>>
>>>
>>> For GICC_APRn/GICC_NSAPRn, those registers describe active priority state for
>>> interrupts already acknowledged by the CPU interface. The final suspend path is
>>> not expected to run with an active physical interrupt context. If those
>>> registers were non-zero there, restoring only APR/NSAPR would not make the
>>> corresponding interrupt handling context valid after resume, and could instead
>>> leave the CPU interface with stale active priority state.
>>
>> Ok I understand now, but if we are expecting here GICD_ISACTIVERn zeroed, why are
>> we saving/restoring it? Shouldn’t we instead have a runtime check that it’s zero and in case
>> it’s not bail out? And in the resume path we would only zero it.
>>
>> Am I missing something?
>
> Good questions.
>
> Yes, the distinction I should have made clearer is between CPU-interface
> active-priority state and distributor active state.
>
> For GICC_APRn/GICC_NSAPRn, I expect the state to be quiesced at this point.
> Those registers track active priorities in the CPU interface. Xen reaches
> gic_suspend() with local interrupts disabled, and for the guest-routed
> interrupt case that can leave a distributor active bit behind, Xen has
> already performed the physical EOI, so the CPU-interface priority has been
> dropped.
> There is no CPU-interface active-priority context that we can meaningfully
> replay after resume.
>
> That is different from GICD_ISACTIVERn. In EOImode==1, EOIR only drops the
> priority. The interrupt remains active in the distributor until the separate
> deactivation step. For a guest-routed interrupt Xen's GICv2 guest end path does
> only the physical EOI; deactivation is completed later by the virtual GIC/GICV
> path when the guest completes the interrupt.
>
> This is why APR/NSAPR and ISACTIVERn are treated differently. For example:
>
> 1. A physical IRQ routed to a guest is acknowledged by Xen.
> 2. The GIC marks the interrupt active in the distributor.
> 3. Xen EOIs it, which drops the physical priority.
> 4. Xen queues/injects the interrupt to the vGIC.
> 5. The guest has not yet run, or the virtual interrupt is not yet deliverable
> because of guest PMR/priority/local IRQ masking/vGIC state.
> 6. Therefore the guest-side deactivate has not happened yet, and the physical
> distributor active bit remains set.
>
> There is also a late suspend window in the current Xen path: domains are
> suspended and the scheduler is disabled before local IRQs are disabled.
> A guest-routed IRQ can therefore be taken by Xen after the guest is already
> suspended, but before gic_suspend(). Xen can EOI/priority-drop it and queue
> it for the guest, while the guest cannot run and deactivate it before the
> GIC state is saved.
>
> This is the same class of issue handled by Linux for GIC EOImode==1. Linux
> saves/restores the active state because forwarded interrupts can remain active
> while passed to a VM [1].
>
> So I don't think GICD_ISACTIVERn should be treated as "must be zero" unless we
> also add an explicit suspend-abort/quiesce policy for in-flight guest
> interrupts. That would be a different design: detect non-zero active/in-flight
> state, unwind suspend, thaw domains, let the guest drain/deactivate the
> interrupts, and retry later. This series does not implement that policy. Given
> the current flow, preserving GICD_ISACTIVERn avoids losing architectural
> interrupt-controller state across suspend/resume.
>
> I am not opposed to such a policy as a follow-up if we want stricter suspend
> quiescence rules, but I think it should be designed explicitly rather than
> inferred from the GIC save/restore code.
>
> Best regards,
> Mykola
>
> [1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1447701208-18150-5-git-send-email-marc.zyngier@arm.com/
Right, yes I agree! I have another question though, since GICC_APRn state should be
quiesced in the suspend path (allimplemented active-priority bits should read as zero),
should we have a runtime check just after disabling the CPU interface?
Cheers,
Luca
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume
2026-04-02 10:45 ` [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
2026-04-27 15:26 ` Luca Fancellu
2026-05-07 22:17 ` Volodymyr Babchuk
@ 2026-05-11 16:00 ` Oleksandr Tyshchenko
2026-05-11 18:52 ` Mykola Kvach
2 siblings, 1 reply; 66+ messages in thread
From: Oleksandr Tyshchenko @ 2026-05-11 16:00 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
On 4/2/26 13:45, Mykola Kvach wrote:
Hello Mykola
I did not spot any obvious issues with this patch. As far as I can tell,
the save/restore register set appears to be complete and correct for the
current codebase.
Just one observation: there is an API asymmetry between
prepare_resume_ctx() and hyp_resume() (save uses pointer, restore
hardcodes global) ...
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> The context of CPU general purpose and system control registers must be
> saved on suspend and restored on resume. This is implemented in
> prepare_resume_ctx and before the return from the hyp_resume function.
> The prepare_resume_ctx must be invoked just before the PSCI system suspend
> call is issued to the ATF. The prepare_resume_ctx must return a non-zero
> value so that the calling 'if' statement evaluates to true, causing the
> system suspend to be invoked. Upon resume, the context saved on suspend
> will be restored, including the link register. Therefore, after
> restoring the context, the control flow will return to the address
> pointed to by the saved link register, which is the place from which
> prepare_resume_ctx was called. To ensure that the calling 'if' statement
> does not again evaluate to true and initiate system suspend, hyp_resume
> must return a zero value after restoring the context.
>
> Note that the order of saving register context into cpu_context structure
> must match the order of restoring.
>
> Support for ARM32 is not implemented. Instead, compilation fails with a
> build-time error if suspend is enabled for ARM32.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in v8:
> - fix alignments in code
>
> Changes in v7:
> - no changes
> ---
> xen/arch/arm/Makefile | 1 +
> xen/arch/arm/arm64/head.S | 90 +++++++++++++++++++++++++++++-
> xen/arch/arm/include/asm/suspend.h | 26 +++++++++
> xen/arch/arm/suspend.c | 14 +++++
> 4 files changed, 130 insertions(+), 1 deletion(-)
> create mode 100644 xen/arch/arm/suspend.c
>
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 69200b2728..c36158271a 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -51,6 +51,7 @@ obj-y += setup.o
> obj-y += shutdown.o
> obj-y += smp.o
> obj-y += smpboot.o
> +obj-$(CONFIG_SYSTEM_SUSPEND) += suspend.o
> obj-$(CONFIG_SYSCTL) += sysctl.o
> obj-y += time.o
> obj-y += traps.o
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 596e960152..2cb02ee314 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -562,6 +562,52 @@ END(efi_xen_start)
> #endif /* CONFIG_ARM_EFI */
>
> #ifdef CONFIG_SYSTEM_SUSPEND
> +/*
> + * int prepare_resume_ctx(struct cpu_context *ptr)
> + *
> + * x0 - pointer to the storage where callee's context will be saved
... the C signature takes a pointer (struct cpu_context *ptr) and
the save path uses it, ...
> + *
> + * CPU context saved here will be restored on resume in hyp_resume function.
> + * prepare_resume_ctx shall return a non-zero value. Upon restoring context
> + * hyp_resume shall return value zero instead. From C code that invokes
> + * prepare_resume_ctx, the return value is interpreted to determine whether
> + * the context is saved (prepare_resume_ctx) or restored (hyp_resume).
> + */
> +FUNC(prepare_resume_ctx)
> + /* Store callee-saved registers */
> + stp x19, x20, [x0], #16
> + stp x21, x22, [x0], #16
> + stp x23, x24, [x0], #16
> + stp x25, x26, [x0], #16
> + stp x27, x28, [x0], #16
> + stp x29, lr, [x0], #16
> +
> + /* Store stack-pointer */
> + mov x2, sp
> + str x2, [x0], #8
> +
> + /* Store system control registers */
> + mrs x2, VBAR_EL2
> + str x2, [x0], #8
> + mrs x2, VTCR_EL2
> + str x2, [x0], #8
> + mrs x2, VTTBR_EL2
> + str x2, [x0], #8
> + mrs x2, TPIDR_EL2
> + str x2, [x0], #8
> + mrs x2, MDCR_EL2
> + str x2, [x0], #8
> + mrs x2, HSTR_EL2
> + str x2, [x0], #8
> + mrs x2, CPTR_EL2
> + str x2, [x0], #8
> + mrs x2, HCR_EL2
> + str x2, [x0], #8
> +
> + /* prepare_resume_ctx must return a non-zero value */
> + mov x0, #1
> + ret
> +END(prepare_resume_ctx)
>
> FUNC(hyp_resume)
> /* Initialize the UART if earlyprintk has been enabled. */
> @@ -580,7 +626,49 @@ FUNC(hyp_resume)
> b enable_secondary_cpu_mm
>
> mmu_resumed:
> - b .
> + /* Now we can access the cpu_context, so restore the context here */
> + ldr x0, =cpu_context
... but the restore path hardcodes =cpu_context, ignoring whatever
pointer was originally passed. If a caller were to pass anything other
than &cpu_context, the resume would load from the wrong location. Since
the sole call site does pass &cpu_context (called from system_suspend()
in the last patch), this works correctly today — but the API is somewhat
misleading.
I might be missing something, but why not make prepare_resume_ctx() take
no arguments and use =cpu_context directly inside the assembly? That way
the save and restore paths would both use the same global, and the API
would not be misleading.
> +
> + /* Restore callee-saved registers */
> + ldp x19, x20, [x0], #16
> + ldp x21, x22, [x0], #16
> + ldp x23, x24, [x0], #16
> + ldp x25, x26, [x0], #16
> + ldp x27, x28, [x0], #16
> + ldp x29, lr, [x0], #16
> +
> + /* Restore stack pointer */
> + ldr x2, [x0], #8
> + mov sp, x2
> +
> + /* Restore system control registers */
> + ldr x2, [x0], #8
> + msr VBAR_EL2, x2
> + ldr x2, [x0], #8
> + msr VTCR_EL2, x2
> + ldr x2, [x0], #8
> + msr VTTBR_EL2, x2
> + ldr x2, [x0], #8
> + msr TPIDR_EL2, x2
> + ldr x2, [x0], #8
> + msr MDCR_EL2, x2
> + ldr x2, [x0], #8
> + msr HSTR_EL2, x2
> + ldr x2, [x0], #8
> + msr CPTR_EL2, x2
> + ldr x2, [x0], #8
> + msr HCR_EL2, x2
> + isb
> +
> + /*
> + * Since context is restored return from this function will appear
> + * as return from prepare_resume_ctx. To distinguish a return from
> + * prepare_resume_ctx which is called upon finalizing the suspend,
> + * as opposed to return from this function which executes on resume,
> + * we need to return zero value here.
> + */
> + mov x0, #0
> + ret
> END(hyp_resume)
>
> #endif /* CONFIG_SYSTEM_SUSPEND */
[snip]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume
2026-05-08 20:59 ` Mykola Kvach
@ 2026-05-11 16:11 ` Oleksandr Tyshchenko
0 siblings, 0 replies; 66+ messages in thread
From: Oleksandr Tyshchenko @ 2026-05-11 16:11 UTC (permalink / raw)
To: Mykola Kvach, xen-devel@lists.xenproject.org
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
On 5/8/26 23:59, Mykola Kvach wrote:
Hello Mykola
> Hi Volodymyr,
>
> Thank you for the feedback.
>
> On Fri, May 8, 2026 at 1:06 AM Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com> wrote:
>>
>> Hi Mykola,
>>
>> Mykola Kvach <xakep.amatop@gmail.com> writes:
>>
>>> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>>>
>>> The MMU must be enabled during the resume path before restoring context,
>>> as virtual addresses are used to access the saved context data.
>>>
>>
>> I agree with Luca, this patch does not makes sense as is. I don't see
>> why it should be separated from the rest of the resume path that is
>> added in the next patch
>
> Ack. I'll combine this with the next patch in v9.
>
> Best regards,
> Mykola
>
>>
>>> This patch adds MMU setup during resume by reusing the existing
>>> enable_secondary_cpu_mm function, which enables data cache and the MMU.
>>> Before the MMU is enabled, the content of TTBR0_EL2 is changed to point
>>> to init_ttbr (page tables used at runtime).
>>>
>>> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
>>> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
>>> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
>>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
>>> ---
>>> Changes in v7:
>>> - no functional changes, just moved commit
>>> ---
>>> xen/arch/arm/arm64/head.S | 24 ++++++++++++++++++++++++
>>> 1 file changed, 24 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
>>> index 72c7b24498..596e960152 100644
>>> --- a/xen/arch/arm/arm64/head.S
>>> +++ b/xen/arch/arm/arm64/head.S
>>> @@ -561,6 +561,30 @@ END(efi_xen_start)
>>>
>>> #endif /* CONFIG_ARM_EFI */
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> +
>>> +FUNC(hyp_resume)
>>> + /* Initialize the UART if earlyprintk has been enabled. */
>>> +#ifdef CONFIG_EARLY_PRINTK
>>> + bl init_uart
>>> +#endif
>>> + PRINT_ID("- Xen resuming -\r\n")
>>> +
>>> + bl check_cpu_mode
>>> + bl cpu_init
>>> +
>>> + ldr x0, =start
>>> + adr x20, start /* x20 := paddr (start) */
>>> + sub x20, x20, x0 /* x20 := phys-offset */
>>> + ldr lr, =mmu_resumed
>>> + b enable_secondary_cpu_mm
>>> +
>>> +mmu_resumed:
>>> + b .
I also think this patch would be better squashed with the next one, as
they are tightly coupled.
During the review of patch 11, I had to switch between patches 10 and 11
several times to understand the full context—patch 10 sets up hyp_resume
with a placeholder (b .), and patch 11 immediately fills in the actual
context restore.
>>> +END(hyp_resume)
>>> +
>>> +#endif /* CONFIG_SYSTEM_SUSPEND */
>>> +
>>> /*
>>> * Local variables:
>>> * mode: ASM
>>
>> --
>> WBR, Volodymyr
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume
2026-05-11 16:00 ` Oleksandr Tyshchenko
@ 2026-05-11 18:52 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-11 18:52 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Oleksandr,
Thank you for the review.
On Mon, May 11, 2026 at 7:00 PM Oleksandr Tyshchenko
<olekstysh@gmail.com> wrote:
>
>
>
> On 4/2/26 13:45, Mykola Kvach wrote:
>
> Hello Mykola
>
> I did not spot any obvious issues with this patch. As far as I can tell,
> the save/restore register set appears to be complete and correct for the
> current codebase.
>
> Just one observation: there is an API asymmetry between
> prepare_resume_ctx() and hyp_resume() (save uses pointer, restore
> hardcodes global) ...
>
> > From: Mirela Simonovic <mirela.simonovic@aggios.com>
> >
> > The context of CPU general purpose and system control registers must be
> > saved on suspend and restored on resume. This is implemented in
> > prepare_resume_ctx and before the return from the hyp_resume function.
> > The prepare_resume_ctx must be invoked just before the PSCI system suspend
> > call is issued to the ATF. The prepare_resume_ctx must return a non-zero
> > value so that the calling 'if' statement evaluates to true, causing the
> > system suspend to be invoked. Upon resume, the context saved on suspend
> > will be restored, including the link register. Therefore, after
> > restoring the context, the control flow will return to the address
> > pointed to by the saved link register, which is the place from which
> > prepare_resume_ctx was called. To ensure that the calling 'if' statement
> > does not again evaluate to true and initiate system suspend, hyp_resume
> > must return a zero value after restoring the context.
> >
> > Note that the order of saving register context into cpu_context structure
> > must match the order of restoring.
> >
> > Support for ARM32 is not implemented. Instead, compilation fails with a
> > build-time error if suspend is enabled for ARM32.
> >
> > Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> > Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> > Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in v8:
> > - fix alignments in code
> >
> > Changes in v7:
> > - no changes
> > ---
> > xen/arch/arm/Makefile | 1 +
> > xen/arch/arm/arm64/head.S | 90 +++++++++++++++++++++++++++++-
> > xen/arch/arm/include/asm/suspend.h | 26 +++++++++
> > xen/arch/arm/suspend.c | 14 +++++
> > 4 files changed, 130 insertions(+), 1 deletion(-)
> > create mode 100644 xen/arch/arm/suspend.c
> >
> > diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> > index 69200b2728..c36158271a 100644
> > --- a/xen/arch/arm/Makefile
> > +++ b/xen/arch/arm/Makefile
> > @@ -51,6 +51,7 @@ obj-y += setup.o
> > obj-y += shutdown.o
> > obj-y += smp.o
> > obj-y += smpboot.o
> > +obj-$(CONFIG_SYSTEM_SUSPEND) += suspend.o
> > obj-$(CONFIG_SYSCTL) += sysctl.o
> > obj-y += time.o
> > obj-y += traps.o
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 596e960152..2cb02ee314 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -562,6 +562,52 @@ END(efi_xen_start)
> > #endif /* CONFIG_ARM_EFI */
> >
> > #ifdef CONFIG_SYSTEM_SUSPEND
> > +/*
> > + * int prepare_resume_ctx(struct cpu_context *ptr)
> > + *
> > + * x0 - pointer to the storage where callee's context will be saved
>
> ... the C signature takes a pointer (struct cpu_context *ptr) and
> the save path uses it, ...
>
> > + *
> > + * CPU context saved here will be restored on resume in hyp_resume function.
> > + * prepare_resume_ctx shall return a non-zero value. Upon restoring context
> > + * hyp_resume shall return value zero instead. From C code that invokes
> > + * prepare_resume_ctx, the return value is interpreted to determine whether
> > + * the context is saved (prepare_resume_ctx) or restored (hyp_resume).
> > + */
> > +FUNC(prepare_resume_ctx)
> > + /* Store callee-saved registers */
> > + stp x19, x20, [x0], #16
> > + stp x21, x22, [x0], #16
> > + stp x23, x24, [x0], #16
> > + stp x25, x26, [x0], #16
> > + stp x27, x28, [x0], #16
> > + stp x29, lr, [x0], #16
> > +
> > + /* Store stack-pointer */
> > + mov x2, sp
> > + str x2, [x0], #8
> > +
> > + /* Store system control registers */
> > + mrs x2, VBAR_EL2
> > + str x2, [x0], #8
> > + mrs x2, VTCR_EL2
> > + str x2, [x0], #8
> > + mrs x2, VTTBR_EL2
> > + str x2, [x0], #8
> > + mrs x2, TPIDR_EL2
> > + str x2, [x0], #8
> > + mrs x2, MDCR_EL2
> > + str x2, [x0], #8
> > + mrs x2, HSTR_EL2
> > + str x2, [x0], #8
> > + mrs x2, CPTR_EL2
> > + str x2, [x0], #8
> > + mrs x2, HCR_EL2
> > + str x2, [x0], #8
> > +
> > + /* prepare_resume_ctx must return a non-zero value */
> > + mov x0, #1
> > + ret
> > +END(prepare_resume_ctx)
> >
> > FUNC(hyp_resume)
> > /* Initialize the UART if earlyprintk has been enabled. */
> > @@ -580,7 +626,49 @@ FUNC(hyp_resume)
> > b enable_secondary_cpu_mm
> >
> > mmu_resumed:
> > - b .
> > + /* Now we can access the cpu_context, so restore the context here */
> > + ldr x0, =cpu_context
>
> ... but the restore path hardcodes =cpu_context, ignoring whatever
> pointer was originally passed. If a caller were to pass anything other
> than &cpu_context, the resume would load from the wrong location. Since
> the sole call site does pass &cpu_context (called from system_suspend()
> in the last patch), this works correctly today — but the API is somewhat
> misleading.
>
> I might be missing something, but why not make prepare_resume_ctx() take
> no arguments and use =cpu_context directly inside the assembly? That way
> the save and restore paths would both use the same global, and the API
> would not be misleading.
Yes, good point. Since the resume path restores from the global context object,
the argument to prepare_resume_ctx() is misleading.
I will remove the argument and make both the save and restore paths use the
same global resume_cpu_context object.
Best regards,
Mykola
>
> > +
> > + /* Restore callee-saved registers */
> > + ldp x19, x20, [x0], #16
> > + ldp x21, x22, [x0], #16
> > + ldp x23, x24, [x0], #16
> > + ldp x25, x26, [x0], #16
> > + ldp x27, x28, [x0], #16
> > + ldp x29, lr, [x0], #16
> > +
> > + /* Restore stack pointer */
> > + ldr x2, [x0], #8
> > + mov sp, x2
> > +
> > + /* Restore system control registers */
> > + ldr x2, [x0], #8
> > + msr VBAR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr VTCR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr VTTBR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr TPIDR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr MDCR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr HSTR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr CPTR_EL2, x2
> > + ldr x2, [x0], #8
> > + msr HCR_EL2, x2
> > + isb
> > +
> > + /*
> > + * Since context is restored return from this function will appear
> > + * as return from prepare_resume_ctx. To distinguish a return from
> > + * prepare_resume_ctx which is called upon finalizing the suspend,
> > + * as opposed to return from this function which executes on resume,
> > + * we need to return zero value here.
> > + */
> > + mov x0, #0
> > + ret
> > END(hyp_resume)
> >
> > #endif /* CONFIG_SYSTEM_SUSPEND */
>
>
> [snip]
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2026-05-11 6:40 ` Luca Fancellu
@ 2026-05-11 20:41 ` Mykola Kvach
0 siblings, 0 replies; 66+ messages in thread
From: Mykola Kvach @ 2026-05-11 20:41 UTC (permalink / raw)
To: Luca Fancellu
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On Mon, May 11, 2026 at 9:41 AM Luca Fancellu <Luca.Fancellu@arm.com> wrote:
>
> Hi Mykola,
>
> >
> >>
> >>>
> >>> For GICC_APRn/GICC_NSAPRn, those registers describe active priority state for
> >>> interrupts already acknowledged by the CPU interface. The final suspend path is
> >>> not expected to run with an active physical interrupt context. If those
> >>> registers were non-zero there, restoring only APR/NSAPR would not make the
> >>> corresponding interrupt handling context valid after resume, and could instead
> >>> leave the CPU interface with stale active priority state.
> >>
> >> Ok I understand now, but if we are expecting here GICD_ISACTIVERn zeroed, why are
> >> we saving/restoring it? Shouldn’t we instead have a runtime check that it’s zero and in case
> >> it’s not bail out? And in the resume path we would only zero it.
> >>
> >> Am I missing something?
> >
> > Good questions.
> >
> > Yes, the distinction I should have made clearer is between CPU-interface
> > active-priority state and distributor active state.
> >
> > For GICC_APRn/GICC_NSAPRn, I expect the state to be quiesced at this point.
> > Those registers track active priorities in the CPU interface. Xen reaches
> > gic_suspend() with local interrupts disabled, and for the guest-routed
> > interrupt case that can leave a distributor active bit behind, Xen has
> > already performed the physical EOI, so the CPU-interface priority has been
> > dropped.
> > There is no CPU-interface active-priority context that we can meaningfully
> > replay after resume.
> >
> > That is different from GICD_ISACTIVERn. In EOImode==1, EOIR only drops the
> > priority. The interrupt remains active in the distributor until the separate
> > deactivation step. For a guest-routed interrupt Xen's GICv2 guest end path does
> > only the physical EOI; deactivation is completed later by the virtual GIC/GICV
> > path when the guest completes the interrupt.
> >
> > This is why APR/NSAPR and ISACTIVERn are treated differently. For example:
> >
> > 1. A physical IRQ routed to a guest is acknowledged by Xen.
> > 2. The GIC marks the interrupt active in the distributor.
> > 3. Xen EOIs it, which drops the physical priority.
> > 4. Xen queues/injects the interrupt to the vGIC.
> > 5. The guest has not yet run, or the virtual interrupt is not yet deliverable
> > because of guest PMR/priority/local IRQ masking/vGIC state.
> > 6. Therefore the guest-side deactivate has not happened yet, and the physical
> > distributor active bit remains set.
> >
> > There is also a late suspend window in the current Xen path: domains are
> > suspended and the scheduler is disabled before local IRQs are disabled.
> > A guest-routed IRQ can therefore be taken by Xen after the guest is already
> > suspended, but before gic_suspend(). Xen can EOI/priority-drop it and queue
> > it for the guest, while the guest cannot run and deactivate it before the
> > GIC state is saved.
> >
> > This is the same class of issue handled by Linux for GIC EOImode==1. Linux
> > saves/restores the active state because forwarded interrupts can remain active
> > while passed to a VM [1].
> >
> > So I don't think GICD_ISACTIVERn should be treated as "must be zero" unless we
> > also add an explicit suspend-abort/quiesce policy for in-flight guest
> > interrupts. That would be a different design: detect non-zero active/in-flight
> > state, unwind suspend, thaw domains, let the guest drain/deactivate the
> > interrupts, and retry later. This series does not implement that policy. Given
> > the current flow, preserving GICD_ISACTIVERn avoids losing architectural
> > interrupt-controller state across suspend/resume.
> >
> > I am not opposed to such a policy as a follow-up if we want stricter suspend
> > quiescence rules, but I think it should be designed explicitly rather than
> > inferred from the GIC save/restore code.
> >
> > Best regards,
> > Mykola
> >
> > [1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1447701208-18150-5-git-send-email-marc.zyngier@arm.com/
>
> Right, yes I agree! I have another question though, since GICC_APRn state should be
> quiesced in the suspend path (allimplemented active-priority bits should read as zero),
> should we have a runtime check just after disabling the CPU interface?
Yes, I think a runtime check is appropriate here.
Best regards,
Mykola
>
> Cheers,
> Luca
>
^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2026-05-11 20:41 UTC | newest]
Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02 10:45 [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
2026-04-20 15:22 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
2026-04-21 13:24 ` Luca Fancellu
2026-05-07 7:48 ` Mykola Kvach
2026-05-08 10:56 ` Luca Fancellu
2026-05-10 6:02 ` Mykola Kvach
2026-05-11 6:40 ` Luca Fancellu
2026-05-11 20:41 ` Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 03/13] xen/arm: gic-v3: tolerate retained redistributor LPI state across CPU_OFF Mykola Kvach
2026-04-22 15:55 ` Luca Fancellu
2026-05-05 6:06 ` Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 04/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions Mykola Kvach
2026-04-23 11:28 ` Luca Fancellu
2026-05-05 7:26 ` Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 05/13] xen/arm: gic-v3: add ITS suspend/resume support Mykola Kvach
2026-04-24 10:53 ` Luca Fancellu
2026-05-05 10:09 ` Mykola Kvach
2026-05-08 11:30 ` Luca Fancellu
2026-05-08 22:11 ` Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 06/13] xen/arm: tee: keep init_tee_secondary() for hotplug and resume Mykola Kvach
2026-04-24 10:59 ` Luca Fancellu
2026-04-27 8:19 ` Bertrand Marquis
2026-05-07 22:26 ` Volodymyr Babchuk
2026-04-02 10:45 ` [PATCH v8 07/13] xen/arm: ffa: fix notification SRI across CPU hotplug/suspend Mykola Kvach
2026-04-24 12:05 ` Luca Fancellu
2026-04-27 8:20 ` Bertrand Marquis
2026-05-05 10:18 ` Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 08/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
2026-04-24 13:34 ` Luca Fancellu
2026-05-05 11:45 ` Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 09/13] arm/smmu-v3: add suspend/resume handlers Mykola Kvach
2026-04-27 14:01 ` Luca Fancellu
2026-04-27 14:02 ` Luca Fancellu
2026-05-05 15:23 ` Mykola Kvach
2026-05-08 12:21 ` Luca Fancellu
2026-05-08 21:44 ` Mykola Kvach
2026-05-09 7:50 ` Luca Fancellu
2026-04-02 10:45 ` [PATCH v8 10/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
2026-04-27 14:50 ` Luca Fancellu
2026-05-05 15:55 ` Mykola Kvach
2026-05-08 13:26 ` Luca Fancellu
2026-05-08 20:51 ` Mykola Kvach
2026-05-07 22:06 ` Volodymyr Babchuk
2026-05-08 20:59 ` Mykola Kvach
2026-05-11 16:11 ` Oleksandr Tyshchenko
2026-04-02 10:45 ` [PATCH v8 11/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
2026-04-27 15:26 ` Luca Fancellu
2026-05-07 22:17 ` Volodymyr Babchuk
2026-05-08 10:38 ` Mykola Kvach
2026-05-11 16:00 ` Oleksandr Tyshchenko
2026-05-11 18:52 ` Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 12/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
2026-04-27 16:21 ` Luca Fancellu
2026-05-05 16:15 ` Mykola Kvach
2026-04-02 10:45 ` [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
2026-04-02 11:00 ` Jan Beulich
2026-04-29 8:05 ` Luca Fancellu
2026-05-05 20:34 ` Mykola Kvach
2026-05-07 22:25 ` Volodymyr Babchuk
2026-05-08 8:37 ` Mykola Kvach
2026-05-08 14:30 ` Luca Fancellu
2026-05-08 20:49 ` Mykola Kvach
2026-04-16 12:51 ` PING: Re: [PATCH v8 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2026-04-16 12:52 ` Mykola Kvach
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.