* [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64
@ 2025-09-01 22:10 Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
` (13 more replies)
0 siblings, 14 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Jan Beulich, Roger Pau Monné, Rahul Singh
From: Mykola Kvach <mykola_kvach@epam.com>
This is part 2 of version 5 of the ARM Xen system suspend/resume patch
series, based on earlier work by Mirela Simonovic and Mykyta Poturai.
The first part is here:
https://marc.info/?l=xen-devel&m=175659181415965&w=2
This version is ported to Xen master (4.21-unstable) and includes
extensive improvements based on reviewer feedback. The patch series
restructures code to improve robustness, maintainability, and implements
system Suspend-to-RAM support on ARM64 hardware domains.
At a high-level, this patch series provides:
- Support for Host system suspend/resume via PSCI SYSTEM_SUSPEND (ARM64)
- Suspend/resume infrastructure for CPU context, timers, GICv2/GICv3 and IPMMU-VMSA
- Proper error propagation and recovery throughout the suspend/resume flow
Key updates in this series:
- Introduced architecture-specific suspend/resume infrastructure (new `suspend.c`, `suspend.h`, low-level context save/restore in `head.S`)
- Integrated GICv2/GICv3 suspend and resume, including memory-backed context save/restore with error handling
- Added time and IRQ suspend/resume hooks, ensuring correct timer/interrupt state across suspend cycles
- Implemented proper PSCI SYSTEM_SUSPEND invocation and version checks
- Improved state management and recovery in error cases during suspend/resume
- Added support for IPMMU-VMSA context save/restore
- Added support for GICv3 eSPI registers context save/restore
---
TODOs:
- Test system suspend with llc_coloring_enabled set and verify functionality
- Implement SMMUv3 suspend/resume handlers
- Enable "xl suspend" support on ARM
- Properly disable Xen timer watchdog from relevant services (only init.d left)
- Add suspend/resume CI test for ARM (QEMU if feasible)
- Investigate feasibility and need for implementing system suspend on ARM32
---
Changelog for v6:
- Add suspend/resume support for GICv3 eSPI registers (to be applied after the
main eSPI series).
- Drop redundant iommu_enabled check from host system suspend.
- Switch from continue_hypercall_on_cpu to a dedicated tasklet for system
suspend, avoiding user register modification and decoupling guest/system
suspend status.
- Refactor IOMMU register context code.
- Improve IRQ handling: call handler->disable(), move system state checks, and
skip IRQ release during suspend inside release_irq().
- Remove redundant GICv3 save/restore state logic now handled during vCPU
context switch.
- Clarify and unify error/warning messages, comments, and documentation.
- Correct loops for saving/restoring priorities and merge loops where possible.
- Add explicit error for unimplemented ITS suspend support.
- Add missing GICD_CTLR_DS bit definition and clarify GICR_WAKER comments.
- Cleanup active and enable registers before restoring.
- Minor comment improvements and code cleanups.
Changes introduced in V5:
- Add support for IPMMU-VMSA context save/restore
- Add support for GICv3 context save/restore
- Select HAS_SYSTEM_SUSPEND in ARM_64 instead of ARM
- Check llc_coloring_enabled instead of LLC_COLORING during the selection
of HAS_SYSTEM_SUSPEND config
- Call host_system_suspend from guest PSCI system suspend instead of
arch_domain_shutdown, reducing the complexity of the new code
Changes introduced in V4:
- Remove the prior tasklet-based workaround in favor of a more
straightforward and safer solution.
- Rework the approach by adding explicit system state checks around
request_irq and release_irq calls; skip these calls during suspend
and resume states to avoid unsafe memory operations when IRQs are
disabled.
- Prevent reinitialization of local IRQ descriptors on system resume.
- Restore the state of local IRQs during system resume for secondary CPUs.
- Drop code for saving and restoring VCPU context (see part 1 of the patch
series for details).
- Remove IOMMU suspend and resume calls until these features are implemented.
- Move system suspend logic to arch_domain_shutdown, invoked from
domain_shutdown.
- Add console_end_sync to the resume path after system suspend.
- Drop unnecessary DAIF masking; interrupts are already masked on resume.
- Remove leftover TLB flush instructions; flushing is handled in enable_mmu.
- Avoid setting x19 in hyp_resume as it is not required.
- Replace prepare_secondary_mm with set_init_ttbr, and call it from system_suspend.
- Produce a build-time error for ARM32 when CONFIG_SYSTEM_SUSPEND is enabled.
- Use register_t instead of uint64_t in the cpu_context structure.
- Apply minor fixes such as renaming functions, updating comments, and
modifying commit messages to accurately reflect the changes introduced
by this patch series.
For earlier changelogs, please refer to the previous cover letters.
Previous versions:
V1: https://marc.info/?l=xen-devel&m=154202231501850&w=2
V2: https://marc.info/?l=xen-devel&m=166514782207736&w=2
V3: https://lists.xen.org/archives/html/xen-devel/2025-03/msg00168.html
Mirela Simonovic (6):
xen/arm: Add suspend and resume timer helpers
xen/arm: gic-v2: Implement GIC suspend/resume functions
xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
xen/arm: Resume memory management on Xen resume
xen/arm: Save/restore context on suspend/resume
xen/arm: Add support for system suspend triggered by hardware domain
Mykola Kvach (5):
xen/arm: gic-v3: Implement GICv3 suspend/resume functions
xen/arm: Don't release IRQs on suspend
xen/arm: irq: avoid local IRQ descriptors reinit on system resume
xen/arm: irq: Restore state of local IRQs during system resume
xen/arm: gic-v3: Add suspend/resume support for eSPI registers
Oleksandr Tyshchenko (2):
iommu/ipmmu-vmsa: Implement suspend/resume callbacks
xen/arm: Suspend/resume IOMMU on Xen suspend/resume
xen/arch/arm/Kconfig | 1 +
xen/arch/arm/Makefile | 1 +
xen/arch/arm/arm64/head.S | 112 +++++++++
xen/arch/arm/gic-v2.c | 143 +++++++++++
xen/arch/arm/gic-v3-lpi.c | 3 +
xen/arch/arm/gic-v3.c | 288 +++++++++++++++++++++++
xen/arch/arm/gic.c | 32 +++
xen/arch/arm/include/asm/gic.h | 12 +
xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
xen/arch/arm/include/asm/mm.h | 2 +
xen/arch/arm/include/asm/psci.h | 1 +
xen/arch/arm/include/asm/suspend.h | 46 ++++
xen/arch/arm/include/asm/time.h | 5 +
xen/arch/arm/irq.c | 46 ++++
xen/arch/arm/mmu/smpboot.c | 2 +-
xen/arch/arm/psci.c | 23 +-
xen/arch/arm/suspend.c | 175 ++++++++++++++
xen/arch/arm/tee/ffa_notif.c | 2 +-
xen/arch/arm/time.c | 49 +++-
xen/arch/arm/vpsci.c | 9 +-
xen/common/domain.c | 4 +
xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 ++++++++++++++++++++
xen/drivers/passthrough/arm/smmu-v3.c | 10 +
xen/drivers/passthrough/arm/smmu.c | 10 +
24 files changed, 1220 insertions(+), 14 deletions(-)
create mode 100644 xen/arch/arm/include/asm/suspend.h
create mode 100644 xen/arch/arm/suspend.c
--
2.48.1
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:14 ` Volodymyr Babchuk
2025-09-01 22:10 ` [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
` (12 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
Timer interrupts must be disabled while the system is suspended to prevent
spurious wake-ups. Suspending the timers involves disabling both the EL1
physical timer and the EL2 hypervisor timer. Resuming consists of raising
the TIMER_SOFTIRQ, which prompts the generic timer code to reprogram the
EL2 timer as needed. Re-enabling of the EL1 timer is left to the entity
that uses it.
Introduce a new helper, disable_physical_timers, to encapsulate disabling
of the physical timers.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V4:
- Rephrased comment and commit message for better clarity
- Created separate function for disabling physical timers
Changes in V3:
- time_suspend and time_resume are now conditionally compiled
under CONFIG_SYSTEM_SUSPEND
---
xen/arch/arm/include/asm/time.h | 5 +++++
xen/arch/arm/time.c | 38 +++++++++++++++++++++++++++------
2 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/xen/arch/arm/include/asm/time.h b/xen/arch/arm/include/asm/time.h
index 49ad8c1a6d..f4fd0c6af5 100644
--- a/xen/arch/arm/include/asm/time.h
+++ b/xen/arch/arm/include/asm/time.h
@@ -108,6 +108,11 @@ void preinit_xen_time(void);
void force_update_vcpu_system_time(struct vcpu *v);
+#ifdef CONFIG_SYSTEM_SUSPEND
+void time_suspend(void);
+void time_resume(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
#endif /* __ARM_TIME_H__ */
/*
* Local variables:
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index e74d30d258..ad984fdfdd 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -303,6 +303,14 @@ static void check_timer_irq_cfg(unsigned int irq, const char *which)
"WARNING: %s-timer IRQ%u is not level triggered.\n", which, irq);
}
+/* Disable physical timers for EL1 and EL2 on the current CPU */
+static inline void disable_physical_timers(void)
+{
+ WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
+ WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
+ isb();
+}
+
/* Set up the timer interrupt on this CPU */
void init_timer_interrupt(void)
{
@@ -310,9 +318,7 @@ void init_timer_interrupt(void)
WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */
/* Do not let the VMs program the physical timer, only read the physical counter */
WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
- WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
- WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
- isb();
+ disable_physical_timers();
request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
"hyptimer", NULL);
@@ -330,9 +336,7 @@ void init_timer_interrupt(void)
*/
static void deinit_timer_interrupt(void)
{
- WRITE_SYSREG(0, CNTP_CTL_EL0); /* Disable physical timer */
- WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Disable hypervisor's timer */
- isb();
+ disable_physical_timers();
release_irq(timer_irq[TIMER_HYP_PPI], NULL);
release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
@@ -372,6 +376,28 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
/* XXX update guest visible wallclock time */
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+void time_suspend(void)
+{
+ disable_physical_timers();
+}
+
+void time_resume(void)
+{
+ /*
+ * Raising the timer softirq triggers generic code to call reprogram_timer
+ * with the correct timeout (not known here).
+ *
+ * No further action is needed to restore timekeeping after power down,
+ * since the system counter is unaffected. See ARM DDI 0487 L.a, D12.1.2
+ * "The system counter must be implemented in an always-on power domain."
+ */
+ raise_softirq(TIMER_SOFTIRQ);
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
static int cpu_time_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:24 ` Volodymyr Babchuk
2025-09-01 22:10 ` [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 " Mykola Kvach
` (11 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
System suspend may lead to a state where GIC would be powered down.
Therefore, Xen should save/restore the context of GIC on suspend/resume.
Note that the context consists of states of registers which are
controlled by the hypervisor. Other GIC registers which are accessible
by guests are saved/restored on context switch.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- drop extra func/line printing from dprintk
- drop checking context allocation from resume handler
- merge some loops where it is possible
Changes in v4:
- Add error logging for allocation failures
Changes in v3:
- Drop asserts and return error codes instead.
- Wrap code with CONFIG_SYSTEM_SUSPEND.
Changes in v2:
- Minor fixes after review.
---
xen/arch/arm/gic-v2.c | 143 +++++++++++++++++++++++++++++++++
xen/arch/arm/gic.c | 29 +++++++
xen/arch/arm/include/asm/gic.h | 12 +++
3 files changed, 184 insertions(+)
diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index b23e72a3d0..6373599e69 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -1098,6 +1098,140 @@ static int gicv2_iomem_deny_access(struct domain *d)
return iomem_deny_access(d, mfn, mfn + nr);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+/* GICv2 registers to be saved/restored on system suspend/resume */
+struct gicv2_context {
+ /* GICC context */
+ uint32_t gicc_ctlr;
+ uint32_t gicc_pmr;
+ uint32_t gicc_bpr;
+ /* GICD context */
+ uint32_t gicd_ctlr;
+ uint32_t *gicd_isenabler;
+ uint32_t *gicd_isactiver;
+ uint32_t *gicd_ipriorityr;
+ uint32_t *gicd_itargetsr;
+ uint32_t *gicd_icfgr;
+};
+
+static struct gicv2_context gicv2_context;
+
+static int gicv2_suspend(void)
+{
+ unsigned int i;
+
+ if ( !gicv2_context.gicd_isenabler )
+ {
+ dprintk(XENLOG_WARNING, "GICv2 suspend context not allocated!\n");
+ return -ENOMEM;
+ }
+
+ /* Save GICC configuration */
+ gicv2_context.gicc_ctlr = readl_gicc(GICC_CTLR);
+ gicv2_context.gicc_pmr = readl_gicc(GICC_PMR);
+ gicv2_context.gicc_bpr = readl_gicc(GICC_BPR);
+
+ /* Save GICD configuration */
+ gicv2_context.gicd_ctlr = readl_gicd(GICD_CTLR);
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
+ {
+ gicv2_context.gicd_isenabler[i] = readl_gicd(GICD_ISENABLER + i * 4);
+ gicv2_context.gicd_isactiver[i] = readl_gicd(GICD_ISACTIVER + i * 4);
+ }
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
+ {
+ gicv2_context.gicd_ipriorityr[i] = readl_gicd(GICD_IPRIORITYR + i * 4);
+ gicv2_context.gicd_itargetsr[i] = readl_gicd(GICD_ITARGETSR + i * 4);
+ }
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
+ gicv2_context.gicd_icfgr[i] = readl_gicd(GICD_ICFGR + i * 4);
+
+ return 0;
+}
+
+static void gicv2_resume(void)
+{
+ unsigned int i;
+
+ gicv2_cpu_disable();
+ /* Disable distributor */
+ writel_gicd(0, GICD_CTLR);
+
+ /* Restore GICD configuration */
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
+ {
+ writel_gicd(0xffffffff, GICD_ICENABLER + i * 4);
+ writel_gicd(gicv2_context.gicd_isenabler[i], GICD_ISENABLER + i * 4);
+
+ writel_gicd(0xffffffff, GICD_ICACTIVER + i * 4);
+ writel_gicd(gicv2_context.gicd_isactiver[i], GICD_ISACTIVER + i * 4);
+ }
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
+ {
+ writel_gicd(gicv2_context.gicd_ipriorityr[i], GICD_IPRIORITYR + i * 4);
+ writel_gicd(gicv2_context.gicd_itargetsr[i], GICD_ITARGETSR + i * 4);
+ }
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
+ writel_gicd(gicv2_context.gicd_icfgr[i], GICD_ICFGR + i * 4);
+
+ /* Make sure all registers are restored and enable distributor */
+ writel_gicd(gicv2_context.gicd_ctlr | GICD_CTL_ENABLE, GICD_CTLR);
+
+ /* Restore GIC CPU interface configuration */
+ writel_gicc(gicv2_context.gicc_pmr, GICC_PMR);
+ writel_gicc(gicv2_context.gicc_bpr, GICC_BPR);
+
+ /* Enable GIC CPU interface */
+ writel_gicc(gicv2_context.gicc_ctlr | GICC_CTL_ENABLE | GICC_CTL_EOI,
+ GICC_CTLR);
+}
+
+static void gicv2_alloc_context(struct gicv2_context *gc)
+{
+ uint32_t n = gicv2_info.nr_lines;
+
+ gc->gicd_isenabler = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
+ if ( !gc->gicd_isenabler )
+ goto err_free;
+
+ gc->gicd_isactiver = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
+ if ( !gc->gicd_isactiver )
+ goto err_free;
+
+ gc->gicd_itargetsr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
+ if ( !gc->gicd_itargetsr )
+ goto err_free;
+
+ gc->gicd_ipriorityr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
+ if ( !gc->gicd_ipriorityr )
+ goto err_free;
+
+ gc->gicd_icfgr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 16));
+ if ( !gc->gicd_icfgr )
+ goto err_free;
+
+ return;
+
+ err_free:
+ printk(XENLOG_ERR "Failed to allocate memory for GICv2 suspend context\n");
+
+ xfree(gc->gicd_icfgr);
+ xfree(gc->gicd_ipriorityr);
+ xfree(gc->gicd_itargetsr);
+ xfree(gc->gicd_isactiver);
+ xfree(gc->gicd_isenabler);
+
+ memset(gc, 0, sizeof(*gc));
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
#ifdef CONFIG_ACPI
static unsigned long gicv2_get_hwdom_extra_madt_size(const struct domain *d)
{
@@ -1302,6 +1436,11 @@ static int __init gicv2_init(void)
spin_unlock(&gicv2.lock);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ /* Allocate memory to be used for saving GIC context during the suspend */
+ gicv2_alloc_context(&gicv2_context);
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
return 0;
}
@@ -1345,6 +1484,10 @@ static const struct gic_hw_operations gicv2_ops = {
.map_hwdom_extra_mappings = gicv2_map_hwdom_extra_mappings,
.iomem_deny_access = gicv2_iomem_deny_access,
.do_LPI = gicv2_do_LPI,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = gicv2_suspend,
+ .resume = gicv2_resume,
+#endif /* CONFIG_SYSTEM_SUSPEND */
};
/* Set up the GIC */
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e80fe0ca24..a018bd7715 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -425,6 +425,35 @@ int gic_iomem_deny_access(struct domain *d)
return gic_hw_ops->iomem_deny_access(d);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+int gic_suspend(void)
+{
+ /* Must be called by boot CPU#0 with interrupts disabled */
+ ASSERT(!local_irq_is_enabled());
+ ASSERT(!smp_processor_id());
+
+ if ( !gic_hw_ops->suspend || !gic_hw_ops->resume )
+ return -ENOSYS;
+
+ return gic_hw_ops->suspend();
+}
+
+void gic_resume(void)
+{
+ /*
+ * Must be called by boot CPU#0 with interrupts disabled after gic_suspend
+ * has returned successfully.
+ */
+ ASSERT(!local_irq_is_enabled());
+ ASSERT(!smp_processor_id());
+ ASSERT(gic_hw_ops->resume);
+
+ gic_hw_ops->resume();
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
static int cpu_gic_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
diff --git a/xen/arch/arm/include/asm/gic.h b/xen/arch/arm/include/asm/gic.h
index 541f0eeb80..a706303008 100644
--- a/xen/arch/arm/include/asm/gic.h
+++ b/xen/arch/arm/include/asm/gic.h
@@ -280,6 +280,12 @@ extern int gicv_setup(struct domain *d);
extern void gic_save_state(struct vcpu *v);
extern void gic_restore_state(struct vcpu *v);
+#ifdef CONFIG_SYSTEM_SUSPEND
+/* Suspend/resume */
+extern int gic_suspend(void);
+extern void gic_resume(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
/* SGI (AKA IPIs) */
enum gic_sgi {
GIC_SGI_EVENT_CHECK,
@@ -395,6 +401,12 @@ struct gic_hw_operations {
int (*iomem_deny_access)(struct domain *d);
/* Handle LPIs, which require special handling */
void (*do_LPI)(unsigned int lpi);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ /* Save GIC configuration due to the system suspend */
+ int (*suspend)(void);
+ /* Restore GIC configuration due to the system resume */
+ void (*resume)(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
};
extern const struct gic_hw_operations *gic_hw_ops;
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 16:08 ` Oleksandr Tyshchenko
2025-09-01 22:10 ` [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend Mykola Kvach
` (10 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
System suspend may lead to a state where GIC would be powered down.
Therefore, Xen should save/restore the context of GIC on suspend/resume.
Note that the context consists of states of registers which are
controlled by the hypervisor. Other GIC registers which are accessible
by guests are saved/restored on context switch.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- Drop gicv3_save/restore_state since it is already handled during vCPU
context switch.
- The comment about systems without SPIs is clarified for readability.
- Error and warning messages related to suspend context allocation are unified
and now use printk() with XENLOG_ERR for consistency.
- The check for suspend context allocation in gicv3_resume() is removed,
as it is handled earlier in the suspend path.
- The loop for saving and restoring PPI/SGI priorities is corrected to use
the proper increment.
- The gicv3_suspend() function now prints an explicit error if ITS suspend
support is not implemented, and returns ENOSYS in this case.
- The GICD_CTLR_DS bit definition is added to gic_v3_defs.h.
- The comment for GICR_WAKER access is expanded to reference the relevant
ARM specification section and clarify the RAZ/WI behavior for Non-secure
accesses.
- Cleanup active and enable registers before restoring.
---
xen/arch/arm/gic-v3-lpi.c | 3 +
xen/arch/arm/gic-v3.c | 235 +++++++++++++++++++++++++
xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
3 files changed, 239 insertions(+)
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index de5052e5cf..61a6e18303 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -391,6 +391,9 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
switch ( action )
{
case CPU_UP_PREPARE:
+ if ( system_state == SYS_STATE_resume )
+ break;
+
rc = gicv3_lpi_allocate_pendtable(cpu);
if ( rc )
printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%lu\n",
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index cd3e1acf79..9f1be7e905 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1776,6 +1776,233 @@ static bool gic_dist_supports_lpis(void)
return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+/* GICv3 registers to be saved/restored on system suspend/resume */
+struct gicv3_ctx {
+ struct dist_ctx {
+ uint32_t ctlr;
+ /*
+ * This struct represent block of 32 IRQs
+ * TODO: store extended SPI configuration (GICv3.1+)
+ */
+ struct irq_regs {
+ uint32_t icfgr[2];
+ uint32_t ipriorityr[8];
+ uint64_t irouter[32];
+ uint32_t isactiver;
+ uint32_t isenabler;
+ } *irqs;
+ } dist;
+
+ /* have only one rdist structure for last running CPU during suspend */
+ struct redist_ctx {
+ uint32_t ctlr;
+ /* TODO: handle case when we have more than 16 PPIs (GICv3.1+) */
+ uint32_t icfgr[2];
+ uint32_t igroupr;
+ uint32_t ipriorityr[8];
+ uint32_t isactiver;
+ uint32_t isenabler;
+ } rdist;
+
+ struct cpu_ctx {
+ uint32_t ctlr;
+ uint32_t pmr;
+ uint32_t bpr;
+ uint32_t sre_el2;
+ uint32_t grpen;
+ } cpu;
+};
+
+static struct gicv3_ctx gicv3_ctx;
+
+static void __init gicv3_alloc_context(void)
+{
+ uint32_t blocks = DIV_ROUND_UP(gicv3_info.nr_lines, 32);
+
+ /* We don't have ITS support for suspend */
+ if ( gicv3_its_host_has_its() )
+ return;
+
+ /* The spec allows for systems without any SPIs */
+ if ( blocks > 1 )
+ {
+ gicv3_ctx.dist.irqs = xzalloc_array(typeof(*gicv3_ctx.dist.irqs),
+ blocks - 1);
+ if ( !gicv3_ctx.dist.irqs )
+ printk(XENLOG_ERR "Failed to allocate memory for GICv3 suspend context\n");
+ }
+}
+
+static void gicv3_disable_redist(void)
+{
+ void __iomem* waker = GICD_RDIST_BASE + GICR_WAKER;
+
+ /*
+ * Avoid infinite loop if Non-secure does not have access to GICR_WAKER.
+ * See Arm IHI 0069H.b, 12.11.42 GICR_WAKER:
+ * When GICD_CTLR.DS == 0 and an access is Non-secure accesses to this
+ * register are RAZ/WI.
+ */
+ if ( !(readl_relaxed(GICD + GICD_CTLR) & GICD_CTLR_DS) )
+ return;
+
+ writel_relaxed(readl_relaxed(waker) | GICR_WAKER_ProcessorSleep, waker);
+ while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 );
+}
+
+static int gicv3_suspend(void)
+{
+ unsigned int i;
+ void __iomem *base;
+ typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
+
+ /* TODO: implement support for ITS */
+ if ( gicv3_its_host_has_its() )
+ {
+ printk(XENLOG_ERR "GICv3: ITS suspend support is not implemented\n");
+ return -ENOSYS;
+ }
+
+ if ( !gicv3_ctx.dist.irqs && gicv3_info.nr_lines > NR_GIC_LOCAL_IRQS )
+ {
+ printk(XENLOG_ERR "GICv3: suspend context is not allocated!\n");
+ return -ENOMEM;
+ }
+
+ /* Save GICC configuration */
+ gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
+ gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
+ gicv3_ctx.cpu.bpr = READ_SYSREG(ICC_BPR1_EL1);
+ gicv3_ctx.cpu.sre_el2 = READ_SYSREG(ICC_SRE_EL2);
+ gicv3_ctx.cpu.grpen = READ_SYSREG(ICC_IGRPEN1_EL1);
+
+ gicv3_disable_interface();
+ gicv3_disable_redist();
+
+ /* Save GICR configuration */
+ gicv3_redist_wait_for_rwp();
+
+ base = GICD_RDIST_SGI_BASE;
+
+ rdist->ctlr = readl_relaxed(base + GICR_CTLR);
+
+ /* Save priority on PPI and SGI interrupts */
+ for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
+ rdist->ipriorityr[i] = readl_relaxed(base + GICR_IPRIORITYR0 + 4 * i);
+
+ rdist->isactiver = readl_relaxed(base + GICR_ISACTIVER0);
+ rdist->isenabler = readl_relaxed(base + GICR_ISENABLER0);
+ rdist->igroupr = readl_relaxed(base + GICR_IGROUPR0);
+ rdist->icfgr[0] = readl_relaxed(base + GICR_ICFGR0);
+ rdist->icfgr[1] = readl_relaxed(base + GICR_ICFGR1);
+
+ /* Save GICD configuration */
+ gicv3_dist_wait_for_rwp();
+ gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
+
+ for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
+ {
+ typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
+ unsigned int irq;
+
+ base = GICD + GICD_ICFGR + 8 * i;
+ irqs->icfgr[0] = readl_relaxed(base);
+ irqs->icfgr[1] = readl_relaxed(base + 4);
+
+ base = GICD + GICD_IPRIORITYR + 32 * i;
+ for ( irq = 0; irq < 8; irq++ )
+ irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
+
+ base = GICD + GICD_IROUTER + 32 * i;
+ for ( irq = 0; irq < 32; irq++ )
+ irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
+
+ irqs->isactiver = readl_relaxed(GICD + GICD_ISACTIVER + 4 * i);
+ irqs->isenabler = readl_relaxed(GICD + GICD_ISENABLER + 4 * i);
+ }
+
+ return 0;
+}
+
+static void gicv3_resume(void)
+{
+ unsigned int i;
+ void __iomem *base;
+ typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
+
+ writel_relaxed(0, GICD + GICD_CTLR);
+
+ for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
+ writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
+
+ for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
+ {
+ typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
+ unsigned int irq;
+
+ base = GICD + GICD_ICFGR + 8 * i;
+ writel_relaxed(irqs->icfgr[0], base);
+ writel_relaxed(irqs->icfgr[1], base + 4);
+
+ base = GICD + GICD_IPRIORITYR + 32 * i;
+ for ( irq = 0; irq < 8; irq++ )
+ writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
+
+ base = GICD + GICD_IROUTER + 32 * i;
+ for ( irq = 0; irq < 32; irq++ )
+ writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
+
+ writel_relaxed(GENMASK(31, 0), GICD + GICD_ICENABLER + i * 4);
+ writel_relaxed(irqs->isenabler, GICD + GICD_ISENABLER + i * 4);
+
+ writel_relaxed(GENMASK(31, 0), GICD + GICD_ICACTIVER + i * 4);
+ writel_relaxed(irqs->isactiver, GICD + GICD_ISACTIVER + i * 4);
+ }
+
+ writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
+ gicv3_dist_wait_for_rwp();
+
+ /* Restore GICR (Redistributor) configuration */
+ gicv3_enable_redist();
+
+ base = GICD_RDIST_SGI_BASE;
+
+ writel_relaxed(0xffffffff, base + GICR_ICENABLER0);
+ gicv3_redist_wait_for_rwp();
+
+ for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
+ writel_relaxed(rdist->ipriorityr[i], base + GICR_IPRIORITYR0 + i * 4);
+
+ writel_relaxed(rdist->isactiver, base + GICR_ISACTIVER0);
+
+ writel_relaxed(rdist->igroupr, base + GICR_IGROUPR0);
+ writel_relaxed(rdist->icfgr[0], base + GICR_ICFGR0);
+ writel_relaxed(rdist->icfgr[1], base + GICR_ICFGR1);
+
+ gicv3_redist_wait_for_rwp();
+
+ writel_relaxed(rdist->isenabler, base + GICR_ISENABLER0);
+ writel_relaxed(rdist->ctlr, GICD_RDIST_BASE + GICR_CTLR);
+
+ gicv3_redist_wait_for_rwp();
+
+ WRITE_SYSREG(gicv3_ctx.cpu.sre_el2, ICC_SRE_EL2);
+ isb();
+
+ /* Restore CPU interface (System registers) */
+ WRITE_SYSREG(gicv3_ctx.cpu.pmr, ICC_PMR_EL1);
+ WRITE_SYSREG(gicv3_ctx.cpu.bpr, ICC_BPR1_EL1);
+ WRITE_SYSREG(gicv3_ctx.cpu.ctlr, ICC_CTLR_EL1);
+ WRITE_SYSREG(gicv3_ctx.cpu.grpen, ICC_IGRPEN1_EL1);
+ isb();
+
+ gicv3_hyp_init();
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
/* Set up the GIC */
static int __init gicv3_init(void)
{
@@ -1850,6 +2077,10 @@ static int __init gicv3_init(void)
gicv3_hyp_init();
+#ifdef CONFIG_SYSTEM_SUSPEND
+ gicv3_alloc_context();
+#endif
+
out:
spin_unlock(&gicv3.lock);
@@ -1889,6 +2120,10 @@ static const struct gic_hw_operations gicv3_ops = {
#endif
.iomem_deny_access = gicv3_iomem_deny_access,
.do_LPI = gicv3_do_LPI,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = gicv3_suspend,
+ .resume = gicv3_resume,
+#endif
};
static int __init gicv3_dt_preinit(struct dt_device_node *node, const void *data)
diff --git a/xen/arch/arm/include/asm/gic_v3_defs.h b/xen/arch/arm/include/asm/gic_v3_defs.h
index 2af093e774..7e86309acb 100644
--- a/xen/arch/arm/include/asm/gic_v3_defs.h
+++ b/xen/arch/arm/include/asm/gic_v3_defs.h
@@ -56,6 +56,7 @@
#define GICD_TYPE_LPIS (1U << 17)
#define GICD_CTLR_RWP (1UL << 31)
+#define GICD_CTLR_DS (1U << 6)
#define GICD_CTLR_ARE_NS (1U << 4)
#define GICD_CTLR_ENABLE_G1A (1U << 1)
#define GICD_CTLR_ENABLE_G1 (1U << 0)
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (2 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 " Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:31 ` Volodymyr Babchuk
2025-09-01 22:10 ` [PATCH v6 05/13] xen/arm: irq: avoid local IRQ descriptors reinit on system resume Mykola Kvach
` (9 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
If we call disable_nonboot_cpus on ARM64 with system_state set
to SYS_STATE_suspend, the following assertion will be triggered:
```
(XEN) [ 25.582712] Disabling non-boot CPUs ...
(XEN) [ 25.587032] Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
[...]
(XEN) [ 25.975069] Xen call trace:
(XEN) [ 25.978353] [<00000a000022e098>] xfree+0x130/0x1a4 (PC)
(XEN) [ 25.984314] [<00000a000022e08c>] xfree+0x124/0x1a4 (LR)
(XEN) [ 25.990276] [<00000a00002747d4>] release_irq+0xe4/0xe8
(XEN) [ 25.996152] [<00000a0000278588>] time.c#cpu_time_callback+0x44/0x60
(XEN) [ 26.003150] [<00000a000021d678>] notifier_call_chain+0x7c/0xa0
(XEN) [ 26.009717] [<00000a00002018e0>] cpu.c#cpu_notifier_call_chain+0x24/0x48
(XEN) [ 26.017148] [<00000a000020192c>] cpu.c#_take_cpu_down+0x28/0x34
(XEN) [ 26.023801] [<00000a0000201944>] cpu.c#take_cpu_down+0xc/0x18
(XEN) [ 26.030281] [<00000a0000225c5c>] stop_machine.c#stopmachine_action+0xbc/0xe4
(XEN) [ 26.038057] [<00000a00002264bc>] tasklet.c#do_tasklet_work+0xb8/0x100
(XEN) [ 26.045229] [<00000a00002268a4>] do_tasklet+0x68/0xb0
(XEN) [ 26.051018] [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
(XEN) [ 26.057585] [<00000a0000277e30>] start_secondary+0x21c/0x220
(XEN) [ 26.063978] [<00000a0000361258>] 00000a0000361258
```
This happens because before invoking take_cpu_down via the stop_machine_run
function on the target CPU, stop_machine_run requests
the STOPMACHINE_DISABLE_IRQ state on that CPU. Releasing memory in
the release_irq function then triggers the assertion:
/*
* Heap allocations may need TLB flushes which may require IRQs to be
* enabled (except when only 1 PCPU is online).
*/
This patch adds system state checks to guard calls to request_irq
and release_irq. These calls are now skipped when system_state is
SYS_STATE_{resume,suspend}, preventing unsafe operations during
suspend/resume handling.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- skipping of IRQ release during system suspend is now handled
inside release_irq().
Changes in V4:
- removed the prior tasklet-based workaround in favor of a more
straightforward and safer solution
- reworked the approach by adding explicit system state checks around
request_irq and release_irq calls, skips these calls during suspend
and resume states to avoid unsafe memory operations when IRQs are
disabled
---
xen/arch/arm/gic.c | 3 +++
xen/arch/arm/irq.c | 3 +++
xen/arch/arm/tee/ffa_notif.c | 2 +-
xen/arch/arm/time.c | 11 +++++++----
4 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index a018bd7715..c64481faa7 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -388,6 +388,9 @@ void gic_dump_info(struct vcpu *v)
void init_maintenance_interrupt(void)
{
+ if ( system_state == SYS_STATE_resume )
+ return;
+
request_irq(gic_hw_ops->info->maintenance_irq, 0, maintenance_interrupt,
"irq-maintenance", NULL);
}
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 02ca82c089..361496a6d0 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -300,6 +300,9 @@ void release_irq(unsigned int irq, const void *dev_id)
unsigned long flags;
struct irqaction *action, **action_ptr;
+ if ( system_state == SYS_STATE_suspend )
+ return;
+
desc = irq_to_desc(irq);
spin_lock_irqsave(&desc->lock,flags);
diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
index 86bef6b3b2..4835e25619 100644
--- a/xen/arch/arm/tee/ffa_notif.c
+++ b/xen/arch/arm/tee/ffa_notif.c
@@ -363,7 +363,7 @@ void ffa_notif_init_interrupt(void)
{
int ret;
- if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
+ if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI && system_state != SYS_STATE_resume )
{
/*
* An error here is unlikely since the primary CPU has already
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index ad984fdfdd..8267fa5191 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -320,10 +320,13 @@ void init_timer_interrupt(void)
WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
disable_physical_timers();
- request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
- "hyptimer", NULL);
- request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
- "virtimer", NULL);
+ if ( system_state != SYS_STATE_resume )
+ {
+ request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
+ "hyptimer", NULL);
+ request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
+ "virtimer", NULL);
+ }
check_timer_irq_cfg(timer_irq[TIMER_HYP_PPI], "hypervisor");
check_timer_irq_cfg(timer_irq[TIMER_VIRT_PPI], "virtual");
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 05/13] xen/arm: irq: avoid local IRQ descriptors reinit on system resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (3 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during " Mykola Kvach
` (8 subsequent siblings)
13 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
On ARM, during system resume, CPUs are brought online again. This normally
triggers init_local_irq_data, which reinitializes IRQ descriptors for
banked interrupts (SGIs and PPIs).
These descriptors are statically allocated per CPU and retain valid
state across suspend/resume cycles. Re-initializing them on resume is
unnecessary and may result in loss of interrupt configuration or
restored state.
This patch skips init_local_irq_data when system_state is set to
SYS_STATE_resume to preserve banked IRQ descs state during resume.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
xen/arch/arm/irq.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 361496a6d0..6c899347ca 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -125,6 +125,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
switch ( action )
{
case CPU_UP_PREPARE:
+ /* Skip local IRQ cleanup on resume */
+ if ( system_state == SYS_STATE_resume )
+ break;
+
rc = init_local_irq_data(cpu);
if ( rc )
printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (4 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 05/13] xen/arm: irq: avoid local IRQ descriptors reinit on system resume Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 16:49 ` Oleksandr Tyshchenko
2025-09-01 22:10 ` [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
` (7 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
and not restored by gic_resume (for secondary cpus).
This patch introduces restore_local_irqs_on_resume, a function that
restores the state of local interrupts on the target CPU during
system resume.
It iterates over all local IRQs and re-enables those that were not
disabled, reprogramming their routing and affinity accordingly.
The function is invoked from start_secondary, ensuring that local IRQ
state is restored early during CPU bring-up after suspend.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- Call handler->disable() instead of just setting the _IRQ_DISABLED flag
- Move the system state check outside of restore_local_irqs_on_resume()
---
xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 6c899347ca..ddd2940554 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
return 0;
}
+/*
+ * The first 32 interrupts (PPIs and SGIs) are per-CPU,
+ * so call this function on the target CPU to restore them.
+ *
+ * SPIs are restored via gic_resume.
+ */
+static void restore_local_irqs_on_resume(void)
+{
+ int irq;
+
+ spin_lock(&local_irqs_type_lock);
+
+ for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
+ {
+ struct irq_desc *desc = irq_to_desc(irq);
+
+ spin_lock(&desc->lock);
+
+ if ( test_bit(_IRQ_DISABLED, &desc->status) )
+ {
+ spin_unlock(&desc->lock);
+ continue;
+ }
+
+ /* Disable the IRQ to avoid assertions in the following calls */
+ desc->handler->disable(desc);
+ gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
+ desc->handler->startup(desc);
+
+ spin_unlock(&desc->lock);
+ }
+
+ spin_unlock(&local_irqs_type_lock);
+}
+
static int cpu_callback(struct notifier_block *nfb, unsigned long action,
void *hcpu)
{
@@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
cpu);
break;
+ case CPU_STARTING:
+ if ( system_state == SYS_STATE_resume )
+ restore_local_irqs_on_resume();
+ break;
}
return notifier_from_errno(rc);
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (5 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during " Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:39 ` Volodymyr Babchuk
2025-09-03 10:01 ` Oleksandr Tyshchenko
2025-09-01 22:10 ` [PATCH v6 08/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
` (6 subsequent siblings)
13 siblings, 2 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Mykola Kvach
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Store and restore active context and micro-TLB registers.
Tested on R-Car H3 Starter Kit.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- refactor code related to hw_register struct, from now it's called
ipmmu_reg_ctx
---
xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
1 file changed, 257 insertions(+)
diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
index ea9fa9ddf3..0973559861 100644
--- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -71,6 +71,8 @@
})
#endif
+#define dev_dbg(dev, fmt, ...) \
+ dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
#define dev_info(dev, fmt, ...) \
dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
#define dev_warn(dev, fmt, ...) \
@@ -130,6 +132,24 @@ struct ipmmu_features {
unsigned int imuctr_ttsel_mask;
};
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+struct ipmmu_reg_ctx {
+ unsigned int imttlbr0;
+ unsigned int imttubr0;
+ unsigned int imttbcr;
+ unsigned int imctr;
+};
+
+struct ipmmu_vmsa_backup {
+ struct device *dev;
+ unsigned int *utlbs_val;
+ unsigned int *asids_val;
+ struct list_head list;
+};
+
+#endif
+
/* Root/Cache IPMMU device's information */
struct ipmmu_vmsa_device {
struct device *dev;
@@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
unsigned int utlb_refcount[IPMMU_UTLB_MAX];
const struct ipmmu_features *features;
+#ifdef CONFIG_SYSTEM_SUSPEND
+ struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
+#endif
};
/*
@@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
spin_unlock_irqrestore(&mmu->lock, flags);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
+static LIST_HEAD(ipmmu_devices_backup);
+
+static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
+
+static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
+ unsigned int utlb)
+{
+ return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
+}
+
+static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
+{
+ struct ipmmu_vmsa_backup *backup_data;
+
+ dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
+
+ spin_lock(&ipmmu_devices_backup_lock);
+
+ list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
+ {
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
+ unsigned int i;
+
+ if ( to_ipmmu(backup_data->dev) != mmu )
+ continue;
+
+ for ( i = 0; i < fwspec->num_ids; i++ )
+ {
+ unsigned int utlb = fwspec->ids[i];
+
+ backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
+ backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
+ }
+ }
+
+ spin_unlock(&ipmmu_devices_backup_lock);
+}
+
+static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
+{
+ struct ipmmu_vmsa_backup *backup_data;
+
+ dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
+
+ spin_lock(&ipmmu_devices_backup_lock);
+
+ list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
+ {
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
+ unsigned int i;
+
+ if ( to_ipmmu(backup_data->dev) != mmu )
+ continue;
+
+ for ( i = 0; i < fwspec->num_ids; i++ )
+ {
+ unsigned int utlb = fwspec->ids[i];
+
+ ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
+ ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
+ }
+ }
+
+ spin_unlock(&ipmmu_devices_backup_lock);
+}
+
+static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
+{
+ struct ipmmu_vmsa_device *mmu = domain->mmu->root;
+ struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
+
+ dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
+
+ regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
+ regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
+ regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
+ regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
+}
+
+static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
+{
+ struct ipmmu_vmsa_device *mmu = domain->mmu->root;
+ struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
+
+ dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
+
+ ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
+ ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
+ ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
+ ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
+}
+
+/*
+ * Xen: Unlike Linux implementation, Xen uses a single driver instance
+ * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
+ * callbacks to be invoked for each IPMMU device. So, we need to iterate
+ * through all registered IPMMUs performing required actions.
+ *
+ * Also take care of restoring special settings, such as translation
+ * table format, etc.
+ */
+static int __must_check ipmmu_suspend(void)
+{
+ struct ipmmu_vmsa_device *mmu;
+
+ if ( !iommu_enabled )
+ return 0;
+
+ printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
+
+ spin_lock(&ipmmu_devices_lock);
+
+ list_for_each_entry( mmu, &ipmmu_devices, list )
+ {
+ if ( ipmmu_is_root(mmu) )
+ {
+ unsigned int i;
+
+ for ( i = 0; i < mmu->num_ctx; i++ )
+ {
+ if ( !mmu->domains[i] )
+ continue;
+ ipmmu_domain_backup_context(mmu->domains[i]);
+ }
+ }
+ else
+ ipmmu_utlbs_backup(mmu);
+ }
+
+ spin_unlock(&ipmmu_devices_lock);
+
+ return 0;
+}
+
+static void ipmmu_resume(void)
+{
+ struct ipmmu_vmsa_device *mmu;
+
+ if ( !iommu_enabled )
+ return;
+
+ printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
+
+ spin_lock(&ipmmu_devices_lock);
+
+ list_for_each_entry( mmu, &ipmmu_devices, list )
+ {
+ uint32_t reg;
+
+ /* Do not use security group function */
+ reg = IMSCTLR + mmu->features->control_offset_base;
+ ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
+
+ if ( ipmmu_is_root(mmu) )
+ {
+ unsigned int i;
+
+ /* Use stage 2 translation table format */
+ reg = IMSAUXCTLR + mmu->features->control_offset_base;
+ ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
+
+ for ( i = 0; i < mmu->num_ctx; i++ )
+ {
+ if ( !mmu->domains[i] )
+ continue;
+ ipmmu_domain_restore_context(mmu->domains[i]);
+ }
+ }
+ else
+ ipmmu_utlbs_restore(mmu);
+ }
+
+ spin_unlock(&ipmmu_devices_lock);
+}
+
+static int ipmmu_alloc_ctx_suspend(struct device *dev)
+{
+ struct ipmmu_vmsa_backup *backup_data;
+ unsigned int *utlbs_val, *asids_val;
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
+ utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
+ if ( !utlbs_val )
+ return -ENOMEM;
+
+ asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
+ if ( !asids_val )
+ {
+ xfree(utlbs_val);
+ return -ENOMEM;
+ }
+
+ backup_data = xzalloc(struct ipmmu_vmsa_backup);
+ if ( !backup_data )
+ {
+ xfree(utlbs_val);
+ xfree(asids_val);
+ return -ENOMEM;
+ }
+
+ backup_data->dev = dev;
+ backup_data->utlbs_val = utlbs_val;
+ backup_data->asids_val = asids_val;
+
+ spin_lock(&ipmmu_devices_backup_lock);
+ list_add(&backup_data->list, &ipmmu_devices_backup);
+ spin_unlock(&ipmmu_devices_backup_lock);
+
+ return 0;
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
{
uint64_t ttbr;
@@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
return ret;
domain->context_id = ret;
+#ifdef CONFIG_SYSTEM_SUSPEND
+ domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
+#endif
/*
* TTBR0
@@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
ipmmu_tlb_sync(domain);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ domain->mmu->root->reg_backup[domain->context_id] = NULL;
+#endif
ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
}
@@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
}
#endif
+#ifdef CONFIG_SYSTEM_SUSPEND
+ if ( ipmmu_alloc_ctx_suspend(dev) )
+ {
+ dev_err(dev, "Failed to allocate context for suspend\n");
+ return -ENOMEM;
+ }
+#endif
+
dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
dev_name(fwspec->iommu_dev), fwspec->num_ids);
@@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
.unmap_page = arm_iommu_unmap_page,
.dt_xlate = ipmmu_dt_xlate,
.add_device = ipmmu_add_device,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = ipmmu_suspend,
+ .resume = ipmmu_resume,
+#endif
};
static __init int ipmmu_init(struct dt_device_node *node, const void *data)
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 08/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (6 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 09/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
` (5 subsequent siblings)
13 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
Invoke PSCI SYSTEM_SUSPEND to finalize Xen's suspend sequence on ARM64 platforms.
Pass the resume entry point (hyp_resume) as the first argument to EL3. The resume
handler is currently a stub and will be implemented later in assembly. Ignore the
context ID argument, as is done in Linux.
Only enable this path when CONFIG_SYSTEM_SUSPEND is set and
PSCI version is >= 1.0.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- move calling of call_psci_system_suspend to commit that
implements system_suspend call
Changes in v4:
- select the appropriate PSCI SYSTEM_SUSPEND function ID based on platform
- update comments and commit message to reflect recent changes
Changes in v3:
- return PSCI_NOT_SUPPORTED instead of a hardcoded 1 on ARM32
- check PSCI version before invoking SYSTEM_SUSPEND in call_psci_system_suspend
---
xen/arch/arm/arm64/head.S | 8 ++++++++
xen/arch/arm/include/asm/psci.h | 1 +
xen/arch/arm/include/asm/suspend.h | 22 ++++++++++++++++++++++
xen/arch/arm/psci.c | 23 ++++++++++++++++++++++-
4 files changed, 53 insertions(+), 1 deletion(-)
create mode 100644 xen/arch/arm/include/asm/suspend.h
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 72c7b24498..3522c497c5 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -561,6 +561,14 @@ END(efi_xen_start)
#endif /* CONFIG_ARM_EFI */
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+FUNC(hyp_resume)
+ b .
+END(hyp_resume)
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
/*
* Local variables:
* mode: ASM
diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
index 48a93e6b79..bb3c73496e 100644
--- a/xen/arch/arm/include/asm/psci.h
+++ b/xen/arch/arm/include/asm/psci.h
@@ -23,6 +23,7 @@ int call_psci_cpu_on(int cpu);
void call_psci_cpu_off(void);
void call_psci_system_off(void);
void call_psci_system_reset(void);
+int call_psci_system_suspend(void);
/* Range of allocated PSCI function numbers */
#define PSCI_FNUM_MIN_VALUE _AC(0,U)
diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
new file mode 100644
index 0000000000..7e04c6e915
--- /dev/null
+++ b/xen/arch/arm/include/asm/suspend.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_ARM_SUSPEND_H__
+#define __ASM_ARM_SUSPEND_H__
+
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+void hyp_resume(void);
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
+#endif /* __ASM_ARM_SUSPEND_H__ */
+
+ /*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index b6860a7760..c9d126b195 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -17,17 +17,20 @@
#include <asm/cpufeature.h>
#include <asm/psci.h>
#include <asm/acpi.h>
+#include <asm/suspend.h>
/*
* While a 64-bit OS can make calls with SMC32 calling conventions, for
* some calls it is necessary to use SMC64 to pass or return 64-bit values.
- * For such calls PSCI_0_2_FN_NATIVE(x) will choose the appropriate
+ * For such calls PSCI_*_FN_NATIVE(x) will choose the appropriate
* (native-width) function ID.
*/
#ifdef CONFIG_ARM_64
#define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN64_##name
+#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN64_##name
#else
#define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN32_##name
+#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN32_##name
#endif
uint32_t psci_ver;
@@ -60,6 +63,24 @@ void call_psci_cpu_off(void)
}
}
+int call_psci_system_suspend(void)
+{
+#ifdef CONFIG_SYSTEM_SUSPEND
+ struct arm_smccc_res res;
+
+ if ( psci_ver < PSCI_VERSION(1, 0) )
+ return PSCI_NOT_SUPPORTED;
+
+ /* 2nd argument (context ID) is not used */
+ arm_smccc_smc(PSCI_1_0_FN_NATIVE(SYSTEM_SUSPEND), __pa(hyp_resume), &res);
+ return PSCI_RET(res);
+#else
+ dprintk(XENLOG_WARNING,
+ "SYSTEM_SUSPEND not supported (CONFIG_SYSTEM_SUSPEND disabled)\n");
+ return PSCI_NOT_SUPPORTED;
+#endif
+}
+
void call_psci_system_off(void)
{
if ( psci_ver > PSCI_VERSION(0, 1) )
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 09/13] xen/arm: Resume memory management on Xen resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (7 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 08/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 10/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
` (4 subsequent siblings)
13 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
The MMU must be enabled during the resume path before restoring context,
as virtual addresses are used to access the saved context data.
This patch adds MMU setup during resume by reusing the existing
enable_secondary_cpu_mm function, which enables data cache and the MMU.
Before the MMU is enabled, the content of TTBR0_EL2 is changed to point
to init_ttbr (page tables used at runtime).
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- moved changes related to set_init_ttbr to commit that implements
system_suspend call
Changes in v4:
- Drop unnecessary DAIF masking; interrupts are already masked on resume
- Remove leftover TLB flush instructions; flushing is done in enable_mmu
- Avoid setting x19 in hyp_resume; not needed
- Replace prepare_secondary_mm with set_init_ttbr; call it from system_suspend
Changes in v3:
- Update commit message for clarity
- Replace create_page_tables, enable_mmu, and mmu_init_secondary_cpu
with enable_secondary_cpu_mm
- Move prepare_secondary_mm to start_xen to avoid crash
- Add early UART init during resume
Changes in v2:
- Move hyp_resume to head.S to keep resume logic together
- Simplify hyp_resume using existing helpers: check_cpu_mode, cpu_init,
create_page_tables, enable_mmu
---
xen/arch/arm/arm64/head.S | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 3522c497c5..596e960152 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -564,6 +564,22 @@ END(efi_xen_start)
#ifdef CONFIG_SYSTEM_SUSPEND
FUNC(hyp_resume)
+ /* Initialize the UART if earlyprintk has been enabled. */
+#ifdef CONFIG_EARLY_PRINTK
+ bl init_uart
+#endif
+ PRINT_ID("- Xen resuming -\r\n")
+
+ bl check_cpu_mode
+ bl cpu_init
+
+ ldr x0, =start
+ adr x20, start /* x20 := paddr (start) */
+ sub x20, x20, x0 /* x20 := phys-offset */
+ ldr lr, =mmu_resumed
+ b enable_secondary_cpu_mm
+
+mmu_resumed:
b .
END(hyp_resume)
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 10/13] xen/arm: Save/restore context on suspend/resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (8 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 09/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
` (3 subsequent siblings)
13 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
The context of CPU general purpose and system control registers must be
saved on suspend and restored on resume. This is implemented in
prepare_resume_ctx and before the return from the hyp_resume function.
The prepare_resume_ctx must be invoked just before the PSCI system suspend
call is issued to the ATF. The prepare_resume_ctx must return a non-zero
value so that the calling 'if' statement evaluates to true, causing the
system suspend to be invoked. Upon resume, the context saved on suspend
will be restored, including the link register. Therefore, after
restoring the context, the control flow will return to the address
pointed to by the saved link register, which is the place from which
prepare_resume_ctx was called. To ensure that the calling 'if' statement
does not again evaluate to true and initiate system suspend, hyp_resume
must return a zero value after restoring the context.
Note that the order of saving register context into cpu_context structure
must match the order of restoring.
Support for ARM32 is not implemented. Instead, compilation fails with a
build-time error if suspend is enabled for ARM32.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- Rename hyp_suspend to prepare_resume_ctx
- moved invocation of prepare_resume_ctx to commit which
implements system_suspend call
Changes in v4:
- Produce build-time error for ARM32 when CONFIG_SYSTEM_SUSPEND is enabled
- Use register_t instead of uint64_t in cpu_context structure
---
xen/arch/arm/Makefile | 1 +
xen/arch/arm/arm64/head.S | 90 +++++++++++++++++++++++++++++-
xen/arch/arm/include/asm/suspend.h | 22 ++++++++
xen/arch/arm/suspend.c | 14 +++++
4 files changed, 126 insertions(+), 1 deletion(-)
create mode 100644 xen/arch/arm/suspend.c
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index f833cdf207..3f6247adee 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -51,6 +51,7 @@ obj-y += setup.o
obj-y += shutdown.o
obj-y += smp.o
obj-y += smpboot.o
+obj-$(CONFIG_SYSTEM_SUSPEND) += suspend.o
obj-$(CONFIG_SYSCTL) += sysctl.o
obj-y += time.o
obj-y += traps.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 596e960152..c6594c0bdd 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -562,6 +562,52 @@ END(efi_xen_start)
#endif /* CONFIG_ARM_EFI */
#ifdef CONFIG_SYSTEM_SUSPEND
+/*
+ * int prepare_resume_ctx(struct cpu_context *ptr)
+ *
+ * x0 - pointer to the storage where callee's context will be saved
+ *
+ * CPU context saved here will be restored on resume in hyp_resume function.
+ * prepare_resume_ctx shall return a non-zero value. Upon restoring context
+ * hyp_resume shall return value zero instead. From C code that invokes
+ * prepare_resume_ctx, the return value is interpreted to determine whether
+ * the context is saved (prepare_resume_ctx) or restored (hyp_resume).
+ */
+FUNC(prepare_resume_ctx)
+ /* Store callee-saved registers */
+ stp x19, x20, [x0], #16
+ stp x21, x22, [x0], #16
+ stp x23, x24, [x0], #16
+ stp x25, x26, [x0], #16
+ stp x27, x28, [x0], #16
+ stp x29, lr, [x0], #16
+
+ /* Store stack-pointer */
+ mov x2, sp
+ str x2, [x0], #8
+
+ /* Store system control registers */
+ mrs x2, VBAR_EL2
+ str x2, [x0], #8
+ mrs x2, VTCR_EL2
+ str x2, [x0], #8
+ mrs x2, VTTBR_EL2
+ str x2, [x0], #8
+ mrs x2, TPIDR_EL2
+ str x2, [x0], #8
+ mrs x2, MDCR_EL2
+ str x2, [x0], #8
+ mrs x2, HSTR_EL2
+ str x2, [x0], #8
+ mrs x2, CPTR_EL2
+ str x2, [x0], #8
+ mrs x2, HCR_EL2
+ str x2, [x0], #8
+
+ /* prepare_resume_ctx must return a non-zero value */
+ mov x0, #1
+ ret
+END(prepare_resume_ctx)
FUNC(hyp_resume)
/* Initialize the UART if earlyprintk has been enabled. */
@@ -580,7 +626,49 @@ FUNC(hyp_resume)
b enable_secondary_cpu_mm
mmu_resumed:
- b .
+ /* Now we can access the cpu_context, so restore the context here */
+ ldr x0, =cpu_context
+
+ /* Restore callee-saved registers */
+ ldp x19, x20, [x0], #16
+ ldp x21, x22, [x0], #16
+ ldp x23, x24, [x0], #16
+ ldp x25, x26, [x0], #16
+ ldp x27, x28, [x0], #16
+ ldp x29, lr, [x0], #16
+
+ /* Restore stack pointer */
+ ldr x2, [x0], #8
+ mov sp, x2
+
+ /* Restore system control registers */
+ ldr x2, [x0], #8
+ msr VBAR_EL2, x2
+ ldr x2, [x0], #8
+ msr VTCR_EL2, x2
+ ldr x2, [x0], #8
+ msr VTTBR_EL2, x2
+ ldr x2, [x0], #8
+ msr TPIDR_EL2, x2
+ ldr x2, [x0], #8
+ msr MDCR_EL2, x2
+ ldr x2, [x0], #8
+ msr HSTR_EL2, x2
+ ldr x2, [x0], #8
+ msr CPTR_EL2, x2
+ ldr x2, [x0], #8
+ msr HCR_EL2, x2
+ isb
+
+ /*
+ * Since context is restored return from this function will appear
+ * as return from prepare_resume_ctx. To distinguish a return from
+ * prepare_resume_ctx which is called upon finalizing the suspend,
+ * as opposed to return from this function which executes on resume,
+ * we need to return zero value here.
+ */
+ mov x0, #0
+ ret
END(hyp_resume)
#endif /* CONFIG_SYSTEM_SUSPEND */
diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
index 7e04c6e915..29eed4ee7f 100644
--- a/xen/arch/arm/include/asm/suspend.h
+++ b/xen/arch/arm/include/asm/suspend.h
@@ -3,9 +3,31 @@
#ifndef __ASM_ARM_SUSPEND_H__
#define __ASM_ARM_SUSPEND_H__
+#include <asm/types.h>
+
#ifdef CONFIG_SYSTEM_SUSPEND
+#ifdef CONFIG_ARM_64
+struct cpu_context {
+ register_t callee_regs[12];
+ register_t sp;
+ register_t vbar_el2;
+ register_t vtcr_el2;
+ register_t vttbr_el2;
+ register_t tpidr_el2;
+ register_t mdcr_el2;
+ register_t hstr_el2;
+ register_t cptr_el2;
+ register_t hcr_el2;
+} __aligned(16);
+#else
+#error "Define cpu_context structure for arm32"
+#endif
+
+extern struct cpu_context cpu_context;
+
void hyp_resume(void);
+int prepare_resume_ctx(struct cpu_context *ptr);
#endif /* CONFIG_SYSTEM_SUSPEND */
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
new file mode 100644
index 0000000000..5093f1bf3d
--- /dev/null
+++ b/xen/arch/arm/suspend.c
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <asm/suspend.h>
+
+struct cpu_context cpu_context;
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (9 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 10/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 5:56 ` Mykola Kvach
2025-09-02 14:33 ` Jan Beulich
2025-09-01 22:10 ` [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
` (2 subsequent siblings)
13 siblings, 2 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
Anthony PERARD, Jan Beulich, Roger Pau Monné, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
Trigger Xen suspend when the hardware domain initiates suspend via
SHUTDOWN_suspend. Redirect system suspend to CPU#0 to ensure the
suspend logic runs on the boot CPU, as required.
Introduce full suspend/resume infrastructure gated by CONFIG_SYSTEM_SUSPEND,
including logic to:
- disable and enable non-boot physical CPUs
- freeze and thaw domains
- suspend and resume the GIC, timer, iommu and console
- maintain system state before and after suspend
On boot, init_ttbr is normally initialized during secondary CPU hotplug.
On uniprocessor systems, this would leave init_ttbr uninitialized,
causing resume to fail. To address this, the boot CPU now sets init_ttbr
during suspend.
Remove the restriction in the vPSCI interface preventing suspend from the
hardware domain.
Select HAS_SYSTEM_SUSPEND for ARM_64.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- Minor changes in comments.
- The implementation now uses own tasklet instead of continue_hypercall_on_cpu,
as the latter rewrites user registers and would tie system suspend status
to guest suspend status.
- The order of calls when suspending devices has been updated.
Changes in v5:
- select HAS_SYSTEM_SUSPEND in ARM_64 instead of ARM
- check llc_coloring_enabled instead of LLC_COLORING during the selection
of HAS_SYSTEM_SUSPEND config
- call host_system_suspend from guest PSCI system suspend instead of
arch_domain_shutdown, reducing the complexity of the new code
- update some comments
Changes introduced in V4:
- drop code for saving and restoring VCPU context (for more information
refer part 1 of patch series)
- remove IOMMU suspend and resume calls until they will be implemented
- move system suspend logic to arch_domain_shutdown, invoked from
domain_shutdown
- apply minor fixes such as renaming functions, updating comments, and
modifying the commit message to reflect these changes
- add console_end_sync to resume path after system suspend
Changes introduced in V3:
- merge changes from other commits into this patch (previously stashed):
1) disable/enable non-boot physical CPUs and freeze/thaw domains during
suspend/resume
2) suspend/resume the timer, GIC, console, IOMMU, and hardware domain
---
xen/arch/arm/Kconfig | 1 +
xen/arch/arm/include/asm/mm.h | 2 +
xen/arch/arm/include/asm/suspend.h | 2 +
xen/arch/arm/mmu/smpboot.c | 2 +-
xen/arch/arm/suspend.c | 150 +++++++++++++++++++++++++++++
xen/arch/arm/vpsci.c | 9 +-
xen/common/domain.c | 4 +
7 files changed, 168 insertions(+), 2 deletions(-)
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 5355534f3d..fdad53fd68 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -8,6 +8,7 @@ config ARM_64
depends on !ARM_32
select 64BIT
select HAS_FAST_MULTIPLY
+ select HAS_SYSTEM_SUSPEND if UNSUPPORTED
select HAS_VPCI_GUEST_SUPPORT if PCI_PASSTHROUGH
config ARM
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 7a93dad2ed..61e211d087 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -365,6 +365,8 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
} while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
}
+void set_init_ttbr(lpae_t *root);
+
#endif /* __ARCH_ARM_MM__ */
/*
* Local variables:
diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
index 29eed4ee7f..8d30b01b7c 100644
--- a/xen/arch/arm/include/asm/suspend.h
+++ b/xen/arch/arm/include/asm/suspend.h
@@ -29,6 +29,8 @@ extern struct cpu_context cpu_context;
void hyp_resume(void);
int prepare_resume_ctx(struct cpu_context *ptr);
+void host_system_suspend(void);
+
#endif /* CONFIG_SYSTEM_SUSPEND */
#endif /* __ASM_ARM_SUSPEND_H__ */
diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
index 37e91d72b7..ff508ecf40 100644
--- a/xen/arch/arm/mmu/smpboot.c
+++ b/xen/arch/arm/mmu/smpboot.c
@@ -72,7 +72,7 @@ static void clear_boot_pagetables(void)
clear_table(boot_third);
}
-static void set_init_ttbr(lpae_t *root)
+void set_init_ttbr(lpae_t *root)
{
/*
* init_ttbr is part of the identity mapping which is read-only. So
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
index 5093f1bf3d..35b20581f1 100644
--- a/xen/arch/arm/suspend.c
+++ b/xen/arch/arm/suspend.c
@@ -1,9 +1,159 @@
/* SPDX-License-Identifier: GPL-2.0-only */
+#include <asm/psci.h>
#include <asm/suspend.h>
+#include <xen/console.h>
+#include <xen/cpu.h>
+#include <xen/llc-coloring.h>
+#include <xen/sched.h>
+#include <xen/tasklet.h>
+
+/*
+ * TODO list:
+ * - Decide which domain will trigger system suspend ctl or hw ?
+ * - Test system suspend with LLC_COLORING enabled and verify functionality
+ * - Implement IOMMU suspend/resume handlers and integrate them
+ * into the suspend/resume path (SMMU)
+ * - Enable "xl suspend" support on ARM architecture
+ * - Properly disable Xen timer watchdog from relevant services (init.d left)
+ * - Add suspend/resume CI test for ARM (QEMU if feasible)
+ * - Investigate feasibility and need for implementing system suspend on ARM32
+ */
+
struct cpu_context cpu_context;
+/* Xen suspend. Note: data is not used (suspend is the suspend to RAM) */
+static void cf_check system_suspend(void *data)
+{
+ int status;
+ unsigned long flags;
+ /* TODO: drop check after verification that features can work together */
+ if ( llc_coloring_enabled )
+ {
+ dprintk(XENLOG_ERR,
+ "System suspend is not supported with LLC_COLORING enabled\n");
+ status = -ENOSYS;
+ goto dom_resume;
+ }
+
+ BUG_ON(system_state != SYS_STATE_active);
+
+ system_state = SYS_STATE_suspend;
+
+ printk("Xen suspending...\n");
+
+ freeze_domains();
+ scheduler_disable();
+
+ /*
+ * Non-boot CPUs have to be disabled on suspend and enabled on resume
+ * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
+ * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
+ * platform capabilities, this may lead to the physical powering down of
+ * CPUs.
+ */
+ status = disable_nonboot_cpus();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_nonboot_cpus;
+ }
+
+ time_suspend();
+
+ console_start_sync();
+ status = console_suspend();
+ if ( status )
+ {
+ dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", status);
+ system_state = SYS_STATE_resume;
+ goto resume_console;
+ }
+
+ local_irq_save(flags);
+ status = gic_suspend();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_irqs;
+ }
+
+ set_init_ttbr(xen_pgtable);
+
+ /*
+ * Enable identity mapping before entering suspend to simplify
+ * the resume path
+ */
+ update_boot_mapping(true);
+
+ if ( prepare_resume_ctx(&cpu_context) )
+ {
+ status = call_psci_system_suspend();
+ /*
+ * If suspend is finalized properly by above system suspend PSCI call,
+ * the code below in this 'if' branch will never execute. Execution
+ * will continue from hyp_resume which is the hypervisor's resume point.
+ * In hyp_resume CPU context will be restored and since link-register is
+ * restored as well, it will appear to return from prepare_resume_ctx.
+ * The difference in returning from prepare_resume_ctx on system suspend
+ * versus resume is in function's return value: on suspend, the return
+ * value is a non-zero value, on resume it is zero. That is why the
+ * control flow will not re-enter this 'if' branch on resume.
+ */
+ if ( status )
+ dprintk(XENLOG_WARNING, "PSCI system suspend failed, err=%d\n",
+ status);
+ }
+
+ system_state = SYS_STATE_resume;
+ update_boot_mapping(false);
+
+ gic_resume();
+
+ resume_irqs:
+ local_irq_restore(flags);
+
+ resume_console:
+ console_resume();
+ console_end_sync();
+
+ time_resume();
+
+ resume_nonboot_cpus:
+ /*
+ * The rcu_barrier() has to be added to ensure that the per cpu area is
+ * freed before a non-boot CPU tries to initialize it (_free_percpu_area()
+ * has to be called before the init_percpu_area()). This scenario occurs
+ * when non-boot CPUs are hot-unplugged on suspend and hotplugged on resume.
+ */
+ rcu_barrier();
+ enable_nonboot_cpus();
+ scheduler_enable();
+ thaw_domains();
+
+ system_state = SYS_STATE_active;
+
+ printk("Resume (status %d)\n", status);
+
+ dom_resume:
+ /* The resume of hardware domain should always follow Xen's resume. */
+ domain_resume(hardware_domain);
+}
+
+static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
+
+void host_system_suspend(void)
+{
+ /*
+ * system_suspend should be called when hardware domain finalizes the
+ * suspend procedure from its boot core (VCPU#0). However, Dom0's vCPU#0
+ * could be mapped to any pCPU. The suspend procedure has to be finalized
+ * by the pCPU#0 (non-boot pCPUs will be disabled during the suspend).
+ */
+ tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
index 22c3a5f544..2f52aba48d 100644
--- a/xen/arch/arm/vpsci.c
+++ b/xen/arch/arm/vpsci.c
@@ -4,6 +4,7 @@
#include <xen/types.h>
#include <asm/current.h>
+#include <asm/suspend.h>
#include <asm/vgic.h>
#include <asm/vpsci.h>
#include <asm/event.h>
@@ -221,9 +222,10 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
if ( is_64bit_domain(d) && is_thumb )
return PSCI_INVALID_ADDRESS;
- /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
+#ifndef CONFIG_SYSTEM_SUSPEND
if ( is_hardware_domain(d) )
return PSCI_NOT_SUPPORTED;
+#endif
/* Ensure that all CPUs other than the calling one are offline */
domain_lock(d);
@@ -249,6 +251,11 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
"SYSTEM_SUSPEND requested, epoint=0x%"PRIregister", cid=0x%"PRIregister"\n",
epoint, cid);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ if ( is_hardware_domain(d) )
+ host_system_suspend();
+#endif
+
return rc;
}
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 667017c5e1..5e224740d3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
d->shutdown_code = reason;
reason = d->shutdown_code;
+#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
+ if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
+#else
if ( is_hardware_domain(d) )
+#endif
hwdom_shutdown(reason);
if ( d->is_shutting_down )
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (10 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 17:25 ` Oleksandr Tyshchenko
2025-09-02 20:51 ` Volodymyr Babchuk
2025-09-01 22:10 ` [PATCH v6 13/13] xen/arm: gic-v3: Add suspend/resume support for eSPI registers Mykola Kvach
2025-09-02 20:48 ` [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Volodymyr Babchuk
13 siblings, 2 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Rahul Singh,
Mykola Kvach
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
This is done using generic iommu_suspend/resume functions that cause
IOMMU driver specific suspend/resume handlers to be called for enabled
IOMMU (if one has suspend/resume driver handlers implemented).
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- Drop iommu_enabled check from host system suspend.
---
xen/arch/arm/suspend.c | 11 +++++++++++
xen/drivers/passthrough/arm/smmu-v3.c | 10 ++++++++++
xen/drivers/passthrough/arm/smmu.c | 10 ++++++++++
3 files changed, 31 insertions(+)
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
index 35b20581f1..f3a3b831c5 100644
--- a/xen/arch/arm/suspend.c
+++ b/xen/arch/arm/suspend.c
@@ -5,6 +5,7 @@
#include <xen/console.h>
#include <xen/cpu.h>
+#include <xen/iommu.h>
#include <xen/llc-coloring.h>
#include <xen/sched.h>
#include <xen/tasklet.h>
@@ -62,6 +63,13 @@ static void cf_check system_suspend(void *data)
time_suspend();
+ status = iommu_suspend();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_time;
+ }
+
console_start_sync();
status = console_suspend();
if ( status )
@@ -118,6 +126,9 @@ static void cf_check system_suspend(void *data)
console_resume();
console_end_sync();
+ iommu_resume();
+
+ resume_time:
time_resume();
resume_nonboot_cpus:
diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
index 81071f4018..f887faf7dc 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -2854,6 +2854,13 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
xfree(xen_domain);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+static int arm_smmu_suspend(void)
+{
+ return -ENOSYS;
+}
+#endif
+
static const struct iommu_ops arm_smmu_iommu_ops = {
.page_sizes = PAGE_SIZE_4K,
.init = arm_smmu_iommu_xen_domain_init,
@@ -2866,6 +2873,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
.unmap_page = arm_iommu_unmap_page,
.dt_xlate = arm_smmu_dt_xlate,
.add_device = arm_smmu_add_device,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = arm_smmu_suspend,
+#endif
};
static __init int arm_smmu_dt_init(struct dt_device_node *dev,
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 22d306d0cb..45f29ef8ec 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
xfree(xen_domain);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+static int arm_smmu_suspend(void)
+{
+ return -ENOSYS;
+}
+#endif
+
static const struct iommu_ops arm_smmu_iommu_ops = {
.page_sizes = PAGE_SIZE_4K,
.init = arm_smmu_iommu_domain_init,
@@ -2960,6 +2967,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
.map_page = arm_iommu_map_page,
.unmap_page = arm_iommu_unmap_page,
.dt_xlate = arm_smmu_dt_xlate_generic,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = arm_smmu_suspend,
+#endif
};
static struct arm_smmu_device *find_smmu(const struct device *dev)
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v6 13/13] xen/arm: gic-v3: Add suspend/resume support for eSPI registers
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (11 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:48 ` [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Volodymyr Babchuk
13 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
Add suspend/resume handling for GICv3 eSPI registers.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Note: The main eSPI patch series is still under review.
This commit is intended to be applied after the main eSPI series:
[PATCH v5 00/12] Introduce eSPI support
https://patchew.org/Xen/cover.1756481577.git.leonid._5Fkomarianskyi@epam.com/
---
xen/arch/arm/gic-v3.c | 141 +++++++++++++++++++++++++++++-------------
1 file changed, 97 insertions(+), 44 deletions(-)
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 9f1be7e905..57403c82a8 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1782,17 +1782,14 @@ static bool gic_dist_supports_lpis(void)
struct gicv3_ctx {
struct dist_ctx {
uint32_t ctlr;
- /*
- * This struct represent block of 32 IRQs
- * TODO: store extended SPI configuration (GICv3.1+)
- */
+ /* This struct represent block of 32 IRQs */
struct irq_regs {
uint32_t icfgr[2];
uint32_t ipriorityr[8];
uint64_t irouter[32];
uint32_t isactiver;
uint32_t isenabler;
- } *irqs;
+ } *irqs, *espi_irqs;
} dist;
/* have only one rdist structure for last running CPU during suspend */
@@ -1831,8 +1828,26 @@ static void __init gicv3_alloc_context(void)
gicv3_ctx.dist.irqs = xzalloc_array(typeof(*gicv3_ctx.dist.irqs),
blocks - 1);
if ( !gicv3_ctx.dist.irqs )
+ {
printk(XENLOG_ERR "Failed to allocate memory for GICv3 suspend context\n");
+ return;
+ }
}
+
+#ifdef CONFIG_GICV3_ESPI
+ if ( !gicv3_info.nr_espi )
+ return;
+
+ gicv3_ctx.dist.espi_irqs = xzalloc_array(typeof(*gicv3_ctx.dist.espi_irqs),
+ gicv3_info.nr_espi / 32);
+ if ( !gicv3_ctx.dist.espi_irqs )
+ {
+ xfree(gicv3_ctx.dist.irqs);
+ gicv3_ctx.dist.irqs = NULL;
+
+ printk(XENLOG_ERR "Failed to allocate memory for GICv3 eSPI suspend context\n");
+ }
+#endif
}
static void gicv3_disable_redist(void)
@@ -1852,6 +1867,65 @@ static void gicv3_disable_redist(void)
while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 );
}
+#define GET_SPI_REG_OFFSET(name, is_espi) \
+ ((is_espi) ? GICD_##name##nE : GICD_##name)
+
+static void gicv3_store_spi_irq_block(typeof(gicv3_ctx.dist.irqs) irqs,
+ unsigned int i, bool is_espi)
+{
+ void __iomem *base;
+ unsigned int irq;
+
+ base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + 8 * i;
+ irqs->icfgr[0] = readl_relaxed(base);
+ irqs->icfgr[1] = readl_relaxed(base + 4);
+
+ base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi) + 32 * i;
+ for ( irq = 0; irq < 8; irq++ )
+ irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi) + 32 * i;
+ for ( irq = 0; irq < 32; irq++ )
+ irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi) + 4 * i;
+ irqs->isactiver = readl_relaxed(base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi) + 4 * i;
+ irqs->isenabler = readl_relaxed(base);
+}
+
+static void gicv3_restore_spi_irq_block(typeof(gicv3_ctx.dist.irqs) irqs,
+ unsigned int i, bool is_espi)
+{
+ void __iomem *base;
+ unsigned int irq;
+
+ base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + 8 * i;
+ writel_relaxed(irqs->icfgr[0], base);
+ writel_relaxed(irqs->icfgr[1], base + 4);
+
+ base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi) + 32 * i;
+ for ( irq = 0; irq < 8; irq++ )
+ writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi) + 32 * i;
+ for ( irq = 0; irq < 32; irq++ )
+ writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(ICENABLER, is_espi) + i * 4;
+ writel_relaxed(GENMASK(31, 0), base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi) + i * 4;
+ writel_relaxed(irqs->isenabler, base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ICACTIVER, is_espi) + i * 4;
+ writel_relaxed(GENMASK(31, 0), base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi) + i * 4;
+ writel_relaxed(irqs->isactiver, base);
+}
+
static int gicv3_suspend(void)
{
unsigned int i;
@@ -1871,6 +1945,14 @@ static int gicv3_suspend(void)
return -ENOMEM;
}
+#ifdef CONFIG_GICV3_ESPI
+ if ( gicv3_info.nr_espi && !gicv3_ctx.dist.espi_irqs )
+ {
+ printk(XENLOG_ERR "GICv3: eSPI suspend context is not allocated!\n");
+ return -ENOMEM;
+ }
+#endif
+
/* Save GICC configuration */
gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
@@ -1903,25 +1985,12 @@ static int gicv3_suspend(void)
gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
- {
- typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
- unsigned int irq;
+ gicv3_store_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
- base = GICD + GICD_ICFGR + 8 * i;
- irqs->icfgr[0] = readl_relaxed(base);
- irqs->icfgr[1] = readl_relaxed(base + 4);
-
- base = GICD + GICD_IPRIORITYR + 32 * i;
- for ( irq = 0; irq < 8; irq++ )
- irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
-
- base = GICD + GICD_IROUTER + 32 * i;
- for ( irq = 0; irq < 32; irq++ )
- irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
-
- irqs->isactiver = readl_relaxed(GICD + GICD_ISACTIVER + 4 * i);
- irqs->isenabler = readl_relaxed(GICD + GICD_ISENABLER + 4 * i);
- }
+#ifdef CONFIG_GICV3_ESPI
+ for ( i = 0; i < gicv3_info.nr_espi / 32; i++ )
+ gicv3_store_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
+#endif
return 0;
}
@@ -1938,28 +2007,12 @@ static void gicv3_resume(void)
writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
- {
- typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
- unsigned int irq;
+ gicv3_restore_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
- base = GICD + GICD_ICFGR + 8 * i;
- writel_relaxed(irqs->icfgr[0], base);
- writel_relaxed(irqs->icfgr[1], base + 4);
-
- base = GICD + GICD_IPRIORITYR + 32 * i;
- for ( irq = 0; irq < 8; irq++ )
- writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
-
- base = GICD + GICD_IROUTER + 32 * i;
- for ( irq = 0; irq < 32; irq++ )
- writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
-
- writel_relaxed(GENMASK(31, 0), GICD + GICD_ICENABLER + i * 4);
- writel_relaxed(irqs->isenabler, GICD + GICD_ISENABLER + i * 4);
-
- writel_relaxed(GENMASK(31, 0), GICD + GICD_ICACTIVER + i * 4);
- writel_relaxed(irqs->isactiver, GICD + GICD_ISACTIVER + i * 4);
- }
+#ifdef CONFIG_GICV3_ESPI
+ for ( i = 0; i < gicv3_info.nr_espi / 32; i++ )
+ gicv3_restore_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
+#endif
writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
gicv3_dist_wait_for_rwp();
--
2.48.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-01 22:10 ` [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
@ 2025-09-02 5:56 ` Mykola Kvach
2025-09-02 14:33 ` Jan Beulich
1 sibling, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-02 5:56 UTC (permalink / raw)
To: xen-devel
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD, Jan Beulich,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach
Hi everyone,
This is the first commit in the second part of the patch series that
cannot exist without part 1. All previous commits are independent and
do not depend on part 1.
On Tue, Sep 2, 2025 at 1:10 AM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> Trigger Xen suspend when the hardware domain initiates suspend via
> SHUTDOWN_suspend. Redirect system suspend to CPU#0 to ensure the
> suspend logic runs on the boot CPU, as required.
>
> Introduce full suspend/resume infrastructure gated by CONFIG_SYSTEM_SUSPEND,
> including logic to:
> - disable and enable non-boot physical CPUs
> - freeze and thaw domains
> - suspend and resume the GIC, timer, iommu and console
> - maintain system state before and after suspend
>
> On boot, init_ttbr is normally initialized during secondary CPU hotplug.
> On uniprocessor systems, this would leave init_ttbr uninitialized,
> causing resume to fail. To address this, the boot CPU now sets init_ttbr
> during suspend.
>
> Remove the restriction in the vPSCI interface preventing suspend from the
> hardware domain.
>
> Select HAS_SYSTEM_SUSPEND for ARM_64.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in v6:
> - Minor changes in comments.
> - The implementation now uses own tasklet instead of continue_hypercall_on_cpu,
> as the latter rewrites user registers and would tie system suspend status
> to guest suspend status.
> - The order of calls when suspending devices has been updated.
>
> Changes in v5:
> - select HAS_SYSTEM_SUSPEND in ARM_64 instead of ARM
> - check llc_coloring_enabled instead of LLC_COLORING during the selection
> of HAS_SYSTEM_SUSPEND config
> - call host_system_suspend from guest PSCI system suspend instead of
> arch_domain_shutdown, reducing the complexity of the new code
> - update some comments
>
> Changes introduced in V4:
> - drop code for saving and restoring VCPU context (for more information
> refer part 1 of patch series)
> - remove IOMMU suspend and resume calls until they will be implemented
> - move system suspend logic to arch_domain_shutdown, invoked from
> domain_shutdown
> - apply minor fixes such as renaming functions, updating comments, and
> modifying the commit message to reflect these changes
> - add console_end_sync to resume path after system suspend
>
> Changes introduced in V3:
> - merge changes from other commits into this patch (previously stashed):
> 1) disable/enable non-boot physical CPUs and freeze/thaw domains during
> suspend/resume
> 2) suspend/resume the timer, GIC, console, IOMMU, and hardware domain
> ---
> xen/arch/arm/Kconfig | 1 +
> xen/arch/arm/include/asm/mm.h | 2 +
> xen/arch/arm/include/asm/suspend.h | 2 +
> xen/arch/arm/mmu/smpboot.c | 2 +-
> xen/arch/arm/suspend.c | 150 +++++++++++++++++++++++++++++
> xen/arch/arm/vpsci.c | 9 +-
> xen/common/domain.c | 4 +
> 7 files changed, 168 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 5355534f3d..fdad53fd68 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -8,6 +8,7 @@ config ARM_64
> depends on !ARM_32
> select 64BIT
> select HAS_FAST_MULTIPLY
> + select HAS_SYSTEM_SUSPEND if UNSUPPORTED
> select HAS_VPCI_GUEST_SUPPORT if PCI_PASSTHROUGH
>
> config ARM
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 7a93dad2ed..61e211d087 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -365,6 +365,8 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
> } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
> }
>
> +void set_init_ttbr(lpae_t *root);
> +
> #endif /* __ARCH_ARM_MM__ */
> /*
> * Local variables:
> diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
> index 29eed4ee7f..8d30b01b7c 100644
> --- a/xen/arch/arm/include/asm/suspend.h
> +++ b/xen/arch/arm/include/asm/suspend.h
> @@ -29,6 +29,8 @@ extern struct cpu_context cpu_context;
> void hyp_resume(void);
> int prepare_resume_ctx(struct cpu_context *ptr);
>
> +void host_system_suspend(void);
> +
> #endif /* CONFIG_SYSTEM_SUSPEND */
>
> #endif /* __ASM_ARM_SUSPEND_H__ */
> diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
> index 37e91d72b7..ff508ecf40 100644
> --- a/xen/arch/arm/mmu/smpboot.c
> +++ b/xen/arch/arm/mmu/smpboot.c
> @@ -72,7 +72,7 @@ static void clear_boot_pagetables(void)
> clear_table(boot_third);
> }
>
> -static void set_init_ttbr(lpae_t *root)
> +void set_init_ttbr(lpae_t *root)
> {
> /*
> * init_ttbr is part of the identity mapping which is read-only. So
> diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> index 5093f1bf3d..35b20581f1 100644
> --- a/xen/arch/arm/suspend.c
> +++ b/xen/arch/arm/suspend.c
> @@ -1,9 +1,159 @@
> /* SPDX-License-Identifier: GPL-2.0-only */
>
> +#include <asm/psci.h>
> #include <asm/suspend.h>
>
> +#include <xen/console.h>
> +#include <xen/cpu.h>
> +#include <xen/llc-coloring.h>
> +#include <xen/sched.h>
> +#include <xen/tasklet.h>
> +
> +/*
> + * TODO list:
> + * - Decide which domain will trigger system suspend ctl or hw ?
> + * - Test system suspend with LLC_COLORING enabled and verify functionality
> + * - Implement IOMMU suspend/resume handlers and integrate them
> + * into the suspend/resume path (SMMU)
> + * - Enable "xl suspend" support on ARM architecture
> + * - Properly disable Xen timer watchdog from relevant services (init.d left)
> + * - Add suspend/resume CI test for ARM (QEMU if feasible)
> + * - Investigate feasibility and need for implementing system suspend on ARM32
> + */
> +
> struct cpu_context cpu_context;
>
> +/* Xen suspend. Note: data is not used (suspend is the suspend to RAM) */
> +static void cf_check system_suspend(void *data)
> +{
> + int status;
> + unsigned long flags;
> + /* TODO: drop check after verification that features can work together */
> + if ( llc_coloring_enabled )
> + {
> + dprintk(XENLOG_ERR,
> + "System suspend is not supported with LLC_COLORING enabled\n");
> + status = -ENOSYS;
> + goto dom_resume;
> + }
> +
> + BUG_ON(system_state != SYS_STATE_active);
> +
> + system_state = SYS_STATE_suspend;
> +
> + printk("Xen suspending...\n");
> +
> + freeze_domains();
> + scheduler_disable();
> +
> + /*
> + * Non-boot CPUs have to be disabled on suspend and enabled on resume
> + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
> + * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
> + * platform capabilities, this may lead to the physical powering down of
> + * CPUs.
> + */
> + status = disable_nonboot_cpus();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_nonboot_cpus;
> + }
> +
> + time_suspend();
> +
> + console_start_sync();
> + status = console_suspend();
> + if ( status )
> + {
> + dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", status);
> + system_state = SYS_STATE_resume;
> + goto resume_console;
> + }
> +
> + local_irq_save(flags);
> + status = gic_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_irqs;
> + }
> +
> + set_init_ttbr(xen_pgtable);
> +
> + /*
> + * Enable identity mapping before entering suspend to simplify
> + * the resume path
> + */
> + update_boot_mapping(true);
> +
> + if ( prepare_resume_ctx(&cpu_context) )
> + {
> + status = call_psci_system_suspend();
> + /*
> + * If suspend is finalized properly by above system suspend PSCI call,
> + * the code below in this 'if' branch will never execute. Execution
> + * will continue from hyp_resume which is the hypervisor's resume point.
> + * In hyp_resume CPU context will be restored and since link-register is
> + * restored as well, it will appear to return from prepare_resume_ctx.
> + * The difference in returning from prepare_resume_ctx on system suspend
> + * versus resume is in function's return value: on suspend, the return
> + * value is a non-zero value, on resume it is zero. That is why the
> + * control flow will not re-enter this 'if' branch on resume.
> + */
> + if ( status )
> + dprintk(XENLOG_WARNING, "PSCI system suspend failed, err=%d\n",
> + status);
> + }
> +
> + system_state = SYS_STATE_resume;
> + update_boot_mapping(false);
> +
> + gic_resume();
> +
> + resume_irqs:
> + local_irq_restore(flags);
> +
> + resume_console:
> + console_resume();
> + console_end_sync();
> +
> + time_resume();
> +
> + resume_nonboot_cpus:
> + /*
> + * The rcu_barrier() has to be added to ensure that the per cpu area is
> + * freed before a non-boot CPU tries to initialize it (_free_percpu_area()
> + * has to be called before the init_percpu_area()). This scenario occurs
> + * when non-boot CPUs are hot-unplugged on suspend and hotplugged on resume.
> + */
> + rcu_barrier();
> + enable_nonboot_cpus();
> + scheduler_enable();
> + thaw_domains();
> +
> + system_state = SYS_STATE_active;
> +
> + printk("Resume (status %d)\n", status);
> +
> + dom_resume:
> + /* The resume of hardware domain should always follow Xen's resume. */
> + domain_resume(hardware_domain);
> +}
> +
> +static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
> +
> +void host_system_suspend(void)
> +{
> + /*
> + * system_suspend should be called when hardware domain finalizes the
> + * suspend procedure from its boot core (VCPU#0). However, Dom0's vCPU#0
> + * could be mapped to any pCPU. The suspend procedure has to be finalized
> + * by the pCPU#0 (non-boot pCPUs will be disabled during the suspend).
> + */
> + tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
> +}
> +
> /*
> * Local variables:
> * mode: C
> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> index 22c3a5f544..2f52aba48d 100644
> --- a/xen/arch/arm/vpsci.c
> +++ b/xen/arch/arm/vpsci.c
> @@ -4,6 +4,7 @@
> #include <xen/types.h>
>
> #include <asm/current.h>
> +#include <asm/suspend.h>
> #include <asm/vgic.h>
> #include <asm/vpsci.h>
> #include <asm/event.h>
> @@ -221,9 +222,10 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> if ( is_64bit_domain(d) && is_thumb )
> return PSCI_INVALID_ADDRESS;
>
> - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> +#ifndef CONFIG_SYSTEM_SUSPEND
> if ( is_hardware_domain(d) )
> return PSCI_NOT_SUPPORTED;
> +#endif
>
> /* Ensure that all CPUs other than the calling one are offline */
> domain_lock(d);
> @@ -249,6 +251,11 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> "SYSTEM_SUSPEND requested, epoint=0x%"PRIregister", cid=0x%"PRIregister"\n",
> epoint, cid);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + if ( is_hardware_domain(d) )
> + host_system_suspend();
> +#endif
> +
> return rc;
> }
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 667017c5e1..5e224740d3 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
> d->shutdown_code = reason;
> reason = d->shutdown_code;
>
> +#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
> + if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
> +#else
> if ( is_hardware_domain(d) )
> +#endif
> hwdom_shutdown(reason);
>
> if ( d->is_shutting_down )
> --
> 2.48.1
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-01 22:10 ` [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
2025-09-02 5:56 ` Mykola Kvach
@ 2025-09-02 14:33 ` Jan Beulich
2025-09-03 4:31 ` Mykola Kvach
1 sibling, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2025-09-02 14:33 UTC (permalink / raw)
To: Mykola Kvach
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
On 02.09.2025 00:10, Mykola Kvach wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
> d->shutdown_code = reason;
> reason = d->shutdown_code;
>
> +#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
> + if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
> +#else
> if ( is_hardware_domain(d) )
> +#endif
> hwdom_shutdown(reason);
I still don't follow why Arm-specific code needs to live here. If this
can't be properly abstracted, then at the very least I'd expect some
code comment here, or at the very, very least something in the description.
From looking at hwdom_shutdown() I get the impression that it doesn't
expect to be called with SHUTDOWN_suspend, yet then the question is why we
make it into domain_shutdown() with that reason code.
Jan
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions
2025-09-01 22:10 ` [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 " Mykola Kvach
@ 2025-09-02 16:08 ` Oleksandr Tyshchenko
2025-09-02 17:30 ` Mykola Kvach
0 siblings, 1 reply; 37+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-02 16:08 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
On 02.09.25 01:10, Mykola Kvach wrote:
Hello Mykola
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> System suspend may lead to a state where GIC would be powered down.
> Therefore, Xen should save/restore the context of GIC on suspend/resume.
>
> Note that the context consists of states of registers which are
> controlled by the hypervisor. Other GIC registers which are accessible
> by guests are saved/restored on context switch.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V6:
> - Drop gicv3_save/restore_state since it is already handled during vCPU
> context switch.
> - The comment about systems without SPIs is clarified for readability.
> - Error and warning messages related to suspend context allocation are unified
> and now use printk() with XENLOG_ERR for consistency.
> - The check for suspend context allocation in gicv3_resume() is removed,
> as it is handled earlier in the suspend path.
> - The loop for saving and restoring PPI/SGI priorities is corrected to use
> the proper increment.
> - The gicv3_suspend() function now prints an explicit error if ITS suspend
> support is not implemented, and returns ENOSYS in this case.
> - The GICD_CTLR_DS bit definition is added to gic_v3_defs.h.
> - The comment for GICR_WAKER access is expanded to reference the relevant
> ARM specification section and clarify the RAZ/WI behavior for Non-secure
> accesses.
> - Cleanup active and enable registers before restoring.
> ---
> xen/arch/arm/gic-v3-lpi.c | 3 +
> xen/arch/arm/gic-v3.c | 235 +++++++++++++++++++++++++
> xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
> 3 files changed, 239 insertions(+)
>
> diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
> index de5052e5cf..61a6e18303 100644
> --- a/xen/arch/arm/gic-v3-lpi.c
> +++ b/xen/arch/arm/gic-v3-lpi.c
> @@ -391,6 +391,9 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> switch ( action )
> {
> case CPU_UP_PREPARE:
> + if ( system_state == SYS_STATE_resume )
> + break;
> +
> rc = gicv3_lpi_allocate_pendtable(cpu);
> if ( rc )
> printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%lu\n",
> diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> index cd3e1acf79..9f1be7e905 100644
> --- a/xen/arch/arm/gic-v3.c
> +++ b/xen/arch/arm/gic-v3.c
> @@ -1776,6 +1776,233 @@ static bool gic_dist_supports_lpis(void)
> return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +/* GICv3 registers to be saved/restored on system suspend/resume */
> +struct gicv3_ctx {
> + struct dist_ctx {
> + uint32_t ctlr;
> + /*
> + * This struct represent block of 32 IRQs
> + * TODO: store extended SPI configuration (GICv3.1+)
> + */
> + struct irq_regs {
> + uint32_t icfgr[2];
> + uint32_t ipriorityr[8];
> + uint64_t irouter[32];
> + uint32_t isactiver;
> + uint32_t isenabler;
> + } *irqs;
> + } dist;
> +
> + /* have only one rdist structure for last running CPU during suspend */
> + struct redist_ctx {
> + uint32_t ctlr;
> + /* TODO: handle case when we have more than 16 PPIs (GICv3.1+) */
> + uint32_t icfgr[2];
> + uint32_t igroupr;
> + uint32_t ipriorityr[8];
> + uint32_t isactiver;
> + uint32_t isenabler;
> + } rdist;
> +
> + struct cpu_ctx {
> + uint32_t ctlr;
> + uint32_t pmr;
> + uint32_t bpr;
> + uint32_t sre_el2;
> + uint32_t grpen;
> + } cpu;
> +};
> +
> +static struct gicv3_ctx gicv3_ctx;
> +
> +static void __init gicv3_alloc_context(void)
> +{
> + uint32_t blocks = DIV_ROUND_UP(gicv3_info.nr_lines, 32);
> +
> + /* We don't have ITS support for suspend */
> + if ( gicv3_its_host_has_its() )
> + return;
> +
> + /* The spec allows for systems without any SPIs */
> + if ( blocks > 1 )
> + {
> + gicv3_ctx.dist.irqs = xzalloc_array(typeof(*gicv3_ctx.dist.irqs),
> + blocks - 1);
> + if ( !gicv3_ctx.dist.irqs )
> + printk(XENLOG_ERR "Failed to allocate memory for GICv3 suspend context\n");
> + }
> +}
> +
> +static void gicv3_disable_redist(void)
> +{
> + void __iomem* waker = GICD_RDIST_BASE + GICR_WAKER;
> +
> + /*
> + * Avoid infinite loop if Non-secure does not have access to GICR_WAKER.
> + * See Arm IHI 0069H.b, 12.11.42 GICR_WAKER:
> + * When GICD_CTLR.DS == 0 and an access is Non-secure accesses to this
> + * register are RAZ/WI.
> + */
> + if ( !(readl_relaxed(GICD + GICD_CTLR) & GICD_CTLR_DS) )
> + return;
> +
> + writel_relaxed(readl_relaxed(waker) | GICR_WAKER_ProcessorSleep, waker);
> + while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 );
> +}
> +
> +static int gicv3_suspend(void)
> +{
> + unsigned int i;
> + void __iomem *base;
> + typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
> +
> + /* TODO: implement support for ITS */
> + if ( gicv3_its_host_has_its() )
> + {
> + printk(XENLOG_ERR "GICv3: ITS suspend support is not implemented\n");
> + return -ENOSYS;
> + }
> +
> + if ( !gicv3_ctx.dist.irqs && gicv3_info.nr_lines > NR_GIC_LOCAL_IRQS )
> + {
> + printk(XENLOG_ERR "GICv3: suspend context is not allocated!\n");
> + return -ENOMEM;
> + }
> +
> + /* Save GICC configuration */
> + gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
> + gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
> + gicv3_ctx.cpu.bpr = READ_SYSREG(ICC_BPR1_EL1);
> + gicv3_ctx.cpu.sre_el2 = READ_SYSREG(ICC_SRE_EL2);
> + gicv3_ctx.cpu.grpen = READ_SYSREG(ICC_IGRPEN1_EL1);
> +
> + gicv3_disable_interface();
> + gicv3_disable_redist();
> +
> + /* Save GICR configuration */
> + gicv3_redist_wait_for_rwp();
> +
> + base = GICD_RDIST_SGI_BASE;
> +
> + rdist->ctlr = readl_relaxed(base + GICR_CTLR);
> +
> + /* Save priority on PPI and SGI interrupts */
> + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> + rdist->ipriorityr[i] = readl_relaxed(base + GICR_IPRIORITYR0 + 4 * i);
> +
> + rdist->isactiver = readl_relaxed(base + GICR_ISACTIVER0);
> + rdist->isenabler = readl_relaxed(base + GICR_ISENABLER0);
> + rdist->igroupr = readl_relaxed(base + GICR_IGROUPR0);
> + rdist->icfgr[0] = readl_relaxed(base + GICR_ICFGR0);
GICR_ICFGR0 is for SGIs, which are always edge-triggered, so I am not
sure that we need to save it here ...
> + rdist->icfgr[1] = readl_relaxed(base + GICR_ICFGR1);
> +
> + /* Save GICD configuration */
> + gicv3_dist_wait_for_rwp();
> + gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
> +
> + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> + {
> + typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
> + unsigned int irq;
> +
> + base = GICD + GICD_ICFGR + 8 * i;
> + irqs->icfgr[0] = readl_relaxed(base);
> + irqs->icfgr[1] = readl_relaxed(base + 4);
> +
> + base = GICD + GICD_IPRIORITYR + 32 * i;
> + for ( irq = 0; irq < 8; irq++ )
> + irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
> +
> + base = GICD + GICD_IROUTER + 32 * i;
> + for ( irq = 0; irq < 32; irq++ )
> + irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
> +
> + irqs->isactiver = readl_relaxed(GICD + GICD_ISACTIVER + 4 * i);
> + irqs->isenabler = readl_relaxed(GICD + GICD_ISENABLER + 4 * i);
> + }
> +
> + return 0;
> +}
> +
> +static void gicv3_resume(void)
> +{
> + unsigned int i;
> + void __iomem *base;
> + typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
> +
> + writel_relaxed(0, GICD + GICD_CTLR);
> +
> + for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
> + writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
> +
> + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> + {
> + typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
> + unsigned int irq;
> +
> + base = GICD + GICD_ICFGR + 8 * i;
> + writel_relaxed(irqs->icfgr[0], base);
> + writel_relaxed(irqs->icfgr[1], base + 4);
> +
> + base = GICD + GICD_IPRIORITYR + 32 * i;
> + for ( irq = 0; irq < 8; irq++ )
> + writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
> +
> + base = GICD + GICD_IROUTER + 32 * i;
> + for ( irq = 0; irq < 32; irq++ )
> + writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
> +
> + writel_relaxed(GENMASK(31, 0), GICD + GICD_ICENABLER + i * 4);
> + writel_relaxed(irqs->isenabler, GICD + GICD_ISENABLER + i * 4);
> +
> + writel_relaxed(GENMASK(31, 0), GICD + GICD_ICACTIVER + i * 4);
> + writel_relaxed(irqs->isactiver, GICD + GICD_ISACTIVER + i * 4);
> + }
> +
> + writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
> + gicv3_dist_wait_for_rwp();
> +
> + /* Restore GICR (Redistributor) configuration */
> + gicv3_enable_redist();
> +
> + base = GICD_RDIST_SGI_BASE;
> +
> + writel_relaxed(0xffffffff, base + GICR_ICENABLER0);
> + gicv3_redist_wait_for_rwp();
> +
> + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> + writel_relaxed(rdist->ipriorityr[i], base + GICR_IPRIORITYR0 + i * 4);
> +
> + writel_relaxed(rdist->isactiver, base + GICR_ISACTIVER0);
> +
> + writel_relaxed(rdist->igroupr, base + GICR_IGROUPR0);
> + writel_relaxed(rdist->icfgr[0], base + GICR_ICFGR0);
... and restore it here.
> + writel_relaxed(rdist->icfgr[1], base + GICR_ICFGR1);
> +
[snip]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-01 22:10 ` [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during " Mykola Kvach
@ 2025-09-02 16:49 ` Oleksandr Tyshchenko
2025-09-02 17:43 ` Mykola Kvach
2025-09-02 22:21 ` Mykola Kvach
0 siblings, 2 replies; 37+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-02 16:49 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
On 02.09.25 01:10, Mykola Kvach wrote:
Hello Mykola
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> and not restored by gic_resume (for secondary cpus).
>
> This patch introduces restore_local_irqs_on_resume, a function that
> restores the state of local interrupts on the target CPU during
> system resume.
>
> It iterates over all local IRQs and re-enables those that were not
> disabled, reprogramming their routing and affinity accordingly.
>
> The function is invoked from start_secondary, ensuring that local IRQ
> state is restored early during CPU bring-up after suspend.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V6:
> - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> - Move the system state check outside of restore_local_irqs_on_resume()
> ---
> xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index 6c899347ca..ddd2940554 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> return 0;
> }
>
> +/*
> + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> + * so call this function on the target CPU to restore them.
> + *
> + * SPIs are restored via gic_resume.
> + */
> +static void restore_local_irqs_on_resume(void)
> +{
> + int irq;
NIT: Please, use "unsigned int" if irq cannot be negative
> +
> + spin_lock(&local_irqs_type_lock);
> +
> + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> + {
> + struct irq_desc *desc = irq_to_desc(irq);
> +
> + spin_lock(&desc->lock);
> +
> + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> + {
> + spin_unlock(&desc->lock);
> + continue;
> + }
> +
> + /* Disable the IRQ to avoid assertions in the following calls */
> + desc->handler->disable(desc);
> + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
Shouldn't we use GIC_PRI_IPI for SGIs?
> + desc->handler->startup(desc);
> +
> + spin_unlock(&desc->lock);
> + }
> +
> + spin_unlock(&local_irqs_type_lock);
> +}
> +
> static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> void *hcpu)
> {
> @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> cpu);
> break;
> + case CPU_STARTING:
> + if ( system_state == SYS_STATE_resume )
> + restore_local_irqs_on_resume();
> + break;
May I please ask, why all this new code (i.e.
restore_local_irqs_on_resume()) is not covered by #ifdef
CONFIG_SYSTEM_SUSPEND?
> }
>
> return notifier_from_errno(rc);
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume
2025-09-01 22:10 ` [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
@ 2025-09-02 17:25 ` Oleksandr Tyshchenko
2025-09-02 17:46 ` Mykola Kvach
2025-09-02 20:51 ` Volodymyr Babchuk
1 sibling, 1 reply; 37+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-02 17:25 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Rahul Singh,
Mykola Kvach
On 02.09.25 01:10, Mykola Kvach wrote:
Hello Mykola
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> This is done using generic iommu_suspend/resume functions that cause
> IOMMU driver specific suspend/resume handlers to be called for enabled
> IOMMU (if one has suspend/resume driver handlers implemented).
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V6:
> - Drop iommu_enabled check from host system suspend.
I do not have any comments for the updated version, thanks.
> ---
> xen/arch/arm/suspend.c | 11 +++++++++++
> xen/drivers/passthrough/arm/smmu-v3.c | 10 ++++++++++
> xen/drivers/passthrough/arm/smmu.c | 10 ++++++++++
> 3 files changed, 31 insertions(+)
>
> diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> index 35b20581f1..f3a3b831c5 100644
> --- a/xen/arch/arm/suspend.c
> +++ b/xen/arch/arm/suspend.c
> @@ -5,6 +5,7 @@
>
> #include <xen/console.h>
> #include <xen/cpu.h>
> +#include <xen/iommu.h>
> #include <xen/llc-coloring.h>
> #include <xen/sched.h>
> #include <xen/tasklet.h>
> @@ -62,6 +63,13 @@ static void cf_check system_suspend(void *data)
>
> time_suspend();
>
> + status = iommu_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_time;
> + }
> +
> console_start_sync();
> status = console_suspend();
> if ( status )
> @@ -118,6 +126,9 @@ static void cf_check system_suspend(void *data)
> console_resume();
> console_end_sync();
>
> + iommu_resume();
> +
> + resume_time:
> time_resume();
>
> resume_nonboot_cpus:
> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> index 81071f4018..f887faf7dc 100644
> --- a/xen/drivers/passthrough/arm/smmu-v3.c
> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> @@ -2854,6 +2854,13 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
> +
> static const struct iommu_ops arm_smmu_iommu_ops = {
> .page_sizes = PAGE_SIZE_4K,
> .init = arm_smmu_iommu_xen_domain_init,
> @@ -2866,6 +2873,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = arm_smmu_dt_xlate,
> .add_device = arm_smmu_add_device,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = arm_smmu_suspend,
> +#endif
> };
>
> static __init int arm_smmu_dt_init(struct dt_device_node *dev,
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 22d306d0cb..45f29ef8ec 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
> +
> static const struct iommu_ops arm_smmu_iommu_ops = {
> .page_sizes = PAGE_SIZE_4K,
> .init = arm_smmu_iommu_domain_init,
> @@ -2960,6 +2967,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> .map_page = arm_iommu_map_page,
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = arm_smmu_dt_xlate_generic,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = arm_smmu_suspend,
> +#endif
> };
>
> static struct arm_smmu_device *find_smmu(const struct device *dev)
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions
2025-09-02 16:08 ` Oleksandr Tyshchenko
@ 2025-09-02 17:30 ` Mykola Kvach
0 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-02 17:30 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Oleksandr,
On Tue, Sep 2, 2025 at 7:08 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
>
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > System suspend may lead to a state where GIC would be powered down.
> > Therefore, Xen should save/restore the context of GIC on suspend/resume.
> >
> > Note that the context consists of states of registers which are
> > controlled by the hypervisor. Other GIC registers which are accessible
> > by guests are saved/restored on context switch.
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - Drop gicv3_save/restore_state since it is already handled during vCPU
> > context switch.
> > - The comment about systems without SPIs is clarified for readability.
> > - Error and warning messages related to suspend context allocation are unified
> > and now use printk() with XENLOG_ERR for consistency.
> > - The check for suspend context allocation in gicv3_resume() is removed,
> > as it is handled earlier in the suspend path.
> > - The loop for saving and restoring PPI/SGI priorities is corrected to use
> > the proper increment.
> > - The gicv3_suspend() function now prints an explicit error if ITS suspend
> > support is not implemented, and returns ENOSYS in this case.
> > - The GICD_CTLR_DS bit definition is added to gic_v3_defs.h.
> > - The comment for GICR_WAKER access is expanded to reference the relevant
> > ARM specification section and clarify the RAZ/WI behavior for Non-secure
> > accesses.
> > - Cleanup active and enable registers before restoring.
> > ---
> > xen/arch/arm/gic-v3-lpi.c | 3 +
> > xen/arch/arm/gic-v3.c | 235 +++++++++++++++++++++++++
> > xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
> > 3 files changed, 239 insertions(+)
> >
> > diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
> > index de5052e5cf..61a6e18303 100644
> > --- a/xen/arch/arm/gic-v3-lpi.c
> > +++ b/xen/arch/arm/gic-v3-lpi.c
> > @@ -391,6 +391,9 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > switch ( action )
> > {
> > case CPU_UP_PREPARE:
> > + if ( system_state == SYS_STATE_resume )
> > + break;
> > +
> > rc = gicv3_lpi_allocate_pendtable(cpu);
> > if ( rc )
> > printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%lu\n",
> > diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> > index cd3e1acf79..9f1be7e905 100644
> > --- a/xen/arch/arm/gic-v3.c
> > +++ b/xen/arch/arm/gic-v3.c
> > @@ -1776,6 +1776,233 @@ static bool gic_dist_supports_lpis(void)
> > return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +/* GICv3 registers to be saved/restored on system suspend/resume */
> > +struct gicv3_ctx {
> > + struct dist_ctx {
> > + uint32_t ctlr;
> > + /*
> > + * This struct represent block of 32 IRQs
> > + * TODO: store extended SPI configuration (GICv3.1+)
> > + */
> > + struct irq_regs {
> > + uint32_t icfgr[2];
> > + uint32_t ipriorityr[8];
> > + uint64_t irouter[32];
> > + uint32_t isactiver;
> > + uint32_t isenabler;
> > + } *irqs;
> > + } dist;
> > +
> > + /* have only one rdist structure for last running CPU during suspend */
> > + struct redist_ctx {
> > + uint32_t ctlr;
> > + /* TODO: handle case when we have more than 16 PPIs (GICv3.1+) */
> > + uint32_t icfgr[2];
> > + uint32_t igroupr;
> > + uint32_t ipriorityr[8];
> > + uint32_t isactiver;
> > + uint32_t isenabler;
> > + } rdist;
> > +
> > + struct cpu_ctx {
> > + uint32_t ctlr;
> > + uint32_t pmr;
> > + uint32_t bpr;
> > + uint32_t sre_el2;
> > + uint32_t grpen;
> > + } cpu;
> > +};
> > +
> > +static struct gicv3_ctx gicv3_ctx;
> > +
> > +static void __init gicv3_alloc_context(void)
> > +{
> > + uint32_t blocks = DIV_ROUND_UP(gicv3_info.nr_lines, 32);
> > +
> > + /* We don't have ITS support for suspend */
> > + if ( gicv3_its_host_has_its() )
> > + return;
> > +
> > + /* The spec allows for systems without any SPIs */
> > + if ( blocks > 1 )
> > + {
> > + gicv3_ctx.dist.irqs = xzalloc_array(typeof(*gicv3_ctx.dist.irqs),
> > + blocks - 1);
> > + if ( !gicv3_ctx.dist.irqs )
> > + printk(XENLOG_ERR "Failed to allocate memory for GICv3 suspend context\n");
> > + }
> > +}
> > +
> > +static void gicv3_disable_redist(void)
> > +{
> > + void __iomem* waker = GICD_RDIST_BASE + GICR_WAKER;
> > +
> > + /*
> > + * Avoid infinite loop if Non-secure does not have access to GICR_WAKER.
> > + * See Arm IHI 0069H.b, 12.11.42 GICR_WAKER:
> > + * When GICD_CTLR.DS == 0 and an access is Non-secure accesses to this
> > + * register are RAZ/WI.
> > + */
> > + if ( !(readl_relaxed(GICD + GICD_CTLR) & GICD_CTLR_DS) )
> > + return;
> > +
> > + writel_relaxed(readl_relaxed(waker) | GICR_WAKER_ProcessorSleep, waker);
> > + while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 );
> > +}
> > +
> > +static int gicv3_suspend(void)
> > +{
> > + unsigned int i;
> > + void __iomem *base;
> > + typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
> > +
> > + /* TODO: implement support for ITS */
> > + if ( gicv3_its_host_has_its() )
> > + {
> > + printk(XENLOG_ERR "GICv3: ITS suspend support is not implemented\n");
> > + return -ENOSYS;
> > + }
> > +
> > + if ( !gicv3_ctx.dist.irqs && gicv3_info.nr_lines > NR_GIC_LOCAL_IRQS )
> > + {
> > + printk(XENLOG_ERR "GICv3: suspend context is not allocated!\n");
> > + return -ENOMEM;
> > + }
> > +
> > + /* Save GICC configuration */
> > + gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
> > + gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
> > + gicv3_ctx.cpu.bpr = READ_SYSREG(ICC_BPR1_EL1);
> > + gicv3_ctx.cpu.sre_el2 = READ_SYSREG(ICC_SRE_EL2);
> > + gicv3_ctx.cpu.grpen = READ_SYSREG(ICC_IGRPEN1_EL1);
> > +
> > + gicv3_disable_interface();
> > + gicv3_disable_redist();
> > +
> > + /* Save GICR configuration */
> > + gicv3_redist_wait_for_rwp();
> > +
> > + base = GICD_RDIST_SGI_BASE;
> > +
> > + rdist->ctlr = readl_relaxed(base + GICR_CTLR);
> > +
> > + /* Save priority on PPI and SGI interrupts */
> > + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> > + rdist->ipriorityr[i] = readl_relaxed(base + GICR_IPRIORITYR0 + 4 * i);
> > +
> > + rdist->isactiver = readl_relaxed(base + GICR_ISACTIVER0);
> > + rdist->isenabler = readl_relaxed(base + GICR_ISENABLER0);
> > + rdist->igroupr = readl_relaxed(base + GICR_IGROUPR0);
> > + rdist->icfgr[0] = readl_relaxed(base + GICR_ICFGR0);
>
> GICR_ICFGR0 is for SGIs, which are always edge-triggered, so I am not
> sure that we need to save it here ...
Looks like I didn’t read the spec carefully and only paid attention to:
12.11.7 GICR_ICFGR0, Interrupt Configuration Register 0
Determines whether the corresponding SGI is edge-triggered or level-sensitive.
But a few lines below it states:
but a few lines below
Int_config<x>, bits [2x+1:2x], for x = 15 to 0
Indicates whether the is level-sensitive or edge-triggered.
0b00 Corresponding interrupt is level-sensitive.
0b10 Corresponding interrupt is edge-triggered.
SGIs are always edge-triggered.
>
>
> > + rdist->icfgr[1] = readl_relaxed(base + GICR_ICFGR1);
> > +
> > + /* Save GICD configuration */
> > + gicv3_dist_wait_for_rwp();
> > + gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
> > +
> > + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> > + {
> > + typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
> > + unsigned int irq;
> > +
> > + base = GICD + GICD_ICFGR + 8 * i;
> > + irqs->icfgr[0] = readl_relaxed(base);
> > + irqs->icfgr[1] = readl_relaxed(base + 4);
> > +
> > + base = GICD + GICD_IPRIORITYR + 32 * i;
> > + for ( irq = 0; irq < 8; irq++ )
> > + irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
> > +
> > + base = GICD + GICD_IROUTER + 32 * i;
> > + for ( irq = 0; irq < 32; irq++ )
> > + irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
> > +
> > + irqs->isactiver = readl_relaxed(GICD + GICD_ISACTIVER + 4 * i);
> > + irqs->isenabler = readl_relaxed(GICD + GICD_ISENABLER + 4 * i);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void gicv3_resume(void)
> > +{
> > + unsigned int i;
> > + void __iomem *base;
> > + typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
> > +
> > + writel_relaxed(0, GICD + GICD_CTLR);
> > +
> > + for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
> > + writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
> > +
> > + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> > + {
> > + typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
> > + unsigned int irq;
> > +
> > + base = GICD + GICD_ICFGR + 8 * i;
> > + writel_relaxed(irqs->icfgr[0], base);
> > + writel_relaxed(irqs->icfgr[1], base + 4);
> > +
> > + base = GICD + GICD_IPRIORITYR + 32 * i;
> > + for ( irq = 0; irq < 8; irq++ )
> > + writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
> > +
> > + base = GICD + GICD_IROUTER + 32 * i;
> > + for ( irq = 0; irq < 32; irq++ )
> > + writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
> > +
> > + writel_relaxed(GENMASK(31, 0), GICD + GICD_ICENABLER + i * 4);
> > + writel_relaxed(irqs->isenabler, GICD + GICD_ISENABLER + i * 4);
> > +
> > + writel_relaxed(GENMASK(31, 0), GICD + GICD_ICACTIVER + i * 4);
> > + writel_relaxed(irqs->isactiver, GICD + GICD_ISACTIVER + i * 4);
> > + }
> > +
> > + writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
> > + gicv3_dist_wait_for_rwp();
> > +
> > + /* Restore GICR (Redistributor) configuration */
> > + gicv3_enable_redist();
> > +
> > + base = GICD_RDIST_SGI_BASE;
> > +
> > + writel_relaxed(0xffffffff, base + GICR_ICENABLER0);
> > + gicv3_redist_wait_for_rwp();
> > +
> > + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> > + writel_relaxed(rdist->ipriorityr[i], base + GICR_IPRIORITYR0 + i * 4);
> > +
> > + writel_relaxed(rdist->isactiver, base + GICR_ISACTIVER0);
> > +
> > + writel_relaxed(rdist->igroupr, base + GICR_IGROUPR0);
> > + writel_relaxed(rdist->icfgr[0], base + GICR_ICFGR0);
>
> ... and restore it here.
Thank you for pointing that out.
I will remove it in the next version of the patch series.
>
>
> > + writel_relaxed(rdist->icfgr[1], base + GICR_ICFGR1);
> > +
>
> [snip]
Best regards,
Mykola
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 16:49 ` Oleksandr Tyshchenko
@ 2025-09-02 17:43 ` Mykola Kvach
2025-09-02 18:16 ` Oleksandr Tyshchenko
2025-09-02 22:21 ` Mykola Kvach
1 sibling, 1 reply; 37+ messages in thread
From: Mykola Kvach @ 2025-09-02 17:43 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Oleksandr,
On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> > and not restored by gic_resume (for secondary cpus).
> >
> > This patch introduces restore_local_irqs_on_resume, a function that
> > restores the state of local interrupts on the target CPU during
> > system resume.
> >
> > It iterates over all local IRQs and re-enables those that were not
> > disabled, reprogramming their routing and affinity accordingly.
> >
> > The function is invoked from start_secondary, ensuring that local IRQ
> > state is restored early during CPU bring-up after suspend.
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> > - Move the system state check outside of restore_local_irqs_on_resume()
> > ---
> > xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 39 insertions(+)
> >
> > diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> > index 6c899347ca..ddd2940554 100644
> > --- a/xen/arch/arm/irq.c
> > +++ b/xen/arch/arm/irq.c
> > @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> > return 0;
> > }
> >
> > +/*
> > + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> > + * so call this function on the target CPU to restore them.
> > + *
> > + * SPIs are restored via gic_resume.
> > + */
> > +static void restore_local_irqs_on_resume(void)
> > +{
> > + int irq;
>
> NIT: Please, use "unsigned int" if irq cannot be negative
ok
>
> > +
> > + spin_lock(&local_irqs_type_lock);
> > +
> > + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> > + {
> > + struct irq_desc *desc = irq_to_desc(irq);
> > +
> > + spin_lock(&desc->lock);
> > +
> > + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> > + {
> > + spin_unlock(&desc->lock);
> > + continue;
> > + }
> > +
> > + /* Disable the IRQ to avoid assertions in the following calls */
> > + desc->handler->disable(desc);
> > + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
>
> Shouldn't we use GIC_PRI_IPI for SGIs?
Yes, we should. But currently I am restoring the same value
as it was before suspend...
I definitely agree that this needs to be fixed at the original
place where the issue was introduced, but I was planning to
address it in a future patch.
>
>
> > + desc->handler->startup(desc);
> > +
> > + spin_unlock(&desc->lock);
> > + }
> > +
> > + spin_unlock(&local_irqs_type_lock);
> > +}
> > +
> > static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > void *hcpu)
> > {
> > @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> > cpu);
> > break;
> > + case CPU_STARTING:
> > + if ( system_state == SYS_STATE_resume )
> > + restore_local_irqs_on_resume();
> > + break;
>
> May I please ask, why all this new code (i.e.
> restore_local_irqs_on_resume()) is not covered by #ifdef
> CONFIG_SYSTEM_SUSPEND?
I don’t see a reason to introduce such "macaron-style" code. On ARM, the
system suspend state is only set when CONFIG_SYSTEM_SUSPEND is defined
anyway.
If you would prefer me to wrap all relevant code with this define, please
let me know and I’ll make the change. In this case, I think the current
approach is cleaner, but I’m open to your opinion.
>
> > }
> >
> > return notifier_from_errno(rc);
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume
2025-09-02 17:25 ` Oleksandr Tyshchenko
@ 2025-09-02 17:46 ` Mykola Kvach
0 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-02 17:46 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Rahul Singh,
Mykola Kvach
Hi Oleksandr,
On Tue, Sep 2, 2025 at 8:25 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >
> > This is done using generic iommu_suspend/resume functions that cause
> > IOMMU driver specific suspend/resume handlers to be called for enabled
> > IOMMU (if one has suspend/resume driver handlers implemented).
> >
> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - Drop iommu_enabled check from host system suspend.
>
> I do not have any comments for the updated version, thanks.
Thank you for your time and the review!
>
>
> > ---
> > xen/arch/arm/suspend.c | 11 +++++++++++
> > xen/drivers/passthrough/arm/smmu-v3.c | 10 ++++++++++
> > xen/drivers/passthrough/arm/smmu.c | 10 ++++++++++
> > 3 files changed, 31 insertions(+)
> >
> > diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> > index 35b20581f1..f3a3b831c5 100644
> > --- a/xen/arch/arm/suspend.c
> > +++ b/xen/arch/arm/suspend.c
> > @@ -5,6 +5,7 @@
> >
> > #include <xen/console.h>
> > #include <xen/cpu.h>
> > +#include <xen/iommu.h>
> > #include <xen/llc-coloring.h>
> > #include <xen/sched.h>
> > #include <xen/tasklet.h>
> > @@ -62,6 +63,13 @@ static void cf_check system_suspend(void *data)
> >
> > time_suspend();
> >
> > + status = iommu_suspend();
> > + if ( status )
> > + {
> > + system_state = SYS_STATE_resume;
> > + goto resume_time;
> > + }
> > +
> > console_start_sync();
> > status = console_suspend();
> > if ( status )
> > @@ -118,6 +126,9 @@ static void cf_check system_suspend(void *data)
> > console_resume();
> > console_end_sync();
> >
> > + iommu_resume();
> > +
> > + resume_time:
> > time_resume();
> >
> > resume_nonboot_cpus:
> > diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> > index 81071f4018..f887faf7dc 100644
> > --- a/xen/drivers/passthrough/arm/smmu-v3.c
> > +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> > @@ -2854,6 +2854,13 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> > xfree(xen_domain);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +static int arm_smmu_suspend(void)
> > +{
> > + return -ENOSYS;
> > +}
> > +#endif
> > +
> > static const struct iommu_ops arm_smmu_iommu_ops = {
> > .page_sizes = PAGE_SIZE_4K,
> > .init = arm_smmu_iommu_xen_domain_init,
> > @@ -2866,6 +2873,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> > .unmap_page = arm_iommu_unmap_page,
> > .dt_xlate = arm_smmu_dt_xlate,
> > .add_device = arm_smmu_add_device,
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + .suspend = arm_smmu_suspend,
> > +#endif
> > };
> >
> > static __init int arm_smmu_dt_init(struct dt_device_node *dev,
> > diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> > index 22d306d0cb..45f29ef8ec 100644
> > --- a/xen/drivers/passthrough/arm/smmu.c
> > +++ b/xen/drivers/passthrough/arm/smmu.c
> > @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> > xfree(xen_domain);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +static int arm_smmu_suspend(void)
> > +{
> > + return -ENOSYS;
> > +}
> > +#endif
> > +
> > static const struct iommu_ops arm_smmu_iommu_ops = {
> > .page_sizes = PAGE_SIZE_4K,
> > .init = arm_smmu_iommu_domain_init,
> > @@ -2960,6 +2967,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> > .map_page = arm_iommu_map_page,
> > .unmap_page = arm_iommu_unmap_page,
> > .dt_xlate = arm_smmu_dt_xlate_generic,
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + .suspend = arm_smmu_suspend,
> > +#endif
> > };
> >
> > static struct arm_smmu_device *find_smmu(const struct device *dev)
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 17:43 ` Mykola Kvach
@ 2025-09-02 18:16 ` Oleksandr Tyshchenko
2025-09-02 20:08 ` Mykola Kvach
0 siblings, 1 reply; 37+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-02 18:16 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On 02.09.25 20:43, Mykola Kvach wrote:
> Hi Oleksandr,
Hello Mykola
>
> On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>>
>>
>>
>> On 02.09.25 01:10, Mykola Kvach wrote:
>>
>> Hello Mykola
>>
>>> From: Mykola Kvach <mykola_kvach@epam.com>
>>>
>>> On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
>>> and not restored by gic_resume (for secondary cpus).
>>>
>>> This patch introduces restore_local_irqs_on_resume, a function that
>>> restores the state of local interrupts on the target CPU during
>>> system resume.
>>>
>>> It iterates over all local IRQs and re-enables those that were not
>>> disabled, reprogramming their routing and affinity accordingly.
>>>
>>> The function is invoked from start_secondary, ensuring that local IRQ
>>> state is restored early during CPU bring-up after suspend.
>>>
>>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
>>> ---
>>> Changes in V6:
>>> - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
>>> - Move the system state check outside of restore_local_irqs_on_resume()
>>> ---
>>> xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 39 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
>>> index 6c899347ca..ddd2940554 100644
>>> --- a/xen/arch/arm/irq.c
>>> +++ b/xen/arch/arm/irq.c
>>> @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
>>> return 0;
>>> }
>>>
>>> +/*
>>> + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
>>> + * so call this function on the target CPU to restore them.
>>> + *
>>> + * SPIs are restored via gic_resume.
>>> + */
>>> +static void restore_local_irqs_on_resume(void)
>>> +{
>>> + int irq;
>>
>> NIT: Please, use "unsigned int" if irq cannot be negative
>
> ok
>
>>
>>> +
>>> + spin_lock(&local_irqs_type_lock);
>>> +
>>> + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
>>> + {
>>> + struct irq_desc *desc = irq_to_desc(irq);
>>> +
>>> + spin_lock(&desc->lock);
>>> +
>>> + if ( test_bit(_IRQ_DISABLED, &desc->status) )
>>> + {
>>> + spin_unlock(&desc->lock);
>>> + continue;
>>> + }
>>> +
>>> + /* Disable the IRQ to avoid assertions in the following calls */
>>> + desc->handler->disable(desc);
>>> + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
>>
>> Shouldn't we use GIC_PRI_IPI for SGIs?
>
> Yes, we should. But currently I am restoring the same value
> as it was before suspend...
>
> I definitely agree that this needs to be fixed at the original
> place where the issue was introduced, but I was planning to
> address it in a future patch.
>
>>
>>
>>> + desc->handler->startup(desc);
>>> +
>>> + spin_unlock(&desc->lock);
>>> + }
>>> +
>>> + spin_unlock(&local_irqs_type_lock);
>>> +}
>>> +
>>> static int cpu_callback(struct notifier_block *nfb, unsigned long action,
>>> void *hcpu)
>>> {
>>> @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
>>> printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
>>> cpu);
>>> break;
>>> + case CPU_STARTING:
>>> + if ( system_state == SYS_STATE_resume )
>>> + restore_local_irqs_on_resume();
>>> + break;
>>
>> May I please ask, why all this new code (i.e.
>> restore_local_irqs_on_resume()) is not covered by #ifdef
>> CONFIG_SYSTEM_SUSPEND?
>
> I don’t see a reason to introduce such "macaron-style" code. On ARM, the
> system suspend state is only set when CONFIG_SYSTEM_SUSPEND is defined
> anyway.
right
>
> If you would prefer me to wrap all relevant code with this define, please
> let me know and I’ll make the change. In this case, I think the current
> approach is cleaner, but I’m open to your opinion.
In other patches, you seem to wrap functions/code that only get called
during suspend/resume with #ifdef CONFIG_SYSTEM_SUSPEND, so I wondered
why restore_local_irqs_on_resume() could not be compiled out
if the feature is not enabled. But if you still think it would be
cleaner this way (w/o #ifdef), I would be ok.
>
>>
>>> }
>>>
>>> return notifier_from_errno(rc);
>>
>
> Best regards,
> Mykola
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 18:16 ` Oleksandr Tyshchenko
@ 2025-09-02 20:08 ` Mykola Kvach
2025-09-02 20:19 ` Mykola Kvach
0 siblings, 1 reply; 37+ messages in thread
From: Mykola Kvach @ 2025-09-02 20:08 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On Tue, Sep 2, 2025 at 9:16 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 20:43, Mykola Kvach wrote:
> > Hi Oleksandr,
>
> Hello Mykola
>
> >
> > On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> >>
> >>
> >>
> >> On 02.09.25 01:10, Mykola Kvach wrote:
> >>
> >> Hello Mykola
> >>
> >>> From: Mykola Kvach <mykola_kvach@epam.com>
> >>>
> >>> On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> >>> and not restored by gic_resume (for secondary cpus).
> >>>
> >>> This patch introduces restore_local_irqs_on_resume, a function that
> >>> restores the state of local interrupts on the target CPU during
> >>> system resume.
> >>>
> >>> It iterates over all local IRQs and re-enables those that were not
> >>> disabled, reprogramming their routing and affinity accordingly.
> >>>
> >>> The function is invoked from start_secondary, ensuring that local IRQ
> >>> state is restored early during CPU bring-up after suspend.
> >>>
> >>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> >>> ---
> >>> Changes in V6:
> >>> - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> >>> - Move the system state check outside of restore_local_irqs_on_resume()
> >>> ---
> >>> xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> >>> 1 file changed, 39 insertions(+)
> >>>
> >>> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> >>> index 6c899347ca..ddd2940554 100644
> >>> --- a/xen/arch/arm/irq.c
> >>> +++ b/xen/arch/arm/irq.c
> >>> @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> >>> return 0;
> >>> }
> >>>
> >>> +/*
> >>> + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> >>> + * so call this function on the target CPU to restore them.
> >>> + *
> >>> + * SPIs are restored via gic_resume.
> >>> + */
> >>> +static void restore_local_irqs_on_resume(void)
> >>> +{
> >>> + int irq;
> >>
> >> NIT: Please, use "unsigned int" if irq cannot be negative
> >
> > ok
> >
> >>
> >>> +
> >>> + spin_lock(&local_irqs_type_lock);
> >>> +
> >>> + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> >>> + {
> >>> + struct irq_desc *desc = irq_to_desc(irq);
> >>> +
> >>> + spin_lock(&desc->lock);
> >>> +
> >>> + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> >>> + {
> >>> + spin_unlock(&desc->lock);
> >>> + continue;
> >>> + }
> >>> +
> >>> + /* Disable the IRQ to avoid assertions in the following calls */
> >>> + desc->handler->disable(desc);
> >>> + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
> >>
> >> Shouldn't we use GIC_PRI_IPI for SGIs?
> >
> > Yes, we should. But currently I am restoring the same value
> > as it was before suspend...
> >
> > I definitely agree that this needs to be fixed at the original
> > place where the issue was introduced, but I was planning to
> > address it in a future patch.
> >
> >>
> >>
> >>> + desc->handler->startup(desc);
> >>> +
> >>> + spin_unlock(&desc->lock);
> >>> + }
> >>> +
> >>> + spin_unlock(&local_irqs_type_lock);
> >>> +}
> >>> +
> >>> static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> >>> void *hcpu)
> >>> {
> >>> @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> >>> printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> >>> cpu);
> >>> break;
> >>> + case CPU_STARTING:
> >>> + if ( system_state == SYS_STATE_resume )
> >>> + restore_local_irqs_on_resume();
> >>> + break;
> >>
> >> May I please ask, why all this new code (i.e.
> >> restore_local_irqs_on_resume()) is not covered by #ifdef
> >> CONFIG_SYSTEM_SUSPEND?
> >
> > I don’t see a reason to introduce such "macaron-style" code. On ARM, the
> > system suspend state is only set when CONFIG_SYSTEM_SUSPEND is defined
> > anyway.
>
> right
>
> >
> > If you would prefer me to wrap all relevant code with this define, please
> > let me know and I’ll make the change. In this case, I think the current
> > approach is cleaner, but I’m open to your opinion.
>
> In other patches, you seem to wrap functions/code that only get called
> during suspend/resume with #ifdef CONFIG_SYSTEM_SUSPEND, so I wondered
> why restore_local_irqs_on_resume() could not be compiled out
> if the feature is not enabled. But if you still think it would be
> cleaner this way (w/o #ifdef), I would be ok.
It’s not entirely true -- I only wrapped code that has a direct dependency
on host_system_suspend(), either being called from it or required for its
correct operation.
If you look through this patch series for the pattern:
SYS_STATE_(suspend|resume)
you’ll see that not all suspend/resume-related code is wrapped in
#ifdef CONFIG_SYSTEM_SUSPEND. This is intentional -- the same applies to
some code already merged into the common parts of Xen.
So restore_local_irqs_on_resume is consistent with the existing approach
in all cpu notifier blocks.
>
> >
> >>
> >>> }
> >>>
> >>> return notifier_from_errno(rc);
> >>
> >
> > Best regards,
> > Mykola
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers
2025-09-01 22:10 ` [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
@ 2025-09-02 20:14 ` Volodymyr Babchuk
0 siblings, 0 replies; 37+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:14 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mirela Simonovic,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Saeed Nowshadi, Mykola Kvach
Hi,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> Timer interrupts must be disabled while the system is suspended to prevent
> spurious wake-ups. Suspending the timers involves disabling both the EL1
> physical timer and the EL2 hypervisor timer. Resuming consists of raising
> the TIMER_SOFTIRQ, which prompts the generic timer code to reprogram the
> EL2 timer as needed. Re-enabling of the EL1 timer is left to the entity
> that uses it.
>
> Introduce a new helper, disable_physical_timers, to encapsulate disabling
> of the physical timers.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in V4:
> - Rephrased comment and commit message for better clarity
> - Created separate function for disabling physical timers
>
> Changes in V3:
> - time_suspend and time_resume are now conditionally compiled
> under CONFIG_SYSTEM_SUSPEND
> ---
> xen/arch/arm/include/asm/time.h | 5 +++++
> xen/arch/arm/time.c | 38 +++++++++++++++++++++++++++------
> 2 files changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/xen/arch/arm/include/asm/time.h b/xen/arch/arm/include/asm/time.h
> index 49ad8c1a6d..f4fd0c6af5 100644
> --- a/xen/arch/arm/include/asm/time.h
> +++ b/xen/arch/arm/include/asm/time.h
> @@ -108,6 +108,11 @@ void preinit_xen_time(void);
>
> void force_update_vcpu_system_time(struct vcpu *v);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +void time_suspend(void);
> +void time_resume(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> #endif /* __ARM_TIME_H__ */
> /*
> * Local variables:
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index e74d30d258..ad984fdfdd 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -303,6 +303,14 @@ static void check_timer_irq_cfg(unsigned int irq, const char *which)
> "WARNING: %s-timer IRQ%u is not level triggered.\n", which, irq);
> }
>
> +/* Disable physical timers for EL1 and EL2 on the current CPU */
> +static inline void disable_physical_timers(void)
> +{
> + WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
> + WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
> + isb();
> +}
> +
> /* Set up the timer interrupt on this CPU */
> void init_timer_interrupt(void)
> {
> @@ -310,9 +318,7 @@ void init_timer_interrupt(void)
> WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */
> /* Do not let the VMs program the physical timer, only read the physical counter */
> WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
> - WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
> - WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
> - isb();
> + disable_physical_timers();
>
> request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> "hyptimer", NULL);
> @@ -330,9 +336,7 @@ void init_timer_interrupt(void)
> */
> static void deinit_timer_interrupt(void)
> {
> - WRITE_SYSREG(0, CNTP_CTL_EL0); /* Disable physical timer */
> - WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Disable hypervisor's timer */
> - isb();
> + disable_physical_timers();
>
> release_irq(timer_irq[TIMER_HYP_PPI], NULL);
> release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
> @@ -372,6 +376,28 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
> /* XXX update guest visible wallclock time */
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +void time_suspend(void)
> +{
> + disable_physical_timers();
> +}
> +
> +void time_resume(void)
> +{
> + /*
> + * Raising the timer softirq triggers generic code to call reprogram_timer
> + * with the correct timeout (not known here).
> + *
> + * No further action is needed to restore timekeeping after power down,
> + * since the system counter is unaffected. See ARM DDI 0487 L.a, D12.1.2
> + * "The system counter must be implemented in an always-on power domain."
> + */
> + raise_softirq(TIMER_SOFTIRQ);
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int cpu_time_callback(struct notifier_block *nfb,
> unsigned long action,
> void *hcpu)
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 20:08 ` Mykola Kvach
@ 2025-09-02 20:19 ` Mykola Kvach
0 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-02 20:19 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On Tue, Sep 2, 2025 at 11:08 PM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> On Tue, Sep 2, 2025 at 9:16 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> >
> >
> >
> > On 02.09.25 20:43, Mykola Kvach wrote:
> > > Hi Oleksandr,
> >
> > Hello Mykola
> >
> > >
> > > On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> > >>
> > >>
> > >>
> > >> On 02.09.25 01:10, Mykola Kvach wrote:
> > >>
> > >> Hello Mykola
> > >>
> > >>> From: Mykola Kvach <mykola_kvach@epam.com>
> > >>>
> > >>> On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> > >>> and not restored by gic_resume (for secondary cpus).
> > >>>
> > >>> This patch introduces restore_local_irqs_on_resume, a function that
> > >>> restores the state of local interrupts on the target CPU during
> > >>> system resume.
> > >>>
> > >>> It iterates over all local IRQs and re-enables those that were not
> > >>> disabled, reprogramming their routing and affinity accordingly.
> > >>>
> > >>> The function is invoked from start_secondary, ensuring that local IRQ
> > >>> state is restored early during CPU bring-up after suspend.
> > >>>
> > >>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > >>> ---
> > >>> Changes in V6:
> > >>> - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> > >>> - Move the system state check outside of restore_local_irqs_on_resume()
> > >>> ---
> > >>> xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> > >>> 1 file changed, 39 insertions(+)
> > >>>
> > >>> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> > >>> index 6c899347ca..ddd2940554 100644
> > >>> --- a/xen/arch/arm/irq.c
> > >>> +++ b/xen/arch/arm/irq.c
> > >>> @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> > >>> return 0;
> > >>> }
> > >>>
> > >>> +/*
> > >>> + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> > >>> + * so call this function on the target CPU to restore them.
> > >>> + *
> > >>> + * SPIs are restored via gic_resume.
> > >>> + */
> > >>> +static void restore_local_irqs_on_resume(void)
> > >>> +{
> > >>> + int irq;
> > >>
> > >> NIT: Please, use "unsigned int" if irq cannot be negative
> > >
> > > ok
> > >
> > >>
> > >>> +
> > >>> + spin_lock(&local_irqs_type_lock);
> > >>> +
> > >>> + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> > >>> + {
> > >>> + struct irq_desc *desc = irq_to_desc(irq);
> > >>> +
> > >>> + spin_lock(&desc->lock);
> > >>> +
> > >>> + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> > >>> + {
> > >>> + spin_unlock(&desc->lock);
> > >>> + continue;
> > >>> + }
> > >>> +
> > >>> + /* Disable the IRQ to avoid assertions in the following calls */
> > >>> + desc->handler->disable(desc);
> > >>> + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
> > >>
> > >> Shouldn't we use GIC_PRI_IPI for SGIs?
> > >
> > > Yes, we should. But currently I am restoring the same value
> > > as it was before suspend...
> > >
> > > I definitely agree that this needs to be fixed at the original
> > > place where the issue was introduced, but I was planning to
> > > address it in a future patch.
> > >
> > >>
> > >>
> > >>> + desc->handler->startup(desc);
> > >>> +
> > >>> + spin_unlock(&desc->lock);
> > >>> + }
> > >>> +
> > >>> + spin_unlock(&local_irqs_type_lock);
> > >>> +}
> > >>> +
> > >>> static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > >>> void *hcpu)
> > >>> {
> > >>> @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > >>> printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> > >>> cpu);
> > >>> break;
> > >>> + case CPU_STARTING:
> > >>> + if ( system_state == SYS_STATE_resume )
> > >>> + restore_local_irqs_on_resume();
> > >>> + break;
> > >>
> > >> May I please ask, why all this new code (i.e.
> > >> restore_local_irqs_on_resume()) is not covered by #ifdef
> > >> CONFIG_SYSTEM_SUSPEND?
> > >
> > > I don’t see a reason to introduce such "macaron-style" code. On ARM, the
> > > system suspend state is only set when CONFIG_SYSTEM_SUSPEND is defined
> > > anyway.
> >
> > right
> >
> > >
> > > If you would prefer me to wrap all relevant code with this define, please
> > > let me know and I’ll make the change. In this case, I think the current
> > > approach is cleaner, but I’m open to your opinion.
> >
> > In other patches, you seem to wrap functions/code that only get called
> > during suspend/resume with #ifdef CONFIG_SYSTEM_SUSPEND, so I wondered
> > why restore_local_irqs_on_resume() could not be compiled out
> > if the feature is not enabled. But if you still think it would be
> > cleaner this way (w/o #ifdef), I would be ok.
>
> It’s not entirely true -- I only wrapped code that has a direct dependency
> on host_system_suspend(), either being called from it or required for its
> correct operation.
>
> If you look through this patch series for the pattern:
> SYS_STATE_(suspend|resume)
>
> you’ll see that not all suspend/resume-related code is wrapped in
> #ifdef CONFIG_SYSTEM_SUSPEND. This is intentional -- the same applies to
> some code already merged into the common parts of Xen.
>
> So restore_local_irqs_on_resume is consistent with the existing approach
> in all cpu notifier blocks.
Of course, I can wrap all code in this patch series if needed. For me, the
current approach looks clearer and aligns with existing code. On the other
hand, I introduced this config option not so long ago, so that may be why
some parts in common code and even in some architectures like x86 are still
uncovered.
In any case, I don't mind covering all the code if you think it would be better.
Right now, this implementation is mainly my preference and aligns with the
existing code. There isn't any other reasoning behind this decision.
>
> >
> > >
> > >>
> > >>> }
> > >>>
> > >>> return notifier_from_errno(rc);
> > >>
> > >
> > > Best regards,
> > > Mykola
> >
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2025-09-01 22:10 ` [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
@ 2025-09-02 20:24 ` Volodymyr Babchuk
0 siblings, 0 replies; 37+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:24 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mirela Simonovic,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Saeed Nowshadi, Mykyta Poturai, Mykola Kvach
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> System suspend may lead to a state where GIC would be powered down.
> Therefore, Xen should save/restore the context of GIC on suspend/resume.
>
> Note that the context consists of states of registers which are
> controlled by the hypervisor. Other GIC registers which are accessible
> by guests are saved/restored on context switch.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in v6:
> - drop extra func/line printing from dprintk
> - drop checking context allocation from resume handler
> - merge some loops where it is possible
>
> Changes in v4:
> - Add error logging for allocation failures
>
> Changes in v3:
> - Drop asserts and return error codes instead.
> - Wrap code with CONFIG_SYSTEM_SUSPEND.
>
> Changes in v2:
> - Minor fixes after review.
> ---
> xen/arch/arm/gic-v2.c | 143 +++++++++++++++++++++++++++++++++
> xen/arch/arm/gic.c | 29 +++++++
> xen/arch/arm/include/asm/gic.h | 12 +++
> 3 files changed, 184 insertions(+)
>
> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> index b23e72a3d0..6373599e69 100644
> --- a/xen/arch/arm/gic-v2.c
> +++ b/xen/arch/arm/gic-v2.c
> @@ -1098,6 +1098,140 @@ static int gicv2_iomem_deny_access(struct domain *d)
> return iomem_deny_access(d, mfn, mfn + nr);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +/* GICv2 registers to be saved/restored on system suspend/resume */
> +struct gicv2_context {
> + /* GICC context */
> + uint32_t gicc_ctlr;
> + uint32_t gicc_pmr;
> + uint32_t gicc_bpr;
> + /* GICD context */
> + uint32_t gicd_ctlr;
> + uint32_t *gicd_isenabler;
> + uint32_t *gicd_isactiver;
> + uint32_t *gicd_ipriorityr;
> + uint32_t *gicd_itargetsr;
> + uint32_t *gicd_icfgr;
> +};
> +
> +static struct gicv2_context gicv2_context;
> +
> +static int gicv2_suspend(void)
> +{
> + unsigned int i;
> +
> + if ( !gicv2_context.gicd_isenabler )
> + {
> + dprintk(XENLOG_WARNING, "GICv2 suspend context not allocated!\n");
> + return -ENOMEM;
> + }
> +
> + /* Save GICC configuration */
> + gicv2_context.gicc_ctlr = readl_gicc(GICC_CTLR);
> + gicv2_context.gicc_pmr = readl_gicc(GICC_PMR);
> + gicv2_context.gicc_bpr = readl_gicc(GICC_BPR);
> +
> + /* Save GICD configuration */
> + gicv2_context.gicd_ctlr = readl_gicd(GICD_CTLR);
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
> + {
> + gicv2_context.gicd_isenabler[i] = readl_gicd(GICD_ISENABLER + i * 4);
> + gicv2_context.gicd_isactiver[i] = readl_gicd(GICD_ISACTIVER + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
> + {
> + gicv2_context.gicd_ipriorityr[i] = readl_gicd(GICD_IPRIORITYR + i * 4);
> + gicv2_context.gicd_itargetsr[i] = readl_gicd(GICD_ITARGETSR + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
> + gicv2_context.gicd_icfgr[i] = readl_gicd(GICD_ICFGR + i * 4);
> +
> + return 0;
> +}
> +
> +static void gicv2_resume(void)
> +{
> + unsigned int i;
> +
> + gicv2_cpu_disable();
> + /* Disable distributor */
> + writel_gicd(0, GICD_CTLR);
> +
> + /* Restore GICD configuration */
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
> + {
> + writel_gicd(0xffffffff, GICD_ICENABLER + i * 4);
> + writel_gicd(gicv2_context.gicd_isenabler[i], GICD_ISENABLER + i * 4);
> +
> + writel_gicd(0xffffffff, GICD_ICACTIVER + i * 4);
> + writel_gicd(gicv2_context.gicd_isactiver[i], GICD_ISACTIVER + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
> + {
> + writel_gicd(gicv2_context.gicd_ipriorityr[i], GICD_IPRIORITYR + i * 4);
> + writel_gicd(gicv2_context.gicd_itargetsr[i], GICD_ITARGETSR + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
> + writel_gicd(gicv2_context.gicd_icfgr[i], GICD_ICFGR + i * 4);
> +
> + /* Make sure all registers are restored and enable distributor */
> + writel_gicd(gicv2_context.gicd_ctlr | GICD_CTL_ENABLE, GICD_CTLR);
> +
> + /* Restore GIC CPU interface configuration */
> + writel_gicc(gicv2_context.gicc_pmr, GICC_PMR);
> + writel_gicc(gicv2_context.gicc_bpr, GICC_BPR);
> +
> + /* Enable GIC CPU interface */
> + writel_gicc(gicv2_context.gicc_ctlr | GICC_CTL_ENABLE | GICC_CTL_EOI,
> + GICC_CTLR);
> +}
> +
> +static void gicv2_alloc_context(struct gicv2_context *gc)
> +{
> + uint32_t n = gicv2_info.nr_lines;
> +
> + gc->gicd_isenabler = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
> + if ( !gc->gicd_isenabler )
> + goto err_free;
> +
> + gc->gicd_isactiver = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
> + if ( !gc->gicd_isactiver )
> + goto err_free;
> +
> + gc->gicd_itargetsr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
> + if ( !gc->gicd_itargetsr )
> + goto err_free;
> +
> + gc->gicd_ipriorityr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
> + if ( !gc->gicd_ipriorityr )
> + goto err_free;
> +
> + gc->gicd_icfgr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 16));
> + if ( !gc->gicd_icfgr )
> + goto err_free;
> +
> + return;
> +
> + err_free:
> + printk(XENLOG_ERR "Failed to allocate memory for GICv2 suspend context\n");
> +
> + xfree(gc->gicd_icfgr);
> + xfree(gc->gicd_ipriorityr);
> + xfree(gc->gicd_itargetsr);
> + xfree(gc->gicd_isactiver);
> + xfree(gc->gicd_isenabler);
> +
> + memset(gc, 0, sizeof(*gc));
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> #ifdef CONFIG_ACPI
> static unsigned long gicv2_get_hwdom_extra_madt_size(const struct domain *d)
> {
> @@ -1302,6 +1436,11 @@ static int __init gicv2_init(void)
>
> spin_unlock(&gicv2.lock);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + /* Allocate memory to be used for saving GIC context during the suspend */
> + gicv2_alloc_context(&gicv2_context);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> return 0;
> }
>
> @@ -1345,6 +1484,10 @@ static const struct gic_hw_operations gicv2_ops = {
> .map_hwdom_extra_mappings = gicv2_map_hwdom_extra_mappings,
> .iomem_deny_access = gicv2_iomem_deny_access,
> .do_LPI = gicv2_do_LPI,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = gicv2_suspend,
> + .resume = gicv2_resume,
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> };
>
> /* Set up the GIC */
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e80fe0ca24..a018bd7715 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -425,6 +425,35 @@ int gic_iomem_deny_access(struct domain *d)
> return gic_hw_ops->iomem_deny_access(d);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +int gic_suspend(void)
> +{
> + /* Must be called by boot CPU#0 with interrupts disabled */
> + ASSERT(!local_irq_is_enabled());
> + ASSERT(!smp_processor_id());
> +
> + if ( !gic_hw_ops->suspend || !gic_hw_ops->resume )
> + return -ENOSYS;
> +
> + return gic_hw_ops->suspend();
> +}
> +
> +void gic_resume(void)
> +{
> + /*
> + * Must be called by boot CPU#0 with interrupts disabled after gic_suspend
> + * has returned successfully.
> + */
> + ASSERT(!local_irq_is_enabled());
> + ASSERT(!smp_processor_id());
> + ASSERT(gic_hw_ops->resume);
> +
> + gic_hw_ops->resume();
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int cpu_gic_callback(struct notifier_block *nfb,
> unsigned long action,
> void *hcpu)
> diff --git a/xen/arch/arm/include/asm/gic.h b/xen/arch/arm/include/asm/gic.h
> index 541f0eeb80..a706303008 100644
> --- a/xen/arch/arm/include/asm/gic.h
> +++ b/xen/arch/arm/include/asm/gic.h
> @@ -280,6 +280,12 @@ extern int gicv_setup(struct domain *d);
> extern void gic_save_state(struct vcpu *v);
> extern void gic_restore_state(struct vcpu *v);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +/* Suspend/resume */
> +extern int gic_suspend(void);
> +extern void gic_resume(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> /* SGI (AKA IPIs) */
> enum gic_sgi {
> GIC_SGI_EVENT_CHECK,
> @@ -395,6 +401,12 @@ struct gic_hw_operations {
> int (*iomem_deny_access)(struct domain *d);
> /* Handle LPIs, which require special handling */
> void (*do_LPI)(unsigned int lpi);
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + /* Save GIC configuration due to the system suspend */
> + int (*suspend)(void);
> + /* Restore GIC configuration due to the system resume */
> + void (*resume)(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> };
>
> extern const struct gic_hw_operations *gic_hw_ops;
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend
2025-09-01 22:10 ` [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend Mykola Kvach
@ 2025-09-02 20:31 ` Volodymyr Babchuk
0 siblings, 0 replies; 37+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:31 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel
Hi Mykola,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> If we call disable_nonboot_cpus on ARM64 with system_state set
> to SYS_STATE_suspend, the following assertion will be triggered:
>
> ```
> (XEN) [ 25.582712] Disabling non-boot CPUs ...
> (XEN) [ 25.587032] Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
> [...]
> (XEN) [ 25.975069] Xen call trace:
> (XEN) [ 25.978353] [<00000a000022e098>] xfree+0x130/0x1a4 (PC)
> (XEN) [ 25.984314] [<00000a000022e08c>] xfree+0x124/0x1a4 (LR)
> (XEN) [ 25.990276] [<00000a00002747d4>] release_irq+0xe4/0xe8
> (XEN) [ 25.996152] [<00000a0000278588>] time.c#cpu_time_callback+0x44/0x60
> (XEN) [ 26.003150] [<00000a000021d678>] notifier_call_chain+0x7c/0xa0
> (XEN) [ 26.009717] [<00000a00002018e0>] cpu.c#cpu_notifier_call_chain+0x24/0x48
> (XEN) [ 26.017148] [<00000a000020192c>] cpu.c#_take_cpu_down+0x28/0x34
> (XEN) [ 26.023801] [<00000a0000201944>] cpu.c#take_cpu_down+0xc/0x18
> (XEN) [ 26.030281] [<00000a0000225c5c>] stop_machine.c#stopmachine_action+0xbc/0xe4
> (XEN) [ 26.038057] [<00000a00002264bc>] tasklet.c#do_tasklet_work+0xb8/0x100
> (XEN) [ 26.045229] [<00000a00002268a4>] do_tasklet+0x68/0xb0
> (XEN) [ 26.051018] [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
> (XEN) [ 26.057585] [<00000a0000277e30>] start_secondary+0x21c/0x220
> (XEN) [ 26.063978] [<00000a0000361258>] 00000a0000361258
> ```
>
> This happens because before invoking take_cpu_down via the stop_machine_run
> function on the target CPU, stop_machine_run requests
> the STOPMACHINE_DISABLE_IRQ state on that CPU. Releasing memory in
> the release_irq function then triggers the assertion:
>
> /*
> * Heap allocations may need TLB flushes which may require IRQs to be
> * enabled (except when only 1 PCPU is online).
> */
>
> This patch adds system state checks to guard calls to request_irq
> and release_irq. These calls are now skipped when system_state is
> SYS_STATE_{resume,suspend}, preventing unsafe operations during
> suspend/resume handling.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in V6:
> - skipping of IRQ release during system suspend is now handled
> inside release_irq().
> Changes in V4:
> - removed the prior tasklet-based workaround in favor of a more
> straightforward and safer solution
> - reworked the approach by adding explicit system state checks around
> request_irq and release_irq calls, skips these calls during suspend
> and resume states to avoid unsafe memory operations when IRQs are
> disabled
> ---
> xen/arch/arm/gic.c | 3 +++
> xen/arch/arm/irq.c | 3 +++
> xen/arch/arm/tee/ffa_notif.c | 2 +-
> xen/arch/arm/time.c | 11 +++++++----
> 4 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index a018bd7715..c64481faa7 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -388,6 +388,9 @@ void gic_dump_info(struct vcpu *v)
>
> void init_maintenance_interrupt(void)
> {
> + if ( system_state == SYS_STATE_resume )
> + return;
> +
> request_irq(gic_hw_ops->info->maintenance_irq, 0, maintenance_interrupt,
> "irq-maintenance", NULL);
> }
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index 02ca82c089..361496a6d0 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -300,6 +300,9 @@ void release_irq(unsigned int irq, const void *dev_id)
> unsigned long flags;
> struct irqaction *action, **action_ptr;
>
> + if ( system_state == SYS_STATE_suspend )
> + return;
> +
> desc = irq_to_desc(irq);
>
> spin_lock_irqsave(&desc->lock,flags);
> diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
> index 86bef6b3b2..4835e25619 100644
> --- a/xen/arch/arm/tee/ffa_notif.c
> +++ b/xen/arch/arm/tee/ffa_notif.c
> @@ -363,7 +363,7 @@ void ffa_notif_init_interrupt(void)
> {
> int ret;
>
> - if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
> + if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI && system_state != SYS_STATE_resume )
> {
> /*
> * An error here is unlikely since the primary CPU has already
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index ad984fdfdd..8267fa5191 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -320,10 +320,13 @@ void init_timer_interrupt(void)
> WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
> disable_physical_timers();
>
> - request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> - "hyptimer", NULL);
> - request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
> - "virtimer", NULL);
> + if ( system_state != SYS_STATE_resume )
> + {
> + request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> + "hyptimer", NULL);
> + request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
> + "virtimer", NULL);
> + }
>
> check_timer_irq_cfg(timer_irq[TIMER_HYP_PPI], "hypervisor");
> check_timer_irq_cfg(timer_irq[TIMER_VIRT_PPI], "virtual");
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-01 22:10 ` [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
@ 2025-09-02 20:39 ` Volodymyr Babchuk
2025-09-03 10:01 ` Oleksandr Tyshchenko
1 sibling, 0 replies; 37+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:39 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Oleksandr Tyshchenko,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Mykola Kvach
Hi,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Store and restore active context and micro-TLB registers.
>
> Tested on R-Car H3 Starter Kit.
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in V6:
> - refactor code related to hw_register struct, from now it's called
> ipmmu_reg_ctx
> ---
> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
> 1 file changed, 257 insertions(+)
>
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index ea9fa9ddf3..0973559861 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -71,6 +71,8 @@
> })
> #endif
>
> +#define dev_dbg(dev, fmt, ...) \
> + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> #define dev_info(dev, fmt, ...) \
> dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> #define dev_warn(dev, fmt, ...) \
> @@ -130,6 +132,24 @@ struct ipmmu_features {
> unsigned int imuctr_ttsel_mask;
> };
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +struct ipmmu_reg_ctx {
> + unsigned int imttlbr0;
> + unsigned int imttubr0;
> + unsigned int imttbcr;
> + unsigned int imctr;
> +};
> +
> +struct ipmmu_vmsa_backup {
> + struct device *dev;
> + unsigned int *utlbs_val;
> + unsigned int *asids_val;
> + struct list_head list;
> +};
> +
> +#endif
> +
> /* Root/Cache IPMMU device's information */
> struct ipmmu_vmsa_device {
> struct device *dev;
> @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
> struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> unsigned int utlb_refcount[IPMMU_UTLB_MAX];
> const struct ipmmu_features *features;
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
> +#endif
> };
>
> /*
> @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
> spin_unlock_irqrestore(&mmu->lock, flags);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
> +static LIST_HEAD(ipmmu_devices_backup);
> +
> +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
> +
> +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
> + unsigned int utlb)
> +{
> + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
> +}
> +
> +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> +
> + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> +
> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> + {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> + unsigned int i;
> +
> + if ( to_ipmmu(backup_data->dev) != mmu )
> + continue;
> +
> + for ( i = 0; i < fwspec->num_ids; i++ )
> + {
> + unsigned int utlb = fwspec->ids[i];
> +
> + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
> + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
> + }
> + }
> +
> + spin_unlock(&ipmmu_devices_backup_lock);
> +}
> +
> +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> +
> + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> +
> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> + {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> + unsigned int i;
> +
> + if ( to_ipmmu(backup_data->dev) != mmu )
> + continue;
> +
> + for ( i = 0; i < fwspec->num_ids; i++ )
> + {
> + unsigned int utlb = fwspec->ids[i];
> +
> + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
> + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
> + }
> + }
> +
> + spin_unlock(&ipmmu_devices_backup_lock);
> +}
> +
> +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
> +{
> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> +
> + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
> +
> + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
> + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
> + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
> + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
> +}
> +
> +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
> +{
> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> +
> + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
> +
> + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
> + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
> + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
> + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
> +}
> +
> +/*
> + * Xen: Unlike Linux implementation, Xen uses a single driver instance
> + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
> + * callbacks to be invoked for each IPMMU device. So, we need to iterate
> + * through all registered IPMMUs performing required actions.
> + *
> + * Also take care of restoring special settings, such as translation
> + * table format, etc.
> + */
> +static int __must_check ipmmu_suspend(void)
> +{
> + struct ipmmu_vmsa_device *mmu;
> +
> + if ( !iommu_enabled )
> + return 0;
> +
> + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
> +
> + spin_lock(&ipmmu_devices_lock);
> +
> + list_for_each_entry( mmu, &ipmmu_devices, list )
> + {
> + if ( ipmmu_is_root(mmu) )
> + {
> + unsigned int i;
> +
> + for ( i = 0; i < mmu->num_ctx; i++ )
> + {
> + if ( !mmu->domains[i] )
> + continue;
> + ipmmu_domain_backup_context(mmu->domains[i]);
> + }
> + }
> + else
> + ipmmu_utlbs_backup(mmu);
> + }
> +
> + spin_unlock(&ipmmu_devices_lock);
> +
> + return 0;
> +}
> +
> +static void ipmmu_resume(void)
> +{
> + struct ipmmu_vmsa_device *mmu;
> +
> + if ( !iommu_enabled )
> + return;
> +
> + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
> +
> + spin_lock(&ipmmu_devices_lock);
> +
> + list_for_each_entry( mmu, &ipmmu_devices, list )
> + {
> + uint32_t reg;
> +
> + /* Do not use security group function */
> + reg = IMSCTLR + mmu->features->control_offset_base;
> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
> +
> + if ( ipmmu_is_root(mmu) )
> + {
> + unsigned int i;
> +
> + /* Use stage 2 translation table format */
> + reg = IMSAUXCTLR + mmu->features->control_offset_base;
> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
> +
> + for ( i = 0; i < mmu->num_ctx; i++ )
> + {
> + if ( !mmu->domains[i] )
> + continue;
> + ipmmu_domain_restore_context(mmu->domains[i]);
> + }
> + }
> + else
> + ipmmu_utlbs_restore(mmu);
> + }
> +
> + spin_unlock(&ipmmu_devices_lock);
> +}
> +
> +static int ipmmu_alloc_ctx_suspend(struct device *dev)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> + unsigned int *utlbs_val, *asids_val;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
> + if ( !utlbs_val )
> + return -ENOMEM;
> +
> + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
> + if ( !asids_val )
> + {
> + xfree(utlbs_val);
> + return -ENOMEM;
> + }
> +
> + backup_data = xzalloc(struct ipmmu_vmsa_backup);
> + if ( !backup_data )
> + {
> + xfree(utlbs_val);
> + xfree(asids_val);
> + return -ENOMEM;
> + }
> +
> + backup_data->dev = dev;
> + backup_data->utlbs_val = utlbs_val;
> + backup_data->asids_val = asids_val;
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> + list_add(&backup_data->list, &ipmmu_devices_backup);
> + spin_unlock(&ipmmu_devices_backup_lock);
> +
> + return 0;
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> {
> uint64_t ttbr;
> @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> return ret;
>
> domain->context_id = ret;
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
> +#endif
>
> /*
> * TTBR0
> @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
> ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
> ipmmu_tlb_sync(domain);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + domain->mmu->root->reg_backup[domain->context_id] = NULL;
> +#endif
> ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
> }
>
> @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> }
> #endif
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + if ( ipmmu_alloc_ctx_suspend(dev) )
> + {
> + dev_err(dev, "Failed to allocate context for suspend\n");
> + return -ENOMEM;
> + }
> +#endif
> +
> dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
> dev_name(fwspec->iommu_dev), fwspec->num_ids);
>
> @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = ipmmu_dt_xlate,
> .add_device = ipmmu_add_device,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = ipmmu_suspend,
> + .resume = ipmmu_resume,
> +#endif
> };
>
> static __init int ipmmu_init(struct dt_device_node *node, const void *data)
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (12 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 13/13] xen/arm: gic-v3: Add suspend/resume support for eSPI registers Mykola Kvach
@ 2025-09-02 20:48 ` Volodymyr Babchuk
13 siblings, 0 replies; 37+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:48 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Andrew Cooper,
Anthony PERARD, Jan Beulich, Roger Pau Monné, Rahul Singh
Hi Mykola,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> This is part 2 of version 5 of the ARM Xen system suspend/resume patch
> series, based on earlier work by Mirela Simonovic and Mykyta Poturai.
>
> The first part is here:
> https://marc.info/?l=xen-devel&m=175659181415965&w=2
>
> This version is ported to Xen master (4.21-unstable) and includes
> extensive improvements based on reviewer feedback. The patch series
> restructures code to improve robustness, maintainability, and implements
> system Suspend-to-RAM support on ARM64 hardware domains.
>
> At a high-level, this patch series provides:
> - Support for Host system suspend/resume via PSCI SYSTEM_SUSPEND
> (ARM64)
I am wondering if you had to split this into 3 patches. Looks like patches
8 and 9 are useless without patch 10. They just add bunch of dead
code. Maybe it is better to squash them into one patch? I may be wrong
here, so maybe other reviewers/maintainers will correct me.
> - Suspend/resume infrastructure for CPU context, timers, GICv2/GICv3 and IPMMU-VMSA
> - Proper error propagation and recovery throughout the suspend/resume flow
>
> Key updates in this series:
> - Introduced architecture-specific suspend/resume infrastructure (new `suspend.c`, `suspend.h`, low-level context save/restore in `head.S`)
> - Integrated GICv2/GICv3 suspend and resume, including memory-backed context save/restore with error handling
> - Added time and IRQ suspend/resume hooks, ensuring correct timer/interrupt state across suspend cycles
> - Implemented proper PSCI SYSTEM_SUSPEND invocation and version checks
> - Improved state management and recovery in error cases during suspend/resume
> - Added support for IPMMU-VMSA context save/restore
> - Added support for GICv3 eSPI registers context save/restore
>
> ---
> TODOs:
> - Test system suspend with llc_coloring_enabled set and verify functionality
> - Implement SMMUv3 suspend/resume handlers
> - Enable "xl suspend" support on ARM
> - Properly disable Xen timer watchdog from relevant services (only init.d left)
> - Add suspend/resume CI test for ARM (QEMU if feasible)
> - Investigate feasibility and need for implementing system suspend on ARM32
> ---
>
> Changelog for v6:
> - Add suspend/resume support for GICv3 eSPI registers (to be applied after the
> main eSPI series).
> - Drop redundant iommu_enabled check from host system suspend.
> - Switch from continue_hypercall_on_cpu to a dedicated tasklet for system
> suspend, avoiding user register modification and decoupling guest/system
> suspend status.
> - Refactor IOMMU register context code.
> - Improve IRQ handling: call handler->disable(), move system state checks, and
> skip IRQ release during suspend inside release_irq().
> - Remove redundant GICv3 save/restore state logic now handled during vCPU
> context switch.
> - Clarify and unify error/warning messages, comments, and documentation.
> - Correct loops for saving/restoring priorities and merge loops where possible.
> - Add explicit error for unimplemented ITS suspend support.
> - Add missing GICD_CTLR_DS bit definition and clarify GICR_WAKER comments.
> - Cleanup active and enable registers before restoring.
> - Minor comment improvements and code cleanups.
>
> Changes introduced in V5:
> - Add support for IPMMU-VMSA context save/restore
> - Add support for GICv3 context save/restore
> - Select HAS_SYSTEM_SUSPEND in ARM_64 instead of ARM
> - Check llc_coloring_enabled instead of LLC_COLORING during the selection
> of HAS_SYSTEM_SUSPEND config
> - Call host_system_suspend from guest PSCI system suspend instead of
> arch_domain_shutdown, reducing the complexity of the new code
>
> Changes introduced in V4:
> - Remove the prior tasklet-based workaround in favor of a more
> straightforward and safer solution.
> - Rework the approach by adding explicit system state checks around
> request_irq and release_irq calls; skip these calls during suspend
> and resume states to avoid unsafe memory operations when IRQs are
> disabled.
> - Prevent reinitialization of local IRQ descriptors on system resume.
> - Restore the state of local IRQs during system resume for secondary CPUs.
> - Drop code for saving and restoring VCPU context (see part 1 of the patch
> series for details).
> - Remove IOMMU suspend and resume calls until these features are implemented.
> - Move system suspend logic to arch_domain_shutdown, invoked from
> domain_shutdown.
> - Add console_end_sync to the resume path after system suspend.
> - Drop unnecessary DAIF masking; interrupts are already masked on resume.
> - Remove leftover TLB flush instructions; flushing is handled in enable_mmu.
> - Avoid setting x19 in hyp_resume as it is not required.
> - Replace prepare_secondary_mm with set_init_ttbr, and call it from system_suspend.
> - Produce a build-time error for ARM32 when CONFIG_SYSTEM_SUSPEND is enabled.
> - Use register_t instead of uint64_t in the cpu_context structure.
> - Apply minor fixes such as renaming functions, updating comments, and
> modifying commit messages to accurately reflect the changes introduced
> by this patch series.
>
> For earlier changelogs, please refer to the previous cover letters.
>
> Previous versions:
> V1: https://marc.info/?l=xen-devel&m=154202231501850&w=2
> V2: https://marc.info/?l=xen-devel&m=166514782207736&w=2
> V3: https://lists.xen.org/archives/html/xen-devel/2025-03/msg00168.html
>
> Mirela Simonovic (6):
> xen/arm: Add suspend and resume timer helpers
> xen/arm: gic-v2: Implement GIC suspend/resume functions
> xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
> xen/arm: Resume memory management on Xen resume
> xen/arm: Save/restore context on suspend/resume
> xen/arm: Add support for system suspend triggered by hardware domain
>
> Mykola Kvach (5):
> xen/arm: gic-v3: Implement GICv3 suspend/resume functions
> xen/arm: Don't release IRQs on suspend
> xen/arm: irq: avoid local IRQ descriptors reinit on system resume
> xen/arm: irq: Restore state of local IRQs during system resume
> xen/arm: gic-v3: Add suspend/resume support for eSPI registers
>
> Oleksandr Tyshchenko (2):
> iommu/ipmmu-vmsa: Implement suspend/resume callbacks
> xen/arm: Suspend/resume IOMMU on Xen suspend/resume
>
> xen/arch/arm/Kconfig | 1 +
> xen/arch/arm/Makefile | 1 +
> xen/arch/arm/arm64/head.S | 112 +++++++++
> xen/arch/arm/gic-v2.c | 143 +++++++++++
> xen/arch/arm/gic-v3-lpi.c | 3 +
> xen/arch/arm/gic-v3.c | 288 +++++++++++++++++++++++
> xen/arch/arm/gic.c | 32 +++
> xen/arch/arm/include/asm/gic.h | 12 +
> xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
> xen/arch/arm/include/asm/mm.h | 2 +
> xen/arch/arm/include/asm/psci.h | 1 +
> xen/arch/arm/include/asm/suspend.h | 46 ++++
> xen/arch/arm/include/asm/time.h | 5 +
> xen/arch/arm/irq.c | 46 ++++
> xen/arch/arm/mmu/smpboot.c | 2 +-
> xen/arch/arm/psci.c | 23 +-
> xen/arch/arm/suspend.c | 175 ++++++++++++++
> xen/arch/arm/tee/ffa_notif.c | 2 +-
> xen/arch/arm/time.c | 49 +++-
> xen/arch/arm/vpsci.c | 9 +-
> xen/common/domain.c | 4 +
> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 ++++++++++++++++++++
> xen/drivers/passthrough/arm/smmu-v3.c | 10 +
> xen/drivers/passthrough/arm/smmu.c | 10 +
> 24 files changed, 1220 insertions(+), 14 deletions(-)
> create mode 100644 xen/arch/arm/include/asm/suspend.h
> create mode 100644 xen/arch/arm/suspend.c
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume
2025-09-01 22:10 ` [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
2025-09-02 17:25 ` Oleksandr Tyshchenko
@ 2025-09-02 20:51 ` Volodymyr Babchuk
1 sibling, 0 replies; 37+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:51 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Oleksandr Tyshchenko,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Rahul Singh, Mykola Kvach
Hi,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> This is done using generic iommu_suspend/resume functions that cause
> IOMMU driver specific suspend/resume handlers to be called for enabled
> IOMMU (if one has suspend/resume driver handlers implemented).
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in V6:
> - Drop iommu_enabled check from host system suspend.
> ---
> xen/arch/arm/suspend.c | 11 +++++++++++
> xen/drivers/passthrough/arm/smmu-v3.c | 10 ++++++++++
> xen/drivers/passthrough/arm/smmu.c | 10 ++++++++++
> 3 files changed, 31 insertions(+)
>
> diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> index 35b20581f1..f3a3b831c5 100644
> --- a/xen/arch/arm/suspend.c
> +++ b/xen/arch/arm/suspend.c
> @@ -5,6 +5,7 @@
>
> #include <xen/console.h>
> #include <xen/cpu.h>
> +#include <xen/iommu.h>
> #include <xen/llc-coloring.h>
> #include <xen/sched.h>
> #include <xen/tasklet.h>
> @@ -62,6 +63,13 @@ static void cf_check system_suspend(void *data)
>
> time_suspend();
>
> + status = iommu_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_time;
> + }
> +
> console_start_sync();
> status = console_suspend();
> if ( status )
> @@ -118,6 +126,9 @@ static void cf_check system_suspend(void *data)
> console_resume();
> console_end_sync();
>
> + iommu_resume();
> +
> + resume_time:
> time_resume();
>
> resume_nonboot_cpus:
> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> index 81071f4018..f887faf7dc 100644
> --- a/xen/drivers/passthrough/arm/smmu-v3.c
> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> @@ -2854,6 +2854,13 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
> +
> static const struct iommu_ops arm_smmu_iommu_ops = {
> .page_sizes = PAGE_SIZE_4K,
> .init = arm_smmu_iommu_xen_domain_init,
> @@ -2866,6 +2873,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = arm_smmu_dt_xlate,
> .add_device = arm_smmu_add_device,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = arm_smmu_suspend,
> +#endif
> };
>
> static __init int arm_smmu_dt_init(struct dt_device_node *dev,
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 22d306d0cb..45f29ef8ec 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
> +
> static const struct iommu_ops arm_smmu_iommu_ops = {
> .page_sizes = PAGE_SIZE_4K,
> .init = arm_smmu_iommu_domain_init,
> @@ -2960,6 +2967,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> .map_page = arm_iommu_map_page,
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = arm_smmu_dt_xlate_generic,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = arm_smmu_suspend,
> +#endif
> };
>
> static struct arm_smmu_device *find_smmu(const struct device *dev)
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 16:49 ` Oleksandr Tyshchenko
2025-09-02 17:43 ` Mykola Kvach
@ 2025-09-02 22:21 ` Mykola Kvach
1 sibling, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-02 22:21 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> > and not restored by gic_resume (for secondary cpus).
> >
> > This patch introduces restore_local_irqs_on_resume, a function that
> > restores the state of local interrupts on the target CPU during
> > system resume.
> >
> > It iterates over all local IRQs and re-enables those that were not
> > disabled, reprogramming their routing and affinity accordingly.
> >
> > The function is invoked from start_secondary, ensuring that local IRQ
> > state is restored early during CPU bring-up after suspend.
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> > - Move the system state check outside of restore_local_irqs_on_resume()
> > ---
> > xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 39 insertions(+)
> >
> > diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> > index 6c899347ca..ddd2940554 100644
> > --- a/xen/arch/arm/irq.c
> > +++ b/xen/arch/arm/irq.c
> > @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> > return 0;
> > }
> >
> > +/*
> > + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> > + * so call this function on the target CPU to restore them.
> > + *
> > + * SPIs are restored via gic_resume.
> > + */
> > +static void restore_local_irqs_on_resume(void)
> > +{
> > + int irq;
>
> NIT: Please, use "unsigned int" if irq cannot be negative
>
> > +
> > + spin_lock(&local_irqs_type_lock);
> > +
> > + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> > + {
> > + struct irq_desc *desc = irq_to_desc(irq);
> > +
> > + spin_lock(&desc->lock);
> > +
> > + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> > + {
> > + spin_unlock(&desc->lock);
> > + continue;
> > + }
> > +
> > + /* Disable the IRQ to avoid assertions in the following calls */
> > + desc->handler->disable(desc);
> > + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
>
> Shouldn't we use GIC_PRI_IPI for SGIs?
I'll update the priority value in the next version.
Initially, I assumed gic_route_irq_to_xen() was used for all
interrupts with the same priority. But looking more closely, it
doesn't appear to be called for SGIs at all.
In fact, SGI configuration, including priority, is handled during CPU
initialization in gic_init_secondary_cpu(), which is called before
the CPU_STARTING notifier.
Given that, it's probably better to avoid updating SGI priorities here
entirely and rely on their boot-time configuration instead.
>
>
> > + desc->handler->startup(desc);
> > +
> > + spin_unlock(&desc->lock);
> > + }
> > +
> > + spin_unlock(&local_irqs_type_lock);
> > +}
> > +
> > static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > void *hcpu)
> > {
> > @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> > cpu);
> > break;
> > + case CPU_STARTING:
> > + if ( system_state == SYS_STATE_resume )
> > + restore_local_irqs_on_resume();
> > + break;
>
> May I please ask, why all this new code (i.e.
> restore_local_irqs_on_resume()) is not covered by #ifdef
> CONFIG_SYSTEM_SUSPEND?
>
> > }
> >
> > return notifier_from_errno(rc);
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-02 14:33 ` Jan Beulich
@ 2025-09-03 4:31 ` Mykola Kvach
0 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-03 4:31 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
Hi Jan,
On Tue, Sep 2, 2025 at 5:33 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 02.09.2025 00:10, Mykola Kvach wrote:
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
> > d->shutdown_code = reason;
> > reason = d->shutdown_code;
> >
> > +#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
> > + if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
> > +#else
> > if ( is_hardware_domain(d) )
> > +#endif
> > hwdom_shutdown(reason);
>
> I still don't follow why Arm-specific code needs to live here. If this
> can't be properly abstracted, then at the very least I'd expect some
> code comment here, or at the very, very least something in the description.
Looks like I missed your comment about this in the previous version of
the patch series.
>
> From looking at hwdom_shutdown() I get the impression that it doesn't
> expect to be called with SHUTDOWN_suspend, yet then the question is why we
> make it into domain_shutdown() with that reason code.
Thank you for the question, it is a good one.
Thinking about it, with the current implementation (i.e. when the HW domain
requests system suspend), we don't really need to call domain_shutdown().
It would be enough to pause the last running vCPU (the current one) just to
make sure that we don't return control to the domain after exiting from the
hvc trap on the PSCI SYSTEM_SUSPEND command. We also need to set
shutting_down to ensure that any asynchronous code or timer callbacks
behave properly during suspend (i.e. skip their normal actions).
However, if we consider a setup with two separate domains -- one control and
one HW -- where the control domain makes the final decision about system
suspend, then we would need to call __domain_finalise_shutdown() during the
HW domain suspend in order to notify the control domain that the HW domain
state has changed. The control domain would then check this state and call
system suspend for itself after confirming that all other domains are in a
suspended state.
I already added a TODO about moving this logic to the control domain. So, at
first sight (unless I am missing something), we can avoid extra modifications
inside domain_shutdown() and simply avoid calling it in case of HW domain.
>
> Jan
Best regards,
Mykola
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-01 22:10 ` [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
2025-09-02 20:39 ` Volodymyr Babchuk
@ 2025-09-03 10:01 ` Oleksandr Tyshchenko
2025-09-03 10:25 ` Mykola Kvach
1 sibling, 1 reply; 37+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-03 10:01 UTC (permalink / raw)
To: Mykola Kvach, xen-devel@lists.xenproject.org
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Mykola Kvach
On 02.09.25 01:10, Mykola Kvach wrote:
Hello Mykola
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Store and restore active context and micro-TLB registers.
>
> Tested on R-Car H3 Starter Kit.
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V6:
> - refactor code related to hw_register struct, from now it's called
> ipmmu_reg_ctx
The updated version looks good, thanks. However, I have one
concern/request ...
> ---
> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
> 1 file changed, 257 insertions(+)
>
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index ea9fa9ddf3..0973559861 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -71,6 +71,8 @@
> })
> #endif
>
> +#define dev_dbg(dev, fmt, ...) \
> + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> #define dev_info(dev, fmt, ...) \
> dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> #define dev_warn(dev, fmt, ...) \
> @@ -130,6 +132,24 @@ struct ipmmu_features {
> unsigned int imuctr_ttsel_mask;
> };
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +struct ipmmu_reg_ctx {
> + unsigned int imttlbr0;
> + unsigned int imttubr0;
> + unsigned int imttbcr;
> + unsigned int imctr;
> +};
> +
> +struct ipmmu_vmsa_backup {
> + struct device *dev;
> + unsigned int *utlbs_val;
> + unsigned int *asids_val;
> + struct list_head list;
> +};
> +
> +#endif
> +
> /* Root/Cache IPMMU device's information */
> struct ipmmu_vmsa_device {
> struct device *dev;
> @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
> struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> unsigned int utlb_refcount[IPMMU_UTLB_MAX];
> const struct ipmmu_features *features;
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
> +#endif
> };
>
> /*
> @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
> spin_unlock_irqrestore(&mmu->lock, flags);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
> +static LIST_HEAD(ipmmu_devices_backup);
> +
> +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
> +
> +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
> + unsigned int utlb)
> +{
> + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
> +}
> +
> +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> +
> + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> +
> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> + {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> + unsigned int i;
> +
> + if ( to_ipmmu(backup_data->dev) != mmu )
> + continue;
> +
> + for ( i = 0; i < fwspec->num_ids; i++ )
> + {
> + unsigned int utlb = fwspec->ids[i];
> +
> + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
> + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
> + }
> + }
> +
> + spin_unlock(&ipmmu_devices_backup_lock);
> +}
> +
> +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> +
> + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> +
> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> + {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> + unsigned int i;
> +
> + if ( to_ipmmu(backup_data->dev) != mmu )
> + continue;
> +
> + for ( i = 0; i < fwspec->num_ids; i++ )
> + {
> + unsigned int utlb = fwspec->ids[i];
> +
> + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
> + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
> + }
> + }
> +
> + spin_unlock(&ipmmu_devices_backup_lock);
> +}
> +
> +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
> +{
> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> +
> + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
> +
> + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
> + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
> + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
> + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
> +}
> +
> +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
> +{
> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> +
> + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
> +
> + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
> + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
> + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
> + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
> +}
> +
> +/*
> + * Xen: Unlike Linux implementation, Xen uses a single driver instance
> + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
> + * callbacks to be invoked for each IPMMU device. So, we need to iterate
> + * through all registered IPMMUs performing required actions.
> + *
> + * Also take care of restoring special settings, such as translation
> + * table format, etc.
> + */
> +static int __must_check ipmmu_suspend(void)
> +{
> + struct ipmmu_vmsa_device *mmu;
> +
> + if ( !iommu_enabled )
> + return 0;
> +
> + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
> +
> + spin_lock(&ipmmu_devices_lock);
> +
> + list_for_each_entry( mmu, &ipmmu_devices, list )
> + {
> + if ( ipmmu_is_root(mmu) )
> + {
> + unsigned int i;
> +
> + for ( i = 0; i < mmu->num_ctx; i++ )
> + {
> + if ( !mmu->domains[i] )
> + continue;
> + ipmmu_domain_backup_context(mmu->domains[i]);
> + }
> + }
> + else
> + ipmmu_utlbs_backup(mmu);
> + }
> +
> + spin_unlock(&ipmmu_devices_lock);
> +
> + return 0;
> +}
> +
> +static void ipmmu_resume(void)
> +{
> + struct ipmmu_vmsa_device *mmu;
> +
> + if ( !iommu_enabled )
> + return;
> +
> + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
> +
> + spin_lock(&ipmmu_devices_lock);
> +
> + list_for_each_entry( mmu, &ipmmu_devices, list )
> + {
> + uint32_t reg;
> +
> + /* Do not use security group function */
> + reg = IMSCTLR + mmu->features->control_offset_base;
> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
> +
> + if ( ipmmu_is_root(mmu) )
> + {
> + unsigned int i;
> +
> + /* Use stage 2 translation table format */
> + reg = IMSAUXCTLR + mmu->features->control_offset_base;
> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
> +
> + for ( i = 0; i < mmu->num_ctx; i++ )
> + {
> + if ( !mmu->domains[i] )
> + continue;
> + ipmmu_domain_restore_context(mmu->domains[i]);
> + }
> + }
> + else
> + ipmmu_utlbs_restore(mmu);
> + }
> +
> + spin_unlock(&ipmmu_devices_lock);
> +}
> +
> +static int ipmmu_alloc_ctx_suspend(struct device *dev)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> + unsigned int *utlbs_val, *asids_val;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
> + if ( !utlbs_val )
> + return -ENOMEM;
> +
> + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
> + if ( !asids_val )
> + {
> + xfree(utlbs_val);
> + return -ENOMEM;
> + }
> +
> + backup_data = xzalloc(struct ipmmu_vmsa_backup);
> + if ( !backup_data )
> + {
> + xfree(utlbs_val);
> + xfree(asids_val);
> + return -ENOMEM;
> + }
> +
> + backup_data->dev = dev;
> + backup_data->utlbs_val = utlbs_val;
> + backup_data->asids_val = asids_val;
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> + list_add(&backup_data->list, &ipmmu_devices_backup);
> + spin_unlock(&ipmmu_devices_backup_lock);
> +
> + return 0;
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> {
> uint64_t ttbr;
> @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> return ret;
>
> domain->context_id = ret;
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
> +#endif
>
> /*
> * TTBR0
> @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
> ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
> ipmmu_tlb_sync(domain);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + domain->mmu->root->reg_backup[domain->context_id] = NULL;
> +#endif
> ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
> }
>
> @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> }
> #endif
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + if ( ipmmu_alloc_ctx_suspend(dev) )
> + {
> + dev_err(dev, "Failed to allocate context for suspend\n");
> + return -ENOMEM;
> + }
> +#endif
... The initial version was based on the driver code without PCI
support, but it is now present. There is PCI-specific code above in this
function (not visible in the context) that performs some initialization,
allocation and device assignment. What I mean is that in case of the
suspend context allocation error here, we will need to undo these
actions (i.e. deassign device). I would move this context allocation
(whose probability to fail is much lower than what is done for PCI dev)
above the PCI-specific stuff, and perform the context freeing on the
error path.
> +
> dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
> dev_name(fwspec->iommu_dev), fwspec->num_ids);
>
> @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = ipmmu_dt_xlate,
> .add_device = ipmmu_add_device,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = ipmmu_suspend,
> + .resume = ipmmu_resume,
> +#endif
> };
>
> static __init int ipmmu_init(struct dt_device_node *node, const void *data)
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-03 10:01 ` Oleksandr Tyshchenko
@ 2025-09-03 10:25 ` Mykola Kvach
2025-09-03 11:49 ` Oleksandr Tyshchenko
0 siblings, 1 reply; 37+ messages in thread
From: Mykola Kvach @ 2025-09-03 10:25 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel@lists.xenproject.org, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Mykola Kvach
Hi Oleksandr,
On Wed, Sep 3, 2025 at 1:01 PM Oleksandr Tyshchenko
<Oleksandr_Tyshchenko@epam.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
>
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >
> > Store and restore active context and micro-TLB registers.
> >
> > Tested on R-Car H3 Starter Kit.
> >
> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - refactor code related to hw_register struct, from now it's called
> > ipmmu_reg_ctx
>
> The updated version looks good, thanks. However, I have one
> concern/request ...
>
> > ---
> > xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
> > 1 file changed, 257 insertions(+)
> >
> > diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> > index ea9fa9ddf3..0973559861 100644
> > --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> > +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> > @@ -71,6 +71,8 @@
> > })
> > #endif
> >
> > +#define dev_dbg(dev, fmt, ...) \
> > + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> > #define dev_info(dev, fmt, ...) \
> > dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> > #define dev_warn(dev, fmt, ...) \
> > @@ -130,6 +132,24 @@ struct ipmmu_features {
> > unsigned int imuctr_ttsel_mask;
> > };
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +struct ipmmu_reg_ctx {
> > + unsigned int imttlbr0;
> > + unsigned int imttubr0;
> > + unsigned int imttbcr;
> > + unsigned int imctr;
> > +};
> > +
> > +struct ipmmu_vmsa_backup {
> > + struct device *dev;
> > + unsigned int *utlbs_val;
> > + unsigned int *asids_val;
> > + struct list_head list;
> > +};
> > +
> > +#endif
> > +
> > /* Root/Cache IPMMU device's information */
> > struct ipmmu_vmsa_device {
> > struct device *dev;
> > @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
> > struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> > unsigned int utlb_refcount[IPMMU_UTLB_MAX];
> > const struct ipmmu_features *features;
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
> > +#endif
> > };
> >
> > /*
> > @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
> > spin_unlock_irqrestore(&mmu->lock, flags);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
> > +static LIST_HEAD(ipmmu_devices_backup);
> > +
> > +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
> > +
> > +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
> > + unsigned int utlb)
> > +{
> > + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
> > +}
> > +
> > +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
> > +{
> > + struct ipmmu_vmsa_backup *backup_data;
> > +
> > + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
> > +
> > + spin_lock(&ipmmu_devices_backup_lock);
> > +
> > + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> > + {
> > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> > + unsigned int i;
> > +
> > + if ( to_ipmmu(backup_data->dev) != mmu )
> > + continue;
> > +
> > + for ( i = 0; i < fwspec->num_ids; i++ )
> > + {
> > + unsigned int utlb = fwspec->ids[i];
> > +
> > + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
> > + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
> > + }
> > + }
> > +
> > + spin_unlock(&ipmmu_devices_backup_lock);
> > +}
> > +
> > +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
> > +{
> > + struct ipmmu_vmsa_backup *backup_data;
> > +
> > + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
> > +
> > + spin_lock(&ipmmu_devices_backup_lock);
> > +
> > + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> > + {
> > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> > + unsigned int i;
> > +
> > + if ( to_ipmmu(backup_data->dev) != mmu )
> > + continue;
> > +
> > + for ( i = 0; i < fwspec->num_ids; i++ )
> > + {
> > + unsigned int utlb = fwspec->ids[i];
> > +
> > + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
> > + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
> > + }
> > + }
> > +
> > + spin_unlock(&ipmmu_devices_backup_lock);
> > +}
> > +
> > +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
> > +{
> > + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> > + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> > +
> > + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
> > +
> > + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
> > + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
> > + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
> > + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
> > +}
> > +
> > +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
> > +{
> > + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> > + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> > +
> > + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
> > +
> > + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
> > + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
> > + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
> > + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
> > +}
> > +
> > +/*
> > + * Xen: Unlike Linux implementation, Xen uses a single driver instance
> > + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
> > + * callbacks to be invoked for each IPMMU device. So, we need to iterate
> > + * through all registered IPMMUs performing required actions.
> > + *
> > + * Also take care of restoring special settings, such as translation
> > + * table format, etc.
> > + */
> > +static int __must_check ipmmu_suspend(void)
> > +{
> > + struct ipmmu_vmsa_device *mmu;
> > +
> > + if ( !iommu_enabled )
> > + return 0;
> > +
> > + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
> > +
> > + spin_lock(&ipmmu_devices_lock);
> > +
> > + list_for_each_entry( mmu, &ipmmu_devices, list )
> > + {
> > + if ( ipmmu_is_root(mmu) )
> > + {
> > + unsigned int i;
> > +
> > + for ( i = 0; i < mmu->num_ctx; i++ )
> > + {
> > + if ( !mmu->domains[i] )
> > + continue;
> > + ipmmu_domain_backup_context(mmu->domains[i]);
> > + }
> > + }
> > + else
> > + ipmmu_utlbs_backup(mmu);
> > + }
> > +
> > + spin_unlock(&ipmmu_devices_lock);
> > +
> > + return 0;
> > +}
> > +
> > +static void ipmmu_resume(void)
> > +{
> > + struct ipmmu_vmsa_device *mmu;
> > +
> > + if ( !iommu_enabled )
> > + return;
> > +
> > + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
> > +
> > + spin_lock(&ipmmu_devices_lock);
> > +
> > + list_for_each_entry( mmu, &ipmmu_devices, list )
> > + {
> > + uint32_t reg;
> > +
> > + /* Do not use security group function */
> > + reg = IMSCTLR + mmu->features->control_offset_base;
> > + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
> > +
> > + if ( ipmmu_is_root(mmu) )
> > + {
> > + unsigned int i;
> > +
> > + /* Use stage 2 translation table format */
> > + reg = IMSAUXCTLR + mmu->features->control_offset_base;
> > + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
> > +
> > + for ( i = 0; i < mmu->num_ctx; i++ )
> > + {
> > + if ( !mmu->domains[i] )
> > + continue;
> > + ipmmu_domain_restore_context(mmu->domains[i]);
> > + }
> > + }
> > + else
> > + ipmmu_utlbs_restore(mmu);
> > + }
> > +
> > + spin_unlock(&ipmmu_devices_lock);
> > +}
> > +
> > +static int ipmmu_alloc_ctx_suspend(struct device *dev)
> > +{
> > + struct ipmmu_vmsa_backup *backup_data;
> > + unsigned int *utlbs_val, *asids_val;
> > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> > +
> > + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
> > + if ( !utlbs_val )
> > + return -ENOMEM;
> > +
> > + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
> > + if ( !asids_val )
> > + {
> > + xfree(utlbs_val);
> > + return -ENOMEM;
> > + }
> > +
> > + backup_data = xzalloc(struct ipmmu_vmsa_backup);
> > + if ( !backup_data )
> > + {
> > + xfree(utlbs_val);
> > + xfree(asids_val);
> > + return -ENOMEM;
> > + }
> > +
> > + backup_data->dev = dev;
> > + backup_data->utlbs_val = utlbs_val;
> > + backup_data->asids_val = asids_val;
> > +
> > + spin_lock(&ipmmu_devices_backup_lock);
> > + list_add(&backup_data->list, &ipmmu_devices_backup);
> > + spin_unlock(&ipmmu_devices_backup_lock);
> > +
> > + return 0;
> > +}
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> > {
> > uint64_t ttbr;
> > @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> > return ret;
> >
> > domain->context_id = ret;
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
> > +#endif
> >
> > /*
> > * TTBR0
> > @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
> > ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
> > ipmmu_tlb_sync(domain);
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + domain->mmu->root->reg_backup[domain->context_id] = NULL;
> > +#endif
> > ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
> > }
> >
> > @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> > }
> > #endif
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + if ( ipmmu_alloc_ctx_suspend(dev) )
> > + {
> > + dev_err(dev, "Failed to allocate context for suspend\n");
> > + return -ENOMEM;
> > + }
> > +#endif
>
> ... The initial version was based on the driver code without PCI
> support, but it is now present. There is PCI-specific code above in this
> function (not visible in the context) that performs some initialization,
> allocation and device assignment. What I mean is that in case of the
> suspend context allocation error here, we will need to undo these
> actions (i.e. deassign device). I would move this context allocation
> (whose probability to fail is much lower than what is done for PCI dev)
> above the PCI-specific stuff, and perform the context freeing on the
> error path.
Maybe it would be better just to add some checks to the suspend handler.
We could skip suspend in case the context is not available, and avoid
deallocating previously allocated stuff. This is similar to what is
done for GICs.
What do you think? Or do you prefer to destroy everything related to the
IOMMU here on error?
>
> > +
> > dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
> > dev_name(fwspec->iommu_dev), fwspec->num_ids);
> >
> > @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
> > .unmap_page = arm_iommu_unmap_page,
> > .dt_xlate = ipmmu_dt_xlate,
> > .add_device = ipmmu_add_device,
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + .suspend = ipmmu_suspend,
> > + .resume = ipmmu_resume,
> > +#endif
> > };
> >
> > static __init int ipmmu_init(struct dt_device_node *node, const void *data)
Best regards,
Mykola
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-03 10:25 ` Mykola Kvach
@ 2025-09-03 11:49 ` Oleksandr Tyshchenko
2025-09-03 15:12 ` Mykola Kvach
0 siblings, 1 reply; 37+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-03 11:49 UTC (permalink / raw)
To: Mykola Kvach, Oleksandr Tyshchenko
Cc: xen-devel@lists.xenproject.org, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Mykola Kvach
On 03.09.25 13:25, Mykola Kvach wrote:
> Hi Oleksandr,
Hello Mykola
>
> On Wed, Sep 3, 2025 at 1:01 PM Oleksandr Tyshchenko
> <Oleksandr_Tyshchenko@epam.com> wrote:
>>
>>
>>
>> On 02.09.25 01:10, Mykola Kvach wrote:
>>
>> Hello Mykola
>>
>>
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> Store and restore active context and micro-TLB registers.
>>>
>>> Tested on R-Car H3 Starter Kit.
>>>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
>>> ---
>>> Changes in V6:
>>> - refactor code related to hw_register struct, from now it's called
>>> ipmmu_reg_ctx
>>
>> The updated version looks good, thanks. However, I have one
>> concern/request ...
>>
>>> ---
>>> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
>>> 1 file changed, 257 insertions(+)
>>>
>>> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>>> index ea9fa9ddf3..0973559861 100644
>>> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>>> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>>> @@ -71,6 +71,8 @@
>>> })
>>> #endif
>>>
>>> +#define dev_dbg(dev, fmt, ...) \
>>> + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
>>> #define dev_info(dev, fmt, ...) \
>>> dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
>>> #define dev_warn(dev, fmt, ...) \
>>> @@ -130,6 +132,24 @@ struct ipmmu_features {
>>> unsigned int imuctr_ttsel_mask;
>>> };
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> +
>>> +struct ipmmu_reg_ctx {
>>> + unsigned int imttlbr0;
>>> + unsigned int imttubr0;
>>> + unsigned int imttbcr;
>>> + unsigned int imctr;
>>> +};
>>> +
>>> +struct ipmmu_vmsa_backup {
>>> + struct device *dev;
>>> + unsigned int *utlbs_val;
>>> + unsigned int *asids_val;
>>> + struct list_head list;
>>> +};
>>> +
>>> +#endif
>>> +
>>> /* Root/Cache IPMMU device's information */
>>> struct ipmmu_vmsa_device {
>>> struct device *dev;
>>> @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
>>> struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
>>> unsigned int utlb_refcount[IPMMU_UTLB_MAX];
>>> const struct ipmmu_features *features;
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
>>> +#endif
>>> };
>>>
>>> /*
>>> @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
>>> spin_unlock_irqrestore(&mmu->lock, flags);
>>> }
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> +
>>> +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
>>> +static LIST_HEAD(ipmmu_devices_backup);
>>> +
>>> +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
>>> +
>>> +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
>>> + unsigned int utlb)
>>> +{
>>> + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
>>> +}
>>> +
>>> +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
>>> +{
>>> + struct ipmmu_vmsa_backup *backup_data;
>>> +
>>> + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
>>> +
>>> + spin_lock(&ipmmu_devices_backup_lock);
>>> +
>>> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
>>> + {
>>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
>>> + unsigned int i;
>>> +
>>> + if ( to_ipmmu(backup_data->dev) != mmu )
>>> + continue;
>>> +
>>> + for ( i = 0; i < fwspec->num_ids; i++ )
>>> + {
>>> + unsigned int utlb = fwspec->ids[i];
>>> +
>>> + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
>>> + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
>>> + }
>>> + }
>>> +
>>> + spin_unlock(&ipmmu_devices_backup_lock);
>>> +}
>>> +
>>> +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
>>> +{
>>> + struct ipmmu_vmsa_backup *backup_data;
>>> +
>>> + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
>>> +
>>> + spin_lock(&ipmmu_devices_backup_lock);
>>> +
>>> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
>>> + {
>>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
>>> + unsigned int i;
>>> +
>>> + if ( to_ipmmu(backup_data->dev) != mmu )
>>> + continue;
>>> +
>>> + for ( i = 0; i < fwspec->num_ids; i++ )
>>> + {
>>> + unsigned int utlb = fwspec->ids[i];
>>> +
>>> + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
>>> + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
>>> + }
>>> + }
>>> +
>>> + spin_unlock(&ipmmu_devices_backup_lock);
>>> +}
>>> +
>>> +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
>>> +{
>>> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
>>> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
>>> +
>>> + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
>>> +
>>> + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
>>> + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
>>> + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
>>> + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
>>> +}
>>> +
>>> +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
>>> +{
>>> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
>>> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
>>> +
>>> + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
>>> +
>>> + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
>>> + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
>>> + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
>>> + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
>>> +}
>>> +
>>> +/*
>>> + * Xen: Unlike Linux implementation, Xen uses a single driver instance
>>> + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
>>> + * callbacks to be invoked for each IPMMU device. So, we need to iterate
>>> + * through all registered IPMMUs performing required actions.
>>> + *
>>> + * Also take care of restoring special settings, such as translation
>>> + * table format, etc.
>>> + */
>>> +static int __must_check ipmmu_suspend(void)
>>> +{
>>> + struct ipmmu_vmsa_device *mmu;
>>> +
>>> + if ( !iommu_enabled )
>>> + return 0;
>>> +
>>> + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
>>> +
>>> + spin_lock(&ipmmu_devices_lock);
>>> +
>>> + list_for_each_entry( mmu, &ipmmu_devices, list )
>>> + {
>>> + if ( ipmmu_is_root(mmu) )
>>> + {
>>> + unsigned int i;
>>> +
>>> + for ( i = 0; i < mmu->num_ctx; i++ )
>>> + {
>>> + if ( !mmu->domains[i] )
>>> + continue;
>>> + ipmmu_domain_backup_context(mmu->domains[i]);
>>> + }
>>> + }
>>> + else
>>> + ipmmu_utlbs_backup(mmu);
>>> + }
>>> +
>>> + spin_unlock(&ipmmu_devices_lock);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void ipmmu_resume(void)
>>> +{
>>> + struct ipmmu_vmsa_device *mmu;
>>> +
>>> + if ( !iommu_enabled )
>>> + return;
>>> +
>>> + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
>>> +
>>> + spin_lock(&ipmmu_devices_lock);
>>> +
>>> + list_for_each_entry( mmu, &ipmmu_devices, list )
>>> + {
>>> + uint32_t reg;
>>> +
>>> + /* Do not use security group function */
>>> + reg = IMSCTLR + mmu->features->control_offset_base;
>>> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
>>> +
>>> + if ( ipmmu_is_root(mmu) )
>>> + {
>>> + unsigned int i;
>>> +
>>> + /* Use stage 2 translation table format */
>>> + reg = IMSAUXCTLR + mmu->features->control_offset_base;
>>> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
>>> +
>>> + for ( i = 0; i < mmu->num_ctx; i++ )
>>> + {
>>> + if ( !mmu->domains[i] )
>>> + continue;
>>> + ipmmu_domain_restore_context(mmu->domains[i]);
>>> + }
>>> + }
>>> + else
>>> + ipmmu_utlbs_restore(mmu);
>>> + }
>>> +
>>> + spin_unlock(&ipmmu_devices_lock);
>>> +}
>>> +
>>> +static int ipmmu_alloc_ctx_suspend(struct device *dev)
>>> +{
>>> + struct ipmmu_vmsa_backup *backup_data;
>>> + unsigned int *utlbs_val, *asids_val;
>>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>>> +
>>> + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
>>> + if ( !utlbs_val )
>>> + return -ENOMEM;
>>> +
>>> + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
>>> + if ( !asids_val )
>>> + {
>>> + xfree(utlbs_val);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + backup_data = xzalloc(struct ipmmu_vmsa_backup);
>>> + if ( !backup_data )
>>> + {
>>> + xfree(utlbs_val);
>>> + xfree(asids_val);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + backup_data->dev = dev;
>>> + backup_data->utlbs_val = utlbs_val;
>>> + backup_data->asids_val = asids_val;
>>> +
>>> + spin_lock(&ipmmu_devices_backup_lock);
>>> + list_add(&backup_data->list, &ipmmu_devices_backup);
>>> + spin_unlock(&ipmmu_devices_backup_lock);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +#endif /* CONFIG_SYSTEM_SUSPEND */
>>> +
>>> static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>>> {
>>> uint64_t ttbr;
>>> @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>>> return ret;
>>>
>>> domain->context_id = ret;
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
>>> +#endif
>>>
>>> /*
>>> * TTBR0
>>> @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
>>> ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
>>> ipmmu_tlb_sync(domain);
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + domain->mmu->root->reg_backup[domain->context_id] = NULL;
>>> +#endif
>>> ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
>>> }
>>>
>>> @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
>>> }
>>> #endif
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + if ( ipmmu_alloc_ctx_suspend(dev) )
>>> + {
>>> + dev_err(dev, "Failed to allocate context for suspend\n");
>>> + return -ENOMEM;
>>> + }
>>> +#endif
>>
>> ... The initial version was based on the driver code without PCI
>> support, but it is now present. There is PCI-specific code above in this
>> function (not visible in the context) that performs some initialization,
>> allocation and device assignment. What I mean is that in case of the
>> suspend context allocation error here, we will need to undo these
>> actions (i.e. deassign device). I would move this context allocation
>> (whose probability to fail is much lower than what is done for PCI dev)
>> above the PCI-specific stuff, and perform the context freeing on the
>> error path.
>
> Maybe it would be better just to add some checks to the suspend handler.
> We could skip suspend in case the context is not available, and avoid
> deallocating previously allocated stuff. This is similar to what is
> done for GICs.
>
> What do you think? Or do you prefer to destroy everything related to the
> IOMMU here on error?
I would prefer if we fail early here in ipmmu_add_device (and rollback
changes) rather than continue and fail later, other people might think
differently. I think, if we cannot simply allocate a memory for the
sctructures that situation is bad.
>
>>
>>> +
>>> dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
>>> dev_name(fwspec->iommu_dev), fwspec->num_ids);
>>>
>>> @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
>>> .unmap_page = arm_iommu_unmap_page,
>>> .dt_xlate = ipmmu_dt_xlate,
>>> .add_device = ipmmu_add_device,
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + .suspend = ipmmu_suspend,
>>> + .resume = ipmmu_resume,
>>> +#endif
>>> };
>>>
>>> static __init int ipmmu_init(struct dt_device_node *node, const void *data)
>
> Best regards,
> Mykola
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-03 11:49 ` Oleksandr Tyshchenko
@ 2025-09-03 15:12 ` Mykola Kvach
0 siblings, 0 replies; 37+ messages in thread
From: Mykola Kvach @ 2025-09-03 15:12 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: Oleksandr Tyshchenko, xen-devel@lists.xenproject.org,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Mykola Kvach
On Wed, Sep 3, 2025 at 2:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 03.09.25 13:25, Mykola Kvach wrote:
> > Hi Oleksandr,
>
> Hello Mykola
>
> >
> > On Wed, Sep 3, 2025 at 1:01 PM Oleksandr Tyshchenko
> > <Oleksandr_Tyshchenko@epam.com> wrote:
> >>
> >>
> >>
> >> On 02.09.25 01:10, Mykola Kvach wrote:
> >>
> >> Hello Mykola
> >>
> >>
> >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>>
> >>> Store and restore active context and micro-TLB registers.
> >>>
> >>> Tested on R-Car H3 Starter Kit.
> >>>
> >>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> >>> ---
> >>> Changes in V6:
> >>> - refactor code related to hw_register struct, from now it's called
> >>> ipmmu_reg_ctx
> >>
> >> The updated version looks good, thanks. However, I have one
> >> concern/request ...
> >>
> >>> ---
> >>> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
> >>> 1 file changed, 257 insertions(+)
> >>>
> >>> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> >>> index ea9fa9ddf3..0973559861 100644
> >>> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> >>> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> >>> @@ -71,6 +71,8 @@
> >>> })
> >>> #endif
> >>>
> >>> +#define dev_dbg(dev, fmt, ...) \
> >>> + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> >>> #define dev_info(dev, fmt, ...) \
> >>> dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> >>> #define dev_warn(dev, fmt, ...) \
> >>> @@ -130,6 +132,24 @@ struct ipmmu_features {
> >>> unsigned int imuctr_ttsel_mask;
> >>> };
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> +
> >>> +struct ipmmu_reg_ctx {
> >>> + unsigned int imttlbr0;
> >>> + unsigned int imttubr0;
> >>> + unsigned int imttbcr;
> >>> + unsigned int imctr;
> >>> +};
> >>> +
> >>> +struct ipmmu_vmsa_backup {
> >>> + struct device *dev;
> >>> + unsigned int *utlbs_val;
> >>> + unsigned int *asids_val;
> >>> + struct list_head list;
> >>> +};
> >>> +
> >>> +#endif
> >>> +
> >>> /* Root/Cache IPMMU device's information */
> >>> struct ipmmu_vmsa_device {
> >>> struct device *dev;
> >>> @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
> >>> struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> >>> unsigned int utlb_refcount[IPMMU_UTLB_MAX];
> >>> const struct ipmmu_features *features;
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
> >>> +#endif
> >>> };
> >>>
> >>> /*
> >>> @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
> >>> spin_unlock_irqrestore(&mmu->lock, flags);
> >>> }
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> +
> >>> +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
> >>> +static LIST_HEAD(ipmmu_devices_backup);
> >>> +
> >>> +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
> >>> +
> >>> +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
> >>> + unsigned int utlb)
> >>> +{
> >>> + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
> >>> +}
> >>> +
> >>> +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
> >>> +{
> >>> + struct ipmmu_vmsa_backup *backup_data;
> >>> +
> >>> + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
> >>> +
> >>> + spin_lock(&ipmmu_devices_backup_lock);
> >>> +
> >>> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> >>> + {
> >>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> >>> + unsigned int i;
> >>> +
> >>> + if ( to_ipmmu(backup_data->dev) != mmu )
> >>> + continue;
> >>> +
> >>> + for ( i = 0; i < fwspec->num_ids; i++ )
> >>> + {
> >>> + unsigned int utlb = fwspec->ids[i];
> >>> +
> >>> + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
> >>> + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
> >>> + }
> >>> + }
> >>> +
> >>> + spin_unlock(&ipmmu_devices_backup_lock);
> >>> +}
> >>> +
> >>> +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
> >>> +{
> >>> + struct ipmmu_vmsa_backup *backup_data;
> >>> +
> >>> + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
> >>> +
> >>> + spin_lock(&ipmmu_devices_backup_lock);
> >>> +
> >>> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> >>> + {
> >>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> >>> + unsigned int i;
> >>> +
> >>> + if ( to_ipmmu(backup_data->dev) != mmu )
> >>> + continue;
> >>> +
> >>> + for ( i = 0; i < fwspec->num_ids; i++ )
> >>> + {
> >>> + unsigned int utlb = fwspec->ids[i];
> >>> +
> >>> + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
> >>> + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
> >>> + }
> >>> + }
> >>> +
> >>> + spin_unlock(&ipmmu_devices_backup_lock);
> >>> +}
> >>> +
> >>> +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
> >>> +{
> >>> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> >>> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> >>> +
> >>> + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
> >>> +
> >>> + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
> >>> + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
> >>> + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
> >>> + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
> >>> +}
> >>> +
> >>> +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
> >>> +{
> >>> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> >>> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> >>> +
> >>> + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
> >>> +
> >>> + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
> >>> + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
> >>> + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
> >>> + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Xen: Unlike Linux implementation, Xen uses a single driver instance
> >>> + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
> >>> + * callbacks to be invoked for each IPMMU device. So, we need to iterate
> >>> + * through all registered IPMMUs performing required actions.
> >>> + *
> >>> + * Also take care of restoring special settings, such as translation
> >>> + * table format, etc.
> >>> + */
> >>> +static int __must_check ipmmu_suspend(void)
> >>> +{
> >>> + struct ipmmu_vmsa_device *mmu;
> >>> +
> >>> + if ( !iommu_enabled )
> >>> + return 0;
> >>> +
> >>> + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
> >>> +
> >>> + spin_lock(&ipmmu_devices_lock);
> >>> +
> >>> + list_for_each_entry( mmu, &ipmmu_devices, list )
> >>> + {
> >>> + if ( ipmmu_is_root(mmu) )
> >>> + {
> >>> + unsigned int i;
> >>> +
> >>> + for ( i = 0; i < mmu->num_ctx; i++ )
> >>> + {
> >>> + if ( !mmu->domains[i] )
> >>> + continue;
> >>> + ipmmu_domain_backup_context(mmu->domains[i]);
> >>> + }
> >>> + }
> >>> + else
> >>> + ipmmu_utlbs_backup(mmu);
> >>> + }
> >>> +
> >>> + spin_unlock(&ipmmu_devices_lock);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void ipmmu_resume(void)
> >>> +{
> >>> + struct ipmmu_vmsa_device *mmu;
> >>> +
> >>> + if ( !iommu_enabled )
> >>> + return;
> >>> +
> >>> + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
> >>> +
> >>> + spin_lock(&ipmmu_devices_lock);
> >>> +
> >>> + list_for_each_entry( mmu, &ipmmu_devices, list )
> >>> + {
> >>> + uint32_t reg;
> >>> +
> >>> + /* Do not use security group function */
> >>> + reg = IMSCTLR + mmu->features->control_offset_base;
> >>> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
> >>> +
> >>> + if ( ipmmu_is_root(mmu) )
> >>> + {
> >>> + unsigned int i;
> >>> +
> >>> + /* Use stage 2 translation table format */
> >>> + reg = IMSAUXCTLR + mmu->features->control_offset_base;
> >>> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
> >>> +
> >>> + for ( i = 0; i < mmu->num_ctx; i++ )
> >>> + {
> >>> + if ( !mmu->domains[i] )
> >>> + continue;
> >>> + ipmmu_domain_restore_context(mmu->domains[i]);
> >>> + }
> >>> + }
> >>> + else
> >>> + ipmmu_utlbs_restore(mmu);
> >>> + }
> >>> +
> >>> + spin_unlock(&ipmmu_devices_lock);
> >>> +}
> >>> +
> >>> +static int ipmmu_alloc_ctx_suspend(struct device *dev)
> >>> +{
> >>> + struct ipmmu_vmsa_backup *backup_data;
> >>> + unsigned int *utlbs_val, *asids_val;
> >>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> >>> +
> >>> + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
> >>> + if ( !utlbs_val )
> >>> + return -ENOMEM;
> >>> +
> >>> + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
> >>> + if ( !asids_val )
> >>> + {
> >>> + xfree(utlbs_val);
> >>> + return -ENOMEM;
> >>> + }
> >>> +
> >>> + backup_data = xzalloc(struct ipmmu_vmsa_backup);
> >>> + if ( !backup_data )
> >>> + {
> >>> + xfree(utlbs_val);
> >>> + xfree(asids_val);
> >>> + return -ENOMEM;
> >>> + }
> >>> +
> >>> + backup_data->dev = dev;
> >>> + backup_data->utlbs_val = utlbs_val;
> >>> + backup_data->asids_val = asids_val;
> >>> +
> >>> + spin_lock(&ipmmu_devices_backup_lock);
> >>> + list_add(&backup_data->list, &ipmmu_devices_backup);
> >>> + spin_unlock(&ipmmu_devices_backup_lock);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +#endif /* CONFIG_SYSTEM_SUSPEND */
> >>> +
> >>> static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> >>> {
> >>> uint64_t ttbr;
> >>> @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> >>> return ret;
> >>>
> >>> domain->context_id = ret;
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
> >>> +#endif
> >>>
> >>> /*
> >>> * TTBR0
> >>> @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
> >>> ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
> >>> ipmmu_tlb_sync(domain);
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + domain->mmu->root->reg_backup[domain->context_id] = NULL;
> >>> +#endif
> >>> ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
> >>> }
> >>>
> >>> @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> >>> }
> >>> #endif
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + if ( ipmmu_alloc_ctx_suspend(dev) )
> >>> + {
> >>> + dev_err(dev, "Failed to allocate context for suspend\n");
> >>> + return -ENOMEM;
> >>> + }
> >>> +#endif
> >>
> >> ... The initial version was based on the driver code without PCI
> >> support, but it is now present. There is PCI-specific code above in this
> >> function (not visible in the context) that performs some initialization,
> >> allocation and device assignment. What I mean is that in case of the
> >> suspend context allocation error here, we will need to undo these
> >> actions (i.e. deassign device). I would move this context allocation
> >> (whose probability to fail is much lower than what is done for PCI dev)
> >> above the PCI-specific stuff, and perform the context freeing on the
> >> error path.
> >
> > Maybe it would be better just to add some checks to the suspend handler.
> > We could skip suspend in case the context is not available, and avoid
> > deallocating previously allocated stuff. This is similar to what is
> > done for GICs.
> >
> > What do you think? Or do you prefer to destroy everything related to the
> > IOMMU here on error?
>
> I would prefer if we fail early here in ipmmu_add_device (and rollback
> changes) rather than continue and fail later, other people might think
> differently. I think, if we cannot simply allocate a memory for the
> sctructures that situation is bad.
Got it, I’ll fix this in the next version of the patch series.
Thank you for pointing that out.
>
>
>
> >
> >>
> >>> +
> >>> dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
> >>> dev_name(fwspec->iommu_dev), fwspec->num_ids);
> >>>
> >>> @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
> >>> .unmap_page = arm_iommu_unmap_page,
> >>> .dt_xlate = ipmmu_dt_xlate,
> >>> .add_device = ipmmu_add_device,
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + .suspend = ipmmu_suspend,
> >>> + .resume = ipmmu_resume,
> >>> +#endif
> >>> };
> >>>
> >>> static __init int ipmmu_init(struct dt_device_node *node, const void *data)
> >
> > Best regards,
> > Mykola
> >
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2025-09-03 15:13 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
2025-09-02 20:14 ` Volodymyr Babchuk
2025-09-01 22:10 ` [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
2025-09-02 20:24 ` Volodymyr Babchuk
2025-09-01 22:10 ` [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 " Mykola Kvach
2025-09-02 16:08 ` Oleksandr Tyshchenko
2025-09-02 17:30 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend Mykola Kvach
2025-09-02 20:31 ` Volodymyr Babchuk
2025-09-01 22:10 ` [PATCH v6 05/13] xen/arm: irq: avoid local IRQ descriptors reinit on system resume Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during " Mykola Kvach
2025-09-02 16:49 ` Oleksandr Tyshchenko
2025-09-02 17:43 ` Mykola Kvach
2025-09-02 18:16 ` Oleksandr Tyshchenko
2025-09-02 20:08 ` Mykola Kvach
2025-09-02 20:19 ` Mykola Kvach
2025-09-02 22:21 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
2025-09-02 20:39 ` Volodymyr Babchuk
2025-09-03 10:01 ` Oleksandr Tyshchenko
2025-09-03 10:25 ` Mykola Kvach
2025-09-03 11:49 ` Oleksandr Tyshchenko
2025-09-03 15:12 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 08/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 09/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 10/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
2025-09-02 5:56 ` Mykola Kvach
2025-09-02 14:33 ` Jan Beulich
2025-09-03 4:31 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
2025-09-02 17:25 ` Oleksandr Tyshchenko
2025-09-02 17:46 ` Mykola Kvach
2025-09-02 20:51 ` Volodymyr Babchuk
2025-09-01 22:10 ` [PATCH v6 13/13] xen/arm: gic-v3: Add suspend/resume support for eSPI registers Mykola Kvach
2025-09-02 20:48 ` [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Volodymyr Babchuk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).