* [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:14 ` Volodymyr Babchuk
2025-09-12 23:04 ` Julien Grall
2025-09-01 22:10 ` [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
` (12 subsequent siblings)
13 siblings, 2 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
Timer interrupts must be disabled while the system is suspended to prevent
spurious wake-ups. Suspending the timers involves disabling both the EL1
physical timer and the EL2 hypervisor timer. Resuming consists of raising
the TIMER_SOFTIRQ, which prompts the generic timer code to reprogram the
EL2 timer as needed. Re-enabling of the EL1 timer is left to the entity
that uses it.
Introduce a new helper, disable_physical_timers, to encapsulate disabling
of the physical timers.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V4:
- Rephrased comment and commit message for better clarity
- Created separate function for disabling physical timers
Changes in V3:
- time_suspend and time_resume are now conditionally compiled
under CONFIG_SYSTEM_SUSPEND
---
xen/arch/arm/include/asm/time.h | 5 +++++
xen/arch/arm/time.c | 38 +++++++++++++++++++++++++++------
2 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/xen/arch/arm/include/asm/time.h b/xen/arch/arm/include/asm/time.h
index 49ad8c1a6d..f4fd0c6af5 100644
--- a/xen/arch/arm/include/asm/time.h
+++ b/xen/arch/arm/include/asm/time.h
@@ -108,6 +108,11 @@ void preinit_xen_time(void);
void force_update_vcpu_system_time(struct vcpu *v);
+#ifdef CONFIG_SYSTEM_SUSPEND
+void time_suspend(void);
+void time_resume(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
#endif /* __ARM_TIME_H__ */
/*
* Local variables:
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index e74d30d258..ad984fdfdd 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -303,6 +303,14 @@ static void check_timer_irq_cfg(unsigned int irq, const char *which)
"WARNING: %s-timer IRQ%u is not level triggered.\n", which, irq);
}
+/* Disable physical timers for EL1 and EL2 on the current CPU */
+static inline void disable_physical_timers(void)
+{
+ WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
+ WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
+ isb();
+}
+
/* Set up the timer interrupt on this CPU */
void init_timer_interrupt(void)
{
@@ -310,9 +318,7 @@ void init_timer_interrupt(void)
WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */
/* Do not let the VMs program the physical timer, only read the physical counter */
WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
- WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
- WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
- isb();
+ disable_physical_timers();
request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
"hyptimer", NULL);
@@ -330,9 +336,7 @@ void init_timer_interrupt(void)
*/
static void deinit_timer_interrupt(void)
{
- WRITE_SYSREG(0, CNTP_CTL_EL0); /* Disable physical timer */
- WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Disable hypervisor's timer */
- isb();
+ disable_physical_timers();
release_irq(timer_irq[TIMER_HYP_PPI], NULL);
release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
@@ -372,6 +376,28 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
/* XXX update guest visible wallclock time */
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+void time_suspend(void)
+{
+ disable_physical_timers();
+}
+
+void time_resume(void)
+{
+ /*
+ * Raising the timer softirq triggers generic code to call reprogram_timer
+ * with the correct timeout (not known here).
+ *
+ * No further action is needed to restore timekeeping after power down,
+ * since the system counter is unaffected. See ARM DDI 0487 L.a, D12.1.2
+ * "The system counter must be implemented in an always-on power domain."
+ */
+ raise_softirq(TIMER_SOFTIRQ);
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
static int cpu_time_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers
2025-09-01 22:10 ` [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
@ 2025-09-02 20:14 ` Volodymyr Babchuk
2025-09-12 23:04 ` Julien Grall
1 sibling, 0 replies; 49+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:14 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mirela Simonovic,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Saeed Nowshadi, Mykola Kvach
Hi,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> Timer interrupts must be disabled while the system is suspended to prevent
> spurious wake-ups. Suspending the timers involves disabling both the EL1
> physical timer and the EL2 hypervisor timer. Resuming consists of raising
> the TIMER_SOFTIRQ, which prompts the generic timer code to reprogram the
> EL2 timer as needed. Re-enabling of the EL1 timer is left to the entity
> that uses it.
>
> Introduce a new helper, disable_physical_timers, to encapsulate disabling
> of the physical timers.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in V4:
> - Rephrased comment and commit message for better clarity
> - Created separate function for disabling physical timers
>
> Changes in V3:
> - time_suspend and time_resume are now conditionally compiled
> under CONFIG_SYSTEM_SUSPEND
> ---
> xen/arch/arm/include/asm/time.h | 5 +++++
> xen/arch/arm/time.c | 38 +++++++++++++++++++++++++++------
> 2 files changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/xen/arch/arm/include/asm/time.h b/xen/arch/arm/include/asm/time.h
> index 49ad8c1a6d..f4fd0c6af5 100644
> --- a/xen/arch/arm/include/asm/time.h
> +++ b/xen/arch/arm/include/asm/time.h
> @@ -108,6 +108,11 @@ void preinit_xen_time(void);
>
> void force_update_vcpu_system_time(struct vcpu *v);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +void time_suspend(void);
> +void time_resume(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> #endif /* __ARM_TIME_H__ */
> /*
> * Local variables:
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index e74d30d258..ad984fdfdd 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -303,6 +303,14 @@ static void check_timer_irq_cfg(unsigned int irq, const char *which)
> "WARNING: %s-timer IRQ%u is not level triggered.\n", which, irq);
> }
>
> +/* Disable physical timers for EL1 and EL2 on the current CPU */
> +static inline void disable_physical_timers(void)
> +{
> + WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
> + WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
> + isb();
> +}
> +
> /* Set up the timer interrupt on this CPU */
> void init_timer_interrupt(void)
> {
> @@ -310,9 +318,7 @@ void init_timer_interrupt(void)
> WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */
> /* Do not let the VMs program the physical timer, only read the physical counter */
> WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
> - WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
> - WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
> - isb();
> + disable_physical_timers();
>
> request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> "hyptimer", NULL);
> @@ -330,9 +336,7 @@ void init_timer_interrupt(void)
> */
> static void deinit_timer_interrupt(void)
> {
> - WRITE_SYSREG(0, CNTP_CTL_EL0); /* Disable physical timer */
> - WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Disable hypervisor's timer */
> - isb();
> + disable_physical_timers();
>
> release_irq(timer_irq[TIMER_HYP_PPI], NULL);
> release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
> @@ -372,6 +376,28 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
> /* XXX update guest visible wallclock time */
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +void time_suspend(void)
> +{
> + disable_physical_timers();
> +}
> +
> +void time_resume(void)
> +{
> + /*
> + * Raising the timer softirq triggers generic code to call reprogram_timer
> + * with the correct timeout (not known here).
> + *
> + * No further action is needed to restore timekeeping after power down,
> + * since the system counter is unaffected. See ARM DDI 0487 L.a, D12.1.2
> + * "The system counter must be implemented in an always-on power domain."
> + */
> + raise_softirq(TIMER_SOFTIRQ);
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int cpu_time_callback(struct notifier_block *nfb,
> unsigned long action,
> void *hcpu)
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers
2025-09-01 22:10 ` [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
2025-09-02 20:14 ` Volodymyr Babchuk
@ 2025-09-12 23:04 ` Julien Grall
2025-11-21 8:10 ` Mykola Kvach
1 sibling, 1 reply; 49+ messages in thread
From: Julien Grall @ 2025-09-12 23:04 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi, Mykola Kvach
Hi Mykola,
On 01/09/2025 23:10, Mykola Kvach wrote:
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> Timer interrupts must be disabled while the system is suspended to prevent
> spurious wake-ups.
Yet, you don't seem to disable the virtual interrupt. Can you explain why?
> Suspending the timers involves disabling both the EL1
> physical timer and the EL2 hypervisor timer.
I know this is what Arm is naming the timers. But I would rather s/EL1//
and s/EL2// because the physical timer is also accessible from EL0.
Note that Xen doesn't use or expose the physical timer. So it should
always be disabled at the point "time_suspend()" is called. I am still
ok to disable it just in case though.
> Resuming consists of raising
> the TIMER_SOFTIRQ, which prompts the generic timer code to reprogram the
> EL2 timer as needed. Re-enabling of the EL1 timer is left to the entity
> that uses it.
>
> Introduce a new helper, disable_physical_timers, to encapsulate disabling
> of the physical timers.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V4:
> - Rephrased comment and commit message for better clarity
> - Created separate function for disabling physical timers
>
> Changes in V3:
> - time_suspend and time_resume are now conditionally compiled
> under CONFIG_SYSTEM_SUSPEND
> ---
> xen/arch/arm/include/asm/time.h | 5 +++++
> xen/arch/arm/time.c | 38 +++++++++++++++++++++++++++------
> 2 files changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/xen/arch/arm/include/asm/time.h b/xen/arch/arm/include/asm/time.h
> index 49ad8c1a6d..f4fd0c6af5 100644
> --- a/xen/arch/arm/include/asm/time.h
> +++ b/xen/arch/arm/include/asm/time.h
> @@ -108,6 +108,11 @@ void preinit_xen_time(void);
>
> void force_update_vcpu_system_time(struct vcpu *v);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +void time_suspend(void);
> +void time_resume(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> #endif /* __ARM_TIME_H__ */
> /*
> * Local variables:
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index e74d30d258..ad984fdfdd 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -303,6 +303,14 @@ static void check_timer_irq_cfg(unsigned int irq, const char *which)
> "WARNING: %s-timer IRQ%u is not level triggered.\n", which, irq);
> }
>
> +/* Disable physical timers for EL1 and EL2 on the current CPU */
The name of the times are "physical timer" and "hypervisor timer".
> +static inline void disable_physical_timers(void)
"Physical is a bit misleading" in this context. All the 3 timers
(virtual, physical, hypervisor) are physical timers. My preference would
be to name this function disable_timers() (assuming you also need to
disable the virtual timer).
> +{
> + WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
> + WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
> + isb();
> +}
> +
> /* Set up the timer interrupt on this CPU */
> void init_timer_interrupt(void)
> {
> @@ -310,9 +318,7 @@ void init_timer_interrupt(void)
> WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */
> /* Do not let the VMs program the physical timer, only read the physical counter */
> WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
> - WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
> - WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
> - isb();
> + disable_physical_timers();
>
> request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> "hyptimer", NULL);
> @@ -330,9 +336,7 @@ void init_timer_interrupt(void)
> */
> static void deinit_timer_interrupt(void)
> {
> - WRITE_SYSREG(0, CNTP_CTL_EL0); /* Disable physical timer */
> - WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Disable hypervisor's timer */
> - isb();
> + disable_physical_timers();
>
> release_irq(timer_irq[TIMER_HYP_PPI], NULL);
> release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
> @@ -372,6 +376,28 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
> /* XXX update guest visible wallclock time */
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +void time_suspend(void)
> +{
> + disable_physical_timers();
> +}
> +
> +void time_resume(void)
> +{
> + /*
> + * Raising the timer softirq triggers generic code to call reprogram_timer
> + * with the correct timeout (not known here).
> + *
> + * No further action is needed to restore timekeeping after power down,
> + * since the system counter is unaffected. See ARM DDI 0487 L.a, D12.1.2
> + * "The system counter must be implemented in an always-on power domain."
> + */
> + raise_softirq(TIMER_SOFTIRQ);
I think we should add a comment about the physical timer.
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int cpu_time_callback(struct notifier_block *nfb,
> unsigned long action,
> void *hcpu)Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers
2025-09-12 23:04 ` Julien Grall
@ 2025-11-21 8:10 ` Mykola Kvach
0 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-11-21 8:10 UTC (permalink / raw)
To: Julien Grall
Cc: xen-devel, Mirela Simonovic, Stefano Stabellini, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi, Mykola Kvach
Hi Julien,
Thanks for your review and for the time spent on this series.
On Sat, Sep 13, 2025 at 2:04 AM Julien Grall <julien@xen.org> wrote:
>
> Hi Mykola,
>
> On 01/09/2025 23:10, Mykola Kvach wrote:
> > From: Mirela Simonovic <mirela.simonovic@aggios.com>
> >
> > Timer interrupts must be disabled while the system is suspended to prevent
> > spurious wake-ups.
>
> Yet, you don't seem to disable the virtual interrupt. Can you explain why?
Thanks for the question — looks like I missed calling this out.
The virtual timer is already disabled on vCPU context switch. During the
suspend flow, ctxt_switch_from() calls virt_timer_save(), which clears
CNTV_CTL_EL0.ENABLE and preserves the timer state in
vcpu->arch.virt_timer. Therefore there is no live virtual timer interrupt
source by the time time_suspend() executes.
Also, the context switch happens before the suspend tasklet is invoked,
and time_suspend() is called from that tasklet.
>
> > Suspending the timers involves disabling both the EL1
> > physical timer and the EL2 hypervisor timer.
> I know this is what Arm is naming the timers. But I would rather s/EL1//
> and s/EL2// because the physical timer is also accessible from EL0.
Thanks, makes sense. I'll drop the EL1/EL2 wording and refer to them as
physical timer and hypervisor timer.
>
> Note that Xen doesn't use or expose the physical timer. So it should
> always be disabled at the point "time_suspend()" is called. I am still
> ok to disable it just in case though.
Right, Xen doesn't rely on CNTP, so it should already be disabled by
the time we reach time_suspend().
>
> > Resuming consists of raising
> > the TIMER_SOFTIRQ, which prompts the generic timer code to reprogram the
> > EL2 timer as needed. Re-enabling of the EL1 timer is left to the entity
> > that uses it.
> >
> > Introduce a new helper, disable_physical_timers, to encapsulate disabling
> > of the physical timers.
> >
> > Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> > Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V4:
> > - Rephrased comment and commit message for better clarity
> > - Created separate function for disabling physical timers
> >
> > Changes in V3:
> > - time_suspend and time_resume are now conditionally compiled
> > under CONFIG_SYSTEM_SUSPEND
> > ---
> > xen/arch/arm/include/asm/time.h | 5 +++++
> > xen/arch/arm/time.c | 38 +++++++++++++++++++++++++++------
> > 2 files changed, 37 insertions(+), 6 deletions(-)
> >
> > diff --git a/xen/arch/arm/include/asm/time.h b/xen/arch/arm/include/asm/time.h
> > index 49ad8c1a6d..f4fd0c6af5 100644
> > --- a/xen/arch/arm/include/asm/time.h
> > +++ b/xen/arch/arm/include/asm/time.h
> > @@ -108,6 +108,11 @@ void preinit_xen_time(void);
> >
> > void force_update_vcpu_system_time(struct vcpu *v);
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +void time_suspend(void);
> > +void time_resume(void);
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > #endif /* __ARM_TIME_H__ */
> > /*
> > * Local variables:
> > diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> > index e74d30d258..ad984fdfdd 100644
> > --- a/xen/arch/arm/time.c
> > +++ b/xen/arch/arm/time.c
> > @@ -303,6 +303,14 @@ static void check_timer_irq_cfg(unsigned int irq, const char *which)
> > "WARNING: %s-timer IRQ%u is not level triggered.\n", which, irq);
> > }
> >
> > +/* Disable physical timers for EL1 and EL2 on the current CPU */
>
> The name of the times are "physical timer" and "hypervisor timer".
Ack
>
> > +static inline void disable_physical_timers(void)
>
> "Physical is a bit misleading" in this context. All the 3 timers
> (virtual, physical, hypervisor) are physical timers. My preference would
> be to name this function disable_timers() (assuming you also need to
> disable the virtual timer).
As explained above, CNTV is already disabled before suspend, so the helper
only targets CNTP/CNTHP. Renamed it to disable_phys_hyp_timers() accordingly.
>
> > +{
> > + WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
> > + WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
> > + isb();
> > +}
> > +
> > /* Set up the timer interrupt on this CPU */
> > void init_timer_interrupt(void)
> > {
> > @@ -310,9 +318,7 @@ void init_timer_interrupt(void)
> > WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */
> > /* Do not let the VMs program the physical timer, only read the physical counter */
> > WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
> > - WRITE_SYSREG(0, CNTP_CTL_EL0); /* Physical timer disabled */
> > - WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */
> > - isb();
> > + disable_physical_timers();
> >
> > request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> > "hyptimer", NULL);
> > @@ -330,9 +336,7 @@ void init_timer_interrupt(void)
> > */
> > static void deinit_timer_interrupt(void)
> > {
> > - WRITE_SYSREG(0, CNTP_CTL_EL0); /* Disable physical timer */
> > - WRITE_SYSREG(0, CNTHP_CTL_EL2); /* Disable hypervisor's timer */
> > - isb();
> > + disable_physical_timers();
> >
> > release_irq(timer_irq[TIMER_HYP_PPI], NULL);
> > release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
> > @@ -372,6 +376,28 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
> > /* XXX update guest visible wallclock time */
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +void time_suspend(void)
> > +{
> > + disable_physical_timers();
> > +}
> > +
> > +void time_resume(void)
> > +{
> > + /*
> > + * Raising the timer softirq triggers generic code to call reprogram_timer
> > + * with the correct timeout (not known here).
> > + *
> > + * No further action is needed to restore timekeeping after power down,
> > + * since the system counter is unaffected. See ARM DDI 0487 L.a, D12.1.2
> > + * "The system counter must be implemented in an always-on power domain."
> > + */
> > + raise_softirq(TIMER_SOFTIRQ);
>
> I think we should add a comment about the physical timer.
I'll add a comment in time_resume() clarifying that the physical timer remains
disabled in Xen, while the virtual timer is restored per-vCPU on
context restore.
>
> > +}
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > static int cpu_time_callback(struct notifier_block *nfb,
> > unsigned long action,
> > void *hcpu)Cheers,
>
> --
> Julien Grall
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:24 ` Volodymyr Babchuk
2025-09-12 23:30 ` Julien Grall
2025-09-01 22:10 ` [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 " Mykola Kvach
` (11 subsequent siblings)
13 siblings, 2 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
System suspend may lead to a state where GIC would be powered down.
Therefore, Xen should save/restore the context of GIC on suspend/resume.
Note that the context consists of states of registers which are
controlled by the hypervisor. Other GIC registers which are accessible
by guests are saved/restored on context switch.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- drop extra func/line printing from dprintk
- drop checking context allocation from resume handler
- merge some loops where it is possible
Changes in v4:
- Add error logging for allocation failures
Changes in v3:
- Drop asserts and return error codes instead.
- Wrap code with CONFIG_SYSTEM_SUSPEND.
Changes in v2:
- Minor fixes after review.
---
xen/arch/arm/gic-v2.c | 143 +++++++++++++++++++++++++++++++++
xen/arch/arm/gic.c | 29 +++++++
xen/arch/arm/include/asm/gic.h | 12 +++
3 files changed, 184 insertions(+)
diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index b23e72a3d0..6373599e69 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -1098,6 +1098,140 @@ static int gicv2_iomem_deny_access(struct domain *d)
return iomem_deny_access(d, mfn, mfn + nr);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+/* GICv2 registers to be saved/restored on system suspend/resume */
+struct gicv2_context {
+ /* GICC context */
+ uint32_t gicc_ctlr;
+ uint32_t gicc_pmr;
+ uint32_t gicc_bpr;
+ /* GICD context */
+ uint32_t gicd_ctlr;
+ uint32_t *gicd_isenabler;
+ uint32_t *gicd_isactiver;
+ uint32_t *gicd_ipriorityr;
+ uint32_t *gicd_itargetsr;
+ uint32_t *gicd_icfgr;
+};
+
+static struct gicv2_context gicv2_context;
+
+static int gicv2_suspend(void)
+{
+ unsigned int i;
+
+ if ( !gicv2_context.gicd_isenabler )
+ {
+ dprintk(XENLOG_WARNING, "GICv2 suspend context not allocated!\n");
+ return -ENOMEM;
+ }
+
+ /* Save GICC configuration */
+ gicv2_context.gicc_ctlr = readl_gicc(GICC_CTLR);
+ gicv2_context.gicc_pmr = readl_gicc(GICC_PMR);
+ gicv2_context.gicc_bpr = readl_gicc(GICC_BPR);
+
+ /* Save GICD configuration */
+ gicv2_context.gicd_ctlr = readl_gicd(GICD_CTLR);
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
+ {
+ gicv2_context.gicd_isenabler[i] = readl_gicd(GICD_ISENABLER + i * 4);
+ gicv2_context.gicd_isactiver[i] = readl_gicd(GICD_ISACTIVER + i * 4);
+ }
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
+ {
+ gicv2_context.gicd_ipriorityr[i] = readl_gicd(GICD_IPRIORITYR + i * 4);
+ gicv2_context.gicd_itargetsr[i] = readl_gicd(GICD_ITARGETSR + i * 4);
+ }
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
+ gicv2_context.gicd_icfgr[i] = readl_gicd(GICD_ICFGR + i * 4);
+
+ return 0;
+}
+
+static void gicv2_resume(void)
+{
+ unsigned int i;
+
+ gicv2_cpu_disable();
+ /* Disable distributor */
+ writel_gicd(0, GICD_CTLR);
+
+ /* Restore GICD configuration */
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
+ {
+ writel_gicd(0xffffffff, GICD_ICENABLER + i * 4);
+ writel_gicd(gicv2_context.gicd_isenabler[i], GICD_ISENABLER + i * 4);
+
+ writel_gicd(0xffffffff, GICD_ICACTIVER + i * 4);
+ writel_gicd(gicv2_context.gicd_isactiver[i], GICD_ISACTIVER + i * 4);
+ }
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
+ {
+ writel_gicd(gicv2_context.gicd_ipriorityr[i], GICD_IPRIORITYR + i * 4);
+ writel_gicd(gicv2_context.gicd_itargetsr[i], GICD_ITARGETSR + i * 4);
+ }
+
+ for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
+ writel_gicd(gicv2_context.gicd_icfgr[i], GICD_ICFGR + i * 4);
+
+ /* Make sure all registers are restored and enable distributor */
+ writel_gicd(gicv2_context.gicd_ctlr | GICD_CTL_ENABLE, GICD_CTLR);
+
+ /* Restore GIC CPU interface configuration */
+ writel_gicc(gicv2_context.gicc_pmr, GICC_PMR);
+ writel_gicc(gicv2_context.gicc_bpr, GICC_BPR);
+
+ /* Enable GIC CPU interface */
+ writel_gicc(gicv2_context.gicc_ctlr | GICC_CTL_ENABLE | GICC_CTL_EOI,
+ GICC_CTLR);
+}
+
+static void gicv2_alloc_context(struct gicv2_context *gc)
+{
+ uint32_t n = gicv2_info.nr_lines;
+
+ gc->gicd_isenabler = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
+ if ( !gc->gicd_isenabler )
+ goto err_free;
+
+ gc->gicd_isactiver = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
+ if ( !gc->gicd_isactiver )
+ goto err_free;
+
+ gc->gicd_itargetsr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
+ if ( !gc->gicd_itargetsr )
+ goto err_free;
+
+ gc->gicd_ipriorityr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
+ if ( !gc->gicd_ipriorityr )
+ goto err_free;
+
+ gc->gicd_icfgr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 16));
+ if ( !gc->gicd_icfgr )
+ goto err_free;
+
+ return;
+
+ err_free:
+ printk(XENLOG_ERR "Failed to allocate memory for GICv2 suspend context\n");
+
+ xfree(gc->gicd_icfgr);
+ xfree(gc->gicd_ipriorityr);
+ xfree(gc->gicd_itargetsr);
+ xfree(gc->gicd_isactiver);
+ xfree(gc->gicd_isenabler);
+
+ memset(gc, 0, sizeof(*gc));
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
#ifdef CONFIG_ACPI
static unsigned long gicv2_get_hwdom_extra_madt_size(const struct domain *d)
{
@@ -1302,6 +1436,11 @@ static int __init gicv2_init(void)
spin_unlock(&gicv2.lock);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ /* Allocate memory to be used for saving GIC context during the suspend */
+ gicv2_alloc_context(&gicv2_context);
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
return 0;
}
@@ -1345,6 +1484,10 @@ static const struct gic_hw_operations gicv2_ops = {
.map_hwdom_extra_mappings = gicv2_map_hwdom_extra_mappings,
.iomem_deny_access = gicv2_iomem_deny_access,
.do_LPI = gicv2_do_LPI,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = gicv2_suspend,
+ .resume = gicv2_resume,
+#endif /* CONFIG_SYSTEM_SUSPEND */
};
/* Set up the GIC */
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e80fe0ca24..a018bd7715 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -425,6 +425,35 @@ int gic_iomem_deny_access(struct domain *d)
return gic_hw_ops->iomem_deny_access(d);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+int gic_suspend(void)
+{
+ /* Must be called by boot CPU#0 with interrupts disabled */
+ ASSERT(!local_irq_is_enabled());
+ ASSERT(!smp_processor_id());
+
+ if ( !gic_hw_ops->suspend || !gic_hw_ops->resume )
+ return -ENOSYS;
+
+ return gic_hw_ops->suspend();
+}
+
+void gic_resume(void)
+{
+ /*
+ * Must be called by boot CPU#0 with interrupts disabled after gic_suspend
+ * has returned successfully.
+ */
+ ASSERT(!local_irq_is_enabled());
+ ASSERT(!smp_processor_id());
+ ASSERT(gic_hw_ops->resume);
+
+ gic_hw_ops->resume();
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
static int cpu_gic_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
diff --git a/xen/arch/arm/include/asm/gic.h b/xen/arch/arm/include/asm/gic.h
index 541f0eeb80..a706303008 100644
--- a/xen/arch/arm/include/asm/gic.h
+++ b/xen/arch/arm/include/asm/gic.h
@@ -280,6 +280,12 @@ extern int gicv_setup(struct domain *d);
extern void gic_save_state(struct vcpu *v);
extern void gic_restore_state(struct vcpu *v);
+#ifdef CONFIG_SYSTEM_SUSPEND
+/* Suspend/resume */
+extern int gic_suspend(void);
+extern void gic_resume(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
/* SGI (AKA IPIs) */
enum gic_sgi {
GIC_SGI_EVENT_CHECK,
@@ -395,6 +401,12 @@ struct gic_hw_operations {
int (*iomem_deny_access)(struct domain *d);
/* Handle LPIs, which require special handling */
void (*do_LPI)(unsigned int lpi);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ /* Save GIC configuration due to the system suspend */
+ int (*suspend)(void);
+ /* Restore GIC configuration due to the system resume */
+ void (*resume)(void);
+#endif /* CONFIG_SYSTEM_SUSPEND */
};
extern const struct gic_hw_operations *gic_hw_ops;
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2025-09-01 22:10 ` [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
@ 2025-09-02 20:24 ` Volodymyr Babchuk
2025-09-12 23:30 ` Julien Grall
1 sibling, 0 replies; 49+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:24 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mirela Simonovic,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Saeed Nowshadi, Mykyta Poturai, Mykola Kvach
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> System suspend may lead to a state where GIC would be powered down.
> Therefore, Xen should save/restore the context of GIC on suspend/resume.
>
> Note that the context consists of states of registers which are
> controlled by the hypervisor. Other GIC registers which are accessible
> by guests are saved/restored on context switch.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in v6:
> - drop extra func/line printing from dprintk
> - drop checking context allocation from resume handler
> - merge some loops where it is possible
>
> Changes in v4:
> - Add error logging for allocation failures
>
> Changes in v3:
> - Drop asserts and return error codes instead.
> - Wrap code with CONFIG_SYSTEM_SUSPEND.
>
> Changes in v2:
> - Minor fixes after review.
> ---
> xen/arch/arm/gic-v2.c | 143 +++++++++++++++++++++++++++++++++
> xen/arch/arm/gic.c | 29 +++++++
> xen/arch/arm/include/asm/gic.h | 12 +++
> 3 files changed, 184 insertions(+)
>
> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> index b23e72a3d0..6373599e69 100644
> --- a/xen/arch/arm/gic-v2.c
> +++ b/xen/arch/arm/gic-v2.c
> @@ -1098,6 +1098,140 @@ static int gicv2_iomem_deny_access(struct domain *d)
> return iomem_deny_access(d, mfn, mfn + nr);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +/* GICv2 registers to be saved/restored on system suspend/resume */
> +struct gicv2_context {
> + /* GICC context */
> + uint32_t gicc_ctlr;
> + uint32_t gicc_pmr;
> + uint32_t gicc_bpr;
> + /* GICD context */
> + uint32_t gicd_ctlr;
> + uint32_t *gicd_isenabler;
> + uint32_t *gicd_isactiver;
> + uint32_t *gicd_ipriorityr;
> + uint32_t *gicd_itargetsr;
> + uint32_t *gicd_icfgr;
> +};
> +
> +static struct gicv2_context gicv2_context;
> +
> +static int gicv2_suspend(void)
> +{
> + unsigned int i;
> +
> + if ( !gicv2_context.gicd_isenabler )
> + {
> + dprintk(XENLOG_WARNING, "GICv2 suspend context not allocated!\n");
> + return -ENOMEM;
> + }
> +
> + /* Save GICC configuration */
> + gicv2_context.gicc_ctlr = readl_gicc(GICC_CTLR);
> + gicv2_context.gicc_pmr = readl_gicc(GICC_PMR);
> + gicv2_context.gicc_bpr = readl_gicc(GICC_BPR);
> +
> + /* Save GICD configuration */
> + gicv2_context.gicd_ctlr = readl_gicd(GICD_CTLR);
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
> + {
> + gicv2_context.gicd_isenabler[i] = readl_gicd(GICD_ISENABLER + i * 4);
> + gicv2_context.gicd_isactiver[i] = readl_gicd(GICD_ISACTIVER + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
> + {
> + gicv2_context.gicd_ipriorityr[i] = readl_gicd(GICD_IPRIORITYR + i * 4);
> + gicv2_context.gicd_itargetsr[i] = readl_gicd(GICD_ITARGETSR + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
> + gicv2_context.gicd_icfgr[i] = readl_gicd(GICD_ICFGR + i * 4);
> +
> + return 0;
> +}
> +
> +static void gicv2_resume(void)
> +{
> + unsigned int i;
> +
> + gicv2_cpu_disable();
> + /* Disable distributor */
> + writel_gicd(0, GICD_CTLR);
> +
> + /* Restore GICD configuration */
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
> + {
> + writel_gicd(0xffffffff, GICD_ICENABLER + i * 4);
> + writel_gicd(gicv2_context.gicd_isenabler[i], GICD_ISENABLER + i * 4);
> +
> + writel_gicd(0xffffffff, GICD_ICACTIVER + i * 4);
> + writel_gicd(gicv2_context.gicd_isactiver[i], GICD_ISACTIVER + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
> + {
> + writel_gicd(gicv2_context.gicd_ipriorityr[i], GICD_IPRIORITYR + i * 4);
> + writel_gicd(gicv2_context.gicd_itargetsr[i], GICD_ITARGETSR + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
> + writel_gicd(gicv2_context.gicd_icfgr[i], GICD_ICFGR + i * 4);
> +
> + /* Make sure all registers are restored and enable distributor */
> + writel_gicd(gicv2_context.gicd_ctlr | GICD_CTL_ENABLE, GICD_CTLR);
> +
> + /* Restore GIC CPU interface configuration */
> + writel_gicc(gicv2_context.gicc_pmr, GICC_PMR);
> + writel_gicc(gicv2_context.gicc_bpr, GICC_BPR);
> +
> + /* Enable GIC CPU interface */
> + writel_gicc(gicv2_context.gicc_ctlr | GICC_CTL_ENABLE | GICC_CTL_EOI,
> + GICC_CTLR);
> +}
> +
> +static void gicv2_alloc_context(struct gicv2_context *gc)
> +{
> + uint32_t n = gicv2_info.nr_lines;
> +
> + gc->gicd_isenabler = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
> + if ( !gc->gicd_isenabler )
> + goto err_free;
> +
> + gc->gicd_isactiver = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
> + if ( !gc->gicd_isactiver )
> + goto err_free;
> +
> + gc->gicd_itargetsr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
> + if ( !gc->gicd_itargetsr )
> + goto err_free;
> +
> + gc->gicd_ipriorityr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
> + if ( !gc->gicd_ipriorityr )
> + goto err_free;
> +
> + gc->gicd_icfgr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 16));
> + if ( !gc->gicd_icfgr )
> + goto err_free;
> +
> + return;
> +
> + err_free:
> + printk(XENLOG_ERR "Failed to allocate memory for GICv2 suspend context\n");
> +
> + xfree(gc->gicd_icfgr);
> + xfree(gc->gicd_ipriorityr);
> + xfree(gc->gicd_itargetsr);
> + xfree(gc->gicd_isactiver);
> + xfree(gc->gicd_isenabler);
> +
> + memset(gc, 0, sizeof(*gc));
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> #ifdef CONFIG_ACPI
> static unsigned long gicv2_get_hwdom_extra_madt_size(const struct domain *d)
> {
> @@ -1302,6 +1436,11 @@ static int __init gicv2_init(void)
>
> spin_unlock(&gicv2.lock);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + /* Allocate memory to be used for saving GIC context during the suspend */
> + gicv2_alloc_context(&gicv2_context);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> return 0;
> }
>
> @@ -1345,6 +1484,10 @@ static const struct gic_hw_operations gicv2_ops = {
> .map_hwdom_extra_mappings = gicv2_map_hwdom_extra_mappings,
> .iomem_deny_access = gicv2_iomem_deny_access,
> .do_LPI = gicv2_do_LPI,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = gicv2_suspend,
> + .resume = gicv2_resume,
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> };
>
> /* Set up the GIC */
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e80fe0ca24..a018bd7715 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -425,6 +425,35 @@ int gic_iomem_deny_access(struct domain *d)
> return gic_hw_ops->iomem_deny_access(d);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +int gic_suspend(void)
> +{
> + /* Must be called by boot CPU#0 with interrupts disabled */
> + ASSERT(!local_irq_is_enabled());
> + ASSERT(!smp_processor_id());
> +
> + if ( !gic_hw_ops->suspend || !gic_hw_ops->resume )
> + return -ENOSYS;
> +
> + return gic_hw_ops->suspend();
> +}
> +
> +void gic_resume(void)
> +{
> + /*
> + * Must be called by boot CPU#0 with interrupts disabled after gic_suspend
> + * has returned successfully.
> + */
> + ASSERT(!local_irq_is_enabled());
> + ASSERT(!smp_processor_id());
> + ASSERT(gic_hw_ops->resume);
> +
> + gic_hw_ops->resume();
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int cpu_gic_callback(struct notifier_block *nfb,
> unsigned long action,
> void *hcpu)
> diff --git a/xen/arch/arm/include/asm/gic.h b/xen/arch/arm/include/asm/gic.h
> index 541f0eeb80..a706303008 100644
> --- a/xen/arch/arm/include/asm/gic.h
> +++ b/xen/arch/arm/include/asm/gic.h
> @@ -280,6 +280,12 @@ extern int gicv_setup(struct domain *d);
> extern void gic_save_state(struct vcpu *v);
> extern void gic_restore_state(struct vcpu *v);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +/* Suspend/resume */
> +extern int gic_suspend(void);
> +extern void gic_resume(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> /* SGI (AKA IPIs) */
> enum gic_sgi {
> GIC_SGI_EVENT_CHECK,
> @@ -395,6 +401,12 @@ struct gic_hw_operations {
> int (*iomem_deny_access)(struct domain *d);
> /* Handle LPIs, which require special handling */
> void (*do_LPI)(unsigned int lpi);
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + /* Save GIC configuration due to the system suspend */
> + int (*suspend)(void);
> + /* Restore GIC configuration due to the system resume */
> + void (*resume)(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> };
>
> extern const struct gic_hw_operations *gic_hw_ops;
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2025-09-01 22:10 ` [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
2025-09-02 20:24 ` Volodymyr Babchuk
@ 2025-09-12 23:30 ` Julien Grall
2025-09-17 3:29 ` Mykola Kvach
1 sibling, 1 reply; 49+ messages in thread
From: Julien Grall @ 2025-09-12 23:30 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach
Hi Mykola,
On 01/09/2025 23:10, Mykola Kvach wrote:
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> System suspend may lead to a state where GIC would be powered down.
> Therefore, Xen should save/restore the context of GIC on suspend/resume.
>
> Note that the context consists of states of registers which are
> controlled by the hypervisor. Other GIC registers which are accessible
> by guests are saved/restored on context switch.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in v6:
> - drop extra func/line printing from dprintk
> - drop checking context allocation from resume handler
> - merge some loops where it is possible
>
> Changes in v4:
> - Add error logging for allocation failures
>
> Changes in v3:
> - Drop asserts and return error codes instead.
> - Wrap code with CONFIG_SYSTEM_SUSPEND.
>
> Changes in v2:
> - Minor fixes after review.
> ---
> xen/arch/arm/gic-v2.c | 143 +++++++++++++++++++++++++++++++++
> xen/arch/arm/gic.c | 29 +++++++
> xen/arch/arm/include/asm/gic.h | 12 +++
> 3 files changed, 184 insertions(+)
>
> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> index b23e72a3d0..6373599e69 100644
> --- a/xen/arch/arm/gic-v2.c
> +++ b/xen/arch/arm/gic-v2.c
> @@ -1098,6 +1098,140 @@ static int gicv2_iomem_deny_access(struct domain *d)
> return iomem_deny_access(d, mfn, mfn + nr);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +/* GICv2 registers to be saved/restored on system suspend/resume */
> +struct gicv2_context {
> + /* GICC context */
> + uint32_t gicc_ctlr;
> + uint32_t gicc_pmr;
> + uint32_t gicc_bpr;
> + /* GICD context */
> + uint32_t gicd_ctlr;
I don't quite follow why all the registers above needs to be
saved/restored. Is it just convenience because it is too complicated to
recreate the value?
> + uint32_t *gicd_isenabler;
> + uint32_t *gicd_isactiver;
> + uint32_t *gicd_ipriorityr;
> + uint32_t *gicd_itargetsr;
> + uint32_t *gicd_icfgr;
> +};> +
> +static struct gicv2_context gicv2_context;
> +
> +static int gicv2_suspend(void)
> +{
> + unsigned int i;
> +
> + if ( !gicv2_context.gicd_isenabler )
> + {
> + dprintk(XENLOG_WARNING, "GICv2 suspend context not allocated!\n");
> + return -ENOMEM;
> + }
> +
> + /* Save GICC configuration */
> + gicv2_context.gicc_ctlr = readl_gicc(GICC_CTLR);
> + gicv2_context.gicc_pmr = readl_gicc(GICC_PMR);
> + gicv2_context.gicc_bpr = readl_gicc(GICC_BPR);
> +
> + /* Save GICD configuration */
> + gicv2_context.gicd_ctlr = readl_gicd(GICD_CTLR);
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
> + {
> + gicv2_context.gicd_isenabler[i] = readl_gicd(GICD_ISENABLER + i * 4);
> + gicv2_context.gicd_isactiver[i] = readl_gicd(GICD_ISACTIVER + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
> + {
> + gicv2_context.gicd_ipriorityr[i] = readl_gicd(GICD_IPRIORITYR + i * 4);
> + gicv2_context.gicd_itargetsr[i] = readl_gicd(GICD_ITARGETSR + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
> + gicv2_context.gicd_icfgr[i] = readl_gicd(GICD_ICFGR + i * 4);
> +
> + return 0;
> +}
> +
> +static void gicv2_resume(void)
> +{
> + unsigned int i;
> +
> + gicv2_cpu_disable();> + /* Disable distributor */
> + writel_gicd(0, GICD_CTLR);
> +
> + /* Restore GICD configuration */
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
> + {
> + writel_gicd(0xffffffff, GICD_ICENABLER + i * 4);
> + writel_gicd(gicv2_context.gicd_isenabler[i], GICD_ISENABLER + i * 4);
> +
> + writel_gicd(0xffffffff, GICD_ICACTIVER + i * 4);
> + writel_gicd(gicv2_context.gicd_isactiver[i], GICD_ISACTIVER + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
> + {
> + writel_gicd(gicv2_context.gicd_ipriorityr[i], GICD_IPRIORITYR + i * 4);
> + writel_gicd(gicv2_context.gicd_itargetsr[i], GICD_ITARGETSR + i * 4);
> + }
> +
> + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
> + writel_gicd(gicv2_context.gicd_icfgr[i], GICD_ICFGR + i * 4);
> +
> + /* Make sure all registers are restored and enable distributor */
> + writel_gicd(gicv2_context.gicd_ctlr | GICD_CTL_ENABLE, GICD_CTLR);
Why are we forcing CTL_ENABLE? Surely it should have been set and if
not, then why is it fine to override it?
> +
> + /* Restore GIC CPU interface configuration */
> + writel_gicc(gicv2_context.gicc_pmr, GICC_PMR);
> + writel_gicc(gicv2_context.gicc_bpr, GICC_BPR);
> +
> + /* Enable GIC CPU interface */
> + writel_gicc(gicv2_context.gicc_ctlr | GICC_CTL_ENABLE | GICC_CTL_EOI,
> + GICC_CTLR);
Same question here for both ENABLE and EOI.
> +}
> +
> +static void gicv2_alloc_context(struct gicv2_context *gc)
I am a bit surprised this is not returning an error? Why is it ok to
ignore the error and continue? At least for now, if someone enable
CONFIG_SYSTEM_SUSPEND, they would likely want the feature. So it would
be better to crash early.
> +{
> + uint32_t n = gicv2_info.nr_lines;
> +
> + gc->gicd_isenabler = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
> + if ( !gc->gicd_isenabler )
> + goto err_free;
> +
> + gc->gicd_isactiver = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
> + if ( !gc->gicd_isactiver )
> + goto err_free;
> +
> + gc->gicd_itargetsr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
> + if ( !gc->gicd_itargetsr )
> + goto err_free;
> +
> + gc->gicd_ipriorityr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
> + if ( !gc->gicd_ipriorityr )
> + goto err_free;
> +
> + gc->gicd_icfgr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 16));
> + if ( !gc->gicd_icfgr )
> + goto err_free;
I am wondering if we are really saving that much by allocating each
array separately? It would simply the code if we fix the array to
support up to 1024 interrupts so we allocate a single structure.
> +> + return;
> +
> + err_free:
> + printk(XENLOG_ERR "Failed to allocate memory for GICv2 suspend context\n");
> +> + xfree(gc->gicd_icfgr);
> + xfree(gc->gicd_ipriorityr);
> + xfree(gc->gicd_itargetsr);
> + xfree(gc->gicd_isactiver);
> + xfree(gc->gicd_isenabler);
NIT: If you use XFREE(), then you don't need the memset below.
> +
> + memset(gc, 0, sizeof(*gc));
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> #ifdef CONFIG_ACPI
> static unsigned long gicv2_get_hwdom_extra_madt_size(const struct domain *d)
> {
> @@ -1302,6 +1436,11 @@ static int __init gicv2_init(void)
>
> spin_unlock(&gicv2.lock);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + /* Allocate memory to be used for saving GIC context during the suspend */
> + gicv2_alloc_context(&gicv2_context);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> return 0;
> }
>
> @@ -1345,6 +1484,10 @@ static const struct gic_hw_operations gicv2_ops = {
> .map_hwdom_extra_mappings = gicv2_map_hwdom_extra_mappings,
> .iomem_deny_access = gicv2_iomem_deny_access,
> .do_LPI = gicv2_do_LPI,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = gicv2_suspend,
> + .resume = gicv2_resume,
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> };
>
> /* Set up the GIC */
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e80fe0ca24..a018bd7715 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -425,6 +425,35 @@ int gic_iomem_deny_access(struct domain *d)
> return gic_hw_ops->iomem_deny_access(d);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +int gic_suspend(void)
> +{
> + /* Must be called by boot CPU#0 with interrupts disabled */
What would prevent us to suspend from another CPU?
> + ASSERT(!local_irq_is_enabled());
> + ASSERT(!smp_processor_id());
> +
> + if ( !gic_hw_ops->suspend || !gic_hw_ops->resume )
> + return -ENOSYS;
> +
> + return gic_hw_ops->suspend();
> +}
> +
> +void gic_resume(void)
> +{
> + /*
> + * Must be called by boot CPU#0 with interrupts disabled after gic_suspend
> + * has returned successfully.
> + */
> + ASSERT(!local_irq_is_enabled());
> + ASSERT(!smp_processor_id());
> + ASSERT(gic_hw_ops->resume);
> +
> + gic_hw_ops->resume();
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int cpu_gic_callback(struct notifier_block *nfb,
> unsigned long action,
> void *hcpu)
> diff --git a/xen/arch/arm/include/asm/gic.h b/xen/arch/arm/include/asm/gic.h
> index 541f0eeb80..a706303008 100644
> --- a/xen/arch/arm/include/asm/gic.h
> +++ b/xen/arch/arm/include/asm/gic.h
> @@ -280,6 +280,12 @@ extern int gicv_setup(struct domain *d);
> extern void gic_save_state(struct vcpu *v);
> extern void gic_restore_state(struct vcpu *v);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +/* Suspend/resume */
> +extern int gic_suspend(void);
> +extern void gic_resume(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> /* SGI (AKA IPIs) */
> enum gic_sgi {
> GIC_SGI_EVENT_CHECK,
> @@ -395,6 +401,12 @@ struct gic_hw_operations {
> int (*iomem_deny_access)(struct domain *d);
> /* Handle LPIs, which require special handling */
> void (*do_LPI)(unsigned int lpi);
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + /* Save GIC configuration due to the system suspend */
> + int (*suspend)(void);
> + /* Restore GIC configuration due to the system resume */
> + void (*resume)(void);
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> };
>
> extern const struct gic_hw_operations *gic_hw_ops;
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions
2025-09-12 23:30 ` Julien Grall
@ 2025-09-17 3:29 ` Mykola Kvach
0 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-17 3:29 UTC (permalink / raw)
To: Julien Grall
Cc: xen-devel, Mirela Simonovic, Stefano Stabellini, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach
Hi Julien,
Thank you for the review.
On Sat, Sep 13, 2025 at 2:30 AM Julien Grall <julien@xen.org> wrote:
>
> Hi Mykola,
>
> On 01/09/2025 23:10, Mykola Kvach wrote:
> > From: Mirela Simonovic <mirela.simonovic@aggios.com>
> >
> > System suspend may lead to a state where GIC would be powered down.
> > Therefore, Xen should save/restore the context of GIC on suspend/resume.
> >
> > Note that the context consists of states of registers which are
> > controlled by the hypervisor. Other GIC registers which are accessible
> > by guests are saved/restored on context switch.
> >
> > Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> > Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> > Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in v6:
> > - drop extra func/line printing from dprintk
> > - drop checking context allocation from resume handler
> > - merge some loops where it is possible
> >
> > Changes in v4:
> > - Add error logging for allocation failures
> >
> > Changes in v3:
> > - Drop asserts and return error codes instead.
> > - Wrap code with CONFIG_SYSTEM_SUSPEND.
> >
> > Changes in v2:
> > - Minor fixes after review.
> > ---
> > xen/arch/arm/gic-v2.c | 143 +++++++++++++++++++++++++++++++++
> > xen/arch/arm/gic.c | 29 +++++++
> > xen/arch/arm/include/asm/gic.h | 12 +++
> > 3 files changed, 184 insertions(+)
> >
> > diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> > index b23e72a3d0..6373599e69 100644
> > --- a/xen/arch/arm/gic-v2.c
> > +++ b/xen/arch/arm/gic-v2.c
> > @@ -1098,6 +1098,140 @@ static int gicv2_iomem_deny_access(struct domain *d)
> > return iomem_deny_access(d, mfn, mfn + nr);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +/* GICv2 registers to be saved/restored on system suspend/resume */
> > +struct gicv2_context {
> > + /* GICC context */
> > + uint32_t gicc_ctlr;
> > + uint32_t gicc_pmr;
> > + uint32_t gicc_bpr;
> > + /* GICD context */
> > + uint32_t gicd_ctlr;
>
> I don't quite follow why all the registers above needs to be
> saved/restored. Is it just convenience because it is too complicated to
> recreate the value?
Do you mean reinitializing them with the same values as in the init path?
My reasoning for saving/restoring is to avoid duplicating assumptions from
initialization in the resume code. If the init sequence changes in the
future, or if some registers are modified outside of init, the resume path
would also need to be updated. Saving/restoring directly feels like a more
universal and robust approach.
>
> > + uint32_t *gicd_isenabler;
> > + uint32_t *gicd_isactiver;
> > + uint32_t *gicd_ipriorityr;
> > + uint32_t *gicd_itargetsr;
> > + uint32_t *gicd_icfgr;
> > +};> +
> > +static struct gicv2_context gicv2_context;
> > +
> > +static int gicv2_suspend(void)
> > +{
> > + unsigned int i;
> > +
> > + if ( !gicv2_context.gicd_isenabler )
> > + {
> > + dprintk(XENLOG_WARNING, "GICv2 suspend context not allocated!\n");
> > + return -ENOMEM;
> > + }
> > +
> > + /* Save GICC configuration */
> > + gicv2_context.gicc_ctlr = readl_gicc(GICC_CTLR);
> > + gicv2_context.gicc_pmr = readl_gicc(GICC_PMR);
> > + gicv2_context.gicc_bpr = readl_gicc(GICC_BPR);
> > +
> > + /* Save GICD configuration */
> > + gicv2_context.gicd_ctlr = readl_gicd(GICD_CTLR);
> > +
> > + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
> > + {
> > + gicv2_context.gicd_isenabler[i] = readl_gicd(GICD_ISENABLER + i * 4);
> > + gicv2_context.gicd_isactiver[i] = readl_gicd(GICD_ISACTIVER + i * 4);
> > + }
> > +
> > + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
> > + {
> > + gicv2_context.gicd_ipriorityr[i] = readl_gicd(GICD_IPRIORITYR + i * 4);
> > + gicv2_context.gicd_itargetsr[i] = readl_gicd(GICD_ITARGETSR + i * 4);
> > + }
> > +
> > + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
> > + gicv2_context.gicd_icfgr[i] = readl_gicd(GICD_ICFGR + i * 4);
> > +
> > + return 0;
> > +}
> > +
> > +static void gicv2_resume(void)
> > +{
> > + unsigned int i;
> > +
> > + gicv2_cpu_disable();> + /* Disable distributor */
> > + writel_gicd(0, GICD_CTLR);
> > +
> > + /* Restore GICD configuration */
> > + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 32); i++ )
> > + {
> > + writel_gicd(0xffffffff, GICD_ICENABLER + i * 4);
> > + writel_gicd(gicv2_context.gicd_isenabler[i], GICD_ISENABLER + i * 4);
> > +
> > + writel_gicd(0xffffffff, GICD_ICACTIVER + i * 4);
> > + writel_gicd(gicv2_context.gicd_isactiver[i], GICD_ISACTIVER + i * 4);
> > + }
> > +
> > + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 4); i++ )
> > + {
> > + writel_gicd(gicv2_context.gicd_ipriorityr[i], GICD_IPRIORITYR + i * 4);
> > + writel_gicd(gicv2_context.gicd_itargetsr[i], GICD_ITARGETSR + i * 4);
> > + }
> > +
> > + for ( i = 0; i < DIV_ROUND_UP(gicv2_info.nr_lines, 16); i++ )
> > + writel_gicd(gicv2_context.gicd_icfgr[i], GICD_ICFGR + i * 4);
> > +
> > + /* Make sure all registers are restored and enable distributor */
> > + writel_gicd(gicv2_context.gicd_ctlr | GICD_CTL_ENABLE, GICD_CTLR);
>
> Why are we forcing CTL_ENABLE? Surely it should have been set and if
> not, then why is it fine to override it?
You are right — forcing GICD_CTL_ENABLE is unnecessary here.
The value of GICD_CTLR was already saved before suspend, so restoring
it as-is should be sufficient.
I will drop the | GICD_CTL_ENABLE and just restore the saved value.
>
> > +
> > + /* Restore GIC CPU interface configuration */
> > + writel_gicc(gicv2_context.gicc_pmr, GICC_PMR);
> > + writel_gicc(gicv2_context.gicc_bpr, GICC_BPR);
> > +
> > + /* Enable GIC CPU interface */
> > + writel_gicc(gicv2_context.gicc_ctlr | GICC_CTL_ENABLE | GICC_CTL_EOI,
> > + GICC_CTLR);
>
> Same question here for both ENABLE and EOI.
You are right here as well — we don’t need to force GICC_CTL_ENABLE
or GICC_CTL_EOI. The saved GICC_CTLR value should already reflect
the correct state at the time of suspend.
So it would be cleaner to just restore the saved register value
directly, without OR’ing additional bits.
>
> > +}
> > +
> > +static void gicv2_alloc_context(struct gicv2_context *gc)
>
> I am a bit surprised this is not returning an error? Why is it ok to
> ignore the error and continue? At least for now, if someone enable
> CONFIG_SYSTEM_SUSPEND, they would likely want the feature. So it would
> be better to crash early.
This behavior was introduced based on feedback on one of the earlier
versions of the patch series.
I agree with your point — if CONFIG_SYSTEM_SUSPEND is enabled, then
failing to allocate the context should be treated as fatal. I will
update the code to crash early in this case.
>
> > +{
> > + uint32_t n = gicv2_info.nr_lines;
> > +
> > + gc->gicd_isenabler = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
> > + if ( !gc->gicd_isenabler )
> > + goto err_free;
> > +
> > + gc->gicd_isactiver = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 32));
> > + if ( !gc->gicd_isactiver )
> > + goto err_free;
> > +
> > + gc->gicd_itargetsr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
> > + if ( !gc->gicd_itargetsr )
> > + goto err_free;
> > +
> > + gc->gicd_ipriorityr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 4));
> > + if ( !gc->gicd_ipriorityr )
> > + goto err_free;
> > +
> > + gc->gicd_icfgr = xzalloc_array(uint32_t, DIV_ROUND_UP(n, 16));
> > + if ( !gc->gicd_icfgr )
> > + goto err_free;
>
> I am wondering if we are really saving that much by allocating each
> array separately? It would simply the code if we fix the array to
> support up to 1024 interrupts so we allocate a single structure.
I suppose some systems may have only local interrupts, or a very small
number of SPIs, which is allowed by the spec.
We could rewrite the code to use a single allocation for all arrays, or
possibly avoid dynamic allocation entirely and declare the arrays of
structs in global scope. The latter approach would simplify the code
and reduce the number of allocations, but it would use memory less
efficiently.
>
> > +> + return;
> > +
> > + err_free:
> > + printk(XENLOG_ERR "Failed to allocate memory for GICv2 suspend context\n");
> > +> + xfree(gc->gicd_icfgr);
> > + xfree(gc->gicd_ipriorityr);
> > + xfree(gc->gicd_itargetsr);
> > + xfree(gc->gicd_isactiver);
> > + xfree(gc->gicd_isenabler);
>
> NIT: If you use XFREE(), then you don't need the memset below.
Ack.
>
> > +
> > + memset(gc, 0, sizeof(*gc));
> > +}
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > #ifdef CONFIG_ACPI
> > static unsigned long gicv2_get_hwdom_extra_madt_size(const struct domain *d)
> > {
> > @@ -1302,6 +1436,11 @@ static int __init gicv2_init(void)
> >
> > spin_unlock(&gicv2.lock);
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + /* Allocate memory to be used for saving GIC context during the suspend */
> > + gicv2_alloc_context(&gicv2_context);
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > return 0;
> > }
> >
> > @@ -1345,6 +1484,10 @@ static const struct gic_hw_operations gicv2_ops = {
> > .map_hwdom_extra_mappings = gicv2_map_hwdom_extra_mappings,
> > .iomem_deny_access = gicv2_iomem_deny_access,
> > .do_LPI = gicv2_do_LPI,
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + .suspend = gicv2_suspend,
> > + .resume = gicv2_resume,
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > };
> >
> > /* Set up the GIC */
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index e80fe0ca24..a018bd7715 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -425,6 +425,35 @@ int gic_iomem_deny_access(struct domain *d)
> > return gic_hw_ops->iomem_deny_access(d);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +int gic_suspend(void)
> > +{
> > + /* Must be called by boot CPU#0 with interrupts disabled */
>
> What would prevent us to suspend from another CPU?
Nothing prevents suspend from being called on another CPU.
According to the PSCI specification, it just needs to be the last
running CPU in the system.
>
> > + ASSERT(!local_irq_is_enabled());
> > + ASSERT(!smp_processor_id());
> > +
> > + if ( !gic_hw_ops->suspend || !gic_hw_ops->resume )
> > + return -ENOSYS;
> > +
> > + return gic_hw_ops->suspend();
> > +}
> > +
> > +void gic_resume(void)
> > +{
> > + /*
> > + * Must be called by boot CPU#0 with interrupts disabled after gic_suspend
> > + * has returned successfully.
> > + */
> > + ASSERT(!local_irq_is_enabled());
> > + ASSERT(!smp_processor_id());
> > + ASSERT(gic_hw_ops->resume);
> > +
> > + gic_hw_ops->resume();
> > +}
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > static int cpu_gic_callback(struct notifier_block *nfb,
> > unsigned long action,
> > void *hcpu)
> > diff --git a/xen/arch/arm/include/asm/gic.h b/xen/arch/arm/include/asm/gic.h
> > index 541f0eeb80..a706303008 100644
> > --- a/xen/arch/arm/include/asm/gic.h
> > +++ b/xen/arch/arm/include/asm/gic.h
> > @@ -280,6 +280,12 @@ extern int gicv_setup(struct domain *d);
> > extern void gic_save_state(struct vcpu *v);
> > extern void gic_restore_state(struct vcpu *v);
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +/* Suspend/resume */
> > +extern int gic_suspend(void);
> > +extern void gic_resume(void);
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > /* SGI (AKA IPIs) */
> > enum gic_sgi {
> > GIC_SGI_EVENT_CHECK,
> > @@ -395,6 +401,12 @@ struct gic_hw_operations {
> > int (*iomem_deny_access)(struct domain *d);
> > /* Handle LPIs, which require special handling */
> > void (*do_LPI)(unsigned int lpi);
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + /* Save GIC configuration due to the system suspend */
> > + int (*suspend)(void);
> > + /* Restore GIC configuration due to the system resume */
> > + void (*resume)(void);
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > };
> >
> > extern const struct gic_hw_operations *gic_hw_ops;
>
> Cheers,
>
> --
> Julien Grall
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 01/13] xen/arm: Add suspend and resume timer helpers Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 02/13] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 16:08 ` Oleksandr Tyshchenko
2025-09-01 22:10 ` [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend Mykola Kvach
` (10 subsequent siblings)
13 siblings, 1 reply; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
System suspend may lead to a state where GIC would be powered down.
Therefore, Xen should save/restore the context of GIC on suspend/resume.
Note that the context consists of states of registers which are
controlled by the hypervisor. Other GIC registers which are accessible
by guests are saved/restored on context switch.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- Drop gicv3_save/restore_state since it is already handled during vCPU
context switch.
- The comment about systems without SPIs is clarified for readability.
- Error and warning messages related to suspend context allocation are unified
and now use printk() with XENLOG_ERR for consistency.
- The check for suspend context allocation in gicv3_resume() is removed,
as it is handled earlier in the suspend path.
- The loop for saving and restoring PPI/SGI priorities is corrected to use
the proper increment.
- The gicv3_suspend() function now prints an explicit error if ITS suspend
support is not implemented, and returns ENOSYS in this case.
- The GICD_CTLR_DS bit definition is added to gic_v3_defs.h.
- The comment for GICR_WAKER access is expanded to reference the relevant
ARM specification section and clarify the RAZ/WI behavior for Non-secure
accesses.
- Cleanup active and enable registers before restoring.
---
xen/arch/arm/gic-v3-lpi.c | 3 +
xen/arch/arm/gic-v3.c | 235 +++++++++++++++++++++++++
xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
3 files changed, 239 insertions(+)
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index de5052e5cf..61a6e18303 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -391,6 +391,9 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
switch ( action )
{
case CPU_UP_PREPARE:
+ if ( system_state == SYS_STATE_resume )
+ break;
+
rc = gicv3_lpi_allocate_pendtable(cpu);
if ( rc )
printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%lu\n",
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index cd3e1acf79..9f1be7e905 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1776,6 +1776,233 @@ static bool gic_dist_supports_lpis(void)
return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+/* GICv3 registers to be saved/restored on system suspend/resume */
+struct gicv3_ctx {
+ struct dist_ctx {
+ uint32_t ctlr;
+ /*
+ * This struct represent block of 32 IRQs
+ * TODO: store extended SPI configuration (GICv3.1+)
+ */
+ struct irq_regs {
+ uint32_t icfgr[2];
+ uint32_t ipriorityr[8];
+ uint64_t irouter[32];
+ uint32_t isactiver;
+ uint32_t isenabler;
+ } *irqs;
+ } dist;
+
+ /* have only one rdist structure for last running CPU during suspend */
+ struct redist_ctx {
+ uint32_t ctlr;
+ /* TODO: handle case when we have more than 16 PPIs (GICv3.1+) */
+ uint32_t icfgr[2];
+ uint32_t igroupr;
+ uint32_t ipriorityr[8];
+ uint32_t isactiver;
+ uint32_t isenabler;
+ } rdist;
+
+ struct cpu_ctx {
+ uint32_t ctlr;
+ uint32_t pmr;
+ uint32_t bpr;
+ uint32_t sre_el2;
+ uint32_t grpen;
+ } cpu;
+};
+
+static struct gicv3_ctx gicv3_ctx;
+
+static void __init gicv3_alloc_context(void)
+{
+ uint32_t blocks = DIV_ROUND_UP(gicv3_info.nr_lines, 32);
+
+ /* We don't have ITS support for suspend */
+ if ( gicv3_its_host_has_its() )
+ return;
+
+ /* The spec allows for systems without any SPIs */
+ if ( blocks > 1 )
+ {
+ gicv3_ctx.dist.irqs = xzalloc_array(typeof(*gicv3_ctx.dist.irqs),
+ blocks - 1);
+ if ( !gicv3_ctx.dist.irqs )
+ printk(XENLOG_ERR "Failed to allocate memory for GICv3 suspend context\n");
+ }
+}
+
+static void gicv3_disable_redist(void)
+{
+ void __iomem* waker = GICD_RDIST_BASE + GICR_WAKER;
+
+ /*
+ * Avoid infinite loop if Non-secure does not have access to GICR_WAKER.
+ * See Arm IHI 0069H.b, 12.11.42 GICR_WAKER:
+ * When GICD_CTLR.DS == 0 and an access is Non-secure accesses to this
+ * register are RAZ/WI.
+ */
+ if ( !(readl_relaxed(GICD + GICD_CTLR) & GICD_CTLR_DS) )
+ return;
+
+ writel_relaxed(readl_relaxed(waker) | GICR_WAKER_ProcessorSleep, waker);
+ while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 );
+}
+
+static int gicv3_suspend(void)
+{
+ unsigned int i;
+ void __iomem *base;
+ typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
+
+ /* TODO: implement support for ITS */
+ if ( gicv3_its_host_has_its() )
+ {
+ printk(XENLOG_ERR "GICv3: ITS suspend support is not implemented\n");
+ return -ENOSYS;
+ }
+
+ if ( !gicv3_ctx.dist.irqs && gicv3_info.nr_lines > NR_GIC_LOCAL_IRQS )
+ {
+ printk(XENLOG_ERR "GICv3: suspend context is not allocated!\n");
+ return -ENOMEM;
+ }
+
+ /* Save GICC configuration */
+ gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
+ gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
+ gicv3_ctx.cpu.bpr = READ_SYSREG(ICC_BPR1_EL1);
+ gicv3_ctx.cpu.sre_el2 = READ_SYSREG(ICC_SRE_EL2);
+ gicv3_ctx.cpu.grpen = READ_SYSREG(ICC_IGRPEN1_EL1);
+
+ gicv3_disable_interface();
+ gicv3_disable_redist();
+
+ /* Save GICR configuration */
+ gicv3_redist_wait_for_rwp();
+
+ base = GICD_RDIST_SGI_BASE;
+
+ rdist->ctlr = readl_relaxed(base + GICR_CTLR);
+
+ /* Save priority on PPI and SGI interrupts */
+ for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
+ rdist->ipriorityr[i] = readl_relaxed(base + GICR_IPRIORITYR0 + 4 * i);
+
+ rdist->isactiver = readl_relaxed(base + GICR_ISACTIVER0);
+ rdist->isenabler = readl_relaxed(base + GICR_ISENABLER0);
+ rdist->igroupr = readl_relaxed(base + GICR_IGROUPR0);
+ rdist->icfgr[0] = readl_relaxed(base + GICR_ICFGR0);
+ rdist->icfgr[1] = readl_relaxed(base + GICR_ICFGR1);
+
+ /* Save GICD configuration */
+ gicv3_dist_wait_for_rwp();
+ gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
+
+ for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
+ {
+ typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
+ unsigned int irq;
+
+ base = GICD + GICD_ICFGR + 8 * i;
+ irqs->icfgr[0] = readl_relaxed(base);
+ irqs->icfgr[1] = readl_relaxed(base + 4);
+
+ base = GICD + GICD_IPRIORITYR + 32 * i;
+ for ( irq = 0; irq < 8; irq++ )
+ irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
+
+ base = GICD + GICD_IROUTER + 32 * i;
+ for ( irq = 0; irq < 32; irq++ )
+ irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
+
+ irqs->isactiver = readl_relaxed(GICD + GICD_ISACTIVER + 4 * i);
+ irqs->isenabler = readl_relaxed(GICD + GICD_ISENABLER + 4 * i);
+ }
+
+ return 0;
+}
+
+static void gicv3_resume(void)
+{
+ unsigned int i;
+ void __iomem *base;
+ typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
+
+ writel_relaxed(0, GICD + GICD_CTLR);
+
+ for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
+ writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
+
+ for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
+ {
+ typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
+ unsigned int irq;
+
+ base = GICD + GICD_ICFGR + 8 * i;
+ writel_relaxed(irqs->icfgr[0], base);
+ writel_relaxed(irqs->icfgr[1], base + 4);
+
+ base = GICD + GICD_IPRIORITYR + 32 * i;
+ for ( irq = 0; irq < 8; irq++ )
+ writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
+
+ base = GICD + GICD_IROUTER + 32 * i;
+ for ( irq = 0; irq < 32; irq++ )
+ writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
+
+ writel_relaxed(GENMASK(31, 0), GICD + GICD_ICENABLER + i * 4);
+ writel_relaxed(irqs->isenabler, GICD + GICD_ISENABLER + i * 4);
+
+ writel_relaxed(GENMASK(31, 0), GICD + GICD_ICACTIVER + i * 4);
+ writel_relaxed(irqs->isactiver, GICD + GICD_ISACTIVER + i * 4);
+ }
+
+ writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
+ gicv3_dist_wait_for_rwp();
+
+ /* Restore GICR (Redistributor) configuration */
+ gicv3_enable_redist();
+
+ base = GICD_RDIST_SGI_BASE;
+
+ writel_relaxed(0xffffffff, base + GICR_ICENABLER0);
+ gicv3_redist_wait_for_rwp();
+
+ for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
+ writel_relaxed(rdist->ipriorityr[i], base + GICR_IPRIORITYR0 + i * 4);
+
+ writel_relaxed(rdist->isactiver, base + GICR_ISACTIVER0);
+
+ writel_relaxed(rdist->igroupr, base + GICR_IGROUPR0);
+ writel_relaxed(rdist->icfgr[0], base + GICR_ICFGR0);
+ writel_relaxed(rdist->icfgr[1], base + GICR_ICFGR1);
+
+ gicv3_redist_wait_for_rwp();
+
+ writel_relaxed(rdist->isenabler, base + GICR_ISENABLER0);
+ writel_relaxed(rdist->ctlr, GICD_RDIST_BASE + GICR_CTLR);
+
+ gicv3_redist_wait_for_rwp();
+
+ WRITE_SYSREG(gicv3_ctx.cpu.sre_el2, ICC_SRE_EL2);
+ isb();
+
+ /* Restore CPU interface (System registers) */
+ WRITE_SYSREG(gicv3_ctx.cpu.pmr, ICC_PMR_EL1);
+ WRITE_SYSREG(gicv3_ctx.cpu.bpr, ICC_BPR1_EL1);
+ WRITE_SYSREG(gicv3_ctx.cpu.ctlr, ICC_CTLR_EL1);
+ WRITE_SYSREG(gicv3_ctx.cpu.grpen, ICC_IGRPEN1_EL1);
+ isb();
+
+ gicv3_hyp_init();
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
/* Set up the GIC */
static int __init gicv3_init(void)
{
@@ -1850,6 +2077,10 @@ static int __init gicv3_init(void)
gicv3_hyp_init();
+#ifdef CONFIG_SYSTEM_SUSPEND
+ gicv3_alloc_context();
+#endif
+
out:
spin_unlock(&gicv3.lock);
@@ -1889,6 +2120,10 @@ static const struct gic_hw_operations gicv3_ops = {
#endif
.iomem_deny_access = gicv3_iomem_deny_access,
.do_LPI = gicv3_do_LPI,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = gicv3_suspend,
+ .resume = gicv3_resume,
+#endif
};
static int __init gicv3_dt_preinit(struct dt_device_node *node, const void *data)
diff --git a/xen/arch/arm/include/asm/gic_v3_defs.h b/xen/arch/arm/include/asm/gic_v3_defs.h
index 2af093e774..7e86309acb 100644
--- a/xen/arch/arm/include/asm/gic_v3_defs.h
+++ b/xen/arch/arm/include/asm/gic_v3_defs.h
@@ -56,6 +56,7 @@
#define GICD_TYPE_LPIS (1U << 17)
#define GICD_CTLR_RWP (1UL << 31)
+#define GICD_CTLR_DS (1U << 6)
#define GICD_CTLR_ARE_NS (1U << 4)
#define GICD_CTLR_ENABLE_G1A (1U << 1)
#define GICD_CTLR_ENABLE_G1 (1U << 0)
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions
2025-09-01 22:10 ` [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 " Mykola Kvach
@ 2025-09-02 16:08 ` Oleksandr Tyshchenko
2025-09-02 17:30 ` Mykola Kvach
0 siblings, 1 reply; 49+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-02 16:08 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
On 02.09.25 01:10, Mykola Kvach wrote:
Hello Mykola
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> System suspend may lead to a state where GIC would be powered down.
> Therefore, Xen should save/restore the context of GIC on suspend/resume.
>
> Note that the context consists of states of registers which are
> controlled by the hypervisor. Other GIC registers which are accessible
> by guests are saved/restored on context switch.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V6:
> - Drop gicv3_save/restore_state since it is already handled during vCPU
> context switch.
> - The comment about systems without SPIs is clarified for readability.
> - Error and warning messages related to suspend context allocation are unified
> and now use printk() with XENLOG_ERR for consistency.
> - The check for suspend context allocation in gicv3_resume() is removed,
> as it is handled earlier in the suspend path.
> - The loop for saving and restoring PPI/SGI priorities is corrected to use
> the proper increment.
> - The gicv3_suspend() function now prints an explicit error if ITS suspend
> support is not implemented, and returns ENOSYS in this case.
> - The GICD_CTLR_DS bit definition is added to gic_v3_defs.h.
> - The comment for GICR_WAKER access is expanded to reference the relevant
> ARM specification section and clarify the RAZ/WI behavior for Non-secure
> accesses.
> - Cleanup active and enable registers before restoring.
> ---
> xen/arch/arm/gic-v3-lpi.c | 3 +
> xen/arch/arm/gic-v3.c | 235 +++++++++++++++++++++++++
> xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
> 3 files changed, 239 insertions(+)
>
> diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
> index de5052e5cf..61a6e18303 100644
> --- a/xen/arch/arm/gic-v3-lpi.c
> +++ b/xen/arch/arm/gic-v3-lpi.c
> @@ -391,6 +391,9 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> switch ( action )
> {
> case CPU_UP_PREPARE:
> + if ( system_state == SYS_STATE_resume )
> + break;
> +
> rc = gicv3_lpi_allocate_pendtable(cpu);
> if ( rc )
> printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%lu\n",
> diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> index cd3e1acf79..9f1be7e905 100644
> --- a/xen/arch/arm/gic-v3.c
> +++ b/xen/arch/arm/gic-v3.c
> @@ -1776,6 +1776,233 @@ static bool gic_dist_supports_lpis(void)
> return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +/* GICv3 registers to be saved/restored on system suspend/resume */
> +struct gicv3_ctx {
> + struct dist_ctx {
> + uint32_t ctlr;
> + /*
> + * This struct represent block of 32 IRQs
> + * TODO: store extended SPI configuration (GICv3.1+)
> + */
> + struct irq_regs {
> + uint32_t icfgr[2];
> + uint32_t ipriorityr[8];
> + uint64_t irouter[32];
> + uint32_t isactiver;
> + uint32_t isenabler;
> + } *irqs;
> + } dist;
> +
> + /* have only one rdist structure for last running CPU during suspend */
> + struct redist_ctx {
> + uint32_t ctlr;
> + /* TODO: handle case when we have more than 16 PPIs (GICv3.1+) */
> + uint32_t icfgr[2];
> + uint32_t igroupr;
> + uint32_t ipriorityr[8];
> + uint32_t isactiver;
> + uint32_t isenabler;
> + } rdist;
> +
> + struct cpu_ctx {
> + uint32_t ctlr;
> + uint32_t pmr;
> + uint32_t bpr;
> + uint32_t sre_el2;
> + uint32_t grpen;
> + } cpu;
> +};
> +
> +static struct gicv3_ctx gicv3_ctx;
> +
> +static void __init gicv3_alloc_context(void)
> +{
> + uint32_t blocks = DIV_ROUND_UP(gicv3_info.nr_lines, 32);
> +
> + /* We don't have ITS support for suspend */
> + if ( gicv3_its_host_has_its() )
> + return;
> +
> + /* The spec allows for systems without any SPIs */
> + if ( blocks > 1 )
> + {
> + gicv3_ctx.dist.irqs = xzalloc_array(typeof(*gicv3_ctx.dist.irqs),
> + blocks - 1);
> + if ( !gicv3_ctx.dist.irqs )
> + printk(XENLOG_ERR "Failed to allocate memory for GICv3 suspend context\n");
> + }
> +}
> +
> +static void gicv3_disable_redist(void)
> +{
> + void __iomem* waker = GICD_RDIST_BASE + GICR_WAKER;
> +
> + /*
> + * Avoid infinite loop if Non-secure does not have access to GICR_WAKER.
> + * See Arm IHI 0069H.b, 12.11.42 GICR_WAKER:
> + * When GICD_CTLR.DS == 0 and an access is Non-secure accesses to this
> + * register are RAZ/WI.
> + */
> + if ( !(readl_relaxed(GICD + GICD_CTLR) & GICD_CTLR_DS) )
> + return;
> +
> + writel_relaxed(readl_relaxed(waker) | GICR_WAKER_ProcessorSleep, waker);
> + while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 );
> +}
> +
> +static int gicv3_suspend(void)
> +{
> + unsigned int i;
> + void __iomem *base;
> + typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
> +
> + /* TODO: implement support for ITS */
> + if ( gicv3_its_host_has_its() )
> + {
> + printk(XENLOG_ERR "GICv3: ITS suspend support is not implemented\n");
> + return -ENOSYS;
> + }
> +
> + if ( !gicv3_ctx.dist.irqs && gicv3_info.nr_lines > NR_GIC_LOCAL_IRQS )
> + {
> + printk(XENLOG_ERR "GICv3: suspend context is not allocated!\n");
> + return -ENOMEM;
> + }
> +
> + /* Save GICC configuration */
> + gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
> + gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
> + gicv3_ctx.cpu.bpr = READ_SYSREG(ICC_BPR1_EL1);
> + gicv3_ctx.cpu.sre_el2 = READ_SYSREG(ICC_SRE_EL2);
> + gicv3_ctx.cpu.grpen = READ_SYSREG(ICC_IGRPEN1_EL1);
> +
> + gicv3_disable_interface();
> + gicv3_disable_redist();
> +
> + /* Save GICR configuration */
> + gicv3_redist_wait_for_rwp();
> +
> + base = GICD_RDIST_SGI_BASE;
> +
> + rdist->ctlr = readl_relaxed(base + GICR_CTLR);
> +
> + /* Save priority on PPI and SGI interrupts */
> + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> + rdist->ipriorityr[i] = readl_relaxed(base + GICR_IPRIORITYR0 + 4 * i);
> +
> + rdist->isactiver = readl_relaxed(base + GICR_ISACTIVER0);
> + rdist->isenabler = readl_relaxed(base + GICR_ISENABLER0);
> + rdist->igroupr = readl_relaxed(base + GICR_IGROUPR0);
> + rdist->icfgr[0] = readl_relaxed(base + GICR_ICFGR0);
GICR_ICFGR0 is for SGIs, which are always edge-triggered, so I am not
sure that we need to save it here ...
> + rdist->icfgr[1] = readl_relaxed(base + GICR_ICFGR1);
> +
> + /* Save GICD configuration */
> + gicv3_dist_wait_for_rwp();
> + gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
> +
> + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> + {
> + typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
> + unsigned int irq;
> +
> + base = GICD + GICD_ICFGR + 8 * i;
> + irqs->icfgr[0] = readl_relaxed(base);
> + irqs->icfgr[1] = readl_relaxed(base + 4);
> +
> + base = GICD + GICD_IPRIORITYR + 32 * i;
> + for ( irq = 0; irq < 8; irq++ )
> + irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
> +
> + base = GICD + GICD_IROUTER + 32 * i;
> + for ( irq = 0; irq < 32; irq++ )
> + irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
> +
> + irqs->isactiver = readl_relaxed(GICD + GICD_ISACTIVER + 4 * i);
> + irqs->isenabler = readl_relaxed(GICD + GICD_ISENABLER + 4 * i);
> + }
> +
> + return 0;
> +}
> +
> +static void gicv3_resume(void)
> +{
> + unsigned int i;
> + void __iomem *base;
> + typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
> +
> + writel_relaxed(0, GICD + GICD_CTLR);
> +
> + for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
> + writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
> +
> + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> + {
> + typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
> + unsigned int irq;
> +
> + base = GICD + GICD_ICFGR + 8 * i;
> + writel_relaxed(irqs->icfgr[0], base);
> + writel_relaxed(irqs->icfgr[1], base + 4);
> +
> + base = GICD + GICD_IPRIORITYR + 32 * i;
> + for ( irq = 0; irq < 8; irq++ )
> + writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
> +
> + base = GICD + GICD_IROUTER + 32 * i;
> + for ( irq = 0; irq < 32; irq++ )
> + writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
> +
> + writel_relaxed(GENMASK(31, 0), GICD + GICD_ICENABLER + i * 4);
> + writel_relaxed(irqs->isenabler, GICD + GICD_ISENABLER + i * 4);
> +
> + writel_relaxed(GENMASK(31, 0), GICD + GICD_ICACTIVER + i * 4);
> + writel_relaxed(irqs->isactiver, GICD + GICD_ISACTIVER + i * 4);
> + }
> +
> + writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
> + gicv3_dist_wait_for_rwp();
> +
> + /* Restore GICR (Redistributor) configuration */
> + gicv3_enable_redist();
> +
> + base = GICD_RDIST_SGI_BASE;
> +
> + writel_relaxed(0xffffffff, base + GICR_ICENABLER0);
> + gicv3_redist_wait_for_rwp();
> +
> + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> + writel_relaxed(rdist->ipriorityr[i], base + GICR_IPRIORITYR0 + i * 4);
> +
> + writel_relaxed(rdist->isactiver, base + GICR_ISACTIVER0);
> +
> + writel_relaxed(rdist->igroupr, base + GICR_IGROUPR0);
> + writel_relaxed(rdist->icfgr[0], base + GICR_ICFGR0);
... and restore it here.
> + writel_relaxed(rdist->icfgr[1], base + GICR_ICFGR1);
> +
[snip]
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 suspend/resume functions
2025-09-02 16:08 ` Oleksandr Tyshchenko
@ 2025-09-02 17:30 ` Mykola Kvach
0 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-02 17:30 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Oleksandr,
On Tue, Sep 2, 2025 at 7:08 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
>
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > System suspend may lead to a state where GIC would be powered down.
> > Therefore, Xen should save/restore the context of GIC on suspend/resume.
> >
> > Note that the context consists of states of registers which are
> > controlled by the hypervisor. Other GIC registers which are accessible
> > by guests are saved/restored on context switch.
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - Drop gicv3_save/restore_state since it is already handled during vCPU
> > context switch.
> > - The comment about systems without SPIs is clarified for readability.
> > - Error and warning messages related to suspend context allocation are unified
> > and now use printk() with XENLOG_ERR for consistency.
> > - The check for suspend context allocation in gicv3_resume() is removed,
> > as it is handled earlier in the suspend path.
> > - The loop for saving and restoring PPI/SGI priorities is corrected to use
> > the proper increment.
> > - The gicv3_suspend() function now prints an explicit error if ITS suspend
> > support is not implemented, and returns ENOSYS in this case.
> > - The GICD_CTLR_DS bit definition is added to gic_v3_defs.h.
> > - The comment for GICR_WAKER access is expanded to reference the relevant
> > ARM specification section and clarify the RAZ/WI behavior for Non-secure
> > accesses.
> > - Cleanup active and enable registers before restoring.
> > ---
> > xen/arch/arm/gic-v3-lpi.c | 3 +
> > xen/arch/arm/gic-v3.c | 235 +++++++++++++++++++++++++
> > xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
> > 3 files changed, 239 insertions(+)
> >
> > diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
> > index de5052e5cf..61a6e18303 100644
> > --- a/xen/arch/arm/gic-v3-lpi.c
> > +++ b/xen/arch/arm/gic-v3-lpi.c
> > @@ -391,6 +391,9 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > switch ( action )
> > {
> > case CPU_UP_PREPARE:
> > + if ( system_state == SYS_STATE_resume )
> > + break;
> > +
> > rc = gicv3_lpi_allocate_pendtable(cpu);
> > if ( rc )
> > printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%lu\n",
> > diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> > index cd3e1acf79..9f1be7e905 100644
> > --- a/xen/arch/arm/gic-v3.c
> > +++ b/xen/arch/arm/gic-v3.c
> > @@ -1776,6 +1776,233 @@ static bool gic_dist_supports_lpis(void)
> > return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +/* GICv3 registers to be saved/restored on system suspend/resume */
> > +struct gicv3_ctx {
> > + struct dist_ctx {
> > + uint32_t ctlr;
> > + /*
> > + * This struct represent block of 32 IRQs
> > + * TODO: store extended SPI configuration (GICv3.1+)
> > + */
> > + struct irq_regs {
> > + uint32_t icfgr[2];
> > + uint32_t ipriorityr[8];
> > + uint64_t irouter[32];
> > + uint32_t isactiver;
> > + uint32_t isenabler;
> > + } *irqs;
> > + } dist;
> > +
> > + /* have only one rdist structure for last running CPU during suspend */
> > + struct redist_ctx {
> > + uint32_t ctlr;
> > + /* TODO: handle case when we have more than 16 PPIs (GICv3.1+) */
> > + uint32_t icfgr[2];
> > + uint32_t igroupr;
> > + uint32_t ipriorityr[8];
> > + uint32_t isactiver;
> > + uint32_t isenabler;
> > + } rdist;
> > +
> > + struct cpu_ctx {
> > + uint32_t ctlr;
> > + uint32_t pmr;
> > + uint32_t bpr;
> > + uint32_t sre_el2;
> > + uint32_t grpen;
> > + } cpu;
> > +};
> > +
> > +static struct gicv3_ctx gicv3_ctx;
> > +
> > +static void __init gicv3_alloc_context(void)
> > +{
> > + uint32_t blocks = DIV_ROUND_UP(gicv3_info.nr_lines, 32);
> > +
> > + /* We don't have ITS support for suspend */
> > + if ( gicv3_its_host_has_its() )
> > + return;
> > +
> > + /* The spec allows for systems without any SPIs */
> > + if ( blocks > 1 )
> > + {
> > + gicv3_ctx.dist.irqs = xzalloc_array(typeof(*gicv3_ctx.dist.irqs),
> > + blocks - 1);
> > + if ( !gicv3_ctx.dist.irqs )
> > + printk(XENLOG_ERR "Failed to allocate memory for GICv3 suspend context\n");
> > + }
> > +}
> > +
> > +static void gicv3_disable_redist(void)
> > +{
> > + void __iomem* waker = GICD_RDIST_BASE + GICR_WAKER;
> > +
> > + /*
> > + * Avoid infinite loop if Non-secure does not have access to GICR_WAKER.
> > + * See Arm IHI 0069H.b, 12.11.42 GICR_WAKER:
> > + * When GICD_CTLR.DS == 0 and an access is Non-secure accesses to this
> > + * register are RAZ/WI.
> > + */
> > + if ( !(readl_relaxed(GICD + GICD_CTLR) & GICD_CTLR_DS) )
> > + return;
> > +
> > + writel_relaxed(readl_relaxed(waker) | GICR_WAKER_ProcessorSleep, waker);
> > + while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 );
> > +}
> > +
> > +static int gicv3_suspend(void)
> > +{
> > + unsigned int i;
> > + void __iomem *base;
> > + typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
> > +
> > + /* TODO: implement support for ITS */
> > + if ( gicv3_its_host_has_its() )
> > + {
> > + printk(XENLOG_ERR "GICv3: ITS suspend support is not implemented\n");
> > + return -ENOSYS;
> > + }
> > +
> > + if ( !gicv3_ctx.dist.irqs && gicv3_info.nr_lines > NR_GIC_LOCAL_IRQS )
> > + {
> > + printk(XENLOG_ERR "GICv3: suspend context is not allocated!\n");
> > + return -ENOMEM;
> > + }
> > +
> > + /* Save GICC configuration */
> > + gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
> > + gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
> > + gicv3_ctx.cpu.bpr = READ_SYSREG(ICC_BPR1_EL1);
> > + gicv3_ctx.cpu.sre_el2 = READ_SYSREG(ICC_SRE_EL2);
> > + gicv3_ctx.cpu.grpen = READ_SYSREG(ICC_IGRPEN1_EL1);
> > +
> > + gicv3_disable_interface();
> > + gicv3_disable_redist();
> > +
> > + /* Save GICR configuration */
> > + gicv3_redist_wait_for_rwp();
> > +
> > + base = GICD_RDIST_SGI_BASE;
> > +
> > + rdist->ctlr = readl_relaxed(base + GICR_CTLR);
> > +
> > + /* Save priority on PPI and SGI interrupts */
> > + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> > + rdist->ipriorityr[i] = readl_relaxed(base + GICR_IPRIORITYR0 + 4 * i);
> > +
> > + rdist->isactiver = readl_relaxed(base + GICR_ISACTIVER0);
> > + rdist->isenabler = readl_relaxed(base + GICR_ISENABLER0);
> > + rdist->igroupr = readl_relaxed(base + GICR_IGROUPR0);
> > + rdist->icfgr[0] = readl_relaxed(base + GICR_ICFGR0);
>
> GICR_ICFGR0 is for SGIs, which are always edge-triggered, so I am not
> sure that we need to save it here ...
Looks like I didn’t read the spec carefully and only paid attention to:
12.11.7 GICR_ICFGR0, Interrupt Configuration Register 0
Determines whether the corresponding SGI is edge-triggered or level-sensitive.
But a few lines below it states:
but a few lines below
Int_config<x>, bits [2x+1:2x], for x = 15 to 0
Indicates whether the is level-sensitive or edge-triggered.
0b00 Corresponding interrupt is level-sensitive.
0b10 Corresponding interrupt is edge-triggered.
SGIs are always edge-triggered.
>
>
> > + rdist->icfgr[1] = readl_relaxed(base + GICR_ICFGR1);
> > +
> > + /* Save GICD configuration */
> > + gicv3_dist_wait_for_rwp();
> > + gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
> > +
> > + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> > + {
> > + typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
> > + unsigned int irq;
> > +
> > + base = GICD + GICD_ICFGR + 8 * i;
> > + irqs->icfgr[0] = readl_relaxed(base);
> > + irqs->icfgr[1] = readl_relaxed(base + 4);
> > +
> > + base = GICD + GICD_IPRIORITYR + 32 * i;
> > + for ( irq = 0; irq < 8; irq++ )
> > + irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
> > +
> > + base = GICD + GICD_IROUTER + 32 * i;
> > + for ( irq = 0; irq < 32; irq++ )
> > + irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
> > +
> > + irqs->isactiver = readl_relaxed(GICD + GICD_ISACTIVER + 4 * i);
> > + irqs->isenabler = readl_relaxed(GICD + GICD_ISENABLER + 4 * i);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void gicv3_resume(void)
> > +{
> > + unsigned int i;
> > + void __iomem *base;
> > + typeof(gicv3_ctx.rdist)* rdist = &gicv3_ctx.rdist;
> > +
> > + writel_relaxed(0, GICD + GICD_CTLR);
> > +
> > + for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
> > + writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
> > +
> > + for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
> > + {
> > + typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
> > + unsigned int irq;
> > +
> > + base = GICD + GICD_ICFGR + 8 * i;
> > + writel_relaxed(irqs->icfgr[0], base);
> > + writel_relaxed(irqs->icfgr[1], base + 4);
> > +
> > + base = GICD + GICD_IPRIORITYR + 32 * i;
> > + for ( irq = 0; irq < 8; irq++ )
> > + writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
> > +
> > + base = GICD + GICD_IROUTER + 32 * i;
> > + for ( irq = 0; irq < 32; irq++ )
> > + writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
> > +
> > + writel_relaxed(GENMASK(31, 0), GICD + GICD_ICENABLER + i * 4);
> > + writel_relaxed(irqs->isenabler, GICD + GICD_ISENABLER + i * 4);
> > +
> > + writel_relaxed(GENMASK(31, 0), GICD + GICD_ICACTIVER + i * 4);
> > + writel_relaxed(irqs->isactiver, GICD + GICD_ISACTIVER + i * 4);
> > + }
> > +
> > + writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
> > + gicv3_dist_wait_for_rwp();
> > +
> > + /* Restore GICR (Redistributor) configuration */
> > + gicv3_enable_redist();
> > +
> > + base = GICD_RDIST_SGI_BASE;
> > +
> > + writel_relaxed(0xffffffff, base + GICR_ICENABLER0);
> > + gicv3_redist_wait_for_rwp();
> > +
> > + for ( i = 0; i < NR_GIC_LOCAL_IRQS / 4; i++ )
> > + writel_relaxed(rdist->ipriorityr[i], base + GICR_IPRIORITYR0 + i * 4);
> > +
> > + writel_relaxed(rdist->isactiver, base + GICR_ISACTIVER0);
> > +
> > + writel_relaxed(rdist->igroupr, base + GICR_IGROUPR0);
> > + writel_relaxed(rdist->icfgr[0], base + GICR_ICFGR0);
>
> ... and restore it here.
Thank you for pointing that out.
I will remove it in the next version of the patch series.
>
>
> > + writel_relaxed(rdist->icfgr[1], base + GICR_ICFGR1);
> > +
>
> [snip]
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (2 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 03/13] xen/arm: gic-v3: Implement GICv3 " Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:31 ` Volodymyr Babchuk
2025-09-12 23:45 ` Julien Grall
2025-09-01 22:10 ` [PATCH v6 05/13] xen/arm: irq: avoid local IRQ descriptors reinit on system resume Mykola Kvach
` (9 subsequent siblings)
13 siblings, 2 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
If we call disable_nonboot_cpus on ARM64 with system_state set
to SYS_STATE_suspend, the following assertion will be triggered:
```
(XEN) [ 25.582712] Disabling non-boot CPUs ...
(XEN) [ 25.587032] Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
[...]
(XEN) [ 25.975069] Xen call trace:
(XEN) [ 25.978353] [<00000a000022e098>] xfree+0x130/0x1a4 (PC)
(XEN) [ 25.984314] [<00000a000022e08c>] xfree+0x124/0x1a4 (LR)
(XEN) [ 25.990276] [<00000a00002747d4>] release_irq+0xe4/0xe8
(XEN) [ 25.996152] [<00000a0000278588>] time.c#cpu_time_callback+0x44/0x60
(XEN) [ 26.003150] [<00000a000021d678>] notifier_call_chain+0x7c/0xa0
(XEN) [ 26.009717] [<00000a00002018e0>] cpu.c#cpu_notifier_call_chain+0x24/0x48
(XEN) [ 26.017148] [<00000a000020192c>] cpu.c#_take_cpu_down+0x28/0x34
(XEN) [ 26.023801] [<00000a0000201944>] cpu.c#take_cpu_down+0xc/0x18
(XEN) [ 26.030281] [<00000a0000225c5c>] stop_machine.c#stopmachine_action+0xbc/0xe4
(XEN) [ 26.038057] [<00000a00002264bc>] tasklet.c#do_tasklet_work+0xb8/0x100
(XEN) [ 26.045229] [<00000a00002268a4>] do_tasklet+0x68/0xb0
(XEN) [ 26.051018] [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
(XEN) [ 26.057585] [<00000a0000277e30>] start_secondary+0x21c/0x220
(XEN) [ 26.063978] [<00000a0000361258>] 00000a0000361258
```
This happens because before invoking take_cpu_down via the stop_machine_run
function on the target CPU, stop_machine_run requests
the STOPMACHINE_DISABLE_IRQ state on that CPU. Releasing memory in
the release_irq function then triggers the assertion:
/*
* Heap allocations may need TLB flushes which may require IRQs to be
* enabled (except when only 1 PCPU is online).
*/
This patch adds system state checks to guard calls to request_irq
and release_irq. These calls are now skipped when system_state is
SYS_STATE_{resume,suspend}, preventing unsafe operations during
suspend/resume handling.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- skipping of IRQ release during system suspend is now handled
inside release_irq().
Changes in V4:
- removed the prior tasklet-based workaround in favor of a more
straightforward and safer solution
- reworked the approach by adding explicit system state checks around
request_irq and release_irq calls, skips these calls during suspend
and resume states to avoid unsafe memory operations when IRQs are
disabled
---
xen/arch/arm/gic.c | 3 +++
xen/arch/arm/irq.c | 3 +++
xen/arch/arm/tee/ffa_notif.c | 2 +-
xen/arch/arm/time.c | 11 +++++++----
4 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index a018bd7715..c64481faa7 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -388,6 +388,9 @@ void gic_dump_info(struct vcpu *v)
void init_maintenance_interrupt(void)
{
+ if ( system_state == SYS_STATE_resume )
+ return;
+
request_irq(gic_hw_ops->info->maintenance_irq, 0, maintenance_interrupt,
"irq-maintenance", NULL);
}
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 02ca82c089..361496a6d0 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -300,6 +300,9 @@ void release_irq(unsigned int irq, const void *dev_id)
unsigned long flags;
struct irqaction *action, **action_ptr;
+ if ( system_state == SYS_STATE_suspend )
+ return;
+
desc = irq_to_desc(irq);
spin_lock_irqsave(&desc->lock,flags);
diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
index 86bef6b3b2..4835e25619 100644
--- a/xen/arch/arm/tee/ffa_notif.c
+++ b/xen/arch/arm/tee/ffa_notif.c
@@ -363,7 +363,7 @@ void ffa_notif_init_interrupt(void)
{
int ret;
- if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
+ if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI && system_state != SYS_STATE_resume )
{
/*
* An error here is unlikely since the primary CPU has already
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index ad984fdfdd..8267fa5191 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -320,10 +320,13 @@ void init_timer_interrupt(void)
WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
disable_physical_timers();
- request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
- "hyptimer", NULL);
- request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
- "virtimer", NULL);
+ if ( system_state != SYS_STATE_resume )
+ {
+ request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
+ "hyptimer", NULL);
+ request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
+ "virtimer", NULL);
+ }
check_timer_irq_cfg(timer_irq[TIMER_HYP_PPI], "hypervisor");
check_timer_irq_cfg(timer_irq[TIMER_VIRT_PPI], "virtual");
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend
2025-09-01 22:10 ` [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend Mykola Kvach
@ 2025-09-02 20:31 ` Volodymyr Babchuk
2025-09-12 23:45 ` Julien Grall
1 sibling, 0 replies; 49+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:31 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel
Hi Mykola,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> If we call disable_nonboot_cpus on ARM64 with system_state set
> to SYS_STATE_suspend, the following assertion will be triggered:
>
> ```
> (XEN) [ 25.582712] Disabling non-boot CPUs ...
> (XEN) [ 25.587032] Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
> [...]
> (XEN) [ 25.975069] Xen call trace:
> (XEN) [ 25.978353] [<00000a000022e098>] xfree+0x130/0x1a4 (PC)
> (XEN) [ 25.984314] [<00000a000022e08c>] xfree+0x124/0x1a4 (LR)
> (XEN) [ 25.990276] [<00000a00002747d4>] release_irq+0xe4/0xe8
> (XEN) [ 25.996152] [<00000a0000278588>] time.c#cpu_time_callback+0x44/0x60
> (XEN) [ 26.003150] [<00000a000021d678>] notifier_call_chain+0x7c/0xa0
> (XEN) [ 26.009717] [<00000a00002018e0>] cpu.c#cpu_notifier_call_chain+0x24/0x48
> (XEN) [ 26.017148] [<00000a000020192c>] cpu.c#_take_cpu_down+0x28/0x34
> (XEN) [ 26.023801] [<00000a0000201944>] cpu.c#take_cpu_down+0xc/0x18
> (XEN) [ 26.030281] [<00000a0000225c5c>] stop_machine.c#stopmachine_action+0xbc/0xe4
> (XEN) [ 26.038057] [<00000a00002264bc>] tasklet.c#do_tasklet_work+0xb8/0x100
> (XEN) [ 26.045229] [<00000a00002268a4>] do_tasklet+0x68/0xb0
> (XEN) [ 26.051018] [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
> (XEN) [ 26.057585] [<00000a0000277e30>] start_secondary+0x21c/0x220
> (XEN) [ 26.063978] [<00000a0000361258>] 00000a0000361258
> ```
>
> This happens because before invoking take_cpu_down via the stop_machine_run
> function on the target CPU, stop_machine_run requests
> the STOPMACHINE_DISABLE_IRQ state on that CPU. Releasing memory in
> the release_irq function then triggers the assertion:
>
> /*
> * Heap allocations may need TLB flushes which may require IRQs to be
> * enabled (except when only 1 PCPU is online).
> */
>
> This patch adds system state checks to guard calls to request_irq
> and release_irq. These calls are now skipped when system_state is
> SYS_STATE_{resume,suspend}, preventing unsafe operations during
> suspend/resume handling.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in V6:
> - skipping of IRQ release during system suspend is now handled
> inside release_irq().
> Changes in V4:
> - removed the prior tasklet-based workaround in favor of a more
> straightforward and safer solution
> - reworked the approach by adding explicit system state checks around
> request_irq and release_irq calls, skips these calls during suspend
> and resume states to avoid unsafe memory operations when IRQs are
> disabled
> ---
> xen/arch/arm/gic.c | 3 +++
> xen/arch/arm/irq.c | 3 +++
> xen/arch/arm/tee/ffa_notif.c | 2 +-
> xen/arch/arm/time.c | 11 +++++++----
> 4 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index a018bd7715..c64481faa7 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -388,6 +388,9 @@ void gic_dump_info(struct vcpu *v)
>
> void init_maintenance_interrupt(void)
> {
> + if ( system_state == SYS_STATE_resume )
> + return;
> +
> request_irq(gic_hw_ops->info->maintenance_irq, 0, maintenance_interrupt,
> "irq-maintenance", NULL);
> }
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index 02ca82c089..361496a6d0 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -300,6 +300,9 @@ void release_irq(unsigned int irq, const void *dev_id)
> unsigned long flags;
> struct irqaction *action, **action_ptr;
>
> + if ( system_state == SYS_STATE_suspend )
> + return;
> +
> desc = irq_to_desc(irq);
>
> spin_lock_irqsave(&desc->lock,flags);
> diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
> index 86bef6b3b2..4835e25619 100644
> --- a/xen/arch/arm/tee/ffa_notif.c
> +++ b/xen/arch/arm/tee/ffa_notif.c
> @@ -363,7 +363,7 @@ void ffa_notif_init_interrupt(void)
> {
> int ret;
>
> - if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI )
> + if ( fw_notif_enabled && notif_sri_irq < NR_GIC_SGI && system_state != SYS_STATE_resume )
> {
> /*
> * An error here is unlikely since the primary CPU has already
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index ad984fdfdd..8267fa5191 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -320,10 +320,13 @@ void init_timer_interrupt(void)
> WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
> disable_physical_timers();
>
> - request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> - "hyptimer", NULL);
> - request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
> - "virtimer", NULL);
> + if ( system_state != SYS_STATE_resume )
> + {
> + request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> + "hyptimer", NULL);
> + request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
> + "virtimer", NULL);
> + }
>
> check_timer_irq_cfg(timer_irq[TIMER_HYP_PPI], "hypervisor");
> check_timer_irq_cfg(timer_irq[TIMER_VIRT_PPI], "virtual");
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend
2025-09-01 22:10 ` [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend Mykola Kvach
2025-09-02 20:31 ` Volodymyr Babchuk
@ 2025-09-12 23:45 ` Julien Grall
2025-09-17 2:22 ` Mykola Kvach
1 sibling, 1 reply; 49+ messages in thread
From: Julien Grall @ 2025-09-12 23:45 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk
Hi Mykola,
On 01/09/2025 23:10, Mykola Kvach wrote:
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> If we call disable_nonboot_cpus on ARM64 with system_state set
> to SYS_STATE_suspend, the following assertion will be triggered:
Looking at the stack trace, I don't understand why this error would not
happen when offlining a CPU. Can you clarify?
Anyway, I am not very happy to special case suspend/resume in the IRQ
code. So I would strongly prefer if we follow a different approach.
The one that come to my mind is to switch from request_irq() to
setup_irq() and allocate the action in a per-cpu variable. With that,
there should be no free happening with the stop_machine helper.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend
2025-09-12 23:45 ` Julien Grall
@ 2025-09-17 2:22 ` Mykola Kvach
0 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-17 2:22 UTC (permalink / raw)
To: Julien Grall
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
Hi Julien,
Thank you for the review.
On Sat, Sep 13, 2025 at 2:45 AM Julien Grall <julien@xen.org> wrote:
>
> Hi Mykola,
>
> On 01/09/2025 23:10, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > If we call disable_nonboot_cpus on ARM64 with system_state set
> > to SYS_STATE_suspend, the following assertion will be triggered:
>
> Looking at the stack trace, I don't understand why this error would not
> happen when offlining a CPU. Can you clarify?
>
> Anyway, I am not very happy to special case suspend/resume in the IRQ
> code. So I would strongly prefer if we follow a different approach.
>
> The one that come to my mind is to switch from request_irq() to
> setup_irq() and allocate the action in a per-cpu variable. With that,
> there should be no free happening with the stop_machine helper.
Yes, this should help in my case and it also looks like a cleaner
solution, thank you.
Interestingly, my teammate Mykyta Poturai came up with the same idea a
few days ago when he faced a similar problem during CPU hotplug
implementation.
So I will just reuse his commits this is the one of the commits:
https://github.com/Deedone/xen/commit/3817601c2f437453035839f29e94c069a770817d
>
> Cheers,
>
> --
> Julien Grall
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v6 05/13] xen/arm: irq: avoid local IRQ descriptors reinit on system resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (3 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 04/13] xen/arm: Don't release IRQs on suspend Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during " Mykola Kvach
` (8 subsequent siblings)
13 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
On ARM, during system resume, CPUs are brought online again. This normally
triggers init_local_irq_data, which reinitializes IRQ descriptors for
banked interrupts (SGIs and PPIs).
These descriptors are statically allocated per CPU and retain valid
state across suspend/resume cycles. Re-initializing them on resume is
unnecessary and may result in loss of interrupt configuration or
restored state.
This patch skips init_local_irq_data when system_state is set to
SYS_STATE_resume to preserve banked IRQ descs state during resume.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
xen/arch/arm/irq.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 361496a6d0..6c899347ca 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -125,6 +125,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
switch ( action )
{
case CPU_UP_PREPARE:
+ /* Skip local IRQ cleanup on resume */
+ if ( system_state == SYS_STATE_resume )
+ break;
+
rc = init_local_irq_data(cpu);
if ( rc )
printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (4 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 05/13] xen/arm: irq: avoid local IRQ descriptors reinit on system resume Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 16:49 ` Oleksandr Tyshchenko
2025-09-01 22:10 ` [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
` (7 subsequent siblings)
13 siblings, 1 reply; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
and not restored by gic_resume (for secondary cpus).
This patch introduces restore_local_irqs_on_resume, a function that
restores the state of local interrupts on the target CPU during
system resume.
It iterates over all local IRQs and re-enables those that were not
disabled, reprogramming their routing and affinity accordingly.
The function is invoked from start_secondary, ensuring that local IRQ
state is restored early during CPU bring-up after suspend.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- Call handler->disable() instead of just setting the _IRQ_DISABLED flag
- Move the system state check outside of restore_local_irqs_on_resume()
---
xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 6c899347ca..ddd2940554 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
return 0;
}
+/*
+ * The first 32 interrupts (PPIs and SGIs) are per-CPU,
+ * so call this function on the target CPU to restore them.
+ *
+ * SPIs are restored via gic_resume.
+ */
+static void restore_local_irqs_on_resume(void)
+{
+ int irq;
+
+ spin_lock(&local_irqs_type_lock);
+
+ for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
+ {
+ struct irq_desc *desc = irq_to_desc(irq);
+
+ spin_lock(&desc->lock);
+
+ if ( test_bit(_IRQ_DISABLED, &desc->status) )
+ {
+ spin_unlock(&desc->lock);
+ continue;
+ }
+
+ /* Disable the IRQ to avoid assertions in the following calls */
+ desc->handler->disable(desc);
+ gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
+ desc->handler->startup(desc);
+
+ spin_unlock(&desc->lock);
+ }
+
+ spin_unlock(&local_irqs_type_lock);
+}
+
static int cpu_callback(struct notifier_block *nfb, unsigned long action,
void *hcpu)
{
@@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
cpu);
break;
+ case CPU_STARTING:
+ if ( system_state == SYS_STATE_resume )
+ restore_local_irqs_on_resume();
+ break;
}
return notifier_from_errno(rc);
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-01 22:10 ` [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during " Mykola Kvach
@ 2025-09-02 16:49 ` Oleksandr Tyshchenko
2025-09-02 17:43 ` Mykola Kvach
2025-09-02 22:21 ` Mykola Kvach
0 siblings, 2 replies; 49+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-02 16:49 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
On 02.09.25 01:10, Mykola Kvach wrote:
Hello Mykola
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> and not restored by gic_resume (for secondary cpus).
>
> This patch introduces restore_local_irqs_on_resume, a function that
> restores the state of local interrupts on the target CPU during
> system resume.
>
> It iterates over all local IRQs and re-enables those that were not
> disabled, reprogramming their routing and affinity accordingly.
>
> The function is invoked from start_secondary, ensuring that local IRQ
> state is restored early during CPU bring-up after suspend.
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V6:
> - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> - Move the system state check outside of restore_local_irqs_on_resume()
> ---
> xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index 6c899347ca..ddd2940554 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> return 0;
> }
>
> +/*
> + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> + * so call this function on the target CPU to restore them.
> + *
> + * SPIs are restored via gic_resume.
> + */
> +static void restore_local_irqs_on_resume(void)
> +{
> + int irq;
NIT: Please, use "unsigned int" if irq cannot be negative
> +
> + spin_lock(&local_irqs_type_lock);
> +
> + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> + {
> + struct irq_desc *desc = irq_to_desc(irq);
> +
> + spin_lock(&desc->lock);
> +
> + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> + {
> + spin_unlock(&desc->lock);
> + continue;
> + }
> +
> + /* Disable the IRQ to avoid assertions in the following calls */
> + desc->handler->disable(desc);
> + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
Shouldn't we use GIC_PRI_IPI for SGIs?
> + desc->handler->startup(desc);
> +
> + spin_unlock(&desc->lock);
> + }
> +
> + spin_unlock(&local_irqs_type_lock);
> +}
> +
> static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> void *hcpu)
> {
> @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> cpu);
> break;
> + case CPU_STARTING:
> + if ( system_state == SYS_STATE_resume )
> + restore_local_irqs_on_resume();
> + break;
May I please ask, why all this new code (i.e.
restore_local_irqs_on_resume()) is not covered by #ifdef
CONFIG_SYSTEM_SUSPEND?
> }
>
> return notifier_from_errno(rc);
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 16:49 ` Oleksandr Tyshchenko
@ 2025-09-02 17:43 ` Mykola Kvach
2025-09-02 18:16 ` Oleksandr Tyshchenko
2025-09-02 22:21 ` Mykola Kvach
1 sibling, 1 reply; 49+ messages in thread
From: Mykola Kvach @ 2025-09-02 17:43 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
Hi Oleksandr,
On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> > and not restored by gic_resume (for secondary cpus).
> >
> > This patch introduces restore_local_irqs_on_resume, a function that
> > restores the state of local interrupts on the target CPU during
> > system resume.
> >
> > It iterates over all local IRQs and re-enables those that were not
> > disabled, reprogramming their routing and affinity accordingly.
> >
> > The function is invoked from start_secondary, ensuring that local IRQ
> > state is restored early during CPU bring-up after suspend.
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> > - Move the system state check outside of restore_local_irqs_on_resume()
> > ---
> > xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 39 insertions(+)
> >
> > diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> > index 6c899347ca..ddd2940554 100644
> > --- a/xen/arch/arm/irq.c
> > +++ b/xen/arch/arm/irq.c
> > @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> > return 0;
> > }
> >
> > +/*
> > + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> > + * so call this function on the target CPU to restore them.
> > + *
> > + * SPIs are restored via gic_resume.
> > + */
> > +static void restore_local_irqs_on_resume(void)
> > +{
> > + int irq;
>
> NIT: Please, use "unsigned int" if irq cannot be negative
ok
>
> > +
> > + spin_lock(&local_irqs_type_lock);
> > +
> > + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> > + {
> > + struct irq_desc *desc = irq_to_desc(irq);
> > +
> > + spin_lock(&desc->lock);
> > +
> > + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> > + {
> > + spin_unlock(&desc->lock);
> > + continue;
> > + }
> > +
> > + /* Disable the IRQ to avoid assertions in the following calls */
> > + desc->handler->disable(desc);
> > + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
>
> Shouldn't we use GIC_PRI_IPI for SGIs?
Yes, we should. But currently I am restoring the same value
as it was before suspend...
I definitely agree that this needs to be fixed at the original
place where the issue was introduced, but I was planning to
address it in a future patch.
>
>
> > + desc->handler->startup(desc);
> > +
> > + spin_unlock(&desc->lock);
> > + }
> > +
> > + spin_unlock(&local_irqs_type_lock);
> > +}
> > +
> > static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > void *hcpu)
> > {
> > @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> > cpu);
> > break;
> > + case CPU_STARTING:
> > + if ( system_state == SYS_STATE_resume )
> > + restore_local_irqs_on_resume();
> > + break;
>
> May I please ask, why all this new code (i.e.
> restore_local_irqs_on_resume()) is not covered by #ifdef
> CONFIG_SYSTEM_SUSPEND?
I don’t see a reason to introduce such "macaron-style" code. On ARM, the
system suspend state is only set when CONFIG_SYSTEM_SUSPEND is defined
anyway.
If you would prefer me to wrap all relevant code with this define, please
let me know and I’ll make the change. In this case, I think the current
approach is cleaner, but I’m open to your opinion.
>
> > }
> >
> > return notifier_from_errno(rc);
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 17:43 ` Mykola Kvach
@ 2025-09-02 18:16 ` Oleksandr Tyshchenko
2025-09-02 20:08 ` Mykola Kvach
0 siblings, 1 reply; 49+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-02 18:16 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On 02.09.25 20:43, Mykola Kvach wrote:
> Hi Oleksandr,
Hello Mykola
>
> On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>>
>>
>>
>> On 02.09.25 01:10, Mykola Kvach wrote:
>>
>> Hello Mykola
>>
>>> From: Mykola Kvach <mykola_kvach@epam.com>
>>>
>>> On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
>>> and not restored by gic_resume (for secondary cpus).
>>>
>>> This patch introduces restore_local_irqs_on_resume, a function that
>>> restores the state of local interrupts on the target CPU during
>>> system resume.
>>>
>>> It iterates over all local IRQs and re-enables those that were not
>>> disabled, reprogramming their routing and affinity accordingly.
>>>
>>> The function is invoked from start_secondary, ensuring that local IRQ
>>> state is restored early during CPU bring-up after suspend.
>>>
>>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
>>> ---
>>> Changes in V6:
>>> - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
>>> - Move the system state check outside of restore_local_irqs_on_resume()
>>> ---
>>> xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 39 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
>>> index 6c899347ca..ddd2940554 100644
>>> --- a/xen/arch/arm/irq.c
>>> +++ b/xen/arch/arm/irq.c
>>> @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
>>> return 0;
>>> }
>>>
>>> +/*
>>> + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
>>> + * so call this function on the target CPU to restore them.
>>> + *
>>> + * SPIs are restored via gic_resume.
>>> + */
>>> +static void restore_local_irqs_on_resume(void)
>>> +{
>>> + int irq;
>>
>> NIT: Please, use "unsigned int" if irq cannot be negative
>
> ok
>
>>
>>> +
>>> + spin_lock(&local_irqs_type_lock);
>>> +
>>> + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
>>> + {
>>> + struct irq_desc *desc = irq_to_desc(irq);
>>> +
>>> + spin_lock(&desc->lock);
>>> +
>>> + if ( test_bit(_IRQ_DISABLED, &desc->status) )
>>> + {
>>> + spin_unlock(&desc->lock);
>>> + continue;
>>> + }
>>> +
>>> + /* Disable the IRQ to avoid assertions in the following calls */
>>> + desc->handler->disable(desc);
>>> + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
>>
>> Shouldn't we use GIC_PRI_IPI for SGIs?
>
> Yes, we should. But currently I am restoring the same value
> as it was before suspend...
>
> I definitely agree that this needs to be fixed at the original
> place where the issue was introduced, but I was planning to
> address it in a future patch.
>
>>
>>
>>> + desc->handler->startup(desc);
>>> +
>>> + spin_unlock(&desc->lock);
>>> + }
>>> +
>>> + spin_unlock(&local_irqs_type_lock);
>>> +}
>>> +
>>> static int cpu_callback(struct notifier_block *nfb, unsigned long action,
>>> void *hcpu)
>>> {
>>> @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
>>> printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
>>> cpu);
>>> break;
>>> + case CPU_STARTING:
>>> + if ( system_state == SYS_STATE_resume )
>>> + restore_local_irqs_on_resume();
>>> + break;
>>
>> May I please ask, why all this new code (i.e.
>> restore_local_irqs_on_resume()) is not covered by #ifdef
>> CONFIG_SYSTEM_SUSPEND?
>
> I don’t see a reason to introduce such "macaron-style" code. On ARM, the
> system suspend state is only set when CONFIG_SYSTEM_SUSPEND is defined
> anyway.
right
>
> If you would prefer me to wrap all relevant code with this define, please
> let me know and I’ll make the change. In this case, I think the current
> approach is cleaner, but I’m open to your opinion.
In other patches, you seem to wrap functions/code that only get called
during suspend/resume with #ifdef CONFIG_SYSTEM_SUSPEND, so I wondered
why restore_local_irqs_on_resume() could not be compiled out
if the feature is not enabled. But if you still think it would be
cleaner this way (w/o #ifdef), I would be ok.
>
>>
>>> }
>>>
>>> return notifier_from_errno(rc);
>>
>
> Best regards,
> Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 18:16 ` Oleksandr Tyshchenko
@ 2025-09-02 20:08 ` Mykola Kvach
2025-09-02 20:19 ` Mykola Kvach
0 siblings, 1 reply; 49+ messages in thread
From: Mykola Kvach @ 2025-09-02 20:08 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On Tue, Sep 2, 2025 at 9:16 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 20:43, Mykola Kvach wrote:
> > Hi Oleksandr,
>
> Hello Mykola
>
> >
> > On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> >>
> >>
> >>
> >> On 02.09.25 01:10, Mykola Kvach wrote:
> >>
> >> Hello Mykola
> >>
> >>> From: Mykola Kvach <mykola_kvach@epam.com>
> >>>
> >>> On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> >>> and not restored by gic_resume (for secondary cpus).
> >>>
> >>> This patch introduces restore_local_irqs_on_resume, a function that
> >>> restores the state of local interrupts on the target CPU during
> >>> system resume.
> >>>
> >>> It iterates over all local IRQs and re-enables those that were not
> >>> disabled, reprogramming their routing and affinity accordingly.
> >>>
> >>> The function is invoked from start_secondary, ensuring that local IRQ
> >>> state is restored early during CPU bring-up after suspend.
> >>>
> >>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> >>> ---
> >>> Changes in V6:
> >>> - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> >>> - Move the system state check outside of restore_local_irqs_on_resume()
> >>> ---
> >>> xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> >>> 1 file changed, 39 insertions(+)
> >>>
> >>> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> >>> index 6c899347ca..ddd2940554 100644
> >>> --- a/xen/arch/arm/irq.c
> >>> +++ b/xen/arch/arm/irq.c
> >>> @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> >>> return 0;
> >>> }
> >>>
> >>> +/*
> >>> + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> >>> + * so call this function on the target CPU to restore them.
> >>> + *
> >>> + * SPIs are restored via gic_resume.
> >>> + */
> >>> +static void restore_local_irqs_on_resume(void)
> >>> +{
> >>> + int irq;
> >>
> >> NIT: Please, use "unsigned int" if irq cannot be negative
> >
> > ok
> >
> >>
> >>> +
> >>> + spin_lock(&local_irqs_type_lock);
> >>> +
> >>> + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> >>> + {
> >>> + struct irq_desc *desc = irq_to_desc(irq);
> >>> +
> >>> + spin_lock(&desc->lock);
> >>> +
> >>> + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> >>> + {
> >>> + spin_unlock(&desc->lock);
> >>> + continue;
> >>> + }
> >>> +
> >>> + /* Disable the IRQ to avoid assertions in the following calls */
> >>> + desc->handler->disable(desc);
> >>> + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
> >>
> >> Shouldn't we use GIC_PRI_IPI for SGIs?
> >
> > Yes, we should. But currently I am restoring the same value
> > as it was before suspend...
> >
> > I definitely agree that this needs to be fixed at the original
> > place where the issue was introduced, but I was planning to
> > address it in a future patch.
> >
> >>
> >>
> >>> + desc->handler->startup(desc);
> >>> +
> >>> + spin_unlock(&desc->lock);
> >>> + }
> >>> +
> >>> + spin_unlock(&local_irqs_type_lock);
> >>> +}
> >>> +
> >>> static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> >>> void *hcpu)
> >>> {
> >>> @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> >>> printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> >>> cpu);
> >>> break;
> >>> + case CPU_STARTING:
> >>> + if ( system_state == SYS_STATE_resume )
> >>> + restore_local_irqs_on_resume();
> >>> + break;
> >>
> >> May I please ask, why all this new code (i.e.
> >> restore_local_irqs_on_resume()) is not covered by #ifdef
> >> CONFIG_SYSTEM_SUSPEND?
> >
> > I don’t see a reason to introduce such "macaron-style" code. On ARM, the
> > system suspend state is only set when CONFIG_SYSTEM_SUSPEND is defined
> > anyway.
>
> right
>
> >
> > If you would prefer me to wrap all relevant code with this define, please
> > let me know and I’ll make the change. In this case, I think the current
> > approach is cleaner, but I’m open to your opinion.
>
> In other patches, you seem to wrap functions/code that only get called
> during suspend/resume with #ifdef CONFIG_SYSTEM_SUSPEND, so I wondered
> why restore_local_irqs_on_resume() could not be compiled out
> if the feature is not enabled. But if you still think it would be
> cleaner this way (w/o #ifdef), I would be ok.
It’s not entirely true -- I only wrapped code that has a direct dependency
on host_system_suspend(), either being called from it or required for its
correct operation.
If you look through this patch series for the pattern:
SYS_STATE_(suspend|resume)
you’ll see that not all suspend/resume-related code is wrapped in
#ifdef CONFIG_SYSTEM_SUSPEND. This is intentional -- the same applies to
some code already merged into the common parts of Xen.
So restore_local_irqs_on_resume is consistent with the existing approach
in all cpu notifier blocks.
>
> >
> >>
> >>> }
> >>>
> >>> return notifier_from_errno(rc);
> >>
> >
> > Best regards,
> > Mykola
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 20:08 ` Mykola Kvach
@ 2025-09-02 20:19 ` Mykola Kvach
0 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-02 20:19 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On Tue, Sep 2, 2025 at 11:08 PM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> On Tue, Sep 2, 2025 at 9:16 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> >
> >
> >
> > On 02.09.25 20:43, Mykola Kvach wrote:
> > > Hi Oleksandr,
> >
> > Hello Mykola
> >
> > >
> > > On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> > >>
> > >>
> > >>
> > >> On 02.09.25 01:10, Mykola Kvach wrote:
> > >>
> > >> Hello Mykola
> > >>
> > >>> From: Mykola Kvach <mykola_kvach@epam.com>
> > >>>
> > >>> On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> > >>> and not restored by gic_resume (for secondary cpus).
> > >>>
> > >>> This patch introduces restore_local_irqs_on_resume, a function that
> > >>> restores the state of local interrupts on the target CPU during
> > >>> system resume.
> > >>>
> > >>> It iterates over all local IRQs and re-enables those that were not
> > >>> disabled, reprogramming their routing and affinity accordingly.
> > >>>
> > >>> The function is invoked from start_secondary, ensuring that local IRQ
> > >>> state is restored early during CPU bring-up after suspend.
> > >>>
> > >>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > >>> ---
> > >>> Changes in V6:
> > >>> - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> > >>> - Move the system state check outside of restore_local_irqs_on_resume()
> > >>> ---
> > >>> xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> > >>> 1 file changed, 39 insertions(+)
> > >>>
> > >>> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> > >>> index 6c899347ca..ddd2940554 100644
> > >>> --- a/xen/arch/arm/irq.c
> > >>> +++ b/xen/arch/arm/irq.c
> > >>> @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> > >>> return 0;
> > >>> }
> > >>>
> > >>> +/*
> > >>> + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> > >>> + * so call this function on the target CPU to restore them.
> > >>> + *
> > >>> + * SPIs are restored via gic_resume.
> > >>> + */
> > >>> +static void restore_local_irqs_on_resume(void)
> > >>> +{
> > >>> + int irq;
> > >>
> > >> NIT: Please, use "unsigned int" if irq cannot be negative
> > >
> > > ok
> > >
> > >>
> > >>> +
> > >>> + spin_lock(&local_irqs_type_lock);
> > >>> +
> > >>> + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> > >>> + {
> > >>> + struct irq_desc *desc = irq_to_desc(irq);
> > >>> +
> > >>> + spin_lock(&desc->lock);
> > >>> +
> > >>> + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> > >>> + {
> > >>> + spin_unlock(&desc->lock);
> > >>> + continue;
> > >>> + }
> > >>> +
> > >>> + /* Disable the IRQ to avoid assertions in the following calls */
> > >>> + desc->handler->disable(desc);
> > >>> + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
> > >>
> > >> Shouldn't we use GIC_PRI_IPI for SGIs?
> > >
> > > Yes, we should. But currently I am restoring the same value
> > > as it was before suspend...
> > >
> > > I definitely agree that this needs to be fixed at the original
> > > place where the issue was introduced, but I was planning to
> > > address it in a future patch.
> > >
> > >>
> > >>
> > >>> + desc->handler->startup(desc);
> > >>> +
> > >>> + spin_unlock(&desc->lock);
> > >>> + }
> > >>> +
> > >>> + spin_unlock(&local_irqs_type_lock);
> > >>> +}
> > >>> +
> > >>> static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > >>> void *hcpu)
> > >>> {
> > >>> @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > >>> printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> > >>> cpu);
> > >>> break;
> > >>> + case CPU_STARTING:
> > >>> + if ( system_state == SYS_STATE_resume )
> > >>> + restore_local_irqs_on_resume();
> > >>> + break;
> > >>
> > >> May I please ask, why all this new code (i.e.
> > >> restore_local_irqs_on_resume()) is not covered by #ifdef
> > >> CONFIG_SYSTEM_SUSPEND?
> > >
> > > I don’t see a reason to introduce such "macaron-style" code. On ARM, the
> > > system suspend state is only set when CONFIG_SYSTEM_SUSPEND is defined
> > > anyway.
> >
> > right
> >
> > >
> > > If you would prefer me to wrap all relevant code with this define, please
> > > let me know and I’ll make the change. In this case, I think the current
> > > approach is cleaner, but I’m open to your opinion.
> >
> > In other patches, you seem to wrap functions/code that only get called
> > during suspend/resume with #ifdef CONFIG_SYSTEM_SUSPEND, so I wondered
> > why restore_local_irqs_on_resume() could not be compiled out
> > if the feature is not enabled. But if you still think it would be
> > cleaner this way (w/o #ifdef), I would be ok.
>
> It’s not entirely true -- I only wrapped code that has a direct dependency
> on host_system_suspend(), either being called from it or required for its
> correct operation.
>
> If you look through this patch series for the pattern:
> SYS_STATE_(suspend|resume)
>
> you’ll see that not all suspend/resume-related code is wrapped in
> #ifdef CONFIG_SYSTEM_SUSPEND. This is intentional -- the same applies to
> some code already merged into the common parts of Xen.
>
> So restore_local_irqs_on_resume is consistent with the existing approach
> in all cpu notifier blocks.
Of course, I can wrap all code in this patch series if needed. For me, the
current approach looks clearer and aligns with existing code. On the other
hand, I introduced this config option not so long ago, so that may be why
some parts in common code and even in some architectures like x86 are still
uncovered.
In any case, I don't mind covering all the code if you think it would be better.
Right now, this implementation is mainly my preference and aligns with the
existing code. There isn't any other reasoning behind this decision.
>
> >
> > >
> > >>
> > >>> }
> > >>>
> > >>> return notifier_from_errno(rc);
> > >>
> > >
> > > Best regards,
> > > Mykola
> >
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during system resume
2025-09-02 16:49 ` Oleksandr Tyshchenko
2025-09-02 17:43 ` Mykola Kvach
@ 2025-09-02 22:21 ` Mykola Kvach
1 sibling, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-02 22:21 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Mykola Kvach, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk
On Tue, Sep 2, 2025 at 7:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > On ARM, the first 32 interrupts (SGIs and PPIs) are banked per-CPU
> > and not restored by gic_resume (for secondary cpus).
> >
> > This patch introduces restore_local_irqs_on_resume, a function that
> > restores the state of local interrupts on the target CPU during
> > system resume.
> >
> > It iterates over all local IRQs and re-enables those that were not
> > disabled, reprogramming their routing and affinity accordingly.
> >
> > The function is invoked from start_secondary, ensuring that local IRQ
> > state is restored early during CPU bring-up after suspend.
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - Call handler->disable() instead of just setting the _IRQ_DISABLED flag
> > - Move the system state check outside of restore_local_irqs_on_resume()
> > ---
> > xen/arch/arm/irq.c | 39 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 39 insertions(+)
> >
> > diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> > index 6c899347ca..ddd2940554 100644
> > --- a/xen/arch/arm/irq.c
> > +++ b/xen/arch/arm/irq.c
> > @@ -116,6 +116,41 @@ static int init_local_irq_data(unsigned int cpu)
> > return 0;
> > }
> >
> > +/*
> > + * The first 32 interrupts (PPIs and SGIs) are per-CPU,
> > + * so call this function on the target CPU to restore them.
> > + *
> > + * SPIs are restored via gic_resume.
> > + */
> > +static void restore_local_irqs_on_resume(void)
> > +{
> > + int irq;
>
> NIT: Please, use "unsigned int" if irq cannot be negative
>
> > +
> > + spin_lock(&local_irqs_type_lock);
> > +
> > + for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> > + {
> > + struct irq_desc *desc = irq_to_desc(irq);
> > +
> > + spin_lock(&desc->lock);
> > +
> > + if ( test_bit(_IRQ_DISABLED, &desc->status) )
> > + {
> > + spin_unlock(&desc->lock);
> > + continue;
> > + }
> > +
> > + /* Disable the IRQ to avoid assertions in the following calls */
> > + desc->handler->disable(desc);
> > + gic_route_irq_to_xen(desc, GIC_PRI_IRQ);
>
> Shouldn't we use GIC_PRI_IPI for SGIs?
I'll update the priority value in the next version.
Initially, I assumed gic_route_irq_to_xen() was used for all
interrupts with the same priority. But looking more closely, it
doesn't appear to be called for SGIs at all.
In fact, SGI configuration, including priority, is handled during CPU
initialization in gic_init_secondary_cpu(), which is called before
the CPU_STARTING notifier.
Given that, it's probably better to avoid updating SGI priorities here
entirely and rely on their boot-time configuration instead.
>
>
> > + desc->handler->startup(desc);
> > +
> > + spin_unlock(&desc->lock);
> > + }
> > +
> > + spin_unlock(&local_irqs_type_lock);
> > +}
> > +
> > static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > void *hcpu)
> > {
> > @@ -134,6 +169,10 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> > printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
> > cpu);
> > break;
> > + case CPU_STARTING:
> > + if ( system_state == SYS_STATE_resume )
> > + restore_local_irqs_on_resume();
> > + break;
>
> May I please ask, why all this new code (i.e.
> restore_local_irqs_on_resume()) is not covered by #ifdef
> CONFIG_SYSTEM_SUSPEND?
>
> > }
> >
> > return notifier_from_errno(rc);
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (5 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 06/13] xen/arm: irq: Restore state of local IRQs during " Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:39 ` Volodymyr Babchuk
2025-09-03 10:01 ` Oleksandr Tyshchenko
2025-09-01 22:10 ` [PATCH v6 08/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
` (6 subsequent siblings)
13 siblings, 2 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Mykola Kvach
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Store and restore active context and micro-TLB registers.
Tested on R-Car H3 Starter Kit.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- refactor code related to hw_register struct, from now it's called
ipmmu_reg_ctx
---
xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
1 file changed, 257 insertions(+)
diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
index ea9fa9ddf3..0973559861 100644
--- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -71,6 +71,8 @@
})
#endif
+#define dev_dbg(dev, fmt, ...) \
+ dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
#define dev_info(dev, fmt, ...) \
dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
#define dev_warn(dev, fmt, ...) \
@@ -130,6 +132,24 @@ struct ipmmu_features {
unsigned int imuctr_ttsel_mask;
};
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+struct ipmmu_reg_ctx {
+ unsigned int imttlbr0;
+ unsigned int imttubr0;
+ unsigned int imttbcr;
+ unsigned int imctr;
+};
+
+struct ipmmu_vmsa_backup {
+ struct device *dev;
+ unsigned int *utlbs_val;
+ unsigned int *asids_val;
+ struct list_head list;
+};
+
+#endif
+
/* Root/Cache IPMMU device's information */
struct ipmmu_vmsa_device {
struct device *dev;
@@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
unsigned int utlb_refcount[IPMMU_UTLB_MAX];
const struct ipmmu_features *features;
+#ifdef CONFIG_SYSTEM_SUSPEND
+ struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
+#endif
};
/*
@@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
spin_unlock_irqrestore(&mmu->lock, flags);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
+static LIST_HEAD(ipmmu_devices_backup);
+
+static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
+
+static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
+ unsigned int utlb)
+{
+ return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
+}
+
+static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
+{
+ struct ipmmu_vmsa_backup *backup_data;
+
+ dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
+
+ spin_lock(&ipmmu_devices_backup_lock);
+
+ list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
+ {
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
+ unsigned int i;
+
+ if ( to_ipmmu(backup_data->dev) != mmu )
+ continue;
+
+ for ( i = 0; i < fwspec->num_ids; i++ )
+ {
+ unsigned int utlb = fwspec->ids[i];
+
+ backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
+ backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
+ }
+ }
+
+ spin_unlock(&ipmmu_devices_backup_lock);
+}
+
+static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
+{
+ struct ipmmu_vmsa_backup *backup_data;
+
+ dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
+
+ spin_lock(&ipmmu_devices_backup_lock);
+
+ list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
+ {
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
+ unsigned int i;
+
+ if ( to_ipmmu(backup_data->dev) != mmu )
+ continue;
+
+ for ( i = 0; i < fwspec->num_ids; i++ )
+ {
+ unsigned int utlb = fwspec->ids[i];
+
+ ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
+ ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
+ }
+ }
+
+ spin_unlock(&ipmmu_devices_backup_lock);
+}
+
+static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
+{
+ struct ipmmu_vmsa_device *mmu = domain->mmu->root;
+ struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
+
+ dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
+
+ regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
+ regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
+ regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
+ regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
+}
+
+static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
+{
+ struct ipmmu_vmsa_device *mmu = domain->mmu->root;
+ struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
+
+ dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
+
+ ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
+ ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
+ ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
+ ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
+}
+
+/*
+ * Xen: Unlike Linux implementation, Xen uses a single driver instance
+ * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
+ * callbacks to be invoked for each IPMMU device. So, we need to iterate
+ * through all registered IPMMUs performing required actions.
+ *
+ * Also take care of restoring special settings, such as translation
+ * table format, etc.
+ */
+static int __must_check ipmmu_suspend(void)
+{
+ struct ipmmu_vmsa_device *mmu;
+
+ if ( !iommu_enabled )
+ return 0;
+
+ printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
+
+ spin_lock(&ipmmu_devices_lock);
+
+ list_for_each_entry( mmu, &ipmmu_devices, list )
+ {
+ if ( ipmmu_is_root(mmu) )
+ {
+ unsigned int i;
+
+ for ( i = 0; i < mmu->num_ctx; i++ )
+ {
+ if ( !mmu->domains[i] )
+ continue;
+ ipmmu_domain_backup_context(mmu->domains[i]);
+ }
+ }
+ else
+ ipmmu_utlbs_backup(mmu);
+ }
+
+ spin_unlock(&ipmmu_devices_lock);
+
+ return 0;
+}
+
+static void ipmmu_resume(void)
+{
+ struct ipmmu_vmsa_device *mmu;
+
+ if ( !iommu_enabled )
+ return;
+
+ printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
+
+ spin_lock(&ipmmu_devices_lock);
+
+ list_for_each_entry( mmu, &ipmmu_devices, list )
+ {
+ uint32_t reg;
+
+ /* Do not use security group function */
+ reg = IMSCTLR + mmu->features->control_offset_base;
+ ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
+
+ if ( ipmmu_is_root(mmu) )
+ {
+ unsigned int i;
+
+ /* Use stage 2 translation table format */
+ reg = IMSAUXCTLR + mmu->features->control_offset_base;
+ ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
+
+ for ( i = 0; i < mmu->num_ctx; i++ )
+ {
+ if ( !mmu->domains[i] )
+ continue;
+ ipmmu_domain_restore_context(mmu->domains[i]);
+ }
+ }
+ else
+ ipmmu_utlbs_restore(mmu);
+ }
+
+ spin_unlock(&ipmmu_devices_lock);
+}
+
+static int ipmmu_alloc_ctx_suspend(struct device *dev)
+{
+ struct ipmmu_vmsa_backup *backup_data;
+ unsigned int *utlbs_val, *asids_val;
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
+ utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
+ if ( !utlbs_val )
+ return -ENOMEM;
+
+ asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
+ if ( !asids_val )
+ {
+ xfree(utlbs_val);
+ return -ENOMEM;
+ }
+
+ backup_data = xzalloc(struct ipmmu_vmsa_backup);
+ if ( !backup_data )
+ {
+ xfree(utlbs_val);
+ xfree(asids_val);
+ return -ENOMEM;
+ }
+
+ backup_data->dev = dev;
+ backup_data->utlbs_val = utlbs_val;
+ backup_data->asids_val = asids_val;
+
+ spin_lock(&ipmmu_devices_backup_lock);
+ list_add(&backup_data->list, &ipmmu_devices_backup);
+ spin_unlock(&ipmmu_devices_backup_lock);
+
+ return 0;
+}
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
{
uint64_t ttbr;
@@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
return ret;
domain->context_id = ret;
+#ifdef CONFIG_SYSTEM_SUSPEND
+ domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
+#endif
/*
* TTBR0
@@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
ipmmu_tlb_sync(domain);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ domain->mmu->root->reg_backup[domain->context_id] = NULL;
+#endif
ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
}
@@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
}
#endif
+#ifdef CONFIG_SYSTEM_SUSPEND
+ if ( ipmmu_alloc_ctx_suspend(dev) )
+ {
+ dev_err(dev, "Failed to allocate context for suspend\n");
+ return -ENOMEM;
+ }
+#endif
+
dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
dev_name(fwspec->iommu_dev), fwspec->num_ids);
@@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
.unmap_page = arm_iommu_unmap_page,
.dt_xlate = ipmmu_dt_xlate,
.add_device = ipmmu_add_device,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = ipmmu_suspend,
+ .resume = ipmmu_resume,
+#endif
};
static __init int ipmmu_init(struct dt_device_node *node, const void *data)
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-01 22:10 ` [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
@ 2025-09-02 20:39 ` Volodymyr Babchuk
2025-09-03 10:01 ` Oleksandr Tyshchenko
1 sibling, 0 replies; 49+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:39 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Oleksandr Tyshchenko,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Mykola Kvach
Hi,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Store and restore active context and micro-TLB registers.
>
> Tested on R-Car H3 Starter Kit.
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in V6:
> - refactor code related to hw_register struct, from now it's called
> ipmmu_reg_ctx
> ---
> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
> 1 file changed, 257 insertions(+)
>
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index ea9fa9ddf3..0973559861 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -71,6 +71,8 @@
> })
> #endif
>
> +#define dev_dbg(dev, fmt, ...) \
> + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> #define dev_info(dev, fmt, ...) \
> dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> #define dev_warn(dev, fmt, ...) \
> @@ -130,6 +132,24 @@ struct ipmmu_features {
> unsigned int imuctr_ttsel_mask;
> };
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +struct ipmmu_reg_ctx {
> + unsigned int imttlbr0;
> + unsigned int imttubr0;
> + unsigned int imttbcr;
> + unsigned int imctr;
> +};
> +
> +struct ipmmu_vmsa_backup {
> + struct device *dev;
> + unsigned int *utlbs_val;
> + unsigned int *asids_val;
> + struct list_head list;
> +};
> +
> +#endif
> +
> /* Root/Cache IPMMU device's information */
> struct ipmmu_vmsa_device {
> struct device *dev;
> @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
> struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> unsigned int utlb_refcount[IPMMU_UTLB_MAX];
> const struct ipmmu_features *features;
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
> +#endif
> };
>
> /*
> @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
> spin_unlock_irqrestore(&mmu->lock, flags);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
> +static LIST_HEAD(ipmmu_devices_backup);
> +
> +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
> +
> +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
> + unsigned int utlb)
> +{
> + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
> +}
> +
> +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> +
> + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> +
> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> + {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> + unsigned int i;
> +
> + if ( to_ipmmu(backup_data->dev) != mmu )
> + continue;
> +
> + for ( i = 0; i < fwspec->num_ids; i++ )
> + {
> + unsigned int utlb = fwspec->ids[i];
> +
> + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
> + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
> + }
> + }
> +
> + spin_unlock(&ipmmu_devices_backup_lock);
> +}
> +
> +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> +
> + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> +
> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> + {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> + unsigned int i;
> +
> + if ( to_ipmmu(backup_data->dev) != mmu )
> + continue;
> +
> + for ( i = 0; i < fwspec->num_ids; i++ )
> + {
> + unsigned int utlb = fwspec->ids[i];
> +
> + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
> + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
> + }
> + }
> +
> + spin_unlock(&ipmmu_devices_backup_lock);
> +}
> +
> +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
> +{
> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> +
> + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
> +
> + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
> + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
> + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
> + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
> +}
> +
> +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
> +{
> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> +
> + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
> +
> + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
> + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
> + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
> + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
> +}
> +
> +/*
> + * Xen: Unlike Linux implementation, Xen uses a single driver instance
> + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
> + * callbacks to be invoked for each IPMMU device. So, we need to iterate
> + * through all registered IPMMUs performing required actions.
> + *
> + * Also take care of restoring special settings, such as translation
> + * table format, etc.
> + */
> +static int __must_check ipmmu_suspend(void)
> +{
> + struct ipmmu_vmsa_device *mmu;
> +
> + if ( !iommu_enabled )
> + return 0;
> +
> + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
> +
> + spin_lock(&ipmmu_devices_lock);
> +
> + list_for_each_entry( mmu, &ipmmu_devices, list )
> + {
> + if ( ipmmu_is_root(mmu) )
> + {
> + unsigned int i;
> +
> + for ( i = 0; i < mmu->num_ctx; i++ )
> + {
> + if ( !mmu->domains[i] )
> + continue;
> + ipmmu_domain_backup_context(mmu->domains[i]);
> + }
> + }
> + else
> + ipmmu_utlbs_backup(mmu);
> + }
> +
> + spin_unlock(&ipmmu_devices_lock);
> +
> + return 0;
> +}
> +
> +static void ipmmu_resume(void)
> +{
> + struct ipmmu_vmsa_device *mmu;
> +
> + if ( !iommu_enabled )
> + return;
> +
> + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
> +
> + spin_lock(&ipmmu_devices_lock);
> +
> + list_for_each_entry( mmu, &ipmmu_devices, list )
> + {
> + uint32_t reg;
> +
> + /* Do not use security group function */
> + reg = IMSCTLR + mmu->features->control_offset_base;
> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
> +
> + if ( ipmmu_is_root(mmu) )
> + {
> + unsigned int i;
> +
> + /* Use stage 2 translation table format */
> + reg = IMSAUXCTLR + mmu->features->control_offset_base;
> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
> +
> + for ( i = 0; i < mmu->num_ctx; i++ )
> + {
> + if ( !mmu->domains[i] )
> + continue;
> + ipmmu_domain_restore_context(mmu->domains[i]);
> + }
> + }
> + else
> + ipmmu_utlbs_restore(mmu);
> + }
> +
> + spin_unlock(&ipmmu_devices_lock);
> +}
> +
> +static int ipmmu_alloc_ctx_suspend(struct device *dev)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> + unsigned int *utlbs_val, *asids_val;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
> + if ( !utlbs_val )
> + return -ENOMEM;
> +
> + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
> + if ( !asids_val )
> + {
> + xfree(utlbs_val);
> + return -ENOMEM;
> + }
> +
> + backup_data = xzalloc(struct ipmmu_vmsa_backup);
> + if ( !backup_data )
> + {
> + xfree(utlbs_val);
> + xfree(asids_val);
> + return -ENOMEM;
> + }
> +
> + backup_data->dev = dev;
> + backup_data->utlbs_val = utlbs_val;
> + backup_data->asids_val = asids_val;
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> + list_add(&backup_data->list, &ipmmu_devices_backup);
> + spin_unlock(&ipmmu_devices_backup_lock);
> +
> + return 0;
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> {
> uint64_t ttbr;
> @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> return ret;
>
> domain->context_id = ret;
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
> +#endif
>
> /*
> * TTBR0
> @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
> ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
> ipmmu_tlb_sync(domain);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + domain->mmu->root->reg_backup[domain->context_id] = NULL;
> +#endif
> ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
> }
>
> @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> }
> #endif
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + if ( ipmmu_alloc_ctx_suspend(dev) )
> + {
> + dev_err(dev, "Failed to allocate context for suspend\n");
> + return -ENOMEM;
> + }
> +#endif
> +
> dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
> dev_name(fwspec->iommu_dev), fwspec->num_ids);
>
> @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = ipmmu_dt_xlate,
> .add_device = ipmmu_add_device,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = ipmmu_suspend,
> + .resume = ipmmu_resume,
> +#endif
> };
>
> static __init int ipmmu_init(struct dt_device_node *node, const void *data)
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-01 22:10 ` [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
2025-09-02 20:39 ` Volodymyr Babchuk
@ 2025-09-03 10:01 ` Oleksandr Tyshchenko
2025-09-03 10:25 ` Mykola Kvach
1 sibling, 1 reply; 49+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-03 10:01 UTC (permalink / raw)
To: Mykola Kvach, xen-devel@lists.xenproject.org
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Mykola Kvach
On 02.09.25 01:10, Mykola Kvach wrote:
Hello Mykola
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Store and restore active context and micro-TLB registers.
>
> Tested on R-Car H3 Starter Kit.
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V6:
> - refactor code related to hw_register struct, from now it's called
> ipmmu_reg_ctx
The updated version looks good, thanks. However, I have one
concern/request ...
> ---
> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
> 1 file changed, 257 insertions(+)
>
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index ea9fa9ddf3..0973559861 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -71,6 +71,8 @@
> })
> #endif
>
> +#define dev_dbg(dev, fmt, ...) \
> + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> #define dev_info(dev, fmt, ...) \
> dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> #define dev_warn(dev, fmt, ...) \
> @@ -130,6 +132,24 @@ struct ipmmu_features {
> unsigned int imuctr_ttsel_mask;
> };
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +struct ipmmu_reg_ctx {
> + unsigned int imttlbr0;
> + unsigned int imttubr0;
> + unsigned int imttbcr;
> + unsigned int imctr;
> +};
> +
> +struct ipmmu_vmsa_backup {
> + struct device *dev;
> + unsigned int *utlbs_val;
> + unsigned int *asids_val;
> + struct list_head list;
> +};
> +
> +#endif
> +
> /* Root/Cache IPMMU device's information */
> struct ipmmu_vmsa_device {
> struct device *dev;
> @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
> struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> unsigned int utlb_refcount[IPMMU_UTLB_MAX];
> const struct ipmmu_features *features;
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
> +#endif
> };
>
> /*
> @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
> spin_unlock_irqrestore(&mmu->lock, flags);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +
> +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
> +static LIST_HEAD(ipmmu_devices_backup);
> +
> +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
> +
> +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
> + unsigned int utlb)
> +{
> + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
> +}
> +
> +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> +
> + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> +
> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> + {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> + unsigned int i;
> +
> + if ( to_ipmmu(backup_data->dev) != mmu )
> + continue;
> +
> + for ( i = 0; i < fwspec->num_ids; i++ )
> + {
> + unsigned int utlb = fwspec->ids[i];
> +
> + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
> + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
> + }
> + }
> +
> + spin_unlock(&ipmmu_devices_backup_lock);
> +}
> +
> +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> +
> + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> +
> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> + {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> + unsigned int i;
> +
> + if ( to_ipmmu(backup_data->dev) != mmu )
> + continue;
> +
> + for ( i = 0; i < fwspec->num_ids; i++ )
> + {
> + unsigned int utlb = fwspec->ids[i];
> +
> + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
> + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
> + }
> + }
> +
> + spin_unlock(&ipmmu_devices_backup_lock);
> +}
> +
> +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
> +{
> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> +
> + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
> +
> + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
> + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
> + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
> + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
> +}
> +
> +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
> +{
> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> +
> + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
> +
> + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
> + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
> + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
> + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
> +}
> +
> +/*
> + * Xen: Unlike Linux implementation, Xen uses a single driver instance
> + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
> + * callbacks to be invoked for each IPMMU device. So, we need to iterate
> + * through all registered IPMMUs performing required actions.
> + *
> + * Also take care of restoring special settings, such as translation
> + * table format, etc.
> + */
> +static int __must_check ipmmu_suspend(void)
> +{
> + struct ipmmu_vmsa_device *mmu;
> +
> + if ( !iommu_enabled )
> + return 0;
> +
> + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
> +
> + spin_lock(&ipmmu_devices_lock);
> +
> + list_for_each_entry( mmu, &ipmmu_devices, list )
> + {
> + if ( ipmmu_is_root(mmu) )
> + {
> + unsigned int i;
> +
> + for ( i = 0; i < mmu->num_ctx; i++ )
> + {
> + if ( !mmu->domains[i] )
> + continue;
> + ipmmu_domain_backup_context(mmu->domains[i]);
> + }
> + }
> + else
> + ipmmu_utlbs_backup(mmu);
> + }
> +
> + spin_unlock(&ipmmu_devices_lock);
> +
> + return 0;
> +}
> +
> +static void ipmmu_resume(void)
> +{
> + struct ipmmu_vmsa_device *mmu;
> +
> + if ( !iommu_enabled )
> + return;
> +
> + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
> +
> + spin_lock(&ipmmu_devices_lock);
> +
> + list_for_each_entry( mmu, &ipmmu_devices, list )
> + {
> + uint32_t reg;
> +
> + /* Do not use security group function */
> + reg = IMSCTLR + mmu->features->control_offset_base;
> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
> +
> + if ( ipmmu_is_root(mmu) )
> + {
> + unsigned int i;
> +
> + /* Use stage 2 translation table format */
> + reg = IMSAUXCTLR + mmu->features->control_offset_base;
> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
> +
> + for ( i = 0; i < mmu->num_ctx; i++ )
> + {
> + if ( !mmu->domains[i] )
> + continue;
> + ipmmu_domain_restore_context(mmu->domains[i]);
> + }
> + }
> + else
> + ipmmu_utlbs_restore(mmu);
> + }
> +
> + spin_unlock(&ipmmu_devices_lock);
> +}
> +
> +static int ipmmu_alloc_ctx_suspend(struct device *dev)
> +{
> + struct ipmmu_vmsa_backup *backup_data;
> + unsigned int *utlbs_val, *asids_val;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
> + if ( !utlbs_val )
> + return -ENOMEM;
> +
> + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
> + if ( !asids_val )
> + {
> + xfree(utlbs_val);
> + return -ENOMEM;
> + }
> +
> + backup_data = xzalloc(struct ipmmu_vmsa_backup);
> + if ( !backup_data )
> + {
> + xfree(utlbs_val);
> + xfree(asids_val);
> + return -ENOMEM;
> + }
> +
> + backup_data->dev = dev;
> + backup_data->utlbs_val = utlbs_val;
> + backup_data->asids_val = asids_val;
> +
> + spin_lock(&ipmmu_devices_backup_lock);
> + list_add(&backup_data->list, &ipmmu_devices_backup);
> + spin_unlock(&ipmmu_devices_backup_lock);
> +
> + return 0;
> +}
> +
> +#endif /* CONFIG_SYSTEM_SUSPEND */
> +
> static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> {
> uint64_t ttbr;
> @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> return ret;
>
> domain->context_id = ret;
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
> +#endif
>
> /*
> * TTBR0
> @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
> ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
> ipmmu_tlb_sync(domain);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + domain->mmu->root->reg_backup[domain->context_id] = NULL;
> +#endif
> ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
> }
>
> @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> }
> #endif
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + if ( ipmmu_alloc_ctx_suspend(dev) )
> + {
> + dev_err(dev, "Failed to allocate context for suspend\n");
> + return -ENOMEM;
> + }
> +#endif
... The initial version was based on the driver code without PCI
support, but it is now present. There is PCI-specific code above in this
function (not visible in the context) that performs some initialization,
allocation and device assignment. What I mean is that in case of the
suspend context allocation error here, we will need to undo these
actions (i.e. deassign device). I would move this context allocation
(whose probability to fail is much lower than what is done for PCI dev)
above the PCI-specific stuff, and perform the context freeing on the
error path.
> +
> dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
> dev_name(fwspec->iommu_dev), fwspec->num_ids);
>
> @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = ipmmu_dt_xlate,
> .add_device = ipmmu_add_device,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = ipmmu_suspend,
> + .resume = ipmmu_resume,
> +#endif
> };
>
> static __init int ipmmu_init(struct dt_device_node *node, const void *data)
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-03 10:01 ` Oleksandr Tyshchenko
@ 2025-09-03 10:25 ` Mykola Kvach
2025-09-03 11:49 ` Oleksandr Tyshchenko
0 siblings, 1 reply; 49+ messages in thread
From: Mykola Kvach @ 2025-09-03 10:25 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel@lists.xenproject.org, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Mykola Kvach
Hi Oleksandr,
On Wed, Sep 3, 2025 at 1:01 PM Oleksandr Tyshchenko
<Oleksandr_Tyshchenko@epam.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
>
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >
> > Store and restore active context and micro-TLB registers.
> >
> > Tested on R-Car H3 Starter Kit.
> >
> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - refactor code related to hw_register struct, from now it's called
> > ipmmu_reg_ctx
>
> The updated version looks good, thanks. However, I have one
> concern/request ...
>
> > ---
> > xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
> > 1 file changed, 257 insertions(+)
> >
> > diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> > index ea9fa9ddf3..0973559861 100644
> > --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> > +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> > @@ -71,6 +71,8 @@
> > })
> > #endif
> >
> > +#define dev_dbg(dev, fmt, ...) \
> > + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> > #define dev_info(dev, fmt, ...) \
> > dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> > #define dev_warn(dev, fmt, ...) \
> > @@ -130,6 +132,24 @@ struct ipmmu_features {
> > unsigned int imuctr_ttsel_mask;
> > };
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +struct ipmmu_reg_ctx {
> > + unsigned int imttlbr0;
> > + unsigned int imttubr0;
> > + unsigned int imttbcr;
> > + unsigned int imctr;
> > +};
> > +
> > +struct ipmmu_vmsa_backup {
> > + struct device *dev;
> > + unsigned int *utlbs_val;
> > + unsigned int *asids_val;
> > + struct list_head list;
> > +};
> > +
> > +#endif
> > +
> > /* Root/Cache IPMMU device's information */
> > struct ipmmu_vmsa_device {
> > struct device *dev;
> > @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
> > struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> > unsigned int utlb_refcount[IPMMU_UTLB_MAX];
> > const struct ipmmu_features *features;
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
> > +#endif
> > };
> >
> > /*
> > @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
> > spin_unlock_irqrestore(&mmu->lock, flags);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +
> > +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
> > +static LIST_HEAD(ipmmu_devices_backup);
> > +
> > +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
> > +
> > +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
> > + unsigned int utlb)
> > +{
> > + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
> > +}
> > +
> > +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
> > +{
> > + struct ipmmu_vmsa_backup *backup_data;
> > +
> > + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
> > +
> > + spin_lock(&ipmmu_devices_backup_lock);
> > +
> > + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> > + {
> > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> > + unsigned int i;
> > +
> > + if ( to_ipmmu(backup_data->dev) != mmu )
> > + continue;
> > +
> > + for ( i = 0; i < fwspec->num_ids; i++ )
> > + {
> > + unsigned int utlb = fwspec->ids[i];
> > +
> > + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
> > + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
> > + }
> > + }
> > +
> > + spin_unlock(&ipmmu_devices_backup_lock);
> > +}
> > +
> > +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
> > +{
> > + struct ipmmu_vmsa_backup *backup_data;
> > +
> > + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
> > +
> > + spin_lock(&ipmmu_devices_backup_lock);
> > +
> > + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> > + {
> > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> > + unsigned int i;
> > +
> > + if ( to_ipmmu(backup_data->dev) != mmu )
> > + continue;
> > +
> > + for ( i = 0; i < fwspec->num_ids; i++ )
> > + {
> > + unsigned int utlb = fwspec->ids[i];
> > +
> > + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
> > + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
> > + }
> > + }
> > +
> > + spin_unlock(&ipmmu_devices_backup_lock);
> > +}
> > +
> > +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
> > +{
> > + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> > + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> > +
> > + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
> > +
> > + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
> > + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
> > + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
> > + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
> > +}
> > +
> > +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
> > +{
> > + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> > + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> > +
> > + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
> > +
> > + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
> > + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
> > + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
> > + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
> > +}
> > +
> > +/*
> > + * Xen: Unlike Linux implementation, Xen uses a single driver instance
> > + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
> > + * callbacks to be invoked for each IPMMU device. So, we need to iterate
> > + * through all registered IPMMUs performing required actions.
> > + *
> > + * Also take care of restoring special settings, such as translation
> > + * table format, etc.
> > + */
> > +static int __must_check ipmmu_suspend(void)
> > +{
> > + struct ipmmu_vmsa_device *mmu;
> > +
> > + if ( !iommu_enabled )
> > + return 0;
> > +
> > + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
> > +
> > + spin_lock(&ipmmu_devices_lock);
> > +
> > + list_for_each_entry( mmu, &ipmmu_devices, list )
> > + {
> > + if ( ipmmu_is_root(mmu) )
> > + {
> > + unsigned int i;
> > +
> > + for ( i = 0; i < mmu->num_ctx; i++ )
> > + {
> > + if ( !mmu->domains[i] )
> > + continue;
> > + ipmmu_domain_backup_context(mmu->domains[i]);
> > + }
> > + }
> > + else
> > + ipmmu_utlbs_backup(mmu);
> > + }
> > +
> > + spin_unlock(&ipmmu_devices_lock);
> > +
> > + return 0;
> > +}
> > +
> > +static void ipmmu_resume(void)
> > +{
> > + struct ipmmu_vmsa_device *mmu;
> > +
> > + if ( !iommu_enabled )
> > + return;
> > +
> > + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
> > +
> > + spin_lock(&ipmmu_devices_lock);
> > +
> > + list_for_each_entry( mmu, &ipmmu_devices, list )
> > + {
> > + uint32_t reg;
> > +
> > + /* Do not use security group function */
> > + reg = IMSCTLR + mmu->features->control_offset_base;
> > + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
> > +
> > + if ( ipmmu_is_root(mmu) )
> > + {
> > + unsigned int i;
> > +
> > + /* Use stage 2 translation table format */
> > + reg = IMSAUXCTLR + mmu->features->control_offset_base;
> > + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
> > +
> > + for ( i = 0; i < mmu->num_ctx; i++ )
> > + {
> > + if ( !mmu->domains[i] )
> > + continue;
> > + ipmmu_domain_restore_context(mmu->domains[i]);
> > + }
> > + }
> > + else
> > + ipmmu_utlbs_restore(mmu);
> > + }
> > +
> > + spin_unlock(&ipmmu_devices_lock);
> > +}
> > +
> > +static int ipmmu_alloc_ctx_suspend(struct device *dev)
> > +{
> > + struct ipmmu_vmsa_backup *backup_data;
> > + unsigned int *utlbs_val, *asids_val;
> > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> > +
> > + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
> > + if ( !utlbs_val )
> > + return -ENOMEM;
> > +
> > + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
> > + if ( !asids_val )
> > + {
> > + xfree(utlbs_val);
> > + return -ENOMEM;
> > + }
> > +
> > + backup_data = xzalloc(struct ipmmu_vmsa_backup);
> > + if ( !backup_data )
> > + {
> > + xfree(utlbs_val);
> > + xfree(asids_val);
> > + return -ENOMEM;
> > + }
> > +
> > + backup_data->dev = dev;
> > + backup_data->utlbs_val = utlbs_val;
> > + backup_data->asids_val = asids_val;
> > +
> > + spin_lock(&ipmmu_devices_backup_lock);
> > + list_add(&backup_data->list, &ipmmu_devices_backup);
> > + spin_unlock(&ipmmu_devices_backup_lock);
> > +
> > + return 0;
> > +}
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> > static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> > {
> > uint64_t ttbr;
> > @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> > return ret;
> >
> > domain->context_id = ret;
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
> > +#endif
> >
> > /*
> > * TTBR0
> > @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
> > ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
> > ipmmu_tlb_sync(domain);
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + domain->mmu->root->reg_backup[domain->context_id] = NULL;
> > +#endif
> > ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
> > }
> >
> > @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> > }
> > #endif
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + if ( ipmmu_alloc_ctx_suspend(dev) )
> > + {
> > + dev_err(dev, "Failed to allocate context for suspend\n");
> > + return -ENOMEM;
> > + }
> > +#endif
>
> ... The initial version was based on the driver code without PCI
> support, but it is now present. There is PCI-specific code above in this
> function (not visible in the context) that performs some initialization,
> allocation and device assignment. What I mean is that in case of the
> suspend context allocation error here, we will need to undo these
> actions (i.e. deassign device). I would move this context allocation
> (whose probability to fail is much lower than what is done for PCI dev)
> above the PCI-specific stuff, and perform the context freeing on the
> error path.
Maybe it would be better just to add some checks to the suspend handler.
We could skip suspend in case the context is not available, and avoid
deallocating previously allocated stuff. This is similar to what is
done for GICs.
What do you think? Or do you prefer to destroy everything related to the
IOMMU here on error?
>
> > +
> > dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
> > dev_name(fwspec->iommu_dev), fwspec->num_ids);
> >
> > @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
> > .unmap_page = arm_iommu_unmap_page,
> > .dt_xlate = ipmmu_dt_xlate,
> > .add_device = ipmmu_add_device,
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + .suspend = ipmmu_suspend,
> > + .resume = ipmmu_resume,
> > +#endif
> > };
> >
> > static __init int ipmmu_init(struct dt_device_node *node, const void *data)
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-03 10:25 ` Mykola Kvach
@ 2025-09-03 11:49 ` Oleksandr Tyshchenko
2025-09-03 15:12 ` Mykola Kvach
0 siblings, 1 reply; 49+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-03 11:49 UTC (permalink / raw)
To: Mykola Kvach, Oleksandr Tyshchenko
Cc: xen-devel@lists.xenproject.org, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Mykola Kvach
On 03.09.25 13:25, Mykola Kvach wrote:
> Hi Oleksandr,
Hello Mykola
>
> On Wed, Sep 3, 2025 at 1:01 PM Oleksandr Tyshchenko
> <Oleksandr_Tyshchenko@epam.com> wrote:
>>
>>
>>
>> On 02.09.25 01:10, Mykola Kvach wrote:
>>
>> Hello Mykola
>>
>>
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> Store and restore active context and micro-TLB registers.
>>>
>>> Tested on R-Car H3 Starter Kit.
>>>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
>>> ---
>>> Changes in V6:
>>> - refactor code related to hw_register struct, from now it's called
>>> ipmmu_reg_ctx
>>
>> The updated version looks good, thanks. However, I have one
>> concern/request ...
>>
>>> ---
>>> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
>>> 1 file changed, 257 insertions(+)
>>>
>>> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>>> index ea9fa9ddf3..0973559861 100644
>>> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>>> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>>> @@ -71,6 +71,8 @@
>>> })
>>> #endif
>>>
>>> +#define dev_dbg(dev, fmt, ...) \
>>> + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
>>> #define dev_info(dev, fmt, ...) \
>>> dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
>>> #define dev_warn(dev, fmt, ...) \
>>> @@ -130,6 +132,24 @@ struct ipmmu_features {
>>> unsigned int imuctr_ttsel_mask;
>>> };
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> +
>>> +struct ipmmu_reg_ctx {
>>> + unsigned int imttlbr0;
>>> + unsigned int imttubr0;
>>> + unsigned int imttbcr;
>>> + unsigned int imctr;
>>> +};
>>> +
>>> +struct ipmmu_vmsa_backup {
>>> + struct device *dev;
>>> + unsigned int *utlbs_val;
>>> + unsigned int *asids_val;
>>> + struct list_head list;
>>> +};
>>> +
>>> +#endif
>>> +
>>> /* Root/Cache IPMMU device's information */
>>> struct ipmmu_vmsa_device {
>>> struct device *dev;
>>> @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
>>> struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
>>> unsigned int utlb_refcount[IPMMU_UTLB_MAX];
>>> const struct ipmmu_features *features;
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
>>> +#endif
>>> };
>>>
>>> /*
>>> @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
>>> spin_unlock_irqrestore(&mmu->lock, flags);
>>> }
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> +
>>> +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
>>> +static LIST_HEAD(ipmmu_devices_backup);
>>> +
>>> +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
>>> +
>>> +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
>>> + unsigned int utlb)
>>> +{
>>> + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
>>> +}
>>> +
>>> +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
>>> +{
>>> + struct ipmmu_vmsa_backup *backup_data;
>>> +
>>> + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
>>> +
>>> + spin_lock(&ipmmu_devices_backup_lock);
>>> +
>>> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
>>> + {
>>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
>>> + unsigned int i;
>>> +
>>> + if ( to_ipmmu(backup_data->dev) != mmu )
>>> + continue;
>>> +
>>> + for ( i = 0; i < fwspec->num_ids; i++ )
>>> + {
>>> + unsigned int utlb = fwspec->ids[i];
>>> +
>>> + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
>>> + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
>>> + }
>>> + }
>>> +
>>> + spin_unlock(&ipmmu_devices_backup_lock);
>>> +}
>>> +
>>> +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
>>> +{
>>> + struct ipmmu_vmsa_backup *backup_data;
>>> +
>>> + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
>>> +
>>> + spin_lock(&ipmmu_devices_backup_lock);
>>> +
>>> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
>>> + {
>>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
>>> + unsigned int i;
>>> +
>>> + if ( to_ipmmu(backup_data->dev) != mmu )
>>> + continue;
>>> +
>>> + for ( i = 0; i < fwspec->num_ids; i++ )
>>> + {
>>> + unsigned int utlb = fwspec->ids[i];
>>> +
>>> + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
>>> + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
>>> + }
>>> + }
>>> +
>>> + spin_unlock(&ipmmu_devices_backup_lock);
>>> +}
>>> +
>>> +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
>>> +{
>>> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
>>> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
>>> +
>>> + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
>>> +
>>> + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
>>> + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
>>> + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
>>> + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
>>> +}
>>> +
>>> +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
>>> +{
>>> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
>>> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
>>> +
>>> + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
>>> +
>>> + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
>>> + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
>>> + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
>>> + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
>>> +}
>>> +
>>> +/*
>>> + * Xen: Unlike Linux implementation, Xen uses a single driver instance
>>> + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
>>> + * callbacks to be invoked for each IPMMU device. So, we need to iterate
>>> + * through all registered IPMMUs performing required actions.
>>> + *
>>> + * Also take care of restoring special settings, such as translation
>>> + * table format, etc.
>>> + */
>>> +static int __must_check ipmmu_suspend(void)
>>> +{
>>> + struct ipmmu_vmsa_device *mmu;
>>> +
>>> + if ( !iommu_enabled )
>>> + return 0;
>>> +
>>> + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
>>> +
>>> + spin_lock(&ipmmu_devices_lock);
>>> +
>>> + list_for_each_entry( mmu, &ipmmu_devices, list )
>>> + {
>>> + if ( ipmmu_is_root(mmu) )
>>> + {
>>> + unsigned int i;
>>> +
>>> + for ( i = 0; i < mmu->num_ctx; i++ )
>>> + {
>>> + if ( !mmu->domains[i] )
>>> + continue;
>>> + ipmmu_domain_backup_context(mmu->domains[i]);
>>> + }
>>> + }
>>> + else
>>> + ipmmu_utlbs_backup(mmu);
>>> + }
>>> +
>>> + spin_unlock(&ipmmu_devices_lock);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void ipmmu_resume(void)
>>> +{
>>> + struct ipmmu_vmsa_device *mmu;
>>> +
>>> + if ( !iommu_enabled )
>>> + return;
>>> +
>>> + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
>>> +
>>> + spin_lock(&ipmmu_devices_lock);
>>> +
>>> + list_for_each_entry( mmu, &ipmmu_devices, list )
>>> + {
>>> + uint32_t reg;
>>> +
>>> + /* Do not use security group function */
>>> + reg = IMSCTLR + mmu->features->control_offset_base;
>>> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
>>> +
>>> + if ( ipmmu_is_root(mmu) )
>>> + {
>>> + unsigned int i;
>>> +
>>> + /* Use stage 2 translation table format */
>>> + reg = IMSAUXCTLR + mmu->features->control_offset_base;
>>> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
>>> +
>>> + for ( i = 0; i < mmu->num_ctx; i++ )
>>> + {
>>> + if ( !mmu->domains[i] )
>>> + continue;
>>> + ipmmu_domain_restore_context(mmu->domains[i]);
>>> + }
>>> + }
>>> + else
>>> + ipmmu_utlbs_restore(mmu);
>>> + }
>>> +
>>> + spin_unlock(&ipmmu_devices_lock);
>>> +}
>>> +
>>> +static int ipmmu_alloc_ctx_suspend(struct device *dev)
>>> +{
>>> + struct ipmmu_vmsa_backup *backup_data;
>>> + unsigned int *utlbs_val, *asids_val;
>>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>>> +
>>> + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
>>> + if ( !utlbs_val )
>>> + return -ENOMEM;
>>> +
>>> + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
>>> + if ( !asids_val )
>>> + {
>>> + xfree(utlbs_val);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + backup_data = xzalloc(struct ipmmu_vmsa_backup);
>>> + if ( !backup_data )
>>> + {
>>> + xfree(utlbs_val);
>>> + xfree(asids_val);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + backup_data->dev = dev;
>>> + backup_data->utlbs_val = utlbs_val;
>>> + backup_data->asids_val = asids_val;
>>> +
>>> + spin_lock(&ipmmu_devices_backup_lock);
>>> + list_add(&backup_data->list, &ipmmu_devices_backup);
>>> + spin_unlock(&ipmmu_devices_backup_lock);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +#endif /* CONFIG_SYSTEM_SUSPEND */
>>> +
>>> static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>>> {
>>> uint64_t ttbr;
>>> @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>>> return ret;
>>>
>>> domain->context_id = ret;
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
>>> +#endif
>>>
>>> /*
>>> * TTBR0
>>> @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
>>> ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
>>> ipmmu_tlb_sync(domain);
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + domain->mmu->root->reg_backup[domain->context_id] = NULL;
>>> +#endif
>>> ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
>>> }
>>>
>>> @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
>>> }
>>> #endif
>>>
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + if ( ipmmu_alloc_ctx_suspend(dev) )
>>> + {
>>> + dev_err(dev, "Failed to allocate context for suspend\n");
>>> + return -ENOMEM;
>>> + }
>>> +#endif
>>
>> ... The initial version was based on the driver code without PCI
>> support, but it is now present. There is PCI-specific code above in this
>> function (not visible in the context) that performs some initialization,
>> allocation and device assignment. What I mean is that in case of the
>> suspend context allocation error here, we will need to undo these
>> actions (i.e. deassign device). I would move this context allocation
>> (whose probability to fail is much lower than what is done for PCI dev)
>> above the PCI-specific stuff, and perform the context freeing on the
>> error path.
>
> Maybe it would be better just to add some checks to the suspend handler.
> We could skip suspend in case the context is not available, and avoid
> deallocating previously allocated stuff. This is similar to what is
> done for GICs.
>
> What do you think? Or do you prefer to destroy everything related to the
> IOMMU here on error?
I would prefer if we fail early here in ipmmu_add_device (and rollback
changes) rather than continue and fail later, other people might think
differently. I think, if we cannot simply allocate a memory for the
sctructures that situation is bad.
>
>>
>>> +
>>> dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
>>> dev_name(fwspec->iommu_dev), fwspec->num_ids);
>>>
>>> @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
>>> .unmap_page = arm_iommu_unmap_page,
>>> .dt_xlate = ipmmu_dt_xlate,
>>> .add_device = ipmmu_add_device,
>>> +#ifdef CONFIG_SYSTEM_SUSPEND
>>> + .suspend = ipmmu_suspend,
>>> + .resume = ipmmu_resume,
>>> +#endif
>>> };
>>>
>>> static __init int ipmmu_init(struct dt_device_node *node, const void *data)
>
> Best regards,
> Mykola
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks
2025-09-03 11:49 ` Oleksandr Tyshchenko
@ 2025-09-03 15:12 ` Mykola Kvach
0 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-03 15:12 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: Oleksandr Tyshchenko, xen-devel@lists.xenproject.org,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Mykola Kvach
On Wed, Sep 3, 2025 at 2:49 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 03.09.25 13:25, Mykola Kvach wrote:
> > Hi Oleksandr,
>
> Hello Mykola
>
> >
> > On Wed, Sep 3, 2025 at 1:01 PM Oleksandr Tyshchenko
> > <Oleksandr_Tyshchenko@epam.com> wrote:
> >>
> >>
> >>
> >> On 02.09.25 01:10, Mykola Kvach wrote:
> >>
> >> Hello Mykola
> >>
> >>
> >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>>
> >>> Store and restore active context and micro-TLB registers.
> >>>
> >>> Tested on R-Car H3 Starter Kit.
> >>>
> >>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> >>> ---
> >>> Changes in V6:
> >>> - refactor code related to hw_register struct, from now it's called
> >>> ipmmu_reg_ctx
> >>
> >> The updated version looks good, thanks. However, I have one
> >> concern/request ...
> >>
> >>> ---
> >>> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 +++++++++++++++++++++++
> >>> 1 file changed, 257 insertions(+)
> >>>
> >>> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> >>> index ea9fa9ddf3..0973559861 100644
> >>> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> >>> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> >>> @@ -71,6 +71,8 @@
> >>> })
> >>> #endif
> >>>
> >>> +#define dev_dbg(dev, fmt, ...) \
> >>> + dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> >>> #define dev_info(dev, fmt, ...) \
> >>> dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> >>> #define dev_warn(dev, fmt, ...) \
> >>> @@ -130,6 +132,24 @@ struct ipmmu_features {
> >>> unsigned int imuctr_ttsel_mask;
> >>> };
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> +
> >>> +struct ipmmu_reg_ctx {
> >>> + unsigned int imttlbr0;
> >>> + unsigned int imttubr0;
> >>> + unsigned int imttbcr;
> >>> + unsigned int imctr;
> >>> +};
> >>> +
> >>> +struct ipmmu_vmsa_backup {
> >>> + struct device *dev;
> >>> + unsigned int *utlbs_val;
> >>> + unsigned int *asids_val;
> >>> + struct list_head list;
> >>> +};
> >>> +
> >>> +#endif
> >>> +
> >>> /* Root/Cache IPMMU device's information */
> >>> struct ipmmu_vmsa_device {
> >>> struct device *dev;
> >>> @@ -142,6 +162,9 @@ struct ipmmu_vmsa_device {
> >>> struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> >>> unsigned int utlb_refcount[IPMMU_UTLB_MAX];
> >>> const struct ipmmu_features *features;
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + struct ipmmu_reg_ctx *reg_backup[IPMMU_CTX_MAX];
> >>> +#endif
> >>> };
> >>>
> >>> /*
> >>> @@ -547,6 +570,222 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
> >>> spin_unlock_irqrestore(&mmu->lock, flags);
> >>> }
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> +
> >>> +static DEFINE_SPINLOCK(ipmmu_devices_backup_lock);
> >>> +static LIST_HEAD(ipmmu_devices_backup);
> >>> +
> >>> +static struct ipmmu_reg_ctx root_pgtable[IPMMU_CTX_MAX];
> >>> +
> >>> +static uint32_t ipmmu_imuasid_read(struct ipmmu_vmsa_device *mmu,
> >>> + unsigned int utlb)
> >>> +{
> >>> + return ipmmu_read(mmu, ipmmu_utlb_reg(mmu, IMUASID(utlb)));
> >>> +}
> >>> +
> >>> +static void ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
> >>> +{
> >>> + struct ipmmu_vmsa_backup *backup_data;
> >>> +
> >>> + dev_dbg(mmu->dev, "Handle micro-TLBs backup\n");
> >>> +
> >>> + spin_lock(&ipmmu_devices_backup_lock);
> >>> +
> >>> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> >>> + {
> >>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> >>> + unsigned int i;
> >>> +
> >>> + if ( to_ipmmu(backup_data->dev) != mmu )
> >>> + continue;
> >>> +
> >>> + for ( i = 0; i < fwspec->num_ids; i++ )
> >>> + {
> >>> + unsigned int utlb = fwspec->ids[i];
> >>> +
> >>> + backup_data->asids_val[i] = ipmmu_imuasid_read(mmu, utlb);
> >>> + backup_data->utlbs_val[i] = ipmmu_imuctr_read(mmu, utlb);
> >>> + }
> >>> + }
> >>> +
> >>> + spin_unlock(&ipmmu_devices_backup_lock);
> >>> +}
> >>> +
> >>> +static void ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
> >>> +{
> >>> + struct ipmmu_vmsa_backup *backup_data;
> >>> +
> >>> + dev_dbg(mmu->dev, "Handle micro-TLBs restore\n");
> >>> +
> >>> + spin_lock(&ipmmu_devices_backup_lock);
> >>> +
> >>> + list_for_each_entry( backup_data, &ipmmu_devices_backup, list )
> >>> + {
> >>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(backup_data->dev);
> >>> + unsigned int i;
> >>> +
> >>> + if ( to_ipmmu(backup_data->dev) != mmu )
> >>> + continue;
> >>> +
> >>> + for ( i = 0; i < fwspec->num_ids; i++ )
> >>> + {
> >>> + unsigned int utlb = fwspec->ids[i];
> >>> +
> >>> + ipmmu_imuasid_write(mmu, utlb, backup_data->asids_val[i]);
> >>> + ipmmu_imuctr_write(mmu, utlb, backup_data->utlbs_val[i]);
> >>> + }
> >>> + }
> >>> +
> >>> + spin_unlock(&ipmmu_devices_backup_lock);
> >>> +}
> >>> +
> >>> +static void ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
> >>> +{
> >>> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> >>> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> >>> +
> >>> + dev_dbg(mmu->dev, "Handle domain context %u backup\n", domain->context_id);
> >>> +
> >>> + regs->imttlbr0 = ipmmu_ctx_read_root(domain, IMTTLBR0);
> >>> + regs->imttubr0 = ipmmu_ctx_read_root(domain, IMTTUBR0);
> >>> + regs->imttbcr = ipmmu_ctx_read_root(domain, IMTTBCR);
> >>> + regs->imctr = ipmmu_ctx_read_root(domain, IMCTR);
> >>> +}
> >>> +
> >>> +static void ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
> >>> +{
> >>> + struct ipmmu_vmsa_device *mmu = domain->mmu->root;
> >>> + struct ipmmu_reg_ctx *regs = mmu->reg_backup[domain->context_id];
> >>> +
> >>> + dev_dbg(mmu->dev, "Handle domain context %u restore\n", domain->context_id);
> >>> +
> >>> + ipmmu_ctx_write_root(domain, IMTTLBR0, regs->imttlbr0);
> >>> + ipmmu_ctx_write_root(domain, IMTTUBR0, regs->imttubr0);
> >>> + ipmmu_ctx_write_root(domain, IMTTBCR, regs->imttbcr);
> >>> + ipmmu_ctx_write_all(domain, IMCTR, regs->imctr | IMCTR_FLUSH);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Xen: Unlike Linux implementation, Xen uses a single driver instance
> >>> + * for handling all IPMMUs. There is no framework for ipmmu_suspend/resume
> >>> + * callbacks to be invoked for each IPMMU device. So, we need to iterate
> >>> + * through all registered IPMMUs performing required actions.
> >>> + *
> >>> + * Also take care of restoring special settings, such as translation
> >>> + * table format, etc.
> >>> + */
> >>> +static int __must_check ipmmu_suspend(void)
> >>> +{
> >>> + struct ipmmu_vmsa_device *mmu;
> >>> +
> >>> + if ( !iommu_enabled )
> >>> + return 0;
> >>> +
> >>> + printk(XENLOG_DEBUG "ipmmu: Suspending ...\n");
> >>> +
> >>> + spin_lock(&ipmmu_devices_lock);
> >>> +
> >>> + list_for_each_entry( mmu, &ipmmu_devices, list )
> >>> + {
> >>> + if ( ipmmu_is_root(mmu) )
> >>> + {
> >>> + unsigned int i;
> >>> +
> >>> + for ( i = 0; i < mmu->num_ctx; i++ )
> >>> + {
> >>> + if ( !mmu->domains[i] )
> >>> + continue;
> >>> + ipmmu_domain_backup_context(mmu->domains[i]);
> >>> + }
> >>> + }
> >>> + else
> >>> + ipmmu_utlbs_backup(mmu);
> >>> + }
> >>> +
> >>> + spin_unlock(&ipmmu_devices_lock);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void ipmmu_resume(void)
> >>> +{
> >>> + struct ipmmu_vmsa_device *mmu;
> >>> +
> >>> + if ( !iommu_enabled )
> >>> + return;
> >>> +
> >>> + printk(XENLOG_DEBUG "ipmmu: Resuming ...\n");
> >>> +
> >>> + spin_lock(&ipmmu_devices_lock);
> >>> +
> >>> + list_for_each_entry( mmu, &ipmmu_devices, list )
> >>> + {
> >>> + uint32_t reg;
> >>> +
> >>> + /* Do not use security group function */
> >>> + reg = IMSCTLR + mmu->features->control_offset_base;
> >>> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) & ~IMSCTLR_USE_SECGRP);
> >>> +
> >>> + if ( ipmmu_is_root(mmu) )
> >>> + {
> >>> + unsigned int i;
> >>> +
> >>> + /* Use stage 2 translation table format */
> >>> + reg = IMSAUXCTLR + mmu->features->control_offset_base;
> >>> + ipmmu_write(mmu, reg, ipmmu_read(mmu, reg) | IMSAUXCTLR_S2PTE);
> >>> +
> >>> + for ( i = 0; i < mmu->num_ctx; i++ )
> >>> + {
> >>> + if ( !mmu->domains[i] )
> >>> + continue;
> >>> + ipmmu_domain_restore_context(mmu->domains[i]);
> >>> + }
> >>> + }
> >>> + else
> >>> + ipmmu_utlbs_restore(mmu);
> >>> + }
> >>> +
> >>> + spin_unlock(&ipmmu_devices_lock);
> >>> +}
> >>> +
> >>> +static int ipmmu_alloc_ctx_suspend(struct device *dev)
> >>> +{
> >>> + struct ipmmu_vmsa_backup *backup_data;
> >>> + unsigned int *utlbs_val, *asids_val;
> >>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> >>> +
> >>> + utlbs_val = xzalloc_array(unsigned int, fwspec->num_ids);
> >>> + if ( !utlbs_val )
> >>> + return -ENOMEM;
> >>> +
> >>> + asids_val = xzalloc_array(unsigned int, fwspec->num_ids);
> >>> + if ( !asids_val )
> >>> + {
> >>> + xfree(utlbs_val);
> >>> + return -ENOMEM;
> >>> + }
> >>> +
> >>> + backup_data = xzalloc(struct ipmmu_vmsa_backup);
> >>> + if ( !backup_data )
> >>> + {
> >>> + xfree(utlbs_val);
> >>> + xfree(asids_val);
> >>> + return -ENOMEM;
> >>> + }
> >>> +
> >>> + backup_data->dev = dev;
> >>> + backup_data->utlbs_val = utlbs_val;
> >>> + backup_data->asids_val = asids_val;
> >>> +
> >>> + spin_lock(&ipmmu_devices_backup_lock);
> >>> + list_add(&backup_data->list, &ipmmu_devices_backup);
> >>> + spin_unlock(&ipmmu_devices_backup_lock);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +#endif /* CONFIG_SYSTEM_SUSPEND */
> >>> +
> >>> static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> >>> {
> >>> uint64_t ttbr;
> >>> @@ -559,6 +798,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
> >>> return ret;
> >>>
> >>> domain->context_id = ret;
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + domain->mmu->root->reg_backup[ret] = &root_pgtable[ret];
> >>> +#endif
> >>>
> >>> /*
> >>> * TTBR0
> >>> @@ -615,6 +857,9 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
> >>> ipmmu_ctx_write_root(domain, IMCTR, IMCTR_FLUSH);
> >>> ipmmu_tlb_sync(domain);
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + domain->mmu->root->reg_backup[domain->context_id] = NULL;
> >>> +#endif
> >>> ipmmu_domain_free_context(domain->mmu->root, domain->context_id);
> >>> }
> >>>
> >>> @@ -1427,6 +1672,14 @@ static int ipmmu_add_device(u8 devfn, struct device *dev)
> >>> }
> >>> #endif
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + if ( ipmmu_alloc_ctx_suspend(dev) )
> >>> + {
> >>> + dev_err(dev, "Failed to allocate context for suspend\n");
> >>> + return -ENOMEM;
> >>> + }
> >>> +#endif
> >>
> >> ... The initial version was based on the driver code without PCI
> >> support, but it is now present. There is PCI-specific code above in this
> >> function (not visible in the context) that performs some initialization,
> >> allocation and device assignment. What I mean is that in case of the
> >> suspend context allocation error here, we will need to undo these
> >> actions (i.e. deassign device). I would move this context allocation
> >> (whose probability to fail is much lower than what is done for PCI dev)
> >> above the PCI-specific stuff, and perform the context freeing on the
> >> error path.
> >
> > Maybe it would be better just to add some checks to the suspend handler.
> > We could skip suspend in case the context is not available, and avoid
> > deallocating previously allocated stuff. This is similar to what is
> > done for GICs.
> >
> > What do you think? Or do you prefer to destroy everything related to the
> > IOMMU here on error?
>
> I would prefer if we fail early here in ipmmu_add_device (and rollback
> changes) rather than continue and fail later, other people might think
> differently. I think, if we cannot simply allocate a memory for the
> sctructures that situation is bad.
Got it, I’ll fix this in the next version of the patch series.
Thank you for pointing that out.
>
>
>
> >
> >>
> >>> +
> >>> dev_info(dev, "Added master device (IPMMU %s micro-TLBs %u)\n",
> >>> dev_name(fwspec->iommu_dev), fwspec->num_ids);
> >>>
> >>> @@ -1492,6 +1745,10 @@ static const struct iommu_ops ipmmu_iommu_ops =
> >>> .unmap_page = arm_iommu_unmap_page,
> >>> .dt_xlate = ipmmu_dt_xlate,
> >>> .add_device = ipmmu_add_device,
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> + .suspend = ipmmu_suspend,
> >>> + .resume = ipmmu_resume,
> >>> +#endif
> >>> };
> >>>
> >>> static __init int ipmmu_init(struct dt_device_node *node, const void *data)
> >
> > Best regards,
> > Mykola
> >
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v6 08/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (6 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 07/13] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 09/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
` (5 subsequent siblings)
13 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
Invoke PSCI SYSTEM_SUSPEND to finalize Xen's suspend sequence on ARM64 platforms.
Pass the resume entry point (hyp_resume) as the first argument to EL3. The resume
handler is currently a stub and will be implemented later in assembly. Ignore the
context ID argument, as is done in Linux.
Only enable this path when CONFIG_SYSTEM_SUSPEND is set and
PSCI version is >= 1.0.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- move calling of call_psci_system_suspend to commit that
implements system_suspend call
Changes in v4:
- select the appropriate PSCI SYSTEM_SUSPEND function ID based on platform
- update comments and commit message to reflect recent changes
Changes in v3:
- return PSCI_NOT_SUPPORTED instead of a hardcoded 1 on ARM32
- check PSCI version before invoking SYSTEM_SUSPEND in call_psci_system_suspend
---
xen/arch/arm/arm64/head.S | 8 ++++++++
xen/arch/arm/include/asm/psci.h | 1 +
xen/arch/arm/include/asm/suspend.h | 22 ++++++++++++++++++++++
xen/arch/arm/psci.c | 23 ++++++++++++++++++++++-
4 files changed, 53 insertions(+), 1 deletion(-)
create mode 100644 xen/arch/arm/include/asm/suspend.h
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 72c7b24498..3522c497c5 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -561,6 +561,14 @@ END(efi_xen_start)
#endif /* CONFIG_ARM_EFI */
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+FUNC(hyp_resume)
+ b .
+END(hyp_resume)
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
/*
* Local variables:
* mode: ASM
diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
index 48a93e6b79..bb3c73496e 100644
--- a/xen/arch/arm/include/asm/psci.h
+++ b/xen/arch/arm/include/asm/psci.h
@@ -23,6 +23,7 @@ int call_psci_cpu_on(int cpu);
void call_psci_cpu_off(void);
void call_psci_system_off(void);
void call_psci_system_reset(void);
+int call_psci_system_suspend(void);
/* Range of allocated PSCI function numbers */
#define PSCI_FNUM_MIN_VALUE _AC(0,U)
diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
new file mode 100644
index 0000000000..7e04c6e915
--- /dev/null
+++ b/xen/arch/arm/include/asm/suspend.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_ARM_SUSPEND_H__
+#define __ASM_ARM_SUSPEND_H__
+
+#ifdef CONFIG_SYSTEM_SUSPEND
+
+void hyp_resume(void);
+
+#endif /* CONFIG_SYSTEM_SUSPEND */
+
+#endif /* __ASM_ARM_SUSPEND_H__ */
+
+ /*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index b6860a7760..c9d126b195 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -17,17 +17,20 @@
#include <asm/cpufeature.h>
#include <asm/psci.h>
#include <asm/acpi.h>
+#include <asm/suspend.h>
/*
* While a 64-bit OS can make calls with SMC32 calling conventions, for
* some calls it is necessary to use SMC64 to pass or return 64-bit values.
- * For such calls PSCI_0_2_FN_NATIVE(x) will choose the appropriate
+ * For such calls PSCI_*_FN_NATIVE(x) will choose the appropriate
* (native-width) function ID.
*/
#ifdef CONFIG_ARM_64
#define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN64_##name
+#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN64_##name
#else
#define PSCI_0_2_FN_NATIVE(name) PSCI_0_2_FN32_##name
+#define PSCI_1_0_FN_NATIVE(name) PSCI_1_0_FN32_##name
#endif
uint32_t psci_ver;
@@ -60,6 +63,24 @@ void call_psci_cpu_off(void)
}
}
+int call_psci_system_suspend(void)
+{
+#ifdef CONFIG_SYSTEM_SUSPEND
+ struct arm_smccc_res res;
+
+ if ( psci_ver < PSCI_VERSION(1, 0) )
+ return PSCI_NOT_SUPPORTED;
+
+ /* 2nd argument (context ID) is not used */
+ arm_smccc_smc(PSCI_1_0_FN_NATIVE(SYSTEM_SUSPEND), __pa(hyp_resume), &res);
+ return PSCI_RET(res);
+#else
+ dprintk(XENLOG_WARNING,
+ "SYSTEM_SUSPEND not supported (CONFIG_SYSTEM_SUSPEND disabled)\n");
+ return PSCI_NOT_SUPPORTED;
+#endif
+}
+
void call_psci_system_off(void)
{
if ( psci_ver > PSCI_VERSION(0, 1) )
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* [PATCH v6 09/13] xen/arm: Resume memory management on Xen resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (7 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 08/13] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 10/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
` (4 subsequent siblings)
13 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
The MMU must be enabled during the resume path before restoring context,
as virtual addresses are used to access the saved context data.
This patch adds MMU setup during resume by reusing the existing
enable_secondary_cpu_mm function, which enables data cache and the MMU.
Before the MMU is enabled, the content of TTBR0_EL2 is changed to point
to init_ttbr (page tables used at runtime).
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- moved changes related to set_init_ttbr to commit that implements
system_suspend call
Changes in v4:
- Drop unnecessary DAIF masking; interrupts are already masked on resume
- Remove leftover TLB flush instructions; flushing is done in enable_mmu
- Avoid setting x19 in hyp_resume; not needed
- Replace prepare_secondary_mm with set_init_ttbr; call it from system_suspend
Changes in v3:
- Update commit message for clarity
- Replace create_page_tables, enable_mmu, and mmu_init_secondary_cpu
with enable_secondary_cpu_mm
- Move prepare_secondary_mm to start_xen to avoid crash
- Add early UART init during resume
Changes in v2:
- Move hyp_resume to head.S to keep resume logic together
- Simplify hyp_resume using existing helpers: check_cpu_mode, cpu_init,
create_page_tables, enable_mmu
---
xen/arch/arm/arm64/head.S | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 3522c497c5..596e960152 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -564,6 +564,22 @@ END(efi_xen_start)
#ifdef CONFIG_SYSTEM_SUSPEND
FUNC(hyp_resume)
+ /* Initialize the UART if earlyprintk has been enabled. */
+#ifdef CONFIG_EARLY_PRINTK
+ bl init_uart
+#endif
+ PRINT_ID("- Xen resuming -\r\n")
+
+ bl check_cpu_mode
+ bl cpu_init
+
+ ldr x0, =start
+ adr x20, start /* x20 := paddr (start) */
+ sub x20, x20, x0 /* x20 := phys-offset */
+ ldr lr, =mmu_resumed
+ b enable_secondary_cpu_mm
+
+mmu_resumed:
b .
END(hyp_resume)
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* [PATCH v6 10/13] xen/arm: Save/restore context on suspend/resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (8 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 09/13] xen/arm: Resume memory management on Xen resume Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-01 22:10 ` [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
` (3 subsequent siblings)
13 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
The context of CPU general purpose and system control registers must be
saved on suspend and restored on resume. This is implemented in
prepare_resume_ctx and before the return from the hyp_resume function.
The prepare_resume_ctx must be invoked just before the PSCI system suspend
call is issued to the ATF. The prepare_resume_ctx must return a non-zero
value so that the calling 'if' statement evaluates to true, causing the
system suspend to be invoked. Upon resume, the context saved on suspend
will be restored, including the link register. Therefore, after
restoring the context, the control flow will return to the address
pointed to by the saved link register, which is the place from which
prepare_resume_ctx was called. To ensure that the calling 'if' statement
does not again evaluate to true and initiate system suspend, hyp_resume
must return a zero value after restoring the context.
Note that the order of saving register context into cpu_context structure
must match the order of restoring.
Support for ARM32 is not implemented. Instead, compilation fails with a
build-time error if suspend is enabled for ARM32.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- Rename hyp_suspend to prepare_resume_ctx
- moved invocation of prepare_resume_ctx to commit which
implements system_suspend call
Changes in v4:
- Produce build-time error for ARM32 when CONFIG_SYSTEM_SUSPEND is enabled
- Use register_t instead of uint64_t in cpu_context structure
---
xen/arch/arm/Makefile | 1 +
xen/arch/arm/arm64/head.S | 90 +++++++++++++++++++++++++++++-
xen/arch/arm/include/asm/suspend.h | 22 ++++++++
xen/arch/arm/suspend.c | 14 +++++
4 files changed, 126 insertions(+), 1 deletion(-)
create mode 100644 xen/arch/arm/suspend.c
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index f833cdf207..3f6247adee 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -51,6 +51,7 @@ obj-y += setup.o
obj-y += shutdown.o
obj-y += smp.o
obj-y += smpboot.o
+obj-$(CONFIG_SYSTEM_SUSPEND) += suspend.o
obj-$(CONFIG_SYSCTL) += sysctl.o
obj-y += time.o
obj-y += traps.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 596e960152..c6594c0bdd 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -562,6 +562,52 @@ END(efi_xen_start)
#endif /* CONFIG_ARM_EFI */
#ifdef CONFIG_SYSTEM_SUSPEND
+/*
+ * int prepare_resume_ctx(struct cpu_context *ptr)
+ *
+ * x0 - pointer to the storage where callee's context will be saved
+ *
+ * CPU context saved here will be restored on resume in hyp_resume function.
+ * prepare_resume_ctx shall return a non-zero value. Upon restoring context
+ * hyp_resume shall return value zero instead. From C code that invokes
+ * prepare_resume_ctx, the return value is interpreted to determine whether
+ * the context is saved (prepare_resume_ctx) or restored (hyp_resume).
+ */
+FUNC(prepare_resume_ctx)
+ /* Store callee-saved registers */
+ stp x19, x20, [x0], #16
+ stp x21, x22, [x0], #16
+ stp x23, x24, [x0], #16
+ stp x25, x26, [x0], #16
+ stp x27, x28, [x0], #16
+ stp x29, lr, [x0], #16
+
+ /* Store stack-pointer */
+ mov x2, sp
+ str x2, [x0], #8
+
+ /* Store system control registers */
+ mrs x2, VBAR_EL2
+ str x2, [x0], #8
+ mrs x2, VTCR_EL2
+ str x2, [x0], #8
+ mrs x2, VTTBR_EL2
+ str x2, [x0], #8
+ mrs x2, TPIDR_EL2
+ str x2, [x0], #8
+ mrs x2, MDCR_EL2
+ str x2, [x0], #8
+ mrs x2, HSTR_EL2
+ str x2, [x0], #8
+ mrs x2, CPTR_EL2
+ str x2, [x0], #8
+ mrs x2, HCR_EL2
+ str x2, [x0], #8
+
+ /* prepare_resume_ctx must return a non-zero value */
+ mov x0, #1
+ ret
+END(prepare_resume_ctx)
FUNC(hyp_resume)
/* Initialize the UART if earlyprintk has been enabled. */
@@ -580,7 +626,49 @@ FUNC(hyp_resume)
b enable_secondary_cpu_mm
mmu_resumed:
- b .
+ /* Now we can access the cpu_context, so restore the context here */
+ ldr x0, =cpu_context
+
+ /* Restore callee-saved registers */
+ ldp x19, x20, [x0], #16
+ ldp x21, x22, [x0], #16
+ ldp x23, x24, [x0], #16
+ ldp x25, x26, [x0], #16
+ ldp x27, x28, [x0], #16
+ ldp x29, lr, [x0], #16
+
+ /* Restore stack pointer */
+ ldr x2, [x0], #8
+ mov sp, x2
+
+ /* Restore system control registers */
+ ldr x2, [x0], #8
+ msr VBAR_EL2, x2
+ ldr x2, [x0], #8
+ msr VTCR_EL2, x2
+ ldr x2, [x0], #8
+ msr VTTBR_EL2, x2
+ ldr x2, [x0], #8
+ msr TPIDR_EL2, x2
+ ldr x2, [x0], #8
+ msr MDCR_EL2, x2
+ ldr x2, [x0], #8
+ msr HSTR_EL2, x2
+ ldr x2, [x0], #8
+ msr CPTR_EL2, x2
+ ldr x2, [x0], #8
+ msr HCR_EL2, x2
+ isb
+
+ /*
+ * Since context is restored return from this function will appear
+ * as return from prepare_resume_ctx. To distinguish a return from
+ * prepare_resume_ctx which is called upon finalizing the suspend,
+ * as opposed to return from this function which executes on resume,
+ * we need to return zero value here.
+ */
+ mov x0, #0
+ ret
END(hyp_resume)
#endif /* CONFIG_SYSTEM_SUSPEND */
diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
index 7e04c6e915..29eed4ee7f 100644
--- a/xen/arch/arm/include/asm/suspend.h
+++ b/xen/arch/arm/include/asm/suspend.h
@@ -3,9 +3,31 @@
#ifndef __ASM_ARM_SUSPEND_H__
#define __ASM_ARM_SUSPEND_H__
+#include <asm/types.h>
+
#ifdef CONFIG_SYSTEM_SUSPEND
+#ifdef CONFIG_ARM_64
+struct cpu_context {
+ register_t callee_regs[12];
+ register_t sp;
+ register_t vbar_el2;
+ register_t vtcr_el2;
+ register_t vttbr_el2;
+ register_t tpidr_el2;
+ register_t mdcr_el2;
+ register_t hstr_el2;
+ register_t cptr_el2;
+ register_t hcr_el2;
+} __aligned(16);
+#else
+#error "Define cpu_context structure for arm32"
+#endif
+
+extern struct cpu_context cpu_context;
+
void hyp_resume(void);
+int prepare_resume_ctx(struct cpu_context *ptr);
#endif /* CONFIG_SYSTEM_SUSPEND */
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
new file mode 100644
index 0000000000..5093f1bf3d
--- /dev/null
+++ b/xen/arch/arm/suspend.c
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <asm/suspend.h>
+
+struct cpu_context cpu_context;
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (9 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 10/13] xen/arm: Save/restore context on suspend/resume Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 5:56 ` Mykola Kvach
2025-09-02 14:33 ` Jan Beulich
2025-09-01 22:10 ` [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
` (2 subsequent siblings)
13 siblings, 2 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mirela Simonovic, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
Anthony PERARD, Jan Beulich, Roger Pau Monné, Saeed Nowshadi,
Mykyta Poturai, Mykola Kvach
From: Mirela Simonovic <mirela.simonovic@aggios.com>
Trigger Xen suspend when the hardware domain initiates suspend via
SHUTDOWN_suspend. Redirect system suspend to CPU#0 to ensure the
suspend logic runs on the boot CPU, as required.
Introduce full suspend/resume infrastructure gated by CONFIG_SYSTEM_SUSPEND,
including logic to:
- disable and enable non-boot physical CPUs
- freeze and thaw domains
- suspend and resume the GIC, timer, iommu and console
- maintain system state before and after suspend
On boot, init_ttbr is normally initialized during secondary CPU hotplug.
On uniprocessor systems, this would leave init_ttbr uninitialized,
causing resume to fail. To address this, the boot CPU now sets init_ttbr
during suspend.
Remove the restriction in the vPSCI interface preventing suspend from the
hardware domain.
Select HAS_SYSTEM_SUSPEND for ARM_64.
Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in v6:
- Minor changes in comments.
- The implementation now uses own tasklet instead of continue_hypercall_on_cpu,
as the latter rewrites user registers and would tie system suspend status
to guest suspend status.
- The order of calls when suspending devices has been updated.
Changes in v5:
- select HAS_SYSTEM_SUSPEND in ARM_64 instead of ARM
- check llc_coloring_enabled instead of LLC_COLORING during the selection
of HAS_SYSTEM_SUSPEND config
- call host_system_suspend from guest PSCI system suspend instead of
arch_domain_shutdown, reducing the complexity of the new code
- update some comments
Changes introduced in V4:
- drop code for saving and restoring VCPU context (for more information
refer part 1 of patch series)
- remove IOMMU suspend and resume calls until they will be implemented
- move system suspend logic to arch_domain_shutdown, invoked from
domain_shutdown
- apply minor fixes such as renaming functions, updating comments, and
modifying the commit message to reflect these changes
- add console_end_sync to resume path after system suspend
Changes introduced in V3:
- merge changes from other commits into this patch (previously stashed):
1) disable/enable non-boot physical CPUs and freeze/thaw domains during
suspend/resume
2) suspend/resume the timer, GIC, console, IOMMU, and hardware domain
---
xen/arch/arm/Kconfig | 1 +
xen/arch/arm/include/asm/mm.h | 2 +
xen/arch/arm/include/asm/suspend.h | 2 +
xen/arch/arm/mmu/smpboot.c | 2 +-
xen/arch/arm/suspend.c | 150 +++++++++++++++++++++++++++++
xen/arch/arm/vpsci.c | 9 +-
xen/common/domain.c | 4 +
7 files changed, 168 insertions(+), 2 deletions(-)
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 5355534f3d..fdad53fd68 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -8,6 +8,7 @@ config ARM_64
depends on !ARM_32
select 64BIT
select HAS_FAST_MULTIPLY
+ select HAS_SYSTEM_SUSPEND if UNSUPPORTED
select HAS_VPCI_GUEST_SUPPORT if PCI_PASSTHROUGH
config ARM
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 7a93dad2ed..61e211d087 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -365,6 +365,8 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
} while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
}
+void set_init_ttbr(lpae_t *root);
+
#endif /* __ARCH_ARM_MM__ */
/*
* Local variables:
diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
index 29eed4ee7f..8d30b01b7c 100644
--- a/xen/arch/arm/include/asm/suspend.h
+++ b/xen/arch/arm/include/asm/suspend.h
@@ -29,6 +29,8 @@ extern struct cpu_context cpu_context;
void hyp_resume(void);
int prepare_resume_ctx(struct cpu_context *ptr);
+void host_system_suspend(void);
+
#endif /* CONFIG_SYSTEM_SUSPEND */
#endif /* __ASM_ARM_SUSPEND_H__ */
diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
index 37e91d72b7..ff508ecf40 100644
--- a/xen/arch/arm/mmu/smpboot.c
+++ b/xen/arch/arm/mmu/smpboot.c
@@ -72,7 +72,7 @@ static void clear_boot_pagetables(void)
clear_table(boot_third);
}
-static void set_init_ttbr(lpae_t *root)
+void set_init_ttbr(lpae_t *root)
{
/*
* init_ttbr is part of the identity mapping which is read-only. So
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
index 5093f1bf3d..35b20581f1 100644
--- a/xen/arch/arm/suspend.c
+++ b/xen/arch/arm/suspend.c
@@ -1,9 +1,159 @@
/* SPDX-License-Identifier: GPL-2.0-only */
+#include <asm/psci.h>
#include <asm/suspend.h>
+#include <xen/console.h>
+#include <xen/cpu.h>
+#include <xen/llc-coloring.h>
+#include <xen/sched.h>
+#include <xen/tasklet.h>
+
+/*
+ * TODO list:
+ * - Decide which domain will trigger system suspend ctl or hw ?
+ * - Test system suspend with LLC_COLORING enabled and verify functionality
+ * - Implement IOMMU suspend/resume handlers and integrate them
+ * into the suspend/resume path (SMMU)
+ * - Enable "xl suspend" support on ARM architecture
+ * - Properly disable Xen timer watchdog from relevant services (init.d left)
+ * - Add suspend/resume CI test for ARM (QEMU if feasible)
+ * - Investigate feasibility and need for implementing system suspend on ARM32
+ */
+
struct cpu_context cpu_context;
+/* Xen suspend. Note: data is not used (suspend is the suspend to RAM) */
+static void cf_check system_suspend(void *data)
+{
+ int status;
+ unsigned long flags;
+ /* TODO: drop check after verification that features can work together */
+ if ( llc_coloring_enabled )
+ {
+ dprintk(XENLOG_ERR,
+ "System suspend is not supported with LLC_COLORING enabled\n");
+ status = -ENOSYS;
+ goto dom_resume;
+ }
+
+ BUG_ON(system_state != SYS_STATE_active);
+
+ system_state = SYS_STATE_suspend;
+
+ printk("Xen suspending...\n");
+
+ freeze_domains();
+ scheduler_disable();
+
+ /*
+ * Non-boot CPUs have to be disabled on suspend and enabled on resume
+ * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
+ * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
+ * platform capabilities, this may lead to the physical powering down of
+ * CPUs.
+ */
+ status = disable_nonboot_cpus();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_nonboot_cpus;
+ }
+
+ time_suspend();
+
+ console_start_sync();
+ status = console_suspend();
+ if ( status )
+ {
+ dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", status);
+ system_state = SYS_STATE_resume;
+ goto resume_console;
+ }
+
+ local_irq_save(flags);
+ status = gic_suspend();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_irqs;
+ }
+
+ set_init_ttbr(xen_pgtable);
+
+ /*
+ * Enable identity mapping before entering suspend to simplify
+ * the resume path
+ */
+ update_boot_mapping(true);
+
+ if ( prepare_resume_ctx(&cpu_context) )
+ {
+ status = call_psci_system_suspend();
+ /*
+ * If suspend is finalized properly by above system suspend PSCI call,
+ * the code below in this 'if' branch will never execute. Execution
+ * will continue from hyp_resume which is the hypervisor's resume point.
+ * In hyp_resume CPU context will be restored and since link-register is
+ * restored as well, it will appear to return from prepare_resume_ctx.
+ * The difference in returning from prepare_resume_ctx on system suspend
+ * versus resume is in function's return value: on suspend, the return
+ * value is a non-zero value, on resume it is zero. That is why the
+ * control flow will not re-enter this 'if' branch on resume.
+ */
+ if ( status )
+ dprintk(XENLOG_WARNING, "PSCI system suspend failed, err=%d\n",
+ status);
+ }
+
+ system_state = SYS_STATE_resume;
+ update_boot_mapping(false);
+
+ gic_resume();
+
+ resume_irqs:
+ local_irq_restore(flags);
+
+ resume_console:
+ console_resume();
+ console_end_sync();
+
+ time_resume();
+
+ resume_nonboot_cpus:
+ /*
+ * The rcu_barrier() has to be added to ensure that the per cpu area is
+ * freed before a non-boot CPU tries to initialize it (_free_percpu_area()
+ * has to be called before the init_percpu_area()). This scenario occurs
+ * when non-boot CPUs are hot-unplugged on suspend and hotplugged on resume.
+ */
+ rcu_barrier();
+ enable_nonboot_cpus();
+ scheduler_enable();
+ thaw_domains();
+
+ system_state = SYS_STATE_active;
+
+ printk("Resume (status %d)\n", status);
+
+ dom_resume:
+ /* The resume of hardware domain should always follow Xen's resume. */
+ domain_resume(hardware_domain);
+}
+
+static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
+
+void host_system_suspend(void)
+{
+ /*
+ * system_suspend should be called when hardware domain finalizes the
+ * suspend procedure from its boot core (VCPU#0). However, Dom0's vCPU#0
+ * could be mapped to any pCPU. The suspend procedure has to be finalized
+ * by the pCPU#0 (non-boot pCPUs will be disabled during the suspend).
+ */
+ tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
index 22c3a5f544..2f52aba48d 100644
--- a/xen/arch/arm/vpsci.c
+++ b/xen/arch/arm/vpsci.c
@@ -4,6 +4,7 @@
#include <xen/types.h>
#include <asm/current.h>
+#include <asm/suspend.h>
#include <asm/vgic.h>
#include <asm/vpsci.h>
#include <asm/event.h>
@@ -221,9 +222,10 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
if ( is_64bit_domain(d) && is_thumb )
return PSCI_INVALID_ADDRESS;
- /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
+#ifndef CONFIG_SYSTEM_SUSPEND
if ( is_hardware_domain(d) )
return PSCI_NOT_SUPPORTED;
+#endif
/* Ensure that all CPUs other than the calling one are offline */
domain_lock(d);
@@ -249,6 +251,11 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
"SYSTEM_SUSPEND requested, epoint=0x%"PRIregister", cid=0x%"PRIregister"\n",
epoint, cid);
+#ifdef CONFIG_SYSTEM_SUSPEND
+ if ( is_hardware_domain(d) )
+ host_system_suspend();
+#endif
+
return rc;
}
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 667017c5e1..5e224740d3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
d->shutdown_code = reason;
reason = d->shutdown_code;
+#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
+ if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
+#else
if ( is_hardware_domain(d) )
+#endif
hwdom_shutdown(reason);
if ( d->is_shutting_down )
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-01 22:10 ` [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
@ 2025-09-02 5:56 ` Mykola Kvach
2025-09-02 14:33 ` Jan Beulich
1 sibling, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-02 5:56 UTC (permalink / raw)
To: xen-devel
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD, Jan Beulich,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach
Hi everyone,
This is the first commit in the second part of the patch series that
cannot exist without part 1. All previous commits are independent and
do not depend on part 1.
On Tue, Sep 2, 2025 at 1:10 AM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> From: Mirela Simonovic <mirela.simonovic@aggios.com>
>
> Trigger Xen suspend when the hardware domain initiates suspend via
> SHUTDOWN_suspend. Redirect system suspend to CPU#0 to ensure the
> suspend logic runs on the boot CPU, as required.
>
> Introduce full suspend/resume infrastructure gated by CONFIG_SYSTEM_SUSPEND,
> including logic to:
> - disable and enable non-boot physical CPUs
> - freeze and thaw domains
> - suspend and resume the GIC, timer, iommu and console
> - maintain system state before and after suspend
>
> On boot, init_ttbr is normally initialized during secondary CPU hotplug.
> On uniprocessor systems, this would leave init_ttbr uninitialized,
> causing resume to fail. To address this, the boot CPU now sets init_ttbr
> during suspend.
>
> Remove the restriction in the vPSCI interface preventing suspend from the
> hardware domain.
>
> Select HAS_SYSTEM_SUSPEND for ARM_64.
>
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xilinx.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in v6:
> - Minor changes in comments.
> - The implementation now uses own tasklet instead of continue_hypercall_on_cpu,
> as the latter rewrites user registers and would tie system suspend status
> to guest suspend status.
> - The order of calls when suspending devices has been updated.
>
> Changes in v5:
> - select HAS_SYSTEM_SUSPEND in ARM_64 instead of ARM
> - check llc_coloring_enabled instead of LLC_COLORING during the selection
> of HAS_SYSTEM_SUSPEND config
> - call host_system_suspend from guest PSCI system suspend instead of
> arch_domain_shutdown, reducing the complexity of the new code
> - update some comments
>
> Changes introduced in V4:
> - drop code for saving and restoring VCPU context (for more information
> refer part 1 of patch series)
> - remove IOMMU suspend and resume calls until they will be implemented
> - move system suspend logic to arch_domain_shutdown, invoked from
> domain_shutdown
> - apply minor fixes such as renaming functions, updating comments, and
> modifying the commit message to reflect these changes
> - add console_end_sync to resume path after system suspend
>
> Changes introduced in V3:
> - merge changes from other commits into this patch (previously stashed):
> 1) disable/enable non-boot physical CPUs and freeze/thaw domains during
> suspend/resume
> 2) suspend/resume the timer, GIC, console, IOMMU, and hardware domain
> ---
> xen/arch/arm/Kconfig | 1 +
> xen/arch/arm/include/asm/mm.h | 2 +
> xen/arch/arm/include/asm/suspend.h | 2 +
> xen/arch/arm/mmu/smpboot.c | 2 +-
> xen/arch/arm/suspend.c | 150 +++++++++++++++++++++++++++++
> xen/arch/arm/vpsci.c | 9 +-
> xen/common/domain.c | 4 +
> 7 files changed, 168 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 5355534f3d..fdad53fd68 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -8,6 +8,7 @@ config ARM_64
> depends on !ARM_32
> select 64BIT
> select HAS_FAST_MULTIPLY
> + select HAS_SYSTEM_SUSPEND if UNSUPPORTED
> select HAS_VPCI_GUEST_SUPPORT if PCI_PASSTHROUGH
>
> config ARM
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 7a93dad2ed..61e211d087 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -365,6 +365,8 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
> } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
> }
>
> +void set_init_ttbr(lpae_t *root);
> +
> #endif /* __ARCH_ARM_MM__ */
> /*
> * Local variables:
> diff --git a/xen/arch/arm/include/asm/suspend.h b/xen/arch/arm/include/asm/suspend.h
> index 29eed4ee7f..8d30b01b7c 100644
> --- a/xen/arch/arm/include/asm/suspend.h
> +++ b/xen/arch/arm/include/asm/suspend.h
> @@ -29,6 +29,8 @@ extern struct cpu_context cpu_context;
> void hyp_resume(void);
> int prepare_resume_ctx(struct cpu_context *ptr);
>
> +void host_system_suspend(void);
> +
> #endif /* CONFIG_SYSTEM_SUSPEND */
>
> #endif /* __ASM_ARM_SUSPEND_H__ */
> diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
> index 37e91d72b7..ff508ecf40 100644
> --- a/xen/arch/arm/mmu/smpboot.c
> +++ b/xen/arch/arm/mmu/smpboot.c
> @@ -72,7 +72,7 @@ static void clear_boot_pagetables(void)
> clear_table(boot_third);
> }
>
> -static void set_init_ttbr(lpae_t *root)
> +void set_init_ttbr(lpae_t *root)
> {
> /*
> * init_ttbr is part of the identity mapping which is read-only. So
> diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> index 5093f1bf3d..35b20581f1 100644
> --- a/xen/arch/arm/suspend.c
> +++ b/xen/arch/arm/suspend.c
> @@ -1,9 +1,159 @@
> /* SPDX-License-Identifier: GPL-2.0-only */
>
> +#include <asm/psci.h>
> #include <asm/suspend.h>
>
> +#include <xen/console.h>
> +#include <xen/cpu.h>
> +#include <xen/llc-coloring.h>
> +#include <xen/sched.h>
> +#include <xen/tasklet.h>
> +
> +/*
> + * TODO list:
> + * - Decide which domain will trigger system suspend ctl or hw ?
> + * - Test system suspend with LLC_COLORING enabled and verify functionality
> + * - Implement IOMMU suspend/resume handlers and integrate them
> + * into the suspend/resume path (SMMU)
> + * - Enable "xl suspend" support on ARM architecture
> + * - Properly disable Xen timer watchdog from relevant services (init.d left)
> + * - Add suspend/resume CI test for ARM (QEMU if feasible)
> + * - Investigate feasibility and need for implementing system suspend on ARM32
> + */
> +
> struct cpu_context cpu_context;
>
> +/* Xen suspend. Note: data is not used (suspend is the suspend to RAM) */
> +static void cf_check system_suspend(void *data)
> +{
> + int status;
> + unsigned long flags;
> + /* TODO: drop check after verification that features can work together */
> + if ( llc_coloring_enabled )
> + {
> + dprintk(XENLOG_ERR,
> + "System suspend is not supported with LLC_COLORING enabled\n");
> + status = -ENOSYS;
> + goto dom_resume;
> + }
> +
> + BUG_ON(system_state != SYS_STATE_active);
> +
> + system_state = SYS_STATE_suspend;
> +
> + printk("Xen suspending...\n");
> +
> + freeze_domains();
> + scheduler_disable();
> +
> + /*
> + * Non-boot CPUs have to be disabled on suspend and enabled on resume
> + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to PSCI
> + * CPU_OFF to be called by each non-boot CPU. Depending on the underlying
> + * platform capabilities, this may lead to the physical powering down of
> + * CPUs.
> + */
> + status = disable_nonboot_cpus();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_nonboot_cpus;
> + }
> +
> + time_suspend();
> +
> + console_start_sync();
> + status = console_suspend();
> + if ( status )
> + {
> + dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", status);
> + system_state = SYS_STATE_resume;
> + goto resume_console;
> + }
> +
> + local_irq_save(flags);
> + status = gic_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_irqs;
> + }
> +
> + set_init_ttbr(xen_pgtable);
> +
> + /*
> + * Enable identity mapping before entering suspend to simplify
> + * the resume path
> + */
> + update_boot_mapping(true);
> +
> + if ( prepare_resume_ctx(&cpu_context) )
> + {
> + status = call_psci_system_suspend();
> + /*
> + * If suspend is finalized properly by above system suspend PSCI call,
> + * the code below in this 'if' branch will never execute. Execution
> + * will continue from hyp_resume which is the hypervisor's resume point.
> + * In hyp_resume CPU context will be restored and since link-register is
> + * restored as well, it will appear to return from prepare_resume_ctx.
> + * The difference in returning from prepare_resume_ctx on system suspend
> + * versus resume is in function's return value: on suspend, the return
> + * value is a non-zero value, on resume it is zero. That is why the
> + * control flow will not re-enter this 'if' branch on resume.
> + */
> + if ( status )
> + dprintk(XENLOG_WARNING, "PSCI system suspend failed, err=%d\n",
> + status);
> + }
> +
> + system_state = SYS_STATE_resume;
> + update_boot_mapping(false);
> +
> + gic_resume();
> +
> + resume_irqs:
> + local_irq_restore(flags);
> +
> + resume_console:
> + console_resume();
> + console_end_sync();
> +
> + time_resume();
> +
> + resume_nonboot_cpus:
> + /*
> + * The rcu_barrier() has to be added to ensure that the per cpu area is
> + * freed before a non-boot CPU tries to initialize it (_free_percpu_area()
> + * has to be called before the init_percpu_area()). This scenario occurs
> + * when non-boot CPUs are hot-unplugged on suspend and hotplugged on resume.
> + */
> + rcu_barrier();
> + enable_nonboot_cpus();
> + scheduler_enable();
> + thaw_domains();
> +
> + system_state = SYS_STATE_active;
> +
> + printk("Resume (status %d)\n", status);
> +
> + dom_resume:
> + /* The resume of hardware domain should always follow Xen's resume. */
> + domain_resume(hardware_domain);
> +}
> +
> +static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
> +
> +void host_system_suspend(void)
> +{
> + /*
> + * system_suspend should be called when hardware domain finalizes the
> + * suspend procedure from its boot core (VCPU#0). However, Dom0's vCPU#0
> + * could be mapped to any pCPU. The suspend procedure has to be finalized
> + * by the pCPU#0 (non-boot pCPUs will be disabled during the suspend).
> + */
> + tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
> +}
> +
> /*
> * Local variables:
> * mode: C
> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> index 22c3a5f544..2f52aba48d 100644
> --- a/xen/arch/arm/vpsci.c
> +++ b/xen/arch/arm/vpsci.c
> @@ -4,6 +4,7 @@
> #include <xen/types.h>
>
> #include <asm/current.h>
> +#include <asm/suspend.h>
> #include <asm/vgic.h>
> #include <asm/vpsci.h>
> #include <asm/event.h>
> @@ -221,9 +222,10 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> if ( is_64bit_domain(d) && is_thumb )
> return PSCI_INVALID_ADDRESS;
>
> - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> +#ifndef CONFIG_SYSTEM_SUSPEND
> if ( is_hardware_domain(d) )
> return PSCI_NOT_SUPPORTED;
> +#endif
>
> /* Ensure that all CPUs other than the calling one are offline */
> domain_lock(d);
> @@ -249,6 +251,11 @@ static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> "SYSTEM_SUSPEND requested, epoint=0x%"PRIregister", cid=0x%"PRIregister"\n",
> epoint, cid);
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + if ( is_hardware_domain(d) )
> + host_system_suspend();
> +#endif
> +
> return rc;
> }
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 667017c5e1..5e224740d3 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
> d->shutdown_code = reason;
> reason = d->shutdown_code;
>
> +#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
> + if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
> +#else
> if ( is_hardware_domain(d) )
> +#endif
> hwdom_shutdown(reason);
>
> if ( d->is_shutting_down )
> --
> 2.48.1
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-01 22:10 ` [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
2025-09-02 5:56 ` Mykola Kvach
@ 2025-09-02 14:33 ` Jan Beulich
2025-09-03 4:31 ` Mykola Kvach
1 sibling, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2025-09-02 14:33 UTC (permalink / raw)
To: Mykola Kvach
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
On 02.09.2025 00:10, Mykola Kvach wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
> d->shutdown_code = reason;
> reason = d->shutdown_code;
>
> +#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
> + if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
> +#else
> if ( is_hardware_domain(d) )
> +#endif
> hwdom_shutdown(reason);
I still don't follow why Arm-specific code needs to live here. If this
can't be properly abstracted, then at the very least I'd expect some
code comment here, or at the very, very least something in the description.
From looking at hwdom_shutdown() I get the impression that it doesn't
expect to be called with SHUTDOWN_suspend, yet then the question is why we
make it into domain_shutdown() with that reason code.
Jan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-02 14:33 ` Jan Beulich
@ 2025-09-03 4:31 ` Mykola Kvach
2025-09-09 6:29 ` Mykola Kvach
0 siblings, 1 reply; 49+ messages in thread
From: Mykola Kvach @ 2025-09-03 4:31 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
Hi Jan,
On Tue, Sep 2, 2025 at 5:33 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 02.09.2025 00:10, Mykola Kvach wrote:
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
> > d->shutdown_code = reason;
> > reason = d->shutdown_code;
> >
> > +#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
> > + if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
> > +#else
> > if ( is_hardware_domain(d) )
> > +#endif
> > hwdom_shutdown(reason);
>
> I still don't follow why Arm-specific code needs to live here. If this
> can't be properly abstracted, then at the very least I'd expect some
> code comment here, or at the very, very least something in the description.
Looks like I missed your comment about this in the previous version of
the patch series.
>
> From looking at hwdom_shutdown() I get the impression that it doesn't
> expect to be called with SHUTDOWN_suspend, yet then the question is why we
> make it into domain_shutdown() with that reason code.
Thank you for the question, it is a good one.
Thinking about it, with the current implementation (i.e. when the HW domain
requests system suspend), we don't really need to call domain_shutdown().
It would be enough to pause the last running vCPU (the current one) just to
make sure that we don't return control to the domain after exiting from the
hvc trap on the PSCI SYSTEM_SUSPEND command. We also need to set
shutting_down to ensure that any asynchronous code or timer callbacks
behave properly during suspend (i.e. skip their normal actions).
However, if we consider a setup with two separate domains -- one control and
one HW -- where the control domain makes the final decision about system
suspend, then we would need to call __domain_finalise_shutdown() during the
HW domain suspend in order to notify the control domain that the HW domain
state has changed. The control domain would then check this state and call
system suspend for itself after confirming that all other domains are in a
suspended state.
I already added a TODO about moving this logic to the control domain. So, at
first sight (unless I am missing something), we can avoid extra modifications
inside domain_shutdown() and simply avoid calling it in case of HW domain.
>
> Jan
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-03 4:31 ` Mykola Kvach
@ 2025-09-09 6:29 ` Mykola Kvach
2025-09-09 6:57 ` Jan Beulich
0 siblings, 1 reply; 49+ messages in thread
From: Mykola Kvach @ 2025-09-09 6:29 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
Hi Jan,
On Wed, Sep 3, 2025 at 7:31 AM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>
> Hi Jan,
>
> On Tue, Sep 2, 2025 at 5:33 PM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 02.09.2025 00:10, Mykola Kvach wrote:
> > > --- a/xen/common/domain.c
> > > +++ b/xen/common/domain.c
> > > @@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
> > > d->shutdown_code = reason;
> > > reason = d->shutdown_code;
> > >
> > > +#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
> > > + if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
> > > +#else
> > > if ( is_hardware_domain(d) )
> > > +#endif
> > > hwdom_shutdown(reason);
> >
> > I still don't follow why Arm-specific code needs to live here. If this
> > can't be properly abstracted, then at the very least I'd expect some
> > code comment here, or at the very, very least something in the description.
>
> Looks like I missed your comment about this in the previous version of
> the patch series.
>
> >
> > From looking at hwdom_shutdown() I get the impression that it doesn't
> > expect to be called with SHUTDOWN_suspend, yet then the question is why we
> > make it into domain_shutdown() with that reason code.
>
> Thank you for the question, it is a good one.
>
> Thinking about it, with the current implementation (i.e. when the HW domain
> requests system suspend), we don't really need to call domain_shutdown().
> It would be enough to pause the last running vCPU (the current one) just to
> make sure that we don't return control to the domain after exiting from the
> hvc trap on the PSCI SYSTEM_SUSPEND command. We also need to set
> shutting_down to ensure that any asynchronous code or timer callbacks
> behave properly during suspend (i.e. skip their normal actions).
If we avoid calling domain_shutdown() for the hardware domain during
suspend, we would need to duplicate most of its logic except for the
hwdom_shutdown() call, which is not ideal.
To improve this, I suggest introducing a helper function:
static inline bool need_hwdom_shutdown(const struct domain *d, u8 reason)
{
if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && IS_ENABLED(CONFIG_ARM) )
return is_hardware_domain(d) && reason != SHUTDOWN_suspend;
return is_hardware_domain(d);
}
Then, in domain_shutdown(), we can call need_hwdom_shutdown() instead
of directly checking is_hardware_domain(d). This keeps the logic
readable and avoids code duplication.
What do you think about this approach?
>
> However, if we consider a setup with two separate domains -- one control and
> one HW -- where the control domain makes the final decision about system
> suspend, then we would need to call __domain_finalise_shutdown() during the
> HW domain suspend in order to notify the control domain that the HW domain
> state has changed. The control domain would then check this state and call
> system suspend for itself after confirming that all other domains are in a
> suspended state.
>
> I already added a TODO about moving this logic to the control domain. So, at
> first sight (unless I am missing something), we can avoid extra modifications
> inside domain_shutdown() and simply avoid calling it in case of HW domain.
>
> >
> > Jan
>
> Best regards,
> Mykola
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-09 6:29 ` Mykola Kvach
@ 2025-09-09 6:57 ` Jan Beulich
2025-09-09 8:14 ` Mykola Kvach
0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2025-09-09 6:57 UTC (permalink / raw)
To: Mykola Kvach
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
On 09.09.2025 08:29, Mykola Kvach wrote:
> On Wed, Sep 3, 2025 at 7:31 AM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>> On Tue, Sep 2, 2025 at 5:33 PM Jan Beulich <jbeulich@suse.com> wrote:
>>> On 02.09.2025 00:10, Mykola Kvach wrote:
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
>>>> d->shutdown_code = reason;
>>>> reason = d->shutdown_code;
>>>>
>>>> +#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
>>>> + if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
>>>> +#else
>>>> if ( is_hardware_domain(d) )
>>>> +#endif
>>>> hwdom_shutdown(reason);
>>>
>>> I still don't follow why Arm-specific code needs to live here. If this
>>> can't be properly abstracted, then at the very least I'd expect some
>>> code comment here, or at the very, very least something in the description.
>>
>> Looks like I missed your comment about this in the previous version of
>> the patch series.
>>
>>>
>>> From looking at hwdom_shutdown() I get the impression that it doesn't
>>> expect to be called with SHUTDOWN_suspend, yet then the question is why we
>>> make it into domain_shutdown() with that reason code.
>>
>> Thank you for the question, it is a good one.
>>
>> Thinking about it, with the current implementation (i.e. when the HW domain
>> requests system suspend), we don't really need to call domain_shutdown().
>> It would be enough to pause the last running vCPU (the current one) just to
>> make sure that we don't return control to the domain after exiting from the
>> hvc trap on the PSCI SYSTEM_SUSPEND command. We also need to set
>> shutting_down to ensure that any asynchronous code or timer callbacks
>> behave properly during suspend (i.e. skip their normal actions).
>
> If we avoid calling domain_shutdown() for the hardware domain during
> suspend, we would need to duplicate most of its logic except for the
> hwdom_shutdown() call, which is not ideal.
That is, you effectively take back what you said earlier (as to not needing
to call domain_shutdown())?
> To improve this, I suggest introducing a helper function:
>
> static inline bool need_hwdom_shutdown(const struct domain *d, u8 reason)
> {
> if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && IS_ENABLED(CONFIG_ARM) )
> return is_hardware_domain(d) && reason != SHUTDOWN_suspend;
>
> return is_hardware_domain(d);
> }
If I see a call to a function of this name, I'd expect the "hardware
domain" nature already having been checked. I.e. a call site would
rather look like
if ( is_hardware_domain(d) && need_hwdom_shutdown(d, reason) )
...;
> Then, in domain_shutdown(), we can call need_hwdom_shutdown() instead
> of directly checking is_hardware_domain(d). This keeps the logic
> readable and avoids code duplication.
>
> What do you think about this approach?
Well, there's still the CONFIG_ARM check in there that I would like to
see gone. (As a nit, the use of u8 would also want to go away.)
Furthermore with continuing to (ab)use domain_shutdown() also for the
suspend case (Dom0 isn't really shut down when suspending, aiui), you
retain the widening of the issue with the bogus setting of
d->is_shutting_down (and hence the need for later clearing the flag
again) that I mentioned elsewhere. (Yes, I remain of the opinion that
you don't need to sort that as a prereq to your work, yet at the same
time I think the goal should be to at least not make a bad situation
worse.)
Jan
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-09 6:57 ` Jan Beulich
@ 2025-09-09 8:14 ` Mykola Kvach
2025-09-09 9:14 ` Jan Beulich
0 siblings, 1 reply; 49+ messages in thread
From: Mykola Kvach @ 2025-09-09 8:14 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
Thank you for the fast response.
On Tue, Sep 9, 2025 at 9:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 09.09.2025 08:29, Mykola Kvach wrote:
> > On Wed, Sep 3, 2025 at 7:31 AM Mykola Kvach <xakep.amatop@gmail.com> wrote:
> >> On Tue, Sep 2, 2025 at 5:33 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>> On 02.09.2025 00:10, Mykola Kvach wrote:
> >>>> --- a/xen/common/domain.c
> >>>> +++ b/xen/common/domain.c
> >>>> @@ -1317,7 +1317,11 @@ int domain_shutdown(struct domain *d, u8 reason)
> >>>> d->shutdown_code = reason;
> >>>> reason = d->shutdown_code;
> >>>>
> >>>> +#if defined(CONFIG_SYSTEM_SUSPEND) && defined(CONFIG_ARM)
> >>>> + if ( reason != SHUTDOWN_suspend && is_hardware_domain(d) )
> >>>> +#else
> >>>> if ( is_hardware_domain(d) )
> >>>> +#endif
> >>>> hwdom_shutdown(reason);
> >>>
> >>> I still don't follow why Arm-specific code needs to live here. If this
> >>> can't be properly abstracted, then at the very least I'd expect some
> >>> code comment here, or at the very, very least something in the description.
> >>
> >> Looks like I missed your comment about this in the previous version of
> >> the patch series.
> >>
> >>>
> >>> From looking at hwdom_shutdown() I get the impression that it doesn't
> >>> expect to be called with SHUTDOWN_suspend, yet then the question is why we
> >>> make it into domain_shutdown() with that reason code.
> >>
> >> Thank you for the question, it is a good one.
> >>
> >> Thinking about it, with the current implementation (i.e. when the HW domain
> >> requests system suspend), we don't really need to call domain_shutdown().
> >> It would be enough to pause the last running vCPU (the current one) just to
> >> make sure that we don't return control to the domain after exiting from the
> >> hvc trap on the PSCI SYSTEM_SUSPEND command. We also need to set
> >> shutting_down to ensure that any asynchronous code or timer callbacks
> >> behave properly during suspend (i.e. skip their normal actions).
> >
> > If we avoid calling domain_shutdown() for the hardware domain during
> > suspend, we would need to duplicate most of its logic except for the
> > hwdom_shutdown() call, which is not ideal.
>
> That is, you effectively take back what you said earlier (as to not needing
> to call domain_shutdown())?
Sure. Looking more closely, I see that for the vCPUs, for example, many flags
are checked. In the case of the control domain initializing shutdown, I need
to see the __domain_finalise_shutdown() call.
We currently don’t have any functionality inside arch_domain_shutdown()
for ARM, but it would be nice to have it in the future. Calling
domain_shutdown() for every domain makes the code more consistent.
The flow for all domains will be the same during suspend, at least within
Xen’s internal code.
>
> > To improve this, I suggest introducing a helper function:
> >
> > static inline bool need_hwdom_shutdown(const struct domain *d, u8 reason)
> > {
> > if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && IS_ENABLED(CONFIG_ARM) )
> > return is_hardware_domain(d) && reason != SHUTDOWN_suspend;
> >
> > return is_hardware_domain(d);
> > }
>
> If I see a call to a function of this name, I'd expect the "hardware
> domain" nature already having been checked. I.e. a call site would
> rather look like
>
> if ( is_hardware_domain(d) && need_hwdom_shutdown(d, reason) )
> ...;
>
For me, the name simply indicates whether we need to call
hwdom_shutdown() or not, and I expect it to perform the check for whether
the domain is a hardware domain inside the function itself.
> > Then, in domain_shutdown(), we can call need_hwdom_shutdown() instead
> > of directly checking is_hardware_domain(d). This keeps the logic
> > readable and avoids code duplication.
> >
> > What do you think about this approach?
>
> Well, there's still the CONFIG_ARM check in there that I would like to
> see gone. (As a nit, the use of u8 would also want to go away.)
We could combine your proposal from v5 of this patch series, i.e., using the
HAS_HWDOM_SUSPEND extra config together with this helper function:
static inline bool need_hwdom_shutdown(const struct domain *d)
{
bool is_hw_dom = is_hardware_domain(d);
if ( !IS_ENABLED(CONFIG_HAS_HWDOM_SUSPEND) )
return is_hw_dom && d->shutdown_code != SHUTDOWN_suspend;
return is_hw_dom;
}
As for the second argument (reason), I can extract it directly from the
domain structure, as is done in the function above.
>
> Furthermore with continuing to (ab)use domain_shutdown() also for the
> suspend case (Dom0 isn't really shut down when suspending, aiui), you
> retain the widening of the issue with the bogus setting of
> d->is_shutting_down (and hence the need for later clearing the flag
> again) that I mentioned elsewhere. (Yes, I remain of the opinion that
> you don't need to sort that as a prereq to your work, yet at the same
> time I think the goal should be to at least not make a bad situation
> worse.)
From the perspective of ARM logic inside Xen, we perform the exact same
shutdown steps as for other domains, except that in the end we need to
call Xen suspend.
For a domain with a toolstack, it is possible to have a running Xen
watchdog service. For example, if we have systemd, it can be easily stopped
from the guest because we have hooks and can perform some actions before
suspend.
The same story applies to a Linux kernel driver: if it has PM ops installed
for the Xen watchdog driver, nothing bad happens.
However, in the case of using init.d, it isn’t easy to stop the Xen WDT
automatically right before suspend. Therefore, Xen code has an extra check
(see domain_watchdog_timeout) where it checks the is_shutting_down flag
and does nothing if it is set.
The is_shutting_down flag is easily reset on Xen resume via a
domain_resume call, so I don’t see any problems with that.
>
> Jan
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-09 8:14 ` Mykola Kvach
@ 2025-09-09 9:14 ` Jan Beulich
2025-09-09 9:55 ` Mykola Kvach
0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2025-09-09 9:14 UTC (permalink / raw)
To: Mykola Kvach
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
On 09.09.2025 10:14, Mykola Kvach wrote:
> On Tue, Sep 9, 2025 at 9:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>> On 09.09.2025 08:29, Mykola Kvach wrote:
>>> Then, in domain_shutdown(), we can call need_hwdom_shutdown() instead
>>> of directly checking is_hardware_domain(d). This keeps the logic
>>> readable and avoids code duplication.
>>>
>>> What do you think about this approach?
>>
>> Well, there's still the CONFIG_ARM check in there that I would like to
>> see gone. (As a nit, the use of u8 would also want to go away.)
>
> We could combine your proposal from v5 of this patch series, i.e., using the
> HAS_HWDOM_SUSPEND extra config together with this helper function:
>
> static inline bool need_hwdom_shutdown(const struct domain *d)
> {
> bool is_hw_dom = is_hardware_domain(d);
>
> if ( !IS_ENABLED(CONFIG_HAS_HWDOM_SUSPEND) )
> return is_hw_dom && d->shutdown_code != SHUTDOWN_suspend;
>
> return is_hw_dom;
> }
Maybe. Yet then the next thing striking me as odd is the redundant
checking of is_hw_dom. Why not simply
{
if ( !is_hardware_domain(d) )
return false;
return IS_ENABLED(CONFIG_HAS_HWDOM_SUSPEND) || d->shutdown_code != SHUTDOWN_suspend;
}
Yet as said - my expectation is anyway that the is_hardware_domain() check
would live in the caller.
> As for the second argument (reason), I can extract it directly from the
> domain structure, as is done in the function above.
Looks like a misunderstanding: I don't mind the function parameter. But
the "u8" type shouldn't be used anymore in new code; that's uint8_t now.
>> Furthermore with continuing to (ab)use domain_shutdown() also for the
>> suspend case (Dom0 isn't really shut down when suspending, aiui), you
>> retain the widening of the issue with the bogus setting of
>> d->is_shutting_down (and hence the need for later clearing the flag
>> again) that I mentioned elsewhere. (Yes, I remain of the opinion that
>> you don't need to sort that as a prereq to your work, yet at the same
>> time I think the goal should be to at least not make a bad situation
>> worse.)
>
> From the perspective of ARM logic inside Xen, we perform the exact same
> shutdown steps as for other domains, except that in the end we need to
> call Xen suspend.
Which, as said, feels wrong. Domains to be revived after resume aren't
really shut down, so imo they should never have ->is_shutting_down set.
> For a domain with a toolstack, it is possible to have a running Xen
> watchdog service. For example, if we have systemd, it can be easily stopped
> from the guest because we have hooks and can perform some actions before
> suspend.
>
> The same story applies to a Linux kernel driver: if it has PM ops installed
> for the Xen watchdog driver, nothing bad happens.
>
> However, in the case of using init.d, it isn’t easy to stop the Xen WDT
> automatically right before suspend. Therefore, Xen code has an extra check
> (see domain_watchdog_timeout) where it checks the is_shutting_down flag
> and does nothing if it is set.
I don't understand how these watchdog considerations come into play here.
> The is_shutting_down flag is easily reset on Xen resume via a
> domain_resume call, so I don’t see any problems with that.
You did read my earlier mail though, regarding concerns towards the clearing
of that flag once it was set? (You must have, since iirc you even asked [1]
whether you're expected to address those issues up front.)
Jan
[1] https://lists.xen.org/archives/html/xen-devel/2025-08/msg02057.html
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-09 9:14 ` Jan Beulich
@ 2025-09-09 9:55 ` Mykola Kvach
2025-09-09 11:48 ` Jan Beulich
0 siblings, 1 reply; 49+ messages in thread
From: Mykola Kvach @ 2025-09-09 9:55 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
Thanks for your detailed comments and suggestions — much appreciated.
On Tue, Sep 9, 2025 at 12:14 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 09.09.2025 10:14, Mykola Kvach wrote:
> > On Tue, Sep 9, 2025 at 9:57 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> On 09.09.2025 08:29, Mykola Kvach wrote:
> >>> Then, in domain_shutdown(), we can call need_hwdom_shutdown() instead
> >>> of directly checking is_hardware_domain(d). This keeps the logic
> >>> readable and avoids code duplication.
> >>>
> >>> What do you think about this approach?
> >>
> >> Well, there's still the CONFIG_ARM check in there that I would like to
> >> see gone. (As a nit, the use of u8 would also want to go away.)
> >
> > We could combine your proposal from v5 of this patch series, i.e., using the
> > HAS_HWDOM_SUSPEND extra config together with this helper function:
> >
> > static inline bool need_hwdom_shutdown(const struct domain *d)
> > {
> > bool is_hw_dom = is_hardware_domain(d);
> >
> > if ( !IS_ENABLED(CONFIG_HAS_HWDOM_SUSPEND) )
> > return is_hw_dom && d->shutdown_code != SHUTDOWN_suspend;
> >
> > return is_hw_dom;
> > }
>
> Maybe. Yet then the next thing striking me as odd is the redundant
> checking of is_hw_dom. Why not simply
>
> {
> if ( !is_hardware_domain(d) )
> return false;
>
> return IS_ENABLED(CONFIG_HAS_HWDOM_SUSPEND) || d->shutdown_code != SHUTDOWN_suspend;
> }
>
> Yet as said - my expectation is anyway that the is_hardware_domain() check
> would live in the caller.
Ack.
>
> > As for the second argument (reason), I can extract it directly from the
> > domain structure, as is done in the function above.
>
> Looks like a misunderstanding: I don't mind the function parameter. But
> the "u8" type shouldn't be used anymore in new code; that's uint8_t now.
Oh, got it.
I just used the same type as in domain_shutdown().
>
> >> Furthermore with continuing to (ab)use domain_shutdown() also for the
> >> suspend case (Dom0 isn't really shut down when suspending, aiui), you
> >> retain the widening of the issue with the bogus setting of
> >> d->is_shutting_down (and hence the need for later clearing the flag
> >> again) that I mentioned elsewhere. (Yes, I remain of the opinion that
> >> you don't need to sort that as a prereq to your work, yet at the same
> >> time I think the goal should be to at least not make a bad situation
> >> worse.)
> >
> > From the perspective of ARM logic inside Xen, we perform the exact same
> > shutdown steps as for other domains, except that in the end we need to
> > call Xen suspend.
>
> Which, as said, feels wrong. Domains to be revived after resume aren't
> really shut down, so imo they should never have ->is_shutting_down set.
I believe this is out of scope for this series;
actually, the same applies to shutdown_code.
>
> > For a domain with a toolstack, it is possible to have a running Xen
> > watchdog service. For example, if we have systemd, it can be easily stopped
> > from the guest because we have hooks and can perform some actions before
> > suspend.
> >
> > The same story applies to a Linux kernel driver: if it has PM ops installed
> > for the Xen watchdog driver, nothing bad happens.
> >
> > However, in the case of using init.d, it isn’t easy to stop the Xen WDT
> > automatically right before suspend. Therefore, Xen code has an extra check
> > (see domain_watchdog_timeout) where it checks the is_shutting_down flag
> > and does nothing if it is set.
>
> I don't understand how these watchdog considerations come into play here.
I’m just trying to explain why we still need to set this flag
even for HW domain.
>
> > The is_shutting_down flag is easily reset on Xen resume via a
> > domain_resume call, so I don’t see any problems with that.
>
> You did read my earlier mail though, regarding concerns towards the clearing
> of that flag once it was set? (You must have, since iirc you even asked [1]
> whether you're expected to address those issues up front.)
As far as I understand, this issue is relevant to x86, and I believe
it is out of scope for this series.
See my previous message here:
https://lists.xen.org/archives/html/xen-devel/2025-08/msg02127.html
I will prepare a separate patch series to address it.
>
> Jan
>
> [1] https://lists.xen.org/archives/html/xen-devel/2025-08/msg02057.html
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain
2025-09-09 9:55 ` Mykola Kvach
@ 2025-09-09 11:48 ` Jan Beulich
0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2025-09-09 11:48 UTC (permalink / raw)
To: Mykola Kvach
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, Saeed Nowshadi, Mykyta Poturai,
Mykola Kvach, xen-devel
On 09.09.2025 11:55, Mykola Kvach wrote:
> On Tue, Sep 9, 2025 at 12:14 PM Jan Beulich <jbeulich@suse.com> wrote:
>> On 09.09.2025 10:14, Mykola Kvach wrote:
>>> On Tue, Sep 9, 2025 at 9:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>> Furthermore with continuing to (ab)use domain_shutdown() also for the
>>>> suspend case (Dom0 isn't really shut down when suspending, aiui), you
>>>> retain the widening of the issue with the bogus setting of
>>>> d->is_shutting_down (and hence the need for later clearing the flag
>>>> again) that I mentioned elsewhere. (Yes, I remain of the opinion that
>>>> you don't need to sort that as a prereq to your work, yet at the same
>>>> time I think the goal should be to at least not make a bad situation
>>>> worse.)
>>>
>>> From the perspective of ARM logic inside Xen, we perform the exact same
>>> shutdown steps as for other domains, except that in the end we need to
>>> call Xen suspend.
>>
>> Which, as said, feels wrong. Domains to be revived after resume aren't
>> really shut down, so imo they should never have ->is_shutting_down set.
>
> I believe this is out of scope for this series;
Yes, but see at the bottom.
> actually, the same applies to shutdown_code.
Not quite sure there.
>>> The is_shutting_down flag is easily reset on Xen resume via a
>>> domain_resume call, so I don’t see any problems with that.
>>
>> You did read my earlier mail though, regarding concerns towards the clearing
>> of that flag once it was set? (You must have, since iirc you even asked [1]
>> whether you're expected to address those issues up front.)
>
> As far as I understand, this issue is relevant to x86, and I believe
> it is out of scope for this series.
Yes and ...
> See my previous message here:
> https://lists.xen.org/archives/html/xen-devel/2025-08/msg02127.html
>
> I will prepare a separate patch series to address it.
... thanks. My request to not extend the badness remains though, as to the
series here.
Jan
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (10 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 11/13] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 17:25 ` Oleksandr Tyshchenko
2025-09-02 20:51 ` Volodymyr Babchuk
2025-09-01 22:10 ` [PATCH v6 13/13] xen/arm: gic-v3: Add suspend/resume support for eSPI registers Mykola Kvach
2025-09-02 20:48 ` [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Volodymyr Babchuk
13 siblings, 2 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Rahul Singh,
Mykola Kvach
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
This is done using generic iommu_suspend/resume functions that cause
IOMMU driver specific suspend/resume handlers to be called for enabled
IOMMU (if one has suspend/resume driver handlers implemented).
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V6:
- Drop iommu_enabled check from host system suspend.
---
xen/arch/arm/suspend.c | 11 +++++++++++
xen/drivers/passthrough/arm/smmu-v3.c | 10 ++++++++++
xen/drivers/passthrough/arm/smmu.c | 10 ++++++++++
3 files changed, 31 insertions(+)
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
index 35b20581f1..f3a3b831c5 100644
--- a/xen/arch/arm/suspend.c
+++ b/xen/arch/arm/suspend.c
@@ -5,6 +5,7 @@
#include <xen/console.h>
#include <xen/cpu.h>
+#include <xen/iommu.h>
#include <xen/llc-coloring.h>
#include <xen/sched.h>
#include <xen/tasklet.h>
@@ -62,6 +63,13 @@ static void cf_check system_suspend(void *data)
time_suspend();
+ status = iommu_suspend();
+ if ( status )
+ {
+ system_state = SYS_STATE_resume;
+ goto resume_time;
+ }
+
console_start_sync();
status = console_suspend();
if ( status )
@@ -118,6 +126,9 @@ static void cf_check system_suspend(void *data)
console_resume();
console_end_sync();
+ iommu_resume();
+
+ resume_time:
time_resume();
resume_nonboot_cpus:
diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
index 81071f4018..f887faf7dc 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -2854,6 +2854,13 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
xfree(xen_domain);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+static int arm_smmu_suspend(void)
+{
+ return -ENOSYS;
+}
+#endif
+
static const struct iommu_ops arm_smmu_iommu_ops = {
.page_sizes = PAGE_SIZE_4K,
.init = arm_smmu_iommu_xen_domain_init,
@@ -2866,6 +2873,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
.unmap_page = arm_iommu_unmap_page,
.dt_xlate = arm_smmu_dt_xlate,
.add_device = arm_smmu_add_device,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = arm_smmu_suspend,
+#endif
};
static __init int arm_smmu_dt_init(struct dt_device_node *dev,
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 22d306d0cb..45f29ef8ec 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
xfree(xen_domain);
}
+#ifdef CONFIG_SYSTEM_SUSPEND
+static int arm_smmu_suspend(void)
+{
+ return -ENOSYS;
+}
+#endif
+
static const struct iommu_ops arm_smmu_iommu_ops = {
.page_sizes = PAGE_SIZE_4K,
.init = arm_smmu_iommu_domain_init,
@@ -2960,6 +2967,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
.map_page = arm_iommu_map_page,
.unmap_page = arm_iommu_unmap_page,
.dt_xlate = arm_smmu_dt_xlate_generic,
+#ifdef CONFIG_SYSTEM_SUSPEND
+ .suspend = arm_smmu_suspend,
+#endif
};
static struct arm_smmu_device *find_smmu(const struct device *dev)
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume
2025-09-01 22:10 ` [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
@ 2025-09-02 17:25 ` Oleksandr Tyshchenko
2025-09-02 17:46 ` Mykola Kvach
2025-09-02 20:51 ` Volodymyr Babchuk
1 sibling, 1 reply; 49+ messages in thread
From: Oleksandr Tyshchenko @ 2025-09-02 17:25 UTC (permalink / raw)
To: Mykola Kvach, xen-devel
Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Rahul Singh,
Mykola Kvach
On 02.09.25 01:10, Mykola Kvach wrote:
Hello Mykola
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> This is done using generic iommu_suspend/resume functions that cause
> IOMMU driver specific suspend/resume handlers to be called for enabled
> IOMMU (if one has suspend/resume driver handlers implemented).
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V6:
> - Drop iommu_enabled check from host system suspend.
I do not have any comments for the updated version, thanks.
> ---
> xen/arch/arm/suspend.c | 11 +++++++++++
> xen/drivers/passthrough/arm/smmu-v3.c | 10 ++++++++++
> xen/drivers/passthrough/arm/smmu.c | 10 ++++++++++
> 3 files changed, 31 insertions(+)
>
> diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> index 35b20581f1..f3a3b831c5 100644
> --- a/xen/arch/arm/suspend.c
> +++ b/xen/arch/arm/suspend.c
> @@ -5,6 +5,7 @@
>
> #include <xen/console.h>
> #include <xen/cpu.h>
> +#include <xen/iommu.h>
> #include <xen/llc-coloring.h>
> #include <xen/sched.h>
> #include <xen/tasklet.h>
> @@ -62,6 +63,13 @@ static void cf_check system_suspend(void *data)
>
> time_suspend();
>
> + status = iommu_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_time;
> + }
> +
> console_start_sync();
> status = console_suspend();
> if ( status )
> @@ -118,6 +126,9 @@ static void cf_check system_suspend(void *data)
> console_resume();
> console_end_sync();
>
> + iommu_resume();
> +
> + resume_time:
> time_resume();
>
> resume_nonboot_cpus:
> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> index 81071f4018..f887faf7dc 100644
> --- a/xen/drivers/passthrough/arm/smmu-v3.c
> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> @@ -2854,6 +2854,13 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
> +
> static const struct iommu_ops arm_smmu_iommu_ops = {
> .page_sizes = PAGE_SIZE_4K,
> .init = arm_smmu_iommu_xen_domain_init,
> @@ -2866,6 +2873,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = arm_smmu_dt_xlate,
> .add_device = arm_smmu_add_device,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = arm_smmu_suspend,
> +#endif
> };
>
> static __init int arm_smmu_dt_init(struct dt_device_node *dev,
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 22d306d0cb..45f29ef8ec 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
> +
> static const struct iommu_ops arm_smmu_iommu_ops = {
> .page_sizes = PAGE_SIZE_4K,
> .init = arm_smmu_iommu_domain_init,
> @@ -2960,6 +2967,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> .map_page = arm_iommu_map_page,
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = arm_smmu_dt_xlate_generic,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = arm_smmu_suspend,
> +#endif
> };
>
> static struct arm_smmu_device *find_smmu(const struct device *dev)
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume
2025-09-02 17:25 ` Oleksandr Tyshchenko
@ 2025-09-02 17:46 ` Mykola Kvach
0 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-02 17:46 UTC (permalink / raw)
To: Oleksandr Tyshchenko
Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Rahul Singh,
Mykola Kvach
Hi Oleksandr,
On Tue, Sep 2, 2025 at 8:25 PM Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
>
> On 02.09.25 01:10, Mykola Kvach wrote:
>
> Hello Mykola
>
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >
> > This is done using generic iommu_suspend/resume functions that cause
> > IOMMU driver specific suspend/resume handlers to be called for enabled
> > IOMMU (if one has suspend/resume driver handlers implemented).
> >
> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V6:
> > - Drop iommu_enabled check from host system suspend.
>
> I do not have any comments for the updated version, thanks.
Thank you for your time and the review!
>
>
> > ---
> > xen/arch/arm/suspend.c | 11 +++++++++++
> > xen/drivers/passthrough/arm/smmu-v3.c | 10 ++++++++++
> > xen/drivers/passthrough/arm/smmu.c | 10 ++++++++++
> > 3 files changed, 31 insertions(+)
> >
> > diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> > index 35b20581f1..f3a3b831c5 100644
> > --- a/xen/arch/arm/suspend.c
> > +++ b/xen/arch/arm/suspend.c
> > @@ -5,6 +5,7 @@
> >
> > #include <xen/console.h>
> > #include <xen/cpu.h>
> > +#include <xen/iommu.h>
> > #include <xen/llc-coloring.h>
> > #include <xen/sched.h>
> > #include <xen/tasklet.h>
> > @@ -62,6 +63,13 @@ static void cf_check system_suspend(void *data)
> >
> > time_suspend();
> >
> > + status = iommu_suspend();
> > + if ( status )
> > + {
> > + system_state = SYS_STATE_resume;
> > + goto resume_time;
> > + }
> > +
> > console_start_sync();
> > status = console_suspend();
> > if ( status )
> > @@ -118,6 +126,9 @@ static void cf_check system_suspend(void *data)
> > console_resume();
> > console_end_sync();
> >
> > + iommu_resume();
> > +
> > + resume_time:
> > time_resume();
> >
> > resume_nonboot_cpus:
> > diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> > index 81071f4018..f887faf7dc 100644
> > --- a/xen/drivers/passthrough/arm/smmu-v3.c
> > +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> > @@ -2854,6 +2854,13 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> > xfree(xen_domain);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +static int arm_smmu_suspend(void)
> > +{
> > + return -ENOSYS;
> > +}
> > +#endif
> > +
> > static const struct iommu_ops arm_smmu_iommu_ops = {
> > .page_sizes = PAGE_SIZE_4K,
> > .init = arm_smmu_iommu_xen_domain_init,
> > @@ -2866,6 +2873,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> > .unmap_page = arm_iommu_unmap_page,
> > .dt_xlate = arm_smmu_dt_xlate,
> > .add_device = arm_smmu_add_device,
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + .suspend = arm_smmu_suspend,
> > +#endif
> > };
> >
> > static __init int arm_smmu_dt_init(struct dt_device_node *dev,
> > diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> > index 22d306d0cb..45f29ef8ec 100644
> > --- a/xen/drivers/passthrough/arm/smmu.c
> > +++ b/xen/drivers/passthrough/arm/smmu.c
> > @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> > xfree(xen_domain);
> > }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +static int arm_smmu_suspend(void)
> > +{
> > + return -ENOSYS;
> > +}
> > +#endif
> > +
> > static const struct iommu_ops arm_smmu_iommu_ops = {
> > .page_sizes = PAGE_SIZE_4K,
> > .init = arm_smmu_iommu_domain_init,
> > @@ -2960,6 +2967,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> > .map_page = arm_iommu_map_page,
> > .unmap_page = arm_iommu_unmap_page,
> > .dt_xlate = arm_smmu_dt_xlate_generic,
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > + .suspend = arm_smmu_suspend,
> > +#endif
> > };
> >
> > static struct arm_smmu_device *find_smmu(const struct device *dev)
>
Best regards,
Mykola
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume
2025-09-01 22:10 ` [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
2025-09-02 17:25 ` Oleksandr Tyshchenko
@ 2025-09-02 20:51 ` Volodymyr Babchuk
1 sibling, 0 replies; 49+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:51 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Oleksandr Tyshchenko,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Rahul Singh, Mykola Kvach
Hi,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> This is done using generic iommu_suspend/resume functions that cause
> IOMMU driver specific suspend/resume handlers to be called for enabled
> IOMMU (if one has suspend/resume driver handlers implemented).
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> Changes in V6:
> - Drop iommu_enabled check from host system suspend.
> ---
> xen/arch/arm/suspend.c | 11 +++++++++++
> xen/drivers/passthrough/arm/smmu-v3.c | 10 ++++++++++
> xen/drivers/passthrough/arm/smmu.c | 10 ++++++++++
> 3 files changed, 31 insertions(+)
>
> diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
> index 35b20581f1..f3a3b831c5 100644
> --- a/xen/arch/arm/suspend.c
> +++ b/xen/arch/arm/suspend.c
> @@ -5,6 +5,7 @@
>
> #include <xen/console.h>
> #include <xen/cpu.h>
> +#include <xen/iommu.h>
> #include <xen/llc-coloring.h>
> #include <xen/sched.h>
> #include <xen/tasklet.h>
> @@ -62,6 +63,13 @@ static void cf_check system_suspend(void *data)
>
> time_suspend();
>
> + status = iommu_suspend();
> + if ( status )
> + {
> + system_state = SYS_STATE_resume;
> + goto resume_time;
> + }
> +
> console_start_sync();
> status = console_suspend();
> if ( status )
> @@ -118,6 +126,9 @@ static void cf_check system_suspend(void *data)
> console_resume();
> console_end_sync();
>
> + iommu_resume();
> +
> + resume_time:
> time_resume();
>
> resume_nonboot_cpus:
> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> index 81071f4018..f887faf7dc 100644
> --- a/xen/drivers/passthrough/arm/smmu-v3.c
> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> @@ -2854,6 +2854,13 @@ static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
> +
> static const struct iommu_ops arm_smmu_iommu_ops = {
> .page_sizes = PAGE_SIZE_4K,
> .init = arm_smmu_iommu_xen_domain_init,
> @@ -2866,6 +2873,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = arm_smmu_dt_xlate,
> .add_device = arm_smmu_add_device,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = arm_smmu_suspend,
> +#endif
> };
>
> static __init int arm_smmu_dt_init(struct dt_device_node *dev,
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 22d306d0cb..45f29ef8ec 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
> xfree(xen_domain);
> }
>
> +#ifdef CONFIG_SYSTEM_SUSPEND
> +static int arm_smmu_suspend(void)
> +{
> + return -ENOSYS;
> +}
> +#endif
> +
> static const struct iommu_ops arm_smmu_iommu_ops = {
> .page_sizes = PAGE_SIZE_4K,
> .init = arm_smmu_iommu_domain_init,
> @@ -2960,6 +2967,9 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
> .map_page = arm_iommu_map_page,
> .unmap_page = arm_iommu_unmap_page,
> .dt_xlate = arm_smmu_dt_xlate_generic,
> +#ifdef CONFIG_SYSTEM_SUSPEND
> + .suspend = arm_smmu_suspend,
> +#endif
> };
>
> static struct arm_smmu_device *find_smmu(const struct device *dev)
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v6 13/13] xen/arm: gic-v3: Add suspend/resume support for eSPI registers
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (11 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 12/13] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
@ 2025-09-01 22:10 ` Mykola Kvach
2025-09-02 20:48 ` [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Volodymyr Babchuk
13 siblings, 0 replies; 49+ messages in thread
From: Mykola Kvach @ 2025-09-01 22:10 UTC (permalink / raw)
To: xen-devel
Cc: Mykola Kvach, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk
From: Mykola Kvach <mykola_kvach@epam.com>
Add suspend/resume handling for GICv3 eSPI registers.
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Note: The main eSPI patch series is still under review.
This commit is intended to be applied after the main eSPI series:
[PATCH v5 00/12] Introduce eSPI support
https://patchew.org/Xen/cover.1756481577.git.leonid._5Fkomarianskyi@epam.com/
---
xen/arch/arm/gic-v3.c | 141 +++++++++++++++++++++++++++++-------------
1 file changed, 97 insertions(+), 44 deletions(-)
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 9f1be7e905..57403c82a8 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1782,17 +1782,14 @@ static bool gic_dist_supports_lpis(void)
struct gicv3_ctx {
struct dist_ctx {
uint32_t ctlr;
- /*
- * This struct represent block of 32 IRQs
- * TODO: store extended SPI configuration (GICv3.1+)
- */
+ /* This struct represent block of 32 IRQs */
struct irq_regs {
uint32_t icfgr[2];
uint32_t ipriorityr[8];
uint64_t irouter[32];
uint32_t isactiver;
uint32_t isenabler;
- } *irqs;
+ } *irqs, *espi_irqs;
} dist;
/* have only one rdist structure for last running CPU during suspend */
@@ -1831,8 +1828,26 @@ static void __init gicv3_alloc_context(void)
gicv3_ctx.dist.irqs = xzalloc_array(typeof(*gicv3_ctx.dist.irqs),
blocks - 1);
if ( !gicv3_ctx.dist.irqs )
+ {
printk(XENLOG_ERR "Failed to allocate memory for GICv3 suspend context\n");
+ return;
+ }
}
+
+#ifdef CONFIG_GICV3_ESPI
+ if ( !gicv3_info.nr_espi )
+ return;
+
+ gicv3_ctx.dist.espi_irqs = xzalloc_array(typeof(*gicv3_ctx.dist.espi_irqs),
+ gicv3_info.nr_espi / 32);
+ if ( !gicv3_ctx.dist.espi_irqs )
+ {
+ xfree(gicv3_ctx.dist.irqs);
+ gicv3_ctx.dist.irqs = NULL;
+
+ printk(XENLOG_ERR "Failed to allocate memory for GICv3 eSPI suspend context\n");
+ }
+#endif
}
static void gicv3_disable_redist(void)
@@ -1852,6 +1867,65 @@ static void gicv3_disable_redist(void)
while ( (readl_relaxed(waker) & GICR_WAKER_ChildrenAsleep) == 0 );
}
+#define GET_SPI_REG_OFFSET(name, is_espi) \
+ ((is_espi) ? GICD_##name##nE : GICD_##name)
+
+static void gicv3_store_spi_irq_block(typeof(gicv3_ctx.dist.irqs) irqs,
+ unsigned int i, bool is_espi)
+{
+ void __iomem *base;
+ unsigned int irq;
+
+ base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + 8 * i;
+ irqs->icfgr[0] = readl_relaxed(base);
+ irqs->icfgr[1] = readl_relaxed(base + 4);
+
+ base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi) + 32 * i;
+ for ( irq = 0; irq < 8; irq++ )
+ irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi) + 32 * i;
+ for ( irq = 0; irq < 32; irq++ )
+ irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi) + 4 * i;
+ irqs->isactiver = readl_relaxed(base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi) + 4 * i;
+ irqs->isenabler = readl_relaxed(base);
+}
+
+static void gicv3_restore_spi_irq_block(typeof(gicv3_ctx.dist.irqs) irqs,
+ unsigned int i, bool is_espi)
+{
+ void __iomem *base;
+ unsigned int irq;
+
+ base = GICD + GET_SPI_REG_OFFSET(ICFGR, is_espi) + 8 * i;
+ writel_relaxed(irqs->icfgr[0], base);
+ writel_relaxed(irqs->icfgr[1], base + 4);
+
+ base = GICD + GET_SPI_REG_OFFSET(IPRIORITYR, is_espi) + 32 * i;
+ for ( irq = 0; irq < 8; irq++ )
+ writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(IROUTER, is_espi) + 32 * i;
+ for ( irq = 0; irq < 32; irq++ )
+ writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
+
+ base = GICD + GET_SPI_REG_OFFSET(ICENABLER, is_espi) + i * 4;
+ writel_relaxed(GENMASK(31, 0), base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISENABLER, is_espi) + i * 4;
+ writel_relaxed(irqs->isenabler, base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ICACTIVER, is_espi) + i * 4;
+ writel_relaxed(GENMASK(31, 0), base);
+
+ base = GICD + GET_SPI_REG_OFFSET(ISACTIVER, is_espi) + i * 4;
+ writel_relaxed(irqs->isactiver, base);
+}
+
static int gicv3_suspend(void)
{
unsigned int i;
@@ -1871,6 +1945,14 @@ static int gicv3_suspend(void)
return -ENOMEM;
}
+#ifdef CONFIG_GICV3_ESPI
+ if ( gicv3_info.nr_espi && !gicv3_ctx.dist.espi_irqs )
+ {
+ printk(XENLOG_ERR "GICv3: eSPI suspend context is not allocated!\n");
+ return -ENOMEM;
+ }
+#endif
+
/* Save GICC configuration */
gicv3_ctx.cpu.ctlr = READ_SYSREG(ICC_CTLR_EL1);
gicv3_ctx.cpu.pmr = READ_SYSREG(ICC_PMR_EL1);
@@ -1903,25 +1985,12 @@ static int gicv3_suspend(void)
gicv3_ctx.dist.ctlr = readl_relaxed(GICD + GICD_CTLR);
for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
- {
- typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
- unsigned int irq;
+ gicv3_store_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
- base = GICD + GICD_ICFGR + 8 * i;
- irqs->icfgr[0] = readl_relaxed(base);
- irqs->icfgr[1] = readl_relaxed(base + 4);
-
- base = GICD + GICD_IPRIORITYR + 32 * i;
- for ( irq = 0; irq < 8; irq++ )
- irqs->ipriorityr[irq] = readl_relaxed(base + 4 * irq);
-
- base = GICD + GICD_IROUTER + 32 * i;
- for ( irq = 0; irq < 32; irq++ )
- irqs->irouter[irq] = readq_relaxed_non_atomic(base + 8 * irq);
-
- irqs->isactiver = readl_relaxed(GICD + GICD_ISACTIVER + 4 * i);
- irqs->isenabler = readl_relaxed(GICD + GICD_ISENABLER + 4 * i);
- }
+#ifdef CONFIG_GICV3_ESPI
+ for ( i = 0; i < gicv3_info.nr_espi / 32; i++ )
+ gicv3_store_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
+#endif
return 0;
}
@@ -1938,28 +2007,12 @@ static void gicv3_resume(void)
writel_relaxed(GENMASK(31, 0), GICD + GICD_IGROUPR + (i / 32) * 4);
for ( i = 1; i < DIV_ROUND_UP(gicv3_info.nr_lines, 32); i++ )
- {
- typeof(gicv3_ctx.dist.irqs) irqs = gicv3_ctx.dist.irqs + i - 1;
- unsigned int irq;
+ gicv3_restore_spi_irq_block(gicv3_ctx.dist.irqs + i - 1, i, false);
- base = GICD + GICD_ICFGR + 8 * i;
- writel_relaxed(irqs->icfgr[0], base);
- writel_relaxed(irqs->icfgr[1], base + 4);
-
- base = GICD + GICD_IPRIORITYR + 32 * i;
- for ( irq = 0; irq < 8; irq++ )
- writel_relaxed(irqs->ipriorityr[irq], base + 4 * irq);
-
- base = GICD + GICD_IROUTER + 32 * i;
- for ( irq = 0; irq < 32; irq++ )
- writeq_relaxed_non_atomic(irqs->irouter[irq], base + 8 * irq);
-
- writel_relaxed(GENMASK(31, 0), GICD + GICD_ICENABLER + i * 4);
- writel_relaxed(irqs->isenabler, GICD + GICD_ISENABLER + i * 4);
-
- writel_relaxed(GENMASK(31, 0), GICD + GICD_ICACTIVER + i * 4);
- writel_relaxed(irqs->isactiver, GICD + GICD_ISACTIVER + i * 4);
- }
+#ifdef CONFIG_GICV3_ESPI
+ for ( i = 0; i < gicv3_info.nr_espi / 32; i++ )
+ gicv3_restore_spi_irq_block(gicv3_ctx.dist.espi_irqs + i, i, true);
+#endif
writel_relaxed(gicv3_ctx.dist.ctlr, GICD + GICD_CTLR);
gicv3_dist_wait_for_rwp();
--
2.48.1
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64
2025-09-01 22:10 [PATCH v6 00/13] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
` (12 preceding siblings ...)
2025-09-01 22:10 ` [PATCH v6 13/13] xen/arm: gic-v3: Add suspend/resume support for eSPI registers Mykola Kvach
@ 2025-09-02 20:48 ` Volodymyr Babchuk
13 siblings, 0 replies; 49+ messages in thread
From: Volodymyr Babchuk @ 2025-09-02 20:48 UTC (permalink / raw)
To: Mykola Kvach
Cc: xen-devel@lists.xenproject.org, Mykola Kvach, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Andrew Cooper,
Anthony PERARD, Jan Beulich, Roger Pau Monné, Rahul Singh
Hi Mykola,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> This is part 2 of version 5 of the ARM Xen system suspend/resume patch
> series, based on earlier work by Mirela Simonovic and Mykyta Poturai.
>
> The first part is here:
> https://marc.info/?l=xen-devel&m=175659181415965&w=2
>
> This version is ported to Xen master (4.21-unstable) and includes
> extensive improvements based on reviewer feedback. The patch series
> restructures code to improve robustness, maintainability, and implements
> system Suspend-to-RAM support on ARM64 hardware domains.
>
> At a high-level, this patch series provides:
> - Support for Host system suspend/resume via PSCI SYSTEM_SUSPEND
> (ARM64)
I am wondering if you had to split this into 3 patches. Looks like patches
8 and 9 are useless without patch 10. They just add bunch of dead
code. Maybe it is better to squash them into one patch? I may be wrong
here, so maybe other reviewers/maintainers will correct me.
> - Suspend/resume infrastructure for CPU context, timers, GICv2/GICv3 and IPMMU-VMSA
> - Proper error propagation and recovery throughout the suspend/resume flow
>
> Key updates in this series:
> - Introduced architecture-specific suspend/resume infrastructure (new `suspend.c`, `suspend.h`, low-level context save/restore in `head.S`)
> - Integrated GICv2/GICv3 suspend and resume, including memory-backed context save/restore with error handling
> - Added time and IRQ suspend/resume hooks, ensuring correct timer/interrupt state across suspend cycles
> - Implemented proper PSCI SYSTEM_SUSPEND invocation and version checks
> - Improved state management and recovery in error cases during suspend/resume
> - Added support for IPMMU-VMSA context save/restore
> - Added support for GICv3 eSPI registers context save/restore
>
> ---
> TODOs:
> - Test system suspend with llc_coloring_enabled set and verify functionality
> - Implement SMMUv3 suspend/resume handlers
> - Enable "xl suspend" support on ARM
> - Properly disable Xen timer watchdog from relevant services (only init.d left)
> - Add suspend/resume CI test for ARM (QEMU if feasible)
> - Investigate feasibility and need for implementing system suspend on ARM32
> ---
>
> Changelog for v6:
> - Add suspend/resume support for GICv3 eSPI registers (to be applied after the
> main eSPI series).
> - Drop redundant iommu_enabled check from host system suspend.
> - Switch from continue_hypercall_on_cpu to a dedicated tasklet for system
> suspend, avoiding user register modification and decoupling guest/system
> suspend status.
> - Refactor IOMMU register context code.
> - Improve IRQ handling: call handler->disable(), move system state checks, and
> skip IRQ release during suspend inside release_irq().
> - Remove redundant GICv3 save/restore state logic now handled during vCPU
> context switch.
> - Clarify and unify error/warning messages, comments, and documentation.
> - Correct loops for saving/restoring priorities and merge loops where possible.
> - Add explicit error for unimplemented ITS suspend support.
> - Add missing GICD_CTLR_DS bit definition and clarify GICR_WAKER comments.
> - Cleanup active and enable registers before restoring.
> - Minor comment improvements and code cleanups.
>
> Changes introduced in V5:
> - Add support for IPMMU-VMSA context save/restore
> - Add support for GICv3 context save/restore
> - Select HAS_SYSTEM_SUSPEND in ARM_64 instead of ARM
> - Check llc_coloring_enabled instead of LLC_COLORING during the selection
> of HAS_SYSTEM_SUSPEND config
> - Call host_system_suspend from guest PSCI system suspend instead of
> arch_domain_shutdown, reducing the complexity of the new code
>
> Changes introduced in V4:
> - Remove the prior tasklet-based workaround in favor of a more
> straightforward and safer solution.
> - Rework the approach by adding explicit system state checks around
> request_irq and release_irq calls; skip these calls during suspend
> and resume states to avoid unsafe memory operations when IRQs are
> disabled.
> - Prevent reinitialization of local IRQ descriptors on system resume.
> - Restore the state of local IRQs during system resume for secondary CPUs.
> - Drop code for saving and restoring VCPU context (see part 1 of the patch
> series for details).
> - Remove IOMMU suspend and resume calls until these features are implemented.
> - Move system suspend logic to arch_domain_shutdown, invoked from
> domain_shutdown.
> - Add console_end_sync to the resume path after system suspend.
> - Drop unnecessary DAIF masking; interrupts are already masked on resume.
> - Remove leftover TLB flush instructions; flushing is handled in enable_mmu.
> - Avoid setting x19 in hyp_resume as it is not required.
> - Replace prepare_secondary_mm with set_init_ttbr, and call it from system_suspend.
> - Produce a build-time error for ARM32 when CONFIG_SYSTEM_SUSPEND is enabled.
> - Use register_t instead of uint64_t in the cpu_context structure.
> - Apply minor fixes such as renaming functions, updating comments, and
> modifying commit messages to accurately reflect the changes introduced
> by this patch series.
>
> For earlier changelogs, please refer to the previous cover letters.
>
> Previous versions:
> V1: https://marc.info/?l=xen-devel&m=154202231501850&w=2
> V2: https://marc.info/?l=xen-devel&m=166514782207736&w=2
> V3: https://lists.xen.org/archives/html/xen-devel/2025-03/msg00168.html
>
> Mirela Simonovic (6):
> xen/arm: Add suspend and resume timer helpers
> xen/arm: gic-v2: Implement GIC suspend/resume functions
> xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
> xen/arm: Resume memory management on Xen resume
> xen/arm: Save/restore context on suspend/resume
> xen/arm: Add support for system suspend triggered by hardware domain
>
> Mykola Kvach (5):
> xen/arm: gic-v3: Implement GICv3 suspend/resume functions
> xen/arm: Don't release IRQs on suspend
> xen/arm: irq: avoid local IRQ descriptors reinit on system resume
> xen/arm: irq: Restore state of local IRQs during system resume
> xen/arm: gic-v3: Add suspend/resume support for eSPI registers
>
> Oleksandr Tyshchenko (2):
> iommu/ipmmu-vmsa: Implement suspend/resume callbacks
> xen/arm: Suspend/resume IOMMU on Xen suspend/resume
>
> xen/arch/arm/Kconfig | 1 +
> xen/arch/arm/Makefile | 1 +
> xen/arch/arm/arm64/head.S | 112 +++++++++
> xen/arch/arm/gic-v2.c | 143 +++++++++++
> xen/arch/arm/gic-v3-lpi.c | 3 +
> xen/arch/arm/gic-v3.c | 288 +++++++++++++++++++++++
> xen/arch/arm/gic.c | 32 +++
> xen/arch/arm/include/asm/gic.h | 12 +
> xen/arch/arm/include/asm/gic_v3_defs.h | 1 +
> xen/arch/arm/include/asm/mm.h | 2 +
> xen/arch/arm/include/asm/psci.h | 1 +
> xen/arch/arm/include/asm/suspend.h | 46 ++++
> xen/arch/arm/include/asm/time.h | 5 +
> xen/arch/arm/irq.c | 46 ++++
> xen/arch/arm/mmu/smpboot.c | 2 +-
> xen/arch/arm/psci.c | 23 +-
> xen/arch/arm/suspend.c | 175 ++++++++++++++
> xen/arch/arm/tee/ffa_notif.c | 2 +-
> xen/arch/arm/time.c | 49 +++-
> xen/arch/arm/vpsci.c | 9 +-
> xen/common/domain.c | 4 +
> xen/drivers/passthrough/arm/ipmmu-vmsa.c | 257 ++++++++++++++++++++
> xen/drivers/passthrough/arm/smmu-v3.c | 10 +
> xen/drivers/passthrough/arm/smmu.c | 10 +
> 24 files changed, 1220 insertions(+), 14 deletions(-)
> create mode 100644 xen/arch/arm/include/asm/suspend.h
> create mode 100644 xen/arch/arm/suspend.c
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 49+ messages in thread