All of lore.kernel.org
 help / color / mirror / Atom feed
From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Mykola Kvach <xakep.amatop@gmail.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Mykola Kvach <Mykola_Kvach@epam.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Julien Grall <julien@xen.org>,
	Bertrand Marquis <bertrand.marquis@arm.com>,
	Michal Orzel <michal.orzel@amd.com>
Subject: Re: [PATCH v5 04/12] xen/arm: Prevent crash during disable_nonboot_cpus on suspend
Date: Sat, 23 Aug 2025 00:36:52 +0000	[thread overview]
Message-ID: <87qzx2an6k.fsf@epam.com> (raw)
In-Reply-To: <98957da5c5068ae8340a21a9aa15a962905a8a22.1754943874.git.mykola_kvach@epam.com> (Mykola Kvach's message of "Mon, 11 Aug 2025 23:48:00 +0300")


Hi Mykola,

Mykola Kvach <xakep.amatop@gmail.com> writes:

While I approve the change, the commit message is somewhat
unclear. Maybe "Don't release IRQs on suspend" will be better?

> From: Mykola Kvach <mykola_kvach@epam.com>
>
> If we call disable_nonboot_cpus on ARM64 with system_state set
> to SYS_STATE_suspend, the following assertion will be triggered:
>
> ```
> (XEN) [   25.582712] Disabling non-boot CPUs ...
> (XEN) [   25.587032] Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
> [...]
> (XEN) [   25.975069] Xen call trace:
> (XEN) [   25.978353]    [<00000a000022e098>] xfree+0x130/0x1a4 (PC)
> (XEN) [   25.984314]    [<00000a000022e08c>] xfree+0x124/0x1a4 (LR)
> (XEN) [   25.990276]    [<00000a00002747d4>] release_irq+0xe4/0xe8
> (XEN) [   25.996152]    [<00000a0000278588>] time.c#cpu_time_callback+0x44/0x60
> (XEN) [   26.003150]    [<00000a000021d678>] notifier_call_chain+0x7c/0xa0
> (XEN) [   26.009717]    [<00000a00002018e0>] cpu.c#cpu_notifier_call_chain+0x24/0x48
> (XEN) [   26.017148]    [<00000a000020192c>] cpu.c#_take_cpu_down+0x28/0x34
> (XEN) [   26.023801]    [<00000a0000201944>] cpu.c#take_cpu_down+0xc/0x18
> (XEN) [   26.030281]    [<00000a0000225c5c>] stop_machine.c#stopmachine_action+0xbc/0xe4
> (XEN) [   26.038057]    [<00000a00002264bc>] tasklet.c#do_tasklet_work+0xb8/0x100
> (XEN) [   26.045229]    [<00000a00002268a4>] do_tasklet+0x68/0xb0
> (XEN) [   26.051018]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
> (XEN) [   26.057585]    [<00000a0000277e30>] start_secondary+0x21c/0x220
> (XEN) [   26.063978]    [<00000a0000361258>] 00000a0000361258
> ```
>
> This happens because before invoking take_cpu_down via the stop_machine_run
> function on the target CPU, stop_machine_run requests
> the STOPMACHINE_DISABLE_IRQ state on that CPU. Releasing memory in
> the release_irq function then triggers the assertion:
>
> /*
>  * Heap allocations may need TLB flushes which may require IRQs to be
>  * enabled (except when only 1 PCPU is online).
>  */
>
> This patch adds system state checks to guard calls to request_irq
> and release_irq. These calls are now skipped when system_state is
> SYS_STATE_{resume,suspend}, preventing unsafe operations during
> suspend/resume handling.

If any call to release_irq() during suspend will trigger ASSERT, and it
is fine to leave IRQs as is during suspend, maybe it will be easier to
put

+        if ( system_state == SYS_STATE_suspend )
+            return;

straight into release_irq() code? This will be easier than playing
whack-a-mole when some other patch will add another release_irq() call
somewhere.


>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V4:
>   - removed the prior tasklet-based workaround in favor of a more
>     straightforward and safer solution
>   - reworked the approach by adding explicit system state checks around
>     request_irq and release_irq calls, skips these calls during suspend
>     and resume states to avoid unsafe memory operations when IRQs are
>     disabled
> ---
>  xen/arch/arm/gic.c           |  6 ++++++
>  xen/arch/arm/tee/ffa_notif.c |  2 +-
>  xen/arch/arm/time.c          | 18 ++++++++++++------
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index a018bd7715..9856cb1592 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -388,6 +388,9 @@ void gic_dump_info(struct vcpu *v)
>  
>  void init_maintenance_interrupt(void)
>  {
> +    if ( system_state == SYS_STATE_resume )
> +        return;
> +
>      request_irq(gic_hw_ops->info->maintenance_irq, 0, maintenance_interrupt,
>                  "irq-maintenance", NULL);
>  }
> @@ -461,6 +464,9 @@ static int cpu_gic_callback(struct notifier_block *nfb,
>      switch ( action )
>      {
>      case CPU_DYING:
> +        if ( system_state == SYS_STATE_suspend )
> +            break;
> +
>          /* This is reverting the work done in init_maintenance_interrupt */
>          release_irq(gic_hw_ops->info->maintenance_irq, NULL);
>          break;
> diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
> index 00efaf8f73..06f715a82b 100644
> --- a/xen/arch/arm/tee/ffa_notif.c
> +++ b/xen/arch/arm/tee/ffa_notif.c
> @@ -347,7 +347,7 @@ void ffa_notif_init_interrupt(void)
>  {
>      int ret;
>  
> -    if ( notif_enabled && notif_sri_irq < NR_GIC_SGI )
> +    if ( notif_enabled && notif_sri_irq < NR_GIC_SGI && system_state != SYS_STATE_resume )
>      {
>          /*
>           * An error here is unlikely since the primary CPU has already
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index ad984fdfdd..b2e07ade43 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -320,10 +320,13 @@ void init_timer_interrupt(void)
>      WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
>      disable_physical_timers();
>  
> -    request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> -                "hyptimer", NULL);
> -    request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
> -                   "virtimer", NULL);
> +    if ( system_state != SYS_STATE_resume )
> +    {
> +        request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> +                    "hyptimer", NULL);
> +        request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
> +                    "virtimer", NULL);
> +    }
>  
>      check_timer_irq_cfg(timer_irq[TIMER_HYP_PPI], "hypervisor");
>      check_timer_irq_cfg(timer_irq[TIMER_VIRT_PPI], "virtual");
> @@ -338,8 +341,11 @@ static void deinit_timer_interrupt(void)
>  {
>      disable_physical_timers();
>  
> -    release_irq(timer_irq[TIMER_HYP_PPI], NULL);
> -    release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
> +    if ( system_state != SYS_STATE_suspend )
> +    {
> +        release_irq(timer_irq[TIMER_HYP_PPI], NULL);
> +        release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
> +    }
>  }
>  
>  /* Wait a set number of microseconds */

-- 
WBR, Volodymyr

  reply	other threads:[~2025-08-23  0:37 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-11 20:47 [PATCH v5 00/12] Add initial Xen Suspend-to-RAM support on ARM64 Mykola Kvach
2025-08-11 20:47 ` [PATCH v5 01/12] xen/arm: Add suspend and resume timer helpers Mykola Kvach
2025-08-11 20:47 ` [PATCH v5 02/12] xen/arm: gic-v2: Implement GIC suspend/resume functions Mykola Kvach
2025-08-23  0:01   ` Volodymyr Babchuk
2025-08-26 13:41     ` Mykola Kvach
2025-08-11 20:47 ` [PATCH v5 03/12] xen/arm: gic-v3: Implement GICv3 " Mykola Kvach
2025-08-23  0:20   ` Volodymyr Babchuk
2025-08-26 13:41     ` Mykola Kvach
2025-08-11 20:48 ` [PATCH v5 04/12] xen/arm: Prevent crash during disable_nonboot_cpus on suspend Mykola Kvach
2025-08-23  0:36   ` Volodymyr Babchuk [this message]
2025-08-26 13:42     ` Mykola Kvach
2025-08-11 20:48 ` [PATCH v5 05/12] xen/arm: irq: avoid local IRQ descriptors reinit on system resume Mykola Kvach
2025-08-23  0:37   ` Volodymyr Babchuk
2025-08-11 20:48 ` [PATCH v5 06/12] xen/arm: irq: Restore state of local IRQs during " Mykola Kvach
2025-08-23  0:45   ` Volodymyr Babchuk
2025-08-26 13:42     ` Mykola Kvach
2025-08-11 20:48 ` [PATCH v5 07/12] xen/arm: Add support for system suspend triggered by hardware domain Mykola Kvach
2025-08-12  7:18   ` Jan Beulich
2025-08-23  0:53     ` Volodymyr Babchuk
2025-08-23  1:00   ` Volodymyr Babchuk
2025-08-26 13:42     ` Mykola Kvach
2025-08-11 20:48 ` [PATCH v5 08/12] xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) Mykola Kvach
2025-08-23  1:06   ` Volodymyr Babchuk
2025-08-26 13:42     ` Mykola Kvach
2025-08-11 20:48 ` [PATCH v5 09/12] xen/arm: Resume memory management on Xen resume Mykola Kvach
2025-08-11 20:48 ` [PATCH v5 10/12] xen/arm: Save/restore context on suspend/resume Mykola Kvach
2025-08-23 17:34   ` Volodymyr Babchuk
2025-08-26 13:42     ` Mykola Kvach
2025-08-11 20:48 ` [PATCH v5 11/12] iommu/ipmmu-vmsa: Implement suspend/resume callbacks Mykola Kvach
2025-08-23 17:48   ` Volodymyr Babchuk
2025-08-26 13:42     ` Mykola Kvach
2025-08-11 20:48 ` [PATCH v5 12/12] xen/arm: Suspend/resume IOMMU on Xen suspend/resume Mykola Kvach
2025-08-23 17:54   ` Volodymyr Babchuk
2025-08-26 13:42     ` Mykola Kvach
2025-08-26 15:01       ` Oleksandr Tyshchenko
2025-08-26 16:17         ` Mykola Kvach
2025-08-11 21:53 ` [PATCH v5 00/12] Add initial Xen Suspend-to-RAM support on ARM64 Julien Grall
2025-08-12  4:58   ` Mykola Kvach

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87qzx2an6k.fsf@epam.com \
    --to=volodymyr_babchuk@epam.com \
    --cc=Mykola_Kvach@epam.com \
    --cc=bertrand.marquis@arm.com \
    --cc=julien@xen.org \
    --cc=michal.orzel@amd.com \
    --cc=sstabellini@kernel.org \
    --cc=xakep.amatop@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.