Re: [PATCH] cpu: fix hard lockup triggered during stress-ng stress testing.

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Catalin Marinas <catalin.marinas@arm.com>
To: shechenglong <shechenglong@xfusion.com>
Cc: will@kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, stone.xulei@xfusion.com,
	chenjialong@xfusion.com, yuxiating@xfusion.com
Subject: Re: [PATCH] cpu: fix hard lockup triggered during stress-ng stress testing.
Date: Thu, 18 Sep 2025 12:28:05 +0100	[thread overview]
Message-ID: <aMvsxd8nHb5roC0o@arm.com> (raw)
In-Reply-To: <20250918064907.1832-1-shechenglong@xfusion.com>

On Thu, Sep 18, 2025 at 02:49:07PM +0800, shechenglong wrote:
> Context of the Issue:
> In an ARM64 environment, the following steps were performed:
> 
> 1. Repeatedly ran stress-ng to stress the CPU, memory, and I/O.
> 2. Cyclically executed test case pty06 from the LTP test suite.
> 3. Added mitigations=off to the GRUB parameters.
> 
> After 1–2 hours of stress testing, a hardlockup occurred,
> causing a system crash.
> 
> Root Cause of the Hardlockup:
> Each time stress-ng starts, it invokes the /sys/kernel/debug/clear_warn_once
> interface, which clears the values in the memory section from __start_once
> to __end_once. This caused functions like pr_info_once() — originally
> designed to print only once — to print again every time stress-ng was called.
> If the pty06 test case happened to be using the serial module at that same
> moment, it would sleep in waiter.list within the __down_common function.
> 
> After pr_info_once() completed its output using the serial module,
> it invoked the semaphore up() function to wake up the process waiting
> in waiter.list. This sequence triggered an A-A deadlock, ultimately
> leading to a hardlockup and system crash.
> 
> To prevent this, a local variable should be used to control and ensure
> the print operation occurs only once.
> 
> Hard lockup call stack:
> 
> _raw_spin_lock_nested+168
> ttwu_queue+180 （rq_lock(rq, &rf); 2nd acquiring the rq->__lock）
> try_to_wake_up+548
> wake_up_process+32
> __up+88
> up+100
> __up_console_sem+96
> console_unlock+696
> vprintk_emit+428
> vprintk_default+64
> vprintk_func+220
> printk+104
> spectre_v4_enable_task_mitigation+344
> __switch_to+100
> __schedule+1028 (rq_lock(rq, &rf); 1st acquiring the rq->__lock)
> schedule_idle+48
> do_idle+388
> cpu_startup_entry+44
> secondary_start_kernel+352

Is the problem actually that we call the spectre v4 stuff on the
switch_to() path (we can't change this) under the rq_lock() and it
subsequently calls printk() which takes the console semaphore? I think
the "once" aspect makes it less likely but does not address the actual
problem.

> diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
> index edf1783ffc81..f8663157e041 100644
> --- a/arch/arm64/kernel/proton-pack.c
> +++ b/arch/arm64/kernel/proton-pack.c
> @@ -424,8 +424,10 @@ static bool spectre_v4_mitigations_off(void)
>  	bool ret = cpu_mitigations_off() ||
>  		   __spectre_v4_policy == SPECTRE_V4_POLICY_MITIGATION_DISABLED;
>  
> -	if (ret)
> -		pr_info_once("spectre-v4 mitigation disabled by command-line option\n");
> +	static atomic_t __printk_once = ATOMIC_INIT(0);
> +
> +	if (ret && !atomic_cmpxchg(&__printk_once, 0, 1))
> +		pr_info("spectre-v4 mitigation disabled by command-line option\n");
>  
>  	return ret;
>  }

I think we should just avoid the printk() on the
spectre_v4_enable_task_mitigation() path. Well, I'd remove it altogether
from the spectre_v4_mitigations_off() as it's called on kernel entry as
well. Just add a different way to print the status during kernel boot if
there isn't one already, maybe an initcall.

-- 
Catalin

next prev parent reply	other threads:[~2025-09-18 11:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-18  6:49 [PATCH] cpu: fix hard lockup triggered during stress-ng stress testing shechenglong
2025-09-18 11:28 ` Catalin Marinas [this message]
2025-09-19 12:05   ` 答复: " shechenglong
2025-09-22 16:54     ` Catalin Marinas
2025-09-22 16:08   ` Mark Rutland
2025-09-24 12:32 ` [PATCH] cpu: fix hard lockup triggered by printk calls within scheduling context shechenglong
2025-09-25 13:48   ` Catalin Marinas
2025-10-03 14:23   ` Will Deacon
2025-10-20 14:51 ` [PATCH v2 0/2] arm64: spectre: Fix hard lockup and cleanup mitigation messages shechenglong
2025-10-20 14:51   ` [PATCH v2 1/2] cpu:Remove the print when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY Kconfig option is disabled shechenglong
2025-10-20 14:51   ` [PATCH v2 2/2] cpu: fix hard lockup triggered by printk calls within scheduling context shechenglong
2025-10-29  3:45 ` [RESEND v2 0/2] arm64: spectre: Fix hard lockup and cleanup mitigation messages shechenglong
2025-10-29  3:45   ` [PATCH v2 1/2] cpu:Remove the print when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY Kconfig option is disabled shechenglong
2025-10-30 14:48     ` Will Deacon
2025-10-29  3:45   ` [PATCH v2 2/2] cpu: fix hard lockup triggered by printk calls within scheduling context shechenglong
2025-10-30 14:50     ` Will Deacon
2025-10-31  9:15 ` [PATCH v3 0/2] arm64: spectre: Fix hard lockup and cleanup mitigation messages shechenglong
2025-10-31  9:15   ` [PATCH v3 1/2] cpu:Remove the print when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY Kconfig option is disabled shechenglong
2025-10-31  9:15   ` [PATCH v3 2/2] cpu: fix hard lockup triggered by printk calls within scheduling context shechenglong
2025-11-07 15:53   ` [PATCH v3 0/2] arm64: spectre: Fix hard lockup and cleanup mitigation messages Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMvsxd8nHb5roC0o@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=chenjialong@xfusion.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shechenglong@xfusion.com \
    --cc=stone.xulei@xfusion.com \
    --cc=will@kernel.org \
    --cc=yuxiating@xfusion.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).