From: Catalin Marinas <catalin.marinas@arm.com>
To: shechenglong <shechenglong@xfusion.com>
Cc: will@kernel.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, stone.xulei@xfusion.com,
chenjialong@xfusion.com, yuxiating@xfusion.com
Subject: Re: [PATCH] cpu: fix hard lockup triggered during stress-ng stress testing.
Date: Thu, 18 Sep 2025 12:28:05 +0100 [thread overview]
Message-ID: <aMvsxd8nHb5roC0o@arm.com> (raw)
In-Reply-To: <20250918064907.1832-1-shechenglong@xfusion.com>
On Thu, Sep 18, 2025 at 02:49:07PM +0800, shechenglong wrote:
> Context of the Issue:
> In an ARM64 environment, the following steps were performed:
>
> 1. Repeatedly ran stress-ng to stress the CPU, memory, and I/O.
> 2. Cyclically executed test case pty06 from the LTP test suite.
> 3. Added mitigations=off to the GRUB parameters.
>
> After 1–2 hours of stress testing, a hardlockup occurred,
> causing a system crash.
>
> Root Cause of the Hardlockup:
> Each time stress-ng starts, it invokes the /sys/kernel/debug/clear_warn_once
> interface, which clears the values in the memory section from __start_once
> to __end_once. This caused functions like pr_info_once() — originally
> designed to print only once — to print again every time stress-ng was called.
> If the pty06 test case happened to be using the serial module at that same
> moment, it would sleep in waiter.list within the __down_common function.
>
> After pr_info_once() completed its output using the serial module,
> it invoked the semaphore up() function to wake up the process waiting
> in waiter.list. This sequence triggered an A-A deadlock, ultimately
> leading to a hardlockup and system crash.
>
> To prevent this, a local variable should be used to control and ensure
> the print operation occurs only once.
>
> Hard lockup call stack:
>
> _raw_spin_lock_nested+168
> ttwu_queue+180 (rq_lock(rq, &rf); 2nd acquiring the rq->__lock)
> try_to_wake_up+548
> wake_up_process+32
> __up+88
> up+100
> __up_console_sem+96
> console_unlock+696
> vprintk_emit+428
> vprintk_default+64
> vprintk_func+220
> printk+104
> spectre_v4_enable_task_mitigation+344
> __switch_to+100
> __schedule+1028 (rq_lock(rq, &rf); 1st acquiring the rq->__lock)
> schedule_idle+48
> do_idle+388
> cpu_startup_entry+44
> secondary_start_kernel+352
Is the problem actually that we call the spectre v4 stuff on the
switch_to() path (we can't change this) under the rq_lock() and it
subsequently calls printk() which takes the console semaphore? I think
the "once" aspect makes it less likely but does not address the actual
problem.
> diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
> index edf1783ffc81..f8663157e041 100644
> --- a/arch/arm64/kernel/proton-pack.c
> +++ b/arch/arm64/kernel/proton-pack.c
> @@ -424,8 +424,10 @@ static bool spectre_v4_mitigations_off(void)
> bool ret = cpu_mitigations_off() ||
> __spectre_v4_policy == SPECTRE_V4_POLICY_MITIGATION_DISABLED;
>
> - if (ret)
> - pr_info_once("spectre-v4 mitigation disabled by command-line option\n");
> + static atomic_t __printk_once = ATOMIC_INIT(0);
> +
> + if (ret && !atomic_cmpxchg(&__printk_once, 0, 1))
> + pr_info("spectre-v4 mitigation disabled by command-line option\n");
>
> return ret;
> }
I think we should just avoid the printk() on the
spectre_v4_enable_task_mitigation() path. Well, I'd remove it altogether
from the spectre_v4_mitigations_off() as it's called on kernel entry as
well. Just add a different way to print the status during kernel boot if
there isn't one already, maybe an initcall.
--
Catalin
next prev parent reply other threads:[~2025-09-18 11:28 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-18 6:49 [PATCH] cpu: fix hard lockup triggered during stress-ng stress testing shechenglong
2025-09-18 11:28 ` Catalin Marinas [this message]
2025-09-19 12:05 ` 答复: " shechenglong
2025-09-22 16:54 ` Catalin Marinas
2025-09-22 16:08 ` Mark Rutland
2025-09-24 12:32 ` [PATCH] cpu: fix hard lockup triggered by printk calls within scheduling context shechenglong
2025-09-25 13:48 ` Catalin Marinas
2025-10-03 14:23 ` Will Deacon
2025-10-20 14:51 ` [PATCH v2 0/2] arm64: spectre: Fix hard lockup and cleanup mitigation messages shechenglong
2025-10-20 14:51 ` [PATCH v2 1/2] cpu:Remove the print when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY Kconfig option is disabled shechenglong
2025-10-20 14:51 ` shechenglong
2025-10-20 14:51 ` [PATCH v2 2/2] cpu: fix hard lockup triggered by printk calls within scheduling context shechenglong
2025-10-20 14:51 ` shechenglong
2025-10-29 3:45 ` [RESEND v2 0/2] arm64: spectre: Fix hard lockup and cleanup mitigation messages shechenglong
2025-10-29 3:45 ` [PATCH v2 1/2] cpu:Remove the print when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY Kconfig option is disabled shechenglong
2025-10-29 3:45 ` shechenglong
2025-10-30 14:48 ` Will Deacon
2025-10-29 3:45 ` [PATCH v2 2/2] cpu: fix hard lockup triggered by printk calls within scheduling context shechenglong
2025-10-29 3:45 ` shechenglong
2025-10-30 14:50 ` Will Deacon
2025-10-31 9:15 ` [PATCH v3 0/2] arm64: spectre: Fix hard lockup and cleanup mitigation messages shechenglong
2025-10-31 9:15 ` [PATCH v3 1/2] cpu:Remove the print when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY Kconfig option is disabled shechenglong
2025-10-31 9:15 ` [PATCH v3 2/2] cpu: fix hard lockup triggered by printk calls within scheduling context shechenglong
2025-11-07 15:53 ` [PATCH v3 0/2] arm64: spectre: Fix hard lockup and cleanup mitigation messages Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aMvsxd8nHb5roC0o@arm.com \
--to=catalin.marinas@arm.com \
--cc=chenjialong@xfusion.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=shechenglong@xfusion.com \
--cc=stone.xulei@xfusion.com \
--cc=will@kernel.org \
--cc=yuxiating@xfusion.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.