Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Sven Peter <sven@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>,
	Yureka Lilian <yureka@cyberchaos.dev>
Cc: Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, asahi@lists.linux.dev,
	Sasha Finkelstein <k@chaosmail.tech>
Subject: Re: [PATCH v2] arm64: errata: Handle Apple WFI State Loss
Date: Fri, 19 Jun 2026 14:40:22 +0200	[thread overview]
Message-ID: <c13d95d0-911f-48f6-b4ca-8a5e9aacef6c@kernel.org> (raw)
In-Reply-To: <ajUbcv75FuSPPi9o@J2N7QTR9R3>



On 6/19/26 12:38, Mark Rutland wrote:
> On Wed, Jun 17, 2026 at 09:23:03PM +0200, Yureka Lilian wrote:
>> On 6/15/26 17:02, Will Deacon wrote:
>>> On Mon, Jun 15, 2026 at 02:21:36PM +0200, Yureka Lilian wrote:
>>>> Apple Silicon CPUs can lose register state in WFI, leading to crashes
>>>> in the idle loop early in the boot process.
>>>> This applies to any previous Apple Silicon CPUs too, but is worked
>>>> around by configuring the WFI mode in SYS_IMP_APL_CYC_OVRD sysreg
>>>> during m1n1's chickens setup.
>>>> This workaround no longer exists since M4.
> 
> Are we *certain* that there's no equivalent control elsewhere? i.e. this
> hasn't just moved?

We are as certain as we can be short of Apple confirming this which 
isn't going to happen.
XNU has a helper function to "force wfi to use clock gating only" [1] 
which is how we learned about this control originally on M1.
This has been disabled starting with M4 using the "NO_CPU_OVRD" define 
which they describe as "CPU_OVRD register accesses are banned" [2]. If 
there was an equivalent control elsewhere I would expect them to just 
use that one instead.

In addition most non-architectural sysregs are read-only starting with 
M4 in the non-Apple-entitled boot mode so even if there was such a 
control we would likely not be able to access it.


[1] 
https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/osfmk/arm64/machine_routines_asm.s#L1129
[2] 
https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/pexpert/pexpert/arm64/board_config.h#L197

> 
>>>> Add a workaround capability for replacing wfi and wfit with nop, and
>>>> an erratum to enable it on the affected CPUs if the workaround using the
>>>> sysreg is not already applied. Leave the decision whether the sysreg
[...]
>>>> +	} while (0)
>>> How can you guarantee that we don't run one of these prior to patching?
>>
>> We can't, but there are a few points to our advantage, namely the boot cpu
>> isn't actually affected by this (when the CYC_OVRD bits are not configured
>> or not supported), and first round of patching happens quite early before
>> the other cpus are started.
> 
> I think you're saying that:
> 
> * On the boot CPU, WFI *never* loses register state.
> 
> * On other CPUs, WFI *might* lose register state (and this cannot be
>    inhibited).
> 
> Is that understanding correct, or are there other conditions where a WFI
> on the boot CPU can lose register state?

Those are our current observations, yes. We don't know why the boot CPU 
behaves differently and there no differences in any Apple sysregs that 
would explain it.

But looking at all wfis in the kernel there are bunch in head.S and 
similar for infinite loops where we don't care if register state is 
lost. The only two that currently matter are a wfit in __delay and the 
wfi in the idle loop.
The __delay one gets enabled after arm64_features are found which 
happens just before arm64_errata from setup_boot_cpu_features() and 
there's no __delay call inbetween that and when alternatives are 
applied. If we follow Will's suggestion with an early_param that happens 
much earlier as well.
My understanding is that the idle loop won't be reached before 
sched_init() and that also happens much later.

> 
> IIRC kdump doesn't ensure the new kernel is started on the boot CPU, so
> I think that would be broken. I guess you can't kexec generally due to a
> lack of offlining of secondary CPUs.

Next to that, kexec also runs into issues with all the various 
co-processors which we can't easily reset or shut down once they've been 
brought up once.


Sven



      reply	other threads:[~2026-06-19 12:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15 12:21 [PATCH v2] arm64: errata: Handle Apple WFI State Loss Yureka Lilian
2026-06-15 12:59 ` Nick Chan
2026-06-15 15:02 ` Will Deacon
2026-06-15 15:27   ` Sven Peter
2026-06-17 19:23   ` Yureka Lilian
2026-06-19  9:24     ` Will Deacon
2026-06-19 10:38     ` Mark Rutland
2026-06-19 12:40       ` Sven Peter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c13d95d0-911f-48f6-b4ca-8a5e9aacef6c@kernel.org \
    --to=sven@kernel.org \
    --cc=asahi@lists.linux.dev \
    --cc=catalin.marinas@arm.com \
    --cc=k@chaosmail.tech \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=will@kernel.org \
    --cc=yureka@cyberchaos.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox