* [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac
@ 2025-01-24 14:52 Emanuele Rocca
2025-01-30 11:43 ` Will Deacon
2025-01-30 12:19 ` Mark Rutland
0 siblings, 2 replies; 15+ messages in thread
From: Emanuele Rocca @ 2025-01-24 14:52 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: Mark Rutland
Hi,
longterm kernel 6.1.123 crashes early when booting on the Lenovo Thinkpad X13s
with the following error:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000264
pc: arm_smccc_version_init+0x90/0x1ac
According to faddr2line, that is line 31 of smccc.c:
arm_smccc_version_init+0x90/0x1ac:
arm_smccc_version_init at debian/build/build_arm64_none_arm64/drivers/firmware/smccc/smccc.c:31
22 void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
23 {
24 struct arm_smccc_res res;
25
26 smccc_version = version;
27 smccc_conduit = conduit;
28
29 smccc_trng_available = smccc_probe_trng();
30
31 if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
This is with kernel 6.1.123. The last known good kernel I have available right
now is 6.1.119. In the 6.1.120 changelog I see the following commit which seems
potentially related?
https://lore.kernel.org/all/20241106160448.2712997-1-mark.rutland@arm.com/
That's stable commit [1].
The relevant upstream commit [2] is in linux 6.12, and that kernel version does
not crash. Comparing [1] vs [2] I see differences, but I can't tell if they can
help debug the issue further.
Thanks,
ema
[1] https://git.kernel.org/linus/bfcaffd4cc2d61ecb0571c5baf127c4089978ad4
[2] https://git.kernel.org/linus/8c462d56487e3abdbf8a61cedfe7c795a54f4a78
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-01-24 14:52 [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac Emanuele Rocca @ 2025-01-30 11:43 ` Will Deacon 2025-02-05 10:35 ` Emanuele Rocca 2025-01-30 12:19 ` Mark Rutland 1 sibling, 1 reply; 15+ messages in thread From: Will Deacon @ 2025-01-30 11:43 UTC (permalink / raw) To: Emanuele Rocca; +Cc: linux-arm-kernel, Mark Rutland On Fri, Jan 24, 2025 at 03:52:10PM +0100, Emanuele Rocca wrote: > longterm kernel 6.1.123 crashes early when booting on the Lenovo Thinkpad X13s > with the following error: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000264 > > pc: arm_smccc_version_init+0x90/0x1ac Please can you share the full crash log? > According to faddr2line, that is line 31 of smccc.c: > > arm_smccc_version_init+0x90/0x1ac: > arm_smccc_version_init at debian/build/build_arm64_none_arm64/drivers/firmware/smccc/smccc.c:31 > > 22 void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit) > 23 { > 24 struct arm_smccc_res res; > 25 > 26 smccc_version = version; > 27 smccc_conduit = conduit; > 28 > 29 smccc_trng_available = smccc_probe_trng(); > 30 > 31 if ((smccc_version >= ARM_SMCCC_VERSION_1_2) && > > This is with kernel 6.1.123. The last known good kernel I have available right > now is 6.1.119. In the 6.1.120 changelog I see the following commit which seems > potentially related? > > https://lore.kernel.org/all/20241106160448.2712997-1-mark.rutland@arm.com/ > > That's stable commit [1]. I can't see anything wrong with that patch but if you have reason to suspect that it's broken can you try reverting it to see if the problem disappears? Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-01-30 11:43 ` Will Deacon @ 2025-02-05 10:35 ` Emanuele Rocca 2025-02-06 13:08 ` Will Deacon 0 siblings, 1 reply; 15+ messages in thread From: Emanuele Rocca @ 2025-02-05 10:35 UTC (permalink / raw) To: Will Deacon; +Cc: linux-arm-kernel, Mark Rutland Hello Will and Mark, On 2025-01-30 11:43, Will Deacon wrote: > On Fri, Jan 24, 2025 at 03:52:10PM +0100, Emanuele Rocca wrote: > > This is with kernel 6.1.123. The last known good kernel I have available right > > now is 6.1.119. In the 6.1.120 changelog I see the following commit which seems > > potentially related? > > > > https://lore.kernel.org/all/20241106160448.2712997-1-mark.rutland@arm.com/ > > > > That's stable commit [1]. > > I can't see anything wrong with that patch but if you have reason to > suspect that it's broken can you try reverting it to see if the problem > disappears? Confirmed: reverting the patch fixes the issue. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-02-05 10:35 ` Emanuele Rocca @ 2025-02-06 13:08 ` Will Deacon 0 siblings, 0 replies; 15+ messages in thread From: Will Deacon @ 2025-02-06 13:08 UTC (permalink / raw) To: Emanuele Rocca; +Cc: linux-arm-kernel, Mark Rutland On Wed, Feb 05, 2025 at 11:35:18AM +0100, Emanuele Rocca wrote: > On 2025-01-30 11:43, Will Deacon wrote: > > On Fri, Jan 24, 2025 at 03:52:10PM +0100, Emanuele Rocca wrote: > > > This is with kernel 6.1.123. The last known good kernel I have available right > > > now is 6.1.119. In the 6.1.120 changelog I see the following commit which seems > > > potentially related? > > > > > > https://lore.kernel.org/all/20241106160448.2712997-1-mark.rutland@arm.com/ > > > > > > That's stable commit [1]. > > > > I can't see anything wrong with that patch but if you have reason to > > suspect that it's broken can you try reverting it to see if the problem > > disappears? > > Confirmed: reverting the patch fixes the issue. I guess that's just a happy accident thanks to the clobber list changing on the smc asm with the sve check. Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-01-24 14:52 [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac Emanuele Rocca 2025-01-30 11:43 ` Will Deacon @ 2025-01-30 12:19 ` Mark Rutland 2025-01-30 14:56 ` Emanuele Rocca 1 sibling, 1 reply; 15+ messages in thread From: Mark Rutland @ 2025-01-30 12:19 UTC (permalink / raw) To: Emanuele Rocca; +Cc: linux-arm-kernel On Fri, Jan 24, 2025 at 03:52:10PM +0100, Emanuele Rocca wrote: > Hi, Hi Emanuele, > longterm kernel 6.1.123 crashes early when booting on the Lenovo Thinkpad X13s > with the following error: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000264 > > pc: arm_smccc_version_init+0x90/0x1ac > > According to faddr2line, that is line 31 of smccc.c: > > arm_smccc_version_init+0x90/0x1ac: > arm_smccc_version_init at debian/build/build_arm64_none_arm64/drivers/firmware/smccc/smccc.c:31 > > 22 void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit) > 23 { > 24 struct arm_smccc_res res; > 25 > 26 smccc_version = version; > 27 smccc_conduit = conduit; > 28 > 29 smccc_trng_available = smccc_probe_trng(); > 30 > 31 if ((smccc_version >= ARM_SMCCC_VERSION_1_2) && For the benefit others, when we looked into this a few days ago it appeared that a GPR was being clobbered across an SMCCC call, resulting in a later crash (as that GPR should hold the ADRP'd base address of 'smccc_version'). I didn't have the time to dig more into that (e.g. to figure out whether kernel/compiler/firmware was to blame). Emanuele, could you please dump the result of: objdump --disassemble=arm_smccc_version_init vmlinux ... for this kernel? That'd make it possible for others to perform/verify the analysis I mentioned above. If you can share any more details from the crash, that'd be helpful. The GPR dump would be *enormously* helpful in this case, and even a photo of the crash log might be useful. > This is with kernel 6.1.123. The last known good kernel I have available right > now is 6.1.119. In the 6.1.120 changelog I see the following commit which seems > potentially related? > > https://lore.kernel.org/all/20241106160448.2712997-1-mark.rutland@arm.com/ Last I looked, there was no obvious reason why that should have an effect on this issue. It's possible that the differing asm constraints have an effect on code generation, and happen to mask the issue. From a quick scan, I note that the asm constraints *don't* include clobber x17, and maybe that's getting clobbered by a veneer between the BL and __arm_smccc_sve_check(). As above, it would really help to have the disassembly for arm_smccc_version_init(), and the GPRs at the time of the crash. Mark. > > That's stable commit [1]. > > The relevant upstream commit [2] is in linux 6.12, and that kernel version does > not crash. Comparing [1] vs [2] I see differences, but I can't tell if they can > help debug the issue further. > > Thanks, > ema > > [1] https://git.kernel.org/linus/bfcaffd4cc2d61ecb0571c5baf127c4089978ad4 > [2] https://git.kernel.org/linus/8c462d56487e3abdbf8a61cedfe7c795a54f4a78 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-01-30 12:19 ` Mark Rutland @ 2025-01-30 14:56 ` Emanuele Rocca 2025-01-31 12:41 ` Will Deacon 0 siblings, 1 reply; 15+ messages in thread From: Emanuele Rocca @ 2025-01-30 14:56 UTC (permalink / raw) To: Mark Rutland; +Cc: linux-arm-kernel On 2025-01-30 12:19, Mark Rutland wrote: > For the benefit others, when we looked into this a few days ago it > appeared that a GPR was being clobbered across an SMCCC call, resulting > in a later crash (as that GPR should hold the ADRP'd base address of > 'smccc_version'). I didn't have the time to dig more into that (e.g. to > figure out whether kernel/compiler/firmware was to blame). > > Emanuele, could you please dump the result of: > > objdump --disassemble=arm_smccc_version_init vmlinux > > ... for this kernel? That'd make it possible for others to > perform/verify the analysis I mentioned above. Sure, here it is: https://people.debian.org/~ema/Z5OpGluX6oX5NLxh@NH27D9T0LF.objdump.txt > If you can share any more details from the crash, that'd be helpful. The > GPR dump would be *enormously* helpful in this case, and even a photo of > the crash log might be useful. pc : arm_smccc_version_init+0x90/0x1ac lr : psci_probe+0x1fc/0x2c0 sp : ffffa3d8eea33ca0 x29: ffffa3d8eea33ce0 x28: 0000000081000200 x27: ffffa3d8edd72a80 x26: ffffa3d8edd728b0 x25: ffffa3d8eec66248 x24: ffffa3d8edc528d8 x23: ffffa3d8eed7f4f8 x22: 0000000000000001 x21: ffffa3d8eea58410 x20: ffffa3d8eed7f000 x19: 0000000000010001 x18: 0000000000000006 x17: 0000000009f69730 x16: 00000008760968d0 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 00000000ffffefff x10: ffffa3d8eeac09c8 x9 : ffffa3d8eea687d8 x8 : ffffa3d8eea33cb8 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffffa3d8eed7f000 x4 : ffffa3d8edeb0000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000 Call trace: arm_smccc_version_init+0x90/0x1ac psci_0_2_init+0x20/0x2c psci_1_0_init+0x1c/0x58 psci_dt_init+0x6c/0x98 setup_arch+0x400/0x5ec start_kernel+0x90/0x788 __primary_switched+0xbc/0xc4 Code: d29fffe1 eb01001f 1a9f97e0 f0ffd384 (b94264c2) The above was transcribed by hand. I double-checked, but there may be mistakes. Photo here: https://people.debian.org/~ema/Z5OpGluX6oX5NLxh@NH27D9T0LF.jpg Thanks, ema ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-01-30 14:56 ` Emanuele Rocca @ 2025-01-31 12:41 ` Will Deacon 2025-01-31 13:54 ` Mark Rutland 0 siblings, 1 reply; 15+ messages in thread From: Will Deacon @ 2025-01-31 12:41 UTC (permalink / raw) To: Emanuele Rocca; +Cc: Mark Rutland, linux-arm-kernel On Thu, Jan 30, 2025 at 03:56:10PM +0100, Emanuele Rocca wrote: > On 2025-01-30 12:19, Mark Rutland wrote: > > For the benefit others, when we looked into this a few days ago it > > appeared that a GPR was being clobbered across an SMCCC call, resulting > > in a later crash (as that GPR should hold the ADRP'd base address of > > 'smccc_version'). I didn't have the time to dig more into that (e.g. to > > figure out whether kernel/compiler/firmware was to blame). > > > > Emanuele, could you please dump the result of: > > > > objdump --disassemble=arm_smccc_version_init vmlinux > > > > ... for this kernel? That'd make it possible for others to > > perform/verify the analysis I mentioned above. > > Sure, here it is: > https://people.debian.org/~ema/Z5OpGluX6oX5NLxh@NH27D9T0LF.objdump.txt > > > If you can share any more details from the crash, that'd be helpful. The > > GPR dump would be *enormously* helpful in this case, and even a photo of > > the crash log might be useful. > > pc : arm_smccc_version_init+0x90/0x1ac > lr : psci_probe+0x1fc/0x2c0 > sp : ffffa3d8eea33ca0 > x29: ffffa3d8eea33ce0 x28: 0000000081000200 x27: ffffa3d8edd72a80 > x26: ffffa3d8edd728b0 x25: ffffa3d8eec66248 x24: ffffa3d8edc528d8 > x23: ffffa3d8eed7f4f8 x22: 0000000000000001 x21: ffffa3d8eea58410 > x20: ffffa3d8eed7f000 x19: 0000000000010001 x18: 0000000000000006 > x17: 0000000009f69730 x16: 00000008760968d0 x15: 0000000000000000 > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 > x11: 00000000ffffefff x10: ffffa3d8eeac09c8 x9 : ffffa3d8eea687d8 > x8 : ffffa3d8eea33cb8 x7 : 0000000000000000 x6 : 0000000000000000 > x5 : ffffa3d8eed7f000 x4 : ffffa3d8edeb0000 x3 : 0000000000000000 > x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000 > > Call trace: > arm_smccc_version_init+0x90/0x1ac > psci_0_2_init+0x20/0x2c > psci_1_0_init+0x1c/0x58 > psci_dt_init+0x6c/0x98 > setup_arch+0x400/0x5ec > start_kernel+0x90/0x788 > __primary_switched+0xbc/0xc4 > Code: d29fffe1 eb01001f 1a9f97e0 f0ffd384 (b94264c2) > > The above was transcribed by hand. I double-checked, but there may be > mistakes. > > Photo here: https://people.debian.org/~ema/Z5OpGluX6oX5NLxh@NH27D9T0LF.jpg Crikey, you transcribed all that junk by _hand_?! Thanks for the effort, but next time please don't waste your time! The screenshot is fine. Anyway, it looks like x6 gets trashed across the SMC. I think that's fine per the SMCCC spec: | Unused result and scratch registers can leak information after an | SMC or HVC call. An implementation can mitigate this risk by either | preserving the register state over the call, or returning a constant | value,such as zero, in each register. so the question is whether or not the compiler should be relying on the register being preserved. It looks to me like the arm_smccc_1_1_smc() macro is actually broken in this regard: it constructs the input list based on the number of arguments passed and doesn't take the above into account at all. So I think we should change that so unused arguments are clobbered instead. Mark -- what do you think? Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-01-31 12:41 ` Will Deacon @ 2025-01-31 13:54 ` Mark Rutland 2025-02-04 10:00 ` Will Deacon 0 siblings, 1 reply; 15+ messages in thread From: Mark Rutland @ 2025-01-31 13:54 UTC (permalink / raw) To: Will Deacon; +Cc: Emanuele Rocca, linux-arm-kernel On Fri, Jan 31, 2025 at 12:41:04PM +0000, Will Deacon wrote: > On Thu, Jan 30, 2025 at 03:56:10PM +0100, Emanuele Rocca wrote: > > On 2025-01-30 12:19, Mark Rutland wrote: > > > For the benefit others, when we looked into this a few days ago it > > > appeared that a GPR was being clobbered across an SMCCC call, resulting > > > in a later crash (as that GPR should hold the ADRP'd base address of > > > 'smccc_version'). I didn't have the time to dig more into that (e.g. to > > > figure out whether kernel/compiler/firmware was to blame). > > > > > > Emanuele, could you please dump the result of: > > > > > > objdump --disassemble=arm_smccc_version_init vmlinux > > > > > > ... for this kernel? That'd make it possible for others to > > > perform/verify the analysis I mentioned above. > > > > Sure, here it is: > > https://people.debian.org/~ema/Z5OpGluX6oX5NLxh@NH27D9T0LF.objdump.txt > > > > > If you can share any more details from the crash, that'd be helpful. The > > > GPR dump would be *enormously* helpful in this case, and even a photo of > > > the crash log might be useful. > > > > pc : arm_smccc_version_init+0x90/0x1ac > > lr : psci_probe+0x1fc/0x2c0 > > sp : ffffa3d8eea33ca0 > > x29: ffffa3d8eea33ce0 x28: 0000000081000200 x27: ffffa3d8edd72a80 > > x26: ffffa3d8edd728b0 x25: ffffa3d8eec66248 x24: ffffa3d8edc528d8 > > x23: ffffa3d8eed7f4f8 x22: 0000000000000001 x21: ffffa3d8eea58410 > > x20: ffffa3d8eed7f000 x19: 0000000000010001 x18: 0000000000000006 > > x17: 0000000009f69730 x16: 00000008760968d0 x15: 0000000000000000 > > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 > > x11: 00000000ffffefff x10: ffffa3d8eeac09c8 x9 : ffffa3d8eea687d8 > > x8 : ffffa3d8eea33cb8 x7 : 0000000000000000 x6 : 0000000000000000 > > x5 : ffffa3d8eed7f000 x4 : ffffa3d8edeb0000 x3 : 0000000000000000 > > x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000 > > > > Call trace: > > arm_smccc_version_init+0x90/0x1ac > > psci_0_2_init+0x20/0x2c > > psci_1_0_init+0x1c/0x58 > > psci_dt_init+0x6c/0x98 > > setup_arch+0x400/0x5ec > > start_kernel+0x90/0x788 > > __primary_switched+0xbc/0xc4 > > Code: d29fffe1 eb01001f 1a9f97e0 f0ffd384 (b94264c2) > > > > The above was transcribed by hand. I double-checked, but there may be > > mistakes. > > > > Photo here: https://people.debian.org/~ema/Z5OpGluX6oX5NLxh@NH27D9T0LF.jpg > > Crikey, you transcribed all that junk by _hand_?! Thanks for the effort, > but next time please don't waste your time! The screenshot is fine. > > Anyway, it looks like x6 gets trashed across the SMC. I think that's fine > per the SMCCC spec: > > | Unused result and scratch registers can leak information after an > | SMC or HVC call. An implementation can mitigate this risk by either > | preserving the register state over the call, or returning a constant > | value,such as zero, in each register. I think that's misleading/stale and x6 specifically is not supposed to be clobbered. Shortly before that there's wording about preserving the registers, summary/history below: * SMCCC v1.0 allowed x0 to x17 to be clobbered over a call; all other GPRs had to be preserved. That was defined in ARM DEN 0028B, AKA "SMC CALLING CONVENTION": https://developer.arm.com/documentation/den0028/b/?lang=en * SMCCC v1.1 mandated that x4 to x17 were preserved. This was critical for some of the ARCH_WORKAROUND_* calls made from kernel/kvm entry asm. That was defined in ARM DEN 0070, AKA "Firmware interfaces for mitigating cache speculation vulnerabilities": https://developer.arm.com/cache-speculation-vulnerability-firmware-specification * SMCCC v1.2 relaxed things such that x4 to x17 must be preserved *unless* they contained results of the function. That was defined in ARM DEN 0028C, AKA "SMC CALLING CONVENTION": https://developer.arm.com/documentation/den0028/c/?lang=en .. which folded inthe requirements from ARM DEN 0070, with the relaxation above, but also kept the stale not from ARM DEN 0028B. In Linux we depend on x4 to x17 not being clobbered *unless* they're being used to return a result. The current wording in the spec is messed up, and suggests that SMC32 calls could clobber the upper 32 bits of x4 to x17, which is inccorect and would break the ARCH_WORKAROUND_* calls mentioend above (given those are SMC32). I'll go chase the SMCCC architects about that; it's clearly an unintentional error when reformatting the document. The *intent* is that x4 to x17 are preserved here. AFAICT, nothing in arm_smccc_version_init() returns a value in x4 to x17. If any of those are clobbered for an SMCCCv1.1 call, it's definitely a firmware bug. > so the question is whether or not the compiler should be relying on the > register being preserved. It looks to me like the arm_smccc_1_1_smc() > macro is actually broken in this regard: it constructs the input list > based on the number of arguments passed and doesn't take the above into > account at all. As above, I think that arm_smccc_1_1_smc() is correct; it *always* clobbers x0 to x3, and cannot be used for SMCCCv1.2+ calls that return values in x4 to x17. That doesn't change the fact that we'll need a workaround, though. > So I think we should change that so unused arguments are clobbered > instead. Mark -- what do you think? As above, if FW gets this wrong for calls made from entry asm, the results could be catastrophic, so it'd be good if we could detect this and log a FW BUG the first time we hit it. We'd have to assume that any of x0 to x17 could be clobbered, and at that point I suspect it'd be better to move this out-of-line for almost all callers, which we effectively already have in the form of the arm_smccc_1_2_{smc,hvc}() trampolines. We could bounce arm_smccc_1_1_*() calls through those. Mark. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-01-31 13:54 ` Mark Rutland @ 2025-02-04 10:00 ` Will Deacon 2025-02-05 16:50 ` Emanuele Rocca 0 siblings, 1 reply; 15+ messages in thread From: Will Deacon @ 2025-02-04 10:00 UTC (permalink / raw) To: Mark Rutland; +Cc: Emanuele Rocca, linux-arm-kernel On Fri, Jan 31, 2025 at 01:54:57PM +0000, Mark Rutland wrote: > On Fri, Jan 31, 2025 at 12:41:04PM +0000, Will Deacon wrote: > > Anyway, it looks like x6 gets trashed across the SMC. I think that's fine > > per the SMCCC spec: > > > > | Unused result and scratch registers can leak information after an > > | SMC or HVC call. An implementation can mitigate this risk by either > > | preserving the register state over the call, or returning a constant > > | value,such as zero, in each register. > > I think that's misleading/stale and x6 specifically is not supposed to > be clobbered. Shortly before that there's wording about preserving the > registers, summary/history below: > > * SMCCC v1.0 allowed x0 to x17 to be clobbered over a call; all other > GPRs had to be preserved. > > That was defined in ARM DEN 0028B, AKA "SMC CALLING CONVENTION": > https://developer.arm.com/documentation/den0028/b/?lang=en > > * SMCCC v1.1 mandated that x4 to x17 were preserved. This was critical > for some of the ARCH_WORKAROUND_* calls made from kernel/kvm entry > asm. > > That was defined in ARM DEN 0070, AKA "Firmware interfaces for > mitigating cache speculation vulnerabilities": > > https://developer.arm.com/cache-speculation-vulnerability-firmware-specification I can't find "DEN 0070" anywhere there... > * SMCCC v1.2 relaxed things such that x4 to x17 must be preserved > *unless* they contained results of the function. > > That was defined in ARM DEN 0028C, AKA "SMC CALLING CONVENTION": > https://developer.arm.com/documentation/den0028/c/?lang=en > > .. which folded inthe requirements from ARM DEN 0070, with the > relaxation above, but also kept the stale not from ARM DEN 0028B. > > In Linux we depend on x4 to x17 not being clobbered *unless* they're > being used to return a result. > > The current wording in the spec is messed up, and suggests that SMC32 > calls could clobber the upper 32 bits of x4 to x17, which is inccorect > and would break the ARCH_WORKAROUND_* calls mentioend above (given those > are SMC32). I'll go chase the SMCCC architects about that; it's clearly > an unintentional error when reformatting the document. > > The *intent* is that x4 to x17 are preserved here. AFAICT, nothing in > arm_smccc_version_init() returns a value in x4 to x17. If any of those > are clobbered for an SMCCCv1.1 call, it's definitely a firmware bug. Perhaps, but I must admit that I have sympathy for the f/w developers here given that the latest spec on the Arm website says that they can do this and the spec that apparently requires the registers to be preserved is either missing or hard to find. How are they supposed to know what the intention was when the documentation definitively says something else? Emanuele -- could you hack the code to poison the other unused result registers () and see if they are also cleared? ARM_SMCCC_TRNG_VERSION looks like a 32-bit call, so that would be W1-W7 afaict. Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-02-04 10:00 ` Will Deacon @ 2025-02-05 16:50 ` Emanuele Rocca 2025-02-06 13:11 ` Will Deacon 0 siblings, 1 reply; 15+ messages in thread From: Emanuele Rocca @ 2025-02-05 16:50 UTC (permalink / raw) To: Will Deacon; +Cc: Mark Rutland, linux-arm-kernel Hello Will, On 2025-02-04 10:00, Will Deacon wrote: > Emanuele -- could you hack the code to poison the other unused result > registers () and see if they are also cleared? ARM_SMCCC_TRNG_VERSION > looks like a 32-bit call, so that would be W1-W7 afaict. Not sure if this is exactly what you are asking for, but right before the call to smccc_probe_trng(): https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/firmware/smccc/smccc.c?h=v6.1.124#n29 I did the following to write to all registers. First I tried W1-W7, then I went for W1-W17 too: register unsigned long w1 asm("w1"); [...] register unsigned long w17 asm("w17"); asm volatile( "mov w1, #0x1234\n" "mov w2, #0x2234\n" [...] "mov w16, #0x0234\n" "mov w17, #0x1234\n" ); The values I wrote were not overwritten, see https://people.debian.org/~ema/w1-w17.jpg ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-02-05 16:50 ` Emanuele Rocca @ 2025-02-06 13:11 ` Will Deacon 2025-02-06 16:57 ` Emanuele Rocca 0 siblings, 1 reply; 15+ messages in thread From: Will Deacon @ 2025-02-06 13:11 UTC (permalink / raw) To: Emanuele Rocca; +Cc: Mark Rutland, linux-arm-kernel On Wed, Feb 05, 2025 at 05:50:13PM +0100, Emanuele Rocca wrote: > Hello Will, > > On 2025-02-04 10:00, Will Deacon wrote: > > Emanuele -- could you hack the code to poison the other unused result > > registers () and see if they are also cleared? ARM_SMCCC_TRNG_VERSION > > looks like a 32-bit call, so that would be W1-W7 afaict. > > Not sure if this is exactly what you are asking for, but right before > the call to smccc_probe_trng(): > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/firmware/smccc/smccc.c?h=v6.1.124#n29 > > I did the following to write to all registers. First I tried W1-W7, then > I went for W1-W17 too: > > register unsigned long w1 asm("w1"); > [...] > register unsigned long w17 asm("w17"); > > asm volatile( > "mov w1, #0x1234\n" > "mov w2, #0x2234\n" > [...] > "mov w16, #0x0234\n" > "mov w17, #0x1234\n" > ); > > The values I wrote were not overwritten, see https://people.debian.org/~ema/w1-w17.jpg Hrm, now I'm confused :/ In your screenshot, x6 looks like it's retained its poison value, but that was the register being corrupted in the initial report. Maybe you could share the diff you made? Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-02-06 13:11 ` Will Deacon @ 2025-02-06 16:57 ` Emanuele Rocca 2025-03-13 22:08 ` Will Deacon 0 siblings, 1 reply; 15+ messages in thread From: Emanuele Rocca @ 2025-02-06 16:57 UTC (permalink / raw) To: Will Deacon; +Cc: Mark Rutland, linux-arm-kernel On 2025-02-06 01:11, Will Deacon wrote: > In your screenshot, x6 looks like it's retained its poison value, but > that was the register being corrupted in the initial report. Maybe you > could share the diff you made? Sure, please see [1]. A kernel built with that patch crashes and all registers retain their poison values. I now also tried moving the poisoning before the smccc_version / smccc_conduit assignments, and the resulting kernel does *not* crash. See [2]. [1] https://people.debian.org/~ema/w1-w17-crash.diff [2] https://people.debian.org/~ema/w1-w17-nocrash.diff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-02-06 16:57 ` Emanuele Rocca @ 2025-03-13 22:08 ` Will Deacon 2025-09-26 10:36 ` Dan Carpenter 0 siblings, 1 reply; 15+ messages in thread From: Will Deacon @ 2025-03-13 22:08 UTC (permalink / raw) To: Emanuele Rocca; +Cc: Mark Rutland, linux-arm-kernel On Thu, Feb 06, 2025 at 05:57:41PM +0100, Emanuele Rocca wrote: > On 2025-02-06 01:11, Will Deacon wrote: > > In your screenshot, x6 looks like it's retained its poison value, but > > that was the register being corrupted in the initial report. Maybe you > > could share the diff you made? > > Sure, please see [1]. A kernel built with that patch crashes and all > registers retain their poison values. Okey doke, at least it sounds like the firmware isn't taking advantage of the broken spec, then. In any case, I think the right way forward here is to use the out-of-line 1.2 helpers whenever we can (the spectre mitigations can stay as they are). Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-03-13 22:08 ` Will Deacon @ 2025-09-26 10:36 ` Dan Carpenter 2025-09-26 11:03 ` Will Deacon 0 siblings, 1 reply; 15+ messages in thread From: Dan Carpenter @ 2025-09-26 10:36 UTC (permalink / raw) To: Will Deacon; +Cc: Emanuele Rocca, Mark Rutland, linux-arm-kernel On Thu, Mar 13, 2025 at 10:08:36PM +0000, Will Deacon wrote: > On Thu, Feb 06, 2025 at 05:57:41PM +0100, Emanuele Rocca wrote: > > On 2025-02-06 01:11, Will Deacon wrote: > > > In your screenshot, x6 looks like it's retained its poison value, but > > > that was the register being corrupted in the initial report. Maybe you > > > could share the diff you made? > > > > Sure, please see [1]. A kernel built with that patch crashes and all > > registers retain their poison values. > > Okey doke, at least it sounds like the firmware isn't taking advantage > of the broken spec, then. > > In any case, I think the right way forward here is to use the > out-of-line 1.2 helpers whenever we can (the spectre mitigations can > stay as they are). > > Will > Any updates on this? I guess it only affects gcc13? Clang seems to boot okay. regards, dan carpenter ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac 2025-09-26 10:36 ` Dan Carpenter @ 2025-09-26 11:03 ` Will Deacon 0 siblings, 0 replies; 15+ messages in thread From: Will Deacon @ 2025-09-26 11:03 UTC (permalink / raw) To: Dan Carpenter; +Cc: Emanuele Rocca, Mark Rutland, linux-arm-kernel On Fri, Sep 26, 2025 at 01:36:34PM +0300, Dan Carpenter wrote: > On Thu, Mar 13, 2025 at 10:08:36PM +0000, Will Deacon wrote: > > On Thu, Feb 06, 2025 at 05:57:41PM +0100, Emanuele Rocca wrote: > > > On 2025-02-06 01:11, Will Deacon wrote: > > > > In your screenshot, x6 looks like it's retained its poison value, but > > > > that was the register being corrupted in the initial report. Maybe you > > > > could share the diff you made? > > > > > > Sure, please see [1]. A kernel built with that patch crashes and all > > > registers retain their poison values. > > > > Okey doke, at least it sounds like the firmware isn't taking advantage > > of the broken spec, then. > > > > In any case, I think the right way forward here is to use the > > out-of-line 1.2 helpers whenever we can (the spectre mitigations can > > stay as they are). > > > > Will > > > > Any updates on this? I guess it only affects gcc13? Clang seems to boot > okay. I don't recall seeing a patch but we should just move to the out-of-line SMCCC helpers wherever we can. The inline macros are fiddly to use, especially when you have to pad things out with zeroes at the caller to prevent registers from getting corrupted. Will ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-09-26 11:04 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-24 14:52 [BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac Emanuele Rocca 2025-01-30 11:43 ` Will Deacon 2025-02-05 10:35 ` Emanuele Rocca 2025-02-06 13:08 ` Will Deacon 2025-01-30 12:19 ` Mark Rutland 2025-01-30 14:56 ` Emanuele Rocca 2025-01-31 12:41 ` Will Deacon 2025-01-31 13:54 ` Mark Rutland 2025-02-04 10:00 ` Will Deacon 2025-02-05 16:50 ` Emanuele Rocca 2025-02-06 13:11 ` Will Deacon 2025-02-06 16:57 ` Emanuele Rocca 2025-03-13 22:08 ` Will Deacon 2025-09-26 10:36 ` Dan Carpenter 2025-09-26 11:03 ` Will Deacon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox