* [PATCH] arm64: traps: Mark kernel as tainted on SError panic
@ 2025-07-10 10:46 Breno Leitao
2025-07-13 22:46 ` Will Deacon
0 siblings, 1 reply; 5+ messages in thread
From: Breno Leitao @ 2025-07-10 10:46 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon
Cc: linux-arm-kernel, linux-kernel, kernel-team, osandov, leo.yan,
rmikey, Breno Leitao
Set TAINT_MACHINE_CHECK when SError interrupts trigger a panic to
flag potential hardware faults. This tainting mechanism aids in
debugging and enables correlation of hardware-related crashes in
large-scale deployments.
This change aligns with similar patches[1] that mark machine check
events when the system crashes due to hardware errors.
Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
---
arch/arm64/kernel/traps.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 9bfa5c944379d..7468b22585cef 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
{
+ add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
console_verbose();
pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
---
base-commit: 8c2e52ebbe885c7eeaabd3b7ddcdc1246fc400d2
change-id: 20250710-arm_serror-77fca8d732d4
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic
2025-07-10 10:46 [PATCH] arm64: traps: Mark kernel as tainted on SError panic Breno Leitao
@ 2025-07-13 22:46 ` Will Deacon
2025-07-14 12:26 ` Breno Leitao
0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2025-07-13 22:46 UTC (permalink / raw)
To: Breno Leitao
Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, kernel-team,
osandov, leo.yan, rmikey
On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote:
> Set TAINT_MACHINE_CHECK when SError interrupts trigger a panic to
> flag potential hardware faults. This tainting mechanism aids in
> debugging and enables correlation of hardware-related crashes in
> large-scale deployments.
>
> This change aligns with similar patches[1] that mark machine check
> events when the system crashes due to hardware errors.
>
> Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1]
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> arch/arm64/kernel/traps.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 9bfa5c944379d..7468b22585cef 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
>
> void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
> {
> + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> console_verbose();
>
> pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
If we're going to taint for SError, shouldn't we also taint for an
unclaimed SEA?
Will
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic
2025-07-13 22:46 ` Will Deacon
@ 2025-07-14 12:26 ` Breno Leitao
2025-07-15 14:02 ` Will Deacon
0 siblings, 1 reply; 5+ messages in thread
From: Breno Leitao @ 2025-07-14 12:26 UTC (permalink / raw)
To: Will Deacon
Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, kernel-team,
osandov, leo.yan, rmikey
Hello Will,
On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote:
> On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote:
> > --- a/arch/arm64/kernel/traps.c
> > +++ b/arch/arm64/kernel/traps.c
> > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
> >
> > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
> > {
> > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> > console_verbose();
> >
> > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
>
> If we're going to taint for SError, shouldn't we also taint for an
> unclaimed SEA?
Yes. I was not very familiar with SEA errors, given I haven't seen on in
production yet, but, reading about it, that is another seems to crash
the system due to hardware errors, thus, we want to taint MACHINE_CHECK.
What about this?
Author: Breno Leitao <leitao@debian.org>
Date: Mon Jul 14 05:16:55 2025 -0700
arm64: Taint kernel on fatal hardware error in do_sea()
This patch updates the do_sea() handler to taint the kernel with
TAINT_MACHINE_CHECK when a fatal hardware error is detected and
reported through Synchronous External Abort (SEA). By marking
the kernel as tainted at the point of error, we improve
post-mortem diagnostics and make it clear that a machine check
or unrecoverable hardware fault has occurred.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 11eb8d1adc84..f590dc71ce99 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
*/
siaddr = untagged_addr(far);
}
+ add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
return 0;
Thanks for the suggestion,
--breno
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic
2025-07-14 12:26 ` Breno Leitao
@ 2025-07-15 14:02 ` Will Deacon
2025-07-15 16:09 ` Breno Leitao
0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2025-07-15 14:02 UTC (permalink / raw)
To: Breno Leitao
Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, kernel-team,
osandov, leo.yan, rmikey
On Mon, Jul 14, 2025 at 05:26:43AM -0700, Breno Leitao wrote:
> On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote:
> > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote:
>
> > > --- a/arch/arm64/kernel/traps.c
> > > +++ b/arch/arm64/kernel/traps.c
> > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
> > >
> > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
> > > {
> > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> > > console_verbose();
> > >
> > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
> >
> > If we're going to taint for SError, shouldn't we also taint for an
> > unclaimed SEA?
>
> Yes. I was not very familiar with SEA errors, given I haven't seen on in
> production yet, but, reading about it, that is another seems to crash
> the system due to hardware errors, thus, we want to taint MACHINE_CHECK.
>
> What about this?
>
> Author: Breno Leitao <leitao@debian.org>
> Date: Mon Jul 14 05:16:55 2025 -0700
>
> arm64: Taint kernel on fatal hardware error in do_sea()
>
> This patch updates the do_sea() handler to taint the kernel with
> TAINT_MACHINE_CHECK when a fatal hardware error is detected and
> reported through Synchronous External Abort (SEA). By marking
> the kernel as tainted at the point of error, we improve
> post-mortem diagnostics and make it clear that a machine check
> or unrecoverable hardware fault has occurred.
>
> Suggested-by: Will Deacon <will@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 11eb8d1adc84..f590dc71ce99 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
> */
> siaddr = untagged_addr(far);
> }
> + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>
> return 0;
Yeah, I reckon so. Probably just fold these into a single patch, though.
Cheers,
Will
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic
2025-07-15 14:02 ` Will Deacon
@ 2025-07-15 16:09 ` Breno Leitao
0 siblings, 0 replies; 5+ messages in thread
From: Breno Leitao @ 2025-07-15 16:09 UTC (permalink / raw)
To: Will Deacon
Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, kernel-team,
osandov, leo.yan, rmikey
On Tue, Jul 15, 2025 at 03:02:13PM +0100, Will Deacon wrote:
> On Mon, Jul 14, 2025 at 05:26:43AM -0700, Breno Leitao wrote:
> > On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote:
> > > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote:
> >
> > > > --- a/arch/arm64/kernel/traps.c
> > > > +++ b/arch/arm64/kernel/traps.c
> > > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
> > > >
> > > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
> > > > {
> > > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> > > > console_verbose();
> > > >
> > > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
> > >
> > > If we're going to taint for SError, shouldn't we also taint for an
> > > unclaimed SEA?
> >
> > Yes. I was not very familiar with SEA errors, given I haven't seen on in
> > production yet, but, reading about it, that is another seems to crash
> > the system due to hardware errors, thus, we want to taint MACHINE_CHECK.
> >
> > What about this?
> >
> > Author: Breno Leitao <leitao@debian.org>
> > Date: Mon Jul 14 05:16:55 2025 -0700
> >
> > arm64: Taint kernel on fatal hardware error in do_sea()
> >
> > This patch updates the do_sea() handler to taint the kernel with
> > TAINT_MACHINE_CHECK when a fatal hardware error is detected and
> > reported through Synchronous External Abort (SEA). By marking
> > the kernel as tainted at the point of error, we improve
> > post-mortem diagnostics and make it clear that a machine check
> > or unrecoverable hardware fault has occurred.
> >
> > Suggested-by: Will Deacon <will@kernel.org>
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> >
> > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > index 11eb8d1adc84..f590dc71ce99 100644
> > --- a/arch/arm64/mm/fault.c
> > +++ b/arch/arm64/mm/fault.c
> > @@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
> > */
> > siaddr = untagged_addr(far);
> > }
> > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> > arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
> >
> > return 0;
>
> Yeah, I reckon so. Probably just fold these into a single patch, though.
Thanks. I test it better tomorrow, then send it.
Thanks for the suggestions,
--breno
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-15 16:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-10 10:46 [PATCH] arm64: traps: Mark kernel as tainted on SError panic Breno Leitao
2025-07-13 22:46 ` Will Deacon
2025-07-14 12:26 ` Breno Leitao
2025-07-15 14:02 ` Will Deacon
2025-07-15 16:09 ` Breno Leitao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).