* [PATCH] arm64: traps: Mark kernel as tainted on SError panic @ 2025-07-10 10:46 Breno Leitao 2025-07-13 22:46 ` Will Deacon 0 siblings, 1 reply; 5+ messages in thread From: Breno Leitao @ 2025-07-10 10:46 UTC (permalink / raw) To: Catalin Marinas, Will Deacon Cc: linux-arm-kernel, linux-kernel, kernel-team, osandov, leo.yan, rmikey, Breno Leitao Set TAINT_MACHINE_CHECK when SError interrupts trigger a panic to flag potential hardware faults. This tainting mechanism aids in debugging and enables correlation of hardware-related crashes in large-scale deployments. This change aligns with similar patches[1] that mark machine check events when the system crashes due to hardware errors. Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1] Signed-off-by: Breno Leitao <leitao@debian.org> --- arch/arm64/kernel/traps.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 9bfa5c944379d..7468b22585cef 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr) { + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); console_verbose(); pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n", --- base-commit: 8c2e52ebbe885c7eeaabd3b7ddcdc1246fc400d2 change-id: 20250710-arm_serror-77fca8d732d4 Best regards, -- Breno Leitao <leitao@debian.org> ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic 2025-07-10 10:46 [PATCH] arm64: traps: Mark kernel as tainted on SError panic Breno Leitao @ 2025-07-13 22:46 ` Will Deacon 2025-07-14 12:26 ` Breno Leitao 0 siblings, 1 reply; 5+ messages in thread From: Will Deacon @ 2025-07-13 22:46 UTC (permalink / raw) To: Breno Leitao Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, kernel-team, osandov, leo.yan, rmikey On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote: > Set TAINT_MACHINE_CHECK when SError interrupts trigger a panic to > flag potential hardware faults. This tainting mechanism aids in > debugging and enables correlation of hardware-related crashes in > large-scale deployments. > > This change aligns with similar patches[1] that mark machine check > events when the system crashes due to hardware errors. > > Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1] > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > arch/arm64/kernel/traps.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c > index 9bfa5c944379d..7468b22585cef 100644 > --- a/arch/arm64/kernel/traps.c > +++ b/arch/arm64/kernel/traps.c > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr) > { > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > console_verbose(); > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n", If we're going to taint for SError, shouldn't we also taint for an unclaimed SEA? Will ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic 2025-07-13 22:46 ` Will Deacon @ 2025-07-14 12:26 ` Breno Leitao 2025-07-15 14:02 ` Will Deacon 0 siblings, 1 reply; 5+ messages in thread From: Breno Leitao @ 2025-07-14 12:26 UTC (permalink / raw) To: Will Deacon Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, kernel-team, osandov, leo.yan, rmikey Hello Will, On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote: > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote: > > --- a/arch/arm64/kernel/traps.c > > +++ b/arch/arm64/kernel/traps.c > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne > > > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr) > > { > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > > console_verbose(); > > > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n", > > If we're going to taint for SError, shouldn't we also taint for an > unclaimed SEA? Yes. I was not very familiar with SEA errors, given I haven't seen on in production yet, but, reading about it, that is another seems to crash the system due to hardware errors, thus, we want to taint MACHINE_CHECK. What about this? Author: Breno Leitao <leitao@debian.org> Date: Mon Jul 14 05:16:55 2025 -0700 arm64: Taint kernel on fatal hardware error in do_sea() This patch updates the do_sea() handler to taint the kernel with TAINT_MACHINE_CHECK when a fatal hardware error is detected and reported through Synchronous External Abort (SEA). By marking the kernel as tainted at the point of error, we improve post-mortem diagnostics and make it clear that a machine check or unrecoverable hardware fault has occurred. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 11eb8d1adc84..f590dc71ce99 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs) */ siaddr = untagged_addr(far); } + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr); return 0; Thanks for the suggestion, --breno ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic 2025-07-14 12:26 ` Breno Leitao @ 2025-07-15 14:02 ` Will Deacon 2025-07-15 16:09 ` Breno Leitao 0 siblings, 1 reply; 5+ messages in thread From: Will Deacon @ 2025-07-15 14:02 UTC (permalink / raw) To: Breno Leitao Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, kernel-team, osandov, leo.yan, rmikey On Mon, Jul 14, 2025 at 05:26:43AM -0700, Breno Leitao wrote: > On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote: > > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote: > > > > --- a/arch/arm64/kernel/traps.c > > > +++ b/arch/arm64/kernel/traps.c > > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne > > > > > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr) > > > { > > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > > > console_verbose(); > > > > > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n", > > > > If we're going to taint for SError, shouldn't we also taint for an > > unclaimed SEA? > > Yes. I was not very familiar with SEA errors, given I haven't seen on in > production yet, but, reading about it, that is another seems to crash > the system due to hardware errors, thus, we want to taint MACHINE_CHECK. > > What about this? > > Author: Breno Leitao <leitao@debian.org> > Date: Mon Jul 14 05:16:55 2025 -0700 > > arm64: Taint kernel on fatal hardware error in do_sea() > > This patch updates the do_sea() handler to taint the kernel with > TAINT_MACHINE_CHECK when a fatal hardware error is detected and > reported through Synchronous External Abort (SEA). By marking > the kernel as tainted at the point of error, we improve > post-mortem diagnostics and make it clear that a machine check > or unrecoverable hardware fault has occurred. > > Suggested-by: Will Deacon <will@kernel.org> > Signed-off-by: Breno Leitao <leitao@debian.org> > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index 11eb8d1adc84..f590dc71ce99 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs) > */ > siaddr = untagged_addr(far); > } > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr); > > return 0; Yeah, I reckon so. Probably just fold these into a single patch, though. Cheers, Will ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic 2025-07-15 14:02 ` Will Deacon @ 2025-07-15 16:09 ` Breno Leitao 0 siblings, 0 replies; 5+ messages in thread From: Breno Leitao @ 2025-07-15 16:09 UTC (permalink / raw) To: Will Deacon Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, kernel-team, osandov, leo.yan, rmikey On Tue, Jul 15, 2025 at 03:02:13PM +0100, Will Deacon wrote: > On Mon, Jul 14, 2025 at 05:26:43AM -0700, Breno Leitao wrote: > > On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote: > > > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote: > > > > > > --- a/arch/arm64/kernel/traps.c > > > > +++ b/arch/arm64/kernel/traps.c > > > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne > > > > > > > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr) > > > > { > > > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > > > > console_verbose(); > > > > > > > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n", > > > > > > If we're going to taint for SError, shouldn't we also taint for an > > > unclaimed SEA? > > > > Yes. I was not very familiar with SEA errors, given I haven't seen on in > > production yet, but, reading about it, that is another seems to crash > > the system due to hardware errors, thus, we want to taint MACHINE_CHECK. > > > > What about this? > > > > Author: Breno Leitao <leitao@debian.org> > > Date: Mon Jul 14 05:16:55 2025 -0700 > > > > arm64: Taint kernel on fatal hardware error in do_sea() > > > > This patch updates the do_sea() handler to taint the kernel with > > TAINT_MACHINE_CHECK when a fatal hardware error is detected and > > reported through Synchronous External Abort (SEA). By marking > > the kernel as tainted at the point of error, we improve > > post-mortem diagnostics and make it clear that a machine check > > or unrecoverable hardware fault has occurred. > > > > Suggested-by: Will Deacon <will@kernel.org> > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > > index 11eb8d1adc84..f590dc71ce99 100644 > > --- a/arch/arm64/mm/fault.c > > +++ b/arch/arm64/mm/fault.c > > @@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs) > > */ > > siaddr = untagged_addr(far); > > } > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > > arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr); > > > > return 0; > > Yeah, I reckon so. Probably just fold these into a single patch, though. Thanks. I test it better tomorrow, then send it. Thanks for the suggestions, --breno ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-15 19:19 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-10 10:46 [PATCH] arm64: traps: Mark kernel as tainted on SError panic Breno Leitao 2025-07-13 22:46 ` Will Deacon 2025-07-14 12:26 ` Breno Leitao 2025-07-15 14:02 ` Will Deacon 2025-07-15 16:09 ` Breno Leitao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox