From: Breno Leitao <leitao@debian.org>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>, James Morse <james.morse@arm.com>,
Borislav Petkov <bp@alien8.de>,
linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel-team@meta.com, kbusch@kernel.org, rmikey@meta.com
Subject: Re: [PATCH] acpi/ghes: add TAINT_MACHINE_CHECK on GHES panic path
Date: Wed, 2 Jul 2025 10:22:50 -0700 [thread overview]
Message-ID: <aGVq6khN+QdqD5Aj@gmail.com> (raw)
In-Reply-To: <aGVe4nv18dRHHV16@agluck-desk3>
On Wed, Jul 02, 2025 at 09:31:30AM -0700, Luck, Tony wrote:
> On Wed, Jul 02, 2025 at 08:39:51AM -0700, Breno Leitao wrote:
> > When a GHES (Generic Hardware Error Source) triggers a panic, add the
> > TAINT_MACHINE_CHECK taint flag to the kernel. This explicitly marks the
>
> While it might not strictly be a machine check that caused GHES to
> panic, it seems close enough from the available TAINT options.
Right, that was my reasoning as well. There are other cases where
TAINT_MACHINE_CHECK is set when the Hardware is broken.
> So unless someone feels it would be better to create a new TAINT
> flag (TAINT_FATAL_GHES? TAINT_FIRMWARE_REPORTED_FATAL_ERRROR?)
> then this seems OK to me.
Thanks. That brings another topic. I am seeing crashes and warnings that
are only happening after recoverable errors. I.e, there is a GHES
recoverable error, and then machine crashes minutes later. A classical
example is when the PCI downstream port disappear, and recovers later,
re-enumerating everything, which is simply chaotic.
I would like to be able to correlate the crash/warning with a machine
that had a recoverable error. At scale, this improves the kernel
monitoring by a lot.
So, if we go toward using TAINT_FATAL_GHES, can we have two flavors?
TAINT_FATAL_GHES_RECOVERABLE and TAINT_FATAL_GHES_FATAL?
Thanks for the review,
--breno
> Reviewed-by: Tony Luck <tony.luck@intel.com>
>
> > kernel as tainted due to a machine check event, improving diagnostics
> > and post-mortem analysis. The taint is set with LOCKDEP_STILL_OK to
> > indicate lockdep remains valid.
> >
> > At large scale deployment, this helps to quickly determin panics that
> > are coming due to hardware failures.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > drivers/acpi/apei/ghes.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> > index f0584ccad4519..3d44f926afe8e 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -1088,6 +1088,8 @@ static void __ghes_panic(struct ghes *ghes,
> >
> > __ghes_print_estatus(KERN_EMERG, ghes->generic, estatus);
> >
> > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> > +
> > ghes_clear_estatus(ghes, estatus, buf_paddr, fixmap_idx);
> >
> > if (!panic_timeout)
> >
> > ---
> > base-commit: e96ee511c906c59b7c4e6efd9d9b33917730e000
> > change-id: 20250702-add_tain-902925f3eb96
> >
> > Best regards,
> > --
> > Breno Leitao <leitao@debian.org>
> >
next prev parent reply other threads:[~2025-07-02 17:22 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 15:39 [PATCH] acpi/ghes: add TAINT_MACHINE_CHECK on GHES panic path Breno Leitao
2025-07-02 16:31 ` Luck, Tony
2025-07-02 17:22 ` Breno Leitao [this message]
2025-07-02 17:54 ` Luck, Tony
2025-07-03 9:01 ` Breno Leitao
2025-07-03 17:51 ` Luck, Tony
2025-07-03 13:24 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGVq6khN+QdqD5Aj@gmail.com \
--to=leitao@debian.org \
--cc=bp@alien8.de \
--cc=james.morse@arm.com \
--cc=kbusch@kernel.org \
--cc=kernel-team@meta.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=rmikey@meta.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.