From: Dan Williams <dan.j.williams@intel.com>
To: "Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>,
"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
Oliver O'Halloran <oohall@gmail.com>,
Bjorn Helgaas <bhelgaas@google.com>,
<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
<linuxppc-dev@lists.ozlabs.org>, <linux-pci@vger.kernel.org>,
Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
Date: Tue, 6 Aug 2024 12:56:24 -0700 [thread overview]
Message-ID: <66b27fe8d73fe_c144829438@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <20240527144356.246220-3-fabio.m.de.francesco@linux.intel.com>
Fabio M. De Francesco wrote:
> Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
I think the critical detail is is that print_extlog_rcd() is only
triggered when ras_userspace_consumers() returns true. The observation
is that ras_userspace_consumers() hides information from the trace path
when the intended purpose of it was to hide duplicate emissions to the
kernel log when userspace is watching the tracepoints.
Setting aside whether ras_userspace_consumers() is still a good idea or
not, it is obvious that this patch as is may surprise environments that
start seeing kernel error logs where the kernel was silent before.
I think the path of least surprise would be to make sure that
pci_print_aer() optionally skips emitting to the kernel log when not
needed wanted.
So perhaps first do a lead-in patch to optionally quiet the print
messages in pci_print_aer() and then pass in KERN_DEBUG from the
extlog_print() path. Then we can decide later what to do about
ras_userspace_consumers().
> the similar ghes_do_proc() (GHES) prints to kernel log and calls
> pci_print_aer() to report via the ftrace infrastructure.
>
> Add support to report the CPER PCIe Error section also via the ftrace
> infrastructure by calling pci_print_aer() to make ELOG act consistently
> with GHES.
You might also want to explain a bit about the motivation for this which
is that I/O Machine Check Arcitecture events may signal failing PCIe
components or links. The AER event contains details on what was
happening on the wire when the error was signaled.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> ---
> drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> drivers/pci/pcie/aer.c | 2 +-
> include/linux/aer.h | 13 +++++++++++--
> 3 files changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index e025ae390737..007ce96f8672 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
> return 1;
> }
>
> +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> + int severity)
> +{
> + struct aer_capability_regs *aer;
> + struct pci_dev *pdev;
> + unsigned int devfn;
> + unsigned int bus;
> + int aer_severity;
> + int domain;
> +
> + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
> + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
> + aer_severity = cper_severity_to_aer(severity);
> + aer = (struct aer_capability_regs *)pcie_err->aer_info;
> + domain = pcie_err->device_id.segment;
> + bus = pcie_err->device_id.bus;
> + devfn = PCI_DEVFN(pcie_err->device_id.device,
> + pcie_err->device_id.function);
> + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> + if (!pdev)
> + return;
> + pci_print_aer(pdev, aer_severity, aer);
...per above this would become:
pci_print_aer(KERN_DEBUG, pdev, aer_severity, aer);
[..]
Rest of the changes look good to me.
next prev parent reply other threads:[~2024-08-06 19:56 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-27 14:43 [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2024-08-06 19:21 ` Dan Williams
2024-08-06 21:07 ` Bjorn Helgaas
2024-05-27 14:43 ` [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
2024-08-06 19:56 ` Dan Williams [this message]
2024-10-23 13:35 ` Fabio M. De Francesco
2024-12-11 1:51 ` Dan Williams
2024-08-06 21:31 ` Bjorn Helgaas
2024-08-07 20:28 ` Dan Williams
2024-08-07 20:31 ` Dan Williams
2024-07-23 13:43 ` [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66b27fe8d73fe_c144829438@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=bhelgaas@google.com \
--cc=fabio.m.de.francesco@linux.intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox