From: "Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com>
To: linux-kernel@vger.kernel.org, Dan Williams <dan.j.williams@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>,
Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
Oliver O'Halloran <oohall@gmail.com>,
Bjorn Helgaas <bhelgaas@google.com>,
linux-acpi@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
linux-pci@vger.kernel.org,
Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
Date: Wed, 23 Oct 2024 15:35:18 +0200 [thread overview]
Message-ID: <8286502.jJDZkT8p0M@fdefranc-mobl3> (raw)
In-Reply-To: <66b27fe8d73fe_c144829438@dwillia2-xfh.jf.intel.com.notmuch>
On Tuesday, August 6, 2024 9:56:24 PM GMT+2 Dan Williams wrote:
> Fabio M. De Francesco wrote:
> > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
>
> I think the critical detail is is that print_extlog_rcd() is only
> triggered when ras_userspace_consumers() returns true. The observation
> is that ras_userspace_consumers() hides information from the trace path
> when the intended purpose of it was to hide duplicate emissions to the
> kernel log when userspace is watching the tracepoints.
>
> Setting aside whether ras_userspace_consumers() is still a good idea or
> not, it is obvious that this patch as is may surprise environments that
> start seeing kernel error logs where the kernel was silent before.
>
> I think the path of least surprise would be to make sure that
> pci_print_aer() optionally skips emitting to the kernel log when not
> needed wanted.
Sorry for replying so late...
I'm not entirely sure that users would not prefer to be surprised by
_finally_ seeing kernel error logs for failing PCIe components. I suspect
that users might have been confused by not seeing any output.
> So perhaps first do a lead-in patch to optionally quiet the print
> messages in pci_print_aer() and then pass in KERN_DEBUG from the
> extlog_print() path. Then we can decide later what to do about
> ras_userspace_consumers().
Anyway, I'll do it.
> > the similar ghes_do_proc() (GHES) prints to kernel log and calls
> > pci_print_aer() to report via the ftrace infrastructure.
> >
> > Add support to report the CPER PCIe Error section also via the ftrace
> > infrastructure by calling pci_print_aer() to make ELOG act consistently
> > with GHES.
>
> You might also want to explain a bit about the motivation for this which
> is that I/O Machine Check Arcitecture events may signal failing PCIe
> components or links. The AER event contains details on what was
> happening on the wire when the error was signaled.
Yes, I agree.
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Fabio M. De Francesco
<fabio.m.de.francesco@linux.intel.com>
> > ---
> > drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> > drivers/pci/pcie/aer.c | 2 +-
> > include/linux/aer.h | 13 +++++++++++--
> > 3 files changed, 42 insertions(+), 3 deletions(-)
> >
> > [...]
> >
> > + pci_print_aer(pdev, aer_severity, aer);
>
> ...per above this would become:
>
> pci_print_aer(KERN_DEBUG, pdev, aer_severity, aer);
>
> [..]
>
> Rest of the changes look good to me.
I need to be sure that I understood...
void pci_print_aer(char *level, struct pci_dev *dev, int aer_severity,
struct aer_capability_regs *aer)
{
[...]
if (printk_get_level(level) <= console_loglevel) {
pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
status, mask);
__aer_print_error(dev, &info);
pci_err(dev, "aer_layer=%s, aer_agent=%s\n",
aer_error_layer[layer], aer_agent_string[agent]);
if (aer_severity != AER_CORRECTABLE)
pci_err(dev, "aer_uncor_severity: 0x%08x\n",
aer->uncor_severity);
if (tlp_header_valid)
__print_tlp_header(dev, &aer->header_log);
}
[...]
}
It would require changing a couple of call sites, like in
aer_recover_work_func():
pci_print_aer(KERN_ERR, pdev, entry.severity, entry.regs);
Would you please confirm that the code shown above is what
you asked for?
Thanks,
Fabio
next prev parent reply other threads:[~2024-10-23 13:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-27 14:43 [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2024-08-06 19:21 ` Dan Williams
2024-08-06 21:07 ` Bjorn Helgaas
2024-05-27 14:43 ` [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
2024-08-06 19:56 ` Dan Williams
2024-10-23 13:35 ` Fabio M. De Francesco [this message]
2024-12-11 1:51 ` Dan Williams
2024-08-06 21:31 ` Bjorn Helgaas
2024-08-07 20:28 ` Dan Williams
2024-08-07 20:31 ` Dan Williams
2024-07-23 13:43 ` [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8286502.jJDZkT8p0M@fdefranc-mobl3 \
--to=fabio.m.de.francesco@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox