From: Dan Williams <dan.j.williams@intel.com>
To: Dan Williams <dan.j.williams@intel.com>,
Bjorn Helgaas <helgaas@kernel.org>,
"Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>,
Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
Oliver O'Halloran <oohall@gmail.com>,
Bjorn Helgaas <bhelgaas@google.com>,
<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
<linuxppc-dev@lists.ozlabs.org>, <linux-pci@vger.kernel.org>,
Dan Williams <dan.j.williams@intel.com>, <bp@alien8.de>
Subject: Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
Date: Wed, 7 Aug 2024 13:31:30 -0700 [thread overview]
Message-ID: <66b3d9a24ae9b_2142c2941b@dwillia2-mobl3.amr.corp.intel.com.notmuch> (raw)
In-Reply-To: <66b3d8ec30e74_2142c2945d@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Dan Williams wrote:
> [ add Boris ]
[ actually add Boris ]
Boris, see below, thoughts on deprecating acpi_extlog...
> Bjorn Helgaas wrote:
> > On Mon, May 27, 2024 at 04:43:41PM +0200, Fabio M. De Francesco wrote:
> > > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd(). Instead,
> > > the similar ghes_do_proc() (GHES) prints to kernel log and calls
> > > pci_print_aer() to report via the ftrace infrastructure.
> > >
> > > Add support to report the CPER PCIe Error section also via the ftrace
> > > infrastructure by calling pci_print_aer() to make ELOG act consistently
> > > with GHES.
> > >
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> > > ---
> > > drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> > > drivers/pci/pcie/aer.c | 2 +-
> > > include/linux/aer.h | 13 +++++++++++--
> > > 3 files changed, 42 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> > > index e025ae390737..007ce96f8672 100644
> > > --- a/drivers/acpi/acpi_extlog.c
> > > +++ b/drivers/acpi/acpi_extlog.c
> > > @@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
> > > return 1;
> > > }
> > >
> > > +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> > > + int severity)
> > > +{
> > > + struct aer_capability_regs *aer;
> > > + struct pci_dev *pdev;
> > > + unsigned int devfn;
> > > + unsigned int bus;
> > > + int aer_severity;
> > > + int domain;
> > > +
> > > + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
> > > + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
> > > + aer_severity = cper_severity_to_aer(severity);
> > > + aer = (struct aer_capability_regs *)pcie_err->aer_info;
> > > + domain = pcie_err->device_id.segment;
> > > + bus = pcie_err->device_id.bus;
> > > + devfn = PCI_DEVFN(pcie_err->device_id.device,
> > > + pcie_err->device_id.function);
> > > + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> > > + if (!pdev)
> > > + return;
> > > + pci_print_aer(pdev, aer_severity, aer);
> > > + pci_dev_put(pdev);
> > > + }
> >
> > I'm 100% in favor of making error reporting work and look the same
> > across GHES and ELOG. But I do have to gripe a bit...
> >
> > It's already unfortunate that GHES and the native AER handling are
> > separate paths that lead to the same place (__aer_print_error()).
> >
> > I'm sorry that we need to add a third path that again does
> > fundamentally the same thing. The fact that they're separate means
> > all the design, reviewing, testing, and maintenance effort is diluted,
> > and error handling always gets too little love in the first place.
> > I think this is a recipe for confusion.
> >
> > ghes_do_proc # GHES
> > apei_estatus_for_each_section
> > ...
> > if (guid_equal(sec_type, &CPER_SEC_PCIE))
> > ghes_handle_aer
> > cper_severity_to_aer
> > aer_recover_queue
> > kfifo_in_spinlocked(&aer_recover_ring) # add to queue
> > aer_recover_work_func # another thread
> > kfifo_get(&aer_recover_ring) # remove from queue
> > pci_print_aer
> > __aer_print_error <---
> >
> > aer_irq # native AER
> > kfifo_put(&aer_fifo) # add to queue
> > aer_isr # another thread
> > kfifo_get(&aer_fifo) # remove from queue
> > ...
> > aer_isr_one_error
> > aer_process_err_devices
> > aer_print_error
> > __aer_print_error <---
> >
> > extlog_print # extlog (x86 only)
> > apei_estatus_for_each_section
> > ...
> > if (guid_equal(sec_type, &CPER_SEC_PCIE))
> > extlog_print_pcie
> > cper_severity_to_aer
> > pci_get_domain_bus_and_slot
> > pci_print_aer
> > __aer_print_error <---
> >
> > And we also have CXL paths that lead to __aer_print_error(), although
> > it seems like they at least start in the native AER (and maybe GHES?)
> > path and branch out somewhere. My head is spinning.
> >
> > Do I *object* to this patch? No, not really; it's a trivial change in
> > drivers/pci/, and Rafael can add my
> >
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> >
> > as needed. But I am afraid we're making ourselves a maintenance
> > headache.
>
> To be honest, I am too. Upon discovering that extlog_print() behaves
> differently than ghes_do_proc(), I had the snarky thought "great, can we
> now just go ahead and deprecate the extlog path, it's just a source of
> maintenance pain.".
>
> So *if*we keep acpi_extlog it then I definitely think it should be
> consistent with other CPER handlers (needs this patch). But, I am also
> open to entertaining "deprecate it".
>
> To me, the fact that ras_userspace_consumers() is only honored for
> acpi_extlog is clear evidence that the kernel has already painted itself
> into a confusing user ABI corner and maybe the proper path forward at
> this point is to cut acpi_extlog loose.
next prev parent reply other threads:[~2024-08-07 20:31 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-27 14:43 [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2024-08-06 19:21 ` Dan Williams
2024-08-06 21:07 ` Bjorn Helgaas
2024-05-27 14:43 ` [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
2024-08-06 19:56 ` Dan Williams
2024-10-23 13:35 ` Fabio M. De Francesco
2024-12-11 1:51 ` Dan Williams
2024-08-06 21:31 ` Bjorn Helgaas
2024-08-07 20:28 ` Dan Williams
2024-08-07 20:31 ` Dan Williams [this message]
2024-07-23 13:43 ` [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66b3d9a24ae9b_2142c2941b@dwillia2-mobl3.amr.corp.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=fabio.m.de.francesco@linux.intel.com \
--cc=helgaas@kernel.org \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox