Linux PCI subsystem development
 help / color / mirror / Atom feed
From: "Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com>
To: linux-kernel@vger.kernel.org, Dan Williams <dan.j.williams@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-acpi@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-pci@vger.kernel.org,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
Date: Wed, 23 Oct 2024 15:35:18 +0200	[thread overview]
Message-ID: <8286502.jJDZkT8p0M@fdefranc-mobl3> (raw)
In-Reply-To: <66b27fe8d73fe_c144829438@dwillia2-xfh.jf.intel.com.notmuch>

On Tuesday, August 6, 2024 9:56:24 PM GMT+2 Dan Williams wrote:
> Fabio M. De Francesco wrote:
> > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
> 
> I think the critical detail is is that print_extlog_rcd() is only
> triggered when ras_userspace_consumers() returns true. The observation
> is that ras_userspace_consumers() hides information from the trace path
> when the intended purpose of it was to hide duplicate emissions to the
> kernel log when userspace is watching the tracepoints.
>
> Setting aside whether ras_userspace_consumers() is still a good idea or
> not, it is obvious that this patch as is may surprise environments that
> start seeing kernel error logs where the kernel was silent before.
>
> I think the path of least surprise would be to make sure that
> pci_print_aer() optionally skips emitting to the kernel log when not
> needed wanted.

Sorry for replying so late...

I'm not entirely sure that users would not prefer to be surprised by 
_finally_ seeing kernel error logs for failing PCIe components. I suspect 
that users might have been confused by not seeing any output.
 
> So perhaps first do a lead-in patch to optionally quiet the print
> messages in pci_print_aer() and then pass in KERN_DEBUG from the
> extlog_print() path. Then we can decide later what to do about
> ras_userspace_consumers().

Anyway, I'll do it.

> > the similar ghes_do_proc() (GHES) prints to kernel log and calls
> > pci_print_aer() to report via the ftrace infrastructure.
> > 
> > Add support to report the CPER PCIe Error section also via the ftrace
> > infrastructure by calling pci_print_aer() to make ELOG act consistently
> > with GHES.
> 
> You might also want to explain a bit about the motivation for this which
> is that I/O Machine Check Arcitecture events may signal failing PCIe
> components or links. The AER event contains details on what was
> happening on the wire when the error was signaled.

Yes, I agree.

> > 
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Fabio M. De Francesco 
<fabio.m.de.francesco@linux.intel.com>
> > ---
> >  drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> >  drivers/pci/pcie/aer.c     |  2 +-
> >  include/linux/aer.h        | 13 +++++++++++--
> >  3 files changed, 42 insertions(+), 3 deletions(-)
> > 
> > [...]
> >
> > +		pci_print_aer(pdev, aer_severity, aer);
> 
> ...per above this would become:
> 
>     pci_print_aer(KERN_DEBUG, pdev, aer_severity, aer);
> 
> [..]
> 
> Rest of the changes look good to me.

I need to be sure that I understood...

void pci_print_aer(char *level, struct pci_dev *dev, int aer_severity,
                   struct aer_capability_regs *aer)
{
        [...]

        if (printk_get_level(level) <= console_loglevel) {
                pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
                        status, mask);
                __aer_print_error(dev, &info);
                pci_err(dev, "aer_layer=%s, aer_agent=%s\n",
                        aer_error_layer[layer], aer_agent_string[agent]);

                if (aer_severity != AER_CORRECTABLE)
                        pci_err(dev, "aer_uncor_severity: 0x%08x\n",
                                aer->uncor_severity);

                if (tlp_header_valid)
                        __print_tlp_header(dev, &aer->header_log);
        }

        [...]
}	

It would require changing a couple of call sites, like in    
aer_recover_work_func():

pci_print_aer(KERN_ERR, pdev, entry.severity, entry.regs);
 
Would you please confirm that the code shown above is what
you asked for?

Thanks,

Fabio



  reply	other threads:[~2024-10-23 13:35 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-27 14:43 [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2024-08-06 19:21   ` Dan Williams
2024-08-06 21:07   ` Bjorn Helgaas
2024-05-27 14:43 ` [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
2024-08-06 19:56   ` Dan Williams
2024-10-23 13:35     ` Fabio M. De Francesco [this message]
2024-12-11  1:51       ` Dan Williams
2024-08-06 21:31   ` Bjorn Helgaas
2024-08-07 20:28     ` Dan Williams
2024-08-07 20:31       ` Dan Williams
2024-07-23 13:43 ` [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8286502.jJDZkT8p0M@fdefranc-mobl3 \
    --to=fabio.m.de.francesco@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox