Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Karolina Stolarek <karolina.stolarek@oracle.com>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Jon Pan-Doh <pandoh@google.com>,
	Terry Bowman <terry.bowman@amd.com>, Len Brown <lenb@kernel.org>,
	James Morse <james.morse@arm.com>,
	Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
	Ben Cheatham <Benjamin.Cheatham@amd.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Shuai Xue <xueshuai@linux.alibaba.com>,
	Liu Xinpeng <liuxp11@chinatelecom.cn>,
	Darren Hart <darren@os.amperecomputing.com>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH] PCI/AER: Consolidate CXL and native AER reporting paths
Date: Thu, 20 Mar 2025 13:17:36 -0500	[thread overview]
Message-ID: <20250320181736.GA1091349@bhelgaas> (raw)
In-Reply-To: <b919c39e-bb0f-40f6-84c9-f712404c0ac0@oracle.com>

On Thu, Mar 20, 2025 at 04:14:04PM +0100, Karolina Stolarek wrote:
> On 19/03/2025 23:21, Bjorn Helgaas wrote:
> > On Mon, Mar 17, 2025 at 10:14:46AM +0000, Karolina Stolarek wrote:
> > > Make CXL devices use aer_print_error() when reporting AER errors.
> > > Add a helper function to populate aer_err_info struct before logging
> > > an error. Move struct aer_err_info definition to the aer.h header
> > > to make it visible to CXL.
> > 
> > Previously, pci_print_aer() was used by both CXL (via
> > cxl_handle_rdport_errors()) and by ACPI GHES via aer_recover_queue()
> > and aer_recover_work_func(), right?
> > 
> > And after this patch, they would use aer_print_error() like native
> > AER, native DPC, and the ACPI EDR DPC path?
> 
> That is correct.

> > I think this consolidation is a good thing, because I don't think we
> > should log errors differently just because we learned about them via a
> > different path.
> > 
> > But I think this also changes the text we put in dmesg, which is
> > potentially disruptive to users and scripts that consume it, so I
> > think we should include a comparison of the previous and new text in
> > the commit log.
> 
> Like I said in a comment to the patch, I tested CXL error reporting in QEMU
> with and without my patch, and the output is the same:
> 
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0c:00.0
> pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0c:00.0
> pcieport 0000:0c:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
> pcieport 0000:0c:00.0:   device [8086:7075] error status/mask=00004000/0000a000
> pcieport 0000:0c:00.0:    [14] CorrIntErr

Maybe there's CXL magic that I missed.  It looks like Terry's series
changes some of this path.  And GHES also currently uses
pci_print_aer().  Some sample logs at [1,2].

Looking at v6.14-rc1, only aer_print_error() logs the "error status"
string, and only pci_print_aer() logs "aer_status", "aer_layer", etc.

The previous path is:

  pci_print_aer
    pci_err("aer_status: 0x%08x, aer_mask: 0x%08x\n")    <--
    __aer_print_error
    pci_err("aer_layer=%s, aer_agent=%s\n")              <--
    pcie_print_tlp_log

New path is:

  aer_print_error
    pci_printk("PCIe Bus Error: severity=%s, type=%s, (%s)\n")
    pci_printk("  device [%04x:%04x] error status/mask=%08x/%08x\n)
    __aer_print_error
    pcie_print_tlp_log

So I expected that the lines I marked in pci_print_aer() would be
different.

Bjorn

[1] https://lore.kernel.org/lkml/2149597.8uJZFlvqrj@xrated/T/
[2] https://lore.kernel.org/all/e8a58616-aeae-ad78-d496-6dfcef4ddcaa@arm.com/T/

  reply	other threads:[~2025-03-20 18:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17 10:14 [PATCH] PCI/AER: Consolidate CXL and native AER reporting paths Karolina Stolarek
2025-03-19 22:21 ` Bjorn Helgaas
2025-03-20 15:14   ` Karolina Stolarek
2025-03-20 18:17     ` Bjorn Helgaas [this message]
2025-03-21 13:56       ` Karolina Stolarek
2025-03-21 15:06         ` Bjorn Helgaas
2025-03-24 19:31           ` Karolina Stolarek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250320181736.GA1091349@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=darren@os.amperecomputing.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=karolina.stolarek@oracle.com \
    --cc=lenb@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liuxp11@chinatelecom.cn \
    --cc=pandoh@google.com \
    --cc=terry.bowman@amd.com \
    --cc=tony.luck@intel.com \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox