Linux ACPI
 help / color / mirror / Atom feed
From: "Dan Williams (nvidia)" <djbw@kernel.org>
To: "Bowman, Terry" <terry.bowman@amd.com>,
	 "Dan Williams (nvidia)" <djbw@kernel.org>,
	 Jonathan Cameron <jic23@kernel.org>
Cc: dave@stgolabs.net,  dave.jiang@intel.com,
	 alison.schofield@intel.com,  bhelgaas@google.com,
	 shiju.jose@huawei.com,  ming.li@zohomail.com,
	 Smita.KoralahalliChannabasappa@amd.com,  rrichter@amd.com,
	 dan.carpenter@linaro.org,  PradeepVineshReddy.Kodamati@amd.com,
	 lukas@wunner.de,  Benjamin.Cheatham@amd.com,
	 sathyanarayanan.kuppuswamy@linux.intel.com,
	 vishal.l.verma@intel.com,  alucerop@amd.com,
	 ira.weiny@intel.com,  corbet@lwn.net,  rafael@kernel.org,
	 xueshuai@linux.alibaba.com,  linux-cxl@vger.kernel.org,
	 linux-kernel@vger.kernel.org,  linux-pci@vger.kernel.org,
	 linux-acpi@vger.kernel.org,  linux-doc@vger.kernel.org,
	 Mauro Carvalho Chehab <mchehab@kernel.org>
Subject: Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace events
Date: Mon, 11 May 2026 16:28:49 -0700	[thread overview]
Message-ID: <6a026631b4f86_1b86a100d7@djbw-dev.notmuch> (raw)
In-Reply-To: <09796934-e093-44e6-b6e2-2d0dd5a29673@amd.com>

Bowman, Terry wrote:
> On 5/8/2026 10:49 PM, Dan Williams (nvidia) wrote:
> > Jonathan Cameron wrote:
> >> On Thu, 7 May 2026 13:33:45 -0500
> >> "Bowman, Terry" <terry.bowman@amd.com> wrote:
> > [..]
> >>>> This concerns me (sorry I wasn't paying attention to the v16 thread).
> >>>> It is a userspace regression against code that is out in the wild and typically
> >>>> not updated in sync with the kernel.
> >>>>
> >>>> If you are suggesting breaking ras-daemon at the very least +CC the maintainer.
> > 
> > Sorry, that was not the intent, see below.
> > 
> >>>>
> >>>> To get to a unified tracepoint add a new one that does what you want, but
> >>>> maintain the existing ones as well.  Userspace can then migrate and maybe
> >>>> in 5+ years time we can delete the non unified ones.
> >>>>
> >>>> No actually comments on the code, just left it all here for Mauro,
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Jonathan
> >>>>   
> >>>
> >>> Dan was clear about using a single set of CE and UE handlers for all CXL RAS 
> >>> protocol errors. While I understand there may be concerns, please direct any 
> >>> objections to Dan and clarify what changes are required to avoid this 
> >>> repeatedly going back and forth.
> >>>
> >>> [1] https://lore.kernel.org/linux-cxl/69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch/
> >>
> >> Sure - Dan's on this thread so I'm sure he'll see it sooner or later.
> >>
> >> Perhaps I'm missing something that makes this less critical than it appears.
> > 
> > No, it is breakage and a thinko on my part on the advice to Terry on the
> > backwards compatibility rules for tracepoints. At the time I was only
> > tracking data type and order of the payload. I.e. string at same
> > position. However, the name of the argument is ABI.
> > 
> > Something like this incremental fixup I think gets this back on track.
> > It keeps legacy ABI support for "memdev" field in the payload. It
> > incrementally lets updated userspace understand "port" and "dport"
> > events. It stops us from growing a new set of events just to update the
> > arguments. It enhances the CPER events to now handle switch ports in
> > addition to endpoint ports.
> > 
> > The bulk of the change is passing @port and @dport to the CXL trace
> > events instead of a plain @dev.
> > 
> 
> Thanks Dan and Jonathan,
> 
> I have a few questions.
> 
> Does this miss logging the Upstream SwitchPort device errors? Add another 
> entry "uport=$"?
> 
> How does the user know which of the devices (memdev, port, or dport) is the 
> erroring device? Do the traces need another string variable inidicating which 
> device triggered the error?

I expect that can be determined from what values get populated.

Endpoint:
memdev=memX port=endpointY dport= host=parent(memX)

Downstream:
memdev= port=portX dport=dport_dev(dportY) host=uport_dev(portX)

Upstream:
memdev= port=portX dport= host=uport_dev(portX)

If dport= is populated, that is the device that triggered the error,
otherwise it is the host= value.

> And, I need to confirm: the Endpoint is NULL unless the CXL Port is an Endpoint 
> Port?

You mean memdev is empty, right? 

  reply	other threads:[~2026-05-11 23:28 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 17:30 [PATCH v17 00/11] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-05-05 17:30 ` [PATCH v17 01/11] PCI/AER: Introduce AER-CXL Kfifo Terry Bowman
2026-05-05 21:17   ` Dave Jiang
2026-05-07 17:53   ` Jonathan Cameron
2026-05-07 18:26     ` Bowman, Terry
2026-05-05 17:30 ` [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace events Terry Bowman
2026-05-05 21:46   ` Dave Jiang
2026-05-07 18:08   ` Jonathan Cameron
2026-05-07 18:33     ` Bowman, Terry
2026-05-08 14:05       ` Jonathan Cameron
2026-05-09  3:49         ` Dan Williams (nvidia)
2026-05-11 12:51           ` Bowman, Terry
2026-05-11 23:28             ` Dan Williams (nvidia) [this message]
2026-05-05 17:30 ` [PATCH v17 03/11] cxl: Use common CPER handling for all CXL devices Terry Bowman
2026-05-05 22:02   ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 04/11] cxl: Rename find_cxl_port() to find_cxl_port_by_dport() Terry Bowman
2026-05-05 22:06   ` Dave Jiang
2026-05-07 18:11     ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 05/11] cxl: Limit CXL-CPER kfifo registration functions scope Terry Bowman
2026-05-05 22:16   ` Dave Jiang
2026-05-07 18:14   ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 06/11] PCI: Establish common CXL Port protocol error flow Terry Bowman
2026-05-07 18:22   ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 07/11] PCI/CXL: Add RCH support to CXL handlers Terry Bowman
2026-05-05 23:59   ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 08/11] cxl: Remove Endpoint AER correctable handler Terry Bowman
2026-05-05 17:30 ` [PATCH v17 09/11] cxl: Update Endpoint AER uncorrectable handler Terry Bowman
2026-05-06 17:43   ` Dave Jiang
2026-05-07 18:25     ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 10/11] PCI/CXL: Mask/Unmask CXL protocol errors Terry Bowman
2026-05-06 18:00   ` Dave Jiang
2026-05-11 21:04     ` Bowman, Terry
2026-05-11 22:36       ` Dave Jiang
2026-05-07 18:29   ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 11/11] Documentation: cxl: Document CXL protocol error handling Terry Bowman
2026-05-06 18:34   ` Dave Jiang
2026-05-07 18:51   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a026631b4f86_1b86a100d7@djbw-dev.notmuch \
    --to=djbw@kernel.org \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=PradeepVineshReddy.Kodamati@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=alucerop@amd.com \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=dan.carpenter@linaro.org \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jic23@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mchehab@kernel.org \
    --cc=ming.li@zohomail.com \
    --cc=rafael@kernel.org \
    --cc=rrichter@amd.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shiju.jose@huawei.com \
    --cc=terry.bowman@amd.com \
    --cc=vishal.l.verma@intel.com \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox