From: "Dan Williams (nvidia)" <djbw@kernel.org>
To: "Bowman, Terry" <terry.bowman@amd.com>,
"Dan Williams (nvidia)" <djbw@kernel.org>,
Jonathan Cameron <jic23@kernel.org>
Cc: dave@stgolabs.net, dave.jiang@intel.com,
alison.schofield@intel.com, bhelgaas@google.com,
shiju.jose@huawei.com, ming.li@zohomail.com,
Smita.KoralahalliChannabasappa@amd.com, rrichter@amd.com,
dan.carpenter@linaro.org, PradeepVineshReddy.Kodamati@amd.com,
lukas@wunner.de, Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
vishal.l.verma@intel.com, alucerop@amd.com,
ira.weiny@intel.com, corbet@lwn.net, rafael@kernel.org,
xueshuai@linux.alibaba.com, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-doc@vger.kernel.org,
Mauro Carvalho Chehab <mchehab@kernel.org>
Subject: Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace events
Date: Mon, 11 May 2026 16:28:49 -0700 [thread overview]
Message-ID: <6a026631b4f86_1b86a100d7@djbw-dev.notmuch> (raw)
In-Reply-To: <09796934-e093-44e6-b6e2-2d0dd5a29673@amd.com>
Bowman, Terry wrote:
> On 5/8/2026 10:49 PM, Dan Williams (nvidia) wrote:
> > Jonathan Cameron wrote:
> >> On Thu, 7 May 2026 13:33:45 -0500
> >> "Bowman, Terry" <terry.bowman@amd.com> wrote:
> > [..]
> >>>> This concerns me (sorry I wasn't paying attention to the v16 thread).
> >>>> It is a userspace regression against code that is out in the wild and typically
> >>>> not updated in sync with the kernel.
> >>>>
> >>>> If you are suggesting breaking ras-daemon at the very least +CC the maintainer.
> >
> > Sorry, that was not the intent, see below.
> >
> >>>>
> >>>> To get to a unified tracepoint add a new one that does what you want, but
> >>>> maintain the existing ones as well. Userspace can then migrate and maybe
> >>>> in 5+ years time we can delete the non unified ones.
> >>>>
> >>>> No actually comments on the code, just left it all here for Mauro,
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Jonathan
> >>>>
> >>>
> >>> Dan was clear about using a single set of CE and UE handlers for all CXL RAS
> >>> protocol errors. While I understand there may be concerns, please direct any
> >>> objections to Dan and clarify what changes are required to avoid this
> >>> repeatedly going back and forth.
> >>>
> >>> [1] https://lore.kernel.org/linux-cxl/69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch/
> >>
> >> Sure - Dan's on this thread so I'm sure he'll see it sooner or later.
> >>
> >> Perhaps I'm missing something that makes this less critical than it appears.
> >
> > No, it is breakage and a thinko on my part on the advice to Terry on the
> > backwards compatibility rules for tracepoints. At the time I was only
> > tracking data type and order of the payload. I.e. string at same
> > position. However, the name of the argument is ABI.
> >
> > Something like this incremental fixup I think gets this back on track.
> > It keeps legacy ABI support for "memdev" field in the payload. It
> > incrementally lets updated userspace understand "port" and "dport"
> > events. It stops us from growing a new set of events just to update the
> > arguments. It enhances the CPER events to now handle switch ports in
> > addition to endpoint ports.
> >
> > The bulk of the change is passing @port and @dport to the CXL trace
> > events instead of a plain @dev.
> >
>
> Thanks Dan and Jonathan,
>
> I have a few questions.
>
> Does this miss logging the Upstream SwitchPort device errors? Add another
> entry "uport=$"?
>
> How does the user know which of the devices (memdev, port, or dport) is the
> erroring device? Do the traces need another string variable inidicating which
> device triggered the error?
I expect that can be determined from what values get populated.
Endpoint:
memdev=memX port=endpointY dport= host=parent(memX)
Downstream:
memdev= port=portX dport=dport_dev(dportY) host=uport_dev(portX)
Upstream:
memdev= port=portX dport= host=uport_dev(portX)
If dport= is populated, that is the device that triggered the error,
otherwise it is the host= value.
> And, I need to confirm: the Endpoint is NULL unless the CXL Port is an Endpoint
> Port?
You mean memdev is empty, right?
next prev parent reply other threads:[~2026-05-11 23:28 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 17:30 [PATCH v17 00/11] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-05-05 17:30 ` [PATCH v17 01/11] PCI/AER: Introduce AER-CXL Kfifo Terry Bowman
2026-05-05 21:17 ` Dave Jiang
2026-05-07 17:53 ` Jonathan Cameron
2026-05-07 18:26 ` Bowman, Terry
2026-05-05 17:30 ` [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace events Terry Bowman
2026-05-05 21:46 ` Dave Jiang
2026-05-07 18:08 ` Jonathan Cameron
2026-05-07 18:33 ` Bowman, Terry
2026-05-08 14:05 ` Jonathan Cameron
2026-05-09 3:49 ` Dan Williams (nvidia)
2026-05-11 12:51 ` Bowman, Terry
2026-05-11 23:28 ` Dan Williams (nvidia) [this message]
2026-05-05 17:30 ` [PATCH v17 03/11] cxl: Use common CPER handling for all CXL devices Terry Bowman
2026-05-05 22:02 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 04/11] cxl: Rename find_cxl_port() to find_cxl_port_by_dport() Terry Bowman
2026-05-05 22:06 ` Dave Jiang
2026-05-07 18:11 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 05/11] cxl: Limit CXL-CPER kfifo registration functions scope Terry Bowman
2026-05-05 22:16 ` Dave Jiang
2026-05-07 18:14 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 06/11] PCI: Establish common CXL Port protocol error flow Terry Bowman
2026-05-07 18:22 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 07/11] PCI/CXL: Add RCH support to CXL handlers Terry Bowman
2026-05-05 23:59 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 08/11] cxl: Remove Endpoint AER correctable handler Terry Bowman
2026-05-05 17:30 ` [PATCH v17 09/11] cxl: Update Endpoint AER uncorrectable handler Terry Bowman
2026-05-06 17:43 ` Dave Jiang
2026-05-07 18:25 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 10/11] PCI/CXL: Mask/Unmask CXL protocol errors Terry Bowman
2026-05-06 18:00 ` Dave Jiang
2026-05-11 21:04 ` Bowman, Terry
2026-05-11 22:36 ` Dave Jiang
2026-05-07 18:29 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 11/11] Documentation: cxl: Document CXL protocol error handling Terry Bowman
2026-05-06 18:34 ` Dave Jiang
2026-05-07 18:51 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6a026631b4f86_1b86a100d7@djbw-dev.notmuch \
--to=djbw@kernel.org \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=corbet@lwn.net \
--cc=dan.carpenter@linaro.org \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jic23@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mchehab@kernel.org \
--cc=ming.li@zohomail.com \
--cc=rafael@kernel.org \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox