* [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging
@ 2026-01-14 18:20 Terry Bowman
2026-01-14 18:20 ` [PATCH v14 01/34] PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
` (33 more replies)
0 siblings, 34 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
This patchset updates CXL Protocol Error handling for CXL Ports and CXL
Endpoints(EP). Previous version of this series can be found here:
https://lore.kernel.org/linux-cxl/20251104170305.4163840-1-terry.bowman@amd.com/
Patches 1-3 introduce pcie_is_cxl(). This includes moving the DVSEC
definitions to uapi/linux/pci_regs.h and then updating formating to match
existing definitions.
Patches 4-12 improve maintainability by separating CXL RAS logic from AER
and CXL PCI logic. 3 new files are created as a result. CXL RCH code in the
AER driver is moved to new file, pci/pcie/aer_cxl_rch.c. The CXL driver's
RAS related code is moved to cxl/core/ras.c, also a new file.
cxl/core/ras_rch.c is introduced to holf the CXL drivers RCH logic. These
moves allow the existing #ifdef conditional compilation to be removed.
Patches 13-18 are general changes needed for CXL handling. These replace
locks with guard(), add bus type to AER logging, add AER documentation, and
replace kernel config CONFIG_PCIEAER_CXL for CONFIG_CXL_RAS.
Patches 19-29 update the CXL Port and Downstream Port discovery. This set
of patches moves Endpoint RAS registers to the parent CXL Endpoint Port,
updates find_cxl_port_by_uport() to work with EP's, and moves dport RAS
setup to EP Port initialization. This subset of patches also moves the AER
driver's remaining CXL logic into pci/pcie/aer_cxl_vh.c.
Patches 30-34 introduce AER-CXL kfifo dequeue logic and add CXL Port
protocol handlers for CXL Ports and EPs. CXL EPs will also include PCI
protocol error handlers. Recent changes in this patch subset include only
logging and handling the the error source device instead of recursing
children devices.
@Bjorn - The patches requiring PCI approval use a "PCI" prefixed title.
== Testing ==
Below are the testing results while using QEMU. The QEMU testing uses a
CXL Root Port, CXL Upstream Switch Port, CXL Downstream Switch Port and CXL
Endpoint as given below. I've attached the QEMU startup commandline used.
This testing uses protocol error injection at all the devices.
The sub-topology for the QEMU testing is:
---------------------
| CXL RP - 0C:00.0 |
---------------------
|
---------------------
| CXL USP - 0D:00.0 |
---------------------
|
---------------------
| CXL DSP - 0E:00.0 |
---------------------
|
---------------------
| CXL EP - 0F:00.0 |
---------------------
root@tbowman-cxl:~# lspci -t
-+-[0000:00]- -00.0
| +-01.0
| +-02.0
| +-03.0
| +-1f.0
| +-1f.2
| \-1f.3
\-[0000:0c]---00.0-[0d-0f]----00.0-[0e-0f]----00.0-[0f]----00.0
The topology was created with:
${qemu} -boot menu=on \
-cpu host \
-nographic \
-monitor telnet:127.0.0.1:1234,server,nowait \
-M virt,cxl=on \
-chardev stdio,id=s1,signal=off,mux=on -serial none \
-device isa-serial,chardev=s1 -mon chardev=s1,mode=readline \
-machine q35,cxl=on \
-m 16G,maxmem=24G,slots=8 \
-cpu EPYC-v3 \
-smp 32 \
-accel kvm \
-drive file=${img},format=raw,index=0,media=disk \
-device e1000,netdev=user.0 \
-netdev user,id=user.0,hostfwd=tcp::5555-:22 \
-object memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M \
-object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M \
-object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M \
-object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M \
-object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M \
-object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M \
-object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device cxl-type3,bus=swport0,volatile-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-vmem0 \
-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-device cxl-type3,bus=swport1,volatile-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-vmem1 \
-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-device cxl-type3,bus=swport2,volatile-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-vmem2 \
-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-device cxl-type3,bus=swport3,volatile-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-vmem3 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
=== Root Port ===
root@tbowman-cxl:~/aer-inject# ./root-ce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0c:00.0
pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0c:00.0
aer_event: 0000:0c:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
pcieport 0000:0c:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
pcieport 0000:0c:00.0: device [8086:7075] error status/mask=00004000/0000a000
pcieport 0000:0c:00.0: [14] CorrIntErr
cxl_port_aer_correctable_error: device=0000:0c:00.0 host=pci0000:0c status='CRC Threshold Hit'
root@tbowman-cxl:~/aer-inject# ./root-uce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0c:00.0
pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0c:00.0
aer_event: 0000:0c:00.0 CXL Bus Error: severity=Fatal, Uncorrectable Internal Error, TLP Header=Not available
pcieport 0000:0c:00.0: CXL Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
pcieport 0000:0c:00.0: device [8086:7075] error status/mask=00400000/02000000
pcieport 0000:0c:00.0: [22] UncorrIntErr
cxl_port_aer_uncorrectable_error: device=0000:0c:00.0 host=pci0000:0c status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Kernel panic - not syncing: CXL cachemem error.
CPU: 26 UID: 0 PID: 176 Comm: kworker/26:0 Tainted: G E 6.19.0-rc4-gd2320443c4cf #222 PREEMPT(voluntary)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: events cxl_proto_err_work_fn [cxl_core]
Call Trace:
<TASK>
dump_stack_lvl+0x26/0xc0
dump_stack+0x10/0x20
vpanic+0x35e/0x3b0
panic+0x57/0x60
cxl_proto_err_work_fn+0x1fc/0x210 [cxl_core]
process_one_work+0x22b/0x600
worker_thread+0x195/0x350
kthread+0x119/0x230
? __pfx_worker_thread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x261/0x2e0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Kernel Offset: disabled
---[ end Kernel panic - not syncing: CXL cachemem error. ]---
=== Upstream Port ===
root@tbowman-cxl:~/aer-inject# ./us-ce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0d:00.0
pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0d:00.0
aer_event: 0000:0d:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
pcieport 0000:0d:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
pcieport 0000:0d:00.0: device [19e5:a128] error status/mask=00004000/0000a000
pcieport 0000:0d:00.0: [14] CorrIntErr
cxl_port_aer_correctable_error: device=0000:0d:00.0 host=0000:0c:00.0 status='CRC Threshold Hit'
root@tbowman-cxl:~/aer-inject# ./us-uce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0d:00.0
pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0d:00.0
aer_event: 0000:0d:00.0 CXL Bus Error: severity=Fatal, , TLP Header=Not available
pcieport 0000:0d:00.0: AER: CXL Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID)
cxl_port_aer_uncorrectable_error: device=0000:0f:00.0 host=0000:0e:00.0 status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Kernel panic - not syncing: CXL cachemem error.
CPU: 26 UID: 0 PID: 247 Comm: irq/24-aerdrv Tainted: G E 6.19.0-rc4-gd2320443c4cf #222 PREEMPT(voluntary)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x26/0xc0
dump_stack+0x10/0x20
vpanic+0x35e/0x3b0
panic+0x57/0x60
cxl_pci_error_detected+0xbb/0xc0 [cxl_core]
report_error_detected+0xda/0x1d0
? __pfx_report_frozen_detected+0x10/0x10
__pci_walk_bus+0x51/0x80
? __pfx_report_frozen_detected+0x10/0x10
__pci_walk_bus+0x39/0x80
? __pfx_report_frozen_detected+0x10/0x10
__pci_walk_bus+0x39/0x80
? __pfx_report_frozen_detected+0x10/0x10
pci_walk_bus+0x32/0x50
pcie_do_recovery+0x302/0x450
? __pfx_aer_root_reset+0x10/0x10
aer_isr_one_error_type+0x18e/0x330
aer_isr_one_error+0x124/0x150
aer_isr+0x4c/0x80
irq_thread_fn+0x28/0x70
irq_thread+0x19a/0x2a0
? __pfx_irq_thread_fn+0x10/0x10
? __pfx_irq_thread_dtor+0x10/0x10
kthread+0x119/0x230
? __pfx_irq_thread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x261/0x2e0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Kernel Offset: disabled
---[ end Kernel panic - not syncing: CXL cachemem error. ]---
=== Downstream Port ===
root@tbowman-cxl:~/aer-inject# ./ds-ce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0e:00.0
pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0e:00.0
aer_event: 0000:0e:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
pcieport 0000:0e:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
pcieport 0000:0e:00.0: device [19e5:a129] error status/mask=00004000/0000a000
pcieport 0000:0e:00.0: [14] CorrIntErr
cxl_port_aer_correctable_error: device=0000:0e:00.0 host=0000:0d:00.0 status='CRC Threshold Hit'
root@tbowman-cxl:~/aer-inject# ./ds-uce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0e:00.0
pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0e:00.0
aer_event: 0000:0e:00.0 CXL Bus Error: severity=Fatal, Uncorrectable Internal Error, TLP Header=Not available
pcieport 0000:0e:00.0: CXL Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
pcieport 0000:0e:00.0: device [19e5:a129] error status/mask=00400000/02000000
pcieport 0000:0e:00.0: [22] UncorrIntErr
cxl_port_aer_uncorrectable_error: device=0000:0e:00.0 host=0000:0d:00.0 status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Kernel panic - not syncing: CXL cachemem error.
CPU: 26 UID: 0 PID: 176 Comm: kworker/26:0 Tainted: G E 6.19.0-rc4-gd2320443c4cf #222 PREEMPT(voluntary)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: events cxl_proto_err_work_fn [cxl_core]
Call Trace:
<TASK>
dump_stack_lvl+0x26/0xc0
dump_stack+0x10/0x20
vpanic+0x35e/0x3b0
panic+0x57/0x60
cxl_proto_err_work_fn+0x1fc/0x210 [cxl_core]
process_one_work+0x22b/0x600
worker_thread+0x195/0x350
kthread+0x119/0x230
? __pfx_worker_thread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x261/0x2e0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Kernel Offset: disabled
---[ end Kernel panic - not syncing: CXL cachemem error. ]---
=== Endpoint ====
root@tbowman-cxl:~/aer-inject# ./ep-ce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0f:00.0
pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0f:00.0
aer_event: 0000:0f:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
cxl_pci 0000:0f:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
cxl_pci 0000:0f:00.0: device [8086:0d93] error status/mask=00004000/0000a000
cxl_pci 0000:0f:00.0: [14] CorrIntErr
cxl_port_aer_correctable_error: device=0000:0f:00.0 host=0000:0e:00.0 status='CRC Threshold Hit'
root@tbowman-cxl:~/aer-inject# ./ep-uce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0f:00.0
pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0f:00.0
aer_event: 0000:0f:00.0 CXL Bus Error: severity=Fatal, , TLP Header=Not available
cxl_pci 0000:0f:00.0: AER: CXL Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID)
cxl_port_aer_uncorrectable_error: device=0000:0f:00.0 host=0000:0e:00.0 status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Kernel panic - not syncing: CXL cachemem error.
CPU: 26 UID: 0 PID: 247 Comm: irq/24-aerdrv Tainted: G E 6.19.0-rc4-gd2320443c4cf #222 PREEMPT(voluntary)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x26/0xc0
dump_stack+0x10/0x20
vpanic+0x35e/0x3b0
panic+0x57/0x60
cxl_pci_error_detected+0xbb/0xc0 [cxl_core]
report_error_detected+0xda/0x1d0
? __pfx_report_frozen_detected+0x10/0x10
report_frozen_detected+0x16/0x20
__pci_walk_bus+0x51/0x80
? __pfx_report_frozen_detected+0x10/0x10
pci_walk_bus+0x32/0x50
pcie_do_recovery+0x302/0x450
? __pfx_aer_root_reset+0x10/0x10
aer_isr_one_error_type+0x18e/0x330
aer_isr_one_error+0x124/0x150
aer_isr+0x4c/0x80
irq_thread_fn+0x28/0x70
irq_thread+0x19a/0x2a0
? __pfx_irq_thread_fn+0x10/0x10
? __pfx_irq_thread_dtor+0x10/0x10
kthread+0x119/0x230
? __pfx_irq_thread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x261/0x2e0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Kernel Offset: disabled
---[ end Kernel panic - not syncing: CXL cachemem error. ]---
== Changes ==
Changes in v13->v14:
PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
- Add Jonathan's and Dan's review-by
- Update commit title prefix (Bjorn)
- Revert format fix for cxl_sbr_masked() (Jonathan)
- Update 'Compute Express Link' comment block (Jonathan)
- Move PCI_DVSEC_CXL_FLEXBUS definitions to later patch where
used (Jonathan)
- Removed stray change (Bjorn)
PCI: Update CXL DVSEC definitions
- New patch. Split from previous patch such that there is now a separate
move patch and a format fix patch.
- Formatting update requested (Bjorn)
- Remove PCI_DVSEC_HEADER1_LENGTH_MASK because it duplicates
PCI_DVSEC_HEADER1_LEN() (Bjorn)
- Add Dan's review-by
PCI: Introduce pcie_is_cxl()
- Move FLEXBUS_STATUS DVSEC here (Jonathan)
- Remove check for EP and USP (Dan)
- Update commit message (Bjorn)
- Fix writing past 80 columns (Bjorn)
- Add pci_is_pcie() parent bridge check at beginning of function (Bjorn)
PCI: Replace cxl_error_is_native() with pcie_aer_is_native()
- New commit
cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c
- Add sign-off for Dan and Jonathan
- Revert inadvertent formatting of cxl_dport_map_rch_aer() (Jonathan)
- Remove default value for CXL_RCH_RAS (Dan)
- Remove unnecessary pci.h include in core.h & ras_rch.c (Jonathan)
- Add linux/types.h include in ras_rch.c (Jonathan)
- Change CONFIG_CXL_RCH_RAS -> CONFIG_CXL_RAS (Dan)
PCI/AER: Export pci_aer_unmask_internal_errors
- New commit. Bjorn requested separating out and adding immediatetly
before being used. This is called from cxl_rch_enable_rcec() in
following patch.
PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
- New commit
PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c
- Add review-by and signed-off for Dan
- Commit message fixup (Dan)
- Update commit message with use-case description (Dan, Lukas)
- Make cxl_error_is_native() static (Dan)
- Make is_internal_error() non-static, non-export (Terry)
PCI/AER: Use guard() in cxl_rch_handle_error_iter()
- Add review-by for Jonathan, Dave Jiang, Dan WIlliams, and Bjorn
- Remove cleanup.h (Jonathan)
- Reverted comment removal (Bjorn)
- Move this patch after pci/pcie/aer_cxl_rch.c creation (Bjorn)
PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS
- New commit
PCI/AER: Report CXL or PCIe bus type in AER trace logging
- Merged with Dan's commit. Changes are moving bus_type the last
parameter in function calls (Dan)
- Removed all DCOs because of changes (Terry)
- Update commit message (Bjorn)
- Add Bjorn's ack-by
PCI/AER: Update struct aer_err_info with kernel-doc formatting
- New commit
cxl/mem: Clarify @host for devm_cxl_add_nvdimm()
- New commit
cxl/port: Remove "enumerate dports" helpers
- New commit
cxl/port: Fix devm resource leaks around with dport management
- New commit
cxl/port: Move dport operations to a driver event
- New commit
cxl/port: Move dport RAS reporting to a port resource
- New commit
cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers
- Correct message spelling (Terry)
cxl/port: Move endpoint component register management to cxl_port
- Correct message spelling (Terry)
cxl/port: Map Port component registers before switchport init
- Updates to use cxl_port_setup_regs() (Dan)
cxl: Change CXL handlers to use guard() instead of scoped_guard()
- Add reviewed-by for Jonathan and Dave Jiang
PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
- Add review-by for Dan
- Update Title prefix (Bjorn)
- Removed merge_result. Only logging error for device reporting the
error (Dan)
- Remove PCI_ERS_RESULT_PANIC paragraph in pci-error-recovery.rst (Bjorn)
PCI/AER: Move AER driver's CXL VH handling to pcie/aer_cxl_vh.c
- Replaced workqueue_types.h include with 'struct work_struct'
predeclaration (Bjorn)
- Update error message (Bjorn)
- Reordered 'struct cxl_proto_err_work_data' (Bjorn)
- Remove export of cxl_error_is_native() here (Bjorn)
cxl/port: Unify endpoint and switch port lookup
- New patch
PCI/AER: Dequeue forwarded CXL error
- Update commit title's prefix (Bjorn)
- Add pdev ref get in AER driver before enqueue and add pdev ref put in
CXL driver after dequeue and handling (Dan)
- Removed handling to simplify patch context (Terry)
PCI: Introduce CXL Port protocol error handlers
- Add Dave Jiang's review-by
- Update commit message & headline (Bjorn)
- Refactor cxl_port_error_detected()/cxl_port_cor_error_detected() to
one line (Jonathan)
- Remove cxl_walk_port(). Only log the erroring device. No port walking. (Dan)
- Remove cxl_pci_drv_bound(). Check for 'is_cxl' parent port is
sufficient (Dan)
- Remove device_lock_if()
- Combine CE and UCE here (Terry)
cxl: Update Endpoint uncorrectable protocol error handling
- Update commit headline (Bjorn)
- Rename pci_error_detected()/pci_cor_error_detected() ->
cxl_pci_error_detected/cxl_pci_cor_error_detected() (Jonathan)
- Remove now-invalid comment in cxl_error_detected() (Jonathan)
- Split into separate patches for UCE and CE (Terry)
cxl: Update Endpoint correctable protocol error handling
- New commit
- Change cxl_cor_error_detected() parameter to &pdev->dev device from
memdev device. (Terry)
cxl: Enable CXL protocol errors during CXL Port probe
- Update commit title's prefix (Bjorn)
Changes in v12->v13:
CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
- Add Dave Jiang's reviewed-by
- Remove changes to existing PCI_DVSEC_CXL_PORT* defines. Update commit
message. (Jonathan)
PCI/CXL: Introduce pcie_is_cxl()
- Add Ben's "reviewed-by"
cxl/pci: Remove unnecessary CXL Endpoint handling helper functions
- None
cxl/pci: Remove unnecessary CXL RCH handling helper functions
- None
cxl: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core
- None
cxl: Move CXL driver's RCH error handling into core/ras_rch.c
- None
CXL/AER: Replace device_lock() in cxl_rch_handle_error_iter() with guard() lock
- New patch
CXL/AER: Move AER drivers RCH error handling into pcie/aer_cxl_rch.c
- Add forward declararation of 'struct aer_err_info' in pci/pci.h (Terry)
- Changed copyright date from 2025 to 2023 (Jonathan)
- Add David Jiang's, Jonathan's, and Ben's review-by
- Readd 'struct aer_err_info' (Bot)
PCI/AER: Report CXL or PCIe bus error type in trace logging
- Remove duplicated aer_err_info inline comments. Is already in the
kernel-doc header (Ben)
cxl/pci: Update RAS handler interfaces to also support CXL Ports
- None
cxl/pci: Log message if RAS registers are unmapped
- Added Bens review-by
cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
- Added Dave Jiang's review-by
cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors
- Add Ben's review-by
cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers
- Change as result of dport delay fix. No longer need switchport and
endport approach. Refactor. (Terry)
CXL/PCI: Introduce PCI_ERS_RESULT_PANIC
- Add Dave Jiang's, Jonathan's, Ben's review-by
- Typo fix (Ben)
CXL/AER: Introduce pcie/aer_cxl_vh.c in AER driver for forwarding CXL errors
- Add Dave Jiang's review-by
- Update error message (Ben)
cxl: Introduce cxl_pci_drv_bound() to check for bound driver
- Add Dave Jiang's review-by.
cxl: Change CXL handlers to use guard() instead of scoped_guard()
- New patch
cxl/pci: Introduce CXL protocol error handlers for endpoints
- Updated all the implemetnation and commit message. (Terry)
- Refactored cxl_cor_error_detected()/cxl_error_detected() to remove
pdev (Dave Jiang)
CXL/PCI: Introduce CXL Port protocol error handlers
- Move get_pci_cxl_host_dev() and cxl_handle_proto_error() to Dequeue
patch (Terry)
- Remove EP case in cxl_get_ras_base(), not used. (Terry)
- Remove check for dport->dport_dev (Dave)
- Remove whitespace (Terry)
PCI/AER: Dequeue forwarded CXL error
- Rewrite cxl_handle_proto_error() and cxl_proto_err_work_fn() (Terry)
- Rename get_cxl_host dev() to be get_cxl_port() (Terry)
- Remove exporting of unused function, pci_aer_clear_fatal_status() (Dave Jiang)
- Change pr_err() calls to ratelimited. (Terry)
- Update commit message. (Terry)
- Remove namespace qualifier from pcie_clear_device_status()
export (Dave Jiang)
- Move locks into cxl_proto_err_work_fn() (Dave)
- Update log messages in cxl_forward_error() (Ben)
CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
- Renamed pci_ers_merge_result() to pcie_ers_merge_result().
pci_ers_merge_result() is already used in eeh driver. (Bot)
CXL/PCI: Introduce CXL uncorrectable protocol error recovery
- Rewrite report_error_detected() and cxl_walk_port (Terry)
- Add guard() before calling cxl_pci_drv_bound() (Dave Jiang)
- Add guard() calls for EP (cxlds->cxlmd->dev & pdev->dev) and ports
(pdev->dev & parent cxl_port) in cxl_report_error_detected() and
cxl_handle_proto_error() (Terry)
- Remove unnecessary check for endpoint port. (Dave Jiang)
- Remove check for RCIEP EP in cxl_report_error_detected() (Terry)
CXL/PCI: Enable CXL protocol errors during CXL Port probe
- Add dev and dev_is_pci() NULL checks in cxl_unmask_proto_interrupts() (Terry)
- Add Dave Jiang's and Ben's review-by
CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup
- Added dev and dev_is_pci() checks in cxl_mask_proto_interrupts() (Terry)
Changes in v11 -> v12:
cxl/pci: Remove unnecessary CXL Endpoint handling helper functions
- Added Dave Jiang's review by
- Moved to front of series
cxl/pci: Remove unnecessary CXL RCH handling helper functions
- Add reviewed-by for Alejandro & Dave Jiang
- Moved to front of series
cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c
- Update CONFIG_CXL_RAS in CXL Kconfig to have CXL_PCI dependency (Terry)
CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS
- Added review-by for Sathyanarayanan
- Changed Kconfig dependency from PCIEAER_CXL to PCIEAER. Moved
this backwards into this patch.
cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditio
- Moved CXL_RCH_RAS Kconfig definition here from following commit
CXL/AER: Introduce aer_cxl_rch.c into AER driver for handling CXL RCH errors
- Rename drivers/pci/pcie/cxl_rch.c to drivers/pci/pcie/aer_cxl_rch.c (Lukas)
- Removed forward declararation of 'struct aer_err_info' in pci/pci.h (Terry)
CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
- Change formatting to be same as existing definitions
- Change GENMASK() -> __GENMASK() and BIT() to _BITUL()
PCI/CXL: Introduce pcie_is_cxl()
- Add review-by for Alejandro
- Add comment in set_pcie_cxl() explaining why updating parent status.
PCI/AER: Report CXL or PCIe bus error type in trace logging
- Change aer_err_info::is_cxl to be bool a bitfield. Update structure padding. (Lukas)
- Add kernel-doc for 'struct aer_err_info' (Lukas)
cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
- Correct parameters to call trace_cxl_aer_correctable_error() (Shiju)
- Add reviewed-by for Jonathan and Shiju
cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers
- Add check for dport_parent->rch before calling cxl_dport_init_ras_reporting().
- RCH dports are initialized from cxl_dport_init_ras_reporting cxl_mem_probe().
CXL/PCI: Introduce PCI_ERS_RESULT_PANIC
- Documentation requested by (Lukas)
CXL/AER: Introduce aer_cxl_vh.c in AER driver for forwarding CXL errors
- Rename drivers/pci/pcie/cxl_aer.c to drivers/pci/pcie/aer_cxl_vh.c (Lukas)
cxl: Introduce cxl_pci_drv_bound() to check for bound driver
- New patch
PCI/AER: Dequeue forwarded CXL error
- Add guard for CE case in cxl_handle_proto_error() (Dave)
- Updated commit message (Terry)
CXL/PCI: Introduce CXL Port protocol error handlers
- Add call to cxl_pci_drv_bound() in cxl_handle_proto_error() and
pci_to_cxl_dev() (Lukas)
- Change cxl_error_detected() -> cxl_cor_error_detected() (Terry)
- Remove NULL variable assignments (Jonathan)
- Replace bus_find_device() with find_cxl_port_by_uport() for upstream
port searches. (Dave)
CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
- Remove static inline pci_ers_merge_result() definition for !CONFIG_PCIEAER.
Is not needed. (Lukas)
CXL/PCI: Introduce CXL uncorrectable protocol error recovery
- Clean up port discovery in cxl_do_recovery() (Dave)
- Add PCI_EXP_TYPE_RC_END to type check in cxl_report_error_detected()
Changes in v10 -> v11:
cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c
- New patch
CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS
- New patch
cxl/pci: Remove unnecessary CXL RCH handling helper functions
- New patch
cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block
- New patch
CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors
- Remove changes in code-split and move to earlier, new patch
- Add #include <linux/bitfield.h> to cxl_ras.c
- Move cxl_rch_handle_error() & cxl_rch_enable_rcec() declarations from pci.h
to aer.h, more localized.
- Introduce CONFIG_CXL_RCH_RAS, includes Makefile changes, ras.c ifdef changes
CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
- New patch
PCI/CXL: Introduce pcie_is_cxl()
- Amended set_pcie_cxl() to check for Upstream Port's and EP's parent
downstream port by calling set_pcie_cxl(). (Dan)
- Retitle patch: 'Add' -> 'Introduce'
- Add check for CXL.mem and CXL.cache (Alejandro, Dan)
PCI/AER: Report CXL or PCIe bus error type in trace logging
- Remove duplicate call to trace_aer_event() (Shiju)
- Added Dan William's and Dave Jiang's reviewed-by
CXL/AER: Update PCI class code check to use FIELD_GET()
- Add #include <linux/bitfield.h> to cxl_ras.c (Terry)
- Removed line wrapping at "(CXL 3.2, 8.1.12.1)". (Jonathan)
cxl/pci: Log message if RAS registers are unmapped
- Added Dave Jiang's review-by (Terry)
cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
- Updated CE and UCE trace routines to maintian consistent TP_Struct ABI
and unchanged TP_printk() logging. (Shiju, Alison)
cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors
- Added Dave Jiang and Jonathan Cameron's review-by
- Changes moved to core/ras.c
cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers
- Use local pointer for readability in cxl_switch_port_init_ras() (Jonathan Cameron)
- Rename port to be ep in cxl_endpoint_port_init_ras() (Dave Jiang)
- Rename dport to be parent_dport in cxl_endpoint_port_init_ras()
and cxl_switch_port_init_ras() (Dave Jiang)
- Port helper changes were in cxl/port.c, now in core/ras.c (Dave Jiang)
cxl/pci: Introduce CXL Endpoint protocol error handlers
- cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
- cxl_error_detected() - Remove extra line (Shiju)
- Changes moved to core/ras.c (Terry)
- cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
- Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
- Move #include "pci.h from cxl.h to core.h (Terry)
- Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors
- Move RCH implementation to cxl_rch.c and RCH declarations to pci/pci.h. (Terry)
- Introduce 'struct cxl_proto_err_kfifo' containing semaphore, fifo,
and work struct. (Dan)
- Remove embedded struct from cxl_proto_err_work (Dan)
- Make 'struct work_struct *cxl_proto_err_work' definition static (Jonathan)
- Add check for NULL cxl_proto_err_kfifo to determine if CXL driver is
not registered for workqueue. (Dan)
PCI/AER: Dequeue forwarded CXL error
- Reword patch commit message to remove RCiEP details (Jonathan)
- Add #include <linux/bitfield.h> (Terry)
- is_cxl_rcd() - Fix short comment message wrap (Jonathan)
- is_cxl_rcd() - Combine return calls into 1 (Jonathan)
- cxl_handle_proto_error() - Move comment earlier (Jonathan)
- Usse FIELD_GET() in discovering class code (Jonathan)
- Remove BDF from cxl_proto_err_work_data. Use 'struct pci_dev *' (Dan)
CXL/PCI: Introduce CXL Port protocol error handlers
- Removed check for PCI_EXP_TYPE_RC_END in cxl_report_error_detected() (Terry)
- Update is_cxl_error() to check for acceptable PCI EP and port types
CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
- pci_ers_merge_result() - Change export to non-namespace and rename
to be pci_ers_merge_result() (Jonathan)
- Move pci_ers_merge_result() definition to pci.h. Needs pci_ers_result (Terry)
CXL/PCI: Introduce CXL uncorrectable protocol error recovery
- pci_ers_merge_results() - Move to earlier patch
CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup
- Remove guard() in cxl_mask_proto_interrupts(). Observed device lockup/block
during testing. (Terry)
Changes in v9 -> v10:
- Add drivers/pci/pcie/cxl_aer.c
- Add drivers/cxl/core/native_ras.c
- Change cxl_register_prot_err_work()/cxl_unregister_prot_err_work to return void
- Check for pcie_ports_native in cxl_do_recovery()
- Remove debug logging in cxl_do_recovery()
- Update PCI_ERS_RESULT_PANIC definition to indicate is CXL specific
- Revert trace logging changes: name,parent -> memdev,host.
- Use FIELD_GET() to check for EP class code (cxl_aer.c & native_ras.c).
- Change _prot_ to _proto_ everywhere
- cxl_rch_handle_error_iter(), check if driver is cxl_pci_driver
- Remove cxl_create_prot_error_info(). Move logic into forward_cxl_error()
- Remove sbdf_to_pci() and move logic into cxl_handle_proto_error()
- Simplify/refactor get_pci_cxl_host_dev()
- Simplify/refactor cxl_get_ras_base()
- Move patch 'Remove unnecessary CXL Endpoint handling helper functions' to front
- Update description for 'CXL/PCI: Introduce CXL Port protocol error
handlers' with why state is not used to determine handling
- Introduce cxl_pci_drv_bound() and call from cxl_rch_handle_error_iter()
Changes in v8 -> v9:
- Updated reference counting to use pci_get_device()/pci_put_device() in
cxl_disable_prot_errors()/cxl_enable_prot_errors
- Refactored cxl_create_prot_err_info() to fix reference counting
- Removed 'struct cxl_port' driver changes for error handler. Instead
check for CXL device type (EP or Port device) and call handler
- Make pcie_is_cxl() static inline in include/linux/linux.h
- Remove NULL check in create_prot_err_info()
- Change success return in cxl_ras_init() to use hardcoded 0
- Changed 'struct work_struct cxl_prot_err_work' declaration to static
- Change to use rate limited log with dev anchor in forward_cxl_error()
- Refactored forward-cxl_error() to remove severity auto variable
- Changed pci_aer_clear_nonfatal_status() to be static inline for
!(CONFIG_PCIEAER)
- Renamed merge_result() to be cxl_merge_result()
- Removed 'ue' condition in cxl_error_detected()
- Updated 2nd parameter in call to __cxl_handle_cor_ras()/__cxl_handle_ras()
in unify patch
- Added log message for failure while assigning interrupt disable callback
- Updated pci_aer_mask_internal_errors() to use pci_clear_and_set_config_dword()
- Simplified patch titles for clarity
- Moved CXL error interrupt disabling into cxl/core/port.c with CXL Port
teardown
- Updated 'struct cxl_port_err_info' to only contain sbdf and severity
Removed everything else.
- Added pdev and CXL device get_device()/put_device() before calling handlers
Changes in v7 -> v8:
[Dan] Use kfifo. Move handling to CXL driver. AER forwards error to CXL
driver
[Dan] Add device reference incrementors where needed throughout
[Dan] Initiate CXL Port RAS init from Switch Port and Endpoint Port init
[Dan] Combine CXL Port and CXL Endpoint trace routine
[Dan] Introduce aer_info::is_cxl. Use to indicate CXL or PCI errors
[Jonathan] Add serial number for all devices in trace
[DaveJ] Move find_cxl_port() change into patch using it
[Terry] Move CXL Port RAS init into cxl/port.c
[Terry] Moved kfifo functions into cxl/core/ras.c
Changes in v6 -> v7:
[Terry] Move updated trace routine call to later patch. Was causing build
error.
Changes in v5 -> v6:
[Ira] Move pcie_is_cxl(dev) define to a inline function
[Ira] Update returning value from pcie_is_cxl_port() to bool w/o cast
[Ira] Change cxl_report_error_detected() cleanup to return correct bool
[Ira] Introduce and use PCI_ERS_RESULT_PANIC
[Ira] Reuse comment for PCIe and CXL recovery paths
[Jonathan] Add type check in for cxl_handle_cor_ras() and cxl_handle_ras()
[Jonathan] cxl_uport/dport_init_ras_reporting(), added a mutex.
[Jonathan] Add logging example to patches updating trace output
[Jonathan] Make parameter 'const' to eliminate for cast in match_uport()
[Jonathan] Use __free() in cxl_pci_port_ras()
[Terry] Add patch to log the PCIe SBDF along with CXL device name
[Terry] Add patch to handle CXL endpoint and RCH DP errors as CXL errors
[Terry] Remove patch w USP UCE fatal support @ aer_get_device_error_info()
[Terry] Rebase to cxl/next commit 5585e342e8d3 ("cxl/memdev: Remove unused partition values")
[Gregory] Pre-initialize pointer to NULL in cxl_pci_port_ras()
[Gregory] Move AER driver bus name detection to a static function
Changes in v4 -> v5:
[Alejandro] Refactor cxl_walk_bridge to simplify 'status' variable usage
[Alejandro] Add WARN_ONCE() in __cxl_handle_ras() and cxl_handle_cor_ras()
[Ming] Remove unnecessary NULL check in cxl_pci_port_ras()
[Terry] Add failure check for call to to_cxl_port() in cxl_pci_port_ras()
[Ming] Use port->dev for call to devm_add_action_or_reset() in
cxl_dport_init_ras_reporting() and cxl_uport_init_ras_reporting()
[Jonathan] Use get_device()/put_device() to prevent race condition in
cxl_clear_port_error_handlers() and cxl_clear_port_error_handlers()
[Terry] Commit message cleanup. Capitalize keywords from CXL and PCI
specifications
Changes in v3 -> v4:
[Lukas] Capitalize PCIe and CXL device names as in specifications
[Lukas] Move call to pcie_is_cxl() into cxl_port_devsec()
[Lukas] Correct namespace spelling
[Lukas] Removed export from pcie_is_cxl_port()
[Lukas] Simplify 'if' blocks in cxl_handle_error()
[Lukas] Change panic message to remove redundant 'panic' text
[Ming] Update to call cxl_dport_init_ras_reporting() in RCH case
[lkp@intel] 'host' parameter is already removed. Remove parameter description too.
[Terry] Added field description for cxl_err_handlers in pci.h comment block
Changes in v1 -> v2:
[Jonathan] Remove extra NULL check and cleanup in cxl_pci_port_ras()
[Jonathan] Update description to DSP map patch description
[Jonathan] Update cxl_pci_port_ras() to check for NULL port
[Jonathan] Dont call handler before handler port changes are present (patch order)
[Bjorn] Fix linebreak in cover sheet URL
[Bjorn] Remove timestamps from test logs in cover sheet
[Bjorn] Retitle AER commits to use "PCI/AER:"
[Bjorn] Retitle patch#3 to use renaming instead of refactoring
[Bjorn] Fix base commit-id on cover sheet
[Bjorn] Add VH spec reference/citation
[Terry] Removed last 2 patches to enable internal errors. Is not needed
because internal errors are enabled in AER driver.
[Dan] Create cxl_do_recovery() and pci_driver::cxl_err_handlers.
[Dan] Use kernel panic in CXL recovery
[Dan] cxl_port_hndlrs -> cxl_port_error_handlers
Dan Williams (8):
PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS
cxl/mem: Clarify @host for devm_cxl_add_nvdimm()
cxl/port: Remove "enumerate dports" helpers
cxl/port: Fix devm resource leaks around with dport management
cxl/port: Move dport operations to a driver event
cxl/port: Move dport RAS reporting to a port resource
cxl/port: Move endpoint component register management to cxl_port
cxl/port: Unify endpoint and switch port lookup
Dave Jiang (1):
cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional
blocks from core/pci.c
Terry Bowman (25):
PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
PCI: Update CXL DVSEC definitions
PCI: Introduce pcie_is_cxl()
cxl/pci: Remove unnecessary CXL Endpoint handling helper functions
cxl/pci: Remove unnecessary CXL RCH handling helper functions
PCI: Replace cxl_error_is_native() with pcie_aer_is_native()
cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c
PCI/AER: Export pci_aer_unmask_internal_errors()
PCI/AER: Update is_internal_error() to be non-static
is_aer_internal_error()
PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c
PCI/AER: Use guard() in cxl_rch_handle_error_iter()
PCI/AER: Report CXL or PCIe bus type in AER trace logging
PCI/AER: Update struct aer_err_info with kernel-doc formatting
cxl: Update RAS handler interfaces to also support CXL Ports
cxl: Update CXL Endpoint tracing
cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers
cxl/port: Map Port component registers before switchport init
cxl: Change CXL handlers to use guard() instead of scoped_guard()
PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
PCI/AER: Move AER driver's CXL VH handling to pcie/aer_cxl_vh.c
PCI/AER: Dequeue forwarded CXL error
PCI: Introduce CXL Port protocol error handlers
cxl: Update Endpoint uncorrectable protocol error handling
cxl: Update Endpoint correctable protocol error handling
cxl: Enable CXL protocol errors during CXL Port probe
Documentation/PCI/pci-error-recovery.rst | 2 +
drivers/cxl/Kconfig | 4 +
drivers/cxl/acpi.c | 11 +-
drivers/cxl/core/Makefile | 3 +-
drivers/cxl/core/core.h | 26 ++
drivers/cxl/core/hdm.c | 6 +-
drivers/cxl/core/pci.c | 402 +++-------------------
drivers/cxl/core/pmem.c | 13 +-
drivers/cxl/core/port.c | 245 +++++++-------
drivers/cxl/core/ras.c | 405 ++++++++++++++++++++++-
drivers/cxl/core/ras_rch.c | 121 +++++++
drivers/cxl/core/regs.c | 14 +-
drivers/cxl/core/trace.h | 25 +-
drivers/cxl/cxl.h | 53 +--
drivers/cxl/cxlmem.h | 4 +-
drivers/cxl/cxlpci.h | 93 +++---
drivers/cxl/mem.c | 4 +-
drivers/cxl/pci.c | 73 +---
drivers/cxl/port.c | 187 ++++++++++-
drivers/pci/pci.c | 1 +
drivers/pci/pci.h | 39 ++-
drivers/pci/pcie/Kconfig | 9 -
drivers/pci/pcie/Makefile | 2 +
drivers/pci/pcie/aer.c | 144 ++------
drivers/pci/pcie/aer_cxl_rch.c | 104 ++++++
drivers/pci/pcie/aer_cxl_vh.c | 82 +++++
drivers/pci/pcie/portdrv.h | 16 +
drivers/pci/probe.c | 31 ++
include/linux/aer.h | 26 ++
include/linux/pci.h | 11 +
include/ras/ras_event.h | 12 +-
include/uapi/linux/pci_regs.h | 64 +++-
tools/testing/cxl/Kbuild | 11 +-
tools/testing/cxl/cxl_core_exports.c | 22 --
tools/testing/cxl/exports.h | 13 -
tools/testing/cxl/test/cxl.c | 6 +-
tools/testing/cxl/test/mock.c | 54 +--
tools/testing/cxl/test/mock.h | 4 +-
38 files changed, 1435 insertions(+), 907 deletions(-)
create mode 100644 drivers/cxl/core/ras_rch.c
create mode 100644 drivers/pci/pcie/aer_cxl_rch.c
create mode 100644 drivers/pci/pcie/aer_cxl_vh.c
delete mode 100644 tools/testing/cxl/exports.h
base-commit: 0f61b1860cc3f52aef9036d7235ed1f017632193
--
2.34.1
^ permalink raw reply [flat|nested] 129+ messages in thread
* [PATCH v14 01/34] PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-22 18:58 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 02/34] PCI: Update CXL DVSEC definitions Terry Bowman
` (32 subsequent siblings)
33 siblings, 1 reply; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The CXL DVSECs are currently defined in cxl/core/cxlpci.h. These are not
accessible to other subsystems. Move these to uapi/linux/pci_regs.h.
The CXL DVSEC definitions will be renamed and reformatted to fit better
with existing defines.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
----
Changes in v13->v14:
- Add Jonathan's and Dan's review-by
- Update commit title prefix (Bjorn)
- Revert format fix for cxl_sbr_masked() (Jonathan)
- Update 'Compute Express Link' comment block (Jonathan)
- Move PCI_DVSEC_CXL_FLEXBUS definitions to later patch where
used (Jonathan)
- Removed stray change (Bjorn)
Changes in v12->v13:
- Add Dave Jiang's reviewed-by
- Remove changes to existing PCI_DVSEC_CXL_PORT* defines. Update commit
message. (Jonathan)
Changes in v11 -> v12:
- Change formatting to be same as existing definitions
- Change GENMASK() -> __GENMASK() and BIT() to _BITUL()
Changes in v10 -> v11:
- New commit
---
drivers/cxl/cxlpci.h | 53 -----------------------------
include/uapi/linux/pci_regs.h | 64 ++++++++++++++++++++++++++++++++---
2 files changed, 59 insertions(+), 58 deletions(-)
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 1d526bea8431..cdb7cf3dbcb4 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -7,59 +7,6 @@
#define CXL_MEMORY_PROGIF 0x10
-/*
- * See section 8.1 Configuration Space Registers in the CXL 2.0
- * Specification. Names are taken straight from the specification with "CXL" and
- * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
- */
-#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
-
-/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
-#define CXL_DVSEC_PCIE_DEVICE 0
-#define CXL_DVSEC_CAP_OFFSET 0xA
-#define CXL_DVSEC_MEM_CAPABLE BIT(2)
-#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
-#define CXL_DVSEC_CTRL_OFFSET 0xC
-#define CXL_DVSEC_MEM_ENABLE BIT(2)
-#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
-#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
-#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
-#define CXL_DVSEC_MEM_ACTIVE BIT(1)
-#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
-#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
-#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
-#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
-
-#define CXL_DVSEC_RANGE_MAX 2
-
-/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
-#define CXL_DVSEC_FUNCTION_MAP 2
-
-/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */
-#define CXL_DVSEC_PORT_EXTENSIONS 3
-
-/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */
-#define CXL_DVSEC_PORT_GPF 4
-#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
-#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK GENMASK(3, 0)
-#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK GENMASK(11, 8)
-#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
-#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK GENMASK(3, 0)
-#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK GENMASK(11, 8)
-
-/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */
-#define CXL_DVSEC_DEVICE_GPF 5
-
-/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */
-#define CXL_DVSEC_PCIE_FLEXBUS_PORT 7
-
-/* CXL 2.0 8.1.9: Register Locator DVSEC */
-#define CXL_DVSEC_REG_LOCATOR 8
-#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
-#define CXL_DVSEC_REG_LOCATOR_BIR_MASK GENMASK(2, 0)
-#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
-#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
-
/*
* NOTE: Currently all the functions which are enabled for CXL require their
* vectors to be in the first 16. Use this as the default max.
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 3add74ae2594..6c4b6f19b18e 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1253,11 +1253,6 @@
#define PCI_DEV3_STA 0x0c /* Device 3 Status Register */
#define PCI_DEV3_STA_SEGMENT 0x8 /* Segment Captured (end-to-end flit-mode detected) */
-/* Compute Express Link (CXL r3.1, sec 8.1.5) */
-#define PCI_DVSEC_CXL_PORT 3
-#define PCI_DVSEC_CXL_PORT_CTL 0x0c
-#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
-
/* Integrity and Data Encryption Extended Capability */
#define PCI_IDE_CAP 0x04
#define PCI_IDE_CAP_LINK 0x1 /* Link IDE Stream Supported */
@@ -1338,4 +1333,63 @@
#define PCI_IDE_SEL_ADDR_3(x) (28 + (x) * PCI_IDE_SEL_ADDR_BLOCK_SIZE)
#define PCI_IDE_SEL_BLOCK_SIZE(nr_assoc) (20 + PCI_IDE_SEL_ADDR_BLOCK_SIZE * (nr_assoc))
+/* Compute Express Link (CXL r3.1, sec 8.1.5) */
+#define PCI_DVSEC_CXL_PORT 3
+#define PCI_DVSEC_CXL_PORT_CTL 0x0c
+#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
+
+/*
+ * Compute Express Link (CXL r3.2, sec 8.1)
+ *
+ * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
+ * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
+ * registers on downstream link-up events.
+ */
+#define PCI_DVSEC_HEADER1_LENGTH_MASK __GENMASK(31, 20)
+
+/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
+#define CXL_DVSEC_PCIE_DEVICE 0
+#define CXL_DVSEC_CAP_OFFSET 0xA
+#define CXL_DVSEC_MEM_CAPABLE _BITUL(2)
+#define CXL_DVSEC_HDM_COUNT_MASK __GENMASK(5, 4)
+#define CXL_DVSEC_CTRL_OFFSET 0xC
+#define CXL_DVSEC_MEM_ENABLE _BITUL(2)
+#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
+#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
+#define CXL_DVSEC_MEM_INFO_VALID _BITUL(0)
+#define CXL_DVSEC_MEM_ACTIVE _BITUL(1)
+#define CXL_DVSEC_MEM_SIZE_LOW_MASK __GENMASK(31, 28)
+#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
+#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
+#define CXL_DVSEC_MEM_BASE_LOW_MASK __GENMASK(31, 28)
+
+#define CXL_DVSEC_RANGE_MAX 2
+
+/* CXL 3.2 8.1.4: Non-CXL Function Map DVSEC */
+#define CXL_DVSEC_FUNCTION_MAP 2
+
+/* CXL 3.2 8.1.5: Extensions DVSEC for Ports */
+#define CXL_DVSEC_PORT 3
+#define CXL_DVSEC_PORT_CTL 0x0c
+#define CXL_DVSEC_PORT_CTL_UNMASK_SBR 0x00000001
+
+/* CXL 3.2 8.1.6: GPF DVSEC for CXL Port */
+#define CXL_DVSEC_PORT_GPF 4
+#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
+#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK __GENMASK(3, 0)
+#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK __GENMASK(11, 8)
+#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
+#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK __GENMASK(3, 0)
+#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK __GENMASK(11, 8)
+
+/* CXL 3.2 8.1.7: GPF DVSEC for CXL Device */
+#define CXL_DVSEC_DEVICE_GPF 5
+
+/* CXL 3.2 8.1.9: Register Locator DVSEC */
+#define CXL_DVSEC_REG_LOCATOR 8
+#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
+#define CXL_DVSEC_REG_LOCATOR_BIR_MASK __GENMASK(2, 0)
+#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK __GENMASK(15, 8)
+#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK __GENMASK(31, 16)
+
#endif /* LINUX_PCI_REGS_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 02/34] PCI: Update CXL DVSEC definitions
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-01-14 18:20 ` [PATCH v14 01/34] PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 18:53 ` Jonathan Cameron
2026-01-22 18:37 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 03/34] PCI: Introduce pcie_is_cxl() Terry Bowman
` (31 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
CXL DVSEC definitions were recently moved into uapi/pci_regs.h, but the
newly added macros do not follow the file's existing naming conventions.
The current format uses CXL_DVSEC_XYZ, while the new CXL entries must
instead use the PCI_DVSEC_CXL_XYZ prefix to match the conventions already
established in pci_regs.h.
The new CXL DVSEC macros also introduce _MASK and _OFFSET suffixes, which
are not used anywhere else in the file. These suffixes lengthen the
identifiers and reduce readability. Remove _MASK and _OFFSET from the
recently added definitions.
Additionally, remove PCI_DVSEC_HEADER1_LENGTH, as it duplicates the existing
PCI_DVSEC_HEADER1_LEN() macro.
Update all existing references to use the new macro names.
Finally, update the inline documentation to reference the latest revision
of the CXL specification.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- New patch. Split from previous patch such that there is now a separate
move patch and a format fix patch.
- Formatting update requested (Bjorn)
- Remove PCI_DVSEC_HEADER1_LENGTH_MASK because it duplicates
PCI_DVSEC_HEADER1_LEN() (Bjorn)
- Add Dan's review-by
---
drivers/cxl/core/pci.c | 58 ++++++++++-----------
drivers/cxl/core/regs.c | 14 +++---
drivers/cxl/pci.c | 2 +-
include/uapi/linux/pci_regs.h | 94 ++++++++++++++++-------------------
4 files changed, 81 insertions(+), 87 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 5b023a0178a4..077b386e0c8d 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -86,12 +86,12 @@ static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
i = 1;
do {
rc = pci_read_config_dword(pdev,
- d + CXL_DVSEC_RANGE_SIZE_LOW(id),
+ d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
&temp);
if (rc)
return rc;
- valid = FIELD_GET(CXL_DVSEC_MEM_INFO_VALID, temp);
+ valid = FIELD_GET(PCI_DVSEC_CXL_MEM_INFO_VALID, temp);
if (valid)
break;
msleep(1000);
@@ -121,11 +121,11 @@ static int cxl_dvsec_mem_range_active(struct cxl_dev_state *cxlds, int id)
/* Check MEM ACTIVE bit, up to 60s timeout by default */
for (i = media_ready_timeout; i; i--) {
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(id), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id), &temp);
if (rc)
return rc;
- active = FIELD_GET(CXL_DVSEC_MEM_ACTIVE, temp);
+ active = FIELD_GET(PCI_DVSEC_CXL_MEM_ACTIVE, temp);
if (active)
break;
msleep(1000);
@@ -154,11 +154,11 @@ int cxl_await_media_ready(struct cxl_dev_state *cxlds)
u16 cap;
rc = pci_read_config_word(pdev,
- d + CXL_DVSEC_CAP_OFFSET, &cap);
+ d + PCI_DVSEC_CXL_CAP, &cap);
if (rc)
return rc;
- hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap);
+ hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT, cap);
for (i = 0; i < hdm_count; i++) {
rc = cxl_dvsec_mem_range_valid(cxlds, i);
if (rc)
@@ -186,16 +186,16 @@ static int cxl_set_mem_enable(struct cxl_dev_state *cxlds, u16 val)
u16 ctrl;
int rc;
- rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
+ rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL, &ctrl);
if (rc < 0)
return rc;
- if ((ctrl & CXL_DVSEC_MEM_ENABLE) == val)
+ if ((ctrl & PCI_DVSEC_CXL_MEM_ENABLE) == val)
return 1;
- ctrl &= ~CXL_DVSEC_MEM_ENABLE;
+ ctrl &= ~PCI_DVSEC_CXL_MEM_ENABLE;
ctrl |= val;
- rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl);
+ rc = pci_write_config_word(pdev, d + PCI_DVSEC_CXL_CTRL, ctrl);
if (rc < 0)
return rc;
@@ -211,7 +211,7 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
{
int rc;
- rc = cxl_set_mem_enable(cxlds, CXL_DVSEC_MEM_ENABLE);
+ rc = cxl_set_mem_enable(cxlds, PCI_DVSEC_CXL_MEM_ENABLE);
if (rc < 0)
return rc;
if (rc > 0)
@@ -273,11 +273,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
return -ENXIO;
}
- rc = pci_read_config_word(pdev, d + CXL_DVSEC_CAP_OFFSET, &cap);
+ rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CAP, &cap);
if (rc)
return rc;
- if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
+ if (!(cap & PCI_DVSEC_CXL_MEM_CAPABLE)) {
dev_dbg(dev, "Not MEM Capable\n");
return -ENXIO;
}
@@ -288,7 +288,7 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
* driver is for a spec defined class code which must be CXL.mem
* capable, there is no point in continuing to enable CXL.mem.
*/
- hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap);
+ hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT, cap);
if (!hdm_count || hdm_count > 2)
return -EINVAL;
@@ -297,11 +297,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
* disabled, and they will remain moot after the HDM Decoder
* capability is enabled.
*/
- rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
+ rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL, &ctrl);
if (rc)
return rc;
- info->mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl);
+ info->mem_enabled = FIELD_GET(PCI_DVSEC_CXL_MEM_ENABLE, ctrl);
if (!info->mem_enabled)
return 0;
@@ -314,35 +314,35 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
return rc;
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_SIZE_HIGH(i), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i), &temp);
if (rc)
return rc;
size = (u64)temp << 32;
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(i), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(i), &temp);
if (rc)
return rc;
- size |= temp & CXL_DVSEC_MEM_SIZE_LOW_MASK;
+ size |= temp & PCI_DVSEC_CXL_MEM_SIZE_LOW;
if (!size) {
continue;
}
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_BASE_HIGH(i), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i), &temp);
if (rc)
return rc;
base = (u64)temp << 32;
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_BASE_LOW(i), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_BASE_LOW(i), &temp);
if (rc)
return rc;
- base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
+ base |= temp & PCI_DVSEC_CXL_MEM_BASE_LOW;
info->dvsec_range[ranges++] = (struct range) {
.start = base,
@@ -1068,7 +1068,7 @@ u16 cxl_gpf_get_dvsec(struct device *dev)
is_port = false;
dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
- is_port ? CXL_DVSEC_PORT_GPF : CXL_DVSEC_DEVICE_GPF);
+ is_port ? PCI_DVSEC_CXL_PORT_GPF : PCI_DVSEC_CXL_DEVICE_GPF);
if (!dvsec)
dev_warn(dev, "%s GPF DVSEC not present\n",
is_port ? "Port" : "Device");
@@ -1084,14 +1084,14 @@ static int update_gpf_port_dvsec(struct pci_dev *pdev, int dvsec, int phase)
switch (phase) {
case 1:
- offset = CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET;
- base = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK;
- scale = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK;
+ offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL;
+ base = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE;
+ scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE;
break;
case 2:
- offset = CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET;
- base = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK;
- scale = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK;
+ offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL;
+ base = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE;
+ scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE;
break;
default:
return -EINVAL;
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 5ca7b0eed568..a010b3214342 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -271,10 +271,10 @@ EXPORT_SYMBOL_NS_GPL(cxl_map_device_regs, "CXL");
static bool cxl_decode_regblock(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi,
struct cxl_register_map *map)
{
- u8 reg_type = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK, reg_lo);
- int bar = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BIR_MASK, reg_lo);
+ u8 reg_type = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID, reg_lo);
+ int bar = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BIR, reg_lo);
u64 offset = ((u64)reg_hi << 32) |
- (reg_lo & CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK);
+ (reg_lo & PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW);
if (offset > pci_resource_len(pdev, bar)) {
dev_warn(&pdev->dev,
@@ -311,15 +311,15 @@ static int __cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_ty
};
regloc = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
- CXL_DVSEC_REG_LOCATOR);
+ PCI_DVSEC_CXL_REG_LOCATOR);
if (!regloc)
return -ENXIO;
pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, ®loc_size);
- regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size);
+ regloc_size = PCI_DVSEC_HEADER1_LEN(regloc_size);
- regloc += CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET;
- regblocks = (regloc_size - CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET) / 8;
+ regloc += PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1;
+ regblocks = (regloc_size - PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1) / 8;
for (i = 0; i < regblocks; i++, regloc += 8) {
u32 reg_lo, reg_hi;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0be4e508affe..b7f694bda913 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -933,7 +933,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
cxlds->rcd = is_cxl_restricted(pdev);
cxlds->serial = pci_get_dsn(pdev);
cxlds->cxl_dvsec = pci_find_dvsec_capability(
- pdev, PCI_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
+ pdev, PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_DEVICE);
if (!cxlds->cxl_dvsec)
dev_warn(&pdev->dev,
"Device DVSEC not present, skip CXL.mem init\n");
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 6c4b6f19b18e..662582bdccf0 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1333,63 +1333,57 @@
#define PCI_IDE_SEL_ADDR_3(x) (28 + (x) * PCI_IDE_SEL_ADDR_BLOCK_SIZE)
#define PCI_IDE_SEL_BLOCK_SIZE(nr_assoc) (20 + PCI_IDE_SEL_ADDR_BLOCK_SIZE * (nr_assoc))
-/* Compute Express Link (CXL r3.1, sec 8.1.5) */
-#define PCI_DVSEC_CXL_PORT 3
-#define PCI_DVSEC_CXL_PORT_CTL 0x0c
-#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
-
/*
- * Compute Express Link (CXL r3.2, sec 8.1)
+ * Compute Express Link (CXL r4.0, sec 8.1)
*
* Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
- * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
+ * is "disconnected" (CXL r4.0, sec 9.12.3). Re-enumerate these
* registers on downstream link-up events.
*/
-#define PCI_DVSEC_HEADER1_LENGTH_MASK __GENMASK(31, 20)
-
-/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
-#define CXL_DVSEC_PCIE_DEVICE 0
-#define CXL_DVSEC_CAP_OFFSET 0xA
-#define CXL_DVSEC_MEM_CAPABLE _BITUL(2)
-#define CXL_DVSEC_HDM_COUNT_MASK __GENMASK(5, 4)
-#define CXL_DVSEC_CTRL_OFFSET 0xC
-#define CXL_DVSEC_MEM_ENABLE _BITUL(2)
-#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
-#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
-#define CXL_DVSEC_MEM_INFO_VALID _BITUL(0)
-#define CXL_DVSEC_MEM_ACTIVE _BITUL(1)
-#define CXL_DVSEC_MEM_SIZE_LOW_MASK __GENMASK(31, 28)
-#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
-#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
-#define CXL_DVSEC_MEM_BASE_LOW_MASK __GENMASK(31, 28)
+
+/* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */
+#define PCI_DVSEC_CXL_DEVICE 0
+#define PCI_DVSEC_CXL_CAP 0xA
+#define PCI_DVSEC_CXL_MEM_CAPABLE _BITUL(2)
+#define PCI_DVSEC_CXL_HDM_COUNT __GENMASK(5, 4)
+#define PCI_DVSEC_CXL_CTRL 0xC
+#define PCI_DVSEC_CXL_MEM_ENABLE _BITUL(2)
+#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
+#define PCI_DVSEC_CXL_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
+#define PCI_DVSEC_CXL_MEM_INFO_VALID _BITUL(0)
+#define PCI_DVSEC_CXL_MEM_ACTIVE _BITUL(1)
+#define PCI_DVSEC_CXL_MEM_SIZE_LOW __GENMASK(31, 28)
+#define PCI_DVSEC_CXL_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
+#define PCI_DVSEC_CXL_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
+#define PCI_DVSEC_CXL_MEM_BASE_LOW __GENMASK(31, 28)
#define CXL_DVSEC_RANGE_MAX 2
-/* CXL 3.2 8.1.4: Non-CXL Function Map DVSEC */
-#define CXL_DVSEC_FUNCTION_MAP 2
-
-/* CXL 3.2 8.1.5: Extensions DVSEC for Ports */
-#define CXL_DVSEC_PORT 3
-#define CXL_DVSEC_PORT_CTL 0x0c
-#define CXL_DVSEC_PORT_CTL_UNMASK_SBR 0x00000001
-
-/* CXL 3.2 8.1.6: GPF DVSEC for CXL Port */
-#define CXL_DVSEC_PORT_GPF 4
-#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
-#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK __GENMASK(3, 0)
-#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK __GENMASK(11, 8)
-#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
-#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK __GENMASK(3, 0)
-#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK __GENMASK(11, 8)
-
-/* CXL 3.2 8.1.7: GPF DVSEC for CXL Device */
-#define CXL_DVSEC_DEVICE_GPF 5
-
-/* CXL 3.2 8.1.9: Register Locator DVSEC */
-#define CXL_DVSEC_REG_LOCATOR 8
-#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
-#define CXL_DVSEC_REG_LOCATOR_BIR_MASK __GENMASK(2, 0)
-#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK __GENMASK(15, 8)
-#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK __GENMASK(31, 16)
+/* CXL r4.0, 8.1.4: Non-CXL Function Map DVSEC */
+#define PCI_DVSEC_CXL_FUNCTION_MAP 2
+
+/* CXL r4.0, 8.1.5: Extensions DVSEC for Ports */
+#define PCI_DVSEC_CXL_PORT 3
+#define PCI_DVSEC_CXL_PORT_CTL 0x0c
+#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
+
+/* CXL r4.0, 8.1.6: GPF DVSEC for CXL Port */
+#define PCI_DVSEC_CXL_PORT_GPF 4
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL 0x0C
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE __GENMASK(3, 0)
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE __GENMASK(11, 8)
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL 0xE
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE __GENMASK(3, 0)
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE __GENMASK(11, 8)
+
+/* CXL r4.0, 8.1.7: GPF DVSEC for CXL Device */
+#define PCI_DVSEC_CXL_DEVICE_GPF 5
+
+/* CXL r4.0, 8.1.9: Register Locator DVSEC */
+#define PCI_DVSEC_CXL_REG_LOCATOR 8
+#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1 0xC
+#define PCI_DVSEC_CXL_REG_LOCATOR_BIR __GENMASK(2, 0)
+#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID __GENMASK(15, 8)
+#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW __GENMASK(31, 16)
#endif /* LINUX_PCI_REGS_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 03/34] PCI: Introduce pcie_is_cxl()
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-01-14 18:20 ` [PATCH v14 01/34] PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2026-01-14 18:20 ` [PATCH v14 02/34] PCI: Update CXL DVSEC definitions Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-21 1:19 ` dan.j.williams
2026-01-22 18:39 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 04/34] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
` (30 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
CXL and AER drivers need the ability to identify CXL devices.
Introduce set_pcie_cxl() with logic checking for CXL.mem or CXL.cache
status in the CXL Flex Bus DVSEC status register. The CXL Flex Bus DVSEC
presence is used because it is required for all the CXL PCIe devices.[1]
Add boolean 'struct pci_dev::is_cxl' with the purpose to cache the CXL
CXL.cache and CXl.mem status.
Call set_pcie_cxl() for the parent bridge. Once a device is created there
is a possibility the parent training or CXL state was updated as well. This
will make certain the correct parent CXL state is cached.
Add function pcie_is_cxl() to return 'struct pci_dev::is_cxl'.
[1] CXL 3.1 Spec, 8.1.1 PCIe Designated Vendor-Specific Extended
Capability (DVSEC) ID Assignment, Table 8-2
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- Move FLEXBUS_STATUS DVSEC here (Jonathan)
- Remove check for EP and USP (Dan)
- Update commit message (Bjorn)
- Fix writing past 80 columns (Bjorn)
- Add pci_is_pcie() parent bridge check at beginning of function (Bjorn)
Changes in v12->v13:
- Add Ben's "reviewed-by"
Changes in v11->v12:
- Add review-by for Alejandro
- Add comment in set_pcie_cxl() explaining why updating parent status.
Changes in v10->v11:
- Amend set_pcie_cxl() to check for Upstream Port's and EP's parent
downstream port by calling set_pcie_cxl(). (Dan)
- Retitle patch: 'Add' -> 'Introduce'
- Add check for CXL.mem and CXL.cache (Alejandro, Dan)
---
drivers/pci/probe.c | 31 +++++++++++++++++++++++++++++++
include/linux/pci.h | 6 ++++++
include/uapi/linux/pci_regs.h | 6 ++++++
3 files changed, 43 insertions(+)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 41183aed8f5d..bd7ce41d0c7a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1735,6 +1735,35 @@ static void set_pcie_thunderbolt(struct pci_dev *dev)
dev->is_thunderbolt = 1;
}
+static void set_pcie_cxl(struct pci_dev *dev)
+{
+ struct pci_dev *bridge;
+ u16 dvsec, cap;
+
+ if (!pci_is_pcie(dev))
+ return;
+
+ /*
+ * Update parent's CXL state because alternate protocol training
+ * may have changed
+ */
+ bridge = pci_upstream_bridge(dev);
+ if (bridge)
+ set_pcie_cxl(bridge);
+
+ dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_FLEXBUS_PORT);
+ if (!dvsec)
+ return;
+
+ pci_read_config_word(dev, dvsec + PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS,
+ &cap);
+
+ dev->is_cxl = FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_CACHE, cap) ||
+ FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_MEM, cap);
+
+}
+
static void set_pcie_untrusted(struct pci_dev *dev)
{
struct pci_dev *parent = pci_upstream_bridge(dev);
@@ -2065,6 +2094,8 @@ int pci_setup_device(struct pci_dev *dev)
/* Need to have dev->cfg_size ready */
set_pcie_thunderbolt(dev);
+ set_pcie_cxl(dev);
+
set_pcie_untrusted(dev);
if (pci_is_pcie(dev))
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 864775651c6f..f8e8b3df794d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -463,6 +463,7 @@ struct pci_dev {
unsigned int is_pciehp:1;
unsigned int shpc_managed:1; /* SHPC owned by shpchp */
unsigned int is_thunderbolt:1; /* Thunderbolt controller */
+ unsigned int is_cxl:1; /* Compute Express Link (CXL) */
/*
* Devices marked being untrusted are the ones that can potentially
* execute DMA attacks and similar. They are typically connected
@@ -791,6 +792,11 @@ static inline bool pci_is_display(struct pci_dev *pdev)
return (pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY;
}
+static inline bool pcie_is_cxl(struct pci_dev *pci_dev)
+{
+ return pci_dev->is_cxl;
+}
+
#define for_each_pci_bridge(dev, bus) \
list_for_each_entry(dev, &bus->devices, bus_list) \
if (!pci_is_bridge(dev)) {} else
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 662582bdccf0..b6622fd60fd9 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1379,6 +1379,12 @@
/* CXL r4.0, 8.1.7: GPF DVSEC for CXL Device */
#define PCI_DVSEC_CXL_DEVICE_GPF 5
+/* CXL r4.0, 8.1.8: Flex Bus DVSEC */
+#define PCI_DVSEC_CXL_FLEXBUS_PORT 7
+#define PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS 0xE
+#define PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_CACHE _BITUL(0)
+#define PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_MEM _BITUL(2)
+
/* CXL r4.0, 8.1.9: Register Locator DVSEC */
#define PCI_DVSEC_CXL_REG_LOCATOR 8
#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1 0xC
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 04/34] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (2 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 03/34] PCI: Introduce pcie_is_cxl() Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 18:20 ` [PATCH v14 05/34] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
` (29 subsequent siblings)
33 siblings, 0 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The CXL driver's cxl_handle_endpoint_cor_ras()/cxl_handle_endpoint_ras()
are unnecessary helper functions used only for Endpoints. Remove these
functions as they are not common for all CXL devices and do not provide
value for EP handling.
Rename __cxl_handle_ras to cxl_handle_ras() and __cxl_handle_cor_ras()
to cxl_handle_cor_ras().
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- None
Changes in v12->v13:
- None
Changes in v11->v12:
- Added Dave Jiang's review by
- Moved to front of series
Changes in v10->v11:
- None
---
drivers/cxl/core/pci.c | 26 ++++++++------------------
1 file changed, 8 insertions(+), 18 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 077b386e0c8d..3ec7407f0c5d 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -632,8 +632,8 @@ void read_cdat_data(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL");
-static void __cxl_handle_cor_ras(struct cxl_dev_state *cxlds,
- void __iomem *ras_base)
+static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds,
+ void __iomem *ras_base)
{
void __iomem *addr;
u32 status;
@@ -649,11 +649,6 @@ static void __cxl_handle_cor_ras(struct cxl_dev_state *cxlds,
}
}
-static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds)
-{
- return __cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
-}
-
/* CXL spec rev3.0 8.2.4.16.1 */
static void header_log_copy(void __iomem *ras_base, u32 *log)
{
@@ -675,8 +670,8 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
*/
-static bool __cxl_handle_ras(struct cxl_dev_state *cxlds,
- void __iomem *ras_base)
+static bool cxl_handle_ras(struct cxl_dev_state *cxlds,
+ void __iomem *ras_base)
{
u32 hl[CXL_HEADERLOG_SIZE_U32];
void __iomem *addr;
@@ -709,11 +704,6 @@ static bool __cxl_handle_ras(struct cxl_dev_state *cxlds,
return true;
}
-static bool cxl_handle_endpoint_ras(struct cxl_dev_state *cxlds)
-{
- return __cxl_handle_ras(cxlds, cxlds->regs.ras);
-}
-
#ifdef CONFIG_PCIEAER_CXL
static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
@@ -792,13 +782,13 @@ EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
struct cxl_dport *dport)
{
- return __cxl_handle_cor_ras(cxlds, dport->regs.ras);
+ return cxl_handle_cor_ras(cxlds, dport->regs.ras);
}
static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
struct cxl_dport *dport)
{
- return __cxl_handle_ras(cxlds, dport->regs.ras);
+ return cxl_handle_ras(cxlds, dport->regs.ras);
}
/*
@@ -895,7 +885,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
if (cxlds->rcd)
cxl_handle_rdport_errors(cxlds);
- cxl_handle_endpoint_cor_ras(cxlds);
+ cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
}
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
@@ -924,7 +914,7 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
* chance the situation is recoverable dump the status of the RAS
* capability registers and bounce the active state of the memdev.
*/
- ue = cxl_handle_endpoint_ras(cxlds);
+ ue = cxl_handle_ras(cxlds, cxlds->regs.ras);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 05/34] cxl/pci: Remove unnecessary CXL RCH handling helper functions
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (3 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 04/34] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 18:20 ` [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native() Terry Bowman
` (28 subsequent siblings)
33 siblings, 0 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
cxl_handle_rdport_cor_ras() and cxl_handle_rdport_ras() are specific
to Restricted CXL Host (RCH) handling. Improve readability and
maintainability by replacing these and instead using the common
cxl_handle_cor_ras() and cxl_handle_ras() functions.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- None
Changes in v12->v13:
- None
Changes in v11->v12:
- Add reviewed-by for Alejandro & Dave Jiang
- Moved to front of series
Changes in v10->v11:
- New patch
---
drivers/cxl/core/pci.c | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 3ec7407f0c5d..51bb0f372e40 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -779,18 +779,6 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
-static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
- struct cxl_dport *dport)
-{
- return cxl_handle_cor_ras(cxlds, dport->regs.ras);
-}
-
-static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
- struct cxl_dport *dport)
-{
- return cxl_handle_ras(cxlds, dport->regs.ras);
-}
-
/*
* Copy the AER capability registers using 32 bit read accesses.
* This is necessary because RCRB AER capability is MMIO mapped. Clear the
@@ -860,9 +848,9 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
pci_print_aer(pdev, severity, &aer_regs);
if (severity == AER_CORRECTABLE)
- cxl_handle_rdport_cor_ras(cxlds, dport);
+ cxl_handle_cor_ras(cxlds, dport->regs.ras);
else
- cxl_handle_rdport_ras(cxlds, dport);
+ cxl_handle_ras(cxlds, dport->regs.ras);
}
#else
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native()
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (4 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 05/34] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 18:55 ` Jonathan Cameron
` (2 more replies)
2026-01-14 18:20 ` [PATCH v14 07/34] cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c Terry Bowman
` (27 subsequent siblings)
33 siblings, 3 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The AER driver includes a CXL support function cxl_error_is_native(). This
function adds no additional value from pcie_aer_is_native().
Simplify the codebase by removing cxl_error_is_native() and replace
occurrences of cxl_error_is_native() with pcie_aer_is_native().
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- New commit (Dan)
---
drivers/pci/pcie/aer.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e0bcaa896803..c99ba2a1159c 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1166,13 +1166,6 @@ static bool is_cxl_mem_dev(struct pci_dev *dev)
return true;
}
-static bool cxl_error_is_native(struct pci_dev *dev)
-{
- struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
-
- return (pcie_ports_native || host->native_aer);
-}
-
static bool is_internal_error(struct aer_err_info *info)
{
if (info->severity == AER_CORRECTABLE)
@@ -1186,7 +1179,7 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
struct aer_err_info *info = (struct aer_err_info *)data;
const struct pci_error_handlers *err_handler;
- if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
+ if (!is_cxl_mem_dev(dev) || !pcie_aer_is_native(dev))
return 0;
/* Protect dev->driver */
@@ -1227,7 +1220,7 @@ static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
bool *handles_cxl = data;
if (!*handles_cxl)
- *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
+ *handles_cxl = is_cxl_mem_dev(dev) && pcie_aer_is_native(dev);
/* Non-zero terminates iteration */
return *handles_cxl;
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 07/34] cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (5 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native() Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 20:51 ` Dave Jiang
2026-01-14 18:20 ` [PATCH v14 08/34] cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c Terry Bowman
` (26 subsequent siblings)
33 siblings, 1 reply; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
From: Dave Jiang <dave.jiang@intel.com>
Create new config CONFIG_CXL_RAS and put all CXL RAS items behind the
config. The config will depend on CPER and PCIE AER to build. Move the
related VH RAS code from core/pci.c to core/ras.c.
Restricted CXL host (RCH) RAS functions will be moved in a future patch.
Cc: Robert Richter <rrichter@amd.com>
Cc: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- None
Changes in v12->v13:
- None
Changes in v11->v12:
- None
Changes in v10->v11:
- New patch
- Updated by Terry Bowman to use (ACPI_APEI_GHES && PCIEAER_CXL) dependency
in Kconfig. Otherwise checks will be reauired for CONFIG_PCIEAER because
AER driver functions are called.
---
drivers/cxl/Kconfig | 4 +
drivers/cxl/core/Makefile | 2 +-
drivers/cxl/core/core.h | 31 +++++++
drivers/cxl/core/pci.c | 189 +-------------------------------------
drivers/cxl/core/ras.c | 176 +++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 8 --
drivers/cxl/cxlpci.h | 16 ++++
tools/testing/cxl/Kbuild | 2 +-
8 files changed, 233 insertions(+), 195 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 48b7314afdb8..217888992c88 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -233,4 +233,8 @@ config CXL_MCE
def_bool y
depends on X86_MCE && MEMORY_FAILURE
+config CXL_RAS
+ def_bool y
+ depends on ACPI_APEI_GHES && PCIEAER && CXL_PCI
+
endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 5ad8fef210b5..b2930cc54f8b 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -14,9 +14,9 @@ cxl_core-y += pci.o
cxl_core-y += hdm.o
cxl_core-y += pmu.o
cxl_core-y += cdat.o
-cxl_core-y += ras.o
cxl_core-$(CONFIG_TRACING) += trace.o
cxl_core-$(CONFIG_CXL_REGION) += region.o
cxl_core-$(CONFIG_CXL_MCE) += mce.o
cxl_core-$(CONFIG_CXL_FEATURES) += features.o
cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
+cxl_core-$(CONFIG_CXL_RAS) += ras.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 1fb66132b777..bc818de87ccc 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -144,8 +144,39 @@ int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c);
int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
struct access_coordinate *c);
+#ifdef CONFIG_CXL_RAS
int cxl_ras_init(void);
void cxl_ras_exit(void);
+bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
+void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
+#else
+static inline int cxl_ras_init(void)
+{
+ return 0;
+}
+
+static inline void cxl_ras_exit(void)
+{
+}
+
+static inline bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+{
+ return false;
+}
+static inline void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base) { }
+#endif /* CONFIG_CXL_RAS */
+
+/* Restricted CXL Host specific RAS functions */
+#ifdef CONFIG_CXL_RAS
+void cxl_dport_map_rch_aer(struct cxl_dport *dport);
+void cxl_disable_rch_root_ints(struct cxl_dport *dport);
+void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
+#else
+static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
+static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
+static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
+#endif /* CONFIG_CXL_RAS */
+
int cxl_gpf_port_setup(struct cxl_dport *dport);
struct cxl_hdm;
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 51bb0f372e40..e132fff80979 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -632,81 +632,8 @@ void read_cdat_data(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL");
-static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds,
- void __iomem *ras_base)
-{
- void __iomem *addr;
- u32 status;
-
- if (!ras_base)
- return;
-
- addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
- status = readl(addr);
- if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
- writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
- }
-}
-
-/* CXL spec rev3.0 8.2.4.16.1 */
-static void header_log_copy(void __iomem *ras_base, u32 *log)
-{
- void __iomem *addr;
- u32 *log_addr;
- int i, log_u32_size = CXL_HEADERLOG_SIZE / sizeof(u32);
-
- addr = ras_base + CXL_RAS_HEADER_LOG_OFFSET;
- log_addr = log;
-
- for (i = 0; i < log_u32_size; i++) {
- *log_addr = readl(addr);
- log_addr++;
- addr += sizeof(u32);
- }
-}
-
-/*
- * Log the state of the RAS status registers and prepare them to log the
- * next error status. Return 1 if reset needed.
- */
-static bool cxl_handle_ras(struct cxl_dev_state *cxlds,
- void __iomem *ras_base)
-{
- u32 hl[CXL_HEADERLOG_SIZE_U32];
- void __iomem *addr;
- u32 status;
- u32 fe;
-
- if (!ras_base)
- return false;
-
- addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
- status = readl(addr);
- if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
- return false;
-
- /* If multiple errors, log header points to first error from ctrl reg */
- if (hweight32(status) > 1) {
- void __iomem *rcc_addr =
- ras_base + CXL_RAS_CAP_CONTROL_OFFSET;
-
- fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
- readl(rcc_addr)));
- } else {
- fe = status;
- }
-
- header_log_copy(ras_base, hl);
- trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl);
- writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
-
- return true;
-}
-
-#ifdef CONFIG_PCIEAER_CXL
-
-static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
+#ifdef CONFIG_CXL_RAS
+void cxl_dport_map_rch_aer(struct cxl_dport *dport)
{
resource_size_t aer_phys;
struct device *host;
@@ -721,19 +648,7 @@ static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
}
}
-static void cxl_dport_map_ras(struct cxl_dport *dport)
-{
- struct cxl_register_map *map = &dport->reg_map;
- struct device *dev = dport->dport_dev;
-
- if (!map->component_map.ras.valid)
- dev_dbg(dev, "RAS registers not found\n");
- else if (cxl_map_component_regs(map, &dport->regs.component,
- BIT(CXL_CM_CAP_CAP_ID_RAS)))
- dev_dbg(dev, "Failed to map RAS capability.\n");
-}
-
-static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
+void cxl_disable_rch_root_ints(struct cxl_dport *dport)
{
void __iomem *aer_base = dport->regs.dport_aer;
u32 aer_cmd_mask, aer_cmd;
@@ -757,28 +672,6 @@ static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
}
-/**
- * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
- * @dport: the cxl_dport that needs to be initialized
- * @host: host device for devm operations
- */
-void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
-{
- dport->reg_map.host = host;
- cxl_dport_map_ras(dport);
-
- if (dport->rch) {
- struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
-
- if (!host_bridge->native_aer)
- return;
-
- cxl_dport_map_rch_aer(dport);
- cxl_disable_rch_root_ints(dport);
- }
-}
-EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
-
/*
* Copy the AER capability registers using 32 bit read accesses.
* This is necessary because RCRB AER capability is MMIO mapped. Clear the
@@ -827,7 +720,7 @@ static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
return false;
}
-static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
+void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
{
struct pci_dev *pdev = to_pci_dev(cxlds->dev);
struct aer_capability_regs aer_regs;
@@ -852,82 +745,8 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
else
cxl_handle_ras(cxlds, dport->regs.ras);
}
-
-#else
-static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
#endif
-void cxl_cor_error_detected(struct pci_dev *pdev)
-{
- struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- struct device *dev = &cxlds->cxlmd->dev;
-
- scoped_guard(device, dev) {
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return;
- }
-
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
-
- cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
- }
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
-
-pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state)
-{
- struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- struct cxl_memdev *cxlmd = cxlds->cxlmd;
- struct device *dev = &cxlmd->dev;
- bool ue;
-
- scoped_guard(device, dev) {
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return PCI_ERS_RESULT_DISCONNECT;
- }
-
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
- /*
- * A frozen channel indicates an impending reset which is fatal to
- * CXL.mem operation, and will likely crash the system. On the off
- * chance the situation is recoverable dump the status of the RAS
- * capability registers and bounce the active state of the memdev.
- */
- ue = cxl_handle_ras(cxlds, cxlds->regs.ras);
- }
-
-
- switch (state) {
- case pci_channel_io_normal:
- if (ue) {
- device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- }
- return PCI_ERS_RESULT_CAN_RECOVER;
- case pci_channel_io_frozen:
- dev_warn(&pdev->dev,
- "%s: frozen state error detected, disable CXL.mem\n",
- dev_name(dev));
- device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- case pci_channel_io_perm_failure:
- dev_warn(&pdev->dev,
- "failure state error detected, request disconnect\n");
- return PCI_ERS_RESULT_DISCONNECT;
- }
- return PCI_ERS_RESULT_NEED_RESET;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
-
static int cxl_flit_size(struct pci_dev *pdev)
{
if (cxl_pci_flit_256(pdev))
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 2731ba3a0799..b933030b8e1e 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -5,6 +5,7 @@
#include <linux/aer.h>
#include <cxl/event.h>
#include <cxlmem.h>
+#include <cxlpci.h>
#include "trace.h"
static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
@@ -124,3 +125,178 @@ void cxl_ras_exit(void)
cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
cancel_work_sync(&cxl_cper_prot_err_work);
}
+
+static void cxl_dport_map_ras(struct cxl_dport *dport)
+{
+ struct cxl_register_map *map = &dport->reg_map;
+ struct device *dev = dport->dport_dev;
+
+ if (!map->component_map.ras.valid)
+ dev_dbg(dev, "RAS registers not found\n");
+ else if (cxl_map_component_regs(map, &dport->regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_RAS)))
+ dev_dbg(dev, "Failed to map RAS capability.\n");
+}
+
+/**
+ * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
+ * @dport: the cxl_dport that needs to be initialized
+ * @host: host device for devm operations
+ */
+void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
+{
+ dport->reg_map.host = host;
+ cxl_dport_map_ras(dport);
+
+ if (dport->rch) {
+ struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
+
+ if (!host_bridge->native_aer)
+ return;
+
+ cxl_dport_map_rch_aer(dport);
+ cxl_disable_rch_root_ints(dport);
+ }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
+
+void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+{
+ void __iomem *addr;
+ u32 status;
+
+ if (!ras_base)
+ return;
+
+ addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
+ status = readl(addr);
+ if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
+ writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
+ trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
+ }
+}
+
+/* CXL spec rev3.0 8.2.4.16.1 */
+static void header_log_copy(void __iomem *ras_base, u32 *log)
+{
+ void __iomem *addr;
+ u32 *log_addr;
+ int i, log_u32_size = CXL_HEADERLOG_SIZE / sizeof(u32);
+
+ addr = ras_base + CXL_RAS_HEADER_LOG_OFFSET;
+ log_addr = log;
+
+ for (i = 0; i < log_u32_size; i++) {
+ *log_addr = readl(addr);
+ log_addr++;
+ addr += sizeof(u32);
+ }
+}
+
+/*
+ * Log the state of the RAS status registers and prepare them to log the
+ * next error status. Return 1 if reset needed.
+ */
+bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+{
+ u32 hl[CXL_HEADERLOG_SIZE_U32];
+ void __iomem *addr;
+ u32 status;
+ u32 fe;
+
+ if (!ras_base)
+ return false;
+
+ addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
+ status = readl(addr);
+ if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
+ return false;
+
+ /* If multiple errors, log header points to first error from ctrl reg */
+ if (hweight32(status) > 1) {
+ void __iomem *rcc_addr =
+ ras_base + CXL_RAS_CAP_CONTROL_OFFSET;
+
+ fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
+ readl(rcc_addr)));
+ } else {
+ fe = status;
+ }
+
+ header_log_copy(ras_base, hl);
+ trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl);
+ writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
+
+ return true;
+}
+
+void cxl_cor_error_detected(struct pci_dev *pdev)
+{
+ struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+ struct device *dev = &cxlds->cxlmd->dev;
+
+ scoped_guard(device, dev) {
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return;
+ }
+
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+
+ cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
+ }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
+
+pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+ struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+ struct cxl_memdev *cxlmd = cxlds->cxlmd;
+ struct device *dev = &cxlmd->dev;
+ bool ue;
+
+ scoped_guard(device, dev) {
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+ /*
+ * A frozen channel indicates an impending reset which is fatal to
+ * CXL.mem operation, and will likely crash the system. On the off
+ * chance the situation is recoverable dump the status of the RAS
+ * capability registers and bounce the active state of the memdev.
+ */
+ ue = cxl_handle_ras(cxlds, cxlds->regs.ras);
+ }
+
+
+ switch (state) {
+ case pci_channel_io_normal:
+ if (ue) {
+ device_release_driver(dev);
+ return PCI_ERS_RESULT_NEED_RESET;
+ }
+ return PCI_ERS_RESULT_CAN_RECOVER;
+ case pci_channel_io_frozen:
+ dev_warn(&pdev->dev,
+ "%s: frozen state error detected, disable CXL.mem\n",
+ dev_name(dev));
+ device_release_driver(dev);
+ return PCI_ERS_RESULT_NEED_RESET;
+ case pci_channel_io_perm_failure:
+ dev_warn(&pdev->dev,
+ "failure state error detected, request disconnect\n");
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+ return PCI_ERS_RESULT_NEED_RESET;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index ba17fa86d249..42a76a7a088f 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -803,14 +803,6 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
struct device *dport_dev, int port_id,
resource_size_t rcrb);
-#ifdef CONFIG_PCIEAER_CXL
-void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport);
-void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
-#else
-static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
- struct device *host) { }
-#endif
-
struct cxl_decoder *to_cxl_decoder(struct device *dev);
struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index cdb7cf3dbcb4..6f9c78886fd9 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -76,7 +76,23 @@ static inline bool cxl_pci_flit_256(struct pci_dev *pdev)
struct cxl_dev_state;
void read_cdat_data(struct cxl_port *port);
+
+#ifdef CONFIG_CXL_RAS
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
+void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
+#else
+static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
+
+static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+ return PCI_ERS_RESULT_NONE;
+}
+
+static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
+ struct device *host) { }
+#endif
+
#endif /* __CXL_PCI_H__ */
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 0e151d0572d1..b7ea66382f3b 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -57,12 +57,12 @@ cxl_core-y += $(CXL_CORE_SRC)/pci.o
cxl_core-y += $(CXL_CORE_SRC)/hdm.o
cxl_core-y += $(CXL_CORE_SRC)/pmu.o
cxl_core-y += $(CXL_CORE_SRC)/cdat.o
-cxl_core-y += $(CXL_CORE_SRC)/ras.o
cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
+cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras.o
cxl_core-y += config_check.o
cxl_core-y += cxl_core_test.o
cxl_core-y += cxl_core_exports.o
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 08/34] cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (6 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 07/34] cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 20:35 ` Dave Jiang
2026-01-14 18:20 ` [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
` (25 subsequent siblings)
33 siblings, 1 reply; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
Restricted CXL Host (RCH) protocol error handling uses a procedure distinct
from the CXL Virtual Hierarchy (VH) handling. This is because of the
differences in the RCH and VH topologies. Improve the maintainability and
add ability to enable/disable RCH handling.
Move and combine the RCH handling code into a single block conditionally
compiled with the CONFIG_CXL_RCH_RAS kernel config.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- Add sign-off for Dan and Jonathan
- Revert inadvertent formatting of cxl_dport_map_rch_aer() (Jonathan)
- Remove default value for CXL_RCH_RAS (Dan)
- Remove unnecessary pci.h include in core.h & ras_rch.c (Jonathan)
- Add linux/types.h include in ras_rch.c (Jonathan)
- Change CONFIG_CXL_RCH_RAS -> CONFIG_CXL_RAS (Dan)
Changes in v12->v13:
- None
Changes v11->v12:
- Moved CXL_RCH_RAS Kconfig definition here from following commit.
Changes v10->v11:
- New patch
---
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/core.h | 11 +---
drivers/cxl/core/pci.c | 115 -----------------------------------
drivers/cxl/core/ras_rch.c | 121 +++++++++++++++++++++++++++++++++++++
tools/testing/cxl/Kbuild | 1 +
5 files changed, 126 insertions(+), 123 deletions(-)
create mode 100644 drivers/cxl/core/ras_rch.c
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index b2930cc54f8b..b37f38d502d8 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -20,3 +20,4 @@ cxl_core-$(CONFIG_CXL_MCE) += mce.o
cxl_core-$(CONFIG_CXL_FEATURES) += features.o
cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
cxl_core-$(CONFIG_CXL_RAS) += ras.o
+cxl_core-$(CONFIG_CXL_RAS) += ras_rch.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index bc818de87ccc..724361195057 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -149,6 +149,9 @@ int cxl_ras_init(void);
void cxl_ras_exit(void);
bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
+void cxl_dport_map_rch_aer(struct cxl_dport *dport);
+void cxl_disable_rch_root_ints(struct cxl_dport *dport);
+void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
#else
static inline int cxl_ras_init(void)
{
@@ -164,14 +167,6 @@ static inline bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras
return false;
}
static inline void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base) { }
-#endif /* CONFIG_CXL_RAS */
-
-/* Restricted CXL Host specific RAS functions */
-#ifdef CONFIG_CXL_RAS
-void cxl_dport_map_rch_aer(struct cxl_dport *dport);
-void cxl_disable_rch_root_ints(struct cxl_dport *dport);
-void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
-#else
static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index e132fff80979..b838c59d7a3c 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -632,121 +632,6 @@ void read_cdat_data(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL");
-#ifdef CONFIG_CXL_RAS
-void cxl_dport_map_rch_aer(struct cxl_dport *dport)
-{
- resource_size_t aer_phys;
- struct device *host;
- u16 aer_cap;
-
- aer_cap = cxl_rcrb_to_aer(dport->dport_dev, dport->rcrb.base);
- if (aer_cap) {
- host = dport->reg_map.host;
- aer_phys = aer_cap + dport->rcrb.base;
- dport->regs.dport_aer = devm_cxl_iomap_block(host, aer_phys,
- sizeof(struct aer_capability_regs));
- }
-}
-
-void cxl_disable_rch_root_ints(struct cxl_dport *dport)
-{
- void __iomem *aer_base = dport->regs.dport_aer;
- u32 aer_cmd_mask, aer_cmd;
-
- if (!aer_base)
- return;
-
- /*
- * Disable RCH root port command interrupts.
- * CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors
- *
- * This sequence may not be necessary. CXL spec states disabling
- * the root cmd register's interrupts is required. But, PCI spec
- * shows these are disabled by default on reset.
- */
- aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN |
- PCI_ERR_ROOT_CMD_NONFATAL_EN |
- PCI_ERR_ROOT_CMD_FATAL_EN);
- aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND);
- aer_cmd &= ~aer_cmd_mask;
- writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
-}
-
-/*
- * Copy the AER capability registers using 32 bit read accesses.
- * This is necessary because RCRB AER capability is MMIO mapped. Clear the
- * status after copying.
- *
- * @aer_base: base address of AER capability block in RCRB
- * @aer_regs: destination for copying AER capability
- */
-static bool cxl_rch_get_aer_info(void __iomem *aer_base,
- struct aer_capability_regs *aer_regs)
-{
- int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
- u32 *aer_regs_buf = (u32 *)aer_regs;
- int n;
-
- if (!aer_base)
- return false;
-
- /* Use readl() to guarantee 32-bit accesses */
- for (n = 0; n < read_cnt; n++)
- aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
-
- writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
- writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
-
- return true;
-}
-
-/* Get AER severity. Return false if there is no error. */
-static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
- int *severity)
-{
- if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
- if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
- *severity = AER_FATAL;
- else
- *severity = AER_NONFATAL;
- return true;
- }
-
- if (aer_regs->cor_status & ~aer_regs->cor_mask) {
- *severity = AER_CORRECTABLE;
- return true;
- }
-
- return false;
-}
-
-void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
-{
- struct pci_dev *pdev = to_pci_dev(cxlds->dev);
- struct aer_capability_regs aer_regs;
- struct cxl_dport *dport;
- int severity;
-
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return;
-
- if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
- return;
-
- if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
- return;
-
- pci_print_aer(pdev, severity, &aer_regs);
-
- if (severity == AER_CORRECTABLE)
- cxl_handle_cor_ras(cxlds, dport->regs.ras);
- else
- cxl_handle_ras(cxlds, dport->regs.ras);
-}
-#endif
-
static int cxl_flit_size(struct pci_dev *pdev)
{
if (cxl_pci_flit_256(pdev))
diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
new file mode 100644
index 000000000000..ed58afd18ecc
--- /dev/null
+++ b/drivers/cxl/core/ras_rch.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2025 AMD Corporation. All rights reserved. */
+
+#include <linux/types.h>
+#include <linux/aer.h>
+#include "cxl.h"
+#include "core.h"
+#include "cxlmem.h"
+
+void cxl_dport_map_rch_aer(struct cxl_dport *dport)
+{
+ resource_size_t aer_phys;
+ struct device *host;
+ u16 aer_cap;
+
+ aer_cap = cxl_rcrb_to_aer(dport->dport_dev, dport->rcrb.base);
+ if (aer_cap) {
+ host = dport->reg_map.host;
+ aer_phys = aer_cap + dport->rcrb.base;
+ dport->regs.dport_aer =
+ devm_cxl_iomap_block(host, aer_phys,
+ sizeof(struct aer_capability_regs));
+ }
+}
+
+void cxl_disable_rch_root_ints(struct cxl_dport *dport)
+{
+ void __iomem *aer_base = dport->regs.dport_aer;
+ u32 aer_cmd_mask, aer_cmd;
+
+ if (!aer_base)
+ return;
+
+ /*
+ * Disable RCH root port command interrupts.
+ * CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors
+ *
+ * This sequence may not be necessary. CXL spec states disabling
+ * the root cmd register's interrupts is required. But, PCI spec
+ * shows these are disabled by default on reset.
+ */
+ aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN |
+ PCI_ERR_ROOT_CMD_NONFATAL_EN |
+ PCI_ERR_ROOT_CMD_FATAL_EN);
+ aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND);
+ aer_cmd &= ~aer_cmd_mask;
+ writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
+}
+
+/*
+ * Copy the AER capability registers using 32 bit read accesses.
+ * This is necessary because RCRB AER capability is MMIO mapped. Clear the
+ * status after copying.
+ *
+ * @aer_base: base address of AER capability block in RCRB
+ * @aer_regs: destination for copying AER capability
+ */
+static bool cxl_rch_get_aer_info(void __iomem *aer_base,
+ struct aer_capability_regs *aer_regs)
+{
+ int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
+ u32 *aer_regs_buf = (u32 *)aer_regs;
+ int n;
+
+ if (!aer_base)
+ return false;
+
+ /* Use readl() to guarantee 32-bit accesses */
+ for (n = 0; n < read_cnt; n++)
+ aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
+
+ writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
+ writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
+
+ return true;
+}
+
+/* Get AER severity. Return false if there is no error. */
+static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
+ int *severity)
+{
+ if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
+ if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
+ *severity = AER_FATAL;
+ else
+ *severity = AER_NONFATAL;
+ return true;
+ }
+
+ if (aer_regs->cor_status & ~aer_regs->cor_mask) {
+ *severity = AER_CORRECTABLE;
+ return true;
+ }
+
+ return false;
+}
+
+void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
+{
+ struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+ struct aer_capability_regs aer_regs;
+ struct cxl_dport *dport;
+ int severity;
+
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return;
+
+ if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
+ return;
+
+ if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
+ return;
+
+ pci_print_aer(pdev, severity, &aer_regs);
+ if (severity == AER_CORRECTABLE)
+ cxl_handle_cor_ras(cxlds, dport->regs.ras);
+ else
+ cxl_handle_ras(cxlds, dport->regs.ras);
+}
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index b7ea66382f3b..6eceefefb0e0 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -63,6 +63,7 @@ cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras.o
+cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras_rch.o
cxl_core-y += config_check.o
cxl_core-y += cxl_core_test.o
cxl_core-y += cxl_core_exports.o
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors()
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (7 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 08/34] cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 19:01 ` Jonathan Cameron
` (4 more replies)
2026-01-14 18:20 ` [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error() Terry Bowman
` (24 subsequent siblings)
33 siblings, 5 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
Internal PCIe errors are not enabled by default during initialization. This
creates a problem for CXL drivers, which rely on PCIe Correctable and
Uncorrectable Internal Errors to receive CXL protocol error notifications.
Export pci_aer_unmask_internal_errors() so CXL and other drivers can
enable internal PCIe errors.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13->v14:
- New commit. Bjorn requested separating out and adding immediatetly
before being used. This is called from cxl_rch_enable_rcec() in
following patch.
---
drivers/pci/pcie/aer.c | 6 +++---
include/linux/aer.h | 2 ++
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index c99ba2a1159c..63658e691aa2 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1120,8 +1120,6 @@ static bool find_source_device(struct pci_dev *parent,
return true;
}
-#ifdef CONFIG_PCIEAER_CXL
-
/**
* pci_aer_unmask_internal_errors - unmask internal errors
* @dev: pointer to the pci_dev data structure
@@ -1132,7 +1130,7 @@ static bool find_source_device(struct pci_dev *parent,
* Note: AER must be enabled and supported by the device which must be
* checked in advance, e.g. with pcie_aer_is_native().
*/
-static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
+void pci_aer_unmask_internal_errors(struct pci_dev *dev)
{
int aer = dev->aer_cap;
u32 mask;
@@ -1145,7 +1143,9 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
mask &= ~PCI_ERR_COR_INTERNAL;
pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
}
+EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
+#ifdef CONFIG_PCIEAER_CXL
static bool is_cxl_mem_dev(struct pci_dev *dev)
{
/*
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 02940be66324..df0f5c382286 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -56,12 +56,14 @@ struct aer_capability_regs {
#if defined(CONFIG_PCIEAER)
int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
int pcie_aer_is_native(struct pci_dev *dev);
+void pci_aer_unmask_internal_errors(struct pci_dev *dev);
#else
static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
{
return -EINVAL;
}
static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
+static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
#endif
void pci_print_aer(struct pci_dev *dev, int aer_severity,
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (8 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 19:08 ` Jonathan Cameron
` (2 more replies)
2026-01-14 18:20 ` [PATCH v14 11/34] PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c Terry Bowman
` (23 subsequent siblings)
33 siblings, 3 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The AER driver includes significant logic for handling CXL protocol errors.
The AER driver will be updated in the future to separate the AER and CXL
logic.
Rename the is_internal_error() function to is_aer_internal_error() as it
gives a more precise indication of the purpose. Make is_aer_internal_error()
non-static to allow for other PCI drivers to access.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13->v14:
- New patch
---
drivers/pci/pcie/aer.c | 4 ++--
drivers/pci/pcie/portdrv.h | 9 +++++++++
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 63658e691aa2..2527e8370186 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1166,7 +1166,7 @@ static bool is_cxl_mem_dev(struct pci_dev *dev)
return true;
}
-static bool is_internal_error(struct aer_err_info *info)
+bool is_aer_internal_error(struct aer_err_info *info)
{
if (info->severity == AER_CORRECTABLE)
return info->status & PCI_ERR_COR_INTERNAL;
@@ -1211,7 +1211,7 @@ static void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
* device driver.
*/
if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
- is_internal_error(info))
+ is_aer_internal_error(info))
pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
}
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index bd29d1cc7b8b..e7a0a2cffea9 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -123,4 +123,13 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {}
#endif /* !CONFIG_PCIE_PME */
struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
+
+struct aer_err_info;
+
+#ifdef CONFIG_PCIEAER_CXL
+bool is_aer_internal_error(struct aer_err_info *info);
+#else
+static inline bool is_aer_internal_error(struct aer_err_info *info) { return false; }
+#endif /* CONFIG_PCIEAER_CXL */
+
#endif /* _PORTDRV_H_ */
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 11/34] PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (9 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error() Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-22 17:23 ` Markus Elfring
2026-01-22 18:53 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 12/34] PCI/AER: Use guard() in cxl_rch_handle_error_iter() Terry Bowman
` (22 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The Restricted CXL Host (RCH) AER error handling logic currently resides
in the AER driver file, aer.c. CXL specific changes conditionally compiled
using #ifdefs.
Improve the AER driver maintainability by separating the RCH specific logic
from the AER driver's core functionality and removing the ifdefs. Introduce
drivers/pci/pcie/aer_cxl_rch.c for moving the RCH AER logic into. Conditionally
compile the file using the CONFIG_CXL_RCH_RAS Kconfig.
Move the CXL logic into the new file but leave CXL helper function
is_internal_error() in aer.c for now as it will be moved in future patch
for CXL Virtual Hierarchy handling.
To maintain compilation after the move other changes are required. Change
cxl_rch_handle_error(), cxl_rch_enable_rcec(), and is_internal_error() to
be non-static inorder for accessing from the AER driver.
Update the new file with the SPDX and 2023 AMD copyright notations because
the RCH bits were initially contributed in 2023 by AMD. See commit:
commit 0a867568bb0d ("PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler")
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- Add review-by and signed-off for Dan
- Commit message fixup (Dan)
- Update commit message with use-case description (Dan, Lukas)
- Make cxl_error_is_native() static (Dan)
Changes in v12->v13:
- Add forward declararation of 'struct aer_err_info' in pci/pci.h (Terry)
- Changed copyright date from 2025 to 2023 (Jonathan)
- Add David Jiang's, Jonathan's, and Ben's review-by
- Re-add 'struct aer_err_info' (Bot)
Changes in v11->v12:
- Rename drivers/pci/pcie/cxl_rch.c to drivers/pci/pcie/aer_cxl_rch.c (Lukas)
- Removed forward declararation of 'struct aer_err_info' in pci/pci.h (Terry)
Changes in v10->v11:
- Remove changes in code-split and move to earlier, new patch
- Add #include <linux/bitfield.h> to cxl_ras.c
- Move cxl_rch_handle_error() & cxl_rch_enable_rcec() declarations from pci.h
to aer.h, more localized.
- Introduce CONFIG_CXL_RCH_RAS, includes Makefile changes, ras.c
ifdef changes
---
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/aer.c | 99 +-----------------------------
drivers/pci/pcie/aer_cxl_rch.c | 106 +++++++++++++++++++++++++++++++++
drivers/pci/pcie/portdrv.h | 9 ++-
4 files changed, 114 insertions(+), 101 deletions(-)
create mode 100644 drivers/pci/pcie/aer_cxl_rch.c
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 173829aa02e6..b0b43a18c304 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o bwctrl.o
obj-y += aspm.o
obj-$(CONFIG_PCIEAER) += aer.o err.o tlp.o
+obj-$(CONFIG_CXL_RAS) += aer_cxl_rch.o
obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
obj-$(CONFIG_PCIE_PME) += pme.o
obj-$(CONFIG_PCIE_DPC) += dpc.o
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 2527e8370186..b1e6ee7468b9 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1145,27 +1145,7 @@ void pci_aer_unmask_internal_errors(struct pci_dev *dev)
}
EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
-#ifdef CONFIG_PCIEAER_CXL
-static bool is_cxl_mem_dev(struct pci_dev *dev)
-{
- /*
- * The capability, status, and control fields in Device 0,
- * Function 0 DVSEC control the CXL functionality of the
- * entire device (CXL 3.0, 8.1.3).
- */
- if (dev->devfn != PCI_DEVFN(0, 0))
- return false;
-
- /*
- * CXL Memory Devices must have the 502h class code set (CXL
- * 3.0, 8.1.12.1).
- */
- if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
- return false;
-
- return true;
-}
-
+#ifdef CONFIG_CXL_RAS
bool is_aer_internal_error(struct aer_err_info *info)
{
if (info->severity == AER_CORRECTABLE)
@@ -1173,83 +1153,6 @@ bool is_aer_internal_error(struct aer_err_info *info)
return info->status & PCI_ERR_UNC_INTN;
}
-
-static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
-{
- struct aer_err_info *info = (struct aer_err_info *)data;
- const struct pci_error_handlers *err_handler;
-
- if (!is_cxl_mem_dev(dev) || !pcie_aer_is_native(dev))
- return 0;
-
- /* Protect dev->driver */
- device_lock(&dev->dev);
-
- err_handler = dev->driver ? dev->driver->err_handler : NULL;
- if (!err_handler)
- goto out;
-
- if (info->severity == AER_CORRECTABLE) {
- if (err_handler->cor_error_detected)
- err_handler->cor_error_detected(dev);
- } else if (err_handler->error_detected) {
- if (info->severity == AER_NONFATAL)
- err_handler->error_detected(dev, pci_channel_io_normal);
- else if (info->severity == AER_FATAL)
- err_handler->error_detected(dev, pci_channel_io_frozen);
- }
-out:
- device_unlock(&dev->dev);
- return 0;
-}
-
-static void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
-{
- /*
- * Internal errors of an RCEC indicate an AER error in an
- * RCH's downstream port. Check and handle them in the CXL.mem
- * device driver.
- */
- if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
- is_aer_internal_error(info))
- pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
-}
-
-static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
-{
- bool *handles_cxl = data;
-
- if (!*handles_cxl)
- *handles_cxl = is_cxl_mem_dev(dev) && pcie_aer_is_native(dev);
-
- /* Non-zero terminates iteration */
- return *handles_cxl;
-}
-
-static bool handles_cxl_errors(struct pci_dev *rcec)
-{
- bool handles_cxl = false;
-
- if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
- pcie_aer_is_native(rcec))
- pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
-
- return handles_cxl;
-}
-
-static void cxl_rch_enable_rcec(struct pci_dev *rcec)
-{
- if (!handles_cxl_errors(rcec))
- return;
-
- pci_aer_unmask_internal_errors(rcec);
- pci_info(rcec, "CXL: Internal errors unmasked");
-}
-
-#else
-static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { }
-static inline void cxl_rch_handle_error(struct pci_dev *dev,
- struct aer_err_info *info) { }
#endif
/**
diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
new file mode 100644
index 000000000000..6b515edb12c1
--- /dev/null
+++ b/drivers/pci/pcie/aer_cxl_rch.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 AMD Corporation. All rights reserved. */
+
+#include <linux/pci.h>
+#include <linux/aer.h>
+#include <linux/bitfield.h>
+#include "../pci.h"
+#include "portdrv.h"
+
+static bool is_cxl_mem_dev(struct pci_dev *dev)
+{
+ /*
+ * The capability, status, and control fields in Device 0,
+ * Function 0 DVSEC control the CXL functionality of the
+ * entire device (CXL 3.0, 8.1.3).
+ */
+ if (dev->devfn != PCI_DEVFN(0, 0))
+ return false;
+
+ /*
+ * CXL Memory Devices must have the 502h class code set (CXL
+ * 3.0, 8.1.12.1).
+ */
+ if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
+ return false;
+
+ return true;
+}
+
+static bool cxl_error_is_native(struct pci_dev *dev)
+{
+ struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
+
+ return (pcie_ports_native || host->native_aer);
+}
+
+static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
+{
+ struct aer_err_info *info = (struct aer_err_info *)data;
+ const struct pci_error_handlers *err_handler;
+
+ if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
+ return 0;
+
+ device_lock(&dev->dev);
+
+ err_handler = dev->driver ? dev->driver->err_handler : NULL;
+ if (!err_handler)
+ goto out;
+
+ if (info->severity == AER_CORRECTABLE) {
+ if (err_handler->cor_error_detected)
+ err_handler->cor_error_detected(dev);
+ } else if (err_handler->error_detected) {
+ if (info->severity == AER_NONFATAL)
+ err_handler->error_detected(dev, pci_channel_io_normal);
+ else if (info->severity == AER_FATAL)
+ err_handler->error_detected(dev, pci_channel_io_frozen);
+ }
+out:
+ device_unlock(&dev->dev);
+ return 0;
+}
+
+void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
+{
+ /*
+ * Internal errors of an RCEC indicate an AER error in an
+ * RCH's downstream port. Check and handle them in the CXL.mem
+ * device driver.
+ */
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
+ is_aer_internal_error(info))
+ pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
+}
+
+static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
+{
+ bool *handles_cxl = data;
+
+ if (!*handles_cxl)
+ *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
+
+ /* Non-zero terminates iteration */
+ return *handles_cxl;
+}
+
+static bool handles_cxl_errors(struct pci_dev *rcec)
+{
+ bool handles_cxl = false;
+
+ if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
+ pcie_aer_is_native(rcec))
+ pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
+
+ return handles_cxl;
+}
+
+void cxl_rch_enable_rcec(struct pci_dev *rcec)
+{
+ if (!handles_cxl_errors(rcec))
+ return;
+
+ pci_aer_unmask_internal_errors(rcec);
+ pci_info(rcec, "CXL: Internal errors unmasked");
+}
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index e7a0a2cffea9..cc58bf2f2c84 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -126,10 +126,13 @@ struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
struct aer_err_info;
-#ifdef CONFIG_PCIEAER_CXL
+#ifdef CONFIG_CXL_RAS
bool is_aer_internal_error(struct aer_err_info *info);
+void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info);
+void cxl_rch_enable_rcec(struct pci_dev *rcec);
#else
static inline bool is_aer_internal_error(struct aer_err_info *info) { return false; }
-#endif /* CONFIG_PCIEAER_CXL */
-
+static inline void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { }
+static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
+#endif /* CONFIG_CXL_RAS */
#endif /* _PORTDRV_H_ */
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 12/34] PCI/AER: Use guard() in cxl_rch_handle_error_iter()
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (10 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 11/34] PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 19:11 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS Terry Bowman
` (21 subsequent siblings)
33 siblings, 1 reply; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
cxl_rch_handle_error_iter() includes a call to device_lock() using a goto
for multiple return paths. Improve readability and maintainability by
using the guard() lock variant.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
Changes in v13 -> v14:
- Add review-by for Jonathan, Dave Jiang, Dan WIlliams, and Bjorn
- Remove cleanup.h (Jonathan)
- Reverted comment removal (Bjorn)
- Move this patch after pci/pcie/aer_cxl_rch.c creation (Bjorn)
Changes in v12 -> v13:
- New patch
---
drivers/pci/pcie/aer_cxl_rch.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
index 6b515edb12c1..e471eefec9c4 100644
--- a/drivers/pci/pcie/aer_cxl_rch.c
+++ b/drivers/pci/pcie/aer_cxl_rch.c
@@ -42,11 +42,11 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
return 0;
- device_lock(&dev->dev);
+ guard(device)(&dev->dev);
err_handler = dev->driver ? dev->driver->err_handler : NULL;
if (!err_handler)
- goto out;
+ return 0;
if (info->severity == AER_CORRECTABLE) {
if (err_handler->cor_error_detected)
@@ -57,8 +57,6 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
else if (info->severity == AER_FATAL)
err_handler->error_detected(dev, pci_channel_io_frozen);
}
-out:
- device_unlock(&dev->dev);
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (11 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 12/34] PCI/AER: Use guard() in cxl_rch_handle_error_iter() Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 19:12 ` Jonathan Cameron
` (3 more replies)
2026-01-14 18:20 ` [PATCH v14 14/34] PCI/AER: Report CXL or PCIe bus type in AER trace logging Terry Bowman
` (20 subsequent siblings)
33 siblings, 4 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
From: Dan Williams <dan.j.williams@intel.com>
One of the primary reasons for the CXL driver to exist is to perform error
handling. If both PCIEAER and CXL are enabled then light up CXL error
handling as well. The work to remove CONFIG_PCIEAER_CXL started in:
commit 4ae6ae66649c ("cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c")
Finish that off with conditionally compiling all CXL RAS related helpers
with CONFIG_CXL_RAS.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
----
Changes in v13->v14:
- New commit
---
drivers/cxl/Kconfig | 2 +-
drivers/pci/pcie/Kconfig | 9 ---------
2 files changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 217888992c88..70acddc08c39 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -235,6 +235,6 @@ config CXL_MCE
config CXL_RAS
def_bool y
- depends on ACPI_APEI_GHES && PCIEAER && CXL_PCI
+ depends on ACPI_APEI_GHES && PCIEAER && CXL_BUS
endif
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 17919b99fa66..207c2deae35f 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -49,15 +49,6 @@ config PCIEAER_INJECT
gotten from:
https://github.com/intel/aer-inject.git
-config PCIEAER_CXL
- bool "PCI Express CXL RAS support"
- default y
- depends on PCIEAER && CXL_PCI
- help
- Enables CXL error handling.
-
- If unsure, say Y.
-
#
# PCI Express ECRC
#
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 14/34] PCI/AER: Report CXL or PCIe bus type in AER trace logging
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (12 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 19:45 ` Jonathan Cameron
2026-01-14 20:56 ` Dave Jiang
2026-01-14 18:20 ` [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting Terry Bowman
` (19 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The AER service driver and aer_event tracing currently log 'PCIe Bus Type'
for all errors. Update the driver and aer_event tracing to log 'CXL Bus
Type' for CXL device errors.
This requires that AER can identify and distinguish between PCIe errors and
CXL errors.
Introduce boolean 'is_cxl' to 'struct aer_err_info'. Add assignment in
aer_get_device_error_info() and pci_print_aer().
Update the aer_event trace routine to accept a bus type string parameter.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
Changes in v13->v14:
- Merged with Dan's commit. Changes are moving bus_type the last
parameter in function calls (Dan)
- Removed all DCOs because of changes (Terry)
- Update commit message (Bjorn)
- Add Bjorn's ack-by
Changes in v12->v13:
- Remove duplicated aer_err_info inline comments. Is already in the
kernel-doc header (Ben)
Changes in v11->v12:
- Change aer_err_info::is_cxl to be bool a bitfield. Update structure
padding. (Lukas)
- Add kernel-doc for 'struct aer_err_info' (Lukas)
Changes in v10->v11:
- Remove duplicate call to trace_aer_event() (Shiju)
- Added Dan William's and Dave Jiang's reviewed-by
---
drivers/pci/pci.h | 8 +++++++-
drivers/pci/pcie/aer.c | 20 +++++++++++++-------
include/ras/ras_event.h | 12 ++++++++----
3 files changed, 28 insertions(+), 12 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 0e67014aa001..41ec38e82c08 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -738,7 +738,8 @@ struct aer_err_info {
unsigned int multi_error_valid:1;
unsigned int first_error:5;
- unsigned int __pad2:2;
+ unsigned int __pad2:1;
+ unsigned int is_cxl:1;
unsigned int tlp_header_valid:1;
unsigned int status; /* COR/UNCOR Error Status */
@@ -749,6 +750,11 @@ struct aer_err_info {
int aer_get_device_error_info(struct aer_err_info *info, int i);
void aer_print_error(struct aer_err_info *info, int i);
+static inline const char *aer_err_bus(struct aer_err_info *info)
+{
+ return info->is_cxl ? "CXL" : "PCIe";
+}
+
int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
unsigned int tlp_len, bool flit,
struct pcie_tlp_log *log);
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index b1e6ee7468b9..d30a217fae46 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -870,6 +870,7 @@ void aer_print_error(struct aer_err_info *info, int i)
struct pci_dev *dev;
int layer, agent, id;
const char *level = info->level;
+ const char *bus_type = aer_err_bus(info);
if (WARN_ON_ONCE(i >= AER_MAX_MULTI_ERR_DEVICES))
return;
@@ -879,22 +880,22 @@ void aer_print_error(struct aer_err_info *info, int i)
pci_dev_aer_stats_incr(dev, info);
trace_aer_event(pci_name(dev), (info->status & ~info->mask),
- info->severity, info->tlp_header_valid, &info->tlp);
+ info->severity, info->tlp_header_valid, &info->tlp, bus_type);
if (!info->ratelimit_print[i])
return;
if (!info->status) {
- pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
- aer_error_severity_string[info->severity]);
+ pci_err(dev, "%s Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
+ bus_type, aer_error_severity_string[info->severity]);
goto out;
}
layer = AER_GET_LAYER_ERROR(info->severity, info->status);
agent = AER_GET_AGENT(info->severity, info->status);
- aer_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
- aer_error_severity_string[info->severity],
+ aer_printk(level, dev, "%s Bus Error: severity=%s, type=%s, (%s)\n",
+ bus_type, aer_error_severity_string[info->severity],
aer_error_layer[layer], aer_agent_string[agent]);
aer_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
@@ -928,6 +929,7 @@ EXPORT_SYMBOL_GPL(cper_severity_to_aer);
void pci_print_aer(struct pci_dev *dev, int aer_severity,
struct aer_capability_regs *aer)
{
+ const char *bus_type;
int layer, agent, tlp_header_valid = 0;
u32 status, mask;
struct aer_err_info info = {
@@ -948,10 +950,13 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
info.status = status;
info.mask = mask;
+ info.is_cxl = pcie_is_cxl(dev);
+
+ bus_type = aer_err_bus(&info);
pci_dev_aer_stats_incr(dev, &info);
- trace_aer_event(pci_name(dev), (status & ~mask),
- aer_severity, tlp_header_valid, &aer->header_log);
+ trace_aer_event(pci_name(dev), (status & ~mask), aer_severity,
+ tlp_header_valid, &aer->header_log, bus_type);
if (!aer_ratelimit(dev, info.severity))
return;
@@ -1301,6 +1306,7 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
/* Must reset in this function */
info->status = 0;
info->tlp_header_valid = 0;
+ info->is_cxl = pcie_is_cxl(dev);
/* The device might not support AER */
if (!aer)
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index eaecc3c5f772..fdb785fa4613 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -339,9 +339,11 @@ TRACE_EVENT(aer_event,
const u32 status,
const u8 severity,
const u8 tlp_header_valid,
- struct pcie_tlp_log *tlp),
+ struct pcie_tlp_log *tlp,
+ const char *bus_type),
- TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp),
+
+ TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp, bus_type),
TP_STRUCT__entry(
__string( dev_name, dev_name )
@@ -349,10 +351,12 @@ TRACE_EVENT(aer_event,
__field( u8, severity )
__field( u8, tlp_header_valid)
__array( u32, tlp_header, PCIE_STD_MAX_TLP_HEADERLOG)
+ __string( bus_type, bus_type )
),
TP_fast_assign(
__assign_str(dev_name);
+ __assign_str(bus_type);
__entry->status = status;
__entry->severity = severity;
__entry->tlp_header_valid = tlp_header_valid;
@@ -364,8 +368,8 @@ TRACE_EVENT(aer_event,
}
),
- TP_printk("%s PCIe Bus Error: severity=%s, %s, TLP Header=%s\n",
- __get_str(dev_name),
+ TP_printk("%s %s Bus Error: severity=%s, %s, TLP Header=%s\n",
+ __get_str(dev_name), __get_str(bus_type),
__entry->severity == AER_CORRECTABLE ? "Corrected" :
__entry->severity == AER_FATAL ?
"Fatal" : "Uncorrected, non-fatal",
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (13 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 14/34] PCI/AER: Report CXL or PCIe bus type in AER trace logging Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 19:48 ` Jonathan Cameron
` (2 more replies)
2026-01-14 18:20 ` [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm() Terry Bowman
` (18 subsequent siblings)
33 siblings, 3 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
Update the existing 'struct aer_err_info' definition to use kernel-doc
formatting. Remove the inline comments to reduce noise and do not introduce
functional changes. This will improve readability and maintainability.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- New commit
---
drivers/pci/pci.h | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 41ec38e82c08..dbc547db208a 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -724,16 +724,33 @@ static inline bool pci_dev_binding_disallowed(struct pci_dev *dev)
#define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */
+/**
+ * struct aer_err_info - AER Error Information
+ * @dev: Devices reporting error
+ * @ratelimit_print: Flag to log or not log the devices' error. 0=NotLog/1=Log
+ * @error_dev_num: Number of devices reporting an error
+ * @level: printk level to use in logging
+ * @id: Value from register PCI_ERR_ROOT_ERR_SRC
+ * @severity: AER severity, 0-UNCOR Non-fatal, 1-UNCOR fatal, 2-COR
+ * @root_ratelimit_print: Flag to log or not log the root's error. 0=NotLog/1=Log
+ * @multi_error_valid: If multiple errors are reported
+ * @first_error: First reported error
+ * @is_cxl: Bus type error: 0-PCI Bus error, 1-CXL Bus error
+ * @tlp_header_valid: Indicates if TLP field contains error information
+ * @status: COR/UNCOR error status
+ * @mask: COR/UNCOR mask
+ * @tlp: Transaction packet information
+ */
struct aer_err_info {
struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
int ratelimit_print[AER_MAX_MULTI_ERR_DEVICES];
int error_dev_num;
- const char *level; /* printk level */
+ const char *level;
unsigned int id:16;
- unsigned int severity:2; /* 0:NONFATAL | 1:FATAL | 2:COR */
- unsigned int root_ratelimit_print:1; /* 0=skip, 1=print */
+ unsigned int severity:2;
+ unsigned int root_ratelimit_print:1;
unsigned int __pad1:4;
unsigned int multi_error_valid:1;
@@ -742,9 +759,9 @@ struct aer_err_info {
unsigned int is_cxl:1;
unsigned int tlp_header_valid:1;
- unsigned int status; /* COR/UNCOR Error Status */
- unsigned int mask; /* COR/UNCOR Error Mask */
- struct pcie_tlp_log tlp; /* TLP Header */
+ unsigned int status;
+ unsigned int mask;
+ struct pcie_tlp_log tlp;
};
int aer_get_device_error_info(struct aer_err_info *info, int i);
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm()
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (14 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 19:49 ` Jonathan Cameron
2026-01-14 21:08 ` Dave Jiang
2026-01-14 18:20 ` [PATCH v14 17/34] cxl: Update RAS handler interfaces to also support CXL Ports Terry Bowman
` (17 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
From: Dan Williams <dan.j.williams@intel.com>
The convention for devm_ helpers in the CXL driver is that the first
argument is the @host for the operation (locked driver::probe() context).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13 -> v14:
- New patch
---
drivers/cxl/core/pmem.c | 13 +++++++------
drivers/cxl/cxl.h | 3 ++-
drivers/cxl/mem.c | 2 +-
3 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c
index 8853415c106a..e7b1e6fa0ea0 100644
--- a/drivers/cxl/core/pmem.c
+++ b/drivers/cxl/core/pmem.c
@@ -237,12 +237,13 @@ static void cxlmd_release_nvdimm(void *_cxlmd)
/**
* devm_cxl_add_nvdimm() - add a bridge between a cxl_memdev and an nvdimm
- * @parent_port: parent port for the (to be added) @cxlmd endpoint port
- * @cxlmd: cxl_memdev instance that will perform LIBNVDIMM operations
+ * @host: host device for devm operations
+ * @port: any port in the CXL topology to find the nvdimm-bridge device
+ * @cxlmd: parent of the to be created cxl_nvdimm device
*
* Return: 0 on success negative error code on failure.
*/
-int devm_cxl_add_nvdimm(struct cxl_port *parent_port,
+int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port,
struct cxl_memdev *cxlmd)
{
struct cxl_nvdimm_bridge *cxl_nvb;
@@ -250,7 +251,7 @@ int devm_cxl_add_nvdimm(struct cxl_port *parent_port,
struct device *dev;
int rc;
- cxl_nvb = cxl_find_nvdimm_bridge(parent_port);
+ cxl_nvb = cxl_find_nvdimm_bridge(port);
if (!cxl_nvb)
return -ENODEV;
@@ -270,10 +271,10 @@ int devm_cxl_add_nvdimm(struct cxl_port *parent_port,
if (rc)
goto err;
- dev_dbg(&cxlmd->dev, "register %s\n", dev_name(dev));
+ dev_dbg(host, "register %s\n", dev_name(dev));
/* @cxlmd carries a reference on @cxl_nvb until cxlmd_release_nvdimm */
- return devm_add_action_or_reset(&cxlmd->dev, cxlmd_release_nvdimm, cxlmd);
+ return devm_add_action_or_reset(host, cxlmd_release_nvdimm, cxlmd);
err:
put_device(dev);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 42a76a7a088f..6f3741a57932 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -887,7 +887,8 @@ struct cxl_nvdimm_bridge *devm_cxl_add_nvdimm_bridge(struct device *host,
struct cxl_port *port);
struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev);
bool is_cxl_nvdimm(struct device *dev);
-int devm_cxl_add_nvdimm(struct cxl_port *parent_port, struct cxl_memdev *cxlmd);
+int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port,
+ struct cxl_memdev *cxlmd);
struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_port *port);
#ifdef CONFIG_CXL_REGION
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 6e6777b7bafb..c2ee7f7f6320 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -153,7 +153,7 @@ static int cxl_mem_probe(struct device *dev)
}
if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
- rc = devm_cxl_add_nvdimm(parent_port, cxlmd);
+ rc = devm_cxl_add_nvdimm(dev, parent_port, cxlmd);
if (rc) {
if (rc == -ENODEV)
dev_info(dev, "PMEM disabled by platform\n");
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 17/34] cxl: Update RAS handler interfaces to also support CXL Ports
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (15 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm() Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 18:20 ` [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers Terry Bowman
` (16 subsequent siblings)
33 siblings, 0 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
CXL PCIe Port Protocol Error handling support will be added to the
CXL drivers in the future. In preparation, rename the existing
interfaces to support handling all CXL PCIe Port Protocol Errors.
The driver's RAS support functions currently rely on a 'struct
cxl_dev_state' type parameter, which is not available for CXL Port
devices. However, since the same CXL RAS capability structure is
needed across most CXL components and devices, a common handling
approach should be adopted.
To accommodate this, update the __cxl_handle_cor_ras() and
__cxl_handle_ras() functions to use a `struct device` instead of
`struct cxl_dev_state`.
No functional changes are introduced.
[1] CXL 3.1 Spec, 8.2.4 CXL.cache and CXL.mem Registers
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- None
---
drivers/cxl/core/core.h | 14 +++++---------
drivers/cxl/core/ras.c | 12 ++++++------
drivers/cxl/core/ras_rch.c | 4 ++--
3 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 724361195057..422531799af2 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -147,8 +147,8 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
#ifdef CONFIG_CXL_RAS
int cxl_ras_init(void);
void cxl_ras_exit(void);
-bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
-void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
+bool cxl_handle_ras(struct device *dev, void __iomem *ras_base);
+void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);
void cxl_dport_map_rch_aer(struct cxl_dport *dport);
void cxl_disable_rch_root_ints(struct cxl_dport *dport);
void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
@@ -157,16 +157,12 @@ static inline int cxl_ras_init(void)
{
return 0;
}
-
-static inline void cxl_ras_exit(void)
-{
-}
-
-static inline bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+static inline void cxl_ras_exit(void) { }
+static inline bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
{
return false;
}
-static inline void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base) { }
+static inline void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { }
static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index b933030b8e1e..72908f3ced77 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -160,7 +160,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
-void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
{
void __iomem *addr;
u32 status;
@@ -172,7 +172,7 @@ void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
status = readl(addr);
if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
+ trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
}
}
@@ -197,7 +197,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
*/
-bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
{
u32 hl[CXL_HEADERLOG_SIZE_U32];
void __iomem *addr;
@@ -224,7 +224,7 @@ bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
}
header_log_copy(ras_base, hl);
- trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl);
+ trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
return true;
@@ -246,7 +246,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
if (cxlds->rcd)
cxl_handle_rdport_errors(cxlds);
- cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
+ cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
}
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
@@ -275,7 +275,7 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
* chance the situation is recoverable dump the status of the RAS
* capability registers and bounce the active state of the memdev.
*/
- ue = cxl_handle_ras(cxlds, cxlds->regs.ras);
+ ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
}
diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
index ed58afd18ecc..0a8b3b9b6388 100644
--- a/drivers/cxl/core/ras_rch.c
+++ b/drivers/cxl/core/ras_rch.c
@@ -115,7 +115,7 @@ void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
pci_print_aer(pdev, severity, &aer_regs);
if (severity == AER_CORRECTABLE)
- cxl_handle_cor_ras(cxlds, dport->regs.ras);
+ cxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras);
else
- cxl_handle_ras(cxlds, dport->regs.ras);
+ cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (16 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 17/34] cxl: Update RAS handler interfaces to also support CXL Ports Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 19:50 ` Jonathan Cameron
` (2 more replies)
2026-01-14 18:20 ` [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management Terry Bowman
` (15 subsequent siblings)
33 siblings, 3 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
From: Dan Williams <dan.j.williams@intel.com>
Now that cxl_switch_port_probe() no longer walks potential dports, because
they are enumerated dynamically on descendant endpoint arrival, remove the
dead code.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13 -> v14:
- New patch
---
drivers/cxl/core/pci.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index b838c59d7a3c..0305a421504e 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -71,6 +71,14 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
}
EXPORT_SYMBOL_NS_GPL(__devm_cxl_add_dport_by_dev, "CXL");
+struct cxl_walk_context {
+ struct pci_bus *bus;
+ struct cxl_port *port;
+ int type;
+ int error;
+ int count;
+};
+
static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
{
struct pci_dev *pdev = to_pci_dev(cxlds->dev);
@@ -820,14 +828,6 @@ int cxl_gpf_port_setup(struct cxl_dport *dport)
return 0;
}
-struct cxl_walk_context {
- struct pci_bus *bus;
- struct cxl_port *port;
- int type;
- int error;
- int count;
-};
-
static int count_dports(struct pci_dev *pdev, void *data)
{
struct cxl_walk_context *ctx = data;
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (17 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 21:26 ` Dave Jiang
2026-01-15 14:46 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 20/34] cxl/port: Move dport operations to a driver event Terry Bowman
` (14 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
From: Dan Williams <dan.j.williams@intel.com>
With dport addition moving out of cxl_switch_port_probe() it is no longer
the case that a single dport-add failure will cause all dport resources
to be automatically unwound.
devm still helps all dport resources get cleaned up when the port is
detached, but setup now needs to avoid leaking resources if an early exit
occurs during setup.
Convert from a "devm add" model, to an "auto remove" model that makes the
caller responsible for registering devm reclaim after the object is fully
instantiated.
As a side of effect of this reorganization port->nr_dports is now always
consistent with the number of entries in the port->dports xarray, and this
can stop playing games with ida_is_empty() which is unreliable as a
detector of whether decoders are setup. I.e. consider how
CONFIG_DEBUG_KOBJECT_RELEASE might wreak havoc with this approach.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13 -> v14:
- New patch
---
drivers/cxl/acpi.c | 11 +-
drivers/cxl/core/pci.c | 10 +-
drivers/cxl/core/port.c | 225 ++++++++++++++++-----------
drivers/cxl/cxl.h | 23 +--
drivers/cxl/port.c | 8 +-
tools/testing/cxl/Kbuild | 3 +-
tools/testing/cxl/cxl_core_exports.c | 13 +-
tools/testing/cxl/exports.h | 4 +-
tools/testing/cxl/test/cxl.c | 6 +-
tools/testing/cxl/test/mock.c | 25 ++-
tools/testing/cxl/test/mock.h | 4 +-
11 files changed, 188 insertions(+), 144 deletions(-)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 77ac940e3013..1e1383eb9bd5 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -679,16 +679,19 @@ static int add_host_bridge_dport(struct device *match, void *arg)
if (ctx.cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11) {
dev_dbg(match, "RCRB found for UID %lld: %pa\n", ctx.uid,
&ctx.base);
- dport = devm_cxl_add_rch_dport(root_port, bridge, ctx.uid,
- ctx.base);
+ dport = cxl_add_rch_dport(root_port, bridge, ctx.uid, ctx.base);
} else {
- dport = devm_cxl_add_dport(root_port, bridge, ctx.uid,
- CXL_RESOURCE_NONE);
+ dport = cxl_add_dport(root_port, bridge, ctx.uid,
+ CXL_RESOURCE_NONE);
}
if (IS_ERR(dport))
return PTR_ERR(dport);
+ ret = cxl_dport_autoremove(dport);
+ if (ret)
+ return ret;
+
ret = get_genport_coordinates(match, dport);
if (ret)
dev_dbg(match, "Failed to get generic port perf coordinates.\n");
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 0305a421504e..512a3e29a095 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -41,14 +41,14 @@ static int pci_get_port_num(struct pci_dev *pdev)
}
/**
- * __devm_cxl_add_dport_by_dev - allocate a dport by dport device
+ * __cxl_add_dport_by_dev - allocate a dport by dport device
* @port: cxl_port that hosts the dport
* @dport_dev: 'struct device' of the dport
*
* Returns the allocated dport on success or ERR_PTR() of -errno on error
*/
-struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev)
+struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev)
{
struct cxl_register_map map;
struct pci_dev *pdev;
@@ -67,9 +67,9 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
return ERR_PTR(rc);
device_lock_assert(&port->dev);
- return devm_cxl_add_dport(port, dport_dev, port_num, map.resource);
+ return cxl_add_dport(port, dport_dev, port_num, map.resource);
}
-EXPORT_SYMBOL_NS_GPL(__devm_cxl_add_dport_by_dev, "CXL");
+EXPORT_SYMBOL_NS_GPL(__cxl_add_dport_by_dev, "CXL");
struct cxl_walk_context {
struct pci_bus *bus;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index fef3aa0c6680..a05a1812bb6e 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1051,7 +1051,8 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id)
return NULL;
}
-static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
+static struct cxl_dport *add_dport(struct cxl_port *port,
+ struct cxl_dport *dport)
{
struct cxl_dport *dup;
int rc;
@@ -1063,16 +1064,33 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
"unable to add dport%d-%s non-unique port id (%s)\n",
dport->port_id, dev_name(dport->dport_dev),
dev_name(dup->dport_dev));
- return -EBUSY;
+ return ERR_PTR(-EBUSY);
+ }
+
+ /*
+ * Unlike CXL switch upstream ports where it can train a CXL link
+ * independent of its downstream ports, a host bridge upstream port may
+ * not enable CXL registers until at least one downstream port (root
+ * port) trains CXL. Enumerate registers once when the number of dports
+ * transitions from zero to one.
+ */
+ if (!port->nr_dports) {
+ rc = cxl_port_setup_regs(port, port->component_reg_phys);
+ if (rc)
+ return ERR_PTR(rc);
}
+ /* Arrange for dport_dev to be valid through remove_dport() */
+ struct device *dev __free(put_device) = get_device(dport->dport_dev);
+
rc = xa_insert(&port->dports, (unsigned long)dport->dport_dev, dport,
GFP_KERNEL);
if (rc)
- return rc;
+ return ERR_PTR(rc);
+ retain_and_null_ptr(dev);
port->nr_dports++;
- return 0;
+ return dport;
}
/*
@@ -1094,51 +1112,32 @@ static void cond_cxl_root_unlock(struct cxl_port *port)
device_unlock(&port->dev);
}
-static void cxl_dport_remove(void *data)
+static void remove_dport(struct cxl_dport *dport)
{
- struct cxl_dport *dport = data;
struct cxl_port *port = dport->port;
+ port->nr_dports--;
xa_erase(&port->dports, (unsigned long) dport->dport_dev);
put_device(dport->dport_dev);
}
-static void cxl_dport_unlink(void *data)
-{
- struct cxl_dport *dport = data;
- struct cxl_port *port = dport->port;
- char link_name[CXL_TARGET_STRLEN];
+DEFINE_FREE(remove_dport, struct cxl_dport *,
+ if (!IS_ERR_OR_NULL(_T)) remove_dport(_T))
- sprintf(link_name, "dport%d", dport->port_id);
- sysfs_remove_link(&port->dev.kobj, link_name);
-}
-
-static struct cxl_dport *
-__devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
- int port_id, resource_size_t component_reg_phys,
- resource_size_t rcrb)
+static struct cxl_dport *__cxl_add_dport(struct cxl_port *port,
+ struct device *dport_dev, int port_id,
+ resource_size_t component_reg_phys,
+ resource_size_t rcrb)
{
char link_name[CXL_TARGET_STRLEN];
- struct cxl_dport *dport;
- struct device *host;
int rc;
- if (is_cxl_root(port))
- host = port->uport_dev;
- else
- host = &port->dev;
-
- if (!host->driver) {
- dev_WARN_ONCE(&port->dev, 1, "dport:%s bad devm context\n",
- dev_name(dport_dev));
- return ERR_PTR(-ENXIO);
- }
-
if (snprintf(link_name, CXL_TARGET_STRLEN, "dport%d", port_id) >=
CXL_TARGET_STRLEN)
return ERR_PTR(-EINVAL);
- dport = devm_kzalloc(host, sizeof(*dport), GFP_KERNEL);
+ struct cxl_dport *dport __free(kfree) =
+ kzalloc(sizeof(*dport), GFP_KERNEL);
if (!dport)
return ERR_PTR(-ENOMEM);
@@ -1176,48 +1175,27 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
&component_reg_phys);
cond_cxl_root_lock(port);
- rc = add_dport(port, dport);
+ struct cxl_dport *dport_add __free(remove_dport) =
+ add_dport(port, dport);
cond_cxl_root_unlock(port);
- if (rc)
- return ERR_PTR(rc);
-
- /*
- * Setup port register if this is the first dport showed up. Having
- * a dport also means that there is at least 1 active link.
- */
- if (port->nr_dports == 1 &&
- port->component_reg_phys != CXL_RESOURCE_NONE) {
- rc = cxl_port_setup_regs(port, port->component_reg_phys);
- if (rc) {
- xa_erase(&port->dports, (unsigned long)dport->dport_dev);
- return ERR_PTR(rc);
- }
- port->component_reg_phys = CXL_RESOURCE_NONE;
- }
+ if (IS_ERR(dport_add))
+ return dport_add;
- get_device(dport_dev);
- rc = devm_add_action_or_reset(host, cxl_dport_remove, dport);
- if (rc)
- return ERR_PTR(rc);
+ if (dev_is_pci(dport_dev))
+ dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
rc = sysfs_create_link(&port->dev.kobj, &dport_dev->kobj, link_name);
if (rc)
return ERR_PTR(rc);
- rc = devm_add_action_or_reset(host, cxl_dport_unlink, dport);
- if (rc)
- return ERR_PTR(rc);
-
- if (dev_is_pci(dport_dev))
- dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
-
cxl_debugfs_create_dport_dir(dport);
- return dport;
+ retain_and_null_ptr(dport_add);
+ return no_free_ptr(dport);
}
/**
- * devm_cxl_add_dport - append VH downstream port data to a cxl_port
+ * cxl_add_dport - append VH downstream port data to a cxl_port
* @port: the cxl_port that references this dport
* @dport_dev: firmware or PCI device representing the dport
* @port_id: identifier for this dport in a decoder's target list
@@ -1227,14 +1205,13 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
* either the port's host (for root ports), or the port itself (for
* switch ports)
*/
-struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
- struct device *dport_dev, int port_id,
- resource_size_t component_reg_phys)
+struct cxl_dport *cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
+ int port_id, resource_size_t component_reg_phys)
{
struct cxl_dport *dport;
- dport = __devm_cxl_add_dport(port, dport_dev, port_id,
- component_reg_phys, CXL_RESOURCE_NONE);
+ dport = __cxl_add_dport(port, dport_dev, port_id, component_reg_phys,
+ CXL_RESOURCE_NONE);
if (IS_ERR(dport)) {
dev_dbg(dport_dev, "failed to add dport to %s: %ld\n",
dev_name(&port->dev), PTR_ERR(dport));
@@ -1245,10 +1222,10 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
return dport;
}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_add_dport, "CXL");
/**
- * devm_cxl_add_rch_dport - append RCH downstream port data to a cxl_port
+ * cxl_add_rch_dport - append RCH downstream port data to a cxl_port
* @port: the cxl_port that references this dport
* @dport_dev: firmware or PCI device representing the dport
* @port_id: identifier for this dport in a decoder's target list
@@ -1256,9 +1233,9 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport, "CXL");
*
* See CXL 3.0 9.11.8 CXL Devices Attached to an RCH
*/
-struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
- struct device *dport_dev, int port_id,
- resource_size_t rcrb)
+struct cxl_dport *cxl_add_rch_dport(struct cxl_port *port,
+ struct device *dport_dev, int port_id,
+ resource_size_t rcrb)
{
struct cxl_dport *dport;
@@ -1267,8 +1244,8 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
return ERR_PTR(-EINVAL);
}
- dport = __devm_cxl_add_dport(port, dport_dev, port_id,
- CXL_RESOURCE_NONE, rcrb);
+ dport = __cxl_add_dport(port, dport_dev, port_id, CXL_RESOURCE_NONE,
+ rcrb);
if (IS_ERR(dport)) {
dev_dbg(dport_dev, "failed to add RCH dport to %s: %ld\n",
dev_name(&port->dev), PTR_ERR(dport));
@@ -1279,7 +1256,7 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
return dport;
}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_add_rch_dport, "CXL");
static int add_ep(struct cxl_ep *new)
{
@@ -1439,13 +1416,42 @@ static void delete_switch_port(struct cxl_port *port)
devm_release_action(port->dev.parent, unregister_port, port);
}
+static void unlink_dport(void *data)
+{
+ struct cxl_dport *dport = data;
+ struct cxl_port *port = dport->port;
+ char link_name[CXL_TARGET_STRLEN];
+
+ sprintf(link_name, "dport%d", dport->port_id);
+ sysfs_remove_link(&port->dev.kobj, link_name);
+ remove_dport(dport);
+ kfree(dport);
+}
+
+int cxl_dport_autoremove(struct cxl_dport *dport)
+{
+ struct cxl_port *port = dport->port;
+ struct device *host;
+
+ if (is_cxl_root(port))
+ host = port->uport_dev;
+ else
+ host = &port->dev;
+
+ return devm_add_action_or_reset(host, unlink_dport, dport);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_dport_autoremove, "CXL");
+
+/*
+ * Note: this only services dynamic removal of mid-level ports, root ports are
+ * always removed by the platform driver (e.g. cxl_acpi). @host can be
+ * hard-coded to &port->dev.
+ */
static void del_dport(struct cxl_dport *dport)
{
struct cxl_port *port = dport->port;
- devm_release_action(&port->dev, cxl_dport_unlink, dport);
- devm_release_action(&port->dev, cxl_dport_remove, dport);
- devm_kfree(&port->dev, dport);
+ devm_release_action(&port->dev, unlink_dport, dport);
}
static void del_dports(struct cxl_port *port)
@@ -1597,10 +1603,24 @@ static int update_decoder_targets(struct device *dev, void *data)
return 0;
}
-DEFINE_FREE(del_cxl_dport, struct cxl_dport *, if (!IS_ERR_OR_NULL(_T)) del_dport(_T))
+static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
+{
+ if (!devres_open_group(&port->dev, port, GFP_KERNEL))
+ return ERR_PTR(-ENOMEM);
+ return port;
+}
+DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
+ if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
+
+static void cxl_port_group_close(struct cxl_port *port)
+{
+ devres_remove_group(&port->dev, port);
+}
+
static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
struct device *dport_dev)
{
+ struct cxl_dport *new_dport;
struct cxl_dport *dport;
int rc;
@@ -1615,29 +1635,46 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
return ERR_PTR(-EBUSY);
}
- struct cxl_dport *new_dport __free(del_cxl_dport) =
- devm_cxl_add_dport_by_dev(port, dport_dev);
- if (IS_ERR(new_dport))
- return new_dport;
-
- cxl_switch_parse_cdat(new_dport);
+ /*
+ * With the first dport arrival it is now safe to start looking at
+ * component registers. Be careful to not strand resources if dport
+ * creation ultimately fails.
+ */
+ struct cxl_port *port_group __free(cxl_port_group_free) =
+ cxl_port_devres_group(port);
+ if (IS_ERR(port_group))
+ return ERR_CAST(port_group);
- if (ida_is_empty(&port->decoder_ida)) {
+ if (port->nr_dports == 0) {
rc = devm_cxl_switch_port_decoders_setup(port);
if (rc)
return ERR_PTR(rc);
- dev_dbg(&port->dev, "first dport%d:%s added with decoders\n",
- new_dport->port_id, dev_name(dport_dev));
- return no_free_ptr(new_dport);
+ /*
+ * Note, when nr_dports returns to zero the port is unregistered
+ * and triggers cleanup. I.e. no need for open-coded release
+ * action on dport removal. See cxl_detach_ep() for that logic.
+ */
}
+ new_dport = cxl_add_dport_by_dev(port, dport_dev);
+ if (IS_ERR(new_dport))
+ return new_dport;
+
+ rc = cxl_dport_autoremove(new_dport);
+ if (rc)
+ return ERR_PTR(rc);
+
+ cxl_switch_parse_cdat(new_dport);
+
+ cxl_port_group_close(no_free_ptr(port_group));
+
+ dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
+ port->nr_dports - 1, new_dport->port_id, dev_name(dport_dev));
+
/* New dport added, update the decoder targets */
device_for_each_child(&port->dev, new_dport, update_decoder_targets);
- dev_dbg(&port->dev, "dport%d:%s added\n", new_dport->port_id,
- dev_name(dport_dev));
-
- return no_free_ptr(new_dport);
+ return new_dport;
}
static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 6f3741a57932..47ee06c95433 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -796,12 +796,12 @@ struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
struct cxl_dport **dport);
bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
-struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
- struct device *dport, int port_id,
- resource_size_t component_reg_phys);
-struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
- struct device *dport_dev, int port_id,
- resource_size_t rcrb);
+struct cxl_dport *cxl_add_dport(struct cxl_port *port, struct device *dport,
+ int port_id,
+ resource_size_t component_reg_phys);
+struct cxl_dport *cxl_add_rch_dport(struct cxl_port *port,
+ struct device *dport_dev, int port_id,
+ resource_size_t rcrb);
struct cxl_decoder *to_cxl_decoder(struct device *dev);
struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
@@ -824,6 +824,7 @@ static inline int cxl_root_decoder_autoremove(struct device *host,
return cxl_decoder_autoremove(host, &cxlrd->cxlsd.cxld);
}
int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
+int cxl_dport_autoremove(struct cxl_dport *dport);
/**
* struct cxl_endpoint_dvsec_info - Cached DVSEC info
@@ -937,10 +938,10 @@ void cxl_coordinates_combine(struct access_coordinate *out,
struct access_coordinate *c2);
bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
-struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev);
-struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev);
+struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev);
+struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev);
/*
* Unit test builds overrides this to __weak, find the 'strong' version
@@ -964,7 +965,7 @@ u16 cxl_gpf_get_dvsec(struct device *dev);
*/
#ifndef CXL_TEST_ENABLE
#define DECLARE_TESTABLE(x) __##x
-#define devm_cxl_add_dport_by_dev DECLARE_TESTABLE(devm_cxl_add_dport_by_dev)
+#define cxl_add_dport_by_dev DECLARE_TESTABLE(cxl_add_dport_by_dev)
#define devm_cxl_switch_port_decoders_setup DECLARE_TESTABLE(devm_cxl_switch_port_decoders_setup)
#endif
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 51c8f2f84717..167cc0a87484 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -59,8 +59,12 @@ static int discover_region(struct device *dev, void *unused)
static int cxl_switch_port_probe(struct cxl_port *port)
{
- /* Reset nr_dports for rebind of driver */
- port->nr_dports = 0;
+ /*
+ * Unfortunately, typical driver operations like "find and map
+ * registers", can not be done at port device attach time and must wait
+ * for dport arrival. See cxl_port_add_dport() and the comments in
+ * add_dport() for details.
+ */
/* Cache the data early to ensure is_visible() works */
read_cdat_data(port);
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 6eceefefb0e0..4d740392aac5 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -5,7 +5,8 @@ ldflags-y += --wrap=acpi_evaluate_integer
ldflags-y += --wrap=acpi_pci_find_root
ldflags-y += --wrap=nvdimm_bus_register
ldflags-y += --wrap=cxl_await_media_ready
-ldflags-y += --wrap=devm_cxl_add_rch_dport
+ldflags-y += --wrap=cxl_add_rch_dport
+ldflags-y += --wrap=cxl_rcd_component_reg_phys
ldflags-y += --wrap=cxl_endpoint_parse_cdat
ldflags-y += --wrap=cxl_dport_init_ras_reporting
ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
diff --git a/tools/testing/cxl/cxl_core_exports.c b/tools/testing/cxl/cxl_core_exports.c
index 6754de35598d..02d479867a12 100644
--- a/tools/testing/cxl/cxl_core_exports.c
+++ b/tools/testing/cxl/cxl_core_exports.c
@@ -7,16 +7,15 @@
/* Exporting of cxl_core symbols that are only used by cxl_test */
EXPORT_SYMBOL_NS_GPL(cxl_num_decoders_committed, "CXL");
-cxl_add_dport_by_dev_fn _devm_cxl_add_dport_by_dev =
- __devm_cxl_add_dport_by_dev;
-EXPORT_SYMBOL_NS_GPL(_devm_cxl_add_dport_by_dev, "CXL");
+cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
+EXPORT_SYMBOL_NS_GPL(_cxl_add_dport_by_dev, "CXL");
-struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev)
+struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev)
{
- return _devm_cxl_add_dport_by_dev(port, dport_dev);
+ return _cxl_add_dport_by_dev(port, dport_dev);
}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport_by_dev, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_add_dport_by_dev, "CXL");
cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup =
__devm_cxl_switch_port_decoders_setup;
diff --git a/tools/testing/cxl/exports.h b/tools/testing/cxl/exports.h
index 7ebee7c0bd67..cbb16073be18 100644
--- a/tools/testing/cxl/exports.h
+++ b/tools/testing/cxl/exports.h
@@ -4,8 +4,8 @@
#define __MOCK_CXL_EXPORTS_H_
typedef struct cxl_dport *(*cxl_add_dport_by_dev_fn)(struct cxl_port *port,
- struct device *dport_dev);
-extern cxl_add_dport_by_dev_fn _devm_cxl_add_dport_by_dev;
+ struct device *dport_dev);
+extern cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev;
typedef int(*cxl_switch_decoders_setup_fn)(struct cxl_port *port);
extern cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup;
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 81e2aef3627a..b7a2b550c0b0 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -1060,8 +1060,8 @@ static struct cxl_dport *mock_cxl_add_dport_by_dev(struct cxl_port *port,
if (&pdev->dev != dport_dev)
continue;
- return devm_cxl_add_dport(port, &pdev->dev, pdev->id,
- CXL_RESOURCE_NONE);
+ return cxl_add_dport(port, &pdev->dev, pdev->id,
+ CXL_RESOURCE_NONE);
}
return ERR_PTR(-ENODEV);
@@ -1126,9 +1126,9 @@ static struct cxl_mock_ops cxl_mock_ops = {
.devm_cxl_switch_port_decoders_setup = mock_cxl_switch_port_decoders_setup,
.devm_cxl_endpoint_decoders_setup = mock_cxl_endpoint_decoders_setup,
.cxl_endpoint_parse_cdat = mock_cxl_endpoint_parse_cdat,
- .devm_cxl_add_dport_by_dev = mock_cxl_add_dport_by_dev,
.hmat_get_extended_linear_cache_size =
mock_hmat_get_extended_linear_cache_size,
+ .cxl_add_dport_by_dev = mock_cxl_add_dport_by_dev,
.list = LIST_HEAD_INIT(cxl_mock_ops.list),
};
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index 44bce80ef3ff..660e8402189c 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -15,14 +15,13 @@
static LIST_HEAD(mock);
static struct cxl_dport *
-redirect_devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev);
+redirect_cxl_add_dport_by_dev(struct cxl_port *port, struct device *dport_dev);
static int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port);
void register_cxl_mock_ops(struct cxl_mock_ops *ops)
{
list_add_rcu(&ops->list, &mock);
- _devm_cxl_add_dport_by_dev = redirect_devm_cxl_add_dport_by_dev;
+ _cxl_add_dport_by_dev = redirect_cxl_add_dport_by_dev;
_devm_cxl_switch_port_decoders_setup =
redirect_devm_cxl_switch_port_decoders_setup;
}
@@ -34,7 +33,7 @@ void unregister_cxl_mock_ops(struct cxl_mock_ops *ops)
{
_devm_cxl_switch_port_decoders_setup =
__devm_cxl_switch_port_decoders_setup;
- _devm_cxl_add_dport_by_dev = __devm_cxl_add_dport_by_dev;
+ _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
list_del_rcu(&ops->list);
synchronize_srcu(&cxl_mock_srcu);
}
@@ -207,7 +206,7 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, "CXL");
-struct cxl_dport *__wrap_devm_cxl_add_rch_dport(struct cxl_port *port,
+struct cxl_dport *__wrap_cxl_add_rch_dport(struct cxl_port *port,
struct device *dport_dev,
int port_id,
resource_size_t rcrb)
@@ -217,19 +216,19 @@ struct cxl_dport *__wrap_devm_cxl_add_rch_dport(struct cxl_port *port,
struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
if (ops && ops->is_mock_port(dport_dev)) {
- dport = devm_cxl_add_dport(port, dport_dev, port_id,
- CXL_RESOURCE_NONE);
+ dport = cxl_add_dport(port, dport_dev, port_id,
+ CXL_RESOURCE_NONE);
if (!IS_ERR(dport)) {
dport->rcrb.base = rcrb;
dport->rch = true;
}
} else
- dport = devm_cxl_add_rch_dport(port, dport_dev, port_id, rcrb);
+ dport = cxl_add_rch_dport(port, dport_dev, port_id, rcrb);
put_cxl_mock_ops(index);
return dport;
}
-EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_add_rch_dport, "CXL");
+EXPORT_SYMBOL_NS_GPL(__wrap_cxl_add_rch_dport, "CXL");
void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
{
@@ -257,17 +256,17 @@ void __wrap_cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device
}
EXPORT_SYMBOL_NS_GPL(__wrap_cxl_dport_init_ras_reporting, "CXL");
-struct cxl_dport *redirect_devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev)
+struct cxl_dport *redirect_cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev)
{
int index;
struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
struct cxl_dport *dport;
if (ops && ops->is_mock_port(port->uport_dev))
- dport = ops->devm_cxl_add_dport_by_dev(port, dport_dev);
+ dport = ops->cxl_add_dport_by_dev(port, dport_dev);
else
- dport = __devm_cxl_add_dport_by_dev(port, dport_dev);
+ dport = __cxl_add_dport_by_dev(port, dport_dev);
put_cxl_mock_ops(index);
return dport;
diff --git a/tools/testing/cxl/test/mock.h b/tools/testing/cxl/test/mock.h
index 2684b89c8aa2..fa13aca4e260 100644
--- a/tools/testing/cxl/test/mock.h
+++ b/tools/testing/cxl/test/mock.h
@@ -22,8 +22,8 @@ struct cxl_mock_ops {
int (*devm_cxl_switch_port_decoders_setup)(struct cxl_port *port);
int (*devm_cxl_endpoint_decoders_setup)(struct cxl_port *port);
void (*cxl_endpoint_parse_cdat)(struct cxl_port *port);
- struct cxl_dport *(*devm_cxl_add_dport_by_dev)(struct cxl_port *port,
- struct device *dport_dev);
+ struct cxl_dport *(*cxl_add_dport_by_dev)(struct cxl_port *port,
+ struct device *dport_dev);
int (*hmat_get_extended_linear_cache_size)(struct resource *backing_res,
int nid,
resource_size_t *cache_size);
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 20/34] cxl/port: Move dport operations to a driver event
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (18 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 21:45 ` Dave Jiang
2026-01-15 14:56 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 21/34] cxl/port: Move dport RAS reporting to a port resource Terry Bowman
` (13 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
From: Dan Williams <dan.j.williams@intel.com>
In preparation for adding more register setup to the cxl_port_add_dport()
path (for RAS register mapping), move the dport creation event to a driver
callback. This achieves 2 things it puts driver operations logically where
they belong, in a driver, and it obviates the gymnastics of
DECLARE_TESTABLE() which just makes a mess of grepping for CXL symbols.
In other words, a driver callback is less of an ongoing maintenance burden
than this DECLARE_TESTABLE arrangement that does not scale and diminishes
the grep-ability of the codebase.
cxl_port_add_dport() moves mostly unmodified from drivers/cxl/core/port.c.
The only deliberate change is that it now assumes that the device_lock is
held on entry and the driver is attached (just like cxl_port_probe()).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13 -> v14:
- New patch
---
drivers/cxl/core/hdm.c | 6 +--
drivers/cxl/core/pci.c | 8 +--
drivers/cxl/core/port.c | 79 ++++++----------------------
drivers/cxl/cxl.h | 23 ++------
drivers/cxl/port.c | 71 +++++++++++++++++++++++++
tools/testing/cxl/Kbuild | 2 +
tools/testing/cxl/cxl_core_exports.c | 21 --------
tools/testing/cxl/exports.h | 13 -----
tools/testing/cxl/test/mock.c | 23 +++-----
9 files changed, 107 insertions(+), 139 deletions(-)
delete mode 100644 tools/testing/cxl/exports.h
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 1c5d2022c87a..365b02b7a241 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -1219,12 +1219,12 @@ static int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
}
/**
- * __devm_cxl_switch_port_decoders_setup - allocate and setup switch decoders
+ * devm_cxl_switch_port_decoders_setup - allocate and setup switch decoders
* @port: CXL port context
*
* Return 0 or -errno on error
*/
-int __devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
+int devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
{
struct cxl_hdm *cxlhdm;
@@ -1248,7 +1248,7 @@ int __devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
dev_err(&port->dev, "HDM decoder capability not found\n");
return -ENXIO;
}
-EXPORT_SYMBOL_NS_GPL(__devm_cxl_switch_port_decoders_setup, "CXL");
+EXPORT_SYMBOL_NS_GPL(devm_cxl_switch_port_decoders_setup, "CXL");
/**
* devm_cxl_endpoint_decoders_setup - allocate and setup endpoint decoders
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 512a3e29a095..8633bfdef38d 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -41,14 +41,14 @@ static int pci_get_port_num(struct pci_dev *pdev)
}
/**
- * __cxl_add_dport_by_dev - allocate a dport by dport device
+ * cxl_add_dport_by_dev - allocate a dport by dport device
* @port: cxl_port that hosts the dport
* @dport_dev: 'struct device' of the dport
*
* Returns the allocated dport on success or ERR_PTR() of -errno on error
*/
-struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev)
+struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev)
{
struct cxl_register_map map;
struct pci_dev *pdev;
@@ -69,7 +69,7 @@ struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
device_lock_assert(&port->dev);
return cxl_add_dport(port, dport_dev, port_num, map.resource);
}
-EXPORT_SYMBOL_NS_GPL(__cxl_add_dport_by_dev, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_add_dport_by_dev, "CXL");
struct cxl_walk_context {
struct pci_bus *bus;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index a05a1812bb6e..2184c20af011 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1603,78 +1603,31 @@ static int update_decoder_targets(struct device *dev, void *data)
return 0;
}
-static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
+void cxl_port_update_decoder_targets(struct cxl_port *port,
+ struct cxl_dport *dport)
{
- if (!devres_open_group(&port->dev, port, GFP_KERNEL))
- return ERR_PTR(-ENOMEM);
- return port;
+ device_for_each_child(&port->dev, dport, update_decoder_targets);
}
+EXPORT_SYMBOL_NS_GPL(cxl_port_update_decoder_targets, "CXL");
+
DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
-static void cxl_port_group_close(struct cxl_port *port)
-{
- devres_remove_group(&port->dev, port);
-}
-
-static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
- struct device *dport_dev)
+static struct cxl_dport *probe_dport(struct cxl_port *port,
+ struct device *dport_dev)
{
- struct cxl_dport *new_dport;
- struct cxl_dport *dport;
- int rc;
+ struct cxl_driver *drv;
device_lock_assert(&port->dev);
if (!port->dev.driver)
return ERR_PTR(-ENXIO);
- dport = cxl_find_dport_by_dev(port, dport_dev);
- if (dport) {
- dev_dbg(&port->dev, "dport%d:%s already exists\n",
- dport->port_id, dev_name(dport_dev));
- return ERR_PTR(-EBUSY);
- }
-
- /*
- * With the first dport arrival it is now safe to start looking at
- * component registers. Be careful to not strand resources if dport
- * creation ultimately fails.
- */
- struct cxl_port *port_group __free(cxl_port_group_free) =
- cxl_port_devres_group(port);
- if (IS_ERR(port_group))
- return ERR_CAST(port_group);
-
- if (port->nr_dports == 0) {
- rc = devm_cxl_switch_port_decoders_setup(port);
- if (rc)
- return ERR_PTR(rc);
- /*
- * Note, when nr_dports returns to zero the port is unregistered
- * and triggers cleanup. I.e. no need for open-coded release
- * action on dport removal. See cxl_detach_ep() for that logic.
- */
- }
-
- new_dport = cxl_add_dport_by_dev(port, dport_dev);
- if (IS_ERR(new_dport))
- return new_dport;
-
- rc = cxl_dport_autoremove(new_dport);
- if (rc)
- return ERR_PTR(rc);
-
- cxl_switch_parse_cdat(new_dport);
-
- cxl_port_group_close(no_free_ptr(port_group));
-
- dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
- port->nr_dports - 1, new_dport->port_id, dev_name(dport_dev));
-
- /* New dport added, update the decoder targets */
- device_for_each_child(&port->dev, new_dport, update_decoder_targets);
+ drv = container_of(port->dev.driver, struct cxl_driver, drv);
+ if (!drv->add_dport)
+ return ERR_PTR(-ENXIO);
- return new_dport;
+ /* see cxl_port_add_dport() */
+ return drv->add_dport(port, dport_dev);
}
static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
@@ -1721,7 +1674,7 @@ static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
}
guard(device)(&port->dev);
- return cxl_port_add_dport(port, dport_dev);
+ return probe_dport(port, dport_dev);
}
static int add_port_attach_ep(struct cxl_memdev *cxlmd,
@@ -1753,7 +1706,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
scoped_guard(device, &parent_port->dev) {
parent_dport = cxl_find_dport_by_dev(parent_port, dparent);
if (!parent_dport) {
- parent_dport = cxl_port_add_dport(parent_port, dparent);
+ parent_dport = probe_dport(parent_port, dparent);
if (IS_ERR(parent_dport))
return PTR_ERR(parent_dport);
}
@@ -1789,7 +1742,7 @@ static struct cxl_dport *find_or_add_dport(struct cxl_port *port,
device_lock_assert(&port->dev);
dport = cxl_find_dport_by_dev(port, dport_dev);
if (!dport) {
- dport = cxl_port_add_dport(port, dport_dev);
+ dport = probe_dport(port, dport_dev);
if (IS_ERR(dport))
return dport;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 47ee06c95433..46491046f101 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -841,8 +841,9 @@ struct cxl_endpoint_dvsec_info {
};
int devm_cxl_switch_port_decoders_setup(struct cxl_port *port);
-int __devm_cxl_switch_port_decoders_setup(struct cxl_port *port);
int devm_cxl_endpoint_decoders_setup(struct cxl_port *port);
+void cxl_port_update_decoder_targets(struct cxl_port *port,
+ struct cxl_dport *dport);
struct cxl_dev_state;
int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
@@ -856,6 +857,8 @@ struct cxl_driver {
const char *name;
int (*probe)(struct device *dev);
void (*remove)(struct device *dev);
+ struct cxl_dport *(*add_dport)(struct cxl_port *port,
+ struct device *dport_dev);
struct device_driver drv;
int id;
};
@@ -940,8 +943,6 @@ void cxl_coordinates_combine(struct access_coordinate *out,
bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
struct device *dport_dev);
-struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev);
/*
* Unit test builds overrides this to __weak, find the 'strong' version
@@ -953,20 +954,4 @@ struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
u16 cxl_gpf_get_dvsec(struct device *dev);
-/*
- * Declaration for functions that are mocked by cxl_test that are called by
- * cxl_core. The respective functions are defined as __foo() and called by
- * cxl_core as foo(). The macros below ensures that those functions would
- * exist as foo(). See tools/testing/cxl/cxl_core_exports.c and
- * tools/testing/cxl/exports.h for setting up the mock functions. The dance
- * is done to avoid a circular dependency where cxl_core calls a function that
- * ends up being a mock function and goes to * cxl_test where it calls a
- * cxl_core function.
- */
-#ifndef CXL_TEST_ENABLE
-#define DECLARE_TESTABLE(x) __##x
-#define cxl_add_dport_by_dev DECLARE_TESTABLE(cxl_add_dport_by_dev)
-#define devm_cxl_switch_port_decoders_setup DECLARE_TESTABLE(devm_cxl_switch_port_decoders_setup)
-#endif
-
#endif /* __CXL_H__ */
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 167cc0a87484..2770bc8520d3 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -155,9 +155,80 @@ static const struct attribute_group *cxl_port_attribute_groups[] = {
NULL,
};
+static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
+{
+ if (!devres_open_group(&port->dev, port, GFP_KERNEL))
+ return ERR_PTR(-ENOMEM);
+ return port;
+}
+DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
+ if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
+
+static void cxl_port_group_close(struct cxl_port *port)
+{
+ devres_remove_group(&port->dev, port);
+}
+
+static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
+ struct device *dport_dev)
+{
+ struct cxl_dport *new_dport;
+ struct cxl_dport *dport;
+ int rc;
+
+ dport = cxl_find_dport_by_dev(port, dport_dev);
+ if (dport) {
+ dev_dbg(&port->dev, "dport%d:%s already exists\n",
+ dport->port_id, dev_name(dport_dev));
+ return ERR_PTR(-EBUSY);
+ }
+
+ /*
+ * With the first dport arrival it is now safe to start looking at
+ * component registers. Be careful to not strand resources if dport
+ * creation ultimately fails.
+ */
+ struct cxl_port *port_group __free(cxl_port_group_free) =
+ cxl_port_devres_group(port);
+ if (IS_ERR(port_group))
+ return ERR_CAST(port_group);
+
+ if (port->nr_dports == 0) {
+ rc = devm_cxl_switch_port_decoders_setup(port);
+ if (rc)
+ return ERR_PTR(rc);
+ /*
+ * Note, when nr_dports returns to zero the port is unregistered
+ * and triggers cleanup. I.e. no need for open-coded release
+ * action on dport removal. See cxl_detach_ep() for that logic.
+ */
+ }
+
+ new_dport = cxl_add_dport_by_dev(port, dport_dev);
+ if (IS_ERR(new_dport))
+ return new_dport;
+
+ rc = cxl_dport_autoremove(new_dport);
+ if (rc)
+ return ERR_PTR(rc);
+
+ cxl_switch_parse_cdat(new_dport);
+
+ cxl_port_group_close(no_free_ptr(port_group));
+
+ dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
+ port->nr_dports - 1, new_dport->port_id, dev_name(dport_dev));
+
+ /* New dport added, update the decoder targets */
+ cxl_port_update_decoder_targets(port, new_dport);
+
+ return new_dport;
+}
+
static struct cxl_driver cxl_port_driver = {
.name = "cxl_port",
.probe = cxl_port_probe,
+ .add_dport = cxl_port_add_dport,
.id = CXL_DEVICE_PORT,
.drv = {
.dev_groups = cxl_port_attribute_groups,
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 4d740392aac5..25516728535e 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -11,6 +11,8 @@ ldflags-y += --wrap=cxl_endpoint_parse_cdat
ldflags-y += --wrap=cxl_dport_init_ras_reporting
ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
ldflags-y += --wrap=hmat_get_extended_linear_cache_size
+ldflags-y += --wrap=cxl_add_dport_by_dev
+ldflags-y += --wrap=devm_cxl_switch_port_decoders_setup
DRIVERS := ../../../drivers
CXL_SRC := $(DRIVERS)/cxl
diff --git a/tools/testing/cxl/cxl_core_exports.c b/tools/testing/cxl/cxl_core_exports.c
index 02d479867a12..f088792a8925 100644
--- a/tools/testing/cxl/cxl_core_exports.c
+++ b/tools/testing/cxl/cxl_core_exports.c
@@ -2,27 +2,6 @@
/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
#include "cxl.h"
-#include "exports.h"
/* Exporting of cxl_core symbols that are only used by cxl_test */
EXPORT_SYMBOL_NS_GPL(cxl_num_decoders_committed, "CXL");
-
-cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
-EXPORT_SYMBOL_NS_GPL(_cxl_add_dport_by_dev, "CXL");
-
-struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev)
-{
- return _cxl_add_dport_by_dev(port, dport_dev);
-}
-EXPORT_SYMBOL_NS_GPL(cxl_add_dport_by_dev, "CXL");
-
-cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup =
- __devm_cxl_switch_port_decoders_setup;
-EXPORT_SYMBOL_NS_GPL(_devm_cxl_switch_port_decoders_setup, "CXL");
-
-int devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
-{
- return _devm_cxl_switch_port_decoders_setup(port);
-}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_switch_port_decoders_setup, "CXL");
diff --git a/tools/testing/cxl/exports.h b/tools/testing/cxl/exports.h
deleted file mode 100644
index cbb16073be18..000000000000
--- a/tools/testing/cxl/exports.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright(c) 2025 Intel Corporation */
-#ifndef __MOCK_CXL_EXPORTS_H_
-#define __MOCK_CXL_EXPORTS_H_
-
-typedef struct cxl_dport *(*cxl_add_dport_by_dev_fn)(struct cxl_port *port,
- struct device *dport_dev);
-extern cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev;
-
-typedef int(*cxl_switch_decoders_setup_fn)(struct cxl_port *port);
-extern cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup;
-
-#endif
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index 660e8402189c..10140a4c5fac 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -10,20 +10,12 @@
#include <cxlmem.h>
#include <cxlpci.h>
#include "mock.h"
-#include "../exports.h"
static LIST_HEAD(mock);
-static struct cxl_dport *
-redirect_cxl_add_dport_by_dev(struct cxl_port *port, struct device *dport_dev);
-static int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port);
-
void register_cxl_mock_ops(struct cxl_mock_ops *ops)
{
list_add_rcu(&ops->list, &mock);
- _cxl_add_dport_by_dev = redirect_cxl_add_dport_by_dev;
- _devm_cxl_switch_port_decoders_setup =
- redirect_devm_cxl_switch_port_decoders_setup;
}
EXPORT_SYMBOL_GPL(register_cxl_mock_ops);
@@ -31,9 +23,6 @@ DEFINE_STATIC_SRCU(cxl_mock_srcu);
void unregister_cxl_mock_ops(struct cxl_mock_ops *ops)
{
- _devm_cxl_switch_port_decoders_setup =
- __devm_cxl_switch_port_decoders_setup;
- _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
list_del_rcu(&ops->list);
synchronize_srcu(&cxl_mock_srcu);
}
@@ -162,7 +151,7 @@ __wrap_nvdimm_bus_register(struct device *dev,
}
EXPORT_SYMBOL_GPL(__wrap_nvdimm_bus_register);
-int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
+int __wrap_devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
{
int rc, index;
struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
@@ -170,11 +159,12 @@ int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
if (ops && ops->is_mock_port(port->uport_dev))
rc = ops->devm_cxl_switch_port_decoders_setup(port);
else
- rc = __devm_cxl_switch_port_decoders_setup(port);
+ rc = devm_cxl_switch_port_decoders_setup(port);
put_cxl_mock_ops(index);
return rc;
}
+EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_switch_port_decoders_setup, "CXL");
int __wrap_devm_cxl_endpoint_decoders_setup(struct cxl_port *port)
{
@@ -256,8 +246,8 @@ void __wrap_cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device
}
EXPORT_SYMBOL_NS_GPL(__wrap_cxl_dport_init_ras_reporting, "CXL");
-struct cxl_dport *redirect_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev)
+struct cxl_dport *__wrap_cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev)
{
int index;
struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
@@ -266,11 +256,12 @@ struct cxl_dport *redirect_cxl_add_dport_by_dev(struct cxl_port *port,
if (ops && ops->is_mock_port(port->uport_dev))
dport = ops->cxl_add_dport_by_dev(port, dport_dev);
else
- dport = __cxl_add_dport_by_dev(port, dport_dev);
+ dport = cxl_add_dport_by_dev(port, dport_dev);
put_cxl_mock_ops(index);
return dport;
}
+EXPORT_SYMBOL_NS_GPL(__wrap_cxl_add_dport_by_dev, "CXL");
MODULE_LICENSE("GPL v2");
MODULE_DESCRIPTION("cxl_test: emulation module");
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 21/34] cxl/port: Move dport RAS reporting to a port resource
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (19 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 20/34] cxl/port: Move dport operations to a driver event Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 21:47 ` Dave Jiang
2026-01-15 15:02 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 22/34] cxl: Update CXL Endpoint tracing Terry Bowman
` (12 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
From: Dan Williams <dan.j.williams@intel.com>
Towards the end goal of making all CXL RAS capability handling uniform
across upstream host bridges, upstream switch ports, and upstream endpoint
ports, move dport RAS setup to cxl_endpoint_port_probe(). Rename the RAS
setup helper to devm_cxl_dport_ras_setup() for symmetry with
devm_cxl_switch_port_decoders_setup().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13 -> v14:
- New patch
---
drivers/cxl/core/ras.c | 12 ++++++------
drivers/cxl/cxlpci.h | 8 ++++----
drivers/cxl/mem.c | 2 --
drivers/cxl/port.c | 12 ++++++++++++
tools/testing/cxl/Kbuild | 2 +-
tools/testing/cxl/test/mock.c | 6 +++---
6 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 72908f3ced77..d71fcac31cf2 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -139,17 +139,17 @@ static void cxl_dport_map_ras(struct cxl_dport *dport)
}
/**
- * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
+ * devm_cxl_dport_ras_setup - Setup CXL RAS report on this dport
* @dport: the cxl_dport that needs to be initialized
- * @host: host device for devm operations
*/
-void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
+void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
{
- dport->reg_map.host = host;
+ dport->reg_map.host = &dport->port->dev;
cxl_dport_map_ras(dport);
if (dport->rch) {
- struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
+ struct pci_host_bridge *host_bridge =
+ to_pci_host_bridge(dport->dport_dev);
if (!host_bridge->native_aer)
return;
@@ -158,7 +158,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
cxl_disable_rch_root_ints(dport);
}
}
-EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
+EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_ras_setup, "CXL");
void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
{
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 6f9c78886fd9..e41bb93d583a 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -81,7 +81,7 @@ void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
-void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
+void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
#else
static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
@@ -90,9 +90,9 @@ static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
{
return PCI_ERS_RESULT_NONE;
}
-
-static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
- struct device *host) { }
+static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
+{
+}
#endif
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index c2ee7f7f6320..e25c33f8c6cf 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -166,8 +166,6 @@ static int cxl_mem_probe(struct device *dev)
else
endpoint_parent = &parent_port->dev;
- cxl_dport_init_ras_reporting(dport, dev);
-
scoped_guard(device, endpoint_parent) {
if (!endpoint_parent->driver) {
dev_err(dev, "CXL port topology %s not enabled\n",
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 2770bc8520d3..8f8fc98c1428 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -75,6 +75,7 @@ static int cxl_switch_port_probe(struct cxl_port *port)
static int cxl_endpoint_port_probe(struct cxl_port *port)
{
struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
+ struct cxl_dport *dport = port->parent_dport;
int rc;
/* Cache the data early to ensure is_visible() works */
@@ -90,6 +91,17 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
if (rc)
return rc;
+ /*
+ * With VH (CXL Virtual Host) topology the cxl_port::add_dport() method
+ * handles RAS setup for downstream ports. With RCH (CXL Restricted CXL
+ * Host) topologies the downstream port is enumerated early by platform
+ * firmware, but the RCRB (root complex register block) is not mapped
+ * until after the cxl_pci driver attaches to the RCIeP (root complex
+ * integrated endpoint).
+ */
+ if (dport->rch)
+ devm_cxl_dport_ras_setup(dport);
+
/*
* Now that all endpoint decoders are successfully enumerated, try to
* assemble regions from committed decoders
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 25516728535e..7250bedf0448 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -8,7 +8,7 @@ ldflags-y += --wrap=cxl_await_media_ready
ldflags-y += --wrap=cxl_add_rch_dport
ldflags-y += --wrap=cxl_rcd_component_reg_phys
ldflags-y += --wrap=cxl_endpoint_parse_cdat
-ldflags-y += --wrap=cxl_dport_init_ras_reporting
+ldflags-y += --wrap=devm_cxl_dport_ras_setup
ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
ldflags-y += --wrap=hmat_get_extended_linear_cache_size
ldflags-y += --wrap=cxl_add_dport_by_dev
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index 10140a4c5fac..8883357ee50d 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -234,17 +234,17 @@ void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(__wrap_cxl_endpoint_parse_cdat, "CXL");
-void __wrap_cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
+void __wrap_devm_cxl_dport_ras_setup(struct cxl_dport *dport)
{
int index;
struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
if (!ops || !ops->is_mock_port(dport->dport_dev))
- cxl_dport_init_ras_reporting(dport, host);
+ devm_cxl_dport_ras_setup(dport);
put_cxl_mock_ops(index);
}
-EXPORT_SYMBOL_NS_GPL(__wrap_cxl_dport_init_ras_reporting, "CXL");
+EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_dport_ras_setup, "CXL");
struct cxl_dport *__wrap_cxl_add_dport_by_dev(struct cxl_port *port,
struct device *dport_dev)
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 22/34] cxl: Update CXL Endpoint tracing
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (20 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 21/34] cxl/port: Move dport RAS reporting to a port resource Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 18:20 ` [PATCH v14 23/34] cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
` (11 subsequent siblings)
33 siblings, 0 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
CXL protocol error handling will be expanded to soon include CXL Port
support along with existing Endpoint support. 2 updates are needed first:
- Update calling interfaces to use 'struct device*'
- Log serial number
Add serial number parameter to the trace logging. This is used for EPs
and 0 is provided for CXL port devices without a serial number.
Leave the correctable and uncorrectable trace routines' TP_STRUCT__entry()
unchanged with respect to member data types and order.
Below is output of correctable and uncorrectable protocol error logging.
CXL Root Port and CXL Endpoint examples are included below.
Root Port:
cxl_aer_correctable_error: device=0000:0c:00.0 host=pci0000:0c serial: 0 status='CRC Threshold Hit'
cxl_aer_uncorrectable_error: device=0000:0c:00.0 host=pci0000:0c serial: 0 status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Endpoint:
cxl_aer_correctable_error: memdev=mem3 host=0000:0f:00.0 serial=0 status='CRC Threshold Hit'
cxl_aer_uncorrectable_error: memdev=mem3 host=0000:0f:00.0 serial: 0 status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
Changes in v13->v14:
- Update commit headline (Bjorn)
Changes in v12->v13:
- Added Dave Jiang's review-by
Changes in v11 -> v12:
- Correct parameters to call trace_cxl_aer_correctable_error()
- Add reviewed-by for Jonathan and Shiju
Changes in v10->v11:
- Updated CE and UCE trace routines to maintain consistent TP_Struct ABI
and unchanged TP_printk() logging.
---
drivers/cxl/core/core.h | 4 ++--
drivers/cxl/core/ras.c | 35 ++++++++++++++++++++---------------
drivers/cxl/core/ras_rch.c | 4 ++--
drivers/cxl/core/trace.h | 25 +++++++++++++------------
4 files changed, 37 insertions(+), 31 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 422531799af2..306762a15dc0 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -147,8 +147,8 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
#ifdef CONFIG_CXL_RAS
int cxl_ras_init(void);
void cxl_ras_exit(void);
-bool cxl_handle_ras(struct device *dev, void __iomem *ras_base);
-void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);
+bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
+void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base);
void cxl_dport_map_rch_aer(struct cxl_dport *dport);
void cxl_disable_rch_root_ints(struct cxl_dport *dport);
void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index d71fcac31cf2..84abcf90fa99 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -13,7 +13,7 @@ static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
{
u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
- trace_cxl_port_aer_correctable_error(&pdev->dev, status);
+ trace_cxl_aer_correctable_error(&pdev->dev, status, 0);
}
static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,
@@ -28,8 +28,8 @@ static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,
else
fe = status;
- trace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,
- ras_cap.header_log);
+ trace_cxl_aer_uncorrectable_error(&pdev->dev, status, fe,
+ ras_cap.header_log, 0);
}
static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd,
@@ -37,7 +37,7 @@ static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd,
{
u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
- trace_cxl_aer_correctable_error(cxlmd, status);
+ trace_cxl_aer_correctable_error(&cxlmd->dev, status, cxlmd->cxlds->serial);
}
static void
@@ -45,6 +45,7 @@ cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,
struct cxl_ras_capability_regs ras_cap)
{
u32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
u32 fe;
if (hweight32(status) > 1)
@@ -53,8 +54,9 @@ cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,
else
fe = status;
- trace_cxl_aer_uncorrectable_error(cxlmd, status, fe,
- ras_cap.header_log);
+ trace_cxl_aer_uncorrectable_error(&cxlmd->dev, status, fe,
+ ras_cap.header_log,
+ cxlds->serial);
}
static int match_memdev_by_parent(struct device *dev, const void *uport)
@@ -160,7 +162,7 @@ void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_ras_setup, "CXL");
-void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
+void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
void __iomem *addr;
u32 status;
@@ -170,10 +172,11 @@ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
status = readl(addr);
- if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
- writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
- }
+ if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK))
+ return;
+ writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
+
+ trace_cxl_aer_correctable_error(dev, status, serial);
}
/* CXL spec rev3.0 8.2.4.16.1 */
@@ -197,7 +200,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
*/
-bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
+bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
u32 hl[CXL_HEADERLOG_SIZE_U32];
void __iomem *addr;
@@ -224,7 +227,7 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
}
header_log_copy(ras_base, hl);
- trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
+ trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
return true;
@@ -246,7 +249,8 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
if (cxlds->rcd)
cxl_handle_rdport_errors(cxlds);
- cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
+ cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial,
+ cxlds->regs.ras);
}
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
@@ -275,7 +279,8 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
* chance the situation is recoverable dump the status of the RAS
* capability registers and bounce the active state of the memdev.
*/
- ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
+ ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial,
+ cxlds->regs.ras);
}
diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
index 0a8b3b9b6388..3e33374e07f2 100644
--- a/drivers/cxl/core/ras_rch.c
+++ b/drivers/cxl/core/ras_rch.c
@@ -115,7 +115,7 @@ void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
pci_print_aer(pdev, severity, &aer_regs);
if (severity == AER_CORRECTABLE)
- cxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras);
+ cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial, dport->regs.ras);
else
- cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras);
+ cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial, dport->regs.ras);
}
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index a972e4ef1936..c569d92b6000 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -77,11 +77,12 @@ TRACE_EVENT(cxl_port_aer_uncorrectable_error,
);
TRACE_EVENT(cxl_aer_uncorrectable_error,
- TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),
- TP_ARGS(cxlmd, status, fe, hl),
+ TP_PROTO(const struct device *cxlmd, u32 status, u32 fe, u32 *hl,
+ u64 serial),
+ TP_ARGS(cxlmd, status, fe, hl, serial),
TP_STRUCT__entry(
- __string(memdev, dev_name(&cxlmd->dev))
- __string(host, dev_name(cxlmd->dev.parent))
+ __string(memdev, dev_name(cxlmd))
+ __string(host, dev_name(cxlmd->parent))
__field(u64, serial)
__field(u32, status)
__field(u32, first_error)
@@ -90,7 +91,7 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
TP_fast_assign(
__assign_str(memdev);
__assign_str(host);
- __entry->serial = cxlmd->cxlds->serial;
+ __entry->serial = serial;
__entry->status = status;
__entry->first_error = fe;
/*
@@ -138,24 +139,24 @@ TRACE_EVENT(cxl_port_aer_correctable_error,
__entry->status = status;
),
TP_printk("device=%s host=%s status='%s'",
- __get_str(device), __get_str(host),
- show_ce_errs(__entry->status)
+ __get_str(device), __get_str(host),
+ show_ce_errs(__entry->status)
)
);
TRACE_EVENT(cxl_aer_correctable_error,
- TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
- TP_ARGS(cxlmd, status),
+ TP_PROTO(const struct device *cxlmd, u32 status, u64 serial),
+ TP_ARGS(cxlmd, status, serial),
TP_STRUCT__entry(
- __string(memdev, dev_name(&cxlmd->dev))
- __string(host, dev_name(cxlmd->dev.parent))
+ __string(memdev, dev_name(cxlmd))
+ __string(host, dev_name(cxlmd->parent))
__field(u64, serial)
__field(u32, status)
),
TP_fast_assign(
__assign_str(memdev);
__assign_str(host);
- __entry->serial = cxlmd->cxlds->serial;
+ __entry->serial = serial;
__entry->status = status;
),
TP_printk("memdev=%s host=%s serial=%lld: status: '%s'",
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 23/34] cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (21 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 22/34] cxl: Update CXL Endpoint tracing Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 21:53 ` Dave Jiang
2026-01-15 15:17 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 24/34] cxl/port: Move endpoint component register management to cxl_port Terry Bowman
` (10 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
In preparation for CXL VH (Virtual Host) topology protocol error handling,
add RAS capability registered mapping for all ports in a CXL VH topology.
This includes the RAS capabilities of Switch Upstream Ports, Switch
Downstream Ports, Host Bridge Ports ("upstream"), and Root Ports
("downstream")
Update cxl_port_add_dport() to map the upstream RAS capability on first
'dport' attach, and downstream RAS capability on each 'dport' attach.
Arrange for dport mappings to be released at del_dport() time.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
[djbw: reword changelog, fix devm handling]
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13->v14:
- Correct message spelling (Terry)
---
drivers/cxl/core/port.c | 2 +-
drivers/cxl/core/ras.c | 11 +++++++++++
drivers/cxl/cxl.h | 2 ++
drivers/cxl/cxlpci.h | 4 ++++
drivers/cxl/port.c | 37 +++++++++++++++++++++++++++++++++++
tools/testing/cxl/Kbuild | 1 +
tools/testing/cxl/test/mock.c | 12 ++++++++++++
7 files changed, 68 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 2184c20af011..2c4e28e7975c 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1451,7 +1451,7 @@ static void del_dport(struct cxl_dport *dport)
{
struct cxl_port *port = dport->port;
- devm_release_action(&port->dev, unlink_dport, dport);
+ devres_release_group(&port->dev, dport);
}
static void del_dports(struct cxl_port *port)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 84abcf90fa99..76ac567724e3 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -162,6 +162,17 @@ void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_ras_setup, "CXL");
+void devm_cxl_port_ras_setup(struct cxl_port *port)
+{
+ struct cxl_register_map *map = &port->reg_map;
+
+ map->host = &port->dev;
+ if (cxl_map_component_regs(map, &port->regs,
+ BIT(CXL_CM_CAP_CAP_ID_RAS)))
+ dev_dbg(&port->dev, "Failed to map RAS capability\n");
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
+
void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
void __iomem *addr;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 46491046f101..805923693707 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -607,6 +607,7 @@ struct cxl_dax_region {
* @parent_dport: dport that points to this port in the parent
* @decoder_ida: allocator for decoder ids
* @reg_map: component and ras register mapping parameters
+ * @regs: mapped component registers
* @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering
* @commit_end: cursor to track highest committed decoder for commit ordering
@@ -628,6 +629,7 @@ struct cxl_port {
struct cxl_dport *parent_dport;
struct ida decoder_ida;
struct cxl_register_map reg_map;
+ struct cxl_component_regs regs;
int nr_dports;
int hdm_end;
int commit_end;
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index e41bb93d583a..ef4496b4e55e 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -82,6 +82,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
+void devm_cxl_port_ras_setup(struct cxl_port *port);
#else
static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
@@ -93,6 +94,9 @@ static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
{
}
+static inline void devm_cxl_port_ras_setup(struct cxl_port *port)
+{
+}
#endif
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 8f8fc98c1428..0d6e010e21ca 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -176,11 +176,29 @@ static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
+static struct cxl_dport *cxl_dport_devres_group(struct cxl_dport *dport)
+{
+ if (!devres_open_group(&dport->port->dev, dport, GFP_KERNEL))
+ return ERR_PTR(-ENOMEM);
+ return dport;
+}
+DEFINE_FREE(cxl_dport_group_free, struct cxl_dport *,
+ if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->port->dev, _T))
+
static void cxl_port_group_close(struct cxl_port *port)
{
devres_remove_group(&port->dev, port);
}
+/*
+ * Unlike the port group, that just facilitates unwind of setup failures, the
+ * dport group needs to stay live for del_dport() to reference.
+ */
+static void cxl_dport_group_close(struct cxl_dport *dport)
+{
+ devres_close_group(&dport->port->dev, dport);
+}
+
static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
struct device *dport_dev)
{
@@ -209,6 +227,13 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
rc = devm_cxl_switch_port_decoders_setup(port);
if (rc)
return ERR_PTR(rc);
+
+ /*
+ * RAS setup is optional, either driver operation can continue
+ * on failure, or the device does not implement RAS registers.
+ */
+ devm_cxl_port_ras_setup(port);
+
/*
* Note, when nr_dports returns to zero the port is unregistered
* and triggers cleanup. I.e. no need for open-coded release
@@ -220,12 +245,24 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
if (IS_ERR(new_dport))
return new_dport;
+ /*
+ * Establish a group for all dport resources that need to be released
+ * when the dport is deleted.
+ */
+ struct cxl_dport *dport_group __free(cxl_dport_group_free) =
+ cxl_dport_devres_group(new_dport);
+ if (IS_ERR(dport_group))
+ return ERR_CAST(dport_group);
+
rc = cxl_dport_autoremove(new_dport);
if (rc)
return ERR_PTR(rc);
+ devm_cxl_dport_ras_setup(new_dport);
+
cxl_switch_parse_cdat(new_dport);
+ cxl_dport_group_close(no_free_ptr(dport_group));
cxl_port_group_close(no_free_ptr(port_group));
dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 7250bedf0448..6c516019600e 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -13,6 +13,7 @@ ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
ldflags-y += --wrap=hmat_get_extended_linear_cache_size
ldflags-y += --wrap=cxl_add_dport_by_dev
ldflags-y += --wrap=devm_cxl_switch_port_decoders_setup
+ldflags-y += --wrap=devm_cxl_port_ras_setup
DRIVERS := ../../../drivers
CXL_SRC := $(DRIVERS)/cxl
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index 8883357ee50d..a0b87bbb2f75 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -246,6 +246,18 @@ void __wrap_devm_cxl_dport_ras_setup(struct cxl_dport *dport)
}
EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_dport_ras_setup, "CXL");
+void __wrap_devm_cxl_port_ras_setup(struct cxl_port *port)
+{
+ int index;
+ struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
+
+ if (!ops || !ops->is_mock_port(port->uport_dev))
+ devm_cxl_port_ras_setup(port);
+
+ put_cxl_mock_ops(index);
+}
+EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_port_ras_setup, "CXL");
+
struct cxl_dport *__wrap_cxl_add_dport_by_dev(struct cxl_port *port,
struct device *dport_dev)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 24/34] cxl/port: Move endpoint component register management to cxl_port
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (22 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 23/34] cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 21:55 ` Dave Jiang
2026-01-15 15:28 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 25/34] cxl/port: Map Port component registers before switchport init Terry Bowman
` (9 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
From: Dan Williams <dan.j.williams@intel.com>
In preparation for generic protocol error handling across CXL endpoints,
whether they be memory expander class devices or accelerators, drop the
endpoint component management from cxl_dev_state.
Organize all CXL port component management through the common cxl_port
driver.
Note that the end game is that drivers/cxl/core/ras.c loses all
dependencies on a 'struct cxl_dev_state' parameter and operates only on
port resources. The removal of component register mapping from cxl_pci is
an incremental step towards that.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13 -> v14:
- New patch
- Update log message for cxl_ras_unmask() failure (Dan)
---
drivers/cxl/core/ras.c | 6 ++--
drivers/cxl/cxlmem.h | 4 +--
drivers/cxl/pci.c | 63 +-----------------------------------------
drivers/cxl/port.c | 54 ++++++++++++++++++++++++++++++++++++
4 files changed, 60 insertions(+), 67 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 76ac567724e3..b37108f60c56 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -247,6 +247,7 @@ bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
void cxl_cor_error_detected(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+ struct cxl_memdev *cxlmd = cxlds->cxlmd;
struct device *dev = &cxlds->cxlmd->dev;
scoped_guard(device, dev) {
@@ -261,7 +262,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
cxl_handle_rdport_errors(cxlds);
cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial,
- cxlds->regs.ras);
+ cxlmd->endpoint->regs.ras);
}
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
@@ -291,10 +292,9 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
* capability registers and bounce the active state of the memdev.
*/
ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial,
- cxlds->regs.ras);
+ cxlmd->endpoint->regs.ras);
}
-
switch (state) {
case pci_channel_io_normal:
if (ue) {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 434031a0c1f7..ab7201ef3ea6 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -415,7 +415,7 @@ struct cxl_dpa_partition {
* @dev: The device associated with this CXL state
* @cxlmd: The device representing the CXL.mem capabilities of @dev
* @reg_map: component and ras register mapping parameters
- * @regs: Parsed register blocks
+ * @regs: Class device "Device" registers
* @cxl_dvsec: Offset to the PCIe device DVSEC
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
* @media_ready: Indicate whether the device media is usable
@@ -431,7 +431,7 @@ struct cxl_dev_state {
struct device *dev;
struct cxl_memdev *cxlmd;
struct cxl_register_map reg_map;
- struct cxl_regs regs;
+ struct cxl_device_regs regs;
int cxl_dvsec;
bool rcd;
bool media_ready;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index b7f694bda913..acb0eb2a13c3 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -535,52 +535,6 @@ static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
return cxl_setup_regs(map);
}
-static int cxl_pci_ras_unmask(struct pci_dev *pdev)
-{
- struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- void __iomem *addr;
- u32 orig_val, val, mask;
- u16 cap;
- int rc;
-
- if (!cxlds->regs.ras) {
- dev_dbg(&pdev->dev, "No RAS registers.\n");
- return 0;
- }
-
- /* BIOS has PCIe AER error control */
- if (!pcie_aer_is_native(pdev))
- return 0;
-
- rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
- if (rc)
- return rc;
-
- if (cap & PCI_EXP_DEVCTL_URRE) {
- addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
- orig_val = readl(addr);
-
- mask = CXL_RAS_UNCORRECTABLE_MASK_MASK |
- CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
- val = orig_val & ~mask;
- writel(val, addr);
- dev_dbg(&pdev->dev,
- "Uncorrectable RAS Errors Mask: %#x -> %#x\n",
- orig_val, val);
- }
-
- if (cap & PCI_EXP_DEVCTL_CERE) {
- addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
- orig_val = readl(addr);
- val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK;
- writel(val, addr);
- dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x -> %#x\n",
- orig_val, val);
- }
-
- return 0;
-}
-
static void free_event_buf(void *buf)
{
kvfree(buf);
@@ -912,13 +866,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
unsigned int i;
bool irq_avail;
- /*
- * Double check the anonymous union trickery in struct cxl_regs
- * FIXME switch to struct_group()
- */
- BUILD_BUG_ON(offsetof(struct cxl_regs, memdev) !=
- offsetof(struct cxl_regs, device_regs.memdev));
-
rc = pcim_enable_device(pdev);
if (rc)
return rc;
@@ -942,7 +889,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
return rc;
- rc = cxl_map_device_regs(&map, &cxlds->regs.device_regs);
+ rc = cxl_map_device_regs(&map, &cxlds->regs);
if (rc)
return rc;
@@ -957,11 +904,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
else if (!cxlds->reg_map.component_map.ras.valid)
dev_dbg(&pdev->dev, "RAS registers not found\n");
- rc = cxl_map_component_regs(&cxlds->reg_map, &cxlds->regs.component,
- BIT(CXL_CM_CAP_CAP_ID_RAS));
- if (rc)
- dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
-
rc = cxl_pci_type3_init_mailbox(cxlds);
if (rc)
return rc;
@@ -1052,9 +994,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
return rc;
- if (cxl_pci_ras_unmask(pdev))
- dev_dbg(&pdev->dev, "No RAS reporting unmasked\n");
-
pci_save_state(pdev);
return rc;
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 0d6e010e21ca..d76b4b532064 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+#include <linux/aer.h>
#include <linux/device.h>
#include <linux/module.h>
#include <linux/slab.h>
@@ -72,6 +73,55 @@ static int cxl_switch_port_probe(struct cxl_port *port)
return 0;
}
+static int cxl_ras_unmask(struct cxl_port *port)
+{
+ struct pci_dev *pdev;
+ void __iomem *addr;
+ u32 orig_val, val, mask;
+ u16 cap;
+ int rc;
+
+ if (!dev_is_pci(port->uport_dev))
+ return 0;
+ pdev = to_pci_dev(port->uport_dev);
+
+ if (!port->regs.ras) {
+ pci_dbg(pdev, "No RAS registers.\n");
+ return 0;
+ }
+
+ /* BIOS has PCIe AER error control */
+ if (!pcie_aer_is_native(pdev))
+ return 0;
+
+ rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
+ if (rc)
+ return rc;
+
+ if (cap & PCI_EXP_DEVCTL_URRE) {
+ addr = port->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
+ orig_val = readl(addr);
+
+ mask = CXL_RAS_UNCORRECTABLE_MASK_MASK |
+ CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
+ val = orig_val & ~mask;
+ writel(val, addr);
+ pci_dbg(pdev, "Uncorrectable RAS Errors Mask: %#x -> %#x\n",
+ orig_val, val);
+ }
+
+ if (cap & PCI_EXP_DEVCTL_CERE) {
+ addr = port->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
+ orig_val = readl(addr);
+ val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK;
+ writel(val, addr);
+ pci_dbg(pdev, "Correctable RAS Errors Mask: %#x -> %#x\n",
+ orig_val, val);
+ }
+
+ return 0;
+}
+
static int cxl_endpoint_port_probe(struct cxl_port *port)
{
struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
@@ -102,6 +152,10 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
if (dport->rch)
devm_cxl_dport_ras_setup(dport);
+ devm_cxl_port_ras_setup(port);
+ if (cxl_ras_unmask(port))
+ dev_dbg(&port->dev, "failed to unmask RAS interrupts\n");
+
/*
* Now that all endpoint decoders are successfully enumerated, try to
* assemble regions from committed decoders
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 25/34] cxl/port: Map Port component registers before switchport init
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (23 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 24/34] cxl/port: Move endpoint component register management to cxl_port Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 21:59 ` Dave Jiang
2026-01-15 15:30 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 26/34] cxl: Change CXL handlers to use guard() instead of scoped_guard() Terry Bowman
` (8 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
Port HDM registers must be mapped before calling
devm_cxl_switch_port_decoders_setup(). Invoke a call to this function
in cxl_port_add_dport().
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/port.c | 3 ++-
drivers/cxl/cxlpci.h | 3 +++
drivers/cxl/port.c | 5 +++++
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 2c4e28e7975c..3f730511f11d 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -778,7 +778,7 @@ static int cxl_setup_comp_regs(struct device *host, struct cxl_register_map *map
return cxl_setup_regs(map);
}
-static int cxl_port_setup_regs(struct cxl_port *port,
+int cxl_port_setup_regs(struct cxl_port *port,
resource_size_t component_reg_phys)
{
if (dev_is_platform(port->uport_dev))
@@ -786,6 +786,7 @@ static int cxl_port_setup_regs(struct cxl_port *port,
return cxl_setup_comp_regs(&port->dev, &port->reg_map,
component_reg_phys);
}
+EXPORT_SYMBOL_NS_GPL(cxl_port_setup_regs, "CXL");
static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
resource_size_t component_reg_phys)
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index ef4496b4e55e..532506595d0f 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -99,4 +99,7 @@ static inline void devm_cxl_port_ras_setup(struct cxl_port *port)
}
#endif
+int cxl_port_setup_regs(struct cxl_port *port,
+ resource_size_t component_reg_phys);
+
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index d76b4b532064..f8a33dbf8222 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -278,6 +278,11 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
return ERR_CAST(port_group);
if (port->nr_dports == 0) {
+
+ rc = cxl_port_setup_regs(port, port->component_reg_phys);
+ if (rc)
+ return ERR_PTR(rc);
+
rc = devm_cxl_switch_port_decoders_setup(port);
if (rc)
return ERR_PTR(rc);
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 26/34] cxl: Change CXL handlers to use guard() instead of scoped_guard()
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (24 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 25/34] cxl/port: Map Port component registers before switchport init Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-23 10:05 ` Markus Elfring
2026-01-14 18:20 ` [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
` (7 subsequent siblings)
33 siblings, 1 reply; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The CXL protocol error handlers use scoped_guard() to guarantee access to
the underlying CXL memory device. Improve readability and reduce complexity
by changing the current scoped_guard() to be guard().
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>>
---
Changes in v13->v14:
- Add reviewed-by for Jonathan and Dave Jiang
Changes in v12->v13:
- New patch
---
drivers/cxl/core/ras.c | 58 +++++++++++++++++++++---------------------
1 file changed, 29 insertions(+), 29 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index b37108f60c56..bf82880e19b4 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -250,20 +250,20 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
struct cxl_memdev *cxlmd = cxlds->cxlmd;
struct device *dev = &cxlds->cxlmd->dev;
- scoped_guard(device, dev) {
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return;
- }
+ guard(device)(dev);
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
-
- cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial,
- cxlmd->endpoint->regs.ras);
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return;
}
+
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+
+ cxl_handle_cor_ras(&cxlmd->dev, cxlds->serial,
+ cxlmd->endpoint->regs.ras);
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
@@ -275,26 +275,26 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
struct device *dev = &cxlmd->dev;
bool ue;
- scoped_guard(device, dev) {
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return PCI_ERS_RESULT_DISCONNECT;
- }
+ guard(device)(dev);
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
- /*
- * A frozen channel indicates an impending reset which is fatal to
- * CXL.mem operation, and will likely crash the system. On the off
- * chance the situation is recoverable dump the status of the RAS
- * capability registers and bounce the active state of the memdev.
- */
- ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial,
- cxlmd->endpoint->regs.ras);
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return PCI_ERS_RESULT_DISCONNECT;
}
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+ /*
+ * A frozen channel indicates an impending reset which is fatal to
+ * CXL.mem operation, and will likely crash the system. On the off
+ * chance the situation is recoverable dump the status of the RAS
+ * capability registers and bounce the active state of the memdev.
+ */
+ ue = cxl_handle_ras(&cxlmd->dev, cxlds->serial,
+ cxlmd->endpoint->regs.ras);
+
switch (state) {
case pci_channel_io_normal:
if (ue) {
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (25 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 26/34] cxl: Change CXL handlers to use guard() instead of scoped_guard() Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 18:58 ` Kuppuswamy Sathyanarayanan
2026-01-14 18:20 ` [PATCH v14 28/34] PCI/AER: Move AER driver's CXL VH handling to pcie/aer_cxl_vh.c Terry Bowman
` (6 subsequent siblings)
33 siblings, 1 reply; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The CXL driver's error handling for uncorrectable errors (UCE) will be
updated in the future. A required change is for the error handlers to
to force a system panic when a UCE is detected.
Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
be used by CXL UCE fatal and non-fatal recovery in future patches. Update
PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v13 -> v14:
- Add review-by for Dan
- Update Title prefix (Bjorn)
- Removed merge_result. Only logging error for device reporting the
error (Dan)
Changes in v12->v13:
- Add Dave Jiang's, Jonathan's, Ben's review-by
- Typo fix (Ben)
Changes v11 -> v12:
- Documentation requested (Lukas)
---
Documentation/PCI/pci-error-recovery.rst | 2 ++
include/linux/pci.h | 3 +++
2 files changed, 5 insertions(+)
diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
index 43bc4e3665b4..82ee2c8c0450 100644
--- a/Documentation/PCI/pci-error-recovery.rst
+++ b/Documentation/PCI/pci-error-recovery.rst
@@ -102,6 +102,8 @@ Possible return values are::
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
+ PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
+ PCI_ERS_RESULT_PANIC, /* System is unstable, panic. Is CXL specific */
};
A driver does not have to implement all of these callbacks; however,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f8e8b3df794d..ee05d5925b13 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -921,6 +921,9 @@ enum pci_ers_result {
/* No AER capabilities registered for the driver */
PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
+
+ /* System is unstable, panic. Is CXL specific */
+ PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
};
/* PCI bus error event callbacks */
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 28/34] PCI/AER: Move AER driver's CXL VH handling to pcie/aer_cxl_vh.c
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (26 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-15 15:40 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 29/34] cxl/port: Unify endpoint and switch port lookup Terry Bowman
` (5 subsequent siblings)
33 siblings, 1 reply; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
CXL virtual hierarchy (VH) RAS handling for CXL Port devices will be added
soon. This requires a notification mechanism for the AER driver to share
the AER interrupt with the CXL driver. The notification will be used as an
indication for the CXL drivers to handle and log the CXL RAS errors.
Note, 'CXL protocol error' terminology will refer to CXL VH and not
CXL RCH errors unless specifically noted going forward.
Introduce a new file in the AER driver to handle the CXL protocol errors
named pci/pcie/aer_cxl_vh.c.
Add a kfifo work queue to be used by the AER and CXL drivers. The AER
driver will be the sole kfifo producer adding work and the cxl_core will be
the sole kfifo consumer removing work. Add the boilerplate kfifo support.
Encapsulate the kfifo, RW semaphore, and work pointer in a single structure.
Add CXL work queue handler registration functions in the AER driver. Export
the functions allowing CXL driver to access. Implement registration
functions for the CXL driver to assign or clear the work handler function.
Synchronize accesses using the RW semaphore.
Introduce 'struct cxl_proto_err_work_data' to serve as the kfifo work data.
This will contain a reference to the PCI error source device and the error
severity. This will be used when the work is dequeued by the cxl_core driver.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
Changes in v13 -> v14:
- Replaced workqueue_types.h include with 'struct work_struct'
predeclaration (Bjorn)
- Update error message (Bjorn)
- Reordered 'struct cxl_proto_err_work_data' (Bjorn)
- Remove export of cxl_error_is_native() here (Bjorn)
Changes in v12->v13:
- Added Dave Jiang's review-by
- Update error message (Ben)
Changes in v11->v12:
- None
Changes in v10->v11:
- cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
- cxl_error_detected() - Remove extra line (Shiju)
- Changes moved to core/ras.c (Terry)
- cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
- Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
- Move #include "pci.h from cxl.h to core.h (Terry)
- Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
---
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/aer.c | 15 ++-----
drivers/pci/pcie/aer_cxl_vh.c | 78 +++++++++++++++++++++++++++++++++++
drivers/pci/pcie/portdrv.h | 4 ++
include/linux/aer.h | 22 ++++++++++
5 files changed, 109 insertions(+), 11 deletions(-)
create mode 100644 drivers/pci/pcie/aer_cxl_vh.c
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index b0b43a18c304..62d3d3c69a5d 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o bwctrl.o
obj-y += aspm.o
obj-$(CONFIG_PCIEAER) += aer.o err.o tlp.o
obj-$(CONFIG_CXL_RAS) += aer_cxl_rch.o
+obj-$(CONFIG_CXL_RAS) += aer_cxl_vh.o
obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
obj-$(CONFIG_PCIE_PME) += pme.o
obj-$(CONFIG_PCIE_DPC) += dpc.o
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d30a217fae46..c2030d32a19c 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1150,16 +1150,6 @@ void pci_aer_unmask_internal_errors(struct pci_dev *dev)
}
EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
-#ifdef CONFIG_CXL_RAS
-bool is_aer_internal_error(struct aer_err_info *info)
-{
- if (info->severity == AER_CORRECTABLE)
- return info->status & PCI_ERR_COR_INTERNAL;
-
- return info->status & PCI_ERR_UNC_INTN;
-}
-#endif
-
/**
* pci_aer_handle_error - handle logging error into an event log
* @dev: pointer to pci_dev data structure of error source device
@@ -1196,7 +1186,10 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
{
cxl_rch_handle_error(dev, info);
- pci_aer_handle_error(dev, info);
+ if (is_cxl_error(dev, info))
+ cxl_forward_error(dev, info);
+ else
+ pci_aer_handle_error(dev, info);
pci_dev_put(dev);
}
diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
new file mode 100644
index 000000000000..2189d3c6cef1
--- /dev/null
+++ b/drivers/pci/pcie/aer_cxl_vh.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2025 AMD Corporation. All rights reserved. */
+
+#include <linux/types.h>
+#include <linux/aer.h>
+#include <linux/bitfield.h>
+#include <linux/kfifo.h>
+#include "../pci.h"
+#include "portdrv.h"
+
+#define CXL_ERROR_SOURCES_MAX 128
+
+struct cxl_proto_err_kfifo {
+ struct work_struct *work;
+ struct rw_semaphore rw_sema;
+ DECLARE_KFIFO(fifo, struct cxl_proto_err_work_data,
+ CXL_ERROR_SOURCES_MAX);
+};
+
+static struct cxl_proto_err_kfifo cxl_proto_err_kfifo = {
+ .rw_sema = __RWSEM_INITIALIZER(cxl_proto_err_kfifo.rw_sema)
+};
+
+bool is_aer_internal_error(struct aer_err_info *info)
+{
+ if (info->severity == AER_CORRECTABLE)
+ return info->status & PCI_ERR_COR_INTERNAL;
+
+ return info->status & PCI_ERR_UNC_INTN;
+}
+
+bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
+{
+ if (!info || !info->is_cxl)
+ return false;
+
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
+ return false;
+
+ return is_aer_internal_error(info);
+}
+
+void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info)
+{
+ struct cxl_proto_err_work_data wd = (struct cxl_proto_err_work_data) {
+ .severity = info->severity,
+ .pdev = pdev
+ };
+
+ guard(rwsem_read)(&cxl_proto_err_kfifo.rw_sema);
+ if (!cxl_proto_err_kfifo.work || !kfifo_put(&cxl_proto_err_kfifo.fifo, wd)) {
+ dev_err_ratelimited(&pdev->dev, "AER-CXL kfifo error");
+ return;
+ }
+
+ schedule_work(cxl_proto_err_kfifo.work);
+}
+
+void cxl_register_proto_err_work(struct work_struct *work)
+{
+ guard(rwsem_write)(&cxl_proto_err_kfifo.rw_sema);
+ cxl_proto_err_kfifo.work = work;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_register_proto_err_work, "CXL");
+
+void cxl_unregister_proto_err_work(void)
+{
+ guard(rwsem_write)(&cxl_proto_err_kfifo.rw_sema);
+ cxl_proto_err_kfifo.work = NULL;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_unregister_proto_err_work, "CXL");
+
+int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd)
+{
+ guard(rwsem_read)(&cxl_proto_err_kfifo.rw_sema);
+ return kfifo_get(&cxl_proto_err_kfifo.fifo, wd);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_proto_err_kfifo_get, "CXL");
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index cc58bf2f2c84..66a6b8099c96 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -130,9 +130,13 @@ struct aer_err_info;
bool is_aer_internal_error(struct aer_err_info *info);
void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info);
void cxl_rch_enable_rcec(struct pci_dev *rcec);
+bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info);
+void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info);
#else
static inline bool is_aer_internal_error(struct aer_err_info *info) { return false; }
static inline void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { }
static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
+static inline bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return false; }
+static inline void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info) { }
#endif /* CONFIG_CXL_RAS */
#endif /* _PORTDRV_H_ */
diff --git a/include/linux/aer.h b/include/linux/aer.h
index df0f5c382286..f351e41dd979 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -53,6 +53,16 @@ struct aer_capability_regs {
u16 uncor_err_source;
};
+/**
+ * struct cxl_proto_err_work_data - Error information used in CXL error handling
+ * @pdev: PCI device detecting the error
+ * @severity: AER severity
+ */
+struct cxl_proto_err_work_data {
+ struct pci_dev *pdev;
+ int severity;
+};
+
#if defined(CONFIG_PCIEAER)
int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
int pcie_aer_is_native(struct pci_dev *dev);
@@ -66,6 +76,18 @@ static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
#endif
+struct work_struct;
+
+#ifdef CONFIG_CXL_RAS
+int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd);
+void cxl_register_proto_err_work(struct work_struct *work);
+void cxl_unregister_proto_err_work(void);
+#else
+static inline int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd) { return 0; }
+static inline void cxl_register_proto_err_work(struct work_struct *work) { }
+static inline void cxl_unregister_proto_err_work(void) { }
+#endif
+
void pci_print_aer(struct pci_dev *dev, int aer_severity,
struct aer_capability_regs *aer);
int cper_severity_to_aer(int cper_severity);
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 29/34] cxl/port: Unify endpoint and switch port lookup
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (27 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 28/34] PCI/AER: Move AER driver's CXL VH handling to pcie/aer_cxl_vh.c Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 23:04 ` Dave Jiang
2026-01-15 15:44 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error Terry Bowman
` (4 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
From: Dan Williams <dan.j.williams@intel.com>
In support of generic CXL protocol error handling across various 'struct
cxl_port' types, update find_cxl_port_by_uport() to retrieve endpoint CXL
port companions from endpoint PCIe device instances.
The end result is that upstream switch ports and endpoint ports can share
error handling and eventually delete the misplaced cxl_error_handlers from
the cxl_pci class driver.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13->v14:
- New patch
---
drivers/cxl/core/port.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 3f730511f11d..a535e57360e0 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1561,10 +1561,20 @@ static int match_port_by_uport(struct device *dev, const void *data)
return 0;
port = to_cxl_port(dev);
+ /* Endpoint ports are hosted by memdevs */
+ if (is_cxl_memdev(port->uport_dev))
+ return uport_dev == port->uport_dev->parent;
return uport_dev == port->uport_dev;
}
-/*
+/**
+ * find_cxl_port_by_uport - Find a CXL port device companion
+ * @uport_dev: Device that acts as a switch or endpoint in the CXL hierarchy
+ *
+ * In the case of endpoint ports recall that port->uport_dev points to a 'struct
+ * cxl_memdev' device. So, the @uport_dev argument is the parent device of the
+ * 'struct cxl_memdev' in that case.
+ *
* Function takes a device reference on the port device. Caller should do a
* put_device() when done.
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (28 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 29/34] cxl/port: Unify endpoint and switch port lookup Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 23:18 ` Dave Jiang
` (2 more replies)
2026-01-14 18:20 ` [PATCH v14 31/34] PCI: Introduce CXL Port protocol error handlers Terry Bowman
` (3 subsequent siblings)
33 siblings, 3 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The AER driver now forwards CXL protocol errors to the CXL driver via a
kfifo. The CXL driver must consume these work items and initiate protocol
error handling while ensuring the device's RAS mappings remain valid
throughout processing.
Implement cxl_proto_err_work_fn() to dequeue work items forwarded by the
AER service driver. Lock the parent CXL Port device to ensure the CXL
device's RAS registers are accessible during handling. Add pdev reference-put
to match reference-get in AER driver. This will ensure pdev access after
kfifo dequeue. These changes apply to CXL Ports and CXL Endpoints.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13->v14:
- Update commit title's prefix (Bjorn)
- Add pdev ref get in AER driver before enqueue and add pdev ref put in
CXL driver after dequeue and handling (Dan)
- Removed handling to simplify patch context (Terry)
Changes in v12->v13:
- Add cxlmd lock using guard() (Terry)
- Remove exporting of unused function, pci_aer_clear_fatal_status() (Dave Jiang)
- Change pr_err() calls to ratelimited. (Terry)
- Update commit message. (Terry)
- Remove namespace qualifier from pcie_clear_device_status()
export (Dave Jiang)
- Move locks into cxl_proto_err_work_fn() (Dave)
- Update log messages in cxl_forward_error() (Ben)
Changes in v11->v12:
- Add guard for CE case in cxl_handle_proto_error() (Dave)
Changes in v10->v11:
- Reword patch commit message to remove RCiEP details (Jonathan)
- Add #include <linux/bitfield.h> (Terry)
- is_cxl_rcd() - Fix short comment message wrap (Jonathan)
- is_cxl_rcd() - Combine return calls into 1 (Jonathan)
- cxl_handle_proto_error() - Move comment earlier (Jonathan)
- Use FIELD_GET() in discovering class code (Jonathan)
- Remove BDF from cxl_proto_err_work_data. Use 'struct
pci_dev *' (Dan)
---
drivers/cxl/core/core.h | 3 ++
drivers/cxl/core/port.c | 6 +--
drivers/cxl/core/ras.c | 98 +++++++++++++++++++++++++++++++----
drivers/pci/pcie/aer_cxl_vh.c | 1 +
4 files changed, 94 insertions(+), 14 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 306762a15dc0..39324e1b8940 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -169,6 +169,9 @@ static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
#endif /* CONFIG_CXL_RAS */
int cxl_gpf_port_setup(struct cxl_dport *dport);
+struct cxl_port *find_cxl_port(struct device *dport_dev,
+ struct cxl_dport **dport);
+struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev);
struct cxl_hdm;
int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index a535e57360e0..0bec10be5d56 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1335,8 +1335,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
return NULL;
}
-static struct cxl_port *find_cxl_port(struct device *dport_dev,
- struct cxl_dport **dport)
+struct cxl_port *find_cxl_port(struct device *dport_dev,
+ struct cxl_dport **dport)
{
struct cxl_find_port_ctx ctx = {
.dport_dev = dport_dev,
@@ -1578,7 +1578,7 @@ static int match_port_by_uport(struct device *dev, const void *data)
* Function takes a device reference on the port device. Caller should do a
* put_device() when done.
*/
-static struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
+struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
{
struct device *dev;
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index bf82880e19b4..0c640b84ad70 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -117,17 +117,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
}
static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
-int cxl_ras_init(void)
-{
- return cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
-}
-
-void cxl_ras_exit(void)
-{
- cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
- cancel_work_sync(&cxl_cper_prot_err_work);
-}
-
static void cxl_dport_map_ras(struct cxl_dport *dport)
{
struct cxl_register_map *map = &dport->reg_map;
@@ -173,6 +162,44 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
+/*
+ * Return 'struct cxl_port *' parent CXL Port of dev
+ *
+ * Reference count increments returned port on success
+ *
+ * @pdev: Find the parent CXL Port of this device
+ */
+static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
+{
+ switch (pci_pcie_type(pdev)) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ {
+ struct cxl_dport *dport;
+ struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
+
+ if (!port) {
+ pci_err(pdev, "Failed to find the CXL device");
+ return NULL;
+ }
+ return port;
+ }
+ case PCI_EXP_TYPE_UPSTREAM:
+ case PCI_EXP_TYPE_ENDPOINT:
+ {
+ struct cxl_port *port = find_cxl_port_by_uport(&pdev->dev);
+
+ if (!port) {
+ pci_err(pdev, "Failed to find the CXL device");
+ return NULL;
+ }
+ return port;
+ }
+ }
+ pci_warn_once(pdev, "Error: Unsupported device type (%#x)", pci_pcie_type(pdev));
+ return NULL;
+}
+
void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
void __iomem *addr;
@@ -316,3 +343,52 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
return PCI_ERS_RESULT_NEED_RESET;
}
EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
+
+static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
+{
+}
+
+static void cxl_proto_err_work_fn(struct work_struct *work)
+{
+ struct cxl_proto_err_work_data wd;
+
+ while (cxl_proto_err_kfifo_get(&wd)) {
+ struct pci_dev *pdev __free(pci_dev_put) = wd.pdev;
+
+ if (!pdev) {
+ pr_err_ratelimited("NULL PCI device passed in AER-CXL KFIFO\n");
+ continue;
+ }
+
+ struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
+ if (!port) {
+ pr_err_ratelimited("Failed to find parent Port device in CXL topology.\n");
+ continue;
+ }
+ guard(device)(&port->dev);
+
+ cxl_handle_proto_error(&wd);
+ }
+}
+
+static struct work_struct cxl_proto_err_work;
+static DECLARE_WORK(cxl_proto_err_work, cxl_proto_err_work_fn);
+
+int cxl_ras_init(void)
+{
+ if (cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work))
+ pr_err("Failed to initialize CXL RAS CPER\n");
+
+ cxl_register_proto_err_work(&cxl_proto_err_work);
+
+ return 0;
+}
+
+void cxl_ras_exit(void)
+{
+ cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
+ cancel_work_sync(&cxl_cper_prot_err_work);
+
+ cxl_unregister_proto_err_work();
+ cancel_work_sync(&cxl_proto_err_work);
+}
diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
index 2189d3c6cef1..0f616f5fafcf 100644
--- a/drivers/pci/pcie/aer_cxl_vh.c
+++ b/drivers/pci/pcie/aer_cxl_vh.c
@@ -48,6 +48,7 @@ void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info)
};
guard(rwsem_read)(&cxl_proto_err_kfifo.rw_sema);
+ pci_dev_get(pdev);
if (!cxl_proto_err_kfifo.work || !kfifo_put(&cxl_proto_err_kfifo.fifo, wd)) {
dev_err_ratelimited(&pdev->dev, "AER-CXL kfifo error");
return;
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 31/34] PCI: Introduce CXL Port protocol error handlers
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (29 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 23:37 ` Dave Jiang
2026-01-22 18:27 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 32/34] cxl: Update Endpoint uncorrectable protocol error handling Terry Bowman
` (2 subsequent siblings)
33 siblings, 2 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
Add CXL protocol error handlers for CXL Port devices (Root Ports,
Downstream Ports, and Upstream Ports). Implement cxl_port_cor_error_detected()
and cxl_port_error_detected() to handle correctable and uncorrectable errors
respectively.
Introduce cxl_get_ras_base() to retrieve the cached RAS register base
address for a given CXL port. This function supports CXL Root Ports,
Downstream Ports, Upstream Ports, and Endpoints by returning their
previously mapped RAS register addresses.
Update the AER driver's is_cxl_error() to recognize CXL Port devices in
addition to CXL Endpoints, as both now have CXL-specific error handlers.
Future patch(es) will include port error handling changes to support
Endpoint protocol errors.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13->v14:
- Add Dave Jiang's review-by
- Update commit message & headline (Bjorn)
- Refactor cxl_port_error_detected()/cxl_port_cor_error_detected() to
one line (Jonathan)
- Remove cxl_walk_port() (Dan)
- Remove cxl_pci_drv_bound(). Check for 'is_cxl' parent port is
sufficient (Dan)
- Remove device_lock_if()
- Combined CE and UCE here (Terry)
Changes in v12->v13:
- Move get_pci_cxl_host_dev() and cxl_handle_proto_error() to Dequeue
patch (Terry)
- Remove EP case in cxl_get_ras_base(), not used. (Terry)
- Remove check for dport->dport_dev (Dave)
- Remove whitespace (Terry)
Changes in v11->v12:
- Add call to cxl_pci_drv_bound() in cxl_handle_proto_error() and
pci_to_cxl_dev()
- Change cxl_error_detected() -> cxl_cor_error_detected()
- Remove NULL variable assignments
- Replace bus_find_device() with find_cxl_port_by_uport() for upstream
port searches.
Changes in v10->v11:
- None
---
drivers/cxl/core/ras.c | 101 +++++++++++++++++++++++++++++++++-
drivers/pci/pci.c | 1 +
drivers/pci/pci.h | 2 -
drivers/pci/pcie/aer.c | 1 +
drivers/pci/pcie/aer_cxl_vh.c | 5 +-
include/linux/aer.h | 2 +
include/linux/pci.h | 2 +
7 files changed, 109 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 0c640b84ad70..96ce85cc0a46 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -200,6 +200,67 @@ static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
return NULL;
}
+static void __iomem *cxl_get_ras_base(struct device *dev)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ switch (pci_pcie_type(pdev)) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ {
+ struct cxl_dport *dport;
+ struct cxl_port *port __free(put_cxl_port) = find_cxl_port(&pdev->dev, &dport);
+
+ if (!dport) {
+ pci_err(pdev, "Failed to find the CXL device");
+ return NULL;
+ }
+ return dport->regs.ras;
+ }
+ case PCI_EXP_TYPE_UPSTREAM:
+ {
+ struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
+
+ if (!port) {
+ pci_err(pdev, "Failed to find the CXL device");
+ return NULL;
+ }
+ return port->regs.ras;
+ }
+ }
+ dev_warn_once(dev, "Error: Unsupported device type (%#x)", pci_pcie_type(pdev));
+ return NULL;
+}
+
+static pci_ers_result_t cxl_port_error_detected(struct device *dev);
+
+static void cxl_do_recovery(struct pci_dev *pdev)
+{
+ struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
+ pci_ers_result_t status;
+
+ if (!port) {
+ pci_err(pdev, "Failed to find the CXL device\n");
+ return;
+ }
+
+ status = cxl_port_error_detected(&pdev->dev);
+ if (status == PCI_ERS_RESULT_PANIC)
+ panic("CXL cachemem error.");
+
+ /*
+ * If we have native control of AER, clear error status in the device
+ * that detected the error. If the platform retained control of AER,
+ * it is responsible for clearing this status. In that case, the
+ * signaling device may not even be visible to the OS.
+ */
+ if (pcie_aer_is_native(pdev)) {
+ pcie_clear_device_status(pdev);
+ pci_aer_clear_nonfatal_status(pdev);
+ pci_aer_clear_fatal_status(pdev);
+ }
+}
+
void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
void __iomem *addr;
@@ -214,7 +275,10 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
return;
writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- trace_cxl_aer_correctable_error(dev, status, serial);
+ if (is_cxl_memdev(dev))
+ trace_cxl_aer_correctable_error(dev, status, serial);
+ else
+ trace_cxl_port_aer_correctable_error(dev, status);
}
/* CXL spec rev3.0 8.2.4.16.1 */
@@ -265,12 +329,27 @@ bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
}
header_log_copy(ras_base, hl);
- trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
+
+ if (is_cxl_memdev(dev))
+ trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
+ else
+ trace_cxl_port_aer_uncorrectable_error(dev, status, fe, hl);
+
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
return true;
}
+static void cxl_port_cor_error_detected(struct device *dev)
+{
+ cxl_handle_cor_ras(dev, 0, cxl_get_ras_base(dev));
+}
+
+static pci_ers_result_t cxl_port_error_detected(struct device *dev)
+{
+ return cxl_handle_ras(dev, 0, cxl_get_ras_base(dev));
+}
+
void cxl_cor_error_detected(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
@@ -346,6 +425,24 @@ EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
{
+ struct pci_dev *pdev = err_info->pdev;
+
+ if (err_info->severity == AER_CORRECTABLE) {
+
+ if (!pcie_aer_is_native(pdev))
+ return;
+
+ if (pdev->aer_cap)
+ pci_clear_and_set_config_dword(pdev,
+ pdev->aer_cap + PCI_ERR_COR_STATUS,
+ 0, PCI_ERR_COR_INTERNAL);
+
+ cxl_port_cor_error_detected(&pdev->dev);
+
+ pcie_clear_device_status(pdev);
+ } else {
+ cxl_do_recovery(pdev);
+ }
}
static void cxl_proto_err_work_fn(struct work_struct *work)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13dbb405dc31..b7bfefdaf990 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2248,6 +2248,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
}
+EXPORT_SYMBOL_GPL(pcie_clear_device_status);
#endif
/**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index dbc547db208a..8bb703524f52 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -229,7 +229,6 @@ void pci_refresh_power_state(struct pci_dev *dev);
int pci_power_up(struct pci_dev *dev);
void pci_disable_enabled_device(struct pci_dev *dev);
int pci_finish_runtime_suspend(struct pci_dev *dev);
-void pcie_clear_device_status(struct pci_dev *dev);
void pcie_clear_root_pme_status(struct pci_dev *dev);
bool pci_check_pme_status(struct pci_dev *dev);
void pci_pme_wakeup_bus(struct pci_bus *bus);
@@ -1196,7 +1195,6 @@ void pci_restore_aer_state(struct pci_dev *dev);
static inline void pci_no_aer(void) { }
static inline void pci_aer_init(struct pci_dev *d) { }
static inline void pci_aer_exit(struct pci_dev *d) { }
-static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
static inline void pci_save_aer_state(struct pci_dev *dev) { }
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index c2030d32a19c..dd7c49651612 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -298,6 +298,7 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
if (status)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
}
+EXPORT_SYMBOL_GPL(pci_aer_clear_fatal_status);
/**
* pci_aer_raw_clear_status - Clear AER error registers.
diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
index 0f616f5fafcf..aa69e504302f 100644
--- a/drivers/pci/pcie/aer_cxl_vh.c
+++ b/drivers/pci/pcie/aer_cxl_vh.c
@@ -34,7 +34,10 @@ bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
if (!info || !info->is_cxl)
return false;
- if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
+ if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT) &&
+ (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
+ (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM) &&
+ (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
return false;
return is_aer_internal_error(info);
diff --git a/include/linux/aer.h b/include/linux/aer.h
index f351e41dd979..c1aef7859d0a 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -65,6 +65,7 @@ struct cxl_proto_err_work_data {
#if defined(CONFIG_PCIEAER)
int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+void pci_aer_clear_fatal_status(struct pci_dev *dev);
int pcie_aer_is_native(struct pci_dev *dev);
void pci_aer_unmask_internal_errors(struct pci_dev *dev);
#else
@@ -72,6 +73,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
{
return -EINVAL;
}
+static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
#endif
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ee05d5925b13..1ef4743bf151 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1921,8 +1921,10 @@ static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { }
#ifdef CONFIG_PCIEAER
bool pci_aer_available(void);
+void pcie_clear_device_status(struct pci_dev *dev);
#else
static inline bool pci_aer_available(void) { return false; }
+static inline void pcie_clear_device_status(struct pci_dev *dev) { }
#endif
bool pci_ats_disabled(void);
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 32/34] cxl: Update Endpoint uncorrectable protocol error handling
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (30 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 31/34] PCI: Introduce CXL Port protocol error handlers Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 22:07 ` dan.j.williams
2026-01-14 18:20 ` [PATCH v14 33/34] cxl: Update Endpoint correctable " Terry Bowman
2026-01-14 18:20 ` [PATCH v14 34/34] cxl: Enable CXL protocol errors during CXL Port probe Terry Bowman
33 siblings, 1 reply; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The CXL drivers must support handling Endpoint CXL and PCI uncorrectable
(UCE) protocol errors. Update the drivers to support both.
Introduce cxl_pci_error_detected() to handle PCI correctable errors,
replacing cxl_error_detected(). Implement this new function to call
the existing CXL Port uncorrectable handler, cxl_port_error_detected().
Update cxl_port_error_detected() for Endpoint handling. Take the CXL
memory device lock, check for a valid driver, and handle restricted
CXL device (RCH) if needed. This is the same sequence initially in
cxl_error_detected(). But, the UCE handler's logic for the returned
result errors is simplified because recovery will not be tried and
instead UCE's will result in the CXL driver invoking system panic.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13->v14:
- Update commit headline (Bjorn)
- Rename pci_error_detected()/pci_cor_error_detected() ->
cxl_pci_error_detected/cxl_pci_cor_error_detected() (Jonathan)
- Remove now-invalid comment in cxl_error_detected() (Jonathan)
- Split into separate patches for UCE and CE (Terry)
Changes in v12->v13:
- Update commit messaqge (Terry)
- Updated all the implementation and commit message. (Terry)
- Refactored cxl_cor_error_detected()/cxl_error_detected() to remove
pdev (Dave Jiang)
Changes in v11->v12:
- None
Changes in v10->v11:
- cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
- cxl_error_detected() - Remove extra line (Shiju)
- Changes moved to core/ras.c (Terry)
- cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
- Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
- Move #include "pci.h from cxl.h to core.h (Terry)
- Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
---
drivers/cxl/core/core.h | 9 ++--
drivers/cxl/core/ras.c | 92 +++++++++++++++++++----------------------
drivers/cxl/cxlpci.h | 15 ++++---
drivers/cxl/pci.c | 6 +--
4 files changed, 60 insertions(+), 62 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 39324e1b8940..96c6cf478427 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -4,6 +4,7 @@
#ifndef __CXL_CORE_H__
#define __CXL_CORE_H__
+#include <linux/pci.h>
#include <cxl/mailbox.h>
#include <linux/rwsem.h>
@@ -147,7 +148,7 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
#ifdef CONFIG_CXL_RAS
int cxl_ras_init(void);
void cxl_ras_exit(void);
-bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
+pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base);
void cxl_dport_map_rch_aer(struct cxl_dport *dport);
void cxl_disable_rch_root_ints(struct cxl_dport *dport);
@@ -158,11 +159,11 @@ static inline int cxl_ras_init(void)
return 0;
}
static inline void cxl_ras_exit(void) { }
-static inline bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
+static inline pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
- return false;
+ return PCI_ERS_RESULT_NONE;
}
-static inline void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { }
+static inline void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base) { }
static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 96ce85cc0a46..dc6e02d64821 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -218,6 +218,7 @@ static void __iomem *cxl_get_ras_base(struct device *dev)
return dport->regs.ras;
}
case PCI_EXP_TYPE_UPSTREAM:
+ case PCI_EXP_TYPE_ENDPOINT:
{
struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
@@ -302,20 +303,22 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
*/
-bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
+pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
u32 hl[CXL_HEADERLOG_SIZE_U32];
void __iomem *addr;
u32 status;
u32 fe;
- if (!ras_base)
- return false;
+ if (!ras_base) {
+ dev_warn_once(dev, "CXL RAS register block is not mapped");
+ return PCI_ERS_RESULT_NONE;
+ }
addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
status = readl(addr);
if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
- return false;
+ return PCI_ERS_RESULT_NONE;
/* If multiple errors, log header points to first error from ctrl reg */
if (hweight32(status) > 1) {
@@ -337,7 +340,7 @@ bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
- return true;
+ return PCI_ERS_RESULT_PANIC;
}
static void cxl_port_cor_error_detected(struct device *dev)
@@ -347,7 +350,30 @@ static void cxl_port_cor_error_detected(struct device *dev)
static pci_ers_result_t cxl_port_error_detected(struct device *dev)
{
- return cxl_handle_ras(dev, 0, cxl_get_ras_base(dev));
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
+ u64 serial = 0;
+
+ if (is_cxl_endpoint(port)) {
+ struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ guard(device)(&cxlmd->dev);
+
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return PCI_ERS_RESULT_NONE;
+ }
+
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+
+ serial = cxlds->serial;
+ }
+
+ return cxl_handle_ras(dev, serial, cxl_get_ras_base(dev));
}
void cxl_cor_error_detected(struct pci_dev *pdev)
@@ -373,55 +399,21 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
-pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state)
+pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t error)
{
- struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- struct cxl_memdev *cxlmd = cxlds->cxlmd;
- struct device *dev = &cxlmd->dev;
- bool ue;
+ struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
+ pci_ers_result_t rc;
- guard(device)(dev);
+ guard(device)(&port->dev);
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return PCI_ERS_RESULT_DISCONNECT;
- }
+ rc = cxl_port_error_detected(&pdev->dev);
+ if (rc == PCI_ERS_RESULT_PANIC)
+ panic("CXL cachemem error.");
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
- /*
- * A frozen channel indicates an impending reset which is fatal to
- * CXL.mem operation, and will likely crash the system. On the off
- * chance the situation is recoverable dump the status of the RAS
- * capability registers and bounce the active state of the memdev.
- */
- ue = cxl_handle_ras(&cxlmd->dev, cxlds->serial,
- cxlmd->endpoint->regs.ras);
-
- switch (state) {
- case pci_channel_io_normal:
- if (ue) {
- device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- }
- return PCI_ERS_RESULT_CAN_RECOVER;
- case pci_channel_io_frozen:
- dev_warn(&pdev->dev,
- "%s: frozen state error detected, disable CXL.mem\n",
- dev_name(dev));
- device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- case pci_channel_io_perm_failure:
- dev_warn(&pdev->dev,
- "failure state error detected, request disconnect\n");
- return PCI_ERS_RESULT_DISCONNECT;
- }
- return PCI_ERS_RESULT_NEED_RESET;
+ return rc;
}
-EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_pci_error_detected, "CXL");
static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
{
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 532506595d0f..f218b343e179 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -79,15 +79,20 @@ void read_cdat_data(struct cxl_port *port);
#ifdef CONFIG_CXL_RAS
void cxl_cor_error_detected(struct pci_dev *pdev);
-pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state);
+pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t error);
void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
void devm_cxl_port_ras_setup(struct cxl_port *port);
+void __cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
+void __cxl_uport_init_ras_reporting(struct cxl_port *port,
+ struct device *host);
+int __cxl_await_media_ready(struct cxl_dev_state *cxlds);
+resource_size_t __cxl_rcd_component_reg_phys(struct device *dev,
+ struct cxl_dport *dport);
#else
static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
-
-static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state)
+static inline pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
{
return PCI_ERS_RESULT_NONE;
}
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index acb0eb2a13c3..ff741adc7c7f 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1051,8 +1051,8 @@ static void cxl_reset_done(struct pci_dev *pdev)
}
}
-static const struct pci_error_handlers cxl_error_handlers = {
- .error_detected = cxl_error_detected,
+static const struct pci_error_handlers pci_error_handlers = {
+ .error_detected = cxl_pci_error_detected,
.slot_reset = cxl_slot_reset,
.resume = cxl_error_resume,
.cor_error_detected = cxl_cor_error_detected,
@@ -1063,7 +1063,7 @@ static struct pci_driver cxl_pci_driver = {
.name = KBUILD_MODNAME,
.id_table = cxl_mem_pci_tbl,
.probe = cxl_pci_probe,
- .err_handler = &cxl_error_handlers,
+ .err_handler = &pci_error_handlers,
.dev_groups = cxl_rcd_groups,
.driver = {
.probe_type = PROBE_PREFER_ASYNCHRONOUS,
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 33/34] cxl: Update Endpoint correctable protocol error handling
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (31 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 32/34] cxl: Update Endpoint uncorrectable protocol error handling Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-14 18:20 ` [PATCH v14 34/34] cxl: Enable CXL protocol errors during CXL Port probe Terry Bowman
33 siblings, 0 replies; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
The CXL drivers must support handling Endpoint CXL and PCI correctable
(CE) protocol errors. Update the driver to support both.
Introduce cxl_pci_cor_error_detected() to handle PCI correctable errors,
replacing cxl_cor_error_detected(). Implement this new function to call
the existing CXL correctable handler, cxl_port_cor_error_detected().
Update cxl_port_cor_error_detected() for correct Endpoint handling.
Take the CXL memory device lock, check for a valid driver, and handle
Restricted CXL Device (RCD) if needed.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v13->v14:
- New commit
- Change cxl_cor_error_detected() parameter to &pdev->dev device from
memdev device. (Terry)
- Updated commit message (Terry)
---
drivers/cxl/core/ras.c | 52 ++++++++++++++++++++++++++----------------
drivers/cxl/cxlpci.h | 6 +++--
drivers/cxl/pci.c | 2 +-
3 files changed, 37 insertions(+), 23 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index dc6e02d64821..427009a8a78a 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -267,8 +267,10 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
void __iomem *addr;
u32 status;
- if (!ras_base)
+ if (!ras_base) {
+ dev_warn_once(dev, "CXL RAS register block is not mapped");
return;
+ }
addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
status = readl(addr);
@@ -345,7 +347,30 @@ pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ra
static void cxl_port_cor_error_detected(struct device *dev)
{
- cxl_handle_cor_ras(dev, 0, cxl_get_ras_base(dev));
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
+ u64 serial = 0;
+
+ if (is_cxl_endpoint(port)) {
+ struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ guard(device)(&cxlmd->dev);
+
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return;
+ }
+
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+
+ serial = cxlds->serial;
+ }
+
+ cxl_handle_cor_ras(dev, serial, cxl_get_ras_base(dev));
}
static pci_ers_result_t cxl_port_error_detected(struct device *dev)
@@ -376,28 +401,15 @@ static pci_ers_result_t cxl_port_error_detected(struct device *dev)
return cxl_handle_ras(dev, serial, cxl_get_ras_base(dev));
}
-void cxl_cor_error_detected(struct pci_dev *pdev)
+void cxl_pci_cor_error_detected(struct pci_dev *pdev)
{
- struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- struct cxl_memdev *cxlmd = cxlds->cxlmd;
- struct device *dev = &cxlds->cxlmd->dev;
-
- guard(device)(dev);
-
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return;
- }
+ struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
+ guard(device)(&port->dev);
- cxl_handle_cor_ras(&cxlmd->dev, cxlds->serial,
- cxlmd->endpoint->regs.ras);
+ cxl_port_cor_error_detected(&pdev->dev);
}
-EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_pci_cor_error_detected, "CXL");
pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
pci_channel_state_t error)
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index f218b343e179..3d70f9b4a193 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -78,7 +78,7 @@ struct cxl_dev_state;
void read_cdat_data(struct cxl_port *port);
#ifdef CONFIG_CXL_RAS
-void cxl_cor_error_detected(struct pci_dev *pdev);
+void cxl_pci_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
pci_channel_state_t error);
void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
@@ -90,7 +90,9 @@ int __cxl_await_media_ready(struct cxl_dev_state *cxlds);
resource_size_t __cxl_rcd_component_reg_phys(struct device *dev,
struct cxl_dport *dport);
#else
-static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
+static inline void cxl_pci_cor_error_detected(struct pci_dev *pdev)
+{
+}
static inline pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
pci_channel_state_t state)
{
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index ff741adc7c7f..328b4ea8dbc5 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1055,7 +1055,7 @@ static const struct pci_error_handlers pci_error_handlers = {
.error_detected = cxl_pci_error_detected,
.slot_reset = cxl_slot_reset,
.resume = cxl_error_resume,
- .cor_error_detected = cxl_cor_error_detected,
+ .cor_error_detected = cxl_pci_cor_error_detected,
.reset_done = cxl_reset_done,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* [PATCH v14 34/34] cxl: Enable CXL protocol errors during CXL Port probe
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (32 preceding siblings ...)
2026-01-14 18:20 ` [PATCH v14 33/34] cxl: Update Endpoint correctable " Terry Bowman
@ 2026-01-14 18:20 ` Terry Bowman
2026-01-15 16:18 ` Jonathan Cameron
33 siblings, 1 reply; 129+ messages in thread
From: Terry Bowman @ 2026-01-14 18:20 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
CXL protocol errors are not enabled for all CXL devices after boot. These
must be enabled inorder to process CXL protocol errors.
Introduce cxl_unmask_proto_interrupts() to call pci_aer_unmask_internal_errors().
pci_aer_unmask_internal_errors() expects the pdev->aer_cap is initialized.
But, dev->aer_cap is not initialized for CXL Upstream Switch Ports and CXL
Downstream Switch Ports. Initialize the dev->aer_cap if necessary. Enable AER
correctable internal errors and uncorrectable internal errors for all CXL
devices.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
Changes in v13->v14:
- Update commit title's prefix (Bjorn)
Changes in v12->v13:
- Add dev and dev_is_pci() NULL checks in cxl_unmask_proto_interrupts() (Terry)
- Add Dave Jiang's and Ben's review-by
Changes in v11->v12:
- None
Changes in v10->v11:
- Added check for valid PCI devices in is_cxl_error() (Terry)
- Removed check for RCiEP in cxl_handle_proto_err() and
cxl_report_error_detected() (Terry)
---
drivers/cxl/core/port.c | 2 ++
drivers/cxl/core/ras.c | 22 ++++++++++++++++++++++
drivers/cxl/cxlpci.h | 4 ++++
3 files changed, 28 insertions(+)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 0bec10be5d56..588801c5d406 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1828,6 +1828,8 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
rc = cxl_add_ep(dport, &cxlmd->dev);
+ cxl_unmask_proto_interrupts(cxlmd->cxlds->dev);
+
/*
* If the endpoint already exists in the port's list,
* that's ok, it was added on a previous pass.
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 427009a8a78a..e299eb50fbe4 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -117,6 +117,24 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
}
static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
+void cxl_unmask_proto_interrupts(struct device *dev)
+{
+ if (!dev || !dev_is_pci(dev))
+ return;
+
+ struct pci_dev *pdev __free(pci_dev_put) = pci_dev_get(to_pci_dev(dev));
+
+ if (!pdev->aer_cap) {
+ pdev->aer_cap = pci_find_ext_capability(pdev,
+ PCI_EXT_CAP_ID_ERR);
+ if (!pdev->aer_cap)
+ return;
+ }
+
+ pci_aer_unmask_internal_errors(pdev);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_unmask_proto_interrupts, "CXL");
+
static void cxl_dport_map_ras(struct cxl_dport *dport)
{
struct cxl_register_map *map = &dport->reg_map;
@@ -127,6 +145,8 @@ static void cxl_dport_map_ras(struct cxl_dport *dport)
else if (cxl_map_component_regs(map, &dport->regs.component,
BIT(CXL_CM_CAP_CAP_ID_RAS)))
dev_dbg(dev, "Failed to map RAS capability.\n");
+
+ cxl_unmask_proto_interrupts(dev);
}
/**
@@ -159,6 +179,8 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
if (cxl_map_component_regs(map, &port->regs,
BIT(CXL_CM_CAP_CAP_ID_RAS)))
dev_dbg(&port->dev, "Failed to map RAS capability\n");
+
+ cxl_unmask_proto_interrupts(port->uport_dev);
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 3d70f9b4a193..0c915c0bdfac 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -89,6 +89,7 @@ void __cxl_uport_init_ras_reporting(struct cxl_port *port,
int __cxl_await_media_ready(struct cxl_dev_state *cxlds);
resource_size_t __cxl_rcd_component_reg_phys(struct device *dev,
struct cxl_dport *dport);
+void cxl_unmask_proto_interrupts(struct device *dev);
#else
static inline void cxl_pci_cor_error_detected(struct pci_dev *pdev)
{
@@ -104,6 +105,9 @@ static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
static inline void devm_cxl_port_ras_setup(struct cxl_port *port)
{
}
+static inline void cxl_unmask_proto_interrupts(struct device *dev)
+{
+}
#endif
int cxl_port_setup_regs(struct cxl_port *port,
--
2.34.1
^ permalink raw reply related [flat|nested] 129+ messages in thread
* Re: [PATCH v14 02/34] PCI: Update CXL DVSEC definitions
2026-01-14 18:20 ` [PATCH v14 02/34] PCI: Update CXL DVSEC definitions Terry Bowman
@ 2026-01-14 18:53 ` Jonathan Cameron
2026-01-19 23:44 ` dan.j.williams
2026-01-22 18:37 ` Bjorn Helgaas
1 sibling, 1 reply; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 18:53 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:23 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> CXL DVSEC definitions were recently moved into uapi/pci_regs.h, but the
> newly added macros do not follow the file's existing naming conventions.
> The current format uses CXL_DVSEC_XYZ, while the new CXL entries must
> instead use the PCI_DVSEC_CXL_XYZ prefix to match the conventions already
> established in pci_regs.h.
>
> The new CXL DVSEC macros also introduce _MASK and _OFFSET suffixes, which
> are not used anywhere else in the file. These suffixes lengthen the
> identifiers and reduce readability. Remove _MASK and _OFFSET from the
> recently added definitions.
>
> Additionally, remove PCI_DVSEC_HEADER1_LENGTH, as it duplicates the existing
> PCI_DVSEC_HEADER1_LEN() macro.
>
> Update all existing references to use the new macro names.
>
> Finally, update the inline documentation to reference the latest revision
> of the CXL specification.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>
> ---
>
> Changes in v13->v14:
> - New patch. Split from previous patch such that there is now a separate
> move patch and a format fix patch.
> - Formatting update requested (Bjorn)
> - Remove PCI_DVSEC_HEADER1_LENGTH_MASK because it duplicates
> PCI_DVSEC_HEADER1_LEN() (Bjorn)
> - Add Dan's review-by
A couple of name choices don't quite work to my reading.
I care more about the _DEVICE_ one than the other.
If consensus is that it is fine with out then I could probably be
persuaded.
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 6c4b6f19b18e..662582bdccf0 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> +/* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */
> +#define PCI_DVSEC_CXL_DEVICE 0
> +#define PCI_DVSEC_CXL_CAP 0xA
Why drop the _DEVICE_ bit of these I'd kind of expect
#define PCI_DVSEC_CXL_DEVICE_CAP
to indicate which DVSEC it is in.
> +#define PCI_DVSEC_CXL_MEM_CAPABLE _BITUL(2)
> +#define PCI_DVSEC_CXL_HDM_COUNT __GENMASK(5, 4)
> +#define PCI_DVSEC_CXL_CTRL 0xC
> +#define PCI_DVSEC_CXL_MEM_ENABLE _BITUL(2)
> +#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> +#define PCI_DVSEC_CXL_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> +#define PCI_DVSEC_CXL_MEM_INFO_VALID _BITUL(0)
> +#define PCI_DVSEC_CXL_MEM_ACTIVE _BITUL(1)
> +#define PCI_DVSEC_CXL_MEM_SIZE_LOW __GENMASK(31, 28)
> +#define PCI_DVSEC_CXL_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> +#define PCI_DVSEC_CXL_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> +#define PCI_DVSEC_CXL_MEM_BASE_LOW __GENMASK(31, 28)
>
> +/* CXL r4.0, 8.1.4: Non-CXL Function Map DVSEC */
> +#define PCI_DVSEC_CXL_FUNCTION_MAP 2
> +
> +/* CXL r4.0, 8.1.5: Extensions DVSEC for Ports */
> +#define PCI_DVSEC_CXL_PORT 3
> +#define PCI_DVSEC_CXL_PORT_CTL 0x0c
> +#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
> +
> +/* CXL r4.0, 8.1.6: GPF DVSEC for CXL Port */
> +#define PCI_DVSEC_CXL_PORT_GPF 4
Nothing like ambiguous naming in the CXL spec as the
following fields sound like they are in the CXL_PORT dvsec
but they aren't. Well the spec avoids it with GPF_FOR_PORT
but we don't want to go there. I wonder...
PCI_DVSEC_CXL_PORTGPF maybe to avoid that?
Sigh. It's probably not worth it and does look horrible, so stick
with these.
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL 0x0C
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE __GENMASK(3, 0)
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE __GENMASK(11, 8)
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL 0xE
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE __GENMASK(3, 0)
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE __GENMASK(11, 8)
> +
> +/* CXL r4.0, 8.1.7: GPF DVSEC for CXL Device */
> +#define PCI_DVSEC_CXL_DEVICE_GPF 5
> +
> +/* CXL r4.0, 8.1.9: Register Locator DVSEC */
> +#define PCI_DVSEC_CXL_REG_LOCATOR 8
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1 0xC
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BIR __GENMASK(2, 0)
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID __GENMASK(15, 8)
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW __GENMASK(31, 16)
>
> #endif /* LINUX_PCI_REGS_H */
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native()
2026-01-14 18:20 ` [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native() Terry Bowman
@ 2026-01-14 18:55 ` Jonathan Cameron
2026-01-14 20:16 ` Dave Jiang
2026-01-14 20:15 ` Dave Jiang
2026-01-22 18:23 ` Bjorn Helgaas
2 siblings, 1 reply; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 18:55 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:27 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> The AER driver includes a CXL support function cxl_error_is_native(). This
> function adds no additional value from pcie_aer_is_native().
>
> Simplify the codebase by removing cxl_error_is_native() and replace
> occurrences of cxl_error_is_native() with pcie_aer_is_native().
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Dave, if for any reason the rest gets delayed, nice if we can
pick this one up anyway.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
2026-01-14 18:20 ` [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
@ 2026-01-14 18:58 ` Kuppuswamy Sathyanarayanan
2026-01-14 19:20 ` Bowman, Terry
0 siblings, 1 reply; 129+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-01-14 18:58 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham, linux-cxl,
vishal.l.verma, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
Hi,
On 1/14/2026 10:20 AM, Terry Bowman wrote:
> The CXL driver's error handling for uncorrectable errors (UCE) will be
> updated in the future. A required change is for the error handlers to
> to force a system panic when a UCE is detected.
>
> Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
> be used by CXL UCE fatal and non-fatal recovery in future patches. Update
> PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>
> Changes in v13 -> v14:
> - Add review-by for Dan
> - Update Title prefix (Bjorn)
> - Removed merge_result. Only logging error for device reporting the
> error (Dan)
>
> Changes in v12->v13:
> - Add Dave Jiang's, Jonathan's, Ben's review-by
> - Typo fix (Ben)
>
> Changes v11 -> v12:
> - Documentation requested (Lukas)
> ---
> Documentation/PCI/pci-error-recovery.rst | 2 ++
> include/linux/pci.h | 3 +++
> 2 files changed, 5 insertions(+)
>
> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
> index 43bc4e3665b4..82ee2c8c0450 100644
> --- a/Documentation/PCI/pci-error-recovery.rst
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -102,6 +102,8 @@ Possible return values are::
> PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
> PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
> PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
> + PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
> + PCI_ERS_RESULT_PANIC, /* System is unstable, panic. Is CXL specific */
> };
I think you also need to update the "Detailed Steps" section of this
document to include details on when these new values should be returned
and how they affect the recovery flow.
>
> A driver does not have to implement all of these callbacks; however,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index f8e8b3df794d..ee05d5925b13 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -921,6 +921,9 @@ enum pci_ers_result {
>
> /* No AER capabilities registered for the driver */
> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
> +
> + /* System is unstable, panic. Is CXL specific */
> + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
> };
>
> /* PCI bus error event callbacks */
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors()
2026-01-14 18:20 ` [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
@ 2026-01-14 19:01 ` Jonathan Cameron
2026-01-14 19:09 ` Kuppuswamy Sathyanarayanan
` (3 subsequent siblings)
4 siblings, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 19:01 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:30 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> Internal PCIe errors are not enabled by default during initialization. This
> creates a problem for CXL drivers, which rely on PCIe Correctable and
> Uncorrectable Internal Errors to receive CXL protocol error notifications.
>
> Export pci_aer_unmask_internal_errors() so CXL and other drivers can
> enable internal PCIe errors.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-14 18:20 ` [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error() Terry Bowman
@ 2026-01-14 19:08 ` Jonathan Cameron
2026-01-15 20:42 ` dan.j.williams
2026-01-20 2:20 ` dan.j.williams
2026-01-22 18:48 ` Bjorn Helgaas
2 siblings, 1 reply; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 19:08 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:31 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> The AER driver includes significant logic for handling CXL protocol errors.
> The AER driver will be updated in the future to separate the AER and CXL
> logic.
>
> Rename the is_internal_error() function to is_aer_internal_error() as it
> gives a more precise indication of the purpose. Make is_aer_internal_error()
> non-static to allow for other PCI drivers to access.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Hi Terry,
I don't see it as sensible to have is_aer_internal_error()
return false if CXL is not built. That question has nothing to
do with CXL. Hence if we are doing generic naming, I think we
should just always have the function available. Gating on CXL
belongs at whatever called it. Which is the case already for
cxl_rch_handle_error() which has a stub that doesn't call this for
when CXL stuff isn't built.
Should just be a case of moving out of if the ifdef in aer.c
as part of this patch.
Jonathan
>
> ---
>
> Changes in v13->v14:
> - New patch
> ---
> drivers/pci/pcie/aer.c | 4 ++--
> drivers/pci/pcie/portdrv.h | 9 +++++++++
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 63658e691aa2..2527e8370186 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1166,7 +1166,7 @@ static bool is_cxl_mem_dev(struct pci_dev *dev)
> return true;
> }
>
> -static bool is_internal_error(struct aer_err_info *info)
> +bool is_aer_internal_error(struct aer_err_info *info)
> {
> if (info->severity == AER_CORRECTABLE)
> return info->status & PCI_ERR_COR_INTERNAL;
> @@ -1211,7 +1211,7 @@ static void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> * device driver.
> */
> if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> - is_internal_error(info))
> + is_aer_internal_error(info))
> pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
> }
>
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index bd29d1cc7b8b..e7a0a2cffea9 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -123,4 +123,13 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {}
> #endif /* !CONFIG_PCIE_PME */
>
> struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
> +
> +struct aer_err_info;
> +
> +#ifdef CONFIG_PCIEAER_CXL
> +bool is_aer_internal_error(struct aer_err_info *info);
> +#else
> +static inline bool is_aer_internal_error(struct aer_err_info *info) { return false; }
This seems odd. It's either an AER internal error or it isn't, whether
or not CXL is enabled. That stubbing out should I think go up to the
caller that can decide whether it cares or not.
> +#endif /* CONFIG_PCIEAER_CXL */
> +
> #endif /* _PORTDRV_H_ */
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors()
2026-01-14 18:20 ` [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
2026-01-14 19:01 ` Jonathan Cameron
@ 2026-01-14 19:09 ` Kuppuswamy Sathyanarayanan
2026-01-14 20:40 ` Dave Jiang
` (2 subsequent siblings)
4 siblings, 0 replies; 129+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-01-14 19:09 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham, linux-cxl,
vishal.l.verma, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/2026 10:20 AM, Terry Bowman wrote:
> Internal PCIe errors are not enabled by default during initialization. This
> creates a problem for CXL drivers, which rely on PCIe Correctable and
> Uncorrectable Internal Errors to receive CXL protocol error notifications.
>
> Export pci_aer_unmask_internal_errors() so CXL and other drivers can
> enable internal PCIe errors.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>
> Changes in v13->v14:
> - New commit. Bjorn requested separating out and adding immediatetly
> before being used. This is called from cxl_rch_enable_rcec() in
> following patch.
> ---
> drivers/pci/pcie/aer.c | 6 +++---
> include/linux/aer.h | 2 ++
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index c99ba2a1159c..63658e691aa2 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1120,8 +1120,6 @@ static bool find_source_device(struct pci_dev *parent,
> return true;
> }
>
> -#ifdef CONFIG_PCIEAER_CXL
> -
> /**
> * pci_aer_unmask_internal_errors - unmask internal errors
> * @dev: pointer to the pci_dev data structure
> @@ -1132,7 +1130,7 @@ static bool find_source_device(struct pci_dev *parent,
> * Note: AER must be enabled and supported by the device which must be
> * checked in advance, e.g. with pcie_aer_is_native().
> */
> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> {
> int aer = dev->aer_cap;
> u32 mask;
> @@ -1145,7 +1143,9 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> mask &= ~PCI_ERR_COR_INTERNAL;
> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> }
> +EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
>
> +#ifdef CONFIG_PCIEAER_CXL
> static bool is_cxl_mem_dev(struct pci_dev *dev)
> {
> /*
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 02940be66324..df0f5c382286 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -56,12 +56,14 @@ struct aer_capability_regs {
> #if defined(CONFIG_PCIEAER)
> int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> int pcie_aer_is_native(struct pci_dev *dev);
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> #else
> static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> {
> return -EINVAL;
> }
> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> #endif
>
> void pci_print_aer(struct pci_dev *dev, int aer_severity,
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 12/34] PCI/AER: Use guard() in cxl_rch_handle_error_iter()
2026-01-14 18:20 ` [PATCH v14 12/34] PCI/AER: Use guard() in cxl_rch_handle_error_iter() Terry Bowman
@ 2026-01-14 19:11 ` Jonathan Cameron
0 siblings, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 19:11 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:33 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> cxl_rch_handle_error_iter() includes a call to device_lock() using a goto
> for multiple return paths. Improve readability and maintainability by
> using the guard() lock variant.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
> ---
>
> Changes in v13 -> v14:
> - Add review-by for Jonathan, Dave Jiang, Dan WIlliams, and Bjorn
> - Remove cleanup.h (Jonathan)
I'm confused. I asked you to add the include (it wasn't there to be
removed!)
> - Reverted comment removal (Bjorn)
> - Move this patch after pci/pcie/aer_cxl_rch.c creation (Bjorn)
>
> Changes in v12 -> v13:
> - New patch
> ---
> drivers/pci/pcie/aer_cxl_rch.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
> index 6b515edb12c1..e471eefec9c4 100644
> --- a/drivers/pci/pcie/aer_cxl_rch.c
> +++ b/drivers/pci/pcie/aer_cxl_rch.c
> @@ -42,11 +42,11 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
> return 0;
>
> - device_lock(&dev->dev);
> + guard(device)(&dev->dev);
>
> err_handler = dev->driver ? dev->driver->err_handler : NULL;
> if (!err_handler)
> - goto out;
> + return 0;
>
> if (info->severity == AER_CORRECTABLE) {
> if (err_handler->cor_error_detected)
> @@ -57,8 +57,6 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> else if (info->severity == AER_FATAL)
> err_handler->error_detected(dev, pci_channel_io_frozen);
> }
> -out:
> - device_unlock(&dev->dev);
> return 0;
> }
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS
2026-01-14 18:20 ` [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS Terry Bowman
@ 2026-01-14 19:12 ` Jonathan Cameron
2026-01-14 20:49 ` Dave Jiang
` (2 subsequent siblings)
3 siblings, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 19:12 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:34 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> One of the primary reasons for the CXL driver to exist is to perform error
> handling. If both PCIEAER and CXL are enabled then light up CXL error
> handling as well. The work to remove CONFIG_PCIEAER_CXL started in:
>
> commit 4ae6ae66649c ("cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c")
>
> Finish that off with conditionally compiling all CXL RAS related helpers
> with CONFIG_CXL_RAS.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
2026-01-14 18:58 ` Kuppuswamy Sathyanarayanan
@ 2026-01-14 19:20 ` Bowman, Terry
2026-01-14 19:45 ` Kuppuswamy Sathyanarayanan
0 siblings, 1 reply; 129+ messages in thread
From: Bowman, Terry @ 2026-01-14 19:20 UTC (permalink / raw)
To: Kuppuswamy Sathyanarayanan, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham, linux-cxl,
vishal.l.verma, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/2026 12:58 PM, Kuppuswamy Sathyanarayanan wrote:
> Hi,
>
> On 1/14/2026 10:20 AM, Terry Bowman wrote:
>> The CXL driver's error handling for uncorrectable errors (UCE) will be
>> updated in the future. A required change is for the error handlers to
>> to force a system panic when a UCE is detected.
>>
>> Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
>> be used by CXL UCE fatal and non-fatal recovery in future patches. Update
>> PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>>
>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>
>> ---
>>
>> Changes in v13 -> v14:
>> - Add review-by for Dan
>> - Update Title prefix (Bjorn)
>> - Removed merge_result. Only logging error for device reporting the
>> error (Dan)
>>
>> Changes in v12->v13:
>> - Add Dave Jiang's, Jonathan's, Ben's review-by
>> - Typo fix (Ben)
>>
>> Changes v11 -> v12:
>> - Documentation requested (Lukas)
>> ---
>> Documentation/PCI/pci-error-recovery.rst | 2 ++
>> include/linux/pci.h | 3 +++
>> 2 files changed, 5 insertions(+)
>>
>> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
>> index 43bc4e3665b4..82ee2c8c0450 100644
>> --- a/Documentation/PCI/pci-error-recovery.rst
>> +++ b/Documentation/PCI/pci-error-recovery.rst
>> @@ -102,6 +102,8 @@ Possible return values are::
>> PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
>> PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
>> PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
>> + PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
>> + PCI_ERS_RESULT_PANIC, /* System is unstable, panic. Is CXL specific */
>> };
>
> I think you also need to update the "Detailed Steps" section of this
> document to include details on when these new values should be returned
> and how they affect the recovery flow.
>
I had details about PCI_ERS_RESULT_PANIC you mention in v13. Bjorne asked me to remove.
-Terry
>>
>> A driver does not have to implement all of these callbacks; however,
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index f8e8b3df794d..ee05d5925b13 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -921,6 +921,9 @@ enum pci_ers_result {
>>
>> /* No AER capabilities registered for the driver */
>> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
>> +
>> + /* System is unstable, panic. Is CXL specific */
>> + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
>> };
>>
>> /* PCI bus error event callbacks */
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
2026-01-14 19:20 ` Bowman, Terry
@ 2026-01-14 19:45 ` Kuppuswamy Sathyanarayanan
0 siblings, 0 replies; 129+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-01-14 19:45 UTC (permalink / raw)
To: Bowman, Terry, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham, linux-cxl,
vishal.l.verma, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
Hi,
On 1/14/2026 11:20 AM, Bowman, Terry wrote:
> On 1/14/2026 12:58 PM, Kuppuswamy Sathyanarayanan wrote:
>> Hi,
>>
>> On 1/14/2026 10:20 AM, Terry Bowman wrote:
>>> The CXL driver's error handling for uncorrectable errors (UCE) will be
>>> updated in the future. A required change is for the error handlers to
>>> to force a system panic when a UCE is detected.
>>>
>>> Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
>>> be used by CXL UCE fatal and non-fatal recovery in future patches. Update
>>> PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.
>>>
>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>>>
>>
>> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>>> ---
>>>
>>> Changes in v13 -> v14:
>>> - Add review-by for Dan
>>> - Update Title prefix (Bjorn)
>>> - Removed merge_result. Only logging error for device reporting the
>>> error (Dan)
>>>
>>> Changes in v12->v13:
>>> - Add Dave Jiang's, Jonathan's, Ben's review-by
>>> - Typo fix (Ben)
>>>
>>> Changes v11 -> v12:
>>> - Documentation requested (Lukas)
>>> ---
>>> Documentation/PCI/pci-error-recovery.rst | 2 ++
>>> include/linux/pci.h | 3 +++
>>> 2 files changed, 5 insertions(+)
>>>
>>> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
>>> index 43bc4e3665b4..82ee2c8c0450 100644
>>> --- a/Documentation/PCI/pci-error-recovery.rst
>>> +++ b/Documentation/PCI/pci-error-recovery.rst
>>> @@ -102,6 +102,8 @@ Possible return values are::
>>> PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
>>> PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
>>> PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
>>> + PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
>>> + PCI_ERS_RESULT_PANIC, /* System is unstable, panic. Is CXL specific */
>>> };
>>
>> I think you also need to update the "Detailed Steps" section of this
>> document to include details on when these new values should be returned
>> and how they affect the recovery flow.
>>
>
> I had details about PCI_ERS_RESULT_PANIC you mention in v13. Bjorne asked me to remove.
Sorry, I did not check the previous version.
What about PCI_ERS_RESULT_NO_AER_DRIVER? I think it needs to be included part
of STEP 1 details, but likely as a separate patch since it is unrelated to the
CXL changes in this series.
>
> -Terry
>
>>>
>>> A driver does not have to implement all of these callbacks; however,
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index f8e8b3df794d..ee05d5925b13 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -921,6 +921,9 @@ enum pci_ers_result {
>>>
>>> /* No AER capabilities registered for the driver */
>>> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
>>> +
>>> + /* System is unstable, panic. Is CXL specific */
>>> + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
>>> };
>>>
>>> /* PCI bus error event callbacks */
>>
>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 14/34] PCI/AER: Report CXL or PCIe bus type in AER trace logging
2026-01-14 18:20 ` [PATCH v14 14/34] PCI/AER: Report CXL or PCIe bus type in AER trace logging Terry Bowman
@ 2026-01-14 19:45 ` Jonathan Cameron
2026-01-15 15:55 ` Mauro Carvalho Chehab
2026-01-14 20:56 ` Dave Jiang
1 sibling, 1 reply; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 19:45 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci,
M.Chehab
On Wed, 14 Jan 2026 12:20:35 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> The AER service driver and aer_event tracing currently log 'PCIe Bus Type'
> for all errors. Update the driver and aer_event tracing to log 'CXL Bus
> Type' for CXL device errors.
>
> This requires that AER can identify and distinguish between PCIe errors and
> CXL errors.
>
> Introduce boolean 'is_cxl' to 'struct aer_err_info'. Add assignment in
> aer_get_device_error_info() and pci_print_aer().
>
> Update the aer_event trace routine to accept a bus type string parameter.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
I wonder if it is worth using __print_symbolic() etc and an integer
storage rather than a string for in the tracepoints.
However, not really that important to me as the strings are small anyway
and there is no precedence of this in ras trace events.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting
2026-01-14 18:20 ` [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting Terry Bowman
@ 2026-01-14 19:48 ` Jonathan Cameron
2026-01-15 20:56 ` dan.j.williams
2026-01-14 21:06 ` Dave Jiang
2026-01-22 18:29 ` Bjorn Helgaas
2 siblings, 1 reply; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 19:48 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:36 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> Update the existing 'struct aer_err_info' definition to use kernel-doc
> formatting. Remove the inline comments to reduce noise and do not introduce
> functional changes. This will improve readability and maintainability.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Hi Terry.
I didn't check but I think kernel-doc script will complain
about partial docs. Other than that possibly needing fixing with
a trivial entry for __pad1
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>
> ---
>
> Changes in v13->v14:
> - New commit
> ---
> drivers/pci/pci.h | 29 +++++++++++++++++++++++------
> 1 file changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 41ec38e82c08..dbc547db208a 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -724,16 +724,33 @@ static inline bool pci_dev_binding_disallowed(struct pci_dev *dev)
>
> #define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */
>
> +/**
> + * struct aer_err_info - AER Error Information
> + * @dev: Devices reporting error
> + * @ratelimit_print: Flag to log or not log the devices' error. 0=NotLog/1=Log
> + * @error_dev_num: Number of devices reporting an error
> + * @level: printk level to use in logging
> + * @id: Value from register PCI_ERR_ROOT_ERR_SRC
> + * @severity: AER severity, 0-UNCOR Non-fatal, 1-UNCOR fatal, 2-COR
> + * @root_ratelimit_print: Flag to log or not log the root's error. 0=NotLog/1=Log
The kernel-doc scripts are annoying fussy. Do they not moan about __pad1
being undocumented?
> + * @multi_error_valid: If multiple errors are reported
> + * @first_error: First reported error
> + * @is_cxl: Bus type error: 0-PCI Bus error, 1-CXL Bus error
> + * @tlp_header_valid: Indicates if TLP field contains error information
> + * @status: COR/UNCOR error status
> + * @mask: COR/UNCOR mask
> + * @tlp: Transaction packet information
> + */
> struct aer_err_info {
> struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
> int ratelimit_print[AER_MAX_MULTI_ERR_DEVICES];
> int error_dev_num;
> - const char *level; /* printk level */
> + const char *level;
>
> unsigned int id:16;
>
> - unsigned int severity:2; /* 0:NONFATAL | 1:FATAL | 2:COR */
> - unsigned int root_ratelimit_print:1; /* 0=skip, 1=print */
> + unsigned int severity:2;
> + unsigned int root_ratelimit_print:1;
> unsigned int __pad1:4;
> unsigned int multi_error_valid:1;
>
> @@ -742,9 +759,9 @@ struct aer_err_info {
> unsigned int is_cxl:1;
> unsigned int tlp_header_valid:1;
>
> - unsigned int status; /* COR/UNCOR Error Status */
> - unsigned int mask; /* COR/UNCOR Error Mask */
> - struct pcie_tlp_log tlp; /* TLP Header */
> + unsigned int status;
> + unsigned int mask;
> + struct pcie_tlp_log tlp;
> };
>
> int aer_get_device_error_info(struct aer_err_info *info, int i);
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm()
2026-01-14 18:20 ` [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm() Terry Bowman
@ 2026-01-14 19:49 ` Jonathan Cameron
2026-01-14 21:08 ` Dave Jiang
1 sibling, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 19:49 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:37 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> The convention for devm_ helpers in the CXL driver is that the first
> argument is the @host for the operation (locked driver::probe() context).
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers
2026-01-14 18:20 ` [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers Terry Bowman
@ 2026-01-14 19:50 ` Jonathan Cameron
2026-01-14 21:23 ` Dave Jiang
2026-01-16 3:15 ` dan.j.williams
2026-01-14 21:24 ` Dave Jiang
2026-01-16 3:21 ` dan.j.williams
2 siblings, 2 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-14 19:50 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:39 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Now that cxl_switch_port_probe() no longer walks potential dports, because
> they are enumerated dynamically on descendant endpoint arrival, remove the
> dead code.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Patch description doesn't match patch.
>
> ---
>
> Changes in v13 -> v14:
> - New patch
> ---
> drivers/cxl/core/pci.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index b838c59d7a3c..0305a421504e 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -71,6 +71,14 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
> }
> EXPORT_SYMBOL_NS_GPL(__devm_cxl_add_dport_by_dev, "CXL");
>
> +struct cxl_walk_context {
> + struct pci_bus *bus;
> + struct cxl_port *port;
> + int type;
> + int error;
> + int count;
> +};
> +
> static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
> {
> struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> @@ -820,14 +828,6 @@ int cxl_gpf_port_setup(struct cxl_dport *dport)
> return 0;
> }
>
> -struct cxl_walk_context {
> - struct pci_bus *bus;
> - struct cxl_port *port;
> - int type;
> - int error;
> - int count;
> -};
> -
> static int count_dports(struct pci_dev *pdev, void *data)
> {
> struct cxl_walk_context *ctx = data;
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native()
2026-01-14 18:20 ` [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native() Terry Bowman
2026-01-14 18:55 ` Jonathan Cameron
@ 2026-01-14 20:15 ` Dave Jiang
2026-01-22 18:23 ` Bjorn Helgaas
2 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 20:15 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> The AER driver includes a CXL support function cxl_error_is_native(). This
> function adds no additional value from pcie_aer_is_native().
>
> Simplify the codebase by removing cxl_error_is_native() and replace
> occurrences of cxl_error_is_native() with pcie_aer_is_native().
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v13->v14:
> - New commit (Dan)
> ---
> drivers/pci/pcie/aer.c | 11 ++---------
> 1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e0bcaa896803..c99ba2a1159c 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1166,13 +1166,6 @@ static bool is_cxl_mem_dev(struct pci_dev *dev)
> return true;
> }
>
> -static bool cxl_error_is_native(struct pci_dev *dev)
> -{
> - struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> -
> - return (pcie_ports_native || host->native_aer);
> -}
> -
> static bool is_internal_error(struct aer_err_info *info)
> {
> if (info->severity == AER_CORRECTABLE)
> @@ -1186,7 +1179,7 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> struct aer_err_info *info = (struct aer_err_info *)data;
> const struct pci_error_handlers *err_handler;
>
> - if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
> + if (!is_cxl_mem_dev(dev) || !pcie_aer_is_native(dev))
> return 0;
>
> /* Protect dev->driver */
> @@ -1227,7 +1220,7 @@ static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> bool *handles_cxl = data;
>
> if (!*handles_cxl)
> - *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
> + *handles_cxl = is_cxl_mem_dev(dev) && pcie_aer_is_native(dev);
>
> /* Non-zero terminates iteration */
> return *handles_cxl;
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native()
2026-01-14 18:55 ` Jonathan Cameron
@ 2026-01-14 20:16 ` Dave Jiang
0 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 20:16 UTC (permalink / raw)
To: Jonathan Cameron, Terry Bowman
Cc: dave, alison.schofield, dan.j.williams, bhelgaas, shiju.jose,
ming.li, Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On 1/14/26 11:55 AM, Jonathan Cameron wrote:
> On Wed, 14 Jan 2026 12:20:27 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
>
>> The AER driver includes a CXL support function cxl_error_is_native(). This
>> function adds no additional value from pcie_aer_is_native().
>>
>> Simplify the codebase by removing cxl_error_is_native() and replace
>> occurrences of cxl_error_is_native() with pcie_aer_is_native().
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>
> Dave, if for any reason the rest gets delayed, nice if we can
> pick this one up anyway.
Needs an Ack from Bjorn one way or another.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 08/34] cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c
2026-01-14 18:20 ` [PATCH v14 08/34] cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c Terry Bowman
@ 2026-01-14 20:35 ` Dave Jiang
0 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 20:35 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> Restricted CXL Host (RCH) protocol error handling uses a procedure distinct
> from the CXL Virtual Hierarchy (VH) handling. This is because of the
> differences in the RCH and VH topologies. Improve the maintainability and
> add ability to enable/disable RCH handling.
>
> Move and combine the RCH handling code into a single block conditionally
> compiled with the CONFIG_CXL_RCH_RAS kernel config.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>
> Changes in v13->v14:
> - Add sign-off for Dan and Jonathan
> - Revert inadvertent formatting of cxl_dport_map_rch_aer() (Jonathan)
> - Remove default value for CXL_RCH_RAS (Dan)
> - Remove unnecessary pci.h include in core.h & ras_rch.c (Jonathan)
> - Add linux/types.h include in ras_rch.c (Jonathan)
> - Change CONFIG_CXL_RCH_RAS -> CONFIG_CXL_RAS (Dan)
>
> Changes in v12->v13:
> - None
>
> Changes v11->v12:
> - Moved CXL_RCH_RAS Kconfig definition here from following commit.
>
> Changes v10->v11:
> - New patch
> ---
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/core.h | 11 +---
> drivers/cxl/core/pci.c | 115 -----------------------------------
> drivers/cxl/core/ras_rch.c | 121 +++++++++++++++++++++++++++++++++++++
> tools/testing/cxl/Kbuild | 1 +
> 5 files changed, 126 insertions(+), 123 deletions(-)
> create mode 100644 drivers/cxl/core/ras_rch.c
>
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index b2930cc54f8b..b37f38d502d8 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -20,3 +20,4 @@ cxl_core-$(CONFIG_CXL_MCE) += mce.o
> cxl_core-$(CONFIG_CXL_FEATURES) += features.o
> cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
> cxl_core-$(CONFIG_CXL_RAS) += ras.o
> +cxl_core-$(CONFIG_CXL_RAS) += ras_rch.o
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index bc818de87ccc..724361195057 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -149,6 +149,9 @@ int cxl_ras_init(void);
> void cxl_ras_exit(void);
> bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
> void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
> +void cxl_dport_map_rch_aer(struct cxl_dport *dport);
> +void cxl_disable_rch_root_ints(struct cxl_dport *dport);
> +void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
> #else
> static inline int cxl_ras_init(void)
> {
> @@ -164,14 +167,6 @@ static inline bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras
> return false;
> }
> static inline void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base) { }
> -#endif /* CONFIG_CXL_RAS */
> -
> -/* Restricted CXL Host specific RAS functions */
> -#ifdef CONFIG_CXL_RAS
> -void cxl_dport_map_rch_aer(struct cxl_dport *dport);
> -void cxl_disable_rch_root_ints(struct cxl_dport *dport);
> -void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
> -#else
> static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
> static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
> static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index e132fff80979..b838c59d7a3c 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -632,121 +632,6 @@ void read_cdat_data(struct cxl_port *port)
> }
> EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL");
>
> -#ifdef CONFIG_CXL_RAS
> -void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> -{
> - resource_size_t aer_phys;
> - struct device *host;
> - u16 aer_cap;
> -
> - aer_cap = cxl_rcrb_to_aer(dport->dport_dev, dport->rcrb.base);
> - if (aer_cap) {
> - host = dport->reg_map.host;
> - aer_phys = aer_cap + dport->rcrb.base;
> - dport->regs.dport_aer = devm_cxl_iomap_block(host, aer_phys,
> - sizeof(struct aer_capability_regs));
> - }
> -}
> -
> -void cxl_disable_rch_root_ints(struct cxl_dport *dport)
> -{
> - void __iomem *aer_base = dport->regs.dport_aer;
> - u32 aer_cmd_mask, aer_cmd;
> -
> - if (!aer_base)
> - return;
> -
> - /*
> - * Disable RCH root port command interrupts.
> - * CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors
> - *
> - * This sequence may not be necessary. CXL spec states disabling
> - * the root cmd register's interrupts is required. But, PCI spec
> - * shows these are disabled by default on reset.
> - */
> - aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN |
> - PCI_ERR_ROOT_CMD_NONFATAL_EN |
> - PCI_ERR_ROOT_CMD_FATAL_EN);
> - aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND);
> - aer_cmd &= ~aer_cmd_mask;
> - writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
> -}
> -
> -/*
> - * Copy the AER capability registers using 32 bit read accesses.
> - * This is necessary because RCRB AER capability is MMIO mapped. Clear the
> - * status after copying.
> - *
> - * @aer_base: base address of AER capability block in RCRB
> - * @aer_regs: destination for copying AER capability
> - */
> -static bool cxl_rch_get_aer_info(void __iomem *aer_base,
> - struct aer_capability_regs *aer_regs)
> -{
> - int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
> - u32 *aer_regs_buf = (u32 *)aer_regs;
> - int n;
> -
> - if (!aer_base)
> - return false;
> -
> - /* Use readl() to guarantee 32-bit accesses */
> - for (n = 0; n < read_cnt; n++)
> - aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
> -
> - writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
> - writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
> -
> - return true;
> -}
> -
> -/* Get AER severity. Return false if there is no error. */
> -static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
> - int *severity)
> -{
> - if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
> - if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
> - *severity = AER_FATAL;
> - else
> - *severity = AER_NONFATAL;
> - return true;
> - }
> -
> - if (aer_regs->cor_status & ~aer_regs->cor_mask) {
> - *severity = AER_CORRECTABLE;
> - return true;
> - }
> -
> - return false;
> -}
> -
> -void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> -{
> - struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> - struct aer_capability_regs aer_regs;
> - struct cxl_dport *dport;
> - int severity;
> -
> - struct cxl_port *port __free(put_cxl_port) =
> - cxl_pci_find_port(pdev, &dport);
> - if (!port)
> - return;
> -
> - if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
> - return;
> -
> - if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
> - return;
> -
> - pci_print_aer(pdev, severity, &aer_regs);
> -
> - if (severity == AER_CORRECTABLE)
> - cxl_handle_cor_ras(cxlds, dport->regs.ras);
> - else
> - cxl_handle_ras(cxlds, dport->regs.ras);
> -}
> -#endif
> -
> static int cxl_flit_size(struct pci_dev *pdev)
> {
> if (cxl_pci_flit_256(pdev))
> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
> new file mode 100644
> index 000000000000..ed58afd18ecc
> --- /dev/null
> +++ b/drivers/cxl/core/ras_rch.c
> @@ -0,0 +1,121 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2025 AMD Corporation. All rights reserved. */
> +
> +#include <linux/types.h>
> +#include <linux/aer.h>
> +#include "cxl.h"
> +#include "core.h"
> +#include "cxlmem.h"
> +
> +void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> +{
> + resource_size_t aer_phys;
> + struct device *host;
> + u16 aer_cap;
> +
> + aer_cap = cxl_rcrb_to_aer(dport->dport_dev, dport->rcrb.base);
> + if (aer_cap) {
> + host = dport->reg_map.host;
> + aer_phys = aer_cap + dport->rcrb.base;
> + dport->regs.dport_aer =
> + devm_cxl_iomap_block(host, aer_phys,
> + sizeof(struct aer_capability_regs));
> + }
> +}
> +
> +void cxl_disable_rch_root_ints(struct cxl_dport *dport)
> +{
> + void __iomem *aer_base = dport->regs.dport_aer;
> + u32 aer_cmd_mask, aer_cmd;
> +
> + if (!aer_base)
> + return;
> +
> + /*
> + * Disable RCH root port command interrupts.
> + * CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors
> + *
> + * This sequence may not be necessary. CXL spec states disabling
> + * the root cmd register's interrupts is required. But, PCI spec
> + * shows these are disabled by default on reset.
> + */
> + aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN |
> + PCI_ERR_ROOT_CMD_NONFATAL_EN |
> + PCI_ERR_ROOT_CMD_FATAL_EN);
> + aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND);
> + aer_cmd &= ~aer_cmd_mask;
> + writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
> +}
> +
> +/*
> + * Copy the AER capability registers using 32 bit read accesses.
> + * This is necessary because RCRB AER capability is MMIO mapped. Clear the
> + * status after copying.
> + *
> + * @aer_base: base address of AER capability block in RCRB
> + * @aer_regs: destination for copying AER capability
> + */
> +static bool cxl_rch_get_aer_info(void __iomem *aer_base,
> + struct aer_capability_regs *aer_regs)
> +{
> + int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
> + u32 *aer_regs_buf = (u32 *)aer_regs;
> + int n;
> +
> + if (!aer_base)
> + return false;
> +
> + /* Use readl() to guarantee 32-bit accesses */
> + for (n = 0; n < read_cnt; n++)
> + aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
> +
> + writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
> + writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
> +
> + return true;
> +}
> +
> +/* Get AER severity. Return false if there is no error. */
> +static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
> + int *severity)
> +{
> + if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
> + if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
> + *severity = AER_FATAL;
> + else
> + *severity = AER_NONFATAL;
> + return true;
> + }
> +
> + if (aer_regs->cor_status & ~aer_regs->cor_mask) {
> + *severity = AER_CORRECTABLE;
> + return true;
> + }
> +
> + return false;
> +}
> +
> +void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> +{
> + struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> + struct aer_capability_regs aer_regs;
> + struct cxl_dport *dport;
> + int severity;
> +
> + struct cxl_port *port __free(put_cxl_port) =
> + cxl_pci_find_port(pdev, &dport);
> + if (!port)
> + return;
> +
> + if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
> + return;
> +
> + if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
> + return;
> +
> + pci_print_aer(pdev, severity, &aer_regs);
> + if (severity == AER_CORRECTABLE)
> + cxl_handle_cor_ras(cxlds, dport->regs.ras);
> + else
> + cxl_handle_ras(cxlds, dport->regs.ras);
> +}
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index b7ea66382f3b..6eceefefb0e0 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -63,6 +63,7 @@ cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
> cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
> cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
> cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras.o
> +cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras_rch.o
> cxl_core-y += config_check.o
> cxl_core-y += cxl_core_test.o
> cxl_core-y += cxl_core_exports.o
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors()
2026-01-14 18:20 ` [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
2026-01-14 19:01 ` Jonathan Cameron
2026-01-14 19:09 ` Kuppuswamy Sathyanarayanan
@ 2026-01-14 20:40 ` Dave Jiang
2026-01-20 2:09 ` dan.j.williams
2026-01-22 18:49 ` Bjorn Helgaas
4 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 20:40 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> Internal PCIe errors are not enabled by default during initialization. This
> creates a problem for CXL drivers, which rely on PCIe Correctable and
> Uncorrectable Internal Errors to receive CXL protocol error notifications.
>
> Export pci_aer_unmask_internal_errors() so CXL and other drivers can
> enable internal PCIe errors.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v13->v14:
> - New commit. Bjorn requested separating out and adding immediatetly
> before being used. This is called from cxl_rch_enable_rcec() in
> following patch.
> ---
> drivers/pci/pcie/aer.c | 6 +++---
> include/linux/aer.h | 2 ++
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index c99ba2a1159c..63658e691aa2 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1120,8 +1120,6 @@ static bool find_source_device(struct pci_dev *parent,
> return true;
> }
>
> -#ifdef CONFIG_PCIEAER_CXL
> -
> /**
> * pci_aer_unmask_internal_errors - unmask internal errors
> * @dev: pointer to the pci_dev data structure
> @@ -1132,7 +1130,7 @@ static bool find_source_device(struct pci_dev *parent,
> * Note: AER must be enabled and supported by the device which must be
> * checked in advance, e.g. with pcie_aer_is_native().
> */
> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> {
> int aer = dev->aer_cap;
> u32 mask;
> @@ -1145,7 +1143,9 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> mask &= ~PCI_ERR_COR_INTERNAL;
> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> }
> +EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
>
> +#ifdef CONFIG_PCIEAER_CXL
> static bool is_cxl_mem_dev(struct pci_dev *dev)
> {
> /*
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 02940be66324..df0f5c382286 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -56,12 +56,14 @@ struct aer_capability_regs {
> #if defined(CONFIG_PCIEAER)
> int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> int pcie_aer_is_native(struct pci_dev *dev);
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> #else
> static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> {
> return -EINVAL;
> }
> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> #endif
>
> void pci_print_aer(struct pci_dev *dev, int aer_severity,
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS
2026-01-14 18:20 ` [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS Terry Bowman
2026-01-14 19:12 ` Jonathan Cameron
@ 2026-01-14 20:49 ` Dave Jiang
2026-01-14 20:50 ` Dave Jiang
2026-01-22 18:24 ` Bjorn Helgaas
3 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 20:49 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> One of the primary reasons for the CXL driver to exist is to perform error
> handling. If both PCIEAER and CXL are enabled then light up CXL error
> handling as well. The work to remove CONFIG_PCIEAER_CXL started in:
>
> commit 4ae6ae66649c ("cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c")
>
> Finish that off with conditionally compiling all CXL RAS related helpers
> with CONFIG_CXL_RAS.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ----
>
> Changes in v13->v14:
> - New commit
> ---
> drivers/cxl/Kconfig | 2 +-
> drivers/pci/pcie/Kconfig | 9 ---------
> 2 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 217888992c88..70acddc08c39 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -235,6 +235,6 @@ config CXL_MCE
>
> config CXL_RAS
> def_bool y
> - depends on ACPI_APEI_GHES && PCIEAER && CXL_PCI
> + depends on ACPI_APEI_GHES && PCIEAER && CXL_BUS
>
> endif
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 17919b99fa66..207c2deae35f 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,15 +49,6 @@ config PCIEAER_INJECT
> gotten from:
> https://github.com/intel/aer-inject.git
>
> -config PCIEAER_CXL
> - bool "PCI Express CXL RAS support"
> - default y
> - depends on PCIEAER && CXL_PCI
> - help
> - Enables CXL error handling.
> -
> - If unsure, say Y.
> -
> #
> # PCI Express ECRC
> #
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS
2026-01-14 18:20 ` [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS Terry Bowman
2026-01-14 19:12 ` Jonathan Cameron
2026-01-14 20:49 ` Dave Jiang
@ 2026-01-14 20:50 ` Dave Jiang
2026-01-22 18:24 ` Bjorn Helgaas
3 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 20:50 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> One of the primary reasons for the CXL driver to exist is to perform error
> handling. If both PCIEAER and CXL are enabled then light up CXL error
> handling as well. The work to remove CONFIG_PCIEAER_CXL started in:
>
> commit 4ae6ae66649c ("cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c")
>
> Finish that off with conditionally compiling all CXL RAS related helpers
> with CONFIG_CXL_RAS.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Terry, if you are including this patch from Dan in your series, you need to sign off on it.
>
> ----
>
> Changes in v13->v14:
> - New commit
> ---
> drivers/cxl/Kconfig | 2 +-
> drivers/pci/pcie/Kconfig | 9 ---------
> 2 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 217888992c88..70acddc08c39 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -235,6 +235,6 @@ config CXL_MCE
>
> config CXL_RAS
> def_bool y
> - depends on ACPI_APEI_GHES && PCIEAER && CXL_PCI
> + depends on ACPI_APEI_GHES && PCIEAER && CXL_BUS
>
> endif
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 17919b99fa66..207c2deae35f 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,15 +49,6 @@ config PCIEAER_INJECT
> gotten from:
> https://github.com/intel/aer-inject.git
>
> -config PCIEAER_CXL
> - bool "PCI Express CXL RAS support"
> - default y
> - depends on PCIEAER && CXL_PCI
> - help
> - Enables CXL error handling.
> -
> - If unsure, say Y.
> -
> #
> # PCI Express ECRC
> #
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 07/34] cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c
2026-01-14 18:20 ` [PATCH v14 07/34] cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c Terry Bowman
@ 2026-01-14 20:51 ` Dave Jiang
0 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 20:51 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dave Jiang <dave.jiang@intel.com>
>
> Create new config CONFIG_CXL_RAS and put all CXL RAS items behind the
> config. The config will depend on CPER and PCIE AER to build. Move the
> related VH RAS code from core/pci.c to core/ras.c.
>
> Restricted CXL host (RCH) RAS functions will be moved in a future patch.
>
> Cc: Robert Richter <rrichter@amd.com>
> Cc: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Terry, missing your sign off as well.
>
> ---
>
> Changes in v13->v14:
> - None
>
> Changes in v12->v13:
> - None
>
> Changes in v11->v12:
> - None
>
> Changes in v10->v11:
> - New patch
> - Updated by Terry Bowman to use (ACPI_APEI_GHES && PCIEAER_CXL) dependency
> in Kconfig. Otherwise checks will be reauired for CONFIG_PCIEAER because
> AER driver functions are called.
> ---
> drivers/cxl/Kconfig | 4 +
> drivers/cxl/core/Makefile | 2 +-
> drivers/cxl/core/core.h | 31 +++++++
> drivers/cxl/core/pci.c | 189 +-------------------------------------
> drivers/cxl/core/ras.c | 176 +++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 8 --
> drivers/cxl/cxlpci.h | 16 ++++
> tools/testing/cxl/Kbuild | 2 +-
> 8 files changed, 233 insertions(+), 195 deletions(-)
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 48b7314afdb8..217888992c88 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -233,4 +233,8 @@ config CXL_MCE
> def_bool y
> depends on X86_MCE && MEMORY_FAILURE
>
> +config CXL_RAS
> + def_bool y
> + depends on ACPI_APEI_GHES && PCIEAER && CXL_PCI
> +
> endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 5ad8fef210b5..b2930cc54f8b 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -14,9 +14,9 @@ cxl_core-y += pci.o
> cxl_core-y += hdm.o
> cxl_core-y += pmu.o
> cxl_core-y += cdat.o
> -cxl_core-y += ras.o
> cxl_core-$(CONFIG_TRACING) += trace.o
> cxl_core-$(CONFIG_CXL_REGION) += region.o
> cxl_core-$(CONFIG_CXL_MCE) += mce.o
> cxl_core-$(CONFIG_CXL_FEATURES) += features.o
> cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
> +cxl_core-$(CONFIG_CXL_RAS) += ras.o
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 1fb66132b777..bc818de87ccc 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -144,8 +144,39 @@ int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c);
> int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
> struct access_coordinate *c);
>
> +#ifdef CONFIG_CXL_RAS
> int cxl_ras_init(void);
> void cxl_ras_exit(void);
> +bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
> +void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
> +#else
> +static inline int cxl_ras_init(void)
> +{
> + return 0;
> +}
> +
> +static inline void cxl_ras_exit(void)
> +{
> +}
> +
> +static inline bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> +{
> + return false;
> +}
> +static inline void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base) { }
> +#endif /* CONFIG_CXL_RAS */
> +
> +/* Restricted CXL Host specific RAS functions */
> +#ifdef CONFIG_CXL_RAS
> +void cxl_dport_map_rch_aer(struct cxl_dport *dport);
> +void cxl_disable_rch_root_ints(struct cxl_dport *dport);
> +void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
> +#else
> +static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
> +static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
> +static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
> +#endif /* CONFIG_CXL_RAS */
> +
> int cxl_gpf_port_setup(struct cxl_dport *dport);
>
> struct cxl_hdm;
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 51bb0f372e40..e132fff80979 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -632,81 +632,8 @@ void read_cdat_data(struct cxl_port *port)
> }
> EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL");
>
> -static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds,
> - void __iomem *ras_base)
> -{
> - void __iomem *addr;
> - u32 status;
> -
> - if (!ras_base)
> - return;
> -
> - addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> - status = readl(addr);
> - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> - trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
> - }
> -}
> -
> -/* CXL spec rev3.0 8.2.4.16.1 */
> -static void header_log_copy(void __iomem *ras_base, u32 *log)
> -{
> - void __iomem *addr;
> - u32 *log_addr;
> - int i, log_u32_size = CXL_HEADERLOG_SIZE / sizeof(u32);
> -
> - addr = ras_base + CXL_RAS_HEADER_LOG_OFFSET;
> - log_addr = log;
> -
> - for (i = 0; i < log_u32_size; i++) {
> - *log_addr = readl(addr);
> - log_addr++;
> - addr += sizeof(u32);
> - }
> -}
> -
> -/*
> - * Log the state of the RAS status registers and prepare them to log the
> - * next error status. Return 1 if reset needed.
> - */
> -static bool cxl_handle_ras(struct cxl_dev_state *cxlds,
> - void __iomem *ras_base)
> -{
> - u32 hl[CXL_HEADERLOG_SIZE_U32];
> - void __iomem *addr;
> - u32 status;
> - u32 fe;
> -
> - if (!ras_base)
> - return false;
> -
> - addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
> - status = readl(addr);
> - if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
> - return false;
> -
> - /* If multiple errors, log header points to first error from ctrl reg */
> - if (hweight32(status) > 1) {
> - void __iomem *rcc_addr =
> - ras_base + CXL_RAS_CAP_CONTROL_OFFSET;
> -
> - fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
> - readl(rcc_addr)));
> - } else {
> - fe = status;
> - }
> -
> - header_log_copy(ras_base, hl);
> - trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl);
> - writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
> -
> - return true;
> -}
> -
> -#ifdef CONFIG_PCIEAER_CXL
> -
> -static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> +#ifdef CONFIG_CXL_RAS
> +void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> {
> resource_size_t aer_phys;
> struct device *host;
> @@ -721,19 +648,7 @@ static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> }
> }
>
> -static void cxl_dport_map_ras(struct cxl_dport *dport)
> -{
> - struct cxl_register_map *map = &dport->reg_map;
> - struct device *dev = dport->dport_dev;
> -
> - if (!map->component_map.ras.valid)
> - dev_dbg(dev, "RAS registers not found\n");
> - else if (cxl_map_component_regs(map, &dport->regs.component,
> - BIT(CXL_CM_CAP_CAP_ID_RAS)))
> - dev_dbg(dev, "Failed to map RAS capability.\n");
> -}
> -
> -static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
> +void cxl_disable_rch_root_ints(struct cxl_dport *dport)
> {
> void __iomem *aer_base = dport->regs.dport_aer;
> u32 aer_cmd_mask, aer_cmd;
> @@ -757,28 +672,6 @@ static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
> writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
> }
>
> -/**
> - * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
> - * @dport: the cxl_dport that needs to be initialized
> - * @host: host device for devm operations
> - */
> -void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
> -{
> - dport->reg_map.host = host;
> - cxl_dport_map_ras(dport);
> -
> - if (dport->rch) {
> - struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
> -
> - if (!host_bridge->native_aer)
> - return;
> -
> - cxl_dport_map_rch_aer(dport);
> - cxl_disable_rch_root_ints(dport);
> - }
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
> -
> /*
> * Copy the AER capability registers using 32 bit read accesses.
> * This is necessary because RCRB AER capability is MMIO mapped. Clear the
> @@ -827,7 +720,7 @@ static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
> return false;
> }
>
> -static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> +void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> {
> struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> struct aer_capability_regs aer_regs;
> @@ -852,82 +745,8 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> else
> cxl_handle_ras(cxlds, dport->regs.ras);
> }
> -
> -#else
> -static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
> #endif
>
> -void cxl_cor_error_detected(struct pci_dev *pdev)
> -{
> - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> - struct device *dev = &cxlds->cxlmd->dev;
> -
> - scoped_guard(device, dev) {
> - if (!dev->driver) {
> - dev_warn(&pdev->dev,
> - "%s: memdev disabled, abort error handling\n",
> - dev_name(dev));
> - return;
> - }
> -
> - if (cxlds->rcd)
> - cxl_handle_rdport_errors(cxlds);
> -
> - cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
> - }
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
> -
> -pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> - pci_channel_state_t state)
> -{
> - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> - struct cxl_memdev *cxlmd = cxlds->cxlmd;
> - struct device *dev = &cxlmd->dev;
> - bool ue;
> -
> - scoped_guard(device, dev) {
> - if (!dev->driver) {
> - dev_warn(&pdev->dev,
> - "%s: memdev disabled, abort error handling\n",
> - dev_name(dev));
> - return PCI_ERS_RESULT_DISCONNECT;
> - }
> -
> - if (cxlds->rcd)
> - cxl_handle_rdport_errors(cxlds);
> - /*
> - * A frozen channel indicates an impending reset which is fatal to
> - * CXL.mem operation, and will likely crash the system. On the off
> - * chance the situation is recoverable dump the status of the RAS
> - * capability registers and bounce the active state of the memdev.
> - */
> - ue = cxl_handle_ras(cxlds, cxlds->regs.ras);
> - }
> -
> -
> - switch (state) {
> - case pci_channel_io_normal:
> - if (ue) {
> - device_release_driver(dev);
> - return PCI_ERS_RESULT_NEED_RESET;
> - }
> - return PCI_ERS_RESULT_CAN_RECOVER;
> - case pci_channel_io_frozen:
> - dev_warn(&pdev->dev,
> - "%s: frozen state error detected, disable CXL.mem\n",
> - dev_name(dev));
> - device_release_driver(dev);
> - return PCI_ERS_RESULT_NEED_RESET;
> - case pci_channel_io_perm_failure:
> - dev_warn(&pdev->dev,
> - "failure state error detected, request disconnect\n");
> - return PCI_ERS_RESULT_DISCONNECT;
> - }
> - return PCI_ERS_RESULT_NEED_RESET;
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
> -
> static int cxl_flit_size(struct pci_dev *pdev)
> {
> if (cxl_pci_flit_256(pdev))
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 2731ba3a0799..b933030b8e1e 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -5,6 +5,7 @@
> #include <linux/aer.h>
> #include <cxl/event.h>
> #include <cxlmem.h>
> +#include <cxlpci.h>
> #include "trace.h"
>
> static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
> @@ -124,3 +125,178 @@ void cxl_ras_exit(void)
> cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> cancel_work_sync(&cxl_cper_prot_err_work);
> }
> +
> +static void cxl_dport_map_ras(struct cxl_dport *dport)
> +{
> + struct cxl_register_map *map = &dport->reg_map;
> + struct device *dev = dport->dport_dev;
> +
> + if (!map->component_map.ras.valid)
> + dev_dbg(dev, "RAS registers not found\n");
> + else if (cxl_map_component_regs(map, &dport->regs.component,
> + BIT(CXL_CM_CAP_CAP_ID_RAS)))
> + dev_dbg(dev, "Failed to map RAS capability.\n");
> +}
> +
> +/**
> + * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
> + * @dport: the cxl_dport that needs to be initialized
> + * @host: host device for devm operations
> + */
> +void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
> +{
> + dport->reg_map.host = host;
> + cxl_dport_map_ras(dport);
> +
> + if (dport->rch) {
> + struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
> +
> + if (!host_bridge->native_aer)
> + return;
> +
> + cxl_dport_map_rch_aer(dport);
> + cxl_disable_rch_root_ints(dport);
> + }
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
> +
> +void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> +{
> + void __iomem *addr;
> + u32 status;
> +
> + if (!ras_base)
> + return;
> +
> + addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> + status = readl(addr);
> + if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> + trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
> + }
> +}
> +
> +/* CXL spec rev3.0 8.2.4.16.1 */
> +static void header_log_copy(void __iomem *ras_base, u32 *log)
> +{
> + void __iomem *addr;
> + u32 *log_addr;
> + int i, log_u32_size = CXL_HEADERLOG_SIZE / sizeof(u32);
> +
> + addr = ras_base + CXL_RAS_HEADER_LOG_OFFSET;
> + log_addr = log;
> +
> + for (i = 0; i < log_u32_size; i++) {
> + *log_addr = readl(addr);
> + log_addr++;
> + addr += sizeof(u32);
> + }
> +}
> +
> +/*
> + * Log the state of the RAS status registers and prepare them to log the
> + * next error status. Return 1 if reset needed.
> + */
> +bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> +{
> + u32 hl[CXL_HEADERLOG_SIZE_U32];
> + void __iomem *addr;
> + u32 status;
> + u32 fe;
> +
> + if (!ras_base)
> + return false;
> +
> + addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
> + status = readl(addr);
> + if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
> + return false;
> +
> + /* If multiple errors, log header points to first error from ctrl reg */
> + if (hweight32(status) > 1) {
> + void __iomem *rcc_addr =
> + ras_base + CXL_RAS_CAP_CONTROL_OFFSET;
> +
> + fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
> + readl(rcc_addr)));
> + } else {
> + fe = status;
> + }
> +
> + header_log_copy(ras_base, hl);
> + trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl);
> + writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
> +
> + return true;
> +}
> +
> +void cxl_cor_error_detected(struct pci_dev *pdev)
> +{
> + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> + struct device *dev = &cxlds->cxlmd->dev;
> +
> + scoped_guard(device, dev) {
> + if (!dev->driver) {
> + dev_warn(&pdev->dev,
> + "%s: memdev disabled, abort error handling\n",
> + dev_name(dev));
> + return;
> + }
> +
> + if (cxlds->rcd)
> + cxl_handle_rdport_errors(cxlds);
> +
> + cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
> + }
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
> +
> +pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> + pci_channel_state_t state)
> +{
> + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> + struct cxl_memdev *cxlmd = cxlds->cxlmd;
> + struct device *dev = &cxlmd->dev;
> + bool ue;
> +
> + scoped_guard(device, dev) {
> + if (!dev->driver) {
> + dev_warn(&pdev->dev,
> + "%s: memdev disabled, abort error handling\n",
> + dev_name(dev));
> + return PCI_ERS_RESULT_DISCONNECT;
> + }
> +
> + if (cxlds->rcd)
> + cxl_handle_rdport_errors(cxlds);
> + /*
> + * A frozen channel indicates an impending reset which is fatal to
> + * CXL.mem operation, and will likely crash the system. On the off
> + * chance the situation is recoverable dump the status of the RAS
> + * capability registers and bounce the active state of the memdev.
> + */
> + ue = cxl_handle_ras(cxlds, cxlds->regs.ras);
> + }
> +
> +
> + switch (state) {
> + case pci_channel_io_normal:
> + if (ue) {
> + device_release_driver(dev);
> + return PCI_ERS_RESULT_NEED_RESET;
> + }
> + return PCI_ERS_RESULT_CAN_RECOVER;
> + case pci_channel_io_frozen:
> + dev_warn(&pdev->dev,
> + "%s: frozen state error detected, disable CXL.mem\n",
> + dev_name(dev));
> + device_release_driver(dev);
> + return PCI_ERS_RESULT_NEED_RESET;
> + case pci_channel_io_perm_failure:
> + dev_warn(&pdev->dev,
> + "failure state error detected, request disconnect\n");
> + return PCI_ERS_RESULT_DISCONNECT;
> + }
> + return PCI_ERS_RESULT_NEED_RESET;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index ba17fa86d249..42a76a7a088f 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -803,14 +803,6 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
> struct device *dport_dev, int port_id,
> resource_size_t rcrb);
>
> -#ifdef CONFIG_PCIEAER_CXL
> -void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport);
> -void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
> -#else
> -static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
> - struct device *host) { }
> -#endif
> -
> struct cxl_decoder *to_cxl_decoder(struct device *dev);
> struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
> struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index cdb7cf3dbcb4..6f9c78886fd9 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -76,7 +76,23 @@ static inline bool cxl_pci_flit_256(struct pci_dev *pdev)
>
> struct cxl_dev_state;
> void read_cdat_data(struct cxl_port *port);
> +
> +#ifdef CONFIG_CXL_RAS
> void cxl_cor_error_detected(struct pci_dev *pdev);
> pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> pci_channel_state_t state);
> +void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
> +#else
> +static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
> +
> +static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> + pci_channel_state_t state)
> +{
> + return PCI_ERS_RESULT_NONE;
> +}
> +
> +static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
> + struct device *host) { }
> +#endif
> +
> #endif /* __CXL_PCI_H__ */
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index 0e151d0572d1..b7ea66382f3b 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -57,12 +57,12 @@ cxl_core-y += $(CXL_CORE_SRC)/pci.o
> cxl_core-y += $(CXL_CORE_SRC)/hdm.o
> cxl_core-y += $(CXL_CORE_SRC)/pmu.o
> cxl_core-y += $(CXL_CORE_SRC)/cdat.o
> -cxl_core-y += $(CXL_CORE_SRC)/ras.o
> cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
> cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
> cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
> cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
> cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
> +cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras.o
> cxl_core-y += config_check.o
> cxl_core-y += cxl_core_test.o
> cxl_core-y += cxl_core_exports.o
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 14/34] PCI/AER: Report CXL or PCIe bus type in AER trace logging
2026-01-14 18:20 ` [PATCH v14 14/34] PCI/AER: Report CXL or PCIe bus type in AER trace logging Terry Bowman
2026-01-14 19:45 ` Jonathan Cameron
@ 2026-01-14 20:56 ` Dave Jiang
1 sibling, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 20:56 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> The AER service driver and aer_event tracing currently log 'PCIe Bus Type'
> for all errors. Update the driver and aer_event tracing to log 'CXL Bus
> Type' for CXL device errors.
>
> This requires that AER can identify and distinguish between PCIe errors and
> CXL errors.
>
> Introduce boolean 'is_cxl' to 'struct aer_err_info'. Add assignment in
> aer_get_device_error_info() and pci_print_aer().
>
> Update the aer_event trace routine to accept a bus type string parameter.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v13->v14:
> - Merged with Dan's commit. Changes are moving bus_type the last
> parameter in function calls (Dan)
> - Removed all DCOs because of changes (Terry)
> - Update commit message (Bjorn)
> - Add Bjorn's ack-by
>
> Changes in v12->v13:
> - Remove duplicated aer_err_info inline comments. Is already in the
> kernel-doc header (Ben)
>
> Changes in v11->v12:
> - Change aer_err_info::is_cxl to be bool a bitfield. Update structure
> padding. (Lukas)
> - Add kernel-doc for 'struct aer_err_info' (Lukas)
>
> Changes in v10->v11:
> - Remove duplicate call to trace_aer_event() (Shiju)
> - Added Dan William's and Dave Jiang's reviewed-by
> ---
> drivers/pci/pci.h | 8 +++++++-
> drivers/pci/pcie/aer.c | 20 +++++++++++++-------
> include/ras/ras_event.h | 12 ++++++++----
> 3 files changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 0e67014aa001..41ec38e82c08 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -738,7 +738,8 @@ struct aer_err_info {
> unsigned int multi_error_valid:1;
>
> unsigned int first_error:5;
> - unsigned int __pad2:2;
> + unsigned int __pad2:1;
> + unsigned int is_cxl:1;
> unsigned int tlp_header_valid:1;
>
> unsigned int status; /* COR/UNCOR Error Status */
> @@ -749,6 +750,11 @@ struct aer_err_info {
> int aer_get_device_error_info(struct aer_err_info *info, int i);
> void aer_print_error(struct aer_err_info *info, int i);
>
> +static inline const char *aer_err_bus(struct aer_err_info *info)
> +{
> + return info->is_cxl ? "CXL" : "PCIe";
> +}
> +
> int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
> unsigned int tlp_len, bool flit,
> struct pcie_tlp_log *log);
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index b1e6ee7468b9..d30a217fae46 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -870,6 +870,7 @@ void aer_print_error(struct aer_err_info *info, int i)
> struct pci_dev *dev;
> int layer, agent, id;
> const char *level = info->level;
> + const char *bus_type = aer_err_bus(info);
>
> if (WARN_ON_ONCE(i >= AER_MAX_MULTI_ERR_DEVICES))
> return;
> @@ -879,22 +880,22 @@ void aer_print_error(struct aer_err_info *info, int i)
>
> pci_dev_aer_stats_incr(dev, info);
> trace_aer_event(pci_name(dev), (info->status & ~info->mask),
> - info->severity, info->tlp_header_valid, &info->tlp);
> + info->severity, info->tlp_header_valid, &info->tlp, bus_type);
>
> if (!info->ratelimit_print[i])
> return;
>
> if (!info->status) {
> - pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
> - aer_error_severity_string[info->severity]);
> + pci_err(dev, "%s Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
> + bus_type, aer_error_severity_string[info->severity]);
> goto out;
> }
>
> layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> agent = AER_GET_AGENT(info->severity, info->status);
>
> - aer_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> - aer_error_severity_string[info->severity],
> + aer_printk(level, dev, "%s Bus Error: severity=%s, type=%s, (%s)\n",
> + bus_type, aer_error_severity_string[info->severity],
> aer_error_layer[layer], aer_agent_string[agent]);
>
> aer_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
> @@ -928,6 +929,7 @@ EXPORT_SYMBOL_GPL(cper_severity_to_aer);
> void pci_print_aer(struct pci_dev *dev, int aer_severity,
> struct aer_capability_regs *aer)
> {
> + const char *bus_type;
> int layer, agent, tlp_header_valid = 0;
> u32 status, mask;
> struct aer_err_info info = {
> @@ -948,10 +950,13 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
>
> info.status = status;
> info.mask = mask;
> + info.is_cxl = pcie_is_cxl(dev);
> +
> + bus_type = aer_err_bus(&info);
>
> pci_dev_aer_stats_incr(dev, &info);
> - trace_aer_event(pci_name(dev), (status & ~mask),
> - aer_severity, tlp_header_valid, &aer->header_log);
> + trace_aer_event(pci_name(dev), (status & ~mask), aer_severity,
> + tlp_header_valid, &aer->header_log, bus_type);
>
> if (!aer_ratelimit(dev, info.severity))
> return;
> @@ -1301,6 +1306,7 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
> /* Must reset in this function */
> info->status = 0;
> info->tlp_header_valid = 0;
> + info->is_cxl = pcie_is_cxl(dev);
>
> /* The device might not support AER */
> if (!aer)
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index eaecc3c5f772..fdb785fa4613 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -339,9 +339,11 @@ TRACE_EVENT(aer_event,
> const u32 status,
> const u8 severity,
> const u8 tlp_header_valid,
> - struct pcie_tlp_log *tlp),
> + struct pcie_tlp_log *tlp,
> + const char *bus_type),
>
> - TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp),
> +
> + TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp, bus_type),
>
> TP_STRUCT__entry(
> __string( dev_name, dev_name )
> @@ -349,10 +351,12 @@ TRACE_EVENT(aer_event,
> __field( u8, severity )
> __field( u8, tlp_header_valid)
> __array( u32, tlp_header, PCIE_STD_MAX_TLP_HEADERLOG)
> + __string( bus_type, bus_type )
> ),
>
> TP_fast_assign(
> __assign_str(dev_name);
> + __assign_str(bus_type);
> __entry->status = status;
> __entry->severity = severity;
> __entry->tlp_header_valid = tlp_header_valid;
> @@ -364,8 +368,8 @@ TRACE_EVENT(aer_event,
> }
> ),
>
> - TP_printk("%s PCIe Bus Error: severity=%s, %s, TLP Header=%s\n",
> - __get_str(dev_name),
> + TP_printk("%s %s Bus Error: severity=%s, %s, TLP Header=%s\n",
> + __get_str(dev_name), __get_str(bus_type),
> __entry->severity == AER_CORRECTABLE ? "Corrected" :
> __entry->severity == AER_FATAL ?
> "Fatal" : "Uncorrected, non-fatal",
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting
2026-01-14 18:20 ` [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting Terry Bowman
2026-01-14 19:48 ` Jonathan Cameron
@ 2026-01-14 21:06 ` Dave Jiang
2026-01-22 18:29 ` Bjorn Helgaas
2 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:06 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> Update the existing 'struct aer_err_info' definition to use kernel-doc
> formatting. Remove the inline comments to reduce noise and do not introduce
> functional changes. This will improve readability and maintainability.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v13->v14:
> - New commit
> ---
> drivers/pci/pci.h | 29 +++++++++++++++++++++++------
> 1 file changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 41ec38e82c08..dbc547db208a 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -724,16 +724,33 @@ static inline bool pci_dev_binding_disallowed(struct pci_dev *dev)
>
> #define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */
>
> +/**
> + * struct aer_err_info - AER Error Information
> + * @dev: Devices reporting error
> + * @ratelimit_print: Flag to log or not log the devices' error. 0=NotLog/1=Log
> + * @error_dev_num: Number of devices reporting an error
> + * @level: printk level to use in logging
> + * @id: Value from register PCI_ERR_ROOT_ERR_SRC
> + * @severity: AER severity, 0-UNCOR Non-fatal, 1-UNCOR fatal, 2-COR
> + * @root_ratelimit_print: Flag to log or not log the root's error. 0=NotLog/1=Log
> + * @multi_error_valid: If multiple errors are reported
> + * @first_error: First reported error
> + * @is_cxl: Bus type error: 0-PCI Bus error, 1-CXL Bus error
> + * @tlp_header_valid: Indicates if TLP field contains error information
> + * @status: COR/UNCOR error status
> + * @mask: COR/UNCOR mask
> + * @tlp: Transaction packet information
> + */
> struct aer_err_info {
> struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
> int ratelimit_print[AER_MAX_MULTI_ERR_DEVICES];
> int error_dev_num;
> - const char *level; /* printk level */
> + const char *level;
>
> unsigned int id:16;
>
> - unsigned int severity:2; /* 0:NONFATAL | 1:FATAL | 2:COR */
> - unsigned int root_ratelimit_print:1; /* 0=skip, 1=print */
> + unsigned int severity:2;
> + unsigned int root_ratelimit_print:1;
> unsigned int __pad1:4;
> unsigned int multi_error_valid:1;
>
> @@ -742,9 +759,9 @@ struct aer_err_info {
> unsigned int is_cxl:1;
> unsigned int tlp_header_valid:1;
>
> - unsigned int status; /* COR/UNCOR Error Status */
> - unsigned int mask; /* COR/UNCOR Error Mask */
> - struct pcie_tlp_log tlp; /* TLP Header */
> + unsigned int status;
> + unsigned int mask;
> + struct pcie_tlp_log tlp;
> };
>
> int aer_get_device_error_info(struct aer_err_info *info, int i);
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm()
2026-01-14 18:20 ` [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm() Terry Bowman
2026-01-14 19:49 ` Jonathan Cameron
@ 2026-01-14 21:08 ` Dave Jiang
2026-01-16 3:07 ` dan.j.williams
1 sibling, 1 reply; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:08 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> The convention for devm_ helpers in the CXL driver is that the first
> argument is the @host for the operation (locked driver::probe() context).
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
A nit below
>
> ---
>
> Changes in v13 -> v14:
> - New patch
> ---
> drivers/cxl/core/pmem.c | 13 +++++++------
> drivers/cxl/cxl.h | 3 ++-
> drivers/cxl/mem.c | 2 +-
> 3 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c
> index 8853415c106a..e7b1e6fa0ea0 100644
> --- a/drivers/cxl/core/pmem.c
> +++ b/drivers/cxl/core/pmem.c
> @@ -237,12 +237,13 @@ static void cxlmd_release_nvdimm(void *_cxlmd)
>
> /**
> * devm_cxl_add_nvdimm() - add a bridge between a cxl_memdev and an nvdimm
> - * @parent_port: parent port for the (to be added) @cxlmd endpoint port
> - * @cxlmd: cxl_memdev instance that will perform LIBNVDIMM operations
> + * @host: host device for devm operations
> + * @port: any port in the CXL topology to find the nvdimm-bridge device
> + * @cxlmd: parent of the to be created cxl_nvdimm device
> *
> * Return: 0 on success negative error code on failure.
> */
> -int devm_cxl_add_nvdimm(struct cxl_port *parent_port,
> +int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port,
s/port/parent_port/ to maintain clarity of the port
DJ
> struct cxl_memdev *cxlmd)
> {
> struct cxl_nvdimm_bridge *cxl_nvb;
> @@ -250,7 +251,7 @@ int devm_cxl_add_nvdimm(struct cxl_port *parent_port,
> struct device *dev;
> int rc;
>
> - cxl_nvb = cxl_find_nvdimm_bridge(parent_port);
> + cxl_nvb = cxl_find_nvdimm_bridge(port);
> if (!cxl_nvb)
> return -ENODEV;
>
> @@ -270,10 +271,10 @@ int devm_cxl_add_nvdimm(struct cxl_port *parent_port,
> if (rc)
> goto err;
>
> - dev_dbg(&cxlmd->dev, "register %s\n", dev_name(dev));
> + dev_dbg(host, "register %s\n", dev_name(dev));
>
> /* @cxlmd carries a reference on @cxl_nvb until cxlmd_release_nvdimm */
> - return devm_add_action_or_reset(&cxlmd->dev, cxlmd_release_nvdimm, cxlmd);
> + return devm_add_action_or_reset(host, cxlmd_release_nvdimm, cxlmd);
>
> err:
> put_device(dev);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 42a76a7a088f..6f3741a57932 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -887,7 +887,8 @@ struct cxl_nvdimm_bridge *devm_cxl_add_nvdimm_bridge(struct device *host,
> struct cxl_port *port);
> struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev);
> bool is_cxl_nvdimm(struct device *dev);
> -int devm_cxl_add_nvdimm(struct cxl_port *parent_port, struct cxl_memdev *cxlmd);
> +int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port,
> + struct cxl_memdev *cxlmd);
> struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_port *port);
>
> #ifdef CONFIG_CXL_REGION
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 6e6777b7bafb..c2ee7f7f6320 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -153,7 +153,7 @@ static int cxl_mem_probe(struct device *dev)
> }
>
> if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
> - rc = devm_cxl_add_nvdimm(parent_port, cxlmd);
> + rc = devm_cxl_add_nvdimm(dev, parent_port, cxlmd);
> if (rc) {
> if (rc == -ENODEV)
> dev_info(dev, "PMEM disabled by platform\n");
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers
2026-01-14 19:50 ` Jonathan Cameron
@ 2026-01-14 21:23 ` Dave Jiang
2026-01-16 3:15 ` dan.j.williams
1 sibling, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:23 UTC (permalink / raw)
To: Jonathan Cameron, Terry Bowman
Cc: dave, alison.schofield, dan.j.williams, bhelgaas, shiju.jose,
ming.li, Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On 1/14/26 12:50 PM, Jonathan Cameron wrote:
> On Wed, 14 Jan 2026 12:20:39 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
>
>> From: Dan Williams <dan.j.williams@intel.com>
>>
>> Now that cxl_switch_port_probe() no longer walks potential dports, because
>> they are enumerated dynamically on descendant endpoint arrival, remove the
>> dead code.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
>
> Patch description doesn't match patch.
Looks like it's the clean up of this commit that's upstream.
Fixes: 3f5b8f7f34f6 ("cxl/port: Remove devm_cxl_port_enumerate_dports()")
>
>>
>> ---
>>
>> Changes in v13 -> v14:
>> - New patch
>> ---
>> drivers/cxl/core/pci.c | 16 ++++++++--------
>> 1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
>> index b838c59d7a3c..0305a421504e 100644
>> --- a/drivers/cxl/core/pci.c
>> +++ b/drivers/cxl/core/pci.c
>> @@ -71,6 +71,14 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
>> }
>> EXPORT_SYMBOL_NS_GPL(__devm_cxl_add_dport_by_dev, "CXL");
>>
>> +struct cxl_walk_context {
>> + struct pci_bus *bus;
>> + struct cxl_port *port;
>> + int type;
>> + int error;
>> + int count;
>> +};
>> +
>> static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
>> {
>> struct pci_dev *pdev = to_pci_dev(cxlds->dev);
>> @@ -820,14 +828,6 @@ int cxl_gpf_port_setup(struct cxl_dport *dport)
>> return 0;
>> }
>>
>> -struct cxl_walk_context {
>> - struct pci_bus *bus;
>> - struct cxl_port *port;
>> - int type;
>> - int error;
>> - int count;
>> -};
>> -
>> static int count_dports(struct pci_dev *pdev, void *data)
>> {
>> struct cxl_walk_context *ctx = data;
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers
2026-01-14 18:20 ` [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers Terry Bowman
2026-01-14 19:50 ` Jonathan Cameron
@ 2026-01-14 21:24 ` Dave Jiang
2026-01-16 3:21 ` dan.j.williams
2 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:24 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Now that cxl_switch_port_probe() no longer walks potential dports, because
> they are enumerated dynamically on descendant endpoint arrival, remove the
> dead code.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Missing your sign off tag
>
> ---
>
> Changes in v13 -> v14:
> - New patch
> ---
> drivers/cxl/core/pci.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index b838c59d7a3c..0305a421504e 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -71,6 +71,14 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
> }
> EXPORT_SYMBOL_NS_GPL(__devm_cxl_add_dport_by_dev, "CXL");
>
> +struct cxl_walk_context {
> + struct pci_bus *bus;
> + struct cxl_port *port;
> + int type;
> + int error;
> + int count;
> +};
> +
> static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
> {
> struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> @@ -820,14 +828,6 @@ int cxl_gpf_port_setup(struct cxl_dport *dport)
> return 0;
> }
>
> -struct cxl_walk_context {
> - struct pci_bus *bus;
> - struct cxl_port *port;
> - int type;
> - int error;
> - int count;
> -};
> -
> static int count_dports(struct pci_dev *pdev, void *data)
> {
> struct cxl_walk_context *ctx = data;
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management
2026-01-14 18:20 ` [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management Terry Bowman
@ 2026-01-14 21:26 ` Dave Jiang
2026-01-15 14:46 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:26 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> With dport addition moving out of cxl_switch_port_probe() it is no longer
> the case that a single dport-add failure will cause all dport resources
> to be automatically unwound.
>
> devm still helps all dport resources get cleaned up when the port is
> detached, but setup now needs to avoid leaking resources if an early exit
> occurs during setup.
>
> Convert from a "devm add" model, to an "auto remove" model that makes the
> caller responsible for registering devm reclaim after the object is fully
> instantiated.
>
> As a side of effect of this reorganization port->nr_dports is now always
> consistent with the number of entries in the port->dports xarray, and this
> can stop playing games with ida_is_empty() which is unreliable as a
> detector of whether decoders are setup. I.e. consider how
> CONFIG_DEBUG_KOBJECT_RELEASE might wreak havoc with this approach.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Missing sign off tag
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v13 -> v14:
> - New patch
> ---
> drivers/cxl/acpi.c | 11 +-
> drivers/cxl/core/pci.c | 10 +-
> drivers/cxl/core/port.c | 225 ++++++++++++++++-----------
> drivers/cxl/cxl.h | 23 +--
> drivers/cxl/port.c | 8 +-
> tools/testing/cxl/Kbuild | 3 +-
> tools/testing/cxl/cxl_core_exports.c | 13 +-
> tools/testing/cxl/exports.h | 4 +-
> tools/testing/cxl/test/cxl.c | 6 +-
> tools/testing/cxl/test/mock.c | 25 ++-
> tools/testing/cxl/test/mock.h | 4 +-
> 11 files changed, 188 insertions(+), 144 deletions(-)
>
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 77ac940e3013..1e1383eb9bd5 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -679,16 +679,19 @@ static int add_host_bridge_dport(struct device *match, void *arg)
> if (ctx.cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11) {
> dev_dbg(match, "RCRB found for UID %lld: %pa\n", ctx.uid,
> &ctx.base);
> - dport = devm_cxl_add_rch_dport(root_port, bridge, ctx.uid,
> - ctx.base);
> + dport = cxl_add_rch_dport(root_port, bridge, ctx.uid, ctx.base);
> } else {
> - dport = devm_cxl_add_dport(root_port, bridge, ctx.uid,
> - CXL_RESOURCE_NONE);
> + dport = cxl_add_dport(root_port, bridge, ctx.uid,
> + CXL_RESOURCE_NONE);
> }
>
> if (IS_ERR(dport))
> return PTR_ERR(dport);
>
> + ret = cxl_dport_autoremove(dport);
> + if (ret)
> + return ret;
> +
> ret = get_genport_coordinates(match, dport);
> if (ret)
> dev_dbg(match, "Failed to get generic port perf coordinates.\n");
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 0305a421504e..512a3e29a095 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -41,14 +41,14 @@ static int pci_get_port_num(struct pci_dev *pdev)
> }
>
> /**
> - * __devm_cxl_add_dport_by_dev - allocate a dport by dport device
> + * __cxl_add_dport_by_dev - allocate a dport by dport device
> * @port: cxl_port that hosts the dport
> * @dport_dev: 'struct device' of the dport
> *
> * Returns the allocated dport on success or ERR_PTR() of -errno on error
> */
> -struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev)
> +struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
> + struct device *dport_dev)
> {
> struct cxl_register_map map;
> struct pci_dev *pdev;
> @@ -67,9 +67,9 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
> return ERR_PTR(rc);
>
> device_lock_assert(&port->dev);
> - return devm_cxl_add_dport(port, dport_dev, port_num, map.resource);
> + return cxl_add_dport(port, dport_dev, port_num, map.resource);
> }
> -EXPORT_SYMBOL_NS_GPL(__devm_cxl_add_dport_by_dev, "CXL");
> +EXPORT_SYMBOL_NS_GPL(__cxl_add_dport_by_dev, "CXL");
>
> struct cxl_walk_context {
> struct pci_bus *bus;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index fef3aa0c6680..a05a1812bb6e 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1051,7 +1051,8 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id)
> return NULL;
> }
>
> -static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
> +static struct cxl_dport *add_dport(struct cxl_port *port,
> + struct cxl_dport *dport)
> {
> struct cxl_dport *dup;
> int rc;
> @@ -1063,16 +1064,33 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
> "unable to add dport%d-%s non-unique port id (%s)\n",
> dport->port_id, dev_name(dport->dport_dev),
> dev_name(dup->dport_dev));
> - return -EBUSY;
> + return ERR_PTR(-EBUSY);
> + }
> +
> + /*
> + * Unlike CXL switch upstream ports where it can train a CXL link
> + * independent of its downstream ports, a host bridge upstream port may
> + * not enable CXL registers until at least one downstream port (root
> + * port) trains CXL. Enumerate registers once when the number of dports
> + * transitions from zero to one.
> + */
> + if (!port->nr_dports) {
> + rc = cxl_port_setup_regs(port, port->component_reg_phys);
> + if (rc)
> + return ERR_PTR(rc);
> }
>
> + /* Arrange for dport_dev to be valid through remove_dport() */
> + struct device *dev __free(put_device) = get_device(dport->dport_dev);
> +
> rc = xa_insert(&port->dports, (unsigned long)dport->dport_dev, dport,
> GFP_KERNEL);
> if (rc)
> - return rc;
> + return ERR_PTR(rc);
>
> + retain_and_null_ptr(dev);
> port->nr_dports++;
> - return 0;
> + return dport;
> }
>
> /*
> @@ -1094,51 +1112,32 @@ static void cond_cxl_root_unlock(struct cxl_port *port)
> device_unlock(&port->dev);
> }
>
> -static void cxl_dport_remove(void *data)
> +static void remove_dport(struct cxl_dport *dport)
> {
> - struct cxl_dport *dport = data;
> struct cxl_port *port = dport->port;
>
> + port->nr_dports--;
> xa_erase(&port->dports, (unsigned long) dport->dport_dev);
> put_device(dport->dport_dev);
> }
>
> -static void cxl_dport_unlink(void *data)
> -{
> - struct cxl_dport *dport = data;
> - struct cxl_port *port = dport->port;
> - char link_name[CXL_TARGET_STRLEN];
> +DEFINE_FREE(remove_dport, struct cxl_dport *,
> + if (!IS_ERR_OR_NULL(_T)) remove_dport(_T))
>
> - sprintf(link_name, "dport%d", dport->port_id);
> - sysfs_remove_link(&port->dev.kobj, link_name);
> -}
> -
> -static struct cxl_dport *
> -__devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> - int port_id, resource_size_t component_reg_phys,
> - resource_size_t rcrb)
> +static struct cxl_dport *__cxl_add_dport(struct cxl_port *port,
> + struct device *dport_dev, int port_id,
> + resource_size_t component_reg_phys,
> + resource_size_t rcrb)
> {
> char link_name[CXL_TARGET_STRLEN];
> - struct cxl_dport *dport;
> - struct device *host;
> int rc;
>
> - if (is_cxl_root(port))
> - host = port->uport_dev;
> - else
> - host = &port->dev;
> -
> - if (!host->driver) {
> - dev_WARN_ONCE(&port->dev, 1, "dport:%s bad devm context\n",
> - dev_name(dport_dev));
> - return ERR_PTR(-ENXIO);
> - }
> -
> if (snprintf(link_name, CXL_TARGET_STRLEN, "dport%d", port_id) >=
> CXL_TARGET_STRLEN)
> return ERR_PTR(-EINVAL);
>
> - dport = devm_kzalloc(host, sizeof(*dport), GFP_KERNEL);
> + struct cxl_dport *dport __free(kfree) =
> + kzalloc(sizeof(*dport), GFP_KERNEL);
> if (!dport)
> return ERR_PTR(-ENOMEM);
>
> @@ -1176,48 +1175,27 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> &component_reg_phys);
>
> cond_cxl_root_lock(port);
> - rc = add_dport(port, dport);
> + struct cxl_dport *dport_add __free(remove_dport) =
> + add_dport(port, dport);
> cond_cxl_root_unlock(port);
> - if (rc)
> - return ERR_PTR(rc);
> -
> - /*
> - * Setup port register if this is the first dport showed up. Having
> - * a dport also means that there is at least 1 active link.
> - */
> - if (port->nr_dports == 1 &&
> - port->component_reg_phys != CXL_RESOURCE_NONE) {
> - rc = cxl_port_setup_regs(port, port->component_reg_phys);
> - if (rc) {
> - xa_erase(&port->dports, (unsigned long)dport->dport_dev);
> - return ERR_PTR(rc);
> - }
> - port->component_reg_phys = CXL_RESOURCE_NONE;
> - }
> + if (IS_ERR(dport_add))
> + return dport_add;
>
> - get_device(dport_dev);
> - rc = devm_add_action_or_reset(host, cxl_dport_remove, dport);
> - if (rc)
> - return ERR_PTR(rc);
> + if (dev_is_pci(dport_dev))
> + dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
>
> rc = sysfs_create_link(&port->dev.kobj, &dport_dev->kobj, link_name);
> if (rc)
> return ERR_PTR(rc);
>
> - rc = devm_add_action_or_reset(host, cxl_dport_unlink, dport);
> - if (rc)
> - return ERR_PTR(rc);
> -
> - if (dev_is_pci(dport_dev))
> - dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
> -
> cxl_debugfs_create_dport_dir(dport);
>
> - return dport;
> + retain_and_null_ptr(dport_add);
> + return no_free_ptr(dport);
> }
>
> /**
> - * devm_cxl_add_dport - append VH downstream port data to a cxl_port
> + * cxl_add_dport - append VH downstream port data to a cxl_port
> * @port: the cxl_port that references this dport
> * @dport_dev: firmware or PCI device representing the dport
> * @port_id: identifier for this dport in a decoder's target list
> @@ -1227,14 +1205,13 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> * either the port's host (for root ports), or the port itself (for
> * switch ports)
> */
> -struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
> - struct device *dport_dev, int port_id,
> - resource_size_t component_reg_phys)
> +struct cxl_dport *cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> + int port_id, resource_size_t component_reg_phys)
> {
> struct cxl_dport *dport;
>
> - dport = __devm_cxl_add_dport(port, dport_dev, port_id,
> - component_reg_phys, CXL_RESOURCE_NONE);
> + dport = __cxl_add_dport(port, dport_dev, port_id, component_reg_phys,
> + CXL_RESOURCE_NONE);
> if (IS_ERR(dport)) {
> dev_dbg(dport_dev, "failed to add dport to %s: %ld\n",
> dev_name(&port->dev), PTR_ERR(dport));
> @@ -1245,10 +1222,10 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
>
> return dport;
> }
> -EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport, "CXL");
> +EXPORT_SYMBOL_NS_GPL(cxl_add_dport, "CXL");
>
> /**
> - * devm_cxl_add_rch_dport - append RCH downstream port data to a cxl_port
> + * cxl_add_rch_dport - append RCH downstream port data to a cxl_port
> * @port: the cxl_port that references this dport
> * @dport_dev: firmware or PCI device representing the dport
> * @port_id: identifier for this dport in a decoder's target list
> @@ -1256,9 +1233,9 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport, "CXL");
> *
> * See CXL 3.0 9.11.8 CXL Devices Attached to an RCH
> */
> -struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
> - struct device *dport_dev, int port_id,
> - resource_size_t rcrb)
> +struct cxl_dport *cxl_add_rch_dport(struct cxl_port *port,
> + struct device *dport_dev, int port_id,
> + resource_size_t rcrb)
> {
> struct cxl_dport *dport;
>
> @@ -1267,8 +1244,8 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
> return ERR_PTR(-EINVAL);
> }
>
> - dport = __devm_cxl_add_dport(port, dport_dev, port_id,
> - CXL_RESOURCE_NONE, rcrb);
> + dport = __cxl_add_dport(port, dport_dev, port_id, CXL_RESOURCE_NONE,
> + rcrb);
> if (IS_ERR(dport)) {
> dev_dbg(dport_dev, "failed to add RCH dport to %s: %ld\n",
> dev_name(&port->dev), PTR_ERR(dport));
> @@ -1279,7 +1256,7 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
>
> return dport;
> }
> -EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, "CXL");
> +EXPORT_SYMBOL_NS_GPL(cxl_add_rch_dport, "CXL");
>
> static int add_ep(struct cxl_ep *new)
> {
> @@ -1439,13 +1416,42 @@ static void delete_switch_port(struct cxl_port *port)
> devm_release_action(port->dev.parent, unregister_port, port);
> }
>
> +static void unlink_dport(void *data)
> +{
> + struct cxl_dport *dport = data;
> + struct cxl_port *port = dport->port;
> + char link_name[CXL_TARGET_STRLEN];
> +
> + sprintf(link_name, "dport%d", dport->port_id);
> + sysfs_remove_link(&port->dev.kobj, link_name);
> + remove_dport(dport);
> + kfree(dport);
> +}
> +
> +int cxl_dport_autoremove(struct cxl_dport *dport)
> +{
> + struct cxl_port *port = dport->port;
> + struct device *host;
> +
> + if (is_cxl_root(port))
> + host = port->uport_dev;
> + else
> + host = &port->dev;
> +
> + return devm_add_action_or_reset(host, unlink_dport, dport);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_dport_autoremove, "CXL");
> +
> +/*
> + * Note: this only services dynamic removal of mid-level ports, root ports are
> + * always removed by the platform driver (e.g. cxl_acpi). @host can be
> + * hard-coded to &port->dev.
> + */
> static void del_dport(struct cxl_dport *dport)
> {
> struct cxl_port *port = dport->port;
>
> - devm_release_action(&port->dev, cxl_dport_unlink, dport);
> - devm_release_action(&port->dev, cxl_dport_remove, dport);
> - devm_kfree(&port->dev, dport);
> + devm_release_action(&port->dev, unlink_dport, dport);
> }
>
> static void del_dports(struct cxl_port *port)
> @@ -1597,10 +1603,24 @@ static int update_decoder_targets(struct device *dev, void *data)
> return 0;
> }
>
> -DEFINE_FREE(del_cxl_dport, struct cxl_dport *, if (!IS_ERR_OR_NULL(_T)) del_dport(_T))
> +static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
> +{
> + if (!devres_open_group(&port->dev, port, GFP_KERNEL))
> + return ERR_PTR(-ENOMEM);
> + return port;
> +}
> +DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
> + if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
> +
> +static void cxl_port_group_close(struct cxl_port *port)
> +{
> + devres_remove_group(&port->dev, port);
> +}
> +
> static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> struct device *dport_dev)
> {
> + struct cxl_dport *new_dport;
> struct cxl_dport *dport;
> int rc;
>
> @@ -1615,29 +1635,46 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> return ERR_PTR(-EBUSY);
> }
>
> - struct cxl_dport *new_dport __free(del_cxl_dport) =
> - devm_cxl_add_dport_by_dev(port, dport_dev);
> - if (IS_ERR(new_dport))
> - return new_dport;
> -
> - cxl_switch_parse_cdat(new_dport);
> + /*
> + * With the first dport arrival it is now safe to start looking at
> + * component registers. Be careful to not strand resources if dport
> + * creation ultimately fails.
> + */
> + struct cxl_port *port_group __free(cxl_port_group_free) =
> + cxl_port_devres_group(port);
> + if (IS_ERR(port_group))
> + return ERR_CAST(port_group);
>
> - if (ida_is_empty(&port->decoder_ida)) {
> + if (port->nr_dports == 0) {
> rc = devm_cxl_switch_port_decoders_setup(port);
> if (rc)
> return ERR_PTR(rc);
> - dev_dbg(&port->dev, "first dport%d:%s added with decoders\n",
> - new_dport->port_id, dev_name(dport_dev));
> - return no_free_ptr(new_dport);
> + /*
> + * Note, when nr_dports returns to zero the port is unregistered
> + * and triggers cleanup. I.e. no need for open-coded release
> + * action on dport removal. See cxl_detach_ep() for that logic.
> + */
> }
>
> + new_dport = cxl_add_dport_by_dev(port, dport_dev);
> + if (IS_ERR(new_dport))
> + return new_dport;
> +
> + rc = cxl_dport_autoremove(new_dport);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + cxl_switch_parse_cdat(new_dport);
> +
> + cxl_port_group_close(no_free_ptr(port_group));
> +
> + dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
> + port->nr_dports - 1, new_dport->port_id, dev_name(dport_dev));
> +
> /* New dport added, update the decoder targets */
> device_for_each_child(&port->dev, new_dport, update_decoder_targets);
>
> - dev_dbg(&port->dev, "dport%d:%s added\n", new_dport->port_id,
> - dev_name(dport_dev));
> -
> - return no_free_ptr(new_dport);
> + return new_dport;
> }
>
> static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 6f3741a57932..47ee06c95433 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -796,12 +796,12 @@ struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
> struct cxl_dport **dport);
> bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
>
> -struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
> - struct device *dport, int port_id,
> - resource_size_t component_reg_phys);
> -struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
> - struct device *dport_dev, int port_id,
> - resource_size_t rcrb);
> +struct cxl_dport *cxl_add_dport(struct cxl_port *port, struct device *dport,
> + int port_id,
> + resource_size_t component_reg_phys);
> +struct cxl_dport *cxl_add_rch_dport(struct cxl_port *port,
> + struct device *dport_dev, int port_id,
> + resource_size_t rcrb);
>
> struct cxl_decoder *to_cxl_decoder(struct device *dev);
> struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
> @@ -824,6 +824,7 @@ static inline int cxl_root_decoder_autoremove(struct device *host,
> return cxl_decoder_autoremove(host, &cxlrd->cxlsd.cxld);
> }
> int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
> +int cxl_dport_autoremove(struct cxl_dport *dport);
>
> /**
> * struct cxl_endpoint_dvsec_info - Cached DVSEC info
> @@ -937,10 +938,10 @@ void cxl_coordinates_combine(struct access_coordinate *out,
> struct access_coordinate *c2);
>
> bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
> -struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev);
> -struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev);
> +struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
> + struct device *dport_dev);
> +struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
> + struct device *dport_dev);
>
> /*
> * Unit test builds overrides this to __weak, find the 'strong' version
> @@ -964,7 +965,7 @@ u16 cxl_gpf_get_dvsec(struct device *dev);
> */
> #ifndef CXL_TEST_ENABLE
> #define DECLARE_TESTABLE(x) __##x
> -#define devm_cxl_add_dport_by_dev DECLARE_TESTABLE(devm_cxl_add_dport_by_dev)
> +#define cxl_add_dport_by_dev DECLARE_TESTABLE(cxl_add_dport_by_dev)
> #define devm_cxl_switch_port_decoders_setup DECLARE_TESTABLE(devm_cxl_switch_port_decoders_setup)
> #endif
>
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 51c8f2f84717..167cc0a87484 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -59,8 +59,12 @@ static int discover_region(struct device *dev, void *unused)
>
> static int cxl_switch_port_probe(struct cxl_port *port)
> {
> - /* Reset nr_dports for rebind of driver */
> - port->nr_dports = 0;
> + /*
> + * Unfortunately, typical driver operations like "find and map
> + * registers", can not be done at port device attach time and must wait
> + * for dport arrival. See cxl_port_add_dport() and the comments in
> + * add_dport() for details.
> + */
>
> /* Cache the data early to ensure is_visible() works */
> read_cdat_data(port);
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index 6eceefefb0e0..4d740392aac5 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -5,7 +5,8 @@ ldflags-y += --wrap=acpi_evaluate_integer
> ldflags-y += --wrap=acpi_pci_find_root
> ldflags-y += --wrap=nvdimm_bus_register
> ldflags-y += --wrap=cxl_await_media_ready
> -ldflags-y += --wrap=devm_cxl_add_rch_dport
> +ldflags-y += --wrap=cxl_add_rch_dport
> +ldflags-y += --wrap=cxl_rcd_component_reg_phys
> ldflags-y += --wrap=cxl_endpoint_parse_cdat
> ldflags-y += --wrap=cxl_dport_init_ras_reporting
> ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
> diff --git a/tools/testing/cxl/cxl_core_exports.c b/tools/testing/cxl/cxl_core_exports.c
> index 6754de35598d..02d479867a12 100644
> --- a/tools/testing/cxl/cxl_core_exports.c
> +++ b/tools/testing/cxl/cxl_core_exports.c
> @@ -7,16 +7,15 @@
> /* Exporting of cxl_core symbols that are only used by cxl_test */
> EXPORT_SYMBOL_NS_GPL(cxl_num_decoders_committed, "CXL");
>
> -cxl_add_dport_by_dev_fn _devm_cxl_add_dport_by_dev =
> - __devm_cxl_add_dport_by_dev;
> -EXPORT_SYMBOL_NS_GPL(_devm_cxl_add_dport_by_dev, "CXL");
> +cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
> +EXPORT_SYMBOL_NS_GPL(_cxl_add_dport_by_dev, "CXL");
>
> -struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev)
> +struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
> + struct device *dport_dev)
> {
> - return _devm_cxl_add_dport_by_dev(port, dport_dev);
> + return _cxl_add_dport_by_dev(port, dport_dev);
> }
> -EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport_by_dev, "CXL");
> +EXPORT_SYMBOL_NS_GPL(cxl_add_dport_by_dev, "CXL");
>
> cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup =
> __devm_cxl_switch_port_decoders_setup;
> diff --git a/tools/testing/cxl/exports.h b/tools/testing/cxl/exports.h
> index 7ebee7c0bd67..cbb16073be18 100644
> --- a/tools/testing/cxl/exports.h
> +++ b/tools/testing/cxl/exports.h
> @@ -4,8 +4,8 @@
> #define __MOCK_CXL_EXPORTS_H_
>
> typedef struct cxl_dport *(*cxl_add_dport_by_dev_fn)(struct cxl_port *port,
> - struct device *dport_dev);
> -extern cxl_add_dport_by_dev_fn _devm_cxl_add_dport_by_dev;
> + struct device *dport_dev);
> +extern cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev;
>
> typedef int(*cxl_switch_decoders_setup_fn)(struct cxl_port *port);
> extern cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup;
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 81e2aef3627a..b7a2b550c0b0 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -1060,8 +1060,8 @@ static struct cxl_dport *mock_cxl_add_dport_by_dev(struct cxl_port *port,
> if (&pdev->dev != dport_dev)
> continue;
>
> - return devm_cxl_add_dport(port, &pdev->dev, pdev->id,
> - CXL_RESOURCE_NONE);
> + return cxl_add_dport(port, &pdev->dev, pdev->id,
> + CXL_RESOURCE_NONE);
> }
>
> return ERR_PTR(-ENODEV);
> @@ -1126,9 +1126,9 @@ static struct cxl_mock_ops cxl_mock_ops = {
> .devm_cxl_switch_port_decoders_setup = mock_cxl_switch_port_decoders_setup,
> .devm_cxl_endpoint_decoders_setup = mock_cxl_endpoint_decoders_setup,
> .cxl_endpoint_parse_cdat = mock_cxl_endpoint_parse_cdat,
> - .devm_cxl_add_dport_by_dev = mock_cxl_add_dport_by_dev,
> .hmat_get_extended_linear_cache_size =
> mock_hmat_get_extended_linear_cache_size,
> + .cxl_add_dport_by_dev = mock_cxl_add_dport_by_dev,
> .list = LIST_HEAD_INIT(cxl_mock_ops.list),
> };
>
> diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> index 44bce80ef3ff..660e8402189c 100644
> --- a/tools/testing/cxl/test/mock.c
> +++ b/tools/testing/cxl/test/mock.c
> @@ -15,14 +15,13 @@
> static LIST_HEAD(mock);
>
> static struct cxl_dport *
> -redirect_devm_cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev);
> +redirect_cxl_add_dport_by_dev(struct cxl_port *port, struct device *dport_dev);
> static int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port);
>
> void register_cxl_mock_ops(struct cxl_mock_ops *ops)
> {
> list_add_rcu(&ops->list, &mock);
> - _devm_cxl_add_dport_by_dev = redirect_devm_cxl_add_dport_by_dev;
> + _cxl_add_dport_by_dev = redirect_cxl_add_dport_by_dev;
> _devm_cxl_switch_port_decoders_setup =
> redirect_devm_cxl_switch_port_decoders_setup;
> }
> @@ -34,7 +33,7 @@ void unregister_cxl_mock_ops(struct cxl_mock_ops *ops)
> {
> _devm_cxl_switch_port_decoders_setup =
> __devm_cxl_switch_port_decoders_setup;
> - _devm_cxl_add_dport_by_dev = __devm_cxl_add_dport_by_dev;
> + _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
> list_del_rcu(&ops->list);
> synchronize_srcu(&cxl_mock_srcu);
> }
> @@ -207,7 +206,7 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, "CXL");
>
> -struct cxl_dport *__wrap_devm_cxl_add_rch_dport(struct cxl_port *port,
> +struct cxl_dport *__wrap_cxl_add_rch_dport(struct cxl_port *port,
> struct device *dport_dev,
> int port_id,
> resource_size_t rcrb)
> @@ -217,19 +216,19 @@ struct cxl_dport *__wrap_devm_cxl_add_rch_dport(struct cxl_port *port,
> struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
>
> if (ops && ops->is_mock_port(dport_dev)) {
> - dport = devm_cxl_add_dport(port, dport_dev, port_id,
> - CXL_RESOURCE_NONE);
> + dport = cxl_add_dport(port, dport_dev, port_id,
> + CXL_RESOURCE_NONE);
> if (!IS_ERR(dport)) {
> dport->rcrb.base = rcrb;
> dport->rch = true;
> }
> } else
> - dport = devm_cxl_add_rch_dport(port, dport_dev, port_id, rcrb);
> + dport = cxl_add_rch_dport(port, dport_dev, port_id, rcrb);
> put_cxl_mock_ops(index);
>
> return dport;
> }
> -EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_add_rch_dport, "CXL");
> +EXPORT_SYMBOL_NS_GPL(__wrap_cxl_add_rch_dport, "CXL");
>
> void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
> {
> @@ -257,17 +256,17 @@ void __wrap_cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device
> }
> EXPORT_SYMBOL_NS_GPL(__wrap_cxl_dport_init_ras_reporting, "CXL");
>
> -struct cxl_dport *redirect_devm_cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev)
> +struct cxl_dport *redirect_cxl_add_dport_by_dev(struct cxl_port *port,
> + struct device *dport_dev)
> {
> int index;
> struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> struct cxl_dport *dport;
>
> if (ops && ops->is_mock_port(port->uport_dev))
> - dport = ops->devm_cxl_add_dport_by_dev(port, dport_dev);
> + dport = ops->cxl_add_dport_by_dev(port, dport_dev);
> else
> - dport = __devm_cxl_add_dport_by_dev(port, dport_dev);
> + dport = __cxl_add_dport_by_dev(port, dport_dev);
> put_cxl_mock_ops(index);
>
> return dport;
> diff --git a/tools/testing/cxl/test/mock.h b/tools/testing/cxl/test/mock.h
> index 2684b89c8aa2..fa13aca4e260 100644
> --- a/tools/testing/cxl/test/mock.h
> +++ b/tools/testing/cxl/test/mock.h
> @@ -22,8 +22,8 @@ struct cxl_mock_ops {
> int (*devm_cxl_switch_port_decoders_setup)(struct cxl_port *port);
> int (*devm_cxl_endpoint_decoders_setup)(struct cxl_port *port);
> void (*cxl_endpoint_parse_cdat)(struct cxl_port *port);
> - struct cxl_dport *(*devm_cxl_add_dport_by_dev)(struct cxl_port *port,
> - struct device *dport_dev);
> + struct cxl_dport *(*cxl_add_dport_by_dev)(struct cxl_port *port,
> + struct device *dport_dev);
> int (*hmat_get_extended_linear_cache_size)(struct resource *backing_res,
> int nid,
> resource_size_t *cache_size);
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 20/34] cxl/port: Move dport operations to a driver event
2026-01-14 18:20 ` [PATCH v14 20/34] cxl/port: Move dport operations to a driver event Terry Bowman
@ 2026-01-14 21:45 ` Dave Jiang
2026-01-15 14:56 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:45 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> In preparation for adding more register setup to the cxl_port_add_dport()
> path (for RAS register mapping), move the dport creation event to a driver
> callback. This achieves 2 things it puts driver operations logically where
> they belong, in a driver, and it obviates the gymnastics of
> DECLARE_TESTABLE() which just makes a mess of grepping for CXL symbols.
>
> In other words, a driver callback is less of an ongoing maintenance burden
> than this DECLARE_TESTABLE arrangement that does not scale and diminishes
> the grep-ability of the codebase.
>
> cxl_port_add_dport() moves mostly unmodified from drivers/cxl/core/port.c.
> The only deliberate change is that it now assumes that the device_lock is
> held on entry and the driver is attached (just like cxl_port_probe()).
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Missing sign off
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v13 -> v14:
> - New patch
> ---
> drivers/cxl/core/hdm.c | 6 +--
> drivers/cxl/core/pci.c | 8 +--
> drivers/cxl/core/port.c | 79 ++++++----------------------
> drivers/cxl/cxl.h | 23 ++------
> drivers/cxl/port.c | 71 +++++++++++++++++++++++++
> tools/testing/cxl/Kbuild | 2 +
> tools/testing/cxl/cxl_core_exports.c | 21 --------
> tools/testing/cxl/exports.h | 13 -----
> tools/testing/cxl/test/mock.c | 23 +++-----
> 9 files changed, 107 insertions(+), 139 deletions(-)
> delete mode 100644 tools/testing/cxl/exports.h
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 1c5d2022c87a..365b02b7a241 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -1219,12 +1219,12 @@ static int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
> }
>
> /**
> - * __devm_cxl_switch_port_decoders_setup - allocate and setup switch decoders
> + * devm_cxl_switch_port_decoders_setup - allocate and setup switch decoders
> * @port: CXL port context
> *
> * Return 0 or -errno on error
> */
> -int __devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
> +int devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
> {
> struct cxl_hdm *cxlhdm;
>
> @@ -1248,7 +1248,7 @@ int __devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
> dev_err(&port->dev, "HDM decoder capability not found\n");
> return -ENXIO;
> }
> -EXPORT_SYMBOL_NS_GPL(__devm_cxl_switch_port_decoders_setup, "CXL");
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_switch_port_decoders_setup, "CXL");
>
> /**
> * devm_cxl_endpoint_decoders_setup - allocate and setup endpoint decoders
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 512a3e29a095..8633bfdef38d 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -41,14 +41,14 @@ static int pci_get_port_num(struct pci_dev *pdev)
> }
>
> /**
> - * __cxl_add_dport_by_dev - allocate a dport by dport device
> + * cxl_add_dport_by_dev - allocate a dport by dport device
> * @port: cxl_port that hosts the dport
> * @dport_dev: 'struct device' of the dport
> *
> * Returns the allocated dport on success or ERR_PTR() of -errno on error
> */
> -struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev)
> +struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
> + struct device *dport_dev)
> {
> struct cxl_register_map map;
> struct pci_dev *pdev;
> @@ -69,7 +69,7 @@ struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
> device_lock_assert(&port->dev);
> return cxl_add_dport(port, dport_dev, port_num, map.resource);
> }
> -EXPORT_SYMBOL_NS_GPL(__cxl_add_dport_by_dev, "CXL");
> +EXPORT_SYMBOL_NS_GPL(cxl_add_dport_by_dev, "CXL");
>
> struct cxl_walk_context {
> struct pci_bus *bus;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index a05a1812bb6e..2184c20af011 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1603,78 +1603,31 @@ static int update_decoder_targets(struct device *dev, void *data)
> return 0;
> }
>
> -static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
> +void cxl_port_update_decoder_targets(struct cxl_port *port,
> + struct cxl_dport *dport)
> {
> - if (!devres_open_group(&port->dev, port, GFP_KERNEL))
> - return ERR_PTR(-ENOMEM);
> - return port;
> + device_for_each_child(&port->dev, dport, update_decoder_targets);
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_port_update_decoder_targets, "CXL");
> +
> DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
> if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
>
> -static void cxl_port_group_close(struct cxl_port *port)
> -{
> - devres_remove_group(&port->dev, port);
> -}
> -
> -static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> - struct device *dport_dev)
> +static struct cxl_dport *probe_dport(struct cxl_port *port,
> + struct device *dport_dev)
> {
> - struct cxl_dport *new_dport;
> - struct cxl_dport *dport;
> - int rc;
> + struct cxl_driver *drv;
>
> device_lock_assert(&port->dev);
> if (!port->dev.driver)
> return ERR_PTR(-ENXIO);
>
> - dport = cxl_find_dport_by_dev(port, dport_dev);
> - if (dport) {
> - dev_dbg(&port->dev, "dport%d:%s already exists\n",
> - dport->port_id, dev_name(dport_dev));
> - return ERR_PTR(-EBUSY);
> - }
> -
> - /*
> - * With the first dport arrival it is now safe to start looking at
> - * component registers. Be careful to not strand resources if dport
> - * creation ultimately fails.
> - */
> - struct cxl_port *port_group __free(cxl_port_group_free) =
> - cxl_port_devres_group(port);
> - if (IS_ERR(port_group))
> - return ERR_CAST(port_group);
> -
> - if (port->nr_dports == 0) {
> - rc = devm_cxl_switch_port_decoders_setup(port);
> - if (rc)
> - return ERR_PTR(rc);
> - /*
> - * Note, when nr_dports returns to zero the port is unregistered
> - * and triggers cleanup. I.e. no need for open-coded release
> - * action on dport removal. See cxl_detach_ep() for that logic.
> - */
> - }
> -
> - new_dport = cxl_add_dport_by_dev(port, dport_dev);
> - if (IS_ERR(new_dport))
> - return new_dport;
> -
> - rc = cxl_dport_autoremove(new_dport);
> - if (rc)
> - return ERR_PTR(rc);
> -
> - cxl_switch_parse_cdat(new_dport);
> -
> - cxl_port_group_close(no_free_ptr(port_group));
> -
> - dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
> - port->nr_dports - 1, new_dport->port_id, dev_name(dport_dev));
> -
> - /* New dport added, update the decoder targets */
> - device_for_each_child(&port->dev, new_dport, update_decoder_targets);
> + drv = container_of(port->dev.driver, struct cxl_driver, drv);
> + if (!drv->add_dport)
> + return ERR_PTR(-ENXIO);
>
> - return new_dport;
> + /* see cxl_port_add_dport() */
> + return drv->add_dport(port, dport_dev);
> }
>
> static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
> @@ -1721,7 +1674,7 @@ static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
> }
>
> guard(device)(&port->dev);
> - return cxl_port_add_dport(port, dport_dev);
> + return probe_dport(port, dport_dev);
> }
>
> static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> @@ -1753,7 +1706,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> scoped_guard(device, &parent_port->dev) {
> parent_dport = cxl_find_dport_by_dev(parent_port, dparent);
> if (!parent_dport) {
> - parent_dport = cxl_port_add_dport(parent_port, dparent);
> + parent_dport = probe_dport(parent_port, dparent);
> if (IS_ERR(parent_dport))
> return PTR_ERR(parent_dport);
> }
> @@ -1789,7 +1742,7 @@ static struct cxl_dport *find_or_add_dport(struct cxl_port *port,
> device_lock_assert(&port->dev);
> dport = cxl_find_dport_by_dev(port, dport_dev);
> if (!dport) {
> - dport = cxl_port_add_dport(port, dport_dev);
> + dport = probe_dport(port, dport_dev);
> if (IS_ERR(dport))
> return dport;
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 47ee06c95433..46491046f101 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -841,8 +841,9 @@ struct cxl_endpoint_dvsec_info {
> };
>
> int devm_cxl_switch_port_decoders_setup(struct cxl_port *port);
> -int __devm_cxl_switch_port_decoders_setup(struct cxl_port *port);
> int devm_cxl_endpoint_decoders_setup(struct cxl_port *port);
> +void cxl_port_update_decoder_targets(struct cxl_port *port,
> + struct cxl_dport *dport);
>
> struct cxl_dev_state;
> int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
> @@ -856,6 +857,8 @@ struct cxl_driver {
> const char *name;
> int (*probe)(struct device *dev);
> void (*remove)(struct device *dev);
> + struct cxl_dport *(*add_dport)(struct cxl_port *port,
> + struct device *dport_dev);
> struct device_driver drv;
> int id;
> };
> @@ -940,8 +943,6 @@ void cxl_coordinates_combine(struct access_coordinate *out,
> bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
> struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
> struct device *dport_dev);
> -struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev);
>
> /*
> * Unit test builds overrides this to __weak, find the 'strong' version
> @@ -953,20 +954,4 @@ struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
>
> u16 cxl_gpf_get_dvsec(struct device *dev);
>
> -/*
> - * Declaration for functions that are mocked by cxl_test that are called by
> - * cxl_core. The respective functions are defined as __foo() and called by
> - * cxl_core as foo(). The macros below ensures that those functions would
> - * exist as foo(). See tools/testing/cxl/cxl_core_exports.c and
> - * tools/testing/cxl/exports.h for setting up the mock functions. The dance
> - * is done to avoid a circular dependency where cxl_core calls a function that
> - * ends up being a mock function and goes to * cxl_test where it calls a
> - * cxl_core function.
> - */
> -#ifndef CXL_TEST_ENABLE
> -#define DECLARE_TESTABLE(x) __##x
> -#define cxl_add_dport_by_dev DECLARE_TESTABLE(cxl_add_dport_by_dev)
> -#define devm_cxl_switch_port_decoders_setup DECLARE_TESTABLE(devm_cxl_switch_port_decoders_setup)
> -#endif
> -
> #endif /* __CXL_H__ */
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 167cc0a87484..2770bc8520d3 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -155,9 +155,80 @@ static const struct attribute_group *cxl_port_attribute_groups[] = {
> NULL,
> };
>
> +static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
> +{
> + if (!devres_open_group(&port->dev, port, GFP_KERNEL))
> + return ERR_PTR(-ENOMEM);
> + return port;
> +}
> +DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
> + if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
> +
> +static void cxl_port_group_close(struct cxl_port *port)
> +{
> + devres_remove_group(&port->dev, port);
> +}
> +
> +static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> + struct device *dport_dev)
> +{
> + struct cxl_dport *new_dport;
> + struct cxl_dport *dport;
> + int rc;
> +
> + dport = cxl_find_dport_by_dev(port, dport_dev);
> + if (dport) {
> + dev_dbg(&port->dev, "dport%d:%s already exists\n",
> + dport->port_id, dev_name(dport_dev));
> + return ERR_PTR(-EBUSY);
> + }
> +
> + /*
> + * With the first dport arrival it is now safe to start looking at
> + * component registers. Be careful to not strand resources if dport
> + * creation ultimately fails.
> + */
> + struct cxl_port *port_group __free(cxl_port_group_free) =
> + cxl_port_devres_group(port);
> + if (IS_ERR(port_group))
> + return ERR_CAST(port_group);
> +
> + if (port->nr_dports == 0) {
> + rc = devm_cxl_switch_port_decoders_setup(port);
> + if (rc)
> + return ERR_PTR(rc);
> + /*
> + * Note, when nr_dports returns to zero the port is unregistered
> + * and triggers cleanup. I.e. no need for open-coded release
> + * action on dport removal. See cxl_detach_ep() for that logic.
> + */
> + }
> +
> + new_dport = cxl_add_dport_by_dev(port, dport_dev);
> + if (IS_ERR(new_dport))
> + return new_dport;
> +
> + rc = cxl_dport_autoremove(new_dport);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + cxl_switch_parse_cdat(new_dport);
> +
> + cxl_port_group_close(no_free_ptr(port_group));
> +
> + dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
> + port->nr_dports - 1, new_dport->port_id, dev_name(dport_dev));
> +
> + /* New dport added, update the decoder targets */
> + cxl_port_update_decoder_targets(port, new_dport);
> +
> + return new_dport;
> +}
> +
> static struct cxl_driver cxl_port_driver = {
> .name = "cxl_port",
> .probe = cxl_port_probe,
> + .add_dport = cxl_port_add_dport,
> .id = CXL_DEVICE_PORT,
> .drv = {
> .dev_groups = cxl_port_attribute_groups,
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index 4d740392aac5..25516728535e 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -11,6 +11,8 @@ ldflags-y += --wrap=cxl_endpoint_parse_cdat
> ldflags-y += --wrap=cxl_dport_init_ras_reporting
> ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
> ldflags-y += --wrap=hmat_get_extended_linear_cache_size
> +ldflags-y += --wrap=cxl_add_dport_by_dev
> +ldflags-y += --wrap=devm_cxl_switch_port_decoders_setup
>
> DRIVERS := ../../../drivers
> CXL_SRC := $(DRIVERS)/cxl
> diff --git a/tools/testing/cxl/cxl_core_exports.c b/tools/testing/cxl/cxl_core_exports.c
> index 02d479867a12..f088792a8925 100644
> --- a/tools/testing/cxl/cxl_core_exports.c
> +++ b/tools/testing/cxl/cxl_core_exports.c
> @@ -2,27 +2,6 @@
> /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
>
> #include "cxl.h"
> -#include "exports.h"
>
> /* Exporting of cxl_core symbols that are only used by cxl_test */
> EXPORT_SYMBOL_NS_GPL(cxl_num_decoders_committed, "CXL");
> -
> -cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
> -EXPORT_SYMBOL_NS_GPL(_cxl_add_dport_by_dev, "CXL");
> -
> -struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev)
> -{
> - return _cxl_add_dport_by_dev(port, dport_dev);
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_add_dport_by_dev, "CXL");
> -
> -cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup =
> - __devm_cxl_switch_port_decoders_setup;
> -EXPORT_SYMBOL_NS_GPL(_devm_cxl_switch_port_decoders_setup, "CXL");
> -
> -int devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
> -{
> - return _devm_cxl_switch_port_decoders_setup(port);
> -}
> -EXPORT_SYMBOL_NS_GPL(devm_cxl_switch_port_decoders_setup, "CXL");
> diff --git a/tools/testing/cxl/exports.h b/tools/testing/cxl/exports.h
> deleted file mode 100644
> index cbb16073be18..000000000000
> --- a/tools/testing/cxl/exports.h
> +++ /dev/null
> @@ -1,13 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -/* Copyright(c) 2025 Intel Corporation */
> -#ifndef __MOCK_CXL_EXPORTS_H_
> -#define __MOCK_CXL_EXPORTS_H_
> -
> -typedef struct cxl_dport *(*cxl_add_dport_by_dev_fn)(struct cxl_port *port,
> - struct device *dport_dev);
> -extern cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev;
> -
> -typedef int(*cxl_switch_decoders_setup_fn)(struct cxl_port *port);
> -extern cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup;
> -
> -#endif
> diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> index 660e8402189c..10140a4c5fac 100644
> --- a/tools/testing/cxl/test/mock.c
> +++ b/tools/testing/cxl/test/mock.c
> @@ -10,20 +10,12 @@
> #include <cxlmem.h>
> #include <cxlpci.h>
> #include "mock.h"
> -#include "../exports.h"
>
> static LIST_HEAD(mock);
>
> -static struct cxl_dport *
> -redirect_cxl_add_dport_by_dev(struct cxl_port *port, struct device *dport_dev);
> -static int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port);
> -
> void register_cxl_mock_ops(struct cxl_mock_ops *ops)
> {
> list_add_rcu(&ops->list, &mock);
> - _cxl_add_dport_by_dev = redirect_cxl_add_dport_by_dev;
> - _devm_cxl_switch_port_decoders_setup =
> - redirect_devm_cxl_switch_port_decoders_setup;
> }
> EXPORT_SYMBOL_GPL(register_cxl_mock_ops);
>
> @@ -31,9 +23,6 @@ DEFINE_STATIC_SRCU(cxl_mock_srcu);
>
> void unregister_cxl_mock_ops(struct cxl_mock_ops *ops)
> {
> - _devm_cxl_switch_port_decoders_setup =
> - __devm_cxl_switch_port_decoders_setup;
> - _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
> list_del_rcu(&ops->list);
> synchronize_srcu(&cxl_mock_srcu);
> }
> @@ -162,7 +151,7 @@ __wrap_nvdimm_bus_register(struct device *dev,
> }
> EXPORT_SYMBOL_GPL(__wrap_nvdimm_bus_register);
>
> -int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
> +int __wrap_devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
> {
> int rc, index;
> struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> @@ -170,11 +159,12 @@ int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port)
> if (ops && ops->is_mock_port(port->uport_dev))
> rc = ops->devm_cxl_switch_port_decoders_setup(port);
> else
> - rc = __devm_cxl_switch_port_decoders_setup(port);
> + rc = devm_cxl_switch_port_decoders_setup(port);
> put_cxl_mock_ops(index);
>
> return rc;
> }
> +EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_switch_port_decoders_setup, "CXL");
>
> int __wrap_devm_cxl_endpoint_decoders_setup(struct cxl_port *port)
> {
> @@ -256,8 +246,8 @@ void __wrap_cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device
> }
> EXPORT_SYMBOL_NS_GPL(__wrap_cxl_dport_init_ras_reporting, "CXL");
>
> -struct cxl_dport *redirect_cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev)
> +struct cxl_dport *__wrap_cxl_add_dport_by_dev(struct cxl_port *port,
> + struct device *dport_dev)
> {
> int index;
> struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> @@ -266,11 +256,12 @@ struct cxl_dport *redirect_cxl_add_dport_by_dev(struct cxl_port *port,
> if (ops && ops->is_mock_port(port->uport_dev))
> dport = ops->cxl_add_dport_by_dev(port, dport_dev);
> else
> - dport = __cxl_add_dport_by_dev(port, dport_dev);
> + dport = cxl_add_dport_by_dev(port, dport_dev);
> put_cxl_mock_ops(index);
>
> return dport;
> }
> +EXPORT_SYMBOL_NS_GPL(__wrap_cxl_add_dport_by_dev, "CXL");
>
> MODULE_LICENSE("GPL v2");
> MODULE_DESCRIPTION("cxl_test: emulation module");
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 21/34] cxl/port: Move dport RAS reporting to a port resource
2026-01-14 18:20 ` [PATCH v14 21/34] cxl/port: Move dport RAS reporting to a port resource Terry Bowman
@ 2026-01-14 21:47 ` Dave Jiang
2026-01-15 15:02 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:47 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Towards the end goal of making all CXL RAS capability handling uniform
> across upstream host bridges, upstream switch ports, and upstream endpoint
> ports, move dport RAS setup to cxl_endpoint_port_probe(). Rename the RAS
> setup helper to devm_cxl_dport_ras_setup() for symmetry with
> devm_cxl_switch_port_decoders_setup().
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
missing sign off tag
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v13 -> v14:
> - New patch
> ---
> drivers/cxl/core/ras.c | 12 ++++++------
> drivers/cxl/cxlpci.h | 8 ++++----
> drivers/cxl/mem.c | 2 --
> drivers/cxl/port.c | 12 ++++++++++++
> tools/testing/cxl/Kbuild | 2 +-
> tools/testing/cxl/test/mock.c | 6 +++---
> 6 files changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 72908f3ced77..d71fcac31cf2 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -139,17 +139,17 @@ static void cxl_dport_map_ras(struct cxl_dport *dport)
> }
>
> /**
> - * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
> + * devm_cxl_dport_ras_setup - Setup CXL RAS report on this dport
> * @dport: the cxl_dport that needs to be initialized
> - * @host: host device for devm operations
> */
> -void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
> +void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
> {
> - dport->reg_map.host = host;
> + dport->reg_map.host = &dport->port->dev;
> cxl_dport_map_ras(dport);
>
> if (dport->rch) {
> - struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
> + struct pci_host_bridge *host_bridge =
> + to_pci_host_bridge(dport->dport_dev);
>
> if (!host_bridge->native_aer)
> return;
> @@ -158,7 +158,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
> cxl_disable_rch_root_ints(dport);
> }
> }
> -EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_ras_setup, "CXL");
>
> void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
> {
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 6f9c78886fd9..e41bb93d583a 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -81,7 +81,7 @@ void read_cdat_data(struct cxl_port *port);
> void cxl_cor_error_detected(struct pci_dev *pdev);
> pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> pci_channel_state_t state);
> -void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
> +void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
> #else
> static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
>
> @@ -90,9 +90,9 @@ static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> {
> return PCI_ERS_RESULT_NONE;
> }
> -
> -static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
> - struct device *host) { }
> +static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
> +{
> +}
> #endif
>
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index c2ee7f7f6320..e25c33f8c6cf 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -166,8 +166,6 @@ static int cxl_mem_probe(struct device *dev)
> else
> endpoint_parent = &parent_port->dev;
>
> - cxl_dport_init_ras_reporting(dport, dev);
> -
> scoped_guard(device, endpoint_parent) {
> if (!endpoint_parent->driver) {
> dev_err(dev, "CXL port topology %s not enabled\n",
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 2770bc8520d3..8f8fc98c1428 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -75,6 +75,7 @@ static int cxl_switch_port_probe(struct cxl_port *port)
> static int cxl_endpoint_port_probe(struct cxl_port *port)
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
> + struct cxl_dport *dport = port->parent_dport;
> int rc;
>
> /* Cache the data early to ensure is_visible() works */
> @@ -90,6 +91,17 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> if (rc)
> return rc;
>
> + /*
> + * With VH (CXL Virtual Host) topology the cxl_port::add_dport() method
> + * handles RAS setup for downstream ports. With RCH (CXL Restricted CXL
> + * Host) topologies the downstream port is enumerated early by platform
> + * firmware, but the RCRB (root complex register block) is not mapped
> + * until after the cxl_pci driver attaches to the RCIeP (root complex
> + * integrated endpoint).
> + */
> + if (dport->rch)
> + devm_cxl_dport_ras_setup(dport);
> +
> /*
> * Now that all endpoint decoders are successfully enumerated, try to
> * assemble regions from committed decoders
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index 25516728535e..7250bedf0448 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -8,7 +8,7 @@ ldflags-y += --wrap=cxl_await_media_ready
> ldflags-y += --wrap=cxl_add_rch_dport
> ldflags-y += --wrap=cxl_rcd_component_reg_phys
> ldflags-y += --wrap=cxl_endpoint_parse_cdat
> -ldflags-y += --wrap=cxl_dport_init_ras_reporting
> +ldflags-y += --wrap=devm_cxl_dport_ras_setup
> ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
> ldflags-y += --wrap=hmat_get_extended_linear_cache_size
> ldflags-y += --wrap=cxl_add_dport_by_dev
> diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> index 10140a4c5fac..8883357ee50d 100644
> --- a/tools/testing/cxl/test/mock.c
> +++ b/tools/testing/cxl/test/mock.c
> @@ -234,17 +234,17 @@ void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
> }
> EXPORT_SYMBOL_NS_GPL(__wrap_cxl_endpoint_parse_cdat, "CXL");
>
> -void __wrap_cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
> +void __wrap_devm_cxl_dport_ras_setup(struct cxl_dport *dport)
> {
> int index;
> struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
>
> if (!ops || !ops->is_mock_port(dport->dport_dev))
> - cxl_dport_init_ras_reporting(dport, host);
> + devm_cxl_dport_ras_setup(dport);
>
> put_cxl_mock_ops(index);
> }
> -EXPORT_SYMBOL_NS_GPL(__wrap_cxl_dport_init_ras_reporting, "CXL");
> +EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_dport_ras_setup, "CXL");
>
> struct cxl_dport *__wrap_cxl_add_dport_by_dev(struct cxl_port *port,
> struct device *dport_dev)
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 23/34] cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers
2026-01-14 18:20 ` [PATCH v14 23/34] cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
@ 2026-01-14 21:53 ` Dave Jiang
2026-01-15 15:17 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:53 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> In preparation for CXL VH (Virtual Host) topology protocol error handling,
> add RAS capability registered mapping for all ports in a CXL VH topology.
> This includes the RAS capabilities of Switch Upstream Ports, Switch
> Downstream Ports, Host Bridge Ports ("upstream"), and Root Ports
> ("downstream")
>
> Update cxl_port_add_dport() to map the upstream RAS capability on first
> 'dport' attach, and downstream RAS capability on each 'dport' attach.
> Arrange for dport mappings to be released at del_dport() time.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> [djbw: reword changelog, fix devm handling]
drop the line above
DJ
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> ---
>
> Changes in v13->v14:
> - Correct message spelling (Terry)
> ---
> drivers/cxl/core/port.c | 2 +-
> drivers/cxl/core/ras.c | 11 +++++++++++
> drivers/cxl/cxl.h | 2 ++
> drivers/cxl/cxlpci.h | 4 ++++
> drivers/cxl/port.c | 37 +++++++++++++++++++++++++++++++++++
> tools/testing/cxl/Kbuild | 1 +
> tools/testing/cxl/test/mock.c | 12 ++++++++++++
> 7 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 2184c20af011..2c4e28e7975c 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1451,7 +1451,7 @@ static void del_dport(struct cxl_dport *dport)
> {
> struct cxl_port *port = dport->port;
>
> - devm_release_action(&port->dev, unlink_dport, dport);
> + devres_release_group(&port->dev, dport);
> }
>
> static void del_dports(struct cxl_port *port)
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 84abcf90fa99..76ac567724e3 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -162,6 +162,17 @@ void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_ras_setup, "CXL");
>
> +void devm_cxl_port_ras_setup(struct cxl_port *port)
> +{
> + struct cxl_register_map *map = &port->reg_map;
> +
> + map->host = &port->dev;
> + if (cxl_map_component_regs(map, &port->regs,
> + BIT(CXL_CM_CAP_CAP_ID_RAS)))
> + dev_dbg(&port->dev, "Failed to map RAS capability\n");
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
> +
> void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> {
> void __iomem *addr;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 46491046f101..805923693707 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -607,6 +607,7 @@ struct cxl_dax_region {
> * @parent_dport: dport that points to this port in the parent
> * @decoder_ida: allocator for decoder ids
> * @reg_map: component and ras register mapping parameters
> + * @regs: mapped component registers
> * @nr_dports: number of entries in @dports
> * @hdm_end: track last allocated HDM decoder instance for allocation ordering
> * @commit_end: cursor to track highest committed decoder for commit ordering
> @@ -628,6 +629,7 @@ struct cxl_port {
> struct cxl_dport *parent_dport;
> struct ida decoder_ida;
> struct cxl_register_map reg_map;
> + struct cxl_component_regs regs;
> int nr_dports;
> int hdm_end;
> int commit_end;
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index e41bb93d583a..ef4496b4e55e 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -82,6 +82,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev);
> pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> pci_channel_state_t state);
> void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
> +void devm_cxl_port_ras_setup(struct cxl_port *port);
> #else
> static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
>
> @@ -93,6 +94,9 @@ static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
> {
> }
> +static inline void devm_cxl_port_ras_setup(struct cxl_port *port)
> +{
> +}
> #endif
>
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 8f8fc98c1428..0d6e010e21ca 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -176,11 +176,29 @@ static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
> DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
> if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
>
> +static struct cxl_dport *cxl_dport_devres_group(struct cxl_dport *dport)
> +{
> + if (!devres_open_group(&dport->port->dev, dport, GFP_KERNEL))
> + return ERR_PTR(-ENOMEM);
> + return dport;
> +}
> +DEFINE_FREE(cxl_dport_group_free, struct cxl_dport *,
> + if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->port->dev, _T))
> +
> static void cxl_port_group_close(struct cxl_port *port)
> {
> devres_remove_group(&port->dev, port);
> }
>
> +/*
> + * Unlike the port group, that just facilitates unwind of setup failures, the
> + * dport group needs to stay live for del_dport() to reference.
> + */
> +static void cxl_dport_group_close(struct cxl_dport *dport)
> +{
> + devres_close_group(&dport->port->dev, dport);
> +}
> +
> static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> struct device *dport_dev)
> {
> @@ -209,6 +227,13 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> rc = devm_cxl_switch_port_decoders_setup(port);
> if (rc)
> return ERR_PTR(rc);
> +
> + /*
> + * RAS setup is optional, either driver operation can continue
> + * on failure, or the device does not implement RAS registers.
> + */
> + devm_cxl_port_ras_setup(port);
> +
> /*
> * Note, when nr_dports returns to zero the port is unregistered
> * and triggers cleanup. I.e. no need for open-coded release
> @@ -220,12 +245,24 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> if (IS_ERR(new_dport))
> return new_dport;
>
> + /*
> + * Establish a group for all dport resources that need to be released
> + * when the dport is deleted.
> + */
> + struct cxl_dport *dport_group __free(cxl_dport_group_free) =
> + cxl_dport_devres_group(new_dport);
> + if (IS_ERR(dport_group))
> + return ERR_CAST(dport_group);
> +
> rc = cxl_dport_autoremove(new_dport);
> if (rc)
> return ERR_PTR(rc);
>
> + devm_cxl_dport_ras_setup(new_dport);
> +
> cxl_switch_parse_cdat(new_dport);
>
> + cxl_dport_group_close(no_free_ptr(dport_group));
> cxl_port_group_close(no_free_ptr(port_group));
>
> dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index 7250bedf0448..6c516019600e 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -13,6 +13,7 @@ ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
> ldflags-y += --wrap=hmat_get_extended_linear_cache_size
> ldflags-y += --wrap=cxl_add_dport_by_dev
> ldflags-y += --wrap=devm_cxl_switch_port_decoders_setup
> +ldflags-y += --wrap=devm_cxl_port_ras_setup
>
> DRIVERS := ../../../drivers
> CXL_SRC := $(DRIVERS)/cxl
> diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> index 8883357ee50d..a0b87bbb2f75 100644
> --- a/tools/testing/cxl/test/mock.c
> +++ b/tools/testing/cxl/test/mock.c
> @@ -246,6 +246,18 @@ void __wrap_devm_cxl_dport_ras_setup(struct cxl_dport *dport)
> }
> EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_dport_ras_setup, "CXL");
>
> +void __wrap_devm_cxl_port_ras_setup(struct cxl_port *port)
> +{
> + int index;
> + struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> +
> + if (!ops || !ops->is_mock_port(port->uport_dev))
> + devm_cxl_port_ras_setup(port);
> +
> + put_cxl_mock_ops(index);
> +}
> +EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_port_ras_setup, "CXL");
> +
> struct cxl_dport *__wrap_cxl_add_dport_by_dev(struct cxl_port *port,
> struct device *dport_dev)
> {
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 24/34] cxl/port: Move endpoint component register management to cxl_port
2026-01-14 18:20 ` [PATCH v14 24/34] cxl/port: Move endpoint component register management to cxl_port Terry Bowman
@ 2026-01-14 21:55 ` Dave Jiang
2026-01-15 15:28 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:55 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> In preparation for generic protocol error handling across CXL endpoints,
> whether they be memory expander class devices or accelerators, drop the
> endpoint component management from cxl_dev_state.
>
> Organize all CXL port component management through the common cxl_port
> driver.
>
> Note that the end game is that drivers/cxl/core/ras.c loses all
> dependencies on a 'struct cxl_dev_state' parameter and operates only on
> port resources. The removal of component register mapping from cxl_pci is
> an incremental step towards that.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
missing sign off
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v13 -> v14:
> - New patch
> - Update log message for cxl_ras_unmask() failure (Dan)
> ---
> drivers/cxl/core/ras.c | 6 ++--
> drivers/cxl/cxlmem.h | 4 +--
> drivers/cxl/pci.c | 63 +-----------------------------------------
> drivers/cxl/port.c | 54 ++++++++++++++++++++++++++++++++++++
> 4 files changed, 60 insertions(+), 67 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 76ac567724e3..b37108f60c56 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -247,6 +247,7 @@ bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> void cxl_cor_error_detected(struct pci_dev *pdev)
> {
> struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> + struct cxl_memdev *cxlmd = cxlds->cxlmd;
> struct device *dev = &cxlds->cxlmd->dev;
>
> scoped_guard(device, dev) {
> @@ -261,7 +262,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
> cxl_handle_rdport_errors(cxlds);
>
> cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial,
> - cxlds->regs.ras);
> + cxlmd->endpoint->regs.ras);
> }
> }
> EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
> @@ -291,10 +292,9 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> * capability registers and bounce the active state of the memdev.
> */
> ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial,
> - cxlds->regs.ras);
> + cxlmd->endpoint->regs.ras);
> }
>
> -
> switch (state) {
> case pci_channel_io_normal:
> if (ue) {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 434031a0c1f7..ab7201ef3ea6 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -415,7 +415,7 @@ struct cxl_dpa_partition {
> * @dev: The device associated with this CXL state
> * @cxlmd: The device representing the CXL.mem capabilities of @dev
> * @reg_map: component and ras register mapping parameters
> - * @regs: Parsed register blocks
> + * @regs: Class device "Device" registers
> * @cxl_dvsec: Offset to the PCIe device DVSEC
> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> * @media_ready: Indicate whether the device media is usable
> @@ -431,7 +431,7 @@ struct cxl_dev_state {
> struct device *dev;
> struct cxl_memdev *cxlmd;
> struct cxl_register_map reg_map;
> - struct cxl_regs regs;
> + struct cxl_device_regs regs;
> int cxl_dvsec;
> bool rcd;
> bool media_ready;
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index b7f694bda913..acb0eb2a13c3 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -535,52 +535,6 @@ static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
> return cxl_setup_regs(map);
> }
>
> -static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> -{
> - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> - void __iomem *addr;
> - u32 orig_val, val, mask;
> - u16 cap;
> - int rc;
> -
> - if (!cxlds->regs.ras) {
> - dev_dbg(&pdev->dev, "No RAS registers.\n");
> - return 0;
> - }
> -
> - /* BIOS has PCIe AER error control */
> - if (!pcie_aer_is_native(pdev))
> - return 0;
> -
> - rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
> - if (rc)
> - return rc;
> -
> - if (cap & PCI_EXP_DEVCTL_URRE) {
> - addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
> - orig_val = readl(addr);
> -
> - mask = CXL_RAS_UNCORRECTABLE_MASK_MASK |
> - CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
> - val = orig_val & ~mask;
> - writel(val, addr);
> - dev_dbg(&pdev->dev,
> - "Uncorrectable RAS Errors Mask: %#x -> %#x\n",
> - orig_val, val);
> - }
> -
> - if (cap & PCI_EXP_DEVCTL_CERE) {
> - addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
> - orig_val = readl(addr);
> - val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK;
> - writel(val, addr);
> - dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x -> %#x\n",
> - orig_val, val);
> - }
> -
> - return 0;
> -}
> -
> static void free_event_buf(void *buf)
> {
> kvfree(buf);
> @@ -912,13 +866,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> unsigned int i;
> bool irq_avail;
>
> - /*
> - * Double check the anonymous union trickery in struct cxl_regs
> - * FIXME switch to struct_group()
> - */
> - BUILD_BUG_ON(offsetof(struct cxl_regs, memdev) !=
> - offsetof(struct cxl_regs, device_regs.memdev));
> -
> rc = pcim_enable_device(pdev);
> if (rc)
> return rc;
> @@ -942,7 +889,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> - rc = cxl_map_device_regs(&map, &cxlds->regs.device_regs);
> + rc = cxl_map_device_regs(&map, &cxlds->regs);
> if (rc)
> return rc;
>
> @@ -957,11 +904,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> else if (!cxlds->reg_map.component_map.ras.valid)
> dev_dbg(&pdev->dev, "RAS registers not found\n");
>
> - rc = cxl_map_component_regs(&cxlds->reg_map, &cxlds->regs.component,
> - BIT(CXL_CM_CAP_CAP_ID_RAS));
> - if (rc)
> - dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
> -
> rc = cxl_pci_type3_init_mailbox(cxlds);
> if (rc)
> return rc;
> @@ -1052,9 +994,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> - if (cxl_pci_ras_unmask(pdev))
> - dev_dbg(&pdev->dev, "No RAS reporting unmasked\n");
> -
> pci_save_state(pdev);
>
> return rc;
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 0d6e010e21ca..d76b4b532064 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
> /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> +#include <linux/aer.h>
> #include <linux/device.h>
> #include <linux/module.h>
> #include <linux/slab.h>
> @@ -72,6 +73,55 @@ static int cxl_switch_port_probe(struct cxl_port *port)
> return 0;
> }
>
> +static int cxl_ras_unmask(struct cxl_port *port)
> +{
> + struct pci_dev *pdev;
> + void __iomem *addr;
> + u32 orig_val, val, mask;
> + u16 cap;
> + int rc;
> +
> + if (!dev_is_pci(port->uport_dev))
> + return 0;
> + pdev = to_pci_dev(port->uport_dev);
> +
> + if (!port->regs.ras) {
> + pci_dbg(pdev, "No RAS registers.\n");
> + return 0;
> + }
> +
> + /* BIOS has PCIe AER error control */
> + if (!pcie_aer_is_native(pdev))
> + return 0;
> +
> + rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
> + if (rc)
> + return rc;
> +
> + if (cap & PCI_EXP_DEVCTL_URRE) {
> + addr = port->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
> + orig_val = readl(addr);
> +
> + mask = CXL_RAS_UNCORRECTABLE_MASK_MASK |
> + CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
> + val = orig_val & ~mask;
> + writel(val, addr);
> + pci_dbg(pdev, "Uncorrectable RAS Errors Mask: %#x -> %#x\n",
> + orig_val, val);
> + }
> +
> + if (cap & PCI_EXP_DEVCTL_CERE) {
> + addr = port->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
> + orig_val = readl(addr);
> + val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK;
> + writel(val, addr);
> + pci_dbg(pdev, "Correctable RAS Errors Mask: %#x -> %#x\n",
> + orig_val, val);
> + }
> +
> + return 0;
> +}
> +
> static int cxl_endpoint_port_probe(struct cxl_port *port)
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
> @@ -102,6 +152,10 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> if (dport->rch)
> devm_cxl_dport_ras_setup(dport);
>
> + devm_cxl_port_ras_setup(port);
> + if (cxl_ras_unmask(port))
> + dev_dbg(&port->dev, "failed to unmask RAS interrupts\n");
> +
> /*
> * Now that all endpoint decoders are successfully enumerated, try to
> * assemble regions from committed decoders
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 25/34] cxl/port: Map Port component registers before switchport init
2026-01-14 18:20 ` [PATCH v14 25/34] cxl/port: Map Port component registers before switchport init Terry Bowman
@ 2026-01-14 21:59 ` Dave Jiang
2026-01-15 15:30 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 21:59 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> Port HDM registers must be mapped before calling
> devm_cxl_switch_port_decoders_setup(). Invoke a call to this function
> in cxl_port_add_dport().
s/Invoke.../Map the per port component registers when the first dport is being enumerated./
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/port.c | 3 ++-
> drivers/cxl/cxlpci.h | 3 +++
> drivers/cxl/port.c | 5 +++++
> 3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 2c4e28e7975c..3f730511f11d 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -778,7 +778,7 @@ static int cxl_setup_comp_regs(struct device *host, struct cxl_register_map *map
> return cxl_setup_regs(map);
> }
>
> -static int cxl_port_setup_regs(struct cxl_port *port,
> +int cxl_port_setup_regs(struct cxl_port *port,
> resource_size_t component_reg_phys)
> {
> if (dev_is_platform(port->uport_dev))
> @@ -786,6 +786,7 @@ static int cxl_port_setup_regs(struct cxl_port *port,
> return cxl_setup_comp_regs(&port->dev, &port->reg_map,
> component_reg_phys);
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_port_setup_regs, "CXL");
>
> static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
> resource_size_t component_reg_phys)
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index ef4496b4e55e..532506595d0f 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -99,4 +99,7 @@ static inline void devm_cxl_port_ras_setup(struct cxl_port *port)
> }
> #endif
>
> +int cxl_port_setup_regs(struct cxl_port *port,
> + resource_size_t component_reg_phys);
> +
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index d76b4b532064..f8a33dbf8222 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -278,6 +278,11 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> return ERR_CAST(port_group);
>
> if (port->nr_dports == 0) {
> +
> + rc = cxl_port_setup_regs(port, port->component_reg_phys);
> + if (rc)
> + return ERR_PTR(rc);
> +
> rc = devm_cxl_switch_port_decoders_setup(port);
> if (rc)
> return ERR_PTR(rc);
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 32/34] cxl: Update Endpoint uncorrectable protocol error handling
2026-01-14 18:20 ` [PATCH v14 32/34] cxl: Update Endpoint uncorrectable protocol error handling Terry Bowman
@ 2026-01-14 22:07 ` dan.j.williams
2026-01-15 15:26 ` Bowman, Terry
2026-01-15 15:27 ` Bowman, Terry
0 siblings, 2 replies; 129+ messages in thread
From: dan.j.williams @ 2026-01-14 22:07 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
Terry Bowman wrote:
> The CXL drivers must support handling Endpoint CXL and PCI uncorrectable
> (UCE) protocol errors. Update the drivers to support both.
>
> Introduce cxl_pci_error_detected() to handle PCI correctable errors,
> replacing cxl_error_detected(). Implement this new function to call
> the existing CXL Port uncorrectable handler, cxl_port_error_detected().
>
> Update cxl_port_error_detected() for Endpoint handling. Take the CXL
> memory device lock, check for a valid driver, and handle restricted
> CXL device (RCH) if needed. This is the same sequence initially in
> cxl_error_detected(). But, the UCE handler's logic for the returned
> result errors is simplified because recovery will not be tried and
> instead UCE's will result in the CXL driver invoking system panic.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
>
> Changes in v13->v14:
> - Update commit headline (Bjorn)
> - Rename pci_error_detected()/pci_cor_error_detected() ->
> cxl_pci_error_detected/cxl_pci_cor_error_detected() (Jonathan)
> - Remove now-invalid comment in cxl_error_detected() (Jonathan)
> - Split into separate patches for UCE and CE (Terry)
>
> Changes in v12->v13:
> - Update commit messaqge (Terry)
> - Updated all the implementation and commit message. (Terry)
> - Refactored cxl_cor_error_detected()/cxl_error_detected() to remove
> pdev (Dave Jiang)
>
> Changes in v11->v12:
> - None
>
> Changes in v10->v11:
> - cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
> - cxl_error_detected() - Remove extra line (Shiju)
> - Changes moved to core/ras.c (Terry)
> - cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
> - Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
> - Move #include "pci.h from cxl.h to core.h (Terry)
> - Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
[..]
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 96ce85cc0a46..dc6e02d64821 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
[..]
> @@ -373,55 +399,21 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
>
> -pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> - pci_channel_state_t state)
> +pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
> + pci_channel_state_t error)
> {
> - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> - struct cxl_memdev *cxlmd = cxlds->cxlmd;
> - struct device *dev = &cxlmd->dev;
> - bool ue;
> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
> + pci_ers_result_t rc;
>
> - guard(device)(dev);
> + guard(device)(&port->dev);
>
> - if (!dev->driver) {
> - dev_warn(&pdev->dev,
> - "%s: memdev disabled, abort error handling\n",
> - dev_name(dev));
> - return PCI_ERS_RESULT_DISCONNECT;
> - }
> + rc = cxl_port_error_detected(&pdev->dev);
> + if (rc == PCI_ERS_RESULT_PANIC)
> + panic("CXL cachemem error.");
[..]
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index acb0eb2a13c3..ff741adc7c7f 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1051,8 +1051,8 @@ static void cxl_reset_done(struct pci_dev *pdev)
> }
> }
>
> -static const struct pci_error_handlers cxl_error_handlers = {
> - .error_detected = cxl_error_detected,
> +static const struct pci_error_handlers pci_error_handlers = {
> + .error_detected = cxl_pci_error_detected,
I still feel like we are disconnected on the fundamental question of who
is responsible for invoking CXL protocol error handling.
To be clear, all of this:
cxl/port: Remove "enumerate dports" helpers
cxl/port: Fix devm resource leaks around with dport management
cxl/port: Move dport operations to a driver event
cxl/port: Move dport RAS reporting to a port resource
cxl/port: Move endpoint component register management to cxl_port
cxl/port: Unify endpoint and switch port lookup
Was with the intent that cxl_pci and any other driver that creates a
cxl_memdev never needs to worry about CXL protocol error handling. It
comes "for free" by registering a "struct cxl_memdev".
This is the rationale for "struct pci_dev" to grow an "is_cxl"
attribute, and for the PCI core to learn how to forward PCIE internal
errors on CXL devices to the CXL core.
The only errors that cxl_pci needs to worry about are non-internal /
native PCI errors. All CXL errors will have already been routed to the
CXL core for generic handling based on a port lookup.
So the end state I am looking for is no call to
cxl_port_error_detected() from any 'struct pci_error_handlers'
implementation. Untangle that ambiguity in the AER core and do not
inflict it on every CXL driver that comes after.
I think we are close to that outcome if not already there by simply
deleting this last cxl_pci_error_detected() -> cxl_port_error_detected()
"false dependency".
Now, if an endpoint driver ever thinks it can do anything sane with CXL
protocol error beyond what the core is already handling, then we can
think about complications like passing a cxl_port error handler
template. I struggle to think of a case like that.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 29/34] cxl/port: Unify endpoint and switch port lookup
2026-01-14 18:20 ` [PATCH v14 29/34] cxl/port: Unify endpoint and switch port lookup Terry Bowman
@ 2026-01-14 23:04 ` Dave Jiang
2026-01-15 15:44 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 23:04 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> In support of generic CXL protocol error handling across various 'struct
> cxl_port' types, update find_cxl_port_by_uport() to retrieve endpoint CXL
> port companions from endpoint PCIe device instances.
>
> The end result is that upstream switch ports and endpoint ports can share
> error handling and eventually delete the misplaced cxl_error_handlers from
> the cxl_pci class driver.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
missing sign off
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v13->v14:
> - New patch
> ---
> drivers/cxl/core/port.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 3f730511f11d..a535e57360e0 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1561,10 +1561,20 @@ static int match_port_by_uport(struct device *dev, const void *data)
> return 0;
>
> port = to_cxl_port(dev);
> + /* Endpoint ports are hosted by memdevs */
> + if (is_cxl_memdev(port->uport_dev))
> + return uport_dev == port->uport_dev->parent;
> return uport_dev == port->uport_dev;
> }
>
> -/*
> +/**
> + * find_cxl_port_by_uport - Find a CXL port device companion
> + * @uport_dev: Device that acts as a switch or endpoint in the CXL hierarchy
> + *
> + * In the case of endpoint ports recall that port->uport_dev points to a 'struct
> + * cxl_memdev' device. So, the @uport_dev argument is the parent device of the
> + * 'struct cxl_memdev' in that case.
> + *
> * Function takes a device reference on the port device. Caller should do a
> * put_device() when done.
> */
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error
2026-01-14 18:20 ` [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error Terry Bowman
@ 2026-01-14 23:18 ` Dave Jiang
2026-01-16 14:42 ` Bowman, Terry
2026-01-15 16:01 ` Jonathan Cameron
2026-01-22 18:32 ` Bjorn Helgaas
2 siblings, 1 reply; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 23:18 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> The AER driver now forwards CXL protocol errors to the CXL driver via a
> kfifo. The CXL driver must consume these work items and initiate protocol
> error handling while ensuring the device's RAS mappings remain valid
> throughout processing.
>
> Implement cxl_proto_err_work_fn() to dequeue work items forwarded by the
> AER service driver. Lock the parent CXL Port device to ensure the CXL
> device's RAS registers are accessible during handling. Add pdev reference-put
> to match reference-get in AER driver. This will ensure pdev access after
> kfifo dequeue. These changes apply to CXL Ports and CXL Endpoints.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
>
> Changes in v13->v14:
> - Update commit title's prefix (Bjorn)
> - Add pdev ref get in AER driver before enqueue and add pdev ref put in
> CXL driver after dequeue and handling (Dan)
> - Removed handling to simplify patch context (Terry)
>
> Changes in v12->v13:
> - Add cxlmd lock using guard() (Terry)
> - Remove exporting of unused function, pci_aer_clear_fatal_status() (Dave Jiang)
> - Change pr_err() calls to ratelimited. (Terry)
> - Update commit message. (Terry)
> - Remove namespace qualifier from pcie_clear_device_status()
> export (Dave Jiang)
> - Move locks into cxl_proto_err_work_fn() (Dave)
> - Update log messages in cxl_forward_error() (Ben)
>
> Changes in v11->v12:
> - Add guard for CE case in cxl_handle_proto_error() (Dave)
>
> Changes in v10->v11:
> - Reword patch commit message to remove RCiEP details (Jonathan)
> - Add #include <linux/bitfield.h> (Terry)
> - is_cxl_rcd() - Fix short comment message wrap (Jonathan)
> - is_cxl_rcd() - Combine return calls into 1 (Jonathan)
> - cxl_handle_proto_error() - Move comment earlier (Jonathan)
> - Use FIELD_GET() in discovering class code (Jonathan)
> - Remove BDF from cxl_proto_err_work_data. Use 'struct
> pci_dev *' (Dan)
> ---
> drivers/cxl/core/core.h | 3 ++
> drivers/cxl/core/port.c | 6 +--
> drivers/cxl/core/ras.c | 98 +++++++++++++++++++++++++++++++----
> drivers/pci/pcie/aer_cxl_vh.c | 1 +
> 4 files changed, 94 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 306762a15dc0..39324e1b8940 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -169,6 +169,9 @@ static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
> #endif /* CONFIG_CXL_RAS */
>
> int cxl_gpf_port_setup(struct cxl_dport *dport);
> +struct cxl_port *find_cxl_port(struct device *dport_dev,
> + struct cxl_dport **dport);
> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev);
>
> struct cxl_hdm;
> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index a535e57360e0..0bec10be5d56 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1335,8 +1335,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
> return NULL;
> }
>
> -static struct cxl_port *find_cxl_port(struct device *dport_dev,
> - struct cxl_dport **dport)
> +struct cxl_port *find_cxl_port(struct device *dport_dev,
> + struct cxl_dport **dport)
> {
> struct cxl_find_port_ctx ctx = {
> .dport_dev = dport_dev,
> @@ -1578,7 +1578,7 @@ static int match_port_by_uport(struct device *dev, const void *data)
> * Function takes a device reference on the port device. Caller should do a
> * put_device() when done.
> */
> -static struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
> {
> struct device *dev;
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index bf82880e19b4..0c640b84ad70 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -117,17 +117,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
> }
> static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>
> -int cxl_ras_init(void)
> -{
> - return cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
> -}
> -
> -void cxl_ras_exit(void)
> -{
> - cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> - cancel_work_sync(&cxl_cper_prot_err_work);
> -}
> -
> static void cxl_dport_map_ras(struct cxl_dport *dport)
> {
> struct cxl_register_map *map = &dport->reg_map;
> @@ -173,6 +162,44 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>
> +/*
> + * Return 'struct cxl_port *' parent CXL Port of dev
> + *
> + * Reference count increments returned port on success
> + *
> + * @pdev: Find the parent CXL Port of this device
> + */
> +static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
> +{
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + {
> + struct cxl_dport *dport;
> + struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port;
> + }
> + case PCI_EXP_TYPE_UPSTREAM:
> + case PCI_EXP_TYPE_ENDPOINT:
> + {
> + struct cxl_port *port = find_cxl_port_by_uport(&pdev->dev);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port;
> + }
> + }
> + pci_warn_once(pdev, "Error: Unsupported device type (%#x)", pci_pcie_type(pdev));
> + return NULL;
> +}
> +
> void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> {
> void __iomem *addr;
> @@ -316,3 +343,52 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> return PCI_ERS_RESULT_NEED_RESET;
> }
> EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
> +
> +static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> +{
> +}
> +
> +static void cxl_proto_err_work_fn(struct work_struct *work)
> +{
> + struct cxl_proto_err_work_data wd;
> +
> + while (cxl_proto_err_kfifo_get(&wd)) {
> + struct pci_dev *pdev __free(pci_dev_put) = wd.pdev;
> +
> + if (!pdev) {
> + pr_err_ratelimited("NULL PCI device passed in AER-CXL KFIFO\n");
> + continue;
> + }
> +
> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
> + if (!port) {
> + pr_err_ratelimited("Failed to find parent Port device in CXL topology.\n");
> + continue;
> + }
> + guard(device)(&port->dev);
> +
> + cxl_handle_proto_error(&wd);
> + }
> +}
> +
> +static struct work_struct cxl_proto_err_work;
> +static DECLARE_WORK(cxl_proto_err_work, cxl_proto_err_work_fn);
> +
> +int cxl_ras_init(void)
> +{
> + if (cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work))
> + pr_err("Failed to initialize CXL RAS CPER\n");
> +
> + cxl_register_proto_err_work(&cxl_proto_err_work);
> +
> + return 0;
> +}
> +
> +void cxl_ras_exit(void)
> +{
> + cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> + cancel_work_sync(&cxl_cper_prot_err_work);
> +
> + cxl_unregister_proto_err_work();
> + cancel_work_sync(&cxl_proto_err_work);
> +}
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> index 2189d3c6cef1..0f616f5fafcf 100644
> --- a/drivers/pci/pcie/aer_cxl_vh.c
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -48,6 +48,7 @@ void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info)
> };
>
> guard(rwsem_read)(&cxl_proto_err_kfifo.rw_sema);
> + pci_dev_get(pdev);
Should this chunk move to where the commit that implements cxl_forward_error()?
> if (!cxl_proto_err_kfifo.work || !kfifo_put(&cxl_proto_err_kfifo.fifo, wd)) {
> dev_err_ratelimited(&pdev->dev, "AER-CXL kfifo error");
> return;
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 31/34] PCI: Introduce CXL Port protocol error handlers
2026-01-14 18:20 ` [PATCH v14 31/34] PCI: Introduce CXL Port protocol error handlers Terry Bowman
@ 2026-01-14 23:37 ` Dave Jiang
2026-01-15 16:12 ` Jonathan Cameron
2026-01-22 18:27 ` Bjorn Helgaas
1 sibling, 1 reply; 129+ messages in thread
From: Dave Jiang @ 2026-01-14 23:37 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/26 11:20 AM, Terry Bowman wrote:
> Add CXL protocol error handlers for CXL Port devices (Root Ports,
> Downstream Ports, and Upstream Ports). Implement cxl_port_cor_error_detected()
> and cxl_port_error_detected() to handle correctable and uncorrectable errors
> respectively.
>
> Introduce cxl_get_ras_base() to retrieve the cached RAS register base
> address for a given CXL port. This function supports CXL Root Ports,
> Downstream Ports, Upstream Ports, and Endpoints by returning their
> previously mapped RAS register addresses.
>
> Update the AER driver's is_cxl_error() to recognize CXL Port devices in
> addition to CXL Endpoints, as both now have CXL-specific error handlers.
>
> Future patch(es) will include port error handling changes to support
> Endpoint protocol errors.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> > ---
>
> Changes in v13->v14:
> - Add Dave Jiang's review-by
Doesn't look like that happened?
> - Update commit message & headline (Bjorn)
> - Refactor cxl_port_error_detected()/cxl_port_cor_error_detected() to
> one line (Jonathan)
> - Remove cxl_walk_port() (Dan)
> - Remove cxl_pci_drv_bound(). Check for 'is_cxl' parent port is
> sufficient (Dan)
> - Remove device_lock_if()
> - Combined CE and UCE here (Terry)
>
> Changes in v12->v13:
> - Move get_pci_cxl_host_dev() and cxl_handle_proto_error() to Dequeue
> patch (Terry)
> - Remove EP case in cxl_get_ras_base(), not used. (Terry)
> - Remove check for dport->dport_dev (Dave)
> - Remove whitespace (Terry)
>
> Changes in v11->v12:
> - Add call to cxl_pci_drv_bound() in cxl_handle_proto_error() and
> pci_to_cxl_dev()
> - Change cxl_error_detected() -> cxl_cor_error_detected()
> - Remove NULL variable assignments
> - Replace bus_find_device() with find_cxl_port_by_uport() for upstream
> port searches.
>
> Changes in v10->v11:
> - None
> ---
> drivers/cxl/core/ras.c | 101 +++++++++++++++++++++++++++++++++-
> drivers/pci/pci.c | 1 +
> drivers/pci/pci.h | 2 -
> drivers/pci/pcie/aer.c | 1 +
> drivers/pci/pcie/aer_cxl_vh.c | 5 +-
> include/linux/aer.h | 2 +
> include/linux/pci.h | 2 +
> 7 files changed, 109 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 0c640b84ad70..96ce85cc0a46 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -200,6 +200,67 @@ static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
> return NULL;
> }
>
> +static void __iomem *cxl_get_ras_base(struct device *dev)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + {
> + struct cxl_dport *dport;
> + struct cxl_port *port __free(put_cxl_port) = find_cxl_port(&pdev->dev, &dport);
> +
> + if (!dport) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return dport->regs.ras;
> + }
> + case PCI_EXP_TYPE_UPSTREAM:
> + {
> + struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port->regs.ras;
> + }
> + }
> + dev_warn_once(dev, "Error: Unsupported device type (%#x)", pci_pcie_type(pdev));
> + return NULL;
> +}
> +
> +static pci_ers_result_t cxl_port_error_detected(struct device *dev);
> +
> +static void cxl_do_recovery(struct pci_dev *pdev)
> +{
> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
To minimize errors, move this line to right above when you check !port. It's acceptable to do inline declaration when it comes cleanup macros.
DJ
> + pci_ers_result_t status;
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device\n");
> + return;
> + }
> +
> + status = cxl_port_error_detected(&pdev->dev);
> + if (status == PCI_ERS_RESULT_PANIC)
> + panic("CXL cachemem error.");
> +
> + /*
> + * If we have native control of AER, clear error status in the device
> + * that detected the error. If the platform retained control of AER,
> + * it is responsible for clearing this status. In that case, the
> + * signaling device may not even be visible to the OS.
> + */
> + if (pcie_aer_is_native(pdev)) {
> + pcie_clear_device_status(pdev);
> + pci_aer_clear_nonfatal_status(pdev);
> + pci_aer_clear_fatal_status(pdev);
> + }
> +}
> +
> void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> {
> void __iomem *addr;
> @@ -214,7 +275,10 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> return;
> writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
>
> - trace_cxl_aer_correctable_error(dev, status, serial);
> + if (is_cxl_memdev(dev))
> + trace_cxl_aer_correctable_error(dev, status, serial);
> + else
> + trace_cxl_port_aer_correctable_error(dev, status);
> }
>
> /* CXL spec rev3.0 8.2.4.16.1 */
> @@ -265,12 +329,27 @@ bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> }
>
> header_log_copy(ras_base, hl);
> - trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
> +
> + if (is_cxl_memdev(dev))
> + trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
> + else
> + trace_cxl_port_aer_uncorrectable_error(dev, status, fe, hl);
> +
> writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
>
> return true;
> }
>
> +static void cxl_port_cor_error_detected(struct device *dev)
> +{
> + cxl_handle_cor_ras(dev, 0, cxl_get_ras_base(dev));
> +}
> +
> +static pci_ers_result_t cxl_port_error_detected(struct device *dev)
> +{
> + return cxl_handle_ras(dev, 0, cxl_get_ras_base(dev));
> +}
> +
> void cxl_cor_error_detected(struct pci_dev *pdev)
> {
> struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> @@ -346,6 +425,24 @@ EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
>
> static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> {
> + struct pci_dev *pdev = err_info->pdev;
> +
> + if (err_info->severity == AER_CORRECTABLE) {
> +
> + if (!pcie_aer_is_native(pdev))
> + return;
> +
> + if (pdev->aer_cap)
> + pci_clear_and_set_config_dword(pdev,
> + pdev->aer_cap + PCI_ERR_COR_STATUS,
> + 0, PCI_ERR_COR_INTERNAL);
> +
> + cxl_port_cor_error_detected(&pdev->dev);
> +
> + pcie_clear_device_status(pdev);
> + } else {
> + cxl_do_recovery(pdev);
> + }
> }
>
> static void cxl_proto_err_work_fn(struct work_struct *work)
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..b7bfefdaf990 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2248,6 +2248,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
> pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> }
> +EXPORT_SYMBOL_GPL(pcie_clear_device_status);
> #endif
>
> /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index dbc547db208a..8bb703524f52 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -229,7 +229,6 @@ void pci_refresh_power_state(struct pci_dev *dev);
> int pci_power_up(struct pci_dev *dev);
> void pci_disable_enabled_device(struct pci_dev *dev);
> int pci_finish_runtime_suspend(struct pci_dev *dev);
> -void pcie_clear_device_status(struct pci_dev *dev);
> void pcie_clear_root_pme_status(struct pci_dev *dev);
> bool pci_check_pme_status(struct pci_dev *dev);
> void pci_pme_wakeup_bus(struct pci_bus *bus);
> @@ -1196,7 +1195,6 @@ void pci_restore_aer_state(struct pci_dev *dev);
> static inline void pci_no_aer(void) { }
> static inline void pci_aer_init(struct pci_dev *d) { }
> static inline void pci_aer_exit(struct pci_dev *d) { }
> -static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
> static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
> static inline void pci_save_aer_state(struct pci_dev *dev) { }
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index c2030d32a19c..dd7c49651612 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -298,6 +298,7 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
> if (status)
> pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> }
> +EXPORT_SYMBOL_GPL(pci_aer_clear_fatal_status);
>
> /**
> * pci_aer_raw_clear_status - Clear AER error registers.
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> index 0f616f5fafcf..aa69e504302f 100644
> --- a/drivers/pci/pcie/aer_cxl_vh.c
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -34,7 +34,10 @@ bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
> if (!info || !info->is_cxl)
> return false;
>
> - if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
> + if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
> return false;
>
> return is_aer_internal_error(info);
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index f351e41dd979..c1aef7859d0a 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -65,6 +65,7 @@ struct cxl_proto_err_work_data {
>
> #if defined(CONFIG_PCIEAER)
> int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> +void pci_aer_clear_fatal_status(struct pci_dev *dev);
> int pcie_aer_is_native(struct pci_dev *dev);
> void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> #else
> @@ -72,6 +73,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> {
> return -EINVAL;
> }
> +static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> #endif
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index ee05d5925b13..1ef4743bf151 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1921,8 +1921,10 @@ static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { }
>
> #ifdef CONFIG_PCIEAER
> bool pci_aer_available(void);
> +void pcie_clear_device_status(struct pci_dev *dev);
> #else
> static inline bool pci_aer_available(void) { return false; }
> +static inline void pcie_clear_device_status(struct pci_dev *dev) { }
> #endif
>
> bool pci_ats_disabled(void);
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management
2026-01-14 18:20 ` [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management Terry Bowman
2026-01-14 21:26 ` Dave Jiang
@ 2026-01-15 14:46 ` Jonathan Cameron
2026-01-16 4:45 ` dan.j.williams
1 sibling, 1 reply; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 14:46 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:40 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> With dport addition moving out of cxl_switch_port_probe() it is no longer
> the case that a single dport-add failure will cause all dport resources
> to be automatically unwound.
>
> devm still helps all dport resources get cleaned up when the port is
> detached, but setup now needs to avoid leaking resources if an early exit
> occurs during setup.
>
> Convert from a "devm add" model, to an "auto remove" model that makes the
> caller responsible for registering devm reclaim after the object is fully
> instantiated.
>
> As a side of effect of this reorganization port->nr_dports is now always
> consistent with the number of entries in the port->dports xarray, and this
> can stop playing games with ida_is_empty() which is unreliable as a
> detector of whether decoders are setup. I.e. consider how
> CONFIG_DEBUG_KOBJECT_RELEASE might wreak havoc with this approach.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
>
> Changes in v13 -> v14:
> - New patch
Hi Dan, Terry,
I think this needs a little reorganization to ensure we don't have
dport and dport_add both being the same pointer for different free
reasons. Adding a helper and we can combine them with a clear
hand over of ownership.
Wrapping devres_remove_group() in a function that is called close_group()
rings alarm bells.
Jonathan
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index fef3aa0c6680..a05a1812bb6e 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> -static struct cxl_dport *
> -__devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> - int port_id, resource_size_t component_reg_phys,
> - resource_size_t rcrb)
> +static struct cxl_dport *__cxl_add_dport(struct cxl_port *port,
> + struct device *dport_dev, int port_id,
> + resource_size_t component_reg_phys,
> + resource_size_t rcrb)
> {
> char link_name[CXL_TARGET_STRLEN];
> - struct cxl_dport *dport;
> - struct device *host;
> int rc;
>
> - if (is_cxl_root(port))
> - host = port->uport_dev;
> - else
> - host = &port->dev;
> -
> - if (!host->driver) {
> - dev_WARN_ONCE(&port->dev, 1, "dport:%s bad devm context\n",
> - dev_name(dport_dev));
> - return ERR_PTR(-ENXIO);
> - }
> -
> if (snprintf(link_name, CXL_TARGET_STRLEN, "dport%d", port_id) >=
> CXL_TARGET_STRLEN)
> return ERR_PTR(-EINVAL);
>
> - dport = devm_kzalloc(host, sizeof(*dport), GFP_KERNEL);
> + struct cxl_dport *dport __free(kfree) =
> + kzalloc(sizeof(*dport), GFP_KERNEL);
> if (!dport)
> return ERR_PTR(-ENOMEM);
>
> @@ -1176,48 +1175,27 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> &component_reg_phys);
>
> cond_cxl_root_lock(port);
> - rc = add_dport(port, dport);
> + struct cxl_dport *dport_add __free(remove_dport) =
> + add_dport(port, dport);
This pattern of having both dport and dport_add effectively
pointing to the same pointer concerns me from a readability / maintainability
point of view. We've often made use of helper functions to avoid doing
this and I think that would make sense here as well.
Take everything down to and including dport_add() as a helper called
something like (naming needs work!)
struct dport_dev *dport __free(remove_and_free_dport) =
add_dport_wrapper();
With the __free doing the kfree as well as remove.
> cond_cxl_root_unlock(port);
> - if (rc)
> - return ERR_PTR(rc);
> -
> - /*
> - * Setup port register if this is the first dport showed up. Having
> - * a dport also means that there is at least 1 active link.
> - */
> - if (port->nr_dports == 1 &&
> - port->component_reg_phys != CXL_RESOURCE_NONE) {
> - rc = cxl_port_setup_regs(port, port->component_reg_phys);
> - if (rc) {
> - xa_erase(&port->dports, (unsigned long)dport->dport_dev);
> - return ERR_PTR(rc);
> - }
> - port->component_reg_phys = CXL_RESOURCE_NONE;
> - }
> + if (IS_ERR(dport_add))
> + return dport_add;
>
> - get_device(dport_dev);
> - rc = devm_add_action_or_reset(host, cxl_dport_remove, dport);
> - if (rc)
> - return ERR_PTR(rc);
> + if (dev_is_pci(dport_dev))
> + dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
>
> rc = sysfs_create_link(&port->dev.kobj, &dport_dev->kobj, link_name);
> if (rc)
> return ERR_PTR(rc);
>
> - rc = devm_add_action_or_reset(host, cxl_dport_unlink, dport);
> - if (rc)
> - return ERR_PTR(rc);
> -
> - if (dev_is_pci(dport_dev))
> - dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
> -
> cxl_debugfs_create_dport_dir(dport);
>
> - return dport;
> + retain_and_null_ptr(dport_add);
> + return no_free_ptr(dport);
> }
> +
> +/*
> + * Note: this only services dynamic removal of mid-level ports, root ports are
> + * always removed by the platform driver (e.g. cxl_acpi). @host can be
> + * hard-coded to &port->dev.
> + */
> static void del_dport(struct cxl_dport *dport)
> {
> struct cxl_port *port = dport->port;
>
> - devm_release_action(&port->dev, cxl_dport_unlink, dport);
> - devm_release_action(&port->dev, cxl_dport_remove, dport);
> - devm_kfree(&port->dev, dport);
> + devm_release_action(&port->dev, unlink_dport, dport);
> }
>
> static void del_dports(struct cxl_port *port)
> @@ -1597,10 +1603,24 @@ static int update_decoder_targets(struct device *dev, void *data)
> return 0;
> }
>
> -DEFINE_FREE(del_cxl_dport, struct cxl_dport *, if (!IS_ERR_OR_NULL(_T)) del_dport(_T))
> +static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
> +{
> + if (!devres_open_group(&port->dev, port, GFP_KERNEL))
> + return ERR_PTR(-ENOMEM);
> + return port;
> +}
> +DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
> + if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
> +
> +static void cxl_port_group_close(struct cxl_port *port)
This feels like misleading naming and I'm not sure what intent is.
Would have expected it to call devres_close_group()
> +{
> + devres_remove_group(&port->dev, port);
> +}
> +
> static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> struct device *dport_dev)
> {
> + struct cxl_dport *new_dport;
> struct cxl_dport *dport;
> int rc;
>
> @@ -1615,29 +1635,46 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> return ERR_PTR(-EBUSY);
> }
>
> - struct cxl_dport *new_dport __free(del_cxl_dport) =
> - devm_cxl_add_dport_by_dev(port, dport_dev);
> - if (IS_ERR(new_dport))
> - return new_dport;
> -
> - cxl_switch_parse_cdat(new_dport);
> + /*
> + * With the first dport arrival it is now safe to start looking at
> + * component registers. Be careful to not strand resources if dport
> + * creation ultimately fails.
> + */
> + struct cxl_port *port_group __free(cxl_port_group_free) =
> + cxl_port_devres_group(port);
> + if (IS_ERR(port_group))
> + return ERR_CAST(port_group);
>
> - if (ida_is_empty(&port->decoder_ida)) {
> + if (port->nr_dports == 0) {
> rc = devm_cxl_switch_port_decoders_setup(port);
> if (rc)
> return ERR_PTR(rc);
> - dev_dbg(&port->dev, "first dport%d:%s added with decoders\n",
> - new_dport->port_id, dev_name(dport_dev));
> - return no_free_ptr(new_dport);
> + /*
> + * Note, when nr_dports returns to zero the port is unregistered
> + * and triggers cleanup. I.e. no need for open-coded release
> + * action on dport removal. See cxl_detach_ep() for that logic.
> + */
> }
>
> + new_dport = cxl_add_dport_by_dev(port, dport_dev);
> + if (IS_ERR(new_dport))
> + return new_dport;
> +
> + rc = cxl_dport_autoremove(new_dport);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + cxl_switch_parse_cdat(new_dport);
> +
> + cxl_port_group_close(no_free_ptr(port_group));
Give name vs what it does I'm not sure how this currently works.
> +
> + dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
> + port->nr_dports - 1, new_dport->port_id, dev_name(dport_dev));
> +
> /* New dport added, update the decoder targets */
> device_for_each_child(&port->dev, new_dport, update_decoder_targets);
>
> - dev_dbg(&port->dev, "dport%d:%s added\n", new_dport->port_id,
> - dev_name(dport_dev));
> -
> - return no_free_ptr(new_dport);
> + return new_dport;
> }
>
> static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 20/34] cxl/port: Move dport operations to a driver event
2026-01-14 18:20 ` [PATCH v14 20/34] cxl/port: Move dport operations to a driver event Terry Bowman
2026-01-14 21:45 ` Dave Jiang
@ 2026-01-15 14:56 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 14:56 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:41 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> In preparation for adding more register setup to the cxl_port_add_dport()
> path (for RAS register mapping), move the dport creation event to a driver
> callback. This achieves 2 things it puts driver operations logically where
> they belong, in a driver, and it obviates the gymnastics of
> DECLARE_TESTABLE() which just makes a mess of grepping for CXL symbols.
>
> In other words, a driver callback is less of an ongoing maintenance burden
> than this DECLARE_TESTABLE arrangement that does not scale and diminishes
> the grep-ability of the codebase.
>
> cxl_port_add_dport() moves mostly unmodified from drivers/cxl/core/port.c.
> The only deliberate change is that it now assumes that the device_lock is
> held on entry and the driver is attached (just like cxl_port_probe()).
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Subject to carrying fixes from earlier review forwards in the code movement,
this looks fine to me.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 21/34] cxl/port: Move dport RAS reporting to a port resource
2026-01-14 18:20 ` [PATCH v14 21/34] cxl/port: Move dport RAS reporting to a port resource Terry Bowman
2026-01-14 21:47 ` Dave Jiang
@ 2026-01-15 15:02 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 15:02 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:42 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Towards the end goal of making all CXL RAS capability handling uniform
> across upstream host bridges, upstream switch ports, and upstream endpoint
> ports, move dport RAS setup to cxl_endpoint_port_probe(). Rename the RAS
> setup helper to devm_cxl_dport_ras_setup() for symmetry with
> devm_cxl_switch_port_decoders_setup().
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
One trivial thing inline.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>
> ---
>
> Changes in v13 -> v14:
> - New patch
> ---
> drivers/cxl/core/ras.c | 12 ++++++------
> drivers/cxl/cxlpci.h | 8 ++++----
> drivers/cxl/mem.c | 2 --
> drivers/cxl/port.c | 12 ++++++++++++
> tools/testing/cxl/Kbuild | 2 +-
> tools/testing/cxl/test/mock.c | 6 +++---
> 6 files changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 72908f3ced77..d71fcac31cf2 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -139,17 +139,17 @@ static void cxl_dport_map_ras(struct cxl_dport *dport)
> }
>
> /**
> - * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
> + * devm_cxl_dport_ras_setup - Setup CXL RAS report on this dport
> * @dport: the cxl_dport that needs to be initialized
> - * @host: host device for devm operations
> */
> -void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
> +void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
Not a thing for this patch set, but might be nice to at somepoint
prefix all the functions that have devm_ registrations in them so
it is obvious where those are hiding. I had to dig down a few
levels to find the call.
> {
> - dport->reg_map.host = host;
> + dport->reg_map.host = &dport->port->dev;
> cxl_dport_map_ras(dport);
>
> if (dport->rch) {
> - struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
> + struct pci_host_bridge *host_bridge =
> + to_pci_host_bridge(dport->dport_dev);
Unrelated change. This series is complex, so this sort of noise is not helpful
to reviewability!
>
> if (!host_bridge->native_aer)
> return;
> @@ -158,7 +158,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
> cxl_disable_rch_root_ints(dport);
> }
> }
> -EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_ras_setup, "CXL");
>
> void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
> {
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 23/34] cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers
2026-01-14 18:20 ` [PATCH v14 23/34] cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2026-01-14 21:53 ` Dave Jiang
@ 2026-01-15 15:17 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 15:17 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:44 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> In preparation for CXL VH (Virtual Host) topology protocol error handling,
> add RAS capability registered mapping for all ports in a CXL VH topology.
> This includes the RAS capabilities of Switch Upstream Ports, Switch
> Downstream Ports, Host Bridge Ports ("upstream"), and Root Ports
> ("downstream")
>
> Update cxl_port_add_dport() to map the upstream RAS capability on first
> 'dport' attach, and downstream RAS capability on each 'dport' attach.
> Arrange for dport mappings to be released at del_dport() time.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> [djbw: reword changelog, fix devm handling]
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
One comment inline on which level we handle failures in ras setup at but
that's already true so things aren't made worse by this.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
I'm not particularly keen on failing to pass errors up to
callers of devm_cxl_dport_ras_setup() which could then cleanly
ignore them with comments saying why. However, that predates this
anyway so a question for another day.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 32/34] cxl: Update Endpoint uncorrectable protocol error handling
2026-01-14 22:07 ` dan.j.williams
@ 2026-01-15 15:26 ` Bowman, Terry
2026-01-15 15:27 ` Bowman, Terry
1 sibling, 0 replies; 129+ messages in thread
From: Bowman, Terry @ 2026-01-15 15:26 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/2026 4:07 PM, dan.j.williams@intel.com wrote:
> Terry Bowman wrote:
>> The CXL drivers must support handling Endpoint CXL and PCI uncorrectable
>> (UCE) protocol errors. Update the drivers to support both.
>>
>> Introduce cxl_pci_error_detected() to handle PCI correctable errors,
>> replacing cxl_error_detected(). Implement this new function to call
>> the existing CXL Port uncorrectable handler, cxl_port_error_detected().
>>
>> Update cxl_port_error_detected() for Endpoint handling. Take the CXL
>> memory device lock, check for a valid driver, and handle restricted
>> CXL device (RCH) if needed. This is the same sequence initially in
>> cxl_error_detected(). But, the UCE handler's logic for the returned
>> result errors is simplified because recovery will not be tried and
>> instead UCE's will result in the CXL driver invoking system panic.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>>
>> ---
>>
>> Changes in v13->v14:
>> - Update commit headline (Bjorn)
>> - Rename pci_error_detected()/pci_cor_error_detected() ->
>> cxl_pci_error_detected/cxl_pci_cor_error_detected() (Jonathan)
>> - Remove now-invalid comment in cxl_error_detected() (Jonathan)
>> - Split into separate patches for UCE and CE (Terry)
>>
>> Changes in v12->v13:
>> - Update commit messaqge (Terry)
>> - Updated all the implementation and commit message. (Terry)
>> - Refactored cxl_cor_error_detected()/cxl_error_detected() to remove
>> pdev (Dave Jiang)
>>
>> Changes in v11->v12:
>> - None
>>
>> Changes in v10->v11:
>> - cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
>> - cxl_error_detected() - Remove extra line (Shiju)
>> - Changes moved to core/ras.c (Terry)
>> - cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
>> - Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
>> - Move #include "pci.h from cxl.h to core.h (Terry)
>> - Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
> [..]
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index 96ce85cc0a46..dc6e02d64821 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
> [..]
>> @@ -373,55 +399,21 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
>>
>> -pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>> - pci_channel_state_t state)
>> +pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
>> + pci_channel_state_t error)
>> {
>> - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>> - struct cxl_memdev *cxlmd = cxlds->cxlmd;
>> - struct device *dev = &cxlmd->dev;
>> - bool ue;
>> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
>> + pci_ers_result_t rc;
>>
>> - guard(device)(dev);
>> + guard(device)(&port->dev);
>>
>> - if (!dev->driver) {
>> - dev_warn(&pdev->dev,
>> - "%s: memdev disabled, abort error handling\n",
>> - dev_name(dev));
>> - return PCI_ERS_RESULT_DISCONNECT;
>> - }
>> + rc = cxl_port_error_detected(&pdev->dev);
>> + if (rc == PCI_ERS_RESULT_PANIC)
>> + panic("CXL cachemem error.");
> [..]
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index acb0eb2a13c3..ff741adc7c7f 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -1051,8 +1051,8 @@ static void cxl_reset_done(struct pci_dev *pdev)
>> }
>> }
>>
>> -static const struct pci_error_handlers cxl_error_handlers = {
>> - .error_detected = cxl_error_detected,
>> +static const struct pci_error_handlers pci_error_handlers = {
>> + .error_detected = cxl_pci_error_detected,
>
> I still feel like we are disconnected on the fundamental question of who
> is responsible for invoking CXL protocol error handling.
>
> To be clear, all of this:
>
> cxl/port: Remove "enumerate dports" helpers
> cxl/port: Fix devm resource leaks around with dport management
> cxl/port: Move dport operations to a driver event
> cxl/port: Move dport RAS reporting to a port resource
> cxl/port: Move endpoint component register management to cxl_port
> cxl/port: Unify endpoint and switch port lookup
>
> Was with the intent that cxl_pci and any other driver that creates a
> cxl_memdev never needs to worry about CXL protocol error handling. It
> comes "for free" by registering a "struct cxl_memdev".
>
> This is the rationale for "struct pci_dev" to grow an "is_cxl"
> attribute, and for the PCI core to learn how to forward PCIE internal
> errors on CXL devices to the CXL core.
>
> The only errors that cxl_pci needs to worry about are non-internal /
> native PCI errors. All CXL errors will have already been routed to the
> CXL core for generic handling based on a port lookup.
>
> So the end state I am looking for is no call to
> cxl_port_error_detected() from any 'struct pci_error_handlers'
> implementation. Untangle that ambiguity in the AER core and do not
> inflict it on every CXL driver that comes after.
>
> I think we are close to that outcome if not already there by simply
> deleting this last cxl_pci_error_detected() -> cxl_port_error_detected()
> "false dependency".
>
> Now, if an endpoint driver ever thinks it can do anything sane with CXL
> protocol error beyond what the core is already handling, then we can
> think about complications like passing a cxl_port error handler
> template. I struggle to think of a case like that.
Thanks for explaining. If I understand correctly the CXL PCI error handlers
should only look at AER (no CXL RAS). We probably don't need a CXL PCI CE
handler in this case either because the AER is already handled & logged by
the AER driver. The UCE CXL PCI handler is needed to return a pci_ers_result
to the AER driver. How does this sound ?
-Terry
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 32/34] cxl: Update Endpoint uncorrectable protocol error handling
2026-01-14 22:07 ` dan.j.williams
2026-01-15 15:26 ` Bowman, Terry
@ 2026-01-15 15:27 ` Bowman, Terry
1 sibling, 0 replies; 129+ messages in thread
From: Bowman, Terry @ 2026-01-15 15:27 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/2026 4:07 PM, dan.j.williams@intel.com wrote:
> Terry Bowman wrote:
>> The CXL drivers must support handling Endpoint CXL and PCI uncorrectable
>> (UCE) protocol errors. Update the drivers to support both.
>>
>> Introduce cxl_pci_error_detected() to handle PCI correctable errors,
>> replacing cxl_error_detected(). Implement this new function to call
>> the existing CXL Port uncorrectable handler, cxl_port_error_detected().
>>
>> Update cxl_port_error_detected() for Endpoint handling. Take the CXL
>> memory device lock, check for a valid driver, and handle restricted
>> CXL device (RCH) if needed. This is the same sequence initially in
>> cxl_error_detected(). But, the UCE handler's logic for the returned
>> result errors is simplified because recovery will not be tried and
>> instead UCE's will result in the CXL driver invoking system panic.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>>
>> ---
>>
>> Changes in v13->v14:
>> - Update commit headline (Bjorn)
>> - Rename pci_error_detected()/pci_cor_error_detected() ->
>> cxl_pci_error_detected/cxl_pci_cor_error_detected() (Jonathan)
>> - Remove now-invalid comment in cxl_error_detected() (Jonathan)
>> - Split into separate patches for UCE and CE (Terry)
>>
>> Changes in v12->v13:
>> - Update commit messaqge (Terry)
>> - Updated all the implementation and commit message. (Terry)
>> - Refactored cxl_cor_error_detected()/cxl_error_detected() to remove
>> pdev (Dave Jiang)
>>
>> Changes in v11->v12:
>> - None
>>
>> Changes in v10->v11:
>> - cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
>> - cxl_error_detected() - Remove extra line (Shiju)
>> - Changes moved to core/ras.c (Terry)
>> - cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
>> - Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
>> - Move #include "pci.h from cxl.h to core.h (Terry)
>> - Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
> [..]
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index 96ce85cc0a46..dc6e02d64821 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
> [..]
>> @@ -373,55 +399,21 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
>>
>> -pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>> - pci_channel_state_t state)
>> +pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
>> + pci_channel_state_t error)
>> {
>> - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>> - struct cxl_memdev *cxlmd = cxlds->cxlmd;
>> - struct device *dev = &cxlmd->dev;
>> - bool ue;
>> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
>> + pci_ers_result_t rc;
>>
>> - guard(device)(dev);
>> + guard(device)(&port->dev);
>>
>> - if (!dev->driver) {
>> - dev_warn(&pdev->dev,
>> - "%s: memdev disabled, abort error handling\n",
>> - dev_name(dev));
>> - return PCI_ERS_RESULT_DISCONNECT;
>> - }
>> + rc = cxl_port_error_detected(&pdev->dev);
>> + if (rc == PCI_ERS_RESULT_PANIC)
>> + panic("CXL cachemem error.");
> [..]
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index acb0eb2a13c3..ff741adc7c7f 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -1051,8 +1051,8 @@ static void cxl_reset_done(struct pci_dev *pdev)
>> }
>> }
>>
>> -static const struct pci_error_handlers cxl_error_handlers = {
>> - .error_detected = cxl_error_detected,
>> +static const struct pci_error_handlers pci_error_handlers = {
>> + .error_detected = cxl_pci_error_detected,
>
> I still feel like we are disconnected on the fundamental question of who
> is responsible for invoking CXL protocol error handling.
>
> To be clear, all of this:
>
> cxl/port: Remove "enumerate dports" helpers
> cxl/port: Fix devm resource leaks around with dport management
> cxl/port: Move dport operations to a driver event
> cxl/port: Move dport RAS reporting to a port resource
> cxl/port: Move endpoint component register management to cxl_port
> cxl/port: Unify endpoint and switch port lookup
>
> Was with the intent that cxl_pci and any other driver that creates a
> cxl_memdev never needs to worry about CXL protocol error handling. It
> comes "for free" by registering a "struct cxl_memdev".
>
> This is the rationale for "struct pci_dev" to grow an "is_cxl"
> attribute, and for the PCI core to learn how to forward PCIE internal
> errors on CXL devices to the CXL core.
>
> The only errors that cxl_pci needs to worry about are non-internal /
> native PCI errors. All CXL errors will have already been routed to the
> CXL core for generic handling based on a port lookup.
>
> So the end state I am looking for is no call to
> cxl_port_error_detected() from any 'struct pci_error_handlers'
> implementation. Untangle that ambiguity in the AER core and do not
> inflict it on every CXL driver that comes after.
>
> I think we are close to that outcome if not already there by simply
> deleting this last cxl_pci_error_detected() -> cxl_port_error_detected()
> "false dependency".
>
> Now, if an endpoint driver ever thinks it can do anything sane with CXL
> protocol error beyond what the core is already handling, then we can
> think about complications like passing a cxl_port error handler
> template. I struggle to think of a case like that.
Thanks for explaining. If I understand correctly the CXL PCI error handlers
should only look at AER (no CXL RAS). We probably don't need a CXL PCI CE
handler in this case either because the AER is already handled & logged by
the AER driver. The UCE CXL PCI handler is needed to return a pci_ers_result
to the AER driver. How does this sound ?
-Terry
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 24/34] cxl/port: Move endpoint component register management to cxl_port
2026-01-14 18:20 ` [PATCH v14 24/34] cxl/port: Move endpoint component register management to cxl_port Terry Bowman
2026-01-14 21:55 ` Dave Jiang
@ 2026-01-15 15:28 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 15:28 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:45 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> In preparation for generic protocol error handling across CXL endpoints,
> whether they be memory expander class devices or accelerators, drop the
> endpoint component management from cxl_dev_state.
>
> Organize all CXL port component management through the common cxl_port
> driver.
>
> Note that the end game is that drivers/cxl/core/ras.c loses all
> dependencies on a 'struct cxl_dev_state' parameter and operates only on
> port resources. The removal of component register mapping from cxl_pci is
> an incremental step towards that.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 25/34] cxl/port: Map Port component registers before switchport init
2026-01-14 18:20 ` [PATCH v14 25/34] cxl/port: Map Port component registers before switchport init Terry Bowman
2026-01-14 21:59 ` Dave Jiang
@ 2026-01-15 15:30 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 15:30 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:46 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> Port HDM registers must be mapped before calling
> devm_cxl_switch_port_decoders_setup(). Invoke a call to this function
> in cxl_port_add_dport().
As I read that description, there is a bisection break before this
in that if you build up to patch 24, they won't be mapped before
it is called.
Maybe this needs squashing with an earlier patch, or if it is for
some reason safe, then add a comment here on why.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
> drivers/cxl/core/port.c | 3 ++-
> drivers/cxl/cxlpci.h | 3 +++
> drivers/cxl/port.c | 5 +++++
> 3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 2c4e28e7975c..3f730511f11d 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -778,7 +778,7 @@ static int cxl_setup_comp_regs(struct device *host, struct cxl_register_map *map
> return cxl_setup_regs(map);
> }
>
> -static int cxl_port_setup_regs(struct cxl_port *port,
> +int cxl_port_setup_regs(struct cxl_port *port,
> resource_size_t component_reg_phys)
> {
> if (dev_is_platform(port->uport_dev))
> @@ -786,6 +786,7 @@ static int cxl_port_setup_regs(struct cxl_port *port,
> return cxl_setup_comp_regs(&port->dev, &port->reg_map,
> component_reg_phys);
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_port_setup_regs, "CXL");
>
> static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
> resource_size_t component_reg_phys)
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index ef4496b4e55e..532506595d0f 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -99,4 +99,7 @@ static inline void devm_cxl_port_ras_setup(struct cxl_port *port)
> }
> #endif
>
> +int cxl_port_setup_regs(struct cxl_port *port,
> + resource_size_t component_reg_phys);
> +
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index d76b4b532064..f8a33dbf8222 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -278,6 +278,11 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> return ERR_CAST(port_group);
>
> if (port->nr_dports == 0) {
> +
> + rc = cxl_port_setup_regs(port, port->component_reg_phys);
> + if (rc)
> + return ERR_PTR(rc);
> +
> rc = devm_cxl_switch_port_decoders_setup(port);
> if (rc)
> return ERR_PTR(rc);
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 28/34] PCI/AER: Move AER driver's CXL VH handling to pcie/aer_cxl_vh.c
2026-01-14 18:20 ` [PATCH v14 28/34] PCI/AER: Move AER driver's CXL VH handling to pcie/aer_cxl_vh.c Terry Bowman
@ 2026-01-15 15:40 ` Jonathan Cameron
0 siblings, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 15:40 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:49 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> CXL virtual hierarchy (VH) RAS handling for CXL Port devices will be added
> soon. This requires a notification mechanism for the AER driver to share
> the AER interrupt with the CXL driver. The notification will be used as an
> indication for the CXL drivers to handle and log the CXL RAS errors.
>
> Note, 'CXL protocol error' terminology will refer to CXL VH and not
> CXL RCH errors unless specifically noted going forward.
>
> Introduce a new file in the AER driver to handle the CXL protocol errors
> named pci/pcie/aer_cxl_vh.c.
>
> Add a kfifo work queue to be used by the AER and CXL drivers. The AER
> driver will be the sole kfifo producer adding work and the cxl_core will be
> the sole kfifo consumer removing work. Add the boilerplate kfifo support.
> Encapsulate the kfifo, RW semaphore, and work pointer in a single structure.
>
> Add CXL work queue handler registration functions in the AER driver. Export
> the functions allowing CXL driver to access. Implement registration
> functions for the CXL driver to assign or clear the work handler function.
> Synchronize accesses using the RW semaphore.
>
> Introduce 'struct cxl_proto_err_work_data' to serve as the kfifo work data.
> This will contain a reference to the PCI error source device and the error
> severity. This will be used when the work is dequeued by the cxl_core driver.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
I'd not trust me :). Some more comments inline.
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> new file mode 100644
> index 000000000000..2189d3c6cef1
> --- /dev/null
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -0,0 +1,78 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2025 AMD Corporation. All rights reserved. */
> +
> +#include <linux/types.h>
I'd pick an ordering for headers. This doesn't seem to follow any common style.
> +#include <linux/aer.h>
> +#include <linux/bitfield.h>
> +#include <linux/kfifo.h>
> +#include "../pci.h"
> +#include "portdrv.h"
> +
> +#define CXL_ERROR_SOURCES_MAX 128
> +
> +struct cxl_proto_err_kfifo {
> + struct work_struct *work;
> + struct rw_semaphore rw_sema;
> + DECLARE_KFIFO(fifo, struct cxl_proto_err_work_data,
> + CXL_ERROR_SOURCES_MAX);
> +};
I've not checked later patches yet, but given the type is never
really used, can we make this an anonymous structure?
It's just there for grouping a bunch of related things so
not surprising the structure type isn't much used.
> +
> +static struct cxl_proto_err_kfifo cxl_proto_err_kfifo = {
> + .rw_sema = __RWSEM_INITIALIZER(cxl_proto_err_kfifo.rw_sema)
> +};
> +
> +bool is_aer_internal_error(struct aer_err_info *info)
> +{
> + if (info->severity == AER_CORRECTABLE)
> + return info->status & PCI_ERR_COR_INTERNAL;
> +
> + return info->status & PCI_ERR_UNC_INTN;
> +}
As per earlier feedback, this is generic AER stuff. I'd leave
it in the original file and remove the ifdef magic around it
(thus avoiding the stub etc)
> +
> +bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
> +{
> + if (!info || !info->is_cxl)
> + return false;
> +
> + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
To me this doesn't fit with the function name. Lots of things
are cxl errors from non endpoint sources. So I'd choose a more
specific name.
> + return false;
> +
> + return is_aer_internal_error(info);
> +}
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 29/34] cxl/port: Unify endpoint and switch port lookup
2026-01-14 18:20 ` [PATCH v14 29/34] cxl/port: Unify endpoint and switch port lookup Terry Bowman
2026-01-14 23:04 ` Dave Jiang
@ 2026-01-15 15:44 ` Jonathan Cameron
1 sibling, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 15:44 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:50 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> In support of generic CXL protocol error handling across various 'struct
> cxl_port' types, update find_cxl_port_by_uport() to retrieve endpoint CXL
> port companions from endpoint PCIe device instances.
>
> The end result is that upstream switch ports and endpoint ports can share
> error handling and eventually delete the misplaced cxl_error_handlers from
> the cxl_pci class driver.
Should mention that you are converting to kernel-doc.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 129+ messages in thread
* RE: [PATCH v14 14/34] PCI/AER: Report CXL or PCIe bus type in AER trace logging
2026-01-14 19:45 ` Jonathan Cameron
@ 2026-01-15 15:55 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 129+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-15 15:55 UTC (permalink / raw)
To: Jonathan Cameron, Terry Bowman
Cc: dave@stgolabs.net, dave.jiang@intel.com,
alison.schofield@intel.com, dan.j.williams@intel.com,
bhelgaas@google.com, Shiju Jose, ming.li@zohomail.com,
mchehab+huawei@kernel.org, Smita.KoralahalliChannabasappa@amd.com,
rrichter@amd.com, dan.carpenter@linaro.org,
PradeepVineshReddy.Kodamati@amd.com, lukas@wunner.de,
Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
linux-cxl@vger.kernel.org, vishal.l.verma@intel.com,
alucerop@amd.com, ira.weiny@intel.com,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
> > The AER service driver and aer_event tracing currently log 'PCIe Bus Type'
> > for all errors. Update the driver and aer_event tracing to log 'CXL
> > Bus Type' for CXL device errors.
> >
> > This requires that AER can identify and distinguish between PCIe
> > errors and CXL errors.
> >
> > Introduce boolean 'is_cxl' to 'struct aer_err_info'. Add assignment in
> > aer_get_device_error_info() and pci_print_aer().
> >
> > Update the aer_event trace routine to accept a bus type string parameter.
> >
> > Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> > Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
> I wonder if it is worth using __print_symbolic() etc and an integer storage rather than a string for in the tracepoints.
> However, not really that important to me as the strings are small anyway and there is no precedence of this in ras trace events.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
It would be a lot better to pass integer values instead of strings. Right now, I'm working on a way
to have a CI to check if Kernel + rasdaemon is doing the right thing. The idea is to inject errors
on QEMU via QMP interface (main patches already there at QEMU tree).
By using integers, it sounds easier to veriy if everything was properly handled, as we can
ignore __print_symbolic at rasdaemon, picking the actual values directly.
Regards,
Mauro
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error
2026-01-14 18:20 ` [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2026-01-14 23:18 ` Dave Jiang
@ 2026-01-15 16:01 ` Jonathan Cameron
2026-01-15 17:29 ` Bowman, Terry
2026-01-22 18:32 ` Bjorn Helgaas
2 siblings, 1 reply; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 16:01 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:51 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> The AER driver now forwards CXL protocol errors to the CXL driver via a
> kfifo. The CXL driver must consume these work items and initiate protocol
> error handling while ensuring the device's RAS mappings remain valid
> throughout processing.
>
> Implement cxl_proto_err_work_fn() to dequeue work items forwarded by the
> AER service driver. Lock the parent CXL Port device to ensure the CXL
> device's RAS registers are accessible during handling. Add pdev reference-put
> to match reference-get in AER driver. This will ensure pdev access after
> kfifo dequeue. These changes apply to CXL Ports and CXL Endpoints.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Few things inline.
Thanks,
Jonathan
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index bf82880e19b4..0c640b84ad70 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -117,17 +117,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
> +/*
> + * Return 'struct cxl_port *' parent CXL Port of dev
> + *
> + * Reference count increments returned port on success
> + *
> + * @pdev: Find the parent CXL Port of this device
This is a non standard type of a comment. I'd make it formal
kernel-doc.
> +
> +static void cxl_proto_err_work_fn(struct work_struct *work)
> +{
> + struct cxl_proto_err_work_data wd;
> +
> + while (cxl_proto_err_kfifo_get(&wd)) {
I'm probably being slow today but where does that helper come from?
> + struct pci_dev *pdev __free(pci_dev_put) = wd.pdev;
> +
> + if (!pdev) {
> + pr_err_ratelimited("NULL PCI device passed in AER-CXL KFIFO\n");
> + continue;
> + }
> +
> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
> + if (!port) {
> + pr_err_ratelimited("Failed to find parent Port device in CXL topology.\n");
> + continue;
> + }
> + guard(device)(&port->dev);
> +
> + cxl_handle_proto_error(&wd);
> + }
> +}
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 31/34] PCI: Introduce CXL Port protocol error handlers
2026-01-14 23:37 ` Dave Jiang
@ 2026-01-15 16:12 ` Jonathan Cameron
0 siblings, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 16:12 UTC (permalink / raw)
To: Dave Jiang
Cc: Terry Bowman, dave, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
> > diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> > index 0c640b84ad70..96ce85cc0a46 100644
> > --- a/drivers/cxl/core/ras.c
> > +++ b/drivers/cxl/core/ras.c
> > +
> > +static pci_ers_result_t cxl_port_error_detected(struct device *dev);
> > +
> > +static void cxl_do_recovery(struct pci_dev *pdev)
> > +{
> > + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
> To minimize errors, move this line to right above when you check !port. It's acceptable to do inline declaration when it comes cleanup macros.
>
> DJ
> > + pci_ers_result_t status;
> > +
> > + if (!port) {
> > + pci_err(pdev, "Failed to find the CXL device\n");
> > + return;
> > + }
> > +
> > + status = cxl_port_error_detected(&pdev->dev);
> > + if (status == PCI_ERS_RESULT_PANIC)
> > + panic("CXL cachemem error.");
> > +
> > + /*
> > + * If we have native control of AER, clear error status in the device
> > + * that detected the error. If the platform retained control of AER,
> > + * it is responsible for clearing this status. In that case, the
> > + * signaling device may not even be visible to the OS.
> > + */
> > + if (pcie_aer_is_native(pdev)) {
> > + pcie_clear_device_status(pdev);
> > + pci_aer_clear_nonfatal_status(pdev);
> > + pci_aer_clear_fatal_status(pdev);
> > + }
> > +}
> > +
> > void cxl_cor_error_detected(struct pci_dev *pdev)
> > {
> > struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> > @@ -346,6 +425,24 @@ EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
> >
> > static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> > {
> > + struct pci_dev *pdev = err_info->pdev;
> > +
> > + if (err_info->severity == AER_CORRECTABLE) {
> > +
> > + if (!pcie_aer_is_native(pdev))
> > + return;
> > +
> > + if (pdev->aer_cap)
> > + pci_clear_and_set_config_dword(pdev,
> > + pdev->aer_cap + PCI_ERR_COR_STATUS,
> > + 0, PCI_ERR_COR_INTERNAL);
> > +
> > + cxl_port_cor_error_detected(&pdev->dev);
> > +
> > + pcie_clear_device_status(pdev);
> > + } else {
> > + cxl_do_recovery(pdev);
> > + }
Could flip logic to get out of here quickly in one case.
if (err_info->severity != AER_CORRECTABLE) {
cxl_do_recovery(pdev);
return;
}
if (!pci...
just to reduce indent we don't need. Up to you though.
> > }
> >
> > static void cxl_proto_err_work_fn(struct work_struct *work)
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 13dbb405dc31..b7bfefdaf990 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2248,6 +2248,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
> > pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> > pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> > }
> > +EXPORT_SYMBOL_GPL(pcie_clear_device_status);
To me it's a little odd that we restrict this to AER
given it's not in AER specific registers or anything like that.
It only happens to be used in that code right now so I guess
it is ok to do this anyway.
> > #endif
> > diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> > index 0f616f5fafcf..aa69e504302f 100644
> > --- a/drivers/pci/pcie/aer_cxl_vh.c
> > +++ b/drivers/pci/pcie/aer_cxl_vh.c
> > @@ -34,7 +34,10 @@ bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
> > if (!info || !info->is_cxl)
> > return false;
> >
> > - if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
> > + if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT) &&
> > + (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
> > + (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM) &&
> > + (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
> > return false;
Ah. This fixes the earlier comment. Maybe add a temp comment
or similar there to say you'll handle others later.
Also, maybe this is cleaner as a switch to avoid all those pci_pcie_type(pdev)
(or a local variable might also work).
switch (pci_pcie_type(pdev)) {
case PCI_EXP_TYPE_ENDPOINT:
case PCI_EXP_TYPE_ROOT_PORT:
case PCI_EXP_TYPE_UPSTREAM:
case PCI_EXP_TYPE_DOWNSTREAM:
return is_aer_internal_error(info);
default:
return false;
}
> >
> > return is_aer_internal_error(info);
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 34/34] cxl: Enable CXL protocol errors during CXL Port probe
2026-01-14 18:20 ` [PATCH v14 34/34] cxl: Enable CXL protocol errors during CXL Port probe Terry Bowman
@ 2026-01-15 16:18 ` Jonathan Cameron
2026-01-15 19:41 ` Bowman, Terry
0 siblings, 1 reply; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-15 16:18 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 14 Jan 2026 12:20:55 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> CXL protocol errors are not enabled for all CXL devices after boot. These
> must be enabled inorder to process CXL protocol errors.
>
> Introduce cxl_unmask_proto_interrupts() to call pci_aer_unmask_internal_errors().
> pci_aer_unmask_internal_errors() expects the pdev->aer_cap is initialized.
> But, dev->aer_cap is not initialized for CXL Upstream Switch Ports and CXL
> Downstream Switch Ports. Initialize the dev->aer_cap if necessary. Enable AER
> correctable internal errors and uncorrectable internal errors for all CXL
> devices.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>
A question inline.
> ---
>
> Changes in v13->v14:
> - Update commit title's prefix (Bjorn)
>
> Changes in v12->v13:
> - Add dev and dev_is_pci() NULL checks in cxl_unmask_proto_interrupts() (Terry)
> - Add Dave Jiang's and Ben's review-by
>
> Changes in v11->v12:
> - None
>
> Changes in v10->v11:
> - Added check for valid PCI devices in is_cxl_error() (Terry)
> - Removed check for RCiEP in cxl_handle_proto_err() and
> cxl_report_error_detected() (Terry)
> ---
> drivers/cxl/core/port.c | 2 ++
> drivers/cxl/core/ras.c | 22 ++++++++++++++++++++++
> drivers/cxl/cxlpci.h | 4 ++++
> 3 files changed, 28 insertions(+)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 0bec10be5d56..588801c5d406 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1828,6 +1828,8 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>
> rc = cxl_add_ep(dport, &cxlmd->dev);
>
> + cxl_unmask_proto_interrupts(cxlmd->cxlds->dev);
> +
> /*
> * If the endpoint already exists in the port's list,
> * that's ok, it was added on a previous pass.
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 427009a8a78a..e299eb50fbe4 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -117,6 +117,24 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
> }
> static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>
> +void cxl_unmask_proto_interrupts(struct device *dev)
> +{
> + if (!dev || !dev_is_pci(dev))
> + return;
> +
> + struct pci_dev *pdev __free(pci_dev_put) = pci_dev_get(to_pci_dev(dev));
> +
> + if (!pdev->aer_cap) {
Add a comment to say why this might not be set. How did we get here
with out calling pci_aer_init()?
> + pdev->aer_cap = pci_find_ext_capability(pdev,
> + PCI_EXT_CAP_ID_ERR);
> + if (!pdev->aer_cap)
> + return;
> + }
> +
> + pci_aer_unmask_internal_errors(pdev);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_unmask_proto_interrupts, "CXL");
> +
> static void cxl_dport_map_ras(struct cxl_dport *dport)
> {
> struct cxl_register_map *map = &dport->reg_map;
> @@ -127,6 +145,8 @@ static void cxl_dport_map_ras(struct cxl_dport *dport)
> else if (cxl_map_component_regs(map, &dport->regs.component,
> BIT(CXL_CM_CAP_CAP_ID_RAS)))
> dev_dbg(dev, "Failed to map RAS capability.\n");
> +
> + cxl_unmask_proto_interrupts(dev);
> }
>
> /**
> @@ -159,6 +179,8 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
> if (cxl_map_component_regs(map, &port->regs,
> BIT(CXL_CM_CAP_CAP_ID_RAS)))
> dev_dbg(&port->dev, "Failed to map RAS capability\n");
> +
> + cxl_unmask_proto_interrupts(port->uport_dev);
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 3d70f9b4a193..0c915c0bdfac 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -89,6 +89,7 @@ void __cxl_uport_init_ras_reporting(struct cxl_port *port,
> int __cxl_await_media_ready(struct cxl_dev_state *cxlds);
> resource_size_t __cxl_rcd_component_reg_phys(struct device *dev,
> struct cxl_dport *dport);
> +void cxl_unmask_proto_interrupts(struct device *dev);
> #else
> static inline void cxl_pci_cor_error_detected(struct pci_dev *pdev)
> {
> @@ -104,6 +105,9 @@ static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
> static inline void devm_cxl_port_ras_setup(struct cxl_port *port)
> {
> }
> +static inline void cxl_unmask_proto_interrupts(struct device *dev)
> +{
> +}
> #endif
>
> int cxl_port_setup_regs(struct cxl_port *port,
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error
2026-01-15 16:01 ` Jonathan Cameron
@ 2026-01-15 17:29 ` Bowman, Terry
0 siblings, 0 replies; 129+ messages in thread
From: Bowman, Terry @ 2026-01-15 17:29 UTC (permalink / raw)
To: Jonathan Cameron
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On 1/15/2026 10:01 AM, Jonathan Cameron wrote:
> On Wed, 14 Jan 2026 12:20:51 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
>
>> The AER driver now forwards CXL protocol errors to the CXL driver via a
>> kfifo. The CXL driver must consume these work items and initiate protocol
>> error handling while ensuring the device's RAS mappings remain valid
>> throughout processing.
>>
>> Implement cxl_proto_err_work_fn() to dequeue work items forwarded by the
>> AER service driver. Lock the parent CXL Port device to ensure the CXL
>> device's RAS registers are accessible during handling. Add pdev reference-put
>> to match reference-get in AER driver. This will ensure pdev access after
>> kfifo dequeue. These changes apply to CXL Ports and CXL Endpoints.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> Few things inline.
> Thanks,
>
> Jonathan
>
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index bf82880e19b4..0c640b84ad70 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -117,17 +117,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>
>> +/*
>> + * Return 'struct cxl_port *' parent CXL Port of dev
>> + *
>> + * Reference count increments returned port on success
>> + *
>> + * @pdev: Find the parent CXL Port of this device
>
> This is a non standard type of a comment. I'd make it formal
> kernel-doc.
>
>
Ok, I'll update it.
>
>> +
>> +static void cxl_proto_err_work_fn(struct work_struct *work)
>> +{
>> + struct cxl_proto_err_work_data wd;
>> +
>> + while (cxl_proto_err_kfifo_get(&wd)) {
>
> I'm probably being slow today but where does that helper come from?
>
drivers/pci/pcie/aer_cxl_vh.c
Thanks for reviewing.
-Terry
>> + struct pci_dev *pdev __free(pci_dev_put) = wd.pdev;
>> +
>> + if (!pdev) {
>> + pr_err_ratelimited("NULL PCI device passed in AER-CXL KFIFO\n");
>> + continue;
>> + }
>> +
>> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
>> + if (!port) {
>> + pr_err_ratelimited("Failed to find parent Port device in CXL topology.\n");
>> + continue;
>> + }
>> + guard(device)(&port->dev);
>> +
>> + cxl_handle_proto_error(&wd);
>> + }
>> +}
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 34/34] cxl: Enable CXL protocol errors during CXL Port probe
2026-01-15 16:18 ` Jonathan Cameron
@ 2026-01-15 19:41 ` Bowman, Terry
0 siblings, 0 replies; 129+ messages in thread
From: Bowman, Terry @ 2026-01-15 19:41 UTC (permalink / raw)
To: Jonathan Cameron
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On 1/15/2026 10:18 AM, Jonathan Cameron wrote:
> On Wed, 14 Jan 2026 12:20:55 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
>
>> CXL protocol errors are not enabled for all CXL devices after boot. These
>> must be enabled inorder to process CXL protocol errors.
>>
>> Introduce cxl_unmask_proto_interrupts() to call pci_aer_unmask_internal_errors().
>> pci_aer_unmask_internal_errors() expects the pdev->aer_cap is initialized.
>> But, dev->aer_cap is not initialized for CXL Upstream Switch Ports and CXL
>> Downstream Switch Ports. Initialize the dev->aer_cap if necessary. Enable AER
>> correctable internal errors and uncorrectable internal errors for all CXL
>> devices.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>>
>
> A question inline.
>
>> ---
>>
>> Changes in v13->v14:
>> - Update commit title's prefix (Bjorn)
>>
>> Changes in v12->v13:
>> - Add dev and dev_is_pci() NULL checks in cxl_unmask_proto_interrupts() (Terry)
>> - Add Dave Jiang's and Ben's review-by
>>
>> Changes in v11->v12:
>> - None
>>
>> Changes in v10->v11:
>> - Added check for valid PCI devices in is_cxl_error() (Terry)
>> - Removed check for RCiEP in cxl_handle_proto_err() and
>> cxl_report_error_detected() (Terry)
>> ---
>> drivers/cxl/core/port.c | 2 ++
>> drivers/cxl/core/ras.c | 22 ++++++++++++++++++++++
>> drivers/cxl/cxlpci.h | 4 ++++
>> 3 files changed, 28 insertions(+)
>>
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 0bec10be5d56..588801c5d406 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -1828,6 +1828,8 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>>
>> rc = cxl_add_ep(dport, &cxlmd->dev);
>>
>> + cxl_unmask_proto_interrupts(cxlmd->cxlds->dev);
>> +
>> /*
>> * If the endpoint already exists in the port's list,
>> * that's ok, it was added on a previous pass.
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index 427009a8a78a..e299eb50fbe4 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -117,6 +117,24 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>> }
>> static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>>
>> +void cxl_unmask_proto_interrupts(struct device *dev)
>> +{
>> + if (!dev || !dev_is_pci(dev))
>> + return;
>> +
>> + struct pci_dev *pdev __free(pci_dev_put) = pci_dev_get(to_pci_dev(dev));
>> +
>> + if (!pdev->aer_cap) {
>
> Add a comment to say why this might not be set. How did we get here
> with out calling pci_aer_init()?
>
I borrowed this from the AER driver. cxl/core/ras.c and pci_dev::aer_cap are
both gated by CONFIG_PCIEAER making the only explanation for !pdev->aer_cap
to be caused by a missing AER capability. The CXL device is broken if this
happens.
-Terry
>> + pdev->aer_cap = pci_find_ext_capability(pdev,
>> + PCI_EXT_CAP_ID_ERR);
>> + if (!pdev->aer_cap)
>> + return;
>> + }
>> +
>> + pci_aer_unmask_internal_errors(pdev);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_unmask_proto_interrupts, "CXL");
>> +
>> static void cxl_dport_map_ras(struct cxl_dport *dport)
>> {
>> struct cxl_register_map *map = &dport->reg_map;
>> @@ -127,6 +145,8 @@ static void cxl_dport_map_ras(struct cxl_dport *dport)
>> else if (cxl_map_component_regs(map, &dport->regs.component,
>> BIT(CXL_CM_CAP_CAP_ID_RAS)))
>> dev_dbg(dev, "Failed to map RAS capability.\n");
>> +
>> + cxl_unmask_proto_interrupts(dev);
>> }
>>
>> /**
>> @@ -159,6 +179,8 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
>> if (cxl_map_component_regs(map, &port->regs,
>> BIT(CXL_CM_CAP_CAP_ID_RAS)))
>> dev_dbg(&port->dev, "Failed to map RAS capability\n");
>> +
>> + cxl_unmask_proto_interrupts(port->uport_dev);
>> }
>> EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>>
>> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
>> index 3d70f9b4a193..0c915c0bdfac 100644
>> --- a/drivers/cxl/cxlpci.h
>> +++ b/drivers/cxl/cxlpci.h
>> @@ -89,6 +89,7 @@ void __cxl_uport_init_ras_reporting(struct cxl_port *port,
>> int __cxl_await_media_ready(struct cxl_dev_state *cxlds);
>> resource_size_t __cxl_rcd_component_reg_phys(struct device *dev,
>> struct cxl_dport *dport);
>> +void cxl_unmask_proto_interrupts(struct device *dev);
>> #else
>> static inline void cxl_pci_cor_error_detected(struct pci_dev *pdev)
>> {
>> @@ -104,6 +105,9 @@ static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport)
>> static inline void devm_cxl_port_ras_setup(struct cxl_port *port)
>> {
>> }
>> +static inline void cxl_unmask_proto_interrupts(struct device *dev)
>> +{
>> +}
>> #endif
>>
>> int cxl_port_setup_regs(struct cxl_port *port,
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-14 19:08 ` Jonathan Cameron
@ 2026-01-15 20:42 ` dan.j.williams
2026-01-22 13:34 ` Lukas Wunner
0 siblings, 1 reply; 129+ messages in thread
From: dan.j.williams @ 2026-01-15 20:42 UTC (permalink / raw)
To: Jonathan Cameron, Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
Jonathan Cameron wrote:
> On Wed, 14 Jan 2026 12:20:31 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
>
> > The AER driver includes significant logic for handling CXL protocol errors.
> > The AER driver will be updated in the future to separate the AER and CXL
> > logic.
> >
> > Rename the is_internal_error() function to is_aer_internal_error() as it
> > gives a more precise indication of the purpose. Make is_aer_internal_error()
> > non-static to allow for other PCI drivers to access.
> >
> > Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Hi Terry,
>
> I don't see it as sensible to have is_aer_internal_error()
> return false if CXL is not built. That question has nothing to
> do with CXL. Hence if we are doing generic naming, I think we
> should just always have the function available. Gating on CXL
> belongs at whatever called it. Which is the case already for
> cxl_rch_handle_error() which has a stub that doesn't call this for
> when CXL stuff isn't built.
>
> Should just be a case of moving out of if the ifdef in aer.c
> as part of this patch.
I agree with the general sentiment, but not the conclusion, especially
because this is a private detail. Linux has long ignored internal
errors. The only reason to consider them now is because CXL decided to
multiplex its error model on top of this oft-ignored feature of PCIe
AER.
Specifically, portdrv.h is not in the global include namespace, this is
a private detail of the only conumer of internal errors:
drivers/pci/pcie/aer_cxl_{rch,vh}.c
At most we should have this as a comment to clarify:
/*
* Note, internal errors are only considered for the CXL error model,
* not for other implementations.
*/
...and the pci_aer_unmask_internal_errors() export should be:
EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core")
...for the same reason. Steer folks away from thinking that it is open
season for adding more internal error support.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting
2026-01-14 19:48 ` Jonathan Cameron
@ 2026-01-15 20:56 ` dan.j.williams
0 siblings, 0 replies; 129+ messages in thread
From: dan.j.williams @ 2026-01-15 20:56 UTC (permalink / raw)
To: Jonathan Cameron, Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
Jonathan Cameron wrote:
> On Wed, 14 Jan 2026 12:20:36 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
>
> > Update the existing 'struct aer_err_info' definition to use kernel-doc
> > formatting. Remove the inline comments to reduce noise and do not introduce
> > functional changes. This will improve readability and maintainability.
> >
> > Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Hi Terry.
>
> I didn't check but I think kernel-doc script will complain
> about partial docs. Other than that possibly needing fixing with
> a trivial entry for __pad1
It does:
Warning: drivers/pci/pci.h:764 struct member '__pad1' not described in 'aer_err_info'
Warning: drivers/pci/pci.h:764 struct member '__pad2' not described in 'aer_err_info'
...those are the only warnings in this set. Btw, this is my hacky script for
checking for new kdoc errors introduced in a patch series. I assume the
0day robot has something similar. Maybe something to cleanup and
contribute to checkpatch:
KDOC=~/git/linux/scripts/kernel-doc
for p in $(stg series -A --noprefix)
do
echo KERNELDOC $p
stg goto $p >/dev/null
for i in $(stg files --bare $p)
do
# only show the new errors relative to the contents of
# the file in the previous commit
f1=$(mktemp)
if [ git show HEAD^:$i >$f1 2>/dev/null ]; then
f2=$(mktemp)
f3=$(mktemp)
$KDOC $f1 2>$f2 >/dev/null
$KDOC $i 2>$f3 >/dev/null
diff -u $f2 $f3
rm $f2 $f3
else
$KDOC $i 2>&1 1>/dev/null
fi
rm $f1
done
done
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm()
2026-01-14 21:08 ` Dave Jiang
@ 2026-01-16 3:07 ` dan.j.williams
2026-01-16 16:22 ` Dave Jiang
0 siblings, 1 reply; 129+ messages in thread
From: dan.j.williams @ 2026-01-16 3:07 UTC (permalink / raw)
To: Dave Jiang, Terry Bowman, dave, jonathan.cameron,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
Dave Jiang wrote:
>
>
> On 1/14/26 11:20 AM, Terry Bowman wrote:
> > From: Dan Williams <dan.j.williams@intel.com>
> >
> > The convention for devm_ helpers in the CXL driver is that the first
> > argument is the @host for the operation (locked driver::probe() context).
> >
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > Reviewed-by: Terry Bowman <terry.bowman@amd.com>
>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> A nit below
>
> >
> > ---
> >
> > Changes in v13 -> v14:
> > - New patch
> > ---
> > drivers/cxl/core/pmem.c | 13 +++++++------
> > drivers/cxl/cxl.h | 3 ++-
> > drivers/cxl/mem.c | 2 +-
> > 3 files changed, 10 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c
> > index 8853415c106a..e7b1e6fa0ea0 100644
> > --- a/drivers/cxl/core/pmem.c
> > +++ b/drivers/cxl/core/pmem.c
> > @@ -237,12 +237,13 @@ static void cxlmd_release_nvdimm(void *_cxlmd)
> >
> > /**
> > * devm_cxl_add_nvdimm() - add a bridge between a cxl_memdev and an nvdimm
> > - * @parent_port: parent port for the (to be added) @cxlmd endpoint port
> > - * @cxlmd: cxl_memdev instance that will perform LIBNVDIMM operations
> > + * @host: host device for devm operations
> > + * @port: any port in the CXL topology to find the nvdimm-bridge device
> > + * @cxlmd: parent of the to be created cxl_nvdimm device
> > *
> > * Return: 0 on success negative error code on failure.
> > */
> > -int devm_cxl_add_nvdimm(struct cxl_port *parent_port,
> > +int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port,
>
> s/port/parent_port/ to maintain clarity of the port
...but it is not used as a "parent" port in this function. Any port in
the topology will do. The reason a port argument is needed is
disambiguate when there are multiple CXL root devices. That currently
only happens when cxl_test is loaded.
However, after writing that, it may make more sense to make that
semantic explicit and just have the caller responsible for passing in an
@cxl_root argument.
A change for not this series.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers
2026-01-14 19:50 ` Jonathan Cameron
2026-01-14 21:23 ` Dave Jiang
@ 2026-01-16 3:15 ` dan.j.williams
1 sibling, 0 replies; 129+ messages in thread
From: dan.j.williams @ 2026-01-16 3:15 UTC (permalink / raw)
To: Jonathan Cameron, Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
Jonathan Cameron wrote:
> On Wed, 14 Jan 2026 12:20:39 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
>
> > From: Dan Williams <dan.j.williams@intel.com>
> >
> > Now that cxl_switch_port_probe() no longer walks potential dports, because
> > they are enumerated dynamically on descendant endpoint arrival, remove the
> > dead code.
> >
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > Reviewed-by: Terry Bowman <terry.bowman@amd.com>
>
> Patch description doesn't match patch.
Yeah, something got frobbed from the reference patch:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/commit/?id=ac97e6edd792
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers
2026-01-14 18:20 ` [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers Terry Bowman
2026-01-14 19:50 ` Jonathan Cameron
2026-01-14 21:24 ` Dave Jiang
@ 2026-01-16 3:21 ` dan.j.williams
2 siblings, 0 replies; 129+ messages in thread
From: dan.j.williams @ 2026-01-16 3:21 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Now that cxl_switch_port_probe() no longer walks potential dports, because
> they are enumerated dynamically on descendant endpoint arrival, remove the
> dead code.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Terry, the reference patch I sent was based on 6.18-rc4. That means it
was missing this change:
3f5b8f7f34f6 cxl/port: Remove devm_cxl_port_enumerate_dports()
This cxl_walk_context move, which was not part of my original patch,
also does not appear to be needed, so this patch can simply be dropped.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management
2026-01-15 14:46 ` Jonathan Cameron
@ 2026-01-16 4:45 ` dan.j.williams
2026-01-16 15:01 ` Jonathan Cameron
0 siblings, 1 reply; 129+ messages in thread
From: dan.j.williams @ 2026-01-16 4:45 UTC (permalink / raw)
To: Jonathan Cameron, Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
Jonathan Cameron wrote:
> On Wed, 14 Jan 2026 12:20:40 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
>
> > From: Dan Williams <dan.j.williams@intel.com>
> >
> > With dport addition moving out of cxl_switch_port_probe() it is no longer
> > the case that a single dport-add failure will cause all dport resources
> > to be automatically unwound.
> >
> > devm still helps all dport resources get cleaned up when the port is
> > detached, but setup now needs to avoid leaking resources if an early exit
> > occurs during setup.
> >
> > Convert from a "devm add" model, to an "auto remove" model that makes the
> > caller responsible for registering devm reclaim after the object is fully
> > instantiated.
> >
> > As a side of effect of this reorganization port->nr_dports is now always
> > consistent with the number of entries in the port->dports xarray, and this
> > can stop playing games with ida_is_empty() which is unreliable as a
> > detector of whether decoders are setup. I.e. consider how
> > CONFIG_DEBUG_KOBJECT_RELEASE might wreak havoc with this approach.
> >
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > Reviewed-by: Terry Bowman <terry.bowman@amd.com>
> >
> > ---
> >
> > Changes in v13 -> v14:
> > - New patch
> Hi Dan, Terry,
>
> I think this needs a little reorganization to ensure we don't have
> dport and dport_add both being the same pointer for different free
> reasons. Adding a helper and we can combine them with a clear
> hand over of ownership.
>
> Wrapping devres_remove_group() in a function that is called close_group()
> rings alarm bells.
>
> Jonathan
[..]
>
> > @@ -1176,48 +1175,27 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> > &component_reg_phys);
> >
> > cond_cxl_root_lock(port);
> > - rc = add_dport(port, dport);
> > + struct cxl_dport *dport_add __free(remove_dport) =
> > + add_dport(port, dport);
>
> This pattern of having both dport and dport_add effectively
> pointing to the same pointer concerns me from a readability / maintainability
> point of view. We've often made use of helper functions to avoid doing
> this and I think that would make sense here as well.
Yeah, while I do think the multi-variable pattern is useful for
many-step object construction, I can usually easily be persuaded to
consider a helper function.
> Take everything down to and including dport_add() as a helper called
> something like (naming needs work!)
> struct dport_dev *dport __free(remove_and_free_dport) =
> add_dport_wrapper();
I ended up with the patch below which is similar in spirit to this
without a new DEFINE_FREE().
>
> > cond_cxl_root_unlock(port);
> > - if (rc)
> > - return ERR_PTR(rc);
> > -
> > - /*
> > - * Setup port register if this is the first dport showed up. Having
> > - * a dport also means that there is at least 1 active link.
> > - */
> > - if (port->nr_dports == 1 &&
> > - port->component_reg_phys != CXL_RESOURCE_NONE) {
> > - rc = cxl_port_setup_regs(port, port->component_reg_phys);
> > - if (rc) {
> > - xa_erase(&port->dports, (unsigned long)dport->dport_dev);
> > - return ERR_PTR(rc);
> > - }
> > - port->component_reg_phys = CXL_RESOURCE_NONE;
> > - }
> > + if (IS_ERR(dport_add))
> > + return dport_add;
> >
> > - get_device(dport_dev);
> > - rc = devm_add_action_or_reset(host, cxl_dport_remove, dport);
> > - if (rc)
> > - return ERR_PTR(rc);
> > + if (dev_is_pci(dport_dev))
> > + dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
> >
> > rc = sysfs_create_link(&port->dev.kobj, &dport_dev->kobj, link_name);
> > if (rc)
> > return ERR_PTR(rc);
> >
> > - rc = devm_add_action_or_reset(host, cxl_dport_unlink, dport);
> > - if (rc)
> > - return ERR_PTR(rc);
> > -
> > - if (dev_is_pci(dport_dev))
> > - dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
> > -
> > cxl_debugfs_create_dport_dir(dport);
> >
> > - return dport;
> > + retain_and_null_ptr(dport_add);
> > + return no_free_ptr(dport);
> > }
>
>
>
> > +
> > +/*
> > + * Note: this only services dynamic removal of mid-level ports, root ports are
> > + * always removed by the platform driver (e.g. cxl_acpi). @host can be
> > + * hard-coded to &port->dev.
> > + */
> > static void del_dport(struct cxl_dport *dport)
> > {
> > struct cxl_port *port = dport->port;
> >
> > - devm_release_action(&port->dev, cxl_dport_unlink, dport);
> > - devm_release_action(&port->dev, cxl_dport_remove, dport);
> > - devm_kfree(&port->dev, dport);
> > + devm_release_action(&port->dev, unlink_dport, dport);
> > }
> >
> > static void del_dports(struct cxl_port *port)
> > @@ -1597,10 +1603,24 @@ static int update_decoder_targets(struct device *dev, void *data)
> > return 0;
> > }
> >
> > -DEFINE_FREE(del_cxl_dport, struct cxl_dport *, if (!IS_ERR_OR_NULL(_T)) del_dport(_T))
> > +static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
> > +{
> > + if (!devres_open_group(&port->dev, port, GFP_KERNEL))
> > + return ERR_PTR(-ENOMEM);
> > + return port;
> > +}
> > +DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
> > + if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
> > +
> > +static void cxl_port_group_close(struct cxl_port *port)
>
> This feels like misleading naming and I'm not sure what intent is.
> Would have expected it to call devres_close_group()
Agree. The hastiness of this patch shows. Switched all the naming to not
be surprising. The flow is:
cxl_port_open_group(): start recording devres resource acquisition
cxl_port_remove_group(): on success, stop tracking the group, leave the resources
cxl_port_release_group(): on failure, destroy the group, free the resources
New patch, added a Fixes: tag.
-- 8< --
From 9731bb6cb5638a0d2141dc072f90db0d00400680 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@intel.com>
Date: Wed, 14 Jan 2026 12:20:40 -0600
Subject: [PATCH] cxl/port: Fix devm resource leaks with dport management
With dport addition moving out of cxl_switch_port_probe() it is no longer
the case that a single dport-add failure will cause all dport resources
to be automatically unwound.
devm still helps all dport resources get cleaned up when the port is
detached, but setup now needs to avoid leaking resources if an early exit
occurs during setup.
Convert from a "devm add" model, to an "auto remove" model that makes the
caller responsible for registering devm reclaim after the object is fully
instantiated.
As a side of effect of this reorganization port->nr_dports is now always
consistent with the number of entries in the port->dports xarray, and this
can stop playing games with ida_is_empty() which is unreliable as a
detector of whether decoders are setup. I.e. consider how
CONFIG_DEBUG_KOBJECT_RELEASE might wreak havoc with this approach.
Cc: <stable@vger.kernel.org>
Fixes: 4f06d81e7c6a ("cxl: Defer dport allocation for switch ports")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/cxl.h | 23 +--
tools/testing/cxl/exports.h | 4 +-
tools/testing/cxl/test/mock.h | 4 +-
drivers/cxl/acpi.c | 11 +-
drivers/cxl/core/pci.c | 10 +-
drivers/cxl/core/port.c | 252 ++++++++++++++++-----------
drivers/cxl/port.c | 8 +-
tools/testing/cxl/cxl_core_exports.c | 13 +-
tools/testing/cxl/test/cxl.c | 6 +-
tools/testing/cxl/test/mock.c | 25 ++-
tools/testing/cxl/Kbuild | 3 +-
11 files changed, 209 insertions(+), 150 deletions(-)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 6f3741a57932..47ee06c95433 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -796,12 +796,12 @@ struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
struct cxl_dport **dport);
bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
-struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
- struct device *dport, int port_id,
- resource_size_t component_reg_phys);
-struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
- struct device *dport_dev, int port_id,
- resource_size_t rcrb);
+struct cxl_dport *cxl_add_dport(struct cxl_port *port, struct device *dport,
+ int port_id,
+ resource_size_t component_reg_phys);
+struct cxl_dport *cxl_add_rch_dport(struct cxl_port *port,
+ struct device *dport_dev, int port_id,
+ resource_size_t rcrb);
struct cxl_decoder *to_cxl_decoder(struct device *dev);
struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
@@ -824,6 +824,7 @@ static inline int cxl_root_decoder_autoremove(struct device *host,
return cxl_decoder_autoremove(host, &cxlrd->cxlsd.cxld);
}
int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
+int cxl_dport_autoremove(struct cxl_dport *dport);
/**
* struct cxl_endpoint_dvsec_info - Cached DVSEC info
@@ -937,10 +938,10 @@ void cxl_coordinates_combine(struct access_coordinate *out,
struct access_coordinate *c2);
bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
-struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev);
-struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev);
+struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev);
+struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev);
/*
* Unit test builds overrides this to __weak, find the 'strong' version
@@ -964,7 +965,7 @@ u16 cxl_gpf_get_dvsec(struct device *dev);
*/
#ifndef CXL_TEST_ENABLE
#define DECLARE_TESTABLE(x) __##x
-#define devm_cxl_add_dport_by_dev DECLARE_TESTABLE(devm_cxl_add_dport_by_dev)
+#define cxl_add_dport_by_dev DECLARE_TESTABLE(cxl_add_dport_by_dev)
#define devm_cxl_switch_port_decoders_setup DECLARE_TESTABLE(devm_cxl_switch_port_decoders_setup)
#endif
diff --git a/tools/testing/cxl/exports.h b/tools/testing/cxl/exports.h
index 7ebee7c0bd67..cbb16073be18 100644
--- a/tools/testing/cxl/exports.h
+++ b/tools/testing/cxl/exports.h
@@ -4,8 +4,8 @@
#define __MOCK_CXL_EXPORTS_H_
typedef struct cxl_dport *(*cxl_add_dport_by_dev_fn)(struct cxl_port *port,
- struct device *dport_dev);
-extern cxl_add_dport_by_dev_fn _devm_cxl_add_dport_by_dev;
+ struct device *dport_dev);
+extern cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev;
typedef int(*cxl_switch_decoders_setup_fn)(struct cxl_port *port);
extern cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup;
diff --git a/tools/testing/cxl/test/mock.h b/tools/testing/cxl/test/mock.h
index 2684b89c8aa2..fa13aca4e260 100644
--- a/tools/testing/cxl/test/mock.h
+++ b/tools/testing/cxl/test/mock.h
@@ -22,8 +22,8 @@ struct cxl_mock_ops {
int (*devm_cxl_switch_port_decoders_setup)(struct cxl_port *port);
int (*devm_cxl_endpoint_decoders_setup)(struct cxl_port *port);
void (*cxl_endpoint_parse_cdat)(struct cxl_port *port);
- struct cxl_dport *(*devm_cxl_add_dport_by_dev)(struct cxl_port *port,
- struct device *dport_dev);
+ struct cxl_dport *(*cxl_add_dport_by_dev)(struct cxl_port *port,
+ struct device *dport_dev);
int (*hmat_get_extended_linear_cache_size)(struct resource *backing_res,
int nid,
resource_size_t *cache_size);
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 77ac940e3013..1e1383eb9bd5 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -679,16 +679,19 @@ static int add_host_bridge_dport(struct device *match, void *arg)
if (ctx.cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11) {
dev_dbg(match, "RCRB found for UID %lld: %pa\n", ctx.uid,
&ctx.base);
- dport = devm_cxl_add_rch_dport(root_port, bridge, ctx.uid,
- ctx.base);
+ dport = cxl_add_rch_dport(root_port, bridge, ctx.uid, ctx.base);
} else {
- dport = devm_cxl_add_dport(root_port, bridge, ctx.uid,
- CXL_RESOURCE_NONE);
+ dport = cxl_add_dport(root_port, bridge, ctx.uid,
+ CXL_RESOURCE_NONE);
}
if (IS_ERR(dport))
return PTR_ERR(dport);
+ ret = cxl_dport_autoremove(dport);
+ if (ret)
+ return ret;
+
ret = get_genport_coordinates(match, dport);
if (ret)
dev_dbg(match, "Failed to get generic port perf coordinates.\n");
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index b838c59d7a3c..ce117812e5c8 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -41,14 +41,14 @@ static int pci_get_port_num(struct pci_dev *pdev)
}
/**
- * __devm_cxl_add_dport_by_dev - allocate a dport by dport device
+ * __cxl_add_dport_by_dev - allocate a dport by dport device
* @port: cxl_port that hosts the dport
* @dport_dev: 'struct device' of the dport
*
* Returns the allocated dport on success or ERR_PTR() of -errno on error
*/
-struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev)
+struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev)
{
struct cxl_register_map map;
struct pci_dev *pdev;
@@ -67,9 +67,9 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
return ERR_PTR(rc);
device_lock_assert(&port->dev);
- return devm_cxl_add_dport(port, dport_dev, port_num, map.resource);
+ return cxl_add_dport(port, dport_dev, port_num, map.resource);
}
-EXPORT_SYMBOL_NS_GPL(__devm_cxl_add_dport_by_dev, "CXL");
+EXPORT_SYMBOL_NS_GPL(__cxl_add_dport_by_dev, "CXL");
static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
{
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index fef3aa0c6680..41b65babd057 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1066,11 +1066,28 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
return -EBUSY;
}
+ /*
+ * Unlike CXL switch upstream ports where it can train a CXL link
+ * independent of its downstream ports, a host bridge upstream port may
+ * not enable CXL registers until at least one downstream port (root
+ * port) trains CXL. Enumerate registers once when the number of dports
+ * transitions from zero to one.
+ */
+ if (!port->nr_dports) {
+ rc = cxl_port_setup_regs(port, port->component_reg_phys);
+ if (rc)
+ return rc;
+ }
+
+ /* Arrange for dport_dev to be valid through remove_dport() */
+ struct device *dev __free(put_device) = get_device(dport->dport_dev);
+
rc = xa_insert(&port->dports, (unsigned long)dport->dport_dev, dport,
GFP_KERNEL);
if (rc)
return rc;
+ retain_and_null_ptr(dev);
port->nr_dports++;
return 0;
}
@@ -1094,51 +1111,64 @@ static void cond_cxl_root_unlock(struct cxl_port *port)
device_unlock(&port->dev);
}
-static void cxl_dport_remove(void *data)
+static void remove_dport(struct cxl_dport *dport)
{
- struct cxl_dport *dport = data;
struct cxl_port *port = dport->port;
+ port->nr_dports--;
xa_erase(&port->dports, (unsigned long) dport->dport_dev);
put_device(dport->dport_dev);
}
-static void cxl_dport_unlink(void *data)
+static struct cxl_dport *__register_dport(struct cxl_dport *dport)
{
- struct cxl_dport *dport = data;
- struct cxl_port *port = dport->port;
+ int rc;
char link_name[CXL_TARGET_STRLEN];
+ struct cxl_port *port = dport->port;
+ struct device *dport_dev = dport->dport_dev;
- sprintf(link_name, "dport%d", dport->port_id);
- sysfs_remove_link(&port->dev.kobj, link_name);
-}
+ if (snprintf(link_name, CXL_TARGET_STRLEN, "dport%d", dport->port_id) >=
+ CXL_TARGET_STRLEN)
+ return ERR_PTR(-EINVAL);
-static struct cxl_dport *
-__devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
- int port_id, resource_size_t component_reg_phys,
- resource_size_t rcrb)
-{
- char link_name[CXL_TARGET_STRLEN];
- struct cxl_dport *dport;
- struct device *host;
- int rc;
+ cond_cxl_root_lock(port);
+ rc = add_dport(port, dport);
+ cond_cxl_root_unlock(port);
+ if (rc)
+ return ERR_PTR(rc);
- if (is_cxl_root(port))
- host = port->uport_dev;
- else
- host = &port->dev;
+ if (dev_is_pci(dport_dev))
+ dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
- if (!host->driver) {
- dev_WARN_ONCE(&port->dev, 1, "dport:%s bad devm context\n",
- dev_name(dport_dev));
- return ERR_PTR(-ENXIO);
+ rc = sysfs_create_link(&port->dev.kobj, &dport_dev->kobj, link_name);
+ if (rc) {
+ remove_dport(dport);
+ return ERR_PTR(rc);
}
- if (snprintf(link_name, CXL_TARGET_STRLEN, "dport%d", port_id) >=
- CXL_TARGET_STRLEN)
- return ERR_PTR(-EINVAL);
+ cxl_debugfs_create_dport_dir(dport);
- dport = devm_kzalloc(host, sizeof(*dport), GFP_KERNEL);
+ return dport;
+}
+
+static struct cxl_dport *register_or_free_dport(struct cxl_dport *dport)
+{
+ struct cxl_dport *result = __register_dport(dport);
+
+ if (IS_ERR(result))
+ kfree(dport);
+ return result;
+}
+
+static struct cxl_dport *__cxl_add_dport(struct cxl_port *port,
+ struct device *dport_dev, int port_id,
+ resource_size_t component_reg_phys,
+ resource_size_t rcrb)
+{
+ int rc;
+
+ struct cxl_dport *dport __free(kfree) =
+ kzalloc(sizeof(*dport), GFP_KERNEL);
if (!dport)
return ERR_PTR(-ENOMEM);
@@ -1175,49 +1205,11 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
dev_dbg(dport_dev, "Component Registers found for dport: %pa\n",
&component_reg_phys);
- cond_cxl_root_lock(port);
- rc = add_dport(port, dport);
- cond_cxl_root_unlock(port);
- if (rc)
- return ERR_PTR(rc);
-
- /*
- * Setup port register if this is the first dport showed up. Having
- * a dport also means that there is at least 1 active link.
- */
- if (port->nr_dports == 1 &&
- port->component_reg_phys != CXL_RESOURCE_NONE) {
- rc = cxl_port_setup_regs(port, port->component_reg_phys);
- if (rc) {
- xa_erase(&port->dports, (unsigned long)dport->dport_dev);
- return ERR_PTR(rc);
- }
- port->component_reg_phys = CXL_RESOURCE_NONE;
- }
-
- get_device(dport_dev);
- rc = devm_add_action_or_reset(host, cxl_dport_remove, dport);
- if (rc)
- return ERR_PTR(rc);
-
- rc = sysfs_create_link(&port->dev.kobj, &dport_dev->kobj, link_name);
- if (rc)
- return ERR_PTR(rc);
-
- rc = devm_add_action_or_reset(host, cxl_dport_unlink, dport);
- if (rc)
- return ERR_PTR(rc);
-
- if (dev_is_pci(dport_dev))
- dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
-
- cxl_debugfs_create_dport_dir(dport);
-
- return dport;
+ return register_or_free_dport(no_free_ptr(dport));
}
/**
- * devm_cxl_add_dport - append VH downstream port data to a cxl_port
+ * cxl_add_dport - append VH downstream port data to a cxl_port
* @port: the cxl_port that references this dport
* @dport_dev: firmware or PCI device representing the dport
* @port_id: identifier for this dport in a decoder's target list
@@ -1227,14 +1219,13 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
* either the port's host (for root ports), or the port itself (for
* switch ports)
*/
-struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
- struct device *dport_dev, int port_id,
- resource_size_t component_reg_phys)
+struct cxl_dport *cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
+ int port_id, resource_size_t component_reg_phys)
{
struct cxl_dport *dport;
- dport = __devm_cxl_add_dport(port, dport_dev, port_id,
- component_reg_phys, CXL_RESOURCE_NONE);
+ dport = __cxl_add_dport(port, dport_dev, port_id, component_reg_phys,
+ CXL_RESOURCE_NONE);
if (IS_ERR(dport)) {
dev_dbg(dport_dev, "failed to add dport to %s: %ld\n",
dev_name(&port->dev), PTR_ERR(dport));
@@ -1245,10 +1236,10 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
return dport;
}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_add_dport, "CXL");
/**
- * devm_cxl_add_rch_dport - append RCH downstream port data to a cxl_port
+ * cxl_add_rch_dport - append RCH downstream port data to a cxl_port
* @port: the cxl_port that references this dport
* @dport_dev: firmware or PCI device representing the dport
* @port_id: identifier for this dport in a decoder's target list
@@ -1256,9 +1247,9 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport, "CXL");
*
* See CXL 3.0 9.11.8 CXL Devices Attached to an RCH
*/
-struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
- struct device *dport_dev, int port_id,
- resource_size_t rcrb)
+struct cxl_dport *cxl_add_rch_dport(struct cxl_port *port,
+ struct device *dport_dev, int port_id,
+ resource_size_t rcrb)
{
struct cxl_dport *dport;
@@ -1267,8 +1258,8 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
return ERR_PTR(-EINVAL);
}
- dport = __devm_cxl_add_dport(port, dport_dev, port_id,
- CXL_RESOURCE_NONE, rcrb);
+ dport = __cxl_add_dport(port, dport_dev, port_id, CXL_RESOURCE_NONE,
+ rcrb);
if (IS_ERR(dport)) {
dev_dbg(dport_dev, "failed to add RCH dport to %s: %ld\n",
dev_name(&port->dev), PTR_ERR(dport));
@@ -1279,7 +1270,7 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
return dport;
}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_add_rch_dport, "CXL");
static int add_ep(struct cxl_ep *new)
{
@@ -1439,13 +1430,42 @@ static void delete_switch_port(struct cxl_port *port)
devm_release_action(port->dev.parent, unregister_port, port);
}
+static void unlink_dport(void *data)
+{
+ struct cxl_dport *dport = data;
+ struct cxl_port *port = dport->port;
+ char link_name[CXL_TARGET_STRLEN];
+
+ sprintf(link_name, "dport%d", dport->port_id);
+ sysfs_remove_link(&port->dev.kobj, link_name);
+ remove_dport(dport);
+ kfree(dport);
+}
+
+int cxl_dport_autoremove(struct cxl_dport *dport)
+{
+ struct cxl_port *port = dport->port;
+ struct device *host;
+
+ if (is_cxl_root(port))
+ host = port->uport_dev;
+ else
+ host = &port->dev;
+
+ return devm_add_action_or_reset(host, unlink_dport, dport);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_dport_autoremove, "CXL");
+
+/*
+ * Note: this only services dynamic removal of mid-level ports, root ports are
+ * always removed by the platform driver (e.g. cxl_acpi). @host can be
+ * hard-coded to &port->dev.
+ */
static void del_dport(struct cxl_dport *dport)
{
struct cxl_port *port = dport->port;
- devm_release_action(&port->dev, cxl_dport_unlink, dport);
- devm_release_action(&port->dev, cxl_dport_remove, dport);
- devm_kfree(&port->dev, dport);
+ devm_release_action(&port->dev, unlink_dport, dport);
}
static void del_dports(struct cxl_port *port)
@@ -1597,10 +1617,24 @@ static int update_decoder_targets(struct device *dev, void *data)
return 0;
}
-DEFINE_FREE(del_cxl_dport, struct cxl_dport *, if (!IS_ERR_OR_NULL(_T)) del_dport(_T))
+static struct cxl_port *cxl_port_open_group(struct cxl_port *port)
+{
+ if (!devres_open_group(&port->dev, port, GFP_KERNEL))
+ return ERR_PTR(-ENOMEM);
+ return port;
+}
+DEFINE_FREE(cxl_port_release_group, struct cxl_port *,
+ if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
+
+static void cxl_port_remove_group(struct cxl_port *port)
+{
+ devres_remove_group(&port->dev, port);
+}
+
static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
struct device *dport_dev)
{
+ struct cxl_dport *new_dport;
struct cxl_dport *dport;
int rc;
@@ -1615,29 +1649,47 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
return ERR_PTR(-EBUSY);
}
- struct cxl_dport *new_dport __free(del_cxl_dport) =
- devm_cxl_add_dport_by_dev(port, dport_dev);
- if (IS_ERR(new_dport))
- return new_dport;
-
- cxl_switch_parse_cdat(new_dport);
+ /*
+ * With the first dport arrival it is now safe to start looking at
+ * component registers. Be careful to not strand resources if dport
+ * creation ultimately fails.
+ */
+ struct cxl_port *port_group __free(cxl_port_release_group) =
+ cxl_port_open_group(port);
+ if (IS_ERR(port_group))
+ return ERR_CAST(port_group);
- if (ida_is_empty(&port->decoder_ida)) {
+ if (port->nr_dports == 0) {
rc = devm_cxl_switch_port_decoders_setup(port);
if (rc)
return ERR_PTR(rc);
- dev_dbg(&port->dev, "first dport%d:%s added with decoders\n",
- new_dport->port_id, dev_name(dport_dev));
- return no_free_ptr(new_dport);
+ /*
+ * Note, when nr_dports returns to zero the port is unregistered
+ * and triggers cleanup. I.e. no need for open-coded release
+ * action on dport removal. See cxl_detach_ep() for that logic.
+ */
}
+ new_dport = cxl_add_dport_by_dev(port, dport_dev);
+ if (IS_ERR(new_dport))
+ return new_dport;
+
+ rc = cxl_dport_autoremove(new_dport);
+ if (rc)
+ return ERR_PTR(rc);
+
+ cxl_switch_parse_cdat(new_dport);
+
+ /* group tracking no longer needed, dport successfully added */
+ cxl_port_remove_group(no_free_ptr(port_group));
+
+ dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
+ port->nr_dports - 1, new_dport->port_id, dev_name(dport_dev));
+
/* New dport added, update the decoder targets */
device_for_each_child(&port->dev, new_dport, update_decoder_targets);
- dev_dbg(&port->dev, "dport%d:%s added\n", new_dport->port_id,
- dev_name(dport_dev));
-
- return no_free_ptr(new_dport);
+ return new_dport;
}
static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 51c8f2f84717..167cc0a87484 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -59,8 +59,12 @@ static int discover_region(struct device *dev, void *unused)
static int cxl_switch_port_probe(struct cxl_port *port)
{
- /* Reset nr_dports for rebind of driver */
- port->nr_dports = 0;
+ /*
+ * Unfortunately, typical driver operations like "find and map
+ * registers", can not be done at port device attach time and must wait
+ * for dport arrival. See cxl_port_add_dport() and the comments in
+ * add_dport() for details.
+ */
/* Cache the data early to ensure is_visible() works */
read_cdat_data(port);
diff --git a/tools/testing/cxl/cxl_core_exports.c b/tools/testing/cxl/cxl_core_exports.c
index 6754de35598d..02d479867a12 100644
--- a/tools/testing/cxl/cxl_core_exports.c
+++ b/tools/testing/cxl/cxl_core_exports.c
@@ -7,16 +7,15 @@
/* Exporting of cxl_core symbols that are only used by cxl_test */
EXPORT_SYMBOL_NS_GPL(cxl_num_decoders_committed, "CXL");
-cxl_add_dport_by_dev_fn _devm_cxl_add_dport_by_dev =
- __devm_cxl_add_dport_by_dev;
-EXPORT_SYMBOL_NS_GPL(_devm_cxl_add_dport_by_dev, "CXL");
+cxl_add_dport_by_dev_fn _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
+EXPORT_SYMBOL_NS_GPL(_cxl_add_dport_by_dev, "CXL");
-struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev)
+struct cxl_dport *cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev)
{
- return _devm_cxl_add_dport_by_dev(port, dport_dev);
+ return _cxl_add_dport_by_dev(port, dport_dev);
}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport_by_dev, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_add_dport_by_dev, "CXL");
cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup =
__devm_cxl_switch_port_decoders_setup;
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 81e2aef3627a..b7a2b550c0b0 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -1060,8 +1060,8 @@ static struct cxl_dport *mock_cxl_add_dport_by_dev(struct cxl_port *port,
if (&pdev->dev != dport_dev)
continue;
- return devm_cxl_add_dport(port, &pdev->dev, pdev->id,
- CXL_RESOURCE_NONE);
+ return cxl_add_dport(port, &pdev->dev, pdev->id,
+ CXL_RESOURCE_NONE);
}
return ERR_PTR(-ENODEV);
@@ -1126,9 +1126,9 @@ static struct cxl_mock_ops cxl_mock_ops = {
.devm_cxl_switch_port_decoders_setup = mock_cxl_switch_port_decoders_setup,
.devm_cxl_endpoint_decoders_setup = mock_cxl_endpoint_decoders_setup,
.cxl_endpoint_parse_cdat = mock_cxl_endpoint_parse_cdat,
- .devm_cxl_add_dport_by_dev = mock_cxl_add_dport_by_dev,
.hmat_get_extended_linear_cache_size =
mock_hmat_get_extended_linear_cache_size,
+ .cxl_add_dport_by_dev = mock_cxl_add_dport_by_dev,
.list = LIST_HEAD_INIT(cxl_mock_ops.list),
};
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index 44bce80ef3ff..660e8402189c 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -15,14 +15,13 @@
static LIST_HEAD(mock);
static struct cxl_dport *
-redirect_devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev);
+redirect_cxl_add_dport_by_dev(struct cxl_port *port, struct device *dport_dev);
static int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port);
void register_cxl_mock_ops(struct cxl_mock_ops *ops)
{
list_add_rcu(&ops->list, &mock);
- _devm_cxl_add_dport_by_dev = redirect_devm_cxl_add_dport_by_dev;
+ _cxl_add_dport_by_dev = redirect_cxl_add_dport_by_dev;
_devm_cxl_switch_port_decoders_setup =
redirect_devm_cxl_switch_port_decoders_setup;
}
@@ -34,7 +33,7 @@ void unregister_cxl_mock_ops(struct cxl_mock_ops *ops)
{
_devm_cxl_switch_port_decoders_setup =
__devm_cxl_switch_port_decoders_setup;
- _devm_cxl_add_dport_by_dev = __devm_cxl_add_dport_by_dev;
+ _cxl_add_dport_by_dev = __cxl_add_dport_by_dev;
list_del_rcu(&ops->list);
synchronize_srcu(&cxl_mock_srcu);
}
@@ -207,7 +206,7 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, "CXL");
-struct cxl_dport *__wrap_devm_cxl_add_rch_dport(struct cxl_port *port,
+struct cxl_dport *__wrap_cxl_add_rch_dport(struct cxl_port *port,
struct device *dport_dev,
int port_id,
resource_size_t rcrb)
@@ -217,19 +216,19 @@ struct cxl_dport *__wrap_devm_cxl_add_rch_dport(struct cxl_port *port,
struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
if (ops && ops->is_mock_port(dport_dev)) {
- dport = devm_cxl_add_dport(port, dport_dev, port_id,
- CXL_RESOURCE_NONE);
+ dport = cxl_add_dport(port, dport_dev, port_id,
+ CXL_RESOURCE_NONE);
if (!IS_ERR(dport)) {
dport->rcrb.base = rcrb;
dport->rch = true;
}
} else
- dport = devm_cxl_add_rch_dport(port, dport_dev, port_id, rcrb);
+ dport = cxl_add_rch_dport(port, dport_dev, port_id, rcrb);
put_cxl_mock_ops(index);
return dport;
}
-EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_add_rch_dport, "CXL");
+EXPORT_SYMBOL_NS_GPL(__wrap_cxl_add_rch_dport, "CXL");
void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
{
@@ -257,17 +256,17 @@ void __wrap_cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device
}
EXPORT_SYMBOL_NS_GPL(__wrap_cxl_dport_init_ras_reporting, "CXL");
-struct cxl_dport *redirect_devm_cxl_add_dport_by_dev(struct cxl_port *port,
- struct device *dport_dev)
+struct cxl_dport *redirect_cxl_add_dport_by_dev(struct cxl_port *port,
+ struct device *dport_dev)
{
int index;
struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
struct cxl_dport *dport;
if (ops && ops->is_mock_port(port->uport_dev))
- dport = ops->devm_cxl_add_dport_by_dev(port, dport_dev);
+ dport = ops->cxl_add_dport_by_dev(port, dport_dev);
else
- dport = __devm_cxl_add_dport_by_dev(port, dport_dev);
+ dport = __cxl_add_dport_by_dev(port, dport_dev);
put_cxl_mock_ops(index);
return dport;
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 6eceefefb0e0..4d740392aac5 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -5,7 +5,8 @@ ldflags-y += --wrap=acpi_evaluate_integer
ldflags-y += --wrap=acpi_pci_find_root
ldflags-y += --wrap=nvdimm_bus_register
ldflags-y += --wrap=cxl_await_media_ready
-ldflags-y += --wrap=devm_cxl_add_rch_dport
+ldflags-y += --wrap=cxl_add_rch_dport
+ldflags-y += --wrap=cxl_rcd_component_reg_phys
ldflags-y += --wrap=cxl_endpoint_parse_cdat
ldflags-y += --wrap=cxl_dport_init_ras_reporting
ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
--
2.52.0
^ permalink raw reply related [flat|nested] 129+ messages in thread
* Re: [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error
2026-01-14 23:18 ` Dave Jiang
@ 2026-01-16 14:42 ` Bowman, Terry
0 siblings, 0 replies; 129+ messages in thread
From: Bowman, Terry @ 2026-01-16 14:42 UTC (permalink / raw)
To: Dave Jiang, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/14/2026 5:18 PM, Dave Jiang wrote:
>
>
> On 1/14/26 11:20 AM, Terry Bowman wrote:
>> The AER driver now forwards CXL protocol errors to the CXL driver via a
>> kfifo. The CXL driver must consume these work items and initiate protocol
>> error handling while ensuring the device's RAS mappings remain valid
>> throughout processing.
>>
>> Implement cxl_proto_err_work_fn() to dequeue work items forwarded by the
>> AER service driver. Lock the parent CXL Port device to ensure the CXL
>> device's RAS registers are accessible during handling. Add pdev reference-put
>> to match reference-get in AER driver. This will ensure pdev access after
>> kfifo dequeue. These changes apply to CXL Ports and CXL Endpoints.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>>
>> ---
>>
>> Changes in v13->v14:
>> - Update commit title's prefix (Bjorn)
>> - Add pdev ref get in AER driver before enqueue and add pdev ref put in
>> CXL driver after dequeue and handling (Dan)
>> - Removed handling to simplify patch context (Terry)
>>
>> Changes in v12->v13:
>> - Add cxlmd lock using guard() (Terry)
>> - Remove exporting of unused function, pci_aer_clear_fatal_status() (Dave Jiang)
>> - Change pr_err() calls to ratelimited. (Terry)
>> - Update commit message. (Terry)
>> - Remove namespace qualifier from pcie_clear_device_status()
>> export (Dave Jiang)
>> - Move locks into cxl_proto_err_work_fn() (Dave)
>> - Update log messages in cxl_forward_error() (Ben)
>>
>> Changes in v11->v12:
>> - Add guard for CE case in cxl_handle_proto_error() (Dave)
>>
>> Changes in v10->v11:
>> - Reword patch commit message to remove RCiEP details (Jonathan)
>> - Add #include <linux/bitfield.h> (Terry)
>> - is_cxl_rcd() - Fix short comment message wrap (Jonathan)
>> - is_cxl_rcd() - Combine return calls into 1 (Jonathan)
>> - cxl_handle_proto_error() - Move comment earlier (Jonathan)
>> - Use FIELD_GET() in discovering class code (Jonathan)
>> - Remove BDF from cxl_proto_err_work_data. Use 'struct
>> pci_dev *' (Dan)
>> ---
>> drivers/cxl/core/core.h | 3 ++
>> drivers/cxl/core/port.c | 6 +--
>> drivers/cxl/core/ras.c | 98 +++++++++++++++++++++++++++++++----
>> drivers/pci/pcie/aer_cxl_vh.c | 1 +
>> 4 files changed, 94 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
>> index 306762a15dc0..39324e1b8940 100644
>> --- a/drivers/cxl/core/core.h
>> +++ b/drivers/cxl/core/core.h
>> @@ -169,6 +169,9 @@ static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
>> #endif /* CONFIG_CXL_RAS */
>>
>> int cxl_gpf_port_setup(struct cxl_dport *dport);
>> +struct cxl_port *find_cxl_port(struct device *dport_dev,
>> + struct cxl_dport **dport);
>> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev);
>>
>> struct cxl_hdm;
>> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index a535e57360e0..0bec10be5d56 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -1335,8 +1335,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
>> return NULL;
>> }
>>
>> -static struct cxl_port *find_cxl_port(struct device *dport_dev,
>> - struct cxl_dport **dport)
>> +struct cxl_port *find_cxl_port(struct device *dport_dev,
>> + struct cxl_dport **dport)
>> {
>> struct cxl_find_port_ctx ctx = {
>> .dport_dev = dport_dev,
>> @@ -1578,7 +1578,7 @@ static int match_port_by_uport(struct device *dev, const void *data)
>> * Function takes a device reference on the port device. Caller should do a
>> * put_device() when done.
>> */
>> -static struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
>> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
>> {
>> struct device *dev;
>>
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index bf82880e19b4..0c640b84ad70 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -117,17 +117,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>> }
>> static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>>
>> -int cxl_ras_init(void)
>> -{
>> - return cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
>> -}
>> -
>> -void cxl_ras_exit(void)
>> -{
>> - cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
>> - cancel_work_sync(&cxl_cper_prot_err_work);
>> -}
>> -
>> static void cxl_dport_map_ras(struct cxl_dport *dport)
>> {
>> struct cxl_register_map *map = &dport->reg_map;
>> @@ -173,6 +162,44 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
>> }
>> EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>>
>> +/*
>> + * Return 'struct cxl_port *' parent CXL Port of dev
>> + *
>> + * Reference count increments returned port on success
>> + *
>> + * @pdev: Find the parent CXL Port of this device
>> + */
>> +static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
>> +{
>> + switch (pci_pcie_type(pdev)) {
>> + case PCI_EXP_TYPE_ROOT_PORT:
>> + case PCI_EXP_TYPE_DOWNSTREAM:
>> + {
>> + struct cxl_dport *dport;
>> + struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
>> +
>> + if (!port) {
>> + pci_err(pdev, "Failed to find the CXL device");
>> + return NULL;
>> + }
>> + return port;
>> + }
>> + case PCI_EXP_TYPE_UPSTREAM:
>> + case PCI_EXP_TYPE_ENDPOINT:
>> + {
>> + struct cxl_port *port = find_cxl_port_by_uport(&pdev->dev);
>> +
>> + if (!port) {
>> + pci_err(pdev, "Failed to find the CXL device");
>> + return NULL;
>> + }
>> + return port;
>> + }
>> + }
>> + pci_warn_once(pdev, "Error: Unsupported device type (%#x)", pci_pcie_type(pdev));
>> + return NULL;
>> +}
>> +
>> void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
>> {
>> void __iomem *addr;
>> @@ -316,3 +343,52 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>> return PCI_ERS_RESULT_NEED_RESET;
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
>> +
>> +static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
>> +{
>> +}
>> +
>> +static void cxl_proto_err_work_fn(struct work_struct *work)
>> +{
>> + struct cxl_proto_err_work_data wd;
>> +
>> + while (cxl_proto_err_kfifo_get(&wd)) {
>> + struct pci_dev *pdev __free(pci_dev_put) = wd.pdev;
>> +
>> + if (!pdev) {
>> + pr_err_ratelimited("NULL PCI device passed in AER-CXL KFIFO\n");
>> + continue;
>> + }
>> +
>> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
>> + if (!port) {
>> + pr_err_ratelimited("Failed to find parent Port device in CXL topology.\n");
>> + continue;
>> + }
>> + guard(device)(&port->dev);
>> +
>> + cxl_handle_proto_error(&wd);
>> + }
>> +}
>> +
>> +static struct work_struct cxl_proto_err_work;
>> +static DECLARE_WORK(cxl_proto_err_work, cxl_proto_err_work_fn);
>> +
>> +int cxl_ras_init(void)
>> +{
>> + if (cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work))
>> + pr_err("Failed to initialize CXL RAS CPER\n");
>> +
>> + cxl_register_proto_err_work(&cxl_proto_err_work);
>> +
>> + return 0;
>> +}
>> +
>> +void cxl_ras_exit(void)
>> +{
>> + cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
>> + cancel_work_sync(&cxl_cper_prot_err_work);
>> +
>> + cxl_unregister_proto_err_work();
>> + cancel_work_sync(&cxl_proto_err_work);
>> +}
>> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
>> index 2189d3c6cef1..0f616f5fafcf 100644
>> --- a/drivers/pci/pcie/aer_cxl_vh.c
>> +++ b/drivers/pci/pcie/aer_cxl_vh.c
>> @@ -48,6 +48,7 @@ void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info)
>> };
>>
>> guard(rwsem_read)(&cxl_proto_err_kfifo.rw_sema);
>> + pci_dev_get(pdev);
>
> Should this chunk move to where the commit that implements cxl_forward_error()?
>
>
Yes, that makes better sense.
-Terry
>> if (!cxl_proto_err_kfifo.work || !kfifo_put(&cxl_proto_err_kfifo.fifo, wd)) {
>> dev_err_ratelimited(&pdev->dev, "AER-CXL kfifo error");
>> return;
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management
2026-01-16 4:45 ` dan.j.williams
@ 2026-01-16 15:01 ` Jonathan Cameron
2026-01-16 16:16 ` Jonathan Cameron
2026-01-19 2:48 ` dan.j.williams
0 siblings, 2 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-16 15:01 UTC (permalink / raw)
To: dan.j.williams
Cc: Terry Bowman, dave, dave.jiang, alison.schofield, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Thu, 15 Jan 2026 20:45:20 -0800
dan.j.williams@intel.com wrote:
> Jonathan Cameron wrote:
> > On Wed, 14 Jan 2026 12:20:40 -0600
> > Terry Bowman <terry.bowman@amd.com> wrote:
> >
> > > From: Dan Williams <dan.j.williams@intel.com>
> > >
> > > With dport addition moving out of cxl_switch_port_probe() it is no longer
> > > the case that a single dport-add failure will cause all dport resources
> > > to be automatically unwound.
> > >
> > > devm still helps all dport resources get cleaned up when the port is
> > > detached, but setup now needs to avoid leaking resources if an early exit
> > > occurs during setup.
> > >
> > > Convert from a "devm add" model, to an "auto remove" model that makes the
> > > caller responsible for registering devm reclaim after the object is fully
> > > instantiated.
> > >
> > > As a side of effect of this reorganization port->nr_dports is now always
> > > consistent with the number of entries in the port->dports xarray, and this
> > > can stop playing games with ida_is_empty() which is unreliable as a
> > > detector of whether decoders are setup. I.e. consider how
> > > CONFIG_DEBUG_KOBJECT_RELEASE might wreak havoc with this approach.
> > >
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > Reviewed-by: Terry Bowman <terry.bowman@amd.com>
> > >
> > > ---
> > >
> > > Changes in v13 -> v14:
> > > - New patch
> > Hi Dan, Terry,
> >
> > I think this needs a little reorganization to ensure we don't have
> > dport and dport_add both being the same pointer for different free
> > reasons. Adding a helper and we can combine them with a clear
> > hand over of ownership.
> >
> > Wrapping devres_remove_group() in a function that is called close_group()
> > rings alarm bells.
> >
> > Jonathan
> [..]
> >
> > > @@ -1176,48 +1175,27 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> > > &component_reg_phys);
> > >
> > > cond_cxl_root_lock(port);
> > > - rc = add_dport(port, dport);
> > > + struct cxl_dport *dport_add __free(remove_dport) =
> > > + add_dport(port, dport);
> >
> > This pattern of having both dport and dport_add effectively
> > pointing to the same pointer concerns me from a readability / maintainability
> > point of view. We've often made use of helper functions to avoid doing
> > this and I think that would make sense here as well.
>
> Yeah, while I do think the multi-variable pattern is useful for
> many-step object construction, I can usually easily be persuaded to
> consider a helper function.
>
> > Take everything down to and including dport_add() as a helper called
> > something like (naming needs work!)
> > struct dport_dev *dport __free(remove_and_free_dport) =
> > add_dport_wrapper();
>
> I ended up with the patch below which is similar in spirit to this
> without a new DEFINE_FREE().
>
>
> >
> > > cond_cxl_root_unlock(port);
> > > - if (rc)
> > > - return ERR_PTR(rc);
> > > -
> > > - /*
> > > - * Setup port register if this is the first dport showed up. Having
> > > - * a dport also means that there is at least 1 active link.
> > > - */
> > > - if (port->nr_dports == 1 &&
> > > - port->component_reg_phys != CXL_RESOURCE_NONE) {
> > > - rc = cxl_port_setup_regs(port, port->component_reg_phys);
> > > - if (rc) {
> > > - xa_erase(&port->dports, (unsigned long)dport->dport_dev);
> > > - return ERR_PTR(rc);
> > > - }
> > > - port->component_reg_phys = CXL_RESOURCE_NONE;
> > > - }
> > > + if (IS_ERR(dport_add))
> > > + return dport_add;
> > >
> > > - get_device(dport_dev);
> > > - rc = devm_add_action_or_reset(host, cxl_dport_remove, dport);
> > > - if (rc)
> > > - return ERR_PTR(rc);
> > > + if (dev_is_pci(dport_dev))
> > > + dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
> > >
> > > rc = sysfs_create_link(&port->dev.kobj, &dport_dev->kobj, link_name);
> > > if (rc)
> > > return ERR_PTR(rc);
> > >
> > > - rc = devm_add_action_or_reset(host, cxl_dport_unlink, dport);
> > > - if (rc)
> > > - return ERR_PTR(rc);
> > > -
> > > - if (dev_is_pci(dport_dev))
> > > - dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
> > > -
> > > cxl_debugfs_create_dport_dir(dport);
> > >
> > > - return dport;
> > > + retain_and_null_ptr(dport_add);
> > > + return no_free_ptr(dport);
> > > }
> >
> >
> >
> > > +
> > > +/*
> > > + * Note: this only services dynamic removal of mid-level ports, root ports are
> > > + * always removed by the platform driver (e.g. cxl_acpi). @host can be
> > > + * hard-coded to &port->dev.
> > > + */
> > > static void del_dport(struct cxl_dport *dport)
> > > {
> > > struct cxl_port *port = dport->port;
> > >
> > > - devm_release_action(&port->dev, cxl_dport_unlink, dport);
> > > - devm_release_action(&port->dev, cxl_dport_remove, dport);
> > > - devm_kfree(&port->dev, dport);
> > > + devm_release_action(&port->dev, unlink_dport, dport);
> > > }
> > >
> > > static void del_dports(struct cxl_port *port)
> > > @@ -1597,10 +1603,24 @@ static int update_decoder_targets(struct device *dev, void *data)
> > > return 0;
> > > }
> > >
> > > -DEFINE_FREE(del_cxl_dport, struct cxl_dport *, if (!IS_ERR_OR_NULL(_T)) del_dport(_T))
> > > +static struct cxl_port *cxl_port_devres_group(struct cxl_port *port)
> > > +{
> > > + if (!devres_open_group(&port->dev, port, GFP_KERNEL))
> > > + return ERR_PTR(-ENOMEM);
> > > + return port;
> > > +}
> > > +DEFINE_FREE(cxl_port_group_free, struct cxl_port *,
> > > + if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
> > > +
> > > +static void cxl_port_group_close(struct cxl_port *port)
> >
> > This feels like misleading naming and I'm not sure what intent is.
> > Would have expected it to call devres_close_group()
>
> Agree. The hastiness of this patch shows. Switched all the naming to not
> be surprising. The flow is:
>
> cxl_port_open_group(): start recording devres resource acquisition
> cxl_port_remove_group(): on success, stop tracking the group, leave the resources
> cxl_port_release_group(): on failure, destroy the group, free the resources
Hi Dan, thanks for getting back on this so quickly!
Ok. So I'd misunderstood intent. If we don't have the option of close_group()
then these are just for temporary tracking of potential cleanup stuff
rather than because we want to optionally roll back part of the main
devres stuff (the bit that gets cleaned up on driver unbind /
just after remove())
I was thinking this was odd usage, but it is documented in devres.rst as one
or the two group examples, so I'm less concerned about that. Maybe
sprinkle a comment or two on the temporary nature of the devres group?
However, this makes me wonder why the other model in devres.rst
https://elixir.bootlin.com/linux/v6.19-rc5/source/Documentation/driver-api/driver-model/devres.rst#L187
int my_midlayer_create_something()
{
if (!devres_open_group(dev, my_midlayer_create_something, GFP_KERNEL))
return -ENOMEM;
...
devres_close_group(dev, my_midlayer_create_something);
return 0;
}
void my_midlayer_destroy_something()
{
devres_release_group(dev, my_midlayer_create_something);
}
isn't more appropriate here.
Did you give that approach a go? Assuming unlink_dport() cleans up all the
same stuff as was covered by the group (it doesn't quite because of the
last few things that can't fail) it should be a much less invasive
change. A small complexity is you'd need group to be created on the right dev
so that it matches what is done in the new autoremove code.
I'll give this a go, but might take a while so sending this in the meantime.
Anyhow, some of the comments that follow are on what you have done, and
a few others are on what the devres_close_group approach would look like.
This might the hardest to review patch I've looked at in a while...
Not sure what you could do about that though!
>
> New patch, added a Fixes: tag.
>
> -- 8< --
> From 9731bb6cb5638a0d2141dc072f90db0d00400680 Mon Sep 17 00:00:00 2001
> From: Dan Williams <dan.j.williams@intel.com>
> Date: Wed, 14 Jan 2026 12:20:40 -0600
> Subject: [PATCH] cxl/port: Fix devm resource leaks with dport management
>
> With dport addition moving out of cxl_switch_port_probe() it is no longer
> the case that a single dport-add failure will cause all dport resources
> to be automatically unwound.
>
> devm still helps all dport resources get cleaned up when the port is
> detached, but setup now needs to avoid leaking resources if an early exit
> occurs during setup.
>
> Convert from a "devm add" model, to an "auto remove" model that makes the
> caller responsible for registering devm reclaim after the object is fully
> instantiated.
>
> As a side of effect of this reorganization port->nr_dports is now always
> consistent with the number of entries in the port->dports xarray, and this
> can stop playing games with ida_is_empty() which is unreliable as a
> detector of whether decoders are setup. I.e. consider how
> CONFIG_DEBUG_KOBJECT_RELEASE might wreak havoc with this approach.
Given complexity of ownership, can we have a flow chart of who owns what when?
>
> Cc: <stable@vger.kernel.org>
> Fixes: 4f06d81e7c6a ("cxl: Defer dport allocation for switch ports")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index b838c59d7a3c..ce117812e5c8 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -41,14 +41,14 @@ static int pci_get_port_num(struct pci_dev *pdev)
> }
>
> /**
> - * __devm_cxl_add_dport_by_dev - allocate a dport by dport device
> + * __cxl_add_dport_by_dev - allocate a dport by dport device
> * @port: cxl_port that hosts the dport
> * @dport_dev: 'struct device' of the dport
> *
> * Returns the allocated dport on success or ERR_PTR() of -errno on error
> */
> -struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
> - struct device *dport_dev)
> +struct cxl_dport *__cxl_add_dport_by_dev(struct cxl_port *port,
> + struct device *dport_dev)
> {
> struct cxl_register_map map;
> struct pci_dev *pdev;
> @@ -67,9 +67,9 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port,
> return ERR_PTR(rc);
>
> device_lock_assert(&port->dev);
> - return devm_cxl_add_dport(port, dport_dev, port_num, map.resource);
> + return cxl_add_dport(port, dport_dev, port_num, map.resource);
> }
> -EXPORT_SYMBOL_NS_GPL(__devm_cxl_add_dport_by_dev, "CXL");
> +EXPORT_SYMBOL_NS_GPL(__cxl_add_dport_by_dev, "CXL");
>
> static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
> {
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index fef3aa0c6680..41b65babd057 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1066,11 +1066,28 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
> return -EBUSY;
> }
>
> + /*
> + * Unlike CXL switch upstream ports where it can train a CXL link
> + * independent of its downstream ports, a host bridge upstream port may
> + * not enable CXL registers until at least one downstream port (root
> + * port) trains CXL. Enumerate registers once when the number of dports
> + * transitions from zero to one.
> + */
> + if (!port->nr_dports) {
> + rc = cxl_port_setup_regs(port, port->component_reg_phys);
> + if (rc)
> + return rc;
> + }
> +
> + /* Arrange for dport_dev to be valid through remove_dport() */
> + struct device *dev __free(put_device) = get_device(dport->dport_dev);
> +
> rc = xa_insert(&port->dports, (unsigned long)dport->dport_dev, dport,
> GFP_KERNEL);
> if (rc)
> return rc;
>
> + retain_and_null_ptr(dev);
> port->nr_dports++;
> return 0;
> }
> @@ -1094,51 +1111,64 @@ static void cond_cxl_root_unlock(struct cxl_port *port)
> device_unlock(&port->dev);
> }
>
> -static void cxl_dport_remove(void *data)
> +static void remove_dport(struct cxl_dport *dport)
> {
> - struct cxl_dport *dport = data;
> struct cxl_port *port = dport->port;
>
> + port->nr_dports--;
> xa_erase(&port->dports, (unsigned long) dport->dport_dev);
> put_device(dport->dport_dev);
> }
>
> -static void cxl_dport_unlink(void *data)
> +static struct cxl_dport *__register_dport(struct cxl_dport *dport)
> {
> - struct cxl_dport *dport = data;
> - struct cxl_port *port = dport->port;
> + int rc;
> char link_name[CXL_TARGET_STRLEN];
> + struct cxl_port *port = dport->port;
> + struct device *dport_dev = dport->dport_dev;
>
> - sprintf(link_name, "dport%d", dport->port_id);
> - sysfs_remove_link(&port->dev.kobj, link_name);
> -}
> + if (snprintf(link_name, CXL_TARGET_STRLEN, "dport%d", dport->port_id) >=
> + CXL_TARGET_STRLEN)
> + return ERR_PTR(-EINVAL);
>
> -static struct cxl_dport *
> -__devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> - int port_id, resource_size_t component_reg_phys,
> - resource_size_t rcrb)
> -{
> - char link_name[CXL_TARGET_STRLEN];
> - struct cxl_dport *dport;
> - struct device *host;
> - int rc;
> + cond_cxl_root_lock(port);
> + rc = add_dport(port, dport);
> + cond_cxl_root_unlock(port);
> + if (rc)
> + return ERR_PTR(rc);
>
> - if (is_cxl_root(port))
> - host = port->uport_dev;
> - else
> - host = &port->dev;
> + if (dev_is_pci(dport_dev))
> + dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
>
> - if (!host->driver) {
> - dev_WARN_ONCE(&port->dev, 1, "dport:%s bad devm context\n",
> - dev_name(dport_dev));
> - return ERR_PTR(-ENXIO);
> + rc = sysfs_create_link(&port->dev.kobj, &dport_dev->kobj, link_name);
> + if (rc) {
There are several operations in here that get undone in unlink_dport()
I'd wrap those up in an __unregister_dport() function so it is easy to see
what pairs with what.
> + remove_dport(dport);
> + return ERR_PTR(rc);
> }
>
> - if (snprintf(link_name, CXL_TARGET_STRLEN, "dport%d", port_id) >=
> - CXL_TARGET_STRLEN)
> - return ERR_PTR(-EINVAL);
> + cxl_debugfs_create_dport_dir(dport);
>
> - dport = devm_kzalloc(host, sizeof(*dport), GFP_KERNEL);
> + return dport;
> +}
> @@ -1439,13 +1430,42 @@ static void delete_switch_port(struct cxl_port *port)
> devm_release_action(port->dev.parent, unregister_port, port);
> }
>
> +static void unlink_dport(void *data)
> +{
> + struct cxl_dport *dport = data;
> + struct cxl_port *port = dport->port;
> + char link_name[CXL_TARGET_STRLEN];
> +
> + sprintf(link_name, "dport%d", dport->port_id);
> + sysfs_remove_link(&port->dev.kobj, link_name);
> + remove_dport(dport);
> + kfree(dport);
To me this removes half the advantage of devres which is
that we don't need to be careful to remove things in the right
order. Ah well, perhaps a price we need to pay.
> +}
> +
> +int cxl_dport_autoremove(struct cxl_dport *dport)
> +{
> + struct cxl_port *port = dport->port;
> + struct device *host;
> +
> + if (is_cxl_root(port))
> + host = port->uport_dev;
> + else
> + host = &port->dev;
> +
> + return devm_add_action_or_reset(host, unlink_dport, dport);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_dport_autoremove, "CXL");
> +
> +/*
> + * Note: this only services dynamic removal of mid-level ports, root ports are
> + * always removed by the platform driver (e.g. cxl_acpi). @host can be
> + * hard-coded to &port->dev.
> + */
> static void del_dport(struct cxl_dport *dport)
> {
> struct cxl_port *port = dport->port;
>
> - devm_release_action(&port->dev, cxl_dport_unlink, dport);
> - devm_release_action(&port->dev, cxl_dport_remove, dport);
> - devm_kfree(&port->dev, dport);
> + devm_release_action(&port->dev, unlink_dport, dport);
If you did go with the devres_close_group() suggestion I think you could
then call devres_remove_group() to undo the dport stuff leaving the
rest of the devres on port-dev in place.
> }
>
> static void del_dports(struct cxl_port *port)
> @@ -1597,10 +1617,24 @@ static int update_decoder_targets(struct device *dev, void *data)
> return 0;
> }
>
> -DEFINE_FREE(del_cxl_dport, struct cxl_dport *, if (!IS_ERR_OR_NULL(_T)) del_dport(_T))
> +static struct cxl_port *cxl_port_open_group(struct cxl_port *port)
> +{
> + if (!devres_open_group(&port->dev, port, GFP_KERNEL))
The use of port as the ID is tiny bit nasty but necessary I guess for the DEFINE_FREE to
work (you could use &port->dev) but that doesn't really help.
Disadvantage is you can't stack these groups without breaking the advice
in devres not to reuse IDs. In practice I think that actually works but
it's the sort of advice comment that makes me think it might not always do so!
> + return ERR_PTR(-ENOMEM);
> + return port;
> +}
> +DEFINE_FREE(cxl_port_release_group, struct cxl_port *,
> + if (!IS_ERR_OR_NULL(_T)) devres_release_group(&(_T)->dev, _T))
> +
> +static void cxl_port_remove_group(struct cxl_port *port)
> +{
> + devres_remove_group(&port->dev, port);
> +}
> +
> static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> struct device *dport_dev)
> {
> + struct cxl_dport *new_dport;
> struct cxl_dport *dport;
Too many dports. Given the existing one is only used for a sanity check, can we
have a precursor that gets rid of it via a helper
int check_no_existing_dport(struct cxl_dport *port, struct device *dport_dev)
{
struct cxl_dport *dport = cxl_find_dport_by_dev(port, dport_dev);
if (dport) {
dev_dbg(&port->dev, "dport%d:%s already exists\n",
dport->port_id, dev_name(dport_dev));
return -EBUSY;
}
return 0;
}
...
rc = check_no_existing_dport(port, dport_dev);
if (rc)
return ERR_PTR(rc);
then can rename new_dport to dport for this patch.
Or don't bother hiding it and just use dport variable for both.
> int rc;
>
> @@ -1615,29 +1649,47 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> return ERR_PTR(-EBUSY);
> }
>
> - struct cxl_dport *new_dport __free(del_cxl_dport) =
> - devm_cxl_add_dport_by_dev(port, dport_dev);
> - if (IS_ERR(new_dport))
> - return new_dport;
> -
> - cxl_switch_parse_cdat(new_dport);
> + /*
> + * With the first dport arrival it is now safe to start looking at
> + * component registers. Be careful to not strand resources if dport
> + * creation ultimately fails.
> + */
> + struct cxl_port *port_group __free(cxl_port_release_group) =
> + cxl_port_open_group(port);
So this is relies on everything being registered against port->dev, whereas
for root ports this then gets handed off to port->uport_dev.
That's a bit obscure - hence request for some patch description text
on who owns what resources + when.
> + if (IS_ERR(port_group))
> + return ERR_CAST(port_group);
>
> - if (ida_is_empty(&port->decoder_ida)) {
> + if (port->nr_dports == 0) {
> rc = devm_cxl_switch_port_decoders_setup(port);
> if (rc)
> return ERR_PTR(rc);
> - dev_dbg(&port->dev, "first dport%d:%s added with decoders\n",
> - new_dport->port_id, dev_name(dport_dev));
> - return no_free_ptr(new_dport);
> + /*
> + * Note, when nr_dports returns to zero the port is unregistered
> + * and triggers cleanup. I.e. no need for open-coded release
> + * action on dport removal. See cxl_detach_ep() for that logic.
> + */
> }
>
> + new_dport = cxl_add_dport_by_dev(port, dport_dev);
> + if (IS_ERR(new_dport))
> + return new_dport;
> +
> + rc = cxl_dport_autoremove(new_dport);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + cxl_switch_parse_cdat(new_dport);
> +
> + /* group tracking no longer needed, dport successfully added */
> + cxl_port_remove_group(no_free_ptr(port_group));
I think this is a tiny bit too late (though my head hurts so could be wrong).
If we hit the error condition just above, we already freed the stuff
that this group controls.
I'd be tempted to have a helper for the entire region the group is held for
helper()
{
struct cxl_port *port_group __free(cxl_port_release_group) =
cxl_port_open_group(port);
if (IS_ERR(port_group))
return ERR_CAST(port_group);
...
cxl_port_remove_group(no_free_ptr(port_group));
return something good;
}
so that the scope is clear.
> +
> + dev_dbg(&port->dev, "dport[%d] id:%d dport_dev: %s added\n",
> + port->nr_dports - 1, new_dport->port_id, dev_name(dport_dev));
> +
> /* New dport added, update the decoder targets */
> device_for_each_child(&port->dev, new_dport, update_decoder_targets);
>
> - dev_dbg(&port->dev, "dport%d:%s added\n", new_dport->port_id,
> - dev_name(dport_dev));
> -
> - return no_free_ptr(new_dport);
> + return new_dport;
> }
>
> static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management
2026-01-16 15:01 ` Jonathan Cameron
@ 2026-01-16 16:16 ` Jonathan Cameron
2026-01-19 23:02 ` dan.j.williams
2026-01-19 2:48 ` dan.j.williams
1 sibling, 1 reply; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-16 16:16 UTC (permalink / raw)
To: dan.j.williams
Cc: Terry Bowman, dave, dave.jiang, alison.schofield, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
> rc = check_no_existing_dport(port, dport_dev);
> if (rc)
> return ERR_PTR(rc);
>
> then can rename new_dport to dport for this patch.
> Or don't bother hiding it and just use dport variable for both.
>
>
> > int rc;
> >
> > @@ -1615,29 +1649,47 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
> > return ERR_PTR(-EBUSY);
> > }
> >
> > - struct cxl_dport *new_dport __free(del_cxl_dport) =
> > - devm_cxl_add_dport_by_dev(port, dport_dev);
> > - if (IS_ERR(new_dport))
> > - return new_dport;
> > -
> > - cxl_switch_parse_cdat(new_dport);
> > + /*
> > + * With the first dport arrival it is now safe to start looking at
> > + * component registers. Be careful to not strand resources if dport
> > + * creation ultimately fails.
> > + */
> > + struct cxl_port *port_group __free(cxl_port_release_group) =
> > + cxl_port_open_group(port);
>
> So this is relies on everything being registered against port->dev, whereas
> for root ports this then gets handed off to port->uport_dev.
> That's a bit obscure - hence request for some patch description text
> on who owns what resources + when.
>
> > + if (IS_ERR(port_group))
> > + return ERR_CAST(port_group);
> >
> > - if (ida_is_empty(&port->decoder_ida)) {
> > + if (port->nr_dports == 0) {
> > rc = devm_cxl_switch_port_decoders_setup(port);
> > if (rc)
> > return ERR_PTR(rc);
I'm not totally sure I have an appropriate base for messing with this but
with just this patch on top of cxl/for-7.0/cxl-init I'm getting problems
with the reorder here as the devm_cxl_switch_port fails to map HDM decoders.
On a simple single switch couple of devices test.
I'm probably suffering Friday afternoon syndrome (and naughty testing just
a middle patch in a series)
My 'guess' is that it's because we no longer add the port first.
I couldn't spot any other patch messing with related logic earlier in Terry's series
(and wanted to keep what I was messing with for trying alternative fixes to a minimum)
Any thoughts on what I'm missing?
Jonathan
> > - dev_dbg(&port->dev, "first dport%d:%s added with decoders\n",
> > - new_dport->port_id, dev_name(dport_dev));
> > - return no_free_ptr(new_dport);
> > + /*
> > + * Note, when nr_dports returns to zero the port is unregistered
> > + * and triggers cleanup. I.e. no need for open-coded release
> > + * action on dport removal. See cxl_detach_ep() for that logic.
> > + */
> > }
> >
> > + new_dport = cxl_add_dport_by_dev(port, dport_dev);
> > + if (IS_ERR(new_dport))
> > + return new_dport;
> > +
> > + rc = cxl_dport_autoremove(new_dport);
> > + if (rc)
> > + return ERR_PTR(rc);
> > +
> > + cxl_switch_parse_cdat(new_dport);
> > +
> > + /* group tracking no longer needed, dport successfully added */
> > + cxl_port_remove_group(no_free_ptr(port_group));
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm()
2026-01-16 3:07 ` dan.j.williams
@ 2026-01-16 16:22 ` Dave Jiang
0 siblings, 0 replies; 129+ messages in thread
From: Dave Jiang @ 2026-01-16 16:22 UTC (permalink / raw)
To: dan.j.williams, Terry Bowman, dave, jonathan.cameron,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/15/26 8:07 PM, dan.j.williams@intel.com wrote:
> Dave Jiang wrote:
>>
>>
>> On 1/14/26 11:20 AM, Terry Bowman wrote:
>>> From: Dan Williams <dan.j.williams@intel.com>
>>>
>>> The convention for devm_ helpers in the CXL driver is that the first
>>> argument is the @host for the operation (locked driver::probe() context).
>>>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
>>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>
>> A nit below
>>
>>>
>>> ---
>>>
>>> Changes in v13 -> v14:
>>> - New patch
>>> ---
>>> drivers/cxl/core/pmem.c | 13 +++++++------
>>> drivers/cxl/cxl.h | 3 ++-
>>> drivers/cxl/mem.c | 2 +-
>>> 3 files changed, 10 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c
>>> index 8853415c106a..e7b1e6fa0ea0 100644
>>> --- a/drivers/cxl/core/pmem.c
>>> +++ b/drivers/cxl/core/pmem.c
>>> @@ -237,12 +237,13 @@ static void cxlmd_release_nvdimm(void *_cxlmd)
>>>
>>> /**
>>> * devm_cxl_add_nvdimm() - add a bridge between a cxl_memdev and an nvdimm
>>> - * @parent_port: parent port for the (to be added) @cxlmd endpoint port
>>> - * @cxlmd: cxl_memdev instance that will perform LIBNVDIMM operations
>>> + * @host: host device for devm operations
>>> + * @port: any port in the CXL topology to find the nvdimm-bridge device
>>> + * @cxlmd: parent of the to be created cxl_nvdimm device
>>> *
>>> * Return: 0 on success negative error code on failure.
>>> */
>>> -int devm_cxl_add_nvdimm(struct cxl_port *parent_port,
>>> +int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port,
>>
>> s/port/parent_port/ to maintain clarity of the port
>
> ...but it is not used as a "parent" port in this function. Any port in
> the topology will do. The reason a port argument is needed is
> disambiguate when there are multiple CXL root devices. That currently
> only happens when cxl_test is loaded.
>
> However, after writing that, it may make more sense to make that
> semantic explicit and just have the caller responsible for passing in an
> @cxl_root argument.
>
> A change for not this series.
I'll make a note in the backlog.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management
2026-01-16 15:01 ` Jonathan Cameron
2026-01-16 16:16 ` Jonathan Cameron
@ 2026-01-19 2:48 ` dan.j.williams
1 sibling, 0 replies; 129+ messages in thread
From: dan.j.williams @ 2026-01-19 2:48 UTC (permalink / raw)
To: Jonathan Cameron, dan.j.williams
Cc: Terry Bowman, dave, dave.jiang, alison.schofield, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
Jonathan Cameron wrote:
[..]
> This might the hardest to review patch I've looked at in a while...
Well, that *is* fatal feedback. I think this simply needs to be broken
up into smaller to digest pieces. Will send that shortly.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management
2026-01-16 16:16 ` Jonathan Cameron
@ 2026-01-19 23:02 ` dan.j.williams
2026-01-20 12:25 ` Jonathan Cameron
0 siblings, 1 reply; 129+ messages in thread
From: dan.j.williams @ 2026-01-19 23:02 UTC (permalink / raw)
To: Jonathan Cameron, dan.j.williams
Cc: Terry Bowman, dave, dave.jiang, alison.schofield, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
Jonathan Cameron wrote:
[..]
> Any thoughts on what I'm missing?
Probably:
cxl/port: Map Port component registers before switchport init
...which really should not be a distinct patch from the one that changes
the ordering. I will send out a more patient series to settle this so
Terry does not need to keep carrying it.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 02/34] PCI: Update CXL DVSEC definitions
2026-01-14 18:53 ` Jonathan Cameron
@ 2026-01-19 23:44 ` dan.j.williams
0 siblings, 0 replies; 129+ messages in thread
From: dan.j.williams @ 2026-01-19 23:44 UTC (permalink / raw)
To: Jonathan Cameron, Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
Jonathan Cameron wrote:
[..]
> > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> > index 6c4b6f19b18e..662582bdccf0 100644
> > --- a/include/uapi/linux/pci_regs.h
> > +++ b/include/uapi/linux/pci_regs.h
>
>
> > +/* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */
> > +#define PCI_DVSEC_CXL_DEVICE 0
> > +#define PCI_DVSEC_CXL_CAP 0xA
>
> Why drop the _DEVICE_ bit of these I'd kind of expect
> #define PCI_DVSEC_CXL_DEVICE_CAP
> to indicate which DVSEC it is in.
We got by without the redundant _DEVICE_ in the name to date, the port
DVSEC has the _PORT_ differentiation. In the interest of not needing to
review another version of this simple patch I vote leave well enough
alone. Will leave it to Bjorn if he wants to override.
[..]
> > +/* CXL r4.0, 8.1.6: GPF DVSEC for CXL Port */
> > +#define PCI_DVSEC_CXL_PORT_GPF 4
>
> Nothing like ambiguous naming in the CXL spec as the
> following fields sound like they are in the CXL_PORT dvsec
> but they aren't. Well the spec avoids it with GPF_FOR_PORT
> but we don't want to go there. I wonder...
> PCI_DVSEC_CXL_PORTGPF maybe to avoid that?
>
> Sigh. It's probably not worth it and does look horrible, so stick
> with these.
Not seeing a siginficant improvement in the suggestion.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors()
2026-01-14 18:20 ` [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
` (2 preceding siblings ...)
2026-01-14 20:40 ` Dave Jiang
@ 2026-01-20 2:09 ` dan.j.williams
2026-01-22 10:31 ` Lukas Wunner
2026-01-22 18:49 ` Bjorn Helgaas
4 siblings, 1 reply; 129+ messages in thread
From: dan.j.williams @ 2026-01-20 2:09 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
Terry Bowman wrote:
> Internal PCIe errors are not enabled by default during initialization. This
> creates a problem for CXL drivers, which rely on PCIe Correctable and
> Uncorrectable Internal Errors to receive CXL protocol error notifications.
>
> Export pci_aer_unmask_internal_errors() so CXL and other drivers can
> enable internal PCIe errors.
I folded in the following to this patch because opening up internal
errors for PCIe drivers in general is not a goal.
1: cb9a15481d8c ! 1: 7433e0204753 PCI/AER: Export pci_aer_unmask_internal_errors()
@@ Metadata
## Commit message ##
PCI/AER: Export pci_aer_unmask_internal_errors()
- Internal PCIe errors are not enabled by default during initialization. This
- creates a problem for CXL drivers, which rely on PCIe Correctable and
- Uncorrectable Internal Errors to receive CXL protocol error notifications.
+ Internal PCIe errors are not enabled by default during initialization
+ because their behavior is too device-specific and there is no standard way
+ to reason about them. However, for CXL an internal error is the standard
+ mechanism for conveying CXL protocol errors.
- Export pci_aer_unmask_internal_errors() so CXL and other drivers can
- enable internal PCIe errors.
+ Export pci_aer_unmask_internal_errors() for CXL, but make it clear that
+ they are only meant for CXL and the status quo for leaving them masked for
+ PCIe in general remains.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260114182055.46029-10-terry.bowman@amd.com
+ Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
## include/linux/aer.h ##
@@ drivers/pci/pcie/aer.c: static bool find_source_device(struct pci_dev *parent,
int aer = dev->aer_cap;
u32 mask;
@@ drivers/pci/pcie/aer.c: static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
- mask &= ~PCI_ERR_COR_INTERNAL;
pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
}
-+EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
++/*
++ * Internal errors are too device-specific to enable generally, however for CXL
++ * their behavior is standardized for conveying CXL protocol errors.
++ */
++EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core");
++
+#ifdef CONFIG_PCIEAER_CXL
static bool is_cxl_mem_dev(struct pci_dev *dev)
{
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-14 18:20 ` [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error() Terry Bowman
2026-01-14 19:08 ` Jonathan Cameron
@ 2026-01-20 2:20 ` dan.j.williams
2026-01-20 15:15 ` Bowman, Terry
2026-01-22 18:48 ` Bjorn Helgaas
2 siblings, 1 reply; 129+ messages in thread
From: dan.j.williams @ 2026-01-20 2:20 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci, terry.bowman
Terry Bowman wrote:
> The AER driver includes significant logic for handling CXL protocol errors.
> The AER driver will be updated in the future to separate the AER and CXL
> logic.
>
> Rename the is_internal_error() function to is_aer_internal_error() as it
> gives a more precise indication of the purpose. Make is_aer_internal_error()
> non-static to allow for other PCI drivers to access.
Not even sure this rename is needed given that it is private to
drivers/pci/pcie/ and the sharing is only for cxl_{rch,vh}.c, not for
"other PCI drivers". Consistent with the idea that internal errors are
not going to become a first-class citizen let us keep this a CXL-only
consideration.
I'll update the changelog to drop the "other PCI drivers" comment.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management
2026-01-19 23:02 ` dan.j.williams
@ 2026-01-20 12:25 ` Jonathan Cameron
0 siblings, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-20 12:25 UTC (permalink / raw)
To: dan.j.williams
Cc: Terry Bowman, dave, dave.jiang, alison.schofield, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Mon, 19 Jan 2026 15:02:09 -0800
dan.j.williams@intel.com wrote:
> Jonathan Cameron wrote:
> [..]
> > Any thoughts on what I'm missing?
>
> Probably:
>
> cxl/port: Map Port component registers before switchport init
>
> ...which really should not be a distinct patch from the one that changes
> the ordering. I will send out a more patient series to settle this so
> Terry does not need to keep carrying it.
>
Ah. I stopped looking when I got to this patch :) Spot on - thanks!
Below is my suggestion of another approach. I think it ends up
a fair bit simpler, but you know this code a lot better than I do, so I may
be missing some problems! I like the simpler reaping. That approach looks
like it can be used elsewhere in this file.
Testing so far is one representative config only.
The patch you highlight above is squashed into this. At least one comment
I made on your patch applies here too (naughty me :)
From a95567d9e2e3809ca1953a9aec24324ae13f6145 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Tue, 20 Jan 2026 10:53:14 +0000
Subject: [PATCH] Alternative to Dan's approach for managing dport resources.
Create a devres group to manage a subset of resources.
On success, close the devres group, but also stash a copy in
struct cxl_dport so we can use it for dport reaping.
On error, when group open release it.
Somewhat tested, but not what I'd consider exhaustive yet!
Used a 2 port switch, 2 EP test config.
1. Unbind endpoints, checked reap happened.
2. Unbind port1, check tear down happened.
3. Add some errors so some calls fail (stepping through each
group that was created) Some cases get setup by
it having another try based on a different downstream device
arrival triggering the flow, but it seems fine.
---
drivers/cxl/core/port.c | 76 ++++++++++++++++++++---------------
drivers/cxl/cxl.h | 2 +
drivers/cxl/cxlpci.h | 4 ++
tools/testing/cxl/test/mock.c | 1 +
4 files changed, 51 insertions(+), 32 deletions(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index fef3aa0c6680..e3c53b2fc78f 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -778,7 +778,7 @@ static int cxl_setup_comp_regs(struct device *host, struct cxl_register_map *map
return cxl_setup_regs(map);
}
-static int cxl_port_setup_regs(struct cxl_port *port,
+int cxl_port_setup_regs(struct cxl_port *port,
resource_size_t component_reg_phys)
{
if (dev_is_platform(port->uport_dev))
@@ -786,6 +786,7 @@ static int cxl_port_setup_regs(struct cxl_port *port,
return cxl_setup_comp_regs(&port->dev, &port->reg_map,
component_reg_phys);
}
+EXPORT_SYMBOL_NS_GPL(cxl_port_setup_regs, "CXL");
static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
resource_size_t component_reg_phys)
@@ -1065,7 +1066,12 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
dev_name(dup->dport_dev));
return -EBUSY;
}
-
+ //comment from original patch here.
+ if (!port->nr_dports) {
+ rc = cxl_port_setup_regs(port, port->component_reg_phys);
+ if (rc)
+ return rc;
+ }
rc = xa_insert(&port->dports, (unsigned long)dport->dport_dev, dport,
GFP_KERNEL);
if (rc)
@@ -1181,20 +1187,6 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
if (rc)
return ERR_PTR(rc);
- /*
- * Setup port register if this is the first dport showed up. Having
- * a dport also means that there is at least 1 active link.
- */
- if (port->nr_dports == 1 &&
- port->component_reg_phys != CXL_RESOURCE_NONE) {
- rc = cxl_port_setup_regs(port, port->component_reg_phys);
- if (rc) {
- xa_erase(&port->dports, (unsigned long)dport->dport_dev);
- return ERR_PTR(rc);
- }
- port->component_reg_phys = CXL_RESOURCE_NONE;
- }
-
get_device(dport_dev);
rc = devm_add_action_or_reset(host, cxl_dport_remove, dport);
if (rc)
@@ -1441,11 +1433,7 @@ static void delete_switch_port(struct cxl_port *port)
static void del_dport(struct cxl_dport *dport)
{
- struct cxl_port *port = dport->port;
-
- devm_release_action(&port->dev, cxl_dport_unlink, dport);
- devm_release_action(&port->dev, cxl_dport_remove, dport);
- devm_kfree(&port->dev, dport);
+ devres_release_group(&dport->port->dev, dport->devres_group);
}
static void del_dports(struct cxl_port *port)
@@ -1602,6 +1590,9 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
struct device *dport_dev)
{
struct cxl_dport *dport;
+ struct cxl_dport *new_dport;
+ struct device *host;
+ void *devres_group;
int rc;
device_lock_assert(&port->dev);
@@ -1615,21 +1606,31 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
return ERR_PTR(-EBUSY);
}
- struct cxl_dport *new_dport __free(del_cxl_dport) =
- devm_cxl_add_dport_by_dev(port, dport_dev);
- if (IS_ERR(new_dport))
- return new_dport;
+ if (is_cxl_root(port))
+ host = port->uport_dev;
+ else
+ host = &port->dev;
+
+ devres_group = devres_open_group(host, NULL, GFP_KERNEL);
+ if (!devres_group)
+ return ERR_PTR(-ENOMEM);
- cxl_switch_parse_cdat(new_dport);
+ if (port->nr_dports == 0) {
+ rc = cxl_port_setup_regs(port, port->component_reg_phys);
+ if (rc)
+ goto release_group;
- if (ida_is_empty(&port->decoder_ida)) {
rc = devm_cxl_switch_port_decoders_setup(port);
if (rc)
- return ERR_PTR(rc);
- dev_dbg(&port->dev, "first dport%d:%s added with decoders\n",
- new_dport->port_id, dev_name(dport_dev));
- return no_free_ptr(new_dport);
+ goto release_group;
+ }
+
+ new_dport = devm_cxl_add_dport_by_dev(port, dport_dev);
+ if (IS_ERR(new_dport)) {
+ rc = PTR_ERR(new_dport);
+ goto release_group;
}
+ cxl_switch_parse_cdat(new_dport);
/* New dport added, update the decoder targets */
device_for_each_child(&port->dev, new_dport, update_decoder_targets);
@@ -1637,7 +1638,18 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
dev_dbg(&port->dev, "dport%d:%s added\n", new_dport->port_id,
dev_name(dport_dev));
- return no_free_ptr(new_dport);
+ /*
+ * Stash the group for use during reaping when all downstream devices
+ * go away.
+ */
+ new_dport->devres_group = devres_group;
+ devres_close_group(host, devres_group);
+
+ return new_dport;
+
+release_group:
+ devres_release_group(host, devres_group);
+ return ERR_PTR(rc);
}
static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev,
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index c796c3db36e0..855f7629f5c4 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -688,6 +688,7 @@ struct cxl_rcrb_info {
* @coord: access coordinates (bandwidth and latency performance attributes)
* @link_latency: calculated PCIe downstream latency
* @gpf_dvsec: Cached GPF port DVSEC
+ * @devres_group: Used to simplify reaping ports.
*/
struct cxl_dport {
struct device *dport_dev;
@@ -700,6 +701,7 @@ struct cxl_dport {
struct access_coordinate coord[ACCESS_COORDINATE_MAX];
long link_latency;
int gpf_dvsec;
+ void *devres_group;
};
/**
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 1d526bea8431..a3fbedb66dbb 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -132,4 +132,8 @@ void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
+
+int cxl_port_setup_regs(struct cxl_port *port,
+ resource_size_t component_reg_phys);
+
#endif /* __CXL_PCI_H__ */
^ permalink raw reply related [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-20 2:20 ` dan.j.williams
@ 2026-01-20 15:15 ` Bowman, Terry
2026-01-20 16:53 ` dan.j.williams
0 siblings, 1 reply; 129+ messages in thread
From: Bowman, Terry @ 2026-01-20 15:15 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
On 1/19/2026 8:20 PM, dan.j.williams@intel.com wrote:
> Terry Bowman wrote:
>> The AER driver includes significant logic for handling CXL protocol errors.
>> The AER driver will be updated in the future to separate the AER and CXL
>> logic.
>>
>> Rename the is_internal_error() function to is_aer_internal_error() as it
>> gives a more precise indication of the purpose. Make is_aer_internal_error()
>> non-static to allow for other PCI drivers to access.
>
> Not even sure this rename is needed given that it is private to
> drivers/pci/pcie/ and the sharing is only for cxl_{rch,vh}.c, not for
> "other PCI drivers". Consistent with the idea that internal errors are
> not going to become a first-class citizen let us keep this a CXL-only
> consideration.
>
> I'll update the changelog to drop the "other PCI drivers" comment.
The name choice was addressed by Bjorn here:
https://lore.kernel.org/linux-cxl/20251208180624.GA3300935@bhelgaas/
Terry
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-20 15:15 ` Bowman, Terry
@ 2026-01-20 16:53 ` dan.j.williams
0 siblings, 0 replies; 129+ messages in thread
From: dan.j.williams @ 2026-01-20 16:53 UTC (permalink / raw)
To: Bowman, Terry, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny
Cc: linux-kernel, linux-pci
Bowman, Terry wrote:
> On 1/19/2026 8:20 PM, dan.j.williams@intel.com wrote:
> > Terry Bowman wrote:
> >> The AER driver includes significant logic for handling CXL protocol errors.
> >> The AER driver will be updated in the future to separate the AER and CXL
> >> logic.
> >>
> >> Rename the is_internal_error() function to is_aer_internal_error() as it
> >> gives a more precise indication of the purpose. Make is_aer_internal_error()
> >> non-static to allow for other PCI drivers to access.
> >
> > Not even sure this rename is needed given that it is private to
> > drivers/pci/pcie/ and the sharing is only for cxl_{rch,vh}.c, not for
> > "other PCI drivers". Consistent with the idea that internal errors are
> > not going to become a first-class citizen let us keep this a CXL-only
> > consideration.
> >
> > I'll update the changelog to drop the "other PCI drivers" comment.
>
> The name choice was addressed by Bjorn here:
>
> https://lore.kernel.org/linux-cxl/20251208180624.GA3300935@bhelgaas/
Thanks, yes, I only folded in the following changes to the changelog:
10: 417535d35e9f ! 11: 098f14e1d884 PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
@@ Commit message
logic.
Rename the is_internal_error() function to is_aer_internal_error() as it
- gives a more precise indication of the purpose. Make is_aer_internal_error()
- non-static to allow for other PCI drivers to access.
+ gives a more precise indication of the purpose. Make
+ is_aer_internal_error() non-static to allow for the 2 different CXL
+ topology error model implementations (RCH and VH) to share this helper.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
-
- ---
-
- Changes in v13->v14:
- - New patch
+ Link: https://patch.msgid.link/20260114182055.46029-11-terry.bowman@amd.com
+ Signed-off-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 03/34] PCI: Introduce pcie_is_cxl()
2026-01-14 18:20 ` [PATCH v14 03/34] PCI: Introduce pcie_is_cxl() Terry Bowman
@ 2026-01-21 1:19 ` dan.j.williams
2026-01-22 18:39 ` Bjorn Helgaas
1 sibling, 0 replies; 129+ messages in thread
From: dan.j.williams @ 2026-01-21 1:19 UTC (permalink / raw)
To: bhelgaas
Cc: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci, terry.bowman
Terry Bowman wrote:
> CXL and AER drivers need the ability to identify CXL devices.
>
> Introduce set_pcie_cxl() with logic checking for CXL.mem or CXL.cache
> status in the CXL Flex Bus DVSEC status register. The CXL Flex Bus DVSEC
> presence is used because it is required for all the CXL PCIe devices.[1]
>
> Add boolean 'struct pci_dev::is_cxl' with the purpose to cache the CXL
> CXL.cache and CXl.mem status.
>
> Call set_pcie_cxl() for the parent bridge. Once a device is created there
> is a possibility the parent training or CXL state was updated as well. This
> will make certain the correct parent CXL state is cached.
>
> Add function pcie_is_cxl() to return 'struct pci_dev::is_cxl'.
>
> [1] CXL 3.1 Spec, 8.1.1 PCIe Designated Vendor-Specific Extended
> Capability (DVSEC) ID Assignment, Table 8-2
Hi Bjorn, this is probably the highest impact change to the PCI core to
enable the rest of the series that does not yet have your ack.
I think the changelog could use some enhancement to highlight more of
the "why" than the "what":
---
CXL is a protocol that runs on top of PCIe electricals. Its error model
also runs on top of the PCIe AER error model by standardizing "internal"
errors as "CXL" errors. Linux has historically ignored internal errors.
CXL protocol error handling is then a task of enhancing the PCIe AER
core to understand that PCIe ports (upstream and downstream) and
endpoints may throw internal errors that represent standard CXL protocol
errors.
The proposed method to make that determination is to teach 'struct
pci_dev' to cache when its link has trained the CXL.mem and/or CXL.cache
protocols and then treat all internal errors as CXL errors. A design
goal is to not burden the PCIe AER core with CXL knowledge beyond just
enough to forward error notifications to the CXL RAS core. The forwarded
notification looks up a 'struct cxl_port' or 'struct cxl_dport'
companion device to the PCI device.
Introduce set_pcie_cxl() with logic checking for CXL.mem or CXL.cache
status in the CXL Flex Bus DVSEC status register. The CXL Flex Bus DVSEC
presence is used because it is required for all the CXL PCIe devices.[1]
[1] CXL 3.1 Spec, 8.1.1 PCIe Designated Vendor-Specific Extended
Capability (DVSEC) ID Assignment, Table 8-2
---
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors()
2026-01-20 2:09 ` dan.j.williams
@ 2026-01-22 10:31 ` Lukas Wunner
2026-01-22 16:48 ` dan.j.williams
0 siblings, 1 reply; 129+ messages in thread
From: Lukas Wunner @ 2026-01-22 10:31 UTC (permalink / raw)
To: dan.j.williams
Cc: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Mon, Jan 19, 2026 at 06:09:39PM -0800, dan.j.williams@intel.com wrote:
> Terry Bowman wrote:
> > Internal PCIe errors are not enabled by default during initialization. This
> > creates a problem for CXL drivers, which rely on PCIe Correctable and
> > Uncorrectable Internal Errors to receive CXL protocol error notifications.
> >
> > Export pci_aer_unmask_internal_errors() so CXL and other drivers can
> > enable internal PCIe errors.
>
> I folded in the following to this patch because opening up internal
> errors for PCIe drivers in general is not a goal.
As said, the "xe" driver needs to unmask Internal Errors and could
take advantage of this helper, so I'd call opening this up for PCI
drivers if not a goal then at least a "desirable side effect". ;)
https://lore.kernel.org/all/aR1_M_i3yIygd8v-@wunner.de/
> + Internal PCIe errors are not enabled by default during initialization
> + because their behavior is too device-specific and there is no standard way
> + to reason about them.
Well, they're not enabled by default because per the spec they're
masked in the Uncorrectable Error Mask and Correctable Error Mask
Registers. It's up to drivers to unmask them if they know the
hardware signals them. CXL just happens be one of those drivers.
> @@ drivers/pci/pcie/aer.c: static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> - mask &= ~PCI_ERR_COR_INTERNAL;
> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> }
Unexplained change vis-à-vis Terry's submission. It seems you're
reading the Correctable Error Mask Register and writing the same
value back. That's doesn't seem to make sense.
> -+EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
>
> ++/*
> ++ * Internal errors are too device-specific to enable generally, however for CXL
> ++ * their behavior is standardized for conveying CXL protocol errors.
> ++ */
> ++EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core");
> ++
This change will require touching aer.c every time a driver
(such as xe) has the need to unmask Internal Errors.
Not sure if that's such a good idea...
Thanks,
Lukas
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-15 20:42 ` dan.j.williams
@ 2026-01-22 13:34 ` Lukas Wunner
2026-01-22 19:09 ` dan.j.williams
0 siblings, 1 reply; 129+ messages in thread
From: Lukas Wunner @ 2026-01-22 13:34 UTC (permalink / raw)
To: dan.j.williams
Cc: Jonathan Cameron, Terry Bowman, dave, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Thu, Jan 15, 2026 at 12:42:36PM -0800, dan.j.williams@intel.com wrote:
> I agree with the general sentiment, but not the conclusion, especially
> because this is a private detail. Linux has long ignored internal
> errors. The only reason to consider them now is because CXL decided to
> multiplex its error model on top of this oft-ignored feature of PCIe
> AER.
>
> Specifically, portdrv.h is not in the global include namespace, this is
> a private detail of the only conumer of internal errors:
> drivers/pci/pcie/aer_cxl_{rch,vh}.c
>
> At most we should have this as a comment to clarify:
>
> /*
> * Note, internal errors are only considered for the CXL error model,
> * not for other implementations.
> */
>
> ...and the pci_aer_unmask_internal_errors() export should be:
>
> EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core")
>
> ...for the same reason. Steer folks away from thinking that it is open
> season for adding more internal error support.
It's not like Internal Errors are a bad thing per se. They're a way
to signal "other" errors besides the spec-defined ones.
As an example, and I'm keeping this in general terms to avoid devulging
information about future products, a device possessing ECC RAM may raise
a Correctable Internal Error when ECC successfully recovers from flipped
bits because it allows alerting the user in advance that the device might
need to be replaced in the near future. If ECC recovery fails, the device
might try to use a reserved spare portion of RAM in lieu of the failing one
and instruct the AER driver to recover through a bus reset. Such errors
are not covered by the spec-defined types. Using the Internal Error type
is the only possibility it seems.
My point is, there are valid (upcoming, not theoretical) use cases for
Internal Errors and creating infrastructure in the kernel to take advantage
of them is a good thing. Hence my continued pushing back on hiding or
discouraging their use.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors()
2026-01-22 10:31 ` Lukas Wunner
@ 2026-01-22 16:48 ` dan.j.williams
2026-01-22 18:51 ` Lukas Wunner
0 siblings, 1 reply; 129+ messages in thread
From: dan.j.williams @ 2026-01-22 16:48 UTC (permalink / raw)
To: Lukas Wunner, dan.j.williams
Cc: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
Lukas Wunner wrote:
> On Mon, Jan 19, 2026 at 06:09:39PM -0800, dan.j.williams@intel.com wrote:
> > Terry Bowman wrote:
> > > Internal PCIe errors are not enabled by default during initialization. This
> > > creates a problem for CXL drivers, which rely on PCIe Correctable and
> > > Uncorrectable Internal Errors to receive CXL protocol error notifications.
> > >
> > > Export pci_aer_unmask_internal_errors() so CXL and other drivers can
> > > enable internal PCIe errors.
> >
> > I folded in the following to this patch because opening up internal
> > errors for PCIe drivers in general is not a goal.
>
> As said, the "xe" driver needs to unmask Internal Errors and could
> take advantage of this helper, so I'd call opening this up for PCI
> drivers if not a goal then at least a "desirable side effect". ;)
>
> https://lore.kernel.org/all/aR1_M_i3yIygd8v-@wunner.de/
I missed that earlier. How did xe manage to be the only device in the history
of Linux that needs internal errors unmasked?
What happens if Linux says "no, that error model has never been supported and
it creates in ongoing mental / maintenance load of internal errors do not
matter for PCIe, only CXL, (except xe)."
> > + Internal PCIe errors are not enabled by default during initialization
> > + because their behavior is too device-specific and there is no standard way
> > + to reason about them.
>
> Well, they're not enabled by default because per the spec they're
> masked in the Uncorrectable Error Mask and Correctable Error Mask
> Registers. It's up to drivers to unmask them if they know the
> hardware signals them. CXL just happens be one of those drivers.
>
> > @@ drivers/pci/pcie/aer.c: static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> > - mask &= ~PCI_ERR_COR_INTERNAL;
> > pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> > }
>
> Unexplained change vis-à-vis Terry's submission. It seems you're
> reading the Correctable Error Mask Register and writing the same
> value back. That's doesn't seem to make sense.
No, sorry, this an interdiff so that change was just a change in context.
It also caused me to do a double-take until I realized it was a pure hunk
context change.
>
> > -+EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
> >
> > ++/*
> > ++ * Internal errors are too device-specific to enable generally, however for CXL
> > ++ * their behavior is standardized for conveying CXL protocol errors.
> > ++ */
> > ++EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core");
> > ++
>
> This change will require touching aer.c every time a driver
> (such as xe) has the need to unmask Internal Errors.
> Not sure if that's such a good idea...
The xe driver can always come back and change this to plain EXPORT_SYMBOL_GPL()
once the clear the hurdle above of, "please reconsider your error model to not
require this never needed before feature of AER".
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 11/34] PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c
2026-01-14 18:20 ` [PATCH v14 11/34] PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c Terry Bowman
@ 2026-01-22 17:23 ` Markus Elfring
2026-01-22 20:05 ` Bowman, Terry
2026-01-22 18:53 ` Bjorn Helgaas
1 sibling, 1 reply; 129+ messages in thread
From: Markus Elfring @ 2026-01-22 17:23 UTC (permalink / raw)
To: Terry Bowman, linux-pci, linux-cxl, Alejandro Lucero Palau,
Alison Schofield, Benjamin Cheatham, Bjorn Helgaas, Dan Carpenter,
Dan Williams, Dave Jiang, Davidlohr Bueso, Ira Weiny,
Jonathan Cameron, Kuppuswamy Sathyanarayanan, Lukas Wunner,
Li Ming, Pradeep Vinesh Reddy Kodamati, Robert Richter,
Shiju Jose, Smita Koralahalli, Vishal Verma
Cc: LKML
…
> +++ b/drivers/accel/thames/thames_gem.c
> @@ -0,0 +1,353 @@
…
> +static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> +{
…
> + device_lock(&dev->dev);
> +
> + err_handler = dev->driver ? dev->driver->err_handler : NULL;
…
> +out:
> + device_unlock(&dev->dev);
> + return 0;
> +}
…
Under which circumstances would you become interested to apply a statement
like “guard(device)(&dev->dev);”?
https://elixir.bootlin.com/linux/v6.19-rc5/source/include/linux/device.h#L913
Regards,
Markus
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native()
2026-01-14 18:20 ` [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native() Terry Bowman
2026-01-14 18:55 ` Jonathan Cameron
2026-01-14 20:15 ` Dave Jiang
@ 2026-01-22 18:23 ` Bjorn Helgaas
2 siblings, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:23 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:27PM -0600, Terry Bowman wrote:
> The AER driver includes a CXL support function cxl_error_is_native(). This
> function adds no additional value from pcie_aer_is_native().
>
> Simplify the codebase by removing cxl_error_is_native() and replace
> occurrences of cxl_error_is_native() with pcie_aer_is_native().
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>
> Changes in v13->v14:
> - New commit (Dan)
> ---
> drivers/pci/pcie/aer.c | 11 ++---------
> 1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e0bcaa896803..c99ba2a1159c 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1166,13 +1166,6 @@ static bool is_cxl_mem_dev(struct pci_dev *dev)
> return true;
> }
>
> -static bool cxl_error_is_native(struct pci_dev *dev)
> -{
> - struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> -
> - return (pcie_ports_native || host->native_aer);
> -}
> -
> static bool is_internal_error(struct aer_err_info *info)
> {
> if (info->severity == AER_CORRECTABLE)
> @@ -1186,7 +1179,7 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> struct aer_err_info *info = (struct aer_err_info *)data;
> const struct pci_error_handlers *err_handler;
>
> - if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
> + if (!is_cxl_mem_dev(dev) || !pcie_aer_is_native(dev))
> return 0;
>
> /* Protect dev->driver */
> @@ -1227,7 +1220,7 @@ static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> bool *handles_cxl = data;
>
> if (!*handles_cxl)
> - *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
> + *handles_cxl = is_cxl_mem_dev(dev) && pcie_aer_is_native(dev);
>
> /* Non-zero terminates iteration */
> return *handles_cxl;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS
2026-01-14 18:20 ` [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS Terry Bowman
` (2 preceding siblings ...)
2026-01-14 20:50 ` Dave Jiang
@ 2026-01-22 18:24 ` Bjorn Helgaas
3 siblings, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:24 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:34PM -0600, Terry Bowman wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> One of the primary reasons for the CXL driver to exist is to perform error
> handling. If both PCIEAER and CXL are enabled then light up CXL error
> handling as well. The work to remove CONFIG_PCIEAER_CXL started in:
>
> commit 4ae6ae66649c ("cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c")
>
> Finish that off with conditionally compiling all CXL RAS related helpers
> with CONFIG_CXL_RAS.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
> ----
>
> Changes in v13->v14:
> - New commit
> ---
> drivers/cxl/Kconfig | 2 +-
> drivers/pci/pcie/Kconfig | 9 ---------
> 2 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 217888992c88..70acddc08c39 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -235,6 +235,6 @@ config CXL_MCE
>
> config CXL_RAS
> def_bool y
> - depends on ACPI_APEI_GHES && PCIEAER && CXL_PCI
> + depends on ACPI_APEI_GHES && PCIEAER && CXL_BUS
>
> endif
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 17919b99fa66..207c2deae35f 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,15 +49,6 @@ config PCIEAER_INJECT
> gotten from:
> https://github.com/intel/aer-inject.git
>
> -config PCIEAER_CXL
> - bool "PCI Express CXL RAS support"
> - default y
> - depends on PCIEAER && CXL_PCI
> - help
> - Enables CXL error handling.
> -
> - If unsure, say Y.
> -
> #
> # PCI Express ECRC
> #
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 31/34] PCI: Introduce CXL Port protocol error handlers
2026-01-14 18:20 ` [PATCH v14 31/34] PCI: Introduce CXL Port protocol error handlers Terry Bowman
2026-01-14 23:37 ` Dave Jiang
@ 2026-01-22 18:27 ` Bjorn Helgaas
1 sibling, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:27 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:52PM -0600, Terry Bowman wrote:
> Add CXL protocol error handlers for CXL Port devices (Root Ports,
> Downstream Ports, and Upstream Ports). Implement cxl_port_cor_error_detected()
> and cxl_port_error_detected() to handle correctable and uncorrectable errors
> respectively.
>
> Introduce cxl_get_ras_base() to retrieve the cached RAS register base
> address for a given CXL port. This function supports CXL Root Ports,
> Downstream Ports, Upstream Ports, and Endpoints by returning their
> previously mapped RAS register addresses.
>
> Update the AER driver's is_cxl_error() to recognize CXL Port devices in
> addition to CXL Endpoints, as both now have CXL-specific error handlers.
>
> Future patch(es) will include port error handling changes to support
> Endpoint protocol errors.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
To me, this is primarily a CXL patch and the PCI changes are only
incidental, so I would use a "cxl" prefix on the subject line.
> drivers/cxl/core/ras.c | 101 +++++++++++++++++++++++++++++++++-
> drivers/pci/pci.c | 1 +
> drivers/pci/pci.h | 2 -
> drivers/pci/pcie/aer.c | 1 +
> drivers/pci/pcie/aer_cxl_vh.c | 5 +-
> include/linux/aer.h | 2 +
> include/linux/pci.h | 2 +
> 7 files changed, 109 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 0c640b84ad70..96ce85cc0a46 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -200,6 +200,67 @@ static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
> return NULL;
> }
>
> +static void __iomem *cxl_get_ras_base(struct device *dev)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + {
> + struct cxl_dport *dport;
> + struct cxl_port *port __free(put_cxl_port) = find_cxl_port(&pdev->dev, &dport);
> +
> + if (!dport) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return dport->regs.ras;
> + }
> + case PCI_EXP_TYPE_UPSTREAM:
> + {
> + struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port->regs.ras;
> + }
> + }
> + dev_warn_once(dev, "Error: Unsupported device type (%#x)", pci_pcie_type(pdev));
> + return NULL;
> +}
> +
> +static pci_ers_result_t cxl_port_error_detected(struct device *dev);
> +
> +static void cxl_do_recovery(struct pci_dev *pdev)
> +{
> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
> + pci_ers_result_t status;
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device\n");
> + return;
> + }
> +
> + status = cxl_port_error_detected(&pdev->dev);
> + if (status == PCI_ERS_RESULT_PANIC)
> + panic("CXL cachemem error.");
> +
> + /*
> + * If we have native control of AER, clear error status in the device
> + * that detected the error. If the platform retained control of AER,
> + * it is responsible for clearing this status. In that case, the
> + * signaling device may not even be visible to the OS.
> + */
> + if (pcie_aer_is_native(pdev)) {
> + pcie_clear_device_status(pdev);
> + pci_aer_clear_nonfatal_status(pdev);
> + pci_aer_clear_fatal_status(pdev);
> + }
> +}
> +
> void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> {
> void __iomem *addr;
> @@ -214,7 +275,10 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> return;
> writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
>
> - trace_cxl_aer_correctable_error(dev, status, serial);
> + if (is_cxl_memdev(dev))
> + trace_cxl_aer_correctable_error(dev, status, serial);
> + else
> + trace_cxl_port_aer_correctable_error(dev, status);
> }
>
> /* CXL spec rev3.0 8.2.4.16.1 */
> @@ -265,12 +329,27 @@ bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> }
>
> header_log_copy(ras_base, hl);
> - trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
> +
> + if (is_cxl_memdev(dev))
> + trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
> + else
> + trace_cxl_port_aer_uncorrectable_error(dev, status, fe, hl);
> +
> writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
>
> return true;
> }
>
> +static void cxl_port_cor_error_detected(struct device *dev)
> +{
> + cxl_handle_cor_ras(dev, 0, cxl_get_ras_base(dev));
> +}
> +
> +static pci_ers_result_t cxl_port_error_detected(struct device *dev)
> +{
> + return cxl_handle_ras(dev, 0, cxl_get_ras_base(dev));
> +}
> +
> void cxl_cor_error_detected(struct pci_dev *pdev)
> {
> struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> @@ -346,6 +425,24 @@ EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
>
> static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> {
> + struct pci_dev *pdev = err_info->pdev;
> +
> + if (err_info->severity == AER_CORRECTABLE) {
> +
> + if (!pcie_aer_is_native(pdev))
> + return;
> +
> + if (pdev->aer_cap)
> + pci_clear_and_set_config_dword(pdev,
> + pdev->aer_cap + PCI_ERR_COR_STATUS,
> + 0, PCI_ERR_COR_INTERNAL);
> +
> + cxl_port_cor_error_detected(&pdev->dev);
> +
> + pcie_clear_device_status(pdev);
> + } else {
> + cxl_do_recovery(pdev);
> + }
> }
>
> static void cxl_proto_err_work_fn(struct work_struct *work)
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..b7bfefdaf990 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2248,6 +2248,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
> pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> }
> +EXPORT_SYMBOL_GPL(pcie_clear_device_status);
> #endif
>
> /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index dbc547db208a..8bb703524f52 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -229,7 +229,6 @@ void pci_refresh_power_state(struct pci_dev *dev);
> int pci_power_up(struct pci_dev *dev);
> void pci_disable_enabled_device(struct pci_dev *dev);
> int pci_finish_runtime_suspend(struct pci_dev *dev);
> -void pcie_clear_device_status(struct pci_dev *dev);
> void pcie_clear_root_pme_status(struct pci_dev *dev);
> bool pci_check_pme_status(struct pci_dev *dev);
> void pci_pme_wakeup_bus(struct pci_bus *bus);
> @@ -1196,7 +1195,6 @@ void pci_restore_aer_state(struct pci_dev *dev);
> static inline void pci_no_aer(void) { }
> static inline void pci_aer_init(struct pci_dev *d) { }
> static inline void pci_aer_exit(struct pci_dev *d) { }
> -static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
> static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
> static inline void pci_save_aer_state(struct pci_dev *dev) { }
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index c2030d32a19c..dd7c49651612 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -298,6 +298,7 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
> if (status)
> pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> }
> +EXPORT_SYMBOL_GPL(pci_aer_clear_fatal_status);
>
> /**
> * pci_aer_raw_clear_status - Clear AER error registers.
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> index 0f616f5fafcf..aa69e504302f 100644
> --- a/drivers/pci/pcie/aer_cxl_vh.c
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -34,7 +34,10 @@ bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
> if (!info || !info->is_cxl)
> return false;
>
> - if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
> + if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
> return false;
>
> return is_aer_internal_error(info);
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index f351e41dd979..c1aef7859d0a 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -65,6 +65,7 @@ struct cxl_proto_err_work_data {
>
> #if defined(CONFIG_PCIEAER)
> int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> +void pci_aer_clear_fatal_status(struct pci_dev *dev);
> int pcie_aer_is_native(struct pci_dev *dev);
> void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> #else
> @@ -72,6 +73,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> {
> return -EINVAL;
> }
> +static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> #endif
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index ee05d5925b13..1ef4743bf151 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1921,8 +1921,10 @@ static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { }
>
> #ifdef CONFIG_PCIEAER
> bool pci_aer_available(void);
> +void pcie_clear_device_status(struct pci_dev *dev);
> #else
> static inline bool pci_aer_available(void) { return false; }
> +static inline void pcie_clear_device_status(struct pci_dev *dev) { }
> #endif
>
> bool pci_ats_disabled(void);
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting
2026-01-14 18:20 ` [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting Terry Bowman
2026-01-14 19:48 ` Jonathan Cameron
2026-01-14 21:06 ` Dave Jiang
@ 2026-01-22 18:29 ` Bjorn Helgaas
2 siblings, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:29 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:36PM -0600, Terry Bowman wrote:
> Update the existing 'struct aer_err_info' definition to use kernel-doc
> formatting. Remove the inline comments to reduce noise and do not introduce
> functional changes. This will improve readability and maintainability.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>
> Changes in v13->v14:
> - New commit
> ---
> drivers/pci/pci.h | 29 +++++++++++++++++++++++------
> 1 file changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 41ec38e82c08..dbc547db208a 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -724,16 +724,33 @@ static inline bool pci_dev_binding_disallowed(struct pci_dev *dev)
>
> #define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */
>
> +/**
> + * struct aer_err_info - AER Error Information
> + * @dev: Devices reporting error
> + * @ratelimit_print: Flag to log or not log the devices' error. 0=NotLog/1=Log
> + * @error_dev_num: Number of devices reporting an error
> + * @level: printk level to use in logging
> + * @id: Value from register PCI_ERR_ROOT_ERR_SRC
> + * @severity: AER severity, 0-UNCOR Non-fatal, 1-UNCOR fatal, 2-COR
> + * @root_ratelimit_print: Flag to log or not log the root's error. 0=NotLog/1=Log
> + * @multi_error_valid: If multiple errors are reported
> + * @first_error: First reported error
> + * @is_cxl: Bus type error: 0-PCI Bus error, 1-CXL Bus error
> + * @tlp_header_valid: Indicates if TLP field contains error information
> + * @status: COR/UNCOR error status
> + * @mask: COR/UNCOR mask
> + * @tlp: Transaction packet information
> + */
> struct aer_err_info {
> struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
> int ratelimit_print[AER_MAX_MULTI_ERR_DEVICES];
> int error_dev_num;
> - const char *level; /* printk level */
> + const char *level;
>
> unsigned int id:16;
>
> - unsigned int severity:2; /* 0:NONFATAL | 1:FATAL | 2:COR */
> - unsigned int root_ratelimit_print:1; /* 0=skip, 1=print */
> + unsigned int severity:2;
> + unsigned int root_ratelimit_print:1;
> unsigned int __pad1:4;
> unsigned int multi_error_valid:1;
>
> @@ -742,9 +759,9 @@ struct aer_err_info {
> unsigned int is_cxl:1;
> unsigned int tlp_header_valid:1;
>
> - unsigned int status; /* COR/UNCOR Error Status */
> - unsigned int mask; /* COR/UNCOR Error Mask */
> - struct pcie_tlp_log tlp; /* TLP Header */
> + unsigned int status;
> + unsigned int mask;
> + struct pcie_tlp_log tlp;
> };
>
> int aer_get_device_error_info(struct aer_err_info *info, int i);
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error
2026-01-14 18:20 ` [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2026-01-14 23:18 ` Dave Jiang
2026-01-15 16:01 ` Jonathan Cameron
@ 2026-01-22 18:32 ` Bjorn Helgaas
2 siblings, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:32 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:51PM -0600, Terry Bowman wrote:
> The AER driver now forwards CXL protocol errors to the CXL driver via a
> kfifo. The CXL driver must consume these work items and initiate protocol
> error handling while ensuring the device's RAS mappings remain valid
> throughout processing.
>
> Implement cxl_proto_err_work_fn() to dequeue work items forwarded by the
> AER service driver. Lock the parent CXL Port device to ensure the CXL
> device's RAS registers are accessible during handling. Add pdev reference-put
> to match reference-get in AER driver. This will ensure pdev access after
> kfifo dequeue. These changes apply to CXL Ports and CXL Endpoints.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
I suppose you used a "PCI/AER" prefix just so I would look at this
patch :) Like the other one, this only touches drivers/pci
incidentally, so I don't think it really merits "PCI/AER". Might just
have to poke me directly if you want my ack on things like this.
> drivers/cxl/core/core.h | 3 ++
> drivers/cxl/core/port.c | 6 +--
> drivers/cxl/core/ras.c | 98 +++++++++++++++++++++++++++++++----
> drivers/pci/pcie/aer_cxl_vh.c | 1 +
> 4 files changed, 94 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 306762a15dc0..39324e1b8940 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -169,6 +169,9 @@ static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
> #endif /* CONFIG_CXL_RAS */
>
> int cxl_gpf_port_setup(struct cxl_dport *dport);
> +struct cxl_port *find_cxl_port(struct device *dport_dev,
> + struct cxl_dport **dport);
> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev);
>
> struct cxl_hdm;
> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index a535e57360e0..0bec10be5d56 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1335,8 +1335,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
> return NULL;
> }
>
> -static struct cxl_port *find_cxl_port(struct device *dport_dev,
> - struct cxl_dport **dport)
> +struct cxl_port *find_cxl_port(struct device *dport_dev,
> + struct cxl_dport **dport)
> {
> struct cxl_find_port_ctx ctx = {
> .dport_dev = dport_dev,
> @@ -1578,7 +1578,7 @@ static int match_port_by_uport(struct device *dev, const void *data)
> * Function takes a device reference on the port device. Caller should do a
> * put_device() when done.
> */
> -static struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
> {
> struct device *dev;
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index bf82880e19b4..0c640b84ad70 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -117,17 +117,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
> }
> static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>
> -int cxl_ras_init(void)
> -{
> - return cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
> -}
> -
> -void cxl_ras_exit(void)
> -{
> - cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> - cancel_work_sync(&cxl_cper_prot_err_work);
> -}
> -
> static void cxl_dport_map_ras(struct cxl_dport *dport)
> {
> struct cxl_register_map *map = &dport->reg_map;
> @@ -173,6 +162,44 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>
> +/*
> + * Return 'struct cxl_port *' parent CXL Port of dev
> + *
> + * Reference count increments returned port on success
> + *
> + * @pdev: Find the parent CXL Port of this device
> + */
> +static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
> +{
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + {
> + struct cxl_dport *dport;
> + struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port;
> + }
> + case PCI_EXP_TYPE_UPSTREAM:
> + case PCI_EXP_TYPE_ENDPOINT:
> + {
> + struct cxl_port *port = find_cxl_port_by_uport(&pdev->dev);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port;
> + }
> + }
> + pci_warn_once(pdev, "Error: Unsupported device type (%#x)", pci_pcie_type(pdev));
> + return NULL;
> +}
> +
> void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> {
> void __iomem *addr;
> @@ -316,3 +343,52 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> return PCI_ERS_RESULT_NEED_RESET;
> }
> EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
> +
> +static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> +{
> +}
> +
> +static void cxl_proto_err_work_fn(struct work_struct *work)
> +{
> + struct cxl_proto_err_work_data wd;
> +
> + while (cxl_proto_err_kfifo_get(&wd)) {
> + struct pci_dev *pdev __free(pci_dev_put) = wd.pdev;
> +
> + if (!pdev) {
> + pr_err_ratelimited("NULL PCI device passed in AER-CXL KFIFO\n");
> + continue;
> + }
> +
> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
> + if (!port) {
> + pr_err_ratelimited("Failed to find parent Port device in CXL topology.\n");
> + continue;
> + }
> + guard(device)(&port->dev);
> +
> + cxl_handle_proto_error(&wd);
> + }
> +}
> +
> +static struct work_struct cxl_proto_err_work;
> +static DECLARE_WORK(cxl_proto_err_work, cxl_proto_err_work_fn);
> +
> +int cxl_ras_init(void)
> +{
> + if (cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work))
> + pr_err("Failed to initialize CXL RAS CPER\n");
> +
> + cxl_register_proto_err_work(&cxl_proto_err_work);
> +
> + return 0;
> +}
> +
> +void cxl_ras_exit(void)
> +{
> + cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> + cancel_work_sync(&cxl_cper_prot_err_work);
> +
> + cxl_unregister_proto_err_work();
> + cancel_work_sync(&cxl_proto_err_work);
> +}
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> index 2189d3c6cef1..0f616f5fafcf 100644
> --- a/drivers/pci/pcie/aer_cxl_vh.c
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -48,6 +48,7 @@ void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info)
> };
>
> guard(rwsem_read)(&cxl_proto_err_kfifo.rw_sema);
> + pci_dev_get(pdev);
> if (!cxl_proto_err_kfifo.work || !kfifo_put(&cxl_proto_err_kfifo.fifo, wd)) {
> dev_err_ratelimited(&pdev->dev, "AER-CXL kfifo error");
> return;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 02/34] PCI: Update CXL DVSEC definitions
2026-01-14 18:20 ` [PATCH v14 02/34] PCI: Update CXL DVSEC definitions Terry Bowman
2026-01-14 18:53 ` Jonathan Cameron
@ 2026-01-22 18:37 ` Bjorn Helgaas
1 sibling, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:37 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:23PM -0600, Terry Bowman wrote:
> CXL DVSEC definitions were recently moved into uapi/pci_regs.h, but the
> newly added macros do not follow the file's existing naming conventions.
> The current format uses CXL_DVSEC_XYZ, while the new CXL entries must
> instead use the PCI_DVSEC_CXL_XYZ prefix to match the conventions already
> established in pci_regs.h.
>
> The new CXL DVSEC macros also introduce _MASK and _OFFSET suffixes, which
> are not used anywhere else in the file. These suffixes lengthen the
> identifiers and reduce readability. Remove _MASK and _OFFSET from the
> recently added definitions.
>
> Additionally, remove PCI_DVSEC_HEADER1_LENGTH, as it duplicates the existing
> PCI_DVSEC_HEADER1_LEN() macro.
>
> Update all existing references to use the new macro names.
>
> Finally, update the inline documentation to reference the latest revision
> of the CXL specification.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
I do agree that PCI_DVSEC_CXL_CAP seems possibly a little too generic
given that there may be more CXL-related DVSECs, so I like Jonathan's
PCI_DVSEC_CXL_DEVICE_CAP idea.
Keep my ack either way.
> ---
>
> Changes in v13->v14:
> - New patch. Split from previous patch such that there is now a separate
> move patch and a format fix patch.
> - Formatting update requested (Bjorn)
> - Remove PCI_DVSEC_HEADER1_LENGTH_MASK because it duplicates
> PCI_DVSEC_HEADER1_LEN() (Bjorn)
> - Add Dan's review-by
> ---
> drivers/cxl/core/pci.c | 58 ++++++++++-----------
> drivers/cxl/core/regs.c | 14 +++---
> drivers/cxl/pci.c | 2 +-
> include/uapi/linux/pci_regs.h | 94 ++++++++++++++++-------------------
> 4 files changed, 81 insertions(+), 87 deletions(-)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 5b023a0178a4..077b386e0c8d 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -86,12 +86,12 @@ static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
> i = 1;
> do {
> rc = pci_read_config_dword(pdev,
> - d + CXL_DVSEC_RANGE_SIZE_LOW(id),
> + d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
> &temp);
> if (rc)
> return rc;
>
> - valid = FIELD_GET(CXL_DVSEC_MEM_INFO_VALID, temp);
> + valid = FIELD_GET(PCI_DVSEC_CXL_MEM_INFO_VALID, temp);
> if (valid)
> break;
> msleep(1000);
> @@ -121,11 +121,11 @@ static int cxl_dvsec_mem_range_active(struct cxl_dev_state *cxlds, int id)
> /* Check MEM ACTIVE bit, up to 60s timeout by default */
> for (i = media_ready_timeout; i; i--) {
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(id), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id), &temp);
> if (rc)
> return rc;
>
> - active = FIELD_GET(CXL_DVSEC_MEM_ACTIVE, temp);
> + active = FIELD_GET(PCI_DVSEC_CXL_MEM_ACTIVE, temp);
> if (active)
> break;
> msleep(1000);
> @@ -154,11 +154,11 @@ int cxl_await_media_ready(struct cxl_dev_state *cxlds)
> u16 cap;
>
> rc = pci_read_config_word(pdev,
> - d + CXL_DVSEC_CAP_OFFSET, &cap);
> + d + PCI_DVSEC_CXL_CAP, &cap);
> if (rc)
> return rc;
>
> - hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap);
> + hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT, cap);
> for (i = 0; i < hdm_count; i++) {
> rc = cxl_dvsec_mem_range_valid(cxlds, i);
> if (rc)
> @@ -186,16 +186,16 @@ static int cxl_set_mem_enable(struct cxl_dev_state *cxlds, u16 val)
> u16 ctrl;
> int rc;
>
> - rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
> + rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL, &ctrl);
> if (rc < 0)
> return rc;
>
> - if ((ctrl & CXL_DVSEC_MEM_ENABLE) == val)
> + if ((ctrl & PCI_DVSEC_CXL_MEM_ENABLE) == val)
> return 1;
> - ctrl &= ~CXL_DVSEC_MEM_ENABLE;
> + ctrl &= ~PCI_DVSEC_CXL_MEM_ENABLE;
> ctrl |= val;
>
> - rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl);
> + rc = pci_write_config_word(pdev, d + PCI_DVSEC_CXL_CTRL, ctrl);
> if (rc < 0)
> return rc;
>
> @@ -211,7 +211,7 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
> {
> int rc;
>
> - rc = cxl_set_mem_enable(cxlds, CXL_DVSEC_MEM_ENABLE);
> + rc = cxl_set_mem_enable(cxlds, PCI_DVSEC_CXL_MEM_ENABLE);
> if (rc < 0)
> return rc;
> if (rc > 0)
> @@ -273,11 +273,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
> return -ENXIO;
> }
>
> - rc = pci_read_config_word(pdev, d + CXL_DVSEC_CAP_OFFSET, &cap);
> + rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CAP, &cap);
> if (rc)
> return rc;
>
> - if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
> + if (!(cap & PCI_DVSEC_CXL_MEM_CAPABLE)) {
> dev_dbg(dev, "Not MEM Capable\n");
> return -ENXIO;
> }
> @@ -288,7 +288,7 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
> * driver is for a spec defined class code which must be CXL.mem
> * capable, there is no point in continuing to enable CXL.mem.
> */
> - hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap);
> + hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT, cap);
> if (!hdm_count || hdm_count > 2)
> return -EINVAL;
>
> @@ -297,11 +297,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
> * disabled, and they will remain moot after the HDM Decoder
> * capability is enabled.
> */
> - rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
> + rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL, &ctrl);
> if (rc)
> return rc;
>
> - info->mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl);
> + info->mem_enabled = FIELD_GET(PCI_DVSEC_CXL_MEM_ENABLE, ctrl);
> if (!info->mem_enabled)
> return 0;
>
> @@ -314,35 +314,35 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
> return rc;
>
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_SIZE_HIGH(i), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i), &temp);
> if (rc)
> return rc;
>
> size = (u64)temp << 32;
>
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(i), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(i), &temp);
> if (rc)
> return rc;
>
> - size |= temp & CXL_DVSEC_MEM_SIZE_LOW_MASK;
> + size |= temp & PCI_DVSEC_CXL_MEM_SIZE_LOW;
> if (!size) {
> continue;
> }
>
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_BASE_HIGH(i), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i), &temp);
> if (rc)
> return rc;
>
> base = (u64)temp << 32;
>
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_BASE_LOW(i), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_BASE_LOW(i), &temp);
> if (rc)
> return rc;
>
> - base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
> + base |= temp & PCI_DVSEC_CXL_MEM_BASE_LOW;
>
> info->dvsec_range[ranges++] = (struct range) {
> .start = base,
> @@ -1068,7 +1068,7 @@ u16 cxl_gpf_get_dvsec(struct device *dev)
> is_port = false;
>
> dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> - is_port ? CXL_DVSEC_PORT_GPF : CXL_DVSEC_DEVICE_GPF);
> + is_port ? PCI_DVSEC_CXL_PORT_GPF : PCI_DVSEC_CXL_DEVICE_GPF);
> if (!dvsec)
> dev_warn(dev, "%s GPF DVSEC not present\n",
> is_port ? "Port" : "Device");
> @@ -1084,14 +1084,14 @@ static int update_gpf_port_dvsec(struct pci_dev *pdev, int dvsec, int phase)
>
> switch (phase) {
> case 1:
> - offset = CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET;
> - base = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK;
> - scale = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK;
> + offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL;
> + base = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE;
> + scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE;
> break;
> case 2:
> - offset = CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET;
> - base = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK;
> - scale = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK;
> + offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL;
> + base = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE;
> + scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE;
> break;
> default:
> return -EINVAL;
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index 5ca7b0eed568..a010b3214342 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -271,10 +271,10 @@ EXPORT_SYMBOL_NS_GPL(cxl_map_device_regs, "CXL");
> static bool cxl_decode_regblock(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi,
> struct cxl_register_map *map)
> {
> - u8 reg_type = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK, reg_lo);
> - int bar = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BIR_MASK, reg_lo);
> + u8 reg_type = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID, reg_lo);
> + int bar = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BIR, reg_lo);
> u64 offset = ((u64)reg_hi << 32) |
> - (reg_lo & CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK);
> + (reg_lo & PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW);
>
> if (offset > pci_resource_len(pdev, bar)) {
> dev_warn(&pdev->dev,
> @@ -311,15 +311,15 @@ static int __cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_ty
> };
>
> regloc = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> - CXL_DVSEC_REG_LOCATOR);
> + PCI_DVSEC_CXL_REG_LOCATOR);
> if (!regloc)
> return -ENXIO;
>
> pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, ®loc_size);
> - regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size);
> + regloc_size = PCI_DVSEC_HEADER1_LEN(regloc_size);
>
> - regloc += CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET;
> - regblocks = (regloc_size - CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET) / 8;
> + regloc += PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1;
> + regblocks = (regloc_size - PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1) / 8;
>
> for (i = 0; i < regblocks; i++, regloc += 8) {
> u32 reg_lo, reg_hi;
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 0be4e508affe..b7f694bda913 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -933,7 +933,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> cxlds->rcd = is_cxl_restricted(pdev);
> cxlds->serial = pci_get_dsn(pdev);
> cxlds->cxl_dvsec = pci_find_dvsec_capability(
> - pdev, PCI_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
> + pdev, PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_DEVICE);
> if (!cxlds->cxl_dvsec)
> dev_warn(&pdev->dev,
> "Device DVSEC not present, skip CXL.mem init\n");
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 6c4b6f19b18e..662582bdccf0 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1333,63 +1333,57 @@
> #define PCI_IDE_SEL_ADDR_3(x) (28 + (x) * PCI_IDE_SEL_ADDR_BLOCK_SIZE)
> #define PCI_IDE_SEL_BLOCK_SIZE(nr_assoc) (20 + PCI_IDE_SEL_ADDR_BLOCK_SIZE * (nr_assoc))
>
> -/* Compute Express Link (CXL r3.1, sec 8.1.5) */
> -#define PCI_DVSEC_CXL_PORT 3
> -#define PCI_DVSEC_CXL_PORT_CTL 0x0c
> -#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
> -
> /*
> - * Compute Express Link (CXL r3.2, sec 8.1)
> + * Compute Express Link (CXL r4.0, sec 8.1)
> *
> * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
> - * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
> + * is "disconnected" (CXL r4.0, sec 9.12.3). Re-enumerate these
> * registers on downstream link-up events.
> */
> -#define PCI_DVSEC_HEADER1_LENGTH_MASK __GENMASK(31, 20)
> -
> -/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
> -#define CXL_DVSEC_PCIE_DEVICE 0
> -#define CXL_DVSEC_CAP_OFFSET 0xA
> -#define CXL_DVSEC_MEM_CAPABLE _BITUL(2)
> -#define CXL_DVSEC_HDM_COUNT_MASK __GENMASK(5, 4)
> -#define CXL_DVSEC_CTRL_OFFSET 0xC
> -#define CXL_DVSEC_MEM_ENABLE _BITUL(2)
> -#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> -#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> -#define CXL_DVSEC_MEM_INFO_VALID _BITUL(0)
> -#define CXL_DVSEC_MEM_ACTIVE _BITUL(1)
> -#define CXL_DVSEC_MEM_SIZE_LOW_MASK __GENMASK(31, 28)
> -#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> -#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> -#define CXL_DVSEC_MEM_BASE_LOW_MASK __GENMASK(31, 28)
> +
> +/* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */
> +#define PCI_DVSEC_CXL_DEVICE 0
> +#define PCI_DVSEC_CXL_CAP 0xA
> +#define PCI_DVSEC_CXL_MEM_CAPABLE _BITUL(2)
> +#define PCI_DVSEC_CXL_HDM_COUNT __GENMASK(5, 4)
> +#define PCI_DVSEC_CXL_CTRL 0xC
> +#define PCI_DVSEC_CXL_MEM_ENABLE _BITUL(2)
> +#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> +#define PCI_DVSEC_CXL_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> +#define PCI_DVSEC_CXL_MEM_INFO_VALID _BITUL(0)
> +#define PCI_DVSEC_CXL_MEM_ACTIVE _BITUL(1)
> +#define PCI_DVSEC_CXL_MEM_SIZE_LOW __GENMASK(31, 28)
> +#define PCI_DVSEC_CXL_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> +#define PCI_DVSEC_CXL_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> +#define PCI_DVSEC_CXL_MEM_BASE_LOW __GENMASK(31, 28)
>
> #define CXL_DVSEC_RANGE_MAX 2
>
> -/* CXL 3.2 8.1.4: Non-CXL Function Map DVSEC */
> -#define CXL_DVSEC_FUNCTION_MAP 2
> -
> -/* CXL 3.2 8.1.5: Extensions DVSEC for Ports */
> -#define CXL_DVSEC_PORT 3
> -#define CXL_DVSEC_PORT_CTL 0x0c
> -#define CXL_DVSEC_PORT_CTL_UNMASK_SBR 0x00000001
> -
> -/* CXL 3.2 8.1.6: GPF DVSEC for CXL Port */
> -#define CXL_DVSEC_PORT_GPF 4
> -#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
> -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK __GENMASK(3, 0)
> -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK __GENMASK(11, 8)
> -#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
> -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK __GENMASK(3, 0)
> -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK __GENMASK(11, 8)
> -
> -/* CXL 3.2 8.1.7: GPF DVSEC for CXL Device */
> -#define CXL_DVSEC_DEVICE_GPF 5
> -
> -/* CXL 3.2 8.1.9: Register Locator DVSEC */
> -#define CXL_DVSEC_REG_LOCATOR 8
> -#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
> -#define CXL_DVSEC_REG_LOCATOR_BIR_MASK __GENMASK(2, 0)
> -#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK __GENMASK(15, 8)
> -#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK __GENMASK(31, 16)
> +/* CXL r4.0, 8.1.4: Non-CXL Function Map DVSEC */
> +#define PCI_DVSEC_CXL_FUNCTION_MAP 2
> +
> +/* CXL r4.0, 8.1.5: Extensions DVSEC for Ports */
> +#define PCI_DVSEC_CXL_PORT 3
> +#define PCI_DVSEC_CXL_PORT_CTL 0x0c
> +#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
> +
> +/* CXL r4.0, 8.1.6: GPF DVSEC for CXL Port */
> +#define PCI_DVSEC_CXL_PORT_GPF 4
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL 0x0C
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE __GENMASK(3, 0)
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE __GENMASK(11, 8)
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL 0xE
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE __GENMASK(3, 0)
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE __GENMASK(11, 8)
> +
> +/* CXL r4.0, 8.1.7: GPF DVSEC for CXL Device */
> +#define PCI_DVSEC_CXL_DEVICE_GPF 5
> +
> +/* CXL r4.0, 8.1.9: Register Locator DVSEC */
> +#define PCI_DVSEC_CXL_REG_LOCATOR 8
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1 0xC
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BIR __GENMASK(2, 0)
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID __GENMASK(15, 8)
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW __GENMASK(31, 16)
>
> #endif /* LINUX_PCI_REGS_H */
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 03/34] PCI: Introduce pcie_is_cxl()
2026-01-14 18:20 ` [PATCH v14 03/34] PCI: Introduce pcie_is_cxl() Terry Bowman
2026-01-21 1:19 ` dan.j.williams
@ 2026-01-22 18:39 ` Bjorn Helgaas
1 sibling, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:39 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:24PM -0600, Terry Bowman wrote:
> CXL and AER drivers need the ability to identify CXL devices.
>
> Introduce set_pcie_cxl() with logic checking for CXL.mem or CXL.cache
> status in the CXL Flex Bus DVSEC status register. The CXL Flex Bus DVSEC
> presence is used because it is required for all the CXL PCIe devices.[1]
>
> Add boolean 'struct pci_dev::is_cxl' with the purpose to cache the CXL
> CXL.cache and CXl.mem status.
>
> Call set_pcie_cxl() for the parent bridge. Once a device is created there
> is a possibility the parent training or CXL state was updated as well. This
> will make certain the correct parent CXL state is cached.
>
> Add function pcie_is_cxl() to return 'struct pci_dev::is_cxl'.
>
> [1] CXL 3.1 Spec, 8.1.1 PCIe Designated Vendor-Specific Extended
> Capability (DVSEC) ID Assignment, Table 8-2
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>
> Changes in v13->v14:
> - Move FLEXBUS_STATUS DVSEC here (Jonathan)
> - Remove check for EP and USP (Dan)
> - Update commit message (Bjorn)
> - Fix writing past 80 columns (Bjorn)
> - Add pci_is_pcie() parent bridge check at beginning of function (Bjorn)
>
> Changes in v12->v13:
> - Add Ben's "reviewed-by"
>
> Changes in v11->v12:
> - Add review-by for Alejandro
> - Add comment in set_pcie_cxl() explaining why updating parent status.
>
> Changes in v10->v11:
> - Amend set_pcie_cxl() to check for Upstream Port's and EP's parent
> downstream port by calling set_pcie_cxl(). (Dan)
> - Retitle patch: 'Add' -> 'Introduce'
> - Add check for CXL.mem and CXL.cache (Alejandro, Dan)
> ---
> drivers/pci/probe.c | 31 +++++++++++++++++++++++++++++++
> include/linux/pci.h | 6 ++++++
> include/uapi/linux/pci_regs.h | 6 ++++++
> 3 files changed, 43 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 41183aed8f5d..bd7ce41d0c7a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1735,6 +1735,35 @@ static void set_pcie_thunderbolt(struct pci_dev *dev)
> dev->is_thunderbolt = 1;
> }
>
> +static void set_pcie_cxl(struct pci_dev *dev)
> +{
> + struct pci_dev *bridge;
> + u16 dvsec, cap;
> +
> + if (!pci_is_pcie(dev))
> + return;
> +
> + /*
> + * Update parent's CXL state because alternate protocol training
> + * may have changed
> + */
> + bridge = pci_upstream_bridge(dev);
> + if (bridge)
> + set_pcie_cxl(bridge);
> +
> + dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_FLEXBUS_PORT);
> + if (!dvsec)
> + return;
> +
> + pci_read_config_word(dev, dvsec + PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS,
> + &cap);
> +
> + dev->is_cxl = FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_CACHE, cap) ||
> + FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_MEM, cap);
> +
> +}
> +
> static void set_pcie_untrusted(struct pci_dev *dev)
> {
> struct pci_dev *parent = pci_upstream_bridge(dev);
> @@ -2065,6 +2094,8 @@ int pci_setup_device(struct pci_dev *dev)
> /* Need to have dev->cfg_size ready */
> set_pcie_thunderbolt(dev);
>
> + set_pcie_cxl(dev);
> +
> set_pcie_untrusted(dev);
>
> if (pci_is_pcie(dev))
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 864775651c6f..f8e8b3df794d 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -463,6 +463,7 @@ struct pci_dev {
> unsigned int is_pciehp:1;
> unsigned int shpc_managed:1; /* SHPC owned by shpchp */
> unsigned int is_thunderbolt:1; /* Thunderbolt controller */
> + unsigned int is_cxl:1; /* Compute Express Link (CXL) */
> /*
> * Devices marked being untrusted are the ones that can potentially
> * execute DMA attacks and similar. They are typically connected
> @@ -791,6 +792,11 @@ static inline bool pci_is_display(struct pci_dev *pdev)
> return (pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY;
> }
>
> +static inline bool pcie_is_cxl(struct pci_dev *pci_dev)
> +{
> + return pci_dev->is_cxl;
> +}
> +
> #define for_each_pci_bridge(dev, bus) \
> list_for_each_entry(dev, &bus->devices, bus_list) \
> if (!pci_is_bridge(dev)) {} else
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 662582bdccf0..b6622fd60fd9 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1379,6 +1379,12 @@
> /* CXL r4.0, 8.1.7: GPF DVSEC for CXL Device */
> #define PCI_DVSEC_CXL_DEVICE_GPF 5
>
> +/* CXL r4.0, 8.1.8: Flex Bus DVSEC */
> +#define PCI_DVSEC_CXL_FLEXBUS_PORT 7
> +#define PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS 0xE
> +#define PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_CACHE _BITUL(0)
> +#define PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_MEM _BITUL(2)
> +
> /* CXL r4.0, 8.1.9: Register Locator DVSEC */
> #define PCI_DVSEC_CXL_REG_LOCATOR 8
> #define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1 0xC
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-14 18:20 ` [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error() Terry Bowman
2026-01-14 19:08 ` Jonathan Cameron
2026-01-20 2:20 ` dan.j.williams
@ 2026-01-22 18:48 ` Bjorn Helgaas
2 siblings, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:48 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:31PM -0600, Terry Bowman wrote:
> The AER driver includes significant logic for handling CXL protocol errors.
> The AER driver will be updated in the future to separate the AER and CXL
> logic.
>
> Rename the is_internal_error() function to is_aer_internal_error() as it
> gives a more precise indication of the purpose. Make is_aer_internal_error()
> non-static to allow for other PCI drivers to access.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Personally I would put "aer_" at the beginning, i.e.,
"aer_is_internal_error()" to match other AER functions.
But either is OK.
> ---
>
> Changes in v13->v14:
> - New patch
> ---
> drivers/pci/pcie/aer.c | 4 ++--
> drivers/pci/pcie/portdrv.h | 9 +++++++++
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 63658e691aa2..2527e8370186 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1166,7 +1166,7 @@ static bool is_cxl_mem_dev(struct pci_dev *dev)
> return true;
> }
>
> -static bool is_internal_error(struct aer_err_info *info)
> +bool is_aer_internal_error(struct aer_err_info *info)
> {
> if (info->severity == AER_CORRECTABLE)
> return info->status & PCI_ERR_COR_INTERNAL;
> @@ -1211,7 +1211,7 @@ static void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> * device driver.
> */
> if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> - is_internal_error(info))
> + is_aer_internal_error(info))
> pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
> }
>
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index bd29d1cc7b8b..e7a0a2cffea9 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -123,4 +123,13 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {}
> #endif /* !CONFIG_PCIE_PME */
>
> struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
> +
> +struct aer_err_info;
> +
> +#ifdef CONFIG_PCIEAER_CXL
> +bool is_aer_internal_error(struct aer_err_info *info);
> +#else
> +static inline bool is_aer_internal_error(struct aer_err_info *info) { return false; }
> +#endif /* CONFIG_PCIEAER_CXL */
> +
> #endif /* _PORTDRV_H_ */
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors()
2026-01-14 18:20 ` [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
` (3 preceding siblings ...)
2026-01-20 2:09 ` dan.j.williams
@ 2026-01-22 18:49 ` Bjorn Helgaas
4 siblings, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:49 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:30PM -0600, Terry Bowman wrote:
> Internal PCIe errors are not enabled by default during initialization. This
> creates a problem for CXL drivers, which rely on PCIe Correctable and
> Uncorrectable Internal Errors to receive CXL protocol error notifications.
>
> Export pci_aer_unmask_internal_errors() so CXL and other drivers can
> enable internal PCIe errors.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>
> Changes in v13->v14:
> - New commit. Bjorn requested separating out and adding immediatetly
> before being used. This is called from cxl_rch_enable_rcec() in
> following patch.
> ---
> drivers/pci/pcie/aer.c | 6 +++---
> include/linux/aer.h | 2 ++
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index c99ba2a1159c..63658e691aa2 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1120,8 +1120,6 @@ static bool find_source_device(struct pci_dev *parent,
> return true;
> }
>
> -#ifdef CONFIG_PCIEAER_CXL
> -
> /**
> * pci_aer_unmask_internal_errors - unmask internal errors
> * @dev: pointer to the pci_dev data structure
> @@ -1132,7 +1130,7 @@ static bool find_source_device(struct pci_dev *parent,
> * Note: AER must be enabled and supported by the device which must be
> * checked in advance, e.g. with pcie_aer_is_native().
> */
> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> {
> int aer = dev->aer_cap;
> u32 mask;
> @@ -1145,7 +1143,9 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> mask &= ~PCI_ERR_COR_INTERNAL;
> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> }
> +EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
>
> +#ifdef CONFIG_PCIEAER_CXL
> static bool is_cxl_mem_dev(struct pci_dev *dev)
> {
> /*
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 02940be66324..df0f5c382286 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -56,12 +56,14 @@ struct aer_capability_regs {
> #if defined(CONFIG_PCIEAER)
> int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> int pcie_aer_is_native(struct pci_dev *dev);
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> #else
> static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> {
> return -EINVAL;
> }
> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> #endif
>
> void pci_print_aer(struct pci_dev *dev, int aer_severity,
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors()
2026-01-22 16:48 ` dan.j.williams
@ 2026-01-22 18:51 ` Lukas Wunner
0 siblings, 0 replies; 129+ messages in thread
From: Lukas Wunner @ 2026-01-22 18:51 UTC (permalink / raw)
To: dan.j.williams
Cc: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Thu, Jan 22, 2026 at 08:48:36AM -0800, dan.j.williams@intel.com wrote:
> Lukas Wunner wrote:
> > As said, the "xe" driver needs to unmask Internal Errors and could
> > take advantage of this helper, so I'd call opening this up for PCI
> > drivers if not a goal then at least a "desirable side effect". ;)
> >
> > https://lore.kernel.org/all/aR1_M_i3yIygd8v-@wunner.de/
>
> I missed that earlier. How did xe manage to be the only device in the history
> of Linux that needs internal errors unmasked?
>
> What happens if Linux says "no, that error model has never been supported and
> it creates in ongoing mental / maintenance load of internal errors do not
> matter for PCIe, only CXL, (except xe)."
Every spec-defined feature is fair game to be used by drivers and
denying them support would seem unreasonable.
> > > -+EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
> > >
> > > ++/*
> > > ++ * Internal errors are too device-specific to enable generally, however for CXL
> > > ++ * their behavior is standardized for conveying CXL protocol errors.
> > > ++ */
> > > ++EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core");
> > > ++
> >
> > This change will require touching aer.c every time a driver
> > (such as xe) has the need to unmask Internal Errors.
> > Not sure if that's such a good idea...
>
> The xe driver can always come back and change this to plain
> EXPORT_SYMBOL_GPL() once the clear the hurdle above of,
> "please reconsider your error model to not require this
> never needed before feature of AER".
Of course, but it may annoy Bjorn that he'll have to deal with
an amendment of this macro for each individual driver that needs it,
or at least for the *next* driver that needs it.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 11/34] PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c
2026-01-14 18:20 ` [PATCH v14 11/34] PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c Terry Bowman
2026-01-22 17:23 ` Markus Elfring
@ 2026-01-22 18:53 ` Bjorn Helgaas
1 sibling, 0 replies; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:53 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:32PM -0600, Terry Bowman wrote:
> The Restricted CXL Host (RCH) AER error handling logic currently resides
> in the AER driver file, aer.c. CXL specific changes conditionally compiled
> using #ifdefs.
>
> Improve the AER driver maintainability by separating the RCH specific logic
> from the AER driver's core functionality and removing the ifdefs. Introduce
> drivers/pci/pcie/aer_cxl_rch.c for moving the RCH AER logic into. Conditionally
> compile the file using the CONFIG_CXL_RCH_RAS Kconfig.
>
> Move the CXL logic into the new file but leave CXL helper function
> is_internal_error() in aer.c for now as it will be moved in future patch
> for CXL Virtual Hierarchy handling.
>
> To maintain compilation after the move other changes are required. Change
> cxl_rch_handle_error(), cxl_rch_enable_rcec(), and is_internal_error() to
> be non-static inorder for accessing from the AER driver.
s/inorder for accessing from the/so they can be used by the/
> Update the new file with the SPDX and 2023 AMD copyright notations because
> the RCH bits were initially contributed in 2023 by AMD. See commit:
> commit 0a867568bb0d ("PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler")
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>
> Changes in v13->v14:
> - Add review-by and signed-off for Dan
> - Commit message fixup (Dan)
> - Update commit message with use-case description (Dan, Lukas)
> - Make cxl_error_is_native() static (Dan)
>
> Changes in v12->v13:
> - Add forward declararation of 'struct aer_err_info' in pci/pci.h (Terry)
> - Changed copyright date from 2025 to 2023 (Jonathan)
> - Add David Jiang's, Jonathan's, and Ben's review-by
> - Re-add 'struct aer_err_info' (Bot)
>
> Changes in v11->v12:
> - Rename drivers/pci/pcie/cxl_rch.c to drivers/pci/pcie/aer_cxl_rch.c (Lukas)
> - Removed forward declararation of 'struct aer_err_info' in pci/pci.h (Terry)
>
> Changes in v10->v11:
> - Remove changes in code-split and move to earlier, new patch
> - Add #include <linux/bitfield.h> to cxl_ras.c
> - Move cxl_rch_handle_error() & cxl_rch_enable_rcec() declarations from pci.h
> to aer.h, more localized.
> - Introduce CONFIG_CXL_RCH_RAS, includes Makefile changes, ras.c
> ifdef changes
> ---
> drivers/pci/pcie/Makefile | 1 +
> drivers/pci/pcie/aer.c | 99 +-----------------------------
> drivers/pci/pcie/aer_cxl_rch.c | 106 +++++++++++++++++++++++++++++++++
> drivers/pci/pcie/portdrv.h | 9 ++-
> 4 files changed, 114 insertions(+), 101 deletions(-)
> create mode 100644 drivers/pci/pcie/aer_cxl_rch.c
>
> diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
> index 173829aa02e6..b0b43a18c304 100644
> --- a/drivers/pci/pcie/Makefile
> +++ b/drivers/pci/pcie/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o bwctrl.o
>
> obj-y += aspm.o
> obj-$(CONFIG_PCIEAER) += aer.o err.o tlp.o
> +obj-$(CONFIG_CXL_RAS) += aer_cxl_rch.o
> obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
> obj-$(CONFIG_PCIE_PME) += pme.o
> obj-$(CONFIG_PCIE_DPC) += dpc.o
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 2527e8370186..b1e6ee7468b9 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1145,27 +1145,7 @@ void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> }
> EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
>
> -#ifdef CONFIG_PCIEAER_CXL
> -static bool is_cxl_mem_dev(struct pci_dev *dev)
> -{
> - /*
> - * The capability, status, and control fields in Device 0,
> - * Function 0 DVSEC control the CXL functionality of the
> - * entire device (CXL 3.0, 8.1.3).
> - */
> - if (dev->devfn != PCI_DEVFN(0, 0))
> - return false;
> -
> - /*
> - * CXL Memory Devices must have the 502h class code set (CXL
> - * 3.0, 8.1.12.1).
> - */
> - if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> - return false;
> -
> - return true;
> -}
> -
> +#ifdef CONFIG_CXL_RAS
> bool is_aer_internal_error(struct aer_err_info *info)
> {
> if (info->severity == AER_CORRECTABLE)
> @@ -1173,83 +1153,6 @@ bool is_aer_internal_error(struct aer_err_info *info)
>
> return info->status & PCI_ERR_UNC_INTN;
> }
> -
> -static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> -{
> - struct aer_err_info *info = (struct aer_err_info *)data;
> - const struct pci_error_handlers *err_handler;
> -
> - if (!is_cxl_mem_dev(dev) || !pcie_aer_is_native(dev))
> - return 0;
> -
> - /* Protect dev->driver */
> - device_lock(&dev->dev);
> -
> - err_handler = dev->driver ? dev->driver->err_handler : NULL;
> - if (!err_handler)
> - goto out;
> -
> - if (info->severity == AER_CORRECTABLE) {
> - if (err_handler->cor_error_detected)
> - err_handler->cor_error_detected(dev);
> - } else if (err_handler->error_detected) {
> - if (info->severity == AER_NONFATAL)
> - err_handler->error_detected(dev, pci_channel_io_normal);
> - else if (info->severity == AER_FATAL)
> - err_handler->error_detected(dev, pci_channel_io_frozen);
> - }
> -out:
> - device_unlock(&dev->dev);
> - return 0;
> -}
> -
> -static void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> -{
> - /*
> - * Internal errors of an RCEC indicate an AER error in an
> - * RCH's downstream port. Check and handle them in the CXL.mem
> - * device driver.
> - */
> - if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> - is_aer_internal_error(info))
> - pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
> -}
> -
> -static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> -{
> - bool *handles_cxl = data;
> -
> - if (!*handles_cxl)
> - *handles_cxl = is_cxl_mem_dev(dev) && pcie_aer_is_native(dev);
> -
> - /* Non-zero terminates iteration */
> - return *handles_cxl;
> -}
> -
> -static bool handles_cxl_errors(struct pci_dev *rcec)
> -{
> - bool handles_cxl = false;
> -
> - if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
> - pcie_aer_is_native(rcec))
> - pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
> -
> - return handles_cxl;
> -}
> -
> -static void cxl_rch_enable_rcec(struct pci_dev *rcec)
> -{
> - if (!handles_cxl_errors(rcec))
> - return;
> -
> - pci_aer_unmask_internal_errors(rcec);
> - pci_info(rcec, "CXL: Internal errors unmasked");
> -}
> -
> -#else
> -static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { }
> -static inline void cxl_rch_handle_error(struct pci_dev *dev,
> - struct aer_err_info *info) { }
> #endif
>
> /**
> diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
> new file mode 100644
> index 000000000000..6b515edb12c1
> --- /dev/null
> +++ b/drivers/pci/pcie/aer_cxl_rch.c
> @@ -0,0 +1,106 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2023 AMD Corporation. All rights reserved. */
> +
> +#include <linux/pci.h>
> +#include <linux/aer.h>
> +#include <linux/bitfield.h>
> +#include "../pci.h"
> +#include "portdrv.h"
> +
> +static bool is_cxl_mem_dev(struct pci_dev *dev)
> +{
> + /*
> + * The capability, status, and control fields in Device 0,
> + * Function 0 DVSEC control the CXL functionality of the
> + * entire device (CXL 3.0, 8.1.3).
> + */
> + if (dev->devfn != PCI_DEVFN(0, 0))
> + return false;
> +
> + /*
> + * CXL Memory Devices must have the 502h class code set (CXL
> + * 3.0, 8.1.12.1).
> + */
> + if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> + return false;
> +
> + return true;
> +}
> +
> +static bool cxl_error_is_native(struct pci_dev *dev)
> +{
> + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> +
> + return (pcie_ports_native || host->native_aer);
> +}
> +
> +static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> +{
> + struct aer_err_info *info = (struct aer_err_info *)data;
> + const struct pci_error_handlers *err_handler;
> +
> + if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
> + return 0;
> +
> + device_lock(&dev->dev);
> +
> + err_handler = dev->driver ? dev->driver->err_handler : NULL;
> + if (!err_handler)
> + goto out;
> +
> + if (info->severity == AER_CORRECTABLE) {
> + if (err_handler->cor_error_detected)
> + err_handler->cor_error_detected(dev);
> + } else if (err_handler->error_detected) {
> + if (info->severity == AER_NONFATAL)
> + err_handler->error_detected(dev, pci_channel_io_normal);
> + else if (info->severity == AER_FATAL)
> + err_handler->error_detected(dev, pci_channel_io_frozen);
> + }
> +out:
> + device_unlock(&dev->dev);
> + return 0;
> +}
> +
> +void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> +{
> + /*
> + * Internal errors of an RCEC indicate an AER error in an
> + * RCH's downstream port. Check and handle them in the CXL.mem
> + * device driver.
> + */
> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> + is_aer_internal_error(info))
> + pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
> +}
> +
> +static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> +{
> + bool *handles_cxl = data;
> +
> + if (!*handles_cxl)
> + *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
> +
> + /* Non-zero terminates iteration */
> + return *handles_cxl;
> +}
> +
> +static bool handles_cxl_errors(struct pci_dev *rcec)
> +{
> + bool handles_cxl = false;
> +
> + if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
> + pcie_aer_is_native(rcec))
> + pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
> +
> + return handles_cxl;
> +}
> +
> +void cxl_rch_enable_rcec(struct pci_dev *rcec)
> +{
> + if (!handles_cxl_errors(rcec))
> + return;
> +
> + pci_aer_unmask_internal_errors(rcec);
> + pci_info(rcec, "CXL: Internal errors unmasked");
> +}
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index e7a0a2cffea9..cc58bf2f2c84 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -126,10 +126,13 @@ struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
>
> struct aer_err_info;
>
> -#ifdef CONFIG_PCIEAER_CXL
> +#ifdef CONFIG_CXL_RAS
> bool is_aer_internal_error(struct aer_err_info *info);
> +void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info);
> +void cxl_rch_enable_rcec(struct pci_dev *rcec);
> #else
> static inline bool is_aer_internal_error(struct aer_err_info *info) { return false; }
> -#endif /* CONFIG_PCIEAER_CXL */
> -
> +static inline void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { }
> +static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
> +#endif /* CONFIG_CXL_RAS */
> #endif /* _PORTDRV_H_ */
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 01/34] PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
2026-01-14 18:20 ` [PATCH v14 01/34] PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
@ 2026-01-22 18:58 ` Bjorn Helgaas
2026-01-22 19:43 ` Bowman, Terry
0 siblings, 1 reply; 129+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 18:58 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Wed, Jan 14, 2026 at 12:20:22PM -0600, Terry Bowman wrote:
> The CXL DVSECs are currently defined in cxl/core/cxlpci.h. These are not
> accessible to other subsystems. Move these to uapi/linux/pci_regs.h.
I'm looking a little bit askance at adding things to
uapi/linux/pci_regs.h and then renaming them. I know it's OCD to
worry about that momentary blip, but changes in uapi potentially break
userspace.
Maybe we could rename them first, then move them to pci_regs.h?
Either way:
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> The CXL DVSEC definitions will be renamed and reformatted to fit better
> with existing defines.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>
> ----
>
> Changes in v13->v14:
> - Add Jonathan's and Dan's review-by
> - Update commit title prefix (Bjorn)
> - Revert format fix for cxl_sbr_masked() (Jonathan)
> - Update 'Compute Express Link' comment block (Jonathan)
> - Move PCI_DVSEC_CXL_FLEXBUS definitions to later patch where
> used (Jonathan)
> - Removed stray change (Bjorn)
>
> Changes in v12->v13:
> - Add Dave Jiang's reviewed-by
> - Remove changes to existing PCI_DVSEC_CXL_PORT* defines. Update commit
> message. (Jonathan)
>
> Changes in v11 -> v12:
> - Change formatting to be same as existing definitions
> - Change GENMASK() -> __GENMASK() and BIT() to _BITUL()
>
> Changes in v10 -> v11:
> - New commit
> ---
> drivers/cxl/cxlpci.h | 53 -----------------------------
> include/uapi/linux/pci_regs.h | 64 ++++++++++++++++++++++++++++++++---
> 2 files changed, 59 insertions(+), 58 deletions(-)
>
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 1d526bea8431..cdb7cf3dbcb4 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -7,59 +7,6 @@
>
> #define CXL_MEMORY_PROGIF 0x10
>
> -/*
> - * See section 8.1 Configuration Space Registers in the CXL 2.0
> - * Specification. Names are taken straight from the specification with "CXL" and
> - * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
> - */
> -#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
> -
> -/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
> -#define CXL_DVSEC_PCIE_DEVICE 0
> -#define CXL_DVSEC_CAP_OFFSET 0xA
> -#define CXL_DVSEC_MEM_CAPABLE BIT(2)
> -#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
> -#define CXL_DVSEC_CTRL_OFFSET 0xC
> -#define CXL_DVSEC_MEM_ENABLE BIT(2)
> -#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> -#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> -#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
> -#define CXL_DVSEC_MEM_ACTIVE BIT(1)
> -#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
> -#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> -#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> -#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
> -
> -#define CXL_DVSEC_RANGE_MAX 2
> -
> -/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
> -#define CXL_DVSEC_FUNCTION_MAP 2
> -
> -/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */
> -#define CXL_DVSEC_PORT_EXTENSIONS 3
> -
> -/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */
> -#define CXL_DVSEC_PORT_GPF 4
> -#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
> -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK GENMASK(3, 0)
> -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK GENMASK(11, 8)
> -#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
> -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK GENMASK(3, 0)
> -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK GENMASK(11, 8)
> -
> -/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */
> -#define CXL_DVSEC_DEVICE_GPF 5
> -
> -/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */
> -#define CXL_DVSEC_PCIE_FLEXBUS_PORT 7
> -
> -/* CXL 2.0 8.1.9: Register Locator DVSEC */
> -#define CXL_DVSEC_REG_LOCATOR 8
> -#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
> -#define CXL_DVSEC_REG_LOCATOR_BIR_MASK GENMASK(2, 0)
> -#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
> -#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
> -
> /*
> * NOTE: Currently all the functions which are enabled for CXL require their
> * vectors to be in the first 16. Use this as the default max.
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 3add74ae2594..6c4b6f19b18e 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1253,11 +1253,6 @@
> #define PCI_DEV3_STA 0x0c /* Device 3 Status Register */
> #define PCI_DEV3_STA_SEGMENT 0x8 /* Segment Captured (end-to-end flit-mode detected) */
>
> -/* Compute Express Link (CXL r3.1, sec 8.1.5) */
> -#define PCI_DVSEC_CXL_PORT 3
> -#define PCI_DVSEC_CXL_PORT_CTL 0x0c
> -#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
> -
> /* Integrity and Data Encryption Extended Capability */
> #define PCI_IDE_CAP 0x04
> #define PCI_IDE_CAP_LINK 0x1 /* Link IDE Stream Supported */
> @@ -1338,4 +1333,63 @@
> #define PCI_IDE_SEL_ADDR_3(x) (28 + (x) * PCI_IDE_SEL_ADDR_BLOCK_SIZE)
> #define PCI_IDE_SEL_BLOCK_SIZE(nr_assoc) (20 + PCI_IDE_SEL_ADDR_BLOCK_SIZE * (nr_assoc))
>
> +/* Compute Express Link (CXL r3.1, sec 8.1.5) */
> +#define PCI_DVSEC_CXL_PORT 3
> +#define PCI_DVSEC_CXL_PORT_CTL 0x0c
> +#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
> +
> +/*
> + * Compute Express Link (CXL r3.2, sec 8.1)
> + *
> + * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
> + * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
> + * registers on downstream link-up events.
> + */
> +#define PCI_DVSEC_HEADER1_LENGTH_MASK __GENMASK(31, 20)
> +
> +/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
> +#define CXL_DVSEC_PCIE_DEVICE 0
> +#define CXL_DVSEC_CAP_OFFSET 0xA
> +#define CXL_DVSEC_MEM_CAPABLE _BITUL(2)
> +#define CXL_DVSEC_HDM_COUNT_MASK __GENMASK(5, 4)
> +#define CXL_DVSEC_CTRL_OFFSET 0xC
> +#define CXL_DVSEC_MEM_ENABLE _BITUL(2)
> +#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> +#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> +#define CXL_DVSEC_MEM_INFO_VALID _BITUL(0)
> +#define CXL_DVSEC_MEM_ACTIVE _BITUL(1)
> +#define CXL_DVSEC_MEM_SIZE_LOW_MASK __GENMASK(31, 28)
> +#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> +#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> +#define CXL_DVSEC_MEM_BASE_LOW_MASK __GENMASK(31, 28)
> +
> +#define CXL_DVSEC_RANGE_MAX 2
> +
> +/* CXL 3.2 8.1.4: Non-CXL Function Map DVSEC */
> +#define CXL_DVSEC_FUNCTION_MAP 2
> +
> +/* CXL 3.2 8.1.5: Extensions DVSEC for Ports */
> +#define CXL_DVSEC_PORT 3
> +#define CXL_DVSEC_PORT_CTL 0x0c
> +#define CXL_DVSEC_PORT_CTL_UNMASK_SBR 0x00000001
> +
> +/* CXL 3.2 8.1.6: GPF DVSEC for CXL Port */
> +#define CXL_DVSEC_PORT_GPF 4
> +#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
> +#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK __GENMASK(3, 0)
> +#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK __GENMASK(11, 8)
> +#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
> +#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK __GENMASK(3, 0)
> +#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK __GENMASK(11, 8)
> +
> +/* CXL 3.2 8.1.7: GPF DVSEC for CXL Device */
> +#define CXL_DVSEC_DEVICE_GPF 5
> +
> +/* CXL 3.2 8.1.9: Register Locator DVSEC */
> +#define CXL_DVSEC_REG_LOCATOR 8
> +#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
> +#define CXL_DVSEC_REG_LOCATOR_BIR_MASK __GENMASK(2, 0)
> +#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK __GENMASK(15, 8)
> +#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK __GENMASK(31, 16)
> +
> #endif /* LINUX_PCI_REGS_H */
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-22 13:34 ` Lukas Wunner
@ 2026-01-22 19:09 ` dan.j.williams
2026-01-22 19:32 ` Lukas Wunner
0 siblings, 1 reply; 129+ messages in thread
From: dan.j.williams @ 2026-01-22 19:09 UTC (permalink / raw)
To: Lukas Wunner, dan.j.williams
Cc: Jonathan Cameron, Terry Bowman, dave, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
Lukas Wunner wrote:
> On Thu, Jan 15, 2026 at 12:42:36PM -0800, dan.j.williams@intel.com wrote:
> > I agree with the general sentiment, but not the conclusion, especially
> > because this is a private detail. Linux has long ignored internal
> > errors. The only reason to consider them now is because CXL decided to
> > multiplex its error model on top of this oft-ignored feature of PCIe
> > AER.
> >
> > Specifically, portdrv.h is not in the global include namespace, this is
> > a private detail of the only conumer of internal errors:
> > drivers/pci/pcie/aer_cxl_{rch,vh}.c
> >
> > At most we should have this as a comment to clarify:
> >
> > /*
> > * Note, internal errors are only considered for the CXL error model,
> > * not for other implementations.
> > */
> >
> > ...and the pci_aer_unmask_internal_errors() export should be:
> >
> > EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core")
> >
> > ...for the same reason. Steer folks away from thinking that it is open
> > season for adding more internal error support.
>
> It's not like Internal Errors are a bad thing per se. They're a way
> to signal "other" errors besides the spec-defined ones.
>
> As an example, and I'm keeping this in general terms to avoid devulging
> information about future products, a device possessing ECC RAM may raise
> a Correctable Internal Error when ECC successfully recovers from flipped
> bits because it allows alerting the user in advance that the device might
> need to be replaced in the near future. If ECC recovery fails, the device
> might try to use a reserved spare portion of RAM in lieu of the failing one
> and instruct the AER driver to recover through a bus reset. Such errors
> are not covered by the spec-defined types. Using the Internal Error type
> is the only possibility it seems.
The Internal Error type is a poor fit for that. This ECC RAM scenario is simply
an internal device event, not a PCIe visible error case. Consider that CXL
Memory Expanders are nothing if not "devices possessing ECC RAM" that may
encounter correctable errors in that RAM. Yes, the user has need for those
correctable errors to be reported, and no, PCIe AER has no reason to care about
conveying those reports. CXL bypasses AER for internal ECC RAM events.
PCIe AER only notices device-internal ECC RAM events in the case where a PCIe
transaction encounters an error. For example, a completer abort attempting to
pull from bad RAM.
So if CXL saw no need to architect internal ECC events into AER, why does Xe
think it is special in this regard?
The CXL solution is simply a typical device interrupt that notifies new entries
in the device event log. See trace_cxl_dram() and trace_cxl_general_media() for
that event handling.
> My point is, there are valid (upcoming, not theoretical) use cases for
> Internal Errors and creating infrastructure in the kernel to take advantage
> of them is a good thing. Hence my continued pushing back on hiding or
> discouraging their use.
It is fine to look ahead, but I would not go so far as to pull in future
requirements into a present patch set. Especially when those future
requirements are suspect.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-22 19:09 ` dan.j.williams
@ 2026-01-22 19:32 ` Lukas Wunner
2026-01-22 21:32 ` dan.j.williams
0 siblings, 1 reply; 129+ messages in thread
From: Lukas Wunner @ 2026-01-22 19:32 UTC (permalink / raw)
To: dan.j.williams
Cc: Jonathan Cameron, Terry Bowman, dave, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On Thu, Jan 22, 2026 at 11:09:48AM -0800, dan.j.williams@intel.com wrote:
> Lukas Wunner wrote:
> > a device possessing ECC RAM may raise
> > a Correctable Internal Error when ECC successfully recovers from flipped
> > bits because it allows alerting the user in advance that the device might
> > need to be replaced in the near future. If ECC recovery fails, the device
> > might try to use a reserved spare portion of RAM in lieu of the failing one
> > and instruct the AER driver to recover through a bus reset. Such errors
> > are not covered by the spec-defined types. Using the Internal Error type
> > is the only possibility it seems.
>
> The Internal Error type is a poor fit for that. This ECC RAM scenario is
> simply an internal device event, not a PCIe visible error case. Consider
> that CXL Memory Expanders are nothing if not "devices possessing ECC RAM"
> that may encounter correctable errors in that RAM. Yes, the user has need
> for those correctable errors to be reported, and no, PCIe AER has no reason
> to care about conveying those reports.
I'm not aware of a better PCIe spec-defined mechanism to report such
errors besides AER (Advanced Error *Reporting*), so I'm not sure why
you consider it a poor fit.
However, reporting corrected ECC errors is only half of the equation.
As stated above, if the ECC error is not correctable, the device may
choose to replace the faulty memory region with reserved spare memory,
but then a reset is required to recover from the error. Precisely what
the AER driver provides, so again I'm not sure why it's a poor fit.
> So if CXL saw no need to architect internal ECC events into AER, why does Xe
> think it is special in this regard?
The most charitable interpretation is that it's just the first mover
and others will follow. Well actually CXL is the first mover. ;)
> The CXL solution is simply a typical device interrupt that notifies
> new entries in the device event log. See trace_cxl_dram() and
> trace_cxl_general_media() for that event handling.
This seems to be based on CPER, which is not part of the PCIe Base Spec.
I can only guess that xe devices are intended to be used on non-ACPI
platforms as well, which may have led to the decision to use a
PCIe spec-defined mechanism.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 01/34] PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
2026-01-22 18:58 ` Bjorn Helgaas
@ 2026-01-22 19:43 ` Bowman, Terry
0 siblings, 0 replies; 129+ messages in thread
From: Bowman, Terry @ 2026-01-22 19:43 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
On 1/22/2026 12:58 PM, Bjorn Helgaas wrote:
> On Wed, Jan 14, 2026 at 12:20:22PM -0600, Terry Bowman wrote:
>> The CXL DVSECs are currently defined in cxl/core/cxlpci.h. These are not
>> accessible to other subsystems. Move these to uapi/linux/pci_regs.h.
>
> I'm looking a little bit askance at adding things to
> uapi/linux/pci_regs.h and then renaming them. I know it's OCD to
> worry about that momentary blip, but changes in uapi potentially break
> userspace.
>
> Maybe we could rename them first, then move them to pci_regs.h?
>
> Either way:
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
Ok, I'll update the naming before moving.
Thanks for reviewing.
-Terry
>> The CXL DVSEC definitions will be renamed and reformatted to fit better
>> with existing defines.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>>
>> ----
>>
>> Changes in v13->v14:
>> - Add Jonathan's and Dan's review-by
>> - Update commit title prefix (Bjorn)
>> - Revert format fix for cxl_sbr_masked() (Jonathan)
>> - Update 'Compute Express Link' comment block (Jonathan)
>> - Move PCI_DVSEC_CXL_FLEXBUS definitions to later patch where
>> used (Jonathan)
>> - Removed stray change (Bjorn)
>>
>> Changes in v12->v13:
>> - Add Dave Jiang's reviewed-by
>> - Remove changes to existing PCI_DVSEC_CXL_PORT* defines. Update commit
>> message. (Jonathan)
>>
>> Changes in v11 -> v12:
>> - Change formatting to be same as existing definitions
>> - Change GENMASK() -> __GENMASK() and BIT() to _BITUL()
>>
>> Changes in v10 -> v11:
>> - New commit
>> ---
>> drivers/cxl/cxlpci.h | 53 -----------------------------
>> include/uapi/linux/pci_regs.h | 64 ++++++++++++++++++++++++++++++++---
>> 2 files changed, 59 insertions(+), 58 deletions(-)
>>
>> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
>> index 1d526bea8431..cdb7cf3dbcb4 100644
>> --- a/drivers/cxl/cxlpci.h
>> +++ b/drivers/cxl/cxlpci.h
>> @@ -7,59 +7,6 @@
>>
>> #define CXL_MEMORY_PROGIF 0x10
>>
>> -/*
>> - * See section 8.1 Configuration Space Registers in the CXL 2.0
>> - * Specification. Names are taken straight from the specification with "CXL" and
>> - * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
>> - */
>> -#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
>> -
>> -/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
>> -#define CXL_DVSEC_PCIE_DEVICE 0
>> -#define CXL_DVSEC_CAP_OFFSET 0xA
>> -#define CXL_DVSEC_MEM_CAPABLE BIT(2)
>> -#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
>> -#define CXL_DVSEC_CTRL_OFFSET 0xC
>> -#define CXL_DVSEC_MEM_ENABLE BIT(2)
>> -#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
>> -#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
>> -#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
>> -#define CXL_DVSEC_MEM_ACTIVE BIT(1)
>> -#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
>> -#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
>> -#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
>> -#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
>> -
>> -#define CXL_DVSEC_RANGE_MAX 2
>> -
>> -/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
>> -#define CXL_DVSEC_FUNCTION_MAP 2
>> -
>> -/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */
>> -#define CXL_DVSEC_PORT_EXTENSIONS 3
>> -
>> -/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */
>> -#define CXL_DVSEC_PORT_GPF 4
>> -#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
>> -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK GENMASK(3, 0)
>> -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK GENMASK(11, 8)
>> -#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
>> -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK GENMASK(3, 0)
>> -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK GENMASK(11, 8)
>> -
>> -/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */
>> -#define CXL_DVSEC_DEVICE_GPF 5
>> -
>> -/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */
>> -#define CXL_DVSEC_PCIE_FLEXBUS_PORT 7
>> -
>> -/* CXL 2.0 8.1.9: Register Locator DVSEC */
>> -#define CXL_DVSEC_REG_LOCATOR 8
>> -#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
>> -#define CXL_DVSEC_REG_LOCATOR_BIR_MASK GENMASK(2, 0)
>> -#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
>> -#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
>> -
>> /*
>> * NOTE: Currently all the functions which are enabled for CXL require their
>> * vectors to be in the first 16. Use this as the default max.
>> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
>> index 3add74ae2594..6c4b6f19b18e 100644
>> --- a/include/uapi/linux/pci_regs.h
>> +++ b/include/uapi/linux/pci_regs.h
>> @@ -1253,11 +1253,6 @@
>> #define PCI_DEV3_STA 0x0c /* Device 3 Status Register */
>> #define PCI_DEV3_STA_SEGMENT 0x8 /* Segment Captured (end-to-end flit-mode detected) */
>>
>> -/* Compute Express Link (CXL r3.1, sec 8.1.5) */
>> -#define PCI_DVSEC_CXL_PORT 3
>> -#define PCI_DVSEC_CXL_PORT_CTL 0x0c
>> -#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
>> -
>> /* Integrity and Data Encryption Extended Capability */
>> #define PCI_IDE_CAP 0x04
>> #define PCI_IDE_CAP_LINK 0x1 /* Link IDE Stream Supported */
>> @@ -1338,4 +1333,63 @@
>> #define PCI_IDE_SEL_ADDR_3(x) (28 + (x) * PCI_IDE_SEL_ADDR_BLOCK_SIZE)
>> #define PCI_IDE_SEL_BLOCK_SIZE(nr_assoc) (20 + PCI_IDE_SEL_ADDR_BLOCK_SIZE * (nr_assoc))
>>
>> +/* Compute Express Link (CXL r3.1, sec 8.1.5) */
>> +#define PCI_DVSEC_CXL_PORT 3
>> +#define PCI_DVSEC_CXL_PORT_CTL 0x0c
>> +#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
>> +
>> +/*
>> + * Compute Express Link (CXL r3.2, sec 8.1)
>> + *
>> + * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
>> + * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
>> + * registers on downstream link-up events.
>> + */
>> +#define PCI_DVSEC_HEADER1_LENGTH_MASK __GENMASK(31, 20)
>> +
>> +/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
>> +#define CXL_DVSEC_PCIE_DEVICE 0
>> +#define CXL_DVSEC_CAP_OFFSET 0xA
>> +#define CXL_DVSEC_MEM_CAPABLE _BITUL(2)
>> +#define CXL_DVSEC_HDM_COUNT_MASK __GENMASK(5, 4)
>> +#define CXL_DVSEC_CTRL_OFFSET 0xC
>> +#define CXL_DVSEC_MEM_ENABLE _BITUL(2)
>> +#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
>> +#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
>> +#define CXL_DVSEC_MEM_INFO_VALID _BITUL(0)
>> +#define CXL_DVSEC_MEM_ACTIVE _BITUL(1)
>> +#define CXL_DVSEC_MEM_SIZE_LOW_MASK __GENMASK(31, 28)
>> +#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
>> +#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
>> +#define CXL_DVSEC_MEM_BASE_LOW_MASK __GENMASK(31, 28)
>> +
>> +#define CXL_DVSEC_RANGE_MAX 2
>> +
>> +/* CXL 3.2 8.1.4: Non-CXL Function Map DVSEC */
>> +#define CXL_DVSEC_FUNCTION_MAP 2
>> +
>> +/* CXL 3.2 8.1.5: Extensions DVSEC for Ports */
>> +#define CXL_DVSEC_PORT 3
>> +#define CXL_DVSEC_PORT_CTL 0x0c
>> +#define CXL_DVSEC_PORT_CTL_UNMASK_SBR 0x00000001
>> +
>> +/* CXL 3.2 8.1.6: GPF DVSEC for CXL Port */
>> +#define CXL_DVSEC_PORT_GPF 4
>> +#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
>> +#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK __GENMASK(3, 0)
>> +#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK __GENMASK(11, 8)
>> +#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
>> +#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK __GENMASK(3, 0)
>> +#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK __GENMASK(11, 8)
>> +
>> +/* CXL 3.2 8.1.7: GPF DVSEC for CXL Device */
>> +#define CXL_DVSEC_DEVICE_GPF 5
>> +
>> +/* CXL 3.2 8.1.9: Register Locator DVSEC */
>> +#define CXL_DVSEC_REG_LOCATOR 8
>> +#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
>> +#define CXL_DVSEC_REG_LOCATOR_BIR_MASK __GENMASK(2, 0)
>> +#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK __GENMASK(15, 8)
>> +#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK __GENMASK(31, 16)
>> +
>> #endif /* LINUX_PCI_REGS_H */
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 11/34] PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c
2026-01-22 17:23 ` Markus Elfring
@ 2026-01-22 20:05 ` Bowman, Terry
0 siblings, 0 replies; 129+ messages in thread
From: Bowman, Terry @ 2026-01-22 20:05 UTC (permalink / raw)
To: Markus Elfring, linux-pci, linux-cxl, Alejandro Lucero Palau,
Alison Schofield, Benjamin Cheatham, Bjorn Helgaas, Dan Carpenter,
Dan Williams, Dave Jiang, Davidlohr Bueso, Ira Weiny,
Jonathan Cameron, Kuppuswamy Sathyanarayanan, Lukas Wunner,
Li Ming, Pradeep Vinesh Reddy Kodamati, Robert Richter,
Shiju Jose, Smita Koralahalli, Vishal Verma
Cc: LKML
On 1/22/2026 11:23 AM, Markus Elfring wrote:
> …
>> +++ b/drivers/accel/thames/thames_gem.c
>> @@ -0,0 +1,353 @@
> …
>> +static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
>> +{
> …
>> + device_lock(&dev->dev);
>> +
>> + err_handler = dev->driver ? dev->driver->err_handler : NULL;
> …
>> +out:
>> + device_unlock(&dev->dev);
>> + return 0;
>> +}
> …
>
> Under which circumstances would you become interested to apply a statement
> like “guard(device)(&dev->dev);”?
> https://elixir.bootlin.com/linux/v6.19-rc5/source/include/linux/device.h#L913
>
> Regards,
> Markus
Hi Markus,
This patch was a move (leaving the lock/unlock calls intact). I change
the lock/unlock to guard() in the following patch, after the move:
PCI/AER: Use guard() in cxl_rch_handle_error_iter()
-Terry
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-22 19:32 ` Lukas Wunner
@ 2026-01-22 21:32 ` dan.j.williams
2026-01-23 12:22 ` Jonathan Cameron
0 siblings, 1 reply; 129+ messages in thread
From: dan.j.williams @ 2026-01-22 21:32 UTC (permalink / raw)
To: Lukas Wunner, dan.j.williams
Cc: Jonathan Cameron, Terry Bowman, dave, dave.jiang,
alison.schofield, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, vishal.l.verma, alucerop,
ira.weiny, linux-kernel, linux-pci
Lukas Wunner wrote:
> On Thu, Jan 22, 2026 at 11:09:48AM -0800, dan.j.williams@intel.com wrote:
> > Lukas Wunner wrote:
> > > a device possessing ECC RAM may raise
> > > a Correctable Internal Error when ECC successfully recovers from flipped
> > > bits because it allows alerting the user in advance that the device might
> > > need to be replaced in the near future. If ECC recovery fails, the device
> > > might try to use a reserved spare portion of RAM in lieu of the failing one
> > > and instruct the AER driver to recover through a bus reset. Such errors
> > > are not covered by the spec-defined types. Using the Internal Error type
> > > is the only possibility it seems.
> >
> > The Internal Error type is a poor fit for that. This ECC RAM scenario is
> > simply an internal device event, not a PCIe visible error case. Consider
> > that CXL Memory Expanders are nothing if not "devices possessing ECC RAM"
> > that may encounter correctable errors in that RAM. Yes, the user has need
> > for those correctable errors to be reported, and no, PCIe AER has no reason
> > to care about conveying those reports.
>
> I'm not aware of a better PCIe spec-defined mechanism to report such
> errors besides AER (Advanced Error *Reporting*), so I'm not sure why
> you consider it a poor fit.
PCIe spec has no role defining the internal error model of devices.
Linux has reason to not endorse a blurring of the lines of where the
PCIe error model ends and the device-specific error model begins. CXL
respects those boundaries, Xe is pushing the boundary.
> However, reporting corrected ECC errors is only half of the equation.
> As stated above, if the ECC error is not correctable, the device may
> choose to replace the faulty memory region with reserved spare memory,
> but then a reset is required to recover from the error. Precisely what
> the AER driver provides, so again I'm not sure why it's a poor fit.
Again CXL has a model for this, those are the "post-package repair"
events handled internally to the device / driver either transparently or
user coordinated. No AER needed. In general devices have plenty of
reasons that the driver determines they need to be reset, they do not
need AER core help to reset themselves on error.
AER is there for link recovery.
> > So if CXL saw no need to architect internal ECC events into AER, why does Xe
> > think it is special in this regard?
>
> The most charitable interpretation is that it's just the first mover
> and others will follow. Well actually CXL is the first mover. ;)
...first mover that helps clarify the role of AER that just happens to
match the status quo that PCIe AER core ignore internal errors.
> > The CXL solution is simply a typical device interrupt that notifies
> > new entries in the device event log. See trace_cxl_dram() and
> > trace_cxl_general_media() for that event handling.
>
> This seems to be based on CPER, which is not part of the PCIe Base Spec.
> I can only guess that xe devices are intended to be used on non-ACPI
> platforms as well, which may have led to the decision to use a
> PCIe spec-defined mechanism.
CPER is compatibility hack for operating systems that do not have native
CXL drivers. The native support is just an interrupt fronting an event
log retrieved with mailbox commands.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 26/34] cxl: Change CXL handlers to use guard() instead of scoped_guard()
2026-01-14 18:20 ` [PATCH v14 26/34] cxl: Change CXL handlers to use guard() instead of scoped_guard() Terry Bowman
@ 2026-01-23 10:05 ` Markus Elfring
0 siblings, 0 replies; 129+ messages in thread
From: Markus Elfring @ 2026-01-23 10:05 UTC (permalink / raw)
To: Terry Bowman, linux-pci, linux-cxl, Alejandro Lucero Palau,
Alison Schofield, Benjamin Cheatham, Bjorn Helgaas, Dan Carpenter,
Dan Williams, Dave Jiang, Davidlohr Bueso, Ira Weiny,
Jonathan Cameron, Kuppuswamy Sathyanarayanan, Lukas Wunner,
Li Ming, Pradeep Vinesh Reddy Kodamati, Robert Richter,
Shiju Jose, Smita Koralahalli, Vishal Verma
Cc: LKML
> The CXL protocol error handlers use scoped_guard() to guarantee access to
> the underlying CXL memory device. Improve readability and reduce complexity
> by changing the current scoped_guard() to be guard().
Would it be worth to mention that this adjustment can only be performed
because lock scopes can be kept the same for two function implementations?
Regards,
Markus
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
2026-01-22 21:32 ` dan.j.williams
@ 2026-01-23 12:22 ` Jonathan Cameron
0 siblings, 0 replies; 129+ messages in thread
From: Jonathan Cameron @ 2026-01-23 12:22 UTC (permalink / raw)
To: dan.j.williams
Cc: Lukas Wunner, Terry Bowman, dave, dave.jiang, alison.schofield,
bhelgaas, shiju.jose, ming.li, Smita.KoralahalliChannabasappa,
rrichter, dan.carpenter, PradeepVineshReddy.Kodamati,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
vishal.l.verma, alucerop, ira.weiny, linux-kernel, linux-pci
On Thu, 22 Jan 2026 13:32:08 -0800
dan.j.williams@intel.com wrote:
> Lukas Wunner wrote:
> > On Thu, Jan 22, 2026 at 11:09:48AM -0800, dan.j.williams@intel.com wrote:
> > > Lukas Wunner wrote:
> > > > a device possessing ECC RAM may raise
> > > > a Correctable Internal Error when ECC successfully recovers from flipped
> > > > bits because it allows alerting the user in advance that the device might
> > > > need to be replaced in the near future. If ECC recovery fails, the device
> > > > might try to use a reserved spare portion of RAM in lieu of the failing one
> > > > and instruct the AER driver to recover through a bus reset. Such errors
> > > > are not covered by the spec-defined types. Using the Internal Error type
> > > > is the only possibility it seems.
> > >
> > > The Internal Error type is a poor fit for that. This ECC RAM scenario is
> > > simply an internal device event, not a PCIe visible error case. Consider
> > > that CXL Memory Expanders are nothing if not "devices possessing ECC RAM"
> > > that may encounter correctable errors in that RAM. Yes, the user has need
> > > for those correctable errors to be reported, and no, PCIe AER has no reason
> > > to care about conveying those reports.
> >
> > I'm not aware of a better PCIe spec-defined mechanism to report such
> > errors besides AER (Advanced Error *Reporting*), so I'm not sure why
> > you consider it a poor fit.
>
> PCIe spec has no role defining the internal error model of devices.
> Linux has reason to not endorse a blurring of the lines of where the
> PCIe error model ends and the device-specific error model begins. CXL
> respects those boundaries, Xe is pushing the boundary.
FWIW we have a bunch of older hardware where we could report this sort
of error either via AER or via an MSI. After some push back years
ago, we flipped them all to the MSI path. That includes stuff that
triggers device resets. I don't think it caused us too much trouble
to make that switch.
>
> > However, reporting corrected ECC errors is only half of the equation.
> > As stated above, if the ECC error is not correctable, the device may
> > choose to replace the faulty memory region with reserved spare memory,
> > but then a reset is required to recover from the error. Precisely what
> > the AER driver provides, so again I'm not sure why it's a poor fit.
>
> Again CXL has a model for this, those are the "post-package repair"
> events handled internally to the device / driver either transparently or
> user coordinated. No AER needed. In general devices have plenty of
> reasons that the driver determines they need to be reset, they do not
> need AER core help to reset themselves on error.
>
> AER is there for link recovery.
>
> > > So if CXL saw no need to architect internal ECC events into AER, why does Xe
> > > think it is special in this regard?
> >
> > The most charitable interpretation is that it's just the first mover
> > and others will follow. Well actually CXL is the first mover. ;)
>
> ...first mover that helps clarify the role of AER that just happens to
> match the status quo that PCIe AER core ignore internal errors.
>
> > > The CXL solution is simply a typical device interrupt that notifies
> > > new entries in the device event log. See trace_cxl_dram() and
> > > trace_cxl_general_media() for that event handling.
> >
> > This seems to be based on CPER, which is not part of the PCIe Base Spec.
> > I can only guess that xe devices are intended to be used on non-ACPI
> > platforms as well, which may have led to the decision to use a
> > PCIe spec-defined mechanism.
>
> CPER is compatibility hack for operating systems that do not have native
> CXL drivers. The native support is just an interrupt fronting an event
> log retrieved with mailbox commands.
Just as a side note, CXL also has FW specific interrupts with a negotation
process for whether they are used, or MSI-X is used for event queues.
Jonathan
^ permalink raw reply [flat|nested] 129+ messages in thread
end of thread, other threads:[~2026-01-23 12:22 UTC | newest]
Thread overview: 129+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-14 18:20 [PATCH v14 00/34] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-01-14 18:20 ` [PATCH v14 01/34] PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2026-01-22 18:58 ` Bjorn Helgaas
2026-01-22 19:43 ` Bowman, Terry
2026-01-14 18:20 ` [PATCH v14 02/34] PCI: Update CXL DVSEC definitions Terry Bowman
2026-01-14 18:53 ` Jonathan Cameron
2026-01-19 23:44 ` dan.j.williams
2026-01-22 18:37 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 03/34] PCI: Introduce pcie_is_cxl() Terry Bowman
2026-01-21 1:19 ` dan.j.williams
2026-01-22 18:39 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 04/34] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2026-01-14 18:20 ` [PATCH v14 05/34] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
2026-01-14 18:20 ` [PATCH v14 06/34] PCI: Replace cxl_error_is_native() with pcie_aer_is_native() Terry Bowman
2026-01-14 18:55 ` Jonathan Cameron
2026-01-14 20:16 ` Dave Jiang
2026-01-14 20:15 ` Dave Jiang
2026-01-22 18:23 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 07/34] cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c Terry Bowman
2026-01-14 20:51 ` Dave Jiang
2026-01-14 18:20 ` [PATCH v14 08/34] cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c Terry Bowman
2026-01-14 20:35 ` Dave Jiang
2026-01-14 18:20 ` [PATCH v14 09/34] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
2026-01-14 19:01 ` Jonathan Cameron
2026-01-14 19:09 ` Kuppuswamy Sathyanarayanan
2026-01-14 20:40 ` Dave Jiang
2026-01-20 2:09 ` dan.j.williams
2026-01-22 10:31 ` Lukas Wunner
2026-01-22 16:48 ` dan.j.williams
2026-01-22 18:51 ` Lukas Wunner
2026-01-22 18:49 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 10/34] PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error() Terry Bowman
2026-01-14 19:08 ` Jonathan Cameron
2026-01-15 20:42 ` dan.j.williams
2026-01-22 13:34 ` Lukas Wunner
2026-01-22 19:09 ` dan.j.williams
2026-01-22 19:32 ` Lukas Wunner
2026-01-22 21:32 ` dan.j.williams
2026-01-23 12:22 ` Jonathan Cameron
2026-01-20 2:20 ` dan.j.williams
2026-01-20 15:15 ` Bowman, Terry
2026-01-20 16:53 ` dan.j.williams
2026-01-22 18:48 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 11/34] PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c Terry Bowman
2026-01-22 17:23 ` Markus Elfring
2026-01-22 20:05 ` Bowman, Terry
2026-01-22 18:53 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 12/34] PCI/AER: Use guard() in cxl_rch_handle_error_iter() Terry Bowman
2026-01-14 19:11 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 13/34] PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS Terry Bowman
2026-01-14 19:12 ` Jonathan Cameron
2026-01-14 20:49 ` Dave Jiang
2026-01-14 20:50 ` Dave Jiang
2026-01-22 18:24 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 14/34] PCI/AER: Report CXL or PCIe bus type in AER trace logging Terry Bowman
2026-01-14 19:45 ` Jonathan Cameron
2026-01-15 15:55 ` Mauro Carvalho Chehab
2026-01-14 20:56 ` Dave Jiang
2026-01-14 18:20 ` [PATCH v14 15/34] PCI/AER: Update struct aer_err_info with kernel-doc formatting Terry Bowman
2026-01-14 19:48 ` Jonathan Cameron
2026-01-15 20:56 ` dan.j.williams
2026-01-14 21:06 ` Dave Jiang
2026-01-22 18:29 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 16/34] cxl/mem: Clarify @host for devm_cxl_add_nvdimm() Terry Bowman
2026-01-14 19:49 ` Jonathan Cameron
2026-01-14 21:08 ` Dave Jiang
2026-01-16 3:07 ` dan.j.williams
2026-01-16 16:22 ` Dave Jiang
2026-01-14 18:20 ` [PATCH v14 17/34] cxl: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2026-01-14 18:20 ` [PATCH v14 18/34] cxl/port: Remove "enumerate dports" helpers Terry Bowman
2026-01-14 19:50 ` Jonathan Cameron
2026-01-14 21:23 ` Dave Jiang
2026-01-16 3:15 ` dan.j.williams
2026-01-14 21:24 ` Dave Jiang
2026-01-16 3:21 ` dan.j.williams
2026-01-14 18:20 ` [PATCH v14 19/34] cxl/port: Fix devm resource leaks around with dport management Terry Bowman
2026-01-14 21:26 ` Dave Jiang
2026-01-15 14:46 ` Jonathan Cameron
2026-01-16 4:45 ` dan.j.williams
2026-01-16 15:01 ` Jonathan Cameron
2026-01-16 16:16 ` Jonathan Cameron
2026-01-19 23:02 ` dan.j.williams
2026-01-20 12:25 ` Jonathan Cameron
2026-01-19 2:48 ` dan.j.williams
2026-01-14 18:20 ` [PATCH v14 20/34] cxl/port: Move dport operations to a driver event Terry Bowman
2026-01-14 21:45 ` Dave Jiang
2026-01-15 14:56 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 21/34] cxl/port: Move dport RAS reporting to a port resource Terry Bowman
2026-01-14 21:47 ` Dave Jiang
2026-01-15 15:02 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 22/34] cxl: Update CXL Endpoint tracing Terry Bowman
2026-01-14 18:20 ` [PATCH v14 23/34] cxl: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2026-01-14 21:53 ` Dave Jiang
2026-01-15 15:17 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 24/34] cxl/port: Move endpoint component register management to cxl_port Terry Bowman
2026-01-14 21:55 ` Dave Jiang
2026-01-15 15:28 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 25/34] cxl/port: Map Port component registers before switchport init Terry Bowman
2026-01-14 21:59 ` Dave Jiang
2026-01-15 15:30 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 26/34] cxl: Change CXL handlers to use guard() instead of scoped_guard() Terry Bowman
2026-01-23 10:05 ` Markus Elfring
2026-01-14 18:20 ` [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
2026-01-14 18:58 ` Kuppuswamy Sathyanarayanan
2026-01-14 19:20 ` Bowman, Terry
2026-01-14 19:45 ` Kuppuswamy Sathyanarayanan
2026-01-14 18:20 ` [PATCH v14 28/34] PCI/AER: Move AER driver's CXL VH handling to pcie/aer_cxl_vh.c Terry Bowman
2026-01-15 15:40 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 29/34] cxl/port: Unify endpoint and switch port lookup Terry Bowman
2026-01-14 23:04 ` Dave Jiang
2026-01-15 15:44 ` Jonathan Cameron
2026-01-14 18:20 ` [PATCH v14 30/34] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2026-01-14 23:18 ` Dave Jiang
2026-01-16 14:42 ` Bowman, Terry
2026-01-15 16:01 ` Jonathan Cameron
2026-01-15 17:29 ` Bowman, Terry
2026-01-22 18:32 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 31/34] PCI: Introduce CXL Port protocol error handlers Terry Bowman
2026-01-14 23:37 ` Dave Jiang
2026-01-15 16:12 ` Jonathan Cameron
2026-01-22 18:27 ` Bjorn Helgaas
2026-01-14 18:20 ` [PATCH v14 32/34] cxl: Update Endpoint uncorrectable protocol error handling Terry Bowman
2026-01-14 22:07 ` dan.j.williams
2026-01-15 15:26 ` Bowman, Terry
2026-01-15 15:27 ` Bowman, Terry
2026-01-14 18:20 ` [PATCH v14 33/34] cxl: Update Endpoint correctable " Terry Bowman
2026-01-14 18:20 ` [PATCH v14 34/34] cxl: Enable CXL protocol errors during CXL Port probe Terry Bowman
2026-01-15 16:18 ` Jonathan Cameron
2026-01-15 19:41 ` Bowman, Terry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox