* [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging
@ 2025-08-27 1:35 Terry Bowman
2025-08-27 1:35 ` [PATCH v11 01/23] cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c Terry Bowman
` (23 more replies)
0 siblings, 24 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
This patchset adds CXL Protocol Error handling for CXL Ports and updates CXL
Endpoints (EP) handling. Previous versions of this series can be found here:
https://lore.kernel.org/linux-cxl/20250626224252.1415009-1-terry.bowman@amd.com/
The first 6 patches reorganize protocol error related CXL code into new files.
The first patch is authored by Dave Jiang and moves the CXL handling and logging into
cxl/core/ras.c. Patch 5 and 6 move the AER driver's restricted CXL host (RCH) logic
into pci/pcie/rch_aer.c. The RCH logic is used for CXL RCH devices from CXL1.1.
CONFIG_PCIEAER_CXL is removed and replaced with CONFIG_CXL_RAS, defined in CXL
Kconfig.
The next 4 patches introduce pcie_is_cxl() and use in the AER driver to detect and
log the error type as PCIe error or CXL error.
The next 4 patches are error handler cleanup in preparation for upcoming error handler
changes.
The next 7 patches introduce CXL Port error handling as well as updating the existing
CXL Endpoint handling. This sets all CXL Ports and EPs to use similar handling and logging
flow. Note, the PCIe EP handling remains for cases the EP is not recognized as a CXL
device. These patches also include change to move the AER driver's CXL virtual hierarchy
code into a new file named pci/pcie/cxl_aer.c. This separates the AER driver's CXL
implementation from the AER driver's core file.
The final 2 patches enable/disable CXL protocol error interrupts during CXL port
creation and teardown.
Note, I'll be on PTO until 9/10. Responses may be delayed.
==== Testing ====
Below are the testing results while using QEMU. The QEMU testing uses a CXL Root
Port, CXL Upstream Switch Port, CXL Downstream Switch Port and CXL Endpoint as
given below. I've attached the QEMU startup commandline used.
The sub-topology for the QEMU testing is:
---------------------
| CXL RP - 0C:00.0 |
---------------------
|
---------------------
| CXL USP - 0D:00.0 |
---------------------
|
---------------------
| CXL DSP - 0E:00.0 |
---------------------
|
---------------------
| CXL EP - 0F:00.0 |
---------------------
root@tbowman-cxl:~# lspci -t
-+-[0000:00]- -00.0
| +-01.0
| +-02.0
| +-03.0
| +-1f.0
| +-1f.2
| \-1f.3
\-[0000:0c]---00.0-[0d-0f]----00.0-[0e-0f]----00.0-[0f]----00.0
The topology was created with:
${qemu} -boot menu=on \
-cpu host \
-nographic \
-monitor telnet:127.0.0.1:1234,server,nowait \
-M virt,cxl=on \
-chardev stdio,id=s1,signal=off,mux=on -serial none \
-device isa-serial,chardev=s1 -mon chardev=s1,mode=readline \
-machine q35,cxl=on \
-m 16G,maxmem=24G,slots=8 \
-cpu EPYC-v3 \
-smp 16 \
-accel kvm \
-drive file=${img},format=raw,index=0,media=disk \
-device e1000,netdev=user.0 \
-netdev user,id=user.0,hostfwd=tcp::5555-:22 \
-object memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device cxl-type3,bus=swport0,volatile-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-vmem0 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
=== Root Port ===
root@tbowman-cxl:~/aer-inject# ./root-ce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0c:00.0
pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0c:00.0
aer_event: 0000:0c:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not availabl e
pcieport 0000:0c:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
pcieport 0000:0c:00.0: device [8086:7075] error status/mask=00004000/0000a000
pcieport 0000:0c:00.0: [14] CorrIntErr
cxl_aer_correctable_error: memdev=0000:0c:00.0 host=pci0000:0c serial=0: status: 'CRC Threshold Hit'
root@tbowman-cxl:~/aer-inject# ./root-uce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0c:00.0
pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0c:00.0
aer_event: 0000:0c:00.0 CXL Bus Error: severity=Fatal, Uncorrectable Internal Error, TLP Header=Not availabl e
pcieport 0000:0c:00.0: CXL Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
pcieport 0000:0c:00.0: device [8086:7075] error status/mask=00400000/02000000
pcieport 0000:0c:00.0: [22] UncorrIntErr
cxl_aer_uncorrectable_error: memdev=0000:0c:00.0 host=pci0000:0c serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 10 UID: 0 PID: 216 Comm: kworker/10:1 Tainted: G E 6.16.0-rc4-testing-00054-g263b76bedfed #1306 PREEMPT(voluntary)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: events cxl_proto_err_work_fn [cxl_core]
RIP: 0010:cxl_error_detected+0x18/0xd0 [cxl_core]
Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 8b 5f 78 <4c> 8b 63 08 4d 8d b4 24 80 00 00 00 4c 89 f7 e8 e4 68 0a f6 be 0f
RSP: 0018:ffffd1870091fd58 EFLAGS: 00010282
RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000003
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8e0241c120c8
RBP: ffff8e0241c120c8 R08: 0000000000000284 R09: 0000000000000001
R10: 00000000ffffffff R11: 00cccccc00cccccc R12: ffff8e0241c12148
R13: ffff8e0241724f00 R14: 0000000000000000 R15: ffff8e0248571740
FS: 0000000000000000(0000) GS:ffff8e05f7ba5000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000113a8b000 CR4: 00000000003506f0
Call Trace:
<TASK>
cxl_report_error_detected+0x3e/0x70 [cxl_core]
cxl_walk_port.constprop.0+0xa4/0x140 [cxl_core]
cxl_proto_err_work_fn+0x1fa/0x430 [cxl_core]
? srso_return_thunk+0x5/0x5f
process_scheduled_works+0xa8/0x420
? __pfx_worker_thread+0x10/0x10
worker_thread+0x11c/0x260
? __pfx_worker_thread+0x10/0x10
kthread+0xfe/0x210
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x18d/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: cfg80211(E) binfmt_misc(E) ppdev(E) intel_rapl_msr(E) cxl_mem(E) intel_rapl_common(E) cxl_pci(E) parport_pc(E) cxl_pmem(E) parport(E) input_leds(E) joydev(E) mac_hid(E) serio_raw(E) dm_m)
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
RIP: 0010:cxl_error_detected+0x18/0xd0 [cxl_core]
Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 8b 5f 78 <4c> 8b 63 08 4d 8d b4 24 80 00 00 00 4c 89 f7 e8 e4 68 0a f6 be 0f
RSP: 0018:ffffd1870091fd58 EFLAGS: 00010282
RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000003
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8e0241c120c8
RBP: ffff8e0241c120c8 R08: 0000000000000284 R09: 0000000000000001
R10: 00000000ffffffff R11: 00cccccc00cccccc R12: ffff8e0241c12148
R13: ffff8e0241724f00 R14: 0000000000000000 R15: ffff8e0248571740
FS: 0000000000000000(0000) GS:ffff8e05f7ba5000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000113a8b000 CR4: 00000000003506f0
note: kworker/10:1[216] exited with irqs disabled
=== Upstream Switch Port ===
root@tbowman-cxl:~/aer-inject# ./us-ce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0d:00.0
pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0d:00.0
aer_event: 0000:0d:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
pcieport 0000:0d:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
pcieport 0000:0d:00.0: device [19e5:a128] error status/mask=00004000/0000a000
pcieport 0000:0d:00.0: [14] CorrIntErr
cxl_aer_correctable_error: memdev=0000:0d:00.0 host=0000:0c:00.0 serial=0: status: 'CRC Threshold Hit'
root@tbowman-cxl:~/aer-inject# ./us-uce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0d:00.0
pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0d:00.0
aer_event: 0000:0d:00.0 CXL Bus Error: severity=Fatal, , TLP Header=Not available
pcieport 0000:0d:00.0: AER: CXL Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID)
cxl_aer_uncorrectable_error: memdev=mem0 host=0000:0f:00.0 serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Kernel panic - not syncing: CXL cachemem error.
CPU: 10 UID: 0 PID: 147 Comm: irq/24-aerdrv Tainted: G E 6.16.0-rc4-testing-00054-g263b76bedfed #1306 PREEMPT(voluntary)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
panic+0x364/0x3d0
? __pfx_aer_root_reset+0x10/0x10
pci_error_detected+0x2b/0x30 [cxl_core]
report_error_detected+0xf7/0x170
? __pfx_report_frozen_detected+0x10/0x10
__pci_walk_bus+0x4c/0x70
? __pfx_report_frozen_detected+0x10/0x10
__pci_walk_bus+0x34/0x70
? __pfx_report_frozen_detected+0x10/0x10
__pci_walk_bus+0x34/0x70
? __pfx_report_frozen_detected+0x10/0x10
pci_walk_bus+0x31/0x50
pcie_do_recovery+0x163/0x2b0
aer_isr_one_error_type+0x1f3/0x380
aer_isr_one_error+0x11d/0x140
aer_isr+0x4c/0x80
irq_thread_fn+0x24/0x70
irq_thread+0x199/0x2a0
? __pfx_irq_thread_fn+0x10/0x10
? __pfx_irq_thread_dtor+0x10/0x10
? __pfx_irq_thread+0x10/0x10
kthread+0xfe/0x210
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x18d/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Kernel Offset: 0x13400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: CXL cachemem error. ]---
=== Downstream Switch Port ===
root@tbowman-cxl:~/aer-inject# ./ds-ce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0e:00.0
pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0e:00.0
aer_event: 0000:0e:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
pcieport 0000:0e:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
pcieport 0000:0e:00.0: device [19e5:a129] error status/mask=00004000/0000a000
pcieport 0000:0e:00.0: [14] CorrIntErr
cxl_aer_correctable_error: memdev=0000:0e:00.0 host=0000:0d:00.0 serial=0: status: 'CRC Threshold Hit'
root@tbowman-cxl:~/aer-inject# ./ds-uce-inject# ./ds-uce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0e:00.0
pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0e:00.0
aer_event: 0000:0e:00.0 CXL Bus Error: severity=Fatal, Uncorrectable Internal Error, TLP Header=Not available
pcieport 0000:0e:00.0: CXL Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
pcieport 0000:0e:00.0: device [19e5:a129] error status/mask=00400000/02000000
pcieport 0000:0e:00.0: [22] UncorrIntErr
cxl_aer_uncorrectable_error: memdev=0000:0d:00.0 host=0000:0c:00.0 serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 10 UID: 0 PID: 196 Comm: kworker/10:1 Tainted: G E 6.16.0-rc4-testing-00054-g263b76bedfed #1306 PREEMPT(voluntary)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: events cxl_proto_err_work_fn [cxl_core]
RIP: 0010:cxl_error_detected+0x18/0xd0 [cxl_core]
Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 8b 5f 78 <4c> 8b 63 08 4d 8d b4 24 80 00 00 00 4c 89 f7 e8 e4 68 aa d9 be 0f
RSP: 0018:ffffd1178086fd58 EFLAGS: 00010282
RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000003
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8a0201c510c8
RBP: ffff8a0201c510c8 R08: 000000000000028c R09: 0000000000000001
R10: 00000000ffffffff R11: 00cccccc00cccccc R12: ffff8a0201c51148
R13: ffff8a0201c56000 R14: ffff8a02011ed000 R15: ffff8a0201110000
FS: 0000000000000000(0000) GS:ffff8a05d3fa5000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000103574000 CR4: 00000000003506f0
Call Trace:
<TASK>
cxl_report_error_detected+0x3e/0x70 [cxl_core]
cxl_walk_port.constprop.0+0x12a/0x140 [cxl_core]
? srso_return_thunk+0x5/0x5f
cxl_proto_err_work_fn+0x1fa/0x430 [cxl_core]
? srso_return_thunk+0x5/0x5f
process_scheduled_works+0xa8/0x420
? __pfx_worker_thread+0x10/0x10
worker_thread+0x11c/0x260
? __pfx_worker_thread+0x10/0x10
kthread+0xfe/0x210
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x18d/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: cfg80211(E) binfmt_misc(E) intel_rapl_msr(E) intel_rapl_common(E) ppdev(E) cxl_mem(E) cxl_pmem(E) parport_pc(E) cxl_pci(E) parport(E) joydev(E) input_leds(E) mac_hid(E) serio_raw(E) dm_m)
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
RIP: 0010:cxl_error_detected+0x18/0xd0 [cxl_core]
Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 8b 5f 78 <4c> 8b 63 08 4d 8d b4 24 80 00 00 00 4c 89 f7 e8 e4 68 aa d9 be 0f
RSP: 0018:ffffd1178086fd58 EFLAGS: 00010282
RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000003
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8a0201c510c8
RBP: ffff8a0201c510c8 R08: 000000000000028c R09: 0000000000000001
R10: 00000000ffffffff R11: 00cccccc00cccccc R12: ffff8a0201c51148
R13: ffff8a0201c56000 R14: ffff8a02011ed000 R15: ffff8a0201110000
FS: 0000000000000000(0000) GS:ffff8a05d3fa5000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000103574000 CR4: 00000000003506f0
note: kworker/10:1[196] exited with irqs disabled
=== Endpoint ===
root@tbowman-cxl:~/aer-inject# ./ep-ce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0f:00.0
pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0f:00.0
aer_event: 0000:0f:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
cxl_pci 0000:0f:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
cxl_pci 0000:0f:00.0: device [8086:0d93] error status/mask=00004000/00000000
cxl_pci 0000:0f:00.0: [14] CorrIntErr
cxl_aer_uncorrectable_error: memdev=mem0 host=0000:0f:00.0 serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
root@tbowman-cxl:~/aer-inject# ./ep-uce-inject.sh
pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0f:00.0
pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0f:00.0
aer_event: 0000:0f:00.0 CXL Bus Error: severity=Fatal, , TLP Header=Not available
cxl_pci 0000:0f:00.0: AER: CXL Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID)
cxl_aer_uncorrectable_error: memdev=mem0 host=0000:0f:00.0 serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Kernel panic - not syncing: CXL cachemem error.
CPU: 10 UID: 0 PID: 148 Comm: irq/24-aerdrv Tainted: G E 6.16.0-rc4-testing-00054-g263b76bedfed #1306 PREEMPT(voluntary)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
panic+0x364/0x3d0
? __pfx_aer_root_reset+0x10/0x10
pci_error_detected+0x2b/0x30 [cxl_core]
report_error_detected+0xf7/0x170
? __pfx_report_frozen_detected+0x10/0x10
__pci_walk_bus+0x4c/0x70
? __pfx_report_frozen_detected+0x10/0x10
pci_walk_bus+0x31/0x50
pcie_do_recovery+0x163/0x2b0
aer_isr_one_error_type+0x1f3/0x380
aer_isr_one_error+0x11d/0x140
aer_isr+0x4c/0x80
irq_thread_fn+0x24/0x70
irq_thread+0x199/0x2a0
? __pfx_irq_thread_fn+0x10/0x10
? __pfx_irq_thread_dtor+0x10/0x10
? __pfx_irq_thread+0x10/0x10
kthread+0xfe/0x210
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x18d/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Kernel Offset: 0xd600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: CXL cachemem error. ]---
==== Changes ====
Changes in v10 -> v11:
cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c
- New patch
CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS
- New patch
cxl/pci: Remove unnecessary CXL RCH handling helper functions
- New patch
cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block
- New patch
CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors
- Remove changes in code-split and move to earlier, new patch
- Add #include <linux/bitfield.h> to cxl_ras.c
- Move cxl_rch_handle_error() & cxl_rch_enable_rcec() declarations from pci.h
to aer.h, more localized.
- Introduce CONFIG_CXL_RCH_RAS, includes Makefile changes, ras.c ifdef changes
CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
- New patch
PCI/CXL: Introduce pcie_is_cxl()
- Amended set_pcie_cxl() to check for Upstream Port's and EP's parent
downstream port by calling set_pcie_cxl(). (Dan)
- Retitle patch: 'Add' -> 'Introduce'
- Add check for CXL.mem and CXL.cache (Alejandro, Dan)
PCI/AER: Report CXL or PCIe bus error type in trace logging
- Remove duplicate call to trace_aer_event() (Shiju)
- Added Dan William's and Dave Jiang's reviewed-by
CXL/AER: Update PCI class code check to use FIELD_GET()
- Add #include <linux/bitfield.h> to cxl_ras.c (Terry)
- Removed line wrapping at "(CXL 3.2, 8.1.12.1)". (Jonathan)
cxl/pci: Log message if RAS registers are unmapped
- Added Dave Jiang's review-by (Terry)
cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
- Updated CE and UCE trace routines to maintian consistent TP_Struct ABI
and unchanged TP_printk() logging. (Shiju, Alison)
cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors
- Added Dave Jiang and Jonathan Cameron's review-by
- Changes moved to core/ras.c
cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers
- Use local pointer for readability in cxl_switch_port_init_ras() (Jonathan Cameron)
- Rename port to be ep in cxl_endpoint_port_init_ras() (Dave Jiang)
- Rename dport to be parent_dport in cxl_endpoint_port_init_ras()
and cxl_switch_port_init_ras() (Dave Jiang)
- Port helper changes were in cxl/port.c, now in core/ras.c (Dave Jiang)
cxl/pci: Introduce CXL Endpoint protocol error handlers
- cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
- cxl_error_detected() - Remove extra line (Shiju)
- Changes moved to core/ras.c (Terry)
- cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
- Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
- Move #include "pci.h from cxl.h to core.h (Terry)
- Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors
- Move RCH implementation to cxl_rch.c and RCH declarations to pci/pci.h. (Terry)
- Introduce 'struct cxl_proto_err_kfifo' containing semaphore, fifo,
and work struct. (Dan)
- Remove embedded struct from cxl_proto_err_work (Dan)
- Make 'struct work_struct *cxl_proto_err_work' definition static (Jonathan)
- Add check for NULL cxl_proto_err_kfifo to determine if CXL driver is
not registered for workqueue. (Dan)
PCI/AER: Dequeue forwarded CXL error
- Reword patch commit message to remove RCiEP details (Jonathan)
- Add #include <linux/bitfield.h> (Terry)
- is_cxl_rcd() - Fix short comment message wrap (Jonathan)
- is_cxl_rcd() - Combine return calls into 1 (Jonathan)
- cxl_handle_proto_error() - Move comment earlier (Jonathan)
- Usse FIELD_GET() in discovering class code (Jonathan)
- Remove BDF from cxl_proto_err_work_data. Use 'struct pci_dev *' (Dan)
CXL/PCI: Introduce CXL Port protocol error handlers
- Removed check for PCI_EXP_TYPE_RC_END in cxl_report_error_detected() (Terry)
- Update is_cxl_error() to check for acceptable PCI EP and port types
CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
- pci_ers_merge_result() - Change export to non-namespace and rename
to be pci_ers_merge_result() (Jonathan)
- Move pci_ers_merge_result() definition to pci.h. Needs pci_ers_result (Terry)
CXL/PCI: Introduce CXL uncorrectable protocol error recovery
- pci_ers_merge_results() - Move to earlier patch
CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup
- Remove guard() in cxl_mask_proto_interrupts(). Observed device lockup/block
during testing. (Terry)
Changes in v9 -> v10:
- Add drivers/pci/pcie/cxl_aer.c
- Add drivers/cxl/core/native_ras.c
- Change cxl_register_prot_err_work()/cxl_unregister_prot_err_work to return void
- Check for pcie_ports_native in cxl_do_recovery()
- Remove debug logging in cxl_do_recovery()
- Update PCI_ERS_RESULT_PANIC definition to indicate is CXL specific
- Revert trace logging changes: name,parent -> memdev,host.
- Use FIELD_GET() to check for EP class code (cxl_aer.c & native_ras.c).
- Change _prot_ to _proto_ everywhere
- cxl_rch_handle_error_iter(), check if driver is cxl_pci_driver
- Remove cxl_create_prot_error_info(). Move logic into forward_cxl_error()
- Remove sbdf_to_pci() and move logic into cxl_handle_proto_error()
- Simplify/refactor get_pci_cxl_host_dev()
- Simplify/refactor cxl_get_ras_base()
- Move patch 'Remove unnecessary CXL Endpoint handling helper functions' to front
- Update description for 'CXL/PCI: Introduce CXL Port protocol error
handlers' with why state is not used to determine handling
- Introduce cxl_pci_drv_bound() and call from cxl_rch_handle_error_iter()
Changes in v8 -> v9:
- Updated reference counting to use pci_get_device()/pci_put_device() in
cxl_disable_prot_errors()/cxl_enable_prot_errors
- Refactored cxl_create_prot_err_info() to fix reference counting
- Removed 'struct cxl_port' driver changes for error handler. Instead
check for CXL device type (EP or Port device) and call handler
- Make pcie_is_cxl() static inline in include/linux/linux.h
- Remove NULL check in create_prot_err_info()
- Change success return in cxl_ras_init() to use hardcoded 0
- Changed 'struct work_struct cxl_prot_err_work' declaration to static
- Change to use rate limited log with dev anchor in forward_cxl_error()
- Refactored forward-cxl_error() to remove severity auto variable
- Changed pci_aer_clear_nonfatal_status() to be static inline for
!(CONFIG_PCIEAER)
- Renamed merge_result() to be cxl_merge_result()
- Removed 'ue' condition in cxl_error_detected()
- Updated 2nd parameter in call to __cxl_handle_cor_ras()/__cxl_handle_ras()
in unify patch
- Added log message for failure while assigning interrupt disable callback
- Updated pci_aer_mask_internal_errors() to use pci_clear_and_set_config_dword()
- Simplified patch titles for clarity
- Moved CXL error interrupt disabling into cxl/core/port.c with CXL Port
teardown
- Updated 'struct cxl_port_err_info' to only contain sbdf and severity
Removed everything else.
- Added pdev and CXL device get_device()/put_device() before calling handlers
Changes in v7 -> v8:
[Dan] Use kfifo. Move handling to CXL driver. AER forwards error to CXL
driver
[Dan] Add device reference incrementors where needed throughout
[Dan] Initiate CXL Port RAS init from Switch Port and Endpoint Port init
[Dan] Combine CXL Port and CXL Endpoint trace routine
[Dan] Introduce aer_info::is_cxl. Use to indicate CXL or PCI errors
[Jonathan] Add serial number for all devices in trace
[DaveJ] Move find_cxl_port() change into patch using it
[Terry] Move CXL Port RAS init into cxl/port.c
[Terry] Moved kfifo functions into cxl/core/ras.c
Changes in v6 -> v7:
[Terry] Move updated trace routine call to later patch. Was causing build
error.
Changes in v5 -> v6:
[Ira] Move pcie_is_cxl(dev) define to a inline function
[Ira] Update returning value from pcie_is_cxl_port() to bool w/o cast
[Ira] Change cxl_report_error_detected() cleanup to return correct bool
[Ira] Introduce and use PCI_ERS_RESULT_PANIC
[Ira] Reuse comment for PCIe and CXL recovery paths
[Jonathan] Add type check in for cxl_handle_cor_ras() and cxl_handle_ras()
[Jonathan] cxl_uport/dport_init_ras_reporting(), added a mutex.
[Jonathan] Add logging example to patches updating trace output
[Jonathan] Make parameter 'const' to eliminate for cast in match_uport()
[Jonathan] Use __free() in cxl_pci_port_ras()
[Terry] Add patch to log the PCIe SBDF along with CXL device name
[Terry] Add patch to handle CXL endpoint and RCH DP errors as CXL errors
[Terry] Remove patch w USP UCE fatal support @ aer_get_device_error_info()
[Terry] Rebase to cxl/next commit 5585e342e8d3 ("cxl/memdev: Remove unused partition values")
[Gregory] Pre-initialize pointer to NULL in cxl_pci_port_ras()
[Gregory] Move AER driver bus name detection to a static function
Changes in v4 -> v5:
[Alejandro] Refactor cxl_walk_bridge to simplify 'status' variable usage
[Alejandro] Add WARN_ONCE() in __cxl_handle_ras() and cxl_handle_cor_ras()
[Ming] Remove unnecessary NULL check in cxl_pci_port_ras()
[Terry] Add failure check for call to to_cxl_port() in cxl_pci_port_ras()
[Ming] Use port->dev for call to devm_add_action_or_reset() in
cxl_dport_init_ras_reporting() and cxl_uport_init_ras_reporting()
[Jonathan] Use get_device()/put_device() to prevent race condition in
cxl_clear_port_error_handlers() and cxl_clear_port_error_handlers()
[Terry] Commit message cleanup. Capitalize keywords from CXL and PCI
specifications
Changes in v3 -> v4:
[Lukas] Capitalize PCIe and CXL device names as in specifications
[Lukas] Move call to pcie_is_cxl() into cxl_port_devsec()
[Lukas] Correct namespace spelling
[Lukas] Removed export from pcie_is_cxl_port()
[Lukas] Simplify 'if' blocks in cxl_handle_error()
[Lukas] Change panic message to remove redundant 'panic' text
[Ming] Update to call cxl_dport_init_ras_reporting() in RCH case
[lkp@intel] 'host' parameter is already removed. Remove parameter description too.
[Terry] Added field description for cxl_err_handlers in pci.h comment block
Changes in v1 -> v2:
[Jonathan] Remove extra NULL check and cleanup in cxl_pci_port_ras()
[Jonathan] Update description to DSP map patch description
[Jonathan] Update cxl_pci_port_ras() to check for NULL port
[Jonathan] Dont call handler before handler port changes are present (patch order)
[Bjorn] Fix linebreak in cover sheet URL
[Bjorn] Remove timestamps from test logs in cover sheet
[Bjorn] Retitle AER commits to use "PCI/AER:"
[Bjorn] Retitle patch#3 to use renaming instead of refactoring
[Bjorn] Fix base commit-id on cover sheet
[Bjorn] Add VH spec reference/citation
[Terry] Removed last 2 patches to enable internal errors. Is not needed
because internal errors are enabled in AER driver.
[Dan] Create cxl_do_recovery() and pci_driver::cxl_err_handlers.
[Dan] Use kernel panic in CXL recovery
[Dan] cxl_port_hndlrs -> cxl_port_error_handlers
Dave Jiang (1):
cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c
Terry Bowman (22):
CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS
cxl/pci: Remove unnecessary CXL Endpoint handling helper functions
cxl/pci: Remove unnecessary CXL RCH handling helper functions
cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS
conditional block
CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH
errors
CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
PCI/CXL: Introduce pcie_is_cxl()
PCI/AER: Report CXL or PCIe bus error type in trace logging
CXL/AER: Update PCI class code check to use FIELD_GET()
cxl/pci: Update RAS handler interfaces to also support CXL Ports
cxl/pci: Log message if RAS registers are unmapped
cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors
cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers
cxl/pci: Introduce CXL Endpoint protocol error handlers
CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors
PCI/AER: Dequeue forwarded CXL error
CXL/PCI: Introduce CXL Port protocol error handlers
CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
CXL/PCI: Introduce CXL uncorrectable protocol error recovery
CXL/PCI: Enable CXL protocol errors during CXL Port probe
CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup
drivers/cxl/Kconfig | 10 +
drivers/cxl/core/Makefile | 2 +-
drivers/cxl/core/core.h | 46 +++
drivers/cxl/core/pci.c | 381 ++----------------
drivers/cxl/core/port.c | 11 +-
drivers/cxl/core/ras.c | 700 +++++++++++++++++++++++++++++++++-
drivers/cxl/core/regs.c | 12 +-
drivers/cxl/core/trace.h | 68 +---
drivers/cxl/cxl.h | 11 +-
drivers/cxl/cxlpci.h | 56 ---
drivers/cxl/mem.c | 6 +-
drivers/cxl/pci.c | 11 +-
drivers/cxl/port.c | 5 +
drivers/pci/pci.c | 19 +-
drivers/pci/pci.h | 31 +-
drivers/pci/pcie/Kconfig | 9 -
drivers/pci/pcie/Makefile | 2 +
drivers/pci/pcie/aer.c | 164 +-------
drivers/pci/pcie/cxl_aer.c | 144 +++++++
drivers/pci/pcie/err.c | 14 +-
drivers/pci/pcie/rcec.c | 1 +
drivers/pci/pcie/rch_aer.c | 99 +++++
drivers/pci/probe.c | 25 ++
include/linux/aer.h | 29 ++
include/linux/pci.h | 30 ++
include/ras/ras_event.h | 9 +-
include/uapi/linux/pci_regs.h | 65 +++-
tools/testing/cxl/Kbuild | 2 +-
28 files changed, 1286 insertions(+), 676 deletions(-)
create mode 100644 drivers/pci/pcie/cxl_aer.c
create mode 100644 drivers/pci/pcie/rch_aer.c
base-commit: f11a5f89910a7ae970fbce4fdc02d86a8ba8570f
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v11 01/23] cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 1:35 ` [PATCH v11 02/23] CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS Terry Bowman
` (22 subsequent siblings)
23 siblings, 0 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
From: Dave Jiang <dave.jiang@intel.com>
Create new config CONFIG_CXL_RAS and put all CXL RAS items behind
the config. The config will depend on CPER and PCIE AER to build.
Move the related RAS code from core/pci.c to core/ras.c.
Cc: Robert Richter <rrichter@amd.com>
Cc: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
- Updated by Terry Bowman to use (ACPI_APEI_GHES && PCIEAER_CXL) dependency
in Kconfig. Otherwise checks will be reauired for CONFIG_PCIEAER because
AER driver functions are called.
---
drivers/cxl/Kconfig | 3 +
drivers/cxl/core/Makefile | 2 +-
drivers/cxl/core/core.h | 12 ++
drivers/cxl/core/pci.c | 319 --------------------------------------
drivers/cxl/core/ras.c | 311 +++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 8 -
drivers/cxl/cxlpci.h | 16 ++
tools/testing/cxl/Kbuild | 2 +-
8 files changed, 344 insertions(+), 329 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 48b7314afdb8..1c7c8989fd8b 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -233,4 +233,7 @@ config CXL_MCE
def_bool y
depends on X86_MCE && MEMORY_FAILURE
+config CXL_RAS
+ def_bool y
+ depends on ACPI_APEI_GHES && PCIEAER_CXL
endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 5ad8fef210b5..b2930cc54f8b 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -14,9 +14,9 @@ cxl_core-y += pci.o
cxl_core-y += hdm.o
cxl_core-y += pmu.o
cxl_core-y += cdat.o
-cxl_core-y += ras.o
cxl_core-$(CONFIG_TRACING) += trace.o
cxl_core-$(CONFIG_CXL_REGION) += region.o
cxl_core-$(CONFIG_CXL_MCE) += mce.o
cxl_core-$(CONFIG_CXL_FEATURES) += features.o
cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
+cxl_core-$(CONFIG_CXL_RAS) += ras.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 2669f251d677..2c81a43d7b05 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -143,8 +143,20 @@ bool cxl_need_node_perf_attrs_update(int nid);
int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
struct access_coordinate *c);
+#ifdef CONFIG_CXL_RAS
int cxl_ras_init(void);
void cxl_ras_exit(void);
+#else
+static inline int cxl_ras_init(void)
+{
+ return 0;
+}
+
+static inline void cxl_ras_exit(void)
+{
+}
+#endif // CONFIG_CXL_RAS
+
int cxl_gpf_port_setup(struct cxl_dport *dport);
#ifdef CONFIG_CXL_FEATURES
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index b50551601c2e..a3aef78f903a 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -6,7 +6,6 @@
#include <linux/delay.h>
#include <linux/pci.h>
#include <linux/pci-doe.h>
-#include <linux/aer.h>
#include <cxlpci.h>
#include <cxlmem.h>
#include <cxl.h>
@@ -664,324 +663,6 @@ void read_cdat_data(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL");
-static void __cxl_handle_cor_ras(struct cxl_dev_state *cxlds,
- void __iomem *ras_base)
-{
- void __iomem *addr;
- u32 status;
-
- if (!ras_base)
- return;
-
- addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
- status = readl(addr);
- if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
- writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
- }
-}
-
-static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds)
-{
- return __cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
-}
-
-/* CXL spec rev3.0 8.2.4.16.1 */
-static void header_log_copy(void __iomem *ras_base, u32 *log)
-{
- void __iomem *addr;
- u32 *log_addr;
- int i, log_u32_size = CXL_HEADERLOG_SIZE / sizeof(u32);
-
- addr = ras_base + CXL_RAS_HEADER_LOG_OFFSET;
- log_addr = log;
-
- for (i = 0; i < log_u32_size; i++) {
- *log_addr = readl(addr);
- log_addr++;
- addr += sizeof(u32);
- }
-}
-
-/*
- * Log the state of the RAS status registers and prepare them to log the
- * next error status. Return 1 if reset needed.
- */
-static bool __cxl_handle_ras(struct cxl_dev_state *cxlds,
- void __iomem *ras_base)
-{
- u32 hl[CXL_HEADERLOG_SIZE_U32];
- void __iomem *addr;
- u32 status;
- u32 fe;
-
- if (!ras_base)
- return false;
-
- addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
- status = readl(addr);
- if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
- return false;
-
- /* If multiple errors, log header points to first error from ctrl reg */
- if (hweight32(status) > 1) {
- void __iomem *rcc_addr =
- ras_base + CXL_RAS_CAP_CONTROL_OFFSET;
-
- fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
- readl(rcc_addr)));
- } else {
- fe = status;
- }
-
- header_log_copy(ras_base, hl);
- trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl);
- writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
-
- return true;
-}
-
-static bool cxl_handle_endpoint_ras(struct cxl_dev_state *cxlds)
-{
- return __cxl_handle_ras(cxlds, cxlds->regs.ras);
-}
-
-#ifdef CONFIG_PCIEAER_CXL
-
-static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
-{
- resource_size_t aer_phys;
- struct device *host;
- u16 aer_cap;
-
- aer_cap = cxl_rcrb_to_aer(dport->dport_dev, dport->rcrb.base);
- if (aer_cap) {
- host = dport->reg_map.host;
- aer_phys = aer_cap + dport->rcrb.base;
- dport->regs.dport_aer = devm_cxl_iomap_block(host, aer_phys,
- sizeof(struct aer_capability_regs));
- }
-}
-
-static void cxl_dport_map_ras(struct cxl_dport *dport)
-{
- struct cxl_register_map *map = &dport->reg_map;
- struct device *dev = dport->dport_dev;
-
- if (!map->component_map.ras.valid)
- dev_dbg(dev, "RAS registers not found\n");
- else if (cxl_map_component_regs(map, &dport->regs.component,
- BIT(CXL_CM_CAP_CAP_ID_RAS)))
- dev_dbg(dev, "Failed to map RAS capability.\n");
-}
-
-static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
-{
- void __iomem *aer_base = dport->regs.dport_aer;
- u32 aer_cmd_mask, aer_cmd;
-
- if (!aer_base)
- return;
-
- /*
- * Disable RCH root port command interrupts.
- * CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors
- *
- * This sequence may not be necessary. CXL spec states disabling
- * the root cmd register's interrupts is required. But, PCI spec
- * shows these are disabled by default on reset.
- */
- aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN |
- PCI_ERR_ROOT_CMD_NONFATAL_EN |
- PCI_ERR_ROOT_CMD_FATAL_EN);
- aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND);
- aer_cmd &= ~aer_cmd_mask;
- writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
-}
-
-/**
- * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
- * @dport: the cxl_dport that needs to be initialized
- * @host: host device for devm operations
- */
-void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
-{
- dport->reg_map.host = host;
- cxl_dport_map_ras(dport);
-
- if (dport->rch) {
- struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
-
- if (!host_bridge->native_aer)
- return;
-
- cxl_dport_map_rch_aer(dport);
- cxl_disable_rch_root_ints(dport);
- }
-}
-EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
-
-static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
- struct cxl_dport *dport)
-{
- return __cxl_handle_cor_ras(cxlds, dport->regs.ras);
-}
-
-static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
- struct cxl_dport *dport)
-{
- return __cxl_handle_ras(cxlds, dport->regs.ras);
-}
-
-/*
- * Copy the AER capability registers using 32 bit read accesses.
- * This is necessary because RCRB AER capability is MMIO mapped. Clear the
- * status after copying.
- *
- * @aer_base: base address of AER capability block in RCRB
- * @aer_regs: destination for copying AER capability
- */
-static bool cxl_rch_get_aer_info(void __iomem *aer_base,
- struct aer_capability_regs *aer_regs)
-{
- int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
- u32 *aer_regs_buf = (u32 *)aer_regs;
- int n;
-
- if (!aer_base)
- return false;
-
- /* Use readl() to guarantee 32-bit accesses */
- for (n = 0; n < read_cnt; n++)
- aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
-
- writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
- writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
-
- return true;
-}
-
-/* Get AER severity. Return false if there is no error. */
-static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
- int *severity)
-{
- if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
- if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
- *severity = AER_FATAL;
- else
- *severity = AER_NONFATAL;
- return true;
- }
-
- if (aer_regs->cor_status & ~aer_regs->cor_mask) {
- *severity = AER_CORRECTABLE;
- return true;
- }
-
- return false;
-}
-
-static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
-{
- struct pci_dev *pdev = to_pci_dev(cxlds->dev);
- struct aer_capability_regs aer_regs;
- struct cxl_dport *dport;
- int severity;
-
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return;
-
- if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
- return;
-
- if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
- return;
-
- pci_print_aer(pdev, severity, &aer_regs);
-
- if (severity == AER_CORRECTABLE)
- cxl_handle_rdport_cor_ras(cxlds, dport);
- else
- cxl_handle_rdport_ras(cxlds, dport);
-}
-
-#else
-static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
-#endif
-
-void cxl_cor_error_detected(struct pci_dev *pdev)
-{
- struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- struct device *dev = &cxlds->cxlmd->dev;
-
- scoped_guard(device, dev) {
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return;
- }
-
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
-
- cxl_handle_endpoint_cor_ras(cxlds);
- }
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
-
-pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state)
-{
- struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- struct cxl_memdev *cxlmd = cxlds->cxlmd;
- struct device *dev = &cxlmd->dev;
- bool ue;
-
- scoped_guard(device, dev) {
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return PCI_ERS_RESULT_DISCONNECT;
- }
-
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
- /*
- * A frozen channel indicates an impending reset which is fatal to
- * CXL.mem operation, and will likely crash the system. On the off
- * chance the situation is recoverable dump the status of the RAS
- * capability registers and bounce the active state of the memdev.
- */
- ue = cxl_handle_endpoint_ras(cxlds);
- }
-
-
- switch (state) {
- case pci_channel_io_normal:
- if (ue) {
- device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- }
- return PCI_ERS_RESULT_CAN_RECOVER;
- case pci_channel_io_frozen:
- dev_warn(&pdev->dev,
- "%s: frozen state error detected, disable CXL.mem\n",
- dev_name(dev));
- device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- case pci_channel_io_perm_failure:
- dev_warn(&pdev->dev,
- "failure state error detected, request disconnect\n");
- return PCI_ERS_RESULT_DISCONNECT;
- }
- return PCI_ERS_RESULT_NEED_RESET;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
-
static int cxl_flit_size(struct pci_dev *pdev)
{
if (cxl_pci_flit_256(pdev))
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 2731ba3a0799..c4f0fa7e40aa 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -5,6 +5,7 @@
#include <linux/aer.h>
#include <cxl/event.h>
#include <cxlmem.h>
+#include <cxlpci.h>
#include "trace.h"
static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
@@ -124,3 +125,313 @@ void cxl_ras_exit(void)
cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
cancel_work_sync(&cxl_cper_prot_err_work);
}
+
+static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
+{
+ resource_size_t aer_phys;
+ struct device *host;
+ u16 aer_cap;
+
+ aer_cap = cxl_rcrb_to_aer(dport->dport_dev, dport->rcrb.base);
+ if (aer_cap) {
+ host = dport->reg_map.host;
+ aer_phys = aer_cap + dport->rcrb.base;
+ dport->regs.dport_aer = devm_cxl_iomap_block(host, aer_phys,
+ sizeof(struct aer_capability_regs));
+ }
+}
+
+static void cxl_dport_map_ras(struct cxl_dport *dport)
+{
+ struct cxl_register_map *map = &dport->reg_map;
+ struct device *dev = dport->dport_dev;
+
+ if (!map->component_map.ras.valid)
+ dev_dbg(dev, "RAS registers not found\n");
+ else if (cxl_map_component_regs(map, &dport->regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_RAS)))
+ dev_dbg(dev, "Failed to map RAS capability.\n");
+}
+
+static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
+{
+ void __iomem *aer_base = dport->regs.dport_aer;
+ u32 aer_cmd_mask, aer_cmd;
+
+ if (!aer_base)
+ return;
+
+ /*
+ * Disable RCH root port command interrupts.
+ * CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors
+ *
+ * This sequence may not be necessary. CXL spec states disabling
+ * the root cmd register's interrupts is required. But, PCI spec
+ * shows these are disabled by default on reset.
+ */
+ aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN |
+ PCI_ERR_ROOT_CMD_NONFATAL_EN |
+ PCI_ERR_ROOT_CMD_FATAL_EN);
+ aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND);
+ aer_cmd &= ~aer_cmd_mask;
+ writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
+}
+
+
+/**
+ * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
+ * @dport: the cxl_dport that needs to be initialized
+ * @host: host device for devm operations
+ */
+void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
+{
+ dport->reg_map.host = host;
+ cxl_dport_map_ras(dport);
+
+ if (dport->rch) {
+ struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
+
+ if (!host_bridge->native_aer)
+ return;
+
+ cxl_dport_map_rch_aer(dport);
+ cxl_disable_rch_root_ints(dport);
+ }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
+
+static void __cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+{
+ void __iomem *addr;
+ u32 status;
+
+ if (!ras_base)
+ return;
+
+ addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
+ status = readl(addr);
+ if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
+ writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
+ trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
+ }
+}
+
+/* CXL spec rev3.0 8.2.4.16.1 */
+static void header_log_copy(void __iomem *ras_base, u32 *log)
+{
+ void __iomem *addr;
+ u32 *log_addr;
+ int i, log_u32_size = CXL_HEADERLOG_SIZE / sizeof(u32);
+
+ addr = ras_base + CXL_RAS_HEADER_LOG_OFFSET;
+ log_addr = log;
+
+ for (i = 0; i < log_u32_size; i++) {
+ *log_addr = readl(addr);
+ log_addr++;
+ addr += sizeof(u32);
+ }
+}
+
+static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
+ struct cxl_dport *dport)
+{
+ return __cxl_handle_cor_ras(cxlds, dport->regs.ras);
+}
+
+/*
+ * Log the state of the RAS status registers and prepare them to log the
+ * next error status. Return 1 if reset needed.
+ */
+static bool __cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+{
+ u32 hl[CXL_HEADERLOG_SIZE_U32];
+ void __iomem *addr;
+ u32 status;
+ u32 fe;
+
+ if (!ras_base)
+ return false;
+
+ addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
+ status = readl(addr);
+ if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
+ return false;
+
+ /* If multiple errors, log header points to first error from ctrl reg */
+ if (hweight32(status) > 1) {
+ void __iomem *rcc_addr =
+ ras_base + CXL_RAS_CAP_CONTROL_OFFSET;
+
+ fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
+ readl(rcc_addr)));
+ } else {
+ fe = status;
+ }
+
+ header_log_copy(ras_base, hl);
+ trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl);
+ writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
+
+ return true;
+}
+
+static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
+ struct cxl_dport *dport)
+{
+ return __cxl_handle_ras(cxlds, dport->regs.ras);
+}
+
+/*
+ * Copy the AER capability registers using 32 bit read accesses.
+ * This is necessary because RCRB AER capability is MMIO mapped. Clear the
+ * status after copying.
+ *
+ * @aer_base: base address of AER capability block in RCRB
+ * @aer_regs: destination for copying AER capability
+ */
+static bool cxl_rch_get_aer_info(void __iomem *aer_base,
+ struct aer_capability_regs *aer_regs)
+{
+ int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
+ u32 *aer_regs_buf = (u32 *)aer_regs;
+ int n;
+
+ if (!aer_base)
+ return false;
+
+ /* Use readl() to guarantee 32-bit accesses */
+ for (n = 0; n < read_cnt; n++)
+ aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
+
+ writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
+ writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
+
+ return true;
+}
+
+/* Get AER severity. Return false if there is no error. */
+static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
+ int *severity)
+{
+ if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
+ if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
+ *severity = AER_FATAL;
+ else
+ *severity = AER_NONFATAL;
+ return true;
+ }
+
+ if (aer_regs->cor_status & ~aer_regs->cor_mask) {
+ *severity = AER_CORRECTABLE;
+ return true;
+ }
+
+ return false;
+}
+
+static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
+{
+ struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+ struct aer_capability_regs aer_regs;
+ struct cxl_dport *dport;
+ int severity;
+
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return;
+
+ if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
+ return;
+
+ if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
+ return;
+
+ pci_print_aer(pdev, severity, &aer_regs);
+ if (severity == AER_CORRECTABLE)
+ cxl_handle_rdport_cor_ras(cxlds, dport);
+ else
+ cxl_handle_rdport_ras(cxlds, dport);
+}
+
+static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds)
+{
+ return __cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
+}
+
+static bool cxl_handle_endpoint_ras(struct cxl_dev_state *cxlds)
+{
+ return __cxl_handle_ras(cxlds, cxlds->regs.ras);
+}
+
+void cxl_cor_error_detected(struct pci_dev *pdev)
+{
+ struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+ struct device *dev = &cxlds->cxlmd->dev;
+
+ scoped_guard(device, dev) {
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return;
+ }
+
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+
+ cxl_handle_endpoint_cor_ras(cxlds);
+ }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
+
+pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+ struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+ struct cxl_memdev *cxlmd = cxlds->cxlmd;
+ struct device *dev = &cxlmd->dev;
+ bool ue;
+
+ scoped_guard(device, dev) {
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+ /*
+ * A frozen channel indicates an impending reset which is fatal to
+ * CXL.mem operation, and will likely crash the system. On the off
+ * chance the situation is recoverable dump the status of the RAS
+ * capability registers and bounce the active state of the memdev.
+ */
+ ue = cxl_handle_endpoint_ras(cxlds);
+ }
+
+
+ switch (state) {
+ case pci_channel_io_normal:
+ if (ue) {
+ device_release_driver(dev);
+ return PCI_ERS_RESULT_NEED_RESET;
+ }
+ return PCI_ERS_RESULT_CAN_RECOVER;
+ case pci_channel_io_frozen:
+ dev_warn(&pdev->dev,
+ "%s: frozen state error detected, disable CXL.mem\n",
+ dev_name(dev));
+ device_release_driver(dev);
+ return PCI_ERS_RESULT_NEED_RESET;
+ case pci_channel_io_perm_failure:
+ dev_warn(&pdev->dev,
+ "failure state error detected, request disconnect\n");
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+ return PCI_ERS_RESULT_NEED_RESET;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b7111e3568d0..8f6224ac6785 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -761,14 +761,6 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
struct device *dport_dev, int port_id,
resource_size_t rcrb);
-#ifdef CONFIG_PCIEAER_CXL
-void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport);
-void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
-#else
-static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
- struct device *host) { }
-#endif
-
struct cxl_decoder *to_cxl_decoder(struct device *dev);
struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 54e219b0049e..3959fa7e2ead 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -132,7 +132,23 @@ struct cxl_dev_state;
int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info);
void read_cdat_data(struct cxl_port *port);
+
+#ifdef CONFIG_CXL_RAS
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
+void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
+#else
+static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
+
+static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+ return PCI_ERS_RESULT_NONE;
+}
+
+static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
+ struct device *host) { }
+#endif
+
#endif /* __CXL_PCI_H__ */
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index d07f14cb7aa4..385301aeaeb3 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -61,12 +61,12 @@ cxl_core-y += $(CXL_CORE_SRC)/pci.o
cxl_core-y += $(CXL_CORE_SRC)/hdm.o
cxl_core-y += $(CXL_CORE_SRC)/pmu.o
cxl_core-y += $(CXL_CORE_SRC)/cdat.o
-cxl_core-y += $(CXL_CORE_SRC)/ras.o
cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
+cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras.o
cxl_core-y += config_check.o
cxl_core-y += cxl_core_test.o
cxl_core-y += cxl_core_exports.o
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 02/23] CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-08-27 1:35 ` [PATCH v11 01/23] cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-29 15:24 ` Jonathan Cameron
2025-08-29 18:16 ` Sathyanarayanan Kuppuswamy
2025-08-27 1:35 ` [PATCH v11 03/23] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
` (21 subsequent siblings)
23 siblings, 2 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
CXL RAS compilation is enabled using CONFIG_CXL_RAS while the AER CXL logic
uses CONFIG_PCIEAER_CXL. The 2 share the same dependencies and can be
combined. The 2 kernel configs are unnecessary and are problematic for the
user because of the duplication. Replace occurrences of CONFIG_PCIEAER_CXL
to be CONFIG_CXL_RAS.
Update the CONFIG_CXL_RAS Kconfig definition to include dependencies 'PCIEAER
&& CXL_PCI' taken from the CONFIG_PCIEAER_CXL definition.
Remove the Kconfig CONFIG_PCIEAER_CXL definition.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v10 -> v11:
- New patch
---
drivers/pci/pcie/Kconfig | 9 ---------
drivers/pci/pcie/aer.c | 2 +-
2 files changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 17919b99fa66..207c2deae35f 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -49,15 +49,6 @@ config PCIEAER_INJECT
gotten from:
https://github.com/intel/aer-inject.git
-config PCIEAER_CXL
- bool "PCI Express CXL RAS support"
- default y
- depends on PCIEAER && CXL_PCI
- help
- Enables CXL error handling.
-
- If unsure, say Y.
-
#
# PCI Express ECRC
#
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 70ac66188367..7fe9f883f5c5 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1086,7 +1086,7 @@ static bool find_source_device(struct pci_dev *parent,
return true;
}
-#ifdef CONFIG_PCIEAER_CXL
+#ifdef CONFIG_CXL_RAS
/**
* pci_aer_unmask_internal_errors - unmask internal errors
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 03/23] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-08-27 1:35 ` [PATCH v11 01/23] cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c Terry Bowman
2025-08-27 1:35 ` [PATCH v11 02/23] CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-28 15:28 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 04/23] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
` (20 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
The CXL driver's cxl_handle_endpoint_cor_ras()/cxl_handle_endpoint_ras()
are unnecessary helper functions used only for Endpoints. Remove these
functions as they are not common for all CXL devices and do not provide
value for EP handling.
Rename __cxl_handle_ras to cxl_handle_ras() and __cxl_handle_cor_ras()
to cxl_handle_cor_ras().
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
Changes in v10->v11:
- None
---
drivers/cxl/core/ras.c | 22 ++++++----------------
1 file changed, 6 insertions(+), 16 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index c4f0fa7e40aa..544a0d8773fa 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -200,7 +200,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
-static void __cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
{
void __iomem *addr;
u32 status;
@@ -236,14 +236,14 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
struct cxl_dport *dport)
{
- return __cxl_handle_cor_ras(cxlds, dport->regs.ras);
+ return cxl_handle_cor_ras(cxlds, dport->regs.ras);
}
/*
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
*/
-static bool __cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
{
u32 hl[CXL_HEADERLOG_SIZE_U32];
void __iomem *addr;
@@ -279,7 +279,7 @@ static bool __cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base
static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
struct cxl_dport *dport)
{
- return __cxl_handle_ras(cxlds, dport->regs.ras);
+ return cxl_handle_ras(cxlds, dport->regs.ras);
}
/*
@@ -355,16 +355,6 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
cxl_handle_rdport_ras(cxlds, dport);
}
-static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds)
-{
- return __cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
-}
-
-static bool cxl_handle_endpoint_ras(struct cxl_dev_state *cxlds)
-{
- return __cxl_handle_ras(cxlds, cxlds->regs.ras);
-}
-
void cxl_cor_error_detected(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
@@ -381,7 +371,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
if (cxlds->rcd)
cxl_handle_rdport_errors(cxlds);
- cxl_handle_endpoint_cor_ras(cxlds);
+ cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
}
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
@@ -410,7 +400,7 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
* chance the situation is recoverable dump the status of the RAS
* capability registers and bounce the active state of the memdev.
*/
- ue = cxl_handle_endpoint_ras(cxlds);
+ ue = cxl_handle_ras(cxlds, cxlds->regs.ras);
}
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 04/23] cxl/pci: Remove unnecessary CXL RCH handling helper functions
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (2 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 03/23] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-28 8:35 ` Alejandro Lucero Palau
2025-08-28 17:32 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 05/23] cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block Terry Bowman
` (19 subsequent siblings)
23 siblings, 2 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
cxl_handle_rdport_cor_ras() and cxl_handle_rdport_ras() are specific
to Restricted CXL Host (RCH) handling. Improve readability and
maintainability by replacing these and instead using the common
cxl_handle_cor_ras() and cxl_handle_ras() functions.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v10->v11:
- New patch
---
drivers/cxl/core/ras.c | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 544a0d8773fa..0875ce8116ff 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -233,12 +233,6 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
}
}
-static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
- struct cxl_dport *dport)
-{
- return cxl_handle_cor_ras(cxlds, dport->regs.ras);
-}
-
/*
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
@@ -276,12 +270,6 @@ static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
return true;
}
-static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
- struct cxl_dport *dport)
-{
- return cxl_handle_ras(cxlds, dport->regs.ras);
-}
-
/*
* Copy the AER capability registers using 32 bit read accesses.
* This is necessary because RCRB AER capability is MMIO mapped. Clear the
@@ -350,9 +338,9 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
pci_print_aer(pdev, severity, &aer_regs);
if (severity == AER_CORRECTABLE)
- cxl_handle_rdport_cor_ras(cxlds, dport);
+ cxl_handle_cor_ras(cxlds, dport->regs.ras);
else
- cxl_handle_rdport_ras(cxlds, dport);
+ cxl_handle_ras(cxlds, dport->regs.ras);
}
void cxl_cor_error_detected(struct pci_dev *pdev)
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 05/23] cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (3 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 04/23] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-28 8:57 ` Alejandro Lucero Palau
2025-08-29 15:33 ` Jonathan Cameron
2025-08-27 1:35 ` [PATCH v11 06/23] CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors Terry Bowman
` (18 subsequent siblings)
23 siblings, 2 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
Restricted CXL Host (RCH) protocol error handling uses a procedure distinct
from the CXL Virtual Hierarchy (VH) handling. This is because of the
differences in the RCH and VH topologies. Improve the maintainability and
add ability to enable/disable RCH handling.
Move and combine the RCH handling code into a single block conditionally
compiled with the CONFIG_CXL_RCH_RAS kernel config.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
v10->v11:
- New patch
---
drivers/cxl/core/ras.c | 175 +++++++++++++++++++++--------------------
1 file changed, 90 insertions(+), 85 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 0875ce8116ff..f42f9a255ef8 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -126,6 +126,7 @@ void cxl_ras_exit(void)
cancel_work_sync(&cxl_cper_prot_err_work);
}
+#ifdef CONFIG_CXL_RCH_RAS
static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
{
resource_size_t aer_phys;
@@ -141,18 +142,6 @@ static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
}
}
-static void cxl_dport_map_ras(struct cxl_dport *dport)
-{
- struct cxl_register_map *map = &dport->reg_map;
- struct device *dev = dport->dport_dev;
-
- if (!map->component_map.ras.valid)
- dev_dbg(dev, "RAS registers not found\n");
- else if (cxl_map_component_regs(map, &dport->regs.component,
- BIT(CXL_CM_CAP_CAP_ID_RAS)))
- dev_dbg(dev, "Failed to map RAS capability.\n");
-}
-
static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
{
void __iomem *aer_base = dport->regs.dport_aer;
@@ -177,6 +166,95 @@ static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
}
+/*
+ * Copy the AER capability registers using 32 bit read accesses.
+ * This is necessary because RCRB AER capability is MMIO mapped. Clear the
+ * status after copying.
+ *
+ * @aer_base: base address of AER capability block in RCRB
+ * @aer_regs: destination for copying AER capability
+ */
+static bool cxl_rch_get_aer_info(void __iomem *aer_base,
+ struct aer_capability_regs *aer_regs)
+{
+ int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
+ u32 *aer_regs_buf = (u32 *)aer_regs;
+ int n;
+
+ if (!aer_base)
+ return false;
+
+ /* Use readl() to guarantee 32-bit accesses */
+ for (n = 0; n < read_cnt; n++)
+ aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
+
+ writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
+ writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
+
+ return true;
+}
+
+/* Get AER severity. Return false if there is no error. */
+static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
+ int *severity)
+{
+ if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
+ if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
+ *severity = AER_FATAL;
+ else
+ *severity = AER_NONFATAL;
+ return true;
+ }
+
+ if (aer_regs->cor_status & ~aer_regs->cor_mask) {
+ *severity = AER_CORRECTABLE;
+ return true;
+ }
+
+ return false;
+}
+
+static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
+{
+ struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+ struct aer_capability_regs aer_regs;
+ struct cxl_dport *dport;
+ int severity;
+
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return;
+
+ if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
+ return;
+
+ if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
+ return;
+
+ pci_print_aer(pdev, severity, &aer_regs);
+ if (severity == AER_CORRECTABLE)
+ cxl_handle_cor_ras(cxlds, dport->regs.ras);
+ else
+ cxl_handle_ras(cxlds, dport->regs.ras);
+}
+#else
+static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
+static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
+static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
+#endif
+
+static void cxl_dport_map_ras(struct cxl_dport *dport)
+{
+ struct cxl_register_map *map = &dport->reg_map;
+ struct device *dev = dport->dport_dev;
+
+ if (!map->component_map.ras.valid)
+ dev_dbg(dev, "RAS registers not found\n");
+ else if (cxl_map_component_regs(map, &dport->regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_RAS)))
+ dev_dbg(dev, "Failed to map RAS capability.\n");
+}
/**
* cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
@@ -270,79 +348,6 @@ static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
return true;
}
-/*
- * Copy the AER capability registers using 32 bit read accesses.
- * This is necessary because RCRB AER capability is MMIO mapped. Clear the
- * status after copying.
- *
- * @aer_base: base address of AER capability block in RCRB
- * @aer_regs: destination for copying AER capability
- */
-static bool cxl_rch_get_aer_info(void __iomem *aer_base,
- struct aer_capability_regs *aer_regs)
-{
- int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
- u32 *aer_regs_buf = (u32 *)aer_regs;
- int n;
-
- if (!aer_base)
- return false;
-
- /* Use readl() to guarantee 32-bit accesses */
- for (n = 0; n < read_cnt; n++)
- aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
-
- writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
- writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
-
- return true;
-}
-
-/* Get AER severity. Return false if there is no error. */
-static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
- int *severity)
-{
- if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
- if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
- *severity = AER_FATAL;
- else
- *severity = AER_NONFATAL;
- return true;
- }
-
- if (aer_regs->cor_status & ~aer_regs->cor_mask) {
- *severity = AER_CORRECTABLE;
- return true;
- }
-
- return false;
-}
-
-static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
-{
- struct pci_dev *pdev = to_pci_dev(cxlds->dev);
- struct aer_capability_regs aer_regs;
- struct cxl_dport *dport;
- int severity;
-
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return;
-
- if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
- return;
-
- if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
- return;
-
- pci_print_aer(pdev, severity, &aer_regs);
- if (severity == AER_CORRECTABLE)
- cxl_handle_cor_ras(cxlds, dport->regs.ras);
- else
- cxl_handle_ras(cxlds, dport->regs.ras);
-}
-
void cxl_cor_error_detected(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 06/23] CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (4 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 05/23] cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-28 20:53 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
` (17 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
The restricted CXL Host (RCH) AER error handling logic currently resides
in the AER driver file, drivers/pci/pcie/aer.c. CXL specific changes are
conditionally compiled using #ifdefs.
Improve the AER driver maintainability by separating the RCH specific logic
from the AER driver's core functionality and removing the ifdefs. Introduce
drivers/pci/pcie/rch_aer.c for moving the RCH AER logic into.
Move the CXL logic into the new file but leave helper functions in aer.c
for now as they will be moved in future patch for CXL virtual hierarchy
handling.
2 changes are required to maintain compilation after the move. Change
cxl_rch_handle_error() & cxl_rch_enable_rcec() to be non-static inorder for
accessing from the AER driver in aer.c.
Introduce CONFIG_CXL_RCH_RAS in cxl/Kconfig. Update pcie/pcie/Makefile to
conditionally compile rch_aer.c file using CONFIG_CXL_RCH_RAS.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v10->v11:
- Remove changes in code-split and move to earlier, new patch
- Add #include <linux/bitfield.h> to cxl_ras.c
- Move cxl_rch_handle_error() & cxl_rch_enable_rcec() declarations from pci.h
to aer.h, more localized.
- Introduce CONFIG_CXL_RCH_RAS, includes Makefile changes, ras.c ifdef changes
---
drivers/cxl/Kconfig | 9 +++-
drivers/cxl/core/ras.c | 3 ++
drivers/pci/pci.h | 20 +++++++
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/aer.c | 108 +++----------------------------------
drivers/pci/pcie/rch_aer.c | 99 ++++++++++++++++++++++++++++++++++
6 files changed, 138 insertions(+), 102 deletions(-)
create mode 100644 drivers/pci/pcie/rch_aer.c
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 1c7c8989fd8b..028201e24523 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -235,5 +235,12 @@ config CXL_MCE
config CXL_RAS
def_bool y
- depends on ACPI_APEI_GHES && PCIEAER_CXL
+ depends on ACPI_APEI_GHES && PCIEAER && CXL_PCI
+
+config CXL_RCH_RAS
+ bool "CXL: Restricted CXL Host (RCH) protocol error handling"
+ def_bool n
+ depends on CXL_RAS
+ help
+ RAS support for Restricted CXL Host (RCH) defined in CXL1.1.
endif
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index f42f9a255ef8..c9f2f0335bfd 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -126,6 +126,9 @@ void cxl_ras_exit(void)
cancel_work_sync(&cxl_cper_prot_err_work);
}
+static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
+static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
+
#ifdef CONFIG_CXL_RCH_RAS
static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
{
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 12215ee72afb..c8a0c0ec0073 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1159,4 +1159,24 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
(PCI_CONF1_ADDRESS(bus, dev, func, reg) | \
PCI_CONF1_EXT_REG(reg))
+struct aer_err_info;
+
+#ifdef CONFIG_CXL_RCH_RAS
+void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info);
+void cxl_rch_enable_rcec(struct pci_dev *rcec);
+#else
+static inline void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { }
+static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
+#endif
+
+#ifdef CONFIG_CXL_RAS
+void pci_aer_unmask_internal_errors(struct pci_dev *dev);
+bool cxl_error_is_native(struct pci_dev *dev);
+bool is_internal_error(struct aer_err_info *info);
+#else
+static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
+static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
+static inline bool is_internal_error(struct aer_err_info *info) { return false; }
+#endif
+
#endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 173829aa02e6..07c299dbcdd7 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o bwctrl.o
obj-y += aspm.o
obj-$(CONFIG_PCIEAER) += aer.o err.o tlp.o
+obj-$(CONFIG_CXL_RCH_RAS) += rch_aer.o
obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
obj-$(CONFIG_PCIE_PME) += pme.o
obj-$(CONFIG_PCIE_DPC) += dpc.o
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 7fe9f883f5c5..29de7ee861f7 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1098,7 +1098,7 @@ static bool find_source_device(struct pci_dev *parent,
* Note: AER must be enabled and supported by the device which must be
* checked in advance, e.g. with pcie_aer_is_native().
*/
-static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
+void pci_aer_unmask_internal_errors(struct pci_dev *dev)
{
int aer = dev->aer_cap;
u32 mask;
@@ -1111,119 +1111,25 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
mask &= ~PCI_ERR_COR_INTERNAL;
pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
}
+EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL");
-static bool is_cxl_mem_dev(struct pci_dev *dev)
-{
- /*
- * The capability, status, and control fields in Device 0,
- * Function 0 DVSEC control the CXL functionality of the
- * entire device (CXL 3.0, 8.1.3).
- */
- if (dev->devfn != PCI_DEVFN(0, 0))
- return false;
-
- /*
- * CXL Memory Devices must have the 502h class code set (CXL
- * 3.0, 8.1.12.1).
- */
- if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
- return false;
-
- return true;
-}
-
-static bool cxl_error_is_native(struct pci_dev *dev)
+bool cxl_error_is_native(struct pci_dev *dev)
{
struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
return (pcie_ports_native || host->native_aer);
}
+EXPORT_SYMBOL_NS_GPL(cxl_error_is_native, "CXL");
-static bool is_internal_error(struct aer_err_info *info)
+bool is_internal_error(struct aer_err_info *info)
{
if (info->severity == AER_CORRECTABLE)
return info->status & PCI_ERR_COR_INTERNAL;
return info->status & PCI_ERR_UNC_INTN;
}
-
-static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
-{
- struct aer_err_info *info = (struct aer_err_info *)data;
- const struct pci_error_handlers *err_handler;
-
- if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
- return 0;
-
- /* Protect dev->driver */
- device_lock(&dev->dev);
-
- err_handler = dev->driver ? dev->driver->err_handler : NULL;
- if (!err_handler)
- goto out;
-
- if (info->severity == AER_CORRECTABLE) {
- if (err_handler->cor_error_detected)
- err_handler->cor_error_detected(dev);
- } else if (err_handler->error_detected) {
- if (info->severity == AER_NONFATAL)
- err_handler->error_detected(dev, pci_channel_io_normal);
- else if (info->severity == AER_FATAL)
- err_handler->error_detected(dev, pci_channel_io_frozen);
- }
-out:
- device_unlock(&dev->dev);
- return 0;
-}
-
-static void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
-{
- /*
- * Internal errors of an RCEC indicate an AER error in an
- * RCH's downstream port. Check and handle them in the CXL.mem
- * device driver.
- */
- if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
- is_internal_error(info))
- pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
-}
-
-static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
-{
- bool *handles_cxl = data;
-
- if (!*handles_cxl)
- *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
-
- /* Non-zero terminates iteration */
- return *handles_cxl;
-}
-
-static bool handles_cxl_errors(struct pci_dev *rcec)
-{
- bool handles_cxl = false;
-
- if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
- pcie_aer_is_native(rcec))
- pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
-
- return handles_cxl;
-}
-
-static void cxl_rch_enable_rcec(struct pci_dev *rcec)
-{
- if (!handles_cxl_errors(rcec))
- return;
-
- pci_aer_unmask_internal_errors(rcec);
- pci_info(rcec, "CXL: Internal errors unmasked");
-}
-
-#else
-static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { }
-static inline void cxl_rch_handle_error(struct pci_dev *dev,
- struct aer_err_info *info) { }
-#endif
+EXPORT_SYMBOL_NS_GPL(is_internal_error, "CXL");
+#endif /* CONFIG_CXL_RAS */
/**
* pci_aer_handle_error - handle logging error into an event log
diff --git a/drivers/pci/pcie/rch_aer.c b/drivers/pci/pcie/rch_aer.c
new file mode 100644
index 000000000000..bfe071eebf67
--- /dev/null
+++ b/drivers/pci/pcie/rch_aer.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2025 AMD Corporation. All rights reserved. */
+
+#include <linux/pci.h>
+#include <linux/aer.h>
+#include <linux/bitfield.h>
+#include "../pci.h"
+
+static bool is_cxl_mem_dev(struct pci_dev *dev)
+{
+ /*
+ * The capability, status, and control fields in Device 0,
+ * Function 0 DVSEC control the CXL functionality of the
+ * entire device (CXL 3.0, 8.1.3).
+ */
+ if (dev->devfn != PCI_DEVFN(0, 0))
+ return false;
+
+ /*
+ * CXL Memory Devices must have the 502h class code set (CXL
+ * 3.0, 8.1.12.1).
+ */
+ if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
+ return false;
+
+ return true;
+}
+
+static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
+{
+ struct aer_err_info *info = (struct aer_err_info *)data;
+ const struct pci_error_handlers *err_handler;
+
+ if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
+ return 0;
+
+ /* Protect dev->driver */
+ device_lock(&dev->dev);
+
+ err_handler = dev->driver ? dev->driver->err_handler : NULL;
+ if (!err_handler)
+ goto out;
+
+ if (info->severity == AER_CORRECTABLE) {
+ if (err_handler->cor_error_detected)
+ err_handler->cor_error_detected(dev);
+ } else if (err_handler->error_detected) {
+ if (info->severity == AER_NONFATAL)
+ err_handler->error_detected(dev, pci_channel_io_normal);
+ else if (info->severity == AER_FATAL)
+ err_handler->error_detected(dev, pci_channel_io_frozen);
+ }
+out:
+ device_unlock(&dev->dev);
+ return 0;
+}
+
+void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
+{
+ /*
+ * Internal errors of an RCEC indicate an AER error in an
+ * RCH's downstream port. Check and handle them in the CXL.mem
+ * device driver.
+ */
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
+ is_internal_error(info))
+ pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
+}
+
+static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
+{
+ bool *handles_cxl = data;
+
+ if (!*handles_cxl)
+ *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
+
+ /* Non-zero terminates iteration */
+ return *handles_cxl;
+}
+
+static bool handles_cxl_errors(struct pci_dev *rcec)
+{
+ bool handles_cxl = false;
+
+ if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
+ pcie_aer_is_native(rcec))
+ pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
+
+ return handles_cxl;
+}
+
+void cxl_rch_enable_rcec(struct pci_dev *rcec)
+{
+ if (!handles_cxl_errors(rcec))
+ return;
+
+ pci_aer_unmask_internal_errors(rcec);
+ pci_info(rcec, "CXL: Internal errors unmasked");
+}
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (5 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 06/23] CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 14:51 ` Lukas Wunner
2025-08-28 21:07 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 08/23] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
` (16 subsequent siblings)
23 siblings, 2 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
The CXL DVSECs are currently defined in cxl/core/cxlpci.h. These are not
accessible to other subsystems.
Change DVSEC name formatting to follow the existing PCI format in
pci_regs.h. The current format uses CXL_DVSEC_XYZ. Change to be PCI_DVSEC_CXL_XYZ.
Update existing occurrences to match the name change.
Update the inline documentation to refer to latest CXL spec version.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/pci.c | 62 +++++++++++++++++------------------
drivers/cxl/core/regs.c | 12 +++----
drivers/cxl/cxlpci.h | 53 ------------------------------
drivers/cxl/pci.c | 2 +-
drivers/pci/pci.c | 18 +++++-----
include/uapi/linux/pci_regs.h | 60 ++++++++++++++++++++++++++++++---
6 files changed, 104 insertions(+), 103 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index a3aef78f903a..d677691f8a05 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -110,19 +110,19 @@ static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
int rc, i;
u32 temp;
- if (id > CXL_DVSEC_RANGE_MAX)
+ if (id > PCI_DVSEC_CXL_RANGE_MAX)
return -EINVAL;
/* Check MEM INFO VALID bit first, give up after 1s */
i = 1;
do {
rc = pci_read_config_dword(pdev,
- d + CXL_DVSEC_RANGE_SIZE_LOW(id),
+ d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
&temp);
if (rc)
return rc;
- valid = FIELD_GET(CXL_DVSEC_MEM_INFO_VALID, temp);
+ valid = FIELD_GET(PCI_DVSEC_CXL_MEM_INFO_VALID, temp);
if (valid)
break;
msleep(1000);
@@ -146,17 +146,17 @@ static int cxl_dvsec_mem_range_active(struct cxl_dev_state *cxlds, int id)
int rc, i;
u32 temp;
- if (id > CXL_DVSEC_RANGE_MAX)
+ if (id > PCI_DVSEC_CXL_RANGE_MAX)
return -EINVAL;
/* Check MEM ACTIVE bit, up to 60s timeout by default */
for (i = media_ready_timeout; i; i--) {
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(id), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id), &temp);
if (rc)
return rc;
- active = FIELD_GET(CXL_DVSEC_MEM_ACTIVE, temp);
+ active = FIELD_GET(PCI_DVSEC_CXL_MEM_ACTIVE, temp);
if (active)
break;
msleep(1000);
@@ -185,11 +185,11 @@ int cxl_await_media_ready(struct cxl_dev_state *cxlds)
u16 cap;
rc = pci_read_config_word(pdev,
- d + CXL_DVSEC_CAP_OFFSET, &cap);
+ d + PCI_DVSEC_CXL_CAP_OFFSET, &cap);
if (rc)
return rc;
- hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap);
+ hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT_MASK, cap);
for (i = 0; i < hdm_count; i++) {
rc = cxl_dvsec_mem_range_valid(cxlds, i);
if (rc)
@@ -217,16 +217,16 @@ static int cxl_set_mem_enable(struct cxl_dev_state *cxlds, u16 val)
u16 ctrl;
int rc;
- rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
+ rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL_OFFSET, &ctrl);
if (rc < 0)
return rc;
- if ((ctrl & CXL_DVSEC_MEM_ENABLE) == val)
+ if ((ctrl & PCI_DVSEC_CXL_MEM_ENABLE) == val)
return 1;
- ctrl &= ~CXL_DVSEC_MEM_ENABLE;
+ ctrl &= ~PCI_DVSEC_CXL_MEM_ENABLE;
ctrl |= val;
- rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl);
+ rc = pci_write_config_word(pdev, d + PCI_DVSEC_CXL_CTRL_OFFSET, ctrl);
if (rc < 0)
return rc;
@@ -242,7 +242,7 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
{
int rc;
- rc = cxl_set_mem_enable(cxlds, CXL_DVSEC_MEM_ENABLE);
+ rc = cxl_set_mem_enable(cxlds, PCI_DVSEC_CXL_MEM_ENABLE);
if (rc < 0)
return rc;
if (rc > 0)
@@ -304,11 +304,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
return -ENXIO;
}
- rc = pci_read_config_word(pdev, d + CXL_DVSEC_CAP_OFFSET, &cap);
+ rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CAP_OFFSET, &cap);
if (rc)
return rc;
- if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
+ if (!(cap & PCI_DVSEC_CXL_MEM_CAPABLE)) {
dev_dbg(dev, "Not MEM Capable\n");
return -ENXIO;
}
@@ -319,7 +319,7 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
* driver is for a spec defined class code which must be CXL.mem
* capable, there is no point in continuing to enable CXL.mem.
*/
- hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap);
+ hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT_MASK, cap);
if (!hdm_count || hdm_count > 2)
return -EINVAL;
@@ -328,11 +328,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
* disabled, and they will remain moot after the HDM Decoder
* capability is enabled.
*/
- rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
+ rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL_OFFSET, &ctrl);
if (rc)
return rc;
- info->mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl);
+ info->mem_enabled = FIELD_GET(PCI_DVSEC_CXL_MEM_ENABLE, ctrl);
if (!info->mem_enabled)
return 0;
@@ -345,35 +345,35 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
return rc;
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_SIZE_HIGH(i), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i), &temp);
if (rc)
return rc;
size = (u64)temp << 32;
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(i), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(i), &temp);
if (rc)
return rc;
- size |= temp & CXL_DVSEC_MEM_SIZE_LOW_MASK;
+ size |= temp & PCI_DVSEC_CXL_MEM_SIZE_LOW_MASK;
if (!size) {
continue;
}
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_BASE_HIGH(i), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i), &temp);
if (rc)
return rc;
base = (u64)temp << 32;
rc = pci_read_config_dword(
- pdev, d + CXL_DVSEC_RANGE_BASE_LOW(i), &temp);
+ pdev, d + PCI_DVSEC_CXL_RANGE_BASE_LOW(i), &temp);
if (rc)
return rc;
- base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
+ base |= temp & PCI_DVSEC_CXL_MEM_BASE_LOW_MASK;
info->dvsec_range[ranges++] = (struct range) {
.start = base,
@@ -781,7 +781,7 @@ u16 cxl_gpf_get_dvsec(struct device *dev)
is_port = false;
dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
- is_port ? CXL_DVSEC_PORT_GPF : CXL_DVSEC_DEVICE_GPF);
+ is_port ? PCI_DVSEC_CXL_PORT_GPF : PCI_DVSEC_CXL_DEVICE_GPF);
if (!dvsec)
dev_warn(dev, "%s GPF DVSEC not present\n",
is_port ? "Port" : "Device");
@@ -797,14 +797,14 @@ static int update_gpf_port_dvsec(struct pci_dev *pdev, int dvsec, int phase)
switch (phase) {
case 1:
- offset = CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET;
- base = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK;
- scale = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK;
+ offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL_OFFSET;
+ base = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE_MASK;
+ scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE_MASK;
break;
case 2:
- offset = CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET;
- base = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK;
- scale = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK;
+ offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL_OFFSET;
+ base = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE_MASK;
+ scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE_MASK;
break;
default:
return -EINVAL;
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 5ca7b0eed568..fb70ffbba72d 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -271,10 +271,10 @@ EXPORT_SYMBOL_NS_GPL(cxl_map_device_regs, "CXL");
static bool cxl_decode_regblock(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi,
struct cxl_register_map *map)
{
- u8 reg_type = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK, reg_lo);
- int bar = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BIR_MASK, reg_lo);
+ u8 reg_type = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID_MASK, reg_lo);
+ int bar = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BIR_MASK, reg_lo);
u64 offset = ((u64)reg_hi << 32) |
- (reg_lo & CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK);
+ (reg_lo & PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW_MASK);
if (offset > pci_resource_len(pdev, bar)) {
dev_warn(&pdev->dev,
@@ -311,15 +311,15 @@ static int __cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_ty
};
regloc = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
- CXL_DVSEC_REG_LOCATOR);
+ PCI_DVSEC_CXL_REG_LOCATOR);
if (!regloc)
return -ENXIO;
pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, ®loc_size);
regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size);
- regloc += CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET;
- regblocks = (regloc_size - CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET) / 8;
+ regloc += PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1_OFFSET;
+ regblocks = (regloc_size - PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1_OFFSET) / 8;
for (i = 0; i < regblocks; i++, regloc += 8) {
u32 reg_lo, reg_hi;
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 3959fa7e2ead..ad24d81e9eaa 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -7,59 +7,6 @@
#define CXL_MEMORY_PROGIF 0x10
-/*
- * See section 8.1 Configuration Space Registers in the CXL 2.0
- * Specification. Names are taken straight from the specification with "CXL" and
- * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
- */
-#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
-
-/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
-#define CXL_DVSEC_PCIE_DEVICE 0
-#define CXL_DVSEC_CAP_OFFSET 0xA
-#define CXL_DVSEC_MEM_CAPABLE BIT(2)
-#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
-#define CXL_DVSEC_CTRL_OFFSET 0xC
-#define CXL_DVSEC_MEM_ENABLE BIT(2)
-#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
-#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
-#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
-#define CXL_DVSEC_MEM_ACTIVE BIT(1)
-#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
-#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
-#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
-#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
-
-#define CXL_DVSEC_RANGE_MAX 2
-
-/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
-#define CXL_DVSEC_FUNCTION_MAP 2
-
-/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */
-#define CXL_DVSEC_PORT_EXTENSIONS 3
-
-/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */
-#define CXL_DVSEC_PORT_GPF 4
-#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
-#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK GENMASK(3, 0)
-#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK GENMASK(11, 8)
-#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
-#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK GENMASK(3, 0)
-#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK GENMASK(11, 8)
-
-/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */
-#define CXL_DVSEC_DEVICE_GPF 5
-
-/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */
-#define CXL_DVSEC_PCIE_FLEXBUS_PORT 7
-
-/* CXL 2.0 8.1.9: Register Locator DVSEC */
-#define CXL_DVSEC_REG_LOCATOR 8
-#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
-#define CXL_DVSEC_REG_LOCATOR_BIR_MASK GENMASK(2, 0)
-#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
-#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
-
/*
* NOTE: Currently all the functions which are enabled for CXL require their
* vectors to be in the first 16. Use this as the default max.
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index bd100ac31672..bd95be1f3d5c 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -933,7 +933,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
cxlds->rcd = is_cxl_restricted(pdev);
cxlds->serial = pci_get_dsn(pdev);
cxlds->cxl_dvsec = pci_find_dvsec_capability(
- pdev, PCI_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
+ pdev, PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_DEVICE);
if (!cxlds->cxl_dvsec)
dev_warn(&pdev->dev,
"Device DVSEC not present, skip CXL.mem init\n");
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9e42090fb108..d775ed37a79b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5031,7 +5031,7 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
static u16 cxl_port_dvsec(struct pci_dev *dev)
{
return pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
- PCI_DVSEC_CXL_PORT);
+ PCI_DVSEC_CXL_PORT_EXT);
}
static bool cxl_sbr_masked(struct pci_dev *dev)
@@ -5043,7 +5043,9 @@ static bool cxl_sbr_masked(struct pci_dev *dev)
if (!dvsec)
return false;
- rc = pci_read_config_word(dev, dvsec + PCI_DVSEC_CXL_PORT_CTL, ®);
+ rc = pci_read_config_word(dev,
+ dvsec + PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET,
+ ®);
if (rc || PCI_POSSIBLE_ERROR(reg))
return false;
@@ -5052,7 +5054,7 @@ static bool cxl_sbr_masked(struct pci_dev *dev)
* bit in Bridge Control has no effect. When 1, the Port generates
* hot reset when the SBR bit is set to 1.
*/
- if (reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR)
+ if (reg & PCI_DVSEC_CXL_PORT_EXT_CTL_UNMASK_SBR)
return false;
return true;
@@ -5097,22 +5099,22 @@ static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
if (probe)
return 0;
- rc = pci_read_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL, ®);
+ rc = pci_read_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET, ®);
if (rc)
return -ENOTTY;
- if (reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR) {
+ if (reg & PCI_DVSEC_CXL_PORT_EXT_CTL_UNMASK_SBR) {
val = reg;
} else {
- val = reg | PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR;
- pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
+ val = reg | PCI_DVSEC_CXL_PORT_EXT_CTL_UNMASK_SBR;
+ pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET,
val);
}
rc = pci_reset_bus_function(dev, probe);
if (reg != val)
- pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
+ pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET,
reg);
return rc;
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index a3a3e942dedf..b03244d55aea 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1225,9 +1225,61 @@
/* Deprecated old name, replaced with PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE */
#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE
-/* Compute Express Link (CXL r3.1, sec 8.1.5) */
-#define PCI_DVSEC_CXL_PORT 3
-#define PCI_DVSEC_CXL_PORT_CTL 0x0c
-#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
+/* Compute Express Link (CXL r3.2, sec 8.1)
+ *
+ * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
+ * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
+ * registers on downstream link-up events.
+ */
+
+#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
+
+/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
+#define PCI_DVSEC_CXL_DEVICE 0
+#define PCI_DVSEC_CXL_CAP_OFFSET 0xA
+#define PCI_DVSEC_CXL_MEM_CAPABLE BIT(2)
+#define PCI_DVSEC_CXL_HDM_COUNT_MASK GENMASK(5, 4)
+#define PCI_DVSEC_CXL_CTRL_OFFSET 0xC
+#define PCI_DVSEC_CXL_MEM_ENABLE BIT(2)
+#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
+#define PCI_DVSEC_CXL_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
+#define PCI_DVSEC_CXL_MEM_INFO_VALID BIT(0)
+#define PCI_DVSEC_CXL_MEM_ACTIVE BIT(1)
+#define PCI_DVSEC_CXL_MEM_SIZE_LOW_MASK GENMASK(31, 28)
+#define PCI_DVSEC_CXL_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
+#define PCI_DVSEC_CXL_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
+#define PCI_DVSEC_CXL_MEM_BASE_LOW_MASK GENMASK(31, 28)
+
+#define PCI_DVSEC_CXL_RANGE_MAX 2
+
+/* CXL 3.2 8.1.4: Non-CXL Function Map DVSEC */
+#define PCI_DVSEC_CXL_FUNCTION_MAP 2
+
+/* CXL 3.2 8.1.5: Extensions DVSEC for Ports */
+#define PCI_DVSEC_CXL_PORT_EXT 3
+#define PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET 0x0c
+#define PCI_DVSEC_CXL_PORT_EXT_CTL_UNMASK_SBR 0x00000001
+
+/* CXL 3.2 8.1.6: GPF DVSEC for CXL Port */
+#define PCI_DVSEC_CXL_PORT_GPF 4
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE_MASK GENMASK(3, 0)
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE_MASK GENMASK(11, 8)
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE_MASK GENMASK(3, 0)
+#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE_MASK GENMASK(11, 8)
+
+/* CXL 3.2 8.1.7: GPF DVSEC for CXL Device */
+#define PCI_DVSEC_CXL_DEVICE_GPF 5
+
+/* CXL 3.2 8.1.8: PCIe DVSEC for Flex Bus Port */
+#define PCI_DVSEC_CXL_FLEXBUS_PORT 7
+
+/* CXL 3.2 8.1.9: Register Locator DVSEC */
+#define PCI_DVSEC_CXL_REG_LOCATOR 8
+#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1_OFFSET 0xC
+#define PCI_DVSEC_CXL_REG_LOCATOR_BIR_MASK GENMASK(2, 0)
+#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
+#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
#endif /* LINUX_PCI_REGS_H */
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 08/23] PCI/CXL: Introduce pcie_is_cxl()
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (6 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-28 8:18 ` Alejandro Lucero Palau
2025-08-27 1:35 ` [PATCH v11 09/23] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
` (15 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
CXL and AER drivers need the ability to identify CXL devices.
Introduce set_pcie_cxl() with logic checking for CXL.mem or CXL.cache
status in the CXL Flexbus DVSEC status register. The CXL Flexbus DVSEC
presence is used because it is required for all the CXL PCIe devices.[1]
Add boolean 'struct pci_dev::is_cxl' with the purpose to cache the CXL
CXL.cache and CXl.mem status.
In the case the device is an EP or USP, call set_pcie_cxl() on behalf of
the parent downstream device. This will make certain the correct state
is cached.
Add function pcie_is_cxl() to return 'struct pci_dev::is_cxl'.
[1] CXL 3.1 Spec, 8.1.1 PCIe Designated Vendor-Specific Extended
Capability (DVSEC) ID Assignment, Table 8-2
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
Changes in v10->v11:
- Amended set_pcie_cxl() to check for Upstream Port's and EP's parent
downstream port by calling set_pcie_cxl(). (Dan)
- Retitle patch: 'Add' -> 'Introduce'
- Add check for CXL.mem and CXL.cache (Alejandro, Dan)
---
drivers/pci/probe.c | 25 +++++++++++++++++++++++++
include/linux/pci.h | 6 ++++++
include/uapi/linux/pci_regs.h | 3 +++
3 files changed, 34 insertions(+)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 4b8693ec9e4c..b08cd0346136 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1691,6 +1691,29 @@ static void set_pcie_thunderbolt(struct pci_dev *dev)
dev->is_thunderbolt = 1;
}
+static void set_pcie_cxl(struct pci_dev *dev)
+{
+ struct pci_dev *parent;
+ u16 dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_FLEXBUS_PORT);
+ if (dvsec) {
+ u16 cap;
+
+ pci_read_config_word(dev, dvsec + PCI_DVSEC_CXL_FLEXBUS_STATUS_OFFSET, &cap);
+
+ dev->is_cxl = FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_STATUS_CACHE_MASK, cap) ||
+ FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_STATUS_MEM_MASK, cap);
+ }
+
+ if (!pci_is_pcie(dev) ||
+ !(pci_pcie_type(dev) == PCI_EXP_TYPE_ENDPOINT ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM))
+ return;
+
+ parent = pci_upstream_bridge(dev);
+ set_pcie_cxl(parent);
+}
+
static void set_pcie_untrusted(struct pci_dev *dev)
{
struct pci_dev *parent = pci_upstream_bridge(dev);
@@ -2021,6 +2044,8 @@ int pci_setup_device(struct pci_dev *dev)
/* Need to have dev->cfg_size ready */
set_pcie_thunderbolt(dev);
+ set_pcie_cxl(dev);
+
set_pcie_untrusted(dev);
if (pci_is_pcie(dev))
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 05e68f35f392..79878243b681 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -453,6 +453,7 @@ struct pci_dev {
unsigned int is_hotplug_bridge:1;
unsigned int shpc_managed:1; /* SHPC owned by shpchp */
unsigned int is_thunderbolt:1; /* Thunderbolt controller */
+ unsigned int is_cxl:1; /* Compute Express Link (CXL) */
/*
* Devices marked being untrusted are the ones that can potentially
* execute DMA attacks and similar. They are typically connected
@@ -744,6 +745,11 @@ static inline bool pci_is_vga(struct pci_dev *pdev)
return false;
}
+static inline bool pcie_is_cxl(struct pci_dev *pci_dev)
+{
+ return pci_dev->is_cxl;
+}
+
#define for_each_pci_bridge(dev, bus) \
list_for_each_entry(dev, &bus->devices, bus_list) \
if (!pci_is_bridge(dev)) {} else
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index b03244d55aea..252c06402b13 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1274,6 +1274,9 @@
/* CXL 3.2 8.1.8: PCIe DVSEC for Flex Bus Port */
#define PCI_DVSEC_CXL_FLEXBUS_PORT 7
+#define PCI_DVSEC_CXL_FLEXBUS_STATUS_OFFSET 0xE
+#define PCI_DVSEC_CXL_FLEXBUS_STATUS_CACHE_MASK BIT(0)
+#define PCI_DVSEC_CXL_FLEXBUS_STATUS_MEM_MASK BIT(2)
/* CXL 3.2 8.1.9: Register Locator DVSEC */
#define PCI_DVSEC_CXL_REG_LOCATOR 8
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 09/23] PCI/AER: Report CXL or PCIe bus error type in trace logging
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (7 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 08/23] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 7:37 ` Lukas Wunner
2025-08-27 1:35 ` [PATCH v11 10/23] CXL/AER: Update PCI class code check to use FIELD_GET() Terry Bowman
` (14 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
The AER service driver and aer_event tracing currently log 'PCIe Bus Type'
for all errors. Update the driver and aer_event tracing to log 'CXL Bus
Type' for CXL device errors.
This requires the AER can identify and distinguish between PCIe errors and
CXL errors.
Introduce boolean 'is_cxl' to 'struct aer_err_info'. Add assignment in
aer_get_device_error_info() and pci_print_aer().
Update the aer_event trace routine to accept a bus type string parameter.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
Changes in v10->v11:
- Remove duplicate call to trace_aer_event() (Shiju)
- Added Dan William's and Dave Jiang's reviewed-by
---
drivers/pci/pci.h | 6 ++++++
drivers/pci/pcie/aer.c | 18 ++++++++++++------
include/ras/ras_event.h | 9 ++++++---
3 files changed, 24 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c8a0c0ec0073..417a088d815f 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -608,6 +608,7 @@ struct aer_err_info {
int ratelimit_print[AER_MAX_MULTI_ERR_DEVICES];
int error_dev_num;
const char *level; /* printk level */
+ bool is_cxl;
unsigned int id:16;
@@ -628,6 +629,11 @@ struct aer_err_info {
int aer_get_device_error_info(struct aer_err_info *info, int i);
void aer_print_error(struct aer_err_info *info, int i);
+static inline const char *aer_err_bus(struct aer_err_info *info)
+{
+ return info->is_cxl ? "CXL" : "PCIe";
+}
+
int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
unsigned int tlp_len, bool flit,
struct pcie_tlp_log *log);
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 29de7ee861f7..1b5f5b0cdc4f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -837,6 +837,7 @@ void aer_print_error(struct aer_err_info *info, int i)
struct pci_dev *dev;
int layer, agent, id;
const char *level = info->level;
+ const char *bus_type = aer_err_bus(info);
if (WARN_ON_ONCE(i >= AER_MAX_MULTI_ERR_DEVICES))
return;
@@ -845,23 +846,23 @@ void aer_print_error(struct aer_err_info *info, int i)
id = pci_dev_id(dev);
pci_dev_aer_stats_incr(dev, info);
- trace_aer_event(pci_name(dev), (info->status & ~info->mask),
+ trace_aer_event(pci_name(dev), bus_type, (info->status & ~info->mask),
info->severity, info->tlp_header_valid, &info->tlp);
if (!info->ratelimit_print[i])
return;
if (!info->status) {
- pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
- aer_error_severity_string[info->severity]);
+ pci_err(dev, "%s Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
+ bus_type, aer_error_severity_string[info->severity]);
goto out;
}
layer = AER_GET_LAYER_ERROR(info->severity, info->status);
agent = AER_GET_AGENT(info->severity, info->status);
- aer_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
- aer_error_severity_string[info->severity],
+ aer_printk(level, dev, "%s Bus Error: severity=%s, type=%s, (%s)\n",
+ bus_type, aer_error_severity_string[info->severity],
aer_error_layer[layer], aer_agent_string[agent]);
aer_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
@@ -895,6 +896,7 @@ EXPORT_SYMBOL_GPL(cper_severity_to_aer);
void pci_print_aer(struct pci_dev *dev, int aer_severity,
struct aer_capability_regs *aer)
{
+ const char *bus_type;
int layer, agent, tlp_header_valid = 0;
u32 status, mask;
struct aer_err_info info = {
@@ -915,9 +917,12 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
info.status = status;
info.mask = mask;
+ info.is_cxl = pcie_is_cxl(dev);
+
+ bus_type = aer_err_bus(&info);
pci_dev_aer_stats_incr(dev, &info);
- trace_aer_event(pci_name(dev), (status & ~mask),
+ trace_aer_event(pci_name(dev), bus_type, (status & ~mask),
aer_severity, tlp_header_valid, &aer->header_log);
if (!aer_ratelimit(dev, info.severity))
@@ -1277,6 +1282,7 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
/* Must reset in this function */
info->status = 0;
info->tlp_header_valid = 0;
+ info->is_cxl = pcie_is_cxl(dev);
/* The device might not support AER */
if (!aer)
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 14c9f943d53f..080829d59c36 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -297,15 +297,17 @@ TRACE_EVENT(non_standard_event,
TRACE_EVENT(aer_event,
TP_PROTO(const char *dev_name,
+ const char *bus_type,
const u32 status,
const u8 severity,
const u8 tlp_header_valid,
struct pcie_tlp_log *tlp),
- TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp),
+ TP_ARGS(dev_name, bus_type, status, severity, tlp_header_valid, tlp),
TP_STRUCT__entry(
__string( dev_name, dev_name )
+ __string( bus_type, bus_type )
__field( u32, status )
__field( u8, severity )
__field( u8, tlp_header_valid)
@@ -314,6 +316,7 @@ TRACE_EVENT(aer_event,
TP_fast_assign(
__assign_str(dev_name);
+ __assign_str(bus_type);
__entry->status = status;
__entry->severity = severity;
__entry->tlp_header_valid = tlp_header_valid;
@@ -325,8 +328,8 @@ TRACE_EVENT(aer_event,
}
),
- TP_printk("%s PCIe Bus Error: severity=%s, %s, TLP Header=%s\n",
- __get_str(dev_name),
+ TP_printk("%s %s Bus Error: severity=%s, %s, TLP Header=%s\n",
+ __get_str(dev_name), __get_str(bus_type),
__entry->severity == AER_CORRECTABLE ? "Corrected" :
__entry->severity == AER_FATAL ?
"Fatal" : "Uncorrected, non-fatal",
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 10/23] CXL/AER: Update PCI class code check to use FIELD_GET()
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (8 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 09/23] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-29 16:03 ` Jonathan Cameron
2025-08-27 1:35 ` [PATCH v11 11/23] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
` (13 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
Update the AER driver's is_cxl_mem_dev() to use FIELD_GET() while checking
for a CXL Endpoint class code.
Introduce a genmask bitmask for checking PCI class codes and locate in
include/uapi/linux/pci_regs.h.
Update the function documentation to reference the latest CXL
specification.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v10->v11:
- Add #include <linux/bitfield.h> to cxl_ras.c
- Removed line wrapping at "(CXL 3.2, 8.1.12.1)".
---
drivers/pci/pcie/aer.c | 1 +
drivers/pci/pcie/rch_aer.c | 6 +++---
include/uapi/linux/pci_regs.h | 2 ++
3 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 1b5f5b0cdc4f..ed1de9256898 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -30,6 +30,7 @@
#include <linux/kfifo.h>
#include <linux/ratelimit.h>
#include <linux/slab.h>
+#include <linux/bitfield.h>
#include <acpi/apei.h>
#include <acpi/ghes.h>
#include <ras/ras_event.h>
diff --git a/drivers/pci/pcie/rch_aer.c b/drivers/pci/pcie/rch_aer.c
index bfe071eebf67..c3e2d4cbe8cc 100644
--- a/drivers/pci/pcie/rch_aer.c
+++ b/drivers/pci/pcie/rch_aer.c
@@ -17,10 +17,10 @@ static bool is_cxl_mem_dev(struct pci_dev *dev)
return false;
/*
- * CXL Memory Devices must have the 502h class code set (CXL
- * 3.0, 8.1.12.1).
+ * CXL Memory Devices must have the 502h class code set
+ * (CXL 3.2, 8.1.12.1).
*/
- if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
+ if (FIELD_GET(PCI_CLASS_CODE_MASK, dev->class) != PCI_CLASS_MEMORY_CXL)
return false;
return true;
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 252c06402b13..c7b635f6cf36 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -73,6 +73,8 @@
#define PCI_CLASS_PROG 0x09 /* Reg. Level Programming Interface */
#define PCI_CLASS_DEVICE 0x0a /* Device class */
+#define PCI_CLASS_CODE_MASK __GENMASK(23, 8)
+
#define PCI_CACHE_LINE_SIZE 0x0c /* 8 bits */
#define PCI_LATENCY_TIMER 0x0d /* 8 bits */
#define PCI_HEADER_TYPE 0x0e /* 8 bits */
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 11/23] cxl/pci: Update RAS handler interfaces to also support CXL Ports
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (9 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 10/23] CXL/AER: Update PCI class code check to use FIELD_GET() Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 1:35 ` [PATCH v11 12/23] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
` (12 subsequent siblings)
23 siblings, 0 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
CXL PCIe Port Protocol Error handling support will be added to the
CXL drivers in the future. In preparation, rename the existing
interfaces to support handling all CXL PCIe Port Protocol Errors.
The driver's RAS support functions currently rely on a 'struct
cxl_dev_state' type parameter, which is not available for CXL Port
devices. However, since the same CXL RAS capability structure is
needed across most CXL components and devices, a common handling
approach should be adopted.
To accommodate this, update the __cxl_handle_cor_ras() and
__cxl_handle_ras() functions to use a `struct device` instead of
`struct cxl_dev_state`.
No functional changes are introduced.
[1] CXL 3.1 Spec, 8.2.4 CXL.cache and CXL.mem Registers
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
Changes in v10->v11:
- None
---
drivers/cxl/core/ras.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index c9f2f0335bfd..f65557e7bfa6 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -126,8 +126,8 @@ void cxl_ras_exit(void)
cancel_work_sync(&cxl_cper_prot_err_work);
}
-static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
-static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
+static bool cxl_handle_ras(struct device *dev, void __iomem *ras_base);
+static void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);
#ifdef CONFIG_CXL_RCH_RAS
static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
@@ -237,9 +237,9 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
pci_print_aer(pdev, severity, &aer_regs);
if (severity == AER_CORRECTABLE)
- cxl_handle_cor_ras(cxlds, dport->regs.ras);
+ cxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras);
else
- cxl_handle_ras(cxlds, dport->regs.ras);
+ cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras);
}
#else
static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
@@ -281,7 +281,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
-static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+static void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
{
void __iomem *addr;
u32 status;
@@ -293,7 +293,7 @@ static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_ba
status = readl(addr);
if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
+ trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
}
}
@@ -318,7 +318,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
*/
-static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
+static bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
{
u32 hl[CXL_HEADERLOG_SIZE_U32];
void __iomem *addr;
@@ -345,7 +345,7 @@ static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
}
header_log_copy(ras_base, hl);
- trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl);
+ trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
return true;
@@ -367,7 +367,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
if (cxlds->rcd)
cxl_handle_rdport_errors(cxlds);
- cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
+ cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
}
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
@@ -396,7 +396,7 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
* chance the situation is recoverable dump the status of the RAS
* capability registers and bounce the active state of the memdev.
*/
- ue = cxl_handle_ras(cxlds, cxlds->regs.ras);
+ ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
}
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 12/23] cxl/pci: Log message if RAS registers are unmapped
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (10 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 11/23] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 1:35 ` [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
` (11 subsequent siblings)
23 siblings, 0 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
The CXL RAS handlers do not currently log if the RAS registers are
unmapped. This is needed in order to help debug CXL error handling. Update
the CXL driver to log a warning message if the RAS register block is
unmapped during RAS error handling.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
Changes v10->v11:
- Added Dave Jiang review-by
---
drivers/cxl/core/ras.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index f65557e7bfa6..3454cf1a118d 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -286,8 +286,10 @@ static void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
void __iomem *addr;
u32 status;
- if (!ras_base)
+ if (!ras_base) {
+ dev_warn_once(dev, "CXL RAS register block is not mapped");
return;
+ }
addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
status = readl(addr);
@@ -325,8 +327,10 @@ static bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
u32 status;
u32 fe;
- if (!ras_base)
+ if (!ras_base) {
+ dev_warn_once(dev, "CXL RAS register block is not mapped");
return false;
+ }
addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
status = readl(addr);
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (11 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 12/23] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 11:55 ` Shiju Jose
2025-08-27 1:35 ` [PATCH v11 14/23] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
` (10 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
CXL currently has separate trace routines for CXL Port errors and CXL
Endpoint errors. This is inconvenient for the user because they must enable
2 sets of trace routines. Make updates to the trace logging such that a
single trace routine logs both CXL Endpoint and CXL Port protocol errors.
Keep the trace log fields 'memdev' and 'host'. While these are not accurate
for non-Endpoints the fields will remain as-is to prevent breaking
userspace RAS trace consumers.
Add serial number parameter to the trace logging. This is used for EPs
and 0 is provided for CXL port devices without a serial number.
Leave the correctable and uncorrectable trace routines' TP_STRUCT__entry()
unchanged with respect to member data types and order.
Below is output of correctable and uncorrectable protocol error logging.
CXL Root Port and CXL Endpoint examples are included below.
Root Port:
cxl_aer_correctable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0 status='CRC Threshold Hit'
cxl_aer_uncorrectable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0 status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Endpoint:
cxl_aer_correctable_error: memdev=mem3 host=0000:0f:00.0 serial=0 status='CRC Threshold Hit'
cxl_aer_uncorrectable_error: memdev=mem3 host=0000:0f:00.0 serial: 0 status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v10->v11:
- Updated CE and UCE trace routines to maintain consistent TP_Struct ABI
and unchanged TP_printk() logging.
---
drivers/cxl/core/ras.c | 35 +++++++++++----------
drivers/cxl/core/trace.h | 68 +++++++---------------------------------
2 files changed, 30 insertions(+), 73 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 3454cf1a118d..fda3b0a64dab 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -13,7 +13,7 @@ static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
{
u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
- trace_cxl_port_aer_correctable_error(&pdev->dev, status);
+ trace_cxl_aer_correctable_error(&pdev->dev, status, 0);
}
static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,
@@ -28,8 +28,8 @@ static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,
else
fe = status;
- trace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,
- ras_cap.header_log);
+ trace_cxl_aer_uncorrectable_error(&pdev->dev, status, fe,
+ ras_cap.header_log, 0);
}
static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd,
@@ -37,7 +37,8 @@ static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd,
{
u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
- trace_cxl_aer_correctable_error(cxlmd, status);
+ trace_cxl_aer_correctable_error(&cxlmd->dev, cxlmd->cxlds->serial,
+ status);
}
static void
@@ -45,6 +46,7 @@ cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,
struct cxl_ras_capability_regs ras_cap)
{
u32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
u32 fe;
if (hweight32(status) > 1)
@@ -53,8 +55,9 @@ cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,
else
fe = status;
- trace_cxl_aer_uncorrectable_error(cxlmd, status, fe,
- ras_cap.header_log);
+ trace_cxl_aer_uncorrectable_error(&cxlmd->dev, status, fe,
+ ras_cap.header_log,
+ cxlds->serial);
}
static int match_memdev_by_parent(struct device *dev, const void *uport)
@@ -126,8 +129,8 @@ void cxl_ras_exit(void)
cancel_work_sync(&cxl_cper_prot_err_work);
}
-static bool cxl_handle_ras(struct device *dev, void __iomem *ras_base);
-static void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);
+static bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
+static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base);
#ifdef CONFIG_CXL_RCH_RAS
static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
@@ -237,9 +240,9 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
pci_print_aer(pdev, severity, &aer_regs);
if (severity == AER_CORRECTABLE)
- cxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras);
+ cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial, dport->regs.ras);
else
- cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras);
+ cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial, dport->regs.ras);
}
#else
static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
@@ -281,7 +284,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
-static void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
+static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
void __iomem *addr;
u32 status;
@@ -295,7 +298,7 @@ static void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
status = readl(addr);
if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
+ trace_cxl_aer_correctable_error(dev, status, serial);
}
}
@@ -320,7 +323,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
*/
-static bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
+static bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
u32 hl[CXL_HEADERLOG_SIZE_U32];
void __iomem *addr;
@@ -349,7 +352,7 @@ static bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
}
header_log_copy(ras_base, hl);
- trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
+ trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
return true;
@@ -371,7 +374,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
if (cxlds->rcd)
cxl_handle_rdport_errors(cxlds);
- cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
+ cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras);
}
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
@@ -400,7 +403,7 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
* chance the situation is recoverable dump the status of the RAS
* capability registers and bounce the active state of the memdev.
*/
- ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
+ ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras);
}
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index a53ec4798b12..60b49beb5e3f 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -48,40 +48,13 @@
{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" } \
)
-TRACE_EVENT(cxl_port_aer_uncorrectable_error,
- TP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl),
- TP_ARGS(dev, status, fe, hl),
- TP_STRUCT__entry(
- __string(device, dev_name(dev))
- __string(host, dev_name(dev->parent))
- __field(u32, status)
- __field(u32, first_error)
- __array(u32, header_log, CXL_HEADERLOG_SIZE_U32)
- ),
- TP_fast_assign(
- __assign_str(device);
- __assign_str(host);
- __entry->status = status;
- __entry->first_error = fe;
- /*
- * Embed the 512B headerlog data for user app retrieval and
- * parsing, but no need to print this in the trace buffer.
- */
- memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);
- ),
- TP_printk("device=%s host=%s status: '%s' first_error: '%s'",
- __get_str(device), __get_str(host),
- show_uc_errs(__entry->status),
- show_uc_errs(__entry->first_error)
- )
-);
-
TRACE_EVENT(cxl_aer_uncorrectable_error,
- TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),
- TP_ARGS(cxlmd, status, fe, hl),
+ TP_PROTO(const struct device *cxlmd, u32 status, u32 fe, u32 *hl,
+ u64 serial),
+ TP_ARGS(cxlmd, status, fe, hl, serial),
TP_STRUCT__entry(
- __string(memdev, dev_name(&cxlmd->dev))
- __string(host, dev_name(cxlmd->dev.parent))
+ __string(memdev, dev_name(cxlmd))
+ __string(host, dev_name(cxlmd->parent))
__field(u64, serial)
__field(u32, status)
__field(u32, first_error)
@@ -90,7 +63,7 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
TP_fast_assign(
__assign_str(memdev);
__assign_str(host);
- __entry->serial = cxlmd->cxlds->serial;
+ __entry->serial = serial;
__entry->status = status;
__entry->first_error = fe;
/*
@@ -124,38 +97,19 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
{ CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" } \
)
-TRACE_EVENT(cxl_port_aer_correctable_error,
- TP_PROTO(struct device *dev, u32 status),
- TP_ARGS(dev, status),
- TP_STRUCT__entry(
- __string(device, dev_name(dev))
- __string(host, dev_name(dev->parent))
- __field(u32, status)
- ),
- TP_fast_assign(
- __assign_str(device);
- __assign_str(host);
- __entry->status = status;
- ),
- TP_printk("device=%s host=%s status='%s'",
- __get_str(device), __get_str(host),
- show_ce_errs(__entry->status)
- )
-);
-
TRACE_EVENT(cxl_aer_correctable_error,
- TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
- TP_ARGS(cxlmd, status),
+ TP_PROTO(const struct device *cxlmd, u32 status, u64 serial),
+ TP_ARGS(cxlmd, status, serial),
TP_STRUCT__entry(
- __string(memdev, dev_name(&cxlmd->dev))
- __string(host, dev_name(cxlmd->dev.parent))
+ __string(memdev, dev_name(cxlmd))
+ __string(host, dev_name(cxlmd->parent))
__field(u64, serial)
__field(u32, status)
),
TP_fast_assign(
__assign_str(memdev);
__assign_str(host);
- __entry->serial = cxlmd->cxlds->serial;
+ __entry->serial = serial;
__entry->status = status;
),
TP_printk("memdev=%s host=%s serial=%lld: status: '%s'",
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 14/23] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (12 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 1:35 ` [PATCH v11 15/23] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
` (9 subsequent siblings)
23 siblings, 0 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
Update cxl_handle_cor_ras() to exit early in the case there is no RAS
errors detected after applying the status mask. This change will make
the correctable handler's implementation consistent with the uncorrectable
handler, cxl_handle_ras().
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
Changes v10->v11:
- Added Dave Jiang and Jonathan Cameron's review-by
- Changes moved to core/ras.c
---
drivers/cxl/core/ras.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index fda3b0a64dab..69559043b772 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -296,10 +296,11 @@ static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras
addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
status = readl(addr);
- if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
- writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- trace_cxl_aer_correctable_error(dev, status, serial);
- }
+ if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK))
+ return;
+ writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
+
+ trace_cxl_aer_correctable_error(dev, status, serial);
}
/* CXL spec rev3.0 8.2.4.16.1 */
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 15/23] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (13 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 14/23] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-28 23:05 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 16/23] cxl/pci: Introduce CXL Endpoint protocol error handlers Terry Bowman
` (8 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
CXL Endpoint (EP) Ports may include Root Ports (RP) or Downstream Switch
Ports (DSP). CXL RPs and DSPs contain RAS registers that require memory
mapping to enable RAS logging. This initialization is currently missing and
must be added for CXL RPs and DSPs.
Update cxl_dport_init_ras_reporting() to support RP and DSP RAS mapping.
Add alongside the existing Restricted CXL Host Downstream Port RAS mapping.
Update cxl_endpoint_port_probe() to invoke cxl_dport_init_ras_reporting().
This will initiate the RAS mapping for CXL RPs and DSPs when each CXL EP is
created and added to the EP port.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v10->v11:
- Use local pointer for readability in cxl_switch_port_init_ras() (Jonathan Cameron)
- Rename port to be ep in cxl_endpoint_port_init_ras() (Dave Jiang)
- Rename dport to be parent_dport in cxl_endpoint_port_init_ras()
and cxl_switch_port_init_ras() (Dave Jiang)
- Port helper changes were in cxl/port.c, now in core/ras.c (Dave Jiang)
---
drivers/cxl/core/core.h | 7 ++++++
drivers/cxl/core/ras.c | 47 +++++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 2 ++
drivers/cxl/cxlpci.h | 4 ----
drivers/cxl/mem.c | 4 +++-
drivers/cxl/port.c | 5 +++++
6 files changed, 64 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 2c81a43d7b05..2fa76a913264 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -146,6 +146,9 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
#ifdef CONFIG_CXL_RAS
int cxl_ras_init(void);
void cxl_ras_exit(void);
+void cxl_switch_port_init_ras(struct cxl_port *port);
+void cxl_endpoint_port_init_ras(struct cxl_port *ep);
+void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
#else
static inline int cxl_ras_init(void)
{
@@ -155,6 +158,10 @@ static inline int cxl_ras_init(void)
static inline void cxl_ras_exit(void)
{
}
+static inline void cxl_switch_port_init_ras(struct cxl_port *port) { }
+static inline void cxl_endpoint_port_init_ras(struct cxl_port *ep) { }
+static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
+ struct device *host) { }
#endif // CONFIG_CXL_RAS
int cxl_gpf_port_setup(struct cxl_dport *dport);
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 69559043b772..42b6e0b092d5 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -284,6 +284,53 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
+static void cxl_uport_init_ras_reporting(struct cxl_port *port,
+ struct device *host)
+{
+ struct cxl_register_map *map = &port->reg_map;
+
+ map->host = host;
+ if (cxl_map_component_regs(map, &port->uport_regs,
+ BIT(CXL_CM_CAP_CAP_ID_RAS)))
+ dev_dbg(&port->dev, "Failed to map RAS capability\n");
+}
+
+void cxl_switch_port_init_ras(struct cxl_port *port)
+{
+ struct cxl_dport *parent_dport = port->parent_dport;
+
+ if (is_cxl_root(to_cxl_port(port->dev.parent)))
+ return;
+
+ /* May have parent DSP or RP */
+ if (parent_dport && dev_is_pci(parent_dport->dport_dev)) {
+ struct pci_dev *pdev = to_pci_dev(parent_dport->dport_dev);
+
+ if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) ||
+ (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM))
+ cxl_dport_init_ras_reporting(parent_dport, &port->dev);
+ }
+
+ cxl_uport_init_ras_reporting(port, &port->dev);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_switch_port_init_ras, "CXL");
+
+void cxl_endpoint_port_init_ras(struct cxl_port *ep)
+{
+ struct cxl_dport *parent_dport;
+ struct cxl_memdev *cxlmd = to_cxl_memdev(ep->uport_dev);
+ struct cxl_port *parent_port __free(put_cxl_port) =
+ cxl_mem_find_port(cxlmd, &parent_dport);
+
+ if (!parent_dport || !dev_is_pci(parent_dport->dport_dev)) {
+ dev_err(&ep->dev, "CXL port topology not found\n");
+ return;
+ }
+
+ cxl_dport_init_ras_reporting(parent_dport, cxlmd->cxlds->dev);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_init_ras, "CXL");
+
static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
void __iomem *addr;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 8f6224ac6785..32fccad9a7f6 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -586,6 +586,7 @@ struct cxl_dax_region {
* @parent_dport: dport that points to this port in the parent
* @decoder_ida: allocator for decoder ids
* @reg_map: component and ras register mapping parameters
+ * @uport_regs: mapped component registers
* @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering
* @commit_end: cursor to track highest committed decoder for commit ordering
@@ -606,6 +607,7 @@ struct cxl_port {
struct cxl_dport *parent_dport;
struct ida decoder_ida;
struct cxl_register_map reg_map;
+ struct cxl_component_regs uport_regs;
int nr_dports;
int hdm_end;
int commit_end;
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index ad24d81e9eaa..a6da0abfa506 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -84,7 +84,6 @@ void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
-void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
#else
static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
@@ -93,9 +92,6 @@ static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
{
return PCI_ERS_RESULT_NONE;
}
-
-static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
- struct device *host) { }
#endif
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 6e6777b7bafb..f7dc0ba8905d 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -7,6 +7,7 @@
#include "cxlmem.h"
#include "cxlpci.h"
+#include "core/core.h"
/**
* DOC: cxl mem
@@ -166,7 +167,8 @@ static int cxl_mem_probe(struct device *dev)
else
endpoint_parent = &parent_port->dev;
- cxl_dport_init_ras_reporting(dport, dev);
+ if (dport->rch)
+ cxl_dport_init_ras_reporting(dport, dev);
scoped_guard(device, endpoint_parent) {
if (!endpoint_parent->driver) {
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index fe4b593331da..e66c7f2e1955 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -6,6 +6,7 @@
#include "cxlmem.h"
#include "cxlpci.h"
+#include "core/core.h"
/**
* DOC: cxl port
@@ -71,6 +72,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
cxl_switch_parse_cdat(port);
+ cxl_switch_port_init_ras(port);
+
cxlhdm = devm_cxl_setup_hdm(port, NULL);
if (!IS_ERR(cxlhdm))
return devm_cxl_enumerate_decoders(cxlhdm, NULL);
@@ -125,6 +128,8 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
if (rc)
return rc;
+ cxl_endpoint_port_init_ras(port);
+
/*
* Now that all endpoint decoders are successfully enumerated, try to
* assemble regions from committed decoders
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 16/23] cxl/pci: Introduce CXL Endpoint protocol error handlers
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (14 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 15/23] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 7:48 ` Lukas Wunner
2025-08-27 1:35 ` [PATCH v11 17/23] CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors Terry Bowman
` (7 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
CXL Endpoint protocol errors are currently handled using PCI error
handlers. The CXL Endpoint requires CXL specific handling in the case of
uncorrectable error (UCE) handling not provided by the PCI handlers.
Add CXL specific handlers for CXL Endpoints. Rename the existing
cxl_error_handlers to be pci_error_handlers to more correctly indicate
the error type and follow naming consistency.
The PCI handlers will be called if the CXL device is not trained for
alternate protocol (CXL). Update the CXL Endpoint PCI handlers to call the
CXL UCE handlers.
The existing EP UCE handler includes checks for various results. These are
no longer needed because CXL UCE recovery will not be attempted. Implement
cxl_handle_ras() to return PCI_ERS_RESULT_NONE or PCI_ERS_RESULT_PANIC. The
CXL UCE handler is called by cxl_do_recovery() that acts on the return
value. In the case of the PCI handler path, call panic() if the result is
PCI_ERS_RESULT_PANIC.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
Changes in v10->v11:
- cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
- cxl_error_detected() - Remove extra line (Shiju)
- Changes moved to core/ras.c (Terry)
- cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
- Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
- Move #include "pci.h from cxl.h to core.h (Terry)
- Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
---
drivers/cxl/core/core.h | 17 +++++++
drivers/cxl/core/ras.c | 110 +++++++++++++++++++---------------------
drivers/cxl/cxlpci.h | 15 ------
drivers/cxl/pci.c | 9 ++--
include/linux/pci.h | 3 ++
5 files changed, 78 insertions(+), 76 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 2fa76a913264..6e3e7f2e0e2d 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -6,6 +6,7 @@
#include <cxl/mailbox.h>
#include <linux/rwsem.h>
+#include <linux/pci.h>
extern const struct device_type cxl_nvdimm_bridge_type;
extern const struct device_type cxl_nvdimm_type;
@@ -149,6 +150,11 @@ void cxl_ras_exit(void);
void cxl_switch_port_init_ras(struct cxl_port *port);
void cxl_endpoint_port_init_ras(struct cxl_port *ep);
void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
+pci_ers_result_t pci_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t error);
+void pci_cor_error_detected(struct pci_dev *pdev);
+void cxl_cor_error_detected(struct device *dev);
+pci_ers_result_t cxl_error_detected(struct device *dev);
#else
static inline int cxl_ras_init(void)
{
@@ -162,6 +168,17 @@ static inline void cxl_switch_port_init_ras(struct cxl_port *port) { }
static inline void cxl_endpoint_port_init_ras(struct cxl_port *ep) { }
static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
struct device *host) { }
+static inline pci_ers_result_t pci_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t error)
+{
+ return PCI_ERS_RESULT_NONE;
+}
+static inline void pci_cor_error_detected(struct pci_dev *pdev) { }
+static inline void cxl_cor_error_detected(struct device *dev) { }
+static inline pci_ers_result_t cxl_error_detected(struct device *dev)
+{
+ return PCI_ERS_RESULT_NONE;
+}
#endif // CONFIG_CXL_RAS
int cxl_gpf_port_setup(struct cxl_dport *dport);
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 42b6e0b092d5..b285448c2d9c 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -129,7 +129,7 @@ void cxl_ras_exit(void)
cancel_work_sync(&cxl_cper_prot_err_work);
}
-static bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
+static pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base);
#ifdef CONFIG_CXL_RCH_RAS
@@ -371,7 +371,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
*/
-static bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
+static pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
u32 hl[CXL_HEADERLOG_SIZE_U32];
void __iomem *addr;
@@ -380,13 +380,13 @@ static bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_bas
if (!ras_base) {
dev_warn_once(dev, "CXL RAS register block is not mapped");
- return false;
+ return PCI_ERS_RESULT_NONE;
}
addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
status = readl(addr);
if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
- return false;
+ return PCI_ERS_RESULT_NONE;
/* If multiple errors, log header points to first error from ctrl reg */
if (hweight32(status) > 1) {
@@ -403,76 +403,72 @@ static bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_bas
trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
- return true;
+ return PCI_ERS_RESULT_PANIC;
}
-void cxl_cor_error_detected(struct pci_dev *pdev)
+void cxl_cor_error_detected(struct device *dev)
{
+ struct pci_dev *pdev = to_pci_dev(dev);
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- struct device *dev = &cxlds->cxlmd->dev;
+ struct device *cxlmd_dev = &cxlds->cxlmd->dev;
- scoped_guard(device, dev) {
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return;
- }
+ guard(device)(cxlmd_dev);
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
-
- cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras);
+ if (!cxlmd_dev->driver) {
+ dev_warn(&pdev->dev, "%s: memdev disabled, abort error handling", dev_name(dev));
+ return;
}
+
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+
+ cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras);
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
-pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state)
+void pci_cor_error_detected(struct pci_dev *pdev)
{
+ cxl_cor_error_detected(&pdev->dev);
+}
+EXPORT_SYMBOL_NS_GPL(pci_cor_error_detected, "CXL");
+
+pci_ers_result_t cxl_error_detected(struct device *dev)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- struct cxl_memdev *cxlmd = cxlds->cxlmd;
- struct device *dev = &cxlmd->dev;
- bool ue;
+ struct device *cxlmd_dev = &cxlds->cxlmd->dev;
- scoped_guard(device, dev) {
- if (!dev->driver) {
- dev_warn(&pdev->dev,
- "%s: memdev disabled, abort error handling\n",
- dev_name(dev));
- return PCI_ERS_RESULT_DISCONNECT;
- }
+ guard(device)(cxlmd_dev);
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
- /*
- * A frozen channel indicates an impending reset which is fatal to
- * CXL.mem operation, and will likely crash the system. On the off
- * chance the situation is recoverable dump the status of the RAS
- * capability registers and bounce the active state of the memdev.
- */
- ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras);
- }
-
-
- switch (state) {
- case pci_channel_io_normal:
- if (ue) {
- device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- }
- return PCI_ERS_RESULT_CAN_RECOVER;
- case pci_channel_io_frozen:
+ if (!dev->driver) {
dev_warn(&pdev->dev,
- "%s: frozen state error detected, disable CXL.mem\n",
+ "%s: memdev disabled, abort error handling\n",
dev_name(dev));
- device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- case pci_channel_io_perm_failure:
- dev_warn(&pdev->dev,
- "failure state error detected, request disconnect\n");
return PCI_ERS_RESULT_DISCONNECT;
}
- return PCI_ERS_RESULT_NEED_RESET;
+
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+
+ /*
+ * A frozen channel indicates an impending reset which is fatal to
+ * CXL.mem operation, and will likely crash the system. On the off
+ * chance the situation is recoverable dump the status of the RAS
+ * capability registers and bounce the active state of the memdev.
+ */
+ return cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras);
}
EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
+
+pci_ers_result_t pci_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t error)
+{
+ pci_ers_result_t rc;
+
+ rc = cxl_error_detected(&pdev->dev);
+ if (rc == PCI_ERS_RESULT_PANIC)
+ panic("CXL cachemem error.");
+
+ return rc;
+}
+EXPORT_SYMBOL_NS_GPL(pci_error_detected, "CXL");
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index a6da0abfa506..ccf0ca36bc00 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -79,19 +79,4 @@ struct cxl_dev_state;
int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info);
void read_cdat_data(struct cxl_port *port);
-
-#ifdef CONFIG_CXL_RAS
-void cxl_cor_error_detected(struct pci_dev *pdev);
-pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state);
-#else
-static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
-
-static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state)
-{
- return PCI_ERS_RESULT_NONE;
-}
-#endif
-
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index bd95be1f3d5c..6803c2fb906b 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -16,6 +16,7 @@
#include "cxlpci.h"
#include "cxl.h"
#include "pmu.h"
+#include "core/core.h"
/**
* DOC: cxl pci
@@ -1112,11 +1113,11 @@ static void cxl_reset_done(struct pci_dev *pdev)
}
}
-static const struct pci_error_handlers cxl_error_handlers = {
- .error_detected = cxl_error_detected,
+static const struct pci_error_handlers pci_error_handlers = {
+ .error_detected = pci_error_detected,
.slot_reset = cxl_slot_reset,
.resume = cxl_error_resume,
- .cor_error_detected = cxl_cor_error_detected,
+ .cor_error_detected = pci_cor_error_detected,
.reset_done = cxl_reset_done,
};
@@ -1124,7 +1125,7 @@ static struct pci_driver cxl_pci_driver = {
.name = KBUILD_MODNAME,
.id_table = cxl_mem_pci_tbl,
.probe = cxl_pci_probe,
- .err_handler = &cxl_error_handlers,
+ .err_handler = &pci_error_handlers,
.dev_groups = cxl_rcd_groups,
.driver = {
.probe_type = PROBE_PREFER_ASYNCHRONOUS,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 79878243b681..3dcab36c437f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -868,6 +868,9 @@ enum pci_ers_result {
/* No AER capabilities registered for the driver */
PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
+
+ /* System is unstable, panic. Is CXL specific */
+ PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
};
/* PCI bus error event callbacks */
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 17/23] CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (15 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 16/23] cxl/pci: Introduce CXL Endpoint protocol error handlers Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 7:56 ` Lukas Wunner
2025-08-27 1:35 ` [PATCH v11 18/23] PCI/AER: Dequeue forwarded CXL error Terry Bowman
` (6 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
CXL virtual hierarchy (VH) RAS handling for CXL Port devices will be added
soon. This requires a notification mechanism for the AER driver to share
the AER interrupt with the CXL driver. The notification will be used as an
indication for the CXL drivers to handle and log the CXL RAS errors.
Note, 'CXL protocol error' terminology will refer to CXL VH and not
CXL RCH errors unless specifically noted going forward.
Introduce a new file in the AER driver to handle the CXL protocol errors
named pci/pcie/cxl_aer.c.
Add a kfifo work queue to be used by the AER and CXL drivers. The AER
driver will be the sole kfifo producer adding work and the cxl_core will be
the sole kfifo consumer removing work. Add the boilerplate kfifo support.
Encapsulate the kfifo, RW semaphore, and work pointer in a single structure.
Add CXL work queue handler registration functions in the AER driver. Export
the functions allowing CXL driver to access. Implement registration
functions for the CXL driver to assign or clear the work handler function.
Synchronize accesses using the RW semaphore.
Introduce 'struct cxl_proto_err_work_data' to serve as the kfifo work data.
This will contain a reference to the erring PCI device and the error
severity. This will be used when the work is dequeued by the cxl_core driver.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
Changes in v10->v11:
- Move RCH implementation to cxl_march.c and RCH declarations to pci/pci.h. (Terry)
- Introduce 'struct cxl_proto_err_kfifo' containing semaphore, fifo,
and work struct. (Dan)
- Remove embedded struct from cxl_proto_err_work (Dan)
- Make 'struct work_struct *cxl_proto_err_work' definition static (Jonathan)
- Add check for NULL cxl_proto_err_kfifo to determine if CXL driver is
not registered for workqueue. (Dan)
---
drivers/pci/pci.h | 4 ++
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/aer.c | 50 ++--------------
drivers/pci/pcie/cxl_aer.c | 120 +++++++++++++++++++++++++++++++++++++
include/linux/aer.h | 21 +++++++
5 files changed, 150 insertions(+), 46 deletions(-)
create mode 100644 drivers/pci/pcie/cxl_aer.c
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 417a088d815f..cfa75903dd3f 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1179,10 +1179,14 @@ static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
void pci_aer_unmask_internal_errors(struct pci_dev *dev);
bool cxl_error_is_native(struct pci_dev *dev);
bool is_internal_error(struct aer_err_info *info);
+bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info);
+void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info);
#else
static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
static inline bool is_internal_error(struct aer_err_info *info) { return false; }
+static inline bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return false; }
+static inline void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info) { }
#endif
#endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 07c299dbcdd7..bfe5bb5a3c89 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o bwctrl.o
obj-y += aspm.o
obj-$(CONFIG_PCIEAER) += aer.o err.o tlp.o
obj-$(CONFIG_CXL_RCH_RAS) += rch_aer.o
+obj-$(CONFIG_CXL_RAS) += cxl_aer.o
obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
obj-$(CONFIG_PCIE_PME) += pme.o
obj-$(CONFIG_PCIE_DPC) += dpc.o
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index ed1de9256898..627d89ccea9c 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1092,51 +1092,6 @@ static bool find_source_device(struct pci_dev *parent,
return true;
}
-#ifdef CONFIG_CXL_RAS
-
-/**
- * pci_aer_unmask_internal_errors - unmask internal errors
- * @dev: pointer to the pci_dev data structure
- *
- * Unmask internal errors in the Uncorrectable and Correctable Error
- * Mask registers.
- *
- * Note: AER must be enabled and supported by the device which must be
- * checked in advance, e.g. with pcie_aer_is_native().
- */
-void pci_aer_unmask_internal_errors(struct pci_dev *dev)
-{
- int aer = dev->aer_cap;
- u32 mask;
-
- pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, &mask);
- mask &= ~PCI_ERR_UNC_INTN;
- pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, mask);
-
- pci_read_config_dword(dev, aer + PCI_ERR_COR_MASK, &mask);
- mask &= ~PCI_ERR_COR_INTERNAL;
- pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
-}
-EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL");
-
-bool cxl_error_is_native(struct pci_dev *dev)
-{
- struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
-
- return (pcie_ports_native || host->native_aer);
-}
-EXPORT_SYMBOL_NS_GPL(cxl_error_is_native, "CXL");
-
-bool is_internal_error(struct aer_err_info *info)
-{
- if (info->severity == AER_CORRECTABLE)
- return info->status & PCI_ERR_COR_INTERNAL;
-
- return info->status & PCI_ERR_UNC_INTN;
-}
-EXPORT_SYMBOL_NS_GPL(is_internal_error, "CXL");
-#endif /* CONFIG_CXL_RAS */
-
/**
* pci_aer_handle_error - handle logging error into an event log
* @dev: pointer to pci_dev data structure of error source device
@@ -1173,7 +1128,10 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
{
cxl_rch_handle_error(dev, info);
- pci_aer_handle_error(dev, info);
+ if (is_cxl_error(dev, info))
+ cxl_forward_error(dev, info);
+ else
+ pci_aer_handle_error(dev, info);
pci_dev_put(dev);
}
diff --git a/drivers/pci/pcie/cxl_aer.c b/drivers/pci/pcie/cxl_aer.c
new file mode 100644
index 000000000000..74e6d2d04ab6
--- /dev/null
+++ b/drivers/pci/pcie/cxl_aer.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2025 AMD Corporation. All rights reserved. */
+
+#include <linux/pci.h>
+#include <linux/aer.h>
+#include <linux/pci.h>
+#include <linux/bitfield.h>
+#include <linux/kfifo.h>
+#include "../pci.h"
+
+#define CXL_ERROR_SOURCES_MAX 128
+
+struct cxl_proto_err_kfifo {
+ struct work_struct *work;
+ struct rw_semaphore rw_sema;
+ DECLARE_KFIFO(fifo, struct cxl_proto_err_work_data,
+ CXL_ERROR_SOURCES_MAX);
+};
+
+static struct cxl_proto_err_kfifo cxl_proto_err_kfifo = {
+ .rw_sema = __RWSEM_INITIALIZER(cxl_proto_err_kfifo.rw_sema)
+};
+
+/**
+ * pci_aer_unmask_internal_errors - unmask internal errors
+ * @dev: pointer to the pci_dev data structure
+ *
+ * Unmask internal errors in the Uncorrectable and Correctable Error
+ * Mask registers.
+ *
+ * Note: AER must be enabled and supported by the device which must be
+ * checked in advance, e.g. with pcie_aer_is_native().
+ */
+void pci_aer_unmask_internal_errors(struct pci_dev *dev)
+{
+ int aer = dev->aer_cap;
+ u32 mask;
+
+ pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, &mask);
+ mask &= ~PCI_ERR_UNC_INTN;
+ pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, mask);
+
+ pci_read_config_dword(dev, aer + PCI_ERR_COR_MASK, &mask);
+ mask &= ~PCI_ERR_COR_INTERNAL;
+ pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
+}
+EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL");
+
+bool cxl_error_is_native(struct pci_dev *dev)
+{
+ struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
+
+ return (pcie_ports_native || host->native_aer);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_error_is_native, "CXL");
+
+bool is_internal_error(struct aer_err_info *info)
+{
+ if (info->severity == AER_CORRECTABLE)
+ return info->status & PCI_ERR_COR_INTERNAL;
+
+ return info->status & PCI_ERR_UNC_INTN;
+}
+EXPORT_SYMBOL_NS_GPL(is_internal_error, "CXL");
+
+bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
+{
+ if (!info || !info->is_cxl)
+ return false;
+
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
+ return false;
+
+ return is_internal_error(info);
+}
+EXPORT_SYMBOL_NS_GPL(is_cxl_error, "CXL");
+
+void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info)
+{
+ struct cxl_proto_err_work_data wd = (struct cxl_proto_err_work_data) {
+ .severity = info->severity,
+ .pdev = pdev
+ };
+
+ guard(rwsem_write)(&cxl_proto_err_kfifo.rw_sema);
+
+ if (!cxl_proto_err_kfifo.work) {
+ dev_warn_once(&pdev->dev, "CXL driver is not registered for kfifo");
+ return;
+ }
+
+ if (!kfifo_put(&cxl_proto_err_kfifo.fifo, wd)) {
+ dev_err_ratelimited(&pdev->dev, "CXL kfifo overflow\n");
+ return;
+ }
+
+ schedule_work(cxl_proto_err_kfifo.work);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_forward_error, "CXL");
+
+void cxl_register_proto_err_work(struct work_struct *work)
+{
+ guard(rwsem_write)(&cxl_proto_err_kfifo.rw_sema);
+ cxl_proto_err_kfifo.work = work;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_register_proto_err_work, "CXL");
+
+void cxl_unregister_proto_err_work(void)
+{
+ guard(rwsem_write)(&cxl_proto_err_kfifo.rw_sema);
+ cxl_proto_err_kfifo.work = NULL;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_unregister_proto_err_work, "CXL");
+
+int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd)
+{
+ guard(rwsem_read)(&cxl_proto_err_kfifo.rw_sema);
+ return kfifo_get(&cxl_proto_err_kfifo.fifo, wd);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_proto_err_kfifo_get, "CXL");
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 02940be66324..f8eb32805957 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -10,6 +10,7 @@
#include <linux/errno.h>
#include <linux/types.h>
+#include <linux/workqueue_types.h>
#define AER_NONFATAL 0
#define AER_FATAL 1
@@ -53,6 +54,16 @@ struct aer_capability_regs {
u16 uncor_err_source;
};
+/**
+ * struct cxl_proto_err_work_data - Error information used in CXL error handling
+ * @severity: AER severity
+ * @pdev: PCI device detecting the error
+ */
+struct cxl_proto_err_work_data {
+ int severity;
+ struct pci_dev *pdev;
+};
+
#if defined(CONFIG_PCIEAER)
int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
int pcie_aer_is_native(struct pci_dev *dev);
@@ -64,6 +75,16 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
#endif
+#ifdef CONFIG_CXL_RAS
+int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd);
+void cxl_register_proto_err_work(struct work_struct *work);
+void cxl_unregister_proto_err_work(void);
+#else
+static inline int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd) { return 0; }
+static inline void cxl_register_proto_err_work(struct work_struct *work) { }
+static inline void cxl_unregister_proto_err_work(void) { }
+#endif
+
void pci_print_aer(struct pci_dev *dev, int aer_severity,
struct aer_capability_regs *aer);
int cper_severity_to_aer(int cper_severity);
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 18/23] PCI/AER: Dequeue forwarded CXL error
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (16 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 17/23] CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-29 0:43 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 19/23] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
` (5 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
The AER driver is now designed to forward CXL protocol errors to the CXL
driver. Update the CXL driver with functionality to dequeue the forwarded
CXL error from the kfifo. Also, update the CXL driver to begin the protocol
error handling processing using the work received from the FIFO.
Update function cxl_proto_err_work_fn() to dequeue work forwarded by the
AER service driver. This will begin the CXL protocol error processing with
a call to cxl_handle_proto_error().
Introduce logic to take the SBDF values from 'struct cxl_proto_error_info'
and use in discovering the erring PCI device. The call to pci_get_domain_bus_and_slot()
will return a reference counted 'struct pci_dev *'. This will serve as
reference count to prevent releasing the CXL Endpoint's mapped RAS while
handling the error. Use scope base __free() to put the reference count.
This will change when adding support for CXL port devices in the future.
Implement cxl_handle_proto_error() to differentiate between Restricted CXL
Host (RCH) protocol errors and CXL virtual host (VH) protocol errors.
Maintain the existing RCH handling. Export the AER driver's pcie_walk_rcec()
allowing the CXL driver to walk the RCEC's secondary bus.
VH correctable error (CE) processing will call the CXL CE handler. VH
uncorrectable errors (UCE) will call cxl_do_recovery(), implemented as a
stub for now and to be updated in future patch. Export pci_aer_clean_fatal_status()
and pci_clean_device_status() used to clean up AER status after handling.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
Changes in v10->v11:
- Reword patch commit message to remove RCiEP details (Jonathan)
- Add #include <linux/bitfield.h> (Terry)
- is_cxl_rcd() - Fix short comment message wrap (Jonathan)
- is_cxl_rcd() - Combine return calls into 1 (Jonathan)
- cxl_handle_proto_error() - Move comment earlier (Jonathan)
- Usse FIELD_GET() in discovering class code (Jonathan)
- Remove BDF from cxl_proto_err_work_data. Use 'struct pci_dev *' (Dan)
---
drivers/cxl/core/ras.c | 68 ++++++++++++++++++++++++++++++++++-------
drivers/pci/pci.c | 1 +
drivers/pci/pci.h | 7 -----
drivers/pci/pcie/aer.c | 1 +
drivers/pci/pcie/rcec.c | 1 +
include/linux/aer.h | 2 ++
include/linux/pci.h | 10 ++++++
7 files changed, 72 insertions(+), 18 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index b285448c2d9c..a2e95c49f965 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -118,17 +118,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
}
static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
-int cxl_ras_init(void)
-{
- return cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
-}
-
-void cxl_ras_exit(void)
-{
- cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
- cancel_work_sync(&cxl_cper_prot_err_work);
-}
-
static pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base);
@@ -331,6 +320,10 @@ void cxl_endpoint_port_init_ras(struct cxl_port *ep)
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_init_ras, "CXL");
+static void cxl_do_recovery(struct device *dev)
+{
+}
+
static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
{
void __iomem *addr;
@@ -472,3 +465,56 @@ pci_ers_result_t pci_error_detected(struct pci_dev *pdev,
return rc;
}
EXPORT_SYMBOL_NS_GPL(pci_error_detected, "CXL");
+
+static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
+{
+ struct pci_dev *pdev = err_info->pdev;
+ struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+ struct cxl_memdev *cxlmd = cxlds->cxlmd;
+ struct device *host_dev __free(put_device) = get_device(&cxlmd->dev);
+
+ if (err_info->severity == AER_CORRECTABLE) {
+ int aer = pdev->aer_cap;
+
+ if (aer)
+ pci_clear_and_set_config_dword(pdev,
+ aer + PCI_ERR_COR_STATUS,
+ 0, PCI_ERR_COR_INTERNAL);
+
+ cxl_cor_error_detected(&cxlmd->dev);
+
+ pcie_clear_device_status(pdev);
+ } else {
+ cxl_do_recovery(&cxlmd->dev);
+ }
+}
+
+static void cxl_proto_err_work_fn(struct work_struct *work)
+{
+ struct cxl_proto_err_work_data wd;
+
+ while (cxl_proto_err_kfifo_get(&wd))
+ cxl_handle_proto_error(&wd);
+}
+
+static struct work_struct cxl_proto_err_work;
+static DECLARE_WORK(cxl_proto_err_work, cxl_proto_err_work_fn);
+
+int cxl_ras_init(void)
+{
+ if (cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work))
+ pr_err("Failed to initialize CXL RAS CPER\n");
+
+ cxl_register_proto_err_work(&cxl_proto_err_work);
+
+ return 0;
+}
+
+void cxl_ras_exit(void)
+{
+ cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
+ cancel_work_sync(&cxl_cper_prot_err_work);
+
+ cxl_unregister_proto_err_work();
+ cancel_work_sync(&cxl_proto_err_work);
+}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d775ed37a79b..2c9827690cb3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2328,6 +2328,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
}
+EXPORT_SYMBOL_NS_GPL(pcie_clear_device_status, "CXL");
#endif
/**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index cfa75903dd3f..69ff7c2d214f 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -671,16 +671,10 @@ static inline bool pci_dpc_recovered(struct pci_dev *pdev) { return false; }
void pci_rcec_init(struct pci_dev *dev);
void pci_rcec_exit(struct pci_dev *dev);
void pcie_link_rcec(struct pci_dev *rcec);
-void pcie_walk_rcec(struct pci_dev *rcec,
- int (*cb)(struct pci_dev *, void *),
- void *userdata);
#else
static inline void pci_rcec_init(struct pci_dev *dev) { }
static inline void pci_rcec_exit(struct pci_dev *dev) { }
static inline void pcie_link_rcec(struct pci_dev *rcec) { }
-static inline void pcie_walk_rcec(struct pci_dev *rcec,
- int (*cb)(struct pci_dev *, void *),
- void *userdata) { }
#endif
#ifdef CONFIG_PCI_ATS
@@ -1022,7 +1016,6 @@ void pci_restore_aer_state(struct pci_dev *dev);
static inline void pci_no_aer(void) { }
static inline void pci_aer_init(struct pci_dev *d) { }
static inline void pci_aer_exit(struct pci_dev *d) { }
-static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
static inline void pci_save_aer_state(struct pci_dev *dev) { }
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 627d89ccea9c..45abe1622316 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -288,6 +288,7 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
if (status)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
}
+EXPORT_SYMBOL_GPL(pci_aer_clear_fatal_status);
/**
* pci_aer_raw_clear_status - Clear AER error registers.
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
index d0bcd141ac9c..fb6cf6449a1d 100644
--- a/drivers/pci/pcie/rcec.c
+++ b/drivers/pci/pcie/rcec.c
@@ -145,6 +145,7 @@ void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
walk_rcec(walk_rcec_helper, &rcec_data);
}
+EXPORT_SYMBOL_NS_GPL(pcie_walk_rcec, "CXL");
void pci_rcec_init(struct pci_dev *dev)
{
diff --git a/include/linux/aer.h b/include/linux/aer.h
index f8eb32805957..1f79f0be4bf7 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -66,12 +66,14 @@ struct cxl_proto_err_work_data {
#if defined(CONFIG_PCIEAER)
int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+void pci_aer_clear_fatal_status(struct pci_dev *dev);
int pcie_aer_is_native(struct pci_dev *dev);
#else
static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
{
return -EINVAL;
}
+static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
#endif
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3dcab36c437f..3407d687459d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1804,6 +1804,9 @@ extern bool pcie_ports_native;
int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req,
bool use_lt);
+void pcie_walk_rcec(struct pci_dev *rcec,
+ int (*cb)(struct pci_dev *, void *),
+ void *userdata);
#else
#define pcie_ports_disabled true
#define pcie_ports_native false
@@ -1814,8 +1817,15 @@ static inline int pcie_set_target_speed(struct pci_dev *port,
{
return -EOPNOTSUPP;
}
+
+static inline void pcie_walk_rcec(struct pci_dev *rcec,
+ int (*cb)(struct pci_dev *, void *),
+ void *userdata) { }
+
#endif
+void pcie_clear_device_status(struct pci_dev *dev);
+
#define PCIE_LINK_STATE_L0S (BIT(0) | BIT(1)) /* Upstr/dwnstr L0s */
#define PCIE_LINK_STATE_L1 BIT(2) /* L1 state */
#define PCIE_LINK_STATE_L1_1 BIT(3) /* ASPM L1.1 state */
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 19/23] CXL/PCI: Introduce CXL Port protocol error handlers
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (17 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 18/23] PCI/AER: Dequeue forwarded CXL error Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-30 0:17 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
` (4 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
Introduce CXL error handlers for CXL Port devices.
Add functions cxl_port_cor_error_detected() and cxl_port_error_detected().
These will serve as the handlers for all CXL Port devices. Introduce
cxl_get_ras_base() to provide the RAS base address needed by the handlers.
Update cxl_handle_proto_error() to call the CXL Port or CXL Endpoint
handler depending on which CXL device reports the error.
Implement cxl_get_ras_base() to return the cached RAS register address of a
CXL Root Port, CXL Downstream Port, or CXL Upstream Port.
Introduce get_pci_cxl_host_dev() to return the host responsible for
releasing the RAS mapped resources. CXL endpoints do not use a host to
manage its resources, allow for NULL in the case of an EP. Use reference
count increment on the host to prevent resource release. Make the caller
responsible for the reference decrement.
Update the AER driver's is_cxl_error() PCI type check because CXL Port
devices are now supported.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
Changes in v10->v11:
- None
---
drivers/cxl/core/core.h | 9 ++
drivers/cxl/core/port.c | 4 +-
drivers/cxl/core/ras.c | 176 +++++++++++++++++++++++++++++++++++--
drivers/pci/pcie/cxl_aer.c | 5 +-
4 files changed, 186 insertions(+), 8 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 6e3e7f2e0e2d..7e66fbb07b8a 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -155,6 +155,8 @@ pci_ers_result_t pci_error_detected(struct pci_dev *pdev,
void pci_cor_error_detected(struct pci_dev *pdev);
void cxl_cor_error_detected(struct device *dev);
pci_ers_result_t cxl_error_detected(struct device *dev);
+void cxl_port_cor_error_detected(struct device *dev);
+pci_ers_result_t cxl_port_error_detected(struct device *dev);
#else
static inline int cxl_ras_init(void)
{
@@ -179,9 +181,16 @@ static inline pci_ers_result_t cxl_error_detected(struct device *dev)
{
return PCI_ERS_RESULT_NONE;
}
+static inline void cxl_port_cor_error_detected(struct device *dev) { }
+static inline pci_ers_result_t cxl_port_error_detected(struct device *dev)
+{
+ return PCI_ERS_RESULT_NONE;
+}
#endif // CONFIG_CXL_RAS
int cxl_gpf_port_setup(struct cxl_dport *dport);
+struct cxl_port *find_cxl_port(struct device *dport_dev,
+ struct cxl_dport **dport);
#ifdef CONFIG_CXL_FEATURES
struct cxl_feat_entry *
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 29197376b18e..758fb73374c1 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1335,8 +1335,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
return NULL;
}
-static struct cxl_port *find_cxl_port(struct device *dport_dev,
- struct cxl_dport **dport)
+struct cxl_port *find_cxl_port(struct device *dport_dev,
+ struct cxl_dport **dport)
{
struct cxl_find_port_ctx ctx = {
.dport_dev = dport_dev,
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index a2e95c49f965..536ca9c815ce 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -251,6 +251,154 @@ static void cxl_dport_map_ras(struct cxl_dport *dport)
dev_dbg(dev, "Failed to map RAS capability.\n");
}
+static int match_uport(struct device *dev, const void *data)
+{
+ const struct device *uport_dev = data;
+ struct cxl_port *port;
+
+ if (!is_cxl_port(dev))
+ return 0;
+
+ port = to_cxl_port(dev);
+
+ return port->uport_dev == uport_dev;
+}
+
+static void __iomem *cxl_get_ras_base(struct device *dev)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ switch (pci_pcie_type(pdev)) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ {
+ struct cxl_dport *dport = NULL;
+ struct cxl_port *port __free(put_cxl_port) = find_cxl_port(&pdev->dev, &dport);
+
+ if (!dport || !dport->dport_dev) {
+ pci_err(pdev, "Failed to find the CXL device");
+ return NULL;
+ }
+
+ if (!dport)
+ return NULL;
+
+ return dport->regs.ras;
+ }
+ case PCI_EXP_TYPE_UPSTREAM:
+ {
+ struct cxl_port *port;
+ struct device *port_dev __free(put_device) =
+ bus_find_device(&cxl_bus_type, NULL,
+ &pdev->dev, match_uport);
+
+ if (!port_dev || !is_cxl_port(port_dev)) {
+ pci_err(pdev, "Failed to find the CXL device");
+ return NULL;
+ }
+
+ port = to_cxl_port(port_dev);
+ if (!port)
+ return NULL;
+
+ return port->uport_regs.ras;
+ }
+ }
+
+ dev_warn_once(dev, "Error: Unsupported device type (%X)", pci_pcie_type(pdev));
+ return NULL;
+}
+
+static struct device *pci_to_cxl_dev(struct pci_dev *pdev)
+{
+ switch (pci_pcie_type(pdev)) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ {
+ struct cxl_dport *dport = NULL;
+ struct cxl_port *port __free(put_cxl_port) =
+ find_cxl_port(&pdev->dev, &dport);
+
+ if (!dport) {
+ pci_err(pdev, "Failed to find the CXL device");
+ return NULL;
+ }
+
+ return dport->dport_dev;
+ }
+ case PCI_EXP_TYPE_UPSTREAM:
+ {
+ struct cxl_port *port;
+ struct device *port_dev __free(put_device) =
+ bus_find_device(&cxl_bus_type, NULL,
+ &pdev->dev, match_uport);
+
+ if (!port_dev || !is_cxl_port(port_dev)) {
+ pci_err(pdev, "Failed to find the CXL device");
+ return NULL;
+ }
+
+ port = to_cxl_port(port_dev);
+ if (!port)
+ return NULL;
+
+ return port->uport_dev;
+ }
+ case PCI_EXP_TYPE_ENDPOINT:
+ {
+ struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+
+ return cxlds->dev;
+ }
+ }
+
+ pci_warn_once(pdev, "Error: Unsupported device type (%X)", pci_pcie_type(pdev));
+ return NULL;
+}
+
+
+/*
+ * Return 'struct device *' responsible for freeing pdev's CXL resources.
+ * Caller is responsible for reference count decrementing the return
+ * 'struct device *'.
+ *
+ * dev: Find the host of this dev
+ */
+static struct device *get_cxl_host_dev(struct device *dev)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ switch (pci_pcie_type(pdev)) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ {
+ struct cxl_dport *dport = NULL;
+ struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
+
+ if (!port)
+ return NULL;
+
+ return &port->dev;
+ }
+ case PCI_EXP_TYPE_UPSTREAM:
+ {
+ struct device *port_dev = bus_find_device(&cxl_bus_type, NULL,
+ &pdev->dev, match_uport);
+
+ if (!port_dev || !is_cxl_port(port_dev))
+ return NULL;
+
+ return port_dev;
+ }
+ /* Endpoint resources are managed by endpoint itself */
+ case PCI_EXP_TYPE_ENDPOINT:
+ return NULL;
+ }
+
+ dev_warn_once(dev, "Error: Unsupported device type (%X)", pci_pcie_type(pdev));
+ return NULL;
+}
+
/**
* cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
* @dport: the cxl_dport that needs to be initialized
@@ -399,6 +547,22 @@ static pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __io
return PCI_ERS_RESULT_PANIC;
}
+void cxl_port_cor_error_detected(struct device *dev)
+{
+ void __iomem *ras_base = cxl_get_ras_base(dev);
+
+ cxl_handle_cor_ras(dev, 0, ras_base);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_cor_error_detected, "CXL");
+
+pci_ers_result_t cxl_port_error_detected(struct device *dev)
+{
+ void __iomem *ras_base = cxl_get_ras_base(dev);
+
+ return cxl_handle_ras(dev, 0, ras_base);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_error_detected, "CXL");
+
void cxl_cor_error_detected(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
@@ -469,9 +633,8 @@ EXPORT_SYMBOL_NS_GPL(pci_error_detected, "CXL");
static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
{
struct pci_dev *pdev = err_info->pdev;
- struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
- struct cxl_memdev *cxlmd = cxlds->cxlmd;
- struct device *host_dev __free(put_device) = get_device(&cxlmd->dev);
+ struct device *dev = pci_to_cxl_dev(pdev);
+ struct device *host_dev __free(put_device) = get_cxl_host_dev(&pdev->dev);
if (err_info->severity == AER_CORRECTABLE) {
int aer = pdev->aer_cap;
@@ -481,11 +644,14 @@ static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
aer + PCI_ERR_COR_STATUS,
0, PCI_ERR_COR_INTERNAL);
- cxl_cor_error_detected(&cxlmd->dev);
+ if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ENDPOINT)
+ cxl_error_detected(&pdev->dev);
+ else
+ cxl_port_cor_error_detected(dev);
pcie_clear_device_status(pdev);
} else {
- cxl_do_recovery(&cxlmd->dev);
+ cxl_do_recovery(dev);
}
}
diff --git a/drivers/pci/pcie/cxl_aer.c b/drivers/pci/pcie/cxl_aer.c
index 74e6d2d04ab6..6eeff0b78b47 100644
--- a/drivers/pci/pcie/cxl_aer.c
+++ b/drivers/pci/pcie/cxl_aer.c
@@ -68,7 +68,10 @@ bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
if (!info || !info->is_cxl)
return false;
- if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
+ if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT) &&
+ (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
+ (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM) &&
+ (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
return false;
return is_internal_error(info);
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (18 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 19/23] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-27 8:04 ` Lukas Wunner
2025-08-27 12:19 ` kernel test robot
2025-08-27 1:35 ` [PATCH v11 21/23] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
` (3 subsequent siblings)
23 siblings, 2 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
CXL uncorrectable errors (UCE) will soon be handled separately from the PCI
AER handling. The merge_result() function can be made common to use in both
handling paths.
Rename the PCI subsystem's merge_result() to be pci_ers_merge_result().
Export pci_ers_merge_result() to make available for the CXL and other
drivers to use.
Update pci_ers_merge_result() to support recently introduced PCI_ERS_RESULT_PANIC
result.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v10->v11:
- New patch
- pci_ers_merge_result() - Change export to non-namespace and rename
to be pci_ers_merge_result()
- Move pci_ers_merge_result() definition to pci.h. Needs pci_ers_result
---
drivers/pci/pcie/err.c | 14 +++++++++-----
include/linux/pci.h | 11 +++++++++++
2 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index de6381c690f5..368bad0cb90e 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -21,9 +21,12 @@
#include "portdrv.h"
#include "../pci.h"
-static pci_ers_result_t merge_result(enum pci_ers_result orig,
- enum pci_ers_result new)
+pci_ers_result_t pci_ers_merge_result(enum pci_ers_result orig,
+ enum pci_ers_result new)
{
+ if (new == PCI_ERS_RESULT_PANIC)
+ return PCI_ERS_RESULT_PANIC;
+
if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
return PCI_ERS_RESULT_NO_AER_DRIVER;
@@ -45,6 +48,7 @@ static pci_ers_result_t merge_result(enum pci_ers_result orig,
return orig;
}
+EXPORT_SYMBOL(pci_ers_merge_result);
static int report_error_detected(struct pci_dev *dev,
pci_channel_state_t state,
@@ -81,7 +85,7 @@ static int report_error_detected(struct pci_dev *dev,
vote = err_handler->error_detected(dev, state);
}
pci_uevent_ers(dev, vote);
- *result = merge_result(*result, vote);
+ *result = pci_ers_merge_result(*result, vote);
device_unlock(&dev->dev);
return 0;
}
@@ -121,7 +125,7 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data)
err_handler = pdrv->err_handler;
vote = err_handler->mmio_enabled(dev);
- *result = merge_result(*result, vote);
+ *result = pci_ers_merge_result(*result, vote);
out:
device_unlock(&dev->dev);
return 0;
@@ -140,7 +144,7 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
err_handler = pdrv->err_handler;
vote = err_handler->slot_reset(dev);
- *result = merge_result(*result, vote);
+ *result = pci_ers_merge_result(*result, vote);
out:
device_unlock(&dev->dev);
return 0;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3407d687459d..ff6812b2b9b6 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2760,6 +2760,17 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type);
#endif
+#if defined(CONFIG_PCIEAER)
+pci_ers_result_t pci_ers_merge_result(enum pci_ers_result orig,
+ enum pci_ers_result new);
+#else
+static inline pci_ers_result_t pci_ers_merge_result(enum pci_ers_result orig,
+ enum pci_ers_result new)
+{
+ return PCI_ERS_RESULT_NONE;
+}
+#endif
+
#include <linux/dma-mapping.h>
#define pci_emerg(pdev, fmt, arg...) dev_emerg(&(pdev)->dev, fmt, ##arg)
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 21/23] CXL/PCI: Introduce CXL uncorrectable protocol error recovery
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (19 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-09-03 22:30 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 22/23] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
` (2 subsequent siblings)
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
Populate the cxl_do_recovery() function with uncorrectable protocol error (UCE)
handling. Follow similar design as found in PCIe error driver,
pcie_do_recovery(). One difference is cxl_do_recovery() will treat all UCEs
as fatal with a kernel panic. This is to prevent corruption on CXL memory.
Introduce cxl_walk_port(). Make this analogous to pci_walk_bridge() but walking
CXL ports instead. This will iterate through the CXL topology from the
erroring device through the downstream CXL Ports and Endpoints.
Export pci_aer_clear_fatal_status() for CXL to use if a UCE is not found.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
Changes in v10->v11:
- pci_ers_merge_results() - Move to earlier patch
---
drivers/cxl/core/port.c | 1 +
drivers/cxl/core/ras.c | 94 +++++++++++++++++++++++++++++++++++++++++
drivers/pci/pci.h | 2 -
include/linux/aer.h | 2 +
4 files changed, 97 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 758fb73374c1..085c8620a797 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1347,6 +1347,7 @@ struct cxl_port *find_cxl_port(struct device *dport_dev,
port = __find_cxl_port(&ctx);
return port;
}
+EXPORT_SYMBOL_NS_GPL(find_cxl_port, "CXL");
static struct cxl_port *find_cxl_port_at(struct cxl_port *parent_port,
struct device *dport_dev,
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 536ca9c815ce..3da675f72616 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -6,6 +6,7 @@
#include <cxl/event.h>
#include <cxlmem.h>
#include <cxlpci.h>
+#include <cxl.h>
#include "trace.h"
static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
@@ -468,8 +469,101 @@ void cxl_endpoint_port_init_ras(struct cxl_port *ep)
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_init_ras, "CXL");
+static int cxl_report_error_detected(struct device *dev, void *data)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ pci_ers_result_t vote, *result = data;
+
+ guard(device)(dev);
+
+ if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ENDPOINT)
+ vote = cxl_error_detected(dev);
+ else
+ vote = cxl_port_error_detected(dev);
+
+ vote = cxl_error_detected(dev);
+ *result = pci_ers_merge_result(*result, vote);
+
+ return 0;
+}
+
+static int match_port_by_parent_dport(struct device *dev, const void *dport_dev)
+{
+ struct cxl_port *port;
+
+ if (!is_cxl_port(dev))
+ return 0;
+
+ port = to_cxl_port(dev);
+
+ return port->parent_dport->dport_dev == dport_dev;
+}
+
+static void cxl_walk_port(struct device *port_dev,
+ int (*cb)(struct device *, void *),
+ void *userdata)
+{
+ struct cxl_dport *dport = NULL;
+ struct cxl_port *port;
+ unsigned long index;
+
+ if (!port_dev)
+ return;
+
+ port = to_cxl_port(port_dev);
+ if (port->uport_dev && dev_is_pci(port->uport_dev))
+ cb(port->uport_dev, userdata);
+
+ xa_for_each(&port->dports, index, dport)
+ {
+ struct device *child_port_dev __free(put_device) =
+ bus_find_device(&cxl_bus_type, &port->dev, dport,
+ match_port_by_parent_dport);
+
+ cb(dport->dport_dev, userdata);
+
+ cxl_walk_port(child_port_dev, cxl_report_error_detected, userdata);
+ }
+
+ if (is_cxl_endpoint(port))
+ cb(port->uport_dev->parent, userdata);
+}
+
static void cxl_do_recovery(struct device *dev)
{
+ pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct cxl_dport *dport;
+ struct cxl_port *port;
+
+ if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) ||
+ (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) {
+ port = find_cxl_port(&pdev->dev, &dport);
+ } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_UPSTREAM) {
+ struct device *port_dev = bus_find_device(&cxl_bus_type, NULL,
+ &pdev->dev, match_uport);
+ port = to_cxl_port(port_dev);
+ }
+
+ if (!port)
+ return;
+
+ cxl_walk_port(&port->dev, cxl_report_error_detected, &status);
+ if (status == PCI_ERS_RESULT_PANIC)
+ panic("CXL cachemem error.");
+
+ /*
+ * If we have native control of AER, clear error status in the device
+ * that detected the error. If the platform retained control of AER,
+ * it is responsible for clearing this status. In that case, the
+ * signaling device may not even be visible to the OS.
+ */
+ if (cxl_error_is_native(pdev)) {
+ pcie_clear_device_status(pdev);
+ pci_aer_clear_nonfatal_status(pdev);
+ pci_aer_clear_fatal_status(pdev);
+ }
+ put_device(&port->dev);
}
static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 69ff7c2d214f..0c4f73dd645f 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1170,13 +1170,11 @@ static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
#ifdef CONFIG_CXL_RAS
void pci_aer_unmask_internal_errors(struct pci_dev *dev);
-bool cxl_error_is_native(struct pci_dev *dev);
bool is_internal_error(struct aer_err_info *info);
bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info);
void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info);
#else
static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
-static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
static inline bool is_internal_error(struct aer_err_info *info) { return false; }
static inline bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return false; }
static inline void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info) { }
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 1f79f0be4bf7..751a026fea73 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -81,10 +81,12 @@ static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd);
void cxl_register_proto_err_work(struct work_struct *work);
void cxl_unregister_proto_err_work(void);
+bool cxl_error_is_native(struct pci_dev *dev);
#else
static inline int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd) { return 0; }
static inline void cxl_register_proto_err_work(struct work_struct *work) { }
static inline void cxl_unregister_proto_err_work(void) { }
+static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
#endif
void pci_print_aer(struct pci_dev *dev, int aer_severity,
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 22/23] CXL/PCI: Enable CXL protocol errors during CXL Port probe
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (20 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 21/23] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-09-03 23:23 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 23/23] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-08-29 0:07 ` [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Dave Jiang
23 siblings, 1 reply; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
CXL protocol errors are not enabled for all CXL devices after boot. These
must be enabled inorder to process CXL protocol errors.
Introduce cxl_unmask_proto_interrupts() to call pci_aer_unmask_internal_errors().
pci_aer_unmask_internal_errors() expects the pdev->aer_cap is initialized.
But, dev->aer_cap is not initialized for CXL Upstream Switch Ports and CXL
Downstream Switch Ports. Initialize the dev->aer_cap if necessary. Enable AER
correctable internal errors and uncorrectable internal errors for all CXL
devices.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
Changes in v10->v11:
- Added check for valid PCI devices in is_cxl_error() (Terry)
- Removed check for RCiEP in cxl_handle_proto_err() and
cxl_report_error_detected() (Terry)
---
drivers/cxl/core/ras.c | 26 +++++++++++++++++++++++++-
drivers/pci/pci.h | 2 --
include/linux/aer.h | 2 ++
3 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 3da675f72616..90ea0dfb942f 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -122,6 +122,21 @@ static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
static pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base);
+static void cxl_unmask_proto_interrupts(struct device *dev)
+{
+ struct pci_dev *pdev __free(pci_dev_put) =
+ pci_dev_get(to_pci_dev(dev));
+
+ if (!pdev->aer_cap) {
+ pdev->aer_cap = pci_find_ext_capability(pdev,
+ PCI_EXT_CAP_ID_ERR);
+ if (!pdev->aer_cap)
+ return;
+ }
+
+ pci_aer_unmask_internal_errors(pdev);
+}
+
#ifdef CONFIG_CXL_RCH_RAS
static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
{
@@ -418,7 +433,10 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
cxl_dport_map_rch_aer(dport);
cxl_disable_rch_root_ints(dport);
+ return;
}
+
+ cxl_unmask_proto_interrupts(dport->dport_dev);
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
@@ -429,8 +447,12 @@ static void cxl_uport_init_ras_reporting(struct cxl_port *port,
map->host = host;
if (cxl_map_component_regs(map, &port->uport_regs,
- BIT(CXL_CM_CAP_CAP_ID_RAS)))
+ BIT(CXL_CM_CAP_CAP_ID_RAS))) {
dev_dbg(&port->dev, "Failed to map RAS capability\n");
+ return;
+ }
+
+ cxl_unmask_proto_interrupts(port->uport_dev);
}
void cxl_switch_port_init_ras(struct cxl_port *port)
@@ -466,6 +488,8 @@ void cxl_endpoint_port_init_ras(struct cxl_port *ep)
}
cxl_dport_init_ras_reporting(parent_dport, cxlmd->cxlds->dev);
+
+ cxl_unmask_proto_interrupts(cxlmd->cxlds->dev);
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_init_ras, "CXL");
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 0c4f73dd645f..090b52a26862 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1169,12 +1169,10 @@ static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
#endif
#ifdef CONFIG_CXL_RAS
-void pci_aer_unmask_internal_errors(struct pci_dev *dev);
bool is_internal_error(struct aer_err_info *info);
bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info);
void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info);
#else
-static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
static inline bool is_internal_error(struct aer_err_info *info) { return false; }
static inline bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return false; }
static inline void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info) { }
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 751a026fea73..4e2fc55f2497 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -82,11 +82,13 @@ int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd);
void cxl_register_proto_err_work(struct work_struct *work);
void cxl_unregister_proto_err_work(void);
bool cxl_error_is_native(struct pci_dev *dev);
+void pci_aer_unmask_internal_errors(struct pci_dev *dev);
#else
static inline int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd) { return 0; }
static inline void cxl_register_proto_err_work(struct work_struct *work) { }
static inline void cxl_unregister_proto_err_work(void) { }
static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
+static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
#endif
void pci_print_aer(struct pci_dev *dev, int aer_severity,
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v11 23/23] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (21 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 22/23] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
@ 2025-08-27 1:35 ` Terry Bowman
2025-08-29 0:07 ` [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Dave Jiang
23 siblings, 0 replies; 53+ messages in thread
From: Terry Bowman @ 2025-08-27 1:35 UTC (permalink / raw)
To: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
During CXL device cleanup the CXL PCIe Port device interrupts remain
enabled. This potentially allows unnecessary interrupt processing on
behalf of the CXL errors while the device is destroyed.
Disable CXL protocol errors by setting the CXL devices' AER mask register.
Introduce pci_aer_mask_internal_errors() similar to pci_aer_unmask_internal_errors().
Introduce cxl_mask_proto_interrupts() to call pci_aer_mask_internal_errors().
Add calls to cxl_mask_proto_interrupts() within CXL Port teardown for CXL
Root Ports, CXL Downstream Switch Ports, CXL Upstream Switch Ports, and CXL
Endpoints. Follow the same "bottom-up" approach used during CXL Port
teardown.
Implement cxl_mask_proto_interrupts() in a header file to avoid introducing
Kconfig ifdefs in cxl/core/port.c.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
Changes in v10->v11:
- Removed guard() cxl_mask_proto_interrupts(). RP was blocking during
testing. (Terry)
---
drivers/cxl/core/core.h | 2 ++
drivers/cxl/core/port.c | 6 ++++++
drivers/cxl/core/ras.c | 8 ++++++++
drivers/pci/pcie/cxl_aer.c | 21 +++++++++++++++++++++
include/linux/aer.h | 2 ++
5 files changed, 39 insertions(+)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 7e66fbb07b8a..385bfd38b778 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -157,6 +157,7 @@ void cxl_cor_error_detected(struct device *dev);
pci_ers_result_t cxl_error_detected(struct device *dev);
void cxl_port_cor_error_detected(struct device *dev);
pci_ers_result_t cxl_port_error_detected(struct device *dev);
+void cxl_mask_proto_interrupts(struct device *dev);
#else
static inline int cxl_ras_init(void)
{
@@ -186,6 +187,7 @@ static inline pci_ers_result_t cxl_port_error_detected(struct device *dev)
{
return PCI_ERS_RESULT_NONE;
}
+static inline void cxl_mask_proto_interrupts(struct device *dev) { }
#endif // CONFIG_CXL_RAS
int cxl_gpf_port_setup(struct cxl_dport *dport);
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 085c8620a797..bb326dc95d5f 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1428,6 +1428,9 @@ EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, "CXL");
*/
static void delete_switch_port(struct cxl_port *port)
{
+ cxl_mask_proto_interrupts(port->uport_dev);
+ cxl_mask_proto_interrupts(port->parent_dport->dport_dev);
+
devm_release_action(port->dev.parent, cxl_unlink_parent_dport, port);
devm_release_action(port->dev.parent, cxl_unlink_uport, port);
devm_release_action(port->dev.parent, unregister_port, port);
@@ -1441,6 +1444,7 @@ static void reap_dports(struct cxl_port *port)
device_lock_assert(&port->dev);
xa_for_each(&port->dports, index, dport) {
+ cxl_mask_proto_interrupts(dport->dport_dev);
devm_release_action(&port->dev, cxl_dport_unlink, dport);
devm_release_action(&port->dev, cxl_dport_remove, dport);
devm_kfree(&port->dev, dport);
@@ -1471,6 +1475,8 @@ static void cxl_detach_ep(void *data)
{
struct cxl_memdev *cxlmd = data;
+ cxl_mask_proto_interrupts(cxlmd->cxlds->dev);
+
for (int i = cxlmd->depth - 1; i >= 1; i--) {
struct cxl_port *port, *parent_port;
struct detach_ctx ctx = {
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 90ea0dfb942f..564144c2d23f 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -137,6 +137,14 @@ static void cxl_unmask_proto_interrupts(struct device *dev)
pci_aer_unmask_internal_errors(pdev);
}
+void cxl_mask_proto_interrupts(struct device *dev)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ pci_aer_mask_internal_errors(pdev);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mask_proto_interrupts, "CXL");
+
#ifdef CONFIG_CXL_RCH_RAS
static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
{
diff --git a/drivers/pci/pcie/cxl_aer.c b/drivers/pci/pcie/cxl_aer.c
index 6eeff0b78b47..2de2d9e7934b 100644
--- a/drivers/pci/pcie/cxl_aer.c
+++ b/drivers/pci/pcie/cxl_aer.c
@@ -46,6 +46,27 @@ void pci_aer_unmask_internal_errors(struct pci_dev *dev)
}
EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL");
+/**
+ * pci_aer_mask_internal_errors - mask internal errors
+ * @dev: pointer to the pcie_dev data structure
+ *
+ * Masks internal errors in the Uncorrectable and Correctable Error
+ * Mask registers.
+ *
+ * Note: AER must be enabled and supported by the device which must be
+ * checked in advance, e.g. with pcie_aer_is_native().
+ */
+void pci_aer_mask_internal_errors(struct pci_dev *dev)
+{
+ int aer = dev->aer_cap;
+
+ pci_clear_and_set_config_dword(dev, aer + PCI_ERR_UNCOR_MASK,
+ 0, PCI_ERR_UNC_INTN);
+ pci_clear_and_set_config_dword(dev, aer + PCI_ERR_COR_MASK,
+ 0, PCI_ERR_COR_INTERNAL);
+}
+EXPORT_SYMBOL_NS_GPL(pci_aer_mask_internal_errors, "CXL");
+
bool cxl_error_is_native(struct pci_dev *dev)
{
struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 4e2fc55f2497..82264221ad09 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -83,12 +83,14 @@ void cxl_register_proto_err_work(struct work_struct *work);
void cxl_unregister_proto_err_work(void);
bool cxl_error_is_native(struct pci_dev *dev);
void pci_aer_unmask_internal_errors(struct pci_dev *dev);
+void pci_aer_mask_internal_errors(struct pci_dev *dev);
#else
static inline int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd) { return 0; }
static inline void cxl_register_proto_err_work(struct work_struct *work) { }
static inline void cxl_unregister_proto_err_work(void) { }
static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
+static inline void pci_aer_mask_internal_errors(struct pci_dev *dev) { }
#endif
void pci_print_aer(struct pci_dev *dev, int aer_severity,
--
2.51.0.rc2.21.ge5ab6b3e5a
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v11 09/23] PCI/AER: Report CXL or PCIe bus error type in trace logging
2025-08-27 1:35 ` [PATCH v11 09/23] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
@ 2025-08-27 7:37 ` Lukas Wunner
0 siblings, 0 replies; 53+ messages in thread
From: Lukas Wunner @ 2025-08-27 7:37 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny,
linux-kernel, linux-pci
On Tue, Aug 26, 2025 at 08:35:24PM -0500, Terry Bowman wrote:
> The AER service driver and aer_event tracing currently log 'PCIe Bus Type'
> for all errors. Update the driver and aer_event tracing to log 'CXL Bus
> Type' for CXL device errors.
>
> This requires the AER can identify and distinguish between PCIe errors and
> CXL errors.
>
> Introduce boolean 'is_cxl' to 'struct aer_err_info'. Add assignment in
> aer_get_device_error_info() and pci_print_aer().
>
> Update the aer_event trace routine to accept a bus type string parameter.
aer_print_error() has a pointer to the struct pci_dev and you've added
an is_cxl bit to that struct in the preceding patch.
Is there a reason why you can't just use that dev->is_cxl bit, in lieu of
adding another is_cxl bit to struct aer_err_info?
If so, please document it in a code comment or at least in the commit
message. If there isn't, please use dev->is_cxl.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 16/23] cxl/pci: Introduce CXL Endpoint protocol error handlers
2025-08-27 1:35 ` [PATCH v11 16/23] cxl/pci: Introduce CXL Endpoint protocol error handlers Terry Bowman
@ 2025-08-27 7:48 ` Lukas Wunner
0 siblings, 0 replies; 53+ messages in thread
From: Lukas Wunner @ 2025-08-27 7:48 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny,
linux-kernel, linux-pci
On Tue, Aug 26, 2025 at 08:35:31PM -0500, Terry Bowman wrote:
> +++ b/include/linux/pci.h
> @@ -868,6 +868,9 @@ enum pci_ers_result {
>
> /* No AER capabilities registered for the driver */
> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
> +
> + /* System is unstable, panic. Is CXL specific */
> + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
> };
Please update Documentation/PCI/pci-error-recovery.rst as well with
this new return value.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 17/23] CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors
2025-08-27 1:35 ` [PATCH v11 17/23] CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors Terry Bowman
@ 2025-08-27 7:56 ` Lukas Wunner
0 siblings, 0 replies; 53+ messages in thread
From: Lukas Wunner @ 2025-08-27 7:56 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny,
linux-kernel, linux-pci
On Tue, Aug 26, 2025 at 08:35:32PM -0500, Terry Bowman wrote:
> +++ b/drivers/pci/pcie/aer.c
> @@ -1092,51 +1092,6 @@ static bool find_source_device(struct pci_dev *parent,
> return true;
> }
>
> -#ifdef CONFIG_CXL_RAS
> -
> -/**
> - * pci_aer_unmask_internal_errors - unmask internal errors
> - * @dev: pointer to the pci_dev data structure
> - *
> - * Unmask internal errors in the Uncorrectable and Correctable Error
> - * Mask registers.
> - *
> - * Note: AER must be enabled and supported by the device which must be
> - * checked in advance, e.g. with pcie_aer_is_native().
> - */
> -void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> -{
> - int aer = dev->aer_cap;
> - u32 mask;
> -
> - pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, &mask);
> - mask &= ~PCI_ERR_UNC_INTN;
> - pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, mask);
> -
> - pci_read_config_dword(dev, aer + PCI_ERR_COR_MASK, &mask);
> - mask &= ~PCI_ERR_COR_INTERNAL;
> - pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> -}
> -EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL");
PCI device drivers may have a need to unmask internal errors if they know
that the device is capable of signaling them and their pci_error_handlers
can take care of them.
Hence I'm in favor of keeping pci_aer_unmask_internal_errors() as a helper
in drivers/pci/pcie/aer.c which PCI device drivers may call.
In particular, the "xe" driver would be a potential future user of this
helper.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
2025-08-27 1:35 ` [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
@ 2025-08-27 8:04 ` Lukas Wunner
2025-08-27 12:19 ` kernel test robot
1 sibling, 0 replies; 53+ messages in thread
From: Lukas Wunner @ 2025-08-27 8:04 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny,
linux-kernel, linux-pci
On Tue, Aug 26, 2025 at 08:35:35PM -0500, Terry Bowman wrote:
> +++ b/include/linux/pci.h
> @@ -2760,6 +2760,17 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
> void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type);
> #endif
>
> +#if defined(CONFIG_PCIEAER)
> +pci_ers_result_t pci_ers_merge_result(enum pci_ers_result orig,
> + enum pci_ers_result new);
> +#else
> +static inline pci_ers_result_t pci_ers_merge_result(enum pci_ers_result orig,
> + enum pci_ers_result new)
> +{
> + return PCI_ERS_RESULT_NONE;
> +}
> +#endif
> +
> #include <linux/dma-mapping.h>
>
> #define pci_emerg(pdev, fmt, arg...) dev_emerg(&(pdev)->dev, fmt, ##arg)
Would it be possible for you to just declare a local version of
pci_ers_merge_result() within drivers/cxl/ which is encapsulated by
"#ifndef CONFIG_PCIEAER"?
That would avoid the need to make this public in include/linux/pci.h.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
2025-08-27 1:35 ` [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
@ 2025-08-27 11:55 ` Shiju Jose
2025-08-29 16:06 ` Jonathan Cameron
0 siblings, 1 reply; 53+ messages in thread
From: Shiju Jose @ 2025-08-27 11:55 UTC (permalink / raw)
To: Terry Bowman, dave@stgolabs.net, Jonathan Cameron,
dave.jiang@intel.com, alison.schofield@intel.com,
dan.j.williams@intel.com, bhelgaas@google.com,
ming.li@zohomail.com, Smita.KoralahalliChannabasappa@amd.com,
rrichter@amd.com, dan.carpenter@linaro.org,
PradeepVineshReddy.Kodamati@amd.com, lukas@wunner.de,
Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
linux-cxl@vger.kernel.org, alucerop@amd.com, ira.weiny@intel.com
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
>-----Original Message-----
>From: Terry Bowman <terry.bowman@amd.com>
>Sent: 27 August 2025 02:35
>To: dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
>dave.jiang@intel.com; alison.schofield@intel.com; dan.j.williams@intel.com;
>bhelgaas@google.com; Shiju Jose <shiju.jose@huawei.com>;
>ming.li@zohomail.com; Smita.KoralahalliChannabasappa@amd.com;
>rrichter@amd.com; dan.carpenter@linaro.org;
>PradeepVineshReddy.Kodamati@amd.com; lukas@wunner.de;
>Benjamin.Cheatham@amd.com;
>sathyanarayanan.kuppuswamy@linux.intel.com; linux-cxl@vger.kernel.org;
>alucerop@amd.com; ira.weiny@intel.com
>Cc: linux-kernel@vger.kernel.org; linux-pci@vger.kernel.org
>Subject: [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints
>and CXL Ports
>
>CXL currently has separate trace routines for CXL Port errors and CXL Endpoint
>errors. This is inconvenient for the user because they must enable
>2 sets of trace routines. Make updates to the trace logging such that a single
>trace routine logs both CXL Endpoint and CXL Port protocol errors.
>
>Keep the trace log fields 'memdev' and 'host'. While these are not accurate for
>non-Endpoints the fields will remain as-is to prevent breaking userspace RAS
>trace consumers.
>
>Add serial number parameter to the trace logging. This is used for EPs and 0 is
>provided for CXL port devices without a serial number.
>
>Leave the correctable and uncorrectable trace routines' TP_STRUCT__entry()
>unchanged with respect to member data types and order.
>
>Below is output of correctable and uncorrectable protocol error logging.
>CXL Root Port and CXL Endpoint examples are included below.
>
>Root Port:
>cxl_aer_correctable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0
>status='CRC Threshold Hit'
>cxl_aer_uncorrectable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0
>status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity
>Error'
>
>Endpoint:
>cxl_aer_correctable_error: memdev=mem3 host=0000:0f:00.0 serial=0
>status='CRC Threshold Hit'
>cxl_aer_uncorrectable_error: memdev=mem3 host=0000:0f:00.0 serial: 0 status:
>'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
>
>Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Shiju Jose <shiju.jose@huawei.com>,
apart from one error below.
>
>---
>Changes in v10->v11:
>- Updated CE and UCE trace routines to maintain consistent TP_Struct ABI and
>unchanged TP_printk() logging.
>---
> drivers/cxl/core/ras.c | 35 +++++++++++----------
> drivers/cxl/core/trace.h | 68 +++++++---------------------------------
> 2 files changed, 30 insertions(+), 73 deletions(-)
>
>diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c index
>3454cf1a118d..fda3b0a64dab 100644
>--- a/drivers/cxl/core/ras.c
>+++ b/drivers/cxl/core/ras.c
>@@ -13,7 +13,7 @@ static void cxl_cper_trace_corr_port_prot_err(struct
>pci_dev *pdev, {
> u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
>
>- trace_cxl_port_aer_correctable_error(&pdev->dev, status);
>+ trace_cxl_aer_correctable_error(&pdev->dev, status, 0);
> }
>
> static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev, @@ -
>28,8 +28,8 @@ static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev
>*pdev,
> else
> fe = status;
>
>- trace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,
>- ras_cap.header_log);
>+ trace_cxl_aer_uncorrectable_error(&pdev->dev, status, fe,
>+ ras_cap.header_log, 0);
> }
>
> static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd, @@ -37,7
>+37,8 @@ static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd,
>{
> u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
>
>- trace_cxl_aer_correctable_error(cxlmd, status);
>+ trace_cxl_aer_correctable_error(&cxlmd->dev, cxlmd->cxlds->serial,
>+ status);
Please correct to
trace_cxl_aer_correctable_error(&cxlmd->dev, status,
cxlmd->cxlds->serial);
> }
>
> static void
[...]
>-
> TRACE_EVENT(cxl_aer_correctable_error,
>- TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
>- TP_ARGS(cxlmd, status),
>+ TP_PROTO(const struct device *cxlmd, u32 status, u64 serial),
>+ TP_ARGS(cxlmd, status, serial),
> TP_STRUCT__entry(
>- __string(memdev, dev_name(&cxlmd->dev))
>- __string(host, dev_name(cxlmd->dev.parent))
>+ __string(memdev, dev_name(cxlmd))
>+ __string(host, dev_name(cxlmd->parent))
> __field(u64, serial)
> __field(u32, status)
> ),
> TP_fast_assign(
> __assign_str(memdev);
> __assign_str(host);
>- __entry->serial = cxlmd->cxlds->serial;
>+ __entry->serial = serial;
> __entry->status = status;
> ),
> TP_printk("memdev=%s host=%s serial=%lld: status: '%s'",
>--
>2.51.0.rc2.21.ge5ab6b3e5a
Thanks,
Shiju
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
2025-08-27 1:35 ` [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
2025-08-27 8:04 ` Lukas Wunner
@ 2025-08-27 12:19 ` kernel test robot
1 sibling, 0 replies; 53+ messages in thread
From: kernel test robot @ 2025-08-27 12:19 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: oe-kbuild-all, linux-kernel, linux-pci
Hi Terry,
kernel test robot noticed the following build errors:
[auto build test ERROR on f11a5f89910a7ae970fbce4fdc02d86a8ba8570f]
url: https://github.com/intel-lab-lkp/linux/commits/Terry-Bowman/cxl-Remove-ifdef-blocks-of-CONFIG_PCIEAER_CXL-from-core-pci-c/20250827-094257
base: f11a5f89910a7ae970fbce4fdc02d86a8ba8570f
patch link: https://lore.kernel.org/r/20250827013539.903682-21-terry.bowman%40amd.com
patch subject: [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
config: powerpc64-randconfig-002-20250827 (https://download.01.org/0day-ci/archive/20250827/202508271903.HTGgt8kV-lkp@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 11.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250827/202508271903.HTGgt8kV-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508271903.HTGgt8kV-lkp@intel.com/
All errors (new ones prefixed by >>):
>> arch/powerpc/kernel/eeh_driver.c:68:28: error: redefinition of 'pci_ers_merge_result'
68 | static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
| ^~~~~~~~~~~~~~~~~~~~
In file included from arch/powerpc/kernel/eeh_driver.c:13:
include/linux/pci.h:2767:32: note: previous definition of 'pci_ers_merge_result' with type 'pci_ers_result_t(enum pci_ers_result, enum pci_ers_result)' {aka 'unsigned int(enum pci_ers_result, enum pci_ers_result)'}
2767 | static inline pci_ers_result_t pci_ers_merge_result(enum pci_ers_result orig,
| ^~~~~~~~~~~~~~~~~~~~
vim +/pci_ers_merge_result +68 arch/powerpc/kernel/eeh_driver.c
20b344971433da Sam Bobroff 2018-05-25 67
30424e386a30d1 Sam Bobroff 2018-05-25 @68 static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
30424e386a30d1 Sam Bobroff 2018-05-25 69 enum pci_ers_result new)
30424e386a30d1 Sam Bobroff 2018-05-25 70 {
30424e386a30d1 Sam Bobroff 2018-05-25 71 if (eeh_result_priority(new) > eeh_result_priority(old))
30424e386a30d1 Sam Bobroff 2018-05-25 72 return new;
30424e386a30d1 Sam Bobroff 2018-05-25 73 return old;
30424e386a30d1 Sam Bobroff 2018-05-25 74 }
30424e386a30d1 Sam Bobroff 2018-05-25 75
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
2025-08-27 1:35 ` [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
@ 2025-08-27 14:51 ` Lukas Wunner
2025-08-29 15:42 ` Jonathan Cameron
2025-08-29 15:47 ` Jonathan Cameron
2025-08-28 21:07 ` Dave Jiang
1 sibling, 2 replies; 53+ messages in thread
From: Lukas Wunner @ 2025-08-27 14:51 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny,
linux-kernel, linux-pci
On Tue, Aug 26, 2025 at 08:35:22PM -0500, Terry Bowman wrote:
> The CXL DVSECs are currently defined in cxl/core/cxlpci.h. These are not
> accessible to other subsystems.
>
> Change DVSEC name formatting to follow the existing PCI format in
> pci_regs.h. The current format uses CXL_DVSEC_XYZ. Change to be PCI_DVSEC_CXL_XYZ.
[...]
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1225,9 +1225,61 @@
> /* Deprecated old name, replaced with PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE */
> #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE
>
> -/* Compute Express Link (CXL r3.1, sec 8.1.5) */
> -#define PCI_DVSEC_CXL_PORT 3
> -#define PCI_DVSEC_CXL_PORT_CTL 0x0c
> -#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
> +/* Compute Express Link (CXL r3.2, sec 8.1)
> + *
> + * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
> + * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
> + * registers on downstream link-up events.
> + */
> +
> +#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
> +
> +/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
> +#define PCI_DVSEC_CXL_DEVICE 0
> +#define PCI_DVSEC_CXL_CAP_OFFSET 0xA
> +#define PCI_DVSEC_CXL_MEM_CAPABLE BIT(2)
> +#define PCI_DVSEC_CXL_HDM_COUNT_MASK GENMASK(5, 4)
> +#define PCI_DVSEC_CXL_CTRL_OFFSET 0xC
> +#define PCI_DVSEC_CXL_MEM_ENABLE BIT(2)
> +#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> +#define PCI_DVSEC_CXL_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> +#define PCI_DVSEC_CXL_MEM_INFO_VALID BIT(0)
> +#define PCI_DVSEC_CXL_MEM_ACTIVE BIT(1)
> +#define PCI_DVSEC_CXL_MEM_SIZE_LOW_MASK GENMASK(31, 28)
> +#define PCI_DVSEC_CXL_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> +#define PCI_DVSEC_CXL_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> +#define PCI_DVSEC_CXL_MEM_BASE_LOW_MASK GENMASK(31, 28)
Is it legal to use BIT() in a uapi header?
We've only allowed use of GENMASK() in uapi headers since 2023 with
commit 3c7a8e190bc5 ("uapi: introduce uapi-friendly macros for GENMASK").
But there is no uapi header in the kernel tree defining BIT().
I note that include/uapi/cxl/features.h has plenty of occurrences of BIT()
since commit 9b8e73cdb141 ("cxl: Move cxl feature command structs to user
header"), which went into v6.15.
ndctl contains a bitmap.h header defining BIT(), I guess that's why this
wasn't perceived as a problem so far:
https://github.com/pmem/ndctl/raw/main/util/bitmap.h
However existing user space applications including <linux/pci_regs.h>
may not have a BIT() definition and I suspect your change above will
break the build of those applications.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 08/23] PCI/CXL: Introduce pcie_is_cxl()
2025-08-27 1:35 ` [PATCH v11 08/23] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
@ 2025-08-28 8:18 ` Alejandro Lucero Palau
0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-28 8:18 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, ira.weiny
Cc: linux-kernel, linux-pci
Hi Terry,
On 8/27/25 02:35, Terry Bowman wrote:
> CXL and AER drivers need the ability to identify CXL devices.
>
> Introduce set_pcie_cxl() with logic checking for CXL.mem or CXL.cache
> status in the CXL Flexbus DVSEC status register. The CXL Flexbus DVSEC
> presence is used because it is required for all the CXL PCIe devices.[1]
>
> Add boolean 'struct pci_dev::is_cxl' with the purpose to cache the CXL
> CXL.cache and CXl.mem status.
>
> In the case the device is an EP or USP, call set_pcie_cxl() on behalf of
> the parent downstream device. This will make certain the correct state
> is cached.
>
> Add function pcie_is_cxl() to return 'struct pci_dev::is_cxl'.
>
> [1] CXL 3.1 Spec, 8.1.1 PCIe Designated Vendor-Specific Extended
> Capability (DVSEC) ID Assignment, Table 8-2
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
With the changes for checking flexbus state:
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Just a minor thing below, something I do not fully understand but I
guess it was discussed/explained previously.
> ---
> Changes in v10->v11:
> - Amended set_pcie_cxl() to check for Upstream Port's and EP's parent
> downstream port by calling set_pcie_cxl(). (Dan)
> - Retitle patch: 'Add' -> 'Introduce'
> - Add check for CXL.mem and CXL.cache (Alejandro, Dan)
> ---
> drivers/pci/probe.c | 25 +++++++++++++++++++++++++
> include/linux/pci.h | 6 ++++++
> include/uapi/linux/pci_regs.h | 3 +++
> 3 files changed, 34 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 4b8693ec9e4c..b08cd0346136 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1691,6 +1691,29 @@ static void set_pcie_thunderbolt(struct pci_dev *dev)
> dev->is_thunderbolt = 1;
> }
>
> +static void set_pcie_cxl(struct pci_dev *dev)
> +{
> + struct pci_dev *parent;
> + u16 dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_FLEXBUS_PORT);
> + if (dvsec) {
> + u16 cap;
> +
> + pci_read_config_word(dev, dvsec + PCI_DVSEC_CXL_FLEXBUS_STATUS_OFFSET, &cap);
> +
> + dev->is_cxl = FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_STATUS_CACHE_MASK, cap) ||
> + FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_STATUS_MEM_MASK, cap);
> + }
> +
> + if (!pci_is_pcie(dev) ||
> + !(pci_pcie_type(dev) == PCI_EXP_TYPE_ENDPOINT ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM))
> + return;
> +
> + parent = pci_upstream_bridge(dev);
> + set_pcie_cxl(parent);
This recursion is confusing to me.
Is it not the parent already having this set from its own pci setup? Or
maybe do we expect that to change after a reset and this is a sanity check?
> +}
> +
> static void set_pcie_untrusted(struct pci_dev *dev)
> {
> struct pci_dev *parent = pci_upstream_bridge(dev);
> @@ -2021,6 +2044,8 @@ int pci_setup_device(struct pci_dev *dev)
> /* Need to have dev->cfg_size ready */
> set_pcie_thunderbolt(dev);
>
> + set_pcie_cxl(dev);
> +
> set_pcie_untrusted(dev);
>
> if (pci_is_pcie(dev))
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 05e68f35f392..79878243b681 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -453,6 +453,7 @@ struct pci_dev {
> unsigned int is_hotplug_bridge:1;
> unsigned int shpc_managed:1; /* SHPC owned by shpchp */
> unsigned int is_thunderbolt:1; /* Thunderbolt controller */
> + unsigned int is_cxl:1; /* Compute Express Link (CXL) */
> /*
> * Devices marked being untrusted are the ones that can potentially
> * execute DMA attacks and similar. They are typically connected
> @@ -744,6 +745,11 @@ static inline bool pci_is_vga(struct pci_dev *pdev)
> return false;
> }
>
> +static inline bool pcie_is_cxl(struct pci_dev *pci_dev)
> +{
> + return pci_dev->is_cxl;
> +}
> +
> #define for_each_pci_bridge(dev, bus) \
> list_for_each_entry(dev, &bus->devices, bus_list) \
> if (!pci_is_bridge(dev)) {} else
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index b03244d55aea..252c06402b13 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1274,6 +1274,9 @@
>
> /* CXL 3.2 8.1.8: PCIe DVSEC for Flex Bus Port */
> #define PCI_DVSEC_CXL_FLEXBUS_PORT 7
> +#define PCI_DVSEC_CXL_FLEXBUS_STATUS_OFFSET 0xE
> +#define PCI_DVSEC_CXL_FLEXBUS_STATUS_CACHE_MASK BIT(0)
> +#define PCI_DVSEC_CXL_FLEXBUS_STATUS_MEM_MASK BIT(2)
>
> /* CXL 3.2 8.1.9: Register Locator DVSEC */
> #define PCI_DVSEC_CXL_REG_LOCATOR 8
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 04/23] cxl/pci: Remove unnecessary CXL RCH handling helper functions
2025-08-27 1:35 ` [PATCH v11 04/23] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
@ 2025-08-28 8:35 ` Alejandro Lucero Palau
2025-08-28 17:32 ` Dave Jiang
1 sibling, 0 replies; 53+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-28 8:35 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, ira.weiny
Cc: linux-kernel, linux-pci
On 8/27/25 02:35, Terry Bowman wrote:
> cxl_handle_rdport_cor_ras() and cxl_handle_rdport_ras() are specific
> to Restricted CXL Host (RCH) handling. Improve readability and
> maintainability by replacing these and instead using the common
> cxl_handle_cor_ras() and cxl_handle_ras() functions.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Good and simple.
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> ---
>
> Changes in v10->v11:
> - New patch
> ---
> drivers/cxl/core/ras.c | 16 ++--------------
> 1 file changed, 2 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 544a0d8773fa..0875ce8116ff 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -233,12 +233,6 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
> }
> }
>
> -static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
> - struct cxl_dport *dport)
> -{
> - return cxl_handle_cor_ras(cxlds, dport->regs.ras);
> -}
> -
> /*
> * Log the state of the RAS status registers and prepare them to log the
> * next error status. Return 1 if reset needed.
> @@ -276,12 +270,6 @@ static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> return true;
> }
>
> -static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
> - struct cxl_dport *dport)
> -{
> - return cxl_handle_ras(cxlds, dport->regs.ras);
> -}
> -
> /*
> * Copy the AER capability registers using 32 bit read accesses.
> * This is necessary because RCRB AER capability is MMIO mapped. Clear the
> @@ -350,9 +338,9 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
>
> pci_print_aer(pdev, severity, &aer_regs);
> if (severity == AER_CORRECTABLE)
> - cxl_handle_rdport_cor_ras(cxlds, dport);
> + cxl_handle_cor_ras(cxlds, dport->regs.ras);
> else
> - cxl_handle_rdport_ras(cxlds, dport);
> + cxl_handle_ras(cxlds, dport->regs.ras);
> }
>
> void cxl_cor_error_detected(struct pci_dev *pdev)
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 05/23] cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block
2025-08-27 1:35 ` [PATCH v11 05/23] cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block Terry Bowman
@ 2025-08-28 8:57 ` Alejandro Lucero Palau
2025-08-29 15:33 ` Jonathan Cameron
1 sibling, 0 replies; 53+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-28 8:57 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, ira.weiny
Cc: linux-kernel, linux-pci
On 8/27/25 02:35, Terry Bowman wrote:
> Restricted CXL Host (RCH) protocol error handling uses a procedure distinct
> from the CXL Virtual Hierarchy (VH) handling. This is because of the
> differences in the RCH and VH topologies. Improve the maintainability and
> add ability to enable/disable RCH handling.
>
> Move and combine the RCH handling code into a single block conditionally
> compiled with the CONFIG_CXL_RCH_RAS kernel config.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
> v10->v11:
> - New patch
> ---
> drivers/cxl/core/ras.c | 175 +++++++++++++++++++++--------------------
> 1 file changed, 90 insertions(+), 85 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 0875ce8116ff..f42f9a255ef8 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -126,6 +126,7 @@ void cxl_ras_exit(void)
> cancel_work_sync(&cxl_cper_prot_err_work);
> }
>
> +#ifdef CONFIG_CXL_RCH_RAS
You are introducing CONFIG_CXL_RCH_RAS in the next patch. Is it correct
to use it here?
> static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> {
> resource_size_t aer_phys;
> @@ -141,18 +142,6 @@ static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> }
> }
>
> -static void cxl_dport_map_ras(struct cxl_dport *dport)
> -{
> - struct cxl_register_map *map = &dport->reg_map;
> - struct device *dev = dport->dport_dev;
> -
> - if (!map->component_map.ras.valid)
> - dev_dbg(dev, "RAS registers not found\n");
> - else if (cxl_map_component_regs(map, &dport->regs.component,
> - BIT(CXL_CM_CAP_CAP_ID_RAS)))
> - dev_dbg(dev, "Failed to map RAS capability.\n");
> -}
> -
> static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
> {
> void __iomem *aer_base = dport->regs.dport_aer;
> @@ -177,6 +166,95 @@ static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
> writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
> }
>
> +/*
> + * Copy the AER capability registers using 32 bit read accesses.
> + * This is necessary because RCRB AER capability is MMIO mapped. Clear the
> + * status after copying.
> + *
> + * @aer_base: base address of AER capability block in RCRB
> + * @aer_regs: destination for copying AER capability
> + */
> +static bool cxl_rch_get_aer_info(void __iomem *aer_base,
> + struct aer_capability_regs *aer_regs)
> +{
> + int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
> + u32 *aer_regs_buf = (u32 *)aer_regs;
> + int n;
> +
> + if (!aer_base)
> + return false;
> +
> + /* Use readl() to guarantee 32-bit accesses */
> + for (n = 0; n < read_cnt; n++)
> + aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
> +
> + writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
> + writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
> +
> + return true;
> +}
> +
> +/* Get AER severity. Return false if there is no error. */
> +static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
> + int *severity)
> +{
> + if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
> + if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
> + *severity = AER_FATAL;
> + else
> + *severity = AER_NONFATAL;
> + return true;
> + }
> +
> + if (aer_regs->cor_status & ~aer_regs->cor_mask) {
> + *severity = AER_CORRECTABLE;
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> +{
> + struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> + struct aer_capability_regs aer_regs;
> + struct cxl_dport *dport;
> + int severity;
> +
> + struct cxl_port *port __free(put_cxl_port) =
> + cxl_pci_find_port(pdev, &dport);
> + if (!port)
> + return;
> +
> + if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
> + return;
> +
> + if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
> + return;
> +
> + pci_print_aer(pdev, severity, &aer_regs);
> + if (severity == AER_CORRECTABLE)
> + cxl_handle_cor_ras(cxlds, dport->regs.ras);
> + else
> + cxl_handle_ras(cxlds, dport->regs.ras);
> +}
> +#else
> +static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
> +static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
> +static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
> +#endif
> +
> +static void cxl_dport_map_ras(struct cxl_dport *dport)
> +{
> + struct cxl_register_map *map = &dport->reg_map;
> + struct device *dev = dport->dport_dev;
> +
> + if (!map->component_map.ras.valid)
> + dev_dbg(dev, "RAS registers not found\n");
> + else if (cxl_map_component_regs(map, &dport->regs.component,
> + BIT(CXL_CM_CAP_CAP_ID_RAS)))
> + dev_dbg(dev, "Failed to map RAS capability.\n");
> +}
>
> /**
> * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
> @@ -270,79 +348,6 @@ static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> return true;
> }
>
> -/*
> - * Copy the AER capability registers using 32 bit read accesses.
> - * This is necessary because RCRB AER capability is MMIO mapped. Clear the
> - * status after copying.
> - *
> - * @aer_base: base address of AER capability block in RCRB
> - * @aer_regs: destination for copying AER capability
> - */
> -static bool cxl_rch_get_aer_info(void __iomem *aer_base,
> - struct aer_capability_regs *aer_regs)
> -{
> - int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
> - u32 *aer_regs_buf = (u32 *)aer_regs;
> - int n;
> -
> - if (!aer_base)
> - return false;
> -
> - /* Use readl() to guarantee 32-bit accesses */
> - for (n = 0; n < read_cnt; n++)
> - aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
> -
> - writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
> - writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
> -
> - return true;
> -}
> -
> -/* Get AER severity. Return false if there is no error. */
> -static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
> - int *severity)
> -{
> - if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
> - if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
> - *severity = AER_FATAL;
> - else
> - *severity = AER_NONFATAL;
> - return true;
> - }
> -
> - if (aer_regs->cor_status & ~aer_regs->cor_mask) {
> - *severity = AER_CORRECTABLE;
> - return true;
> - }
> -
> - return false;
> -}
> -
> -static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> -{
> - struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> - struct aer_capability_regs aer_regs;
> - struct cxl_dport *dport;
> - int severity;
> -
> - struct cxl_port *port __free(put_cxl_port) =
> - cxl_pci_find_port(pdev, &dport);
> - if (!port)
> - return;
> -
> - if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
> - return;
> -
> - if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
> - return;
> -
> - pci_print_aer(pdev, severity, &aer_regs);
> - if (severity == AER_CORRECTABLE)
> - cxl_handle_cor_ras(cxlds, dport->regs.ras);
> - else
> - cxl_handle_ras(cxlds, dport->regs.ras);
> -}
> -
> void cxl_cor_error_detected(struct pci_dev *pdev)
> {
> struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 03/23] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions
2025-08-27 1:35 ` [PATCH v11 03/23] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
@ 2025-08-28 15:28 ` Dave Jiang
0 siblings, 0 replies; 53+ messages in thread
From: Dave Jiang @ 2025-08-28 15:28 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> The CXL driver's cxl_handle_endpoint_cor_ras()/cxl_handle_endpoint_ras()
> are unnecessary helper functions used only for Endpoints. Remove these
> functions as they are not common for all CXL devices and do not provide
> value for EP handling.
>
> Rename __cxl_handle_ras to cxl_handle_ras() and __cxl_handle_cor_ras()
> to cxl_handle_cor_ras().
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v10->v11:
> - None
> ---
> drivers/cxl/core/ras.c | 22 ++++++----------------
> 1 file changed, 6 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index c4f0fa7e40aa..544a0d8773fa 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -200,7 +200,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
>
> -static void __cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> +static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> {
> void __iomem *addr;
> u32 status;
> @@ -236,14 +236,14 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
> static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
> struct cxl_dport *dport)
> {
> - return __cxl_handle_cor_ras(cxlds, dport->regs.ras);
> + return cxl_handle_cor_ras(cxlds, dport->regs.ras);
> }
>
> /*
> * Log the state of the RAS status registers and prepare them to log the
> * next error status. Return 1 if reset needed.
> */
> -static bool __cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> +static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> {
> u32 hl[CXL_HEADERLOG_SIZE_U32];
> void __iomem *addr;
> @@ -279,7 +279,7 @@ static bool __cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base
> static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
> struct cxl_dport *dport)
> {
> - return __cxl_handle_ras(cxlds, dport->regs.ras);
> + return cxl_handle_ras(cxlds, dport->regs.ras);
> }
>
> /*
> @@ -355,16 +355,6 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> cxl_handle_rdport_ras(cxlds, dport);
> }
>
> -static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds)
> -{
> - return __cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
> -}
> -
> -static bool cxl_handle_endpoint_ras(struct cxl_dev_state *cxlds)
> -{
> - return __cxl_handle_ras(cxlds, cxlds->regs.ras);
> -}
> -
> void cxl_cor_error_detected(struct pci_dev *pdev)
> {
> struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> @@ -381,7 +371,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
> if (cxlds->rcd)
> cxl_handle_rdport_errors(cxlds);
>
> - cxl_handle_endpoint_cor_ras(cxlds);
> + cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
> }
> }
> EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
> @@ -410,7 +400,7 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> * chance the situation is recoverable dump the status of the RAS
> * capability registers and bounce the active state of the memdev.
> */
> - ue = cxl_handle_endpoint_ras(cxlds);
> + ue = cxl_handle_ras(cxlds, cxlds->regs.ras);
> }
>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 04/23] cxl/pci: Remove unnecessary CXL RCH handling helper functions
2025-08-27 1:35 ` [PATCH v11 04/23] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
2025-08-28 8:35 ` Alejandro Lucero Palau
@ 2025-08-28 17:32 ` Dave Jiang
1 sibling, 0 replies; 53+ messages in thread
From: Dave Jiang @ 2025-08-28 17:32 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> cxl_handle_rdport_cor_ras() and cxl_handle_rdport_ras() are specific
> to Restricted CXL Host (RCH) handling. Improve readability and
> maintainability by replacing these and instead using the common
> cxl_handle_cor_ras() and cxl_handle_ras() functions.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
>
> Changes in v10->v11:
> - New patch
> ---
> drivers/cxl/core/ras.c | 16 ++--------------
> 1 file changed, 2 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 544a0d8773fa..0875ce8116ff 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -233,12 +233,6 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
> }
> }
>
> -static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
> - struct cxl_dport *dport)
> -{
> - return cxl_handle_cor_ras(cxlds, dport->regs.ras);
> -}
> -
> /*
> * Log the state of the RAS status registers and prepare them to log the
> * next error status. Return 1 if reset needed.
> @@ -276,12 +270,6 @@ static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> return true;
> }
>
> -static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
> - struct cxl_dport *dport)
> -{
> - return cxl_handle_ras(cxlds, dport->regs.ras);
> -}
> -
> /*
> * Copy the AER capability registers using 32 bit read accesses.
> * This is necessary because RCRB AER capability is MMIO mapped. Clear the
> @@ -350,9 +338,9 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
>
> pci_print_aer(pdev, severity, &aer_regs);
> if (severity == AER_CORRECTABLE)
> - cxl_handle_rdport_cor_ras(cxlds, dport);
> + cxl_handle_cor_ras(cxlds, dport->regs.ras);
> else
> - cxl_handle_rdport_ras(cxlds, dport);
> + cxl_handle_ras(cxlds, dport->regs.ras);
> }
>
> void cxl_cor_error_detected(struct pci_dev *pdev)
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 06/23] CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors
2025-08-27 1:35 ` [PATCH v11 06/23] CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors Terry Bowman
@ 2025-08-28 20:53 ` Dave Jiang
2025-08-29 8:39 ` Lukas Wunner
0 siblings, 1 reply; 53+ messages in thread
From: Dave Jiang @ 2025-08-28 20:53 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> The restricted CXL Host (RCH) AER error handling logic currently resides
> in the AER driver file, drivers/pci/pcie/aer.c. CXL specific changes are
> conditionally compiled using #ifdefs.
>
> Improve the AER driver maintainability by separating the RCH specific logic
> from the AER driver's core functionality and removing the ifdefs. Introduce
> drivers/pci/pcie/rch_aer.c for moving the RCH AER logic into.
>
> Move the CXL logic into the new file but leave helper functions in aer.c
> for now as they will be moved in future patch for CXL virtual hierarchy
> handling.
>
> 2 changes are required to maintain compilation after the move. Change
> cxl_rch_handle_error() & cxl_rch_enable_rcec() to be non-static inorder for
> accessing from the AER driver in aer.c.
>
> Introduce CONFIG_CXL_RCH_RAS in cxl/Kconfig. Update pcie/pcie/Makefile to
> conditionally compile rch_aer.c file using CONFIG_CXL_RCH_RAS.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
> Changes in v10->v11:
> - Remove changes in code-split and move to earlier, new patch
> - Add #include <linux/bitfield.h> to cxl_ras.c
> - Move cxl_rch_handle_error() & cxl_rch_enable_rcec() declarations from pci.h
> to aer.h, more localized.
> - Introduce CONFIG_CXL_RCH_RAS, includes Makefile changes, ras.c ifdef changes
> ---
> drivers/cxl/Kconfig | 9 +++-
> drivers/cxl/core/ras.c | 3 ++
> drivers/pci/pci.h | 20 +++++++
> drivers/pci/pcie/Makefile | 1 +
> drivers/pci/pcie/aer.c | 108 +++----------------------------------
> drivers/pci/pcie/rch_aer.c | 99 ++++++++++++++++++++++++++++++++++
I wonder if this should be cxl_rch_aer.c to be clear that it's cxl related code.
DJ
> 6 files changed, 138 insertions(+), 102 deletions(-)
> create mode 100644 drivers/pci/pcie/rch_aer.c
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 1c7c8989fd8b..028201e24523 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -235,5 +235,12 @@ config CXL_MCE
>
> config CXL_RAS
> def_bool y
> - depends on ACPI_APEI_GHES && PCIEAER_CXL
> + depends on ACPI_APEI_GHES && PCIEAER && CXL_PCI
> +
> +config CXL_RCH_RAS
> + bool "CXL: Restricted CXL Host (RCH) protocol error handling"
> + def_bool n
> + depends on CXL_RAS
> + help
> + RAS support for Restricted CXL Host (RCH) defined in CXL1.1.
> endif
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index f42f9a255ef8..c9f2f0335bfd 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -126,6 +126,9 @@ void cxl_ras_exit(void)
> cancel_work_sync(&cxl_cper_prot_err_work);
> }
>
> +static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
> +static void cxl_handle_cor_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base);
> +
> #ifdef CONFIG_CXL_RCH_RAS
> static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> {
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 12215ee72afb..c8a0c0ec0073 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -1159,4 +1159,24 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
> (PCI_CONF1_ADDRESS(bus, dev, func, reg) | \
> PCI_CONF1_EXT_REG(reg))
>
> +struct aer_err_info;
> +
> +#ifdef CONFIG_CXL_RCH_RAS
> +void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info);
> +void cxl_rch_enable_rcec(struct pci_dev *rcec);
> +#else
> +static inline void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { }
> +static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
> +#endif
> +
> +#ifdef CONFIG_CXL_RAS
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> +bool cxl_error_is_native(struct pci_dev *dev);
> +bool is_internal_error(struct aer_err_info *info);
> +#else
> +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> +static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
> +static inline bool is_internal_error(struct aer_err_info *info) { return false; }
> +#endif
> +
> #endif /* DRIVERS_PCI_H */
> diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
> index 173829aa02e6..07c299dbcdd7 100644
> --- a/drivers/pci/pcie/Makefile
> +++ b/drivers/pci/pcie/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o bwctrl.o
>
> obj-y += aspm.o
> obj-$(CONFIG_PCIEAER) += aer.o err.o tlp.o
> +obj-$(CONFIG_CXL_RCH_RAS) += rch_aer.o
> obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
> obj-$(CONFIG_PCIE_PME) += pme.o
> obj-$(CONFIG_PCIE_DPC) += dpc.o
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 7fe9f883f5c5..29de7ee861f7 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1098,7 +1098,7 @@ static bool find_source_device(struct pci_dev *parent,
> * Note: AER must be enabled and supported by the device which must be
> * checked in advance, e.g. with pcie_aer_is_native().
> */
> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> {
> int aer = dev->aer_cap;
> u32 mask;
> @@ -1111,119 +1111,25 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> mask &= ~PCI_ERR_COR_INTERNAL;
> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> }
> +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL");
>
> -static bool is_cxl_mem_dev(struct pci_dev *dev)
> -{
> - /*
> - * The capability, status, and control fields in Device 0,
> - * Function 0 DVSEC control the CXL functionality of the
> - * entire device (CXL 3.0, 8.1.3).
> - */
> - if (dev->devfn != PCI_DEVFN(0, 0))
> - return false;
> -
> - /*
> - * CXL Memory Devices must have the 502h class code set (CXL
> - * 3.0, 8.1.12.1).
> - */
> - if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> - return false;
> -
> - return true;
> -}
> -
> -static bool cxl_error_is_native(struct pci_dev *dev)
> +bool cxl_error_is_native(struct pci_dev *dev)
> {
> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
>
> return (pcie_ports_native || host->native_aer);
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_error_is_native, "CXL");
>
> -static bool is_internal_error(struct aer_err_info *info)
> +bool is_internal_error(struct aer_err_info *info)
> {
> if (info->severity == AER_CORRECTABLE)
> return info->status & PCI_ERR_COR_INTERNAL;
>
> return info->status & PCI_ERR_UNC_INTN;
> }
> -
> -static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> -{
> - struct aer_err_info *info = (struct aer_err_info *)data;
> - const struct pci_error_handlers *err_handler;
> -
> - if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
> - return 0;
> -
> - /* Protect dev->driver */
> - device_lock(&dev->dev);
> -
> - err_handler = dev->driver ? dev->driver->err_handler : NULL;
> - if (!err_handler)
> - goto out;
> -
> - if (info->severity == AER_CORRECTABLE) {
> - if (err_handler->cor_error_detected)
> - err_handler->cor_error_detected(dev);
> - } else if (err_handler->error_detected) {
> - if (info->severity == AER_NONFATAL)
> - err_handler->error_detected(dev, pci_channel_io_normal);
> - else if (info->severity == AER_FATAL)
> - err_handler->error_detected(dev, pci_channel_io_frozen);
> - }
> -out:
> - device_unlock(&dev->dev);
> - return 0;
> -}
> -
> -static void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> -{
> - /*
> - * Internal errors of an RCEC indicate an AER error in an
> - * RCH's downstream port. Check and handle them in the CXL.mem
> - * device driver.
> - */
> - if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> - is_internal_error(info))
> - pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
> -}
> -
> -static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> -{
> - bool *handles_cxl = data;
> -
> - if (!*handles_cxl)
> - *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
> -
> - /* Non-zero terminates iteration */
> - return *handles_cxl;
> -}
> -
> -static bool handles_cxl_errors(struct pci_dev *rcec)
> -{
> - bool handles_cxl = false;
> -
> - if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
> - pcie_aer_is_native(rcec))
> - pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
> -
> - return handles_cxl;
> -}
> -
> -static void cxl_rch_enable_rcec(struct pci_dev *rcec)
> -{
> - if (!handles_cxl_errors(rcec))
> - return;
> -
> - pci_aer_unmask_internal_errors(rcec);
> - pci_info(rcec, "CXL: Internal errors unmasked");
> -}
> -
> -#else
> -static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { }
> -static inline void cxl_rch_handle_error(struct pci_dev *dev,
> - struct aer_err_info *info) { }
> -#endif
> +EXPORT_SYMBOL_NS_GPL(is_internal_error, "CXL");
> +#endif /* CONFIG_CXL_RAS */
>
> /**
> * pci_aer_handle_error - handle logging error into an event log
> diff --git a/drivers/pci/pcie/rch_aer.c b/drivers/pci/pcie/rch_aer.c
> new file mode 100644
> index 000000000000..bfe071eebf67
> --- /dev/null
> +++ b/drivers/pci/pcie/rch_aer.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2025 AMD Corporation. All rights reserved. */
> +
> +#include <linux/pci.h>
> +#include <linux/aer.h>
> +#include <linux/bitfield.h>
> +#include "../pci.h"
> +
> +static bool is_cxl_mem_dev(struct pci_dev *dev)
> +{
> + /*
> + * The capability, status, and control fields in Device 0,
> + * Function 0 DVSEC control the CXL functionality of the
> + * entire device (CXL 3.0, 8.1.3).
> + */
> + if (dev->devfn != PCI_DEVFN(0, 0))
> + return false;
> +
> + /*
> + * CXL Memory Devices must have the 502h class code set (CXL
> + * 3.0, 8.1.12.1).
> + */
> + if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> + return false;
> +
> + return true;
> +}
> +
> +static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> +{
> + struct aer_err_info *info = (struct aer_err_info *)data;
> + const struct pci_error_handlers *err_handler;
> +
> + if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
> + return 0;
> +
> + /* Protect dev->driver */
> + device_lock(&dev->dev);
> +
> + err_handler = dev->driver ? dev->driver->err_handler : NULL;
> + if (!err_handler)
> + goto out;
> +
> + if (info->severity == AER_CORRECTABLE) {
> + if (err_handler->cor_error_detected)
> + err_handler->cor_error_detected(dev);
> + } else if (err_handler->error_detected) {
> + if (info->severity == AER_NONFATAL)
> + err_handler->error_detected(dev, pci_channel_io_normal);
> + else if (info->severity == AER_FATAL)
> + err_handler->error_detected(dev, pci_channel_io_frozen);
> + }
> +out:
> + device_unlock(&dev->dev);
> + return 0;
> +}
> +
> +void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> +{
> + /*
> + * Internal errors of an RCEC indicate an AER error in an
> + * RCH's downstream port. Check and handle them in the CXL.mem
> + * device driver.
> + */
> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> + is_internal_error(info))
> + pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
> +}
> +
> +static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> +{
> + bool *handles_cxl = data;
> +
> + if (!*handles_cxl)
> + *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
> +
> + /* Non-zero terminates iteration */
> + return *handles_cxl;
> +}
> +
> +static bool handles_cxl_errors(struct pci_dev *rcec)
> +{
> + bool handles_cxl = false;
> +
> + if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
> + pcie_aer_is_native(rcec))
> + pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
> +
> + return handles_cxl;
> +}
> +
> +void cxl_rch_enable_rcec(struct pci_dev *rcec)
> +{
> + if (!handles_cxl_errors(rcec))
> + return;
> +
> + pci_aer_unmask_internal_errors(rcec);
> + pci_info(rcec, "CXL: Internal errors unmasked");
> +}
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
2025-08-27 1:35 ` [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2025-08-27 14:51 ` Lukas Wunner
@ 2025-08-28 21:07 ` Dave Jiang
1 sibling, 0 replies; 53+ messages in thread
From: Dave Jiang @ 2025-08-28 21:07 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> The CXL DVSECs are currently defined in cxl/core/cxlpci.h. These are not
> accessible to other subsystems.
>
> Change DVSEC name formatting to follow the existing PCI format in
> pci_regs.h. The current format uses CXL_DVSEC_XYZ. Change to be PCI_DVSEC_CXL_XYZ.
I don't think renaming is necessary. Especially changing all the existing CXL code. CXL isn't exactly considered a subset of PCI (not part of PCI consortium). IMO it may be better to leave it as it was. Maybe others have different opinions.
DJ
>
> Update existing occurrences to match the name change.
>
> Update the inline documentation to refer to latest CXL spec version.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
> drivers/cxl/core/pci.c | 62 +++++++++++++++++------------------
> drivers/cxl/core/regs.c | 12 +++----
> drivers/cxl/cxlpci.h | 53 ------------------------------
> drivers/cxl/pci.c | 2 +-
> drivers/pci/pci.c | 18 +++++-----
> include/uapi/linux/pci_regs.h | 60 ++++++++++++++++++++++++++++++---
> 6 files changed, 104 insertions(+), 103 deletions(-)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index a3aef78f903a..d677691f8a05 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -110,19 +110,19 @@ static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
> int rc, i;
> u32 temp;
>
> - if (id > CXL_DVSEC_RANGE_MAX)
> + if (id > PCI_DVSEC_CXL_RANGE_MAX)
> return -EINVAL;
>
> /* Check MEM INFO VALID bit first, give up after 1s */
> i = 1;
> do {
> rc = pci_read_config_dword(pdev,
> - d + CXL_DVSEC_RANGE_SIZE_LOW(id),
> + d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
> &temp);
> if (rc)
> return rc;
>
> - valid = FIELD_GET(CXL_DVSEC_MEM_INFO_VALID, temp);
> + valid = FIELD_GET(PCI_DVSEC_CXL_MEM_INFO_VALID, temp);
> if (valid)
> break;
> msleep(1000);
> @@ -146,17 +146,17 @@ static int cxl_dvsec_mem_range_active(struct cxl_dev_state *cxlds, int id)
> int rc, i;
> u32 temp;
>
> - if (id > CXL_DVSEC_RANGE_MAX)
> + if (id > PCI_DVSEC_CXL_RANGE_MAX)
> return -EINVAL;
>
> /* Check MEM ACTIVE bit, up to 60s timeout by default */
> for (i = media_ready_timeout; i; i--) {
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(id), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id), &temp);
> if (rc)
> return rc;
>
> - active = FIELD_GET(CXL_DVSEC_MEM_ACTIVE, temp);
> + active = FIELD_GET(PCI_DVSEC_CXL_MEM_ACTIVE, temp);
> if (active)
> break;
> msleep(1000);
> @@ -185,11 +185,11 @@ int cxl_await_media_ready(struct cxl_dev_state *cxlds)
> u16 cap;
>
> rc = pci_read_config_word(pdev,
> - d + CXL_DVSEC_CAP_OFFSET, &cap);
> + d + PCI_DVSEC_CXL_CAP_OFFSET, &cap);
> if (rc)
> return rc;
>
> - hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap);
> + hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT_MASK, cap);
> for (i = 0; i < hdm_count; i++) {
> rc = cxl_dvsec_mem_range_valid(cxlds, i);
> if (rc)
> @@ -217,16 +217,16 @@ static int cxl_set_mem_enable(struct cxl_dev_state *cxlds, u16 val)
> u16 ctrl;
> int rc;
>
> - rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
> + rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL_OFFSET, &ctrl);
> if (rc < 0)
> return rc;
>
> - if ((ctrl & CXL_DVSEC_MEM_ENABLE) == val)
> + if ((ctrl & PCI_DVSEC_CXL_MEM_ENABLE) == val)
> return 1;
> - ctrl &= ~CXL_DVSEC_MEM_ENABLE;
> + ctrl &= ~PCI_DVSEC_CXL_MEM_ENABLE;
> ctrl |= val;
>
> - rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl);
> + rc = pci_write_config_word(pdev, d + PCI_DVSEC_CXL_CTRL_OFFSET, ctrl);
> if (rc < 0)
> return rc;
>
> @@ -242,7 +242,7 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
> {
> int rc;
>
> - rc = cxl_set_mem_enable(cxlds, CXL_DVSEC_MEM_ENABLE);
> + rc = cxl_set_mem_enable(cxlds, PCI_DVSEC_CXL_MEM_ENABLE);
> if (rc < 0)
> return rc;
> if (rc > 0)
> @@ -304,11 +304,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
> return -ENXIO;
> }
>
> - rc = pci_read_config_word(pdev, d + CXL_DVSEC_CAP_OFFSET, &cap);
> + rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CAP_OFFSET, &cap);
> if (rc)
> return rc;
>
> - if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
> + if (!(cap & PCI_DVSEC_CXL_MEM_CAPABLE)) {
> dev_dbg(dev, "Not MEM Capable\n");
> return -ENXIO;
> }
> @@ -319,7 +319,7 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
> * driver is for a spec defined class code which must be CXL.mem
> * capable, there is no point in continuing to enable CXL.mem.
> */
> - hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap);
> + hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT_MASK, cap);
> if (!hdm_count || hdm_count > 2)
> return -EINVAL;
>
> @@ -328,11 +328,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
> * disabled, and they will remain moot after the HDM Decoder
> * capability is enabled.
> */
> - rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
> + rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL_OFFSET, &ctrl);
> if (rc)
> return rc;
>
> - info->mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl);
> + info->mem_enabled = FIELD_GET(PCI_DVSEC_CXL_MEM_ENABLE, ctrl);
> if (!info->mem_enabled)
> return 0;
>
> @@ -345,35 +345,35 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
> return rc;
>
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_SIZE_HIGH(i), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i), &temp);
> if (rc)
> return rc;
>
> size = (u64)temp << 32;
>
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(i), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(i), &temp);
> if (rc)
> return rc;
>
> - size |= temp & CXL_DVSEC_MEM_SIZE_LOW_MASK;
> + size |= temp & PCI_DVSEC_CXL_MEM_SIZE_LOW_MASK;
> if (!size) {
> continue;
> }
>
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_BASE_HIGH(i), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i), &temp);
> if (rc)
> return rc;
>
> base = (u64)temp << 32;
>
> rc = pci_read_config_dword(
> - pdev, d + CXL_DVSEC_RANGE_BASE_LOW(i), &temp);
> + pdev, d + PCI_DVSEC_CXL_RANGE_BASE_LOW(i), &temp);
> if (rc)
> return rc;
>
> - base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
> + base |= temp & PCI_DVSEC_CXL_MEM_BASE_LOW_MASK;
>
> info->dvsec_range[ranges++] = (struct range) {
> .start = base,
> @@ -781,7 +781,7 @@ u16 cxl_gpf_get_dvsec(struct device *dev)
> is_port = false;
>
> dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> - is_port ? CXL_DVSEC_PORT_GPF : CXL_DVSEC_DEVICE_GPF);
> + is_port ? PCI_DVSEC_CXL_PORT_GPF : PCI_DVSEC_CXL_DEVICE_GPF);
> if (!dvsec)
> dev_warn(dev, "%s GPF DVSEC not present\n",
> is_port ? "Port" : "Device");
> @@ -797,14 +797,14 @@ static int update_gpf_port_dvsec(struct pci_dev *pdev, int dvsec, int phase)
>
> switch (phase) {
> case 1:
> - offset = CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET;
> - base = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK;
> - scale = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK;
> + offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL_OFFSET;
> + base = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE_MASK;
> + scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE_MASK;
> break;
> case 2:
> - offset = CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET;
> - base = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK;
> - scale = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK;
> + offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL_OFFSET;
> + base = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE_MASK;
> + scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE_MASK;
> break;
> default:
> return -EINVAL;
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index 5ca7b0eed568..fb70ffbba72d 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -271,10 +271,10 @@ EXPORT_SYMBOL_NS_GPL(cxl_map_device_regs, "CXL");
> static bool cxl_decode_regblock(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi,
> struct cxl_register_map *map)
> {
> - u8 reg_type = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK, reg_lo);
> - int bar = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BIR_MASK, reg_lo);
> + u8 reg_type = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID_MASK, reg_lo);
> + int bar = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BIR_MASK, reg_lo);
> u64 offset = ((u64)reg_hi << 32) |
> - (reg_lo & CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK);
> + (reg_lo & PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW_MASK);
>
> if (offset > pci_resource_len(pdev, bar)) {
> dev_warn(&pdev->dev,
> @@ -311,15 +311,15 @@ static int __cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_ty
> };
>
> regloc = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> - CXL_DVSEC_REG_LOCATOR);
> + PCI_DVSEC_CXL_REG_LOCATOR);
> if (!regloc)
> return -ENXIO;
>
> pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, ®loc_size);
> regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size);
>
> - regloc += CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET;
> - regblocks = (regloc_size - CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET) / 8;
> + regloc += PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1_OFFSET;
> + regblocks = (regloc_size - PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1_OFFSET) / 8;
>
> for (i = 0; i < regblocks; i++, regloc += 8) {
> u32 reg_lo, reg_hi;
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 3959fa7e2ead..ad24d81e9eaa 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -7,59 +7,6 @@
>
> #define CXL_MEMORY_PROGIF 0x10
>
> -/*
> - * See section 8.1 Configuration Space Registers in the CXL 2.0
> - * Specification. Names are taken straight from the specification with "CXL" and
> - * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
> - */
> -#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
> -
> -/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
> -#define CXL_DVSEC_PCIE_DEVICE 0
> -#define CXL_DVSEC_CAP_OFFSET 0xA
> -#define CXL_DVSEC_MEM_CAPABLE BIT(2)
> -#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
> -#define CXL_DVSEC_CTRL_OFFSET 0xC
> -#define CXL_DVSEC_MEM_ENABLE BIT(2)
> -#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> -#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> -#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
> -#define CXL_DVSEC_MEM_ACTIVE BIT(1)
> -#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
> -#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> -#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> -#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
> -
> -#define CXL_DVSEC_RANGE_MAX 2
> -
> -/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
> -#define CXL_DVSEC_FUNCTION_MAP 2
> -
> -/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */
> -#define CXL_DVSEC_PORT_EXTENSIONS 3
> -
> -/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */
> -#define CXL_DVSEC_PORT_GPF 4
> -#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
> -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK GENMASK(3, 0)
> -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK GENMASK(11, 8)
> -#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
> -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK GENMASK(3, 0)
> -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK GENMASK(11, 8)
> -
> -/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */
> -#define CXL_DVSEC_DEVICE_GPF 5
> -
> -/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */
> -#define CXL_DVSEC_PCIE_FLEXBUS_PORT 7
> -
> -/* CXL 2.0 8.1.9: Register Locator DVSEC */
> -#define CXL_DVSEC_REG_LOCATOR 8
> -#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC
> -#define CXL_DVSEC_REG_LOCATOR_BIR_MASK GENMASK(2, 0)
> -#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
> -#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
> -
> /*
> * NOTE: Currently all the functions which are enabled for CXL require their
> * vectors to be in the first 16. Use this as the default max.
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index bd100ac31672..bd95be1f3d5c 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -933,7 +933,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> cxlds->rcd = is_cxl_restricted(pdev);
> cxlds->serial = pci_get_dsn(pdev);
> cxlds->cxl_dvsec = pci_find_dvsec_capability(
> - pdev, PCI_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
> + pdev, PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_DEVICE);
> if (!cxlds->cxl_dvsec)
> dev_warn(&pdev->dev,
> "Device DVSEC not present, skip CXL.mem init\n");
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 9e42090fb108..d775ed37a79b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5031,7 +5031,7 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
> static u16 cxl_port_dvsec(struct pci_dev *dev)
> {
> return pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
> - PCI_DVSEC_CXL_PORT);
> + PCI_DVSEC_CXL_PORT_EXT);
> }
>
> static bool cxl_sbr_masked(struct pci_dev *dev)
> @@ -5043,7 +5043,9 @@ static bool cxl_sbr_masked(struct pci_dev *dev)
> if (!dvsec)
> return false;
>
> - rc = pci_read_config_word(dev, dvsec + PCI_DVSEC_CXL_PORT_CTL, ®);
> + rc = pci_read_config_word(dev,
> + dvsec + PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET,
> + ®);
> if (rc || PCI_POSSIBLE_ERROR(reg))
> return false;
>
> @@ -5052,7 +5054,7 @@ static bool cxl_sbr_masked(struct pci_dev *dev)
> * bit in Bridge Control has no effect. When 1, the Port generates
> * hot reset when the SBR bit is set to 1.
> */
> - if (reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR)
> + if (reg & PCI_DVSEC_CXL_PORT_EXT_CTL_UNMASK_SBR)
> return false;
>
> return true;
> @@ -5097,22 +5099,22 @@ static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
> if (probe)
> return 0;
>
> - rc = pci_read_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL, ®);
> + rc = pci_read_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET, ®);
> if (rc)
> return -ENOTTY;
>
> - if (reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR) {
> + if (reg & PCI_DVSEC_CXL_PORT_EXT_CTL_UNMASK_SBR) {
> val = reg;
> } else {
> - val = reg | PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR;
> - pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
> + val = reg | PCI_DVSEC_CXL_PORT_EXT_CTL_UNMASK_SBR;
> + pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET,
> val);
> }
>
> rc = pci_reset_bus_function(dev, probe);
>
> if (reg != val)
> - pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
> + pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET,
> reg);
>
> return rc;
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index a3a3e942dedf..b03244d55aea 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1225,9 +1225,61 @@
> /* Deprecated old name, replaced with PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE */
> #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE
>
> -/* Compute Express Link (CXL r3.1, sec 8.1.5) */
> -#define PCI_DVSEC_CXL_PORT 3
> -#define PCI_DVSEC_CXL_PORT_CTL 0x0c
> -#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
> +/* Compute Express Link (CXL r3.2, sec 8.1)
> + *
> + * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
> + * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
> + * registers on downstream link-up events.
> + */
> +
> +#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
> +
> +/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
> +#define PCI_DVSEC_CXL_DEVICE 0
> +#define PCI_DVSEC_CXL_CAP_OFFSET 0xA
> +#define PCI_DVSEC_CXL_MEM_CAPABLE BIT(2)
> +#define PCI_DVSEC_CXL_HDM_COUNT_MASK GENMASK(5, 4)
> +#define PCI_DVSEC_CXL_CTRL_OFFSET 0xC
> +#define PCI_DVSEC_CXL_MEM_ENABLE BIT(2)
> +#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> +#define PCI_DVSEC_CXL_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> +#define PCI_DVSEC_CXL_MEM_INFO_VALID BIT(0)
> +#define PCI_DVSEC_CXL_MEM_ACTIVE BIT(1)
> +#define PCI_DVSEC_CXL_MEM_SIZE_LOW_MASK GENMASK(31, 28)
> +#define PCI_DVSEC_CXL_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> +#define PCI_DVSEC_CXL_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> +#define PCI_DVSEC_CXL_MEM_BASE_LOW_MASK GENMASK(31, 28)
> +
> +#define PCI_DVSEC_CXL_RANGE_MAX 2
> +
> +/* CXL 3.2 8.1.4: Non-CXL Function Map DVSEC */
> +#define PCI_DVSEC_CXL_FUNCTION_MAP 2
> +
> +/* CXL 3.2 8.1.5: Extensions DVSEC for Ports */
> +#define PCI_DVSEC_CXL_PORT_EXT 3
> +#define PCI_DVSEC_CXL_PORT_EXT_CTL_OFFSET 0x0c
> +#define PCI_DVSEC_CXL_PORT_EXT_CTL_UNMASK_SBR 0x00000001
> +
> +/* CXL 3.2 8.1.6: GPF DVSEC for CXL Port */
> +#define PCI_DVSEC_CXL_PORT_GPF 4
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE_MASK GENMASK(3, 0)
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE_MASK GENMASK(11, 8)
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE_MASK GENMASK(3, 0)
> +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE_MASK GENMASK(11, 8)
> +
> +/* CXL 3.2 8.1.7: GPF DVSEC for CXL Device */
> +#define PCI_DVSEC_CXL_DEVICE_GPF 5
> +
> +/* CXL 3.2 8.1.8: PCIe DVSEC for Flex Bus Port */
> +#define PCI_DVSEC_CXL_FLEXBUS_PORT 7
> +
> +/* CXL 3.2 8.1.9: Register Locator DVSEC */
> +#define PCI_DVSEC_CXL_REG_LOCATOR 8
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1_OFFSET 0xC
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BIR_MASK GENMASK(2, 0)
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
> +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
>
> #endif /* LINUX_PCI_REGS_H */
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 15/23] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers
2025-08-27 1:35 ` [PATCH v11 15/23] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
@ 2025-08-28 23:05 ` Dave Jiang
0 siblings, 0 replies; 53+ messages in thread
From: Dave Jiang @ 2025-08-28 23:05 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> CXL Endpoint (EP) Ports may include Root Ports (RP) or Downstream Switch
> Ports (DSP). CXL RPs and DSPs contain RAS registers that require memory
> mapping to enable RAS logging. This initialization is currently missing and
> must be added for CXL RPs and DSPs.
>
> Update cxl_dport_init_ras_reporting() to support RP and DSP RAS mapping.
> Add alongside the existing Restricted CXL Host Downstream Port RAS mapping.
>
> Update cxl_endpoint_port_probe() to invoke cxl_dport_init_ras_reporting().
> This will initiate the RAS mapping for CXL RPs and DSPs when each CXL EP is
> created and added to the EP port.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
> Changes in v10->v11:
> - Use local pointer for readability in cxl_switch_port_init_ras() (Jonathan Cameron)
> - Rename port to be ep in cxl_endpoint_port_init_ras() (Dave Jiang)
> - Rename dport to be parent_dport in cxl_endpoint_port_init_ras()
> and cxl_switch_port_init_ras() (Dave Jiang)
> - Port helper changes were in cxl/port.c, now in core/ras.c (Dave Jiang)
> ---
> drivers/cxl/core/core.h | 7 ++++++
> drivers/cxl/core/ras.c | 47 +++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 2 ++
> drivers/cxl/cxlpci.h | 4 ----
> drivers/cxl/mem.c | 4 +++-
> drivers/cxl/port.c | 5 +++++
> 6 files changed, 64 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 2c81a43d7b05..2fa76a913264 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -146,6 +146,9 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
> #ifdef CONFIG_CXL_RAS
> int cxl_ras_init(void);
> void cxl_ras_exit(void);
> +void cxl_switch_port_init_ras(struct cxl_port *port);
> +void cxl_endpoint_port_init_ras(struct cxl_port *ep);
> +void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
> #else
> static inline int cxl_ras_init(void)
> {
> @@ -155,6 +158,10 @@ static inline int cxl_ras_init(void)
> static inline void cxl_ras_exit(void)
> {
> }
> +static inline void cxl_switch_port_init_ras(struct cxl_port *port) { }
> +static inline void cxl_endpoint_port_init_ras(struct cxl_port *ep) { }
> +static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
> + struct device *host) { }
> #endif // CONFIG_CXL_RAS
>
> int cxl_gpf_port_setup(struct cxl_dport *dport);
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 69559043b772..42b6e0b092d5 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -284,6 +284,53 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
>
> +static void cxl_uport_init_ras_reporting(struct cxl_port *port,
> + struct device *host)
> +{
> + struct cxl_register_map *map = &port->reg_map;
> +
> + map->host = host;
> + if (cxl_map_component_regs(map, &port->uport_regs,
> + BIT(CXL_CM_CAP_CAP_ID_RAS)))
> + dev_dbg(&port->dev, "Failed to map RAS capability\n");
> +}
> +
> +void cxl_switch_port_init_ras(struct cxl_port *port)
> +{
> + struct cxl_dport *parent_dport = port->parent_dport;
> +
> + if (is_cxl_root(to_cxl_port(port->dev.parent)))
> + return;
> +
> + /* May have parent DSP or RP */
> + if (parent_dport && dev_is_pci(parent_dport->dport_dev)) {
> + struct pci_dev *pdev = to_pci_dev(parent_dport->dport_dev);
> +
> + if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) ||
> + (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM))
> + cxl_dport_init_ras_reporting(parent_dport, &port->dev);
> + }
> +
> + cxl_uport_init_ras_reporting(port, &port->dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_switch_port_init_ras, "CXL");
> +
> +void cxl_endpoint_port_init_ras(struct cxl_port *ep)
> +{
> + struct cxl_dport *parent_dport;
> + struct cxl_memdev *cxlmd = to_cxl_memdev(ep->uport_dev);
> + struct cxl_port *parent_port __free(put_cxl_port) =
> + cxl_mem_find_port(cxlmd, &parent_dport);
> +
> + if (!parent_dport || !dev_is_pci(parent_dport->dport_dev)) {
> + dev_err(&ep->dev, "CXL port topology not found\n");
> + return;
> + }
> +
> + cxl_dport_init_ras_reporting(parent_dport, cxlmd->cxlds->dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_init_ras, "CXL");
> +
> static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> {
> void __iomem *addr;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 8f6224ac6785..32fccad9a7f6 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -586,6 +586,7 @@ struct cxl_dax_region {
> * @parent_dport: dport that points to this port in the parent
> * @decoder_ida: allocator for decoder ids
> * @reg_map: component and ras register mapping parameters
> + * @uport_regs: mapped component registers
> * @nr_dports: number of entries in @dports
> * @hdm_end: track last allocated HDM decoder instance for allocation ordering
> * @commit_end: cursor to track highest committed decoder for commit ordering
> @@ -606,6 +607,7 @@ struct cxl_port {
> struct cxl_dport *parent_dport;
> struct ida decoder_ida;
> struct cxl_register_map reg_map;
> + struct cxl_component_regs uport_regs;
> int nr_dports;
> int hdm_end;
> int commit_end;
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index ad24d81e9eaa..a6da0abfa506 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -84,7 +84,6 @@ void read_cdat_data(struct cxl_port *port);
> void cxl_cor_error_detected(struct pci_dev *pdev);
> pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> pci_channel_state_t state);
> -void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
> #else
> static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
>
> @@ -93,9 +92,6 @@ static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> {
> return PCI_ERS_RESULT_NONE;
> }
> -
> -static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
> - struct device *host) { }
> #endif
>
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 6e6777b7bafb..f7dc0ba8905d 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -7,6 +7,7 @@
>
> #include "cxlmem.h"
> #include "cxlpci.h"
> +#include "core/core.h"
>
> /**
> * DOC: cxl mem
> @@ -166,7 +167,8 @@ static int cxl_mem_probe(struct device *dev)
> else
> endpoint_parent = &parent_port->dev;
>
> - cxl_dport_init_ras_reporting(dport, dev);
> + if (dport->rch)
> + cxl_dport_init_ras_reporting(dport, dev);
So the endpoint port probe calls this via cxl_endpoint_port_init_ras(), and if it's RCH the memedev probe also calls this. Trying to understand why it happens for both drivers for the RCH case...
>
> scoped_guard(device, endpoint_parent) {
> if (!endpoint_parent->driver) {
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index fe4b593331da..e66c7f2e1955 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -6,6 +6,7 @@
>
> #include "cxlmem.h"
> #include "cxlpci.h"
> +#include "core/core.h"
>
> /**
> * DOC: cxl port
> @@ -71,6 +72,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
>
> cxl_switch_parse_cdat(port);
>
> + cxl_switch_port_init_ras(port);
> +
> cxlhdm = devm_cxl_setup_hdm(port, NULL);
> if (!IS_ERR(cxlhdm))
> return devm_cxl_enumerate_decoders(cxlhdm, NULL);
> @@ -125,6 +128,8 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> if (rc)
> return rc;
>
> + cxl_endpoint_port_init_ras(port);
> +
> /*
> * Now that all endpoint decoders are successfully enumerated, try to
> * assemble regions from committed decoders
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
` (22 preceding siblings ...)
2025-08-27 1:35 ` [PATCH v11 23/23] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
@ 2025-08-29 0:07 ` Dave Jiang
23 siblings, 0 replies; 53+ messages in thread
From: Dave Jiang @ 2025-08-29 0:07 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> This patchset adds CXL Protocol Error handling for CXL Ports and updates CXL
> Endpoints (EP) handling. Previous versions of this series can be found here:
> https://lore.kernel.org/linux-cxl/20250626224252.1415009-1-terry.bowman@amd.com/
>
> The first 6 patches reorganize protocol error related CXL code into new files.
> The first patch is authored by Dave Jiang and moves the CXL handling and logging into
> cxl/core/ras.c. Patch 5 and 6 move the AER driver's restricted CXL host (RCH) logic
> into pci/pcie/rch_aer.c. The RCH logic is used for CXL RCH devices from CXL1.1.
> CONFIG_PCIEAER_CXL is removed and replaced with CONFIG_CXL_RAS, defined in CXL
> Kconfig.
>
> The next 4 patches introduce pcie_is_cxl() and use in the AER driver to detect and
> log the error type as PCIe error or CXL error.
>
> The next 4 patches are error handler cleanup in preparation for upcoming error handler
> changes.
>
> The next 7 patches introduce CXL Port error handling as well as updating the existing
> CXL Endpoint handling. This sets all CXL Ports and EPs to use similar handling and logging
> flow. Note, the PCIe EP handling remains for cases the EP is not recognized as a CXL
> device. These patches also include change to move the AER driver's CXL virtual hierarchy
> code into a new file named pci/pcie/cxl_aer.c. This separates the AER driver's CXL
> implementation from the AER driver's core file.
>
> The final 2 patches enable/disable CXL protocol error interrupts during CXL port
> creation and teardown.
>
> Note, I'll be on PTO until 9/10. Responses may be delayed.
>
> ==== Testing ====
>
> Below are the testing results while using QEMU. The QEMU testing uses a CXL Root
> Port, CXL Upstream Switch Port, CXL Downstream Switch Port and CXL Endpoint as
> given below. I've attached the QEMU startup commandline used.
>
> The sub-topology for the QEMU testing is:
> ---------------------
> | CXL RP - 0C:00.0 |
> ---------------------
> |
> ---------------------
> | CXL USP - 0D:00.0 |
> ---------------------
> |
> ---------------------
> | CXL DSP - 0E:00.0 |
> ---------------------
> |
> ---------------------
> | CXL EP - 0F:00.0 |
> ---------------------
>
> root@tbowman-cxl:~# lspci -t
> -+-[0000:00]- -00.0
> | +-01.0
> | +-02.0
> | +-03.0
> | +-1f.0
> | +-1f.2
> | \-1f.3
> \-[0000:0c]---00.0-[0d-0f]----00.0-[0e-0f]----00.0-[0f]----00.0
>
> The topology was created with:
> ${qemu} -boot menu=on \
> -cpu host \
> -nographic \
> -monitor telnet:127.0.0.1:1234,server,nowait \
> -M virt,cxl=on \
> -chardev stdio,id=s1,signal=off,mux=on -serial none \
> -device isa-serial,chardev=s1 -mon chardev=s1,mode=readline \
> -machine q35,cxl=on \
> -m 16G,maxmem=24G,slots=8 \
> -cpu EPYC-v3 \
> -smp 16 \
> -accel kvm \
> -drive file=${img},format=raw,index=0,media=disk \
> -device e1000,netdev=user.0 \
> -netdev user,id=user.0,hostfwd=tcp::5555-:22 \
> -object memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
> -object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
> -device cxl-upstream,bus=root_port0,id=us0 \
> -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
> -device cxl-type3,bus=swport0,volatile-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-vmem0 \
> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
>
> === Root Port ===
> root@tbowman-cxl:~/aer-inject# ./root-ce-inject.sh
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0c:00.0
> pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0c:00.0
> aer_event: 0000:0c:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not availabl e
> pcieport 0000:0c:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
> pcieport 0000:0c:00.0: device [8086:7075] error status/mask=00004000/0000a000
> pcieport 0000:0c:00.0: [14] CorrIntErr
> cxl_aer_correctable_error: memdev=0000:0c:00.0 host=pci0000:0c serial=0: status: 'CRC Threshold Hit'
>
> root@tbowman-cxl:~/aer-inject# ./root-uce-inject.sh
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0c:00.0
> pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0c:00.0
> aer_event: 0000:0c:00.0 CXL Bus Error: severity=Fatal, Uncorrectable Internal Error, TLP Header=Not availabl e
> pcieport 0000:0c:00.0: CXL Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
> pcieport 0000:0c:00.0: device [8086:7075] error status/mask=00400000/02000000
> pcieport 0000:0c:00.0: [22] UncorrIntErr
> cxl_aer_uncorrectable_error: memdev=0000:0c:00.0 host=pci0000:0c serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 10 UID: 0 PID: 216 Comm: kworker/10:1 Tainted: G E 6.16.0-rc4-testing-00054-g263b76bedfed #1306 PREEMPT(voluntary)
> Tainted: [E]=UNSIGNED_MODULE
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> Workqueue: events cxl_proto_err_work_fn [cxl_core]
> RIP: 0010:cxl_error_detected+0x18/0xd0 [cxl_core]
> Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 8b 5f 78 <4c> 8b 63 08 4d 8d b4 24 80 00 00 00 4c 89 f7 e8 e4 68 0a f6 be 0f
> RSP: 0018:ffffd1870091fd58 EFLAGS: 00010282
> RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000003
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8e0241c120c8
> RBP: ffff8e0241c120c8 R08: 0000000000000284 R09: 0000000000000001
> R10: 00000000ffffffff R11: 00cccccc00cccccc R12: ffff8e0241c12148
> R13: ffff8e0241724f00 R14: 0000000000000000 R15: ffff8e0248571740
> FS: 0000000000000000(0000) GS:ffff8e05f7ba5000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 0000000113a8b000 CR4: 00000000003506f0
> Call Trace:
> <TASK>
> cxl_report_error_detected+0x3e/0x70 [cxl_core]
> cxl_walk_port.constprop.0+0xa4/0x140 [cxl_core]
> cxl_proto_err_work_fn+0x1fa/0x430 [cxl_core]
> ? srso_return_thunk+0x5/0x5f
> process_scheduled_works+0xa8/0x420
> ? __pfx_worker_thread+0x10/0x10
> worker_thread+0x11c/0x260
> ? __pfx_worker_thread+0x10/0x10
> kthread+0xfe/0x210
> ? __pfx_kthread+0x10/0x10
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x18d/0x1e0
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> Modules linked in: cfg80211(E) binfmt_misc(E) ppdev(E) intel_rapl_msr(E) cxl_mem(E) intel_rapl_common(E) cxl_pci(E) parport_pc(E) cxl_pmem(E) parport(E) input_leds(E) joydev(E) mac_hid(E) serio_raw(E) dm_m)
> CR2: 0000000000000008
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:cxl_error_detected+0x18/0xd0 [cxl_core]
> Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 8b 5f 78 <4c> 8b 63 08 4d 8d b4 24 80 00 00 00 4c 89 f7 e8 e4 68 0a f6 be 0f
> RSP: 0018:ffffd1870091fd58 EFLAGS: 00010282
> RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000003
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8e0241c120c8
> RBP: ffff8e0241c120c8 R08: 0000000000000284 R09: 0000000000000001
> R10: 00000000ffffffff R11: 00cccccc00cccccc R12: ffff8e0241c12148
> R13: ffff8e0241724f00 R14: 0000000000000000 R15: ffff8e0248571740
> FS: 0000000000000000(0000) GS:ffff8e05f7ba5000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 0000000113a8b000 CR4: 00000000003506f0
> note: kworker/10:1[216] exited with irqs disabled
>
> === Upstream Switch Port ===
> root@tbowman-cxl:~/aer-inject# ./us-ce-inject.sh
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0d:00.0
> pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0d:00.0
> aer_event: 0000:0d:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
> pcieport 0000:0d:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
> pcieport 0000:0d:00.0: device [19e5:a128] error status/mask=00004000/0000a000
> pcieport 0000:0d:00.0: [14] CorrIntErr
> cxl_aer_correctable_error: memdev=0000:0d:00.0 host=0000:0c:00.0 serial=0: status: 'CRC Threshold Hit'
>
> root@tbowman-cxl:~/aer-inject# ./us-uce-inject.sh
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0d:00.0
> pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0d:00.0
> aer_event: 0000:0d:00.0 CXL Bus Error: severity=Fatal, , TLP Header=Not available
> pcieport 0000:0d:00.0: AER: CXL Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID)
> cxl_aer_uncorrectable_error: memdev=mem0 host=0000:0f:00.0 serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
> Kernel panic - not syncing: CXL cachemem error.
> CPU: 10 UID: 0 PID: 147 Comm: irq/24-aerdrv Tainted: G E 6.16.0-rc4-testing-00054-g263b76bedfed #1306 PREEMPT(voluntary)
> Tainted: [E]=UNSIGNED_MODULE
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> Call Trace:
> <TASK>
> panic+0x364/0x3d0
> ? __pfx_aer_root_reset+0x10/0x10
> pci_error_detected+0x2b/0x30 [cxl_core]
> report_error_detected+0xf7/0x170
> ? __pfx_report_frozen_detected+0x10/0x10
> __pci_walk_bus+0x4c/0x70
> ? __pfx_report_frozen_detected+0x10/0x10
> __pci_walk_bus+0x34/0x70
> ? __pfx_report_frozen_detected+0x10/0x10
> __pci_walk_bus+0x34/0x70
> ? __pfx_report_frozen_detected+0x10/0x10
> pci_walk_bus+0x31/0x50
> pcie_do_recovery+0x163/0x2b0
> aer_isr_one_error_type+0x1f3/0x380
> aer_isr_one_error+0x11d/0x140
> aer_isr+0x4c/0x80
> irq_thread_fn+0x24/0x70
> irq_thread+0x199/0x2a0
> ? __pfx_irq_thread_fn+0x10/0x10
> ? __pfx_irq_thread_dtor+0x10/0x10
> ? __pfx_irq_thread+0x10/0x10
> kthread+0xfe/0x210
> ? __pfx_kthread+0x10/0x10
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x18d/0x1e0
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> Kernel Offset: 0x13400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> ---[ end Kernel panic - not syncing: CXL cachemem error. ]---
>
> === Downstream Switch Port ===
> root@tbowman-cxl:~/aer-inject# ./ds-ce-inject.sh
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0e:00.0
> pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0e:00.0
> aer_event: 0000:0e:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
> pcieport 0000:0e:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
> pcieport 0000:0e:00.0: device [19e5:a129] error status/mask=00004000/0000a000
> pcieport 0000:0e:00.0: [14] CorrIntErr
> cxl_aer_correctable_error: memdev=0000:0e:00.0 host=0000:0d:00.0 serial=0: status: 'CRC Threshold Hit'
>
> root@tbowman-cxl:~/aer-inject# ./ds-uce-inject# ./ds-uce-inject.sh
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0e:00.0
> pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0e:00.0
> aer_event: 0000:0e:00.0 CXL Bus Error: severity=Fatal, Uncorrectable Internal Error, TLP Header=Not available
> pcieport 0000:0e:00.0: CXL Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
> pcieport 0000:0e:00.0: device [19e5:a129] error status/mask=00400000/02000000
> pcieport 0000:0e:00.0: [22] UncorrIntErr
> cxl_aer_uncorrectable_error: memdev=0000:0d:00.0 host=0000:0c:00.0 serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 10 UID: 0 PID: 196 Comm: kworker/10:1 Tainted: G E 6.16.0-rc4-testing-00054-g263b76bedfed #1306 PREEMPT(voluntary)
> Tainted: [E]=UNSIGNED_MODULE
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> Workqueue: events cxl_proto_err_work_fn [cxl_core]
> RIP: 0010:cxl_error_detected+0x18/0xd0 [cxl_core]
> Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 8b 5f 78 <4c> 8b 63 08 4d 8d b4 24 80 00 00 00 4c 89 f7 e8 e4 68 aa d9 be 0f
> RSP: 0018:ffffd1178086fd58 EFLAGS: 00010282
> RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000003
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8a0201c510c8
> RBP: ffff8a0201c510c8 R08: 000000000000028c R09: 0000000000000001
> R10: 00000000ffffffff R11: 00cccccc00cccccc R12: ffff8a0201c51148
> R13: ffff8a0201c56000 R14: ffff8a02011ed000 R15: ffff8a0201110000
> FS: 0000000000000000(0000) GS:ffff8a05d3fa5000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 0000000103574000 CR4: 00000000003506f0
> Call Trace:
> <TASK>
> cxl_report_error_detected+0x3e/0x70 [cxl_core]
> cxl_walk_port.constprop.0+0x12a/0x140 [cxl_core]
> ? srso_return_thunk+0x5/0x5f
> cxl_proto_err_work_fn+0x1fa/0x430 [cxl_core]
> ? srso_return_thunk+0x5/0x5f
> process_scheduled_works+0xa8/0x420
> ? __pfx_worker_thread+0x10/0x10
> worker_thread+0x11c/0x260
> ? __pfx_worker_thread+0x10/0x10
> kthread+0xfe/0x210
> ? __pfx_kthread+0x10/0x10
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x18d/0x1e0
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> Modules linked in: cfg80211(E) binfmt_misc(E) intel_rapl_msr(E) intel_rapl_common(E) ppdev(E) cxl_mem(E) cxl_pmem(E) parport_pc(E) cxl_pci(E) parport(E) joydev(E) input_leds(E) mac_hid(E) serio_raw(E) dm_m)
> CR2: 0000000000000008
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:cxl_error_detected+0x18/0xd0 [cxl_core]
> Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 8b 5f 78 <4c> 8b 63 08 4d 8d b4 24 80 00 00 00 4c 89 f7 e8 e4 68 aa d9 be 0f
> RSP: 0018:ffffd1178086fd58 EFLAGS: 00010282
> RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000003
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8a0201c510c8
> RBP: ffff8a0201c510c8 R08: 000000000000028c R09: 0000000000000001
> R10: 00000000ffffffff R11: 00cccccc00cccccc R12: ffff8a0201c51148
> R13: ffff8a0201c56000 R14: ffff8a02011ed000 R15: ffff8a0201110000
> FS: 0000000000000000(0000) GS:ffff8a05d3fa5000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 0000000103574000 CR4: 00000000003506f0
> note: kworker/10:1[196] exited with irqs disabled
>
> === Endpoint ===
> root@tbowman-cxl:~/aer-inject# ./ep-ce-inject.sh
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0f:00.0
> pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0f:00.0
> aer_event: 0000:0f:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
> cxl_pci 0000:0f:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
> cxl_pci 0000:0f:00.0: device [8086:0d93] error status/mask=00004000/00000000
> cxl_pci 0000:0f:00.0: [14] CorrIntErr
> cxl_aer_uncorrectable_error: memdev=mem0 host=0000:0f:00.0 serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
>
> root@tbowman-cxl:~/aer-inject# ./ep-uce-inject.sh
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0f:00.0
> pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0f:00.0
> aer_event: 0000:0f:00.0 CXL Bus Error: severity=Fatal, , TLP Header=Not available
> cxl_pci 0000:0f:00.0: AER: CXL Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID)
> cxl_aer_uncorrectable_error: memdev=mem0 host=0000:0f:00.0 serial=0: status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
> Kernel panic - not syncing: CXL cachemem error.
> CPU: 10 UID: 0 PID: 148 Comm: irq/24-aerdrv Tainted: G E 6.16.0-rc4-testing-00054-g263b76bedfed #1306 PREEMPT(voluntary)
> Tainted: [E]=UNSIGNED_MODULE
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> Call Trace:
> <TASK>
> panic+0x364/0x3d0
> ? __pfx_aer_root_reset+0x10/0x10
> pci_error_detected+0x2b/0x30 [cxl_core]
> report_error_detected+0xf7/0x170
> ? __pfx_report_frozen_detected+0x10/0x10
> __pci_walk_bus+0x4c/0x70
> ? __pfx_report_frozen_detected+0x10/0x10
> pci_walk_bus+0x31/0x50
> pcie_do_recovery+0x163/0x2b0
> aer_isr_one_error_type+0x1f3/0x380
> aer_isr_one_error+0x11d/0x140
> aer_isr+0x4c/0x80
> irq_thread_fn+0x24/0x70
> irq_thread+0x199/0x2a0
> ? __pfx_irq_thread_fn+0x10/0x10
> ? __pfx_irq_thread_dtor+0x10/0x10
> ? __pfx_irq_thread+0x10/0x10
> kthread+0xfe/0x210
> ? __pfx_kthread+0x10/0x10
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x18d/0x1e0
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> Kernel Offset: 0xd600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> ---[ end Kernel panic - not syncing: CXL cachemem error. ]---
>
> ==== Changes ====
> Changes in v10 -> v11:
> cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c
> - New patch
> CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS
> - New patch
> cxl/pci: Remove unnecessary CXL RCH handling helper functions
> - New patch
> cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block
> - New patch
> CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors
> - Remove changes in code-split and move to earlier, new patch
> - Add #include <linux/bitfield.h> to cxl_ras.c
> - Move cxl_rch_handle_error() & cxl_rch_enable_rcec() declarations from pci.h
> to aer.h, more localized.
> - Introduce CONFIG_CXL_RCH_RAS, includes Makefile changes, ras.c ifdef changes
> CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
> - New patch
> PCI/CXL: Introduce pcie_is_cxl()
> - Amended set_pcie_cxl() to check for Upstream Port's and EP's parent
> downstream port by calling set_pcie_cxl(). (Dan)
> - Retitle patch: 'Add' -> 'Introduce'
> - Add check for CXL.mem and CXL.cache (Alejandro, Dan)
> PCI/AER: Report CXL or PCIe bus error type in trace logging
> - Remove duplicate call to trace_aer_event() (Shiju)
> - Added Dan William's and Dave Jiang's reviewed-by
> CXL/AER: Update PCI class code check to use FIELD_GET()
> - Add #include <linux/bitfield.h> to cxl_ras.c (Terry)
> - Removed line wrapping at "(CXL 3.2, 8.1.12.1)". (Jonathan)
> cxl/pci: Log message if RAS registers are unmapped
> - Added Dave Jiang's review-by (Terry)
> cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
> - Updated CE and UCE trace routines to maintian consistent TP_Struct ABI
> and unchanged TP_printk() logging. (Shiju, Alison)
> cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors
> - Added Dave Jiang and Jonathan Cameron's review-by
> - Changes moved to core/ras.c
> cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers
> - Use local pointer for readability in cxl_switch_port_init_ras() (Jonathan Cameron)
> - Rename port to be ep in cxl_endpoint_port_init_ras() (Dave Jiang)
> - Rename dport to be parent_dport in cxl_endpoint_port_init_ras()
> and cxl_switch_port_init_ras() (Dave Jiang)
> - Port helper changes were in cxl/port.c, now in core/ras.c (Dave Jiang)
> cxl/pci: Introduce CXL Endpoint protocol error handlers
> - cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
> - cxl_error_detected() - Remove extra line (Shiju)
> - Changes moved to core/ras.c (Terry)
> - cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
> - Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
> - Move #include "pci.h from cxl.h to core.h (Terry)
> - Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
> CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors
> - Move RCH implementation to cxl_rch.c and RCH declarations to pci/pci.h. (Terry)
> - Introduce 'struct cxl_proto_err_kfifo' containing semaphore, fifo,
> and work struct. (Dan)
> - Remove embedded struct from cxl_proto_err_work (Dan)
> - Make 'struct work_struct *cxl_proto_err_work' definition static (Jonathan)
> - Add check for NULL cxl_proto_err_kfifo to determine if CXL driver is
> not registered for workqueue. (Dan)
> PCI/AER: Dequeue forwarded CXL error
> - Reword patch commit message to remove RCiEP details (Jonathan)
> - Add #include <linux/bitfield.h> (Terry)
> - is_cxl_rcd() - Fix short comment message wrap (Jonathan)
> - is_cxl_rcd() - Combine return calls into 1 (Jonathan)
> - cxl_handle_proto_error() - Move comment earlier (Jonathan)
> - Usse FIELD_GET() in discovering class code (Jonathan)
> - Remove BDF from cxl_proto_err_work_data. Use 'struct pci_dev *' (Dan)
> CXL/PCI: Introduce CXL Port protocol error handlers
> - Removed check for PCI_EXP_TYPE_RC_END in cxl_report_error_detected() (Terry)
> - Update is_cxl_error() to check for acceptable PCI EP and port types
> CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
> - pci_ers_merge_result() - Change export to non-namespace and rename
> to be pci_ers_merge_result() (Jonathan)
> - Move pci_ers_merge_result() definition to pci.h. Needs pci_ers_result (Terry)
> CXL/PCI: Introduce CXL uncorrectable protocol error recovery
> - pci_ers_merge_results() - Move to earlier patch
> CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup
> - Remove guard() in cxl_mask_proto_interrupts(). Observed device lockup/block
> during testing. (Terry)
>
> Changes in v9 -> v10:
> - Add drivers/pci/pcie/cxl_aer.c
> - Add drivers/cxl/core/native_ras.c
> - Change cxl_register_prot_err_work()/cxl_unregister_prot_err_work to return void
> - Check for pcie_ports_native in cxl_do_recovery()
> - Remove debug logging in cxl_do_recovery()
> - Update PCI_ERS_RESULT_PANIC definition to indicate is CXL specific
> - Revert trace logging changes: name,parent -> memdev,host.
> - Use FIELD_GET() to check for EP class code (cxl_aer.c & native_ras.c).
> - Change _prot_ to _proto_ everywhere
> - cxl_rch_handle_error_iter(), check if driver is cxl_pci_driver
> - Remove cxl_create_prot_error_info(). Move logic into forward_cxl_error()
> - Remove sbdf_to_pci() and move logic into cxl_handle_proto_error()
> - Simplify/refactor get_pci_cxl_host_dev()
> - Simplify/refactor cxl_get_ras_base()
> - Move patch 'Remove unnecessary CXL Endpoint handling helper functions' to front
> - Update description for 'CXL/PCI: Introduce CXL Port protocol error
> handlers' with why state is not used to determine handling
> - Introduce cxl_pci_drv_bound() and call from cxl_rch_handle_error_iter()
>
> Changes in v8 -> v9:
> - Updated reference counting to use pci_get_device()/pci_put_device() in
> cxl_disable_prot_errors()/cxl_enable_prot_errors
> - Refactored cxl_create_prot_err_info() to fix reference counting
> - Removed 'struct cxl_port' driver changes for error handler. Instead
> check for CXL device type (EP or Port device) and call handler
> - Make pcie_is_cxl() static inline in include/linux/linux.h
> - Remove NULL check in create_prot_err_info()
> - Change success return in cxl_ras_init() to use hardcoded 0
> - Changed 'struct work_struct cxl_prot_err_work' declaration to static
> - Change to use rate limited log with dev anchor in forward_cxl_error()
> - Refactored forward-cxl_error() to remove severity auto variable
> - Changed pci_aer_clear_nonfatal_status() to be static inline for
> !(CONFIG_PCIEAER)
> - Renamed merge_result() to be cxl_merge_result()
> - Removed 'ue' condition in cxl_error_detected()
> - Updated 2nd parameter in call to __cxl_handle_cor_ras()/__cxl_handle_ras()
> in unify patch
> - Added log message for failure while assigning interrupt disable callback
> - Updated pci_aer_mask_internal_errors() to use pci_clear_and_set_config_dword()
> - Simplified patch titles for clarity
> - Moved CXL error interrupt disabling into cxl/core/port.c with CXL Port
> teardown
> - Updated 'struct cxl_port_err_info' to only contain sbdf and severity
> Removed everything else.
> - Added pdev and CXL device get_device()/put_device() before calling handlers
>
> Changes in v7 -> v8:
> [Dan] Use kfifo. Move handling to CXL driver. AER forwards error to CXL
> driver
> [Dan] Add device reference incrementors where needed throughout
> [Dan] Initiate CXL Port RAS init from Switch Port and Endpoint Port init
> [Dan] Combine CXL Port and CXL Endpoint trace routine
> [Dan] Introduce aer_info::is_cxl. Use to indicate CXL or PCI errors
> [Jonathan] Add serial number for all devices in trace
> [DaveJ] Move find_cxl_port() change into patch using it
> [Terry] Move CXL Port RAS init into cxl/port.c
> [Terry] Moved kfifo functions into cxl/core/ras.c
>
> Changes in v6 -> v7:
> [Terry] Move updated trace routine call to later patch. Was causing build
> error.
>
> Changes in v5 -> v6:
> [Ira] Move pcie_is_cxl(dev) define to a inline function
> [Ira] Update returning value from pcie_is_cxl_port() to bool w/o cast
> [Ira] Change cxl_report_error_detected() cleanup to return correct bool
> [Ira] Introduce and use PCI_ERS_RESULT_PANIC
> [Ira] Reuse comment for PCIe and CXL recovery paths
> [Jonathan] Add type check in for cxl_handle_cor_ras() and cxl_handle_ras()
> [Jonathan] cxl_uport/dport_init_ras_reporting(), added a mutex.
> [Jonathan] Add logging example to patches updating trace output
> [Jonathan] Make parameter 'const' to eliminate for cast in match_uport()
> [Jonathan] Use __free() in cxl_pci_port_ras()
> [Terry] Add patch to log the PCIe SBDF along with CXL device name
> [Terry] Add patch to handle CXL endpoint and RCH DP errors as CXL errors
> [Terry] Remove patch w USP UCE fatal support @ aer_get_device_error_info()
> [Terry] Rebase to cxl/next commit 5585e342e8d3 ("cxl/memdev: Remove unused partition values")
> [Gregory] Pre-initialize pointer to NULL in cxl_pci_port_ras()
> [Gregory] Move AER driver bus name detection to a static function
>
> Changes in v4 -> v5:
> [Alejandro] Refactor cxl_walk_bridge to simplify 'status' variable usage
> [Alejandro] Add WARN_ONCE() in __cxl_handle_ras() and cxl_handle_cor_ras()
> [Ming] Remove unnecessary NULL check in cxl_pci_port_ras()
> [Terry] Add failure check for call to to_cxl_port() in cxl_pci_port_ras()
> [Ming] Use port->dev for call to devm_add_action_or_reset() in
> cxl_dport_init_ras_reporting() and cxl_uport_init_ras_reporting()
> [Jonathan] Use get_device()/put_device() to prevent race condition in
> cxl_clear_port_error_handlers() and cxl_clear_port_error_handlers()
> [Terry] Commit message cleanup. Capitalize keywords from CXL and PCI
> specifications
>
> Changes in v3 -> v4:
> [Lukas] Capitalize PCIe and CXL device names as in specifications
> [Lukas] Move call to pcie_is_cxl() into cxl_port_devsec()
> [Lukas] Correct namespace spelling
> [Lukas] Removed export from pcie_is_cxl_port()
> [Lukas] Simplify 'if' blocks in cxl_handle_error()
> [Lukas] Change panic message to remove redundant 'panic' text
> [Ming] Update to call cxl_dport_init_ras_reporting() in RCH case
> [lkp@intel] 'host' parameter is already removed. Remove parameter description too.
> [Terry] Added field description for cxl_err_handlers in pci.h comment block
>
> Changes in v1 -> v2:
> [Jonathan] Remove extra NULL check and cleanup in cxl_pci_port_ras()
> [Jonathan] Update description to DSP map patch description
> [Jonathan] Update cxl_pci_port_ras() to check for NULL port
> [Jonathan] Dont call handler before handler port changes are present (patch order)
> [Bjorn] Fix linebreak in cover sheet URL
> [Bjorn] Remove timestamps from test logs in cover sheet
> [Bjorn] Retitle AER commits to use "PCI/AER:"
> [Bjorn] Retitle patch#3 to use renaming instead of refactoring
> [Bjorn] Fix base commit-id on cover sheet
> [Bjorn] Add VH spec reference/citation
> [Terry] Removed last 2 patches to enable internal errors. Is not needed
> because internal errors are enabled in AER driver.
> [Dan] Create cxl_do_recovery() and pci_driver::cxl_err_handlers.
> [Dan] Use kernel panic in CXL recovery
> [Dan] cxl_port_hndlrs -> cxl_port_error_handlers
>
>
> Dave Jiang (1):
> cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c
>
> Terry Bowman (22):
> CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS
> cxl/pci: Remove unnecessary CXL Endpoint handling helper functions
> cxl/pci: Remove unnecessary CXL RCH handling helper functions
> cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS
> conditional block
> CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH
> errors
> CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
> PCI/CXL: Introduce pcie_is_cxl()
> PCI/AER: Report CXL or PCIe bus error type in trace logging
> CXL/AER: Update PCI class code check to use FIELD_GET()
> cxl/pci: Update RAS handler interfaces to also support CXL Ports
> cxl/pci: Log message if RAS registers are unmapped
> cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
> cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors
> cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers
> cxl/pci: Introduce CXL Endpoint protocol error handlers
> CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors
> PCI/AER: Dequeue forwarded CXL error
> CXL/PCI: Introduce CXL Port protocol error handlers
> CXL/PCI: Export and rename merge_result() to pci_ers_merge_result()
> CXL/PCI: Introduce CXL uncorrectable protocol error recovery
> CXL/PCI: Enable CXL protocol errors during CXL Port probe
> CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup
>
> drivers/cxl/Kconfig | 10 +
> drivers/cxl/core/Makefile | 2 +-
> drivers/cxl/core/core.h | 46 +++
> drivers/cxl/core/pci.c | 381 ++----------------
> drivers/cxl/core/port.c | 11 +-
> drivers/cxl/core/ras.c | 700 +++++++++++++++++++++++++++++++++-
> drivers/cxl/core/regs.c | 12 +-
> drivers/cxl/core/trace.h | 68 +---
> drivers/cxl/cxl.h | 11 +-
> drivers/cxl/cxlpci.h | 56 ---
> drivers/cxl/mem.c | 6 +-
> drivers/cxl/pci.c | 11 +-
> drivers/cxl/port.c | 5 +
> drivers/pci/pci.c | 19 +-
> drivers/pci/pci.h | 31 +-
> drivers/pci/pcie/Kconfig | 9 -
> drivers/pci/pcie/Makefile | 2 +
> drivers/pci/pcie/aer.c | 164 +-------
> drivers/pci/pcie/cxl_aer.c | 144 +++++++
> drivers/pci/pcie/err.c | 14 +-
> drivers/pci/pcie/rcec.c | 1 +
> drivers/pci/pcie/rch_aer.c | 99 +++++
> drivers/pci/probe.c | 25 ++
> include/linux/aer.h | 29 ++
> include/linux/pci.h | 30 ++
> include/ras/ras_event.h | 9 +-
> include/uapi/linux/pci_regs.h | 65 +++-
> tools/testing/cxl/Kbuild | 2 +-
> 28 files changed, 1286 insertions(+), 676 deletions(-)
> create mode 100644 drivers/pci/pcie/cxl_aer.c
> create mode 100644 drivers/pci/pcie/rch_aer.c
>
> base-commit: f11a5f89910a7ae970fbce4fdc02d86a8ba8570f
Hi Terry, this seems to be some version of cxl/next based on 6.16-rc4. Please rebase the code against linus upstream next time. Should've been a version of 6.17-rc. Thanks.
DJ
> --
> 2.51.0.rc2.21.ge5ab6b3e5a
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 18/23] PCI/AER: Dequeue forwarded CXL error
2025-08-27 1:35 ` [PATCH v11 18/23] PCI/AER: Dequeue forwarded CXL error Terry Bowman
@ 2025-08-29 0:43 ` Dave Jiang
2025-08-29 7:10 ` Lukas Wunner
0 siblings, 1 reply; 53+ messages in thread
From: Dave Jiang @ 2025-08-29 0:43 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> The AER driver is now designed to forward CXL protocol errors to the CXL
I would rephrase it to:
The AER driver enqueues the CXL protocol error info to the created kfifo for the CXL driver to consume.
> driver. Update the CXL driver with functionality to dequeue the forwarded
> CXL error from the kfifo. Also, update the CXL driver to begin the protocol
> error handling processing using the work received from the FIFO.
>
> Update function cxl_proto_err_work_fn() to dequeue work forwarded by the
> AER service driver. This will begin the CXL protocol error processing with
> a call to cxl_handle_proto_error().
>
> Introduce logic to take the SBDF values from 'struct cxl_proto_error_info'
> and use in discovering the erring PCI device. The call to pci_get_domain_bus_and_slot()
> will return a reference counted 'struct pci_dev *'. This will serve as
> reference count to prevent releasing the CXL Endpoint's mapped RAS while
> handling the error. Use scope base __free() to put the reference count.
> This will change when adding support for CXL port devices in the future.
>
> Implement cxl_handle_proto_error() to differentiate between Restricted CXL
> Host (RCH) protocol errors and CXL virtual host (VH) protocol errors.
> Maintain the existing RCH handling. Export the AER driver's pcie_walk_rcec()
> allowing the CXL driver to walk the RCEC's secondary bus.
>
> VH correctable error (CE) processing will call the CXL CE handler. VH
> uncorrectable errors (UCE) will call cxl_do_recovery(), implemented as a
> stub for now and to be updated in future patch. Export pci_aer_clean_fatal_status()
> and pci_clean_device_status() used to clean up AER status after handling.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>
> ---
> Changes in v10->v11:
> - Reword patch commit message to remove RCiEP details (Jonathan)
> - Add #include <linux/bitfield.h> (Terry)
> - is_cxl_rcd() - Fix short comment message wrap (Jonathan)
> - is_cxl_rcd() - Combine return calls into 1 (Jonathan)
> - cxl_handle_proto_error() - Move comment earlier (Jonathan)
> - Usse FIELD_GET() in discovering class code (Jonathan)
> - Remove BDF from cxl_proto_err_work_data. Use 'struct pci_dev *' (Dan)
> ---
> drivers/cxl/core/ras.c | 68 ++++++++++++++++++++++++++++++++++-------
> drivers/pci/pci.c | 1 +
> drivers/pci/pci.h | 7 -----
> drivers/pci/pcie/aer.c | 1 +
> drivers/pci/pcie/rcec.c | 1 +
> include/linux/aer.h | 2 ++
> include/linux/pci.h | 10 ++++++
> 7 files changed, 72 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index b285448c2d9c..a2e95c49f965 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -118,17 +118,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
> }
> static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>
> -int cxl_ras_init(void)
> -{
> - return cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
> -}
> -
> -void cxl_ras_exit(void)
> -{
> - cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> - cancel_work_sync(&cxl_cper_prot_err_work);
> -}
> -
> static pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
> static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base);
>
> @@ -331,6 +320,10 @@ void cxl_endpoint_port_init_ras(struct cxl_port *ep)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_init_ras, "CXL");
>
> +static void cxl_do_recovery(struct device *dev)
> +{
> +}
> +
> static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> {
> void __iomem *addr;
> @@ -472,3 +465,56 @@ pci_ers_result_t pci_error_detected(struct pci_dev *pdev,
> return rc;
> }
> EXPORT_SYMBOL_NS_GPL(pci_error_detected, "CXL");
> +
> +static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> +{
> + struct pci_dev *pdev = err_info->pdev;
> + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
So this function is called from the workqueue thread to consume data from the kfifo right? Do we need to take the device lock of the pdev to ensure that a driver is bound to the device before we attempt to retrieve the data? And do we also need to verify that the driver bound is the cxl_pci driver (and not something like vfio_pci)? Otherwise I think assuming the drv data is cxl_dev_state may cause crash.
DJ
> + struct cxl_memdev *cxlmd = cxlds->cxlmd;
> + struct device *host_dev __free(put_device) = get_device(&cxlmd->dev);
> +> + if (err_info->severity == AER_CORRECTABLE) {
> + int aer = pdev->aer_cap;
> +
> + if (aer)
> + pci_clear_and_set_config_dword(pdev,
> + aer + PCI_ERR_COR_STATUS,
> + 0, PCI_ERR_COR_INTERNAL);
> +
> + cxl_cor_error_detected(&cxlmd->dev);
> +
> + pcie_clear_device_status(pdev);
> + } else {
> + cxl_do_recovery(&cxlmd->dev);
> + }
> +}
> +
> +static void cxl_proto_err_work_fn(struct work_struct *work)
> +{
> + struct cxl_proto_err_work_data wd;
> +
> + while (cxl_proto_err_kfifo_get(&wd))
> + cxl_handle_proto_error(&wd);
> +}
> +
> +static struct work_struct cxl_proto_err_work;
> +static DECLARE_WORK(cxl_proto_err_work, cxl_proto_err_work_fn);
> +
> +int cxl_ras_init(void)
> +{
> + if (cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work))
> + pr_err("Failed to initialize CXL RAS CPER\n");
> +
> + cxl_register_proto_err_work(&cxl_proto_err_work);
> +
> + return 0;
> +}
> +
> +void cxl_ras_exit(void)
> +{
> + cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> + cancel_work_sync(&cxl_cper_prot_err_work);
> +
> + cxl_unregister_proto_err_work();
> + cancel_work_sync(&cxl_proto_err_work);
> +}
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d775ed37a79b..2c9827690cb3 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2328,6 +2328,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
> pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> }
> +EXPORT_SYMBOL_NS_GPL(pcie_clear_device_status, "CXL");
> #endif
>
> /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index cfa75903dd3f..69ff7c2d214f 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -671,16 +671,10 @@ static inline bool pci_dpc_recovered(struct pci_dev *pdev) { return false; }
> void pci_rcec_init(struct pci_dev *dev);
> void pci_rcec_exit(struct pci_dev *dev);
> void pcie_link_rcec(struct pci_dev *rcec);
> -void pcie_walk_rcec(struct pci_dev *rcec,
> - int (*cb)(struct pci_dev *, void *),
> - void *userdata);
> #else
> static inline void pci_rcec_init(struct pci_dev *dev) { }
> static inline void pci_rcec_exit(struct pci_dev *dev) { }
> static inline void pcie_link_rcec(struct pci_dev *rcec) { }
> -static inline void pcie_walk_rcec(struct pci_dev *rcec,
> - int (*cb)(struct pci_dev *, void *),
> - void *userdata) { }
> #endif
>
> #ifdef CONFIG_PCI_ATS
> @@ -1022,7 +1016,6 @@ void pci_restore_aer_state(struct pci_dev *dev);
> static inline void pci_no_aer(void) { }
> static inline void pci_aer_init(struct pci_dev *d) { }
> static inline void pci_aer_exit(struct pci_dev *d) { }
> -static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
> static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
> static inline void pci_save_aer_state(struct pci_dev *dev) { }
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 627d89ccea9c..45abe1622316 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -288,6 +288,7 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
> if (status)
> pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> }
> +EXPORT_SYMBOL_GPL(pci_aer_clear_fatal_status);
>
> /**
> * pci_aer_raw_clear_status - Clear AER error registers.
> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> index d0bcd141ac9c..fb6cf6449a1d 100644
> --- a/drivers/pci/pcie/rcec.c
> +++ b/drivers/pci/pcie/rcec.c
> @@ -145,6 +145,7 @@ void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
>
> walk_rcec(walk_rcec_helper, &rcec_data);
> }
> +EXPORT_SYMBOL_NS_GPL(pcie_walk_rcec, "CXL");
>
> void pci_rcec_init(struct pci_dev *dev)
> {
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index f8eb32805957..1f79f0be4bf7 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -66,12 +66,14 @@ struct cxl_proto_err_work_data {
>
> #if defined(CONFIG_PCIEAER)
> int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> +void pci_aer_clear_fatal_status(struct pci_dev *dev);
> int pcie_aer_is_native(struct pci_dev *dev);
> #else
> static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> {
> return -EINVAL;
> }
> +static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> #endif
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3dcab36c437f..3407d687459d 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1804,6 +1804,9 @@ extern bool pcie_ports_native;
>
> int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req,
> bool use_lt);
> +void pcie_walk_rcec(struct pci_dev *rcec,
> + int (*cb)(struct pci_dev *, void *),
> + void *userdata);
> #else
> #define pcie_ports_disabled true
> #define pcie_ports_native false
> @@ -1814,8 +1817,15 @@ static inline int pcie_set_target_speed(struct pci_dev *port,
> {
> return -EOPNOTSUPP;
> }
> +
> +static inline void pcie_walk_rcec(struct pci_dev *rcec,
> + int (*cb)(struct pci_dev *, void *),
> + void *userdata) { }
> +
> #endif
>
> +void pcie_clear_device_status(struct pci_dev *dev);
> +
> #define PCIE_LINK_STATE_L0S (BIT(0) | BIT(1)) /* Upstr/dwnstr L0s */
> #define PCIE_LINK_STATE_L1 BIT(2) /* L1 state */
> #define PCIE_LINK_STATE_L1_1 BIT(3) /* ASPM L1.1 state */
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 18/23] PCI/AER: Dequeue forwarded CXL error
2025-08-29 0:43 ` Dave Jiang
@ 2025-08-29 7:10 ` Lukas Wunner
0 siblings, 0 replies; 53+ messages in thread
From: Lukas Wunner @ 2025-08-29 7:10 UTC (permalink / raw)
To: Dave Jiang
Cc: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny,
linux-kernel, linux-pci
On Thu, Aug 28, 2025 at 05:43:31PM -0700, Dave Jiang wrote:
> On 8/26/25 6:35 PM, Terry Bowman wrote:
> > +static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> > +{
> > + struct pci_dev *pdev = err_info->pdev;
> > + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>
> So this function is called from the workqueue thread to consume data
> from the kfifo right? Do we need to take the device lock of the pdev
> to ensure that a driver is bound to the device before we attempt to
> retrieve the data? And do we also need to verify that the driver bound
> is the cxl_pci driver (and not something like vfio_pci)? Otherwise I
> think assuming the drv data is cxl_dev_state may cause crash.
In v10 of this series, there used to be a cxl_pci_drv_bound() function
to verify that the cxl_pci_driver is bound and not some other driver.
That function was called from cxl_rch_handle_error_iter().
It seems this is gone in v11?
Thanks,
Lukas
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 06/23] CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors
2025-08-28 20:53 ` Dave Jiang
@ 2025-08-29 8:39 ` Lukas Wunner
0 siblings, 0 replies; 53+ messages in thread
From: Lukas Wunner @ 2025-08-29 8:39 UTC (permalink / raw)
To: Dave Jiang
Cc: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny,
linux-kernel, linux-pci
On Thu, Aug 28, 2025 at 01:53:35PM -0700, Dave Jiang wrote:
> On 8/26/25 6:35 PM, Terry Bowman wrote:
> > drivers/cxl/Kconfig | 9 +++-
> > drivers/cxl/core/ras.c | 3 ++
> > drivers/pci/pci.h | 20 +++++++
> > drivers/pci/pcie/Makefile | 1 +
> > drivers/pci/pcie/aer.c | 108 +++----------------------------------
> > drivers/pci/pcie/rch_aer.c | 99 ++++++++++++++++++++++++++++++++++
>
> I wonder if this should be cxl_rch_aer.c to be clear that it's cxl
> related code.
I'd prefer an "aer_" prefix at the beginning of filenames,
but maybe that's just me...
Thanks,
Lukas
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 02/23] CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS
2025-08-27 1:35 ` [PATCH v11 02/23] CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS Terry Bowman
@ 2025-08-29 15:24 ` Jonathan Cameron
2025-08-29 18:16 ` Sathyanarayanan Kuppuswamy
1 sibling, 0 replies; 53+ messages in thread
From: Jonathan Cameron @ 2025-08-29 15:24 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
alucerop, ira.weiny, linux-kernel, linux-pci
On Tue, 26 Aug 2025 20:35:17 -0500
Terry Bowman <terry.bowman@amd.com> wrote:
> CXL RAS compilation is enabled using CONFIG_CXL_RAS while the AER CXL logic
> uses CONFIG_PCIEAER_CXL. The 2 share the same dependencies and can be
> combined. The 2 kernel configs are unnecessary and are problematic for the
> user because of the duplication. Replace occurrences of CONFIG_PCIEAER_CXL
> to be CONFIG_CXL_RAS.
>
> Update the CONFIG_CXL_RAS Kconfig definition to include dependencies 'PCIEAER
> && CXL_PCI' taken from the CONFIG_PCIEAER_CXL definition.
>
> Remove the Kconfig CONFIG_PCIEAER_CXL definition.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Bjorn, do you mind having some code in PCIE land built depending on
a config symbol from drivers/cxl? Alternative to this would be to remove
the visible symbol (drop text after bool) and select it from
the CONFIG_CXL_RAS entry.
>
> ---
> Changes in v10 -> v11:
> - New patch
> ---
> drivers/pci/pcie/Kconfig | 9 ---------
> drivers/pci/pcie/aer.c | 2 +-
> 2 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 17919b99fa66..207c2deae35f 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,15 +49,6 @@ config PCIEAER_INJECT
> gotten from:
> https://github.com/intel/aer-inject.git
>
> -config PCIEAER_CXL
> - bool "PCI Express CXL RAS support"
> - default y
> - depends on PCIEAER && CXL_PCI
> - help
> - Enables CXL error handling.
> -
> - If unsure, say Y.
> -
> #
> # PCI Express ECRC
> #
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 70ac66188367..7fe9f883f5c5 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1086,7 +1086,7 @@ static bool find_source_device(struct pci_dev *parent,
> return true;
> }
>
> -#ifdef CONFIG_PCIEAER_CXL
> +#ifdef CONFIG_CXL_RAS
>
> /**
> * pci_aer_unmask_internal_errors - unmask internal errors
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 05/23] cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block
2025-08-27 1:35 ` [PATCH v11 05/23] cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block Terry Bowman
2025-08-28 8:57 ` Alejandro Lucero Palau
@ 2025-08-29 15:33 ` Jonathan Cameron
1 sibling, 0 replies; 53+ messages in thread
From: Jonathan Cameron @ 2025-08-29 15:33 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
alucerop, ira.weiny, linux-kernel, linux-pci
On Tue, 26 Aug 2025 20:35:20 -0500
Terry Bowman <terry.bowman@amd.com> wrote:
> Restricted CXL Host (RCH) protocol error handling uses a procedure distinct
> from the CXL Virtual Hierarchy (VH) handling. This is because of the
> differences in the RCH and VH topologies. Improve the maintainability and
> add ability to enable/disable RCH handling.
>
> Move and combine the RCH handling code into a single block conditionally
> compiled with the CONFIG_CXL_RCH_RAS kernel config.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
How painful to move this to a ras_rch.c file and conditionally compile that?
Would want to do that is some merged thing with patch 1 though, rather than
moving at least some of the code twice.
> ---
> v10->v11:
> - New patch
> ---
> drivers/cxl/core/ras.c | 175 +++++++++++++++++++++--------------------
> 1 file changed, 90 insertions(+), 85 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 0875ce8116ff..f42f9a255ef8 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -126,6 +126,7 @@ void cxl_ras_exit(void)
> cancel_work_sync(&cxl_cper_prot_err_work);
> }
>
> +#ifdef CONFIG_CXL_RCH_RAS
> static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> {
> resource_size_t aer_phys;
> @@ -141,18 +142,6 @@ static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> }
> }
>
> -static void cxl_dport_map_ras(struct cxl_dport *dport)
> -{
> - struct cxl_register_map *map = &dport->reg_map;
> - struct device *dev = dport->dport_dev;
> -
> - if (!map->component_map.ras.valid)
> - dev_dbg(dev, "RAS registers not found\n");
> - else if (cxl_map_component_regs(map, &dport->regs.component,
> - BIT(CXL_CM_CAP_CAP_ID_RAS)))
> - dev_dbg(dev, "Failed to map RAS capability.\n");
> -}
> -
> static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
> {
> void __iomem *aer_base = dport->regs.dport_aer;
> @@ -177,6 +166,95 @@ static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
> writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
> }
>
> +/*
> + * Copy the AER capability registers using 32 bit read accesses.
> + * This is necessary because RCRB AER capability is MMIO mapped. Clear the
> + * status after copying.
> + *
> + * @aer_base: base address of AER capability block in RCRB
> + * @aer_regs: destination for copying AER capability
> + */
> +static bool cxl_rch_get_aer_info(void __iomem *aer_base,
> + struct aer_capability_regs *aer_regs)
> +{
> + int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
> + u32 *aer_regs_buf = (u32 *)aer_regs;
> + int n;
> +
> + if (!aer_base)
> + return false;
> +
> + /* Use readl() to guarantee 32-bit accesses */
> + for (n = 0; n < read_cnt; n++)
> + aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
> +
> + writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
> + writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
> +
> + return true;
> +}
> +
> +/* Get AER severity. Return false if there is no error. */
> +static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
> + int *severity)
> +{
> + if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
> + if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
> + *severity = AER_FATAL;
> + else
> + *severity = AER_NONFATAL;
> + return true;
> + }
> +
> + if (aer_regs->cor_status & ~aer_regs->cor_mask) {
> + *severity = AER_CORRECTABLE;
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> +{
> + struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> + struct aer_capability_regs aer_regs;
> + struct cxl_dport *dport;
> + int severity;
> +
> + struct cxl_port *port __free(put_cxl_port) =
> + cxl_pci_find_port(pdev, &dport);
> + if (!port)
> + return;
> +
> + if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
> + return;
> +
> + if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
> + return;
> +
> + pci_print_aer(pdev, severity, &aer_regs);
> + if (severity == AER_CORRECTABLE)
> + cxl_handle_cor_ras(cxlds, dport->regs.ras);
> + else
> + cxl_handle_ras(cxlds, dport->regs.ras);
> +}
> +#else
> +static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
> +static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
> +static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
> +#endif
> +
> +static void cxl_dport_map_ras(struct cxl_dport *dport)
> +{
> + struct cxl_register_map *map = &dport->reg_map;
> + struct device *dev = dport->dport_dev;
> +
> + if (!map->component_map.ras.valid)
> + dev_dbg(dev, "RAS registers not found\n");
> + else if (cxl_map_component_regs(map, &dport->regs.component,
> + BIT(CXL_CM_CAP_CAP_ID_RAS)))
> + dev_dbg(dev, "Failed to map RAS capability.\n");
> +}
>
> /**
> * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
> @@ -270,79 +348,6 @@ static bool cxl_handle_ras(struct cxl_dev_state *cxlds, void __iomem *ras_base)
> return true;
> }
>
> -/*
> - * Copy the AER capability registers using 32 bit read accesses.
> - * This is necessary because RCRB AER capability is MMIO mapped. Clear the
> - * status after copying.
> - *
> - * @aer_base: base address of AER capability block in RCRB
> - * @aer_regs: destination for copying AER capability
> - */
> -static bool cxl_rch_get_aer_info(void __iomem *aer_base,
> - struct aer_capability_regs *aer_regs)
> -{
> - int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
> - u32 *aer_regs_buf = (u32 *)aer_regs;
> - int n;
> -
> - if (!aer_base)
> - return false;
> -
> - /* Use readl() to guarantee 32-bit accesses */
> - for (n = 0; n < read_cnt; n++)
> - aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
> -
> - writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
> - writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
> -
> - return true;
> -}
> -
> -/* Get AER severity. Return false if there is no error. */
> -static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
> - int *severity)
> -{
> - if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
> - if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
> - *severity = AER_FATAL;
> - else
> - *severity = AER_NONFATAL;
> - return true;
> - }
> -
> - if (aer_regs->cor_status & ~aer_regs->cor_mask) {
> - *severity = AER_CORRECTABLE;
> - return true;
> - }
> -
> - return false;
> -}
> -
> -static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> -{
> - struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> - struct aer_capability_regs aer_regs;
> - struct cxl_dport *dport;
> - int severity;
> -
> - struct cxl_port *port __free(put_cxl_port) =
> - cxl_pci_find_port(pdev, &dport);
> - if (!port)
> - return;
> -
> - if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
> - return;
> -
> - if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
> - return;
> -
> - pci_print_aer(pdev, severity, &aer_regs);
> - if (severity == AER_CORRECTABLE)
> - cxl_handle_cor_ras(cxlds, dport->regs.ras);
> - else
> - cxl_handle_ras(cxlds, dport->regs.ras);
> -}
> -
> void cxl_cor_error_detected(struct pci_dev *pdev)
> {
> struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
2025-08-27 14:51 ` Lukas Wunner
@ 2025-08-29 15:42 ` Jonathan Cameron
2025-08-29 15:47 ` Jonathan Cameron
1 sibling, 0 replies; 53+ messages in thread
From: Jonathan Cameron @ 2025-08-29 15:42 UTC (permalink / raw)
To: Lukas Wunner
Cc: Terry Bowman, dave, dave.jiang, alison.schofield, dan.j.williams,
bhelgaas, shiju.jose, ming.li, Smita.KoralahalliChannabasappa,
rrichter, dan.carpenter, PradeepVineshReddy.Kodamati,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
alucerop, ira.weiny, linux-kernel, linux-pci
On Wed, 27 Aug 2025 16:51:15 +0200
Lukas Wunner <lukas@wunner.de> wrote:
> On Tue, Aug 26, 2025 at 08:35:22PM -0500, Terry Bowman wrote:
> > The CXL DVSECs are currently defined in cxl/core/cxlpci.h. These are not
> > accessible to other subsystems.
> >
> > Change DVSEC name formatting to follow the existing PCI format in
> > pci_regs.h. The current format uses CXL_DVSEC_XYZ. Change to be PCI_DVSEC_CXL_XYZ.
> [...]
> > +++ b/include/uapi/linux/pci_regs.h
> > @@ -1225,9 +1225,61 @@
> > /* Deprecated old name, replaced with PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE */
> > #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE
> >
> > -/* Compute Express Link (CXL r3.1, sec 8.1.5) */
> > -#define PCI_DVSEC_CXL_PORT 3
> > -#define PCI_DVSEC_CXL_PORT_CTL 0x0c
> > -#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
> > +/* Compute Express Link (CXL r3.2, sec 8.1)
> > + *
> > + * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
> > + * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
> > + * registers on downstream link-up events.
> > + */
> > +
> > +#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
> > +
> > +/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
> > +#define PCI_DVSEC_CXL_DEVICE 0
> > +#define PCI_DVSEC_CXL_CAP_OFFSET 0xA
> > +#define PCI_DVSEC_CXL_MEM_CAPABLE BIT(2)
> > +#define PCI_DVSEC_CXL_HDM_COUNT_MASK GENMASK(5, 4)
> > +#define PCI_DVSEC_CXL_CTRL_OFFSET 0xC
> > +#define PCI_DVSEC_CXL_MEM_ENABLE BIT(2)
> > +#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> > +#define PCI_DVSEC_CXL_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> > +#define PCI_DVSEC_CXL_MEM_INFO_VALID BIT(0)
> > +#define PCI_DVSEC_CXL_MEM_ACTIVE BIT(1)
> > +#define PCI_DVSEC_CXL_MEM_SIZE_LOW_MASK GENMASK(31, 28)
> > +#define PCI_DVSEC_CXL_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> > +#define PCI_DVSEC_CXL_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> > +#define PCI_DVSEC_CXL_MEM_BASE_LOW_MASK GENMASK(31, 28)
>
> Is it legal to use BIT() in a uapi header?
>
> We've only allowed use of GENMASK() in uapi headers since 2023 with
> commit 3c7a8e190bc5 ("uapi: introduce uapi-friendly macros for GENMASK").
>
> But there is no uapi header in the kernel tree defining BIT().
>
> I note that include/uapi/cxl/features.h has plenty of occurrences of BIT()
> since commit 9b8e73cdb141 ("cxl: Move cxl feature command structs to user
> header"), which went into v6.15.
>
> ndctl contains a bitmap.h header defining BIT(), I guess that's why this
> wasn't perceived as a problem so far:
> https://github.com/pmem/ndctl/raw/main/util/bitmap.h
>
> However existing user space applications including <linux/pci_regs.h>
> may not have a BIT() definition and I suspect your change above will
> break the build of those applications.
Probably not breaking existing code, as it would have to actually use the
define to run into problems. However, it's still a an excellent point
as adding register definitions that can't be used is not helpful.
Jonathan
>
> Thanks,
>
> Lukas
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
2025-08-27 14:51 ` Lukas Wunner
2025-08-29 15:42 ` Jonathan Cameron
@ 2025-08-29 15:47 ` Jonathan Cameron
1 sibling, 0 replies; 53+ messages in thread
From: Jonathan Cameron @ 2025-08-29 15:47 UTC (permalink / raw)
To: Lukas Wunner
Cc: Terry Bowman, dave, dave.jiang, alison.schofield, dan.j.williams,
bhelgaas, shiju.jose, ming.li, Smita.KoralahalliChannabasappa,
rrichter, dan.carpenter, PradeepVineshReddy.Kodamati,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
alucerop, ira.weiny, linux-kernel, linux-pci, shiju.jose
On Wed, 27 Aug 2025 16:51:15 +0200
Lukas Wunner <lukas@wunner.de> wrote:
> On Tue, Aug 26, 2025 at 08:35:22PM -0500, Terry Bowman wrote:
> > The CXL DVSECs are currently defined in cxl/core/cxlpci.h. These are not
> > accessible to other subsystems.
> >
> > Change DVSEC name formatting to follow the existing PCI format in
> > pci_regs.h. The current format uses CXL_DVSEC_XYZ. Change to be PCI_DVSEC_CXL_XYZ.
> [...]
> > +++ b/include/uapi/linux/pci_regs.h
> > @@ -1225,9 +1225,61 @@
> > /* Deprecated old name, replaced with PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE */
> > #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE
> >
> > -/* Compute Express Link (CXL r3.1, sec 8.1.5) */
> > -#define PCI_DVSEC_CXL_PORT 3
> > -#define PCI_DVSEC_CXL_PORT_CTL 0x0c
> > -#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
> > +/* Compute Express Link (CXL r3.2, sec 8.1)
> > + *
> > + * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state
> > + * is "disconnected" (CXL r3.2, sec 9.12.3). Re-enumerate these
> > + * registers on downstream link-up events.
> > + */
> > +
> > +#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
> > +
> > +/* CXL 3.2 8.1.3: PCIe DVSEC for CXL Device */
> > +#define PCI_DVSEC_CXL_DEVICE 0
> > +#define PCI_DVSEC_CXL_CAP_OFFSET 0xA
> > +#define PCI_DVSEC_CXL_MEM_CAPABLE BIT(2)
> > +#define PCI_DVSEC_CXL_HDM_COUNT_MASK GENMASK(5, 4)
> > +#define PCI_DVSEC_CXL_CTRL_OFFSET 0xC
> > +#define PCI_DVSEC_CXL_MEM_ENABLE BIT(2)
> > +#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
> > +#define PCI_DVSEC_CXL_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
> > +#define PCI_DVSEC_CXL_MEM_INFO_VALID BIT(0)
> > +#define PCI_DVSEC_CXL_MEM_ACTIVE BIT(1)
> > +#define PCI_DVSEC_CXL_MEM_SIZE_LOW_MASK GENMASK(31, 28)
> > +#define PCI_DVSEC_CXL_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
> > +#define PCI_DVSEC_CXL_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> > +#define PCI_DVSEC_CXL_MEM_BASE_LOW_MASK GENMASK(31, 28)
>
> Is it legal to use BIT() in a uapi header?
>
> We've only allowed use of GENMASK() in uapi headers since 2023 with
> commit 3c7a8e190bc5 ("uapi: introduce uapi-friendly macros for GENMASK").
>
I was curious, so went looking at that series.
There was a second patch that has what we need here
https://lore.kernel.org/all/20240131225010.2872733-3-pbonzini@redhat.com/
So upshot is we should use _BITUL() for user space headers defined in
uapi/linux/const.h, also should be using __GENMASK() not GENMASK()
in uapi/cxl/features.h
Anyhow fancy fixing up the existing cxl headers? I'll get to it maybe but
will be a while as have a massive review backlog.
Jonathan
> But there is no uapi header in the kernel tree defining BIT().
>
> I note that include/uapi/cxl/features.h has plenty of occurrences of BIT()
> since commit 9b8e73cdb141 ("cxl: Move cxl feature command structs to user
> header"), which went into v6.15.
>
> ndctl contains a bitmap.h header defining BIT(), I guess that's why this
> wasn't perceived as a problem so far:
> https://github.com/pmem/ndctl/raw/main/util/bitmap.h
>
> However existing user space applications including <linux/pci_regs.h>
> may not have a BIT() definition and I suspect your change above will
> break the build of those applications.
>
> Thanks,
>
> Lukas
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 10/23] CXL/AER: Update PCI class code check to use FIELD_GET()
2025-08-27 1:35 ` [PATCH v11 10/23] CXL/AER: Update PCI class code check to use FIELD_GET() Terry Bowman
@ 2025-08-29 16:03 ` Jonathan Cameron
0 siblings, 0 replies; 53+ messages in thread
From: Jonathan Cameron @ 2025-08-29 16:03 UTC (permalink / raw)
To: Terry Bowman
Cc: dave, dave.jiang, alison.schofield, dan.j.williams, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, linux-cxl,
alucerop, ira.weiny, linux-kernel, linux-pci
On Tue, 26 Aug 2025 20:35:25 -0500
Terry Bowman <terry.bowman@amd.com> wrote:
> Update the AER driver's is_cxl_mem_dev() to use FIELD_GET() while checking
> for a CXL Endpoint class code.
>
> Introduce a genmask bitmask for checking PCI class codes and locate in
> include/uapi/linux/pci_regs.h.
>
> Update the function documentation to reference the latest CXL
> specification.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
In general I like this change, but maybe we should make a broader use of
it even in this introductory patch. E.g. use FIELD_PREP()
in cxl_mem_pci_tbl() and perhaps add a define for the mask
of the programming interface as well?
Maybe use it in pci_ids.h (which raises the question of whether
it's in the right place) for PCI_CLASS_BRIDGE_PCI_NORMAL / SUBTRACTIVE
and the various USB definitions that include the programming interface.
Perhaps wider use is something for a follow up, but nice to at least
use it everywhere relevant in the CXL code to make the justification stronger
from the start.
>
> ---
> Changes in v10->v11:
> - Add #include <linux/bitfield.h> to cxl_ras.c
I think you missed and hit wrong file!
> - Removed line wrapping at "(CXL 3.2, 8.1.12.1)".
> ---
> drivers/pci/pcie/aer.c | 1 +
> drivers/pci/pcie/rch_aer.c | 6 +++---
> include/uapi/linux/pci_regs.h | 2 ++
> 3 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1b5f5b0cdc4f..ed1de9256898 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -30,6 +30,7 @@
> #include <linux/kfifo.h>
> #include <linux/ratelimit.h>
> #include <linux/slab.h>
> +#include <linux/bitfield.h>
This seems unrelated as no use of new stuff in here.
Also, looks from what I can see here to be alphabetical order.
> #include <acpi/apei.h>
> #include <acpi/ghes.h>
> #include <ras/ras_event.h>
> diff --git a/drivers/pci/pcie/rch_aer.c b/drivers/pci/pcie/rch_aer.c
> index bfe071eebf67..c3e2d4cbe8cc 100644
> --- a/drivers/pci/pcie/rch_aer.c
> +++ b/drivers/pci/pcie/rch_aer.c
> @@ -17,10 +17,10 @@ static bool is_cxl_mem_dev(struct pci_dev *dev)
> return false;
>
> /*
> - * CXL Memory Devices must have the 502h class code set (CXL
> - * 3.0, 8.1.12.1).
> + * CXL Memory Devices must have the 502h class code set
> + * (CXL 3.2, 8.1.12.1).
> */
> - if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> + if (FIELD_GET(PCI_CLASS_CODE_MASK, dev->class) != PCI_CLASS_MEMORY_CXL)
> return false;
>
> return true;
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 252c06402b13..c7b635f6cf36 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -73,6 +73,8 @@
> #define PCI_CLASS_PROG 0x09 /* Reg. Level Programming Interface */
> #define PCI_CLASS_DEVICE 0x0a /* Device class */
>
> +#define PCI_CLASS_CODE_MASK __GENMASK(23, 8)
> +
> #define PCI_CACHE_LINE_SIZE 0x0c /* 8 bits */
> #define PCI_LATENCY_TIMER 0x0d /* 8 bits */
> #define PCI_HEADER_TYPE 0x0e /* 8 bits */
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
2025-08-27 11:55 ` Shiju Jose
@ 2025-08-29 16:06 ` Jonathan Cameron
0 siblings, 0 replies; 53+ messages in thread
From: Jonathan Cameron @ 2025-08-29 16:06 UTC (permalink / raw)
To: Shiju Jose
Cc: Terry Bowman, dave@stgolabs.net, dave.jiang@intel.com,
alison.schofield@intel.com, dan.j.williams@intel.com,
bhelgaas@google.com, ming.li@zohomail.com,
Smita.KoralahalliChannabasappa@amd.com, rrichter@amd.com,
dan.carpenter@linaro.org, PradeepVineshReddy.Kodamati@amd.com,
lukas@wunner.de, Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
linux-cxl@vger.kernel.org, alucerop@amd.com, ira.weiny@intel.com,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
> >Subject: [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints
> >and CXL Ports
> >
> >CXL currently has separate trace routines for CXL Port errors and CXL Endpoint
> >errors. This is inconvenient for the user because they must enable
> >2 sets of trace routines. Make updates to the trace logging such that a single
> >trace routine logs both CXL Endpoint and CXL Port protocol errors.
> >
> >Keep the trace log fields 'memdev' and 'host'. While these are not accurate for
> >non-Endpoints the fields will remain as-is to prevent breaking userspace RAS
> >trace consumers.
> >
> >Add serial number parameter to the trace logging. This is used for EPs and 0 is
> >provided for CXL port devices without a serial number.
> >
> >Leave the correctable and uncorrectable trace routines' TP_STRUCT__entry()
> >unchanged with respect to member data types and order.
> >
> >Below is output of correctable and uncorrectable protocol error logging.
> >CXL Root Port and CXL Endpoint examples are included below.
> >
> >Root Port:
> >cxl_aer_correctable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0
> >status='CRC Threshold Hit'
> >cxl_aer_uncorrectable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0
> >status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity
> >Error'
> >
> >Endpoint:
> >cxl_aer_correctable_error: memdev=mem3 host=0000:0f:00.0 serial=0
> >status='CRC Threshold Hit'
> >cxl_aer_uncorrectable_error: memdev=mem3 host=0000:0f:00.0 serial: 0 status:
> >'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
> >
> >Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> Reviewed-by: Shiju Jose <shiju.jose@huawei.com>,
> apart from one error below.
>
Good spot.
With that fixed up,
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 02/23] CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS
2025-08-27 1:35 ` [PATCH v11 02/23] CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS Terry Bowman
2025-08-29 15:24 ` Jonathan Cameron
@ 2025-08-29 18:16 ` Sathyanarayanan Kuppuswamy
1 sibling, 0 replies; 53+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-29 18:16 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
alison.schofield, dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham, linux-cxl,
alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> CXL RAS compilation is enabled using CONFIG_CXL_RAS while the AER CXL logic
> uses CONFIG_PCIEAER_CXL. The 2 share the same dependencies and can be
> combined. The 2 kernel configs are unnecessary and are problematic for the
> user because of the duplication. Replace occurrences of CONFIG_PCIEAER_CXL
> to be CONFIG_CXL_RAS.
>
> Update the CONFIG_CXL_RAS Kconfig definition to include dependencies 'PCIEAER
> && CXL_PCI' taken from the CONFIG_PCIEAER_CXL definition.
>
> Remove the Kconfig CONFIG_PCIEAER_CXL definition.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
Looks good to me.
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Changes in v10 -> v11:
> - New patch
> ---
> drivers/pci/pcie/Kconfig | 9 ---------
> drivers/pci/pcie/aer.c | 2 +-
> 2 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 17919b99fa66..207c2deae35f 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,15 +49,6 @@ config PCIEAER_INJECT
> gotten from:
> https://github.com/intel/aer-inject.git
>
> -config PCIEAER_CXL
> - bool "PCI Express CXL RAS support"
> - default y
> - depends on PCIEAER && CXL_PCI
> - help
> - Enables CXL error handling.
> -
> - If unsure, say Y.
> -
> #
> # PCI Express ECRC
> #
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 70ac66188367..7fe9f883f5c5 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1086,7 +1086,7 @@ static bool find_source_device(struct pci_dev *parent,
> return true;
> }
>
> -#ifdef CONFIG_PCIEAER_CXL
> +#ifdef CONFIG_CXL_RAS
>
> /**
> * pci_aer_unmask_internal_errors - unmask internal errors
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 19/23] CXL/PCI: Introduce CXL Port protocol error handlers
2025-08-27 1:35 ` [PATCH v11 19/23] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
@ 2025-08-30 0:17 ` Dave Jiang
0 siblings, 0 replies; 53+ messages in thread
From: Dave Jiang @ 2025-08-30 0:17 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> Introduce CXL error handlers for CXL Port devices.
>
> Add functions cxl_port_cor_error_detected() and cxl_port_error_detected().
> These will serve as the handlers for all CXL Port devices. Introduce
> cxl_get_ras_base() to provide the RAS base address needed by the handlers.
>
> Update cxl_handle_proto_error() to call the CXL Port or CXL Endpoint
> handler depending on which CXL device reports the error.
>
> Implement cxl_get_ras_base() to return the cached RAS register address of a
> CXL Root Port, CXL Downstream Port, or CXL Upstream Port.
>
> Introduce get_pci_cxl_host_dev() to return the host responsible for
> releasing the RAS mapped resources. CXL endpoints do not use a host to
> manage its resources, allow for NULL in the case of an EP. Use reference
> count increment on the host to prevent resource release. Make the caller
> responsible for the reference decrement.
>
> Update the AER driver's is_cxl_error() PCI type check because CXL Port
> devices are now supported.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>
> ---
> Changes in v10->v11:
> - None
> ---
> drivers/cxl/core/core.h | 9 ++
> drivers/cxl/core/port.c | 4 +-
> drivers/cxl/core/ras.c | 176 +++++++++++++++++++++++++++++++++++--
> drivers/pci/pcie/cxl_aer.c | 5 +-
> 4 files changed, 186 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 6e3e7f2e0e2d..7e66fbb07b8a 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -155,6 +155,8 @@ pci_ers_result_t pci_error_detected(struct pci_dev *pdev,
> void pci_cor_error_detected(struct pci_dev *pdev);
> void cxl_cor_error_detected(struct device *dev);
> pci_ers_result_t cxl_error_detected(struct device *dev);
> +void cxl_port_cor_error_detected(struct device *dev);
> +pci_ers_result_t cxl_port_error_detected(struct device *dev);
> #else
> static inline int cxl_ras_init(void)
> {
> @@ -179,9 +181,16 @@ static inline pci_ers_result_t cxl_error_detected(struct device *dev)
> {
> return PCI_ERS_RESULT_NONE;
> }
> +static inline void cxl_port_cor_error_detected(struct device *dev) { }
> +static inline pci_ers_result_t cxl_port_error_detected(struct device *dev)
> +{
> + return PCI_ERS_RESULT_NONE;
> +}
> #endif // CONFIG_CXL_RAS
>
> int cxl_gpf_port_setup(struct cxl_dport *dport);
> +struct cxl_port *find_cxl_port(struct device *dport_dev,
> + struct cxl_dport **dport);
>
> #ifdef CONFIG_CXL_FEATURES
> struct cxl_feat_entry *
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 29197376b18e..758fb73374c1 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1335,8 +1335,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
> return NULL;
> }
>
> -static struct cxl_port *find_cxl_port(struct device *dport_dev,
> - struct cxl_dport **dport)
> +struct cxl_port *find_cxl_port(struct device *dport_dev,
> + struct cxl_dport **dport)
> {
> struct cxl_find_port_ctx ctx = {
> .dport_dev = dport_dev,
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index a2e95c49f965..536ca9c815ce 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -251,6 +251,154 @@ static void cxl_dport_map_ras(struct cxl_dport *dport)
> dev_dbg(dev, "Failed to map RAS capability.\n");
> }
>
> +static int match_uport(struct device *dev, const void *data)
> +{
> + const struct device *uport_dev = data;
> + struct cxl_port *port;
> +
> + if (!is_cxl_port(dev))
> + return 0;
> +
> + port = to_cxl_port(dev);
> +
> + return port->uport_dev == uport_dev;
> +}
Just heads up, there's a find_cxl_port_by_uport() coming with the deferred dport init series that you can use instead. Just FYI.
> +
> +static void __iomem *cxl_get_ras_base(struct device *dev)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + {
> + struct cxl_dport *dport = NULL;
> + struct cxl_port *port __free(put_cxl_port) = find_cxl_port(&pdev->dev, &dport);
> +
> + if (!dport || !dport->dport_dev) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> +
> + if (!dport)
> + return NULL;
> +
> + return dport->regs.ras;
> + }
> + case PCI_EXP_TYPE_UPSTREAM:
> + {
> + struct cxl_port *port;
> + struct device *port_dev __free(put_device) =
> + bus_find_device(&cxl_bus_type, NULL,
> + &pdev->dev, match_uport);
> +
> + if (!port_dev || !is_cxl_port(port_dev)) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> +
> + port = to_cxl_port(port_dev);
> + if (!port)
> + return NULL;
> +
> + return port->uport_regs.ras;
> + }
> + }
> +
> + dev_warn_once(dev, "Error: Unsupported device type (%X)", pci_pcie_type(pdev));
> + return NULL;
> +}
> +
> +static struct device *pci_to_cxl_dev(struct pci_dev *pdev)
> +{
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + {
> + struct cxl_dport *dport = NULL;
> + struct cxl_port *port __free(put_cxl_port) =
> + find_cxl_port(&pdev->dev, &dport);
> +
> + if (!dport) {
I think you can just check 'port' here right? and then you don't need to set dport to NULL.
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> +
> + return dport->dport_dev;
> + }
> + case PCI_EXP_TYPE_UPSTREAM:
> + {
> + struct cxl_port *port;
> + struct device *port_dev __free(put_device) =
> + bus_find_device(&cxl_bus_type, NULL,
> + &pdev->dev, match_uport);
> +
> + if (!port_dev || !is_cxl_port(port_dev)) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> +
> + port = to_cxl_port(port_dev);
> + if (!port)
> + return NULL;
> +
> + return port->uport_dev;
> + }
> + case PCI_EXP_TYPE_ENDPOINT:
> + {
> + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
With this function called through cxl_proto_err_work_fn() and not the AER handling stack, I have concerns of if a driver is bound to the endpoint, and if the pci driver bound to the endpoint is cxl_pci for cxlds to be valid.
> +
> + return cxlds->dev;
> + }
> + }
> +
> + pci_warn_once(pdev, "Error: Unsupported device type (%X)", pci_pcie_type(pdev));
> + return NULL;
> +}
> +
> +
> +/*
> + * Return 'struct device *' responsible for freeing pdev's CXL resources.
> + * Caller is responsible for reference count decrementing the return
> + * 'struct device *'.
> + *
> + * dev: Find the host of this dev
> + */
> +static struct device *get_cxl_host_dev(struct device *dev)
> +{> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + {
> + struct cxl_dport *dport = NULL;
No need to set to NULL. dport not used.
DJ
> + struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
> +
> + if (!port)
> + return NULL;
> +
> + return &port->dev;
> + }> + case PCI_EXP_TYPE_UPSTREAM:
> + {
> + struct device *port_dev = bus_find_device(&cxl_bus_type, NULL,
> + &pdev->dev, match_uport);
> +
> + if (!port_dev || !is_cxl_port(port_dev))
> + return NULL;
> +
> + return port_dev;
> + }
> + /* Endpoint resources are managed by endpoint itself */
> + case PCI_EXP_TYPE_ENDPOINT:
> + return NULL;
> + }
> +
> + dev_warn_once(dev, "Error: Unsupported device type (%X)", pci_pcie_type(pdev));
> + return NULL;
> +}
> +
> /**
> * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
> * @dport: the cxl_dport that needs to be initialized
> @@ -399,6 +547,22 @@ static pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __io
> return PCI_ERS_RESULT_PANIC;
> }
>
> +void cxl_port_cor_error_detected(struct device *dev)
> +{
> + void __iomem *ras_base = cxl_get_ras_base(dev);
> +
> + cxl_handle_cor_ras(dev, 0, ras_base);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_port_cor_error_detected, "CXL");
> +
> +pci_ers_result_t cxl_port_error_detected(struct device *dev)
> +{
> + void __iomem *ras_base = cxl_get_ras_base(dev);
> +
> + return cxl_handle_ras(dev, 0, ras_base);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_port_error_detected, "CXL");
> +
> void cxl_cor_error_detected(struct device *dev)
> {
> struct pci_dev *pdev = to_pci_dev(dev);
> @@ -469,9 +633,8 @@ EXPORT_SYMBOL_NS_GPL(pci_error_detected, "CXL");
> static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> {
> struct pci_dev *pdev = err_info->pdev;
> - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> - struct cxl_memdev *cxlmd = cxlds->cxlmd;
> - struct device *host_dev __free(put_device) = get_device(&cxlmd->dev);
> + struct device *dev = pci_to_cxl_dev(pdev);
> + struct device *host_dev __free(put_device) = get_cxl_host_dev(&pdev->dev);
>
> if (err_info->severity == AER_CORRECTABLE) {
> int aer = pdev->aer_cap;
> @@ -481,11 +644,14 @@ static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> aer + PCI_ERR_COR_STATUS,
> 0, PCI_ERR_COR_INTERNAL);
>
> - cxl_cor_error_detected(&cxlmd->dev);
> + if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ENDPOINT)
> + cxl_error_detected(&pdev->dev);
> + else
> + cxl_port_cor_error_detected(dev);
>
> pcie_clear_device_status(pdev);
> } else {
> - cxl_do_recovery(&cxlmd->dev);
> + cxl_do_recovery(dev);
> }
> }
>
> diff --git a/drivers/pci/pcie/cxl_aer.c b/drivers/pci/pcie/cxl_aer.c
> index 74e6d2d04ab6..6eeff0b78b47 100644
> --- a/drivers/pci/pcie/cxl_aer.c
> +++ b/drivers/pci/pcie/cxl_aer.c
> @@ -68,7 +68,10 @@ bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
> if (!info || !info->is_cxl)
> return false;
>
> - if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
> + if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
> return false;
>
> return is_internal_error(info);
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 21/23] CXL/PCI: Introduce CXL uncorrectable protocol error recovery
2025-08-27 1:35 ` [PATCH v11 21/23] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
@ 2025-09-03 22:30 ` Dave Jiang
0 siblings, 0 replies; 53+ messages in thread
From: Dave Jiang @ 2025-09-03 22:30 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> Populate the cxl_do_recovery() function with uncorrectable protocol error (UCE)
> handling. Follow similar design as found in PCIe error driver,
> pcie_do_recovery(). One difference is cxl_do_recovery() will treat all UCEs
> as fatal with a kernel panic. This is to prevent corruption on CXL memory.
>
> Introduce cxl_walk_port(). Make this analogous to pci_walk_bridge() but walking
> CXL ports instead. This will iterate through the CXL topology from the
> erroring device through the downstream CXL Ports and Endpoints.
>
> Export pci_aer_clear_fatal_status() for CXL to use if a UCE is not found.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
> Changes in v10->v11:
> - pci_ers_merge_results() - Move to earlier patch
> ---
> drivers/cxl/core/port.c | 1 +
> drivers/cxl/core/ras.c | 94 +++++++++++++++++++++++++++++++++++++++++
> drivers/pci/pci.h | 2 -
> include/linux/aer.h | 2 +
> 4 files changed, 97 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 758fb73374c1..085c8620a797 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1347,6 +1347,7 @@ struct cxl_port *find_cxl_port(struct device *dport_dev,
> port = __find_cxl_port(&ctx);
> return port;
> }
> +EXPORT_SYMBOL_NS_GPL(find_cxl_port, "CXL");
>
> static struct cxl_port *find_cxl_port_at(struct cxl_port *parent_port,
> struct device *dport_dev,
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 536ca9c815ce..3da675f72616 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -6,6 +6,7 @@
> #include <cxl/event.h>
> #include <cxlmem.h>
> #include <cxlpci.h>
> +#include <cxl.h>
> #include "trace.h"
>
> static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
> @@ -468,8 +469,101 @@ void cxl_endpoint_port_init_ras(struct cxl_port *ep)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_init_ras, "CXL");
>
> +static int cxl_report_error_detected(struct device *dev, void *data)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> + pci_ers_result_t vote, *result = data;
> +
> + guard(device)(dev);
> +
> + if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ENDPOINT)
> + vote = cxl_error_detected(dev);
> + else
> + vote = cxl_port_error_detected(dev);
> +
> + vote = cxl_error_detected(dev);
> + *result = pci_ers_merge_result(*result, vote);
> +
> + return 0;
> +}
> +
> +static int match_port_by_parent_dport(struct device *dev, const void *dport_dev)
> +{
> + struct cxl_port *port;
> +
> + if (!is_cxl_port(dev))
> + return 0;
> +
> + port = to_cxl_port(dev);
> +
> + return port->parent_dport->dport_dev == dport_dev;
> +}
> +
> +static void cxl_walk_port(struct device *port_dev,
> + int (*cb)(struct device *, void *),
> + void *userdata)
> +{
> + struct cxl_dport *dport = NULL;
> + struct cxl_port *port;
> + unsigned long index;
> +
> + if (!port_dev)
> + return;
> +
> + port = to_cxl_port(port_dev);
> + if (port->uport_dev && dev_is_pci(port->uport_dev))
> + cb(port->uport_dev, userdata);
> +
> + xa_for_each(&port->dports, index, dport)
> + {
> + struct device *child_port_dev __free(put_device) =
> + bus_find_device(&cxl_bus_type, &port->dev, dport,
> + match_port_by_parent_dport);
> +
> + cb(dport->dport_dev, userdata);
> +
> + cxl_walk_port(child_port_dev, cxl_report_error_detected, userdata);
> + }
> +
> + if (is_cxl_endpoint(port))
> + cb(port->uport_dev->parent, userdata);
> +}
> +
> static void cxl_do_recovery(struct device *dev)
> {
> + pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + struct cxl_dport *dport;
> + struct cxl_port *port;
> +
> + if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) ||
> + (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> + port = find_cxl_port(&pdev->dev, &dport);
> + } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_UPSTREAM) {
> + struct device *port_dev = bus_find_device(&cxl_bus_type, NULL,
> + &pdev->dev, match_uport);
> + port = to_cxl_port(port_dev);
> + }
Do we not attempt recovery if the device is an endpoint? Is it because it is handled directly by AER callback of the cxl_pci driver? Should endpoint error just not be forwarded from the AER kfifo producer instead of being checked on the consumer end after going through the kfifo mechanism?
DJ
> +
> + if (!port)
> + return;
> +
> + cxl_walk_port(&port->dev, cxl_report_error_detected, &status);
> + if (status == PCI_ERS_RESULT_PANIC)
> + panic("CXL cachemem error.");
> +
> + /*
> + * If we have native control of AER, clear error status in the device
> + * that detected the error. If the platform retained control of AER,
> + * it is responsible for clearing this status. In that case, the
> + * signaling device may not even be visible to the OS.
> + */
> + if (cxl_error_is_native(pdev)) {
> + pcie_clear_device_status(pdev);
> + pci_aer_clear_nonfatal_status(pdev);
> + pci_aer_clear_fatal_status(pdev);
> + }
> + put_device(&port->dev);
> }
>
> static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 69ff7c2d214f..0c4f73dd645f 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -1170,13 +1170,11 @@ static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
>
> #ifdef CONFIG_CXL_RAS
> void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> -bool cxl_error_is_native(struct pci_dev *dev);
> bool is_internal_error(struct aer_err_info *info);
> bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info);
> void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info);
> #else
> static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> -static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
> static inline bool is_internal_error(struct aer_err_info *info) { return false; }
> static inline bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return false; }
> static inline void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info) { }
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 1f79f0be4bf7..751a026fea73 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -81,10 +81,12 @@ static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd);
> void cxl_register_proto_err_work(struct work_struct *work);
> void cxl_unregister_proto_err_work(void);
> +bool cxl_error_is_native(struct pci_dev *dev);
> #else
> static inline int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd) { return 0; }
> static inline void cxl_register_proto_err_work(struct work_struct *work) { }
> static inline void cxl_unregister_proto_err_work(void) { }
> +static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
> #endif
>
> void pci_print_aer(struct pci_dev *dev, int aer_severity,
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v11 22/23] CXL/PCI: Enable CXL protocol errors during CXL Port probe
2025-08-27 1:35 ` [PATCH v11 22/23] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
@ 2025-09-03 23:23 ` Dave Jiang
0 siblings, 0 replies; 53+ messages in thread
From: Dave Jiang @ 2025-09-03 23:23 UTC (permalink / raw)
To: Terry Bowman, dave, jonathan.cameron, alison.schofield,
dan.j.williams, bhelgaas, shiju.jose, ming.li,
Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
sathyanarayanan.kuppuswamy, linux-cxl, alucerop, ira.weiny
Cc: linux-kernel, linux-pci
On 8/26/25 6:35 PM, Terry Bowman wrote:
> CXL protocol errors are not enabled for all CXL devices after boot. These
> must be enabled inorder to process CXL protocol errors.
>
> Introduce cxl_unmask_proto_interrupts() to call pci_aer_unmask_internal_errors().
> pci_aer_unmask_internal_errors() expects the pdev->aer_cap is initialized.
> But, dev->aer_cap is not initialized for CXL Upstream Switch Ports and CXL
> Downstream Switch Ports. Initialize the dev->aer_cap if necessary. Enable AER
> correctable internal errors and uncorrectable internal errors for all CXL
> devices.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> ---
> Changes in v10->v11:
> - Added check for valid PCI devices in is_cxl_error() (Terry)
> - Removed check for RCiEP in cxl_handle_proto_err() and
> cxl_report_error_detected() (Terry)
> ---
> drivers/cxl/core/ras.c | 26 +++++++++++++++++++++++++-
> drivers/pci/pci.h | 2 --
> include/linux/aer.h | 2 ++
> 3 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 3da675f72616..90ea0dfb942f 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -122,6 +122,21 @@ static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
> static pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
> static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base);
>
> +static void cxl_unmask_proto_interrupts(struct device *dev)
> +{
> + struct pci_dev *pdev __free(pci_dev_put) =
> + pci_dev_get(to_pci_dev(dev));
> +
> + if (!pdev->aer_cap) {
> + pdev->aer_cap = pci_find_ext_capability(pdev,
> + PCI_EXT_CAP_ID_ERR);
> + if (!pdev->aer_cap)
> + return;
> + }
> +
> + pci_aer_unmask_internal_errors(pdev);
> +}
> +
> #ifdef CONFIG_CXL_RCH_RAS
> static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
> {
> @@ -418,7 +433,10 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
>
> cxl_dport_map_rch_aer(dport);
> cxl_disable_rch_root_ints(dport);
> + return;
> }
> +
> + cxl_unmask_proto_interrupts(dport->dport_dev);
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL");
>
> @@ -429,8 +447,12 @@ static void cxl_uport_init_ras_reporting(struct cxl_port *port,
>
> map->host = host;
> if (cxl_map_component_regs(map, &port->uport_regs,
> - BIT(CXL_CM_CAP_CAP_ID_RAS)))
> + BIT(CXL_CM_CAP_CAP_ID_RAS))) {
> dev_dbg(&port->dev, "Failed to map RAS capability\n");
> + return;
> + }
> +
> + cxl_unmask_proto_interrupts(port->uport_dev);
> }
>
> void cxl_switch_port_init_ras(struct cxl_port *port)
> @@ -466,6 +488,8 @@ void cxl_endpoint_port_init_ras(struct cxl_port *ep)
> }
>
> cxl_dport_init_ras_reporting(parent_dport, cxlmd->cxlds->dev);
> +
> + cxl_unmask_proto_interrupts(cxlmd->cxlds->dev);
> }
> EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_init_ras, "CXL");
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 0c4f73dd645f..090b52a26862 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -1169,12 +1169,10 @@ static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
> #endif
>
> #ifdef CONFIG_CXL_RAS
> -void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> bool is_internal_error(struct aer_err_info *info);
> bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info);
> void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info);
> #else
> -static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> static inline bool is_internal_error(struct aer_err_info *info) { return false; }
> static inline bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return false; }
> static inline void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info) { }
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 751a026fea73..4e2fc55f2497 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -82,11 +82,13 @@ int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd);
> void cxl_register_proto_err_work(struct work_struct *work);
> void cxl_unregister_proto_err_work(void);
> bool cxl_error_is_native(struct pci_dev *dev);
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> #else
> static inline int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd) { return 0; }
> static inline void cxl_register_proto_err_work(struct work_struct *work) { }
> static inline void cxl_unregister_proto_err_work(void) { }
> static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
> +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> #endif
>
> void pci_print_aer(struct pci_dev *dev, int aer_severity,
^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2025-09-03 23:23 UTC | newest]
Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-08-27 1:35 ` [PATCH v11 01/23] cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c Terry Bowman
2025-08-27 1:35 ` [PATCH v11 02/23] CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS Terry Bowman
2025-08-29 15:24 ` Jonathan Cameron
2025-08-29 18:16 ` Sathyanarayanan Kuppuswamy
2025-08-27 1:35 ` [PATCH v11 03/23] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-08-28 15:28 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 04/23] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
2025-08-28 8:35 ` Alejandro Lucero Palau
2025-08-28 17:32 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 05/23] cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block Terry Bowman
2025-08-28 8:57 ` Alejandro Lucero Palau
2025-08-29 15:33 ` Jonathan Cameron
2025-08-27 1:35 ` [PATCH v11 06/23] CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors Terry Bowman
2025-08-28 20:53 ` Dave Jiang
2025-08-29 8:39 ` Lukas Wunner
2025-08-27 1:35 ` [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2025-08-27 14:51 ` Lukas Wunner
2025-08-29 15:42 ` Jonathan Cameron
2025-08-29 15:47 ` Jonathan Cameron
2025-08-28 21:07 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 08/23] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
2025-08-28 8:18 ` Alejandro Lucero Palau
2025-08-27 1:35 ` [PATCH v11 09/23] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
2025-08-27 7:37 ` Lukas Wunner
2025-08-27 1:35 ` [PATCH v11 10/23] CXL/AER: Update PCI class code check to use FIELD_GET() Terry Bowman
2025-08-29 16:03 ` Jonathan Cameron
2025-08-27 1:35 ` [PATCH v11 11/23] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2025-08-27 1:35 ` [PATCH v11 12/23] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
2025-08-27 1:35 ` [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-08-27 11:55 ` Shiju Jose
2025-08-29 16:06 ` Jonathan Cameron
2025-08-27 1:35 ` [PATCH v11 14/23] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
2025-08-27 1:35 ` [PATCH v11 15/23] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-08-28 23:05 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 16/23] cxl/pci: Introduce CXL Endpoint protocol error handlers Terry Bowman
2025-08-27 7:48 ` Lukas Wunner
2025-08-27 1:35 ` [PATCH v11 17/23] CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors Terry Bowman
2025-08-27 7:56 ` Lukas Wunner
2025-08-27 1:35 ` [PATCH v11 18/23] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2025-08-29 0:43 ` Dave Jiang
2025-08-29 7:10 ` Lukas Wunner
2025-08-27 1:35 ` [PATCH v11 19/23] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
2025-08-30 0:17 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
2025-08-27 8:04 ` Lukas Wunner
2025-08-27 12:19 ` kernel test robot
2025-08-27 1:35 ` [PATCH v11 21/23] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
2025-09-03 22:30 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 22/23] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-09-03 23:23 ` Dave Jiang
2025-08-27 1:35 ` [PATCH v11 23/23] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-08-29 0:07 ` [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Dave Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).