From: Richard Cheng <icheng@nvidia.com>
To: Srirangan Madhavan <smadhavan@nvidia.com>
Cc: Alison Schofield <alison.schofield@intel.com>,
Bjorn Helgaas <bhelgaas@google.com>,
Dan Williams <djbw@kernel.org>,
Dave Jiang <dave.jiang@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Ira Weiny <ira.weiny@intel.com>,
Jonathan Cameron <jic23@kernel.org>,
Vishal Verma <vishal.l.verma@intel.com>,
linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, vsethi@nvidia.com,
alwilliamson@nvidia.com, Dan Williams <danwilliams@nvidia.com>,
Sai Yashwanth Reddy Kancherla <skancherla@nvidia.com>,
Vishal Aslot <vaslot@nvidia.com>,
Manish Honap <mhonap@nvidia.com>, Jiandi An <jan@nvidia.com>,
linux-tegra@vger.kernel.org
Subject: Re: [PATCH v7 00/11] PCI/CXL: Add CXL reset support for Type 2 devices
Date: Wed, 24 Jun 2026 22:26:53 +0800 [thread overview]
Message-ID: <ajvlJNPOPkm_hj3O@MWDK4CY14F> (raw)
In-Reply-To: <20260623032453.3404772-1-smadhavan@nvidia.com>
On Tue, Jun 23, 2026 at 03:24:42AM +0800, Srirangan Madhavan wrote:
> Hi folks!
>
> This series adds CXL Reset support for CXL Type 2 devices through the
> existing PCI reset_method ABI. The reset sequence follows the CXL 4.0
> specification [1], including CXL.cache disable, optional cache
> writeback, CXL Reset initiation, ResetComplete polling, and ResetError
> reporting.
>
> The userspace ABI is the existing PCI reset interface:
>
> /sys/bus/pci/devices/.../reset_method
> /sys/bus/pci/devices/.../reset
>
> Userspace can select "cxl_reset" in reset_method and then trigger reset
> through the existing reset attribute.
>
Hi Srirangan,
Thanks for the work, I applied your series and run some tests on a
CXL type-2 capable GPU, seems like something is wrong.
Device's BDF is 0002:81:00.0 and DVSEC base is at 0x10c
CAP=0x116, CTRL2=0x11c, STATUS2=0x11e
STATUS2 bits: bit0 CACHE_INV, bit1 RST_DONE, bit2 RST_ERR
CTRL2 bits: bit0 = DISABLE_CACHING
I run all the following command as root.
"""
# b=0002:81:00.0; dev=/sys/bus/pci/devices/$b
# echo cxl_reset > $dev/reset_method
# echo "PRE CAP=0x$(setpci -s $b 0x116.w) CTRL2=0x$(setpci -s $b 0x11c.w) STATUS2=0x$(setpci -s $b 0x11e.w)"
# dmesg -C
# time echo 1 > $dev/reset
# echo "POST CAP=0x$(setpci -s $b 0x116.w) CTRL2=0x$(setpci -s $b 0x11c.w) STATUS2=0x$(setpci -s $b 0x11e.w)"
"""
So we know,
PRE CAP=0x8bd7 CTRL2=0x0000 STATUS2=0x8000
==> RESET rc=1 elapsed_ms=114 err=[bash: line 9: echo: write error: Input/output error]
POST CAP=0x8bd7 CTRL2=0x0001 STATUS2=0x8003
device-present=1 reset_method=cxl_reset leaked_cxl_reset_iomem=0
with dmesg no output.
The write() to reset failed with -EIO after ~114 ms, but STATUS2 went 0x8000 -> 0x8003 .
The device completed the reset, so the kernel returned failure for a reset the HW did successfully.
CTRL2 went 0x0000 -> 0x0001, the device's CXL.cache is disabled after the "failed" reset.
~114ms is almost equal to msleep(100) in cxl_reset_wait_done() + the first poll. On the first poll
the Status2 read returns 0xffff, 0xffff has bit 2 set, which the code reads as RST_ERR -> return -EIO with no retry.
After that I ran the same "echo 1 > $dev/reset" in a 25x loop, logging rc, elapsed ms, CTRL2, STATUS2 at each iteration, then dmesg.
"""
PRE CAP=0x8bd7 CTRL2=0x0001 STATUS2=0x8003
iter 1 rc=1 ms= 114 CTRL2=0x0001 STATUS2=0x8003 present=1
iter 2 rc=1 ms= 171 CTRL2=0xffff STATUS2=0xffff present=1
iter 3 rc=1 ms= 6 CTRL2=0xffff STATUS2=0xffff present=1
[snip]
iter 25 rc=1 ms= 5 CTRL2=0xffff STATUS2=0xffff present=1
### rc histogram: rc=1 : 25x
### leaked cxl_reset iomem regions: 0
"""
The complete dmesg shows
"""
[ 1892.870193] ------------[ cut here ]------------
[ 1892.870215] index 7 is out of range for type 'resource_size_t [6]'
[ 1892.870218] CPU: 121 UID: 0 PID: 19436 Comm: bash Not tainted 7.1.0-rc7+ #1 PREEMPT(full)
[ 1892.870221] Hardware name: NVIDIA VR NVL72/P3809-BMC, BIOS NV_SBIOS: 06.02.00.00, OEM_SBIOS: 06.02.00.00 Mon
Jun 8 08:22:03 PM UTC 2026
[ 1892.870222] Call trace:
[ 1892.870223] show_stack+0x24/0x50 (C)
[ 1892.870229] dump_stack_lvl+0x80/0x140
[ 1892.870236] dump_stack+0x1c/0x38
[ 1892.870237] __ubsan_handle_out_of_bounds+0xd0/0x128
[ 1892.870242] pci_restore_iov_state+0x250/0x270
[ 1892.870249] pci_restore_state+0x10c/0x2c0
[ 1892.870251] pci_dev_restore+0x6c/0xb0
[ 1892.870252] pci_reset_function+0x94/0x160
[ 1892.870254] reset_store+0x78/0xf0
[ 1892.870258] dev_attr_store+0x24/0x78
[ 1892.870263] sysfs_kf_write+0x88/0xc8
[ 1892.870268] kernfs_fop_write_iter+0x170/0x228
[ 1892.870272] vfs_write+0x270/0x3a8
[ 1892.870276] ksys_write+0x7c/0x138
[ 1892.870277] __arm64_sys_write+0x28/0x50
[ 1892.870279] invoke_syscall.constprop.0+0xac/0x100
[ 1892.870282] do_el0_svc+0x4c/0x100
[ 1892.870283] el0_svc+0x50/0x2b0
[ 1892.870285] el0t_64_sync_handler+0xc0/0x108
[ 1892.870286] el0t_64_sync+0x1b8/0x1c0
[ 1892.870291] ---[ end trace ]---
[ 1892.870293] ------------[ cut here ]------------
[ 1892.870293] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ 1892.870295] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[ 1892.870296] CPU: 121 UID: 0 PID: 19436 Comm: bash Not tainted 7.1.0-rc7+ #1 PREEMPT(full)
[ 1892.870297] Hardware name: NVIDIA VR NVL72/P3809-BMC, BIOS NV_SBIOS: 06.02.00.00, OEM_SBIOS: 06.02.00.00 Mon
Jun 8 08:22:03 PM UTC 2026
[ 1892.870298] Call trace:
[ 1892.870298] show_stack+0x24/0x50 (C)
[ 1892.870299] dump_stack_lvl+0x80/0x140
[ 1892.870300] dump_stack+0x1c/0x38
[ 1892.870301] __ubsan_handle_shift_out_of_bounds+0x154/0x260
[ 1892.870302] pci_rebar_bytes_to_size+0x98/0xc8
[ 1892.870305] pci_restore_iov_state+0x1f0/0x270
[ 1892.870306] pci_restore_state+0x10c/0x2c0
[ 1892.870307] pci_dev_restore+0x6c/0xb0
[ 1892.870308] pci_reset_function+0x94/0x160
[ 1892.870308] reset_store+0x78/0xf0
[ 1892.870309] dev_attr_store+0x24/0x78
[ 1892.870310] sysfs_kf_write+0x88/0xc8
[ 1892.870311] kernfs_fop_write_iter+0x170/0x228
[ 1892.870312] vfs_write+0x270/0x3a8
[ 1892.870313] ksys_write+0x7c/0x138
[ 1892.870314] __arm64_sys_write+0x28/0x50
[ 1892.870315] invoke_syscall.constprop.0+0xac/0x100
[ 1892.870315] do_el0_svc+0x4c/0x100
[ 1892.870316] el0_svc+0x50/0x2b0
[ 1892.870317] el0t_64_sync_handler+0xc0/0x108
[ 1892.870318] el0t_64_sync+0x1b8/0x1c0
[ 1892.870319] ---[ end trace ]---
"""
After iter 1, iter 2 took the device to CTRL2=0xffff STATUS2=0xffff, the config space returns all-ones,
which means the function stopped responding to config cycles and didn't self-recover.
During that iteration's pci_dev_restore(), the PCI core read the device's VF-ReBAR config, which is now 0xffffffff,
and used the garbage as array indices: bar_idx = 7 to dev->sriov->barsz[] which is sized 6. Two UBSAN reports.
Best regards,
Richard Cheng.
> Following Dan's v6 feedback, this replaces the proposed memdev sysfs ABI
> with the existing PCI reset_method interface.
>
> v7 changes from v6 [2]:
> - Move the ABI from a CXL memdev attribute to PCI reset_method.
> - Drop the memdev dependency from reset entry; advertise cxl_reset for
> Type 2 functions that report CXL Reset support in the CXL Device DVSEC.
> - Incorporate Dan's HDM reset refactor: shared decoder settings,
> pci_dev->hdm cached state, and built-in CONFIG_CXL_HDM helpers.
> - Cache endpoint HDM settings during PCI enumeration when MMIO decoding
> is already enabled, and let CXL core refresh the same cache later.
> - Reduce the earlier PCI/CXL save/restore series [3] to the HDM state
> cache and restore infrastructure needed by this reset flow.
> - Use cached HDM ranges to reject reset while affected ranges are busy
> and to invalidate CPU caches before reset.
> - Discover the CXL reset scope with the Non-CXL Function Map and CXL
> cache/mem capability bits.
> - Quiesce affected sibling functions with PCI save/disable and IOMMU
> reset prepare/done before executing reset.
> - Restore cached HDM decoder state after reset before completing PCI
> reset recovery.
> - Keep CXL Reset Memory Clear disabled.
>
> Motivation:
> -----------
> - Type 2 devices need a CXL-specific reset mechanism beyond existing PCI
> reset methods.
>
> - FLR does not reset CXL.cache or CXL.mem protocol state. CXL Reset is
> the architectural reset mechanism for those protocols.
>
> - The PCI reset_method ABI lets userspace select this narrower CXL reset
> before falling back to broader bus reset methods.
>
> Change Description:
> -------------------
>
> Patch 1: cxl/hdm: Split decoder programming into a reusable helper
> - Move shared decoder settings to include/cxl/cxl.h.
> - Factor low-level HDM register programming into cxl_commit().
>
> Patch 2: cxl/hdm: Cache decoder settings on PCI devices
> - Cache CXL core HDM decoder settings in pci_dev->hdm.
> - Refresh the cache as decoders are enumerated, committed, or reset.
>
> Patch 3: cxl/hdm: Cache endpoint decoder settings during PCI enumeration
> - Snapshot endpoint HDM state during PCI capability initialization when
> memory decoding is already enabled.
> - Reuse the same cache when CXL core later enumerates the device.
>
> Patch 4: PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
> - Export PCI reset lifecycle helpers for CXL reset orchestration.
>
> Patch 5: PCI/CXL: Add CXL Device Reset helper
> - Add the internal DVSEC reset sequence.
> - Disable CXL.cache, perform cache writeback where supported, initiate
> CXL Reset, and wait for completion.
>
> Patch 6: PCI/CXL: Validate HDM ranges before CXL reset
> - Collect enabled cached HDM ranges.
> - Reject reset if affected ranges are busy and invalidate CPU caches.
>
> Patch 7: PCI/CXL: Discover the CXL reset scope
> - Discover same-scope CXL functions with the Non-CXL Function Map and
> CXL cache/mem capability bits.
>
> Patch 8: PCI/CXL: Coordinate sibling functions for CXL reset
> - Lock, save, disable, and IOMMU-block affected sibling functions.
> - Include mem-capable siblings in HDM range validation and cache flush.
>
> Patch 9: cxl/pci: Restore CXL HDM state after PCI reset
> - Restore cached global and per-decoder HDM state after reset.
> - Keep IOMMU reset blocks active until HDM restore completes.
>
> Patch 10: PCI/CXL: Expose CXL Reset as a PCI reset method
> - Add "cxl_reset" to the PCI reset_method table for Type 2 reset-capable
> CXL devices.
>
> Patch 11: Documentation/ABI: Document CXL Reset PCI reset method
> - Document the new reset_method value and reset behavior.
>
> The CPU cache invalidation step depends on
> cpu_cache_invalidate_memregion() support for the affected address ranges.
> If no provider is available, reset fails before hardware reset is
> requested.
>
> Example:
>
> echo cxl_reset > /sys/bus/pci/devices/0000:bb:dd.f/reset_method
> echo 1 > /sys/bus/pci/devices/0000:bb:dd.f/reset
>
> Basic CXL DVSEC reset testing was done on a CXL Type 2 device. The reset
> sequence completed successfully and ResetComplete was observed.
>
> References:
> [1] https://computeexpresslink.org/wp-content/uploads/2026/02/CXL-Specification_rev4p0_ver1p0_2026February26_clean_evalcopy_v2.pdf
> [2] https://lore.kernel.org/linux-cxl/20260528083154.137979-1-smadhavan@nvidia.com/
> [3] https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/
>
> Srirangan Madhavan (11):
> cxl/hdm: Split decoder programming into a reusable helper
> cxl/hdm: Cache decoder settings on PCI devices
> cxl/hdm: Cache endpoint decoder settings during PCI enumeration
> PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
> PCI/CXL: Add CXL Device Reset helper
> PCI/CXL: Validate HDM ranges before CXL reset
> PCI/CXL: Discover the CXL reset scope
> PCI/CXL: Coordinate sibling functions for CXL reset
> cxl/pci: Restore CXL HDM state after PCI reset
> PCI/CXL: Expose CXL Reset as a PCI reset method
> Documentation/ABI: Document CXL Reset PCI reset method
>
> Documentation/ABI/testing/sysfs-bus-pci | 14 +
> drivers/cxl/Kconfig | 4 +
> drivers/cxl/core/Makefile | 2 +-
> drivers/cxl/core/hdm.c | 234 ++---
> drivers/cxl/core/region.c | 6 +-
> drivers/cxl/core/reset.c | 1276 +++++++++++++++++++++++
> drivers/cxl/cxl.h | 43 -
> drivers/pci/pci.c | 25 +-
> drivers/pci/probe.c | 2 +
> include/cxl/cxl.h | 85 +-
> include/linux/pci.h | 10 +-
> include/uapi/linux/pci_regs.h | 15 +
> tools/testing/cxl/test/cxl.c | 10 +-
> 13 files changed, 1554 insertions(+), 172 deletions(-)
> create mode 100644 drivers/cxl/core/reset.c
>
> base-commit: 72afdd8181219f459142e571999b3b44ef7b85fb
> --
> 2.43.0
prev parent reply other threads:[~2026-06-24 14:27 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 3:24 [PATCH v7 00/11] PCI/CXL: Add CXL reset support for Type 2 devices Srirangan Madhavan
2026-06-23 3:24 ` [PATCH v7 01/11] cxl: Split decoder programming into a reusable helper Srirangan Madhavan
2026-06-23 3:24 ` [PATCH v7 02/11] cxl: Cache decoder settings on PCI devices Srirangan Madhavan
2026-06-23 23:13 ` Dan Williams (nvidia)
2026-06-23 3:24 ` [PATCH v7 03/11] cxl: Cache endpoint decoder settings during PCI enumeration Srirangan Madhavan
2026-06-24 2:15 ` Dan Williams (nvidia)
2026-06-23 3:24 ` [PATCH v7 04/11] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() Srirangan Madhavan
2026-06-24 2:17 ` Dan Williams (nvidia)
2026-06-23 3:24 ` [PATCH v7 05/11] cxl: Add CXL Device Reset helper Srirangan Madhavan
2026-06-24 14:33 ` Richard Cheng
2026-06-23 3:24 ` [PATCH v7 06/11] cxl: Validate HDM ranges before CXL reset Srirangan Madhavan
2026-06-23 3:24 ` [PATCH v7 07/11] PCI/cxl: Discover the CXL reset scope Srirangan Madhavan
2026-06-23 3:24 ` [PATCH v7 08/11] cxl: Coordinate sibling functions for CXL reset Srirangan Madhavan
2026-06-23 23:00 ` Dan Williams (nvidia)
2026-06-23 3:24 ` [PATCH v7 09/11] cxl: Restore CXL HDM state after PCI reset Srirangan Madhavan
2026-06-24 14:55 ` Richard Cheng
2026-06-23 3:24 ` [PATCH v7 10/11] PCI/cxl: Expose CXL Reset as a PCI reset method Srirangan Madhavan
2026-06-23 3:24 ` [PATCH v7 11/11] Documentation/ABI: Document CXL Reset " Srirangan Madhavan
2026-06-24 14:26 ` Richard Cheng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajvlJNPOPkm_hj3O@MWDK4CY14F \
--to=icheng@nvidia.com \
--cc=alison.schofield@intel.com \
--cc=alwilliamson@nvidia.com \
--cc=bhelgaas@google.com \
--cc=danwilliams@nvidia.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=djbw@kernel.org \
--cc=ira.weiny@intel.com \
--cc=jan@nvidia.com \
--cc=jic23@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=mhonap@nvidia.com \
--cc=skancherla@nvidia.com \
--cc=smadhavan@nvidia.com \
--cc=vaslot@nvidia.com \
--cc=vishal.l.verma@intel.com \
--cc=vsethi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox