Linux Tegra architecture development
 help / color / mirror / Atom feed
From: Richard Cheng <icheng@nvidia.com>
To: Srirangan Madhavan <smadhavan@nvidia.com>
Cc: Alison Schofield <alison.schofield@intel.com>,
	 Bjorn Helgaas <bhelgaas@google.com>,
	Dan Williams <djbw@kernel.org>,
	 Dave Jiang <dave.jiang@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	 Ira Weiny <ira.weiny@intel.com>,
	Jonathan Cameron <jic23@kernel.org>,
	 Vishal Verma <vishal.l.verma@intel.com>,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	 linux-kernel@vger.kernel.org, vsethi@nvidia.com,
	alwilliamson@nvidia.com,  Dan Williams <danwilliams@nvidia.com>,
	Sai Yashwanth Reddy Kancherla <skancherla@nvidia.com>,
	 Vishal Aslot <vaslot@nvidia.com>,
	Manish Honap <mhonap@nvidia.com>, Jiandi An <jan@nvidia.com>,
	 linux-tegra@vger.kernel.org
Subject: Re: [PATCH v7 00/11] PCI/CXL: Add CXL reset support for Type 2 devices
Date: Wed, 24 Jun 2026 22:26:53 +0800	[thread overview]
Message-ID: <ajvlJNPOPkm_hj3O@MWDK4CY14F> (raw)
In-Reply-To: <20260623032453.3404772-1-smadhavan@nvidia.com>

On Tue, Jun 23, 2026 at 03:24:42AM +0800, Srirangan Madhavan wrote:
> Hi folks!
> 
> This series adds CXL Reset support for CXL Type 2 devices through the
> existing PCI reset_method ABI. The reset sequence follows the CXL 4.0
> specification [1], including CXL.cache disable, optional cache
> writeback, CXL Reset initiation, ResetComplete polling, and ResetError
> reporting.
> 
> The userspace ABI is the existing PCI reset interface:
> 
>     /sys/bus/pci/devices/.../reset_method
>     /sys/bus/pci/devices/.../reset
> 
> Userspace can select "cxl_reset" in reset_method and then trigger reset
> through the existing reset attribute.
>

Hi Srirangan,

Thanks for the work, I applied your series and run some tests on a
CXL type-2 capable GPU, seems like something is wrong.

Device's BDF is 0002:81:00.0 and DVSEC base is at 0x10c
CAP=0x116, CTRL2=0x11c, STATUS2=0x11e
STATUS2 bits: bit0 CACHE_INV, bit1 RST_DONE, bit2 RST_ERR
CTRL2 bits: bit0 = DISABLE_CACHING

I run all the following command as root.

"""
# b=0002:81:00.0; dev=/sys/bus/pci/devices/$b
# echo cxl_reset > $dev/reset_method
# echo "PRE  CAP=0x$(setpci -s $b 0x116.w) CTRL2=0x$(setpci -s $b 0x11c.w) STATUS2=0x$(setpci -s $b 0x11e.w)"
# dmesg -C
# time echo 1 > $dev/reset
# echo "POST CAP=0x$(setpci -s $b 0x116.w) CTRL2=0x$(setpci -s $b 0x11c.w) STATUS2=0x$(setpci -s $b 0x11e.w)"
"""

So we know,
PRE  CAP=0x8bd7 CTRL2=0x0000 STATUS2=0x8000
==> RESET rc=1 elapsed_ms=114 err=[bash: line 9: echo: write error: Input/output error]
POST CAP=0x8bd7 CTRL2=0x0001 STATUS2=0x8003
device-present=1 reset_method=cxl_reset leaked_cxl_reset_iomem=0

with dmesg no output.

The write() to reset failed with -EIO after ~114 ms, but STATUS2 went 0x8000 -> 0x8003 .
The device completed the reset, so the kernel returned failure for a reset the HW did successfully.
CTRL2 went 0x0000 -> 0x0001, the device's CXL.cache is disabled after the "failed" reset.
~114ms is almost equal to msleep(100) in cxl_reset_wait_done() + the first poll. On the first poll
the Status2 read returns 0xffff, 0xffff has bit 2 set, which the code reads as RST_ERR -> return -EIO with no retry.


After that I ran the same "echo 1 > $dev/reset" in a 25x loop, logging rc, elapsed ms, CTRL2, STATUS2 at each iteration, then dmesg.
"""
PRE  CAP=0x8bd7 CTRL2=0x0001 STATUS2=0x8003
iter  1 rc=1 ms= 114 CTRL2=0x0001 STATUS2=0x8003 present=1
iter  2 rc=1 ms= 171 CTRL2=0xffff STATUS2=0xffff present=1
iter  3 rc=1 ms=   6 CTRL2=0xffff STATUS2=0xffff present=1
[snip]
iter 25 rc=1 ms=   5 CTRL2=0xffff STATUS2=0xffff present=1
### rc histogram:  rc=1 : 25x
### leaked cxl_reset iomem regions: 0
"""

The complete dmesg shows
"""
  [ 1892.870193] ------------[ cut here ]------------
  [ 1892.870215] index 7 is out of range for type 'resource_size_t [6]'
  [ 1892.870218] CPU: 121 UID: 0 PID: 19436 Comm: bash Not tainted 7.1.0-rc7+ #1 PREEMPT(full)
  [ 1892.870221] Hardware name: NVIDIA VR NVL72/P3809-BMC, BIOS NV_SBIOS: 06.02.00.00, OEM_SBIOS: 06.02.00.00 Mon
  Jun  8 08:22:03 PM UTC 2026
  [ 1892.870222] Call trace:
  [ 1892.870223]  show_stack+0x24/0x50 (C)
  [ 1892.870229]  dump_stack_lvl+0x80/0x140
  [ 1892.870236]  dump_stack+0x1c/0x38
  [ 1892.870237]  __ubsan_handle_out_of_bounds+0xd0/0x128
  [ 1892.870242]  pci_restore_iov_state+0x250/0x270
  [ 1892.870249]  pci_restore_state+0x10c/0x2c0
  [ 1892.870251]  pci_dev_restore+0x6c/0xb0
  [ 1892.870252]  pci_reset_function+0x94/0x160
  [ 1892.870254]  reset_store+0x78/0xf0
  [ 1892.870258]  dev_attr_store+0x24/0x78
  [ 1892.870263]  sysfs_kf_write+0x88/0xc8
  [ 1892.870268]  kernfs_fop_write_iter+0x170/0x228
  [ 1892.870272]  vfs_write+0x270/0x3a8
  [ 1892.870276]  ksys_write+0x7c/0x138
  [ 1892.870277]  __arm64_sys_write+0x28/0x50
  [ 1892.870279]  invoke_syscall.constprop.0+0xac/0x100
  [ 1892.870282]  do_el0_svc+0x4c/0x100
  [ 1892.870283]  el0_svc+0x50/0x2b0
  [ 1892.870285]  el0t_64_sync_handler+0xc0/0x108
  [ 1892.870286]  el0t_64_sync+0x1b8/0x1c0
  [ 1892.870291] ---[ end trace ]---
  [ 1892.870293] ------------[ cut here ]------------
  [ 1892.870293] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
  [ 1892.870295] shift exponent 64 is too large for 64-bit type 'long unsigned int'
  [ 1892.870296] CPU: 121 UID: 0 PID: 19436 Comm: bash Not tainted 7.1.0-rc7+ #1 PREEMPT(full)
  [ 1892.870297] Hardware name: NVIDIA VR NVL72/P3809-BMC, BIOS NV_SBIOS: 06.02.00.00, OEM_SBIOS: 06.02.00.00 Mon
  Jun  8 08:22:03 PM UTC 2026
  [ 1892.870298] Call trace:
  [ 1892.870298]  show_stack+0x24/0x50 (C)
  [ 1892.870299]  dump_stack_lvl+0x80/0x140
  [ 1892.870300]  dump_stack+0x1c/0x38
  [ 1892.870301]  __ubsan_handle_shift_out_of_bounds+0x154/0x260
  [ 1892.870302]  pci_rebar_bytes_to_size+0x98/0xc8
  [ 1892.870305]  pci_restore_iov_state+0x1f0/0x270
  [ 1892.870306]  pci_restore_state+0x10c/0x2c0
  [ 1892.870307]  pci_dev_restore+0x6c/0xb0
  [ 1892.870308]  pci_reset_function+0x94/0x160
  [ 1892.870308]  reset_store+0x78/0xf0
  [ 1892.870309]  dev_attr_store+0x24/0x78
  [ 1892.870310]  sysfs_kf_write+0x88/0xc8
  [ 1892.870311]  kernfs_fop_write_iter+0x170/0x228
  [ 1892.870312]  vfs_write+0x270/0x3a8
  [ 1892.870313]  ksys_write+0x7c/0x138
  [ 1892.870314]  __arm64_sys_write+0x28/0x50
  [ 1892.870315]  invoke_syscall.constprop.0+0xac/0x100
  [ 1892.870315]  do_el0_svc+0x4c/0x100
  [ 1892.870316]  el0_svc+0x50/0x2b0
  [ 1892.870317]  el0t_64_sync_handler+0xc0/0x108
  [ 1892.870318]  el0t_64_sync+0x1b8/0x1c0
  [ 1892.870319] ---[ end trace ]---
"""

After iter 1, iter 2 took the device to CTRL2=0xffff STATUS2=0xffff, the config space returns all-ones,
which means the function stopped responding to config cycles and didn't self-recover.
During that iteration's pci_dev_restore(), the PCI core read the device's VF-ReBAR config, which is now 0xffffffff,
and used the garbage as array indices: bar_idx = 7 to dev->sriov->barsz[] which is sized 6. Two UBSAN reports.

Best regards,
Richard Cheng.




> Following Dan's v6 feedback, this replaces the proposed memdev sysfs ABI
> with the existing PCI reset_method interface.
> 
> v7 changes from v6 [2]:
> - Move the ABI from a CXL memdev attribute to PCI reset_method.
> - Drop the memdev dependency from reset entry; advertise cxl_reset for
>   Type 2 functions that report CXL Reset support in the CXL Device DVSEC.
> - Incorporate Dan's HDM reset refactor: shared decoder settings,
>   pci_dev->hdm cached state, and built-in CONFIG_CXL_HDM helpers.
> - Cache endpoint HDM settings during PCI enumeration when MMIO decoding
>   is already enabled, and let CXL core refresh the same cache later.
> - Reduce the earlier PCI/CXL save/restore series [3] to the HDM state
>   cache and restore infrastructure needed by this reset flow.
> - Use cached HDM ranges to reject reset while affected ranges are busy
>   and to invalidate CPU caches before reset.
> - Discover the CXL reset scope with the Non-CXL Function Map and CXL
>   cache/mem capability bits.
> - Quiesce affected sibling functions with PCI save/disable and IOMMU
>   reset prepare/done before executing reset.
> - Restore cached HDM decoder state after reset before completing PCI
>   reset recovery.
> - Keep CXL Reset Memory Clear disabled.
> 
> Motivation:
> -----------
> - Type 2 devices need a CXL-specific reset mechanism beyond existing PCI
>   reset methods.
> 
> - FLR does not reset CXL.cache or CXL.mem protocol state. CXL Reset is
>   the architectural reset mechanism for those protocols.
> 
> - The PCI reset_method ABI lets userspace select this narrower CXL reset
>   before falling back to broader bus reset methods.
> 
> Change Description:
> -------------------
> 
> Patch 1: cxl/hdm: Split decoder programming into a reusable helper
> - Move shared decoder settings to include/cxl/cxl.h.
> - Factor low-level HDM register programming into cxl_commit().
> 
> Patch 2: cxl/hdm: Cache decoder settings on PCI devices
> - Cache CXL core HDM decoder settings in pci_dev->hdm.
> - Refresh the cache as decoders are enumerated, committed, or reset.
> 
> Patch 3: cxl/hdm: Cache endpoint decoder settings during PCI enumeration
> - Snapshot endpoint HDM state during PCI capability initialization when
>   memory decoding is already enabled.
> - Reuse the same cache when CXL core later enumerates the device.
> 
> Patch 4: PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
> - Export PCI reset lifecycle helpers for CXL reset orchestration.
> 
> Patch 5: PCI/CXL: Add CXL Device Reset helper
> - Add the internal DVSEC reset sequence.
> - Disable CXL.cache, perform cache writeback where supported, initiate
>   CXL Reset, and wait for completion.
> 
> Patch 6: PCI/CXL: Validate HDM ranges before CXL reset
> - Collect enabled cached HDM ranges.
> - Reject reset if affected ranges are busy and invalidate CPU caches.
> 
> Patch 7: PCI/CXL: Discover the CXL reset scope
> - Discover same-scope CXL functions with the Non-CXL Function Map and
>   CXL cache/mem capability bits.
> 
> Patch 8: PCI/CXL: Coordinate sibling functions for CXL reset
> - Lock, save, disable, and IOMMU-block affected sibling functions.
> - Include mem-capable siblings in HDM range validation and cache flush.
> 
> Patch 9: cxl/pci: Restore CXL HDM state after PCI reset
> - Restore cached global and per-decoder HDM state after reset.
> - Keep IOMMU reset blocks active until HDM restore completes.
> 
> Patch 10: PCI/CXL: Expose CXL Reset as a PCI reset method
> - Add "cxl_reset" to the PCI reset_method table for Type 2 reset-capable
>   CXL devices.
> 
> Patch 11: Documentation/ABI: Document CXL Reset PCI reset method
> - Document the new reset_method value and reset behavior.
> 
> The CPU cache invalidation step depends on
> cpu_cache_invalidate_memregion() support for the affected address ranges.
> If no provider is available, reset fails before hardware reset is
> requested.
> 
> Example:
> 
>     echo cxl_reset > /sys/bus/pci/devices/0000:bb:dd.f/reset_method
>     echo 1 > /sys/bus/pci/devices/0000:bb:dd.f/reset
> 
> Basic CXL DVSEC reset testing was done on a CXL Type 2 device. The reset
> sequence completed successfully and ResetComplete was observed.
> 
> References:
> [1] https://computeexpresslink.org/wp-content/uploads/2026/02/CXL-Specification_rev4p0_ver1p0_2026February26_clean_evalcopy_v2.pdf
> [2] https://lore.kernel.org/linux-cxl/20260528083154.137979-1-smadhavan@nvidia.com/
> [3] https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/
> 
> Srirangan Madhavan (11):
>   cxl/hdm: Split decoder programming into a reusable helper
>   cxl/hdm: Cache decoder settings on PCI devices
>   cxl/hdm: Cache endpoint decoder settings during PCI enumeration
>   PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
>   PCI/CXL: Add CXL Device Reset helper
>   PCI/CXL: Validate HDM ranges before CXL reset
>   PCI/CXL: Discover the CXL reset scope
>   PCI/CXL: Coordinate sibling functions for CXL reset
>   cxl/pci: Restore CXL HDM state after PCI reset
>   PCI/CXL: Expose CXL Reset as a PCI reset method
>   Documentation/ABI: Document CXL Reset PCI reset method
> 
>  Documentation/ABI/testing/sysfs-bus-pci |   14 +
>  drivers/cxl/Kconfig                     |    4 +
>  drivers/cxl/core/Makefile               |    2 +-
>  drivers/cxl/core/hdm.c                  |  234 ++---
>  drivers/cxl/core/region.c               |    6 +-
>  drivers/cxl/core/reset.c                | 1276 +++++++++++++++++++++++
>  drivers/cxl/cxl.h                       |   43 -
>  drivers/pci/pci.c                       |   25 +-
>  drivers/pci/probe.c                     |    2 +
>  include/cxl/cxl.h                       |   85 +-
>  include/linux/pci.h                     |   10 +-
>  include/uapi/linux/pci_regs.h           |   15 +
>  tools/testing/cxl/test/cxl.c            |   10 +-
>  13 files changed, 1554 insertions(+), 172 deletions(-)
>  create mode 100644 drivers/cxl/core/reset.c
> 
> base-commit: 72afdd8181219f459142e571999b3b44ef7b85fb
> -- 
> 2.43.0

      parent reply	other threads:[~2026-06-24 14:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23  3:24 [PATCH v7 00/11] PCI/CXL: Add CXL reset support for Type 2 devices Srirangan Madhavan
2026-06-23  3:24 ` [PATCH v7 01/11] cxl: Split decoder programming into a reusable helper Srirangan Madhavan
2026-06-23  3:24 ` [PATCH v7 02/11] cxl: Cache decoder settings on PCI devices Srirangan Madhavan
2026-06-23 23:13   ` Dan Williams (nvidia)
2026-06-23  3:24 ` [PATCH v7 03/11] cxl: Cache endpoint decoder settings during PCI enumeration Srirangan Madhavan
2026-06-24  2:15   ` Dan Williams (nvidia)
2026-06-23  3:24 ` [PATCH v7 04/11] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() Srirangan Madhavan
2026-06-24  2:17   ` Dan Williams (nvidia)
2026-06-23  3:24 ` [PATCH v7 05/11] cxl: Add CXL Device Reset helper Srirangan Madhavan
2026-06-24 14:33   ` Richard Cheng
2026-06-23  3:24 ` [PATCH v7 06/11] cxl: Validate HDM ranges before CXL reset Srirangan Madhavan
2026-06-23  3:24 ` [PATCH v7 07/11] PCI/cxl: Discover the CXL reset scope Srirangan Madhavan
2026-06-23  3:24 ` [PATCH v7 08/11] cxl: Coordinate sibling functions for CXL reset Srirangan Madhavan
2026-06-23 23:00   ` Dan Williams (nvidia)
2026-06-23  3:24 ` [PATCH v7 09/11] cxl: Restore CXL HDM state after PCI reset Srirangan Madhavan
2026-06-24 14:55   ` Richard Cheng
2026-06-23  3:24 ` [PATCH v7 10/11] PCI/cxl: Expose CXL Reset as a PCI reset method Srirangan Madhavan
2026-06-23  3:24 ` [PATCH v7 11/11] Documentation/ABI: Document CXL Reset " Srirangan Madhavan
2026-06-24 14:26 ` Richard Cheng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajvlJNPOPkm_hj3O@MWDK4CY14F \
    --to=icheng@nvidia.com \
    --cc=alison.schofield@intel.com \
    --cc=alwilliamson@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=danwilliams@nvidia.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=djbw@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=jan@nvidia.com \
    --cc=jic23@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=mhonap@nvidia.com \
    --cc=skancherla@nvidia.com \
    --cc=smadhavan@nvidia.com \
    --cc=vaslot@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox