Linux CXL
 help / color / mirror / Atom feed
* (no subject)
@ 2026-04-28 18:24 Fabio M. De Francesco
  2026-04-28 18:24 ` [PATCH 1/2] PCI/CXL: Allow PM Init to complete on cxl_bus reset if ACS SV enabled Fabio M. De Francesco
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Fabio M. De Francesco @ 2026-04-28 18:24 UTC (permalink / raw)
  To: linux-cxl
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams, Bjorn Helgaas,
	linux-kernel, linux-pci, Fabio M. De Francesco

Subject: [PATCH 0/2] PCI/CXL: Recover CXL Downstream Ports from PM Init failure

CXL r4.0 sec 8.1.5.1 Implementation Note describes a scenario in which a
Secondary Bus Reset, a Link Down, or Downstream Port Containment on a
CXL Downstream Port prevents Port PM Init from completing when ACS
Source Validation is enabled on the Downstream Port. The spec states
that another SBR alone does not recover the port and describes a
software recovery sequence.  

Patch 1 extends cxl_reset_bus_function(), the helper backing the cxl_bus
PCI/CXL reset method exposed to userspace via sysfs. It saves, clears,
and restores ACS Source Validation and Bus Master Enable on the CXL
Downstream Port around the SBR it issues. This keeps the userspace
cxl_bus reset path from leaving the port unable to complete PM Init.

Patch 2 adds a recovery pass during CXL enumeration. For each CXL
Downstream Port in a memdev's ancestry, the CXL core checks whether PM
Init has completed. If it has not, regardless of what caused the
failure, it invokes cxl_reset_bus_function() on the child below the port
in the hope of restoring the port to a usable state. CXL enumeration
re-runs after events that tear down and re-probe the memdev, including
DPC, AER, and Link Down, so those paths reach this recovery.

This small series is developed from an old RFC v3:
https://lore.kernel.org/linux-cxl/20260330193347.25072-1-fabio.m.de.francesco@linux.intel.com/

Fabio M. De Francesco (2):
  PCI/CXL: Allow PM Init to complete on cxl_bus reset if ACS SV enabled
  cxl/core: Recover from PM Init failure via cxl_reset_bus_function()

drivers/cxl/core/pci.c        | 30 ++++++++++++++++++++
 drivers/cxl/core/port.c       | 22 +++++++++++++++
 drivers/cxl/cxlpci.h          |  3 ++
 drivers/pci/pci.c             | 52 ++++++++++++++++++++++++++++++++++-
 include/linux/pci.h           |  1 + 
 include/uapi/linux/pci_regs.h |  2 ++
 6 files changed, 109 insertions(+), 1 deletion(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 13+ messages in thread
* [NDCTL PATCH v3 2/2] cxl: Add check for regions before disabling memdev
@ 2023-11-30 21:51 Dave Jiang
  2024-04-17  6:46 ` Yao Xingtao
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Jiang @ 2023-11-30 21:51 UTC (permalink / raw)
  To: linux-cxl, nvdimm; +Cc: vishal.l.verma, caoqq

Add a check for memdev disable to see if there are active regions present
before disabling the device. This is necessary now regions are present to
fulfill the TODO that was left there. The best way to determine if a
region is active is to see if there are decoders enabled for the mem
device. This is also best effort as the state is only a snapshot the
kernel provides and is not atomic WRT the memdev disable operation. The
expectation is the admin issuing the command has full control of the mem
device and there are no other agents also attempt to control the device.

Reviewed-by: Quanquan Cao <caoqq@fujitsu.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v3:
- Add emission of warning for forcing operation. (Quanquan)
v2:
- Warn if active region regardless of -f. (Alison)
- Expound on -f behavior in man page. (Vishal)
---
 Documentation/cxl/cxl-disable-memdev.txt |    4 +++-
 cxl/memdev.c                             |   20 +++++++++++++++++---
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/Documentation/cxl/cxl-disable-memdev.txt b/Documentation/cxl/cxl-disable-memdev.txt
index c4edb93ee94a..34b720288705 100644
--- a/Documentation/cxl/cxl-disable-memdev.txt
+++ b/Documentation/cxl/cxl-disable-memdev.txt
@@ -27,7 +27,9 @@ include::bus-option.txt[]
 	a device if the tool determines the memdev is in active usage. Recall
 	that CXL memory ranges might have been established by platform
 	firmware and disabling an active device is akin to force removing
-	memory from a running system.
+	memory from a running system. Going down this path does not offline
+	active memory if they are currently online. User is recommended to
+	offline and disable the appropriate regions before disabling the memdevs.
 
 -v::
 	Turn on verbose debug messages in the library (if libcxl was built with
diff --git a/cxl/memdev.c b/cxl/memdev.c
index 2dd2e7fcc4dd..ed962d478048 100644
--- a/cxl/memdev.c
+++ b/cxl/memdev.c
@@ -437,14 +437,28 @@ static int action_free_dpa(struct cxl_memdev *memdev,
 
 static int action_disable(struct cxl_memdev *memdev, struct action_context *actx)
 {
+	struct cxl_endpoint *ep;
+	struct cxl_port *port;
+
 	if (!cxl_memdev_is_enabled(memdev))
 		return 0;
 
-	if (!param.force) {
-		/* TODO: actually detect rather than assume active */
+	ep = cxl_memdev_get_endpoint(memdev);
+	if (!ep)
+		return -ENODEV;
+
+	port = cxl_endpoint_get_port(ep);
+	if (!port)
+		return -ENODEV;
+
+	if (cxl_port_decoders_committed(port)) {
 		log_err(&ml, "%s is part of an active region\n",
 			cxl_memdev_get_devname(memdev));
-		return -EBUSY;
+		if (!param.force)
+			return -EBUSY;
+
+		log_err(&ml, "Forcing %s disable with an active region!\n",
+			cxl_memdev_get_devname(memdev));
 	}
 
 	return cxl_memdev_disable_invalidate(memdev);



^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v5 0/5] CXL Poison List Retrieval & Tracing
@ 2023-01-18 20:59 alison.schofield
  2023-01-27  1:59 ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: alison.schofield @ 2023-01-18 20:59 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Dave Jiang, Ben Widawsky,
	Steven Rostedt
  Cc: Alison Schofield, linux-cxl, linux-kernel

From: Alison Schofield <alison.schofield@intel.com>

**RESENDING this cover letter previously mis-threaded.

Changes in v5:
- Rebase on cxl/next 
- Use struct_size() to calc mbox cmd payload .min_out
- s/INTERNAL/INJECTED mocked poison record source
- Added Jonathan Reviewed-by tag on Patch 3

Link to v4:
https://lore.kernel.org/linux-cxl/cover.1671135967.git.alison.schofield@intel.com/

Add support for retrieving device poison lists and store the returned
error records as kernel trace events.

The handling of the poison list is guided by the CXL 3.0 Specification
Section 8.2.9.8.4.1. [1] 

Example, triggered by memdev:
$ echo 1 > /sys/bus/cxl/devices/mem3/trigger_poison_list
cxl_poison: memdev=mem3 pcidev=cxl_mem.3 region= region_uuid=00000000-0000-0000-0000-000000000000 dpa=0x0 length=0x40 source=Internal flags= overflow_time=0

Example, triggered by region:
$ echo 1 > /sys/bus/cxl/devices/region5/trigger_poison_list
cxl_poison: memdev=mem0 pcidev=cxl_mem.0 region=region5 region_uuid=bfcb7a29-890e-4a41-8236-fe22221fc75c dpa=0x0 length=0x40 source=Internal flags= overflow_time=0
cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region=region5 region_uuid=bfcb7a29-890e-4a41-8236-fe22221fc75c dpa=0x0 length=0x40 source=Internal flags= overflow_time=0

[1]: https://www.computeexpresslink.org/download-the-specification

Alison Schofield (5):
  cxl/mbox: Add GET_POISON_LIST mailbox command
  cxl/trace: Add TRACE support for CXL media-error records
  cxl/memdev: Add trigger_poison_list sysfs attribute
  cxl/region: Add trigger_poison_list sysfs attribute
  tools/testing/cxl: Mock support for Get Poison List

 Documentation/ABI/testing/sysfs-bus-cxl | 28 +++++++++
 drivers/cxl/core/mbox.c                 | 78 +++++++++++++++++++++++
 drivers/cxl/core/memdev.c               | 45 ++++++++++++++
 drivers/cxl/core/region.c               | 33 ++++++++++
 drivers/cxl/core/trace.h                | 83 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    | 69 +++++++++++++++++++-
 drivers/cxl/pci.c                       |  4 ++
 tools/testing/cxl/test/mem.c            | 42 +++++++++++++
 8 files changed, 381 insertions(+), 1 deletion(-)


base-commit: 589c3357370a596ef7c99c00baca8ac799fce531
-- 
2.37.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-06  5:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 18:24 Fabio M. De Francesco
2026-04-28 18:24 ` [PATCH 1/2] PCI/CXL: Allow PM Init to complete on cxl_bus reset if ACS SV enabled Fabio M. De Francesco
2026-05-01 18:36   ` Dave Jiang
2026-04-28 18:24 ` [PATCH 2/2] cxl/core: Recover from PM Init failure via cxl_reset_bus_function() Fabio M. De Francesco
2026-05-01 21:59   ` Dave Jiang
2026-05-06  5:54   ` Alison Schofield
2026-05-01 22:01 ` Dave Jiang
  -- strict thread matches above, loose matches on Subject: below --
2023-11-30 21:51 [NDCTL PATCH v3 2/2] cxl: Add check for regions before disabling memdev Dave Jiang
2024-04-17  6:46 ` Yao Xingtao
2024-04-17 18:14   ` Verma, Vishal L
2024-04-22  7:26     ` Re: Xingtao Yao (Fujitsu)
2023-01-18 20:59 [PATCH v5 0/5] CXL Poison List Retrieval & Tracing alison.schofield
2023-01-27  1:59 ` Dan Williams
2023-01-27 16:10   ` Alison Schofield
2023-01-27 19:16     ` Re: Dan Williams
2023-01-27 21:36       ` Re: Alison Schofield
2023-01-27 22:04         ` Re: Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox