Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v18 12/13] PCI/CXL: Mask/Unmask CXL protocol errors
From: Dave Jiang @ 2026-07-20 22:52 UTC (permalink / raw)
  To: Terry Bowman, Bjorn Helgaas, Dan Williams, Ira Weiny,
	Jonathan Cameron, Len Brown, Rafael J . Wysocki, Robert Richter
  Cc: linux-acpi, linux-cxl, linux-doc, linux-kernel, linux-pci,
	linuxppc-dev, Alejandro Lucero, Alison Schofield, Ankit Agrawal,
	Ard Biesheuvel, Ben Cheatham, Borislav Petkov, Breno Leitao,
	Davidlohr Bueso, Fabio M . De Francesco, Gregory Price,
	Hanjun Guo, Jonathan Corbet, Kees Cook,
	Kuppuswamy Sathyanarayanan, Li Ming, Mahesh J Salgaonkar,
	Mauro Carvalho Chehab, Oliver O'Halloran, Shiju Jose,
	Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck, Vishal Verma
In-Reply-To: <20260717222706.3540281-13-terry.bowman@amd.com>



On 7/17/26 3:27 PM, Terry Bowman wrote:
> CXL protocol errors are not enabled for all CXL devices after boot.
> They must be enabled in order to process CXL protocol errors. Provide
> matching teardown helpers so the masks are restored when a CXL Port
> or dport goes away.
> 
> Add pci_aer_mask_internal_errors() as the symmetric counterpart to
> pci_aer_unmask_internal_errors() and export both for the cxl_core
> module.
> 
> Introduce cxl_unmask_proto_interrupts() and cxl_mask_proto_interrupts()
> in cxl_core to wrap the PCI helpers with the dev_is_pci() and
> pcie_aer_is_native() gating CXL needs. Both helpers tolerate a NULL
> or non-PCI @dev so callers do not have to special-case it.
> 
> Wire cxl_unmask_proto_interrupts() into the success path of
> cxl_dport_map_ras() and devm_cxl_port_ras_setup() so the unmask
> only runs when the RAS register block was actually mapped. Pair each
> unmask with a devm_add_action_or_reset() registration of
> cxl_mask_proto_irqs() scoped to the host device so the mask is
> restored when devres is released. This applies to dports, Endpoints,
> Upstream Switch Ports, Downstream Switch Ports, and Root Ports.
> 
> Remove the dev_is_pci(dport->dport_dev) guard in
> devm_cxl_dport_rch_ras_setup(). On RCH systems dport->dport_dev is the
> pci_host_bridge device, which is not on pci_bus_type, so this guard
> caused the function to return early on real hardware without mapping
> dport RAS or AER registers. The caller already gates on dport->rch,
> which is sufficient to exclude cxl_test mock devices.
> 
> Co-developed-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> 
> ---
> 
> Changes in v17->v18:
> - Make cxl_unmask_proto_interrupts() and cxl_mask_proto_interrupts() static
> - Remove dev_is_pci() guard from devm_cxl_dport_rch_ras_setup(); the guard
>   blocked real RCH hardware because pci_host_bridge is not on pci_bus_type
> 
> Changes in v16->v17:
> - Drop redundant cxl_mask_proto_interrupts() calls from unregister_port()
>   and cxl_dport_remove(); the devres action registered alongside the unmask
>   is the sole mask path.
> - Update title
> - Remove unnecessary check for aer_capabilities
> - Gate cxl_unmask_proto_interrupts() on pcie_aer_is_native()
> - Add pci_aer_mask_internal_errors() and cxl_mask_proto_interrupts()
> - Only unmask on successful cxl_map_component_regs()
> - NULL-check @dev in cxl_{un,}mask_proto_interrupts()
> - Drop static and declare in core/core.h
> 
> Change in v15 -> v16:
> - None
> 
> Change in v14 -> v15:
> - None
> 
> Changes in v13->v14:
> - Update commit title's prefix (Bjorn)
> 
> Changes in v12->v13:
> - Add dev and dev_is_pci() NULL checks in cxl_unmask_proto_interrupts() (Terry)
> - Add Dave Jiang's and Ben's review-by
> 
> Changes in v11->v12:
> - None
> ---
>  drivers/cxl/core/ras.c        | 73 +++++++++++++++++++++++++++++++----
>  drivers/pci/pcie/aer.c        | 28 ++++++++++++--
>  include/linux/aer.h           |  2 +
>  tools/testing/cxl/Kbuild      |  1 +
>  tools/testing/cxl/test/mock.c | 12 ++++++
>  5 files changed, 105 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 69b320c74469c..d77208af41e03 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -117,16 +117,64 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>  }
>  static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>  
> +static void cxl_unmask_proto_interrupts(struct device *dev)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev || !dev_is_pci(dev))
> +		return;
> +
> +	pdev = to_pci_dev(dev);
> +	if (!pcie_aer_is_native(pdev))
> +		return;
> +
> +	pci_aer_unmask_internal_errors(pdev);
> +}
> +
> +static void cxl_mask_proto_interrupts(struct device *dev)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev || !dev_is_pci(dev))
> +		return;
> +
> +	pdev = to_pci_dev(dev);
> +	if (!pcie_aer_is_native(pdev))
> +		return;
> +
> +	pci_aer_mask_internal_errors(pdev);
> +}
> +
> +static void cxl_mask_proto_irqs(void *dev)
> +{
> +	cxl_mask_proto_interrupts(dev);
> +}
> +
>  static void cxl_dport_map_ras(struct cxl_dport *dport)
>  {
>  	struct cxl_register_map *map = &dport->reg_map;
>  	struct device *dev = dport->dport_dev;
>  
> -	if (!map->component_map.ras.valid)
> +	if (!map->component_map.ras.valid) {
>  		dev_dbg(dev, "RAS registers not found\n");
> -	else if (cxl_map_component_regs(map, &dport->regs.component,
> -					BIT(CXL_CM_CAP_CAP_ID_RAS)))
> +		return;
> +	}
> +
> +	if (cxl_map_component_regs(map, &dport->regs.component,
> +				   BIT(CXL_CM_CAP_CAP_ID_RAS))) {
>  		dev_dbg(dev, "Failed to map RAS capability.\n");
> +		return;
> +	}
> +
> +	if (!dev_is_pci(dev))
> +		return;
> +
> +	cxl_unmask_proto_interrupts(dev);
> +	if (devm_add_action_or_reset(dport_to_host(dport),
> +				     cxl_mask_proto_irqs, dev)) {
> +		dev_warn(dev, "failed to defer CXL proto-irq mask; CXL protocol error reporting disabled\n");
> +		dport->regs.component.ras = NULL;
> +	}
>  }
>  
>  /**
> @@ -143,9 +191,6 @@ void devm_cxl_dport_rch_ras_setup(struct cxl_dport *dport)
>  {
>  	struct pci_host_bridge *host_bridge;
>  
> -	if (!dev_is_pci(dport->dport_dev))
> -		return;
> -
>  	devm_cxl_dport_ras_setup(dport);
>  
>  	host_bridge = to_pci_host_bridge(dport->dport_dev);
> @@ -160,6 +205,7 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_rch_ras_setup, "CXL");
>  void devm_cxl_port_ras_setup(struct cxl_port *port)
>  {
>  	struct cxl_register_map *map = &port->reg_map;
> +	struct device *dev;
>  
>  	if (!map->component_map.ras.valid) {
>  		dev_dbg(&port->dev, "RAS registers not found\n");
> @@ -168,8 +214,21 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
>  
>  	map->host = &port->dev;
>  	if (cxl_map_component_regs(map, &port->regs,
> -				   BIT(CXL_CM_CAP_CAP_ID_RAS)))
> +				   BIT(CXL_CM_CAP_CAP_ID_RAS))) {
>  		dev_dbg(&port->dev, "Failed to map RAS capability\n");
> +		return;
> +	}
> +
> +	dev = is_cxl_endpoint(port) ? port->uport_dev->parent : port->uport_dev;
> +	if (!dev_is_pci(dev))
> +		return;
> +
> +	cxl_unmask_proto_interrupts(dev);
> +	if (devm_add_action_or_reset(&port->dev, cxl_mask_proto_irqs, dev)) {
> +		dev_warn(&port->dev,
> +			 "failed to defer CXL proto-irq mask; CXL protocol error reporting disabled\n");
> +		port->regs.ras = NULL;
> +	}
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>  
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 0bd23a65e7ebc..be6dc2cbd4491 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1143,12 +1143,32 @@ void pci_aer_unmask_internal_errors(struct pci_dev *dev)
>  	mask &= ~PCI_ERR_COR_INTERNAL;
>  	pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
>  }
> +EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core");
>  
> -/*
> - * Internal errors are too device-specific to enable generally, however for CXL
> - * their behavior is standardized for conveying CXL protocol errors.
> +/**
> + * pci_aer_mask_internal_errors - mask internal errors
> + * @dev: pointer to the pci_dev data structure
> + *
> + * Mask internal errors in the Uncorrectable and Correctable Error
> + * Mask registers.
> + *
> + * Note: AER must be enabled and supported by the device which must be
> + * checked in advance, e.g. with pcie_aer_is_native().
>   */
> -EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core");
> +void pci_aer_mask_internal_errors(struct pci_dev *dev)
> +{
> +	int aer = dev->aer_cap;
> +	u32 mask;
> +
> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, &mask);
> +	mask |= PCI_ERR_UNC_INTN;
> +	pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, mask);
> +
> +	pci_read_config_dword(dev, aer + PCI_ERR_COR_MASK, &mask);
> +	mask |= PCI_ERR_COR_INTERNAL;
> +	pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(pci_aer_mask_internal_errors, "cxl_core");
>  
>  /**
>   * pci_aer_handle_error - handle logging error into an event log
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 8eba3192e2d15..b3657b80564b9 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -58,6 +58,7 @@ struct aer_capability_regs {
>  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>  int pcie_aer_is_native(struct pci_dev *dev);
>  void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> +void pci_aer_mask_internal_errors(struct pci_dev *dev);
>  #else
>  static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>  {
> @@ -65,6 +66,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>  }
>  static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>  static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> +static inline void pci_aer_mask_internal_errors(struct pci_dev *dev) { }
>  #endif
>  
>  #ifdef CONFIG_CXL_RAS
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index 2be1df80fcc93..957945201f04d 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -6,6 +6,7 @@ ldflags-y += --wrap=acpi_pci_find_root
>  ldflags-y += --wrap=nvdimm_bus_register
>  ldflags-y += --wrap=cxl_await_media_ready
>  ldflags-y += --wrap=devm_cxl_add_rch_dport
> +ldflags-y += --wrap=devm_cxl_dport_rch_ras_setup
>  ldflags-y += --wrap=cxl_endpoint_parse_cdat
>  ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
>  ldflags-y += --wrap=hmat_get_extended_linear_cache_size
> diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> index 6454b868b122c..5ad3243da8d29 100644
> --- a/tools/testing/cxl/test/mock.c
> +++ b/tools/testing/cxl/test/mock.c
> @@ -220,6 +220,18 @@ struct cxl_dport *__wrap_devm_cxl_add_rch_dport(struct cxl_port *port,
>  }
>  EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_add_rch_dport, "CXL");
>  
> +void __wrap_devm_cxl_dport_rch_ras_setup(struct cxl_dport *dport)
> +{
> +	int index;
> +	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> +
> +	if (!ops || !ops->is_mock_port(dport->dport_dev))
> +		devm_cxl_dport_rch_ras_setup(dport);
> +
> +	put_cxl_mock_ops(index);
> +}
> +EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_dport_rch_ras_setup, "CXL");
> +
>  void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
>  {
>  	int index;


^ permalink raw reply

* Re: [PATCH v18 10/13] cxl: Add port and dport identifiers to CXL AER trace events
From: Dave Jiang @ 2026-07-20 22:44 UTC (permalink / raw)
  To: Terry Bowman, Bjorn Helgaas, Dan Williams, Ira Weiny,
	Jonathan Cameron, Len Brown, Rafael J . Wysocki, Robert Richter
  Cc: linux-acpi, linux-cxl, linux-doc, linux-kernel, linux-pci,
	linuxppc-dev, Alejandro Lucero, Alison Schofield, Ankit Agrawal,
	Ard Biesheuvel, Ben Cheatham, Borislav Petkov, Breno Leitao,
	Davidlohr Bueso, Fabio M . De Francesco, Gregory Price,
	Hanjun Guo, Jonathan Corbet, Kees Cook,
	Kuppuswamy Sathyanarayanan, Li Ming, Mahesh J Salgaonkar,
	Mauro Carvalho Chehab, Oliver O'Halloran, Shiju Jose,
	Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck, Vishal Verma
In-Reply-To: <20260717222706.3540281-11-terry.bowman@amd.com>



On 7/17/26 3:27 PM, Terry Bowman wrote:
> From: Dan Williams <djbw@kernel.org>
> 
> Pass struct cxl_port * and struct cxl_dport * to the cxl_aer_*
> trace events instead of a plain struct device * derived at the
> caller. The trace event helpers then derive the right strings for
> endpoints, switch ports, root ports, and RCH downstream ports
> consistently across the CPER and native AER paths.
> 
> The unified cxl_aer_* events keep "memdev" as the legacy field
> (endpoint events populate it with the memdev name; non-endpoint
> events emit memdev="") and add new "port" and "dport" string fields
> populated for all CXL device classes. Updated userspace can key
> off "port" and "dport" without a parallel set of events.
> 
> Remove the separate cxl_port_aer_uncorrectable_error and
> cxl_port_aer_correctable_error trace events. All CXL AER events now
> use the unified cxl_aer_* events with port and dport fields.
> 
> Rework cxl_cper_handle_prot_err() to use find_cxl_port_by_dev() and
> the unified trace helpers, replacing the per-port-type branching and
> bus_find_device() memdev lookup.
> 
> The TP_printk format string places "port=%s dport=%s" between
> "memdev=%s" and "host=%s", changing the text-mode field order from
> the pre-patch output. This does not affect consumers such as
> rasdaemon that use libtraceevent to parse fields by name rather than
> by fixed text position.
> 
> For non-endpoint events (switch port, root port, RCH dport),
> "memdev" is empty and "port"/"dport" carry the topology information.
> 
> The serial number is retrieved via pci_get_dsn() which performs live
> PCI configuration space reads. A following patch ("PCI: Cache PCI
> DSN into pci_dev->dsn during probe") replaces these with a cached
> serial number to avoid config space access in error handlers and panic
> paths.
> 
> Below are examples of the different CXL devices' error trace logs
> after this patch:
> 
>      ---------------------
>      | CXL RP - 0C:00.0  |
>      ---------------------
>                |
>      ---------------------
>      | CXL USP - 0D:00.0 |
>      ---------------------
>                |
>      --------------------
>      | CXL DSP - 0E:00.0 |
>      --------------------
>                |
>      ---------------------
>      | CXL EP - 0F:00.0  |
>      ---------------------
> 
> Root Port:
> cxl_aer_correctable_error: memdev= port=port1 dport=0000:0c:00.0 \
>    host=pci0000:0c serial=0: status: 'Memory Data ECC Error'
> 
> cxl_aer_uncorrectable_error: memdev= port=port1 dport=0000:0c:00.0 \
>    host=pci0000:0c serial=0: status: 'Cache Address Parity Error'  \
>    first_error: 'Cache Address Parity Error'
> 
> Upstream Switch Port:
> cxl_aer_correctable_error: memdev= port=port2 dport= host=0000:0d:00.0 \
>    serial=0: status: 'Memory Data ECC Error'
> 
> UCE NA - Upstream Switch Port UCE's are handled in the portdrv driver's
> PCI AER callbacks that are not CXL aware.
> 
> Downstream Switch Port:
> cxl_aer_correctable_error: memdev= port=port2 dport=0000:0e:00.0 \
>    host=0000:0d:00.0 serial=0: status: 'Memory Data ECC Error'
> 
> cxl_aer_uncorrectable_error: memdev= port=port2 dport=0000:0e:00.0 \
>    host=0000:0d:00.0 serial=0: status: 'Cache Address Parity Error' \
>    first_error: 'Cache Address Parity Error'
> 
> Endpoint:
> cxl_aer_uncorrectable_error: memdev=mem1 port=endpoint4 dport= \
>    host=0000:0f:00.0 serial=0: status: 'Cache Address Parity Error' \
>    first_error: 'Cache Address Parity Error'
> 
> cxl_aer_correctable_error: memdev=mem1 port=endpoint4 dport= host=0000:0f:00.0 \
>    serial=0: status: 'Memory Data ECC Error'
> 
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Dan Williams <djbw@kernel.org>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> Changes in v17->v18:
> - Consolidate double find_cxl_port_by_dev() in cxl_cper_handle_prot_err()
> - Add comment noting dport is NULL for Endpoint and Upstream Port devices
> - Add cxl_trace_* helpers
> - Add CPER refactor
> 
> Changes in v16->v17:
> - Replace cxlds->serial with pci_get_dsn()
> - Change 'memdev' to 'device' (Dan)
> - Updated Commit message
> 
> Changes in v15->v16:
> - Add Dan's review-by
> - Incorporate Dan's comment into commit message:
> "Add the serial number at the end to preserve compatibility with
> libtraceevent parsing of the parameters."
> 
> Changes in v14->v15:
> - Update commit message.
> - Moved cxl_handle_ras/cxl_handle_cor_ras() changes to future patch (terry)
> 
> Changes in v13->v14:
> - Update commit headline (Bjorn)
> 
> Changes in v12->v13:
> - Added Dave Jiang's review-by
> 
> Changes in v11 -> v12:
> - Correct parameters to call trace_cxl_aer_correctable_error()
> - Add reviewed-by for Jonathan and Shiju
> 
> Changes in v10->v11:
> - Updated CE and UCE trace routines to maintain consistent TP_Struct ABI
> and unchanged TP_printk() logging.
> ---
>  drivers/cxl/core/core.h    |   8 +--
>  drivers/cxl/core/ras.c     | 131 +++++++++++--------------------------
>  drivers/cxl/core/ras_rch.c |   3 +-
>  drivers/cxl/core/trace.c   |  35 ++++++++++
>  drivers/cxl/core/trace.h   |  91 ++++++++------------------
>  drivers/cxl/cxlmem.h       |   7 ++
>  6 files changed, 113 insertions(+), 162 deletions(-)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 5ca1275fd8f35..a55a4e409feda 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -186,11 +186,11 @@ static inline struct device *dport_to_host(struct cxl_dport *dport)
>  void cxl_ras_init(void);
>  void cxl_ras_exit(void);
>  bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport,
> -		    void __iomem *ras_base);
> +		    void __iomem *ras_base, u64 serial);
>  void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port,
>  		     struct cxl_dport *dport);
>  void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport,
> -			void __iomem *ras_base);
> +			void __iomem *ras_base, u64 serial);
>  void cxl_dport_map_rch_aer(struct cxl_dport *dport);
>  void cxl_disable_rch_root_ints(struct cxl_dport *dport);
>  void cxl_handle_rdport_errors(struct pci_dev *pdev);
> @@ -200,14 +200,14 @@ void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
>  static inline void cxl_ras_init(void) { }
>  static inline void cxl_ras_exit(void) { }
>  static inline bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport,
> -				  void __iomem *ras_base)
> +				  void __iomem *ras_base, u64 serial)
>  {
>  	return false;
>  }
>  static inline void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port,
>  				   struct cxl_dport *dport) { }
>  static inline void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport,
> -				      void __iomem *ras_base) { }
> +				      void __iomem *ras_base, u64 serial) { }
>  static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
>  static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
>  static inline void cxl_handle_rdport_errors(struct pci_dev *pdev) { }
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index d5dc2c22565da..acf40b2396c3b 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -12,69 +12,37 @@
>  static_assert(CXL_HEADERLOG_TRACE_SIZE_U32 == 128,
>  	      "rasdaemon ABI requires exactly 128 u32s");
>  
> -static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
> -					      struct cxl_ras_capability_regs ras_cap)
> -{
> -	u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
> -
> -	trace_cxl_port_aer_correctable_error(&pdev->dev, status);
> -}
> -
> -static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,
> -						struct cxl_ras_capability_regs ras_cap)
> +static void cxl_cper_trace_uncorr_prot_err(struct cxl_port *port, struct cxl_dport *dport,
> +					   u64 serial, struct cxl_ras_capability_regs *ras_cap)
>  {
>  	u32 hl[CXL_HEADERLOG_TRACE_SIZE_U32] = {};
> -	u32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
> +	u32 status = ras_cap->uncor_status & ~ras_cap->uncor_mask;
>  	u32 fe;
>  
>  	if (hweight32(status) > 1)
>  		fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
> -				   ras_cap.cap_control));
> -	else
> -		fe = status;
> -
> -	memcpy(hl, ras_cap.header_log, CXL_HEADERLOG_SIZE);
> -	trace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe, hl);
> -}
> -
> -static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd,
> -					 struct cxl_ras_capability_regs ras_cap)
> -{
> -	u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
> -
> -	trace_cxl_aer_correctable_error(cxlmd, status);
> -}
> -
> -static void
> -cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,
> -			       struct cxl_ras_capability_regs ras_cap)
> -{
> -	u32 hl[CXL_HEADERLOG_TRACE_SIZE_U32] = {};
> -	u32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
> -	u32 fe;
> -
> -	if (hweight32(status) > 1)
> -		fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
> -				   ras_cap.cap_control));
> +				   ras_cap->cap_control));
>  	else
>  		fe = status;
>  
>  	/*
> -	 * ras_cap.header_log[] holds CXL_HEADERLOG_SIZE_U32 (16) hardware
> +	 * ras_cap->header_log[] holds CXL_HEADERLOG_SIZE_U32 (16) hardware
>  	 * dwords.  Copy them into the front of a zero-filled
>  	 * CXL_HEADERLOG_TRACE_SIZE_U32 (128) u32 staging buffer so the trace
>  	 * event memcpy sees a full 512-byte source and the userspace ABI
>  	 * (rasdaemon) is preserved.
>  	 */
> -	memcpy(hl, ras_cap.header_log, CXL_HEADERLOG_SIZE);
> -	trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl);
> +	memcpy(hl, ras_cap->header_log, CXL_HEADERLOG_SIZE);
> +	trace_cxl_aer_uncorrectable_error(port, dport, status, fe,
> +					  hl, serial);
>  }
>  
> -static int match_memdev_by_parent(struct device *dev, const void *uport)
> +static void cxl_cper_trace_corr_prot_err(struct cxl_port *port, struct cxl_dport *dport,
> +					 u64 serial, struct cxl_ras_capability_regs *ras_cap)
>  {
> -	if (is_cxl_memdev(dev) && dev->parent == uport)
> -		return 1;
> -	return 0;
> +	u32 status = ras_cap->cor_status & ~ras_cap->cor_mask;
> +
> +	trace_cxl_aer_correctable_error(port, dport, status, serial);
>  }
>  
>  
> @@ -109,47 +77,34 @@ static struct cxl_port *find_cxl_port_by_dev(struct device *dev, struct cxl_dpor
>  
>  void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
>  {
> +	struct cxl_dport *dport;
>  	unsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device,
>  				       data->prot_err.agent_addr.function);
> -	struct pci_dev *pdev __free(pci_dev_put) =
> -		pci_get_domain_bus_and_slot(data->prot_err.agent_addr.segment,
> -					    data->prot_err.agent_addr.bus,
> -					    devfn);
> -	struct cxl_memdev *cxlmd;
> -	int port_type;
> -
> -	if (!pdev)
> -		return;
> -
> -	port_type = pci_pcie_type(pdev);
> -	if (port_type == PCI_EXP_TYPE_ROOT_PORT ||
> -	    port_type == PCI_EXP_TYPE_DOWNSTREAM ||
> -	    port_type == PCI_EXP_TYPE_UPSTREAM) {
> -		if (data->severity == AER_CORRECTABLE)
> -			cxl_cper_trace_corr_port_prot_err(pdev, data->ras_cap);
> -		else
> -			cxl_cper_trace_uncorr_port_prot_err(pdev, data->ras_cap);
> -
> +	struct pci_dev *pdev __free(pci_dev_put) = pci_get_domain_bus_and_slot(
> +		data->prot_err.agent_addr.segment, data->prot_err.agent_addr.bus, devfn);
> +	if (!pdev) {
> +		pr_err_ratelimited("Failed to find CPER device in CXL topology\n");
>  		return;
>  	}
>  
> -	guard(device)(&pdev->dev);
> -	if (!pdev->dev.driver) {
> -		dev_warn_ratelimited(&pdev->dev,
> -				     "Device is unbound, abort CPER error handling\n");
> +	struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_dev(&pdev->dev, NULL);
> +	if (!port) {
> +		dev_err_ratelimited(&pdev->dev,
> +				    "Failed to find parent port device in CXL topology\n");
>  		return;
>  	}
>  
> -	struct device *mem_dev __free(put_device) = bus_find_device(
> -		&cxl_bus_type, NULL, pdev, match_memdev_by_parent);
> -	if (!mem_dev)
> -		return;
> +	guard(device)(&port->dev);
> +
> +	/* dport is NULL for Endpoint and Upstream Port devices */
> +	dport = cxl_find_dport_by_dev(port, &pdev->dev);
>  
> -	cxlmd = to_cxl_memdev(mem_dev);
>  	if (data->severity == AER_CORRECTABLE)
> -		cxl_cper_trace_corr_prot_err(cxlmd, data->ras_cap);
> +		cxl_cper_trace_corr_prot_err(port, dport, pci_get_dsn(pdev),
> +					     &data->ras_cap);
>  	else
> -		cxl_cper_trace_uncorr_prot_err(cxlmd, data->ras_cap);
> +		cxl_cper_trace_uncorr_prot_err(port, dport, pci_get_dsn(pdev),
> +					       &data->ras_cap);
>  }
>  EXPORT_SYMBOL_GPL(cxl_cper_handle_prot_err);
>  
> @@ -240,14 +195,14 @@ void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port, struct cxl_dpo
>  		return;
>  	}
>  
> -	if (cxl_handle_ras(port, dport, ras_base))
> +	if (cxl_handle_ras(port, dport, ras_base, pci_get_dsn(pdev)))
>  		panic("CXL cachemem error");
>  
>  	dev_dbg(&pdev->dev,
>  		"CXL UCE signaled but no CXL RAS status bits set\n");
>  }
>  
> -void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport, void __iomem *ras_base)
> +void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport, void __iomem *ras_base, u64 serial)
>  {
>  	u32 status;
>  	void __iomem *addr;
> @@ -259,12 +214,7 @@ void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport, void __i
>  	status = readl(addr);
>  	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
>  		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> -		if (is_cxl_endpoint(port))
> -			trace_cxl_aer_correctable_error(to_cxl_memdev(port->uport_dev), status);
> -		else if (dport)
> -			trace_cxl_port_aer_correctable_error(dport->dport_dev, status);
> -		else
> -			trace_cxl_port_aer_correctable_error(port->uport_dev, status);
> +		trace_cxl_aer_correctable_error(port, dport, status, serial);
>  	}
>  }
>  
> @@ -289,7 +239,8 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
>   * Log the state of the RAS status registers and prepare them to log the
>   * next error status. Return 1 if reset needed.
>   */
> -bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport, void __iomem *ras_base)
> +bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport,
> +		    void __iomem *ras_base, u64 serial)
>  {
>  	u32 hl[CXL_HEADERLOG_TRACE_SIZE_U32] = {};
>  	void __iomem *addr;
> @@ -316,12 +267,7 @@ bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport, void __iomem
>  	}
>  
>  	header_log_copy(ras_base, hl);
> -	if (is_cxl_endpoint(port))
> -		trace_cxl_aer_uncorrectable_error(to_cxl_memdev(port->uport_dev), status, fe, hl);
> -	else if (dport)
> -		trace_cxl_port_aer_uncorrectable_error(dport->dport_dev, status, fe, hl);
> -	else
> -		trace_cxl_port_aer_uncorrectable_error(port->uport_dev, status, fe, hl);
> +	trace_cxl_aer_uncorrectable_error(port, dport, status, fe, hl, serial);
>  
>  	writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
>  
> @@ -360,7 +306,8 @@ pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
>  		 * cases below handle AER recovery for devices without active
>  		 * CXL.mem traffic.
>  		 */
> -		ue = cxl_handle_ras(port, NULL, to_ras_base(port, NULL));
> +		ue = cxl_handle_ras(port, NULL, to_ras_base(port, NULL),
> +				    pci_get_dsn(pdev));
>  	}
>  
>  	/*
> @@ -392,7 +339,7 @@ static void cxl_handle_proto_error(struct pci_dev *pdev, struct cxl_port *port,
>  				   struct cxl_dport *dport, int severity)
>  {
>  	if (severity == AER_CORRECTABLE)
> -		cxl_handle_cor_ras(port, dport, to_ras_base(port, dport));
> +		cxl_handle_cor_ras(port, dport, to_ras_base(port, dport), pci_get_dsn(pdev));
>  	else
>  		cxl_do_recovery(pdev, port, dport);
>  }
> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
> index f4b98f2c11a1c..0385d2f4a2f66 100644
> --- a/drivers/cxl/core/ras_rch.c
> +++ b/drivers/cxl/core/ras_rch.c
> @@ -118,7 +118,8 @@ void cxl_handle_rdport_errors(struct pci_dev *pdev)
>  
>  	pci_print_aer(pdev, severity, &aer_regs);
>  	if (severity == AER_CORRECTABLE)
> -		cxl_handle_cor_ras(dport->port, dport, to_ras_base(port, dport));
> +		cxl_handle_cor_ras(dport->port, dport, to_ras_base(port, dport),
> +				   pci_get_dsn(pdev));
>  	else
>  		cxl_do_recovery(pdev, dport->port, dport);
>  }
> diff --git a/drivers/cxl/core/trace.c b/drivers/cxl/core/trace.c
> index 7f2a9dd0d0e3f..df42d119c53dd 100644
> --- a/drivers/cxl/core/trace.c
> +++ b/drivers/cxl/core/trace.c
> @@ -2,7 +2,42 @@
>  /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
>  
>  #include <cxl.h>
> +#include <cxlmem.h>
>  #include "core.h"
>  
> +const char *cxl_trace_memdev_name(struct cxl_port *port)
> +{
> +	if (is_cxl_endpoint(port)) {
> +		struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
> +
> +		return dev_name(&cxlmd->dev);
> +	}
> +
> +	return "";
> +}
> +
> +const char *cxl_trace_host_name(struct cxl_port *port)
> +{
> +	if (is_cxl_endpoint(port)) {
> +		struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
> +
> +		return dev_name(cxlmd->dev.parent);
> +	}
> +
> +	return dev_name(port->uport_dev);
> +}
> +
> +const char *cxl_trace_port_name(struct cxl_port *port)
> +{
> +	return dev_name(&port->dev);
> +}
> +
> +const char *cxl_trace_dport_name(struct cxl_dport *dport)
> +{
> +	if (dport)
> +		return dev_name(dport->dport_dev);
> +	return "";
> +}
> +
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index d37876096dd7c..910aceb2ca3ab 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -48,44 +48,15 @@
>  	{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" }			  \
>  )
>  
> -TRACE_EVENT(cxl_port_aer_uncorrectable_error,
> -	TP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl),
> -	TP_ARGS(dev, status, fe, hl),
> -	TP_STRUCT__entry(
> -		__string(device, dev_name(dev))
> -		__string(host, dev_name(dev->parent))
> -		__field(u32, status)
> -		__field(u32, first_error)
> -		__array(u32, header_log, CXL_HEADERLOG_TRACE_SIZE_U32)
> -	),
> -	TP_fast_assign(
> -		__assign_str(device);
> -		__assign_str(host);
> -		__entry->status = status;
> -		__entry->first_error = fe;
> -		/*
> -		 * Embed headerlog data for user app retrieval and parsing,
> -		 * but no need to print in the trace buffer. Only
> -		 * CXL_HEADERLOG_SIZE_U32 (16) dwords are hardware data;
> -		 * the remaining entries preserve the 512-byte ABI layout
> -		 * rasdaemon depends on and are zero-filled by the caller.
> -		 */
> -		memcpy(__entry->header_log, hl,
> -			CXL_HEADERLOG_TRACE_SIZE_U32 * sizeof(u32));
> -	),
> -	TP_printk("device=%s host=%s status: '%s' first_error: '%s'",
> -		  __get_str(device), __get_str(host),
> -		  show_uc_errs(__entry->status),
> -		  show_uc_errs(__entry->first_error)
> -	)
> -);
> -
>  TRACE_EVENT(cxl_aer_uncorrectable_error,
> -	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),
> -	TP_ARGS(cxlmd, status, fe, hl),
> +	TP_PROTO(struct cxl_port *port, struct cxl_dport *dport,
> +		 u32 status, u32 fe, u32 *hl, u64 serial),
> +	TP_ARGS(port, dport, status, fe, hl, serial),
>  	TP_STRUCT__entry(
> -		__string(memdev, dev_name(&cxlmd->dev))
> -		__string(host, dev_name(cxlmd->dev.parent))
> +		__string(memdev, cxl_trace_memdev_name(port))
> +		__string(port, cxl_trace_port_name(port))
> +		__string(dport, cxl_trace_dport_name(dport))
> +		__string(host, cxl_trace_host_name(port))
>  		__field(u64, serial)
>  		__field(u32, status)
>  		__field(u32, first_error)
> @@ -93,8 +64,10 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
>  	),
>  	TP_fast_assign(
>  		__assign_str(memdev);
> +		__assign_str(port);
> +		__assign_str(dport);
>  		__assign_str(host);
> -		__entry->serial = cxlmd->cxlds->serial;
> +		__entry->serial = serial;
>  		__entry->status = status;
>  		__entry->first_error = fe;
>  		/*
> @@ -107,8 +80,9 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
>  		memcpy(__entry->header_log, hl,
>  			CXL_HEADERLOG_TRACE_SIZE_U32 * sizeof(u32));
>  	),
> -	TP_printk("memdev=%s host=%s serial=%lld: status: '%s' first_error: '%s'",
> -		  __get_str(memdev), __get_str(host), __entry->serial,
> +	TP_printk("memdev=%s port=%s dport=%s host=%s serial=%lld: status: '%s' first_error: '%s'",
> +		  __get_str(memdev), __get_str(port), __get_str(dport),
> +		  __get_str(host), __entry->serial,
>  		  show_uc_errs(__entry->status),
>  		  show_uc_errs(__entry->first_error)
>  	)
> @@ -132,42 +106,29 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
>  	{ CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" }	\
>  )
>  
> -TRACE_EVENT(cxl_port_aer_correctable_error,
> -	TP_PROTO(struct device *dev, u32 status),
> -	TP_ARGS(dev, status),
> -	TP_STRUCT__entry(
> -		__string(device, dev_name(dev))
> -		__string(host, dev_name(dev->parent))
> -		__field(u32, status)
> -	),
> -	TP_fast_assign(
> -		__assign_str(device);
> -		__assign_str(host);
> -		__entry->status = status;
> -	),
> -	TP_printk("device=%s host=%s status='%s'",
> -		  __get_str(device), __get_str(host),
> -		  show_ce_errs(__entry->status)
> -	)
> -);
> -
>  TRACE_EVENT(cxl_aer_correctable_error,
> -	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
> -	TP_ARGS(cxlmd, status),
> +	TP_PROTO(struct cxl_port *port, struct cxl_dport *dport,
> +		 u32 status, u64 serial),
> +	TP_ARGS(port, dport, status, serial),
>  	TP_STRUCT__entry(
> -		__string(memdev, dev_name(&cxlmd->dev))
> -		__string(host, dev_name(cxlmd->dev.parent))
> +		__string(memdev, cxl_trace_memdev_name(port))
> +		__string(port, cxl_trace_port_name(port))
> +		__string(dport, cxl_trace_dport_name(dport))
> +		__string(host, cxl_trace_host_name(port))
>  		__field(u64, serial)
>  		__field(u32, status)
>  	),
>  	TP_fast_assign(
>  		__assign_str(memdev);
> +		__assign_str(port);
> +		__assign_str(dport);
>  		__assign_str(host);
> -		__entry->serial = cxlmd->cxlds->serial;
> +		__entry->serial = serial;
>  		__entry->status = status;
>  	),
> -	TP_printk("memdev=%s host=%s serial=%lld: status: '%s'",
> -		  __get_str(memdev), __get_str(host), __entry->serial,
> +	TP_printk("memdev=%s port=%s dport=%s host=%s serial=%lld: status: '%s'",
> +		  __get_str(memdev), __get_str(port), __get_str(dport),
> +		  __get_str(host), __entry->serial,
>  		  show_ce_errs(__entry->status)
>  	)
>  );
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index ed419d0c59f2f..f1ef8b78db18a 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -125,6 +125,13 @@ static inline int cxl_memdev_attach_region(struct cxl_memdev *cxlmd)
>  #endif
>  
>  struct cxl_memdev *devm_cxl_add_classdev(struct cxl_dev_state *cxlds);
> +
> +/* trace-event helpers */
> +const char *cxl_trace_memdev_name(struct cxl_port *port);
> +const char *cxl_trace_host_name(struct cxl_port *port);
> +const char *cxl_trace_port_name(struct cxl_port *port);
> +const char *cxl_trace_dport_name(struct cxl_dport *dport);
> +
>  struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
>  					 const struct cxl_memdev_attach *attach);
>  int devm_cxl_sanitize_setup_notifier(struct device *host,


^ permalink raw reply

* Re: [PATCH v7 03/12] PCI: liveupdate: Track incoming preserved PCI devices
From: Pasha Tatashin @ 2026-07-20 22:44 UTC (permalink / raw)
  To: David Matlack
  Cc: Pasha Tatashin, kexec, linux-doc, linux-kernel, linux-mm,
	linux-pci, Adithya Jayachandran, Alexander Graf, Alex Williamson,
	Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
	Lukas Wunner, Mike Rapoport, Parav Pandit, Pranjal Shrivastava,
	Pratyush Yadav, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, William Tu, Yi Liu
In-Reply-To: <CALzav=e1Gcu-9TD7F--41cYqKBrStKOE4VqQ4K3PKdNuYnNmcQ@mail.gmail.com>

On 2026-07-20 14:54:51-07:00, David Matlack wrote:
> On Fri, Jul 17, 2026 at 2:47 PM Pasha Tatashin
> <pasha.tatashin@soleen.com> wrote:
> 
> > On Fri, 10 Jul 2026 21:26:06 +0000, David Matlack <dmatlack@google.com> wrote:
> >
> > I am confused by this. Each preserved PCI device must have an associated
> > FD preserved with it via LUO. I.e., vfiofd would need to be preserved. If
> > the vfiofd was not reclaimed, and finish is not possible, that vfiofd
> > would still be owned by LUO, and therefore PCI FLB refcount would stay
> > positive.
> >
> > However, if finish is possible, and this is the last vfiofd that is
> > finished, FLB will be freed as soon as the reference count reaches zero,
> > which I would think is the expected behavior.
> >
> > What is the point of holding a reference here, instead of only for the
> > duration of FLB access, i.e. to make sure we are accessing a valid data?
> 
> The duration of the access is from here until pci_liveupdate_finish()
> because that it when the pointer (dev->liveupdate.incoming) is
> cleared. So that is why the PCI core holds the reference from here
> until pci_liveupdate_finish().
> 
> We could avoid this by deleteing dev->liveupdate.incoming and,
> instead, fetching the incoming FLB and doing the xarray lookup every
> atime the PCI core needs to access the device's incoming ser struct,
> but that would be inefficient.

Thanks for the explanation. As I understand, the most straightforward 
way to avoid holding the permanent reference is indeed to delete 
dev->liveupdate.incoming entirely and perform an xarray lookup on every 
access, like this:

bool pci_liveupdate_is_incoming(struct pci_dev *dev)
{
	...
	incoming = pci_liveupdate_flb_get_incoming();
	...
	dev_ser = xa_load(&incoming->xa, key);
	...
	pci_liveupdate_flb_put_incoming();
	return dev_ser && dev_ser->refcount > 0;
}

However, as you note, this is inefficient because it affects every 
single device and adds lookup overhead to every access (not sure about 
the actual cost though, xarray access is pretty fast!).

We can, however, still avoid tinkering with the lifecycle of the FLB, 
and instead treat dev->liveupdate.incoming as a "hint" that we validate 
on access with a fast, liveness check:

1. At Setup: In pci_liveupdate_setup_device(), we do the xarray lookup 
once, cache the pointer in dev->liveupdate.incoming, and immediately 
call pci_liveupdate_flb_put_incoming(). We do not hold a permanent 
reference.

2. On Access: When an accessor runs, instead of doing a full xarray 
lookup, it just validates the cached pointer's liveness by temporarily 
securing the FLB:

static struct pci_flb_incoming *pci_liveupdate_get_incoming(struct pci_dev *dev)
{
	struct pci_flb_incoming *incoming;

	incoming = pci_liveupdate_flb_get_incoming();
	if (!incoming)
		return NULL;

	if (dev->liveupdate.incoming)
		return incoming;

	pci_liveupdate_flb_put_incoming();
	return NULL;
}

* If get_incoming() returns NULL (the FLB has already finished/freed), 
  the hint is invalid and the device is no longer incoming.

* If it returns a valid pointer, the FLB is guaranteed to be alive, so 
  we can immediately use the cached dev->liveupdate.incoming pointer 
  without xarray lookups. The accessor then puts the transient 
  reference when done.

3. In Finish/Cleanup: We can simply clear dev->liveupdate.incoming,
and decrement ser->nr_devices.

^ permalink raw reply

* Re: [PATCH v18 05/13] PCI/AER: Introduce AER-CXL protocol error kfifo
From: Jonathan Cameron @ 2026-07-20 22:41 UTC (permalink / raw)
  To: Terry Bowman
  Cc: Bjorn Helgaas, Dan Williams, Dave Jiang, Ira Weiny, Len Brown,
	Rafael J . Wysocki, Robert Richter, linux-acpi, linux-cxl,
	linux-doc, linux-kernel, linux-pci, linuxppc-dev,
	Alejandro Lucero, Alison Schofield, Ankit Agrawal, Ard Biesheuvel,
	Ben Cheatham, Borislav Petkov, Breno Leitao, Davidlohr Bueso,
	Fabio M . De Francesco, Gregory Price, Hanjun Guo,
	Jonathan Corbet, Kees Cook, Kuppuswamy Sathyanarayanan, Li Ming,
	Mahesh J Salgaonkar, Mauro Carvalho Chehab, Oliver O'Halloran,
	Shiju Jose, Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck,
	Vishal Verma, Ashok Raj
In-Reply-To: <20260717222706.3540281-6-terry.bowman@amd.com>

On Fri, 17 Jul 2026 17:26:58 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

Hi Terry.

This is a rather length patch description.  Might be worth
a parse to see if key details can be covered in something that folk
are more likely to read!

> CXL VH RAS handling requires a path for the AER driver to hand off
> CXL protocol errors to cxl_core for logging and recovery before PCIe
> AER recovery tears down the device. Add
> drivers/pci/pcie/aer_cxl_vh.c to implement this handoff via a kfifo-backed
> work item.
> 
> Introduce is_aer_internal_error() to identify CXL protocol errors
> from AER internal error status bits across both correctable and
> uncorrectable severities.

Already existed. Just in a different location. 

> 
> Introduce is_cxl_error() to gate the VH kfifo path.

Feels like to much detail to me given not really anything to say
about it other than the obvious.

> 
> Introduce struct cxl_proto_err_work_data to carry the error source
> PCI device and severity through the kfifo. Encapsulate the kfifo,
> per-producer spinlock, registration rwsem, and work pointer in struct
> cxl_proto_err_kfifo. Initialize the embedded kfifo via INIT_KFIFO()

Can we cut this down a little where we don't use useful info. e.g.
"Initialize the kfifo from a subsys_initcall() to it is ready before
any producer or consumer runs."

> from a subsys_initcall so its metadata is populated before any
> producer or consumer runs.
> 

> Introduce cxl_forward_error() to enqueue a CXL protocol error. A
> reference is taken on the PCI device; the consumer releases it via
> for_each_cxl_proto_err(). On enqueue failure the reference is
> released immediately, the error is dropped, and the consumer is
> scheduled to drain existing entries.

We don't need to cover error paths in the patch description unless
they are really complex and need explanation. Even then probably belongs
more in comments.

Anyhow you get the idea..

> A subsequent patch wires
> cxl_forward_error() into handle_error_source() where correctable and
> uncorrectable status clearing is left to pci_aer_handle_error().
> 
> Introduce cxl_proto_err_flush() to synchronously wait for the
> consumer worker to drain the kfifo. A subsequent patch wires this
> into handle_error_source() for UCE events so the CXL plane completes
> error handling and panic policy before pci_aer_handle_error() drives
> PCIe recovery.
> 
> Introduce cxl_register_proto_err_work() and
> cxl_unregister_proto_err_work() for cxl_core to register and
> deregister its work handler. On unregistration, pending kfifo entries
> are drained and their pdev references released before
> cancel_work_sync() runs. Export these and for_each_cxl_proto_err()
> via EXPORT_SYMBOL_FOR_MODULES restricted to cxl_core.
> 
> Protect the work pointer with a rwsem to correctly serialize
> registration, deregistration, enqueue, and dequeue against concurrent
> AER IRQ threads. Serialize concurrent kfifo writers with a spinlock.
> 
> Add MAINTAINERS entries for aer_cxl_vh.c and aer_cxl_rch.c under
> the CXL entry so CXL maintainers are CC'd on changes to the AER-CXL
> bridging code.

Various things inline.  The one potential thing I'd like to highlight is
the loss of tracking assume panic isn't appropriate.  My paranoid hat
says always go the other way.  If we don't know we didn't lose an
uncorrectable error panic.  Maybe it's worth us considering when this
might happen in a real system?

> 
> Co-developed-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> 
> ---
> 
> Changes in v17->v18:
> - Remove correctable status clear from cxl_forward_error(); the AER core
>   clears all status bits via pci_aer_handle_error() info->status writeback
> - Schedule consumer on kfifo overflow so existing entries can be drained
> 
> Changes in v16->v17:
> - Reword "kfifo semaphore" to "kfifo spinlock" to match fifo_lock.
> - Defer the handle_error_source() is_cxl_error() switch to the patch that
>   registers the kfifo consumer to keep each commit bisect-safe.
> - Rename rwsema to rwsem
> - Change CPER exports to use EXPORT_SYMBOL_FOR_MODULES.
> - Add work cancel function.
> - Replace kfifo_put() with kfifo_in_spinlocked() for multiple producers
> - Add fifo_lock spinlock for concurrent producer serialisation
> - Initialize the embedded kfifo with INIT_KFIFO() in a subsys_initcall so
>   kfifo->mask, ->esize and ->data are set before first use.
> - Clear PCI_ERR_COR_STATUS in cxl_forward_error() after enqueue so the
>   device is acked for correctable events even when the consumer drops the
>   event. Uncorrectable status is left for cxl_do_recovery() to clear after
>   recovery completes, mirroring the AER core convention.
> - WARN on double-registration in cxl_register_proto_err_work() to make an
>   unintended second consumer visible at runtime.
> - Add direct rwsem.h, cleanup.h and workqueue.h includes for symbols used
>   in aer_cxl_vh.c
> - Add MAINTAINERS entries for drivers/pci/pcie/aer_cxl_*.c
> - Update message
> ---
>  MAINTAINERS                   |   2 +
>  drivers/pci/pcie/Makefile     |   1 +
>  drivers/pci/pcie/aer.c        |  10 --
>  drivers/pci/pcie/aer_cxl_vh.c | 221 ++++++++++++++++++++++++++++++++++
>  drivers/pci/pcie/portdrv.h    |   6 +
>  include/linux/aer.h           |  24 ++++
>  6 files changed, 254 insertions(+), 10 deletions(-)
>  create mode 100644 drivers/pci/pcie/aer_cxl_vh.c
> 


> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> new file mode 100644
> index 0000000000000..93bed07936100
> --- /dev/null
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -0,0 +1,221 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2026 AMD Corporation. All rights reserved. */
> +
> +#include <linux/aer.h>
> +#include <linux/atomic.h>
> +#include <linux/cleanup.h>
> +#include <linux/init.h>
> +#include <linux/kfifo.h>
> +#include <linux/rwsem.h>

Check these.  I'd expect at least spinlock.h to meet the rough include what
you use aim.

> +#include <linux/wait_bit.h>
> +#include <linux/workqueue.h>
> +#include "../pci.h"
> +#include "portdrv.h"
> +
> +#define CXL_ERROR_SOURCES_MAX          128
> +
> +struct cxl_proto_err_kfifo {
> +	struct work_struct *work;
> +	void (*flush)(void);
> +	struct rw_semaphore rwsem;
> +	spinlock_t fifo_lock;

Ideally add a quick comment to every lock to say what data it covers.
Kind of obvious for this one but in general it is good practice and
reduces chance of later scope confusion.

> +	atomic_t flush_inflight;
> +	DECLARE_KFIFO(fifo, struct cxl_proto_err_work_data,
> +		      CXL_ERROR_SOURCES_MAX);
> +};

> +/**
> + * cxl_forward_error - Forward a CXL protocol error to the CXL subsystem via kfifo
> + * @pdev: PCI device that reported the AER error
> + * @info: AER error info containing severity and status
> + *
> + * Producer side of the AER-CXL kfifo. Enqueues a CXL protocol error work
> + * item and schedules the consumer workqueue. Takes a reference on @pdev
> + * that the consumer releases after handling.
> + *
> + * Return: true if the caller must flush the kfifo before AER recovery,
> + * false if no CXL error handling was initiated due to early return on
> + * error.
> + */
> +bool cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info)
> +{
> +	struct cxl_proto_err_work_data wd = {
> +		.severity = info->severity,
> +		.pdev = pdev,
> +	};
> +
> +	guard(rwsem_read)(&cxl_proto_err_kfifo.rwsem);
> +
> +	if (!cxl_proto_err_kfifo.work) {
> +		dev_err_ratelimited(&pdev->dev, "AER-CXL kfifo reader not registered\n");
> +		return false;
> +	}
> +
> +	/*
> +	 * Reference discipline: the AER caller (handle_error_source())
> +	 * holds a ref on @pdev for the duration of this call and releases
> +	 * it on return. Take a fresh ref here so the pdev stays live while
> +	 * queued in the kfifo; the consumer (for_each_cxl_proto_err())
> +	 * drops that ref after handling. On enqueue failure below, drop
> +	 * the ref we just took to avoid a leak.

Most of this is useful. The what we do in error handling (given so local)
probably not.

> +	 */
> +	pci_dev_get(pdev);
> +
> +	/* Serialize concurrent kfifo writers: multiple AER threaded IRQs */
> +	if (!kfifo_in_spinlocked(&cxl_proto_err_kfifo.fifo, &wd, 1,
> +				 &cxl_proto_err_kfifo.fifo_lock)) {
> +		/* Dropped; no panic - UCE unconfirmed without RAS read */

Hmm. That's interesting.  To me it fails the normal ras thing of assume
the worst if we lost track. I'd panic. But I'm open to other views on this!

> +		dev_err_ratelimited(&pdev->dev, "AER-CXL kfifo add failed\n");
> +		pci_dev_put(pdev);
> +		schedule_work(cxl_proto_err_kfifo.work);

Shared with below. Maybe just drop out of the if unless this gets more
complex in later patches.

> +		return true;
> +	}
> +
> +	schedule_work(cxl_proto_err_kfifo.work);
> +	return true;
> +}


> +/**
> + * for_each_cxl_proto_err - Call a function for each kfifo work item
> + *
> + * Single-consumer invariant: this function is only called from
> + * cxl_proto_err_work_fn() via a single DECLARE_WORK.
> + *
> + * Holds rwsem_read internally; fn() must not call cxl_register_proto_err_work()
> + * or cxl_unregister_proto_err_work().
> + */
> +void for_each_cxl_proto_err(struct cxl_proto_err_work_data *wd,
> +			    cxl_proto_err_fn_t fn)
> +{
> +	guard(rwsem_read)(&cxl_proto_err_kfifo.rwsem);
> +	while (kfifo_get(&cxl_proto_err_kfifo.fifo, wd)) {
> +		fn(wd);

Like the earlier case I razed, I'd like a comment here to say where
the reference we re releasing was taken.

> +		pci_dev_put(wd->pdev);
> +	}
> +}
> +EXPORT_SYMBOL_FOR_MODULES(for_each_cxl_proto_err, "cxl_core");
> +
> +/**
> + * cxl_proto_err_flush - drain pending AER-CXL kfifo work synchronously
> + *
> + * Wait for the consumer worker to finish processing all entries
> + * currently in the kfifo. Used by handle_error_source() for UCE so
> + * the CXL plane can read CXL RAS, apply panic policy, and clear CXL
> + * state before pci_aer_handle_error() drives PCIe recovery.
> + *
> + * Snapshots the flush callback under rwsem_read and releases the rwsem
> + * before calling it.  This avoids holding rwsem_read across flush_work(),
> + * which would deadlock via the rwsem HANDOFF mechanism when a concurrent
> + * rwsem_write waiter (cxl_unregister_proto_err_work) blocks new readers
> + * including the worker's for_each_cxl_proto_err() rwsem_read acquisition.
> + *
> + * The flush_inflight counter prevents cxl_core module unload while a
> + * flush is in progress outside the rwsem. The counter is incremented
> + * under rwsem_read (mutually exclusive with the rwsem_write in
> + * cancel_cxl_proto_err() that NULLs the flush pointer) and decremented
> + * after the flush completes. cxl_unregister_proto_err_work() waits for
> + * the counter to reach zero before proceeding with cancel_work_sync().

I can't see it from just this patch so it might be useful to say if the
counter is ever not equal to 1 or 0.  If it isn't can we make that explicit
in the code?

> + *
> + * For correctable events the consumer can run asynchronously; AER
> + * does not need to call this helper for AER_CORRECTABLE.

For me this is both too much and likely to rot over time. Some stuff
feels like it belongs with the flags it is talking about rather than here.

> + */
> +void cxl_proto_err_flush(void)
> +{
> +	void (*flush)(void);
> +
> +	scoped_guard(rwsem_read, &cxl_proto_err_kfifo.rwsem) {
> +		flush = cxl_proto_err_kfifo.flush;
> +		if (flush)
> +			atomic_inc(&cxl_proto_err_kfifo.flush_inflight);
> +	}
> +
> +	if (flush) {
> +		flush();
> +		if (atomic_dec_and_test(&cxl_proto_err_kfifo.flush_inflight))
> +			wake_up_var(&cxl_proto_err_kfifo.flush_inflight);
> +	}
> +}



^ permalink raw reply

* [PATCH v6] docs/ja_JP: translate submitting-patches.rst (sign-off)
From: Akiyoshi Kurita @ 2026-07-20 22:34 UTC (permalink / raw)
  To: linux-doc; +Cc: linux-kernel, corbet, akiyks, weibu

Translate the "Include PATCH in the subject" and "Sign your work -
the Developer's Certificate of Origin" sections into Japanese.

Keep the DCO text in English as the original certificate text, and add
a Japanese note that the sign-off refers to the English DCO text.

Use a reStructuredText note directive to make it clear that the note is
specific to the Japanese translation.

Signed-off-by: Akiyoshi Kurita <weibu@redadmin.org>
---
Changes in v6:

Remove the extra blank line at EOF introduced during the rebase.

 .../ja_JP/process/submitting-patches.rst      | 77 +++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/Documentation/translations/ja_JP/process/submitting-patches.rst b/Documentation/translations/ja_JP/process/submitting-patches.rst
index d8ee82ba790b..22d54b663051 100644
--- a/Documentation/translations/ja_JP/process/submitting-patches.rst
+++ b/Documentation/translations/ja_JP/process/submitting-patches.rst
@@ -402,3 +402,80 @@ ping したりする前に、少なくとも 1 週間は待ってください。
 を追加しないでください。
 "RESEND" は、前回の投稿から一切変更のないパッチまたはパッチシリーズの
 再送だけに当てはまります。
+
+
+件名に PATCH を含める
+---------------------
+
+Linus と linux-kernel メーリングリストには大量のメールが届くため、
+件名の先頭に ``[PATCH]`` を付けることが一般的な慣例となっています。
+これにより、Linus や他のカーネル開発者は、パッチとその他の議論を
+容易に区別できます。
+
+``git send-email`` は、この指定を自動的に行います。
+
+
+作業への署名 - Developer's Certificate of Origin
+--------------------------------------------------
+
+誰が何を行ったのかを追跡しやすくするため、特にパッチが複数階層の
+メンテナーを経由して最終的にカーネルへ取り込まれる場合に備えて、
+メールでやり取りされるパッチには sign-off の手続きが導入されています。
+
+sign-off は、パッチの説明の末尾に追加する単純な一行です。これは、
+そのパッチを自分で作成したか、オープンソースのパッチとして提出する
+権利を持っていることを証明します。
+
+.. note:: 【訳註】
+
+   ``Signed-off-by`` によって同意する対象は、翻訳文ではなく、
+   以下に示す英語原文の Developer's Certificate of Origin 1.1 です。
+   DCO は法的な性質を持つ文書であるため、本文は翻訳せず、原文のまま
+   掲載します。内容を確認する場合は、必ず英語原文を参照してください。
+
+規則は単純で、以下を証明できる場合です::
+
+        Developer's Certificate of Origin 1.1
+
+        By making a contribution to this project, I certify that:
+
+        (a) The contribution was created in whole or in part by me and I
+            have the right to submit it under the open source license
+            indicated in the file; or
+
+        (b) The contribution is based upon previous work that, to the best
+            of my knowledge, is covered under an appropriate open source
+            license and I have the right under that license to submit that
+            work with modifications, whether created in whole or in part
+            by me, under the same open source license (unless I am
+            permitted to submit under a different license), as indicated
+            in the file; or
+
+        (c) The contribution was provided directly to me by some other
+            person who certified (a), (b) or (c) and I have not modified
+            it.
+
+        (d) I understand and agree that this project and the contribution
+            are public and that a record of the contribution (including all
+            personal information I submit with it, including my sign-off) is
+            maintained indefinitely and may be redistributed consistent with
+            this project or the open source license(s) involved.
+
+上記を証明できる場合は、次のような行を追加します::
+
+        Signed-off-by: Random J Developer <random@developer.example.org>
+
+既知の身元を使用してください。匿名での貢献は認められません。
+``git commit -s`` を使用すると、この行を自動的に追加できます。
+
+revert にも ``Signed-off-by:`` を含める必要があります。
+``git revert -s`` を使用すると、自動的に追加できます。
+
+末尾に追加のタグを付ける人もいます。現時点では無視されますが、
+社内手続きを示したり、sign-off に関する特記事項を記録したりするために
+使用できます。
+
+作者の SoB に続く追加の SoB（``Signed-off-by:``）は、パッチの開発には
+関与せず、その取り扱いや転送を行った人によるものです。SoB の連鎖は、
+パッチがメンテナーを経て最終的に Linus へ届いた実際の経路を反映する
+必要があります。最初の SoB は、単独の主要作者であることを示します。
-- 
2.52.0


^ permalink raw reply related

* Re: [RFC] cxl: Device protocol AER injection
From: Bowman, Terry @ 2026-07-20 22:27 UTC (permalink / raw)
  To: Jonathan Cameron, Ashok Raj
  Cc: Bjorn Helgaas, Dan Williams, Dave Jiang, Ira Weiny, Len Brown,
	Rafael J . Wysocki, Robert Richter, linux-acpi, linux-cxl,
	linux-doc, linux-kernel, linux-pci, linuxppc-dev,
	Alejandro Lucero, Alison Schofield, Ankit Agrawal, Ard Biesheuvel,
	Ben Cheatham, Borislav Petkov, Breno Leitao, Davidlohr Bueso,
	Fabio M . De Francesco, Gregory Price, Hanjun Guo,
	Jonathan Corbet, Kees Cook, Kuppuswamy Sathyanarayanan, Li Ming,
	Mahesh J Salgaonkar, Mauro Carvalho Chehab, Oliver O'Halloran,
	Shiju Jose, Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck,
	Vishal Verma, linux-cxl@vger.kernel.org, linux-acpi, linux-doc,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linuxppc-dev
In-Reply-To: <20260720220430.30b7e3ef@jic23-huawei>

On 7/20/2026 4:04 PM, Jonathan Cameron wrote:
> On Fri, 17 Jul 2026 17:57:00 -0500
> Terry Bowman <terry.bowman@amd.com> wrote:
> 
>> This patch is intended to provide a method of testing the recently submitted
>> cxl series "cxl: Enable CXL PCIe Port Protocol Error handling and logging" found
>> here:
>>
>> https://lore.kernel.org/linux-cxl/20260717222706.3540281-1-terry.bowman@amd.com/T/#md90ec1fdd1b374bf1e32e7736e2b3e34b328c701
> 
> Hi Terry,
> 
> https://lore.kernel.org/linux-cxl/20260717222706.3540281-1-terry.bowman@amd.com
> 
> Works fine.  Generally you can crop that end bit off the links.
> 

Good to know. I'll use that going forward.

>>
>> The changes in this patch will allow CXL RAS protocol testing by injecting
>> AER errors using AER EINJ. The RAS register block status is updated
>> using a central function to augment RAS register block returned by
>> to_ras_base(). This supports all CXL devices including Root Ports,
>> Upstream Switch Ports, Downstream Switch Ports, Endpoints, and RCH
>> Downstream Ports.
> 
> Why is this an RFC rather than a final proposal?  There should always
> be something to give the reviewer that info in the patch description.
> I'd actually be tempted to throw a cover letter in to have somewhere
> out of the way to put that information.
> 
> Is it simply because it only makes sense once the other seris lands. 
> 

The immediate priority was providing a testing procedure for the v18 series.
I wasn't sure how the design would be received. For instance, the to_einj_ras_base() 
could be moved into cxl-test as a to_ras_base() mock implemented function or 
it could remain in ras.c (or maybe even ras_einj.c) outside of cxl-mock. I'm 
looking forward to Alison's review and comments. 

My preference is to introduce core/ras_einj,c and add these changes. This 
would be to isolate all the changes except for: cxl_ras_einj_init(), cxl_ras_einj_exit(), 
and to_einj_ras_base(). I've started moving forward with making these changes 
knowing the direction could change once this receives more reviews.

Also worth discussing is the commandline takes multiple parameters for
a single sysfs file which I know isn't acceptable by everyone. I personally 
like the interface as-is because its simpler to use in comparison to multiple 
files that must be set individually.

>>
>> Add debugfs-based CXL protocol error injection for testing CXL RAS
>> error handling paths. Injects CXL RAS protocol errors using AER internal
>> error inject interface via /sys/kernel/debug/cxl/aer_einj_inject.
>>
>> RAS CXL status is set using to_ras_base() function override when kernel config
>> CONFIG_CXL_PROTO_AER_EINJ is enabled.
>>
>> Usage:
>>   echo "DDDD:BB:DD.F [UCE|CE] AER_STATUS RAS_STATUS [RCH]" > \
>>       /sys/kernel/debug/cxl/aer_einj_inject
>>
>> Move struct aer_error_inj and aer_inject() to linux/aer.h so CXL
>> can invoke AER injection directly. Export aer_inject() with
>> EXPORT_SYMBOL_GPL.
>>
>> Make cxl_debugfs non-static in port.c and declare it extern in
>> core.h so the debugfs file can be created under the existing CXL
>> debugfs root.
>>
>> Co-developed-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> +CC Ashok,
> 
> Various thing inline.  Mostly this review is a bit superficial as I'd like
> ideally to see a cleaner separation of this at level of files etc.
> 
> Thanks,
> 
> Jonathan
>> ---
>>  drivers/cxl/Kconfig           |  13 +++
>>  drivers/cxl/core/core.h       |  21 ++++
>>  drivers/cxl/core/port.c       |   2 +-
>>  drivers/cxl/core/ras.c        | 208 ++++++++++++++++++++++++++++++++++
>>  drivers/cxl/core/ras_rch.c    |  12 ++
>>  drivers/pci/pcie/aer_inject.c |  29 ++---
>>  include/linux/aer.h           |  15 +++
>>  7 files changed, 281 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
>> index 80aeb0d556bd7..ef449228b2549 100644
>> --- a/drivers/cxl/Kconfig
>> +++ b/drivers/cxl/Kconfig
>> @@ -238,6 +238,19 @@ config CXL_RAS
>>  	def_bool y
>>  	depends on ACPI_APEI_GHES && PCIEAER && CXL_BUS
>>  
>> +config CXL_PROTO_AER_EINJ
>> +	bool "CXL: RAS Protocol Error Injection using AER EINJ"
>> +	depends on CXL_RAS
>> +	depends on PCIEAER_INJECT
> 
> Do we think anyone who has CXL and PCIEAER_INJECT support will want
> to carefully not build this?  I'm just wondering if we can avoid asking
> the question and base the built or not on the combination of those.
> 

Good question: How do we incorporate this with the existing CXL EINJ 
functionality making them complementary and consistant? The existing 
EINJ is true ACPI injection but only supports RPs. The ACPI EINJ callouts 
are currrently in core/port.c. We should consider moving it into a common file 
such as core/ras_einj.c. With that move it will help force us merge the interfaces
where/if possible and at least give central location for error RAS injection.
I think folding the AER injection into PCIEAER_INJECT kernel config is 
reasonable but looking for others feedback.

>> +	help
>> +	  Enable debugfs-based CXL protocol error injection. Writes to
>> +	  /sys/kernel/debug/cxl/aer_einj_inject inject CXL RAS protocol
>> +	  errors using the AER internal error inject interface.
>> +
>> +	  This is a debug/test facility. Say N for production kernels.
>> +
>> +	  If unsure say N.
> 
>> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
>> index a55a4e409feda..91910d2bb5d39 100644
>> --- a/drivers/cxl/core/core.h
>> +++ b/drivers/cxl/core/core.h
> ...
> 
>> @@ -244,4 +247,22 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
>>  
>>  resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
>>  					   struct cxl_dport *dport);
>> +
>> +#ifdef CONFIG_CXL_PROTO_AER_EINJ
> 
> Do we need the ifdefs?  If the option isn't built none of this should get
> used.  So small benefit.
> 

Youre right. You pointed that trick/pattern out before and results in less 
unneeded code. I'll make that change.

>> +
>> +#define AER_REGISTER_SIZE 5
>> +#define RAS_REGISTER_SIZE (CXL_RAS_CAPABILITY_LENGTH / sizeof(u32))
>> +
>> +struct cxl_aer_einj {
>> +	int correctable;
>> +	bool is_rch;
>> +	struct mutex *lock;
>> +	struct device *dev;
>> +	u32 aer_registers[AER_REGISTER_SIZE];
>> +	u32 ras_registers[RAS_REGISTER_SIZE];
>> +};
>> +
>> +extern struct cxl_aer_einj cxl_aer_einj;
>> +#endif
> 
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index d77208af41e03..d41deea899d30 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -3,6 +3,7 @@
>>  
>>  #include <linux/pci.h>
>>  #include <linux/aer.h>
>> +#include <linux/debugfs.h>
>>  #include <cxl/event.h>
>>  #include <cxlmem.h>
>>  #include <cxlpci.h>
>> @@ -117,6 +118,195 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>>  }
>>  static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>>  
>> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
>> +
> 
> Unless very strong reasons for it, generally don't do #if stuff in c files.
> Just have a separate c file for this.  Otherwise it hurts readability and
> we tend to loose the clean separation over time.  A file makes that less
> likely.
> 
Ok.
> 
>> +static DEFINE_MUTEX(cxl_aer_einj_mutex);
> 
> Needs a comment for what data it is protecting.
> 
>> +
>> +struct cxl_aer_einj cxl_aer_einj = {
>> +	.lock = &cxl_aer_einj_mutex,
>> +};
>> +
>> +static const char cxl_aer_einj_usage[] =
>> +	"ssss:bb:dd.f [UCE|CE] AER_STATUS RAS_STATUS [RCH]\n";
>> +
>> +static int cxl_aer_inject_error(struct pci_dev *pdev, bool correctable,
>> +				u32 aer_status, u32 ras_status)
>> +{
>> +	/* RCD errors are signaled as internal errors on the associated RCEC */
>> +	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) {
>> +		if (!pdev->rcec)
>> +			return -ENODEV;
>> +		pdev = pdev->rcec;
>> +	}
>> +
>> +	struct aer_error_inj einj = {
>> +		.bus = pdev->bus->number,
>> +		.dev = PCI_SLOT(pdev->devfn),
>> +		.fn = PCI_FUNC(pdev->devfn),
>> +		.domain = pci_domain_nr(pdev->bus),
>> +	};
>> +	int ret;
>> +	int aer_offset;
>> +	int ras_offset;
>> +
>> +	if (correctable) {
>> +		einj.cor_status = aer_status | PCI_ERR_COR_INTERNAL;
>> +		aer_offset = PCI_ERR_COR_STATUS / sizeof(u32);
>> +		ras_offset = CXL_RAS_CORRECTABLE_STATUS_OFFSET / sizeof(u32);
> 
> Given these are offsets into cxl_aer_einj.aer_registers / ras_registers
> can we use sizeof(*cxl_aer_einj.aer_registers) etc
> 
We could but that assumes the CEs are book ending the register blocks. And this would 
be inconsistent with UCE case below, right? Tell me if I misunderstaood the question. 

>> +	} else {
>> +		einj.uncor_status = aer_status | PCI_ERR_UNC_INTN;
>> +		aer_offset = PCI_ERR_UNCOR_STATUS / sizeof(u32);
>> +		ras_offset = CXL_RAS_UNCORRECTABLE_STATUS_OFFSET / sizeof(u32);
>> +	}
>> +
>> +	cxl_aer_einj.correctable = correctable;
>> +	cxl_aer_einj.aer_registers[aer_offset] = aer_status;
>> +	cxl_aer_einj.ras_registers[ras_offset] = ras_status;
>> +
>> +	ret = aer_inject(&einj);
>> +	if (ret) {
>> +		pr_err("cxl-einj: aer_inject failed: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static ssize_t cxl_aer_einj_write(struct file *file,
>> +				    const char __user *ubuf,
>> +				    size_t count, loff_t *ppos)
> 
> Might as well wrap to 80 chars and save a line.
> 
Ok
>> +{
> 
> ...
> 
>> +
>> +	struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_dev(&pdev->dev, &dport);
>> +	if (!port) {
>> +		dev_err(&pdev->dev, "cxl-einj: Failed to find CXL Port.\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	if (!to_ras_base(port, dport)) {
>> +		dev_err(&pdev->dev, "cxl-einj: RAS not initialized.\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	cxl_aer_einj.is_rch = (nargs == 5 && strcmp(topology, "RCH") == 0);
>> +	if (!cxl_aer_einj.is_rch)
>> +		pci_dev_get(pdev);
>> +	cxl_aer_einj.dev = cxl_aer_einj.is_rch ? pdev->dev.parent : &pdev->dev;
> 
> 	cxl_aer_einj.dev = cxl_aer_einj.is_rch ? pdev->dev.parent : pci_dev_get(&pdev->dev);
> 
> Or use an if else for both is_rch based choices.
> Lazy me wonders.... Can we just grab a reference to the dev.parent for is_rch and
> simplify the code?  We don't really need it I think but it is harmless.
> 
> 
Your change is cleaner and the parent ref increment doesnt hurt as you mentioned.

>> +	ret = cxl_aer_inject_error(pdev, strcmp(severity, "CE") == 0,
>> +				   aer_status, ras_status);
>> +	if (ret) {
>> +		if (!cxl_aer_einj.is_rch)
>> +			pci_dev_put(pdev);
>> +		cxl_aer_einj.dev = NULL;
>> +		pr_err("cxl-einj: injection failed for %s: %d\n", sbdf, ret);
>> +		return ret;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>> +static ssize_t cxl_aer_einj_read(struct file *file, char __user *ubuf,
>> +				 size_t count, loff_t *ppos)
>> +{
>> +	return simple_read_from_buffer(ubuf, count, ppos,
>> +				       cxl_aer_einj_usage,
> 
> Probably wrap this to push that up a line.
> 

Ok

>> +				       sizeof(cxl_aer_einj_usage) - 1);
>> +}
> 
>> +
>> +static void __iomem *to_einj_ras_base(struct cxl_port *port, struct cxl_dport *dport)
>> +{
>> +	if (dport) {
>> +		if (cxl_aer_einj.is_rch) {
>> +			if (cxl_aer_einj.dev == dport->dport_dev) {
>> +				cxl_aer_einj.dev = NULL;
>> +				return (__force void __iomem *)cxl_aer_einj.ras_registers;
> 
> Given the output of this is always force cast, maybe move that force up to the caller?
> 

I think that works if it remains an internal helper and isn't made a mock function to to_ras_base(). 
If to_einj_ras_base() is used as a mock than it would require changing the to_ras_base() 
as well. 

>> +			}
>> +		} else {
>> +			if (cxl_aer_einj.dev == dport->dport_dev) {
>> +				pci_dev_put(to_pci_dev(cxl_aer_einj.dev));
> 
> Not locally obvious why a thing called to_einj_ras_base should put anything it didn't
> get.  I think this needs a restructure to more obviously be tidying up references
> that were held over the queue. At very leads needs a comment.
> /* Reference held from X no longer needed so drop */
> 

The ref was incremented in cxl_aer_einj_write() on invoking injection. The ref is 
decremented here after its usage. RCHs are excluded because they dont have a SBDF.

>> +				cxl_aer_einj.dev = NULL;
>> +				return (__force void __iomem *)cxl_aer_einj.ras_registers;
>> +			}
>> +		}
>> +	} else if (!cxl_aer_einj.is_rch) {
>> +		struct device *dev = is_cxl_endpoint(port) ?
>> +			port->uport_dev->parent : port->uport_dev;
>> +
>> +		if (dev_is_pci(dev) && cxl_aer_einj.dev == dev) {
>> +			pci_dev_put(to_pci_dev(cxl_aer_einj.dev));
>> +			cxl_aer_einj.dev = NULL;
>> +			return (__force void __iomem *)cxl_aer_einj.ras_registers;
>> +		}
>> +	}
>> +
>> +	return NULL;
>> +}
>> +#endif
>> +
>>  static void cxl_unmask_proto_interrupts(struct device *dev)
>>  {
>>  	struct pci_dev *pdev;
>> @@ -238,6 +428,14 @@ void __iomem *to_ras_base(struct cxl_port *port, struct cxl_dport *dport)
>>  	if (!port)
>>  		return NULL;
>>  
>> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
> 
> Wrap with
> 	if (IS_ENABLED()) to keep it visible to the compiler.
> 
Ok
>> +	if (cxl_aer_einj.dev) {
>> +		void __iomem *einj = to_einj_ras_base(port, dport);
>> +		if (einj)
>> +			return einj;
>> +	}
>> +#endif
>> +
>>  	if (dport)
>>  		return dport->regs.ras;
>>  
>> @@ -458,10 +656,20 @@ void cxl_ras_init(void)
>>  	cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
>>  	cxl_register_proto_err_work(&cxl_proto_err_work,
>>  				   cxl_proto_err_do_flush);
>> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
>> +	cxl_ras_create_debugfs(cxl_debugfs);
> stub that in a header.
> 

Actually, during rework today I moved it to cxl_ras_einj_init(),
a new function. Thoughts?

>> +#endif
>>  }
>>  
>>  void cxl_ras_exit(void)
>>  {
>>  	cxl_unregister_proto_err_work();
>>  	cxl_cper_unregister_prot_err_work();
>> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
> 
> As below.  Use 
> 	if (IS_ENABLED())
> 
> and keep everything visible.
> 

Ok

>> +	if (cxl_aer_einj.dev) {
>> +		if (!cxl_aer_einj.is_rch)
>> +			pci_dev_put(to_pci_dev(cxl_aer_einj.dev));
>> +		cxl_aer_einj.dev = NULL;
>> +	}
>> +#endif
>>  }
>> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
>> index 14bb3bdb2d092..5071cf86e4a68 100644
>> --- a/drivers/cxl/core/ras_rch.c
>> +++ b/drivers/cxl/core/ras_rch.c
>> @@ -110,6 +110,14 @@ void cxl_handle_rdport_errors(struct pci_dev *pdev)
>>  	if (!dport)
>>  		return;
>>  
>> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
>> +	if (cxl_aer_einj.is_rch && cxl_aer_einj.dev) {
>> +		severity = cxl_aer_einj.correctable ?
>> +			AER_CORRECTABLE : AER_FATAL;
>> +		goto handle_ras;
>> +	}
> 
> Use instead
> 	if (IS_ENABLED(CONFIG_CXL_PROTO_AR_EINJ))
> 
> then compiler can see the code (but remove it) which means
> you don't need the dance around the label below.
> 
> In general follow this pattern anyway rather than #if 
> when you can.
> 

Got it.
>> +#endif
>> +
>>  	if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
>>  		return;
>>  
>> @@ -117,6 +125,10 @@ void cxl_handle_rdport_errors(struct pci_dev *pdev)
>>  		return;
>>  
>>  	pci_print_aer(pdev, severity, &aer_regs);
>> +
>> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
>> +handle_ras:
>> +#endif
>>  	if (severity == AER_CORRECTABLE)
>>  		cxl_handle_cor_ras(dport->port, dport,
>>  				   to_ras_base(port, dport), pdev->dsn);
>> diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c
>> index 09bfc7194ef31..b313adef680ae 100644
>> --- a/drivers/pci/pcie/aer_inject.c
>> +++ b/drivers/pci/pcie/aer_inject.c
>> @@ -14,6 +14,7 @@
>>  
>>  #define dev_fmt(fmt) "aer_inject: " fmt
>>  
>> +#include <linux/aer.h>
>>  #include <linux/module.h>
>>  #include <linux/init.h>
>>  #include <linux/interrupt.h>
>> @@ -31,19 +32,6 @@
>>  static bool aer_mask_override;
>>  module_param(aer_mask_override, bool, 0);
>>  
>> -struct aer_error_inj {
>> -	u8 bus;
>> -	u8 dev;
>> -	u8 fn;
>> -	u32 uncor_status;
>> -	u32 cor_status;
>> -	u32 header_log0;
>> -	u32 header_log1;
>> -	u32 header_log2;
>> -	u32 header_log3;
>> -	u32 domain;
>> -};
>> -
>>  struct aer_error {
>>  	struct list_head list;
>>  	u32 domain;
>> @@ -316,7 +304,7 @@ static int pci_bus_set_aer_ops(struct pci_bus *bus)
>>  	return 0;
>>  }
>>  
>> -static int aer_inject(struct aer_error_inj *einj)
>> +int aer_inject(struct aer_error_inj *einj)
>>  {
>>  	struct aer_error *err, *rperr;
>>  	struct aer_error *err_alloc = NULL, *rperr_alloc = NULL;
>> @@ -332,10 +320,14 @@ static int aer_inject(struct aer_error_inj *einj)
>>  	dev = pci_get_domain_bus_and_slot(einj->domain, einj->bus, devfn);
>>  	if (!dev)
>>  		return -ENODEV;
>> -	rpdev = pcie_find_root_port(dev);
>> -	/* If Root Port not found, try to find an RCEC */
>> -	if (!rpdev)
>> -		rpdev = dev->rcec;
>> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) 
> 
> { }
> as the else is multiline (see coding standard)
> 

Ok

> Maybe need a comment for why it might be an RCEC for injection.
> Is this an RCH specific path where there is nothing else to target?
> 
>> +		rpdev = dev;
>> +	else {
>> +		rpdev = pcie_find_root_port(dev);
>> +		/* If Root Port not found, try to find an RCEC */
>> +		if (!rpdev)
>> +			rpdev = dev->rcec;
>> +	}
>>  	if (!rpdev) {
>>  		pci_err(dev, "Neither Root Port nor RCEC found\n");
>>  		ret = -ENODEV;
>> @@ -482,6 +474,7 @@ static int aer_inject(struct aer_error_inj *einj)
>>  	pci_dev_put(dev);
>>  	return ret;
>>  }
>> +EXPORT_SYMBOL_GPL(aer_inject);
> I wonder if we want to restrict this to specific modules?
> 
> One for Bjorn probably.
> 
Because right now it injects AER to any PCI device. 

Also, worth discussing is the commandline currently uses multiple parameters for
a single sysfs file. I mentioned this at the top.


Thanks for reviewing Jonathan.

-Terry

>>  
>>  static ssize_t aer_inject_write(struct file *filp, const char __user *ubuf,
>>  				size_t usize, loff_t *off)
>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>> index b3657b80564b9..65c22ba597657 100644
>> --- a/include/linux/aer.h
>> +++ b/include/linux/aer.h
>> @@ -27,6 +27,21 @@
>>  struct pci_dev;
>>  struct work_struct;
>>  
>> +struct aer_error_inj {
>> +	u8 bus;
>> +	u8 dev;
>> +	u8 fn;
>> +	u32 uncor_status;
>> +	u32 cor_status;
>> +	u32 header_log0;
>> +	u32 header_log1;
>> +	u32 header_log2;
>> +	u32 header_log3;
>> +	u32 domain;
>> +};
>> +
>> +int aer_inject(struct aer_error_inj *einj);
>> +
>>  struct pcie_tlp_log {
>>  	union {
>>  		u32 dw[PCIE_STD_MAX_TLP_HEADERLOG];
> 


^ permalink raw reply

* Re: [RFC PATCH 1/2] KVM: x86/pmu: Add CAP to disable SW accounting of emulated instructions
From: Sean Christopherson @ 2026-07-20 22:27 UTC (permalink / raw)
  To: Luka Absandze
  Cc: Paolo Bonzini, kvm, linux-kernel, linux-doc, Alexander Graf,
	David Woodhouse
In-Reply-To: <20260720192221.72912-2-absandze@amazon.de>

On Mon, Jul 20, 2026, Luka Absandze wrote:
> The only functional change for an opted-in VM is reduced accuracy: a
> guest counting instructions-retired or branches-retired undercounts by
> the instructions KVM emulates in host context, i.e. the behavior that
> predates the accounting cited above. Hardware-executed guest
> instructions continue to be counted by the backing perf_event, and its
> overflow/PMI path is unchanged.

Do you *need* per-VM control, or would a module param (or a magic value for
enable_pmu) suffice?  While I mostly buy the "it used to work this way" argument
(just "mostly", because that commit landed 4.5 years ago), I'm not exactly keen
on adding uAPI that is effectively "re-introduce a bug to workaround fundamental
design issues in KVM's emulated PMU implementation".

If we do add proper uAPI, we should extend KVM_CAP_PMU_CAPABILITY, not add a new
CAP entirely.

^ permalink raw reply

* Re: [PATCH v18 09/13] cxl: Update CXL Endpoint AER handler
From: Dave Jiang @ 2026-07-20 22:25 UTC (permalink / raw)
  To: Terry Bowman, Bjorn Helgaas, Dan Williams, Ira Weiny,
	Jonathan Cameron, Len Brown, Rafael J . Wysocki, Robert Richter
  Cc: linux-acpi, linux-cxl, linux-doc, linux-kernel, linux-pci,
	linuxppc-dev, Alejandro Lucero, Alison Schofield, Ankit Agrawal,
	Ard Biesheuvel, Ben Cheatham, Borislav Petkov, Breno Leitao,
	Davidlohr Bueso, Fabio M . De Francesco, Gregory Price,
	Hanjun Guo, Jonathan Corbet, Kees Cook,
	Kuppuswamy Sathyanarayanan, Li Ming, Mahesh J Salgaonkar,
	Mauro Carvalho Chehab, Oliver O'Halloran, Shiju Jose,
	Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck, Vishal Verma
In-Reply-To: <20260717222706.3540281-10-terry.bowman@amd.com>



On 7/17/26 3:27 PM, Terry Bowman wrote:
> Rename cxl_error_detected() to cxl_pci_error_detected() and rename
> the struct pci_error_handlers instance to cxl_pci_error_handlers to
> avoid shadowing the struct type tag.
> 
> Document the unconditional CXL RAS read policy: on a dead link,
> readl() returns 0xFFFFFFFF which is interpreted as UCE bits set and
> triggers a panic. If RAS registers are not mapped the read is
> skipped and the frozen/perm_failure switch cases defer to AER
> recovery for devices without active CXL.mem traffic.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> 
> Changes in v17->v18:
> - Fix cxl_pci_error_detected() to use find_cxl_port_by_uport() and port->uport_dev
> - Read CXL RAS unconditionally; panic on UCE regardless of channel state
> - Document unconditional read policy and 0xFFFFFFFF behavior in comment
> - Drop guard removal paragraph from commit message (not in this diff)
> - Drop Reviewed-by tags pending re-review after message change
> 
> Changes in v16->v17:
> - Rename pci_error_handlers struct instance to cxl_pci_error_handlers to
>   avoid shadowing the struct type tag.
> - Restore scoped_guard(device) and dev->driver check around AER read.
> - NULL-check find_cxl_port_by_dev() before deref of port->uport_dev.
> - Updated commit message. (Terry)
> - Add scope cleanup for port variable in cxl_pci_error_detected() (Terry)
> - Drop cxl_uncor_aer_present(), rely on AER state
> 
> Changes in v15->v16:
> - Update commit message (DaveJ)
> - s/cxl_handle_aer()/cxl_uncor_aer_present()/g (Jonathan)
> - cxl_uncor_aer_present(): Leave original result calculation based on
>   if a UCE is present and the provided state (Terry)
> - Add call to pci_print_aer(). AER fails to log because is upstream
>   link (Terry)
> 
> Changes in v14->v15:
> - Update commit message and title. Added Bjorn's ack.
> - Move CE and UCE handling logic here
> 
> Changes in v13->v14:
> - Add Dave Jiang's review-by
> - Update commit message & headline (Bjorn)
> - Refactor cxl_port_error_detected()/cxl_port_cor_error_detected() to
>   one line (Jonathan)
> - Remove cxl_walk_port() (Dan)
> - Remove cxl_pci_drv_bound(). Check for 'is_cxl' parent port is
>   sufficient (Dan)
> - Remove device_lock_if()
> - Combined CE and UCE here (Terry)
> 
> Changes in v12->v13:
> - Move get_pci_cxl_host_dev() and cxl_handle_proto_error() to Dequeue
>   patch (Terry)
> - Remove EP case in cxl_get_ras_base(), not used. (Terry)
> - Remove check for dport->dport_dev (Dave)
> - Remove whitespace (Terry)
> 
> Changes in v11->v12:
> - Add call to cxl_pci_drv_bound() in cxl_handle_proto_error() and
>   pci_to_cxl_dev()
> - Change cxl_error_detected() -> cxl_cor_error_detected()
> - Remove NULL variable assignments
> - Replace bus_find_device() with find_cxl_port_by_uport() for upstream
>   port searches.
> 
> Changes in v10->v11:
> - None
> ---
>  drivers/cxl/core/ras.c | 24 +++++++++++++++---------
>  drivers/cxl/cxlpci.h   |  8 ++++----
>  drivers/cxl/pci.c      | 12 ++++++------
>  3 files changed, 25 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 6f4a3c1b0bb85..d5dc2c22565da 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -328,10 +328,8 @@ bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport, void __iomem
>  	return true;
>  }
>  
> -
> -
> -pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> -				    pci_channel_state_t state)
> +pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
> +					pci_channel_state_t state)
>  {
>  	struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
>  	bool ue = false;
> @@ -349,10 +347,18 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  		}
>  
>  		/*
> -		 * A frozen channel indicates an impending reset which is fatal to
> -		 * CXL.mem operation, and will likely crash the system. On the off
> -		 * chance the situation is recoverable dump the status of the RAS
> -		 * capability registers and bounce the active state of the memdev.
> +		 * The CXL RAS read is unconditional regardless of channel
> +		 * state.  Any uncorrectable error bit set in the CXL RAS
> +		 * status register triggers a panic because CXL.mem cache
> +		 * coherency is already lost; continuing risks silent data
> +		 * corruption across interleaved HDM regions.
> +		 *
> +		 * On a dead link readl() returns 0xFFFFFFFF which sets all
> +		 * UCE bits and also triggers the panic - this is intentional.
> +		 * If RAS registers are not mapped the read is skipped, the
> +		 * panic is not reached, and the frozen/perm_failure switch
> +		 * cases below handle AER recovery for devices without active
> +		 * CXL.mem traffic.
>  		 */
>  		ue = cxl_handle_ras(port, NULL, to_ras_base(port, NULL));
>  	}
> @@ -380,7 +386,7 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  	}
>  	return PCI_ERS_RESULT_NEED_RESET;
>  }
> -EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
> +EXPORT_SYMBOL_NS_GPL(cxl_pci_error_detected, "CXL");
>  
>  static void cxl_handle_proto_error(struct pci_dev *pdev, struct cxl_port *port,
>  				   struct cxl_dport *dport, int severity)
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 06c46adcf0f6c..8aeb80a4e5732 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -89,13 +89,13 @@ struct cxl_dev_state;
>  void read_cdat_data(struct cxl_port *port);
>  
>  #ifdef CONFIG_CXL_RAS
> -pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> -				    pci_channel_state_t state);
> +pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
> +					pci_channel_state_t state);
>  void devm_cxl_dport_rch_ras_setup(struct cxl_dport *dport);
>  void devm_cxl_port_ras_setup(struct cxl_port *port);
>  #else
> -static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> -						  pci_channel_state_t state)
> +static inline pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
> +						      pci_channel_state_t state)
>  {
>  	return PCI_ERS_RESULT_NONE;
>  }
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 5c21db36073fe..6cf1db7b85020 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1000,18 +1000,18 @@ static void cxl_reset_done(struct pci_dev *pdev)
>  	}
>  }
>  
> -static const struct pci_error_handlers cxl_error_handlers = {
> -	.error_detected	= cxl_error_detected,
> -	.slot_reset	= cxl_slot_reset,
> -	.resume		= cxl_error_resume,
> -	.reset_done	= cxl_reset_done,
> +static const struct pci_error_handlers cxl_pci_error_handlers = {
> +	.error_detected		= cxl_pci_error_detected,
> +	.slot_reset		= cxl_slot_reset,
> +	.resume			= cxl_error_resume,
> +	.reset_done		= cxl_reset_done,
>  };
>  
>  static struct pci_driver cxl_pci_driver = {
>  	.name			= KBUILD_MODNAME,
>  	.id_table		= cxl_mem_pci_tbl,
>  	.probe			= cxl_pci_probe,
> -	.err_handler		= &cxl_error_handlers,
> +	.err_handler		= &cxl_pci_error_handlers,
>  	.dev_groups		= cxl_rcd_groups,
>  	.driver	= {
>  		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,


^ permalink raw reply

* Re: [PATCH v18 08/13] cxl/pci: Thread port and dport through RAS handling helpers
From: Dave Jiang @ 2026-07-20 22:15 UTC (permalink / raw)
  To: Terry Bowman, Bjorn Helgaas, Dan Williams, Ira Weiny,
	Jonathan Cameron, Len Brown, Rafael J . Wysocki, Robert Richter
  Cc: linux-acpi, linux-cxl, linux-doc, linux-kernel, linux-pci,
	linuxppc-dev, Alejandro Lucero, Alison Schofield, Ankit Agrawal,
	Ard Biesheuvel, Ben Cheatham, Borislav Petkov, Breno Leitao,
	Davidlohr Bueso, Fabio M . De Francesco, Gregory Price,
	Hanjun Guo, Jonathan Corbet, Kees Cook,
	Kuppuswamy Sathyanarayanan, Li Ming, Mahesh J Salgaonkar,
	Mauro Carvalho Chehab, Oliver O'Halloran, Shiju Jose,
	Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck, Vishal Verma
In-Reply-To: <20260717222706.3540281-9-terry.bowman@amd.com>



On 7/17/26 3:27 PM, Terry Bowman wrote:
> From: Dan Williams <djbw@kernel.org>
> 
> The callers of cxl_handle_ras() and cxl_handle_cor_ras() already hold
> a struct cxl_port * and struct cxl_dport * for the device being
> handled. Passing a generic struct device * requires is_cxl_memdev()
> to distinguish Endpoints from ports at trace emission time. Threading
> port and dport directly enables is_cxl_endpoint(port) and explicit
> dport/port branching for cleaner trace dispatch.
> 
> Refactor cxl_handle_ras() and cxl_handle_cor_ras() to accept struct
> cxl_port * and struct cxl_dport * directly. The CXL RAS trace event
> emission logic is split into three branches: Endpoint events are
> identified via is_cxl_endpoint(port) and emit with the memdev, dport
> events emit with dport->dport_dev, and Upstream Port events fall back
> to port->uport_dev.
> 
> Update cxl_handle_rdport_errors() in ras_rch.c and
> cxl_handle_proto_error() in ras.c to pass port and dport to the
> refactored functions.
> 
> RCH Downstream Port correctable trace events now report the dport
> device (dport->dport_dev) as a consequence of threading port and dport
> through the RAS helpers. The following trace event rework ("cxl: Add
> port and dport identifiers to CXL AER trace events") adds explicit
> memdev, port, dport, and host fields that provide full context for
> all device types.
> 
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Dan Williams <djbw@kernel.org>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> 
> ---
> 
> Changes in v17 -> v18:
> - New patch.
> ---
>  drivers/cxl/core/core.h    | 12 ++++++++----
>  drivers/cxl/core/ras.c     | 29 +++++++++++++++--------------
>  drivers/cxl/core/ras_rch.c |  2 +-
>  3 files changed, 24 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 272634ff2615b..5ca1275fd8f35 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -185,10 +185,12 @@ static inline struct device *dport_to_host(struct cxl_dport *dport)
>  #ifdef CONFIG_CXL_RAS
>  void cxl_ras_init(void);
>  void cxl_ras_exit(void);
> -bool cxl_handle_ras(struct device *dev, void __iomem *ras_base);
> +bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport,
> +		    void __iomem *ras_base);
>  void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port,
>  		     struct cxl_dport *dport);
> -void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);
> +void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport,
> +			void __iomem *ras_base);
>  void cxl_dport_map_rch_aer(struct cxl_dport *dport);
>  void cxl_disable_rch_root_ints(struct cxl_dport *dport);
>  void cxl_handle_rdport_errors(struct pci_dev *pdev);
> @@ -197,13 +199,15 @@ void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
>  #else
>  static inline void cxl_ras_init(void) { }
>  static inline void cxl_ras_exit(void) { }
> -static inline bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
> +static inline bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport,
> +				  void __iomem *ras_base)
>  {
>  	return false;
>  }
>  static inline void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port,
>  				   struct cxl_dport *dport) { }
> -static inline void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { }
> +static inline void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport,
> +				      void __iomem *ras_base) { }
>  static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
>  static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
>  static inline void cxl_handle_rdport_errors(struct pci_dev *pdev) { }
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 9a142abcf4f8b..6f4a3c1b0bb85 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -232,7 +232,6 @@ void __iomem *to_ras_base(struct cxl_port *port, struct cxl_dport *dport)
>  
>  void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port, struct cxl_dport *dport)
>  {
> -	struct device *dev = dport ? dport->dport_dev : port->uport_dev;
>  	void __iomem *ras_base = to_ras_base(port, dport);
>  
>  	if (!ras_base) {
> @@ -241,14 +240,14 @@ void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port, struct cxl_dpo
>  		return;
>  	}
>  
> -	if (cxl_handle_ras(dev, ras_base))
> +	if (cxl_handle_ras(port, dport, ras_base))
>  		panic("CXL cachemem error");
>  
>  	dev_dbg(&pdev->dev,
>  		"CXL UCE signaled but no CXL RAS status bits set\n");
>  }
>  
> -void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
> +void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport, void __iomem *ras_base)
>  {
>  	u32 status;
>  	void __iomem *addr;
> @@ -260,10 +259,12 @@ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
>  	status = readl(addr);
>  	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
>  		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> -		if (is_cxl_memdev(dev))
> -			trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
> +		if (is_cxl_endpoint(port))
> +			trace_cxl_aer_correctable_error(to_cxl_memdev(port->uport_dev), status);
> +		else if (dport)
> +			trace_cxl_port_aer_correctable_error(dport->dport_dev, status);
>  		else
> -			trace_cxl_port_aer_correctable_error(dev, status);
> +			trace_cxl_port_aer_correctable_error(port->uport_dev, status);
>  	}
>  }
>  
> @@ -288,7 +289,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
>   * Log the state of the RAS status registers and prepare them to log the
>   * next error status. Return 1 if reset needed.
>   */
> -bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
> +bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport, void __iomem *ras_base)
>  {
>  	u32 hl[CXL_HEADERLOG_TRACE_SIZE_U32] = {};
>  	void __iomem *addr;
> @@ -315,10 +316,12 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
>  	}
>  
>  	header_log_copy(ras_base, hl);
> -	if (is_cxl_memdev(dev))
> -		trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
> +	if (is_cxl_endpoint(port))
> +		trace_cxl_aer_uncorrectable_error(to_cxl_memdev(port->uport_dev), status, fe, hl);
> +	else if (dport)
> +		trace_cxl_port_aer_uncorrectable_error(dport->dport_dev, status, fe, hl);
>  	else
> -		trace_cxl_port_aer_uncorrectable_error(dev, status, fe, hl);
> +		trace_cxl_port_aer_uncorrectable_error(port->uport_dev, status, fe, hl);
>  
>  	writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
>  
> @@ -351,7 +354,7 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  		 * chance the situation is recoverable dump the status of the RAS
>  		 * capability registers and bounce the active state of the memdev.
>  		 */
> -		ue = cxl_handle_ras(port->uport_dev, to_ras_base(port, NULL));
> +		ue = cxl_handle_ras(port, NULL, to_ras_base(port, NULL));
>  	}
>  
>  	/*
> @@ -382,10 +385,8 @@ EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
>  static void cxl_handle_proto_error(struct pci_dev *pdev, struct cxl_port *port,
>  				   struct cxl_dport *dport, int severity)
>  {
> -	struct device *dev = dport ? dport->dport_dev : port->uport_dev;
> -
>  	if (severity == AER_CORRECTABLE)
> -		cxl_handle_cor_ras(dev, to_ras_base(port, dport));
> +		cxl_handle_cor_ras(port, dport, to_ras_base(port, dport));
>  	else
>  		cxl_do_recovery(pdev, port, dport);
>  }
> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
> index f2d2fb83758b9..f4b98f2c11a1c 100644
> --- a/drivers/cxl/core/ras_rch.c
> +++ b/drivers/cxl/core/ras_rch.c
> @@ -118,7 +118,7 @@ void cxl_handle_rdport_errors(struct pci_dev *pdev)
>  
>  	pci_print_aer(pdev, severity, &aer_regs);
>  	if (severity == AER_CORRECTABLE)
> -		cxl_handle_cor_ras(&pdev->dev, to_ras_base(port, dport));
> +		cxl_handle_cor_ras(dport->port, dport, to_ras_base(port, dport));
>  	else
>  		cxl_do_recovery(pdev, dport->port, dport);
>  }


^ permalink raw reply

* Re: [PATCH bpf-next v5 8/8] selftests: net: add test for XDP_PASS skb checksum invalidation
From: Lorenzo Bianconi @ 2026-07-20 22:08 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Andrew Lunn, Tony Nguyen, Przemek Kitszel, Alexander Lobakin,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Maciej Fijalkowski, Jonathan Corbet, Shuah Khan,
	Kumar Kartikeya Dwivedi, Emil Tsalapatis, Vladimir Vdovin,
	Jakub Sitnicki, netdev, bpf, intel-wired-lan, linux-kselftest,
	linux-doc
In-Reply-To: <al53WS9FMTnu7tBI@devvm7509.cco0.facebook.com>

[-- Attachment #1: Type: text/plain, Size: 4470 bytes --]

> On 07/16, Lorenzo Bianconi wrote:
> > On Jul 16, Stanislav Fomichev wrote:
> > > On 07/15, Lorenzo Bianconi wrote:
> > > > Add a test that verifies skb->ip_summed is set to CHECKSUM_NONE
> > > > when a device running in XDP mode creates an skb from a xdp_buff
> > > > if the attached ebpf program returns an XDP_PASS.
> > > > The test attaches an XDP program returning XDP_PASS, and a TC
> > > > ingress program that runs the bpf_skb_rx_checksum() kfunc to
> > > > inspect the resulting skb. After XDP_PASS the driver must invalidate
> > > > any previously computed hardware RX checksum since XDP may have
> > > > modified the packet data.
> > > > The BPF program counts packets per checksum type in a map, and the
> > > > test runner verifies that after sending traffic the CHECKSUM_NONE
> > > > counter is non-zero while CHECKSUM_UNNECESSARY and CHECKSUM_COMPLETE
> > > > counters are zero.
> > > > 
> > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > ---
> > > >  Documentation/networking/xdp-rx-metadata.rst       |  5 ++
> > > >  .../selftests/drivers/net/hw/xdp_metadata.py       | 55 +++++++++++++++-
> > > >  .../selftests/net/lib/skb_metadata_csum.bpf.c      | 73 ++++++++++++++++++++++
> > > >  3 files changed, 132 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > > > index 93918b3769a3..7434ac98242a 100644
> > > > --- a/Documentation/networking/xdp-rx-metadata.rst
> > > > +++ b/Documentation/networking/xdp-rx-metadata.rst
> > > > @@ -90,6 +90,11 @@ conversion, and the XDP metadata is not used by the kernel when building
> > > >  ``skbs``. However, TC-BPF programs can access the XDP metadata area using
> > > >  the ``data_meta`` pointer.
> > > 
> > > [..]
> > > 
> > > > +If a driver is running in XDP mode, any existing hardware RX checksum
> > > > +(``CHECKSUM_UNNECESSARY`` or ``CHECKSUM_COMPLETE``) must be invalidated
> > > > +by setting ``skb->ip_summed`` to ``CHECKSUM_NONE`` before passing the
> > > > +skb to the kernel, since XDP may have modified the packet data.
> > > > +
> > > >  In the future, we'd like to support a case where an XDP program
> > > >  can override some of the metadata used for building ``skbs``.
> > > 
> > > Sorry for keeping nitpicking on this, but I'm still not convinced that
> > > it is what we currently do. From my previous reply:
> > 
> > no worries :)
> > My current take-away from the previous discussion is we just need to document
> > what would be the driver expected behaviour adding a kselftest for it (without
> > modifying any driver).
> > 
> > > 
> > > > > Looking at a few drivers:
> > > > > - bnxt (bnxt_rx_pkt) does UNNECESSARY - ok
> > > > > - mlx5 (mlx5e_handle_csum) does UNNECESSARY and skips COMPLETE if there is
> > > > >   bpf prog attached
> > > > > - fbnic (fbnic_rx_csum) - can do COMPLETE even with xdp attached?
> > > > > - gve (gve_rx) - can do COMPLETE even with xdp attached?
> > > 
> > > (although for gve I might be wrong, there is also gve_rx_skb_csum that only
> > > does UNNECESSARY).
> > > 
> > > I'd wait for Jakub to chime in, but it feels like we should just document
> > > what we currently do as a recommended approach: for the drivers
> > > that support COMPLETE, do not report it when the bpf program is attached.
> > > Both NONE and UNNECESSARY are ok.
> > 
> > I am not completely sure the UNNECESSARY case is different from the COMPLETE
> > one. What are we supposed to do if the driver reports UNNECESSARY and the ebpf
> > program modifies some fields covered by the rx-checksum?
> 
> For unnecessary, I think the safe expectation is that the bpf program
> will update the value of the checksum in the packet if it touches the data?

I do not have a strong opinion on it.
@Jakub: any input on it?

> 
> > > Also, did you run this test on real HW? NIPA now has HW tests, maybe it
> > > makes sense to route this series via net-next to get the real coverage?
> > 
> > What about splitting this series and have two different series:
> > - bpf-next: add xdp rx kfunc and related selftest
> > - net-next: add kselftest for the driver expected behaviour.
> > 
> > What do you think?
> 
> I'd post everything to net-next to get the HW coverage. Once you get all
> the acks we can ask the maintainers' guidance.

ack, I am fine with that.

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH] power: supply: bd71828: add a terminating table border
From: Sebastian Reichel @ 2026-07-20 22:03 UTC (permalink / raw)
  To: linux-doc, Randy Dunlap; +Cc: Andreas Kemnade, Matti Vaittinen, linux-pm
In-Reply-To: <20260620011821.3568674-1-rdunlap@infradead.org>


On Fri, 19 Jun 2026 18:18:21 -0700, Randy Dunlap wrote:
> Fix a documentation build error by adding a bottom table border:
> 
> Documentation/ABI/testing/sysfs-class-power-bd71828:1: ERROR: Malformed table.
> No bottom table border found.
> ============  ===========================================
> 1             automatic adjustment of input current limit
> 0             no adjustment of input current limit. This
>               helps for more unusual power sources like
>               solar modules. [docutils]
> 
> [...]

Applied, thanks!

[1/1] power: supply: bd71828: add a terminating table border
      commit: b056f21a38276ead20353d71d50a52206609d242

Best regards,
-- 
Sebastian Reichel <sebastian.reichel@collabora.com>


^ permalink raw reply

* Re: [PATCH v18 04/13] cxl: Rename find_cxl_port() to find_cxl_port_by_dport()
From: Jonathan Cameron @ 2026-07-20 22:02 UTC (permalink / raw)
  To: Terry Bowman
  Cc: Bjorn Helgaas, Dan Williams, Dave Jiang, Ira Weiny, Len Brown,
	Rafael J . Wysocki, Robert Richter, linux-acpi, linux-cxl,
	linux-doc, linux-kernel, linux-pci, linuxppc-dev,
	Alejandro Lucero, Alison Schofield, Ankit Agrawal, Ard Biesheuvel,
	Ben Cheatham, Borislav Petkov, Breno Leitao, Davidlohr Bueso,
	Fabio M . De Francesco, Gregory Price, Hanjun Guo,
	Jonathan Corbet, Kees Cook, Kuppuswamy Sathyanarayanan, Li Ming,
	Mahesh J Salgaonkar, Mauro Carvalho Chehab, Oliver O'Halloran,
	Shiju Jose, Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck,
	Vishal Verma
In-Reply-To: <20260717222706.3540281-5-terry.bowman@amd.com>

On Fri, 17 Jul 2026 17:26:57 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> From: Dan Williams <djbw@kernel.org>
> 
> find_cxl_port() and find_cxl_port_by_uport() are internal port lookup
> functions that search the CXL bus by dport and uport respectively, but
> their names do not make the lookup method clear.
> 
> Rename find_cxl_port() to find_cxl_port_by_dport() to make the lookup
> method explicit and consistent with find_cxl_port_by_uport(). Both
> functions remain static to port.c; the upcoming patch that adds the
> first cross-file caller will widen their scope.

Could have mentioned the __ one, but it is fairly obvious why you did
that even without saying it.

Nice improvement
Reviewed-by: Jonathan Cameron <jonathan.cameron@oss.qualcomm.com>

> 
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Dan Williams <djbw@kernel.org>
> 
> ---
> 
> Changes in v17 -> v18:
> - New commit
> ---
>  drivers/cxl/core/port.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index f90f899c31d07..cadb51f70f854 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1379,7 +1379,7 @@ static int match_port_by_dport(struct device *dev, const void *data)
>  	return dport != NULL;
>  }
>  
> -static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
> +static struct cxl_port *__find_cxl_port_by_dport(struct cxl_find_port_ctx *ctx)
>  {
>  	struct device *dev;
>  
> @@ -1392,8 +1392,16 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
>  	return NULL;
>  }
>  
> -static struct cxl_port *find_cxl_port(struct device *dport_dev,
> -				      struct cxl_dport **dport)
> +/**
> + * find_cxl_port_by_dport - find a cxl_port by one of its targets
> + * @dport_dev: device representing the dport target
> + * @dport: optional output of the 'struct cxl_dport' companion of the @dport_dev
> + *
> + * Return a 'struct cxl_port' with an elevated reference if found. Use
> + * __free(put_cxl_port) to release.
> + */
> +static struct cxl_port *find_cxl_port_by_dport(struct device *dport_dev,
> +					       struct cxl_dport **dport)
>  {
>  	struct cxl_find_port_ctx ctx = {
>  		.dport_dev = dport_dev,
> @@ -1401,7 +1409,7 @@ static struct cxl_port *find_cxl_port(struct device *dport_dev,
>  	};
>  	struct cxl_port *port;
>  
> -	port = __find_cxl_port(&ctx);
> +	port = __find_cxl_port_by_dport(&ctx);
>  	return port;
>  }
>  
> @@ -1895,14 +1903,14 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, "CXL");
>  struct cxl_port *cxl_pci_find_port(struct pci_dev *pdev,
>  				   struct cxl_dport **dport)
>  {
> -	return find_cxl_port(pdev->dev.parent, dport);
> +	return find_cxl_port_by_dport(pdev->dev.parent, dport);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_pci_find_port, "CXL");
>  
>  struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
>  				   struct cxl_dport **dport)
>  {
> -	return find_cxl_port(grandparent(&cxlmd->dev), dport);
> +	return find_cxl_port_by_dport(grandparent(&cxlmd->dev), dport);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, "CXL");
>  


^ permalink raw reply

* Re: [patch 00/18] entry: Consolidate and rework syscall entry handling
From: Magnus Lindholm @ 2026-07-20 22:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Michael Ellerman, Shrikanth Hegde,
	linuxppc-dev, Kees Cook, Huacai Chen, loongarch, Paul Walmsley,
	Palmer Dabbelt, linux-riscv, Sven Schnelle, linux-s390, x86,
	Mark Rutland, Jinjie Ruan, Andy Lutomirski, Oleg Nesterov,
	Richard Henderson, Russell King, Catalin Marinas, Guo Ren,
	Geert Uytterhoeven, Thomas Bogendoerfer, Helge Deller,
	Yoshinori Sato, Richard Weinberger, Chris Zankel,
	linux-arm-kernel, linux-alpha, linux-csky, linux-m68k, linux-mips,
	linux-parisc, linux-sh, linux-um, Arnd Bergmann, Vineet Gupta,
	Will Deacon, Brian Cain, Michal Simek, Dinh Nguyen,
	David S. Miller, Andreas Larsson, linux-snps-arc, linux-hexagon,
	linux-openrisc, sparclinux, linux-arch, Michal Suchánek,
	Jonathan Corbet, linux-doc
In-Reply-To: <87wluplb6k.ffs@fw13>

Thomas,

On Mon, Jul 20, 2026 at 9:21 PM Thomas Gleixner <tglx@kernel.org> wrote:
>
>
> Similar to what Peter and me suggested in the related discussion
> vs. s390 which has a similar issue, you can just have a dedicated
> syscall_return member in your pt_regs struct, which is preset to -ENOSYS
> and operate on that. You have an unused padding entry there which means
> it won't even change the size.
>

Thanks, that makes sense in principle.

One complication is that my Alpha GENERIC_ENTRY branch already reuses that
former pt_regs padding slot, while keeping the pt_regs size and stack
alignment unchanged. The extra syscall-entry bookkeeping I needed so far lives
in thread_info rather than adding more fields to pt_regs.

So the padding slot is not free in my current branch. I can still look at
whether a dedicated syscall_return scratch value would be cleaner, but the
tested version currently handles this by setting the default -ENOSYS result
only on the no-dispatch path, after generic entry processing, and only if
ptrace/seccomp/BPF did not already provide a return value.

>
> You were on CC for the V2, no?
>

Sorry, I must have missed that.

> The whole pile including Jinjie's seccomp bypass fix is in
>
>     git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/entry
>
> and is targeted for the 7.3 merge window.
>

Great!

> If you want to base your stuff on that for 7.3, I can add a tag which
> makes it immutable so it can be pulled into the alpha tree.
>

Yes, please. There is still some ongoing testing/review of my Alpha
GENERIC_ENTRY series, but assuming it is ready in time for the 7.3 merge
window, an immutable tag would be very helpful.

Magnus

^ permalink raw reply

* Re: [PATCH v7 03/12] PCI: liveupdate: Track incoming preserved PCI devices
From: David Matlack @ 2026-07-20 22:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kexec, linux-doc, linux-kernel, linux-mm, linux-pci,
	Adithya Jayachandran, Alexander Graf, Bjorn Helgaas, Chris Li,
	David Rientjes, Jacob Pan, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Leon Romanovsky, Lukas Wunner, Mike Rapoport,
	Parav Pandit, Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	William Tu, Yi Liu
In-Reply-To: <20260717163820.6bc8c13b@shazbot.org>

On Fri, Jul 17, 2026 at 3:38 PM Alex Williamson <alex@shazbot.org> wrote:
>
> On Fri, 10 Jul 2026 21:26:06 +0000
> David Matlack <dmatlack@google.com> wrote:
>
> > @@ -298,6 +377,87 @@ void pci_liveupdate_unpreserve(struct pci_dev *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(pci_liveupdate_unpreserve);
> >
> > +static struct pci_flb_incoming *pci_liveupdate_flb_get_incoming(void)
> > +{
> > +     struct pci_flb_incoming *incoming = NULL;
> > +     int ret;
> > +
> > +     ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)&incoming);
> > +
> > +     /* Live Update is not enabled. */
> > +     if (ret == -EOPNOTSUPP)
> > +             return NULL;
> > +
> > +     /* Live Update is enabled, but there is no incoming FLB data. */
> > +     if (ret == -ENODATA)
> > +             return NULL;
> > +
> > +     /*
> > +      * Live Update is enabled and there is incoming FLB data, but none of it
> > +      * matches pci_liveupdate_flb.compatible.
> > +      *
> > +      * This could mean that no PCI FLB data was passed by the previous
> > +      * kernel, but it could also mean the previous kernel used a different
> > +      * compatibility string (i.e. a different ABI).
> > +      */
> > +     if (ret == -ENOENT) {
> > +             pr_info_once("No incoming FLB matched %s\n", pci_liveupdate_flb.compatible);
> > +             return NULL;
> > +     }
> > +
> > +     /*
> > +      * There is incoming FLB data that matches pci_liveupdate_flb.compatible
> > +      * but it cannot be retrieved.
> > +      */
> > +     if (ret)
> > +             panic("Failed to retrieve incoming FLB data (%d)\n", ret);
> > +
> > +     return incoming;
> > +}
>
> I'm having trouble following the error escalation here.  What's
> fundamentally the difference between FLB data being provided and not
> compatible (subtle log message) versus FLB data being provided and
> compatible but we cannot access it (panic!)?

There is no fundamental difference IMO and I think we should handle
them the same way (panic). The problem is the PCI core cannot actually
distinguish between "PCI FLB provided and not compatible" (should
panic) and "no PCI FLB provided" (should log). I would like to propose
changes to LUO so that the FLB handlers can detect the former
situation but that requires some non-trivial work that I didn't want
to block on.

For now, I guess you could say it is up to the user to not perform
Live Update between incompatible versions. I can update the
documentation.

>
> Don't both suggest devices are running but we can't get their FLB data
> to continue letting them run?
>
> The errno interpretation is also slightly different than the comment
> above liveupdate_flb_get_incoming():
>
>  * Return: 0 on success, or a negative errno on failure. -ENODATA means no
>  * incoming FLB data, -ENOENT means specific flb not found in the incoming
>  * data, -ENODEV if the FLB's module is unloading, and -EOPNOTSUPP when
>  * live update is disabled or not configured.

I think the only disagreement is the lack of specific -ENODEV
handling. Is that what you are referring to?

>
> Thanks,
> Alex

^ permalink raw reply

* Re: [PATCH v18 03/13] cxl: Tighten CPER kfifo registration API and symbol visibility
From: Jonathan Cameron @ 2026-07-20 21:59 UTC (permalink / raw)
  To: Terry Bowman
  Cc: Bjorn Helgaas, Dan Williams, Dave Jiang, Ira Weiny, Len Brown,
	Rafael J . Wysocki, Robert Richter, linux-acpi, linux-cxl,
	linux-doc, linux-kernel, linux-pci, linuxppc-dev,
	Alejandro Lucero, Alison Schofield, Ankit Agrawal, Ard Biesheuvel,
	Ben Cheatham, Borislav Petkov, Breno Leitao, Davidlohr Bueso,
	Fabio M . De Francesco, Gregory Price, Hanjun Guo,
	Jonathan Corbet, Kees Cook, Kuppuswamy Sathyanarayanan, Li Ming,
	Mahesh J Salgaonkar, Mauro Carvalho Chehab, Oliver O'Halloran,
	Shiju Jose, Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck,
	Vishal Verma
In-Reply-To: <20260717222706.3540281-4-terry.bowman@amd.com>

On Fri, 17 Jul 2026 17:26:56 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> From: Dan Williams <djbw@kernel.org>
> 
Hi Terry, Dan,

There are a few different things in here. Having them all together
may make it a little harder to keep track of the different changes.

> Tighten the CPER protocol error kfifo registration API and symbol
> visibility.
> 
> Use EXPORT_SYMBOL_FOR_MODULES() instead of EXPORT_SYMBOL_NS_GPL() for
> the CPER kfifo registration symbols. This names the consuming module
> explicitly and gives compile-time enforcement.
To me this is one patch.
> 
> Drop the work_struct argument from the unregister path. Change the
> WARN_ONCE condition to a NULL check since there is no caller pointer
> to compare against anymore.

This is one.

> 
> Change register/unregister return types to void. Flag double registration
> with WARN_ONCE() inside the lock instead of returning an error.
Another change.

That covers why for the register, for unregister maybe mention it was
unchecked anyway.

> 
> Change cxl_ras_init() to void because there is one consumer and one producer
> so the error return was unnecessary. Remove the now-dead error check in
> cxl_core_init().
This is change

> 
> Add a diagnostic log when the driver is not bound in
> cxl_cper_handle_prot_err().
Unrelated really to all the rest. 

All nice changes and maybe a few of them can be combined but to me this
is too many in one.  Anyhow I might be letting perfect be enemy of
good and all that so if others think it should go in as one then
I'm fine with that.

Trivial thing inline todo with formatting consistency.

> 
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Dan Williams <djbw@kernel.org>
> 
> ---
> 
> Changes in v17 -> v18:
> - New patch.
> ---
>  drivers/acpi/apei/ghes.c | 32 +++++++++++++++-----------------
>  drivers/cxl/core/core.h  |  7 ++-----
>  drivers/cxl/core/port.c  |  6 +-----
>  drivers/cxl/core/ras.c   | 12 +++++++-----
>  include/cxl/event.h      | 17 ++++++-----------
>  5 files changed, 31 insertions(+), 43 deletions(-)
> 
> diff --git a/include/cxl/event.h b/include/cxl/event.h
> index ff97fea718d2c..3471d4f75c025 100644
> --- a/include/cxl/event.h
> +++ b/include/cxl/event.h
> @@ -287,10 +287,10 @@ struct cxl_cper_prot_err_work_data {

> -static inline int cxl_cper_register_prot_err_work(struct work_struct *work)
> +static inline void cxl_cper_register_prot_err_work(struct work_struct *work)
>  {
> -	return 0;
>  }

I'd go a bit long and move { } up.  Compact stubs are always nice!


> -static inline int cxl_cper_unregister_prot_err_work(struct work_struct *work)
> +static inline void cxl_cper_unregister_prot_err_work(void)

This one is shorter than the one you did it for above!

>  {
> -	return 0;
>  }
>  static inline int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
>  {


^ permalink raw reply

* Re: [PATCH v7 03/12] PCI: liveupdate: Track incoming preserved PCI devices
From: David Matlack @ 2026-07-20 21:54 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: kexec, linux-doc, linux-kernel, linux-mm, linux-pci,
	Adithya Jayachandran, Alexander Graf, Alex Williamson,
	Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
	Lukas Wunner, Mike Rapoport, Parav Pandit, Pranjal Shrivastava,
	Pratyush Yadav, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, William Tu, Yi Liu
In-Reply-To: <178432481253.189683.3297727348836286619.b4-review@b4>

On Fri, Jul 17, 2026 at 2:47 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> On Fri, 10 Jul 2026 21:26:06 +0000, David Matlack <dmatlack@google.com> wrote:
> > diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
> > index 03075ce06ac9..df6a02240aa4 100644
> > --- a/drivers/pci/liveupdate.c
> > +++ b/drivers/pci/liveupdate.c
> > @@ -298,6 +377,87 @@ void pci_liveupdate_unpreserve(struct pci_dev *dev)
> > [ ... skip 76 lines ... ]
> > +     /*
> > +      * Hold the ref on the incoming FLB until pci_liveupdate_finish() so
> > +      * that dev->liveupdate.incoming cannot get freed while the PCI core
> > +      * has a pointer to it. It's better to leak the incoming FLB than do a
> > +      * use-after-free if driver does not call pci_liveupdate_finish().
> > +      */
>
> I am confused by this. Each preserved PCI device must have an associated
> FD preserved with it via LUO. I.e., vfiofd would need to be preserved. If
> the vfiofd was not reclaimed, and finish is not possible, that vfiofd
> would still be owned by LUO, and therefore PCI FLB refcount would stay
> positive.
>
> However, if finish is possible, and this is the last vfiofd that is
> finished, FLB will be freed as soon as the reference count reaches zero,
> which I would think is the expected behavior.
>
> What is the point of holding a reference here, instead of only for the
> duration of FLB access, i.e. to make sure we are accessing a valid data?

The duration of the access is from here until pci_liveupdate_finish()
because that it when the pointer (dev->liveupdate.incoming) is
cleared. So that is why the PCI core holds the reference from here
until pci_liveupdate_finish().

We could avoid this by deleteing dev->liveupdate.incoming and,
instead, fetching the incoming FLB and doing the xarray lookup every
atime the PCI core needs to access the device's incoming ser struct,
but that would be inefficient.

^ permalink raw reply

* Re: [PATCH v18 07/13] PCI/CXL: Add RCH support to CXL handlers
From: Dave Jiang @ 2026-07-20 21:47 UTC (permalink / raw)
  To: Terry Bowman, Bjorn Helgaas, Dan Williams, Ira Weiny,
	Jonathan Cameron, Len Brown, Rafael J . Wysocki, Robert Richter
  Cc: linux-acpi, linux-cxl, linux-doc, linux-kernel, linux-pci,
	linuxppc-dev, Alejandro Lucero, Alison Schofield, Ankit Agrawal,
	Ard Biesheuvel, Ben Cheatham, Borislav Petkov, Breno Leitao,
	Davidlohr Bueso, Fabio M . De Francesco, Gregory Price,
	Hanjun Guo, Jonathan Corbet, Kees Cook,
	Kuppuswamy Sathyanarayanan, Li Ming, Mahesh J Salgaonkar,
	Mauro Carvalho Chehab, Oliver O'Halloran, Shiju Jose,
	Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck, Vishal Verma
In-Reply-To: <20260717222706.3540281-8-terry.bowman@amd.com>



On 7/17/26 3:27 PM, Terry Bowman wrote:
> Restricted CXL Host (RCH) error handling is a separate path from the
> new CXL Port error handling flow. Fold RCH error handling into the
> Port flow so both share a common entry point.
> 
> Update cxl_rch_handle_error_iter() to forward RCH protocol errors
> through the AER-CXL kfifo. Change cxl_rch_handle_error() return type
> from void to bool so handle_error_source() can determine whether work
> was enqueued and call cxl_proto_err_flush() before AER recovery
> proceeds.
> 
> For RC_END devices, __cxl_proto_err_work_fn() calls
> cxl_handle_rdport_errors() to process RCH Downstream Port errors,
> then falls through to the VH path for RC_END Endpoint handling.
> 
> An RCD uncorrectable CXL RAS error now panics via cxl_do_recovery().
> Before this patch the RCH Downstream Port UCE path called
> cxl_handle_ras() but ignored its return value - no panic. After this
> patch the same condition calls cxl_do_recovery() which panics on
> confirmed UCE. The Endpoint UCE path already panicked at the parent
> commit. This matches the panic policy added in the common CXL Port
> protocol error flow.
> 
> Remove cxl_cor_error_detected() and its .cor_error_detected
> registration in cxl_error_handlers. Correctable Endpoint errors are
> now routed through the AER-CXL kfifo like all other CXL protocol
> errors.
> 
> Drop the cxlds->rcd / cxl_handle_rdport_errors(cxlds) branches from
> cxl_error_detected(). RCH downstream port error handling is now
> performed by __cxl_proto_err_work_fn() via the kfifo path, which
> calls cxl_handle_rdport_errors(pdev) before the common dispatch.
> 
> Change cxl_handle_rdport_errors() to take a struct pci_dev * instead
> of a struct cxl_dev_state *, matching the new caller context. Re-fetch
> dport under guard() to close the TOCTOU window between
> cxl_pci_find_port()'s lockless xa_load() and the first dereference of
> the returned pointer.
> 
> Change find_cxl_port_by_dev() RC_END lookup from
> find_cxl_port_by_dport(dev->parent) to find_cxl_port_by_uport(dev),
> matching the Endpoint lookup path. RC_END Endpoint port resolution
> uses the uport (the RC_END device itself), while the separate RCH
> Downstream Port lookup is handled by cxl_handle_rdport_errors().
> 
> The RCH Downstream Port and the RCD Endpoint (RC_END) are separate
> devices with independent RAS register blocks. cxl_handle_rdport_errors()
> handles the RCH Downstream Port RAS. RCD Endpoint (RC_END) is handled in
> cxl_handle_proto_error().
> 
> Use to_ras_base() in cxl_handle_rdport_errors() instead of referencing
> dport->regs.ras directly. Make to_ras_base() non-static in ras.c and
> declare it in core.h so ras_rch.c can access it. Route all RAS base address lookups
> through a single helper to prepare for CXL RAS error injection testing
> that follows this series.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> 
> ---
> 
> Changes in v17->v18:
> - Pass &pdev->dev instead of dport->port->uport_dev in
>   cxl_handle_rdport_errors() to avoid dropping RCH trace events.
> - Document trace event attribution change.
> - Document removal of cxl_cor_error_detected() and cxlds->rcd branches.
> - Capitalize Endpoint per PCI spec convention.
> - Use to_ras_base() in cxl_handle_rdport_errors() to centralize RAS base
>   address lookup in preparation for error injection testing.
> 
> Changes in v16->v17:
> - Drop now-dead cxlds->rcd branches from cxl_{cor_,}error_detected().
> - Drop duplicate subject line from commit body.
> - Document panic-on-uncorrectable behavior change for RCD path.
> - Document trace event device-name change (memN -> PCI BDF) for RCH path.
> - Rewrite cxl_handle_proto_error() RC_END comment to clarify RCD/RCH shared
>   interrupt relationship
> - Rewrite commit message
> 
> Changes in v16:
> - New commit
> ---
>  drivers/cxl/core/core.h        | 10 +++++--
>  drivers/cxl/core/ras.c         | 50 ++++++++--------------------------
>  drivers/cxl/core/ras_rch.c     | 16 ++++++-----
>  drivers/cxl/cxlpci.h           |  3 --
>  drivers/cxl/pci.c              |  1 -
>  drivers/pci/pcie/aer.c         |  4 +--
>  drivers/pci/pcie/aer_cxl_rch.c | 39 ++++++++++++--------------
>  drivers/pci/pcie/portdrv.h     |  4 +--
>  8 files changed, 48 insertions(+), 79 deletions(-)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 7c70bea06c2db..272634ff2615b 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -191,7 +191,8 @@ void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port,
>  void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);
>  void cxl_dport_map_rch_aer(struct cxl_dport *dport);
>  void cxl_disable_rch_root_ints(struct cxl_dport *dport);
> -void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
> +void cxl_handle_rdport_errors(struct pci_dev *pdev);
> +void __iomem *to_ras_base(struct cxl_port *port, struct cxl_dport *dport);
>  void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
>  #else
>  static inline void cxl_ras_init(void) { }
> @@ -205,7 +206,12 @@ static inline void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port,
>  static inline void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { }
>  static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
>  static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
> -static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
> +static inline void cxl_handle_rdport_errors(struct pci_dev *pdev) { }
> +static inline void __iomem *to_ras_base(struct cxl_port *port,
> +					 struct cxl_dport *dport)
> +{
> +	return NULL;
> +}
>  static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport) { }
>  #endif /* CONFIG_CXL_RAS */
>  
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index b190e69c2d415..9a142abcf4f8b 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -218,7 +218,8 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>  
> -static void __iomem *to_ras_base(struct cxl_port *port, struct cxl_dport *dport)
> +
> +void __iomem *to_ras_base(struct cxl_port *port, struct cxl_dport *dport)
>  {
>  	if (!port)
>  		return NULL;
> @@ -324,37 +325,7 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
>  	return true;
>  }
>  
> -void cxl_cor_error_detected(struct pci_dev *pdev)
> -{
> -	guard(device)(&pdev->dev);
> -	if (!pdev->dev.driver)
> -		return;
> -
> -	struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
> -	if (!port)
> -		return;
> -
> -	if (is_cxl_restricted(pdev)) {
> -		struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> -		struct cxl_memdev *cxlmd = cxlds->cxlmd;
>  
> -		scoped_guard(device, &cxlmd->dev) {
> -			cxl_handle_rdport_errors(cxlds);
> -		}
> -	}
> -
> -	scoped_guard(device, &port->dev) {
> -		if (!port->dev.driver) {
> -			dev_warn(&pdev->dev,
> -				 "%s: port disabled, abort error handling\n",
> -				 dev_name(&port->dev));
> -			return;
> -		}
> -
> -		cxl_handle_cor_ras(port->uport_dev, to_ras_base(port, NULL));
> -	}
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
>  
>  pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  				    pci_channel_state_t state)
> @@ -365,14 +336,6 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  	if (!port)
>  		return PCI_ERS_RESULT_DISCONNECT;
>  
> -	if (is_cxl_restricted(pdev)) {
> -		struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> -		struct cxl_memdev *cxlmd = cxlds->cxlmd;
> -
> -		scoped_guard(device, &cxlmd->dev) {
> -			cxl_handle_rdport_errors(cxlds);
> -		}
> -	}
>  
>  	scoped_guard(device, &port->dev) {
>  		if (!port->dev.driver) {
> @@ -429,6 +392,15 @@ static void cxl_handle_proto_error(struct pci_dev *pdev, struct cxl_port *port,
>  
>  static void __cxl_proto_err_work_fn(struct cxl_proto_err_work_data *wd)
>  {
> +	/*
> +	 * For RC_END (RCD) devices, handle RCH Downstream Port errors
> +	 * first.  cxl_handle_rdport_errors() does its own port lookup
> +	 * and locking, keeping the Downstream Port lock separate from the
> +	 * Endpoint Port lock taken below.
> +	 */
> +	if (is_cxl_restricted(wd->pdev))
> +		cxl_handle_rdport_errors(wd->pdev);
> +
>  	struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_dev(&wd->pdev->dev, NULL);
>  	if (!port) {
>  		dev_err_ratelimited(&wd->pdev->dev,
> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
> index 44b335d560708..f2d2fb83758b9 100644
> --- a/drivers/cxl/core/ras_rch.c
> +++ b/drivers/cxl/core/ras_rch.c
> @@ -1,7 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2025 AMD Corporation. All rights reserved. */
>  
> -#include <linux/types.h>
>  #include <linux/aer.h>
>  #include "cxl.h"
>  #include "core.h"
> @@ -96,18 +95,21 @@ static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
>  	return false;
>  }
>  
> -void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> +void cxl_handle_rdport_errors(struct pci_dev *pdev)
>  {
> -	struct pci_dev *pdev = to_pci_dev(cxlds->dev);
>  	struct aer_capability_regs aer_regs;
>  	struct cxl_dport *dport;
>  	int severity;
>  
> -	struct cxl_port *port __free(put_cxl_port) =
> -		cxl_pci_find_port(pdev, &dport);
> +	struct cxl_port *port __free(put_cxl_port) = cxl_pci_find_port(pdev, NULL);
>  	if (!port)
>  		return;
>  
> +	guard(device)(&port->dev);
> +	dport = cxl_find_dport_by_dev(port, pdev->dev.parent);
> +	if (!dport)
> +		return;
> +
>  	if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
>  		return;
>  
> @@ -116,7 +118,7 @@ void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
>  
>  	pci_print_aer(pdev, severity, &aer_regs);
>  	if (severity == AER_CORRECTABLE)
> -		cxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras);
> +		cxl_handle_cor_ras(&pdev->dev, to_ras_base(port, dport));
>  	else
> -		cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras);
> +		cxl_do_recovery(pdev, dport->port, dport);
>  }
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index b826eb53cf7ba..06c46adcf0f6c 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -89,14 +89,11 @@ struct cxl_dev_state;
>  void read_cdat_data(struct cxl_port *port);
>  
>  #ifdef CONFIG_CXL_RAS
> -void cxl_cor_error_detected(struct pci_dev *pdev);
>  pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  				    pci_channel_state_t state);
>  void devm_cxl_dport_rch_ras_setup(struct cxl_dport *dport);
>  void devm_cxl_port_ras_setup(struct cxl_port *port);
>  #else
> -static inline void cxl_cor_error_detected(struct pci_dev *pdev) { }
> -
>  static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  						  pci_channel_state_t state)
>  {
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 7c6faee7f85ed..5c21db36073fe 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1004,7 +1004,6 @@ static const struct pci_error_handlers cxl_error_handlers = {
>  	.error_detected	= cxl_error_detected,
>  	.slot_reset	= cxl_slot_reset,
>  	.resume		= cxl_error_resume,
> -	.cor_error_detected	= cxl_cor_error_detected,
>  	.reset_done	= cxl_reset_done,
>  };
>  
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 2d9d40528e709..0bd23a65e7ebc 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1185,9 +1185,7 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>  
>  static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>  {
> -	bool cxl_pending = false;
> -
> -	cxl_rch_handle_error(dev, info);
> +	bool cxl_pending = cxl_rch_handle_error(dev, info);
>  
>  	if (is_cxl_error(dev, info))
>  		cxl_pending |= cxl_forward_error(dev, info);
> diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
> index e471eefec9c40..683712fc965ff 100644
> --- a/drivers/pci/pcie/aer_cxl_rch.c
> +++ b/drivers/pci/pcie/aer_cxl_rch.c
> @@ -34,42 +34,37 @@ static bool cxl_error_is_native(struct pci_dev *dev)
>  	return (pcie_ports_native || host->native_aer);
>  }
>  
> +struct cxl_rch_error_ctx {
> +	struct aer_err_info *info;
> +	bool enqueued;
> +};
> +
>  static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
>  {
> -	struct aer_err_info *info = (struct aer_err_info *)data;
> -	const struct pci_error_handlers *err_handler;
> +	struct cxl_rch_error_ctx *ctx = data;
>  
>  	if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
>  		return 0;
>  
> -	guard(device)(&dev->dev);
> -
> -	err_handler = dev->driver ? dev->driver->err_handler : NULL;
> -	if (!err_handler)
> -		return 0;
> -
> -	if (info->severity == AER_CORRECTABLE) {
> -		if (err_handler->cor_error_detected)
> -			err_handler->cor_error_detected(dev);
> -	} else if (err_handler->error_detected) {
> -		if (info->severity == AER_NONFATAL)
> -			err_handler->error_detected(dev, pci_channel_io_normal);
> -		else if (info->severity == AER_FATAL)
> -			err_handler->error_detected(dev, pci_channel_io_frozen);
> -	}
> +	if (cxl_forward_error(dev, ctx->info))
> +		ctx->enqueued = true;
>  	return 0;
>  }
>  
> -void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> +bool cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>  {
> +	struct cxl_rch_error_ctx ctx = { .info = info };
> +
>  	/*
> -	 * Internal errors of an RCEC indicate an AER error in an
> -	 * RCH's downstream port. Check and handle them in the CXL.mem
> -	 * device driver.
> +	 * An RCEC AER internal error indicates an error in an
> +	 * associated RCH Downstream Port or RC_END device or both.
> +	 * Forward to the cxl_core module for handling.
>  	 */
>  	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
>  	    is_aer_internal_error(info))
> -		pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
> +		pcie_walk_rcec(dev, cxl_rch_handle_error_iter, &ctx);
> +
> +	return ctx.enqueued;
>  }
>  
>  static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index fd203010877bf..807bca90dee0e 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -128,14 +128,14 @@ struct aer_err_info;
>  
>  #ifdef CONFIG_CXL_RAS
>  bool is_aer_internal_error(struct aer_err_info *info);
> -void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info);
> +bool cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info);
>  void cxl_rch_enable_rcec(struct pci_dev *rcec);
>  bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info);
>  bool cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info);
>  void cxl_proto_err_flush(void);
>  #else
>  static inline bool is_aer_internal_error(struct aer_err_info *info) { return false; }
> -static inline void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { }
> +static inline bool cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { return false; }
>  static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { }
>  static inline bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return false; }
>  static inline bool cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info) { return false; }


^ permalink raw reply

* Re: [PATCH v18 02/13] acpi/apei/ghes: Use raw_spinlock_t for CXL CPER work locks
From: Jonathan Cameron @ 2026-07-20 21:41 UTC (permalink / raw)
  To: Terry Bowman
  Cc: Bjorn Helgaas, Dan Williams, Dave Jiang, Ira Weiny, Len Brown,
	Rafael J . Wysocki, Robert Richter, linux-acpi, linux-cxl,
	linux-doc, linux-kernel, linux-pci, linuxppc-dev,
	Alejandro Lucero, Alison Schofield, Ankit Agrawal, Ard Biesheuvel,
	Ben Cheatham, Borislav Petkov, Breno Leitao, Davidlohr Bueso,
	Fabio M . De Francesco, Gregory Price, Hanjun Guo,
	Jonathan Corbet, Kees Cook, Kuppuswamy Sathyanarayanan, Li Ming,
	Mahesh J Salgaonkar, Mauro Carvalho Chehab, Oliver O'Halloran,
	Shiju Jose, Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck,
	Vishal Verma
In-Reply-To: <20260717222706.3540281-3-terry.bowman@amd.com>

On Fri, 17 Jul 2026 17:26:55 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> The CXL CPER work registration and unregistration helpers in
> drivers/acpi/apei/ghes.c acquire cxl_cper_work_lock and
> cxl_cper_prot_err_work_lock with guard(spinlock), which leaves local
> interrupts enabled. The corresponding post paths
> (cxl_cper_post_event(), cxl_cper_post_prot_err()) execute in hard IRQ
> context (they are called from the GHES error notification path) and
> acquire the same locks via guard(spinlock_irqsave).
> 
> If a CPU is holding one of these locks via guard(spinlock) when a
> GHES interrupt arrives on the same CPU, the IRQ handler spins on the
> held lock waiting for it to release, while the lock holder is
> preempted by the IRQ. The result is a deadlock.
> 
> Convert both locks from spinlock_t to raw_spinlock_t and use
> guard(raw_spinlock_irqsave) at all call sites. On PREEMPT_RT kernels
> spinlock_t is backed by rt_mutex and sleeping from hard IRQ context is
> not permitted; raw_spinlock_t is safe in both contexts.
> 
> Add WARN_ONCE to both register functions to surface double-registration
> bugs at runtime.
> 
> Restructure both unregister functions to clear the global work pointer
> under the lock before calling cancel_work_sync(), closing the window
> where a CPER interrupt could schedule work on a pointer about to be
> freed. Add kfifo_reset() after cancel_work_sync() so stale entries
> are not replayed on next module load.
> 
> Both kfifos are single-consumer: only one work_struct is registered at
> a time, enforced by the WARN_ONCE guard in the register functions.
> kfifo_reset() is safe outside the lock because cancel_work_sync() has
> already quiesced the consumer, and no new consumer can register until
> the current module exit completes and a fresh module init runs.
> 
> Remove the now-redundant cancel_work_sync() call from
> cxl_pci_driver_exit() - cxl_cper_unregister_work() handles quiescing
> internally.
> 
> Reported-by: Sashiko <sashiko@linuxfoundation.org>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Fixes: 5e4a264bf8b5 ("acpi/ghes: Process CXL Component Events")
> Fixes: 36f257e3b0ba ("acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors")
> Cc: stable@vger.kernel.org

Reviewed-by: Jonathan Cameron <jonathan.cameron@oss.qualcomm.com>

> 
> ---
> 
> Changes in v17 -> v18:
> - New patch.
> ---
>  drivers/acpi/apei/ghes.c | 50 ++++++++++++++++++++++++++--------------
>  drivers/cxl/pci.c        |  1 -
>  2 files changed, 33 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 3236a3ce79d6b..ca7a138c1ff2e 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -749,7 +749,7 @@ static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
>  		    CXL_CPER_PROT_ERR_FIFO_DEPTH);
>  
>  /* Synchronize schedule_work() with cxl_cper_prot_err_work changes */
> -static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
> +static DEFINE_RAW_SPINLOCK(cxl_cper_prot_err_work_lock);
>  struct work_struct *cxl_cper_prot_err_work;
>  
>  static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> @@ -761,7 +761,7 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
>  	if (cxl_cper_sec_prot_err_valid(prot_err))
>  		return;
>  
> -	guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
> +	guard(raw_spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
>  
>  	if (!cxl_cper_prot_err_work)
>  		return;
> @@ -780,10 +780,11 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
>  
>  int cxl_cper_register_prot_err_work(struct work_struct *work)
>  {
> -	if (cxl_cper_prot_err_work)
> -		return -EINVAL;
> +	guard(raw_spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
>  
> -	guard(spinlock)(&cxl_cper_prot_err_work_lock);
> +	if (WARN_ONCE(cxl_cper_prot_err_work,
> +		      "CPER-CXL kfifo consumer already registered\n"))
> +		return -EINVAL;
>  	cxl_cper_prot_err_work = work;
>  	return 0;
>  }
> @@ -791,11 +792,18 @@ EXPORT_SYMBOL_NS_GPL(cxl_cper_register_prot_err_work, "CXL");
>  
>  int cxl_cper_unregister_prot_err_work(struct work_struct *work)
>  {
> -	if (cxl_cper_prot_err_work != work)
> -		return -EINVAL;
> +	scoped_guard(raw_spinlock_irqsave, &cxl_cper_prot_err_work_lock) {
> +		if (WARN_ONCE(cxl_cper_prot_err_work != work,
> +			      "CPER-CXL kfifo consumer mismatch on unregister\n"))
> +			return -EINVAL;
> +		cxl_cper_prot_err_work = NULL;
> +	}
> +
> +	cancel_work_sync(work);
> +
> +	/* Discard stale entries so they are not replayed on next module load */
> +	kfifo_reset(&cxl_cper_prot_err_fifo);
>  
> -	guard(spinlock)(&cxl_cper_prot_err_work_lock);
> -	cxl_cper_prot_err_work = NULL;
>  	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, "CXL");
> @@ -811,7 +819,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_cper_prot_err_kfifo_get, "CXL");
>  DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, CXL_CPER_FIFO_DEPTH);
>  
>  /* Synchronize schedule_work() with cxl_cper_work changes */
> -static DEFINE_SPINLOCK(cxl_cper_work_lock);
> +static DEFINE_RAW_SPINLOCK(cxl_cper_work_lock);
>  struct work_struct *cxl_cper_work;
>  
>  static void cxl_cper_post_event(enum cxl_event_type event_type,
> @@ -831,7 +839,7 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
>  		return;
>  	}
>  
> -	guard(spinlock_irqsave)(&cxl_cper_work_lock);
> +	guard(raw_spinlock_irqsave)(&cxl_cper_work_lock);
>  
>  	if (!cxl_cper_work)
>  		return;
> @@ -849,10 +857,11 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
>  
>  int cxl_cper_register_work(struct work_struct *work)
>  {
> -	if (cxl_cper_work)
> +	guard(raw_spinlock_irqsave)(&cxl_cper_work_lock);
> +	if (WARN_ONCE(cxl_cper_work,
> +		      "CXL CPER kfifo consumer already registered\n"))
>  		return -EINVAL;
>  
> -	guard(spinlock)(&cxl_cper_work_lock);
>  	cxl_cper_work = work;
>  	return 0;
>  }
> @@ -860,11 +869,18 @@ EXPORT_SYMBOL_NS_GPL(cxl_cper_register_work, "CXL");
>  
>  int cxl_cper_unregister_work(struct work_struct *work)
>  {
> -	if (cxl_cper_work != work)
> -		return -EINVAL;
> +	scoped_guard(raw_spinlock_irqsave, &cxl_cper_work_lock) {
> +		if (WARN_ONCE(cxl_cper_work != work,
> +			      "CXL CPER kfifo consumer mismatch on unregister\n"))
> +			return -EINVAL;
> +		cxl_cper_work = NULL;
> +	}
> +
> +	cancel_work_sync(work);
> +
> +	/* Discard stale entries so they are not replayed on next module load */
> +	kfifo_reset(&cxl_cper_fifo);
>  
> -	guard(spinlock)(&cxl_cper_work_lock);
> -	cxl_cper_work = NULL;
>  	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_work, "CXL");
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 267c679b0b3c2..7c6faee7f85ed 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1083,7 +1083,6 @@ static int __init cxl_pci_driver_init(void)
>  static void __exit cxl_pci_driver_exit(void)
>  {
>  	cxl_cper_unregister_work(&cxl_cper_work);
> -	cancel_work_sync(&cxl_cper_work);
>  	pci_unregister_driver(&cxl_pci_driver);
>  }
>  


^ permalink raw reply

* Re: [PATCH v18 01/13] cxl/ras: Fix cxl_rch_get_aer_severity() wrong severity register
From: Jonathan Cameron @ 2026-07-20 21:26 UTC (permalink / raw)
  To: Terry Bowman
  Cc: Bjorn Helgaas, Dan Williams, Dave Jiang, Ira Weiny, Len Brown,
	Rafael J . Wysocki, Robert Richter, linux-acpi, linux-cxl,
	linux-doc, linux-kernel, linux-pci, linuxppc-dev,
	Alejandro Lucero, Alison Schofield, Ankit Agrawal, Ard Biesheuvel,
	Ben Cheatham, Borislav Petkov, Breno Leitao, Davidlohr Bueso,
	Fabio M . De Francesco, Gregory Price, Hanjun Guo,
	Jonathan Corbet, Kees Cook, Kuppuswamy Sathyanarayanan, Li Ming,
	Mahesh J Salgaonkar, Mauro Carvalho Chehab, Oliver O'Halloran,
	Shiju Jose, Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck,
	Vishal Verma
In-Reply-To: <20260717222706.3540281-2-terry.bowman@amd.com>

On Fri, 17 Jul 2026 17:26:54 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> cxl_rch_get_aer_severity() classifies RCH Downstream Port uncorrectable
> errors as fatal or non-fatal by ANDing uncorrectable status with
> PCI_ERR_ROOT_FATAL_RCV. This is wrong because PCI_ERR_ROOT_FATAL_RCV is a
> Root Error Status register bit (bit 6), not a severity bit. ANDing it
> against uncorrectable status tests a reserved bit and produces incorrect
> severity classification.
> 
> Fix by ANDing the unmasked uncor_status against uncor_severity. Per
> PCIe Base Spec r6.0 Section 7.8.4.4, each bit in the Uncorrectable
> Error Severity register indicates whether the corresponding error is
> fatal (1) or non-fatal (0).
> 
> Fixes: 6ac07883dbb5 ("cxl/pci: Add RCH downstream port error logging")
> Cc: stable@vger.kernel.org
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> 
One trivial thing below, otherwise LGTM

Either way

Reviewed-by: Jonathan Cameron <jonathan.cameron@oss.qualcomm.com>

> ---
> 
> Changes in v17 -> v18:
> - New patch.
> ---
>  drivers/cxl/core/ras_rch.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
> index 0a8b3b9b63884..44b335d560708 100644
> --- a/drivers/cxl/core/ras_rch.c
> +++ b/drivers/cxl/core/ras_rch.c
> @@ -80,7 +80,8 @@ static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
>  				     int *severity)
>  {
>  	if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
> -		if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
> +		if ((aer_regs->uncor_status & ~aer_regs->uncor_mask) &

This bit looks familiar (see line above!)  Worth a local variable maybe?


> +		    aer_regs->uncor_severity)
>  			*severity = AER_FATAL;
>  		else
>  			*severity = AER_NONFATAL;


^ permalink raw reply

* Re: [PATCH] Documentation: wmi: lenovo-wmi-other: Document intentional BIOS misspelling
From: Yahya Toubali @ 2026-07-20 21:06 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Yahya Toubali, Weijie Yuan, mpearson-lenovo, linux-doc,
	platform-driver-x86, linux-kernel
In-Reply-To: <87jyqp4yf4.fsf@trenco.lwn.net>

Hi Jon,

Thank you for the link. I have read the policy carefully.

To be completely honest: I used an LLM to identify potential typos and 
whitespace issues across the documentation files, as well as to draft 
the initial commit messages and email syntax. 

I see now why sending automated, low-value patches creates 
unnecessary churn for maintainers. I take full responsibility for 
those submissions. 

Going forward, I will not rely on LLM output for patch generation. Any 
future contributions from me will consist strictly of manual technical 
work and substantive fixes that I thoroughly understand and verify 
myself.

Appreciate your patience and guidance,
Yahya

^ permalink raw reply

* Re: [RFC] cxl: Device protocol AER injection
From: Jonathan Cameron @ 2026-07-20 21:04 UTC (permalink / raw)
  To: Terry Bowman, Ashok Raj
  Cc: Bjorn Helgaas, Dan Williams, Dave Jiang, Ira Weiny, Len Brown,
	Rafael J . Wysocki, Robert Richter, linux-acpi, linux-cxl,
	linux-doc, linux-kernel, linux-pci, linuxppc-dev,
	Alejandro Lucero, Alison Schofield, Ankit Agrawal, Ard Biesheuvel,
	Ben Cheatham, Borislav Petkov, Breno Leitao, Davidlohr Bueso,
	Fabio M . De Francesco, Gregory Price, Hanjun Guo,
	Jonathan Corbet, Kees Cook, Kuppuswamy Sathyanarayanan, Li Ming,
	Mahesh J Salgaonkar, Mauro Carvalho Chehab, Oliver O'Halloran,
	Shiju Jose, Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck,
	Vishal Verma
In-Reply-To: <20260717225700.3543801-1-terry.bowman@amd.com>

On Fri, 17 Jul 2026 17:57:00 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> This patch is intended to provide a method of testing the recently submitted
> cxl series "cxl: Enable CXL PCIe Port Protocol Error handling and logging" found
> here:
> 
> https://lore.kernel.org/linux-cxl/20260717222706.3540281-1-terry.bowman@amd.com/T/#md90ec1fdd1b374bf1e32e7736e2b3e34b328c701

Hi Terry,

https://lore.kernel.org/linux-cxl/20260717222706.3540281-1-terry.bowman@amd.com

Works fine.  Generally you can crop that end bit off the links.

> 
> The changes in this patch will allow CXL RAS protocol testing by injecting
> AER errors using AER EINJ. The RAS register block status is updated
> using a central function to augment RAS register block returned by
> to_ras_base(). This supports all CXL devices including Root Ports,
> Upstream Switch Ports, Downstream Switch Ports, Endpoints, and RCH
> Downstream Ports.

Why is this an RFC rather than a final proposal?  There should always
be something to give the reviewer that info in the patch description.
I'd actually be tempted to throw a cover letter in to have somewhere
out of the way to put that information.

Is it simply because it only makes sense once the other seris lands. 

> 
> Add debugfs-based CXL protocol error injection for testing CXL RAS
> error handling paths. Injects CXL RAS protocol errors using AER internal
> error inject interface via /sys/kernel/debug/cxl/aer_einj_inject.
> 
> RAS CXL status is set using to_ras_base() function override when kernel config
> CONFIG_CXL_PROTO_AER_EINJ is enabled.
> 
> Usage:
>   echo "DDDD:BB:DD.F [UCE|CE] AER_STATUS RAS_STATUS [RCH]" > \
>       /sys/kernel/debug/cxl/aer_einj_inject
> 
> Move struct aer_error_inj and aer_inject() to linux/aer.h so CXL
> can invoke AER injection directly. Export aer_inject() with
> EXPORT_SYMBOL_GPL.
> 
> Make cxl_debugfs non-static in port.c and declare it extern in
> core.h so the debugfs file can be created under the existing CXL
> debugfs root.
> 
> Co-developed-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
+CC Ashok,

Various thing inline.  Mostly this review is a bit superficial as I'd like
ideally to see a cleaner separation of this at level of files etc.

Thanks,

Jonathan
> ---
>  drivers/cxl/Kconfig           |  13 +++
>  drivers/cxl/core/core.h       |  21 ++++
>  drivers/cxl/core/port.c       |   2 +-
>  drivers/cxl/core/ras.c        | 208 ++++++++++++++++++++++++++++++++++
>  drivers/cxl/core/ras_rch.c    |  12 ++
>  drivers/pci/pcie/aer_inject.c |  29 ++---
>  include/linux/aer.h           |  15 +++
>  7 files changed, 281 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 80aeb0d556bd7..ef449228b2549 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -238,6 +238,19 @@ config CXL_RAS
>  	def_bool y
>  	depends on ACPI_APEI_GHES && PCIEAER && CXL_BUS
>  
> +config CXL_PROTO_AER_EINJ
> +	bool "CXL: RAS Protocol Error Injection using AER EINJ"
> +	depends on CXL_RAS
> +	depends on PCIEAER_INJECT

Do we think anyone who has CXL and PCIEAER_INJECT support will want
to carefully not build this?  I'm just wondering if we can avoid asking
the question and base the built or not on the combination of those.

> +	help
> +	  Enable debugfs-based CXL protocol error injection. Writes to
> +	  /sys/kernel/debug/cxl/aer_einj_inject inject CXL RAS protocol
> +	  errors using the AER internal error inject interface.
> +
> +	  This is a debug/test facility. Say N for production kernels.
> +
> +	  If unsure say N.

> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index a55a4e409feda..91910d2bb5d39 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
...

> @@ -244,4 +247,22 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
>  
>  resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
>  					   struct cxl_dport *dport);
> +
> +#ifdef CONFIG_CXL_PROTO_AER_EINJ

Do we need the ifdefs?  If the option isn't built none of this should get
used.  So small benefit.

> +
> +#define AER_REGISTER_SIZE 5
> +#define RAS_REGISTER_SIZE (CXL_RAS_CAPABILITY_LENGTH / sizeof(u32))
> +
> +struct cxl_aer_einj {
> +	int correctable;
> +	bool is_rch;
> +	struct mutex *lock;
> +	struct device *dev;
> +	u32 aer_registers[AER_REGISTER_SIZE];
> +	u32 ras_registers[RAS_REGISTER_SIZE];
> +};
> +
> +extern struct cxl_aer_einj cxl_aer_einj;
> +#endif

> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index d77208af41e03..d41deea899d30 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -3,6 +3,7 @@
>  
>  #include <linux/pci.h>
>  #include <linux/aer.h>
> +#include <linux/debugfs.h>
>  #include <cxl/event.h>
>  #include <cxlmem.h>
>  #include <cxlpci.h>
> @@ -117,6 +118,195 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>  }
>  static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>  
> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
> +

Unless very strong reasons for it, generally don't do #if stuff in c files.
Just have a separate c file for this.  Otherwise it hurts readability and
we tend to loose the clean separation over time.  A file makes that less
likely.


> +static DEFINE_MUTEX(cxl_aer_einj_mutex);

Needs a comment for what data it is protecting.

> +
> +struct cxl_aer_einj cxl_aer_einj = {
> +	.lock = &cxl_aer_einj_mutex,
> +};
> +
> +static const char cxl_aer_einj_usage[] =
> +	"ssss:bb:dd.f [UCE|CE] AER_STATUS RAS_STATUS [RCH]\n";
> +
> +static int cxl_aer_inject_error(struct pci_dev *pdev, bool correctable,
> +				u32 aer_status, u32 ras_status)
> +{
> +	/* RCD errors are signaled as internal errors on the associated RCEC */
> +	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) {
> +		if (!pdev->rcec)
> +			return -ENODEV;
> +		pdev = pdev->rcec;
> +	}
> +
> +	struct aer_error_inj einj = {
> +		.bus = pdev->bus->number,
> +		.dev = PCI_SLOT(pdev->devfn),
> +		.fn = PCI_FUNC(pdev->devfn),
> +		.domain = pci_domain_nr(pdev->bus),
> +	};
> +	int ret;
> +	int aer_offset;
> +	int ras_offset;
> +
> +	if (correctable) {
> +		einj.cor_status = aer_status | PCI_ERR_COR_INTERNAL;
> +		aer_offset = PCI_ERR_COR_STATUS / sizeof(u32);
> +		ras_offset = CXL_RAS_CORRECTABLE_STATUS_OFFSET / sizeof(u32);

Given these are offsets into cxl_aer_einj.aer_registers / ras_registers
can we use sizeof(*cxl_aer_einj.aer_registers) etc

> +	} else {
> +		einj.uncor_status = aer_status | PCI_ERR_UNC_INTN;
> +		aer_offset = PCI_ERR_UNCOR_STATUS / sizeof(u32);
> +		ras_offset = CXL_RAS_UNCORRECTABLE_STATUS_OFFSET / sizeof(u32);
> +	}
> +
> +	cxl_aer_einj.correctable = correctable;
> +	cxl_aer_einj.aer_registers[aer_offset] = aer_status;
> +	cxl_aer_einj.ras_registers[ras_offset] = ras_status;
> +
> +	ret = aer_inject(&einj);
> +	if (ret) {
> +		pr_err("cxl-einj: aer_inject failed: %d\n", ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static ssize_t cxl_aer_einj_write(struct file *file,
> +				    const char __user *ubuf,
> +				    size_t count, loff_t *ppos)

Might as well wrap to 80 chars and save a line.

> +{

...

> +
> +	struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_dev(&pdev->dev, &dport);
> +	if (!port) {
> +		dev_err(&pdev->dev, "cxl-einj: Failed to find CXL Port.\n");
> +		return -ENODEV;
> +	}
> +
> +	if (!to_ras_base(port, dport)) {
> +		dev_err(&pdev->dev, "cxl-einj: RAS not initialized.\n");
> +		return -ENODEV;
> +	}
> +
> +	cxl_aer_einj.is_rch = (nargs == 5 && strcmp(topology, "RCH") == 0);
> +	if (!cxl_aer_einj.is_rch)
> +		pci_dev_get(pdev);
> +	cxl_aer_einj.dev = cxl_aer_einj.is_rch ? pdev->dev.parent : &pdev->dev;

	cxl_aer_einj.dev = cxl_aer_einj.is_rch ? pdev->dev.parent : pci_dev_get(&pdev->dev);

Or use an if else for both is_rch based choices.
Lazy me wonders.... Can we just grab a reference to the dev.parent for is_rch and
simplify the code?  We don't really need it I think but it is harmless.


> +	ret = cxl_aer_inject_error(pdev, strcmp(severity, "CE") == 0,
> +				   aer_status, ras_status);
> +	if (ret) {
> +		if (!cxl_aer_einj.is_rch)
> +			pci_dev_put(pdev);
> +		cxl_aer_einj.dev = NULL;
> +		pr_err("cxl-einj: injection failed for %s: %d\n", sbdf, ret);
> +		return ret;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t cxl_aer_einj_read(struct file *file, char __user *ubuf,
> +				 size_t count, loff_t *ppos)
> +{
> +	return simple_read_from_buffer(ubuf, count, ppos,
> +				       cxl_aer_einj_usage,

Probably wrap this to push that up a line.

> +				       sizeof(cxl_aer_einj_usage) - 1);
> +}

> +
> +static void __iomem *to_einj_ras_base(struct cxl_port *port, struct cxl_dport *dport)
> +{
> +	if (dport) {
> +		if (cxl_aer_einj.is_rch) {
> +			if (cxl_aer_einj.dev == dport->dport_dev) {
> +				cxl_aer_einj.dev = NULL;
> +				return (__force void __iomem *)cxl_aer_einj.ras_registers;

Given the output of this is always force cast, maybe move that force up to the caller?

> +			}
> +		} else {
> +			if (cxl_aer_einj.dev == dport->dport_dev) {
> +				pci_dev_put(to_pci_dev(cxl_aer_einj.dev));

Not locally obvious why a thing called to_einj_ras_base should put anything it didn't
get.  I think this needs a restructure to more obviously be tidying up references
that were held over the queue. At very leads needs a comment.
/* Reference held from X no longer needed so drop */

> +				cxl_aer_einj.dev = NULL;
> +				return (__force void __iomem *)cxl_aer_einj.ras_registers;
> +			}
> +		}
> +	} else if (!cxl_aer_einj.is_rch) {
> +		struct device *dev = is_cxl_endpoint(port) ?
> +			port->uport_dev->parent : port->uport_dev;
> +
> +		if (dev_is_pci(dev) && cxl_aer_einj.dev == dev) {
> +			pci_dev_put(to_pci_dev(cxl_aer_einj.dev));
> +			cxl_aer_einj.dev = NULL;
> +			return (__force void __iomem *)cxl_aer_einj.ras_registers;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +#endif
> +
>  static void cxl_unmask_proto_interrupts(struct device *dev)
>  {
>  	struct pci_dev *pdev;
> @@ -238,6 +428,14 @@ void __iomem *to_ras_base(struct cxl_port *port, struct cxl_dport *dport)
>  	if (!port)
>  		return NULL;
>  
> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)

Wrap with
	if (IS_ENABLED()) to keep it visible to the compiler.

> +	if (cxl_aer_einj.dev) {
> +		void __iomem *einj = to_einj_ras_base(port, dport);
> +		if (einj)
> +			return einj;
> +	}
> +#endif
> +
>  	if (dport)
>  		return dport->regs.ras;
>  
> @@ -458,10 +656,20 @@ void cxl_ras_init(void)
>  	cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
>  	cxl_register_proto_err_work(&cxl_proto_err_work,
>  				   cxl_proto_err_do_flush);
> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
> +	cxl_ras_create_debugfs(cxl_debugfs);
stub that in a header.

> +#endif
>  }
>  
>  void cxl_ras_exit(void)
>  {
>  	cxl_unregister_proto_err_work();
>  	cxl_cper_unregister_prot_err_work();
> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)

As below.  Use 
	if (IS_ENABLED())

and keep everything visible.

> +	if (cxl_aer_einj.dev) {
> +		if (!cxl_aer_einj.is_rch)
> +			pci_dev_put(to_pci_dev(cxl_aer_einj.dev));
> +		cxl_aer_einj.dev = NULL;
> +	}
> +#endif
>  }
> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
> index 14bb3bdb2d092..5071cf86e4a68 100644
> --- a/drivers/cxl/core/ras_rch.c
> +++ b/drivers/cxl/core/ras_rch.c
> @@ -110,6 +110,14 @@ void cxl_handle_rdport_errors(struct pci_dev *pdev)
>  	if (!dport)
>  		return;
>  
> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
> +	if (cxl_aer_einj.is_rch && cxl_aer_einj.dev) {
> +		severity = cxl_aer_einj.correctable ?
> +			AER_CORRECTABLE : AER_FATAL;
> +		goto handle_ras;
> +	}

Use instead
	if (IS_ENABLED(CONFIG_CXL_PROTO_AR_EINJ))

then compiler can see the code (but remove it) which means
you don't need the dance around the label below.

In general follow this pattern anyway rather than #if 
when you can.

> +#endif
> +
>  	if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
>  		return;
>  
> @@ -117,6 +125,10 @@ void cxl_handle_rdport_errors(struct pci_dev *pdev)
>  		return;
>  
>  	pci_print_aer(pdev, severity, &aer_regs);
> +
> +#if IS_ENABLED(CONFIG_CXL_PROTO_AER_EINJ)
> +handle_ras:
> +#endif
>  	if (severity == AER_CORRECTABLE)
>  		cxl_handle_cor_ras(dport->port, dport,
>  				   to_ras_base(port, dport), pdev->dsn);
> diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c
> index 09bfc7194ef31..b313adef680ae 100644
> --- a/drivers/pci/pcie/aer_inject.c
> +++ b/drivers/pci/pcie/aer_inject.c
> @@ -14,6 +14,7 @@
>  
>  #define dev_fmt(fmt) "aer_inject: " fmt
>  
> +#include <linux/aer.h>
>  #include <linux/module.h>
>  #include <linux/init.h>
>  #include <linux/interrupt.h>
> @@ -31,19 +32,6 @@
>  static bool aer_mask_override;
>  module_param(aer_mask_override, bool, 0);
>  
> -struct aer_error_inj {
> -	u8 bus;
> -	u8 dev;
> -	u8 fn;
> -	u32 uncor_status;
> -	u32 cor_status;
> -	u32 header_log0;
> -	u32 header_log1;
> -	u32 header_log2;
> -	u32 header_log3;
> -	u32 domain;
> -};
> -
>  struct aer_error {
>  	struct list_head list;
>  	u32 domain;
> @@ -316,7 +304,7 @@ static int pci_bus_set_aer_ops(struct pci_bus *bus)
>  	return 0;
>  }
>  
> -static int aer_inject(struct aer_error_inj *einj)
> +int aer_inject(struct aer_error_inj *einj)
>  {
>  	struct aer_error *err, *rperr;
>  	struct aer_error *err_alloc = NULL, *rperr_alloc = NULL;
> @@ -332,10 +320,14 @@ static int aer_inject(struct aer_error_inj *einj)
>  	dev = pci_get_domain_bus_and_slot(einj->domain, einj->bus, devfn);
>  	if (!dev)
>  		return -ENODEV;
> -	rpdev = pcie_find_root_port(dev);
> -	/* If Root Port not found, try to find an RCEC */
> -	if (!rpdev)
> -		rpdev = dev->rcec;
> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) 

{ }
as the else is multiline (see coding standard)

Maybe need a comment for why it might be an RCEC for injection.
Is this an RCH specific path where there is nothing else to target?

> +		rpdev = dev;
> +	else {
> +		rpdev = pcie_find_root_port(dev);
> +		/* If Root Port not found, try to find an RCEC */
> +		if (!rpdev)
> +			rpdev = dev->rcec;
> +	}
>  	if (!rpdev) {
>  		pci_err(dev, "Neither Root Port nor RCEC found\n");
>  		ret = -ENODEV;
> @@ -482,6 +474,7 @@ static int aer_inject(struct aer_error_inj *einj)
>  	pci_dev_put(dev);
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(aer_inject);
I wonder if we want to restrict this to specific modules?

One for Bjorn probably.

>  
>  static ssize_t aer_inject_write(struct file *filp, const char __user *ubuf,
>  				size_t usize, loff_t *off)
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index b3657b80564b9..65c22ba597657 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -27,6 +27,21 @@
>  struct pci_dev;
>  struct work_struct;
>  
> +struct aer_error_inj {
> +	u8 bus;
> +	u8 dev;
> +	u8 fn;
> +	u32 uncor_status;
> +	u32 cor_status;
> +	u32 header_log0;
> +	u32 header_log1;
> +	u32 header_log2;
> +	u32 header_log3;
> +	u32 domain;
> +};
> +
> +int aer_inject(struct aer_error_inj *einj);
> +
>  struct pcie_tlp_log {
>  	union {
>  		u32 dw[PCIE_STD_MAX_TLP_HEADERLOG];


^ permalink raw reply

* Re: [PATCH v18 06/13] PCI: Establish common CXL Port protocol error flow
From: Dave Jiang @ 2026-07-20 20:44 UTC (permalink / raw)
  To: Terry Bowman, Bjorn Helgaas, Dan Williams, Ira Weiny,
	Jonathan Cameron, Len Brown, Rafael J . Wysocki, Robert Richter
  Cc: linux-acpi, linux-cxl, linux-doc, linux-kernel, linux-pci,
	linuxppc-dev, Alejandro Lucero, Alison Schofield, Ankit Agrawal,
	Ard Biesheuvel, Ben Cheatham, Borislav Petkov, Breno Leitao,
	Davidlohr Bueso, Fabio M . De Francesco, Gregory Price,
	Hanjun Guo, Jonathan Corbet, Kees Cook,
	Kuppuswamy Sathyanarayanan, Li Ming, Mahesh J Salgaonkar,
	Mauro Carvalho Chehab, Oliver O'Halloran, Shiju Jose,
	Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck, Vishal Verma
In-Reply-To: <20260717222706.3540281-7-terry.bowman@amd.com>



On 7/17/26 3:26 PM, Terry Bowman wrote:
> Add CXL protocol error dispatch in handle_error_source() using
> is_cxl_error() and cxl_forward_error() to route errors through the
> AER-CXL kfifo. Expand is_cxl_error() from Endpoint-only to include
> Root Port, Upstream Port, and Downstream Port device types. The
> producer and consumer go live in the same commit to avoid silently
> dropping CXL errors during bisect.
> 
> For uncorrectable events, call cxl_proto_err_flush() to ensure CXL RAS
> registers are read, panic policy is applied, and CXL state is cleared
> before pci_aer_handle_error() drives PCIe recovery. Without the flush,
> AER recovery can tear down drivers and unmap the CXL RAS iomaps while
> the kfifo consumer is still reading them. Correctable events do not
> need the flush and run asynchronously. RCH kfifo support is added in
> the following patch ("PCI/CXL: Add RCH support to CXL handlers").
> 
> Add cxl_handle_proto_error() to dispatch correctable and uncorrectable
> errors through the CXL RAS helpers. Add cxl_do_recovery() to coordinate
> uncorrectable recovery. Panic when a UCE is confirmed by a successful
> CXL RAS status register read. If the RAS registers cannot be read the
> UCE cannot be confirmed and panic is not triggered. Gate error handling
> on the port driver being bound to avoid processing errors on disabled
> devices.
> 
> The kfifo consumer holds guard(device)(&port->dev) and checks
> port->dev.driver before accessing RAS registers, serializing against
> driver unbind and devm iomap teardown. For UCE, cxl_proto_err_flush()
> runs the worker synchronously before AER recovery, ensuring the device
> is present during RAS register access.
> 
> Add to_ras_base() to centralize RAS base lookup: dport->regs.ras for
> Root/Downstream Ports, port->regs.ras for Upstream Ports and Endpoints.
> Use to_ras_base() to access the CXL devices' RAS registers as it will
> provide an avenue to inject status simulation during testing.
> 
> Add CXL RAS logging in cxl_handle_cor_ras() and cxl_handle_ras(). The
> existing cxl_cor_error_detected() and cxl_error_detected() AER
> callbacks remain for all Endpoints and are reworked to use
> find_cxl_port_by_uport() and to_ras_base(), with UCE now triggering
> panic unconditionally. These callbacks are further updated in the
> following patch ("PCI/CXL: Add RCH support to CXL handlers").
> 
> Fix a pre-existing race for cxlds between cxl_handle_rdport_errors() and
> cxl_memdev_shutdown() by holding a cxlmd device scoped_guard() around
> the rdport call. Release the lock before taking the Port lock to avoid
> the lock inversion.
> 
> Co-developed-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> 
> Changes in v17->v18:
> - Fix pre-existing race: hold memdev device lock around
>   cxl_handle_rdport_errors(), release before port lock
> - Fix handle_error_source() to call pci_aer_handle_error() unconditionally
>   so AER handling always runs after cxl_forward_error()
> - Add cxl_proto_err_flush() call for CXL UCE to drain kfifo before AER
>   recovery tears down the device
> - Fix NULL dereference of dport->dport_dev in cxl_handle_cor_ras() and
>   cxl_handle_ras() for UPSTREAM/ENDPOINT port types: use dport->dport_dev
>   when dport is non-NULL, else fall back to port->uport_dev
> - Remove duplicate pcie_clear_device_status() call from
>   cxl_handle_proto_error() CE path; pci_aer_handle_error() already clears it
> - Clarify panic policy: panic only on confirmed UCE via RAS status read
> - Document kfifo consumer serialization against driver unbind via
>   guard(device)(&port->dev) and port->dev.driver check
> 
> Changes in v16->v17:
> - get_cxl_port() -> find_cxl_port_by_dev()
> - Simplified find_cxl_port_by_dev()
> - Replace and remove cxl_serial_number() w/ pci_get_dsn()
> - cxl_get_ras_base() -> to_ras_base()
> - Drop dependency on PCI_ERS_RESULT_PANIC; cxl_do_recovery() panics
>   directly. (PANIC enum patch dropped from series.)
> - Clarify panic semantics: panic on any uncorrectable CXL RAS error, not
>   only AER-FATAL severities.
> - Add is_cxl_error() switch in handle_error_source() here, paired with the
>   kfifo consumer registration, to keep each commit bisect-safe.
> - Drop pcie_aer_is_native() guard in cxl_do_recovery() (always native).
> - Swap order with the "Limit" patch for bisectability w/ cxl_ras_exit()
> - Reword for "any uncorrectable" CXL RAS error panics.
> - Restore log messages for port-not-found and port-unbound cases.
> - Whitespace cleanup (Jonathan)
> - Update to get_cxl_port() documentation (Terry)
> - Fix __cxl_proto_err_work_fn() to return 0 for transient errors.
> - Drop !port check in cxl_do_recovery(), caller already validated
> - Fix kerneldoc @pdev -> @dev in find_cxl_port_by_dev()
> - Fix missing space in pr_err_ratelimited()
> - Made pcie_clear_device_status() and pci_aer_clear_fatal_status()
>   EXPORT_SYMBOL_FOR_MODULES("cxl_core") (Dan)
> - Move find_cxl_port_by_dport() and find_cxl_port_by_uport()
>   de-staticisation and core.h declarations from the rename patch to
>   here, where the first cross-file callers in find_cxl_port_by_dev()
>   land.
> 
> Changes in v15->v16:
> - get_ras_base(), initialize dport to NULL (Jonathan)
> - Remove guard(device)(&cxlmd->dev) (Jonathan)
> - Fix dev_warns() (Jonathan)
> - Remove comment in cxl_port_error_detected() (Dan)
> - Update switch-case brackets to follow clang-format (Dan)
> - Add PCI_EXP_TYPE_RC_END for cxl_get_ras_base() (Terry)
> - Add NULL port check in cxl_serial_number() (Terry)
> 
> Changes in v14->v15:
> - Update commit message and title. Added Bjorn's ack.
> - Move CE and UCE handling logic here
> 
> Changes in v13->v14:
> - Add Dave Jiang's review-by
> - Update commit message & headline (Bjorn)
> - Refactor cxl_port_error_detected()/cxl_port_cor_error_detected() to
>   one line (Jonathan)
> - Remove cxl_walk_port() (Dan)
> - Remove cxl_pci_drv_bound(). Check for 'is_cxl' parent port is
>   sufficient (Dan)
> - Remove device_lock_if()
> - Combined CE and UCE here (Terry)
> 
> Changes in v12->v13:
> - Move get_pci_cxl_host_dev() and cxl_handle_proto_error() to Dequeue
>   patch (Terry)
> - Remove EP case in cxl_get_ras_base(), not used. (Terry)
> - Remove check for dport->dport_dev (Dave)
> - Remove whitespace (Terry)
> 
> Changes in v11->v12:
> - Add call to cxl_pci_drv_bound() in cxl_handle_proto_error() and
>   pci_to_cxl_dev()
> - Change cxl_error_detected() -> cxl_cor_error_detected()
> - Remove NULL variable assignments
> - Replace bus_find_device() with find_cxl_port_by_uport() for upstream
>   port searches.
> 
> Changes in v10->v11:
> - None
> ---
>  drivers/cxl/core/core.h       |   7 ++
>  drivers/cxl/core/port.c       |   6 +-
>  drivers/cxl/core/ras.c        | 223 +++++++++++++++++++++++++++-------
>  drivers/pci/pci.h             |   1 -
>  drivers/pci/pcie/aer.c        |  13 ++
>  drivers/pci/pcie/aer_cxl_vh.c |  16 ++-
>  6 files changed, 220 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 23fe40ddf4c6b..7c70bea06c2db 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -186,6 +186,8 @@ static inline struct device *dport_to_host(struct cxl_dport *dport)
>  void cxl_ras_init(void);
>  void cxl_ras_exit(void);
>  bool cxl_handle_ras(struct device *dev, void __iomem *ras_base);
> +void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port,
> +		     struct cxl_dport *dport);
>  void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);
>  void cxl_dport_map_rch_aer(struct cxl_dport *dport);
>  void cxl_disable_rch_root_ints(struct cxl_dport *dport);
> @@ -198,6 +200,8 @@ static inline bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
>  {
>  	return false;
>  }
> +static inline void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port,
> +				   struct cxl_dport *dport) { }
>  static inline void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { }
>  static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
>  static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
> @@ -206,6 +210,9 @@ static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport) { }
>  #endif /* CONFIG_CXL_RAS */
>  
>  int cxl_gpf_port_setup(struct cxl_dport *dport);
> +struct cxl_port *find_cxl_port_by_dport(struct device *dport_dev,
> +					struct cxl_dport **dport);
> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev);
>  
>  struct cxl_hdm;
>  int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index cadb51f70f854..a76f3ee05cba8 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1400,8 +1400,8 @@ static struct cxl_port *__find_cxl_port_by_dport(struct cxl_find_port_ctx *ctx)
>   * Return a 'struct cxl_port' with an elevated reference if found. Use
>   * __free(put_cxl_port) to release.
>   */
> -static struct cxl_port *find_cxl_port_by_dport(struct device *dport_dev,
> -					       struct cxl_dport **dport)
> +struct cxl_port *find_cxl_port_by_dport(struct device *dport_dev,
> +					struct cxl_dport **dport)
>  {
>  	struct cxl_find_port_ctx ctx = {
>  		.dport_dev = dport_dev,
> @@ -1596,7 +1596,7 @@ static int match_port_by_uport(struct device *dev, const void *data)
>   * Function takes a device reference on the port device. Caller should do a
>   * put_device() when done.
>   */
> -static struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
>  {
>  	struct device *dev;
>  
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 135f1997e6f4f..b190e69c2d415 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -77,6 +77,36 @@ static int match_memdev_by_parent(struct device *dev, const void *uport)
>  	return 0;
>  }
>  
> +
> +/**
> + * find_cxl_port_by_dev - Use @dev as hint to do a _by_dport or _by_uport lookup
> + * @dev: generic device that may either be a companion of port or target dport
> + * @dport: output parameter; set to the matched dport for dport-class
> + * lookups (Root Port, Downstream Port), NULL otherwise.
> + *
> + * Return a 'struct cxl_port' with an elevated reference if found. Use
> + * __free(put_cxl_port) to release.
> + */
> +static struct cxl_port *find_cxl_port_by_dev(struct device *dev, struct cxl_dport **dport)
> +{
> +	if (dport)
> +		*dport = NULL;
> +	if (!dev_is_pci(dev))
> +		return NULL;
> +
> +	switch (pci_pcie_type(to_pci_dev(dev))) {
> +	case PCI_EXP_TYPE_ROOT_PORT:
> +	case PCI_EXP_TYPE_DOWNSTREAM:
> +		return find_cxl_port_by_dport(dev, dport);
> +	case PCI_EXP_TYPE_UPSTREAM:
> +	case PCI_EXP_TYPE_ENDPOINT:
> +	case PCI_EXP_TYPE_RC_END:
> +		return find_cxl_port_by_uport(dev);
> +	}
> +
> +	return NULL;
> +}
> +
>  void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
>  {
>  	unsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device,
> @@ -132,16 +162,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>  }
>  static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>  
> -void cxl_ras_init(void)
> -{
> -	cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
> -}
> -
> -void cxl_ras_exit(void)
> -{
> -	cxl_cper_unregister_prot_err_work();
> -}
> -
>  static void cxl_dport_map_ras(struct cxl_dport *dport)
>  {
>  	struct cxl_register_map *map = &dport->reg_map;
> @@ -198,10 +218,39 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>  
> +static void __iomem *to_ras_base(struct cxl_port *port, struct cxl_dport *dport)
> +{
> +	if (!port)
> +		return NULL;
> +
> +	if (dport)
> +		return dport->regs.ras;
> +
> +	return port->regs.ras;
> +}
> +
> +void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port, struct cxl_dport *dport)
> +{
> +	struct device *dev = dport ? dport->dport_dev : port->uport_dev;
> +	void __iomem *ras_base = to_ras_base(port, dport);
> +
> +	if (!ras_base) {
> +		dev_err(&pdev->dev,
> +			"CXL UCE signaled but RAS registers not mapped\n");
> +		return;
> +	}
> +
> +	if (cxl_handle_ras(dev, ras_base))
> +		panic("CXL cachemem error");
> +
> +	dev_dbg(&pdev->dev,
> +		"CXL UCE signaled but no CXL RAS status bits set\n");
> +}
> +
>  void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
>  {
> -	void __iomem *addr;
>  	u32 status;
> +	void __iomem *addr;
>  
>  	if (!ras_base)
>  		return;
> @@ -210,7 +259,10 @@ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
>  	status = readl(addr);
>  	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
>  		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> -		trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
> +		if (is_cxl_memdev(dev))
> +			trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
> +		else
> +			trace_cxl_port_aer_correctable_error(dev, status);
>  	}
>  }
>  
> @@ -262,7 +314,11 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
>  	}
>  
>  	header_log_copy(ras_base, hl);
> -	trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
> +	if (is_cxl_memdev(dev))
> +		trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
> +	else
> +		trace_cxl_port_aer_uncorrectable_error(dev, status, fe, hl);
> +
>  	writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
>  
>  	return true;
> @@ -270,22 +326,32 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
>  
>  void cxl_cor_error_detected(struct pci_dev *pdev)
>  {
> -	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> -	struct cxl_memdev *cxlmd = cxlds->cxlmd;
> -	struct device *dev = &cxlds->cxlmd->dev;
> +	guard(device)(&pdev->dev);
> +	if (!pdev->dev.driver)
> +		return;
> +
> +	struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
> +	if (!port)
> +		return;
> +
> +	if (is_cxl_restricted(pdev)) {
> +		struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +		struct cxl_memdev *cxlmd = cxlds->cxlmd;
>  
> -	scoped_guard(device, dev) {
> -		if (!dev->driver) {
> +		scoped_guard(device, &cxlmd->dev) {
> +			cxl_handle_rdport_errors(cxlds);
> +		}
> +	}
> +
> +	scoped_guard(device, &port->dev) {
> +		if (!port->dev.driver) {
>  			dev_warn(&pdev->dev,
> -				 "%s: memdev disabled, abort error handling\n",
> -				 dev_name(dev));
> +				 "%s: port disabled, abort error handling\n",
> +				 dev_name(&port->dev));
>  			return;
>  		}
>  
> -		if (cxlds->rcd)
> -			cxl_handle_rdport_errors(cxlds);
> -
> -		cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras);
> +		cxl_handle_cor_ras(port->uport_dev, to_ras_base(port, NULL));
>  	}
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
> @@ -293,42 +359,53 @@ EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
>  pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  				    pci_channel_state_t state)
>  {
> -	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> -	struct cxl_memdev *cxlmd = cxlds->cxlmd;
> -	struct device *dev = &cxlmd->dev;
> -	bool ue;
> +	struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
> +	bool ue = false;
>  
> -	scoped_guard(device, dev) {
> -		if (!dev->driver) {
> +	if (!port)
> +		return PCI_ERS_RESULT_DISCONNECT;
> +
> +	if (is_cxl_restricted(pdev)) {
> +		struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +		struct cxl_memdev *cxlmd = cxlds->cxlmd;
> +
> +		scoped_guard(device, &cxlmd->dev) {
> +			cxl_handle_rdport_errors(cxlds);
> +		}
> +	}
> +
> +	scoped_guard(device, &port->dev) {
> +		if (!port->dev.driver) {
>  			dev_warn(&pdev->dev,
> -				 "%s: memdev disabled, abort error handling\n",
> -				 dev_name(dev));
> +				 "%s: port disabled, abort error handling\n",
> +				 dev_name(&port->dev));
>  			return PCI_ERS_RESULT_DISCONNECT;
>  		}
>  
> -		if (cxlds->rcd)
> -			cxl_handle_rdport_errors(cxlds);
>  		/*
>  		 * A frozen channel indicates an impending reset which is fatal to
>  		 * CXL.mem operation, and will likely crash the system. On the off
>  		 * chance the situation is recoverable dump the status of the RAS
>  		 * capability registers and bounce the active state of the memdev.
>  		 */
> -		ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras);
> +		ue = cxl_handle_ras(port->uport_dev, to_ras_base(port, NULL));
>  	}
>  
> +	/*
> +	 * CXL.mem UCE means cache coherency is lost. Continuing risks
> +	 * silent data corruption across interleaved HDM regions.
> +	 */
> +	if (ue)
> +		panic("CXL cachemem error");
> +
>  	switch (state) {
>  	case pci_channel_io_normal:
> -		if (ue) {
> -			device_release_driver(dev);
> -			return PCI_ERS_RESULT_NEED_RESET;
> -		}
>  		return PCI_ERS_RESULT_CAN_RECOVER;
>  	case pci_channel_io_frozen:
>  		dev_warn(&pdev->dev,
>  			 "%s: frozen state error detected, disable CXL.mem\n",
> -			 dev_name(dev));
> -		device_release_driver(dev);
> +			 dev_name(port->uport_dev));
> +		device_release_driver(port->uport_dev);
>  		return PCI_ERS_RESULT_NEED_RESET;
>  	case pci_channel_io_perm_failure:
>  		dev_warn(&pdev->dev,
> @@ -338,3 +415,67 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  	return PCI_ERS_RESULT_NEED_RESET;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
> +
> +static void cxl_handle_proto_error(struct pci_dev *pdev, struct cxl_port *port,
> +				   struct cxl_dport *dport, int severity)
> +{
> +	struct device *dev = dport ? dport->dport_dev : port->uport_dev;
> +
> +	if (severity == AER_CORRECTABLE)
> +		cxl_handle_cor_ras(dev, to_ras_base(port, dport));
> +	else
> +		cxl_do_recovery(pdev, port, dport);
> +}
> +
> +static void __cxl_proto_err_work_fn(struct cxl_proto_err_work_data *wd)
> +{
> +	struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_dev(&wd->pdev->dev, NULL);
> +	if (!port) {
> +		dev_err_ratelimited(&wd->pdev->dev,
> +				    "Failed to find parent port device in CXL topology\n");
> +		return;
> +	}
> +	guard(device)(&port->dev);
> +	if (!port->dev.driver) {
> +		dev_err_ratelimited(&port->dev,
> +				    "Port device is unbound, abort error handling\n");
> +		return;
> +	}
> +
> +	struct cxl_dport *dport = cxl_find_dport_by_dev(port, &wd->pdev->dev);
> +	if (!dport && (pci_pcie_type(wd->pdev) == PCI_EXP_TYPE_ROOT_PORT ||
> +		       pci_pcie_type(wd->pdev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> +		dev_err_ratelimited(&wd->pdev->dev,
> +				    "Failed to find dport device in CXL topology\n");
> +		return;
> +	}
> +
> +	cxl_handle_proto_error(wd->pdev, port, dport, wd->severity);
> +}
> +
> +static void cxl_proto_err_work_fn(struct work_struct *work)
> +{
> +	struct cxl_proto_err_work_data wd;
> +
> +	for_each_cxl_proto_err(&wd, __cxl_proto_err_work_fn);
> +}
> +
> +static DECLARE_WORK(cxl_proto_err_work, cxl_proto_err_work_fn);
> +
> +static void cxl_proto_err_do_flush(void)
> +{
> +	flush_work(&cxl_proto_err_work);
> +}
> +
> +void cxl_ras_init(void)
> +{
> +	cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
> +	cxl_register_proto_err_work(&cxl_proto_err_work,
> +				   cxl_proto_err_do_flush);
> +}
> +
> +void cxl_ras_exit(void)
> +{
> +	cxl_unregister_proto_err_work();
> +	cxl_cper_unregister_prot_err_work();
> +}
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 4469e1a77f3c1..a83e2aef75912 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -1296,7 +1296,6 @@ void pci_restore_aer_state(struct pci_dev *dev);
>  static inline void pci_no_aer(void) { }
>  static inline void pci_aer_init(struct pci_dev *d) { }
>  static inline void pci_aer_exit(struct pci_dev *d) { }
> -static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
>  static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
>  static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
>  static inline void pci_save_aer_state(struct pci_dev *dev) { }
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index c5bce25df51cb..2d9d40528e709 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1185,7 +1185,20 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>  
>  static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>  {
> +	bool cxl_pending = false;
> +
>  	cxl_rch_handle_error(dev, info);
> +
> +	if (is_cxl_error(dev, info))
> +		cxl_pending |= cxl_forward_error(dev, info);
> +
> +	/*
> +	 * Wait for UCE CXL work to complete before AER recovery
> +	 * tears down the device. CE can run asynchronously.
> +	 */
> +	if (cxl_pending && info->severity != AER_CORRECTABLE)
> +		cxl_proto_err_flush();
> +
>  	pci_aer_handle_error(dev, info);
>  	pci_dev_put(dev);
>  }
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> index 93bed07936100..ecf47bba0c9d4 100644
> --- a/drivers/pci/pcie/aer_cxl_vh.c
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -49,8 +49,22 @@ bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
>  	if (!info || !info->is_cxl)
>  		return false;
>  
> -	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
> +	/*
> +	 * RC_END (Restricted CXL Device) is not included here because RC_END
> +	 * reports errors on behalf of upstream RCH Downstream Port and thus
> +	 * requires a unique discovery detailed in CXL4.0 spec (12.2.1.1).
> +	 * The RCH device error discovery and RC_END forwarding flow begins
> +	 * in cxl_rch_handle_error().
> +	 */
> +	switch (pci_pcie_type(pdev)) {
> +	case PCI_EXP_TYPE_ENDPOINT:
> +	case PCI_EXP_TYPE_ROOT_PORT:
> +	case PCI_EXP_TYPE_UPSTREAM:
> +	case PCI_EXP_TYPE_DOWNSTREAM:
> +		break;
> +	default:
>  		return false;
> +	}
>  
>  	return is_aer_internal_error(info);
>  }


^ permalink raw reply

* Re: [PATCH v18 02/13] acpi/apei/ghes: Use raw_spinlock_t for CXL CPER work locks
From: Bowman, Terry @ 2026-07-20 20:38 UTC (permalink / raw)
  To: Dave Jiang, Bjorn Helgaas, Dan Williams, Ira Weiny,
	Jonathan Cameron, Len Brown, Rafael J . Wysocki, Robert Richter
  Cc: linux-acpi, linux-cxl, linux-doc, linux-kernel, linux-pci,
	linuxppc-dev, Alejandro Lucero, Alison Schofield, Ankit Agrawal,
	Ard Biesheuvel, Ben Cheatham, Borislav Petkov, Breno Leitao,
	Davidlohr Bueso, Fabio M . De Francesco, Gregory Price,
	Hanjun Guo, Jonathan Corbet, Kees Cook,
	Kuppuswamy Sathyanarayanan, Li Ming, Mahesh J Salgaonkar,
	Mauro Carvalho Chehab, Oliver O'Halloran, Shiju Jose,
	Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck, Vishal Verma,
	linux-pci@vger.kernel.org, linux-acpi,
	linux-kernel@vger.kernel.org, linux-doc, linuxppc-dev
In-Reply-To: <e66c4a00-53e4-4f5a-bc07-edd9c92beb96@intel.com>

On 7/20/2026 3:12 PM, Dave Jiang wrote:
> 
> 
> On 7/17/26 3:26 PM, Terry Bowman wrote:
>> The CXL CPER work registration and unregistration helpers in
>> drivers/acpi/apei/ghes.c acquire cxl_cper_work_lock and
>> cxl_cper_prot_err_work_lock with guard(spinlock), which leaves local
>> interrupts enabled. The corresponding post paths
>> (cxl_cper_post_event(), cxl_cper_post_prot_err()) execute in hard IRQ
>> context (they are called from the GHES error notification path) and
>> acquire the same locks via guard(spinlock_irqsave).
>>
>> If a CPU is holding one of these locks via guard(spinlock) when a
>> GHES interrupt arrives on the same CPU, the IRQ handler spins on the
>> held lock waiting for it to release, while the lock holder is
>> preempted by the IRQ. The result is a deadlock.
>>
>> Convert both locks from spinlock_t to raw_spinlock_t and use
>> guard(raw_spinlock_irqsave) at all call sites. On PREEMPT_RT kernels
>> spinlock_t is backed by rt_mutex and sleeping from hard IRQ context is
>> not permitted; raw_spinlock_t is safe in both contexts.
>>
>> Add WARN_ONCE to both register functions to surface double-registration
>> bugs at runtime.
>>
>> Restructure both unregister functions to clear the global work pointer
>> under the lock before calling cancel_work_sync(), closing the window
>> where a CPER interrupt could schedule work on a pointer about to be
>> freed. Add kfifo_reset() after cancel_work_sync() so stale entries
>> are not replayed on next module load.
>>
>> Both kfifos are single-consumer: only one work_struct is registered at
>> a time, enforced by the WARN_ONCE guard in the register functions.
>> kfifo_reset() is safe outside the lock because cancel_work_sync() has
>> already quiesced the consumer, and no new consumer can register until
>> the current module exit completes and a fresh module init runs.
>>
>> Remove the now-redundant cancel_work_sync() call from
>> cxl_pci_driver_exit() - cxl_cper_unregister_work() handles quiescing
>> internally.
>>
>> Reported-by: Sashiko <sashiko@linuxfoundation.org>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Fixes: 5e4a264bf8b5 ("acpi/ghes: Process CXL Component Events")
>> Fixes: 36f257e3b0ba ("acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors")
>> Cc: stable@vger.kernel.org
> 
> With the minor sashiko issue addressed,
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
> This can probably be picked up ahead of the series.
> 
> 

Yes.

-Terry

>>
>> ---
>>
>> Changes in v17 -> v18:
>> - New patch.
>> ---
>>  drivers/acpi/apei/ghes.c | 50 ++++++++++++++++++++++++++--------------
>>  drivers/cxl/pci.c        |  1 -
>>  2 files changed, 33 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 3236a3ce79d6b..ca7a138c1ff2e 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -749,7 +749,7 @@ static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
>>  		    CXL_CPER_PROT_ERR_FIFO_DEPTH);
>>  
>>  /* Synchronize schedule_work() with cxl_cper_prot_err_work changes */
>> -static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
>> +static DEFINE_RAW_SPINLOCK(cxl_cper_prot_err_work_lock);
>>  struct work_struct *cxl_cper_prot_err_work;
>>  
>>  static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
>> @@ -761,7 +761,7 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
>>  	if (cxl_cper_sec_prot_err_valid(prot_err))
>>  		return;
>>  
>> -	guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
>> +	guard(raw_spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
>>  
>>  	if (!cxl_cper_prot_err_work)
>>  		return;
>> @@ -780,10 +780,11 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
>>  
>>  int cxl_cper_register_prot_err_work(struct work_struct *work)
>>  {
>> -	if (cxl_cper_prot_err_work)
>> -		return -EINVAL;
>> +	guard(raw_spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
>>  
>> -	guard(spinlock)(&cxl_cper_prot_err_work_lock);
>> +	if (WARN_ONCE(cxl_cper_prot_err_work,
>> +		      "CPER-CXL kfifo consumer already registered\n"))
>> +		return -EINVAL;
>>  	cxl_cper_prot_err_work = work;
>>  	return 0;
>>  }
>> @@ -791,11 +792,18 @@ EXPORT_SYMBOL_NS_GPL(cxl_cper_register_prot_err_work, "CXL");
>>  
>>  int cxl_cper_unregister_prot_err_work(struct work_struct *work)
>>  {
>> -	if (cxl_cper_prot_err_work != work)
>> -		return -EINVAL;
>> +	scoped_guard(raw_spinlock_irqsave, &cxl_cper_prot_err_work_lock) {
>> +		if (WARN_ONCE(cxl_cper_prot_err_work != work,
>> +			      "CPER-CXL kfifo consumer mismatch on unregister\n"))
>> +			return -EINVAL;
>> +		cxl_cper_prot_err_work = NULL;
>> +	}
>> +
>> +	cancel_work_sync(work);
>> +
>> +	/* Discard stale entries so they are not replayed on next module load */
>> +	kfifo_reset(&cxl_cper_prot_err_fifo);
>>  
>> -	guard(spinlock)(&cxl_cper_prot_err_work_lock);
>> -	cxl_cper_prot_err_work = NULL;
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, "CXL");
>> @@ -811,7 +819,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_cper_prot_err_kfifo_get, "CXL");
>>  DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, CXL_CPER_FIFO_DEPTH);
>>  
>>  /* Synchronize schedule_work() with cxl_cper_work changes */
>> -static DEFINE_SPINLOCK(cxl_cper_work_lock);
>> +static DEFINE_RAW_SPINLOCK(cxl_cper_work_lock);
>>  struct work_struct *cxl_cper_work;
>>  
>>  static void cxl_cper_post_event(enum cxl_event_type event_type,
>> @@ -831,7 +839,7 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
>>  		return;
>>  	}
>>  
>> -	guard(spinlock_irqsave)(&cxl_cper_work_lock);
>> +	guard(raw_spinlock_irqsave)(&cxl_cper_work_lock);
>>  
>>  	if (!cxl_cper_work)
>>  		return;
>> @@ -849,10 +857,11 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
>>  
>>  int cxl_cper_register_work(struct work_struct *work)
>>  {
>> -	if (cxl_cper_work)
>> +	guard(raw_spinlock_irqsave)(&cxl_cper_work_lock);
>> +	if (WARN_ONCE(cxl_cper_work,
>> +		      "CXL CPER kfifo consumer already registered\n"))
>>  		return -EINVAL;
>>  
>> -	guard(spinlock)(&cxl_cper_work_lock);
>>  	cxl_cper_work = work;
>>  	return 0;
>>  }
>> @@ -860,11 +869,18 @@ EXPORT_SYMBOL_NS_GPL(cxl_cper_register_work, "CXL");
>>  
>>  int cxl_cper_unregister_work(struct work_struct *work)
>>  {
>> -	if (cxl_cper_work != work)
>> -		return -EINVAL;
>> +	scoped_guard(raw_spinlock_irqsave, &cxl_cper_work_lock) {
>> +		if (WARN_ONCE(cxl_cper_work != work,
>> +			      "CXL CPER kfifo consumer mismatch on unregister\n"))
>> +			return -EINVAL;
>> +		cxl_cper_work = NULL;
>> +	}
>> +
>> +	cancel_work_sync(work);
>> +
>> +	/* Discard stale entries so they are not replayed on next module load */
>> +	kfifo_reset(&cxl_cper_fifo);
>>  
>> -	guard(spinlock)(&cxl_cper_work_lock);
>> -	cxl_cper_work = NULL;
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_work, "CXL");
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 267c679b0b3c2..7c6faee7f85ed 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -1083,7 +1083,6 @@ static int __init cxl_pci_driver_init(void)
>>  static void __exit cxl_pci_driver_exit(void)
>>  {
>>  	cxl_cper_unregister_work(&cxl_cper_work);
>> -	cancel_work_sync(&cxl_cper_work);
>>  	pci_unregister_driver(&cxl_pci_driver);
>>  }
>>  
> 


^ permalink raw reply

* Re: [PATCH v18 01/13] cxl/ras: Fix cxl_rch_get_aer_severity() wrong severity register
From: Bowman, Terry @ 2026-07-20 20:36 UTC (permalink / raw)
  To: Dave Jiang, Bjorn Helgaas, Dan Williams, Ira Weiny,
	Jonathan Cameron, Len Brown, Rafael J . Wysocki, Robert Richter
  Cc: linux-acpi, linux-cxl, linux-doc, linux-kernel, linux-pci,
	linuxppc-dev, Alejandro Lucero, Alison Schofield, Ankit Agrawal,
	Ard Biesheuvel, Ben Cheatham, Borislav Petkov, Breno Leitao,
	Davidlohr Bueso, Fabio M . De Francesco, Gregory Price,
	Hanjun Guo, Jonathan Corbet, Kees Cook,
	Kuppuswamy Sathyanarayanan, Li Ming, Mahesh J Salgaonkar,
	Mauro Carvalho Chehab, Oliver O'Halloran, Shiju Jose,
	Shuah Khan, Shuai Xue, Smita Koralahalli, Tony Luck, Vishal Verma,
	linux-cxl@vger.kernel.org, linux-acpi, linux-doc,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linuxppc-dev
In-Reply-To: <9c371c59-c871-4988-83d7-2aa780d8e6c4@intel.com>

On 7/20/2026 3:09 PM, Dave Jiang wrote:
> 
> 
> On 7/17/26 3:26 PM, Terry Bowman wrote:
>> cxl_rch_get_aer_severity() classifies RCH Downstream Port uncorrectable
>> errors as fatal or non-fatal by ANDing uncorrectable status with
>> PCI_ERR_ROOT_FATAL_RCV. This is wrong because PCI_ERR_ROOT_FATAL_RCV is a
>> Root Error Status register bit (bit 6), not a severity bit. ANDing it
>> against uncorrectable status tests a reserved bit and produces incorrect
>> severity classification.
>>
>> Fix by ANDing the unmasked uncor_status against uncor_severity. Per
>> PCIe Base Spec r6.0 Section 7.8.4.4, each bit in the Uncorrectable
>> Error Severity register indicates whether the corresponding error is
>> fatal (1) or non-fatal (0).
>>
>> Fixes: 6ac07883dbb5 ("cxl/pci: Add RCH downstream port error logging")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
> This can be picked up ahead looks like?
> 
Yes.

-Terry

>>
>> ---
>>
>> Changes in v17 -> v18:
>> - New patch.
>> ---
>>  drivers/cxl/core/ras_rch.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
>> index 0a8b3b9b63884..44b335d560708 100644
>> --- a/drivers/cxl/core/ras_rch.c
>> +++ b/drivers/cxl/core/ras_rch.c
>> @@ -80,7 +80,8 @@ static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
>>  				     int *severity)
>>  {
>>  	if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
>> -		if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
>> +		if ((aer_regs->uncor_status & ~aer_regs->uncor_mask) &
>> +		    aer_regs->uncor_severity)
>>  			*severity = AER_FATAL;
>>  		else
>>  			*severity = AER_NONFATAL;
> 


^ permalink raw reply

* Re: [PATCH v3] docs: dt: maintainer: Add Devicetree and OF maintainer profile document
From: Frank Li @ 2026-07-20 20:34 UTC (permalink / raw)
  To: Rob Herring
  Cc: Krzysztof Kozlowski, Krzysztof Kozlowski, Conor Dooley,
	Jonathan Corbet, Shuah Khan, devicetree, workflows, linux-doc,
	linux-kernel, Saravana Kannan
In-Reply-To: <CAL_JsqKkxVTVE=o-uar=JEVvqw9NhBAQcBrJ0ukf+RBXCwFZJA@mail.gmail.com>

On Mon, Jul 20, 2026 at 03:12:17PM -0500, Rob Herring wrote:
> On Mon, Jul 20, 2026 at 1:40 PM Krzysztof Kozlowski
> <krzysztof.kozlowski@oss.qualcomm.com> wrote:
> >
> > Document how Devicetree and Open Firmware maintainers handle their
> > subsystem, especially focusing on two caveats:
> >
> > Devicetree subsystem handles patches with a minor difference comparing
> > to other subsystems: while DT maintainers pick up OF code, they only
> > provide review of DT bindings without applying these.
> >
> > All three DT bindings maintainers rely currently on Patchwork and due to
> > enormous amount of emails per day, regardless how much DT maintainers
> > try, they cannot read all the emails.
> >
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: Conor Dooley <conor+dt@kernel.org>
> > Cc: Saravana Kannan <saravanak@kernel.org>
> > Cc: devicetree@vger.kernel.org
> > Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
> >
> > ---
> >
> > I expect patch to be picked up by Rob, after review.
> >
> > Changes in v3:
> > 1. Add also F: entry
> >
> > Changes in v2:
> > 1. Correct typos and trailing white spaces.
> > 2. Fix order of P: after C: in maintainers.
> > ---
> >  .../process/maintainer-devicetree.rst         | 70 +++++++++++++++++++
> >  MAINTAINERS                                   |  3 +
> >  2 files changed, 73 insertions(+)
> >  create mode 100644 Documentation/process/maintainer-devicetree.rst
> >
> > diff --git a/Documentation/process/maintainer-devicetree.rst b/Documentation/process/maintainer-devicetree.rst
> > new file mode 100644
> > index 000000000000..d8ffe752bf5d
> > --- /dev/null
> > +++ b/Documentation/process/maintainer-devicetree.rst
> > @@ -0,0 +1,70 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +======================================
> > +Devicetree and Open Firmware Subsystem
> > +======================================
> > +
> > +Other Process Documents
> > +-----------------------
> > +
> > +Please see the documents in Documentation/devicetree/bindings/ for information
> > +on how to write proper Devicetree bindings and how to submit patches.
> > +
> > +Patch Review and Handling
> > +-------------------------
> > +
> > +Patches handled by Devicetree maintainers are processed differently depending
> > +on the patch type:
> > +
> > +1. Core OF driver code, e.g. drivers/of/:
> > +   patches are reviewed and applied by DT maintainers.
> > +
> > +2. Devicetree bindings:
> > +   patches are reviewed by DT maintainers but, except in certain cases, should
> > +   be applied by subsystem maintainers.  See also *For kernel maintainers* in
> > +   Documentation/devicetree/bindings/submitting-patches.rst.
>
> I would reword:
>
> patches are reviewed by DT maintainers, but should be applied by
> subsystem maintainers except in certain cases.

It would be wonderful if provide a few sample *certain* cases.

Anyways, nice document.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
>
> > +
> > +3. DTS and drivers:
> > +   DT maintainers might provide comments, but review is generally not expected.
>
> For DTS, we expect to pass schema checks or at least not add new warnings.
>
> I can address these 2 when applying.
>
> Rob

^ permalink raw reply

page: next (older)
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox