Linux CXL
 help / color / mirror / Atom feed
From: Alejandro Lucero Palau <alucerop@amd.com>
To: Srirangan Madhavan <smadhavan@nvidia.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: Zhi Wang <zhiw@nvidia.com>, Vishal Aslot <vaslot@nvidia.com>,
	Shanker Donthineni <sdonthineni@nvidia.com>,
	linux-cxl@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH v2 2/2] cxl: add support for cxl reset
Date: Fri, 21 Feb 2025 12:43:16 +0000	[thread overview]
Message-ID: <0e88f930-ac33-4455-88f9-c1b4039d2dd0@amd.com> (raw)
In-Reply-To: <20250221043906.1593189-3-smadhavan@nvidia.com>


On 2/21/25 04:39, Srirangan Madhavan wrote:
> Type 2 devices are being introduced and will require finer-grained
> reset mechanisms beyond bus-wide reset methods.
>
> Add support for CXL reset per CXL v3.2 Section 9.6/9.7


Hi,


This seems too early as there is no plans for supporting CXL.cache yet. 
It is the second part of the current ongoing type2 support though.


I guess starting the discussion about how to proceed is not a problem, 
so my comments below, but my first comment is if the decisions about 
what to do should be generic. I think having some helpers for accel 
drivers will be good, but the invocation should be in the hands of the 
accel driver.


> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>   drivers/pci/pci.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++
>   include/cxl/pci.h   |  40 ++++++++----
>   include/linux/pci.h |   2 +-
>   3 files changed, 174 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 3ab1871ecf8a..9108daae252b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5117,6 +5117,151 @@ static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
>   	return rc;
>   }
>   
> +static int cxl_reset_prepare(struct pci_dev *dev, u16 dvsec)
> +{
> +	u32 timeout_us = 100, timeout_tot_us = 10000;
> +	u16 reg, cap;
> +	int rc;
> +
> +	if (!pci_wait_for_pending_transaction(dev))
> +		pci_err(dev, "timed out waiting for pending transaction; performing cxl reset anyway\n");
> +
> +	/* Check if the device is cache capable. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &cap);
> +	if (rc)
> +		return rc;
> +
> +	if (!(cap & CXL_DVSEC_CACHE_CAPABLE))
> +		return 0;
> +
> +	/* Disable cache. WB and invalidate cache if capability is advertised */


I do not know how safe is this. IMO, this needs to be synchronized by 
the accel driver which could imply to tell user space first. In our case 
it would imply to stop rx queues in the netdev for CXL.cache, and tx 
queues for CXL.mem. Doing it unconditionally could make current CXL 
transactions to stall ... although it could be argued the reset event 
implies something is broken, but let's try to do it properly if there is 
a chance of the system not unreliable.


> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	reg |= CXL_DVSEC_DISABLE_CACHING;
> +	/*
> +	 * DEVCTL2 bits are written only once. So check WB+I capability while
> +	 * keeping disable caching set.
> +	 */
> +	if (cap & CXL_DVSEC_CACHE_WBI_CAPABLE)
> +		reg |= CXL_DVSEC_INIT_CACHE_WBI;
> +	pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
> +
> +	/*
> +	 * From Section 9.6: "Software may leverage the cache size reported in
> +	 * the DVSEC CXL Capability2 register to compute a suitable timeout
> +	 * value".
> +	 * Given there is no conversion factor for cache size -> timeout,
> +	 * setting timer for default 10ms.
> +	 */
> +	do {
> +		if (timeout_tot_us == 0)
> +			return -ETIMEDOUT;
> +		usleep_range(timeout_us, timeout_us + 1);
> +		timeout_tot_us -= timeout_us;
> +		rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
> +					  &reg);
> +		if (rc)
> +			return rc;
> +	} while (!(reg & CXL_DVSEC_CACHE_INVALID));
> +
> +	return 0;
> +}
> +
> +static int cxl_reset_init(struct pci_dev *dev, u16 dvsec)


I think cxl_reset_start after the *_prepare call makes more sense here.


> +{
> +	/*
> +	 * Timeout values ref CXL Spec v3.2 Ch 8 Control and Status Registers,
> +	 * under section 8.1.3.1 DVSEC CXL Capability.
> +	 */
> +	u32 reset_timeouts_ms[] = { 10, 100, 1000, 10000, 100000 };
> +	u16 reg;
> +	u32 timeout_ms;
> +	int rc, ind;
> +
> +	/* Check if CXL Reset MEM CLR is supported. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +
> +	if (reg & CXL_DVSEC_CXL_RST_MEM_CLR_CAPABLE) {
> +		rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
> +					  &reg);
> +		if (rc)
> +			return rc;
> +
> +		reg |= CXL_DVSEC_CXL_RST_MEM_CLR_ENABLE;
> +		pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
> +	}
> +
> +	/* Read timeout value. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	ind = FIELD_GET(CXL_DVSEC_CXL_RST_TIMEOUT_MASK, reg);
> +	timeout_ms = reset_timeouts_ms[ind];
> +
> +	/* Write reset config. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +
> +	reg |= CXL_DVSEC_INIT_CXL_RESET;
> +	pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
> +
> +	/* Wait till timeout and then check reset status is complete. */
> +	msleep(timeout_ms);
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_STATUS2_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	if (reg & CXL_DVSEC_CXL_RESET_ERR ||
> +	    ~reg & CXL_DVSEC_CXL_RST_COMPLETE)
> +		return -ETIMEDOUT;
> +
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	reg &= (~CXL_DVSEC_DISABLE_CACHING);
> +	pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
> +
> +	return 0;
> +}
> +
> +/**
> + * cxl_reset - initiate a cxl reset
> + * @dev: device to reset
> + * @probe: if true, return 0 if device can be reset this way
> + *
> + * Initiate a cxl reset on @dev.
> + */
> +static int cxl_reset(struct pci_dev *dev, bool probe)
> +{
> +	u16 dvsec, reg;
> +	int rc;
> +
> +	dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
> +					  CXL_DVSEC_PCIE_DEVICE);
> +	if (!dvsec)
> +		return -ENOTTY;
> +
> +	/* Check if CXL Reset is supported. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
> +	if (rc)
> +		return -ENOTTY;
> +
> +	if (reg & CXL_DVSEC_CXL_RST_CAPABLE == 0)
> +		return -ENOTTY;
> +
> +	if (probe)
> +		return 0;
> +
> +	rc = cxl_reset_prepare(dev, dvsec);
> +	if (rc)
> +		return rc;
> +
> +	return cxl_reset_init(dev, dvsec);


One thing this does not cover, and I do not know if it should, is the 
fact that the device CXL regs will be reset, so the question is if the 
old values should be restored or the device/driver should go through the 
same initialization, if a hotplug device, or do it specifically if 
already present at boot time and the BIOS doing that first 
initialization. In one case the restoration needs to happen, in the 
other the old values/objects need to be removed. I think the second case 
is more problematic because this is likely involving CXL root complex 
configuration performed by the BIOS ... Not trivial at all IMO.


This is the main concern I expressed some time ago when looking at how 
type2 should support resets.


Anyway, thank you for sending this series which will foster further 
discussions about all this.


  parent reply	other threads:[~2025-02-21 12:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21  4:39 [PATCH v2 0/2] Add CXL Reset Support for CXL Devices Srirangan Madhavan
2025-02-21  4:39 ` [PATCH v2 1/2] cxl: de-duplicate cxl DVSEC reg defines Srirangan Madhavan
2025-02-21  4:39 ` [PATCH v2 2/2] cxl: add support for cxl reset Srirangan Madhavan
2025-02-21 10:45   ` Lukas Wunner
2025-02-22  0:13     ` Bjorn Helgaas
2025-04-07 23:45       ` Srirangan Madhavan
2025-08-22 15:33         ` Alejandro Lucero Palau
2025-02-21 12:43   ` Alejandro Lucero Palau [this message]
2025-02-22  5:45   ` kernel test robot
2025-02-22  7:08   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e88f930-ac33-4455-88f9-c1b4039d2dd0@amd.com \
    --to=alucerop@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sdonthineni@nvidia.com \
    --cc=smadhavan@nvidia.com \
    --cc=vaslot@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox