The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: "Dan Williams (nvidia)" <djbw@kernel.org>
To: Srirangan Madhavan <smadhavan@nvidia.com>,
	 linux-cxl@vger.kernel.org,  linux-pci@vger.kernel.org,
	 linux-kernel@vger.kernel.org
Cc: vsethi@nvidia.com,  alwilliamson@nvidia.com,
	 Dan Williams <danwilliams@nvidia.com>,
	 Sai Yashwanth Reddy Kancherla <skancherla@nvidia.com>,
	 Vishal Aslot <vaslot@nvidia.com>,
	 Manish Honap <mhonap@nvidia.com>,  Jiandi An <jan@nvidia.com>,
	 Richard Cheng <icheng@nvidia.com>,
	 linux-tegra@vger.kernel.org,
	 Srirangan Madhavan <smadhavan@nvidia.com>
Subject: Re: [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders
Date: Wed, 03 Jun 2026 15:35:03 -0700	[thread overview]
Message-ID: <6a20ac17906d8_42b91004f@djbw-dev.notmuch> (raw)
In-Reply-To: <20260528083154.137979-2-smadhavan@nvidia.com>

Srirangan Madhavan wrote:
> Add helpers to restore endpoint decoder programming for a CXL memdev from
> CXL core's cached decoder objects, then commit it as a distinct step.
> Callers are expected to have established reset safety and to hold
> cxl_rwsem.region for write.

In the /sys/bus/.../reset path the PCI device only has cached HDM state,
no 'struct cxl_region' objects. So this needs a simpler method to
determine that HDM is unmapped. It probably also needs account for
userspace mappings. The closest approximation to "this phys addr is not
mapped anywhere, even /dev/mem" is a successful __request_region().

> cxl_restore_memdev_decoders() restores programmable decoder state while
> keeping traffic disabled. For HDM-backed endpoints it programs enabled
> endpoint decoder fields without COMMIT, keeps the HDM Decoder Capability
> disabled, and mirrors matching endpoint DVSEC ranges where possible. For
> endpoints without HDM decoder registers, it restores the legacy DVSEC
> ranges that model endpoint decode.

Ideally just use the same helper for restoring the configuration as
writing the configuration. I.e. I am not sure that writing all the
address ranges first is required since per decoder commit is still
required. Effectively trying to maximize identical flows for the reset
and dynamic configuration paths.

> cxl_commit_memdev_decoders() enables the HDM Decoder Capability and
> commits enabled, unlocked endpoint decoders after safety checks pass. It
> sets COMMIT only after decoder fields have been restored, does not
> re-lock decoders, and does not set DVSEC MEM_ENABLE.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/core/hdm.c | 318 ++++++++++++++++++++++++++++++++++++++++-
>  drivers/cxl/cxl.h      |   2 +
>  2 files changed, 317 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 0c80b76a5f9b..f7af1041a9fc 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -679,7 +679,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
>  	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>  }
>  
> -static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
> +static int cxld_set_interleave_fields(struct cxl_decoder *cxld, u32 *ctrl)
>  {
>  	u16 eig;
>  	u8 eiw;
> @@ -690,14 +690,22 @@ static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
>  	 */
>  	if (WARN_ONCE(ways_to_eiw(cxld->interleave_ways, &eiw),
>  		      "invalid interleave_ways: %d\n", cxld->interleave_ways))
> -		return;
> +		return -EINVAL;
>  	if (WARN_ONCE(granularity_to_eig(cxld->interleave_granularity, &eig),
>  		      "invalid interleave_granularity: %d\n",
>  		      cxld->interleave_granularity))
> -		return;
> +		return -EINVAL;
>  
>  	u32p_replace_bits(ctrl, eig, CXL_HDM_DECODER0_CTRL_IG_MASK);
>  	u32p_replace_bits(ctrl, eiw, CXL_HDM_DECODER0_CTRL_IW_MASK);
> +	return 0;
> +}

It is awkward to get all the way down to restoring the interleave to
find that the original interleave settings were invalid to start.

I think it is also impossible. These warnings have never provided any
value and should probably be deleted rather than honored and reflected
up the stack.

> +
> +static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
> +{
> +	if (cxld_set_interleave_fields(cxld, ctrl))
> +		return;
> +
>  	*ctrl |= CXL_HDM_DECODER0_CTRL_COMMIT;
>  }
>  
> @@ -927,6 +935,310 @@ static void cxl_decoder_reset(struct cxl_decoder *cxld)
>  	}
>  }
>  
> +static int cxl_restore_dvsec_range(struct cxl_memdev *cxlmd,
> +				   struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +	u64 base = cxld->hpa_range.start;
> +	u64 size = range_len(&cxld->hpa_range);
> +	u32 lo;
> +	int dvsec = cxlds->cxl_dvsec;
> +	int id = cxld->id;
> +	int rc;
> +
> +	if (!dvsec)
> +		return 0;
> +
> +	if (id >= CXL_DVSEC_RANGE_MAX)
> +		return 0;
> +
> +	rc = pci_write_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(id),
> +				    upper_32_bits(base));
> +	if (rc)
> +		return rc;
> +
> +	rc = pci_read_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(id),
> +				   &lo);
> +	if (rc)
> +		return rc;
> +	lo &= ~PCI_DVSEC_CXL_MEM_BASE_LOW;
> +	lo |= lower_32_bits(base) & PCI_DVSEC_CXL_MEM_BASE_LOW;
> +
> +	rc = pci_write_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(id),
> +				    lo);
> +	if (rc)
> +		return rc;
> +
> +	rc = pci_write_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_SIZE_HIGH(id),
> +				    upper_32_bits(size));
> +	if (rc)
> +		return rc;
> +
> +	rc = pci_read_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
> +				   &lo);
> +	if (rc)
> +		return rc;
> +
> +	/*
> +	 * Preserve MEM_INFO_VALID / MEM_ACTIVE and any reserved bits while
> +	 * restoring only the programmable size bits.
> +	 */
> +	lo &= ~PCI_DVSEC_CXL_MEM_SIZE_LOW;
> +	lo |= lower_32_bits(size) & PCI_DVSEC_CXL_MEM_SIZE_LOW;

No need for lower_32_bits() with the mask.

> +
> +	return pci_write_config_dword(pdev,
> +				      dvsec + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
> +				      lo);
> +}
> +
> +static int cxl_restore_hdm_decoder(struct cxl_hdm *cxlhdm,
> +				   struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	void __iomem *hdm;
> +	u64 base, size, skip;
> +	u32 ctrl;
> +	int id;
> +
> +	id = cxld->id;
> +	hdm = cxlhdm->regs.hdm_decoder;
> +	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
> +	if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
> +		return 0;
> +
> +	base = cxld->hpa_range.start;
> +	size = range_len(&cxld->hpa_range);
> +	skip = cxled->skip;
> +
> +	ctrl &= ~(CXL_HDM_DECODER0_CTRL_LOCK |
> +		  CXL_HDM_DECODER0_CTRL_COMMIT |
> +		  CXL_HDM_DECODER0_CTRL_COMMITTED |
> +		  CXL_HDM_DECODER0_CTRL_COMMIT_ERROR);
> +	if (cxld_set_interleave_fields(cxld, &ctrl))
> +		return -EINVAL;
> +	cxld_set_type(cxld, &ctrl);
> +
> +	/* Preserve setup_hw_decoder() programming order, without COMMIT. */
> +	writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
> +	writel(lower_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
> +	writel(upper_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(id));
> +	writel(lower_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id));
> +	writel(upper_32_bits(skip), hdm + CXL_HDM_DECODER0_SKIP_HIGH(id));
> +	writel(lower_32_bits(skip), hdm + CXL_HDM_DECODER0_SKIP_LOW(id));
> +	wmb();
> +	writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
> +
> +	return 0;
> +}
> +
> +struct cxl_restore_ctx {
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_hdm *cxlhdm;
> +};
> +
> +static int cxl_restore_decoder(struct device *dev, void *data)
> +{
> +	struct cxl_restore_ctx *ctx = data;
> +	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_decoder *cxld;
> +	int rc;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	cxld = &cxled->cxld;
> +	if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
> +		return 0;
> +
> +	if (ctx->cxlhdm->regs.hdm_decoder) {
> +		if (cxld->id >= ctx->cxlhdm->decoder_count)
> +			return -EINVAL;
> +
> +		rc = cxl_restore_hdm_decoder(ctx->cxlhdm, cxled);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return cxl_restore_dvsec_range(ctx->cxlmd, cxled);
> +}
> +
> +static int cxl_restore_decoders(struct cxl_memdev *cxlmd, struct cxl_hdm *cxlhdm)
> +{
> +	struct cxl_port *port = cxlhdm->port;
> +	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
> +	struct cxl_restore_ctx ctx = {
> +		.cxlmd = cxlmd,
> +		.cxlhdm = cxlhdm,
> +	};
> +	u32 global_ctrl;
> +
> +	if (hdm) {
> +		global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
> +		writel(global_ctrl & ~CXL_HDM_DECODER_ENABLE,
> +		       hdm + CXL_HDM_DECODER_CTRL_OFFSET);

After reset, global control should not require re-disabling.

> +	}
> +
> +	return device_for_each_child(&port->dev, &ctx, cxl_restore_decoder);

This gets cleaner when being able to skip the device_for_each_child()
and just walk the array of cached HDM info in the device.

  parent reply	other threads:[~2026-06-03 22:35 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders Srirangan Madhavan
2026-05-28 11:06   ` Richard Cheng
2026-06-02 18:12     ` Dave Jiang
2026-06-02 18:31   ` Dave Jiang
2026-06-02 20:34   ` Cheatham, Benjamin
2026-06-03 22:35   ` Dan Williams (nvidia) [this message]
2026-05-28  8:31 ` [PATCH v6 2/9] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() Srirangan Madhavan
2026-06-02 20:18   ` Dave Jiang
2026-06-03 22:36   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 3/9] cxl: Add reset-idle and cache flush helpers Srirangan Madhavan
2026-06-02 20:34   ` Cheatham, Benjamin
2026-06-02 20:36   ` Dave Jiang
2026-06-04  2:49   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 4/9] PCI/CXL: Add sibling function coordination for reset Srirangan Madhavan
2026-05-28 11:15   ` Richard Cheng
2026-06-02 22:10   ` Dave Jiang
2026-06-04  3:13   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 5/9] cxl/pci: Add CXL DVSEC reset helper Srirangan Madhavan
2026-06-02 20:34   ` Cheatham, Benjamin
2026-05-28  8:31 ` [PATCH v6 6/9] cxl/pci: Track memdevs affected by CXL reset Srirangan Madhavan
2026-06-02 20:34   ` Cheatham, Benjamin
2026-05-28  8:31 ` [PATCH v6 7/9] cxl/pci: Orchestrate CXL reset for affected memdevs Srirangan Madhavan
2026-06-02 20:34   ` Cheatham, Benjamin
2026-06-04  3:25   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 8/9] cxl/memdev: Add cxl_reset sysfs attribute Srirangan Madhavan
2026-06-02 21:35   ` Cheatham, Benjamin
2026-06-02 23:50   ` Dave Jiang
2026-05-28  8:31 ` [PATCH v6 9/9] Documentation/ABI: Document CXL memdev cxl_reset Srirangan Madhavan
2026-06-03  0:11   ` Dave Jiang
2026-06-02 20:34 ` [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Cheatham, Benjamin
2026-06-02 21:42 ` Dan Williams (nvidia)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a20ac17906d8_42b91004f@djbw-dev.notmuch \
    --to=djbw@kernel.org \
    --cc=alwilliamson@nvidia.com \
    --cc=danwilliams@nvidia.com \
    --cc=icheng@nvidia.com \
    --cc=jan@nvidia.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=mhonap@nvidia.com \
    --cc=skancherla@nvidia.com \
    --cc=smadhavan@nvidia.com \
    --cc=vaslot@nvidia.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox