public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: <alejandro.lucero-palau@amd.com>, <linux-cxl@vger.kernel.org>,
	<netdev@vger.kernel.org>, <dan.j.williams@intel.com>,
	<edward.cree@amd.com>, <davem@davemloft.net>, <kuba@kernel.org>,
	<pabeni@redhat.com>, <edumazet@google.com>,
	<dave.jiang@intel.com>
Cc: Alejandro Lucero <alucerop@amd.com>,
	Ben Cheatham <benjamin.cheatham@amd.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: Re: [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation
Date: Wed, 21 May 2025 13:23:39 -0700	[thread overview]
Message-ID: <682e364b9dc25_1626e100f8@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <20250514132743.523469-14-alejandro.lucero-palau@amd.com>

alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
> 
> Region creation involves finding available DPA (device-physical-address)
> capacity to map into HPA (host-physical-address) space.
> 
> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
> that tries to allocate the DPA memory the driver requires to operate.The
> memory requested should not be bigger than the max available HPA obtained
> previously with cxl_get_hpa_freespace.
> 
> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
> 
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/cxl/core/hdm.c | 86 ++++++++++++++++++++++++++++++++++++++++++
>  include/cxl/cxl.h      |  5 +++
>  2 files changed, 91 insertions(+)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 70cae4ebf8a4..500df2deceef 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -3,6 +3,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/device.h>
>  #include <linux/delay.h>
> +#include <cxl/cxl.h>
>  
>  #include "cxlmem.h"
>  #include "core.h"
> @@ -546,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
>  	return base;
>  }
>  
> +/**
> + * cxl_dpa_free - release DPA (Device Physical Address)
> + *
> + * @cxled: endpoint decoder linked to the DPA
> + *
> + * Returns 0 or error.
> + */
>  int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>  {
>  	struct cxl_port *port = cxled_to_port(cxled);
> @@ -572,6 +580,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>  	devm_cxl_dpa_release(cxled);
>  	return 0;
>  }
> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>  
>  int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
>  		     enum cxl_partition_mode mode)
> @@ -686,6 +695,83 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
>  	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>  }
>  
> +static int find_free_decoder(struct device *dev, const void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_port *port;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	port = cxled_to_port(cxled);
> +
> +	if (cxled->cxld.id != port->hdm_end + 1)
> +		return 0;
> +
> +	return 1;
> +}
> +
> +/**
> + * cxl_request_dpa - search and reserve DPA given input constraints
> + * @cxlmd: memdev with an endpoint port with available decoders
> + * @mode: DPA operation mode (ram vs pmem)
> + * @alloc: dpa size required
> + *
> + * Returns a pointer to a cxl_endpoint_decoder struct or an error
> + *
> + * Given that a region needs to allocate from limited HPA capacity it
> + * may be the case that a device has more mappable DPA capacity than
> + * available HPA. The expectation is that @alloc is a driver known
> + * value based on the device capacity but it could not be available
> + * due to HPA constraints.
> + *
> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
> + * reserved, or an error pointer. The caller is also expected to own the
> + * lifetime of the memdev registration associated with the endpoint to
> + * pin the decoder registered as well.
> + */
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> +					     enum cxl_partition_mode mode,
> +					     resource_size_t alloc)
> +{
> +	struct cxl_port *endpoint = cxlmd->endpoint;
> +	struct cxl_endpoint_decoder *cxled;
> +	struct device *cxled_dev;
> +	int rc;
> +
> +	if (!IS_ALIGNED(alloc, SZ_256M))
> +		return ERR_PTR(-EINVAL);
> +
> +	down_read(&cxl_dpa_rwsem);
> +	cxled_dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
> +	up_read(&cxl_dpa_rwsem);

In another effort [1] I am trying to get rid of all explicit unlock
management of cxl_dpa_rwsem and cxl_region_rwsem, and ultimately get rid
of all "goto" use in the CXL core. 

[1]: http://lore.kernel.org/20250507072145.3614298-1-dan.j.williams@intel.com

So that conversion here would be:

DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (_T) put_device(&cxled->cxld.dev))
struct cxl_endpoint_decoder *cxl_find_free_decoder(struct cxl_memdev *cxlmd)
{
	struct device *dev;

	scoped_guard(rwsem_read, &cxl_dpa_rwsem)
		dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
	if (dev)
		return to_cxl_endpoint_decoder(dev);
	return NULL;
}

...and then:

struct cxl_endpoint_decoder *cxled __free(put_cxled) = cxl_find_free_decoder(cxlmd);

> +
> +	if (!cxled_dev)
> +		return ERR_PTR(-ENXIO);
> +
> +	cxled = to_cxl_endpoint_decoder(cxled_dev);
> +
> +	if (!cxled) {
> +		rc = -ENODEV;
> +		goto err;
> +	}
> +
> +	rc = cxl_dpa_set_part(cxled, mode);
> +	if (rc)
> +		goto err;
> +
> +	rc = cxl_dpa_alloc(cxled, alloc);

The current user of this interface is sysfs. The expecation there is
that if 2 userspace threads are racing to allocate DPA space, the kernel
will protect itself and not get confused, but the result will be that
one thread loses the race and needs to redo its allocation.

That's not an interface that the kernel can support, so there needs to
be some locking to enforce that 2 threads racing cxl_request_dpa() each
end up with independent allocations. That likely needs to be a
syncrhonization primitive over the entire process due to the way that
CXL requires in-order allocation of DPA and HPA. Effectively you need to
complete the entire HPA allocatcion, DPA allocation, and decoder
programming in one atomic unit.

I think to start since there is only 1 Type-2 driver in the kernel and
it's only use case is single-threaded setup this is not yet an immediate
problem.

  parent reply	other threads:[~2025-05-21 20:23 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 01/22] cxl: Add type2 " alejandro.lucero-palau
2025-05-20  2:43   ` Alison Schofield
2025-05-20  7:18     ` Alejandro Lucero Palau
2025-05-20 20:06       ` Dave Jiang
2025-05-21  9:30         ` Alejandro Lucero Palau
2025-05-20  7:17   ` dan.j.williams
2025-05-21 10:44     ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 02/22] sfc: add cxl support alejandro.lucero-palau
2025-05-20  7:37   ` dan.j.williams
2025-05-21 10:50     ` Alejandro Lucero Palau
2025-05-21 17:12       ` Dan Williams
2025-05-22  8:49         ` Alejandro Lucero Palau
2025-05-22 19:41           ` Dan Williams
2025-06-04  8:09             ` Jonathan Cameron
2025-05-14 13:27 ` [PATCH v16 03/22] cxl: Move pci generic code alejandro.lucero-palau
2025-05-20  2:42   ` Alison Schofield
2025-05-21 17:44   ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 04/22] cxl: Move register/capability check to driver alejandro.lucero-palau
2025-05-20  2:41   ` Alison Schofield
2025-05-21 18:23   ` Dan Williams
2025-05-22  9:45     ` Alejandro Lucero Palau
2025-05-22 19:51       ` Dan Williams
2025-05-23  9:12         ` Alejandro Lucero Palau
2025-05-23 16:55           ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup alejandro.lucero-palau
2025-05-20  2:41   ` Alison Schofield
2025-05-21 18:28   ` Dan Williams
2025-05-22  9:52     ` Alejandro Lucero Palau
2025-05-22 20:04       ` Dan Williams
2025-06-06 11:59         ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 06/22] sfc: make regs setup with checking and set media ready alejandro.lucero-palau
2025-05-21 18:34   ` Dan Williams
2025-05-22 10:07     ` Alejandro Lucero Palau
2025-05-22 20:22       ` Dan Williams
2025-05-22 20:53         ` Dan Williams
2025-05-22 21:09           ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
2025-05-20  2:40   ` Alison Schofield
2025-05-21 18:47   ` Dan Williams
2025-05-22 10:24     ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 08/22] sfc: initialize dpa alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 09/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2025-05-20  2:40   ` Alison Schofield
2025-05-21 18:49   ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 10/22] sfc: create type2 cxl memdev alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2025-05-20  2:36   ` Alison Schofield
2025-05-21 19:31   ` Dan Williams
2025-05-22 10:56     ` Alejandro Lucero Palau
2025-05-22 20:31       ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 12/22] sfc: obtain root decoder with enough HPA free space alejandro.lucero-palau
2025-05-21 19:56   ` Dan Williams
2025-06-06 12:59     ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2025-05-20  2:39   ` Alison Schofield
2025-05-21 20:23   ` Dan Williams [this message]
2025-06-06 13:09     ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 14/22] sfc: get endpoint decoder alejandro.lucero-palau
2025-05-21 20:28   ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
2025-05-20  2:39   ` Alison Schofield
2025-05-14 13:27 ` [PATCH v16 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
2025-05-20  2:37   ` Alison Schofield
2025-05-14 13:27 ` [PATCH v16 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
2025-05-20  2:38   ` Alison Schofield
2025-05-14 13:27 ` [PATCH v16 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2025-05-20  2:37   ` Alison Schofield
2025-05-21 20:45   ` Dan Williams
2025-06-06 13:27     ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax alejandro.lucero-palau
2025-05-20  2:36   ` Alison Schofield
2025-05-21 20:49   ` Dan Williams
2025-06-06 13:39     ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 20/22] sfc: create cxl region alejandro.lucero-palau
2025-05-21 21:01   ` Dan Williams
2025-06-06 13:44     ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
2025-05-20  2:35   ` Alison Schofield
2025-05-21 21:31   ` Dan Williams
2025-06-06 14:03     ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
2025-05-21 21:48   ` Dan Williams
2025-05-23  1:13     ` Edward Cree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=682e364b9dc25_1626e100f8@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=alucerop@amd.com \
    --cc=benjamin.cheatham@amd.com \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=edward.cree@amd.com \
    --cc=kuba@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox