From: Alejandro Lucero Palau <alucerop@amd.com>
To: alejandro.lucero-palau@amd.com, linux-cxl@vger.kernel.org,
netdev@vger.kernel.org, dan.j.williams@intel.com,
edward.cree@amd.com, davem@davemloft.net, kuba@kernel.org,
pabeni@redhat.com, edumazet@google.com, dave.jiang@intel.com
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: Re: [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration
Date: Thu, 26 Feb 2026 16:13:07 +0000 [thread overview]
Message-ID: <07ee40eb-39cb-4951-9464-af72b0ae77b2@amd.com> (raw)
In-Reply-To: <20260201155438.2664640-13-alejandro.lucero-palau@amd.com>
On 2/1/26 15:54, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> CXL region creation involves allocating capacity from Device Physical
> Address (DPA) and assigning it to decode a given Host Physical Address
> (HPA). Before determining how much DPA to allocate the amount of available
> HPA must be determined. Also, not all HPA is created equal, some HPA
> targets RAM, some targets PMEM, some is prepared for device-memory flows
> like HDM-D and HDM-DB, and some is HDM-H (host-only).
>
> In order to support Type2 CXL devices, wrap all of those concerns into
> an API that retrieves a root decoder (platform CXL window) that fits the
> specified constraints and the capacity available for a new region.
>
> Add a complementary function for releasing the reference to such root
> decoder.
>
> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/region.c | 164 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 3 +
> include/cxl/cxl.h | 6 ++
> 3 files changed, 173 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 954b8fcdbac6..bdefd088f5f1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -705,6 +705,170 @@ static int free_hpa(struct cxl_region *cxlr)
> return 0;
> }
>
> +struct cxlrd_max_context {
> + struct device * const *host_bridges;
> + int interleave_ways;
> + unsigned long flags;
> + resource_size_t max_hpa;
> + struct cxl_root_decoder *cxlrd;
> +};
> +
> +static int find_max_hpa(struct device *dev, void *data)
> +{
> + struct cxlrd_max_context *ctx = data;
> + struct cxl_switch_decoder *cxlsd;
> + struct cxl_root_decoder *cxlrd;
> + struct resource *res, *prev;
> + struct cxl_decoder *cxld;
> + resource_size_t free = 0;
> + resource_size_t max;
> + int found = 0;
> +
> + if (!is_root_decoder(dev))
> + return 0;
> +
> + cxlrd = to_cxl_root_decoder(dev);
> + cxlsd = &cxlrd->cxlsd;
> + cxld = &cxlsd->cxld;
> +
> + if ((cxld->flags & ctx->flags) != ctx->flags) {
> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
> + cxld->flags, ctx->flags);
> + return 0;
> + }
> +
> + for (int i = 0; i < ctx->interleave_ways; i++) {
> + for (int j = 0; j < ctx->interleave_ways; j++) {
> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
> + found++;
> + break;
> + }
> + }
> + }
> +
> + if (found != ctx->interleave_ways) {
> + dev_dbg(dev,
> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
> + found, ctx->interleave_ways);
> + return 0;
> + }
> +
> + /*
> + * Walk the root decoder resource range relying on cxl_rwsem.region to
> + * preclude sibling arrival/departure and find the largest free space
> + * gap.
> + */
> + lockdep_assert_held_read(&cxl_rwsem.region);
> + res = cxlrd->res->child;
> +
> + /* With no resource child the whole parent resource is available */
> + if (!res)
> + max = resource_size(cxlrd->res);
> + else
> + max = 0;
> +
> + for (prev = NULL; res; prev = res, res = res->sibling) {
> + if (!prev && res->start == cxlrd->res->start &&
> + res->end == cxlrd->res->end) {
> + max = resource_size(cxlrd->res);
> + break;
> + }
When working on sending this patch independently, as I'm doing for
facilitating all this Type2 integration, I did realize the above check
is completely wrong.
FWIW, I did add it in v22 which was a rush job for getting a version
before LPC to be tested by PJ, and although "it works" for the second
time the driver loads after HDMs are reset during driver unload, it is
embarrassingly wrong and only "fixing" the initialization for the second
and subsequent driver loads. The real problem was (I found it later but
not changed here because v23 does not care) the driver unload was to
releasing the resources of the first region created.
It will not be there in the coming patch for this functionality.
> + /*
> + * Sanity check for preventing arithmetic problems below as a
> + * resource with size 0 could imply using the end field below
> + * when set to unsigned zero - 1 or all f in hex.
> + */
> + if (prev && !resource_size(prev))
> + continue;
> +
> + if (!prev && res->start > cxlrd->res->start) {
> + free = res->start - cxlrd->res->start;
> + max = max(free, max);
> + }
> + if (prev && res->start > prev->end + 1) {
> + free = res->start - prev->end + 1;
> + max = max(free, max);
> + }
> + }
> +
> + if (prev && prev->end + 1 < cxlrd->res->end + 1) {
> + free = cxlrd->res->end + 1 - prev->end + 1;
> + max = max(free, max);
> + }
> +
> + dev_dbg(cxlrd_dev(cxlrd), "found %pa bytes of free space\n", &max);
> + if (max > ctx->max_hpa) {
> + if (ctx->cxlrd)
> + put_device(cxlrd_dev(ctx->cxlrd));
> + get_device(cxlrd_dev(cxlrd));
> + ctx->cxlrd = cxlrd;
> + ctx->max_hpa = max;
> + }
> + return 0;
> +}
> +
> +/**
> + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
> + * @cxlmd: the mem device requiring the HPA
> + * @interleave_ways: number of entries in @host_bridges
> + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
> + * @max_avail_contig: output parameter of max contiguous bytes available in the
> + * returned decoder
> + *
> + * Returns a pointer to a struct cxl_root_decoder
> + *
> + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
> + * in (@max_avail_contig))' is a point in time snapshot. If by the time the
> + * caller goes to use this decoder and its capacity is reduced then caller needs
> + * to loop and retry.
> + *
> + * The returned root decoder has an elevated reference count that needs to be
> + * put with cxl_put_root_decoder(cxlrd).
> + */
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max_avail_contig)
> +{
> + struct cxlrd_max_context ctx = {
> + .flags = flags,
> + .interleave_ways = interleave_ways,
> + };
> + struct cxl_port *root_port;
> + struct cxl_port *endpoint;
> +
> + endpoint = cxlmd->endpoint;
> + if (!endpoint) {
> + dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + ctx.host_bridges = &endpoint->host_bridge;
> +
> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
> + if (!root) {
> + dev_dbg(&endpoint->dev, "endpoint is not related to a root port\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + root_port = &root->port;
> + scoped_guard(rwsem_read, &cxl_rwsem.region)
> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
> +
> + if (!ctx.cxlrd)
> + return ERR_PTR(-ENOMEM);
> +
> + *max_avail_contig = ctx.max_hpa;
> + return ctx.cxlrd;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
> +
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
> +{
> + put_device(cxlrd_dev(cxlrd));
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
> +
> static ssize_t size_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t len)
> {
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 944c5d1ccceb..c7d9b2c2908f 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -706,6 +706,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
> struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
> struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
> bool is_root_decoder(struct device *dev);
> +
> +#define cxlrd_dev(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
> +
> bool is_switch_decoder(struct device *dev);
> bool is_endpoint_decoder(struct device *dev);
> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 92880c26b2d5..834dc7e78934 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -255,4 +255,10 @@ struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
> struct range;
> int cxl_get_region_range(struct cxl_region *region, struct range *range);
> void cxl_unregister_region(struct cxl_region *cxlr);
> +struct cxl_port;
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max);
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
> #endif /* __CXL_CXL_H__ */
next prev parent reply other threads:[~2026-02-26 16:13 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 01/22] cxl: Add type2 " alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-19 8:52 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 02/22] sfc: add cxl support alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 03/22] cxl: Move pci generic code alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 04/22] cxl/sfc: Map cxl component regs alejandro.lucero-palau
2026-03-20 17:22 ` Edward Cree
2026-02-01 15:54 ` [PATCH v23 05/22] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
2026-03-20 17:24 ` Edward Cree
2026-02-01 15:54 ` [PATCH v23 06/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 07/22] sfc: create type2 cxl memdev alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-12 9:16 ` Alejandro Lucero Palau
2026-03-09 22:49 ` PJ Waskiewicz
2026-03-10 13:54 ` Alejandro Lucero Palau
2026-03-13 2:03 ` Dan Williams
2026-03-13 13:10 ` Alejandro Lucero Palau
2026-03-16 14:33 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 09/22] cxl: Add function for obtaining region range alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 10/22] cxl: Export function for unwinding cxl by accelerators alejandro.lucero-palau
2026-02-19 23:16 ` Dave Jiang
2026-02-21 4:48 ` Gregory Price
2026-02-01 15:54 ` [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 8:55 ` Alejandro Lucero Palau
2026-02-19 23:31 ` Dave Jiang
2026-02-20 8:08 ` Alejandro Lucero Palau
2026-03-20 17:25 ` Edward Cree
2026-02-01 15:54 ` [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 9:58 ` Alejandro Lucero Palau
2026-02-19 17:29 ` Cheatham, Benjamin
2026-02-20 15:42 ` Dave Jiang
2026-02-26 16:13 ` Alejandro Lucero Palau [this message]
2026-02-01 15:54 ` [PATCH v23 13/22] sfc: get root decoder alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-11 22:12 ` Cheatham, Benjamin
2026-02-19 10:26 ` Alejandro Lucero Palau
2026-02-13 16:14 ` [PATCH " Gregory Price
2026-02-16 12:34 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 15/22] sfc: get endpoint decoder alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 16/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 17/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-19 10:40 ` Alejandro Lucero Palau
2026-02-19 17:29 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 18/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 19/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-19 10:48 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 20/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 10:50 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 21/22] sfc: create cxl region alejandro.lucero-palau
2026-02-13 16:14 ` [PATCH " Gregory Price
2026-02-20 8:00 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
2026-02-13 16:14 ` [PATCH " Gregory Price
2026-02-20 8:04 ` Alejandro Lucero Palau
2026-02-11 22:12 ` [PATCH v23 00/22] Type2 device basic support Cheatham, Benjamin
2026-03-09 22:43 ` PJ Waskiewicz
2026-03-10 14:02 ` Alejandro Lucero Palau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=07ee40eb-39cb-4951-9464-af72b0ae77b2@amd.com \
--to=alucerop@amd.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alejandro.lucero-palau@amd.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=edward.cree@amd.com \
--cc=kuba@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox