From: Dave Jiang <dave.jiang@intel.com>
To: Alison Schofield <alison.schofield@intel.com>
Cc: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org,
rafael@kernel.org, bp@alien8.de, dan.j.williams@intel.com,
tony.luck@intel.com, dave@stgolabs.net,
jonathan.cameron@huawei.com, ira.weiny@intel.com,
ming.li@zohomail.com
Subject: Re: [PATCH v3 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL
Date: Fri, 21 Feb 2025 17:13:59 -0700 [thread overview]
Message-ID: <22cc0953-aee2-4008-85fb-fdbf9f4f9110@intel.com> (raw)
In-Reply-To: <Z7fudpMdKOSef8TH@aschofie-mobl2.lan>
On 2/20/25 8:09 PM, Alison Schofield wrote:
> On Fri, Jan 17, 2025 at 10:28:31AM -0700, Dave Jiang wrote:
>> The current cxl region size only indicates the size of the CXL memory
>> region without accounting for the extended linear cache size. Retrieve the
>> cache size from HMAT and append that to the cxl region size for the cxl
>> region range that matches the SRAT range that has extended linear cache
>> enabled.
>>
>> The SRAT defines the whole memory range that includes the extended linear
>> cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side
>> Cache Information Structure defines the size of the extended linear cache
>> size and matches to the SRAT Memory Affinity Structure by the memory
>> proxmity domain. Add a helper to match the cxl range to the SRAT memory
>> range in order to retrieve the cache size.
>>
>> There are several places that checks the cxl region range against the
>> decoder range. Use new helper to check between the two ranges and address
>> the new cache size.
>
> This reads like we are inflating the region size by cache size, and then
> changing region set up code to account for the inflation. So, I'm going
> to question where we need to do that inflation.
>
> When the new region param p->cache_size is calculated it is added directly
> to the p->res and that leads to much of the other work in region.c
>
> Could p->cache_size be used as an addend when needed, like:
> - Add it to the insert_resource in construct_region().
> - Add it to the sysfs show's for region resource start and resource size.
>
> Then when we get to dpa to hpa address translation, the p->res start
> doesn't need adjusting either. As it is now, it's the cache start
> and I think it should be the cxl resource start.
>
> The touchpoints may grow in the direction I'm suggesting that make
> it a poorer choice than what is here now. Maybe its time for the
> something like a cxl_resource and a non_cxl_resource that add together
> to make the region_resource.
>
> I haven't been following this patch set all along, just started looking
> yesterday, so I'm prepared to be way off base. Figure blurting it out
> at this point is the faster path forward.
>
> More comments related below...
>
>
>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> snip
>
>> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> snip
>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index b98b1ccffd1c..2d8699a86b24 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -824,6 +824,21 @@ static int match_free_decoder(struct device *dev, void *data)
>> return 1;
>> }
>>
>> +static bool region_res_match_cxl_range(struct cxl_region_params *p,
>> + struct range *range)
>> +{
>> + if (!p->res)
>> + return false;
>> +
>> + /*
>> + * If an extended linear cache region then the CXL range is assumed
>> + * to be fronted by the DRAM range in current known implementation.
>> + * This assumption will be made until a variant implementation exists.
>> + */
>> + return p->res->start + p->cache_size == range->start &&
>> + p->res->end == range->end;
>> +}
>> +
>> static int match_auto_decoder(struct device *dev, void *data)
>> {
>> struct cxl_region_params *p = data;
>> @@ -836,7 +851,7 @@ static int match_auto_decoder(struct device *dev, void *data)
>> cxld = to_cxl_decoder(dev);
>> r = &cxld->hpa_range;
>>
>> - if (p->res && p->res->start == r->start && p->res->end == r->end)
>> + if (region_res_match_cxl_range(p, r))
>> return 1;
>
> if we don't change p->res directly, this isn't needed.
It does get changed so it's needed. A lot of these changes are done after tripping setup failures during testing and debugging.
>
>> return 0;
>> @@ -1424,8 +1439,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>> if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
>> if (cxld->interleave_ways != iw ||
>> cxld->interleave_granularity != ig ||
>> - cxld->hpa_range.start != p->res->start ||
>> - cxld->hpa_range.end != p->res->end ||
>> + !region_res_match_cxl_range(p, &cxld->hpa_range) ||
>
> similar
>
>> ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
>> dev_err(&cxlr->dev,
>> "%s:%s %s expected iw: %d ig: %d %pr\n",
>> @@ -1949,7 +1963,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>> return -ENXIO;
>> }
>>
>> - if (resource_size(cxled->dpa_res) * p->interleave_ways !=
>> + if (resource_size(cxled->dpa_res) * p->interleave_ways + p->cache_size !=
>> resource_size(p->res)) {
>
> similar
>
>> dev_dbg(&cxlr->dev,
>> "%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n",
>> @@ -3221,6 +3235,45 @@ static int match_region_by_range(struct device *dev, void *data)
>> return rc;
>> }
>>
>> +static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
>> + struct resource *res)
>> +{
>> + struct cxl_region_params *p = &cxlr->params;
>> + int nid = phys_to_target_node(res->start);
>> + resource_size_t size, cache_size;
>> + int rc;
>> +
>> + size = resource_size(res);
>> + if (!size)
>> + return -EINVAL;
>> +
>> + rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
>> + if (rc)
>> + return rc;
>> +
>> + if (!cache_size)
>> + return 0;
>> +
>> + if (size != cache_size) {
>> + dev_warn(&cxlr->dev, "Extended Linear Cache is not 1:1, unsupported!");
>> + return -EOPNOTSUPP;
>> + }
>> +
>> + /*
>> + * Move the start of the range to where the cache range starts. The
>> + * implementation assumes that the cache range is in front of the
>> + * CXL range. This is not dictated by the HMAT spec but is how the
>> + * current known implementation is configured.
>> + *
>> + * The cache range is expected to be within the CFMWS. The adjusted
>> + * res->start should not be less than cxlrd->res->start.
>
> Check for 'cache range is expected to be within the CFMWS' ?
Will add
>
>
>> + */
>> + res->start -= cache_size;
>> + p->cache_size = cache_size;
>> +
>> + return 0;
>> +}
>> +
>> /* Establish an empty region covering the given HPA range */
>> static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>> struct cxl_endpoint_decoder *cxled)
>> @@ -3267,6 +3320,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>>
>> *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
>> dev_name(&cxlr->dev));
>> +
>> + rc = cxl_extended_linear_cache_resize(cxlr, res);
>> + if (rc) {
>> + /*
>> + * Failing to support extended linear cache region resize does not
>> + * prevent the region from functioning. Only causes cxl list showing
>> + * incorrect region size.
>
> Also cxlr_hpa_cache_alias() lookups will fail for cxl events, so no
> hpa_alias in trace events.
Right. But it needs to report the near memory alias vs the CXL address. hpa_alias is used interchangeably and not necessarily specific to near or far memory.
>
>> + */
>> + dev_warn(cxlmd->dev.parent,
>> + "Failed to support extended linear cache.\n");
>
> Maybe more specifics of what is/isn't present.
It's just a general catch all for whatever failures from retrieving the cache size and calculate the start address.
>
>> + }
>> +
>> rc = insert_resource(cxlrd->res, res);
>
> Cut off in this diff is the "p->res = res" assignment that follows,
> which then makes all the previous changes regarding matching decoder
> ranges necessary.
yes
>
>
>> if (rc) {
>> /*
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> snip
>
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> snip
>
>> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> snip
>
next prev parent reply other threads:[~2025-02-22 0:14 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 17:28 [PATCH v3 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
2025-01-17 17:28 ` [PATCH v3 1/4] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
2025-02-21 1:42 ` Alison Schofield
2025-02-21 23:45 ` Dave Jiang
2025-01-17 17:28 ` [PATCH v3 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
2025-02-21 3:09 ` Alison Schofield
2025-02-22 0:13 ` Dave Jiang [this message]
2025-01-17 17:28 ` [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
2025-01-21 15:02 ` Jonathan Cameron
2025-02-21 1:30 ` Alison Schofield
2025-02-24 17:32 ` Dave Jiang
2025-01-17 17:28 ` [PATCH v3 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
2025-01-20 5:05 ` Li Ming
2025-01-21 15:14 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=22cc0953-aee2-4008-85fb-fdbf9f4f9110@intel.com \
--to=dave.jiang@intel.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=ming.li@zohomail.com \
--cc=rafael@kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox