Re: [PATCH v2] ACPI: NUMA: debug invalid unused PXM value for CFMWs

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Yuquan Wang <wangyuquan1236@phytium.com.cn>
To: Alison Schofield <alison.schofield@intel.com>
Cc: rafael@kernel.org, lenb@kernel.org, dave@stgolabs.net,
	jonathan.cameron@huawei.com, dave.jiang@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	dan.j.williams@intel.com, rrichter@amd.com, bfaccini@nvidia.com,
	rppt@kernel.org, haibo1.xu@intel.com, chenbaozi@phytium.com.cn,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-cxl@vger.kernel.org
Subject: Re: [PATCH v2] ACPI: NUMA: debug invalid unused PXM value for CFMWs
Date: Fri, 14 Mar 2025 15:55:44 +0800	[thread overview]
Message-ID: <Z9PhAKEFj5ugVuow@phytium.com.cn> (raw)
In-Reply-To: <Z9MHvp6GA_iGwfg0@aschofie-mobl2.lan>

On Thu, Mar 13, 2025 at 09:28:46AM -0700, Alison Schofield wrote:
> On Thu, Mar 13, 2025 at 02:09:07PM +0800, Yuquan Wang wrote:
> > The absence of SRAT would cause the fake_pxm to be -1 and increment
> > to 0, then send to acpi_parse_cfmws(). If there exists CXL memory
> > ranges that are defined in the CFMWS and not already defined in the
> > SRAT, the new node (node0) for the CXL memory would be invalid, as
> > node0 is already in "used".
> > 
> > This utilizes disable_srat() & srat_disabled() to fail CXL init.
> 
> Seems like this fixup has drifted from adjusting the fake_pxm to 
> shutting down CXL parsing. More below -
> 
> > 
> > Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
> > ---
> > 
> > Changes in v2:
> > - Add disable_srat() when fake_pxm is invalid
> > - Add srat_disabled() check in cxl_acpi_probe() and acpi_parse_cfmws()
> > 
> > 
> >  drivers/acpi/numa/srat.c | 10 ++++++++++
> >  drivers/cxl/acpi.c       |  4 ++++
> >  2 files changed, 14 insertions(+)
> > 
> > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> > index 00ac0d7bb8c9..2dac25c9258a 100644
> > --- a/drivers/acpi/numa/srat.c
> > +++ b/drivers/acpi/numa/srat.c
> > @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> >  	start = cfmws->base_hpa;
> >  	end = cfmws->base_hpa + cfmws->window_size;
> >  
> > +	if (srat_disabled()) {
> > +		pr_err("SRAT is missing or bad while processing CFMWS.\n");
> > +		return -EINVAL;
> > +	}
> > +
> 
> This goes too far by shutting down cfmws parsing for lack of SRAT.
>

Actually, I thought there need another patch to fix the follow problem
that the fake node bring when no SRAT.

Detailed description below.

> >  	/*
> >  	 * The SRAT may have already described NUMA details for all,
> >  	 * or a portion of, this CFMWS HPA range. Extend the memblks
> > @@ -646,6 +651,11 @@ int __init acpi_numa_init(void)
> >  		if (node_to_pxm_map[i] > fake_pxm)
> >  			fake_pxm = node_to_pxm_map[i];
> >  	}
> > +
> > +	/* Make sure CFMWs fake nodes start at node[1] */
> > +	if (fake_pxm < 0)
> > +		disable_srat();
> > +
> 
> How does the code above make sure fake node starts at node[1]?
> Would an explicit adjustment like this work?

Thanks for your correction :) Yes, the way I used here is too implicit.

> 
> -       last_real_pxm = fake_pxm;
> -       fake_pxm++;
> +       fake_pxm = max(fake_pxm, 1);
> +       last_real_pxm = fake_pxm--;

I tried the adjustment below: 

	fake_pxm = max(fake_pxm, 0); // 0 because it will increment to 1  
	last_real_pxm = fake_pxm++;

This works but it might only control the parameter sent to acpi_parse_cfmws(). 
According to acpi_map_pxm_to_node(), altough the input fake_pxm is 1 when no
SRAT, the returned node would still be 0 and the following nodes are aslo
incorrect.

Hence, I tried add a new line below:

	fake_pxm = max(fake_pxm, 0);
	last_real_pxm = fake_pxm++;
        node_set(0, nodes_found_map);

As no matter what situation, node[0] would be found and set. With this
setting, acpi_map_pxm_to_node() could return the expected node value
even if no SRAT. :( 

Unfortunately, when we use "cxl create-region" to enable our cxl memory,
it would still be assigned to node[0], because the "numa_add_memblk()"
can only add numa_memblk to numa_meminfo list. 

If our SRAT is OK, the numa_memblks_init() would then utilize
numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to
numa_reserved_meminfo in CFMWs fake node situation. If SRAT is missing
or bad, the numa_memblks_init() would fail since init_func() would fail. 
And it causes that no numa_memblk in numa_reserved_meminfo list and the
following dax&memory_hotplug drivers could not online the expected fake
node. 

Based on the above problem, I have a new patch idea that introduce a new
function in mm/numa_memblks.c: numa_add_reserved_memblk(). It could add
one numa_memblk to nuam_reserved_meminfo directly. Maybe we could call
it in acpi_parse_cfmws() if srat is missing.

In mm/numa_memblks.c:

	int __init numa_add_reserved_memblk(int nid, u64 start, u64 end)
	{		
		return numa_add_memblk_to(nid, start, end, &numa_reserved_meminfo);
	}

In drivers/acpi/numa/srat.c:

	if (srat_disabled()) {
		if (numa_add_reserved_memblk(node, start, end) < 0) {
			pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
			node, start, end);
		}
	}

:( But..., the dax_kmem driver will fail because something wrong in
memory_group_register_static(). The good result is our cxl memory would
not be assigned to node[0] anymore!

BTW, as papering these things looks like not easily, I chose to aggressively
fail the acpi_parse_cfmws() in srat.c since it mainly works for building
cxl fake nodes and also fail the CXL init in cxl_acpi_probe per Jonathan.

Link: https://lists.nongnu.org/archive/html/qemu-devel/2025-03/msg03668.html

Hopes more comments to guide me! I'm a really rookie in kernel community :P

> >  	last_real_pxm = fake_pxm;
> >  	fake_pxm++;
> >  	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws,
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index cb14829bb9be..e75a8ead99f6 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -829,6 +829,10 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> >  	if (rc)
> >  		return rc;
> >  
> > +	/* CXL must be in a NUMA system */
> > +	if (srat_disabled())
> > +		return -EINVAL;
> > +
> >  	cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL);
> >  	if (!cxl_res)
> >  		return -ENOMEM;
> > -- 
> > 2.34.1
> >

next prev parent reply	other threads:[~2025-03-14  7:56 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13  6:09 [PATCH v2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
2025-03-13 15:02 ` Gregory Price
2025-03-14 10:12   ` Jonathan Cameron
2025-03-14 13:38     ` Gregory Price
2025-03-14 16:55       ` Jonathan Cameron
2025-03-13 16:28 ` Alison Schofield
2025-03-14  7:55   ` Yuquan Wang [this message]
2025-03-14 16:55     ` Alison Schofield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9PhAKEFj5ugVuow@phytium.com.cn \
    --to=wangyuquan1236@phytium.com.cn \
    --cc=alison.schofield@intel.com \
    --cc=bfaccini@nvidia.com \
    --cc=chenbaozi@phytium.com.cn \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=haibo1.xu@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=rppt@kernel.org \
    --cc=rrichter@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox