* [RFC PATCH v3 0/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs
@ 2025-03-21 2:36 Yuquan Wang
2025-03-21 2:36 ` [RFC PATCH v3 1/2] mm: numa_memblks: introduce numa_add_reserved_memblk Yuquan Wang
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Yuquan Wang @ 2025-03-21 2:36 UTC (permalink / raw)
To: Jonathan.Cameron, dan.j.williams, rppt, akpm, david, bfaccini,
rafael, lenb, dave, dave.jiang, alison.schofield, vishal.l.verma,
ira.weiny, rrichter, haibo1.xu
Cc: linux-acpi, linux-cxl, linux-kernel, chenbaozi, Yuquan Wang
Changes in v3:
- Introduce numa_add_reserved_memblk()
- Use more explicit way to make sure CFMWs fake node >= 1
- Replace numa_add_memblk() with numa_add_reserved_memblk() in acpi_parse_cfmws()
Problem
-------
The absence of SRAT would cause the fake_pxm to be -1 and increment
to 0, then send to acpi_parse_cfmws(). If there exists CXL memory
ranges that are defined in the CFMWS and not already defined in the
SRAT, the new node (node0) for the CXL memory would be invalid, as
node0 is already in "used", and all CXL memory might be online on
node0.
Control cxl fake node value
---------------------------
This utilizes node_set(0, nodes_found_map) to set pxm&node map. With
this setting, acpi_map_pxm_to_node() could return the expected node
value even if no SRAT.
numa_add_memblk() problem on this issue
---------------------------------------
If SRAT is valid, the numa_memblks_init() would then utilize
numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to
numa_reserved_meminfo in CFMWs fake node situation.
If SRAT is missing or bad, the numa_memblks_init() would fail since
init_func() would fail. And it causes that no numa_memblk in
numa_reserved_meminfo list and the following dax_cxl driver could
find the expected fake node.
numa_add_reserved_memblk()
--------------------------
Use numa_add_reserved_memblk() to replace numa_add_memblk(), since
the cxl numa_memblk added by numa_add_memblk() would finally be moved
to numa_reserved_meminfo, and numa_add_reserved_memblk() here could
add cxl numa_memblk into reserved list directly. Hence, no matter
SRAT is good or not, cxl numa_memblk could be allocated to reserved
list.
dax_kmem problem
----------------
The dax_kmem driver will fail if no SRAT because something wrong in
memory_group_register_static(). The good result is our cxl memory would
not be assigned to node0 anymore!
Discussion
----------
As papering these things looks like not easily, in v2 patch[1] I chose
to aggressively fail the acpi_parse_cfmws() in srat.c since it mainly
works for building cxl fake nodes and also fail the CXL init in
cxl_acpi_probe per the discussion from Jonathan[2]. Actually, I am not
sure if this(v3) is the best way to handle this issue, so hopes more
comments to guide me!
Link:
[1] https://lore.kernel.org/linux-cxl/Z9RfnXQlr9G1qz00@aschofie-mobl2.lan/T/#t
[2] https://lists.nongnu.org/archive/html/qemu-devel/2025-03/msg03668.html
Yuquan Wang (2):
mm: numa_memblks: introduce numa_add_reserved_memblk
ACPI: NUMA: debug invalid unused PXM value for CFMWs
drivers/acpi/numa/srat.c | 11 ++++++++---
include/linux/numa_memblks.h | 1 +
mm/numa_memblks.c | 16 ++++++++++++++++
3 files changed, 25 insertions(+), 3 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH v3 1/2] mm: numa_memblks: introduce numa_add_reserved_memblk
2025-03-21 2:36 [RFC PATCH v3 0/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
@ 2025-03-21 2:36 ` Yuquan Wang
2025-03-26 23:33 ` Dan Williams
2025-03-21 2:36 ` [RFC PATCH v3 2/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
2025-03-26 23:19 ` [RFC PATCH v3 0/2] " Dan Williams
2 siblings, 1 reply; 8+ messages in thread
From: Yuquan Wang @ 2025-03-21 2:36 UTC (permalink / raw)
To: Jonathan.Cameron, dan.j.williams, rppt, akpm, david, bfaccini,
rafael, lenb, dave, dave.jiang, alison.schofield, vishal.l.verma,
ira.weiny, rrichter, haibo1.xu
Cc: linux-acpi, linux-cxl, linux-kernel, chenbaozi, Yuquan Wang
With numa_add_reserved_memblk(), kernel could add numa_memblk into
numa_reserved_meminfo directly.
In previous, such process relies on numa_add_memblk() adding to
numa_meminfo list first and then uses numa_move_tail_memblk() to
move one from numa_meminfo to numa_reserved_meminfo.
Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
---
include/linux/numa_memblks.h | 1 +
mm/numa_memblks.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/include/linux/numa_memblks.h b/include/linux/numa_memblks.h
index dd85613cdd86..991076cba7c5 100644
--- a/include/linux/numa_memblks.h
+++ b/include/linux/numa_memblks.h
@@ -22,6 +22,7 @@ struct numa_meminfo {
};
int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+int __init numa_add_reserved_memblk(int nid, u64 start, u64 end);
void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi);
int __init numa_cleanup_meminfo(struct numa_meminfo *mi);
diff --git a/mm/numa_memblks.c b/mm/numa_memblks.c
index ff4054f4334d..e70c76cc46dd 100644
--- a/mm/numa_memblks.c
+++ b/mm/numa_memblks.c
@@ -200,6 +200,22 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
return numa_add_memblk_to(nid, start, end, &numa_meminfo);
}
+/**
+ * numa_add_reserved_memblk - Add one numa_memblk to numa_reserved_meminfo
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @end: End address of the new memblk
+ *
+ * Add a new memblk to the numa_reserved_meminfo.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+int __init numa_add_reserved_memblk(int nid, u64 start, u64 end)
+{
+ return numa_add_memblk_to(nid, start, end, &numa_reserved_meminfo);
+}
+
/**
* numa_cleanup_meminfo - Cleanup a numa_meminfo
* @mi: numa_meminfo to clean up
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH v3 2/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs
2025-03-21 2:36 [RFC PATCH v3 0/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
2025-03-21 2:36 ` [RFC PATCH v3 1/2] mm: numa_memblks: introduce numa_add_reserved_memblk Yuquan Wang
@ 2025-03-21 2:36 ` Yuquan Wang
2025-03-21 11:51 ` Jonathan Cameron
2025-03-26 23:48 ` Dan Williams
2025-03-26 23:19 ` [RFC PATCH v3 0/2] " Dan Williams
2 siblings, 2 replies; 8+ messages in thread
From: Yuquan Wang @ 2025-03-21 2:36 UTC (permalink / raw)
To: Jonathan.Cameron, dan.j.williams, rppt, akpm, david, bfaccini,
rafael, lenb, dave, dave.jiang, alison.schofield, vishal.l.verma,
ira.weiny, rrichter, haibo1.xu
Cc: linux-acpi, linux-cxl, linux-kernel, chenbaozi, Yuquan Wang
The absence of SRAT would cause the fake_pxm to be -1 and increment
to 0, then send to acpi_parse_cfmws(). If there exists CXL memory
ranges that are defined in the CFMWS and not already defined in the
SRAT, the new node (node0) for the CXL memory would be invalid, as
node0 is already in "used", and all CXL memory might be online on
node0.
This utilizes node_set(0, nodes_found_map) to set pxm&node map. With
this setting, acpi_map_pxm_to_node() could return the expected node
value even if no SRAT.
If SRAT is valid, the numa_memblks_init() would then utilize
numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to
numa_reserved_meminfo in CFMWs fake node situation.
If SRAT is missing or bad, the numa_memblks_init() would fail since
init_func() would fail. And it causes that no numa_memblk in
numa_reserved_meminfo list and the following dax_cxl driver could
find the expected fake node.
Use numa_add_reserved_memblk() to replace numa_add_memblk(), since
the cxl numa_memblk added by numa_add_memblk() would finally be moved
to numa_reserved_meminfo, and numa_add_reserved_memblk() here could
add cxl numa_memblk into reserved list directly. Hence, no matter
SRAT is good or not, cxl numa_memblk could be allocated to reserved
list.
Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
---
drivers/acpi/numa/srat.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 00ac0d7bb8c9..50bfecfb9c16 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -458,11 +458,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
return -EINVAL;
}
- if (numa_add_memblk(node, start, end) < 0) {
+ if (numa_add_reserved_memblk(node, start, end) < 0) {
/* CXL driver must handle the NUMA_NO_NODE case */
pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
node, start, end);
}
+
node_set(node, numa_nodes_parsed);
/* Set the next available fake_pxm value */
@@ -646,8 +647,12 @@ int __init acpi_numa_init(void)
if (node_to_pxm_map[i] > fake_pxm)
fake_pxm = node_to_pxm_map[i];
}
- last_real_pxm = fake_pxm;
- fake_pxm++;
+
+ /* Make sure CFMWs fake node >= 1 */
+ fake_pxm = max(fake_pxm, 0);
+ last_real_pxm = fake_pxm++;
+ node_set(0, nodes_found_map);
+
acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws,
&fake_pxm);
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v3 2/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs
2025-03-21 2:36 ` [RFC PATCH v3 2/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
@ 2025-03-21 11:51 ` Jonathan Cameron
2025-03-26 15:00 ` Mike Rapoport
2025-03-26 23:48 ` Dan Williams
1 sibling, 1 reply; 8+ messages in thread
From: Jonathan Cameron @ 2025-03-21 11:51 UTC (permalink / raw)
To: Yuquan Wang
Cc: dan.j.williams, rppt, akpm, david, bfaccini, rafael, lenb, dave,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, rrichter,
haibo1.xu, linux-acpi, linux-cxl, linux-kernel, chenbaozi
On Fri, 21 Mar 2025 10:36:02 +0800
Yuquan Wang <wangyuquan1236@phytium.com.cn> wrote:
> The absence of SRAT would cause the fake_pxm to be -1 and increment
> to 0, then send to acpi_parse_cfmws(). If there exists CXL memory
> ranges that are defined in the CFMWS and not already defined in the
> SRAT, the new node (node0) for the CXL memory would be invalid, as
> node0 is already in "used", and all CXL memory might be online on
> node0.
>
> This utilizes node_set(0, nodes_found_map) to set pxm&node map. With
> this setting, acpi_map_pxm_to_node() could return the expected node
> value even if no SRAT.
>
> If SRAT is valid, the numa_memblks_init() would then utilize
> numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to
> numa_reserved_meminfo in CFMWs fake node situation.
I would call out that numa_move_tail_memblk() is called in
numa_cleanup_meminfo() which is indeed called by num_memblks_init()
>
> If SRAT is missing or bad, the numa_memblks_init() would fail since
> init_func() would fail. And it causes that no numa_memblk in
> numa_reserved_meminfo list and the following dax_cxl driver could
> find the expected fake node.
>
> Use numa_add_reserved_memblk() to replace numa_add_memblk(), since
> the cxl numa_memblk added by numa_add_memblk() would finally be moved
> to numa_reserved_meminfo, and numa_add_reserved_memblk() here could
> add cxl numa_memblk into reserved list directly. Hence, no matter
> SRAT is good or not, cxl numa_memblk could be allocated to reserved
> list.
>
> Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
This definitely wants input from Mike Rapoport.
Looks fine to me, but there may be some subtle corners I'm missing.
> ---
> drivers/acpi/numa/srat.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 00ac0d7bb8c9..50bfecfb9c16 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -458,11 +458,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> return -EINVAL;
> }
>
> - if (numa_add_memblk(node, start, end) < 0) {
> + if (numa_add_reserved_memblk(node, start, end) < 0) {
> /* CXL driver must handle the NUMA_NO_NODE case */
> pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> node, start, end);
> }
> +
Unrelated change. Always give patches a final look through to spot
things like this. Trivial, but they all add noise to what we are focusing on.
> node_set(node, numa_nodes_parsed);
>
> /* Set the next available fake_pxm value */
> @@ -646,8 +647,12 @@ int __init acpi_numa_init(void)
> if (node_to_pxm_map[i] > fake_pxm)
> fake_pxm = node_to_pxm_map[i];
> }
> - last_real_pxm = fake_pxm;
> - fake_pxm++;
> +
> + /* Make sure CFMWs fake node >= 1 */
> + fake_pxm = max(fake_pxm, 0);
> + last_real_pxm = fake_pxm++;
> + node_set(0, nodes_found_map);
> +
> acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws,
> &fake_pxm);
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v3 2/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs
2025-03-21 11:51 ` Jonathan Cameron
@ 2025-03-26 15:00 ` Mike Rapoport
0 siblings, 0 replies; 8+ messages in thread
From: Mike Rapoport @ 2025-03-26 15:00 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Yuquan Wang, dan.j.williams, akpm, david, bfaccini, rafael, lenb,
dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
rrichter, haibo1.xu, linux-acpi, linux-cxl, linux-kernel,
chenbaozi
Hi,
On Fri, Mar 21, 2025 at 11:51:16AM +0000, Jonathan Cameron wrote:
> On Fri, 21 Mar 2025 10:36:02 +0800
> Yuquan Wang <wangyuquan1236@phytium.com.cn> wrote:
>
> > The absence of SRAT would cause the fake_pxm to be -1 and increment
> > to 0, then send to acpi_parse_cfmws(). If there exists CXL memory
> > ranges that are defined in the CFMWS and not already defined in the
> > SRAT, the new node (node0) for the CXL memory would be invalid, as
> > node0 is already in "used", and all CXL memory might be online on
> > node0.
> >
> > This utilizes node_set(0, nodes_found_map) to set pxm&node map. With
> > this setting, acpi_map_pxm_to_node() could return the expected node
> > value even if no SRAT.
> >
> > If SRAT is valid, the numa_memblks_init() would then utilize
> > numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to
> > numa_reserved_meminfo in CFMWs fake node situation.
>
> I would call out that numa_move_tail_memblk() is called in
> numa_cleanup_meminfo() which is indeed called by num_memblks_init()
>
> >
> > If SRAT is missing or bad, the numa_memblks_init() would fail since
> > init_func() would fail. And it causes that no numa_memblk in
> > numa_reserved_meminfo list and the following dax_cxl driver could
> > find the expected fake node.
> >
> > Use numa_add_reserved_memblk() to replace numa_add_memblk(), since
> > the cxl numa_memblk added by numa_add_memblk() would finally be moved
> > to numa_reserved_meminfo, and numa_add_reserved_memblk() here could
> > add cxl numa_memblk into reserved list directly. Hence, no matter
> > SRAT is good or not, cxl numa_memblk could be allocated to reserved
> > list.
> >
> > Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
>
> This definitely wants input from Mike Rapoport.
> Looks fine to me, but there may be some subtle corners I'm missing.
I'm fine with exposing numa_add_reserved_memblk(), but I don't understand
CXL discovery enough to say if adding CXL ranges directly to
numa_reserved_meminfo.
If this is always the case that CFMW regions end up on
numa_reserved_meminfo, adding them there in the first place does make
sense.
> > ---
> > drivers/acpi/numa/srat.c | 11 ++++++++---
> > 1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> > index 00ac0d7bb8c9..50bfecfb9c16 100644
> > --- a/drivers/acpi/numa/srat.c
> > +++ b/drivers/acpi/numa/srat.c
> > @@ -458,11 +458,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> > return -EINVAL;
> > }
> >
> > - if (numa_add_memblk(node, start, end) < 0) {
> > + if (numa_add_reserved_memblk(node, start, end) < 0) {
> > /* CXL driver must handle the NUMA_NO_NODE case */
> > pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> > node, start, end);
> > }
> > +
>
> Unrelated change. Always give patches a final look through to spot
> things like this. Trivial, but they all add noise to what we are focusing on.
>
> > node_set(node, numa_nodes_parsed);
> >
> > /* Set the next available fake_pxm value */
> > @@ -646,8 +647,12 @@ int __init acpi_numa_init(void)
> > if (node_to_pxm_map[i] > fake_pxm)
> > fake_pxm = node_to_pxm_map[i];
> > }
> > - last_real_pxm = fake_pxm;
> > - fake_pxm++;
> > +
> > + /* Make sure CFMWs fake node >= 1 */
> > + fake_pxm = max(fake_pxm, 0);
> > + last_real_pxm = fake_pxm++;
I'd make it more explicit:
/*
* Make sure CFMWs fake nodes follow last_real_pxm, even when SRAT
* is invalid
*/
last_real_pxm = max(fake_pxm, 0);
fake_pxm = last_real_pxm + 1;
> > + node_set(0, nodes_found_map);
> > +
> > acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws,
> > &fake_pxm);
> >
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v3 0/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs
2025-03-21 2:36 [RFC PATCH v3 0/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
2025-03-21 2:36 ` [RFC PATCH v3 1/2] mm: numa_memblks: introduce numa_add_reserved_memblk Yuquan Wang
2025-03-21 2:36 ` [RFC PATCH v3 2/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
@ 2025-03-26 23:19 ` Dan Williams
2 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2025-03-26 23:19 UTC (permalink / raw)
To: Yuquan Wang, Jonathan.Cameron, dan.j.williams, rppt, akpm, david,
bfaccini, rafael, lenb, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rrichter, haibo1.xu
Cc: linux-acpi, linux-cxl, linux-kernel, chenbaozi, Yuquan Wang
Yuquan Wang wrote:
> Changes in v3:
> - Introduce numa_add_reserved_memblk()
> - Use more explicit way to make sure CFMWs fake node >= 1
> - Replace numa_add_memblk() with numa_add_reserved_memblk() in acpi_parse_cfmws()
>
> Problem
> -------
> The absence of SRAT
Can you say more about CXL ACPI platforms that do not have an SRAT? That
seems like the first problem, and this information helps determine
whether this is a fix that needs backporting, or just a cleaner way to
handle this quirk.
> would cause the fake_pxm to be -1 and increment
Not an issue in your patch but it strikes me that "fake_pxm" is
misnamed. There is nothing "fake" about it. It is simply the first
synthesized CXL proximity domain for future CXL region provisioning.
I would not mind a separate patch renaming "fake_pxm" to "cxl_pxm", just
to not confuse future readers with the simulated nodes produced by
mm/numa_emulation.c.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v3 1/2] mm: numa_memblks: introduce numa_add_reserved_memblk
2025-03-21 2:36 ` [RFC PATCH v3 1/2] mm: numa_memblks: introduce numa_add_reserved_memblk Yuquan Wang
@ 2025-03-26 23:33 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2025-03-26 23:33 UTC (permalink / raw)
To: Yuquan Wang, Jonathan.Cameron, dan.j.williams, rppt, akpm, david,
bfaccini, rafael, lenb, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rrichter, haibo1.xu
Cc: linux-acpi, linux-cxl, linux-kernel, chenbaozi, Yuquan Wang
Yuquan Wang wrote:
> With numa_add_reserved_memblk(), kernel could add numa_memblk into
> numa_reserved_meminfo directly.
>
> In previous, such process relies on numa_add_memblk() adding to
> numa_meminfo list first and then uses numa_move_tail_memblk() to
> move one from numa_meminfo to numa_reserved_meminfo.
I would explicitly state the motivation and the use case for the patch.
---
acpi_parse_cfmws() currently adds empty CFMWS ranges to numa_meminfo
with the expectation that numa_cleanup_meminfo moves them to
numa_reserved_meminfo. There is no need for that indirection when it is
known in advance that these unpopulated ranges are meant for
numa_reserved_meminfo in suppot of future hotplug / CXL provisioning.
---
>
> Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
> ---
> include/linux/numa_memblks.h | 1 +
> mm/numa_memblks.c | 16 ++++++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/include/linux/numa_memblks.h b/include/linux/numa_memblks.h
> index dd85613cdd86..991076cba7c5 100644
> --- a/include/linux/numa_memblks.h
> +++ b/include/linux/numa_memblks.h
> @@ -22,6 +22,7 @@ struct numa_meminfo {
> };
>
> int __init numa_add_memblk(int nodeid, u64 start, u64 end);
> +int __init numa_add_reserved_memblk(int nid, u64 start, u64 end);
> void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi);
>
> int __init numa_cleanup_meminfo(struct numa_meminfo *mi);
> diff --git a/mm/numa_memblks.c b/mm/numa_memblks.c
> index ff4054f4334d..e70c76cc46dd 100644
> --- a/mm/numa_memblks.c
> +++ b/mm/numa_memblks.c
> @@ -200,6 +200,22 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
> return numa_add_memblk_to(nid, start, end, &numa_meminfo);
> }
>
> +/**
> + * numa_add_reserved_memblk - Add one numa_memblk to numa_reserved_meminfo
> + * @nid: NUMA node ID of the new memblk
> + * @start: Start address of the new memblk
> + * @end: End address of the new memblk
> + *
> + * Add a new memblk to the numa_reserved_meminfo.
I would say a bit more here about when to use this function. Something
like:
"numa_cleanup_meminfo() reconciles all numa_memblk instances against
memblock_type information and moves any that intersect reserved ranges
to numa_reserved_meminfo. However, when that information is known ahead
of time add the numa_memblk to numa_reserved_meminfo directly."
With those 2 suggestions you can add:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v3 2/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs
2025-03-21 2:36 ` [RFC PATCH v3 2/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
2025-03-21 11:51 ` Jonathan Cameron
@ 2025-03-26 23:48 ` Dan Williams
1 sibling, 0 replies; 8+ messages in thread
From: Dan Williams @ 2025-03-26 23:48 UTC (permalink / raw)
To: Yuquan Wang, Jonathan.Cameron, dan.j.williams, rppt, akpm, david,
bfaccini, rafael, lenb, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rrichter, haibo1.xu
Cc: linux-acpi, linux-cxl, linux-kernel, chenbaozi, Yuquan Wang
Yuquan Wang wrote:
> The absence of SRAT would cause the fake_pxm to be -1 and increment
> to 0, then send to acpi_parse_cfmws(). If there exists CXL memory
> ranges that are defined in the CFMWS and not already defined in the
> SRAT, the new node (node0) for the CXL memory would be invalid, as
> node0 is already in "used", and all CXL memory might be online on
> node0.
It is still not clear to me why this is a problem. If there is no SRAT
and CXL is the first memory proximity domain in the system then it
should be 0.
In other words, if it is a problem that the kernel is picking node0 for
CXL memory when there is no SRAT, the problem is that there is no SRAT.
> This utilizes node_set(0, nodes_found_map) to set pxm&node map. With
> this setting, acpi_map_pxm_to_node() could return the expected node
> value even if no SRAT.
>
> If SRAT is valid, the numa_memblks_init() would then utilize
> numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to
> numa_reserved_meminfo in CFMWs fake node situation.
>
> If SRAT is missing or bad, the numa_memblks_init() would fail since
> init_func() would fail. And it causes that no numa_memblk in
> numa_reserved_meminfo list and the following dax_cxl driver could
> find the expected fake node.
>
> Use numa_add_reserved_memblk() to replace numa_add_memblk(), since
> the cxl numa_memblk added by numa_add_memblk() would finally be moved
> to numa_reserved_meminfo, and numa_add_reserved_memblk() here could
> add cxl numa_memblk into reserved list directly. Hence, no matter
> SRAT is good or not, cxl numa_memblk could be allocated to reserved
> list.
Do you not have other problems due to numa_register_meminfo() not being
called?
I would really like to say that the platform is buggy without an SRAT
and you should not expect anything useful from a NUMA perspective on
such a platform. Everything showing up in node0 in that case sounds
right.
>
> Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
> ---
> drivers/acpi/numa/srat.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 00ac0d7bb8c9..50bfecfb9c16 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -458,11 +458,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> return -EINVAL;
> }
>
> - if (numa_add_memblk(node, start, end) < 0) {
> + if (numa_add_reserved_memblk(node, start, end) < 0) {
This change can move to patch1 with the new justification I suggested.
...then we can have the pxm fixup discussion separately.
> /* CXL driver must handle the NUMA_NO_NODE case */
> pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> node, start, end);
> }
> +
> node_set(node, numa_nodes_parsed);
>
> /* Set the next available fake_pxm value */
> @@ -646,8 +647,12 @@ int __init acpi_numa_init(void)
> if (node_to_pxm_map[i] > fake_pxm)
> fake_pxm = node_to_pxm_map[i];
> }
> - last_real_pxm = fake_pxm;
> - fake_pxm++;
> +
> + /* Make sure CFMWs fake node >= 1 */
> + fake_pxm = max(fake_pxm, 0);
> + last_real_pxm = fake_pxm++;
> + node_set(0, nodes_found_map);
> +
> acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws,
> &fake_pxm);
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-03-26 23:48 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-21 2:36 [RFC PATCH v3 0/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
2025-03-21 2:36 ` [RFC PATCH v3 1/2] mm: numa_memblks: introduce numa_add_reserved_memblk Yuquan Wang
2025-03-26 23:33 ` Dan Williams
2025-03-21 2:36 ` [RFC PATCH v3 2/2] ACPI: NUMA: debug invalid unused PXM value for CFMWs Yuquan Wang
2025-03-21 11:51 ` Jonathan Cameron
2025-03-26 15:00 ` Mike Rapoport
2025-03-26 23:48 ` Dan Williams
2025-03-26 23:19 ` [RFC PATCH v3 0/2] " Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox