From: Mike Rapoport <rppt@kernel.org>
To: Marc Zyngier <maz@kernel.org>
Cc: linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Zi Yan <ziy@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
stable@vger.kernel.org
Subject: Re: [PATCH] arch_numa: Restore nid checks before registering a memblock with a node
Date: Fri, 29 Nov 2024 11:23:24 +0200 [thread overview]
Message-ID: <Z0mIDBD4KLyxyOCm@kernel.org> (raw)
In-Reply-To: <87v7w6sa5s.wl-maz@kernel.org>
On Fri, Nov 29, 2024 at 08:42:55AM +0000, Marc Zyngier wrote:
> On Fri, 29 Nov 2024 08:24:16 +0000,
> Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Thu, Nov 28, 2024 at 04:52:14PM +0000, Marc Zyngier wrote:
> > > Hi Mike,
> > >
> > > On Thu, 28 Nov 2024 07:03:33 +0000,
> > > Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > Hi Marc,
> > > >
> > > > > diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
> > > > > index e187016764265..5457248eb0811 100644
> > > > > --- a/drivers/base/arch_numa.c
> > > > > +++ b/drivers/base/arch_numa.c
> > > > > @@ -207,7 +207,21 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> > > > > static int __init numa_register_nodes(void)
> > > > > {
> > > > > int nid;
> > > > > -
> > > > > + struct memblock_region *mblk;
> > > > > +
> > > > > + /* Check that valid nid is set to memblks */
> > > > > + for_each_mem_region(mblk) {
> > > > > + int mblk_nid = memblock_get_region_node(mblk);
> > > > > + phys_addr_t start = mblk->base;
> > > > > + phys_addr_t end = mblk->base + mblk->size - 1;
> > > > > +
> > > > > + if (mblk_nid == NUMA_NO_NODE || mblk_nid >= MAX_NUMNODES) {
> > > > > + pr_warn("Warning: invalid memblk node %d [mem %pap-%pap]\n",
> > > > > + mblk_nid, &start, &end);
> > > > > + return -EINVAL;
> > > > > + }
> > > >
> > > > We have memblock_validate_numa_coverage() that checks that amount of memory
> > > > with unset node id is less than a threshold. The loop here can be replaced
> > > > with something like
> > > >
> > > > if (!memblock_validate_numa_coverage(0))
> > > > return -EINVAL;
> > >
> > > Unfortunately, that doesn't seem to result in something that works
> > > (relevant extract only):
> > >
> > > [ 0.000000] NUMA: no nodes coverage for 9MB of 65516MB RAM
> > > [ 0.000000] NUMA: Faking a node at [mem 0x0000000000500000-0x0000000fff0fffff]
> > > [ 0.000000] NUMA: no nodes coverage for 0MB of 65516MB RAM
> > > [ 0.000000] Unable to handle kernel paging request at virtual address 0000000000001d40
> > >
> > > Any idea?
> >
> > With 0 as the threshold the validation fails for the fake node, but it
> > should be fine with memblock_validate_numa_coverage(1)
>
> Huh, subtle. This indeed seems to work. I'll respin the patch next week.
With the patch below memblock_validate_numa_coverage(0) should also work
and it makes more sense.
@Andrew, I can take both this and Marc's new patch via memblock tree if you
prefer.
From de55af44c02bc9aa43f05a785ac66a5aafa43354 Mon Sep 17 00:00:00 2001
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Date: Fri, 29 Nov 2024 11:13:47 +0200
Subject: [PATCH] memblock: allow zero threshold in validate_numa_converage()
Currently memblock validate_numa_converage() returns false negative when
threshold set to zero.
Make the check if the memory size with invalid node ID is greater than
the threshold exclusive to fix that.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/memblock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 0389ce5cd281..095c18b5c430 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -735,7 +735,7 @@ int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
/**
* memblock_validate_numa_coverage - check if amount of memory with
* no node ID assigned is less than a threshold
- * @threshold_bytes: maximal number of pages that can have unassigned node
+ * @threshold_bytes: maximal memory size that can have unassigned node
* ID (in bytes).
*
* A buggy firmware may report memory that does not belong to any node.
@@ -755,7 +755,7 @@ bool __init_memblock memblock_validate_numa_coverage(unsigned long threshold_byt
nr_pages += end_pfn - start_pfn;
}
- if ((nr_pages << PAGE_SHIFT) >= threshold_bytes) {
+ if ((nr_pages << PAGE_SHIFT) > threshold_bytes) {
mem_size_mb = memblock_phys_mem_size() >> 20;
pr_err("NUMA: no nodes coverage for %luMB of %luMB RAM\n",
(nr_pages << PAGE_SHIFT) >> 20, mem_size_mb);
--
2.45.2
> Thanks for your help,
>
> M.
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2024-11-29 9:24 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-27 19:30 [PATCH] arch_numa: Restore nid checks before registering a memblock with a node Marc Zyngier
2024-11-28 7:03 ` Mike Rapoport
2024-11-28 16:52 ` Marc Zyngier
2024-11-29 8:24 ` Mike Rapoport
2024-11-29 8:42 ` Marc Zyngier
2024-11-29 9:23 ` Mike Rapoport [this message]
2024-11-29 10:41 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z0mIDBD4KLyxyOCm@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=stable@vger.kernel.org \
--cc=will@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.