public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Andrew Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86/NUMA: don't pass MAX_NUMNODES to memblock_set_node()
Date: Wed, 29 May 2024 18:00:58 +0200	[thread overview]
Message-ID: <997fcbc7-4e75-4aa2-974c-15d984f02d02@suse.com> (raw)
In-Reply-To: <e33ec69b-21e0-46e3-9b70-6d89548a145b@intel.com>

On 29.05.2024 17:36, Dave Hansen wrote:
> On 5/29/24 00:42, Jan Beulich wrote:
>> On an (old) x86 system with SRAT just covering space above 4Gb:
>>
>>     ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0xfffffffff] hotplug
> 
> OK, so you've got a system with buggy NUMA information.  It _used_ to
> "refuse" the NUMA configuration.  Now it tries to move forward and
> eventually does a NULL deref in memmap_init().
> 
> Right?

Yes.

>> the commit referenced below leads to this NUMA configuration no longer
>> being refused by a CONFIG_NUMA=y kernel (previously
>>
>>     NUMA: nodes only cover 6144MB of your 8185MB e820 RAM. Not used.
>>     No NUMA configuration found
>>     Faking a node at [mem 0x0000000000000000-0x000000027fffffff]
>>
>> was seen in the log directly after the message quoted above), because of
>> memblock_validate_numa_coverage() checking for NUMA_NO_NODE (only). This
>> in turn led to memblock_alloc_range_nid()'s warning about MAX_NUMNODES
>> triggering, followed by a NULL deref in memmap_init() when trying to
>> access node 64's (NODE_SHIFT=6) node data.
> 
> This is a really oblique way of saying:
> 
> 	... followed by a NULL deref in memmap_init() of
> 	NODE_DATA(MAX_NUMNODES).
> 
>> To compensate said change, avoid passing MAX_NUMNODES to
>> memblock_set_node(). In turn numa_clear_kernel_node_hotplug()'s check
>> then also needs adjusting.
>>
>> Fixes: ff6c3d81f2e8 ("NUMA: optimize detection of memory with no node id assigned by firmware")
> 
> I was expecting to see MAX_NUMNODES checks in ff6c3d81f2e8 somewhere.
> But I don't see any in the numa_meminfo_cover_memory() or
> __absent_pages_in_range().
> 
> In other words, it's not completely clear why ff6c3d81f2e8 introduced
> this problem.

It is my understanding that said change, by preventing the NUMA
configuration from being rejected, resulted in different code paths to
be taken. The observed crash was somewhat later than the "No NUMA
configuration found" etc messages. Thus I don't really see a connection
between said change not having had any MAX_NUMNODES check and it having
introduced the (only perceived?) regression.

Jan

  reply	other threads:[~2024-05-29 16:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29  7:42 [PATCH] x86/NUMA: don't pass MAX_NUMNODES to memblock_set_node() Jan Beulich
2024-05-29 15:36 ` Dave Hansen
2024-05-29 16:00   ` Jan Beulich [this message]
2024-05-29 16:08     ` Dave Hansen
2024-05-31  6:21       ` Jan Beulich
2024-05-31  9:42       ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=997fcbc7-4e75-4aa2-974c-15d984f02d02@suse.com \
    --to=jbeulich@suse.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox