linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Donet Tom <donettom@linux.ibm.com>
To: Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>, Zi Yan <ziy@nvidia.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	rafael@kernel.org, Danilo Krummrich <dakr@kernel.org>,
	Ritesh Harjani <ritesh.list@gmail.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Yury Norov <yury.norov@gmail.com>,
	Dave Jiang <dave.jiang@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/3] driver/base: Optimize memory block registration to reduce boot time
Date: Fri, 9 May 2025 21:10:34 +0530	[thread overview]
Message-ID: <d19ad41c-069d-436d-8fea-a05188adcb0e@linux.ibm.com> (raw)
In-Reply-To: <78bc6a1b-164e-4925-a624-a271a4499364@redhat.com>


On 5/8/25 2:48 PM, David Hildenbrand wrote:
> On 05.05.25 15:24, Mike Rapoport wrote:
>> On Mon, May 05, 2025 at 10:18:43AM +0200, David Hildenbrand wrote:
>>> On 05.05.25 09:53, Mike Rapoport wrote:
>>>> On Mon, May 05, 2025 at 09:38:43AM +0200, David Hildenbrand wrote:
>>>>> On 05.05.25 09:28, Oscar Salvador wrote:
>>>>>> On Mon, May 05, 2025 at 09:16:48AM +0200, David Hildenbrand wrote:
>>>>>>> memory hotplug code never calls register_one_node(), unless I am 
>>>>>>> missing
>>>>>>> something.
>>>>>>>
>>>>>>> During add_memory_resource(), we call __try_online_node(nid, 
>>>>>>> false), meaning
>>>>>>> we skip register_one_node().
>>>>>>>
>>>>>>> The only caller of __try_online_node(nid, true) is 
>>>>>>> try_online_node(), called
>>>>>>> from CPU hotplug code, and I *guess* that is not required.
>>>>>>
>>>>>> Well, I guess this is because we need to link the cpus to the node.
>>>>>> register_one_node() has two jobs: 1) register cpus belonging to 
>>>>>> the node
>>>>>> and 2) register memory-blocks belonging to the node (if any).
>>>>>
>>>>> Ah, via __register_one_node() ...
>>>>>
>>>>> I would assume that an offline node
>>>>>
>>>>> (1) has no memory
>>>>> (2) has no CPUs
>>>>>
>>>>> When we *hotplug* either memory or CPUs, and we first online the 
>>>>> node, there
>>>>> is nothing to register. Because if there would be something, the 
>>>>> node would
>>>>> already be online.
>>>>>
>>>>> In particular, try_offline_node() will only offline a node if
>>>>>
>>>>> (A) No present pages: No pages are spanned anymore. This includes
>>>>>       offline memory blocks.
>>>>> (B) No present CPUs.
>>>>>
>>>>> But maybe there is some case that I am missing ...
>>>>
>>>> I actually hoped you and Oscar know how that stuff works :)
>>>
>>> Well, I know how the memory side works, but the CPU side is giving 
>>> me a hard
>>> time :)
>>>
>>>>
>>>> I tried to figure what is going on there and it all looks really 
>>>> convoluted.
>>>
>>> Jap ...
>>>
>>>>
>>>> So, on boot we have
>>>>     cpu_up() ->
>>>>         try_online_node() ->
>>>>                bails out because all nodes are online (at least on
>>>>             x86 AFAIU, see 1ca75fa7f19d ("arch/x86/mm/numa: Do
>>>>                           not initialize nodes twice"))
>>>>     node_dev_init()i ->
>>>>         register_one_node() ->
>>>>             this one can use __register_one_node() and loop
>>>>             over memblock regions.
>>>>
>>>> And for the hotplug/unplug path, it seems that
>>>> register_memory_blocks_under_node(MEMINIT_EARLY) is superfluous, 
>>>> because if
>>>> a node had memory it wouldn't get offlined, and if we are 
>>>> hotplugging an
>>>> node with memory and cpus, memory hotplug anyway calls
>>>> register_memory_blocks_under_node_hotplug().
>>>>
>>>> So, IMHO, register_one_node() should not call
>>>> register_memory_blocks_under_node() at all, but again, I might have 
>>>> missed
>>>> something :)
>>>
>>> Hm, but someone has to create these links for the memory blocks.
>>
>> My understanding that the links for the memory blocks during hotplug 
>> are created in
>>
>> add_memory_resource()
>>    register_memory_blocks_under_node()
>
> Yes, during hotplug it's exactly that, after registering the node + 
> setting it online.
>
>>
>> So register_one_node() only calls register_memory_blocks_under_node() 
>> when
>> there are no actual memory resources under that node, isn't it?
>
> Except in early boot. That's why register_one_node() has:
>
>     register_memory_blocks_under_node(nid, start_pfn, end_pfn,
>                       MEMINIT_EARLY);
>
>                         ^ early :)
> And that is triggered by
>
>     node_dev_init()->register_one_node()
>
>
>>
>> Then we can drop the call to register_memory_blocks_under_node() from
>> register_one_node() and add creation of memory blocks to 
>> node_dev_init(),
>> i.e.
>>
>> node_dev_init()
>>    for_each_node(nid)
>>      __register_one_node(nid)
>>        for_each_mem_region()
>>          /* create memory block if node matches */
>
> Yes exactly, that makes sense.

Hi Andrew and Mike

Based on the discussion so far, it is clear that the patch will work in all cases,
including when CONFIG_ARCH_KEEP_MEMBLOCK  is disabled. Just checking —
would you prefer to take this version, or should I send a v4?

Thanks
Donet



  reply	other threads:[~2025-05-09 15:41 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-03  5:40 [PATCH v3 1/3] driver/base: Optimize memory block registration to reduce boot time Donet Tom
2025-05-03  5:40 ` [PATCH v3 2/3] driver/base: remove register_mem_block_under_node_early() Donet Tom
2025-05-03 13:10   ` Zi Yan
2025-05-03  5:40 ` [PATCH v3 3/3] drivers/base : Rename register_memory_blocks_under_node() and remove context argument Donet Tom
2025-05-03 13:10   ` Zi Yan
2025-05-03 13:10 ` [PATCH v3 1/3] driver/base: Optimize memory block registration to reduce boot time Zi Yan
2025-05-04 11:09 ` Mike Rapoport
2025-05-04 16:34   ` Donet Tom
2025-05-04 20:03     ` Andrew Morton
2025-05-05 14:05       ` Mike Rapoport
2025-05-05  7:16     ` David Hildenbrand
2025-05-05  7:28       ` Oscar Salvador
2025-05-05  7:38         ` David Hildenbrand
2025-05-05  7:53           ` Mike Rapoport
2025-05-05  8:18             ` David Hildenbrand
2025-05-05 13:24               ` Mike Rapoport
2025-05-08  9:18                 ` David Hildenbrand
2025-05-09 15:40                   ` Donet Tom [this message]
2025-05-09 21:10                     ` Andrew Morton
2025-05-11  6:40                       ` Donet Tom
2025-05-11  5:39                     ` Mike Rapoport
2025-05-11 12:33                       ` Donet Tom
2025-05-05  7:57           ` Oscar Salvador
2025-05-05  8:12             ` David Hildenbrand
2025-05-05  9:36               ` Oscar Salvador
2025-05-05 10:36                 ` David Hildenbrand
2025-05-05 12:51                   ` Donet Tom
2025-05-05 13:02                     ` David Hildenbrand
2025-05-05 16:40                       ` Donet Tom
2025-05-05 13:07                   ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d19ad41c-069d-436d-8fea-a05188adcb0e@linux.ibm.com \
    --to=donettom@linux.ibm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=dakr@kernel.org \
    --cc=dave.jiang@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=rafael@kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=rppt@kernel.org \
    --cc=yury.norov@gmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).