All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Donet Tom <donettom@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	Ritesh Harjani <ritesh.list@gmail.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Danilo Krummrich <dakr@kernel.org>
Subject: Re: [PATCH] driver/base/node.c: Fix softlockups during the initialization of large systems with interleaved memory blocks
Date: Mon, 10 Mar 2025 13:52:28 +0100	[thread overview]
Message-ID: <2025031051-gab-viability-e288@gregkh> (raw)
In-Reply-To: <20250310115305.13599-1-donettom@linux.ibm.com>

On Mon, Mar 10, 2025 at 06:53:05AM -0500, Donet Tom wrote:
> On large systems with more than 64TB of DRAM, if the memory blocks
> are interleaved, node initialization (node_dev_init()) could take
> a long time since it iterates over each memory block. If the memory
> block belongs to the current iterating node, the first pfn_to_nid
> will provide the correct value. Otherwise, it will iterate over all
> PFNs and check the nid. On non-preemptive kernels, this can result
> in a watchdog softlockup warning. Even though CONFIG_PREEMPT_LAZY
> is enabled in kernels now [1], we may still need to fix older
> stable kernels to avoid encountering these kernel warnings during
> boot.
> 
> This patch adds a cond_resched() call in node_dev_init() to avoid
> this warning.
> 
> node_dev_init()
>     register_one_node
>         register_memory_blocks_under_node
>             walk_memory_blocks()
>                 register_mem_block_under_node_early
>                     get_nid_for_pfn
>                         early_pfn_to_nid
> 
> In my system node4 has a memory block ranging from memory30351
> to memory38524, and memory128433. The memory blocks between
> memory38524 and memory128433 do not belong to this node.
> 
> In  walk_memory_blocks() we iterate over all memblocks starting
> from memory38524 to memory128433.
> In register_mem_block_under_node_early(), up to memory38524, the
> first pfn correctly returns the corresponding nid and the function
> returns from there. But after memory38524 and until memory128433,
> the loop iterates through each pfn and checks the nid. Since the nid
> does not match the required nid, the loop continues. This causes
> the soft lockups.
> 
> [1]: https://lore.kernel.org/linuxppc-dev/20241116192306.88217-1-sshegde@linux.ibm.com/
> Fixes: 2848a28b0a60 ("drivers/base/node: consolidate node device subsystem initialization in node_dev_init()")
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
>  drivers/base/node.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 0ea653fa3433..107eb508e28e 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -975,5 +975,6 @@ void __init node_dev_init(void)
>  		ret = register_one_node(i);
>  		if (ret)
>  			panic("%s() failed to add node: %d\n", __func__, ret);
> +		cond_resched();

That's a horrible hack, sorry, but no, we can't sprinkle this around in
random locations, especially as this is actually fixed by using a
different scheduler model as you say.

Why not just make the code faster so as to avoid the long time this
takes?

thanks,

greg k-h

  reply	other threads:[~2025-03-10 12:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-10 11:53 [PATCH] driver/base/node.c: Fix softlockups during the initialization of large systems with interleaved memory blocks Donet Tom
2025-03-10 12:52 ` Greg Kroah-Hartman [this message]
2025-03-11  8:56   ` Donet Tom
2025-03-11  9:29     ` David Hildenbrand
2025-03-11 15:00       ` Donet Tom
2025-03-11 19:39         ` David Hildenbrand
2025-03-11  9:22 ` David Hildenbrand
2025-03-11 15:03   ` Donet Tom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2025031051-gab-viability-e288@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=dakr@kernel.org \
    --cc=david@redhat.com \
    --cc=donettom@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=ritesh.list@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.