From: Qian Cai <cai@lca.pw>
To: Michal Hocko <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, brho@google.com, kernelfans@gmail.com,
dave.hansen@intel.com, rppt@linux.ibm.com, peterz@infradead.org,
mpe@ellerman.id.au, mingo@elte.hu, osalvador@suse.de,
luto@kernel.org, tglx@linutronix.de, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH -next v2] mm/hotplug: fix a null-ptr-deref during NUMA boot
Date: Fri, 21 Jun 2019 09:17:58 -0400 [thread overview]
Message-ID: <1561123078.5154.41.camel@lca.pw> (raw)
In-Reply-To: <20190513124112.GH24036@dhcp22.suse.cz>
Sigh...
I don't see any benefit to keep the broken commit,
"x86, numa: always initialize all possible nodes"
for so long in linux-next that just prevent x86 NUMA machines with any memory-
less node from booting.
Andrew, maybe it is time to drop this patch until Michal found some time to fix
it properly.
On Mon, 2019-05-13 at 14:41 +0200, Michal Hocko wrote:
> On Sun 12-05-19 01:48:29, Qian Cai wrote:
> > The linux-next commit ("x86, numa: always initialize all possible
> > nodes") introduced a crash below during boot for systems with a
> > memory-less node. This is due to CPUs that get onlined during SMP boot,
> > but that onlining triggers a page fault in bus_add_device() during
> > device registration:
> >
> > error = sysfs_create_link(&bus->p->devices_kset->kobj,
> >
> > bus->p is NULL. That "p" is the subsys_private struct, and it should
> > have been set in,
> >
> > postcore_initcall(register_node_type);
> >
> > but that happens in do_basic_setup() after smp_init().
> >
> > The old code had set this node online via alloc_node_data(), so when it
> > came time to do_cpu_up() -> try_online_node(), the node was already up
> > and nothing happened.
> >
> > Now, it attempts to online the node, which registers the node with
> > sysfs, but that can't happen before the 'node' subsystem is registered.
> >
> > Since kernel_init() is running by a kernel thread that is in
> > SYSTEM_SCHEDULING state, fixed this by skipping registering with sysfs
> > during the early boot in __try_online_node().
>
> Relying on SYSTEM_SCHEDULING looks really hackish. Why cannot we simply
> drop try_online_node from do_cpu_up? Your v2 remark below suggests that
> we need to call node_set_online because something later on depends on
> that. Btw. why do we even allocate a pgdat from this path? This looks
> really messy.
>
> > Call Trace:
> > device_add+0x43e/0x690
> > device_register+0x107/0x110
> > __register_one_node+0x72/0x150
> > __try_online_node+0x8f/0xd0
> > try_online_node+0x2b/0x50
> > do_cpu_up+0x46/0xf0
> > cpu_up+0x13/0x20
> > smp_init+0x6e/0xd0
> > kernel_init_freeable+0xe5/0x21f
> > kernel_init+0xf/0x180
> > ret_from_fork+0x1f/0x30
> >
> > Reported-by: Barret Rhoden <brho@google.com>
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> >
> > v2: Set the node online as it have CPUs. Otherwise, those memory-less nodes
> > will
> > end up being not in sysfs i.e., /sys/devices/system/node/.
> >
> > mm/memory_hotplug.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index b236069ff0d8..6eb2331fa826 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1037,6 +1037,18 @@ static int __try_online_node(int nid, u64 start, bool
> > set_node_online)
> > if (node_online(nid))
> > return 0;
> >
> > + /*
> > + * Here is called by cpu_up() to online a node without memory from
> > + * kernel_init() which guarantees that "set_node_online" is true
> > which
> > + * will set the node online as it have CPUs but not ready to call
> > + * register_one_node() as "node_subsys" has not been initialized
> > + * properly yet.
> > + */
> > + if (system_state == SYSTEM_SCHEDULING) {
> > + node_set_online(nid);
> > + return 0;
> > + }
> > +
> > pgdat = hotadd_new_pgdat(nid, start);
> > if (!pgdat) {
> > pr_err("Cannot online node %d due to NULL pgdat\n", nid);
> > --
> > 2.20.1 (Apple Git-117)
>
>
next prev parent reply other threads:[~2019-06-21 13:18 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-12 5:48 [PATCH -next v2] mm/hotplug: fix a null-ptr-deref during NUMA boot Qian Cai
2019-05-13 12:41 ` Michal Hocko
2019-05-13 13:43 ` Qian Cai
2019-05-13 14:04 ` Michal Hocko
2019-05-13 15:20 ` Qian Cai
2019-05-13 15:31 ` Michal Hocko
2019-05-22 7:12 ` Pingfan Liu
2019-05-22 11:16 ` Michal Hocko
2019-05-23 3:58 ` Pingfan Liu
2019-05-23 4:00 ` Pingfan Liu
2019-05-28 18:21 ` Michal Hocko
2019-05-30 13:01 ` Pingfan Liu
2019-05-28 18:20 ` Michal Hocko
2019-05-30 12:55 ` Pingfan Liu
2019-05-31 9:03 ` Michal Hocko
2019-06-03 4:17 ` Pingfan Liu
2019-06-21 13:17 ` Qian Cai [this message]
2019-06-21 13:55 ` Michal Hocko
2019-06-24 8:42 ` Pingfan Liu
2019-06-26 13:57 ` Michal Hocko
2019-06-27 3:11 ` Pingfan Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1561123078.5154.41.camel@lca.pw \
--to=cai@lca.pw \
--cc=akpm@linux-foundation.org \
--cc=brho@google.com \
--cc=dave.hansen@intel.com \
--cc=kernelfans@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mhocko@kernel.org \
--cc=mingo@elte.hu \
--cc=mpe@ellerman.id.au \
--cc=osalvador@suse.de \
--cc=peterz@infradead.org \
--cc=rppt@linux.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.