Re: [RFC v3] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Laurent Vivier <lvivier@redhat.com>,
	linux-kernel@vger.kernel.org,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Borislav Petkov <bp@suse.de>,
	David Gibson <david@gibson.dropbear.id.au>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nathan Fontenot <nfont@linux.vnet.ibm.com>,
	Michael Bringmann <mwb@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, Ingo Molnar <mingo@redhat.com>
Subject: Re: [RFC v3] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node
Date: Mon, 18 Mar 2019 12:26:50 +0100	[thread overview]
Message-ID: <20190318112650.GO6058@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20190318104730.GA4450@linux.vnet.ibm.com>

On Mon, Mar 18, 2019 at 04:17:30PM +0530, Srikar Dronamraju wrote:
> > > node 0 (because firmware doesn't provide the distance information for
> > > memoryless/cpuless nodes):
> > > 
> > >   node   0   1   2   3
> > >     0:  10  40  10  10
> > >     1:  40  10  40  40
> > >     2:  10  40  10  10
> > >     3:  10  40  10  10
> > 
> > *groan*... what does it do for things like percpu memory? ISTR the
> > per-cpu chunks are all allocated early too. Having them all use memory
> > out of node-0 would seem sub-optimal.
> 
> In the specific failing case, there is only one node with memory; all other
> nodes are cpu only nodes.
> 
> However in the generic case since its just a cpu hotplug ops, the memory
> allocated for per-cpu chunks allocated early would remain.

What do you do in the case where there's multiple nodes with memory, but
only one with CPUs on?

Do you then still allocate the per-cpu memory for the CPUs that will
appear on that second node on node0?

> > > We should have:
> > > 
> > >   node   0   1   2   3
> > >     0:  10  40  40  40
> > >     1:  40  10  40  40
> > >     2:  40  40  10  40
> > >     3:  40  40  40  10
> > 
> > Can it happen that it introduces a new distance in the table? One that
> > hasn't been seen before? This example only has 10 and 40, but suppose
> > the new node lands at distance 20 (or 80); can such a thing happen?
> > 
> > If not; why not?
> 
> Yes distances can be 20, 40 or 80. There is nothing that makes the node
> distance to be 40 always.

This,

> > So you're relying on sched_domain_numa_masks_set/clear() to fix this up,
> > but that in turn relies on the sched_domain_numa_levels thing to stay
> > accurate.
> > 
> > This all seems very fragile and unfortunate.
> > 
> 
> Any reasons why this is fragile?

breaks that patch. The code assumes all the numa distances are known at
boot. If you add distances later, it comes unstuck.

It's not like you're actually changing the interconnects around at
runtime. Node topology really should be known at boot time.

What I _think_ the x86 BIOS does is, for each empty socket, iterate as
many logical CPUs (non-present) as it finds on Socket-0 (or whatever
socket is the boot socket).

Those non-present CPUs are assigned to their respective nodes. And
if/when a physical CPU is placed on the socket and the CPUs onlined, it
all 'works' (see ACPI SRAT).

I'm not entirely sure what happens on x86 when it boots with say a
10-core part and you then fill an empty socket with a 20-core part, I
suspect we simply will not use more than 10, we'll not have space
reserved in the Linux cpumasks for them anyway.

next prev parent reply	other threads:[~2019-03-18 11:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-04 19:59 [RFC v3] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node Laurent Vivier
2019-03-05 11:59 ` Peter Zijlstra
2019-03-18 10:47   ` Srikar Dronamraju
2019-03-18 11:26     ` Peter Zijlstra [this message]
2019-03-15 11:12 ` Laurent Vivier
2019-03-15 12:25   ` Peter Zijlstra
2019-03-15 13:05     ` Laurent Vivier
2019-03-18 11:06   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190318112650.GO6058@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bp@suse.de \
    --cc=david@gibson.dropbear.id.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lvivier@redhat.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=mwb@linux.vnet.ibm.com \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=suravee.suthikulpanit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox