public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Heiko Carstens <heiko.carstens@de.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: Ming Lei <tom.leiming@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Michael Holzheu <holzheu@linux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [bisected] "sched: Allow per-cpu kernel threads to run on online && !active" causes warning
Date: Tue, 16 Aug 2016 09:55:05 +0200	[thread overview]
Message-ID: <20160816075505.GB3896@osiris> (raw)
In-Reply-To: <20160815224801.GA3672@mtj.duckdns.org>

Hi Tejun,

> On Mon, Aug 15, 2016 at 01:19:08PM +0200, Heiko Carstens wrote:
> > I can imagine several ways to fix this for s390, but before doing that I'm
> > wondering if the workqueue code is correct with
> > 
> > a) assuming that the cpu_to_node() mapping is valid for all _possible_ cpus
> >    that early
> 
> This can be debatable and making it "first registration sticks" is
> likely easy enough.
> 
> > and
> > 
> > b) that the cpu_to_node() mapping does never change
> 
> However, this part isn't just from workqueue.  It just hits in a more
> obvious way.  For example, memory allocation has the same problem and
> we would have to synchronize memory allocations against cpu <-> node
> mapping changing.  It'd be silly to add the complexity and overhead of
> making the mapping dynamic when that there's nothing inherently
> dynamic about it.  The surface area is pretty big here.
> 
> I have no idea how s390 fakenuma works.  Is that very difficult from
> x86's?  IIRC, x86's fakenuma isn't all that dynamic.

I'm not asking to make the cpu <-> node completely dynamic. We have already
code in place to keep the cpu <-> node mapping static, however currently
this happens too late, but can be fixed quite easily.

Unfortunately we do not always know to which node a cpu belongs when we
register it, currently all cpus will be registered to node 0 and only when
a cpu is brought online this will be corrected.

The problem we have are "standby" cpus on s390, for which we know they are
present but can't use them currently. The mechanism is the following:

We detect a standby cpu and register it via register_cpu(); since the node
isn't known yet for this cpu, the cpu_to_node() function will return 0,
therefore all standby cpus will be registered under node 0.

The new standby cpu will have a "configure" sysfs attribute. If somebody
writes "1" to it we signal the hypervisor that we want to use the cpu and
it allocates one. If this request succeeds we finally know where the cpu is
located topology wise and can fix up everything (and can also make the cpu
to node mapping static).
Note: as long as cpu isn't configured it cannot be brought online.

If the cpu now is finally brought online the change_cpu_under_node() code
within drivers/base/cpu.c fixes up the node symlinks so at least the sysfs
representation is also correct.

If later on the cpu is brought offline, deconfigured, etc. we do not change
the cpu_to_node mapping anymore.

So the question is how to define "first registration sticks". :)

  reply	other threads:[~2016-08-16  7:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-27 12:54 [bisected] "sched: Allow per-cpu kernel threads to run on online && !active" causes warning Heiko Carstens
2016-07-27 15:23 ` Thomas Gleixner
2016-07-30 11:25   ` Heiko Carstens
2016-08-08  7:45     ` Ming Lei
2016-08-15 11:19       ` Heiko Carstens
2016-08-15 22:48         ` Tejun Heo
2016-08-16  7:55           ` Heiko Carstens [this message]
2016-08-16 15:20             ` Tejun Heo
2016-08-16 15:29               ` Peter Zijlstra
2016-08-16 15:42                 ` Tejun Heo
2016-08-16 22:19                   ` Heiko Carstens
2016-08-17  9:20                     ` Michael Holzheu
2016-08-17 13:58                     ` Tejun Heo
2016-08-18  9:30                       ` Michael Holzheu
2016-08-18 14:42                         ` Tejun Heo
2016-08-19  9:52                           ` Michael Holzheu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160816075505.GB3896@osiris \
    --to=heiko.carstens@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=holzheu@linux.vnet.ibm.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox