From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753446AbcHPWUF (ORCPT ); Tue, 16 Aug 2016 18:20:05 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:36326 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752922AbcHPWUD (ORCPT ); Tue, 16 Aug 2016 18:20:03 -0400 X-IBM-Helo: d06dlp03.portsmouth.uk.ibm.com X-IBM-MailFrom: heiko.carstens@de.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Wed, 17 Aug 2016 00:19:53 +0200 From: Heiko Carstens To: Tejun Heo Cc: Peter Zijlstra , Ming Lei , Thomas Gleixner , LKML , Yasuaki Ishimatsu , Andrew Morton , Lai Jiangshan , Michael Holzheu , Martin Schwidefsky Subject: Re: [bisected] "sched: Allow per-cpu kernel threads to run on online && !active" causes warning References: <20160727125412.GB3912@osiris> <20160730112552.GA3744@osiris> <20160815111908.GA3903@osiris> <20160815224801.GA3672@mtj.duckdns.org> <20160816075505.GB3896@osiris> <20160816152027.GD9516@htj.duckdns.org> <20160816152949.GL30192@twins.programming.kicks-ass.net> <20160816154205.GE9516@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160816154205.GE9516@htj.duckdns.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16081622-0024-0000-0000-00000205D7C4 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16081622-0025-0000-0000-000020063EEC Message-Id: <20160816221953.GA3373@osiris> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-08-16_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608160251 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 16, 2016 at 11:42:05AM -0400, Tejun Heo wrote: > Hello, Peter. > > On Tue, Aug 16, 2016 at 05:29:49PM +0200, Peter Zijlstra wrote: > > On Tue, Aug 16, 2016 at 11:20:27AM -0400, Tejun Heo wrote: > > > As long as the mapping doesn't change after the first onlining of the > > > CPU, the workqueue side shouldn't be too difficult to fix up. I'll > > > look into it. For memory allocations, as long as the cpu <-> node > > > mapping is established before any memory allocation for the cpu takes > > > place, it should be fine too, I think. > > > > Don't we allocate per-cpu memory for 'cpu_possible_map' on boot? There's > > a whole bunch of per-cpu memory users that does things like: > > > > > > for_each_possible_cpu(cpu) { > > struct foo *foo = per_cpu_ptr(&per_cpu_var, cpu); > > > > /* muck with foo */ > > } > > > > > > Which requires a cpu->node map for all possible cpus at boot time. > > Ah, right. If cpu -> node mapping is dynamic, there isn't much that > we can do about allocating per-cpu memory on the wrong node. And it > is problematic that percpu allocations can race against an onlining > CPU switching its node association. > > One way to keep the mapping stable would be reserving per-node > possible CPU slots so that the CPU number assigned to a new CPU is on > the right node. It'd be a simple solution but would get really > expensive with increasing number of nodes. > > Heiko, do you have any ideas? I think the easiest solution would be to simply assign all cpus, for which we do not have any topology information, to an arbitrary node; e.g. round robin. After all the case that cpus are added later is rare and the s390 fake numa implementation does not know about the memory topology. All it is doing is distributing the memory to several nodes in order to avoid a single huge node. So that should be sort of ok. Unless somebody has a better idea? Michael, Martin?