From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932096AbbE1M3p (ORCPT ); Thu, 28 May 2015 08:29:45 -0400 Received: from mail-wi0-f179.google.com ([209.85.212.179]:35732 "EHLO mail-wi0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750756AbbE1M3i (ORCPT ); Thu, 28 May 2015 08:29:38 -0400 Date: Thu, 28 May 2015 14:29:31 +0200 From: Ingo Molnar To: Peter Zijlstra Cc: Mike Galbraith , Josef Bacik , riel@redhat.com, mingo@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE Message-ID: <20150528122931.GA8592@gmail.com> References: <1432761736-22093-1-git-send-email-jbacik@fb.com> <20150528102127.GD3644@twins.programming.kicks-ass.net> <1432811789.3237.138.camel@novell.com> <20150528114912.GA29228@gmail.com> <1432815303.3237.156.camel@novell.com> <20150528121933.GI3644@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150528121933.GI3644@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Peter Zijlstra wrote: > > On Thu, 2015-05-28 at 13:49 +0200, Ingo Molnar wrote: > > > > What's the biggest you've seen? > > Wikipedia here: http://en.wikipedia.org/wiki/Haswell_%28microarchitecture%29 > > Tell us HSW-E[PX] have 18 cores 36 thread SKUs. > > But yes, what Mike says, its bound to only get bigger. So it's starting to get big enough to warrant an optimization of the way we account and discover idle CPUs: So when a CPU goes idle, it has idle cycles it could spend on registering itself in either an idle-CPUs bitmap, or in an idle-CPUs queue. The queue (or bitmap) would strictly be only shared between CPUs within the same domain, so the cache bouncing cost from that is still small and package-local. (We remote access overhead in select_idle_sibling() already, due to having to access half of all remote rqs on average.) Such an approach would make select_idle_sibling() independent on the size of the cores domain, it would make it essentially O(1). ( There's a bit of a complication with rq->wake_list, but I think it would be good enough to just register/unregister from the idle handler, if something is idle only short term it should probably not be considered for SMP balancing. ) But I'd definitely not go towards making our SMP balancing macro idle selection decisions poorer, just because our internal implementation is O(nr_cores_per_package) ... Agreed? Thanks, Ingo