From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752116AbcEGJF6 (ORCPT <rfc822;w@1wt.eu>);
	Sat, 7 May 2016 05:05:58 -0400
Received: from mga01.intel.com ([192.55.52.88]:7988 "EHLO mga01.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750936AbcEGJF4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Sat, 7 May 2016 05:05:56 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.24,589,1455004800"; 
   d="scan'208";a="948135701"
Date: Sat, 7 May 2016 09:24:17 +0800
From: Yuyang Du <yuyang.du@intel.com>
To: Mike Galbraith <mgalbraith@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>, Chris Mason <clm@fb.com>,
        Ingo Molnar <mingo@kernel.org>,
        Matt Fleming <matt@codeblueprint.co.uk>, linux-kernel@vger.kernel.org
Subject: Re: sched: tweak select_idle_sibling to look for idle threads
Message-ID: <20160507012417.GK16093@intel.com>
References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com>
 <20160409190554.honue3gtian2p6vr@floor.thefacebook.com>
 <20160430124731.GE2975@worktop.cust.blueprintrf.com>
 <1462086753.9717.29.camel@suse.de>
 <20160501085303.GF2975@worktop.cust.blueprintrf.com>
 <1462094425.9717.45.camel@suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1462094425.9717.45.camel@suse.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sun, May 01, 2016 at 11:20:25AM +0200, Mike Galbraith wrote:
> On Sun, 2016-05-01 at 10:53 +0200, Peter Zijlstra wrote:
> > On Sun, May 01, 2016 at 09:12:33AM +0200, Mike Galbraith wrote:
> > > On Sat, 2016-04-30 at 14:47 +0200, Peter Zijlstra wrote:
> > 
> > > > Can you guys have a play with this; I think one and two node tbench are
> > > > good, but I seem to be getting significant run to run variance on that,
> > > > so maybe I'm not doing it right.
> > > 
> > > Nah, tbench is just variance prone.  It got dinged up at clients=cores
> > > on my desktop box, on 4 sockets the high end got seriously dinged up.
> > 
> > Ouch, yeah, big hurt. Lets try that again... :-)
> 
> Yeah, box could use a little bandaid and a hug :)
> 
> Playing with Chris' benchmark, seems the biggest problem is that we
> don't buddy up waker of many and it's wakees in a node.. ie the wake
> wide thing isn't necessarily our friend when there are multiple wakers
> of many.  If I run an instance per node with one mother of all work in
> autobench mode, it works exactly as you'd expect, game over is when
> wakees = socket size. It never get's near that point if I let things
> wander, it beats itself up well before we get there.

Maybe give the criteria a bit margin, not just wakees tend to equal llc_size,
but the numbers are so wild to easily break the fragile condition, like:

if (master * 100 < slave * factor * 110)
        return 0;

And since you accumulate wakee number (and decay at HZ), this check tends to
not satisfy ever?

if (slave < factor)
	return 0;