From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751585AbbJLKBE (ORCPT <rfc822;w@1wt.eu>);
	Mon, 12 Oct 2015 06:01:04 -0400
Received: from mga03.intel.com ([134.134.136.65]:8598 "EHLO mga03.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751173AbbJLKBC (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 12 Oct 2015 06:01:02 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.17,671,1437462000"; 
   d="scan'208";a="662527698"
Date: Mon, 12 Oct 2015 10:12:31 +0800
From: Yuyang Du <yuyang.du@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>, linux-kernel@vger.kernel.org
Subject: Re: 4.3 group scheduling regression
Message-ID: <20151012021230.GK11102@intel.com>
References: <20151008111959.GM3816@twins.programming.kicks-ass.net>
 <1444483369.2804.9.camel@gmail.com>
 <20151010170142.GI3816@twins.programming.kicks-ass.net>
 <1444530318.3363.40.camel@gmail.com>
 <1444585321.4169.18.camel@gmail.com>
 <20151012072344.GM3604@twins.programming.kicks-ass.net>
 <1444635897.3425.19.camel@gmail.com>
 <20151012080407.GJ3816@twins.programming.kicks-ass.net>
 <20151012005351.GJ11102@intel.com>
 <20151012091206.GK3816@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20151012091206.GK3816@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Oct 12, 2015 at 11:12:06AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 12, 2015 at 08:53:51AM +0800, Yuyang Du wrote:
> > Good morning, Peter.
> > 
> > On Mon, Oct 12, 2015 at 10:04:07AM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> > > 
> > > > It's odd to me that things look pretty much the same good/bad tree with
> > > > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > > > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > > > the BadThing trigger.
> > >
> > > Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> > > you had your entire user session in 1 (auto) group and was competing
> > > against 8 manual cgroups.
> > > 
> > > So how exactly are things configured?
> >  
> > Hmm... my impression is the naughty boy mplayer (+Xorg) isn't favored, due 
> > to the per CPU group entity share distribution. Let me dig more.
> 
> So in the old code we had 'magic' to deal with the case where a cgroup
> was consuming less than 1 cpu's worth of runtime. For example, a single
> task running in the group.
> 
> In that scenario it might be possible that the group entity weight:
> 
> 	se->weight = (tg->shares * cfs_rq->weight) / tg->weight;
> 
> Strongly deviates from the tg->shares; you want the single task reflect
> the full group shares to the next level; due to the whole distributed
> approximation stuff.

Yeah, I thought so.
 
> I see you've deleted all that code; see the former
> __update_group_entity_contrib().
 
Probably not there, it actually was an icky way to adjust things.

> It could be that we need to bring that back. But let me think a little
> bit more on this.. I'm having a hard time waking :/

I am guessing it is in calc_tg_weight(), and naughty boys do make them more
favored, what a reality...

Mike, beg you test the following?

--

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..b184da0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
 	 */
 	tg_weight = atomic_long_read(&tg->load_avg);
 	tg_weight -= cfs_rq->tg_load_avg_contrib;
-	tg_weight += cfs_rq_load_avg(cfs_rq);
+	tg_weight += cfs_rq->load.weight;
 
 	return tg_weight;
 }
@@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
 	long tg_weight, load, shares;
 
 	tg_weight = calc_tg_weight(tg, cfs_rq);
-	load = cfs_rq_load_avg(cfs_rq);
+	load = cfs_rq->load.weight;
 
 	shares = (tg->shares * load);
 	if (tg_weight)