From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751786AbdCNOrt (ORCPT <rfc822;w@1wt.eu>);
        Tue, 14 Mar 2017 10:47:49 -0400
Received: from mout.gmx.net ([212.227.17.20]:60140 "EHLO mout.gmx.net"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751154AbdCNOqh (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Mar 2017 10:46:37 -0400
Message-ID: <1489502742.4111.29.camel@gmx.de>
Subject: Re: [PATCHSET for-4.11] cgroup: implement cgroup v2 thread mode
From: Mike Galbraith <efault@gmx.de>
To: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, lizefan@huawei.com,
        hannes@cmpxchg.org, mingo@redhat.com, pjt@google.com,
        luto@amacapital.net, cgroups@vger.kernel.org,
        linux-kernel@vger.kernel.org, kernel-team@fb.com, lvenanci@redhat.com,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Andrew Morton <akpm@linux-foundation.org>
Date: Tue, 14 Mar 2017 15:45:42 +0100
In-Reply-To: <20170313192621.GD15709@htj.duckdns.org>
References: <20170203202048.GD6515@twins.programming.kicks-ass.net>
         <20170203205955.GA9886@mtj.duckdns.org>
         <20170206124943.GJ6515@twins.programming.kicks-ass.net>
         <20170208230819.GD25826@htj.duckdns.org>
         <20170209102909.GC6515@twins.programming.kicks-ass.net>
         <20170210154508.GA16097@mtj.duckdns.org>
         <20170210175145.GJ6515@twins.programming.kicks-ass.net>
         <20170212050544.GJ29323@mtj.duckdns.org> <1486882799.24462.25.camel@gmx.de>
         <1486964707.5912.93.camel@gmx.de> <20170313192621.GD15709@htj.duckdns.org>
Content-Type: text/plain; charset="us-ascii"
X-Mailer: Evolution 3.16.5 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:yLwEveVpDKJjQGsJNym5YpOhJJF4RIXh9+2XIq6++YV/iQSygE4
 wszHpDJ7/jlXdix3bawMpRpII8uiQoBZLTHSwm0pZUi7JQJzLiJr3/PtsjqfwhHJTNqkaJS
 /W3cYErSeTuLmkFOk903/at1CJjW0eSyl4mVZX5WOnF53GeOcDyVaeADp9GOe/AC4G0BrLp
 8L4OF58LSTd+fJG80uDLA==
X-UI-Out-Filterresults: notjunk:1;V01:K0:18ysnfK29Gk=:m/eh5X4kWsxHo2lwQ9NId/
 t3DDO7hL4JA1PivEtAJzUOGyolm2RGM01f4Fxww3OUE7BYFveD2rRPmq4iRs2TrgvVePOO0Y6
 TlTWt0ZOXOWCRCjC7i4YeNY8VKXdebHTqwoAXGm3hfKV4c7BcCLTv2M1wxQeLUSBm3x0M1NFK
 5sircwEqMlzrifKaKRQNyJjqsQVnAuS15ABvNTkZUVtxi4gF1vXSUG0VJf2dPtCLi7muaglMy
 WUIlx2et214q702/NLSfSC2++djjFWAHmhsAV2tog//nvZ7QbN/Re7U+llaJqeRft9NCqX8m3
 FN+xKY9AZBTR5xPbw4bTdTYWLsIzoHTpOdbj00LDHchHfMP6q9WxPWB/qGp9IrOldXohicvB3
 8Hljdkve1PHk0puS8z1PVQIfmFwjV6bvWZKuQFm3ESxr2bFU2SHVF6mEVxm2vnv4SN7/HLoNQ
 CpLnGC4qid8jZxt4eizpuYIDnXGwIe+CuIGilByYEbNnA3+NbpLy8KMw2SfwUQSFaGyrd85ej
 TfoawK99k5m1njGlwr2y2BLifgK59kM4JdzDJ0jg7dl4+U5sNEGgjfvsJHqrHiwgI4N397fyS
 g6lK2vwHWLE7Yrw/97BCz5XNufQekIIbf29L2YTMYGhNHSFN+d9BeetuIKwuhTweu+39gAOhv
 amzQ1aIv5uPjmebYyycuIKQGB+3yaHoHHBlxBK2tzu5EiedqfTxqmFPMHdGTvUBpuBAkjaWlr
 l6MsNmnUl3rNifh9QaJArMb6ZtyAlFZaDw5Q+cKvm35Hhwbmo6om0E7B9wlhELJ9SnyqVFdFT
 zQFMXX9
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, 2017-03-13 at 15:26 -0400, Tejun Heo wrote:
> Hello, Mike.
> 
> Sorry about the long delay.
> 
> On Mon, Feb 13, 2017 at 06:45:07AM +0100, Mike Galbraith wrote:
> > > > So, as long as the depth stays reasonable (single digit or lower),
> > > > what we try to do is keeping tree traversal operations aggregated or
> > > > located on slow paths.  There still are places that this overhead
> > > > shows up (e.g. the block controllers aren't too optimized) but it
> > > > isn't particularly difficult to make a handful of layers not matter at
> > > > all.
> > > 
> > > A handful of cpu bean counting layers stings considerably.
> 
> Hmm... yeah, I was trying to think about ways to avoid full scheduling
> overhead at each layer (the scheduler does a lot per each layer of
> scheduling) but don't think it's possible to circumvent that without
> introducing a whole lot of scheduling artifacts.

Yup.

> In a lot of workloads, the added overhead from several layers of CPU
> controllers doesn't seem to get in the way too much (most threads do
> something other than scheduling after all).

Sure, don't schedule a lot, it doesn't hurt much, but there are plenty
of loads that routinely do schedule a LOT, and there it matters a LOT..
which is why network benchmarks tend to be severely allergic to
scheduler lard.

>   The only major issue that
> we're seeing in the fleet is the cgroup iteration in idle rebalancing
> code pushing up the scheduling latency too much but that's a different
> issue.

Hm, I would suspect PELT to be the culprit there.  It helps smooth out
load balancing, but will stack "skinny looking" tasks.

	-Mike