From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751486Ab1ITN4a (ORCPT ); Tue, 20 Sep 2011 09:56:30 -0400 Received: from casper.infradead.org ([85.118.1.10]:58693 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751041Ab1ITN43 convert rfc822-to-8bit (ORCPT ); Tue, 20 Sep 2011 09:56:29 -0400 Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede From: Peter Zijlstra To: Kamalesh Babulal Cc: Srivatsa Vaddagiri , Paul Turner , Vladimir Davydov , "linux-kernel@vger.kernel.org" , Bharata B Rao , Dhaval Giani , Vaidyanathan Srinivasan , Ingo Molnar , Pavel Emelianov Date: Tue, 20 Sep 2011 15:56:09 +0200 In-Reply-To: <20110919175129.GA11164@linux.vnet.ibm.com> References: <1315922848.5977.11.camel@twins> <20110913162119.GA3045@linux.vnet.ibm.com> <1315931775.5977.29.camel@twins> <20110913175425.GB3062@linux.vnet.ibm.com> <1315937995.4226.9.camel@twins> <20110913182841.GO11100@linux.vnet.ibm.com> <1315938646.4226.12.camel@twins> <20110913183502.GP11100@linux.vnet.ibm.com> <20110915175537.GA17701@linux.vnet.ibm.com> <1316123323.4060.24.camel@twins> <20110919175129.GA11164@linux.vnet.ibm.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Mailer: Evolution 3.0.3- Message-ID: <1316526969.13664.31.camel@twins> Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2011-09-19 at 23:21 +0530, Kamalesh Babulal wrote: > * Peter Zijlstra [2011-09-15 23:48:43]: > > > On Thu, 2011-09-15 at 23:25 +0530, Kamalesh Babulal wrote: > > > 2.6.38 | 1551 | 48 | 62 | 47 | 50 | > > > ----------------+-------+-------+-------+-------+-------+ > > > 2.6.39 | 3784 | 457 | 722 | 3209 | 1037 | > > > > I'd say we wrecked it going from .38 to .39 and only made it worse after > > that. > > after reverting the commit 866ab43efd325fae8889ea, of the patches > went between .38 and .39 reduces the ping pong of the tasks. > > ------------------------+-------+-------+-------+-------+-------+ > Kernel | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | > ------------------------+-------+-------+-------+-------+-------+ > 2.6.39 | 1542 | 2172 | 2727 | 120 | 3681 | > ------------------------+-------+-------+-------+-------+-------+ > 2.6.39 (with | | | | | | > 866ab43efd reverted) | 65 | 78 | 58 | 99 | 62 | > ------------------------+-------+-------+-------+-------+-------+ > 3.1-rc4+tip | | | | | | > (e467f18f945c) | 1219 | 2037 | 1943 | 772 | 1701 | > ------------------------+-------+-------+-------+-------+-------+ > 3.1-rc4+tip (e467f18f9) | | | | | | > (866ab43efd reverted) | 64 | 45 | 59 | 59 | 69 | > ------------------------+-------+-------+-------+-------+-------+ Right, so reverting that breaks the cpuset/cpuaffinity thing again :-( Now I'm not quite sure why group_imb gets toggled in this use-case at all, having put a trace_printk() in, we get: <...>-1894 [006] 704.056250: find_busiest_group: max: 2048, min: 0, avg: 1024, nr: 2 kworker/1:1-101 [001] 706.305523: find_busiest_group: max: 3072, min: 0, avg: 1024, nr: 3 Which is of course a bad state to be in, but we also get: migration/17-73 [017] 706.284191: find_busiest_group: max: 1024, min: 0, avg: 512, nr: 2 -0 [003] 706.325435: find_busiest_group: max: 1250, min: 440, avg: 1024, nr: 2 on a CGROUP=n kernel.. which I think we can attribute to races. When I enable tracing I also get some good runs, so it smells like the lb does one bad thing and instead of correcting it it makes it worse. It looks like its set-off by a mass-wakeup of random crap that really shouldn't be waking at all, I mean who needs automount to wakeup, or whatever the fuck rtkit-daemon is. I'm pretty sure my bash loops don't do anything remotely related to those. Anyway, once enough random crap wakes up, the load-balancer goes shift stuff around, once we hit the group_imb conditions we seem to get stuck in a bad state instead of getting out of it. Bah!