From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758757AbYDANj3 (ORCPT ); Tue, 1 Apr 2008 09:39:29 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754989AbYDANjG (ORCPT ); Tue, 1 Apr 2008 09:39:06 -0400 Received: from pentafluge.infradead.org ([213.146.154.40]:42805 "EHLO pentafluge.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754860AbYDANjF (ORCPT ); Tue, 1 Apr 2008 09:39:05 -0400 Subject: Re: [PATCH 1/2] Customize sched domain via cpuset From: Peter Zijlstra To: Andi Kleen Cc: Hidetoshi Seto , linux-kernel@vger.kernel.org, Ingo Molnar , Paul Jackson In-Reply-To: <20080401132924.GI29105@one.firstfloor.org> References: <47F21BE3.5030705@jp.fujitsu.com> <87zlsdzttp.fsf@basil.nowhere.org> <1207050968.8514.721.camel@twins> <20080401132924.GI29105@one.firstfloor.org> Content-Type: text/plain Date: Tue, 01 Apr 2008 15:38:51 +0200 Message-Id: <1207057131.8514.736.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.21.92 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2008-04-01 at 15:29 +0200, Andi Kleen wrote: > On Tue, Apr 01, 2008 at 01:56:08PM +0200, Peter Zijlstra wrote: > > On Tue, 2008-04-01 at 13:40 +0200, Andi Kleen wrote: > > > Hidetoshi Seto writes: > > > > > > > Using cpuset, now we can partition the system into multiple sched domains. > > > > Then, how about providing different characteristics for each domains? > > > > > > Did you actually see much improvement in any relevant workload > > > from tweaking these parameters? If yes what did you change? > > > And how much did it gain? > > > > > > Ideally the kernel should perform well without much tweaking > > > out of the box, simply because most users won't tweak. Adding a > > > lot of such parameters would imply giving up on good defaults which > > > is not a good thing. > > > > >From what I understand they need very aggressive idle balancing; much > > more so than what is normally healty. > > > > I can see how something like that can be useful when you have a lot of > > very short running tasks. These could pile up on a few cpus and leave > > others idle. > > Could the scheduler auto tune itself to this situation? > > e.g. when it sees a row of very high run queue inbalances increase the > frequency of the idle balancer? Its not actually the idle balancer that's addressed here, but that runs at 1/HZ, so no we can't do that faster unless you tie it to a hrtimer. What it does do is more aggresively look for idle cpus on newidle and fork. Normally we only consider the socket for these lookups, they want a wider view. Auto-tune, perhaps although I'm a bit skeptical of heuristics. We'd need data on the avg 'atom' length of the tasks and idle-ness of remote cpus and so on. The thing is, even then it depends on the data footprint of these tasks and the cost/benefit for your application. By more aggresively migrating tasks you penalize through-put but get a better worst case response time. I'm just not sure we can make that decision for the user.