From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422943AbXDRMNT (ORCPT ); Wed, 18 Apr 2007 08:13:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1422953AbXDRMNT (ORCPT ); Wed, 18 Apr 2007 08:13:19 -0400 Received: from mail.suse.de ([195.135.220.2]:44751 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422943AbXDRMNR (ORCPT ); Wed, 18 Apr 2007 08:13:17 -0400 Date: Wed, 18 Apr 2007 14:13:04 +0200 From: Nick Piggin To: Ingo Molnar Cc: Andy Whitcroft , linux-kernel@vger.kernel.org, Linus Torvalds , Andrew Morton , Con Kolivas , Mike Galbraith , Arjan van de Ven , Thomas Gleixner , Steve Fox , Nishanth Aravamudan Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS] Message-ID: <20070418121304.GA878@wotan.suse.de> References: <20070413202100.GA9957@elte.hu> <46247DAB.1060808@shadowen.org> <20070417093257.GA9267@wotan.suse.de> <20070417095900.GB25553@elte.hu> <20070418085525.GA21194@wotan.suse.de> <20070418095334.GA26525@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070418095334.GA26525@elte.hu> User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 18, 2007 at 11:53:34AM +0200, Ingo Molnar wrote: > > * Nick Piggin wrote: > > > So looking at elapsed time, a granularity of 100ms is just behind the > > mainline score. However it is using slightly less user time and > > slightly more idle time, which indicates that balancing might have got > > a bit less aggressive. > > > > But anyway, it conclusively shows the efficiency impact of such tiny > > timeslices. > > yeah, the 4% drop in a CPU-cache-sensitive workload like kernbench is > not unexpected when going to really frequent preemption. Clearly, the > default preemption granularity needs to be tuned up. > > I think you said you measured ~3msec average preemption rate per CPU? This was just looking at ctxsw numbers from running 2 cpu hogs on the same runqueue. > That would suggest the average cache-trashing cost was 120 usecs per > every 3 msec window. Taking that as a ballpark figure, to get the > difference back into the noise range we'd have to either use ~5 msec: > > echo 5000000 > /proc/sys/kernel/sched_granularity > > or 15 msec: > > echo 15000000 > /proc/sys/kernel/sched_granularity > > (depending on whether it's 5x 3msec or 5x 1msec - i'm still not sure i > correctly understood your 3msec value. I'd have to know your kernbench > workload's approximate 'steady state' context-switch rate to do a more > accurate calculation.) The kernel compile (make -j8 on 4 thread system) is doing 1800 total context switches per second (450/s per runqueue) for cfs, and 670 for mainline. Going up to 20ms granularity for cfs brings the context switch numbers similar, but user time is still a % or so higher. I'd be more worried about compute heavy threads which naturally don't do much context switching. Some other numbers on the same system Hackbench: 2.6.21-rc7 cfs-v2 1ms[*] nicksched 10 groups: Time: 1.332 0.743 0.607 20 groups: Time: 1.197 1.100 1.241 30 groups: Time: 1.754 2.376 1.834 40 groups: Time: 3.451 2.227 2.503 50 groups: Time: 3.726 3.399 3.220 60 groups: Time: 3.548 4.567 3.668 70 groups: Time: 4.206 4.905 4.314 80 groups: Time: 4.551 6.324 4.879 90 groups: Time: 7.904 6.962 5.335 100 groups: Time: 7.293 7.799 5.857 110 groups: Time: 10.595 8.728 6.517 120 groups: Time: 7.543 9.304 7.082 130 groups: Time: 8.269 10.639 8.007 140 groups: Time: 11.867 8.250 8.302 150 groups: Time: 14.852 8.656 8.662 160 groups: Time: 9.648 9.313 9.541 Mainline seems pretty inconsistent here. lmbench 0K ctxsw latency bound to CPU0: tasks 2 2.59 3.42 2.50 4 3.26 3.54 3.09 8 3.01 3.64 3.22 16 3.00 3.66 3.50 32 2.99 3.70 3.49 64 3.09 4.17 3.50 128 4.80 5.58 4.74 256 5.79 6.37 5.76 cfs is noticably disadvantaged. [*] 500ms didn't make much difference in either test.