From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933142Ab2CZRfq (ORCPT ); Mon, 26 Mar 2012 13:35:46 -0400 Received: from e28smtp02.in.ibm.com ([122.248.162.2]:43644 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933106Ab2CZRfp (ORCPT ); Mon, 26 Mar 2012 13:35:45 -0400 Date: Mon, 26 Mar 2012 23:05:33 +0530 From: Srivatsa Vaddagiri To: Peter Zijlstra Cc: Ingo Molnar , Mike Galbraith , Suresh Siddha , linux-kernel , Paul Turner Subject: Re: sched: Avoid SMT siblings in select_idle_sibling() if possible Message-ID: <20120326173533.GA4689@linux.vnet.ibm.com> Reply-To: Srivatsa Vaddagiri References: <1329764866.2293.376.camhel@twins> <20120305152443.GE26559@linux.vnet.ibm.com> <20120306091410.GD27238@elte.hu> <20120322153205.GA28570@linux.vnet.ibm.com> <1332750960.16159.81.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1332750960.16159.81.camel@twins> User-Agent: Mutt/1.5.21 (2010-09-15) x-cbid: 12032617-5816-0000-0000-000001E83876 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Peter Zijlstra [2012-03-26 10:36:00]: > > tip tip + patch > > > > volano 1 1.29 (29% improvement) > > sysbench [n3] 1 2 (100% improvement) > > tbench 1 [n4] 1 1.07 (7% improvement) > > tbench 8 [n5] 1 1.26 (26% improvement) > > httperf [n6] 1 1.05 (5% improvement) > > Trade 1 1.31 (31% improvement) > > That smells like there's more to the story, a 100% improvement is too > much.. Yeah I have rubbed my eyes several times to make sure its true and ran the same benchmark (sysbench) again now! I can recreate that ~100% improvement with the patch even now. To quickly re-cap my environment, I have a 16-cpu machine w/ 5 cgroups. 1 cgroup (8192 shares) hosts sysbench inside 8-vcpu VM while remaining 4 cgroups (1024 shares each) hosts 4 cpu hogs running on bare metal. Given this overcommittment, select_idle_sibling() should mostly be a no-op (i.e it won't find any idle cores and thus defaults to prev_cpu). Also the only tasks that will (sleep and) wakeup are the VM tasks. Since the patch potentially affects (improves) scheduling latencies, I measured Sum(se.statistics.wait_sum) for the VM tasks over the benchmark run (5 iterations of sysbench). tip : 987240 ms tip + patch : 280275 ms I will get more comprehensive perf data shortly and post. >>From what I can tell, the huge improvement in benchmark score is coming from reduced latencies for its VM tasks. The hard part to figure out (when we are inside select_task_rq_fair()) is whether any potential improvement in latencies (because of waking up on a less loaded cpu) will offshoot the cost of potentially more L2-cache misses, for which IMHO we don't have enough data to make a good decision. - vatsa