From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52106C4321E for ; Fri, 7 Sep 2018 10:11:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 15B872083D for ; Fri, 7 Sep 2018 10:11:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15B872083D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728298AbeIGOv7 (ORCPT ); Fri, 7 Sep 2018 10:51:59 -0400 Received: from outbound-smtp08.blacknight.com ([46.22.139.13]:59552 "EHLO outbound-smtp08.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726237AbeIGOv6 (ORCPT ); Fri, 7 Sep 2018 10:51:58 -0400 Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp08.blacknight.com (Postfix) with ESMTPS id E438C1C32E5 for ; Fri, 7 Sep 2018 11:11:40 +0100 (IST) Received: (qmail 19951 invoked from network); 7 Sep 2018 10:11:40 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.88]) by 81.17.254.9 with ESMTPA; 7 Sep 2018 10:11:40 -0000 From: Mel Gorman To: Srikar Dronamraju , Peter Zijlstra Cc: Ingo Molnar , Rik van Riel , LKML , Mel Gorman Subject: [PATCH 3/4] sched/numa: Stop comparing tasks for NUMA placement after selecting an idle core Date: Fri, 7 Sep 2018 11:11:38 +0100 Message-Id: <20180907101139.20760-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180907101139.20760-1-mgorman@techsingularity.net> References: <20180907101139.20760-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org task_numa_migrate is responsible for finding a core on a preferred NUMA node for a task. As part of this, task_numa_find_cpu iterates through the CPUs of a node and evaulates CPUs, both idle and with running tasks, as placement candidates. Generally though, any idle CPU is equivalent in terms of improving imbalances and a search after finding one is pointless. This patch stops examining CPUs on a node if an idle CPU is considered suitable. While there are some workloads that show minor gains and losses, they are mostly within the noise with the exception of specjbb whether running as one large VM or one VM per socket. The following was reported on a two-socket Haswell machine with 24 cores per socket. specjbb, one JVM per socket (2 total) 4.19.0-rc1 4.19.0-rc1 vanilla oneselect-v1 Hmean tput-1 42258.43 ( 0.00%) 43692.10 ( 3.39%) Hmean tput-2 87811.26 ( 0.00%) 93719.52 ( 6.73%) Hmean tput-3 138100.56 ( 0.00%) 143484.08 ( 3.90%) Hmean tput-4 181061.51 ( 0.00%) 191292.99 ( 5.65%) Hmean tput-5 225577.34 ( 0.00%) 233439.58 ( 3.49%) Hmean tput-6 264763.44 ( 0.00%) 270634.50 ( 2.22%) Hmean tput-7 301458.48 ( 0.00%) 314133.32 ( 4.20%) Hmean tput-8 348364.50 ( 0.00%) 358445.76 ( 2.89%) Hmean tput-9 382129.65 ( 0.00%) 403288.75 ( 5.54%) Hmean tput-10 403566.70 ( 0.00%) 444592.51 ( 10.17%) Hmean tput-11 456967.43 ( 0.00%) 483300.45 ( 5.76%) Hmean tput-12 502295.98 ( 0.00%) 526281.53 ( 4.78%) Hmean tput-13 441284.41 ( 0.00%) 535507.75 ( 21.35%) Hmean tput-14 461478.57 ( 0.00%) 542068.97 ( 17.46%) Hmean tput-15 489725.29 ( 0.00%) 545033.17 ( 11.29%) Hmean tput-16 503726.56 ( 0.00%) 549738.23 ( 9.13%) Hmean tput-17 528650.57 ( 0.00%) 550849.00 ( 4.20%) Hmean tput-18 518065.41 ( 0.00%) 550018.29 ( 6.17%) Hmean tput-19 527412.99 ( 0.00%) 550652.26 ( 4.41%) Hmean tput-20 528166.25 ( 0.00%) 545783.85 ( 3.34%) Hmean tput-21 524669.70 ( 0.00%) 544848.37 ( 3.85%) Hmean tput-22 519010.38 ( 0.00%) 539603.70 ( 3.97%) Hmean tput-23 514947.43 ( 0.00%) 534714.32 ( 3.84%) Hmean tput-24 517953.29 ( 0.00%) 531783.24 ( 2.67%) Coeffecient of variance is roughly 0-3% depending on the wareshouse count so these results are generally outside of the noise. Note that the biggest improvements are when a socket would be roughly half loaded. It's not especially obvious why this would be true given that without the patch the socket is scanned anyway but it may be cache miss related. On a 2-socket broadwell machine, the same observation was made in that the biggest benefit was when a socket was half-loaded. If a single JVM is used for the entire machine, the biggest benefit was also when the machine was half-utilised. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5b2f1684e96e..d59d3e00a480 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1535,7 +1535,7 @@ static bool load_too_imbalanced(long src_load, long dst_load, * into account that it might be best if task running on the dst_cpu should * be exchanged with the source task */ -static void task_numa_compare(struct task_numa_env *env, +static bool task_numa_compare(struct task_numa_env *env, long taskimp, long groupimp, bool maymove) { struct rq *dst_rq = cpu_rq(env->dst_cpu); @@ -1545,6 +1545,7 @@ static void task_numa_compare(struct task_numa_env *env, long imp = env->p->numa_group ? groupimp : taskimp; long moveimp = imp; int dist = env->dist; + bool dst_idle = false; rcu_read_lock(); cur = task_rcu_dereference(&dst_rq->curr); @@ -1638,11 +1639,13 @@ static void task_numa_compare(struct task_numa_env *env, env->dst_cpu = select_idle_sibling(env->p, env->src_cpu, env->dst_cpu); local_irq_enable(); + dst_idle = true; } task_numa_assign(env, cur, imp); unlock: rcu_read_unlock(); + return dst_idle; } static void task_numa_find_cpu(struct task_numa_env *env, @@ -1668,7 +1671,8 @@ static void task_numa_find_cpu(struct task_numa_env *env, continue; env->dst_cpu = cpu; - task_numa_compare(env, taskimp, groupimp, maymove); + if (task_numa_compare(env, taskimp, groupimp, maymove)) + break; } } -- 2.16.4