From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01769C43603 for ; Thu, 19 Dec 2019 08:41:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D039524650 for ; Thu, 19 Dec 2019 08:41:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726692AbfLSIlj (ORCPT ); Thu, 19 Dec 2019 03:41:39 -0500 Received: from outbound-smtp22.blacknight.com ([81.17.249.190]:51182 "EHLO outbound-smtp22.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726463AbfLSIlj (ORCPT ); Thu, 19 Dec 2019 03:41:39 -0500 Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp22.blacknight.com (Postfix) with ESMTPS id 48434B86FE for ; Thu, 19 Dec 2019 08:41:37 +0000 (GMT) Received: (qmail 3991 invoked from network); 19 Dec 2019 08:41:37 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.18.57]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 19 Dec 2019 08:41:37 -0000 Date: Thu, 19 Dec 2019 08:41:34 +0000 From: Mel Gorman To: Rik van Riel Cc: Vincent Guittot , Ingo Molnar , Peter Zijlstra , pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, LKML Subject: Re: [PATCH] sched, fair: Allow a small degree of load imbalance between SD_NUMA domains Message-ID: <20191219084134.GH3178@techsingularity.net> References: <20191218154402.GF3178@techsingularity.net> <37ec5587dbb4035b883e5a69b56da4cc67f0e5ff.camel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <37ec5587dbb4035b883e5a69b56da4cc67f0e5ff.camel@surriel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 18, 2019 at 09:58:01PM -0500, Rik van Riel wrote: > On Wed, 2019-12-18 at 15:44 +0000, Mel Gorman wrote: > > > + /* > > + * Ignore imbalance unless busiest sd is close > > to 50% > > + * utilisation. At that point balancing for > > memory > > + * bandwidth and potentially avoiding > > unnecessary use > > + * of HT siblings is as relevant as memory > > locality. > > + */ > > + imbalance_max = (busiest->group_weight >> 1) - > > imbalance_adj; > > + if (env->imbalance <= imbalance_adj && > > + busiest->sum_nr_running < imbalance_max) { > > + env->imbalance = 0; > > + } > > + } > > return; > > } > > I can see how the 50% point is often great for HT, > but I wonder if that is also the case for SMT4 and > SMT8 systems... > Maybe, maybe not but it's not the most important concern. The highlight in the comment was about memory bandwidth and HT was simply an additional concern. Ideally memory bandwidth and consumption would be taken into account but we know nothing about either. Even if peak memory bandwidth was known, the reference pattern matters a *lot* which can be readily illustrated by using STREAM and observing the different bandwidths for different reference patterns. Similarly, while we might know pages that were referenced, we do not know the bandwidth consumption without taking additional overhead with a PMU. Hence, it makes sense to at least hope that the active tasks have similar memory bandwidth requirements and load balance as normal when we are near the 50% active tasks/busy CPUs. If SMT4 or SMT8 have different requirements or it matters for memory bandwidth then it would need to be carefully examined by someone with access to such hardware to determine an arch-specific and maybe even a per-CPU-family cutoff. In the context of this patch, it unconditionally makes sense that the basic case of two communicating tasks are not migrating cross-node on wakeup and then again on load balance. -- Mel Gorman SUSE Labs