From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E57F8C64E75 for ; Fri, 20 Nov 2020 09:06:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9710A2220B for ; Fri, 20 Nov 2020 09:06:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727126AbgKTJGf (ORCPT ); Fri, 20 Nov 2020 04:06:35 -0500 Received: from outbound-smtp07.blacknight.com ([46.22.139.12]:54223 "EHLO outbound-smtp07.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725789AbgKTJGe (ORCPT ); Fri, 20 Nov 2020 04:06:34 -0500 Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp07.blacknight.com (Postfix) with ESMTPS id 2CF9D1C404D for ; Fri, 20 Nov 2020 09:06:32 +0000 (GMT) Received: (qmail 11172 invoked from network); 20 Nov 2020 09:06:32 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 20 Nov 2020 09:06:31 -0000 From: Mel Gorman To: Peter Zijlstra , Ingo Molnar Cc: Vincent Guittot , Valentin Schneider , Juri Lelli , LKML , Mel Gorman Subject: [PATCH v3 0/4] Revisit NUMA imbalance tolerance and fork balancing Date: Fri, 20 Nov 2020 09:06:26 +0000 Message-Id: <20201120090630.3286-1-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changelog since v2 o Build fix for !NUMA_BALANCING configurations Changelog since v1 o Split out patch that moves imbalance calculation o Strongly connect fork imbalance considerations with adjust_numa_imbalance When NUMA and CPU balancing were reconciled, there was an attempt to allow a degree of imbalance but it caused more problems than it solved. Instead, imbalance was only allowed with an almost idle NUMA domain. A lot of the problems have since been addressed so it's time for a revisit. There is also an issue with how fork is balanced across threads. It's mentioned in this context as patch 3 and 4 should share similar behaviour in terms of a nodes utilisation. Patch 1 is just a cosmetic rename Patch 2 moves an imbalance calculation. It is both a micro-optimisation and avoids confusing what imbalance means for different group types. Patch 3 allows a "floating" imbalance to exist so communicating tasks can remain on the same domain until utilisation is higher. It aims to balance compute availability with memory bandwidth. Patch 4 is the interesting one. Currently fork can allow a NUMA node to be completely utilised as long as there are idle CPUs until the load balancer gets involved. This caused serious problems with a real workload that unfortunately I cannot share many details about but there is a proxy reproducer. -- 2.26.2 Mel Gorman (4): sched/numa: Rename nr_running and break out the magic number sched: Avoid unnecessary calculation of load imbalance at clone time sched/numa: Allow a floating imbalance between NUMA nodes sched: Limit the amount of NUMA imbalance that can exist at fork time kernel/sched/fair.c | 44 +++++++++++++++++++++++++++++++------------- 1 file changed, 31 insertions(+), 13 deletions(-) -- 2.26.2