From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCAECC433E0 for ; Tue, 2 Feb 2021 15:20:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7CDF264E8F for ; Tue, 2 Feb 2021 15:20:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235370AbhBBPU2 (ORCPT ); Tue, 2 Feb 2021 10:20:28 -0500 Received: from foss.arm.com ([217.140.110.172]:51916 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232502AbhBBPSJ (ORCPT ); Tue, 2 Feb 2021 10:18:09 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 97859ED1; Tue, 2 Feb 2021 07:17:23 -0800 (PST) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 623D63F719; Tue, 2 Feb 2021 07:17:21 -0800 (PST) From: Valentin Schneider To: Barry Song , vincent.guittot@linaro.org, mgorman@suse.de, mingo@kernel.org, peterz@infradead.org, dietmar.eggemann@arm.com, morten.rasmussen@arm.com, linux-kernel@vger.kernel.org Cc: linuxarm@openeuler.org, xuwei5@huawei.com, liguozhu@hisilicon.com, tiantao6@hisilicon.com, wanghuiqiang@huawei.com, prime.zeng@hisilicon.com, jonathan.cameron@huawei.com, guodong.xu@linaro.org, Barry Song , Meelis Roos Subject: Re: [PATCH] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2 In-Reply-To: <20210201033830.15040-1-song.bao.hua@hisilicon.com> References: <20210201033830.15040-1-song.bao.hua@hisilicon.com> User-Agent: Notmuch/0.21 (http://notmuchmail.org) Emacs/26.3 (x86_64-pc-linux-gnu) Date: Tue, 02 Feb 2021 15:17:15 +0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/02/21 16:38, Barry Song wrote: > @@ -964,6 +941,12 @@ static void init_overlap_sched_group(struct sched_domain *sd, > > build_balance_mask(sd, sg, mask); > cpu = cpumask_first_and(sched_group_span(sg), mask); > + /* > + * for the group generated by grandchild, use the sgc of 2nd cpu > + * because the 1st cpu might be used by another sched_group > + */ > + if (from_grandchild && cpumask_weight(mask) > 1) > + cpu = cpumask_next_and(cpu, sched_group_span(sg), mask); > > sg->sgc = *per_cpu_ptr(sdd->sgc, cpu); So you are getting a (hopefully) unique ID for this group span at this given topology level (i.e. sd->private) but as I had stated in that list of issues, this creates an sgc that isn't attached to the local group of any sched_domain, and thus won't get its capacity values updated. This can actually be seen via the capacity values you're getting at build time: > [ 0.868907] CPU0 attaching sched-domain(s): ... > [ 0.869542] domain-2: span=0-5 level=NUMA > [ 0.869559] groups: 0:{ span=0-3 cap=4002 }, 5:{ span=4-5 cap=2048 } ^^^^^^^^^^^^^^^^ > [ 0.871177] CPU4 attaching sched-domain(s): ... > [ 0.871200] groups: 4:{ span=4 cap=977 }, 5:{ span=5 cap=1001 } > [ 0.871243] domain-1: span=4-7 level=NUMA > [ 0.871257] groups: 4:{ span=4-5 cap=1978 }, 6:{ span=6-7 cap=1968 } ^^^^^^^^^^^^^^^^ IMO what we want to do here is to hook this CPU0-domain-2-group5 to the sgc of CPU4-domain1-group4. I've done that in the below diff - this gives us groups with sgc's owned at lower topology levels, but this will only ever be true for non-local groups. This has the added benefit of working with single-CPU nodes. Briefly tested on your topology and the sunfire's (via QEMU), and I didn't get screamed at. Before the fun police comes and impounds my keyboard, I'd like to point out that we could leverage this cross-level sgc referencing hack to further change the NUMA domains and pretty much get rid of overlapping groups (that's what I was fumbling with in [1]). [1]: http://lore.kernel.org/r/jhjwnw11ak2.mognet@arm.com That is, rather than building overlapping groups and fixing them whenever that breaks (distance > 2), we could have: - the local group being the child domain's span (as always) - all non-local NUMA groups spanning a single node each, with the right sgc cross-referencing. Thoughts? --->8--- diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b748999c9e11..ef43abb6b1fb 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -932,21 +932,15 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu) static void init_overlap_sched_group(struct sched_domain *sd, struct sched_group *sg, - int from_grandchild) + struct sched_domain *grandchild) { struct cpumask *mask = sched_domains_tmpmask2; - struct sd_data *sdd = sd->private; + struct sd_data *sdd = grandchild ? grandchild->private : sd->private; struct cpumask *sg_span; int cpu; build_balance_mask(sd, sg, mask); cpu = cpumask_first_and(sched_group_span(sg), mask); - /* - * for the group generated by grandchild, use the sgc of 2nd cpu - * because the 1st cpu might be used by another sched_group - */ - if (from_grandchild && cpumask_weight(mask) > 1) - cpu = cpumask_next_and(cpu, sched_group_span(sg), mask); sg->sgc = *per_cpu_ptr(sdd->sgc, cpu); if (atomic_inc_return(&sg->sgc->ref) == 1) @@ -979,7 +973,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) for_each_cpu_wrap(i, span, cpu) { struct cpumask *sg_span; - int from_grandchild = 0; + bool from_grandchild = false; if (cpumask_test_cpu(i, covered)) continue; @@ -1033,7 +1027,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) !cpumask_subset(sched_domain_span(sibling->child), span)) { sibling = sibling->child; - from_grandchild = 1; + from_grandchild = true; } sg = build_group_from_child_sched_domain(sibling, cpu); @@ -1043,7 +1037,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) sg_span = sched_group_span(sg); cpumask_or(covered, covered, sg_span); - init_overlap_sched_group(sd, sg, from_grandchild); + init_overlap_sched_group(sd, sg, from_grandchild ? sibling : NULL); if (!first) first = sg;