From: Mel Gorman <mgorman@techsingularity.net>
To: "Gautham R. Shenoy" <gautham.shenoy@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Valentin Schneider <valentin.schneider@arm.com>,
Aubrey Li <aubrey.li@linux.intel.com>,
Barry Song <song.bao.hua@hisilicon.com>,
Mike Galbraith <efault@gmx.de>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs
Date: Wed, 15 Dec 2021 12:25:50 +0000 [thread overview]
Message-ID: <20211215122550.GR3366@techsingularity.net> (raw)
In-Reply-To: <YbnW/vLgE8MmQopN@BLR-5CG11610CF.amd.com>
On Wed, Dec 15, 2021 at 05:22:30PM +0530, Gautham R. Shenoy wrote:
> Hello Mel,
>
>
> On Mon, Dec 13, 2021 at 08:17:37PM +0530, Gautham R. Shenoy wrote:
>
> >
> > Thanks for the patch. I will queue this one for tonight.
> >
>
> Getting the numbers took a bit longer than I expected.
>
No worries.
> > > <SNIP>
> > > + /*
> > > + * Set span based on top domain that places
> > > + * tasks in sibling domains.
> > > + */
> > > + top = sd;
> > > + top_p = top->parent;
> > > + while (top_p && (top_p->flags & SD_PREFER_SIBLING)) {
> > > + top = top->parent;
> > > + top_p = top->parent;
> > > + }
> > > + imb_span = top_p ? top_p->span_weight : sd->span_weight;
> > > } else {
> > > - sd->imb_numa_nr = imb * (sd->span_weight / imb_span);
> > > + int factor = max(1U, (sd->span_weight / imb_span));
> > > +
>
>
> So for the first NUMA domain, the sd->imb_numa_nr will be imb, which
> turns out to be 2 for Zen2 and Zen3 processors across all Nodes Per Socket Settings.
>
> On a 2 Socket Zen3:
>
> NPS=1
> child=MC, llc_weight=16, sd=DIE. sd->span_weight=128 imb=max(2U, (16*16/128) / 4)=2
> top_p = NUMA, imb_span = 256.
>
> NUMA: sd->span_weight =256; sd->imb_numa_nr = 2 * (256/256) = 2
>
> NPS=2
> child=MC, llc_weight=16, sd=NODE. sd->span_weight=64 imb=max(2U, (16*16/64) / 4) = 2
> top_p = NUMA, imb_span = 128.
>
> NUMA: sd->span_weight =128; sd->imb_numa_nr = 2 * (128/128) = 2
> NUMA: sd->span_weight =256; sd->imb_numa_nr = 2 * (256/128) = 4
>
> NPS=4:
> child=MC, llc_weight=16, sd=NODE. sd->span_weight=32 imb=max(2U, (16*16/32) / 4) = 2
> top_p = NUMA, imb_span = 128.
>
> NUMA: sd->span_weight =128; sd->imb_numa_nr = 2 * (128/128) = 2
> NUMA: sd->span_weight =256; sd->imb_numa_nr = 2 * (256/128) = 4
>
> Again, we will be more aggressively load balancing across the two
> sockets in NPS=1 mode compared to NPS=2/4.
>
Yes, but I felt it was reasonable behaviour because we have to strike
some sort of balance between allowing a NUMA imbalance up to a point
to prevent communicating tasks being pulled apart and v3 broke that
completely. There will always be a tradeoff between tasks that want to
remain local to each other and others that prefer to spread as wide as
possible as quickly as possible.
> <SNIP>
> If we retain the (2,4) thresholds from v4.1 but use them in
> allow_numa_imbalance() as in v3 we get
>
> NPS=4
> Test: mel-v4.2
> Copy: 225860.12 (498.11%)
> Scale: 227869.07 (572.58%)
> Add: 278365.58 (624.93%)
> Triad: 264315.44 (596.62%)
>
The potential problem with this is that it probably will work for
netperf when it's a single communicating pair but may not work as well
when there are multiple communicating pairs or a number of communicating
tasks that exceed numa_imb_nr.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NPS=1
> ======
> Clients: tip-core mel-v3 mel-v4 mel-v4.1
> 1 633.19 619.16 632.94 619.27
> (0.00%) (-2.21%) (-0.03%) (-2.19%)
>
> 2 1152.48 1189.88 1184.82 1189.19
> (0.00%) (3.24%) (2.80%) (3.18%)
>
> 4 1946.46 2177.40 1979.56 2196.09
> (0.00%) (11.86%) (1.70%) (12.82%)
>
> 8 3553.29 3564.50 3678.07 3668.77
> (0.00%) (0.31%) (3.51%) (3.24%)
>
> 16 6217.03 6484.58 6249.29 6534.73
> (0.00%) (4.30%) (0.51%) (5.11%)
>
> 32 11702.59 12185.77 12005.99 11917.57
> (0.00%) (4.12%) (2.59%) (1.83%)
>
> 64 18394.56 19535.11 19080.19 19500.55
> (0.00%) (6.20%) (3.72%) (6.01%)
>
> 128 27231.02 31759.92 27200.52 30358.99
> (0.00%) (16.63%) (-0.11%) (11.48%)
>
> 256 33166.10 24474.30 31639.98 24788.12
> (0.00%) (-26.20%) (-4.60%) (-25.26%)
>
> 512 41605.44 54823.57 46684.48 54559.02
> (0.00%) (31.77%) (12.20%) (31.13%)
>
> 1024 53650.54 56329.39 44422.99 56320.66
> (0.00%) (4.99%) (-17.19%) (4.97%)
>
>
> We see that the v4.1 performs better than v4 in most cases except when
> the number of clients=256 where the spread strategy seems to be
> hurting as we see degradation in both v3 and v4.1. This is true even
> for NPS=2 and NPS=4 cases (see below).
>
The 256 client case is a bit of a crapshoot. At that point, the NUMA
imbalancing is disabled and the machine is overloaded.
> NPS=2
> =====
> Clients: tip-core mel-v3 mel-v4 mel-v4.1
> 1 629.76 620.91 629.11 631.95
> (0.00%) (-1.40%) (-0.10%) (0.34%)
>
> 2 1176.96 1203.12 1169.09 1186.74
> (0.00%) (2.22%) (-0.66%) (0.83%)
>
> 4 1990.97 2228.04 1888.19 1995.21
> (0.00%) (11.90%) (-5.16%) (0.21%)
>
> 8 3534.57 3617.16 3660.30 3548.09
> (0.00%) (2.33%) (3.55%) (0.38%)
>
> 16 6294.71 6547.80 6504.13 6470.34
> (0.00%) (4.02%) (3.32%) (2.79%)
>
> 32 12035.73 12143.03 11396.26 11860.91
> (0.00%) (0.89%) (-5.31%) (-1.45%)
>
> 64 18583.39 19439.12 17126.47 18799.54
> (0.00%) (4.60%) (-7.83%) (1.16%)
>
> 128 27811.89 30562.84 28090.29 27468.94
> (0.00%) (9.89%) (1.00%) (-1.23%)
>
> 256 28148.95 26488.57 29117.13 23628.29
> (0.00%) (-5.89%) (3.43%) (-16.05%)
>
> 512 43934.15 52796.38 42603.49 41725.75
> (0.00%) (20.17%) (-3.02%) (-5.02%)
>
> 1024 54391.65 53891.83 48419.09 43913.40
> (0.00%) (-0.91%) (-10.98%) (-19.26%)
>
> In this case, v4.1 performs as good as v4 upto 64 clients. But after
> that we see degradation. The degradation is significant in 1024
> clients case.
>
Kinda the same, it's more likely to be run-to-run variance because the
machine is overloaded.
> NPS=4
> =====
> Clients: tip-core mel-v3 mel-v4 mel-v4.1 mel-v4.2
> 1 622.65 617.83 667.34 644.76 617.58
> (0.00%) (-0.77%) (7.17%) (3.55%) (-0.81%)
>
> 2 1160.62 1182.30 1294.08 1193.88 1182.55
> (0.00%) (1.86%) (11.49%) (2.86%) (1.88%)
>
> 4 1961.14 2171.91 2477.71 1929.56 2116.01
> (0.00%) (10.74%) (26.34%) (-1.61%) (7.89%)
>
> 8 3662.94 3447.98 4067.40 3627.43 3580.32
> (0.00%) (-5.86%) (11.04%) (-0.96%) (-2.25%)
>
> 16 6490.92 5871.93 6924.32 6660.13 6413.34
> (0.00%) (-9.53%) (6.67%) (2.60%) (-1.19%)
>
> 32 11831.81 12004.30 12709.06 12187.78 11767.46
> (0.00%) (1.45%) (7.41%) (3.00%) (-0.54%)
>
> 64 17717.36 18406.79 18785.41 18820.33 18197.86
> (0.00%) (3.89%) (6.02%) (6.22%) (2.71%)
>
> 128 27723.35 27777.34 27939.63 27399.64 24310.93
> (0.00%) (0.19%) (0.78%) (-1.16%) (-12.30%)
>
> 256 30919.69 23937.03 35412.26 26780.37 24642.24
> (0.00%) (-22.58%) (14.52%) (-13.38%) (-20.30%)
>
> 512 43366.03 49570.65 43830.84 43654.42 41031.90
> (0.00%) (14.30%) (1.07%) (0.66%) (-5.38%)
>
> 1024 46960.83 53576.16 50557.19 43743.07 40884.98
> (0.00%) (14.08%) (7.65%) (-6.85%) (-12.93%)
>
>
> In the NPS=4 case, clearly v4 provides the best results.
>
> v4.1 does better v4.2 since it is able to hold off spreading for a
> longer period compared to v4.2.
>
Most likely because v4.2 is disabling the allowed NUMA imbalance too
soon. This is the trade-off between favouring communicating tasks over
embararassingly parallel problems.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2021-12-15 12:25 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-10 9:33 [PATCH v4 0/2] Adjust NUMA imbalance for multiple LLCs Mel Gorman
2021-12-10 9:33 ` [PATCH 1/2] sched/fair: Use weight of SD_NUMA domain in find_busiest_group Mel Gorman
2021-12-21 10:53 ` Vincent Guittot
2021-12-21 11:32 ` Mel Gorman
2021-12-21 13:05 ` Vincent Guittot
2021-12-10 9:33 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs Mel Gorman
2021-12-13 8:28 ` Gautham R. Shenoy
2021-12-13 13:01 ` Mel Gorman
2021-12-13 14:47 ` Gautham R. Shenoy
2021-12-15 11:52 ` Gautham R. Shenoy
2021-12-15 12:25 ` Mel Gorman [this message]
2021-12-16 18:33 ` Gautham R. Shenoy
2021-12-20 11:12 ` Mel Gorman
2021-12-21 15:03 ` Gautham R. Shenoy
2021-12-21 17:13 ` Vincent Guittot
2021-12-22 8:52 ` Jirka Hladky
2022-01-04 19:52 ` Jirka Hladky
2022-01-05 10:42 ` Mel Gorman
2022-01-05 10:49 ` Mel Gorman
2022-01-10 15:53 ` Vincent Guittot
2022-01-12 10:24 ` Mel Gorman
2021-12-17 19:54 ` Gautham R. Shenoy
-- strict thread matches above, loose matches on Subject: below --
2022-02-08 9:43 [PATCH v6 0/2] Adjust NUMA imbalance for " Mel Gorman
2022-02-08 9:43 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2022-02-08 16:19 ` Gautham R. Shenoy
2022-02-09 5:10 ` K Prateek Nayak
2022-02-09 10:33 ` Mel Gorman
2022-02-11 19:02 ` Jirka Hladky
2022-02-14 10:27 ` Srikar Dronamraju
2022-02-14 11:03 ` Vincent Guittot
2022-02-03 14:46 [PATCH v5 0/2] Adjust NUMA imbalance for " Mel Gorman
2022-02-03 14:46 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2022-02-04 7:06 ` Srikar Dronamraju
2022-02-04 9:04 ` Mel Gorman
2022-02-04 15:07 ` Nayak, KPrateek (K Prateek)
2022-02-04 16:45 ` Mel Gorman
2021-12-01 15:18 [PATCH v3 0/2] Adjust NUMA imbalance for " Mel Gorman
2021-12-01 15:18 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2021-12-03 8:15 ` Barry Song
2021-12-03 10:50 ` Mel Gorman
2021-12-03 11:14 ` Barry Song
2021-12-03 13:27 ` Mel Gorman
2021-12-04 10:40 ` Peter Zijlstra
2021-12-06 8:48 ` Gautham R. Shenoy
2021-12-06 14:51 ` Peter Zijlstra
2021-12-06 15:12 ` Mel Gorman
2021-12-09 14:23 ` Valentin Schneider
2021-12-09 15:43 ` Mel Gorman
2021-11-25 15:19 [PATCH 0/2] Adjust NUMA imbalance for " Mel Gorman
2021-11-25 15:19 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211215122550.GR3366@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=aubrey.li@linux.intel.com \
--cc=efault@gmx.de \
--cc=gautham.shenoy@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=song.bao.hua@hisilicon.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).