Re: [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v4

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Phil Auld <pauld@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Hillf Danton <hdanton@sina.com>, Parth Shah <parth@linux.ibm.com>,
	Rik van Riel <riel@surriel.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v4
Date: Mon, 20 Jan 2020 13:39:35 +0530	[thread overview]
Message-ID: <20200120080935.GD20112@linux.vnet.ibm.com> (raw)
In-Reply-To: <20200117215853.GS3466@techsingularity.net>

* Mel Gorman <mgorman@techsingularity.net> [2020-01-17 21:58:53]:

> On Fri, Jan 17, 2020 at 11:26:31PM +0530, Srikar Dronamraju wrote:
> > * Mel Gorman <mgorman@techsingularity.net> [2020-01-14 10:13:20]:
> > 
> > We certainly are seeing better results than v1.
> > However numa02, numa03, numa05, numa09 and numa10 still seem to regressing, while
> > the others are improving.
> > 
> > While numa04 improves by 14%, numa02 regress by around 12%.
> > 

> Ok, so it's both a win and a loss. This is a curiousity that this patch
> may be the primary factor given that the logic only triggers when the
> local group has spare capacity and the busiest group is nearly idle. The
> test cases you describe should have fairly busy local groups.
> 

Right, your code only seems to affect when the local group has spare
capacity and the busiest->sum_nr_running <=2 

> > 
> > numa01 is a set of 2 process each running 128 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> 
> Are the shared operations shared between the 2 processes? 256 threads
> in total would more than exceed the capacity of a local group, even 128
> threads per process would exceed the capacity of the local group. In such
> a situation, much would depend on the locality of the accesses as well
> as any shared accesses.

Except for numa02 and numa07, (both handle local memory operations) all
shared operations are within the process. i.e per process sharing.

> 
> > numa02 is a single process with 256 threads;
> > each thread doing 800 loops on 32MB thread local memory operations.
> > 
> 
> This one is more interesting. False sharing shouldn't be an issue so the
> threads should be independent.
> 
> > numa03 is a single process with 256 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> Similar.

This is similar to numa01. Except now all threads belong to just one
process.

> 
> > numa04 is a set of 8 process (as many nodes) each running 32 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> Less clear as you don't say what is sharing the memory operations.

all sharing is within the process. In Numa04/numa09, I try to spawn as many
process as the number of nodes, other than that its same as Numa02.

> 
> > numa05 is a set of 16 process (twice as many nodes) each running 16 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> > Details below:
> 
> How many iterations for each test? 

I run 5 iterations. Want me to run with more iterations?

> 
> 
> > ./numa02.sh      Real:  78.87      82.31      80.59      1.72     -12.7187%
> > ./numa02.sh      Sys:   81.18      85.07      83.12      1.94     -35.0337%
> > ./numa02.sh      User:  16303.70   17122.14   16712.92   409.22   -12.5182%
> 
> Before range: 58 to 72
> After range: 78 to 82
> 
> This one is more interesting in general. Can you add trace_printks to
> the check for SD_NUMA the patch introduces and dump the sum_nr_running
> for both local and busiest when the imbalance is ignored please? That
> might give some hint as to the improper conditions where imbalance is
> ignored.

Can be done. Will get back with the results. But do let me know if you want
to run with more iterations or rerun the tests.

> 
> However, knowing the number of iterations would be helpful. Can you also
> tell me if this is consistent between boots or is it always roughly 12%
> regression regardless of the number of iterations?
> 

I have only measured for 5 iterations and I haven't repeated to see if the
numbers are consistent.

> > ./numa03.sh      Real:  477.20     528.12     502.66     25.46    -4.85219%
> > ./numa03.sh      Sys:   88.93      115.36     102.15     13.21    -25.629%
> > ./numa03.sh      User:  119120.73  129829.89  124475.31  5354.58  -3.8219%
> 
> Range before: 471 to 485
> Range after: 477 to 528
> 
> > ./numa04.sh      Real:  374.70     414.76     394.73     20.03    14.6708%
> > ./numa04.sh      Sys:   357.14     379.20     368.17     11.03    3.27294%
> > ./numa04.sh      User:  87830.73   88547.21   88188.97   358.24   5.7113%
> 
> Range before: 450 -> 454
> Range after:  374 -> 414
> 
> Big gain there but the fact the range changed so much is a concern and
> makes me wonder if this case is stable from boot to boot. 
> 
> > ./numa05.sh      Real:  369.50     401.56     385.53     16.03    -5.64937%
> > ./numa05.sh      Sys:   718.99     741.02     730.00     11.01    -3.76438%
> > ./numa05.sh      User:  84989.07   85271.75   85130.41   141.34   -1.48142%
> > 
> 
> Big range changes again but the shared memory operations complicate
> matters. I think it's best to focus on numa02 for and identify if there
> is an improper condition where the patch has an impact, the local group
> has high utilisation but spare capacity while the busiest group is
> almost completely idle.
> 
> > vmstat for numa01
> 
> I'm not going to comment in detail on these other than noting that NUMA
> balancing is heavily active in all cases which may be masking any effect
> of the patch and may have unstable results in general.
> 
> > <SNIP vmstat>
> > <SNIP description of loads that showed gains>
> >
> > numa09 is a set of 8 process (as many nodes) each running 4 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> No description of shared operations but NUMA balancing is very active so
> sharing is probably between processes.
> 
> > numa10 is a set of 16 process (twice as many nodes) each running 2 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> Again, shared accesses without description and heavy NUMA balancing
> activity.
> 
> So bottom line, a lot of these cases have shared operations where NUMA
> balancing decisions should dominate and make it hard to detect any impact
> from the patch. The exception is numa02 so please add tracing and dump
> out local and busiest sum_nr_running when the imbalance is ignored. I
> want to see if it's as simple as the local group is very busy but has
> capacity where the busiest group is almost idle. I also want to see how
> many times over the course of the numa02 workload that the conditions
> for the patch are even met.
> 

-- 
Thanks and Regards
Srikar Dronamraju

next prev parent reply	other threads:[~2020-01-20  8:09 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 10:13 [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v4 Mel Gorman
2020-01-16 16:35 ` Mel Gorman
2020-01-17 13:08   ` Vincent Guittot
2020-01-17 14:15     ` Mel Gorman
2020-01-17 14:32       ` Phil Auld
2020-01-17 14:23     ` Phil Auld
2020-01-17 14:37   ` Valentin Schneider
2020-01-17 13:16 ` Vincent Guittot
2020-01-17 14:26   ` Mel Gorman
2020-01-17 14:29     ` Vincent Guittot
2020-01-17 15:09 ` Vincent Guittot
2020-01-17 15:11   ` Peter Zijlstra
2020-01-17 15:21 ` Phil Auld
2020-01-17 17:56 ` Srikar Dronamraju
2020-01-17 21:58   ` Mel Gorman
2020-01-20  8:09     ` Srikar Dronamraju [this message]
2020-01-20  8:33       ` Mel Gorman
2020-01-20 17:27         ` Srikar Dronamraju
2020-01-20 18:21           ` Mel Gorman
2020-01-21  8:55             ` Srikar Dronamraju
2020-01-21  9:11               ` Mel Gorman
2020-01-21 10:42                 ` Peter Zijlstra
2020-01-21  9:59 ` Srikar Dronamraju
2020-01-29 11:32 ` [tip: sched/core] sched/fair: Allow a small load imbalance between low utilisation SD_NUMA domains tip-bot2 for Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200120080935.GD20112@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=parth@linux.ibm.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=riel@surriel.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox