Re: [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v4

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Phil Auld <pauld@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Hillf Danton <hdanton@sina.com>, Parth Shah <parth@linux.ibm.com>,
	Rik van Riel <riel@surriel.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v4
Date: Mon, 20 Jan 2020 13:39:35 +0530	[thread overview]
Message-ID: <20200120080935.GD20112@linux.vnet.ibm.com> (raw)
In-Reply-To: <20200117215853.GS3466@techsingularity.net>

* Mel Gorman <mgorman@techsingularity.net> [2020-01-17 21:58:53]:

> On Fri, Jan 17, 2020 at 11:26:31PM +0530, Srikar Dronamraju wrote:
> > * Mel Gorman <mgorman@techsingularity.net> [2020-01-14 10:13:20]:
> > 
> > We certainly are seeing better results than v1.
> > However numa02, numa03, numa05, numa09 and numa10 still seem to regressing, while
> > the others are improving.
> > 
> > While numa04 improves by 14%, numa02 regress by around 12%.
> > 

> Ok, so it's both a win and a loss. This is a curiousity that this patch
> may be the primary factor given that the logic only triggers when the
> local group has spare capacity and the busiest group is nearly idle. The
> test cases you describe should have fairly busy local groups.
> 

Right, your code only seems to affect when the local group has spare
capacity and the busiest->sum_nr_running <=2 

> > 
> > numa01 is a set of 2 process each running 128 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> 
> Are the shared operations shared between the 2 processes? 256 threads
> in total would more than exceed the capacity of a local group, even 128
> threads per process would exceed the capacity of the local group. In such
> a situation, much would depend on the locality of the accesses as well
> as any shared accesses.

Except for numa02 and numa07, (both handle local memory operations) all
shared operations are within the process. i.e per process sharing.

> 
> > numa02 is a single process with 256 threads;
> > each thread doing 800 loops on 32MB thread local memory operations.
> > 
> 
> This one is more interesting. False sharing shouldn't be an issue so the
> threads should be independent.
> 
> > numa03 is a single process with 256 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> Similar.

This is similar to numa01. Except now all threads belong to just one
process.

> 
> > numa04 is a set of 8 process (as many nodes) each running 32 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> Less clear as you don't say what is sharing the memory operations.

all sharing is within the process. In Numa04/numa09, I try to spawn as many
process as the number of nodes, other than that its same as Numa02.

> 
> > numa05 is a set of 16 process (twice as many nodes) each running 16 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> > Details below:
> 
> How many iterations for each test? 

I run 5 iterations. Want me to run with more iterations?

> 
> 
> > ./numa02.sh      Real:  78.87      82.31      80.59      1.72     -12.7187%
> > ./numa02.sh      Sys:   81.18      85.07      83.12      1.94     -35.0337%
> > ./numa02.sh      User:  16303.70   17122.14   16712.92   409.22   -12.5182%
> 
> Before range: 58 to 72
> After range: 78 to 82
> 
> This one is more interesting in general. Can you add trace_printks to
> the check for SD_NUMA the patch introduces and dump the sum_nr_running
> for both local and busiest when the imbalance is ignored please? That
> might give some hint as to the improper conditions where imbalance is
> ignored.

Can be done. Will get back with the results. But do let me know if you want
to run with more iterations or rerun the tests.

> 
> However, knowing the number of iterations would be helpful. Can you also
> tell me if this is consistent between boots or is it always roughly 12%
> regression regardless of the number of iterations?
> 

I have only measured for 5 iterations and I haven't repeated to see if the
numbers are consistent.

> > ./numa03.sh      Real:  477.20     528.12     502.66     25.46    -4.85219%
> > ./numa03.sh      Sys:   88.93      115.36     102.15     13.21    -25.629%
> > ./numa03.sh      User:  119120.73  129829.89  124475.31  5354.58  -3.8219%
> 
> Range before: 471 to 485
> Range after: 477 to 528
> 
> > ./numa04.sh      Real:  374.70     414.76     394.73     20.03    14.6708%
> > ./numa04.sh      Sys:   357.14     379.20     368.17     11.03    3.27294%
> > ./numa04.sh      User:  87830.73   88547.21   88188.97   358.24   5.7113%
> 
> Range before: 450 -> 454
> Range after:  374 -> 414
> 
> Big gain there but the fact the range changed so much is a concern and
> makes me wonder if this case is stable from boot to boot. 
> 
> > ./numa05.sh      Real:  369.50     401.56     385.53     16.03    -5.64937%
> > ./numa05.sh      Sys:   718.99     741.02     730.00     11.01    -3.76438%
> > ./numa05.sh      User:  84989.07   85271.75   85130.41   141.34   -1.48142%
> > 
> 
> Big range changes again but the shared memory operations complicate
> matters. I think it's best to focus on numa02 for and identify if there
> is an improper condition where the patch has an impact, the local group
> has high utilisation but spare capacity while the busiest group is
> almost completely idle.
> 
> > vmstat for numa01
> 
> I'm not going to comment in detail on these other than noting that NUMA
> balancing is heavily active in all cases which may be masking any effect
> of the patch and may have unstable results in general.
> 
> > <SNIP vmstat>
> > <SNIP description of loads that showed gains>
> >
> > numa09 is a set of 8 process (as many nodes) each running 4 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> No description of shared operations but NUMA balancing is very active so
> sharing is probably between processes.
> 
> > numa10 is a set of 16 process (twice as many nodes) each running 2 threads;
> > each thread doing 50 loops on 3GB process shared memory operations.
> > 
> 
> Again, shared accesses without description and heavy NUMA balancing
> activity.
> 
> So bottom line, a lot of these cases have shared operations where NUMA
> balancing decisions should dominate and make it hard to detect any impact
> from the patch. The exception is numa02 so please add tracing and dump
> out local and busiest sum_nr_running when the imbalance is ignored. I
> want to see if it's as simple as the local group is very busy but has
> capacity where the busiest group is almost idle. I also want to see how
> many times over the course of the numa02 workload that the conditions
> for the patch are even met.
> 

-- 
Thanks and Regards
Srikar Dronamraju

next prev parent reply	other threads:[~2020-01-20  8:09 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 10:13 [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v4 Mel Gorman
2020-01-16 16:35 ` Mel Gorman
2020-01-17 13:08   ` Vincent Guittot
2020-01-17 14:15     ` Mel Gorman
2020-01-17 14:32       ` Phil Auld
2020-01-17 14:23     ` Phil Auld
2020-01-17 14:37   ` Valentin Schneider
2020-01-17 13:16 ` Vincent Guittot
2020-01-17 14:26   ` Mel Gorman
2020-01-17 14:29     ` Vincent Guittot
2020-01-17 15:09 ` Vincent Guittot
2020-01-17 15:11   ` Peter Zijlstra
2020-01-17 15:21 ` Phil Auld
2020-01-17 17:56 ` Srikar Dronamraju
2020-01-17 21:58   ` Mel Gorman
2020-01-20  8:09     ` Srikar Dronamraju [this message]
2020-01-20  8:33       ` Mel Gorman
2020-01-20 17:27         ` Srikar Dronamraju
2020-01-20 18:21           ` Mel Gorman
2020-01-21  8:55             ` Srikar Dronamraju
2020-01-21  9:11               ` Mel Gorman
2020-01-21 10:42                 ` Peter Zijlstra
2020-01-21  9:59 ` Srikar Dronamraju
2020-01-29 11:32 ` [tip: sched/core] sched/fair: Allow a small load imbalance between low utilisation SD_NUMA domains tip-bot2 for Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200120080935.GD20112@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=parth@linux.ibm.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=riel@surriel.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.