From: Ingo Molnar <mingo@kernel.org>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH v2 2/4] sched:Consider imbalance_pct when comparing loads in numa_has_capacity
Date: Tue, 23 Jun 2015 10:10:39 +0200 [thread overview]
Message-ID: <20150623081038.GA26231@gmail.com> (raw)
In-Reply-To: <20150622162958.GB32412@linux.vnet.ibm.com>
* Srikar Dronamraju <srikar@linux.vnet.ibm.com> wrote:
> * Rik van Riel <riel@redhat.com> [2015-06-16 10:39:13]:
>
> > On 06/16/2015 07:56 AM, Srikar Dronamraju wrote:
> > > This is consistent with all other load balancing instances where we
> > > absorb unfairness upto env->imbalance_pct. Absorbing unfairness upto
> > > env->imbalance_pct allows to pull and retain task to their preferred
> > > nodes.
> > >
> > > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> >
> > How does this work with other workloads, eg.
> > single instance SPECjbb2005, or two SPECjbb2005
> > instances on a four node system?
> >
> > Is the load still balanced evenly between nodes
> > with this patch?
> >
>
> Yes, I have looked at mpstat logs while running SPECjbb2005 for 1JVMper
> System, 2 JVMs per System and 4 JVMs per System and observed that the
> load spreading was similar with and without this patch.
>
> Also I have visualized using htop when running 0.5X (i.e 48 threads on
> 96 cpu system) cpu stress workloads to see that the spread is similar
> before and after the patch.
>
> Please let me know if there are any better ways to observe the
> spread. [...]
There are. I see you are using prehistoric tooling, but see the various NUMA
convergence latency measurement utilities in 'perf bench numa':
triton:~/tip> perf bench numa mem -h
# Running 'numa/mem' benchmark:
# Running main, "perf bench numa numa-mem -h"
usage: perf bench numa <options>
-p, --nr_proc <n> number of processes
-t, --nr_threads <n> number of threads per process
-G, --mb_global <MB> global memory (MBs)
-P, --mb_proc <MB> process memory (MBs)
-L, --mb_proc_locked <MB>
process serialized/locked memory access (MBs), <= process_memory
-T, --mb_thread <MB> thread memory (MBs)
-l, --nr_loops <n> max number of loops to run
-s, --nr_secs <n> max number of seconds to run
-u, --usleep <n> usecs to sleep per loop iteration
-R, --data_reads access the data via writes (can be mixed with -W)
-W, --data_writes access the data via writes (can be mixed with -R)
-B, --data_backwards access the data backwards as well
-Z, --data_zero_memset
access the data via glibc bzero only
-r, --data_rand_walk access the data with random (32bit LFSR) walk
-z, --init_zero bzero the initial allocations
-I, --init_random randomize the contents of the initial allocations
-0, --init_cpu0 do the initial allocations on CPU#0
-x, --perturb_secs <n>
perturb thread 0/0 every X secs, to test convergence stability
-d, --show_details Show details
-a, --all Run all tests in the suite
-H, --thp <n> MADV_NOHUGEPAGE < 0 < MADV_HUGEPAGE
-c, --show_convergence
show convergence details
-m, --measure_convergence
measure convergence latency
-q, --quiet quiet mode
-S, --serialize-startup
serialize thread startup
-C, --cpus <cpu[,cpu2,...cpuN]>
bind the first N tasks to these specific cpus (the rest is unbound)
-M, --memnodes <node[,node2,...nodeN]>
bind the first N tasks to these specific memory nodes (the rest is unbound)
'-m' will measure convergence.
'-c' will visualize it.
'--thp' can be used to turn hugepages on/off
For example you can create a 'numa02' work-alike by doing:
vega:~> cat numa02
#!/bin/bash
perf bench numa mem --no-data_rand_walk -p 1 -t 32 -G 0 -P 0 -T 32 -l 800 -zZ0c $@
this perf bench numa command mimics numa02 pretty exactly on a 32 CPU system.
This will run it in a loop:
vega:~> cat numa02-loop
while :; do
./numa02 2>&1 | grep runtime-max/thread
sleep 1
done
Or here are various numa01 work-alikes:
vega:~> cat numa01
perf bench numa mem --no-data_rand_walk -p 2 -t 16 -G 0 -P 3072 -T 0 -l 50 -zZ0c $@
vega:~> cat numa01-hard-bind
./numa01 --cpus=0-16_16x16#16 --memnodes=0x16,2x16
or numa01-thread-alloc:
vega:~> cat numa01-THREAD_ALLOC
perf bench numa mem --no-data_rand_walk -p 2 -t 16 -G 0 -P 0 -T 192 -l 1000 -zZ0c $@
You can generate very flexible setups of NUMA access patterns, and measure their
behavior accurately.
It's all so much more capable and more flexible than autonumabench ...
Also, when you are trying to report numbers for multiple runs, please use
something like:
perf stat --null --repeat 3 ...
This will run the workload 3 times (doing only time measurement) and report the
stddev in a human readable form.
Thanks,
Ingo
next prev parent reply other threads:[~2015-06-23 8:10 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-16 11:55 [PATCH v2 0/4] Improve numa load balancing Srikar Dronamraju
2015-06-16 11:55 ` [PATCH v2 1/4] sched/tip:Prefer numa hotness over cache hotness Srikar Dronamraju
2015-07-06 15:50 ` [tip:sched/core] sched/numa: Prefer NUMA " tip-bot for Srikar Dronamraju
2015-07-07 0:19 ` Srikar Dronamraju
2015-07-08 13:31 ` Srikar Dronamraju
2015-07-07 6:49 ` tip-bot for Srikar Dronamraju
2015-06-16 11:56 ` [PATCH v2 2/4] sched:Consider imbalance_pct when comparing loads in numa_has_capacity Srikar Dronamraju
2015-06-16 14:39 ` Rik van Riel
2015-06-22 16:29 ` Srikar Dronamraju
2015-06-23 1:18 ` Rik van Riel
2015-06-23 8:10 ` Ingo Molnar [this message]
2015-06-23 13:01 ` Srikar Dronamraju
2015-06-23 14:36 ` Ingo Molnar
2015-07-06 15:50 ` [tip:sched/core] sched/numa: Consider 'imbalance_pct' when comparing loads in numa_has_capacity() tip-bot for Srikar Dronamraju
2015-07-07 6:49 ` tip-bot for Srikar Dronamraju
2015-06-16 11:56 ` [PATCH v2 3/4] sched:Fix task_numa_migrate to always update preferred node Srikar Dronamraju
2015-06-16 14:54 ` Rik van Riel
2015-06-16 17:19 ` Srikar Dronamraju
2015-06-16 18:25 ` Rik van Riel
2015-06-16 17:18 ` Rik van Riel
2015-06-16 11:56 ` [PATCH v2 4/4] sched:Use correct nid while evaluating task weights Srikar Dronamraju
2015-06-16 15:00 ` Rik van Riel
2015-06-16 17:26 ` Srikar Dronamraju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150623081038.GA26231@gmail.com \
--to=mingo@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=srikar@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox