* Regression in perf bench numa convergence stats
@ 2015-06-24 11:10 Srikar Dronamraju
2015-06-24 12:49 ` Ingo Molnar
2015-06-26 8:43 ` [tip:perf/urgent] perf bench numa: Fix to show proper " tip-bot for Srikar Dronamraju
0 siblings, 2 replies; 4+ messages in thread
From: Srikar Dronamraju @ 2015-06-24 11:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Vinson Lee, Ingo Molnar
Cc: LKML, Namhyung Kim, Masami Hiramatsu
perf bench numa mem with -c / -m options on v4.1 and latest tip arent
showing correct convergence statistics. I ran git bisect between v4.0 and
v4.1. I have included the patch that fixed the problem for me.
After bisect, git bisect visualize shows
>From e1e455f4f4d35850c30235747620d0d078fe9f64 Mon Sep 17 00:00:00 2001
From: Vinson Lee <vlee@twitter.com>
Date: Mon, 23 Mar 2015 12:09:16 -0700
Subject: [PATCH] perf tools: Work around lack of sched_getcpu in glibc < 2.6.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This patch fixes this build error with glibc < 2.6.
CC util/cloexec.o
cc1: warnings being treated as errors
util/cloexec.c: In function _perf_flag_probe_:
util/cloexec.c:24: error: implicit declaration of function
_sched_getcpu_
util/cloexec.c:24: error: nested extern declaration of _sched_getcpu_
make: *** [util/cloexec.o] Error 1
Signed-off-by: Vinson Lee <vlee@twitter.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: stable@vger.kernel.org # 3.18+
Link: http://lkml.kernel.org/r/1427137761-16119-1-git-send-email-vlee@twopensource.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
# git log --oneline e1e455f
e1e455f perf tools: Work around lack of sched_getcpu in glibc < 2.6.
77cfe38 perf kmem: Print big numbers using thousands' group
929a6bb tools lib traceevent: Factor out allocating and processing args
e6d7c91 perf probe: Fix to get ummapped symbol address on kernel
228f14f perf tools: Remove (null) value of "Sort order" for perf mem report
2c7da8c perf annotate: Allow annotation for decompressed kernel modules
bc84f46 perf tools: Try to lookup kernel module map before creating one
907fb50 perf tools: Remove is_kmodule_extension function
e746b3e perf tools: Remove compressed argument from is_kernel_module
8dee9ff perf tools: Use kmod_path__parse in is_kernel_module
To further verify if the problem is because of e1e455f commit, I did roll back to e1e455f
and its parent 77cfe38. I see this problem on more than one system.
# rpm -qa | grep glibc-2
glibc-2.17-55.el7.x86_64
git reset --hard e1e455f
# Running 'numa/mem' benchmark:
# Running main, "perf bench numa numa-mem --no-data_rand_walk -p 1 -t 64 -G 0 -P 0 -T 32 -l 800 -zZ0c"
#
#
###
# 64 tasks will execute (on 4 nodes, 64 CPUs):
# 800x 0MB global shared mem operations
# 800x 0MB process shared mem operations
# 800x 32MB thread local mem operations
###
###
#
# Startup synchronization: ... threads initialized in 0.512908 seconds.
#
# 0.1% [0.0 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0}
# 0.6% [0.0 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0}
# 5.1% [0.0 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0}
# 9.6% [0.1 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0}
# 14.0% [0.1 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0}
###
4.903 secs slowest (max) thread-runtime
4.873 secs fastest (min) thread-runtime
4.941 secs average thread-runtime
0.301 % difference between max/avg runtime
4.228 GB data processed, per thread
270.583 GB data processed, total
1.160 nsecs/byte/thread runtime
0.862 GB/sec/thread speed
55.193 GB/sec total speed
and its parent 77cfe38
# git reset --hard 77cfe38
# Running 'numa/mem' benchmark:
# Running main, "perf bench numa numa-mem --no-data_rand_walk -p 1 -t 64 -G 0 -P 0 -T 32 -l 800 -zZ0c"
#
#
###
# 64 tasks will execute (on 4 nodes, 64 CPUs):
# 800x 0MB global shared mem operations
# 800x 0MB process shared mem operations
# 800x 32MB thread local mem operations
###
###
#
# Startup synchronization: ... threads initialized in 0.421336 seconds.
#
# 0.4% [0.0 mins] 16/1 16/1 16/1 16/1 [ 0/4 ] l: 1-20 ( 19) [95.0%] {4-4}
# 2.6% [0.0 mins] 17/1 15/1 16/1 16/1 [ 2/4 ] l: 3-37 ( 34) [91.9%] {4-4}
# 7.1% [0.0 mins] 17/1 15/1 16/1 16/1 [ 2/4 ] l: 32-67 ( 35) [52.2%] {4-4}
# 11.8% [0.1 mins] 17/1 15/1 16/1 16/1 [ 2/4 ] l: 65-103 ( 38) [36.9%] {4-4}
# 15.9% [0.1 mins] 17/1 15/1 16/1 16/1 [ 2/4 ] l: 98-136 ( 38) [27.9%] {4-4}
###
4.970 secs slowest (max) thread-runtime
4.940 secs fastest (min) thread-runtime
4.980 secs average thread-runtime
0.300 % difference between max/avg runtime
4.237 GB data processed, per thread
271.187 GB data processed, total
1.173 nsecs/byte/thread runtime
0.853 GB/sec/thread speed
54.562 GB/sec total speed
Even reverting e1e455f on top of tip/master seems to avoid the problem.
The below patch fixes the problem.
--
Thanks and Regards
Srikar Dronamraju
---->8--------------------------------------------
>From 88199ad8a3d6495080eaa016b87a612bc742b1c4 Mon Sep 17 00:00:00 2001
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Date: Wed, 24 Jun 2015 16:23:22 +0530
Subject: [PATCH] perf tools:Fix perf_bench to show proper convergence
With commit: e1e455f (perf tools: Work around lack of sched_getcpu in
glibc < 2.6), perf_bench numa mem with -c or -m option is not able to
correctly calculate convergence. With the above commit, sched_getcpu
always seems to return -1. The intention of commit e1e455f was to add a
sched_getcpu in glibc < 2.6. Hence keep the sched_getcpu definition
under an ifdef.
This regression happened occurred between v4.0 and v4.1
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
tools/perf/util/cloexec.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c
index 85b5238..2babdda 100644
--- a/tools/perf/util/cloexec.c
+++ b/tools/perf/util/cloexec.c
@@ -7,11 +7,15 @@
static unsigned long flag = PERF_FLAG_FD_CLOEXEC;
+#ifdef __GLIBC_PREREQ
+#if !__GLIBC_PREREQ(2, 6)
int __weak sched_getcpu(void)
{
errno = ENOSYS;
return -1;
}
+#endif
+#endif
static int perf_flag_probe(void)
{
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Regression in perf bench numa convergence stats
2015-06-24 11:10 Regression in perf bench numa convergence stats Srikar Dronamraju
@ 2015-06-24 12:49 ` Ingo Molnar
2015-06-25 15:30 ` Arnaldo Carvalho de Melo
2015-06-26 8:43 ` [tip:perf/urgent] perf bench numa: Fix to show proper " tip-bot for Srikar Dronamraju
1 sibling, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2015-06-24 12:49 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Vinson Lee, Ingo Molnar,
LKML, Namhyung Kim, Masami Hiramatsu
* Srikar Dronamraju <srikar@linux.vnet.ibm.com> wrote:
> perf bench numa mem with -c / -m options on v4.1 and latest tip arent
> showing correct convergence statistics. I ran git bisect between v4.0 and
> v4.1. I have included the patch that fixed the problem for me.
> From 88199ad8a3d6495080eaa016b87a612bc742b1c4 Mon Sep 17 00:00:00 2001
> From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Date: Wed, 24 Jun 2015 16:23:22 +0530
> Subject: [PATCH] perf tools:Fix perf_bench to show proper convergence
>
> With commit: e1e455f (perf tools: Work around lack of sched_getcpu in
> glibc < 2.6), perf_bench numa mem with -c or -m option is not able to
> correctly calculate convergence. With the above commit, sched_getcpu
> always seems to return -1. The intention of commit e1e455f was to add a
> sched_getcpu in glibc < 2.6. Hence keep the sched_getcpu definition
> under an ifdef.
>
> This regression happened occurred between v4.0 and v4.1
>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> tools/perf/util/cloexec.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c
> index 85b5238..2babdda 100644
> --- a/tools/perf/util/cloexec.c
> +++ b/tools/perf/util/cloexec.c
> @@ -7,11 +7,15 @@
>
> static unsigned long flag = PERF_FLAG_FD_CLOEXEC;
>
> +#ifdef __GLIBC_PREREQ
> +#if !__GLIBC_PREREQ(2, 6)
> int __weak sched_getcpu(void)
> {
> errno = ENOSYS;
> return -1;
> }
> +#endif
> +#endif
>
Thanks Srikar!
Acked-by: Ingo Molnar <mingo@kernel.org>
Ingo
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Regression in perf bench numa convergence stats
2015-06-24 12:49 ` Ingo Molnar
@ 2015-06-25 15:30 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-25 15:30 UTC (permalink / raw)
To: Ingo Molnar
Cc: Srikar Dronamraju, Jiri Olsa, Vinson Lee, Ingo Molnar, LKML,
Namhyung Kim, Masami Hiramatsu
Em Wed, Jun 24, 2015 at 02:49:28PM +0200, Ingo Molnar escreveu:
>
> Thanks Srikar!
>
> Acked-by: Ingo Molnar <mingo@kernel.org>
Thanks, applied to perf/urgent.
- Arnaldo
^ permalink raw reply [flat|nested] 4+ messages in thread
* [tip:perf/urgent] perf bench numa: Fix to show proper convergence stats
2015-06-24 11:10 Regression in perf bench numa convergence stats Srikar Dronamraju
2015-06-24 12:49 ` Ingo Molnar
@ 2015-06-26 8:43 ` tip-bot for Srikar Dronamraju
1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Srikar Dronamraju @ 2015-06-26 8:43 UTC (permalink / raw)
To: linux-tip-commits
Cc: acme, mingo, tglx, namhyung, jolsa, srikar, masami.hiramatsu.pt,
vlee, linux-kernel, hpa
Commit-ID: 2b42b09b88c831ba4da2d669581dde371c38c2af
Gitweb: http://git.kernel.org/tip/2b42b09b88c831ba4da2d669581dde371c38c2af
Author: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
AuthorDate: Wed, 24 Jun 2015 16:40:04 +0530
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 25 Jun 2015 12:28:35 -0300
perf bench numa: Fix to show proper convergence stats
With commit: e1e455f4f4d3 (perf tools: Work around lack of sched_getcpu
in glibc < 2.6), perf_bench numa mem with -c or -m option is not able to
correctly calculate convergence.
With the above commit, sched_getcpu always seems to return -1. The
intention of commit e1e455f was to add a sched_getcpu in glibc < 2.6.
Hence keep the sched_getcpu definition under an ifdef.
This regression happened occurred between v4.0 and v4.1
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vinson Lee <vlee@twitter.com>
Fixes: e1e455f4f4d3 ("perf tools: Work around lack of sched_getcpu in glibc < 2.6")
Link: http://lkml.kernel.org/r/20150624111004.GA5220@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/cloexec.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c
index 85b5238..2babdda 100644
--- a/tools/perf/util/cloexec.c
+++ b/tools/perf/util/cloexec.c
@@ -7,11 +7,15 @@
static unsigned long flag = PERF_FLAG_FD_CLOEXEC;
+#ifdef __GLIBC_PREREQ
+#if !__GLIBC_PREREQ(2, 6)
int __weak sched_getcpu(void)
{
errno = ENOSYS;
return -1;
}
+#endif
+#endif
static int perf_flag_probe(void)
{
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-06-26 8:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-24 11:10 Regression in perf bench numa convergence stats Srikar Dronamraju
2015-06-24 12:49 ` Ingo Molnar
2015-06-25 15:30 ` Arnaldo Carvalho de Melo
2015-06-26 8:43 ` [tip:perf/urgent] perf bench numa: Fix to show proper " tip-bot for Srikar Dronamraju
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox