* perf_events: zero time running and enabled, but non-zero count
@ 2009-10-23 19:16 Corey Ashford
[not found] ` <20091023193246.GA3139@elte.hu>
0 siblings, 1 reply; 6+ messages in thread
From: Corey Ashford @ 2009-10-23 19:16 UTC (permalink / raw)
To: Peter Zijlstra, paulus, LKML, Ingo Molnar
Hi folks,
First let me state that I'm working with a Linux kernel which is a couple of
weeks old and came from Linus' post-2.6.31 release tree. It does contain the
name change from perf_counter to perf_event, for example.
I'm working on some optimizations of PAPI's use of perf_events. In particular I
want to speed up the reading of events. To do this, I am doing two things:
1) Not disabling and enabling the counters around reading the counters. In
other words, read them on the fly.
2) Use the PERF_FORMAT_GROUP option so that I can read up an entire group of
counters in one syscall.
This seems to be working fairly well, except it seems that sometimes the "time
running" and "time enabled" values that I read up are zero, even though the
count is non-zero. The case I'm looking at has three groups, and it's the third
group, containing a single event, which is getting the zero enabled/running values.
One experiment that I just now tried was to put the code back in that
disables/re-enables the counters around the code that reads the counters. This
seems to have fixed the problem, but causes a performance degradation, of course.
Is this expected behavior? Should I be able to read the counts, and times on
the fly like I want to be able to do?
Thanks for your consideration of this issue.
- Corey
Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
cjashfor@us.ibm.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf_events: zero time running and enabled, but non-zero count
[not found] ` <4AE21655.6040403@linux.vnet.ibm.com>
@ 2009-10-23 20:57 ` Ingo Molnar
2009-10-26 2:37 ` Corey Ashford
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2009-10-23 20:57 UTC (permalink / raw)
To: Corey Ashford; +Cc: Peter Zijlstra, paulus, LKML
* Corey Ashford <cjashfor@linux.vnet.ibm.com> wrote:
>> Btw., you might want to consider putting such testcases into a new
>> 'perf test' kind of utility (under tools/perf/builtin-test.c) that
>> does various tests such as reading a count and validating the
>> time-running/time-enabled values - so that we can see any bugs
>> directly.
>
> I'll see if I can put the test case in some sort of acceptable format
> like that. Maybe we can use this one test case as the starting point
> for builtin-test.c.
Yeah. Feel free to add just something minimal that solves your problem.
Others will add their testcases too i'm sure, once the basic command is
there.
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf_events: zero time running and enabled, but non-zero count
2009-10-23 20:57 ` Ingo Molnar
@ 2009-10-26 2:37 ` Corey Ashford
2009-10-30 0:13 ` Corey Ashford
2009-11-02 21:20 ` Frederic Weisbecker
0 siblings, 2 replies; 6+ messages in thread
From: Corey Ashford @ 2009-10-26 2:37 UTC (permalink / raw)
To: Ingo Molnar, paulus; +Cc: Peter Zijlstra, LKML
[-- Attachment #1: Type: text/plain, Size: 2257 bytes --]
On 10/23/2009 01:57 PM, Ingo Molnar wrote:
>
> * Corey Ashford<cjashfor@linux.vnet.ibm.com> wrote:
>
>>> Btw., you might want to consider putting such testcases into a new
>>> 'perf test' kind of utility (under tools/perf/builtin-test.c) that
>>> does various tests such as reading a count and validating the
>>> time-running/time-enabled values - so that we can see any bugs
>>> directly.
>>
>> I'll see if I can put the test case in some sort of acceptable format
>> like that. Maybe we can use this one test case as the starting point
>> for builtin-test.c.
>
> Yeah. Feel free to add just something minimal that solves your problem.
> Others will add their testcases too i'm sure, once the basic command is
> there.
>
> Ingo
Hi Ingo and Paul,
I followed the directions Ingo provided for getting the -tip tree and
then attempted to build the perf tool that's in it and it doesn't build.
I'm getting this error on a Power5 box:
../../lib/find_next_bit.c: In function ‘ext2_swabp’:
../../lib/find_next_bit.c:168: warning: passing argument 1 of
‘__swab64p’ from incompatible pointer type
make: *** [util/find_next_bit.o] Error 1
I looked at the code and it looks ok to me. There's a cast from an
"unsigned long *" to a "unsigned long long *", but those types should be
of the same size (in -m64 mode), so I don't know why this is failing.
Some good news is that I have created a simple test case which
demonstrates the bug, but I haven't formulated it in the builtin-test.c
format yet, since I have been able to build perf.
If you toss the attached file into the perf source directory and compile
it as follows:
gcc -m64 -o petb perf_events_time_bug.c -lrt
it will show the error:
% ./petb
[0] = 1
[1] = 0 <- time enabled
[2] = 0 <- time running
[3] = 4559d1f8
[4] = 1
test failed!
perf_events_time_bug.c -lrt
If you compile it with:
gcc -DDO_DISABLE -m66 -o petb perf_events_time_bug.c -lrt
This will disable the event before reading it and then you will get the
correct times.
% ./petb
[0] = 1
[1] = 2a50c1c8 <- time enabled
[2] = 2a50c1c8 <- time running
[3] = 460dea97
[4] = 2
test passed!
--
Regards,
- Corey
Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
cjashfor@us.ibm.com
[-- Attachment #2: perf_events_time_bug.c --]
[-- Type: text/plain, Size: 2754 bytes --]
#include "perf.h"
#include <assert.h>
#include <errno.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <time.h>
#include "perf.h"
static double dummy3(double x, int iters);
long long get_real_usec(void)
{
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return ((long long) ts.tv_sec * 1000000) + ((long long) ts.tv_nsec
/ 1000);
}
#define NUM_FLOPS 20000000
int main(int argc, char **argv)
{
struct perf_event_attr event_attr;
int iters = NUM_FLOPS;
double x = 1.1, y;
long long t1, t2;
t2 = 1000000; /* Target: 1,000,000 usec */
/* Measure one run */
t1 = get_real_usec();
y = dummy3(x, iters);
t1 = get_real_usec() - t1;
if (t2 > t1) /* Scale up execution time to match t2 */
iters = iters * (int) (t2 / t1);
x = 1.0;
memset(&event_attr, 0, sizeof(struct perf_event_attr));
event_attr.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID
| PERF_FORMAT_TOTAL_TIME_RUNNING
| PERF_FORMAT_TOTAL_TIME_ENABLED;
event_attr.disabled = 1;
event_attr.type = PERF_TYPE_HARDWARE;
event_attr.config = PERF_COUNT_HW_CPU_CYCLES;
/* count this thread, make this the group leader */
int fd = sys_perf_event_open(&event_attr, 0, -1, -1, 0);
if (fd == -1) {
printf("sys_perf_event_open failed: %s\n", strerror(errno));
exit(1);
}
int ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, NULL);
if (ret == -1) {
printf("enable failed: %s\n", strerror(errno));
exit(1);
}
y = dummy3(x, iters);
#ifdef DO_DISABLE
ret = ioctl(fd, PERF_EVENT_IOC_DISABLE, NULL);
if (ret == -1) {
printf("disable failed: %s\n", strerror(errno));
exit (1);
}
#endif
uint64_t buffer[5];
int cnt;
cnt = read(fd, buffer, sizeof(buffer));
if (cnt == -1) {
printf("read failed: %s\n", strerror(errno));
}
int i;
for (i = 0; i < 5; i++) {
printf("[%d] = %llx\n", i, buffer[i]);
}
#define TIME_ENABLED_IDX 1
#define TIME_RUNNING_IDX 2
if (!buffer[TIME_ENABLED_IDX] || !buffer[TIME_RUNNING_IDX]) {
printf("test failed!\n");
exit(1);
} else {
printf("test passed!\n");
exit(0);
}
}
static double dummy3(double x, int iters)
{
int i;
double w, y, z, a, b, c, d, e, f, g, h;
double one;
one = 1.0;
w = x;
y = x;
z = x;
a = x;
b = x;
c = x;
d = x;
e = x;
f = x;
g = x;
h = x;
for (i = 1; i <= iters; i++) {
w = w * 1.000000000001 + one;
y = y * 1.000000000002 + one;
z = z * 1.000000000003 + one;
a = a * 1.000000000004 + one;
b = b * 1.000000000005 + one;
c = c * 0.999999999999 + one;
d = d * 0.999999999998 + one;
e = e * 0.999999999997 + one;
f = f * 0.999999999996 + one;
g = h * 0.999999999995 + one;
h = h * 1.000000000006 + one;
}
return 2.0 * (a + b + c + d + e + f + w + x + y + z + g + h);
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf_events: zero time running and enabled, but non-zero count
2009-10-26 2:37 ` Corey Ashford
@ 2009-10-30 0:13 ` Corey Ashford
2009-11-02 21:20 ` Frederic Weisbecker
1 sibling, 0 replies; 6+ messages in thread
From: Corey Ashford @ 2009-10-30 0:13 UTC (permalink / raw)
To: Ingo Molnar, paulus; +Cc: Peter Zijlstra, LKML
Anyone had a chance to look at this test case yet?
Regards,
- Corey
Corey Ashford wrote:
> On 10/23/2009 01:57 PM, Ingo Molnar wrote:
>>
>> * Corey Ashford<cjashfor@linux.vnet.ibm.com> wrote:
>>
>>>> Btw., you might want to consider putting such testcases into a new
>>>> 'perf test' kind of utility (under tools/perf/builtin-test.c) that
>>>> does various tests such as reading a count and validating the
>>>> time-running/time-enabled values - so that we can see any bugs
>>>> directly.
>>>
>>> I'll see if I can put the test case in some sort of acceptable format
>>> like that. Maybe we can use this one test case as the starting point
>>> for builtin-test.c.
>>
>> Yeah. Feel free to add just something minimal that solves your problem.
>> Others will add their testcases too i'm sure, once the basic command is
>> there.
>>
>> Ingo
>
> Hi Ingo and Paul,
>
> I followed the directions Ingo provided for getting the -tip tree and
> then attempted to build the perf tool that's in it and it doesn't build.
> I'm getting this error on a Power5 box:
>
> ../../lib/find_next_bit.c: In function ‘ext2_swabp’:
> ../../lib/find_next_bit.c:168: warning: passing argument 1 of
> ‘__swab64p’ from incompatible pointer type
> make: *** [util/find_next_bit.o] Error 1
>
> I looked at the code and it looks ok to me. There's a cast from an
> "unsigned long *" to a "unsigned long long *", but those types should be
> of the same size (in -m64 mode), so I don't know why this is failing.
>
> Some good news is that I have created a simple test case which
> demonstrates the bug, but I haven't formulated it in the builtin-test.c
> format yet, since I have been able to build perf.
>
> If you toss the attached file into the perf source directory and compile
> it as follows:
>
> gcc -m64 -o petb perf_events_time_bug.c -lrt
>
> it will show the error:
> % ./petb
> [0] = 1
> [1] = 0 <- time enabled
> [2] = 0 <- time running
> [3] = 4559d1f8
> [4] = 1
> test failed!
>
> perf_events_time_bug.c -lrt
>
> If you compile it with:
>
> gcc -DDO_DISABLE -m66 -o petb perf_events_time_bug.c -lrt
>
> This will disable the event before reading it and then you will get the
> correct times.
>
> % ./petb
> [0] = 1
> [1] = 2a50c1c8 <- time enabled
> [2] = 2a50c1c8 <- time running
> [3] = 460dea97
> [4] = 2
> test passed!
>
--
Regards,
- Corey
Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
503-578-3507
cjashfor@us.ibm.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf_events: zero time running and enabled, but non-zero count
2009-10-26 2:37 ` Corey Ashford
2009-10-30 0:13 ` Corey Ashford
@ 2009-11-02 21:20 ` Frederic Weisbecker
2009-11-03 0:27 ` Corey Ashford
1 sibling, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2009-11-02 21:20 UTC (permalink / raw)
To: Corey Ashford; +Cc: Ingo Molnar, paulus, Peter Zijlstra, LKML
On Sun, Oct 25, 2009 at 07:37:09PM -0700, Corey Ashford wrote:
> I'm getting this error on a Power5 box:
>
> ../../lib/find_next_bit.c: In function ‘ext2_swabp’:
> ../../lib/find_next_bit.c:168: warning: passing argument 1 of
> ‘__swab64p’ from incompatible pointer type
> make: *** [util/find_next_bit.o] Error 1
>
> I looked at the code and it looks ok to me. There's a cast from an
> "unsigned long *" to a "unsigned long long *", but those types should be
> of the same size (in -m64 mode), so I don't know why this is failing.
Yeah, it's ok because we have the BITS_PER_LONG check. But gcc is not
aware of that and just consider the cast as breaking the strict aliasing
rules.
Can you tell me if the following patch works for you? (only for
this build error).
Thanks.
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 800783d..167bc04 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -846,8 +846,9 @@ util/bitmap.o: ../../lib/bitmap.c PERF-CFLAGS
util/hweight.o: ../../lib/hweight.c PERF-CFLAGS
$(QUIET_CC)$(CC) -o util/hweight.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
+KFNB_FLAGS = `echo $(ALL_CFLAGS) | sed s/-Wstrict-aliasing=3//`
util/find_next_bit.o: ../../lib/find_next_bit.c PERF-CFLAGS
- $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
+ $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(KFNB_FLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
perf-%$X: %.o $(PERFLIBS)
$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: perf_events: zero time running and enabled, but non-zero count
2009-11-02 21:20 ` Frederic Weisbecker
@ 2009-11-03 0:27 ` Corey Ashford
0 siblings, 0 replies; 6+ messages in thread
From: Corey Ashford @ 2009-11-03 0:27 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: Ingo Molnar, paulus, Peter Zijlstra, LKML
Frederic Weisbecker wrote:
> On Sun, Oct 25, 2009 at 07:37:09PM -0700, Corey Ashford wrote:
>> I'm getting this error on a Power5 box:
>>
>> ../../lib/find_next_bit.c: In function ‘ext2_swabp’:
>> ../../lib/find_next_bit.c:168: warning: passing argument 1 of
>> ‘__swab64p’ from incompatible pointer type
>> make: *** [util/find_next_bit.o] Error 1
>>
>> I looked at the code and it looks ok to me. There's a cast from an
>> "unsigned long *" to a "unsigned long long *", but those types should be
>> of the same size (in -m64 mode), so I don't know why this is failing.
>
>
> Yeah, it's ok because we have the BITS_PER_LONG check. But gcc is not
> aware of that and just consider the cast as breaking the strict aliasing
> rules.
>
> Can you tell me if the following patch works for you? (only for
> this build error).
Thanks for looking into the build problem, Frederic. With your patch, I'm still
seeing the error:
% patch -p1 < ~/patch
patching file tools/perf/Makefile
Hunk #1 succeeded at 845 (offset -1 lines).
% cd tools/perf
~/linux.trees.git/tools/perf
% make
Makefile:472: No libdwarf.h found or old libdwarf.h found, disables dwarf
support. Please install libdwarf-dev/libdwarf-devel >= 20081231
CC util/find_next_bit.o
cc1: warnings being treated as errors
../../lib/find_next_bit.c: In function 'ext2_swabp':
../../lib/find_next_bit.c:168: warning: passing argument 1 of '__swab64p' from
incompatible pointer type
make: *** [util/find_next_bit.o] Error 1
elm3c4.beaverton.ibm.com:corey-13% cat ~/patch
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 800783d..167bc04 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -846,8 +846,9 @@ util/bitmap.o: ../../lib/bitmap.c PERF-CFLAGS
util/hweight.o: ../../lib/hweight.c PERF-CFLAGS
$(QUIET_CC)$(CC) -o util/hweight.o -c $(ALL_CFLAGS)
-DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
+KFNB_FLAGS = `echo $(ALL_CFLAGS) | sed s/-Wstrict-aliasing=3//`
util/find_next_bit.o: ../../lib/find_next_bit.c PERF-CFLAGS
- $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(ALL_CFLAGS)
-DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
+ $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(KFNB_FLAGS)
-DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
perf-%$X: %.o $(PERFLIBS)
$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^)
$(LIBS)
>
>
> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> index 800783d..167bc04 100644
> --- a/tools/perf/Makefile
> +++ b/tools/perf/Makefile
> @@ -846,8 +846,9 @@ util/bitmap.o: ../../lib/bitmap.c PERF-CFLAGS
> util/hweight.o: ../../lib/hweight.c PERF-CFLAGS
> $(QUIET_CC)$(CC) -o util/hweight.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
>
> +KFNB_FLAGS = `echo $(ALL_CFLAGS) | sed s/-Wstrict-aliasing=3//`
> util/find_next_bit.o: ../../lib/find_next_bit.c PERF-CFLAGS
> - $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
> + $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(KFNB_FLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
>
> perf-%$X: %.o $(PERFLIBS)
> $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS)
>
>
--
Regards,
- Corey
Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
503-578-3507
cjashfor@us.ibm.com
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-11-03 0:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-23 19:16 perf_events: zero time running and enabled, but non-zero count Corey Ashford
[not found] ` <20091023193246.GA3139@elte.hu>
[not found] ` <4AE21655.6040403@linux.vnet.ibm.com>
2009-10-23 20:57 ` Ingo Molnar
2009-10-26 2:37 ` Corey Ashford
2009-10-30 0:13 ` Corey Ashford
2009-11-02 21:20 ` Frederic Weisbecker
2009-11-03 0:27 ` Corey Ashford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox