* perf_events: zero time running and enabled, but non-zero count
@ 2009-10-23 19:16 Corey Ashford
[not found] ` <20091023193246.GA3139@elte.hu>
0 siblings, 1 reply; 6+ messages in thread
From: Corey Ashford @ 2009-10-23 19:16 UTC (permalink / raw)
To: Peter Zijlstra, paulus, LKML, Ingo Molnar
Hi folks,
First let me state that I'm working with a Linux kernel which is a couple of
weeks old and came from Linus' post-2.6.31 release tree. It does contain the
name change from perf_counter to perf_event, for example.
I'm working on some optimizations of PAPI's use of perf_events. In particular I
want to speed up the reading of events. To do this, I am doing two things:
1) Not disabling and enabling the counters around reading the counters. In
other words, read them on the fly.
2) Use the PERF_FORMAT_GROUP option so that I can read up an entire group of
counters in one syscall.
This seems to be working fairly well, except it seems that sometimes the "time
running" and "time enabled" values that I read up are zero, even though the
count is non-zero. The case I'm looking at has three groups, and it's the third
group, containing a single event, which is getting the zero enabled/running values.
One experiment that I just now tried was to put the code back in that
disables/re-enables the counters around the code that reads the counters. This
seems to have fixed the problem, but causes a performance degradation, of course.
Is this expected behavior? Should I be able to read the counts, and times on
the fly like I want to be able to do?
Thanks for your consideration of this issue.
- Corey
Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
cjashfor@us.ibm.com
^ permalink raw reply [flat|nested] 6+ messages in thread[parent not found: <20091023193246.GA3139@elte.hu>]
[parent not found: <4AE21655.6040403@linux.vnet.ibm.com>]
* Re: perf_events: zero time running and enabled, but non-zero count [not found] ` <4AE21655.6040403@linux.vnet.ibm.com> @ 2009-10-23 20:57 ` Ingo Molnar 2009-10-26 2:37 ` Corey Ashford 0 siblings, 1 reply; 6+ messages in thread From: Ingo Molnar @ 2009-10-23 20:57 UTC (permalink / raw) To: Corey Ashford; +Cc: Peter Zijlstra, paulus, LKML * Corey Ashford <cjashfor@linux.vnet.ibm.com> wrote: >> Btw., you might want to consider putting such testcases into a new >> 'perf test' kind of utility (under tools/perf/builtin-test.c) that >> does various tests such as reading a count and validating the >> time-running/time-enabled values - so that we can see any bugs >> directly. > > I'll see if I can put the test case in some sort of acceptable format > like that. Maybe we can use this one test case as the starting point > for builtin-test.c. Yeah. Feel free to add just something minimal that solves your problem. Others will add their testcases too i'm sure, once the basic command is there. Ingo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf_events: zero time running and enabled, but non-zero count 2009-10-23 20:57 ` Ingo Molnar @ 2009-10-26 2:37 ` Corey Ashford 2009-10-30 0:13 ` Corey Ashford 2009-11-02 21:20 ` Frederic Weisbecker 0 siblings, 2 replies; 6+ messages in thread From: Corey Ashford @ 2009-10-26 2:37 UTC (permalink / raw) To: Ingo Molnar, paulus; +Cc: Peter Zijlstra, LKML [-- Attachment #1: Type: text/plain, Size: 2257 bytes --] On 10/23/2009 01:57 PM, Ingo Molnar wrote: > > * Corey Ashford<cjashfor@linux.vnet.ibm.com> wrote: > >>> Btw., you might want to consider putting such testcases into a new >>> 'perf test' kind of utility (under tools/perf/builtin-test.c) that >>> does various tests such as reading a count and validating the >>> time-running/time-enabled values - so that we can see any bugs >>> directly. >> >> I'll see if I can put the test case in some sort of acceptable format >> like that. Maybe we can use this one test case as the starting point >> for builtin-test.c. > > Yeah. Feel free to add just something minimal that solves your problem. > Others will add their testcases too i'm sure, once the basic command is > there. > > Ingo Hi Ingo and Paul, I followed the directions Ingo provided for getting the -tip tree and then attempted to build the perf tool that's in it and it doesn't build. I'm getting this error on a Power5 box: ../../lib/find_next_bit.c: In function ‘ext2_swabp’: ../../lib/find_next_bit.c:168: warning: passing argument 1 of ‘__swab64p’ from incompatible pointer type make: *** [util/find_next_bit.o] Error 1 I looked at the code and it looks ok to me. There's a cast from an "unsigned long *" to a "unsigned long long *", but those types should be of the same size (in -m64 mode), so I don't know why this is failing. Some good news is that I have created a simple test case which demonstrates the bug, but I haven't formulated it in the builtin-test.c format yet, since I have been able to build perf. If you toss the attached file into the perf source directory and compile it as follows: gcc -m64 -o petb perf_events_time_bug.c -lrt it will show the error: % ./petb [0] = 1 [1] = 0 <- time enabled [2] = 0 <- time running [3] = 4559d1f8 [4] = 1 test failed! perf_events_time_bug.c -lrt If you compile it with: gcc -DDO_DISABLE -m66 -o petb perf_events_time_bug.c -lrt This will disable the event before reading it and then you will get the correct times. % ./petb [0] = 1 [1] = 2a50c1c8 <- time enabled [2] = 2a50c1c8 <- time running [3] = 460dea97 [4] = 2 test passed! -- Regards, - Corey Corey Ashford Software Engineer IBM Linux Technology Center, Linux Toolchain Beaverton, OR cjashfor@us.ibm.com [-- Attachment #2: perf_events_time_bug.c --] [-- Type: text/plain, Size: 2754 bytes --] #include "perf.h" #include <assert.h> #include <errno.h> #include <math.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <time.h> #include "perf.h" static double dummy3(double x, int iters); long long get_real_usec(void) { struct timespec ts; clock_gettime(CLOCK_REALTIME, &ts); return ((long long) ts.tv_sec * 1000000) + ((long long) ts.tv_nsec / 1000); } #define NUM_FLOPS 20000000 int main(int argc, char **argv) { struct perf_event_attr event_attr; int iters = NUM_FLOPS; double x = 1.1, y; long long t1, t2; t2 = 1000000; /* Target: 1,000,000 usec */ /* Measure one run */ t1 = get_real_usec(); y = dummy3(x, iters); t1 = get_real_usec() - t1; if (t2 > t1) /* Scale up execution time to match t2 */ iters = iters * (int) (t2 / t1); x = 1.0; memset(&event_attr, 0, sizeof(struct perf_event_attr)); event_attr.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID | PERF_FORMAT_TOTAL_TIME_RUNNING | PERF_FORMAT_TOTAL_TIME_ENABLED; event_attr.disabled = 1; event_attr.type = PERF_TYPE_HARDWARE; event_attr.config = PERF_COUNT_HW_CPU_CYCLES; /* count this thread, make this the group leader */ int fd = sys_perf_event_open(&event_attr, 0, -1, -1, 0); if (fd == -1) { printf("sys_perf_event_open failed: %s\n", strerror(errno)); exit(1); } int ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, NULL); if (ret == -1) { printf("enable failed: %s\n", strerror(errno)); exit(1); } y = dummy3(x, iters); #ifdef DO_DISABLE ret = ioctl(fd, PERF_EVENT_IOC_DISABLE, NULL); if (ret == -1) { printf("disable failed: %s\n", strerror(errno)); exit (1); } #endif uint64_t buffer[5]; int cnt; cnt = read(fd, buffer, sizeof(buffer)); if (cnt == -1) { printf("read failed: %s\n", strerror(errno)); } int i; for (i = 0; i < 5; i++) { printf("[%d] = %llx\n", i, buffer[i]); } #define TIME_ENABLED_IDX 1 #define TIME_RUNNING_IDX 2 if (!buffer[TIME_ENABLED_IDX] || !buffer[TIME_RUNNING_IDX]) { printf("test failed!\n"); exit(1); } else { printf("test passed!\n"); exit(0); } } static double dummy3(double x, int iters) { int i; double w, y, z, a, b, c, d, e, f, g, h; double one; one = 1.0; w = x; y = x; z = x; a = x; b = x; c = x; d = x; e = x; f = x; g = x; h = x; for (i = 1; i <= iters; i++) { w = w * 1.000000000001 + one; y = y * 1.000000000002 + one; z = z * 1.000000000003 + one; a = a * 1.000000000004 + one; b = b * 1.000000000005 + one; c = c * 0.999999999999 + one; d = d * 0.999999999998 + one; e = e * 0.999999999997 + one; f = f * 0.999999999996 + one; g = h * 0.999999999995 + one; h = h * 1.000000000006 + one; } return 2.0 * (a + b + c + d + e + f + w + x + y + z + g + h); } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf_events: zero time running and enabled, but non-zero count 2009-10-26 2:37 ` Corey Ashford @ 2009-10-30 0:13 ` Corey Ashford 2009-11-02 21:20 ` Frederic Weisbecker 1 sibling, 0 replies; 6+ messages in thread From: Corey Ashford @ 2009-10-30 0:13 UTC (permalink / raw) To: Ingo Molnar, paulus; +Cc: Peter Zijlstra, LKML Anyone had a chance to look at this test case yet? Regards, - Corey Corey Ashford wrote: > On 10/23/2009 01:57 PM, Ingo Molnar wrote: >> >> * Corey Ashford<cjashfor@linux.vnet.ibm.com> wrote: >> >>>> Btw., you might want to consider putting such testcases into a new >>>> 'perf test' kind of utility (under tools/perf/builtin-test.c) that >>>> does various tests such as reading a count and validating the >>>> time-running/time-enabled values - so that we can see any bugs >>>> directly. >>> >>> I'll see if I can put the test case in some sort of acceptable format >>> like that. Maybe we can use this one test case as the starting point >>> for builtin-test.c. >> >> Yeah. Feel free to add just something minimal that solves your problem. >> Others will add their testcases too i'm sure, once the basic command is >> there. >> >> Ingo > > Hi Ingo and Paul, > > I followed the directions Ingo provided for getting the -tip tree and > then attempted to build the perf tool that's in it and it doesn't build. > I'm getting this error on a Power5 box: > > ../../lib/find_next_bit.c: In function ‘ext2_swabp’: > ../../lib/find_next_bit.c:168: warning: passing argument 1 of > ‘__swab64p’ from incompatible pointer type > make: *** [util/find_next_bit.o] Error 1 > > I looked at the code and it looks ok to me. There's a cast from an > "unsigned long *" to a "unsigned long long *", but those types should be > of the same size (in -m64 mode), so I don't know why this is failing. > > Some good news is that I have created a simple test case which > demonstrates the bug, but I haven't formulated it in the builtin-test.c > format yet, since I have been able to build perf. > > If you toss the attached file into the perf source directory and compile > it as follows: > > gcc -m64 -o petb perf_events_time_bug.c -lrt > > it will show the error: > % ./petb > [0] = 1 > [1] = 0 <- time enabled > [2] = 0 <- time running > [3] = 4559d1f8 > [4] = 1 > test failed! > > perf_events_time_bug.c -lrt > > If you compile it with: > > gcc -DDO_DISABLE -m66 -o petb perf_events_time_bug.c -lrt > > This will disable the event before reading it and then you will get the > correct times. > > % ./petb > [0] = 1 > [1] = 2a50c1c8 <- time enabled > [2] = 2a50c1c8 <- time running > [3] = 460dea97 > [4] = 2 > test passed! > -- Regards, - Corey Corey Ashford Software Engineer IBM Linux Technology Center, Linux Toolchain Beaverton, OR 503-578-3507 cjashfor@us.ibm.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf_events: zero time running and enabled, but non-zero count 2009-10-26 2:37 ` Corey Ashford 2009-10-30 0:13 ` Corey Ashford @ 2009-11-02 21:20 ` Frederic Weisbecker 2009-11-03 0:27 ` Corey Ashford 1 sibling, 1 reply; 6+ messages in thread From: Frederic Weisbecker @ 2009-11-02 21:20 UTC (permalink / raw) To: Corey Ashford; +Cc: Ingo Molnar, paulus, Peter Zijlstra, LKML On Sun, Oct 25, 2009 at 07:37:09PM -0700, Corey Ashford wrote: > I'm getting this error on a Power5 box: > > ../../lib/find_next_bit.c: In function ‘ext2_swabp’: > ../../lib/find_next_bit.c:168: warning: passing argument 1 of > ‘__swab64p’ from incompatible pointer type > make: *** [util/find_next_bit.o] Error 1 > > I looked at the code and it looks ok to me. There's a cast from an > "unsigned long *" to a "unsigned long long *", but those types should be > of the same size (in -m64 mode), so I don't know why this is failing. Yeah, it's ok because we have the BITS_PER_LONG check. But gcc is not aware of that and just consider the cast as breaking the strict aliasing rules. Can you tell me if the following patch works for you? (only for this build error). Thanks. diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 800783d..167bc04 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -846,8 +846,9 @@ util/bitmap.o: ../../lib/bitmap.c PERF-CFLAGS util/hweight.o: ../../lib/hweight.c PERF-CFLAGS $(QUIET_CC)$(CC) -o util/hweight.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< +KFNB_FLAGS = `echo $(ALL_CFLAGS) | sed s/-Wstrict-aliasing=3//` util/find_next_bit.o: ../../lib/find_next_bit.c PERF-CFLAGS - $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< + $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(KFNB_FLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< perf-%$X: %.o $(PERFLIBS) $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: perf_events: zero time running and enabled, but non-zero count 2009-11-02 21:20 ` Frederic Weisbecker @ 2009-11-03 0:27 ` Corey Ashford 0 siblings, 0 replies; 6+ messages in thread From: Corey Ashford @ 2009-11-03 0:27 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Ingo Molnar, paulus, Peter Zijlstra, LKML Frederic Weisbecker wrote: > On Sun, Oct 25, 2009 at 07:37:09PM -0700, Corey Ashford wrote: >> I'm getting this error on a Power5 box: >> >> ../../lib/find_next_bit.c: In function ‘ext2_swabp’: >> ../../lib/find_next_bit.c:168: warning: passing argument 1 of >> ‘__swab64p’ from incompatible pointer type >> make: *** [util/find_next_bit.o] Error 1 >> >> I looked at the code and it looks ok to me. There's a cast from an >> "unsigned long *" to a "unsigned long long *", but those types should be >> of the same size (in -m64 mode), so I don't know why this is failing. > > > Yeah, it's ok because we have the BITS_PER_LONG check. But gcc is not > aware of that and just consider the cast as breaking the strict aliasing > rules. > > Can you tell me if the following patch works for you? (only for > this build error). Thanks for looking into the build problem, Frederic. With your patch, I'm still seeing the error: % patch -p1 < ~/patch patching file tools/perf/Makefile Hunk #1 succeeded at 845 (offset -1 lines). % cd tools/perf ~/linux.trees.git/tools/perf % make Makefile:472: No libdwarf.h found or old libdwarf.h found, disables dwarf support. Please install libdwarf-dev/libdwarf-devel >= 20081231 CC util/find_next_bit.o cc1: warnings being treated as errors ../../lib/find_next_bit.c: In function 'ext2_swabp': ../../lib/find_next_bit.c:168: warning: passing argument 1 of '__swab64p' from incompatible pointer type make: *** [util/find_next_bit.o] Error 1 elm3c4.beaverton.ibm.com:corey-13% cat ~/patch diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 800783d..167bc04 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -846,8 +846,9 @@ util/bitmap.o: ../../lib/bitmap.c PERF-CFLAGS util/hweight.o: ../../lib/hweight.c PERF-CFLAGS $(QUIET_CC)$(CC) -o util/hweight.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< +KFNB_FLAGS = `echo $(ALL_CFLAGS) | sed s/-Wstrict-aliasing=3//` util/find_next_bit.o: ../../lib/find_next_bit.c PERF-CFLAGS - $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< + $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(KFNB_FLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< perf-%$X: %.o $(PERFLIBS) $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) > > > diff --git a/tools/perf/Makefile b/tools/perf/Makefile > index 800783d..167bc04 100644 > --- a/tools/perf/Makefile > +++ b/tools/perf/Makefile > @@ -846,8 +846,9 @@ util/bitmap.o: ../../lib/bitmap.c PERF-CFLAGS > util/hweight.o: ../../lib/hweight.c PERF-CFLAGS > $(QUIET_CC)$(CC) -o util/hweight.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< > > +KFNB_FLAGS = `echo $(ALL_CFLAGS) | sed s/-Wstrict-aliasing=3//` > util/find_next_bit.o: ../../lib/find_next_bit.c PERF-CFLAGS > - $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(ALL_CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< > + $(QUIET_CC)$(CC) -o util/find_next_bit.o -c $(KFNB_FLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< > > perf-%$X: %.o $(PERFLIBS) > $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) > > -- Regards, - Corey Corey Ashford Software Engineer IBM Linux Technology Center, Linux Toolchain Beaverton, OR 503-578-3507 cjashfor@us.ibm.com ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-11-03 0:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-23 19:16 perf_events: zero time running and enabled, but non-zero count Corey Ashford
[not found] ` <20091023193246.GA3139@elte.hu>
[not found] ` <4AE21655.6040403@linux.vnet.ibm.com>
2009-10-23 20:57 ` Ingo Molnar
2009-10-26 2:37 ` Corey Ashford
2009-10-30 0:13 ` Corey Ashford
2009-11-02 21:20 ` Frederic Weisbecker
2009-11-03 0:27 ` Corey Ashford
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox