* perf: regression with PERF_EVENT_IOC_REFRESH @ 2011-05-23 20:04 Vince Weaver 2011-05-23 20:22 ` Peter Zijlstra 0 siblings, 1 reply; 25+ messages in thread From: Vince Weaver @ 2011-05-23 20:04 UTC (permalink / raw) To: linux-kernel; +Cc: fbuihuu, a.p.zijlstra, mingo, paulus, acme Hello the changeset 2e939d1d perf: Limit event refresh to sampling event changes the behavior of ioctl( , PERF_EVENT_IOC_REFRESH, ) before the changeset, you could have a counter group where only one of the subevents was sampling. PERF_EVENT_IOC_REFRESH would correctly enable sampling for only that subevent. With the changeset applied, this fails with EINVALID. Now you can only sample in a group leader. Was this an intended change? It breaks various of our PAPI regression tests that worked fine before the change was committed. Vince vweaver1@eecs.utk.edu ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-23 20:04 perf: regression with PERF_EVENT_IOC_REFRESH Vince Weaver @ 2011-05-23 20:22 ` Peter Zijlstra 2011-05-24 6:20 ` Vince Weaver 0 siblings, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2011-05-23 20:22 UTC (permalink / raw) To: Vince Weaver; +Cc: linux-kernel, fbuihuu, mingo, paulus, acme On Mon, 2011-05-23 at 16:04 -0400, Vince Weaver wrote: > Hello > > the changeset 2e939d1d perf: Limit event refresh to sampling event > > changes the behavior of > ioctl( , PERF_EVENT_IOC_REFRESH, ) > > before the changeset, you could have a counter group where only one of the > subevents was sampling. PERF_EVENT_IOC_REFRESH would correctly enable > sampling for only that subevent. But how? it adds to the event_limit of the event you call the ioctl() on, not on any of the group siblings. > With the changeset applied, this fails with EINVALID. Now you can only > sample in a group leader. I'm not quite seeing how group-leaders are relevant here, you can only call this ioctl() on sampling events, regardless of if they're they group leader or a sibling. > Was this an intended change? It breaks various of our PAPI regression > tests that worked fine before the change was committed. I'm not quite seeing what's wrong yet, the change seemed simple enough in that calling that ioctl() on an event that wasn't capable of generating samples doesn't make sense, since it will increase the event limit for the event you call it on. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-23 20:22 ` Peter Zijlstra @ 2011-05-24 6:20 ` Vince Weaver 2011-05-24 10:30 ` Peter Zijlstra 0 siblings, 1 reply; 25+ messages in thread From: Vince Weaver @ 2011-05-24 6:20 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, fbuihuu, mingo, paulus, acme [-- Attachment #1: Type: TEXT/PLAIN, Size: 1451 bytes --] On Mon, 23 May 2011, Peter Zijlstra wrote: > > before the changeset, you could have a counter group where only one of the > > subevents was sampling. PERF_EVENT_IOC_REFRESH would correctly enable > > sampling for only that subevent. > > But how? it adds to the event_limit of the event you call the ioctl() > on, not on any of the group siblings. the old behavior did. > > With the changeset applied, this fails with EINVALID. Now you can only > > sample in a group leader. > > I'm not quite seeing how group-leaders are relevant here, you can only > call this ioctl() on sampling events, regardless of if they're they > group leader or a sibling. previous behavior was that if you called it on a group leader that wasn't sampling, a sibling event that was sampling would be activated. > > Was this an intended change? It breaks various of our PAPI regression > > tests that worked fine before the change was committed. > > I'm not quite seeing what's wrong yet, the change seemed simple enough > in that calling that ioctl() on an event that wasn't capable of > generating samples doesn't make sense, since it will increase the event > limit for the event you call it on. attached is some code that will return a sampled count on older kernels but gives EINVAL on current kernels. The old behavior might have been unintentional, but PAPI unfortunately depends on it (not code I originaly wrote, but code I have to maintain). Vince [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: TEXT/x-csrc; name=sampled_notleader_refresh.c, Size: 3040 bytes --] /* sampled_notleader_refresh.c */ /* by Vince Weaver vweaver1 _at_ eecs.utk.edu */ /* Compile with gcc -O2 -Wall -o sampled_notleader_refresh sampled_notleader_refresh.c */ #define _GNU_SOURCE 1 #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #include <signal.h> #include <sys/mman.h> #include <sys/ioctl.h> #include <asm/unistd.h> #include <sys/prctl.h> #include <linux/perf_event.h> static int count=0; static void our_handler(int signum,siginfo_t *oh, void *blah) { count++; } int perf_event_open(struct perf_event_attr *hw_event_uptr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open, hw_event_uptr, pid, cpu, group_fd,flags); } double busywork(int count) { int i; double sum=0.0012; for(i=0;i<count;i++) { sum+=0.01; } return sum; } int main(int argc, char** argv) { int ret,fd1,fd2; double result; struct perf_event_attr pe; struct sigaction sa; memset(&sa, 0, sizeof(struct sigaction)); sa.sa_sigaction = our_handler; sa.sa_flags = SA_SIGINFO; if (sigaction( SIGIO, &sa, NULL) < 0) { fprintf(stderr,"Error setting up signal handler\n"); exit(1); } memset(&pe,0,sizeof(struct perf_event_attr)); pe.type=PERF_TYPE_HARDWARE; pe.size=sizeof(struct perf_event_attr); pe.config=PERF_COUNT_HW_INSTRUCTIONS; pe.sample_period=000000; pe.sample_type=0; pe.read_format=PERF_FORMAT_GROUP|PERF_FORMAT_ID; pe.disabled=1; pe.pinned=0; pe.exclude_kernel=1; pe.exclude_hv=1; pe.wakeup_events=1; fd1=perf_event_open(&pe,0,-1,-1,0); if (fd1<0) { fprintf(stderr,"Error opening leader %llx\n",pe.config); exit(1); } pe.type=PERF_TYPE_HARDWARE; pe.config=PERF_COUNT_HW_INSTRUCTIONS; pe.sample_period=1000000; pe.sample_type=PERF_SAMPLE_IP; pe.read_format=PERF_FORMAT_GROUP|PERF_FORMAT_ID; pe.disabled=0; pe.pinned=0; pe.exclude_kernel=1; pe.exclude_hv=1; pe.wakeup_events=1; fd2=perf_event_open(&pe,0,-1,fd1,0); if (fd2<0) { fprintf(stderr,"Error opening %llx\n",pe.config); exit(1); } void *blargh; /* important! */ blargh=mmap(NULL, (1+1)*4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd2, 0); fcntl(fd2, F_SETFL, O_RDWR|O_NONBLOCK|O_ASYNC); fcntl(fd2, F_SETSIG, SIGIO); fcntl(fd2,F_SETOWN,getpid()); ioctl(fd1, PERF_EVENT_IOC_RESET, 0); ioctl(fd2, PERF_EVENT_IOC_RESET, 0); ret=ioctl(fd1, PERF_EVENT_IOC_REFRESH,0); if (ret<0) { fprintf(stderr,"Error with PERF_EVENT_IOC_REFRESH of group leader: " "%d %s\n",errno,strerror(errno)); exit(1); } result=busywork(10000000); ret=ioctl(fd2, PERF_EVENT_IOC_DISABLE,0); printf("Count: %d %lf\n",count,result); return 0; } ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 6:20 ` Vince Weaver @ 2011-05-24 10:30 ` Peter Zijlstra 2011-05-24 15:04 ` Vince Weaver 2011-05-28 3:38 ` Vince Weaver 0 siblings, 2 replies; 25+ messages in thread From: Peter Zijlstra @ 2011-05-24 10:30 UTC (permalink / raw) To: Vince Weaver; +Cc: linux-kernel, fbuihuu, mingo, paulus, acme On Tue, 2011-05-24 at 02:20 -0400, Vince Weaver wrote: > On Mon, 23 May 2011, Peter Zijlstra wrote: > > > > before the changeset, you could have a counter group where only one of the > > > subevents was sampling. PERF_EVENT_IOC_REFRESH would correctly enable > > > sampling for only that subevent. > > > > But how? it adds to the event_limit of the event you call the ioctl() > > on, not on any of the group siblings. > > the old behavior did. > > > > With the changeset applied, this fails with EINVALID. Now you can only > > > sample in a group leader. > > > > I'm not quite seeing how group-leaders are relevant here, you can only > > call this ioctl() on sampling events, regardless of if they're they > > group leader or a sibling. > > previous behavior was that if you called it on a group leader that wasn't > sampling, a sibling event that was sampling would be activated. > > > > Was this an intended change? It breaks various of our PAPI regression > > > tests that worked fine before the change was committed. > > > > I'm not quite seeing what's wrong yet, the change seemed simple enough > > in that calling that ioctl() on an event that wasn't capable of > > generating samples doesn't make sense, since it will increase the event > > limit for the event you call it on. > > attached is some code that will return a sampled count on older kernels > but gives EINVAL on current kernels. > > The old behavior might have been unintentional, but PAPI unfortunately > depends on it (not code I originaly wrote, but code I have to maintain). Right, so what the code does is create a group of which the leader is disabled, which effectively has the whole group disabled. It then abuses REFRESH,0 to enable the leader. The code should use ENABLE (surprise, surprise!) to enable the leader and get things going. This is very clear abuse of the API and I'm not really inclined to fix it, in fact, I'd be tempted to add an additional test to verify the argument to REFRESH > 0. Then again, I do appreciate you're having a problem there, how soon can you push this trivial fix into all maintained PAPI branches? We could revert this for one release and then properly shut this abuse down the next release. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 10:30 ` Peter Zijlstra @ 2011-05-24 15:04 ` Vince Weaver 2011-05-24 15:10 ` Peter Zijlstra 2011-05-24 15:11 ` Peter Zijlstra 2011-05-28 3:38 ` Vince Weaver 1 sibling, 2 replies; 25+ messages in thread From: Vince Weaver @ 2011-05-24 15:04 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, fbuihuu, mingo, paulus, acme On Tue, 24 May 2011, Peter Zijlstra wrote: > > The old behavior might have been unintentional, but PAPI unfortunately > > depends on it (not code I originaly wrote, but code I have to maintain). > > Right, so what the code does is create a group of which the leader is > disabled, which effectively has the whole group disabled. It then abuses > REFRESH,0 to enable the leader. yes. This is currently what PAPI does. > The code should use ENABLE (surprise, surprise!) to enable the leader > and get things going. It was unclear from the documentation that enabling a group leader that had siblings with overflow setup would start them triggering without some sort of REFRESH first. It does seem to work though. But then again, so did the original behavior. > This is very clear abuse of the API and I'm not really inclined to fix > it, in fact, I'd be tempted to add an additional test to verify the > argument to REFRESH > 0. Well, when there's no documentation then people write to the code. As I said before, I didn't write this code. I just get to pick up the pieces when it breaks. As an aside, I also wasted 6 hours last night finding out that you don't get signaled on overflow if you don't have a ring-buffer mmap()ed, even if you never read from the buffer and you only are interested in counting the number of overflows. I should stop complaining though or you'll start telling me to fix the documentation. Which maybe I would do if I didn't spend all my time writing code to reproduce problems in the perf ABI for bisection purposes. > Then again, I do appreciate you're having a problem there, how soon can > you push this trivial fix into all maintained PAPI branches? We could > revert this for one release and then properly shut this abuse down the > next release. well since you apparently aren't going to retroactively revert it for 2.6.37, 2.6.38, or 2.6.39 then it would be silly to do it just for 2.6.40. I'll audit the PAPI code to see how widespread it is. To warn you though, we still have people complain within hours when we break support for 2.6.16 kernels, so it's not like the people who use PAPI are the kind to run out and install a minor stable release. They're going to upgrade their distro or move their binary to a newer machine and suddenly start geting EINVAL returns on their previously working code and then start noisily complaining. Vince ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 15:04 ` Vince Weaver @ 2011-05-24 15:10 ` Peter Zijlstra 2011-05-24 15:40 ` Ingo Molnar 2011-05-24 17:53 ` Vince Weaver 2011-05-24 15:11 ` Peter Zijlstra 1 sibling, 2 replies; 25+ messages in thread From: Peter Zijlstra @ 2011-05-24 15:10 UTC (permalink / raw) To: Vince Weaver; +Cc: linux-kernel, fbuihuu, mingo, paulus, acme On Tue, 2011-05-24 at 11:04 -0400, Vince Weaver wrote: > 2.6.37, 2.6.38, or 2.6.39 then it would be silly to do it just for > 2.6.40. Oh, I assumed it was recent and .39/.40 would suffice. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 15:10 ` Peter Zijlstra @ 2011-05-24 15:40 ` Ingo Molnar 2011-05-24 20:31 ` Vince Weaver 2011-05-24 17:53 ` Vince Weaver 1 sibling, 1 reply; 25+ messages in thread From: Ingo Molnar @ 2011-05-24 15:40 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Vince Weaver, linux-kernel, fbuihuu, paulus, acme * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Tue, 2011-05-24 at 11:04 -0400, Vince Weaver wrote: > > > 2.6.37, 2.6.38, or 2.6.39 then it would be silly to do it just for > > 2.6.40. No, this commit was added in v2.6.38 so v2.6.37 should be fine. > Oh, I assumed it was recent and .39/.40 would suffice. Btw., how did it happen that the PAPI tests did not get run against upstream over the course of about half a year, two full stable kernels released: Date: Mon Mar 14 18:20:32 2011 -0700 Linux 2.6.38 Date: Wed May 18 21:06:34 2011 -0700 Linux 2.6.39 ? I'd suggest periodically running the PAPI tests on the perf development tree: http://people.redhat.com/mingo/tip.git/README doing that would have caught this problem 6 months ago. The upstream policy is that regressions are generally recognized before the next kernel gets released: i.e. in the stabilization period after -rc1, the roughly two months until the final kernel gets released. That is the window when we can still fix regressions relatively cheaply. Yes, there are exceptions, but if a piece of user-space code did not get tested with upstream over months and months then that moves into the 'fix it if we can' category - not a regression per se. So the upstream message is: we can only care about you if you care testing upstream. So if it's easy to fix we can certainly fix this bug and mark it for a -stable backport, but this is not a regression that got reported to us in any timely manner. Btw., to get such assumptions tested more frequently than twice a year i'd suggest moving these usecases into 'perf test' or so - that it gets run every day: $ perf test 1: vmlinux symtab matches kallsyms: FAILED! 2: detect open syscall event: Ok 3: detect open syscall event on all cpus: Ok 4: read samples using the mmap interface: Ok Thanks, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 15:40 ` Ingo Molnar @ 2011-05-24 20:31 ` Vince Weaver 2011-05-25 10:39 ` Ingo Molnar 0 siblings, 1 reply; 25+ messages in thread From: Vince Weaver @ 2011-05-24 20:31 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel, fbuihuu, paulus, acme On Tue, 24 May 2011, Ingo Molnar wrote: > > Btw., how did it happen that the PAPI tests did not get run against upstream > over the course of about half a year, two full stable kernels released: we run regresion tests nightly. There was a bug in our "create two events and sample on the second" test, where it was actualy sampling on the first counter by mistake. When I fixed the test to do what it claimed to do it found the bug. PAPI runs on at least 5 different operating systems, 3 different Linux perf counter implementations, and on kernels dating back to 2.4. Plus numerous architectures. While we try to test against recent Linux perf_events, we just don't have the hardware to be comprehensive. It doesn't help that our test machines are primarily used for GPGPU work during the day, so we're limited to what kernels we can have installed due to driver issues. > suggest moving these usecases into 'perf test' or so - that it gets run every > day: you can feel free to install PAPI on your test machines and give it a run daily too. It's open source setenv CVSROOT :pserver:anonymous@cvs.eecs.utk.edu:/cvs/homes/papi cvs login cvs co all cd papi ./configure make ./run_tests There are often false negatives on the tests that can be a pain to track down. Welcome to my world. We gladly accept patches. Vince ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 20:31 ` Vince Weaver @ 2011-05-25 10:39 ` Ingo Molnar 2011-05-25 21:24 ` Vince Weaver 0 siblings, 1 reply; 25+ messages in thread From: Ingo Molnar @ 2011-05-25 10:39 UTC (permalink / raw) To: Vince Weaver; +Cc: Peter Zijlstra, linux-kernel, fbuihuu, paulus, acme * Vince Weaver <vweaver1@eecs.utk.edu> wrote: > > suggest moving these usecases into 'perf test' or so - that it > > gets run every day: > > you can feel free to install PAPI on your test machines and give it > a run daily too. It's open source Well, i gave you a suggestion of how to prevent such regressions in the future, should you be worried about such regressions. The standing upstream policy is that while we try to do a good effort and obviously fix everything we find ourselves or which gets reported to us, we cannot test everything so if you want us to fix bugs you need to pick one or more of these options: - run our code - or help us build better tests, either to 'perf test' (or to LTP) - or get your users to test recent upstream kernels for you - or ignore us and deal with regressions whenever you get hit by them Your choice really. Thanks, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-25 10:39 ` Ingo Molnar @ 2011-05-25 21:24 ` Vince Weaver 0 siblings, 0 replies; 25+ messages in thread From: Vince Weaver @ 2011-05-25 21:24 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel, fbuihuu, paulus, acme On Wed, 25 May 2011, Ingo Molnar wrote: > * Vince Weaver <vweaver1@eecs.utk.edu> wrote: > > Well, i gave you a suggestion of how to prevent such regressions in > the future, should you be worried about such regressions. > > - or help us build better tests, either to 'perf test' (or to LTP) I do have a perf_event validation test suite that I maintain to help when debugging PAPI issues. http://web.eecs.utk.edu/~vweaver1/projects/perf-events/validation.html it's only about 12 tests right now, but feel free to use them if you'd like (they're GPL). I really don't have any interest in merging them into the perf tree though. Vince ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 15:10 ` Peter Zijlstra 2011-05-24 15:40 ` Ingo Molnar @ 2011-05-24 17:53 ` Vince Weaver 1 sibling, 0 replies; 25+ messages in thread From: Vince Weaver @ 2011-05-24 17:53 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, fbuihuu, mingo, paulus, acme On Tue, 24 May 2011, Peter Zijlstra wrote: > On Tue, 2011-05-24 at 11:04 -0400, Vince Weaver wrote: > > 2.6.37, 2.6.38, or 2.6.39 then it would be silly to do it just for > > 2.6.40. > > Oh, I assumed it was recent and .39/.40 would suffice. It turns out the problem was only introduced in PAPI in November ( I had assumed it dated back further). So maybe not as dire as it seemed at first. The fix to PAPI does seem to be trivial, though the whole set of bad code was introduced to work around a different problem so I need to verify the fix doesn't break other things. Thanks, Vince vweaver1@eecs.utk.edu ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 15:04 ` Vince Weaver 2011-05-24 15:10 ` Peter Zijlstra @ 2011-05-24 15:11 ` Peter Zijlstra 2011-05-24 15:18 ` Peter Zijlstra 1 sibling, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2011-05-24 15:11 UTC (permalink / raw) To: Vince Weaver; +Cc: linux-kernel, fbuihuu, mingo, paulus, acme On Tue, 2011-05-24 at 11:04 -0400, Vince Weaver wrote: > As an aside, I also wasted 6 hours last night finding out that you don't > get signaled on overflow if you don't have a ring-buffer mmap()ed, even > if you never read from the buffer and you only are interested in counting > the number of overflows. That sounds like something we could fix, let me investigate. > I should stop complaining though or you'll > start telling me to fix the documentation. Which maybe I would do if I > didn't spend all my time writing code to reproduce problems in the perf > ABI for bisection purposes. Both are appreciated. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 15:11 ` Peter Zijlstra @ 2011-05-24 15:18 ` Peter Zijlstra 2011-05-24 21:48 ` Vince Weaver 0 siblings, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2011-05-24 15:18 UTC (permalink / raw) To: Vince Weaver; +Cc: linux-kernel, fbuihuu, mingo, paulus, acme On Tue, 2011-05-24 at 17:11 +0200, Peter Zijlstra wrote: > On Tue, 2011-05-24 at 11:04 -0400, Vince Weaver wrote: > > As an aside, I also wasted 6 hours last night finding out that you don't > > get signaled on overflow if you don't have a ring-buffer mmap()ed, even > > if you never read from the buffer and you only are interested in counting > > the number of overflows. > > That sounds like something we could fix, let me investigate. Does the below cure this? --- kernel/events/core.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index c09767f..bd1ba5e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5018,9 +5018,12 @@ static int __perf_event_overflow(struct perf_event *event, int nmi, event->pending_kill = POLL_HUP; if (nmi) { event->pending_disable = 1; + event->pending_wakeup = 1; irq_work_queue(&event->pending); - } else + } else { perf_event_disable(event); + perf_event_wakeup(event); + } } if (event->overflow_handler) ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 15:18 ` Peter Zijlstra @ 2011-05-24 21:48 ` Vince Weaver 0 siblings, 0 replies; 25+ messages in thread From: Vince Weaver @ 2011-05-24 21:48 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, fbuihuu, mingo, paulus, acme On Tue, 24 May 2011, Peter Zijlstra wrote: > On Tue, 2011-05-24 at 17:11 +0200, Peter Zijlstra wrote: > > On Tue, 2011-05-24 at 11:04 -0400, Vince Weaver wrote: > > > As an aside, I also wasted 6 hours last night finding out that you don't > > > get signaled on overflow if you don't have a ring-buffer mmap()ed, even > > > if you never read from the buffer and you only are interested in counting > > > the number of overflows. > > > > That sounds like something we could fix, let me investigate. > > Does the below cure this? > well that was an interesting interaction between debian-unstable and linus-git. Luckily a udev update seemed to help. Anyway, your patch did not fix the problem. I still don't get overflow signals if the fd doesn't have a ring-buffer mmap()ed to it. For a test case you can take my previous test case and comment out the mmap call. Since this test case is probably the only code in the world trying to do this, maybe it's not that important. I can test further patches, but my fastest build machine here at home takes 3 hours to build a kernel so there will be some latency in the response. Vince ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-24 10:30 ` Peter Zijlstra 2011-05-24 15:04 ` Vince Weaver @ 2011-05-28 3:38 ` Vince Weaver 2011-05-28 10:22 ` Peter Zijlstra 1 sibling, 1 reply; 25+ messages in thread From: Vince Weaver @ 2011-05-28 3:38 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, mingo, paulus, acme On Tue, 24 May 2011, Peter Zijlstra wrote: > > This is very clear abuse of the API and I'm not really inclined to fix > it, in fact, I'd be tempted to add an additional test to verify the > argument to REFRESH > 0. on that note (and while trying to document exactly what the ioctls do) it seems that a PERF_EVENT_IOC_REFRESH with an argument of anything higher than one does not work on kernels 2.6.36 and newer. The behavior acts as if 1 was passed, even if you pass in, say, 3. Is it worth bisecting this, or has this become the new official behavior since no one noticed until now? To reproduce you can grab my tests from here: http://web.eecs.utk.edu/~vweaver1/projects/perf-events/validation.html and run the ./validation/simple_overflow_leader test Vince ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-28 3:38 ` Vince Weaver @ 2011-05-28 10:22 ` Peter Zijlstra 2011-05-28 13:26 ` perf: definition of a "regression" Vince Weaver 2011-05-29 16:54 ` perf: regression with PERF_EVENT_IOC_REFRESH Vince Weaver 0 siblings, 2 replies; 25+ messages in thread From: Peter Zijlstra @ 2011-05-28 10:22 UTC (permalink / raw) To: Vince Weaver; +Cc: linux-kernel, mingo, paulus, acme On Fri, 2011-05-27 at 23:38 -0400, Vince Weaver wrote: > on that note (and while trying to document exactly what the ioctls do) it > seems that a PERF_EVENT_IOC_REFRESH with an argument of anything higher > than one does not work on kernels 2.6.36 and newer. The behavior acts > as if 1 was passed, even if you pass in, say, 3. Urgh, no that should definitely work. Thanks for the test-case, I'll work on that (probably not until Monday though, but who knows). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: definition of a "regression" 2011-05-28 10:22 ` Peter Zijlstra @ 2011-05-28 13:26 ` Vince Weaver 2011-06-02 7:45 ` Ingo Molnar 2011-05-29 16:54 ` perf: regression with PERF_EVENT_IOC_REFRESH Vince Weaver 1 sibling, 1 reply; 25+ messages in thread From: Vince Weaver @ 2011-05-28 13:26 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, mingo, paulus, acme, torvalds On Sat, 28 May 2011, Peter Zijlstra wrote: > On Fri, 2011-05-27 at 23:38 -0400, Vince Weaver wrote: > > on that note (and while trying to document exactly what the ioctls do) it > > seems that a PERF_EVENT_IOC_REFRESH with an argument of anything higher > > than one does not work on kernels 2.6.36 and newer. The behavior acts > > as if 1 was passed, even if you pass in, say, 3. > > Urgh, no that should definitely work. Thanks for the test-case, I'll > work on that (probably not until Monday though, but who knows). So wait, the two regressions I found in 2.6.37 are WONTFIX because they are too old, even though they break existing userspace code? And this older regression in 2.6.36 is going to be fixed, even though perf, PAPI, and libpfm4 don't trigger the buggy functionality at all? I think it's time to redefine the PERF_EVENT_IOC_REFRESH ioctl to just refresh once (as that's what it actually does on 2.6.36 - 2.6.39) and if we need to refresh multiple we should add a new PERF_EVENT_IOC_REFRESH_COUNT ioctl. I know I am being difficult, but the perf-event ABI is a mess to program for in a backward compatible fashion. Vince ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: definition of a "regression" 2011-05-28 13:26 ` perf: definition of a "regression" Vince Weaver @ 2011-06-02 7:45 ` Ingo Molnar 0 siblings, 0 replies; 25+ messages in thread From: Ingo Molnar @ 2011-06-02 7:45 UTC (permalink / raw) To: Vince Weaver; +Cc: Peter Zijlstra, linux-kernel, paulus, acme, torvalds * Vince Weaver <vweaver1@eecs.utk.edu> wrote: > On Sat, 28 May 2011, Peter Zijlstra wrote: > > > On Fri, 2011-05-27 at 23:38 -0400, Vince Weaver wrote: > > > on that note (and while trying to document exactly what the ioctls do) it > > > seems that a PERF_EVENT_IOC_REFRESH with an argument of anything higher > > > than one does not work on kernels 2.6.36 and newer. The behavior acts > > > as if 1 was passed, even if you pass in, say, 3. > > > > Urgh, no that should definitely work. Thanks for the test-case, I'll > > work on that (probably not until Monday though, but who knows). > > So wait, the two regressions I found in 2.6.37 are WONTFIX because > they are too old, even though they break existing userspace code? > > And this older regression in 2.6.36 is going to be fixed, even > though perf, PAPI, and libpfm4 don't trigger the buggy > functionality at all? Btw., these considerations are flexible and we can reconsider and change the WONTFIX if there's a patch available and doesn't look horrible to backport. We can also mark fixes that havent been marked -stable originally as -stable later on, etc. So please don't feel needlessly bitter about past decisions: when there's some good technical solution to a problem (or we were plain out wrong about a decision) we try hard not to stand in the way. Thanks, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: regression with PERF_EVENT_IOC_REFRESH 2011-05-28 10:22 ` Peter Zijlstra 2011-05-28 13:26 ` perf: definition of a "regression" Vince Weaver @ 2011-05-29 16:54 ` Vince Weaver 2011-05-31 1:33 ` perf: [patch] " Vince Weaver 1 sibling, 1 reply; 25+ messages in thread From: Vince Weaver @ 2011-05-29 16:54 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, mingo, paulus, acme On Sat, 28 May 2011, Peter Zijlstra wrote: > On Fri, 2011-05-27 at 23:38 -0400, Vince Weaver wrote: > > on that note (and while trying to document exactly what the ioctls do) it > > seems that a PERF_EVENT_IOC_REFRESH with an argument of anything higher > > than one does not work on kernels 2.6.36 and newer. The behavior acts > > as if 1 was passed, even if you pass in, say, 3. > > Urgh, no that should definitely work. Thanks for the test-case, I'll > work on that (probably not until Monday though, but who knows). > after a painfully long bisection, it turns out that this problem was in theory introduced by the following commit: d57e34fdd60be7ffd0b1d86bfa1a553df86b7172 perf: Simplify the ring-buffer logic: make perf_buffer_alloc() do everything needed I'll see if I can come up with a patch, but it's a bit non-obvious why this commit is affecting the REFRESH value at all. Vince vweaver1@eecs.utk.edu ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: [patch] regression with PERF_EVENT_IOC_REFRESH 2011-05-29 16:54 ` perf: regression with PERF_EVENT_IOC_REFRESH Vince Weaver @ 2011-05-31 1:33 ` Vince Weaver 2011-05-31 7:17 ` Peter Zijlstra 2011-05-31 7:23 ` Peter Zijlstra 0 siblings, 2 replies; 25+ messages in thread From: Vince Weaver @ 2011-05-31 1:33 UTC (permalink / raw) To: Vince Weaver; +Cc: Peter Zijlstra, linux-kernel, mingo, paulus, acme On Sun, 29 May 2011, Vince Weaver wrote: > On Sat, 28 May 2011, Peter Zijlstra wrote: > > > On Fri, 2011-05-27 at 23:38 -0400, Vince Weaver wrote: > > > on that note (and while trying to document exactly what the ioctls do) it > > > seems that a PERF_EVENT_IOC_REFRESH with an argument of anything higher > > > than one does not work on kernels 2.6.36 and newer. The behavior acts > > > as if 1 was passed, even if you pass in, say, 3. > > > > Urgh, no that should definitely work. Thanks for the test-case, I'll > > work on that (probably not until Monday though, but who knows). > > > > after a painfully long bisection, it turns out that this problem was in > theory introduced by the following commit: > > d57e34fdd60be7ffd0b1d86bfa1a553df86b7172 > > perf: Simplify the ring-buffer logic: make perf_buffer_alloc() do everything needed > > I'll see if I can come up with a patch, but it's a bit non-obvious why > this commit is affecting the REFRESH value at all. the problem was the mentioned commit tried to optimize the use of watermark and wakeup_watermark without taking into account that wakeup_watermark is a union with wakeup_events. The patch below *should* fix it, but something unrelated has broken overflow support between 2.6.39 and 3.0-rc1 which I haven't had time to investigate. The overflow count is suddenly about 10x what it should be though. So the below is semi-untested and I possibly need to do another bisect. *sigh* Vince Signed-off-by: Vince Weaver <vweaver1@cl320.eecs.utk.edu> diff --git a/kernel/events/core.c b/kernel/events/core.c index d863b3c..e3ff283 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3403,12 +3403,14 @@ unlock: static unsigned long perf_data_size(struct perf_buffer *buffer); static void -perf_buffer_init(struct perf_buffer *buffer, long watermark, int flags) +perf_buffer_init(struct perf_buffer *buffer, + long watermark, + long wakeup_watermark, int flags) { long max_size = perf_data_size(buffer); if (watermark) - buffer->watermark = min(max_size, watermark); + buffer->watermark = min(max_size, wakeup_watermark); if (!buffer->watermark) buffer->watermark = max_size / 2; @@ -3451,7 +3453,8 @@ static void *perf_mmap_alloc_page(int cpu) } static struct perf_buffer * -perf_buffer_alloc(int nr_pages, long watermark, int cpu, int flags) +perf_buffer_alloc(int nr_pages, long watermark, + long wakeup_watermark, int cpu, int flags) { struct perf_buffer *buffer; unsigned long size; @@ -3476,7 +3479,7 @@ perf_buffer_alloc(int nr_pages, long watermark, int cpu, int flags) buffer->nr_pages = nr_pages; - perf_buffer_init(buffer, watermark, flags); + perf_buffer_init(buffer, watermark, wakeup_watermark, flags); return buffer; @@ -3568,7 +3571,8 @@ static void perf_buffer_free(struct perf_buffer *buffer) } static struct perf_buffer * -perf_buffer_alloc(int nr_pages, long watermark, int cpu, int flags) +perf_buffer_alloc(int nr_pages, long watermark, + long wakeup_watermark, int cpu, int flags) { struct perf_buffer *buffer; unsigned long size; @@ -3592,7 +3596,7 @@ perf_buffer_alloc(int nr_pages, long watermark, int cpu, int flags) buffer->page_order = ilog2(nr_pages); buffer->nr_pages = 1; - perf_buffer_init(buffer, watermark, flags); + perf_buffer_init(buffer, watermark, wakeup_watermark, flags); return buffer; @@ -3787,7 +3791,9 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma) if (vma->vm_flags & VM_WRITE) flags |= PERF_BUFFER_WRITABLE; - buffer = perf_buffer_alloc(nr_pages, event->attr.wakeup_watermark, + buffer = perf_buffer_alloc(nr_pages, + event->attr.watermark, + event->attr.wakeup_watermark, event->cpu, flags); if (!buffer) { ret = -ENOMEM; ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: perf: [patch] regression with PERF_EVENT_IOC_REFRESH 2011-05-31 1:33 ` perf: [patch] " Vince Weaver @ 2011-05-31 7:17 ` Peter Zijlstra 2011-05-31 7:23 ` Peter Zijlstra 1 sibling, 0 replies; 25+ messages in thread From: Peter Zijlstra @ 2011-05-31 7:17 UTC (permalink / raw) To: Vince Weaver; +Cc: Vince Weaver, linux-kernel, mingo, paulus, acme On Mon, 2011-05-30 at 21:33 -0400, Vince Weaver wrote: > On Sun, 29 May 2011, Vince Weaver wrote: > > > On Sat, 28 May 2011, Peter Zijlstra wrote: > > > > > On Fri, 2011-05-27 at 23:38 -0400, Vince Weaver wrote: > > > > on that note (and while trying to document exactly what the ioctls do) it > > > > seems that a PERF_EVENT_IOC_REFRESH with an argument of anything higher > > > > than one does not work on kernels 2.6.36 and newer. The behavior acts > > > > as if 1 was passed, even if you pass in, say, 3. > > > > > > Urgh, no that should definitely work. Thanks for the test-case, I'll > > > work on that (probably not until Monday though, but who knows). > > > > > > > after a painfully long bisection, it turns out that this problem was in > > theory introduced by the following commit: > > > > d57e34fdd60be7ffd0b1d86bfa1a553df86b7172 > > > > perf: Simplify the ring-buffer logic: make perf_buffer_alloc() do everything needed > > > > I'll see if I can come up with a patch, but it's a bit non-obvious why > > this commit is affecting the REFRESH value at all. > > the problem was the mentioned commit tried to optimize the use of > watermark and wakeup_watermark without taking into account that > wakeup_watermark is a union with wakeup_events. > > The patch below *should* fix it, Awesome thanks! > but something unrelated has broken > overflow support between 2.6.39 and 3.0-rc1 which I haven't had time to > investigate. The overflow count is suddenly about 10x what it should be > though. So the below is semi-untested and I possibly need to do another > bisect. *sigh* Yeah, I noticed, I was hunting that as well.. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: [patch] regression with PERF_EVENT_IOC_REFRESH 2011-05-31 1:33 ` perf: [patch] " Vince Weaver 2011-05-31 7:17 ` Peter Zijlstra @ 2011-05-31 7:23 ` Peter Zijlstra 2011-05-31 13:49 ` Vince Weaver 1 sibling, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2011-05-31 7:23 UTC (permalink / raw) To: Vince Weaver; +Cc: Vince Weaver, linux-kernel, mingo, paulus, acme On Mon, 2011-05-30 at 21:33 -0400, Vince Weaver wrote: > the problem was the mentioned commit tried to optimize the use of > watermark and wakeup_watermark without taking into account that > wakeup_watermark is a union with wakeup_events. Note that wake_events isn't related to IOC_REFRESH, wake_events is how much events to buffer in the mmap-buffer before issuing a wakeup. IOC_REFRESH increments event_limit, which is how many events to run before disabling yourself. What I gather is that due to that SIGIO bug (fixed by f506b3dc0e), you had to have both an mmap and a wakeup in order for that signal to arrive. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: [patch] regression with PERF_EVENT_IOC_REFRESH 2011-05-31 7:23 ` Peter Zijlstra @ 2011-05-31 13:49 ` Vince Weaver 2011-05-31 15:52 ` Peter Zijlstra 0 siblings, 1 reply; 25+ messages in thread From: Vince Weaver @ 2011-05-31 13:49 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Vince Weaver, linux-kernel, mingo, paulus, acme On Tue, 31 May 2011, Peter Zijlstra wrote: > On Mon, 2011-05-30 at 21:33 -0400, Vince Weaver wrote: > > the problem was the mentioned commit tried to optimize the use of > > watermark and wakeup_watermark without taking into account that > > wakeup_watermark is a union with wakeup_events. > > Note that wake_events isn't related to IOC_REFRESH, wake_events is how > much events to buffer in the mmap-buffer before issuing a wakeup. > > IOC_REFRESH increments event_limit, which is how many events to run > before disabling yourself. > > What I gather is that due to that SIGIO bug (fixed by f506b3dc0e), you > had to have both an mmap and a wakeup in order for that signal to > arrive. yes, but due to a bug in the mentioned changeset, the buffer watermark value was being set to a low value even if *watermark* was 0. So if you were using IOC_REFRESH to set the *wakeup_events* value, it was also setting the *wakeup_watermark* value (it's a union) and the buffer setup was then unconditionally setting the buffer watermark to the value of the supposedly unrelated *wakeup_watermark*. Normally the wakeup watermark would default to something like 2048, but if you were trying to set the wakeup_events value to something like 3 then wakeup_watermark would be set to that too, causing a lot more overflow events. I verified all the above painfully using a lot of printks. I agree this does seem to be a combination of bugs, as even with a properlyu set value on affected kernels you'd get spurious watermark overflow events if you weren't consuming the ring buffer. In any case, I can provide a cleaner patch than the one before that isn't as intrusive. I'm also bisecting the other problem I mentioned, the one where overflows are 10x too large on 3.0-rc1. I'm at work with a Nehalem machine so the bisect should go faster than the bisect I had to do on an atom machine this weekend. A power outage over the weekend has taken part of the network down here though so my e-mail access is a bit limited, so I apologize if I've been missing comments sent to my other e-mail address. Vince ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: [patch] regression with PERF_EVENT_IOC_REFRESH 2011-05-31 13:49 ` Vince Weaver @ 2011-05-31 15:52 ` Peter Zijlstra 2011-05-31 16:39 ` Vince Weaver 0 siblings, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2011-05-31 15:52 UTC (permalink / raw) To: Vince Weaver; +Cc: Vince Weaver, linux-kernel, mingo, paulus, acme On Tue, 2011-05-31 at 09:49 -0400, Vince Weaver wrote: > On Tue, 31 May 2011, Peter Zijlstra wrote: > > > On Mon, 2011-05-30 at 21:33 -0400, Vince Weaver wrote: > > > the problem was the mentioned commit tried to optimize the use of > > > watermark and wakeup_watermark without taking into account that > > > wakeup_watermark is a union with wakeup_events. > > > > Note that wake_events isn't related to IOC_REFRESH, wake_events is how > > much events to buffer in the mmap-buffer before issuing a wakeup. > > > > IOC_REFRESH increments event_limit, which is how many events to run > > before disabling yourself. > > > > What I gather is that due to that SIGIO bug (fixed by f506b3dc0e), you > > had to have both an mmap and a wakeup in order for that signal to > > arrive. > > yes, but due to a bug in the mentioned changeset, the buffer watermark > value was being set to a low value even if *watermark* was 0. So if you > were using IOC_REFRESH to set the *wakeup_events* value, IOC_REFRESH sets event->event_limit, not wakeup_events. > it was also > setting the *wakeup_watermark* value (it's a union) and the buffer setup > was then unconditionally setting the buffer watermark to the value of the > supposedly unrelated *wakeup_watermark*. Normally the wakeup watermark > would default to something like 2048, but if you were trying to set the > wakeup_events value to something like 3 then wakeup_watermark would be set > to that too, causing a lot more overflow events. poll() wakeups, which were inadvertly linked to SIGIOs > I verified all the above painfully using a lot of printks. I prefer to use trace_printk() and /debug/tracing/, that doesn't slow stuff down as much. > I agree this does seem to be a combination of bugs, as even with a > properlyu set value on affected kernels you'd get spurious watermark > overflow events if you weren't consuming the ring buffer. *nod* > In any case, I can provide a cleaner patch than the one before that isn't > as intrusive. Appreciated. > I'm also bisecting the other problem I mentioned, the one where overflows > are 10x too large on 3.0-rc1. I'm at work with a Nehalem machine so the > bisect should go faster than the bisect I had to do on an atom machine > this weekend. It wouldn't be the SIGIO fix would it?, with that every overflow generates a SIGIO, not only the poll() wakeups. And ouch at bisecting (or even building a kernel) on an Atom, those things are horridly slow. > A power outage over the weekend has taken part of the > network down here though so my e-mail access is a bit limited, so I > apologize if I've been missing comments sent to my other e-mail address. I'm afraid not, I've been mostly tied up with fixing some scheduler regressions. Also, it looks like I just broke stuff even worse in -tip, am bisecting that now. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: perf: [patch] regression with PERF_EVENT_IOC_REFRESH 2011-05-31 15:52 ` Peter Zijlstra @ 2011-05-31 16:39 ` Vince Weaver 0 siblings, 0 replies; 25+ messages in thread From: Vince Weaver @ 2011-05-31 16:39 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Vince Weaver, linux-kernel, mingo, paulus, acme On Tue, 31 May 2011, Peter Zijlstra wrote: > > IOC_REFRESH sets event->event_limit, not wakeup_events. ahh, yes. So it's a userspace "bug". The test code called a "IOC_REFRESH, 3" whenever it got signalled. It didn't distinguish between whether it was a plain overflow or if it was a ring-buffer overflow (can it distinguish?). Thus the watermark bug was confusing the user-space code into refreshing when it was not necessary. > > I'm also bisecting the other problem I mentioned, the one where overflows > > are 10x too large on 3.0-rc1. I'm at work with a Nehalem machine so the > > bisect should go faster than the bisect I had to do on an atom machine > > this weekend. > > It wouldn't be the SIGIO fix would it?, with that every overflow > generates a SIGIO, not only the poll() wakeups. And ouch at bisecting > (or even building a kernel) on an Atom, those things are horridly slow. Oh, it could be the SIGIO fix. I hadn't realized that got merged already. And yes, bisect on atom is painful, but my alternatives were a 1.7GHz P4 (with small disk, so would have been over NFS), a 600MHz G3 iBook, or an armv6 machine. So atom it was. Vince vweaver1@eecs.utk.edu ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2011-06-02 7:45 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-23 20:04 perf: regression with PERF_EVENT_IOC_REFRESH Vince Weaver 2011-05-23 20:22 ` Peter Zijlstra 2011-05-24 6:20 ` Vince Weaver 2011-05-24 10:30 ` Peter Zijlstra 2011-05-24 15:04 ` Vince Weaver 2011-05-24 15:10 ` Peter Zijlstra 2011-05-24 15:40 ` Ingo Molnar 2011-05-24 20:31 ` Vince Weaver 2011-05-25 10:39 ` Ingo Molnar 2011-05-25 21:24 ` Vince Weaver 2011-05-24 17:53 ` Vince Weaver 2011-05-24 15:11 ` Peter Zijlstra 2011-05-24 15:18 ` Peter Zijlstra 2011-05-24 21:48 ` Vince Weaver 2011-05-28 3:38 ` Vince Weaver 2011-05-28 10:22 ` Peter Zijlstra 2011-05-28 13:26 ` perf: definition of a "regression" Vince Weaver 2011-06-02 7:45 ` Ingo Molnar 2011-05-29 16:54 ` perf: regression with PERF_EVENT_IOC_REFRESH Vince Weaver 2011-05-31 1:33 ` perf: [patch] " Vince Weaver 2011-05-31 7:17 ` Peter Zijlstra 2011-05-31 7:23 ` Peter Zijlstra 2011-05-31 13:49 ` Vince Weaver 2011-05-31 15:52 ` Peter Zijlstra 2011-05-31 16:39 ` Vince Weaver
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox