* [PATCH] perf_events: fix time tracking in samples @ 2010-10-19 16:47 Stephane Eranian 2010-10-19 16:52 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Stephane Eranian @ 2010-10-19 16:47 UTC (permalink / raw) To: linux-kernel Cc: peterz, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian, eranian, robert.richter This patch corrects time tracking in samples. Without this patch both time_enabled and time_running may be reported as zero when user asks for PERF_SAMPLE_READ. You use PERF_SAMPLE_READ when you want to sample the values of other counters in each sample. Because of multiplexing, it is necessary to know both time_enable, time_running to be able to scale counts correctly. We defer updating timing until we know it is really needed, i.e., only when we have PERF_SAMPLE_READ. With this patch, the libpfm4 example task_smpl now reports correct counts (shown on 2.4GHz Core 2): $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears noploop 5 noploop for 5 seconds IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3 2,400,000,254 unhalted_core_cycles:u (33) 2,399,273,744 instructions_retired:u (34) 53,340 baclears (35) Signed-off-by: Stephane Eranian <eranian@google.com> --- diff --git a/kernel/perf_event.c b/kernel/perf_event.c index f309e80..04611dd 100644 --- a/kernel/perf_event.c +++ b/kernel/perf_event.c @@ -3494,6 +3494,9 @@ static void perf_output_read_group(struct perf_output_handle *handle, static void perf_output_read(struct perf_output_handle *handle, struct perf_event *event) { + update_context_time(event->ctx); + update_event_times(event); + if (event->attr.read_format & PERF_FORMAT_GROUP) perf_output_read_group(handle, event); else ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] perf_events: fix time tracking in samples 2010-10-19 16:47 [PATCH] perf_events: fix time tracking in samples Stephane Eranian @ 2010-10-19 16:52 ` Peter Zijlstra 2010-10-19 17:01 ` Stephane Eranian 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2010-10-19 16:52 UTC (permalink / raw) To: eranian Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian, robert.richter On Tue, 2010-10-19 at 18:47 +0200, Stephane Eranian wrote: > This patch corrects time tracking in samples. Without this patch > both time_enabled and time_running may be reported as zero when > user asks for PERF_SAMPLE_READ. > > You use PERF_SAMPLE_READ when you want to sample the values of > other counters in each sample. Because of multiplexing, it is > necessary to know both time_enable, time_running to be able > to scale counts correctly. > > We defer updating timing until we know it is really needed, i.e., > only when we have PERF_SAMPLE_READ. > > With this patch, the libpfm4 example task_smpl now reports > correct counts (shown on 2.4GHz Core 2): > > $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears noploop 5 > noploop for 5 seconds > IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3 > 2,400,000,254 unhalted_core_cycles:u (33) > 2,399,273,744 instructions_retired:u (34) > 53,340 baclears (35) > > Signed-off-by: Stephane Eranian <eranian@google.com> > > --- > > diff --git a/kernel/perf_event.c b/kernel/perf_event.c > index f309e80..04611dd 100644 > --- a/kernel/perf_event.c > +++ b/kernel/perf_event.c > @@ -3494,6 +3494,9 @@ static void perf_output_read_group(struct perf_output_handle *handle, > static void perf_output_read(struct perf_output_handle *handle, > struct perf_event *event) > { > + update_context_time(event->ctx); > + update_event_times(event); > + > if (event->attr.read_format & PERF_FORMAT_GROUP) > perf_output_read_group(handle, event); > else Right, except that this can actually corrupt the time measurements... :/ Usually context times are updated under ctx->lock, and this is called from NMI context, which can interrupt ctx->lock.. I was thinking about updating a local copy of the times, in that case you can only get funny times from samples, but it won't corrupt the actual running data. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf_events: fix time tracking in samples 2010-10-19 16:52 ` Peter Zijlstra @ 2010-10-19 17:01 ` Stephane Eranian 2010-10-19 17:09 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Stephane Eranian @ 2010-10-19 17:01 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian, robert.richter On Tue, Oct 19, 2010 at 6:52 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, 2010-10-19 at 18:47 +0200, Stephane Eranian wrote: >> This patch corrects time tracking in samples. Without this patch >> both time_enabled and time_running may be reported as zero when >> user asks for PERF_SAMPLE_READ. >> >> You use PERF_SAMPLE_READ when you want to sample the values of >> other counters in each sample. Because of multiplexing, it is >> necessary to know both time_enable, time_running to be able >> to scale counts correctly. >> >> We defer updating timing until we know it is really needed, i.e., >> only when we have PERF_SAMPLE_READ. >> >> With this patch, the libpfm4 example task_smpl now reports >> correct counts (shown on 2.4GHz Core 2): >> >> $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears noploop 5 >> noploop for 5 seconds >> IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3 >> 2,400,000,254 unhalted_core_cycles:u (33) >> 2,399,273,744 instructions_retired:u (34) >> 53,340 baclears (35) >> >> Signed-off-by: Stephane Eranian <eranian@google.com> >> >> --- >> >> diff --git a/kernel/perf_event.c b/kernel/perf_event.c >> index f309e80..04611dd 100644 >> --- a/kernel/perf_event.c >> +++ b/kernel/perf_event.c >> @@ -3494,6 +3494,9 @@ static void perf_output_read_group(struct perf_output_handle *handle, >> static void perf_output_read(struct perf_output_handle *handle, >> struct perf_event *event) >> { >> + update_context_time(event->ctx); >> + update_event_times(event); >> + >> if (event->attr.read_format & PERF_FORMAT_GROUP) >> perf_output_read_group(handle, event); >> else > > > Right, except that this can actually corrupt the time measurements... :/ > > Usually context times are updated under ctx->lock, and this is called > from NMI context, which can interrupt ctx->lock.. > Ok, I missed that. But I don't understand why you need the lock to udpate the time. The lower-level clock is lockless if I recall. Can't you use an atomic ops in update_context_time()? > I was thinking about updating a local copy of the times, in that case > you can only get funny times from samples, but it won't corrupt the > actual running data. > You want time to be correct in every sample How would you detect bogus timing? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf_events: fix time tracking in samples 2010-10-19 17:01 ` Stephane Eranian @ 2010-10-19 17:09 ` Peter Zijlstra 2010-10-19 19:03 ` Stephane Eranian 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2010-10-19 17:09 UTC (permalink / raw) To: Stephane Eranian Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian, robert.richter On Tue, 2010-10-19 at 19:01 +0200, Stephane Eranian wrote: > On Tue, Oct 19, 2010 at 6:52 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, 2010-10-19 at 18:47 +0200, Stephane Eranian wrote: > >> This patch corrects time tracking in samples. Without this patch > >> both time_enabled and time_running may be reported as zero when > >> user asks for PERF_SAMPLE_READ. > >> > >> You use PERF_SAMPLE_READ when you want to sample the values of > >> other counters in each sample. Because of multiplexing, it is > >> necessary to know both time_enable, time_running to be able > >> to scale counts correctly. > >> > >> We defer updating timing until we know it is really needed, i.e., > >> only when we have PERF_SAMPLE_READ. > >> > >> With this patch, the libpfm4 example task_smpl now reports > >> correct counts (shown on 2.4GHz Core 2): > >> > >> $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears noploop 5 > >> noploop for 5 seconds > >> IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3 > >> 2,400,000,254 unhalted_core_cycles:u (33) > >> 2,399,273,744 instructions_retired:u (34) > >> 53,340 baclears (35) > >> > >> Signed-off-by: Stephane Eranian <eranian@google.com> > >> > >> --- > >> > >> diff --git a/kernel/perf_event.c b/kernel/perf_event.c > >> index f309e80..04611dd 100644 > >> --- a/kernel/perf_event.c > >> +++ b/kernel/perf_event.c > >> @@ -3494,6 +3494,9 @@ static void perf_output_read_group(struct perf_output_handle *handle, > >> static void perf_output_read(struct perf_output_handle *handle, > >> struct perf_event *event) > >> { > >> + update_context_time(event->ctx); > >> + update_event_times(event); > >> + > >> if (event->attr.read_format & PERF_FORMAT_GROUP) > >> perf_output_read_group(handle, event); > >> else > > > > > > Right, except that this can actually corrupt the time measurements... :/ > > > > Usually context times are updated under ctx->lock, and this is called > > from NMI context, which can interrupt ctx->lock.. > > > Ok, I missed that. But I don't understand why you need the lock to > udpate the time. The lower-level clock is lockless if I recall. Can't you > use an atomic ops in update_context_time()? atomic ops would slow down those code paths, also, I don't think you can fully get the ordering between ->tstamp_$foo and ->total_time_$foo just right. > > I was thinking about updating a local copy of the times, in that case > > you can only get funny times from samples, but it won't corrupt the > > actual running data. > > > You want time to be correct in every sample How would you detect > bogus timing? Not sure, but barring 64bit atomics for all these, 32bit archs and NMI are going to be 'interesting' ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf_events: fix time tracking in samples 2010-10-19 17:09 ` Peter Zijlstra @ 2010-10-19 19:03 ` Stephane Eranian 2010-10-20 11:00 ` Peter Zijlstra 2010-10-20 11:13 ` Peter Zijlstra 0 siblings, 2 replies; 9+ messages in thread From: Stephane Eranian @ 2010-10-19 19:03 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian, robert.richter On Tue, Oct 19, 2010 at 7:09 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, 2010-10-19 at 19:01 +0200, Stephane Eranian wrote: >> On Tue, Oct 19, 2010 at 6:52 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Tue, 2010-10-19 at 18:47 +0200, Stephane Eranian wrote: >> >> This patch corrects time tracking in samples. Without this patch >> >> both time_enabled and time_running may be reported as zero when >> >> user asks for PERF_SAMPLE_READ. >> >> >> >> You use PERF_SAMPLE_READ when you want to sample the values of >> >> other counters in each sample. Because of multiplexing, it is >> >> necessary to know both time_enable, time_running to be able >> >> to scale counts correctly. >> >> >> >> We defer updating timing until we know it is really needed, i.e., >> >> only when we have PERF_SAMPLE_READ. >> >> >> >> With this patch, the libpfm4 example task_smpl now reports >> >> correct counts (shown on 2.4GHz Core 2): >> >> >> >> $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears noploop 5 >> >> noploop for 5 seconds >> >> IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3 >> >> 2,400,000,254 unhalted_core_cycles:u (33) >> >> 2,399,273,744 instructions_retired:u (34) >> >> 53,340 baclears (35) >> >> >> >> Signed-off-by: Stephane Eranian <eranian@google.com> >> >> >> >> --- >> >> >> >> diff --git a/kernel/perf_event.c b/kernel/perf_event.c >> >> index f309e80..04611dd 100644 >> >> --- a/kernel/perf_event.c >> >> +++ b/kernel/perf_event.c >> >> @@ -3494,6 +3494,9 @@ static void perf_output_read_group(struct perf_output_handle *handle, >> >> static void perf_output_read(struct perf_output_handle *handle, >> >> struct perf_event *event) >> >> { >> >> + update_context_time(event->ctx); >> >> + update_event_times(event); >> >> + >> >> if (event->attr.read_format & PERF_FORMAT_GROUP) >> >> perf_output_read_group(handle, event); >> >> else >> > >> > >> > Right, except that this can actually corrupt the time measurements... :/ >> > >> > Usually context times are updated under ctx->lock, and this is called >> > from NMI context, which can interrupt ctx->lock.. >> > >> Ok, I missed that. But I don't understand why you need the lock to >> udpate the time. The lower-level clock is lockless if I recall. Can't you >> use an atomic ops in update_context_time()? > > atomic ops would slow down those code paths, also, I don't think you can > fully get the ordering between ->tstamp_$foo and ->total_time_$foo just > right. > I don't get that. Could you give an example? >> > I was thinking about updating a local copy of the times, in that case >> > you can only get funny times from samples, but it won't corrupt the >> > actual running data. >> > >> You want time to be correct in every sample How would you detect >> bogus timing? > > Not sure, but barring 64bit atomics for all these, 32bit archs and NMI > are going to be 'interesting' > Every sample needs to be correct, otherwise you run the risk of introducing bias. I think if the tradeoffs is correctness vs. speed, I'd choose correctness. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf_events: fix time tracking in samples 2010-10-19 19:03 ` Stephane Eranian @ 2010-10-20 11:00 ` Peter Zijlstra 2010-10-20 11:13 ` Peter Zijlstra 1 sibling, 0 replies; 9+ messages in thread From: Peter Zijlstra @ 2010-10-20 11:00 UTC (permalink / raw) To: Stephane Eranian Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian, robert.richter On Tue, 2010-10-19 at 21:03 +0200, Stephane Eranian wrote: > >> Ok, I missed that. But I don't understand why you need the lock to > >> udpate the time. The lower-level clock is lockless if I recall. Can't you > >> use an atomic ops in update_context_time()? > > > > atomic ops would slow down those code paths, also, I don't think you can > > fully get the ordering between ->tstamp_$foo and ->total_time_$foo just > > right. > > > > I don't get that. Could you give an example? Take update_context_time(), it has: now = perf_clock(); ctx->time += now - ctx->timestamp; ctx->timestamp = now; If you interleave two of those you get: ctx->timestamp = T0; now = perf_clock(); /* T1 */ ctx->time += now - ctx->timestamp; now = perf_clock(); /* T2 */ ctx->time += now - ctx->timestamp; ctx->timestamp = now; ctx->timestamp = now; So at this point you would expect timestamp = T2 and time += T2-T0. Except that: time += T1 - T0 + T2 - T0 != T2 - T0 and timestamp = T1 You can of course write it as something like x86_perf_event_update(), but then there's trying to keep total_time_running and total_time_enabled in sync. > > Not sure, but barring 64bit atomics for all these, 32bit archs and NMI > > are going to be 'interesting' > > > > Every sample needs to be correct, otherwise you run the risk of introducing > bias. > > I think if the tradeoffs is correctness vs. speed, I'd choose correctness. Well, yes, but it sucks, esp. since its only relevant for PERF_SAMPLE_READ. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf_events: fix time tracking in samples 2010-10-19 19:03 ` Stephane Eranian 2010-10-20 11:00 ` Peter Zijlstra @ 2010-10-20 11:13 ` Peter Zijlstra 2010-10-20 12:42 ` Stephane Eranian 1 sibling, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2010-10-20 11:13 UTC (permalink / raw) To: Stephane Eranian Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian, robert.richter On Tue, 2010-10-19 at 21:03 +0200, Stephane Eranian wrote: > >> Ok, I missed that. But I don't understand why you need the lock to > >> udpate the time. The lower-level clock is lockless if I recall. Can't you > >> use an atomic ops in update_context_time()? > > > > atomic ops would slow down those code paths, also, I don't think you can > > fully get the ordering between ->tstamp_$foo and ->total_time_$foo just > > right. > > > > I don't get that. Could you give an example? Take update_context_time(), it has: now = perf_clock(); ctx->time += now - ctx->timestamp; ctx->timestamp = now; If you interleave two of those you get: ctx->timestamp = T0; now = perf_clock(); /* T1 */ ctx->time += now - ctx->timestamp; now = perf_clock(); /* T2 */ ctx->time += now - ctx->timestamp; ctx->timestamp = now; ctx->timestamp = now; So at this point you would expect timestamp = T2 and time += T2-T0. Except that: time += T1 - T0 + T2 - T0 != T2 - T0 and timestamp = T1 You can of course write it as something like x86_perf_event_update(), but then there's trying to keep total_time_running and total_time_enabled in sync. > > Not sure, but barring 64bit atomics for all these, 32bit archs and NMI > > are going to be 'interesting' > > > > Every sample needs to be correct, otherwise you run the risk of introducing > bias. > > I think if the tradeoffs is correctness vs. speed, I'd choose correctness. Well, yes, but it sucks, esp. since its only relevant to PERF_SAMPLE_READ. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf_events: fix time tracking in samples 2010-10-20 11:13 ` Peter Zijlstra @ 2010-10-20 12:42 ` Stephane Eranian 2010-10-20 13:04 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Stephane Eranian @ 2010-10-20 12:42 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian, robert.richter On Wed, Oct 20, 2010 at 1:13 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, 2010-10-19 at 21:03 +0200, Stephane Eranian wrote: > >> >> Ok, I missed that. But I don't understand why you need the lock to >> >> udpate the time. The lower-level clock is lockless if I recall. Can't you >> >> use an atomic ops in update_context_time()? >> > >> > atomic ops would slow down those code paths, also, I don't think you can >> > fully get the ordering between ->tstamp_$foo and ->total_time_$foo just >> > right. >> > >> >> I don't get that. Could you give an example? > > Take update_context_time(), it has: > > now = perf_clock(); > ctx->time += now - ctx->timestamp; > ctx->timestamp = now; > > If you interleave two of those you get: > > ctx->timestamp = T0; > > > now = perf_clock(); /* T1 */ > ctx->time += now - ctx->timestamp; > now = perf_clock(); /* T2 */ > ctx->time += now - ctx->timestamp; > ctx->timestamp = now; > ctx->timestamp = now; > > > So at this point you would expect timestamp = T2 and time += T2-T0. > > Except that: time += T1 - T0 + T2 - T0 != T2 - T0 and > timestamp = T1 > > You can of course write it as something like x86_perf_event_update(), > but then there's trying to keep total_time_running and > total_time_enabled in sync. > I think it is more complicated than x86_perf_event_update() because, this time, you have to update 2 fields, i.e., ctx->time, ctx->timestamp. The difficulty is to backtrack after you've successfully and atomically set the first one. You cannot just subtract/replace what you've changed because it was visible by whoever interrupted you. So it is already too late. It may be better to try another approach just for PERF_SAMPLE_READ with its own version of ctx->time. What about if on event_sched_in() you were snapshotting ctx->time. Then in the perf_output_read_event(), you'd have to compute the time delta and add it to this private version of ctx->time and store that in the sample. > >> > Not sure, but barring 64bit atomics for all these, 32bit archs and NMI >> > are going to be 'interesting' >> > >> >> Every sample needs to be correct, otherwise you run the risk of introducing >> bias. >> >> I think if the tradeoffs is correctness vs. speed, I'd choose correctness. > > Well, yes, but it sucks, esp. since its only relevant to > PERF_SAMPLE_READ. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf_events: fix time tracking in samples 2010-10-20 12:42 ` Stephane Eranian @ 2010-10-20 13:04 ` Peter Zijlstra 0 siblings, 0 replies; 9+ messages in thread From: Peter Zijlstra @ 2010-10-20 13:04 UTC (permalink / raw) To: Stephane Eranian Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian, robert.richter On Wed, 2010-10-20 at 14:42 +0200, Stephane Eranian wrote: > It may be better to try another approach just for PERF_SAMPLE_READ > with its own version of ctx->time. What about if on event_sched_in() you > were snapshotting ctx->time. Then in the perf_output_read_event(), you'd > have to compute the time delta and add it to this private version of > ctx->time and store that in the sample. Right, that might work. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-10-20 13:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-19 16:47 [PATCH] perf_events: fix time tracking in samples Stephane Eranian 2010-10-19 16:52 ` Peter Zijlstra 2010-10-19 17:01 ` Stephane Eranian 2010-10-19 17:09 ` Peter Zijlstra 2010-10-19 19:03 ` Stephane Eranian 2010-10-20 11:00 ` Peter Zijlstra 2010-10-20 11:13 ` Peter Zijlstra 2010-10-20 12:42 ` Stephane Eranian 2010-10-20 13:04 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox