* [PATCH] perf: fix perf_event_context->time
@ 2023-02-28 19:21 Song Liu
2023-03-01 22:29 ` Namhyung Kim
0 siblings, 1 reply; 5+ messages in thread
From: Song Liu @ 2023-02-28 19:21 UTC (permalink / raw)
To: linux-kernel; +Cc: kernel-team, Song Liu, Peter Zijlstra, Namhyung Kim
Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
accurate time_enabled and time_running for an event. The difference between
ctx->timestamp and ctx->time is the among of time when the context is not
enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
it is not the case at the moment. To show this with drgn [1]:
drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
For help, type help(drgn).
...
>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
>>> ctx.timestamp * 1.0 / ctx.time
(double)1.0385869134111765
>>>
ctx->timestamp is about 4% larger than ctx.time. This issue causes time
read by perf_event_read_local() goes back in some cases.
Fix this by updating the condition for __update_context_time(ctx, false).
Specifically, it should only be called when we enable EVENT_TIME for the
ctx.
[1] drgn: https://github.com/osandov/drgn
Fixes: 09f5e7dc7ad7 ("perf: Fix perf_event_read_local() time")
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
---
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 380476a934e8..67478f43e26e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3872,7 +3872,7 @@ ctx_sched_in(struct perf_event_context *ctx, enum event_type_t event_type)
if (likely(!ctx->nr_events))
return;
- if (is_active ^ EVENT_TIME) {
+ if (!(is_active & EVENT_TIME)) {
/* start ctx time */
__update_context_time(ctx, false);
perf_cgroup_set_timestamp(cpuctx);
--
2.30.2
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] perf: fix perf_event_context->time
2023-02-28 19:21 [PATCH] perf: fix perf_event_context->time Song Liu
@ 2023-03-01 22:29 ` Namhyung Kim
2023-03-01 23:16 ` Song Liu
0 siblings, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2023-03-01 22:29 UTC (permalink / raw)
To: Song Liu; +Cc: linux-kernel, kernel-team, Peter Zijlstra
Hi Song,
On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote:
>
> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
> accurate time_enabled and time_running for an event. The difference between
> ctx->timestamp and ctx->time is the among of time when the context is not
> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
I'm not sure if it's correct. The timestamp can go when the context is disabled
for example, in ctx_resched() even if the NMI watchdog is enabled, right?
> it is not the case at the moment. To show this with drgn [1]:
>
> drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
> For help, type help(drgn).
> ...
> >>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
> >>> ctx.timestamp * 1.0 / ctx.time
> (double)1.0385869134111765
> >>>
>
> ctx->timestamp is about 4% larger than ctx.time. This issue causes time
> read by perf_event_read_local() goes back in some cases.
I don't think the difference between the ctx.time and the timestamp
itself is a problem. I think the problem is it CAN update the timestamp
(and the timeoffset) while the context is enabled. Then depending on
the timing, event times can return smaller values than before.
>
> Fix this by updating the condition for __update_context_time(ctx, false).
> Specifically, it should only be called when we enable EVENT_TIME for the
> ctx.
So this change fixes the condition to update the timestamp only if
it moves from disabled to enabled state.
>
> [1] drgn: https://github.com/osandov/drgn
> Fixes: 09f5e7dc7ad7 ("perf: Fix perf_event_read_local() time")
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Song Liu <song@kernel.org>
Actually I got some similar reports lately but couldn't find time to
dig in deeply. Thanks for the fix!
Acked-by: Namhyung Kim <namhyung@kernel.org>
Thanks,
Namhyung
> ---
> kernel/events/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 380476a934e8..67478f43e26e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3872,7 +3872,7 @@ ctx_sched_in(struct perf_event_context *ctx, enum event_type_t event_type)
> if (likely(!ctx->nr_events))
> return;
>
> - if (is_active ^ EVENT_TIME) {
> + if (!(is_active & EVENT_TIME)) {
> /* start ctx time */
> __update_context_time(ctx, false);
> perf_cgroup_set_timestamp(cpuctx);
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] perf: fix perf_event_context->time
2023-03-01 22:29 ` Namhyung Kim
@ 2023-03-01 23:16 ` Song Liu
2023-03-02 21:15 ` Namhyung Kim
0 siblings, 1 reply; 5+ messages in thread
From: Song Liu @ 2023-03-01 23:16 UTC (permalink / raw)
To: Namhyung Kim; +Cc: Song Liu, LKML, Kernel Team, Peter Zijlstra
> On Mar 1, 2023, at 2:29 PM, Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Song,
>
> On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote:
>>
>> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
>> accurate time_enabled and time_running for an event. The difference between
>> ctx->timestamp and ctx->time is the among of time when the context is not
>> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
>
> I'm not sure if it's correct. The timestamp can go when the context is disabled
> for example, in ctx_resched() even if the NMI watchdog is enabled, right?
I think we do not disable EVENT_TIME for per cpu ctx?
>
>> it is not the case at the moment. To show this with drgn [1]:
>>
>> drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
>> For help, type help(drgn).
>> ...
>>>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
>>>>> ctx.timestamp * 1.0 / ctx.time
>> (double)1.0385869134111765
>>>>>
>>
>> ctx->timestamp is about 4% larger than ctx.time. This issue causes time
>> read by perf_event_read_local() goes back in some cases.
>
> I don't think the difference between the ctx.time and the timestamp
> itself is a problem. I think the problem is it CAN update the timestamp
> (and the timeoffset) while the context is enabled. Then depending on
> the timing, event times can return smaller values than before.
For per cpu ctx, I think timeoffset should stay the same (may not be zero
though).
Thanks,
Song
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] perf: fix perf_event_context->time
2023-03-01 23:16 ` Song Liu
@ 2023-03-02 21:15 ` Namhyung Kim
2023-03-03 1:21 ` Song Liu
0 siblings, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2023-03-02 21:15 UTC (permalink / raw)
To: Song Liu; +Cc: Song Liu, LKML, Kernel Team, Peter Zijlstra
On Wed, Mar 1, 2023 at 3:16 PM Song Liu <songliubraving@meta.com> wrote:
>
>
>
> > On Mar 1, 2023, at 2:29 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Song,
> >
> > On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote:
> >>
> >> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
> >> accurate time_enabled and time_running for an event. The difference between
> >> ctx->timestamp and ctx->time is the among of time when the context is not
> >> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
> >
> > I'm not sure if it's correct. The timestamp can go when the context is disabled
> > for example, in ctx_resched() even if the NMI watchdog is enabled, right?
>
> I think we do not disable EVENT_TIME for per cpu ctx?
I can see ctx_sched_out(ctx, EVENT_TIME) in some places.
Also it'd reset EVENT_TIME if both _PINNED and _FLEXIBLE is
cleared.
Thanks,
Namhyung
>
> >
> >> it is not the case at the moment. To show this with drgn [1]:
> >>
> >> drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
> >> For help, type help(drgn).
> >> ...
> >>>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
> >>>>> ctx.timestamp * 1.0 / ctx.time
> >> (double)1.0385869134111765
> >>>>>
> >>
> >> ctx->timestamp is about 4% larger than ctx.time. This issue causes time
> >> read by perf_event_read_local() goes back in some cases.
> >
> > I don't think the difference between the ctx.time and the timestamp
> > itself is a problem. I think the problem is it CAN update the timestamp
> > (and the timeoffset) while the context is enabled. Then depending on
> > the timing, event times can return smaller values than before.
>
> For per cpu ctx, I think timeoffset should stay the same (may not be zero
> though).
>
> Thanks,
> Song
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] perf: fix perf_event_context->time
2023-03-02 21:15 ` Namhyung Kim
@ 2023-03-03 1:21 ` Song Liu
0 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2023-03-03 1:21 UTC (permalink / raw)
To: Namhyung Kim, Peter Zijlstra; +Cc: Song Liu, Song Liu, LKML, Kernel Team
> On Mar 2, 2023, at 1:15 PM, Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Mar 1, 2023 at 3:16 PM Song Liu <songliubraving@meta.com> wrote:
>>
>>
>>
>>> On Mar 1, 2023, at 2:29 PM, Namhyung Kim <namhyung@kernel.org> wrote:
>>>
>>> Hi Song,
>>>
>>> On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote:
>>>>
>>>> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
>>>> accurate time_enabled and time_running for an event. The difference between
>>>> ctx->timestamp and ctx->time is the among of time when the context is not
>>>> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
>>>
>>> I'm not sure if it's correct. The timestamp can go when the context is disabled
>>> for example, in ctx_resched() even if the NMI watchdog is enabled, right?
>>
>> I think we do not disable EVENT_TIME for per cpu ctx?
>
> I can see ctx_sched_out(ctx, EVENT_TIME) in some places.
> Also it'd reset EVENT_TIME if both _PINNED and _FLEXIBLE is
> cleared.
Yeah, you are right. I missed this part.
Hi Peter,
Does this fix look good do you?
Thanks,
Song
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-03-03 1:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-28 19:21 [PATCH] perf: fix perf_event_context->time Song Liu
2023-03-01 22:29 ` Namhyung Kim
2023-03-01 23:16 ` Song Liu
2023-03-02 21:15 ` Namhyung Kim
2023-03-03 1:21 ` Song Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox