From: Yeoreum Yun <yeoreum.yun@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@redhat.com, acme@kernel.org, namhyung@kernel.org,
mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] events/core: fix acoount failure for event's total_enable_time
Date: Fri, 20 Dec 2024 14:05:39 +0000 [thread overview]
Message-ID: <Z2V5s2JM4W7BRckR@e129823.arm.com> (raw)
In-Reply-To: <20241220133359.GC17537@noisy.programming.kicks-ass.net>
Hi, Peter.
On Fri, Dec 20, 2024 at 02:33:59PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 20, 2024 at 10:02:02AM +0000, Yeoreum Yun wrote:
> > The perf core code fails to account for total_enable_time of event
> > when its state is inactive.
> >
> > Here is the error case for failure to account for total_enable_time for
> > core event:
> >
> > sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 2s
> > ...
> > armv8_pmuv3_0/event=0x08/: 1138698008 2289429840 2174835740
> > armv8_pmuv3_1/event=0x08/: 1826791390 1950025700 847648440
>
> So *that* number is wrong and should have been equal to the number above
> it.
Yes it is.
> >
> > Performance counter stats for 'stress-ng --pthread=2 -t 2s':
> >
> > 1,138,698,008 armv8_pmuv3_0/event=0x08/ (94.99%)
> > 1,826,791,390 armv8_pmuv3_1/event=0x08/ (43.47%)
> >
> > Since above two events are belong to the same task context and
> > mutually-exclusive per cpu (they couldn't be active at the same time on the same cpu),
> > the total_enable_time should be the same (the 2nd number among 3 numbers)
> > and the summation of ratio should be 100%.
>
> So the enable_time for the second event is too low, meaning the ratio
> hsould have been ~15% rather than ~43%.
>
> /me goes try and congest the wall of text below for more clues :/
>
> > This account failure of total_enable_time for event could happen in below sequence.
> >
> > 1. two event opened with:
> > - first event (e0) is opened with pmu0(p0) which could be added on CPU0.
> > - second event (e1) is opened with pmu1(p1) which could be added on CPU1.
> > - these two event belongs to the same task_ctx.
> >
> > at this point:
> > task_ctx->time = 0
> > e0->total_enable_time = 0
> > e0->total_running_time = 0
> > e1->total_enable_time = 0
> > e1->total_running_time = 0
> >
> > 2. the task_ctx is sched in CPU0.
> > - In ctx_sched_in(), the task_ctx->time doesn't updated.
> > - In event_sched_in() e0 is activated so, its state becomes ACTIVE.
> > - In event_sched_in() e1 is activated, but soon becomes !ACTIVE
> > because pmu1 doesn't support CPU1 so it failed in pmu1->add().
>
> This doesn't make sense; e1 should never reach event_sched_in() and it
> should already be INACTIVE.
>
> Notable events are created INACTIVE when !attr->disabled.
But in perf stat code, it via enable_counter(), so it's set with
INACTIVE.
> Also, scheduling should not get beyond merge_sched_in()'s
> event_filter_match(), which will find the CPU is a mismatch and stop
> right there.
>
> This also means the event (e1) does not get to go on flexible_active
> (see below).
No, when perf stat command with above, the cpu sets as == -1,
So, It doesn't filter out in event_filter_match(). so it enter into
merge_sched_in() and get into event_sched_in().
> > at this point:
> > task_ctx->time = 0
> > e0->total_enable_time = 0
> > e0->total_running_time = 0
> > e1->total_enable_time = 0
> > e1->total_running_time = 0
>
> Both have the same state, surely that could've been compressed somewhat.
>
> /me deletes giant wall of text
>
> > 7. the task exit and task_ctx is sched out. and suppose scheulded time as t3.
> > - In ctx_sched_out(), the task_ctx->time updated with t1 + t2 + t3.
> > - e0 was ACTIVE, so In event_sched_out(),
> > e0->total_enable_time and e1->total_running_time are updated with t3.
> > - e1 was !ACTIVE, so it doesn't update
> > e1->total_enable_time and e1->total_running_time.
> >
> > at this point:
> > task_ctx->time = t1 + t2 + t3
> > e0->total_enable_time = t1 + t2 +t3
> > e0->total_running_time = t1 + t3
> > e1->total_enable_time = t1 + t2
> > e1->total_running_time = t2
> >
> > In step 7, amount of t3 enable time is missing in e1.
> > So, the perf prints error ratio for total_running_time/total_enable_time
> > for e1.
> >
> > Here is the summary of change of time as timeline:
> >
> > CPU0 CPU1 CPU0
> > | in | out | | in | out | | in | out (exit) |
> > ------------------------------------------------------------------------------
> > ctx->time | 0 | t1 | | t1 | t1 + t2 | | t1 + t2 | t1 + t2 + t3 |
> > ------------------------------------------------------------------------------
> > e0->ena | 0 | t1 | | t1 | t1 *| | t1 + t2 | t1 + t2 + t3 |
> > ------------------------------------------------------------------------------
> > e0->run | 0 | t1 | | t1 | t1 | | t1 + t2 | t1 + t3 |
> > ------------------------------------------------------------------------------
> > e1->ena | 0 | 0 *| | t1 | t1 + t2 | | t1 + t2 | t1 + t2 XXX |
> > ------------------------------------------------------------------------------
> > e1->run | 0 | 0 | | 0 | t2 | | t1 + t2 | t2 |
> > ------------------------------------------------------------------------------
>
> So just the table is fine (better even than this giant wall of text),
> but you have to explain what's what. Eg, there is no clear progression,
> and having two columns both labeled CPU0 is highly confusing.
>
> Also, a visual indication of where the error happens is useful. So the
> cell marked with XXX is the failure case, right, that's missing '+ t3'.
>
> Except then you should also mark those I stuck a * on.
>
Thanks. I'll rewrite it again.
> > To address this, update total_enable_time in event_sched_out() when event state
> > is PERF_EVENT_STATE_INACTIVE.
>
> This is a giant jump that I'm not following. Notably ctx_sched_out()
> will only iterate pmu_ctx->{pinned,flexible}_active and that list should
> only include ACTIVE events.
> So how does handling INACTIVE in event_sched_out() even begin to help?
the answer is in the perf_event_exit_event()'s
perf_remove_from_context(). in here
event_sched_out() is called via __perf_remove_from_context()
So above case, the enable time is fixed in here.
But I think in case of change of group_ctx_out() is my mistak. Sorry.
>
> Additionally, where does perf-stat get those numbers from; because
> afaict perf_event_read() will do perf_event_update_time() and you should
> be getting the correct numbers, no?
>
> I'm utterly lost in the perf-stat code :-( Also it appears I can't even
> build tools/perf anymore :-(
>
After exit task. in this case, the ctx->is_active doesn't have
EVENT_TIME, it doesn't update ctx->time and the event time.
So, the task_ctx which is child one is callected with missing
total_enable_time.
> > After this patch, this problem is gone like:
>
> Yes, but why? Something smells.
> Looking at group_sched_in(), I'm thinking that that STATE_OFF thing is
> totally wrong, should that not be something like:
>
> if (group_event->state <= PERF_EVENT_STATE_OFF)
> return -EBUSY;
>
> or something !0 at any rate.
>
>
> So what path leads to event_sched_out() with !ACTIVE that's relevant
> here?
>
> It can't have geen that group_sched_out(), because that wasn't in your
> first patch.
>
> How is this a fix and why is it the right fix?
>
>
As I said above, at the time of exit. __perf_remove_from_context()
calls in event_sched_out(). so it fixes the missing enabled time at the
exit.
the middle of event''s total_enable_time() is updated in ctx_sched_in()
(as I above said, it doesn't filter out because it runs with cpu == -1).
So, it's enough to add missing enabled time with exit() and this is
fixed via event_sched_out() in __perf_remove_from_context().
The group_sched_out() is my mistake, But with update in
event_sched_out(). It's enough to fix and I think it's right fix.
next prev parent reply other threads:[~2024-12-20 14:06 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-20 10:02 [PATCH v2] events/core: fix acoount failure for event's total_enable_time Yeoreum Yun
2024-12-20 13:33 ` Peter Zijlstra
2024-12-20 14:05 ` Yeoreum Yun [this message]
2024-12-20 15:14 ` Peter Zijlstra
2024-12-20 15:26 ` Yeoreum Yun
2024-12-20 15:30 ` Peter Zijlstra
2024-12-20 15:45 ` Yeoreum Yun
2024-12-20 16:23 ` Yeoreum Yun
2025-01-10 16:36 ` Peter Zijlstra
2025-01-28 18:57 ` Yeo Reum Yun
2025-02-25 20:36 ` Ingo Molnar
2025-03-06 13:43 ` Yeoreum Yun
2024-12-20 15:28 ` Peter Zijlstra
2024-12-20 15:43 ` Yeoreum Yun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z2V5s2JM4W7BRckR@e129823.arm.com \
--to=yeoreum.yun@arm.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox