Re: [PATCH v2] events/core: fix acoount failure for event's total_enable_time

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Yeoreum Yun <yeoreum.yun@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@redhat.com, acme@kernel.org, namhyung@kernel.org,
	mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com,
	kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] events/core: fix acoount failure for event's total_enable_time
Date: Fri, 20 Dec 2024 14:05:39 +0000	[thread overview]
Message-ID: <Z2V5s2JM4W7BRckR@e129823.arm.com> (raw)
In-Reply-To: <20241220133359.GC17537@noisy.programming.kicks-ass.net>

Hi, Peter.

On Fri, Dec 20, 2024 at 02:33:59PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 20, 2024 at 10:02:02AM +0000, Yeoreum Yun wrote:
> > The perf core code fails to account for total_enable_time of event
> > when its state is inactive.
> >
> > Here is the error case for failure to account for total_enable_time for
> > core event:
> >
> > sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 2s
> > ...
> > armv8_pmuv3_0/event=0x08/: 1138698008 2289429840 2174835740
> > armv8_pmuv3_1/event=0x08/: 1826791390 1950025700 847648440
>
> So *that* number is wrong and should have been equal to the number above
> it.

Yes it is.

> >
> >  Performance counter stats for 'stress-ng --pthread=2 -t 2s':
> >
> >      1,138,698,008      armv8_pmuv3_0/event=0x08/                                               (94.99%)
> >      1,826,791,390      armv8_pmuv3_1/event=0x08/                                               (43.47%)
> >
> > Since above two events are belong to the same task context and
> > mutually-exclusive per cpu (they couldn't be active at the same time on the same cpu),
> > the total_enable_time should be the same (the 2nd number among 3 numbers)
> > and the summation of ratio should be 100%.
>
> So the enable_time for the second event is too low, meaning the ratio
> hsould have been ~15% rather than ~43%.
>
> /me goes try and congest the wall of text below for more clues :/
>
> > This account failure of total_enable_time for event could happen in below sequence.
> >
> > 1. two event opened with:
> >    - first event (e0) is opened with pmu0(p0) which could be added on CPU0.
> >    - second event (e1) is opened with pmu1(p1) which could be added on CPU1.
> >    - these two event belongs to the same task_ctx.
> >
> >    at this point:
> >      task_ctx->time = 0
> >      e0->total_enable_time = 0
> >      e0->total_running_time = 0
> >      e1->total_enable_time = 0
> >      e1->total_running_time = 0
> >
> > 2. the task_ctx is sched in CPU0.
> >    - In ctx_sched_in(), the task_ctx->time doesn't updated.
> >    - In event_sched_in() e0 is activated so, its state becomes ACTIVE.
> >    - In event_sched_in() e1 is activated, but soon becomes !ACTIVE
> >      because pmu1 doesn't support CPU1 so it failed in pmu1->add().
>
> This doesn't make sense; e1 should never reach event_sched_in() and it
> should already be INACTIVE.
>
> Notable events are created INACTIVE when !attr->disabled.

But in perf stat code, it via enable_counter(), so it's set with
INACTIVE.

> Also, scheduling should not get beyond merge_sched_in()'s
> event_filter_match(), which will find the CPU is a mismatch and stop
> right there.
>
> This also means the event (e1) does not get to go on flexible_active
> (see below).

No, when perf stat command with above, the cpu sets as == -1,
So, It doesn't filter out in event_filter_match(). so it enter into
merge_sched_in() and get into event_sched_in().

> >    at this point:
> >      task_ctx->time = 0
> >      e0->total_enable_time = 0
> >      e0->total_running_time = 0
> >      e1->total_enable_time = 0
> >      e1->total_running_time = 0
>
> Both have the same state, surely that could've been compressed somewhat.
>
> /me deletes giant wall of text
>
> > 7. the task exit and task_ctx is sched out. and suppose scheulded time as t3.
> >    - In ctx_sched_out(), the task_ctx->time updated with t1 + t2 + t3.
> >    - e0 was ACTIVE, so In event_sched_out(),
> >      e0->total_enable_time and e1->total_running_time are updated with t3.
> >    - e1 was !ACTIVE, so it doesn't update
> >      e1->total_enable_time and e1->total_running_time.
> >
> >    at this point:
> >      task_ctx->time = t1 + t2 + t3
> >      e0->total_enable_time = t1 + t2 +t3
> >      e0->total_running_time = t1 + t3
> >      e1->total_enable_time = t1 + t2
> >      e1->total_running_time = t2
> >
> > In step 7, amount of t3 enable time is missing in e1.
> > So, the perf prints error ratio for total_running_time/total_enable_time
> > for e1.
> >
> > Here is the summary of change of time as timeline:
> >
> >                CPU0                  CPU1                     CPU0
> >           |  in | out |       | in | out     |       | in      | out (exit)   |
> > ------------------------------------------------------------------------------
> > ctx->time |  0  | t1  |       | t1 | t1 + t2 |       | t1 + t2 | t1 + t2 + t3 |
> > ------------------------------------------------------------------------------
> > e0->ena   |  0  | t1  |       | t1 | t1     *|       | t1 + t2 | t1 + t2 + t3 |
> > ------------------------------------------------------------------------------
> > e0->run   |  0  | t1  |       | t1 | t1      |       | t1 + t2 | t1 + t3      |
> > ------------------------------------------------------------------------------
> > e1->ena   |  0  | 0  *|       | t1 | t1 + t2 |       | t1 + t2 | t1 + t2  XXX |
> > ------------------------------------------------------------------------------
> > e1->run   |  0  | 0   |       | 0  | t2      |       | t1 + t2 | t2           |
> > ------------------------------------------------------------------------------
>
> So just the table is fine (better even than this giant wall of text),
> but you have to explain what's what. Eg, there is no clear progression,
> and having two columns both labeled CPU0 is highly confusing.
>
> Also, a visual indication of where the error happens is useful. So the
> cell marked with XXX is the failure case, right, that's missing '+ t3'.
>
> Except then you should also mark those I stuck a * on.
>

Thanks. I'll rewrite it again.

> > To address this, update total_enable_time in event_sched_out() when event state
> > is PERF_EVENT_STATE_INACTIVE.
>
> This is a giant jump that I'm not following. Notably ctx_sched_out()
> will only iterate pmu_ctx->{pinned,flexible}_active and that list should
> only include ACTIVE events.
> So how does handling INACTIVE in event_sched_out() even begin to help?

the answer is in the perf_event_exit_event()'s
perf_remove_from_context(). in here
event_sched_out() is called via __perf_remove_from_context()
So above case, the enable time is fixed in here.

But I think in case of change of group_ctx_out() is my mistak. Sorry.

>
> Additionally, where does perf-stat get those numbers from; because
> afaict perf_event_read() will do perf_event_update_time() and you should
> be getting the correct numbers, no?
>
> I'm utterly lost in the perf-stat code :-( Also it appears I can't even
> build tools/perf anymore :-(
>

After exit task. in this case, the ctx->is_active doesn't have
EVENT_TIME, it doesn't update ctx->time and the event time.
So, the task_ctx which is child one is callected with missing
total_enable_time.

> > After this patch, this problem is gone like:
>
> Yes, but why? Something smells.
> Looking at group_sched_in(), I'm thinking that that STATE_OFF thing is
> totally wrong, should that not be something like:
>
> 	if (group_event->state <= PERF_EVENT_STATE_OFF)
> 		return -EBUSY;
>
> or something !0 at any rate.
>
>
> So what path leads to event_sched_out() with !ACTIVE that's relevant
> here?
>
> It can't have geen that group_sched_out(), because that wasn't in your
> first patch.
>
> How is this a fix and why is it the right fix?
>
>

As I said above, at the time of exit. __perf_remove_from_context()
calls in event_sched_out(). so it fixes the missing enabled time at the
exit.

the middle of event''s total_enable_time() is updated in ctx_sched_in()
(as I above said, it doesn't filter out because it runs with cpu == -1).
So, it's enough to add missing enabled time with exit() and this is
fixed via event_sched_out() in __perf_remove_from_context().

The group_sched_out() is my mistake, But with update in
event_sched_out(). It's enough to fix and I think it's right fix.

next prev parent reply	other threads:[~2024-12-20 14:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20 10:02 [PATCH v2] events/core: fix acoount failure for event's total_enable_time Yeoreum Yun
2024-12-20 13:33 ` Peter Zijlstra
2024-12-20 14:05   ` Yeoreum Yun [this message]
2024-12-20 15:14     ` Peter Zijlstra
2024-12-20 15:26       ` Yeoreum Yun
2024-12-20 15:30         ` Peter Zijlstra
2024-12-20 15:45           ` Yeoreum Yun
2024-12-20 16:23           ` Yeoreum Yun
2025-01-10 16:36             ` Peter Zijlstra
2025-01-28 18:57               ` Yeo Reum Yun
2025-02-25 20:36                 ` Ingo Molnar
2025-03-06 13:43                   ` Yeoreum Yun
2024-12-20 15:28       ` Peter Zijlstra
2024-12-20 15:43         ` Yeoreum Yun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2V5s2JM4W7BRckR@e129823.arm.com \
    --to=yeoreum.yun@arm.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox