From: Yeoreum Yun <yeoreum.yun@arm.com>
To: peterz@infradead.org, mingo@redhat.com, mingo@kernel.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, elver@google.com, leo.yan@arm.com,
james.clark@linaro.org, suzuki.poulose@arm.com,
mike.leach@arm.com
Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
Yeoreum Yun <yeoreum.yun@arm.com>
Subject: [PATCH v5 1/1] events/core: fix failure case for accounting child_total_enable_time at task exit
Date: Wed, 26 Mar 2025 08:20:03 +0000 [thread overview]
Message-ID: <20250326082003.1630986-1-yeoreum.yun@arm.com> (raw)
The perf core code fails to account for total_enable_time of event
when its state is inactive.
Here is a failure case for accounting total_enable_time for
CPU PMU events:
sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 2s
...
armv8_pmuv3_0/event=0x08/: 1138698008 2289429840 2174835740
armv8_pmuv3_1/event=0x08/: 1826791390 1950025700 847648440
` ` `
` ` > total_time_running with child
` > total_time_enabled with child
> count with child
Performance counter stats for 'stress-ng --pthread=2 -t 2s':
1,138,698,008 armv8_pmuv3_0/event=0x08/ (94.99%)
1,826,791,390 armv8_pmuv3_1/event=0x08/ (43.47%)
The two events above are opened on two different CPU PMUs, for example,
each event is opened for a cluster in an Arm big.LITTLE system, they
will never run on the same CPU. In theory, the total enabled time should
be same for both events, as two events are opened and closed together.
As the result show, the two events' total enabled time including
child event are different (2289429840 vs 1950025700).
This is because child events are not accounted properly
if a event is INACTIVE state when the task exits:
perf_event_exit_event()
`> perf_remove_from_context()
`> __perf_remove_from_context()
`> perf_child_detach() -> Accumulate child_total_time_enabled
`> list_del_event() -> Update child event's time
The problem is the time accumulation happens prior to child event's time
updating. Thus, it misses to account the last period's time when event
exits.
The perf core layer follows the rule that timekeeping is tied to state
change. To address the issue, make __perf_remove_from_context() to
handle task exit case by passing the 'DETACH_EXIT' to it and
invokes perf_event_state() for state alongside with accouting the time.
Then, perf_child_detach() populates the time into parent's time metrics.
After this patch, this problem is gone like:
sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 10s
...
armv8_pmuv3_0/event=0x08/: 15396770398 32157963940 21898169000
armv8_pmuv3_1/event=0x08/: 22428964974 32157963940 10259794940
Performance counter stats for 'stress-ng --pthread=2 -t 10s':
15,396,770,398 armv8_pmuv3_0/event=0x08/ (68.10%)
22,428,964,974 armv8_pmuv3_1/event=0x08/ (31.90%)
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Fixes: ef54c1a476aef ("perf: Rework perf_event_exit_event()")
---
kernel/events/core.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 823aa0824916..f191e92c2f48 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2407,6 +2407,7 @@ ctx_time_update_event(struct perf_event_context *ctx, struct perf_event *event)
#define DETACH_GROUP 0x01UL
#define DETACH_CHILD 0x02UL
#define DETACH_DEAD 0x04UL
+#define DETACH_EXIT 0x08UL
/*
* Cross CPU call to remove a performance event
@@ -2421,6 +2422,7 @@ __perf_remove_from_context(struct perf_event *event,
void *info)
{
struct perf_event_pmu_context *pmu_ctx = event->pmu_ctx;
+ enum perf_event_state state = PERF_EVENT_STATE_OFF;
unsigned long flags = (unsigned long)info;
ctx_time_update(cpuctx, ctx);
@@ -2429,16 +2431,19 @@ __perf_remove_from_context(struct perf_event *event,
* Ensure event_sched_out() switches to OFF, at the very least
* this avoids raising perf_pending_task() at this time.
*/
- if (flags & DETACH_DEAD)
+ if (flags & DETACH_EXIT)
+ state = PERF_EVENT_STATE_EXIT;
+ if (flags & DETACH_DEAD) {
event->pending_disable = 1;
+ state = PERF_EVENT_STATE_DEAD;
+ }
event_sched_out(event, ctx);
+ perf_event_set_state(event, min(event->state, state));
if (flags & DETACH_GROUP)
perf_group_detach(event);
if (flags & DETACH_CHILD)
perf_child_detach(event);
list_del_event(event, ctx);
- if (flags & DETACH_DEAD)
- event->state = PERF_EVENT_STATE_DEAD;
if (!pmu_ctx->nr_events) {
pmu_ctx->rotate_necessary = 0;
@@ -13448,12 +13453,7 @@ perf_event_exit_event(struct perf_event *event, struct perf_event_context *ctx)
mutex_lock(&parent_event->child_mutex);
}
- perf_remove_from_context(event, detach_flags);
-
- raw_spin_lock_irq(&ctx->lock);
- if (event->state > PERF_EVENT_STATE_EXIT)
- perf_event_set_state(event, PERF_EVENT_STATE_EXIT);
- raw_spin_unlock_irq(&ctx->lock);
+ perf_remove_from_context(event, detach_flags | DETACH_EXIT);
/*
* Child events can be freed.
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
next reply other threads:[~2025-03-26 8:20 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-26 8:20 Yeoreum Yun [this message]
2025-03-27 11:03 ` [PATCH v5 1/1] events/core: fix failure case for accounting child_total_enable_time at task exit Leo Yan
2025-03-29 11:31 ` Yeo Reum Yun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250326082003.1630986-1-yeoreum.yun@arm.com \
--to=yeoreum.yun@arm.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=elver@google.com \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=leo.yan@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mike.leach@arm.com \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=suzuki.poulose@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).