From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>, mingo@kernel.org
Cc: acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/5] perf: Add context time freeze
Date: Wed, 7 Aug 2024 11:17:18 -0400 [thread overview]
Message-ID: <b38cc358-8e46-48bd-88c0-ff4b8db6bd15@linux.intel.com> (raw)
In-Reply-To: <20240807115550.250637571@infradead.org>
On 2024-08-07 7:29 a.m., Peter Zijlstra wrote:
> Many of the the context reschedule users are of the form:
>
> ctx_sched_out(.type = EVENT_TIME);
> ... modify context
> ctx_resched();
>
> With the idea that the whole reschedule happens with a single
> time-stamp, rather than with each ctx_sched_out() advancing time and
> ctx_sched_in() re-starting time, creating a non-atomic experience.
>
> However, Kan noticed that since this completely stops time, it
> actually looses a bit of time between the stop and start. Worse, now
> that we can do partial (per PMU) reschedules, the PMUs that are not
> scheduled out still observe the time glitch.
>
> Replace this with:
>
> ctx_time_freeze();
> ... modify context
> ctx_resched();
>
> With the assumption that this happens in a perf_ctx_lock() /
> perf_ctx_unlock() pair.
>
> The new ctx_time_freeze() will update time and sets EVENT_FROZEN, and
> ensures EVENT_TIME and EVENT_FROZEN remain set, this avoids
> perf_event_time_now() from observing a time wobble from not seeing
> EVENT_TIME for a little while.
>
> Additionally, this avoids loosing time between
> ctx_sched_out(EVENT_TIME) and ctx_sched_in(), which would re-set the
> timestamp.
>
> Reported-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/events/core.c | 128 ++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 86 insertions(+), 42 deletions(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -155,20 +155,55 @@ static int cpu_function_call(int cpu, re
> return data.ret;
> }
>
> +enum event_type_t {
> + EVENT_FLEXIBLE = 0x01,
> + EVENT_PINNED = 0x02,
> + EVENT_TIME = 0x04,
> + EVENT_FROZEN = 0x08,
> + /* see ctx_resched() for details */
> + EVENT_CPU = 0x10,
> + EVENT_CGROUP = 0x20,
> +
> + /* compound helpers */
> + EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
> + EVENT_TIME_FROZEN = EVENT_TIME | EVENT_FROZEN,
> +};
> +
> +static inline void __perf_ctx_lock(struct perf_event_context *ctx)
> +{
> + raw_spin_lock(&ctx->lock);
> + WARN_ON_ONCE(ctx->is_active & EVENT_FROZEN);
> +}
> +
> static void perf_ctx_lock(struct perf_cpu_context *cpuctx,
> struct perf_event_context *ctx)
> {
> - raw_spin_lock(&cpuctx->ctx.lock);
> + __perf_ctx_lock(&cpuctx->ctx);
> if (ctx)
> - raw_spin_lock(&ctx->lock);
> + __perf_ctx_lock(ctx);
> +}
> +
> +static inline void __perf_ctx_unlock(struct perf_event_context *ctx)
> +{
> + /*
> + * If ctx_sched_in() didn't again set any ALL flags, clean up
> + * after ctx_sched_out() by clearing is_active.
> + */
> + if (ctx->is_active & EVENT_FROZEN) {
> + if (!(ctx->is_active & EVENT_ALL))
Nit:
It may be better to add a macro/inline function to replace all the
(ctx->is_active & EVENT_ALL) check? For example,
+static inline bool perf_ctx_has_active_events(struct perf_event_context
*ctx)
+{
+ return ctx->is_active & EVENT_ALL;
+}
...
+ if (ctx->is_active & EVENT_FROZEN) {
+ if (!perf_ctx_has_active_events(ctx))
+ ctx->is_active = 0;
+ else
+ ctx->is_active &= ~EVENT_FROZEN;
It can tell very straightforwardly that we want to clear all flags if
there is no active event.
The EVENT_ALL may bring confusion. It actually means all events, not all
event types. The developer may have to go to the define and figure out
what exactly the EVENT_ALL includes.
Thanks,
Kan
> + ctx->is_active = 0;
> + else
> + ctx->is_active &= ~EVENT_FROZEN;
> + }
> + raw_spin_unlock(&ctx->lock);
> }
>
> static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
> struct perf_event_context *ctx)
> {
> if (ctx)
> - raw_spin_unlock(&ctx->lock);
> - raw_spin_unlock(&cpuctx->ctx.lock);
> + __perf_ctx_unlock(ctx);
> + __perf_ctx_unlock(&cpuctx->ctx);
> }
>
> #define TASK_TOMBSTONE ((void *)-1L)
> @@ -370,16 +405,6 @@ static void event_function_local(struct
> (PERF_SAMPLE_BRANCH_KERNEL |\
> PERF_SAMPLE_BRANCH_HV)
>
> -enum event_type_t {
> - EVENT_FLEXIBLE = 0x1,
> - EVENT_PINNED = 0x2,
> - EVENT_TIME = 0x4,
> - /* see ctx_resched() for details */
> - EVENT_CPU = 0x8,
> - EVENT_CGROUP = 0x10,
> - EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
> -};
> -
> /*
> * perf_sched_events : >0 events exist
> */
> @@ -2332,18 +2357,39 @@ group_sched_out(struct perf_event *group
> }
>
> static inline void
> -ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx)
> +__ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx, bool final)
> {
> if (ctx->is_active & EVENT_TIME) {
> + if (ctx->is_active & EVENT_FROZEN)
> + return;
> update_context_time(ctx);
> - update_cgrp_time_from_cpuctx(cpuctx, false);
> + update_cgrp_time_from_cpuctx(cpuctx, final);
> }
> }
>
> static inline void
> +ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx)
> +{
> + __ctx_time_update(cpuctx, ctx, false);
> +}
> +
> +/*
> + * To be used inside perf_ctx_lock() / perf_ctx_unlock(). Lasts until perf_ctx_unlock().
> + */
> +static inline void
> +ctx_time_freeze(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx)
> +{
> + ctx_time_update(cpuctx, ctx);
> + if (ctx->is_active & EVENT_TIME)
> + ctx->is_active |= EVENT_FROZEN;
> +}
> +
> +static inline void
> ctx_time_update_event(struct perf_event_context *ctx, struct perf_event *event)
> {
> if (ctx->is_active & EVENT_TIME) {
> + if (ctx->is_active & EVENT_FROZEN)
> + return;
> update_context_time(ctx);
> update_cgrp_time_from_event(event);
> }
> @@ -2822,7 +2868,7 @@ static int __perf_install_in_context(vo
> #endif
>
> if (reprogram) {
> - ctx_sched_out(ctx, NULL, EVENT_TIME);
> + ctx_time_freeze(cpuctx, ctx);
> add_event_to_ctx(event, ctx);
> ctx_resched(cpuctx, task_ctx, event->pmu_ctx->pmu,
> get_event_type(event));
> @@ -2968,8 +3014,7 @@ static void __perf_event_enable(struct p
> event->state <= PERF_EVENT_STATE_ERROR)
> return;
>
> - if (ctx->is_active)
> - ctx_sched_out(ctx, NULL, EVENT_TIME);
> + ctx_time_freeze(cpuctx, ctx);
>
> perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE);
> perf_cgroup_event_enable(event, ctx);
> @@ -2977,19 +3022,15 @@ static void __perf_event_enable(struct p
> if (!ctx->is_active)
> return;
>
> - if (!event_filter_match(event)) {
> - ctx_sched_in(ctx, NULL, EVENT_TIME);
> + if (!event_filter_match(event))
> return;
> - }
>
> /*
> * If the event is in a group and isn't the group leader,
> * then don't put it on unless the group is on.
> */
> - if (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE) {
> - ctx_sched_in(ctx, NULL, EVENT_TIME);
> + if (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE)
> return;
> - }
>
> task_ctx = cpuctx->task_ctx;
> if (ctx->task)
> @@ -3263,7 +3304,7 @@ static void __pmu_ctx_sched_out(struct p
> struct perf_event *event, *tmp;
> struct pmu *pmu = pmu_ctx->pmu;
>
> - if (ctx->task && !ctx->is_active) {
> + if (ctx->task && !(ctx->is_active & EVENT_ALL)) {
> struct perf_cpu_pmu_context *cpc;
>
> cpc = this_cpu_ptr(pmu->cpu_pmu_context);
> @@ -3338,24 +3379,29 @@ ctx_sched_out(struct perf_event_context
> *
> * would only update time for the pinned events.
> */
> - if (is_active & EVENT_TIME) {
> - /* update (and stop) ctx time */
> - update_context_time(ctx);
> - update_cgrp_time_from_cpuctx(cpuctx, ctx == &cpuctx->ctx);
> + __ctx_time_update(cpuctx, ctx, ctx == &cpuctx->ctx);
> +
> + /*
> + * CPU-release for the below ->is_active store,
> + * see __load_acquire() in perf_event_time_now()
> + */
> + barrier();
> + ctx->is_active &= ~event_type;
> +
> + if (!(ctx->is_active & EVENT_ALL)) {
> /*
> - * CPU-release for the below ->is_active store,
> - * see __load_acquire() in perf_event_time_now()
> + * For FROZEN, preserve TIME|FROZEN such that perf_event_time_now()
> + * does not observe a hole. perf_ctx_unlock() will clean up.
> */
> - barrier();
> + if (ctx->is_active & EVENT_FROZEN)
> + ctx->is_active &= EVENT_TIME_FROZEN;
> + else
> + ctx->is_active = 0;
> }
>
> - ctx->is_active &= ~event_type;
> - if (!(ctx->is_active & EVENT_ALL))
> - ctx->is_active = 0;
> -
> if (ctx->task) {
> WARN_ON_ONCE(cpuctx->task_ctx != ctx);
> - if (!ctx->is_active)
> + if (!(ctx->is_active & EVENT_ALL))
> cpuctx->task_ctx = NULL;
> }
>
> @@ -3943,7 +3989,7 @@ ctx_sched_in(struct perf_event_context *
>
> ctx->is_active |= (event_type | EVENT_TIME);
> if (ctx->task) {
> - if (!is_active)
> + if (!(is_active & EVENT_ALL))
> cpuctx->task_ctx = ctx;
> else
> WARN_ON_ONCE(cpuctx->task_ctx != ctx);
> @@ -4424,7 +4470,7 @@ static void perf_event_enable_on_exec(st
>
> cpuctx = this_cpu_ptr(&perf_cpu_context);
> perf_ctx_lock(cpuctx, ctx);
> - ctx_sched_out(ctx, NULL, EVENT_TIME);
> + ctx_time_freeze(cpuctx, ctx);
>
> list_for_each_entry(event, &ctx->event_list, event_entry) {
> enabled |= event_enable_on_exec(event, ctx);
> @@ -4437,8 +4483,6 @@ static void perf_event_enable_on_exec(st
> if (enabled) {
> clone_ctx = unclone_ctx(ctx);
> ctx_resched(cpuctx, ctx, NULL, event_type);
> - } else {
> - ctx_sched_in(ctx, NULL, EVENT_TIME);
> }
> perf_ctx_unlock(cpuctx, ctx);
>
>
>
>
next prev parent reply other threads:[~2024-08-07 15:17 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-07 11:29 [PATCH 0/5] perf: Per PMU context reschedule and misc Peter Zijlstra
2024-08-07 11:29 ` [PATCH 1/5] perf: Optimize context reschedule for single PMU cases Peter Zijlstra
2024-08-08 10:32 ` [tip: perf/core] " tip-bot2 for Peter Zijlstra
2024-08-07 11:29 ` [PATCH 2/5] perf: Extract a few helpers Peter Zijlstra
2024-08-08 10:32 ` [tip: perf/core] " tip-bot2 for Peter Zijlstra
2024-08-07 11:29 ` [PATCH 3/5] perf: Fix event_function_call() locking Peter Zijlstra
2024-08-08 10:32 ` [tip: perf/core] " tip-bot2 for Peter Zijlstra
2024-08-13 1:34 ` Pengfei Xu
2024-08-13 15:19 ` Naresh Kamboju
2024-08-13 18:28 ` Namhyung Kim
2024-08-13 21:02 ` Peter Zijlstra
2024-08-14 2:35 ` Pengfei Xu
2024-08-07 11:29 ` [PATCH 4/5] perf: Add context time freeze Peter Zijlstra
2024-08-07 15:17 ` Liang, Kan [this message]
2024-08-07 19:09 ` Peter Zijlstra
2024-08-08 10:32 ` [tip: perf/core] " tip-bot2 for Peter Zijlstra
2024-08-07 11:29 ` [PATCH 5/5] perf: Optimize __pmu_ctx_sched_out() Peter Zijlstra
2024-08-08 10:32 ` [tip: perf/core] " tip-bot2 for Peter Zijlstra
2024-08-07 15:19 ` [PATCH 0/5] perf: Per PMU context reschedule and misc Liang, Kan
2024-08-07 18:54 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b38cc358-8e46-48bd-88c0-ff4b8db6bd15@linux.intel.com \
--to=kan.liang@linux.intel.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.