public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Minwoo Ahn <mwahn402@gmail.com>
To: peterz@infradead.org
Cc: acme@kernel.org, adrian.hunter@intel.com,
	alexander.shishkin@linux.intel.com, irogers@google.com,
	james.clark@linaro.org, jinkyu@yonsei.ac.kr, jolsa@kernel.org,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	mark.rutland@arm.com, mingo@redhat.com, mwahn402@gmail.com,
	namhyung@kernel.org
Subject: Re: [PATCH v4] perf/core: Fix sampling period inconsistency across CPU migration
Date: Mon,  4 May 2026 13:52:53 +0000	[thread overview]
Message-ID: <20260504135253.1649829-1-mwahn402@gmail.com> (raw)
In-Reply-To: <20260504080832.GO3126523@noisy.programming.kicks-ass.net>

On Mon, May 04, 2026 at 10:08:32AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 29, 2026 at 09:51:34AM +0000, Minwoo Ahn wrote:
> > 
> > When per-task software events are sampled, period_left is not
> > managed consistently when task migration happens. The perf_event
> > may observe a different hw_perf_event::period_left on the new CPU,
> > breaking the sampling periodicity. Even if a task was near its
> > sampling point, it would use a stale period_left after migration.
> 
> How? This is just vague words, not actually saying anything of
> substance.
> 
> > Introduce struct perf_task_context as a per-task container to
> 
> How can you propose a solution to a non-defined problem?

Let me describe the problem more concretely.

A common per-task sampling invocation such as
`perf record -e task-clock -c 1000000 ./a.out` or
`perf record -t TID` opens, unless the user constrains it with `-C`,
one perf_event for each online CPU on the system. So a single target
task ends up with N distinct perf_event objects, each carrying its own
hw_perf_event::period_left that only counts down while the task is
running on the CPU that object is bound to.

When the task migrates from CPU_X to CPU_Y, the event on CPU_X is
scheduled out and the event on CPU_Y is scheduled in. They are
distinct objects, so the partial period accumulated on CPU_X stays in
that object and is dropped, and CPU_Y's event resumes from whatever
period_left it last held -- typically a full sample_period left over
from when its hrtimer was last armed.

Timeline (sample_period = 1.0s, migrate at t=0.6s):

  t=0.0s   on CPU0; event_CPU0->period_left = 1.0s
  t=0.6s   migrate CPU0 -> CPU1
           event_CPU0 sched_out, period_left = 0.4s
           event_CPU1 sched_in,  period_left = 1.0s   (never decremented)
  t=1.6s   first sample fires on CPU1   (task expected one at t=1.0s)

So sample intervals seen by the task no longer correspond to
sample_period.

The example above uses task-clock for clarity (sample_period is in
nanoseconds and period_left counts down with on-CPU time), but the
same mechanism applies to any per-task software sampling event: each
per-CPU object holds its own period_left and the partial progress on
the source CPU is not transferred to the destination CPU on migration.
What is dropped on migration is the unit each event counts in
(occurrences for non-clock sw events, time for clock events).

`perf record --per-thread` avoids this by opening a single
perf_event with cpu=-1, but it also disables inheritance, so it
cannot sample threads spawned after the run starts. The default
per-CPU mode is the one that supports inheritance, and it is the
mode where the inconsistency above is observed.

The per-task period_left should remain consistent from the task's
point of view across migrations, and that is the motivation behind
this patch.

Thanks,
Minwoo

      reply	other threads:[~2026-05-04 13:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29  9:51 [PATCH v4] perf/core: Fix sampling period inconsistency across CPU migration Minwoo Ahn
2026-05-04  8:08 ` Peter Zijlstra
2026-05-04 13:52   ` Minwoo Ahn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260504135253.1649829-1-mwahn402@gmail.com \
    --to=mwahn402@gmail.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jinkyu@yonsei.ac.kr \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox