public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexey Budankov <alexey.budankov@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
	Dmitri Prokhorov <Dmitry.Prohorov@intel.com>,
	Valery Cherepennikov <valery.cherepennikov@intel.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Stephane Eranian <eranian@google.com>,
	David Carrillo-Cisneros <davidcc@google.com>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped events on mux interrupt
Date: Thu, 3 Aug 2017 18:58:41 +0300	[thread overview]
Message-ID: <06dc5615-a3ba-cc7b-d172-a13601ba4d4d@linux.intel.com> (raw)
In-Reply-To: <20170803140016.otzlyyszgpznksto@hirez.programming.kicks-ass.net>

On 03.08.2017 17:00, Peter Zijlstra wrote:
> On Wed, Aug 02, 2017 at 11:15:39AM +0300, Alexey Budankov wrote:
>> +struct perf_event_tstamp {
>> +	/*
>> +	 * These are timestamps used for computing total_time_enabled
>> +	 * and total_time_running when the event is in INACTIVE or
>> +	 * ACTIVE state, measured in nanoseconds from an arbitrary point
>> +	 * in time.
>> +	 * enabled: the notional time when the event was enabled
>> +	 * running: the notional time when the event was scheduled on
>> +	 * stopped: in INACTIVE state, the notional time when the
>> +	 *    event was scheduled off.
>> +	 */
>> +	u64 enabled;
>> +	u64 running;
>> +	u64 stopped;
>> +};
> 
> 
> So I have the below (untested) patch, also see:
> 
>   https://lkml.kernel.org/r/20170802171051.zlq5rgx3jqkkxpg7@hirez.programming.kicks-ass.net
> 
> And I don't think I fully agree with your description of running.

I copied this comment from the previous place without any change.

> Despite its name tstamp_running is not in fact a time stamp afaict. Its
> more like an accumulator of running, but with an offset of stopped.

I see tstamp_running as something that needs to be subtracted from the timestamp
e.g. when update_context_time() is called to get correct event's total timings:

total_time_enabled = timestamp - enabled
total_time_running = timestamp - running

E.g. for the case with a single thread and a single event, running on a
dual-core machine during 10 ticks and half time on each core we have:

For the first core event instance:

10 = total_time_enabled = timestamp[110] - enabled[100]
5  = total_time_running = timestamp[110] - running[100 + 1 + 1 + 1 + 1 + 1]

"+ 1" above for every time event instance doesn't get thru perf_event_filter().
In particular when an event instance is for a CPU different from the one that 
schedules the instance.

So 5/10 = 0.5 - 50% of time event running on the first core. The same is for the second core.

When we sum up instances times we get value for the user:

50%(first core) + 50%(second core) = 100% of event run time - no multiplexing case.

Without a thread migration we would have:

For the first core running thread:

10 = total_time_enabled = timestamp[110] - enabled[100]
10 = total_time_running = timestamp[110] - running[100]

10/10 = 1 - 100%

For the second core:

10 = total_time_enabled = timestamp[110] - enabled[100]
0  = total_time_running = timestamp[110] - running[100 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1]

0/10 = 0 - 0%

100% + 0% == 100% of event run time

>From this perspective tstamp_running field indeed accumulates some time 
but is more like tstamp_eligible_to_run so:

	total_time_running == elapsed - tstamp_eligible_to_run

> 
> I'm always completely confused by the way this timekeeping is done.
> 
> ---
> Subject: perf: Fix time on IOC_ENABLE
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu Aug 3 15:42:09 CEST 2017
> 
> Vince reported that when we do IOC_ENABLE/IOC_DISABLE while the task
> is SIGSTOP'ed state the timestamps go wobbly.
> 
> It turns out we indeed fail to correctly account time while in 'OFF'
> state and doing IOC_ENABLE without getting scheduled in exposes the
> problem.
> 
> Further thinking about this problem, it occurred to me that we can
> suffer a similar fate when we migrate an uncore event between CPUs.
> The perf_event_install() on the 'new' CPU will do add_event_to_ctx()
> which will reset all the time stamp, resulting in a subsequent
> update_event_times() to overwrite the total_time_* fields with smaller
> values.
> 
> Reported-by: Vince Weaver <vincent.weaver@maine.edu>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/core.c |   36 +++++++++++++++++++++++++++++++-----
>  1 file changed, 31 insertions(+), 5 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2217,6 +2217,33 @@ static int group_can_go_on(struct perf_e
>  	return can_add_hw;
>  }
>  
> +/*
> + * Complement to update_event_times(). This computes the tstamp_* values to
> + * continue 'enabled' state from @now. And effectively discards the time
> + * between the prior tstamp_stopped and now (as we were in the OFF state, or
> + * just switched (context) time base).
> + *
> + * This further assumes '@event->state == INACTIVE' (we just came from OFF) and
> + * cannot have been scheduled in yet. And going into INACTIVE state means
> + * '@event->tstamp_stopped = @now'.
> + *
> + * Thus given the rules of update_event_times():
> + *
> + *   total_time_enabled = tstamp_stopped - tstamp_enabled
> + *   total_time_running = tstamp_stopped - tstamp_running
> + *
> + * We can insert 'tstamp_stopped == now' and reverse them to compute new
> + * tstamp_* values.
> + */
> +static void __perf_event_enable_time(struct perf_event *event, u64 now)
> +{
> +	WARN_ON_ONCE(event->state != PERF_EVENT_STATE_INACTIVE);
> +
> +	event->tstamp_stopped = now;
> +	event->tstamp_enabled = now - event->total_time_enabled;
> +	event->tstamp_running = now - event->total_time_running;
> +}
> +
>  static void add_event_to_ctx(struct perf_event *event,
>  			       struct perf_event_context *ctx)
>  {
> @@ -2224,9 +2251,7 @@ static void add_event_to_ctx(struct perf
>  
>  	list_add_event(event, ctx);
>  	perf_group_attach(event);
> -	event->tstamp_enabled = tstamp;
> -	event->tstamp_running = tstamp;
> -	event->tstamp_stopped = tstamp;
> +	__perf_event_enable_time(event, tstamp);
>  }
>  
>  static void ctx_sched_out(struct perf_event_context *ctx,
> @@ -2471,10 +2496,11 @@ static void __perf_event_mark_enabled(st
>  	u64 tstamp = perf_event_time(event);
>  
>  	event->state = PERF_EVENT_STATE_INACTIVE;
> -	event->tstamp_enabled = tstamp - event->total_time_enabled;
> +	__perf_event_enable_time(event, tstamp);
>  	list_for_each_entry(sub, &event->sibling_list, group_entry) {
> +		/* XXX should not be > INACTIVE if event isn't */
>  		if (sub->state >= PERF_EVENT_STATE_INACTIVE)
> -			sub->tstamp_enabled = tstamp - sub->total_time_enabled;
> +			__perf_event_enable_time(sub, tstamp);
>  	}
>  }
>  
> 

  reply	other threads:[~2017-08-03 15:58 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-02  8:11 [PATCH v6 0/3] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Alexey Budankov
2017-08-02  8:13 ` [PATCH v6 1/3] perf/core: use rb trees for pinned/flexible groups Alexey Budankov
2017-08-03 13:00   ` Peter Zijlstra
2017-08-03 20:30     ` Alexey Budankov
2017-08-04 14:36       ` Peter Zijlstra
2017-08-07  7:17         ` Alexey Budankov
2017-08-07  8:39           ` Peter Zijlstra
2017-08-07  9:13             ` Peter Zijlstra
2017-08-07 15:32               ` Alexey Budankov
2017-08-07 15:55                 ` Peter Zijlstra
2017-08-07 16:27                   ` Alexey Budankov
2017-08-07 16:57                     ` Peter Zijlstra
2017-08-07 17:39                       ` Andi Kleen
2017-08-07 18:12                         ` Peter Zijlstra
2017-08-07 18:13                       ` Alexey Budankov
2017-08-15 17:28           ` Alexey Budankov
2017-08-23 13:39             ` Alexander Shishkin
2017-08-23 14:18               ` Alexey Budankov
2017-08-29 13:51             ` Alexander Shishkin
2017-08-30  8:30               ` Alexey Budankov
2017-08-30 10:18                 ` Alexander Shishkin
2017-08-30 10:30                   ` Alexey Budankov
2017-08-30 11:13                     ` Alexander Shishkin
2017-08-30 11:16                 ` Alexey Budankov
2017-08-31 10:12                   ` Alexey Budankov
2017-08-31 10:12             ` Alexey Budankov
2017-08-04 14:53       ` Peter Zijlstra
2017-08-07 15:22         ` Alexey Budankov
2017-08-02  8:15 ` [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped events on mux interrupt Alexey Budankov
2017-08-03 13:04   ` Peter Zijlstra
2017-08-03 14:00   ` Peter Zijlstra
2017-08-03 15:58     ` Alexey Budankov [this message]
2017-08-04 12:36       ` Peter Zijlstra
2017-08-03 15:00   ` Peter Zijlstra
2017-08-03 18:47     ` Alexey Budankov
2017-08-04 12:35       ` Peter Zijlstra
2017-08-04 12:51         ` Peter Zijlstra
2017-08-04 14:25           ` Alexey Budankov
2017-08-04 14:23         ` Alexey Budankov
2017-08-10 15:57     ` Alexey Budankov
2017-08-22 20:47       ` Peter Zijlstra
2017-08-23  8:54         ` Alexey Budankov
2017-08-31 17:18           ` [RFC][PATCH] perf: Rewrite enabled/running timekeeping Peter Zijlstra
2017-08-31 19:51             ` Stephane Eranian
2017-09-05  7:51               ` Stephane Eranian
2017-09-05  9:44                 ` Peter Zijlstra
2017-09-01 10:45             ` Alexey Budankov
2017-09-01 12:31               ` Peter Zijlstra
2017-09-01 11:17             ` Alexey Budankov
2017-09-01 12:42               ` Peter Zijlstra
2017-09-01 21:03             ` Vince Weaver
2017-09-04 10:46             ` Alexey Budankov
2017-09-04 12:08               ` Peter Zijlstra
2017-09-04 14:56                 ` Alexey Budankov
2017-09-04 15:41                   ` Peter Zijlstra
2017-09-04 15:58                     ` Peter Zijlstra
2017-09-05 10:17                     ` Alexey Budankov
2017-09-05 11:19                       ` Peter Zijlstra
2017-09-11  6:55                         ` Alexey Budankov
2017-09-05 12:06                       ` Alexey Budankov
2017-09-05 12:59                         ` Peter Zijlstra
2017-09-05 16:03                         ` Peter Zijlstra
2017-09-06 13:48                           ` Alexey Budankov
2017-09-08  8:47                           ` Alexey Budankov
2018-03-12 17:43                             ` [tip:perf/core] perf/cor: Use RB trees for pinned/flexible groups tip-bot for Alexey Budankov
2017-08-02  8:16 ` [PATCH v6 3/3]: perf/core: add mux switch to skip to the current CPU's events list on mux interrupt Alexey Budankov
2017-08-18  5:17 ` [PATCH v7 0/2] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Alexey Budankov
2017-08-18  5:21   ` [PATCH v7 1/2] perf/core: use rb trees for pinned/flexible groups Alexey Budankov
2017-08-23 11:17     ` Alexander Shishkin
2017-08-23 17:23       ` Alexey Budankov
2017-08-18  5:22   ` [PATCH v7 2/2] perf/core: add mux switch to skip to the current CPU's events list on mux interrupt Alexey Budankov
2017-08-23 11:54     ` Alexander Shishkin
2017-08-23 18:12       ` Alexey Budankov
2017-08-22 20:21   ` [PATCH v7 0/2] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Peter Zijlstra
2017-08-23  8:54     ` Alexey Budankov
2017-08-31 10:12     ` Alexey Budankov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06dc5615-a3ba-cc7b-d172-a13601ba4d4d@linux.intel.com \
    --to=alexey.budankov@linux.intel.com \
    --cc=Dmitry.Prohorov@intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=davidcc@google.com \
    --cc=eranian@google.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=valery.cherepennikov@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox