public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexey Budankov <alexey.budankov@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
	Dmitri Prokhorov <Dmitry.Prohorov@intel.com>,
	Valery Cherepennikov <valery.cherepennikov@intel.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Stephane Eranian <eranian@google.com>,
	David Carrillo-Cisneros <davidcc@google.com>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped events on mux interrupt
Date: Thu, 10 Aug 2017 18:57:43 +0300	[thread overview]
Message-ID: <c052d929-db1d-8b31-4a62-07563d48063b@linux.intel.com> (raw)
In-Reply-To: <20170803150052.za2vofyqfgarukdr@hirez.programming.kicks-ass.net>

[-- Attachment #1: Type: text/plain, Size: 6163 bytes --]

Well, Ok.

I re-implemented this patch of patch set and also implemented a unit 
test that IMHO mimics the case mentioned above to check if it solves the issue.

The test is a single thread application that creates 272 fds for every of  
two INSTRUCTIONS_RETIRED events covering 272 cores of Intel Xeon Phi. 

The test simultaneously executes 1 million of instructions several times 
when the events are enabled and reads the events' counts, TOTAL_ENABLED and 
TOTAL_RUNNING over read() system call.

The first event is allocated, enabled and disabled at the beginning and 
the end of the test execution phase where as the second event allocation and
measurements are included into the measurement interval of the first event.

Below is what I am getting now when running the test on the patched kernel:

EID	CPU	COUNT		T_ENABLED	T_RUNNING	 SCALE
--- 0 enabled ---
--- million instructions ---
0	26	1334		1666671		137366		  8.24
0	94	138637		1735872		527098		 30.37
0	162	874166		1823695		1083785		 59.43
--- million instructions ---
0	26	1334		3328832		137366		  4.13
0	94	1164146		3318599		2109825		 63.58
0	162	874166		3390716		1083785		 31.96
--- million instructions ---
0	26	1334		4835955		137366		  2.84
0	94	2189671		4820750		3611976		 74.93
0	162	874166		4934513		1083785		 21.96
--- 1 enabled ---
--- million instructions ---
0	26	1334		22661310	137366		  0.61
0	94	3248450		22667165	21458391	 94.67
0	162	874166		22742990	1083785		  4.77
1	94	1033387		2150307		2150307		100.00
--- million instructions ---
0	26	1334		24878504	137366		  0.55
0	94	4289784		24869644	23660870	 95.14
0	162	874166		24943564	1083785		  4.34
1	94	2074675		4370708		4370708		100.00
--- 1 disabled ---
--- million instructions ---
0	26	1334		27681611	137366		  0.50
0	94	5337752		27675968	26467194	 95.63
0	162	874166		27749528	1083785		  3.91
1	94	2089278		5024218		5024218		100.00
--- 0 disabled ---
--- million instructions ---
0	26	1334		29004868	137366		  0.47
0	94	5372734		28964075	27755301	 95.83
0	162	874166		29005751	1083785		  3.74
1	94	2089278		5024218		5024218		100.00

The output demonstrates that test thread migrated two time during execution
thus several fds were employed for measuring amount of executed instructions.

Also the output shows that T_RUNNING values of events updated and
maintained so that sums of SCALE values for every event are near 100% 
(no multiplexing) after every million instructions execution.

Unit test code is attached for convenience.

The key thing in the patch is explicit updating of tstamp fields for
INACTIVE events in update_event_times().

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
kernel/events/core.c | 52 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0a4f619..d195fdc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1391,6 +1391,27 @@ static u64 perf_event_time(struct perf_event *event)
 	return ctx ? ctx->time : 0;
 }
 
+void perf_event_tstamp_init(struct perf_event *event)
+{
+	u64 tstamp = perf_event_time(event);
+
+	event->tstamp_enabled = tstamp;
+	event->tstamp_running = tstamp;
+	event->tstamp_stopped = tstamp;
+}
+
+void perf_event_tstamp_update(struct perf_event *event)
+{
+	u64 tstamp, delta;
+
+	tstamp = perf_event_time(event);
+
+	delta = tstamp - event->tstamp_stopped;
+
+	event->tstamp_running += delta;
+	event->tstamp_stopped = tstamp;
+}
+
 /*
  * Update the total_time_enabled and total_time_running fields for a event.
  */
@@ -1405,6 +1426,9 @@ static void update_event_times(struct perf_event *event)
 	    event->group_leader->state < PERF_EVENT_STATE_INACTIVE)
 		return;
 
+	if (event->state == PERF_EVENT_STATE_INACTIVE)
+		perf_event_tstamp_update(event);
+
 	/*
 	 * in cgroup mode, time_enabled represents
 	 * the time the event was enabled AND active
@@ -1430,7 +1454,6 @@ static void update_event_times(struct perf_event *event)
 		run_end = perf_event_time(event);
 
 	event->total_time_running = run_end - event->tstamp_running;
-
 }
 
 /*
@@ -1954,9 +1977,6 @@ event_sched_out(struct perf_event *event,
 		  struct perf_cpu_context *cpuctx,
 		  struct perf_event_context *ctx)
 {
-	u64 tstamp = perf_event_time(event);
-	u64 delta;
-
 	WARN_ON_ONCE(event->ctx != ctx);
 	lockdep_assert_held(&ctx->lock);
 
@@ -1967,18 +1987,15 @@ event_sched_out(struct perf_event *event,
 	 * via read() for time_enabled, time_running:
 	 */
 	if (event->state == PERF_EVENT_STATE_INACTIVE &&
-	    !event_filter_match(event)) {
-		delta = tstamp - event->tstamp_stopped;
-		event->tstamp_running += delta;
-		event->tstamp_stopped = tstamp;
-	}
+	    !event_filter_match(event))
+		perf_event_tstamp_update(event);
 
 	if (event->state != PERF_EVENT_STATE_ACTIVE)
 		return;
 
 	perf_pmu_disable(event->pmu);
 
-	event->tstamp_stopped = tstamp;
+	event->tstamp_stopped = perf_event_time(event);
 	event->pmu->del(event, 0);
 	event->oncpu = -1;
 	event->state = PERF_EVENT_STATE_INACTIVE;
@@ -2294,7 +2311,6 @@ group_sched_in(struct perf_event *group_event,
 {
 	struct perf_event *event, *partial_group = NULL;
 	struct pmu *pmu = ctx->pmu;
-	u64 now = ctx->time;
 	bool simulate = false;
 
 	if (group_event->state == PERF_EVENT_STATE_OFF)
@@ -2340,12 +2356,10 @@ group_sched_in(struct perf_event *group_event,
 		if (event == partial_group)
 			simulate = true;
 
-		if (simulate) {
-			event->tstamp_running += now - event->tstamp_stopped;
-			event->tstamp_stopped = now;
-		} else {
+		if (simulate)
+			perf_event_tstamp_update(event);
+		else
 			event_sched_out(event, cpuctx, ctx);
-		}
 	}
 	event_sched_out(group_event, cpuctx, ctx);
 
@@ -2390,13 +2404,9 @@ static int group_can_go_on(struct perf_event *event,
 static void add_event_to_ctx(struct perf_event *event,
 			       struct perf_event_context *ctx)
 {
-	u64 tstamp = perf_event_time(event);
-
 	list_add_event(event, ctx);
 	perf_group_attach(event);
-	event->tstamp_enabled = tstamp;
-	event->tstamp_running = tstamp;
-	event->tstamp_stopped = tstamp;
+	perf_event_tstamp_init(event);
 }
 
 static void ctx_sched_out(struct perf_event_context *ctx,

[-- Attachment #2: check_multiplexing_read.c --]
[-- Type: text/plain, Size: 3294 bytes --]

/* check_multiplexing_read.c						*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <unistd.h>
#include <errno.h>

#include <sys/mman.h>

#include <sys/ioctl.h>
#include <asm/unistd.h>

#include "perf_event.h"
#include "test_utils.h"
#include "perf_helpers.h"
#include "instructions_testcode.h"

#define NUM_CPUS 272
#define NUM_EVENTS 2

int fd[NUM_EVENTS][NUM_CPUS];
long long events[NUM_EVENTS]={
	PERF_COUNT_HW_CPU_CYCLES,
	PERF_COUNT_HW_INSTRUCTIONS
};

static long long base_results[NUM_CPUS][3];

#define TIME_ENABLED	1
#define TIME_RUNNING	2

static int test_routine(void) {

	int i,result;

	printf("--- million instructions ---\n");

	for(i=0;i<1;i++) {
		result=instructions_million();
	}

	return result;
}

void alloc_events(long long config, int fd[NUM_CPUS])
{
	int ret,i,j;

	struct perf_event_attr pe;

	for(j=0;j<NUM_CPUS;j++) {
		memset(&pe,0,sizeof(struct perf_event_attr));
		pe.type=PERF_TYPE_HARDWARE;
		pe.size=sizeof(struct perf_event_attr);
		pe.config=config;
		pe.disabled=1;
		pe.exclude_kernel=1;
		pe.exclude_hv=1;
		pe.read_format=PERF_FORMAT_TOTAL_TIME_ENABLED |
				PERF_FORMAT_TOTAL_TIME_RUNNING;

		fd[j]=perf_event_open(&pe,0,j,-1,0);
		if (fd[j]<0) {
			fprintf(stderr,"Failed adding mpx event 0 %s\n",
				strerror(errno));
			return -1;
		}
	}
}

void free_events(int fd[NUM_CPUS])
{
	int j;
	for(j=0;j<NUM_CPUS;j++) {
		close(fd[j]);
	}
}

void enable_events(int fd[NUM_CPUS])
{
	int j,ret;
	for(j=0;j<NUM_CPUS;j++) {
		ret=ioctl(fd[j], PERF_EVENT_IOC_ENABLE,0);
		if (ret<0) {
			fprintf(stderr,"Error starting event fd[%d]\n",j);
		}
	}
}

void disable_events(int fd[NUM_CPUS])
{
	int j,ret;
	for(j=0;j<NUM_CPUS;j++) {
		ret=ioctl(fd[j], PERF_EVENT_IOC_DISABLE,0);
		if (ret<0) {
			fprintf(stderr,"Error stopping event fd[%d]\n",j);
		}
	}
}

void read_events(int i, int fd[NUM_CPUS])
{
	int j, ret;
	for(j=0;j<NUM_CPUS;j++) {
		ret=read(fd[j],&base_results[j],3*sizeof(long long));
		if (ret<3*sizeof(long long)) {
			fprintf(stderr,"Event fd[0][%d] unexpected read size %d\n",j,ret);
			return;
		}
		if (base_results[j][0])
			printf("%d\t%d\t%lld\t\t\t%lld\t\t\t%lld\t\t\t%.2f\n", i, j,
				base_results[j][0],
				base_results[j][TIME_ENABLED],
				base_results[j][TIME_RUNNING],
				(double)base_results[j][TIME_RUNNING]/
				(double)base_results[j][TIME_ENABLED] * 100.);
	}
}

int main(int argc, char** argv) {

	int ret,quiet,i,j;

	struct perf_event_attr pe;
	
	char test_string[]="Testing ...";

	printf("\nEID\tCPU\tCOUNT\t\t\tT_ENABLED\t\t\tT_RUNNING\t\t\tSCALE\n");

	alloc_events(events[1], fd[0]);
	enable_events(fd[0]);
	printf("--- 1 enabled\n");

	test_routine();
	read_events(0, fd[0]);

	test_routine();
	read_events(0, fd[0]);

	test_routine();
	read_events(0, fd[0]);

	alloc_events(events[1], fd[1]);
	enable_events(fd[1]);
	printf("--- 1 enabled\n");

	test_routine();
	read_events(0, fd[0]);
	read_events(1, fd[1]);

	test_routine();
	read_events(0, fd[0]);
	read_events(1, fd[1]);

	disable_events(fd[1]);
	printf("--- 1 disabled\n");

	test_routine();
	read_events(0,fd[0]);
	read_events(1,fd[1]);

	disable_events(fd[0]);
	printf("--- 0 disabled\n");
	test_routine();

	read_events(0,fd[0]);
	read_events(1,fd[1]);

	free_events(fd[1]);
	free_events(fd[0]);

	test_pass(test_string);

	return 0;
}

  parent reply	other threads:[~2017-08-10 15:57 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-02  8:11 [PATCH v6 0/3] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Alexey Budankov
2017-08-02  8:13 ` [PATCH v6 1/3] perf/core: use rb trees for pinned/flexible groups Alexey Budankov
2017-08-03 13:00   ` Peter Zijlstra
2017-08-03 20:30     ` Alexey Budankov
2017-08-04 14:36       ` Peter Zijlstra
2017-08-07  7:17         ` Alexey Budankov
2017-08-07  8:39           ` Peter Zijlstra
2017-08-07  9:13             ` Peter Zijlstra
2017-08-07 15:32               ` Alexey Budankov
2017-08-07 15:55                 ` Peter Zijlstra
2017-08-07 16:27                   ` Alexey Budankov
2017-08-07 16:57                     ` Peter Zijlstra
2017-08-07 17:39                       ` Andi Kleen
2017-08-07 18:12                         ` Peter Zijlstra
2017-08-07 18:13                       ` Alexey Budankov
2017-08-15 17:28           ` Alexey Budankov
2017-08-23 13:39             ` Alexander Shishkin
2017-08-23 14:18               ` Alexey Budankov
2017-08-29 13:51             ` Alexander Shishkin
2017-08-30  8:30               ` Alexey Budankov
2017-08-30 10:18                 ` Alexander Shishkin
2017-08-30 10:30                   ` Alexey Budankov
2017-08-30 11:13                     ` Alexander Shishkin
2017-08-30 11:16                 ` Alexey Budankov
2017-08-31 10:12                   ` Alexey Budankov
2017-08-31 10:12             ` Alexey Budankov
2017-08-04 14:53       ` Peter Zijlstra
2017-08-07 15:22         ` Alexey Budankov
2017-08-02  8:15 ` [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped events on mux interrupt Alexey Budankov
2017-08-03 13:04   ` Peter Zijlstra
2017-08-03 14:00   ` Peter Zijlstra
2017-08-03 15:58     ` Alexey Budankov
2017-08-04 12:36       ` Peter Zijlstra
2017-08-03 15:00   ` Peter Zijlstra
2017-08-03 18:47     ` Alexey Budankov
2017-08-04 12:35       ` Peter Zijlstra
2017-08-04 12:51         ` Peter Zijlstra
2017-08-04 14:25           ` Alexey Budankov
2017-08-04 14:23         ` Alexey Budankov
2017-08-10 15:57     ` Alexey Budankov [this message]
2017-08-22 20:47       ` Peter Zijlstra
2017-08-23  8:54         ` Alexey Budankov
2017-08-31 17:18           ` [RFC][PATCH] perf: Rewrite enabled/running timekeeping Peter Zijlstra
2017-08-31 19:51             ` Stephane Eranian
2017-09-05  7:51               ` Stephane Eranian
2017-09-05  9:44                 ` Peter Zijlstra
2017-09-01 10:45             ` Alexey Budankov
2017-09-01 12:31               ` Peter Zijlstra
2017-09-01 11:17             ` Alexey Budankov
2017-09-01 12:42               ` Peter Zijlstra
2017-09-01 21:03             ` Vince Weaver
2017-09-04 10:46             ` Alexey Budankov
2017-09-04 12:08               ` Peter Zijlstra
2017-09-04 14:56                 ` Alexey Budankov
2017-09-04 15:41                   ` Peter Zijlstra
2017-09-04 15:58                     ` Peter Zijlstra
2017-09-05 10:17                     ` Alexey Budankov
2017-09-05 11:19                       ` Peter Zijlstra
2017-09-11  6:55                         ` Alexey Budankov
2017-09-05 12:06                       ` Alexey Budankov
2017-09-05 12:59                         ` Peter Zijlstra
2017-09-05 16:03                         ` Peter Zijlstra
2017-09-06 13:48                           ` Alexey Budankov
2017-09-08  8:47                           ` Alexey Budankov
2018-03-12 17:43                             ` [tip:perf/core] perf/cor: Use RB trees for pinned/flexible groups tip-bot for Alexey Budankov
2017-08-02  8:16 ` [PATCH v6 3/3]: perf/core: add mux switch to skip to the current CPU's events list on mux interrupt Alexey Budankov
2017-08-18  5:17 ` [PATCH v7 0/2] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Alexey Budankov
2017-08-18  5:21   ` [PATCH v7 1/2] perf/core: use rb trees for pinned/flexible groups Alexey Budankov
2017-08-23 11:17     ` Alexander Shishkin
2017-08-23 17:23       ` Alexey Budankov
2017-08-18  5:22   ` [PATCH v7 2/2] perf/core: add mux switch to skip to the current CPU's events list on mux interrupt Alexey Budankov
2017-08-23 11:54     ` Alexander Shishkin
2017-08-23 18:12       ` Alexey Budankov
2017-08-22 20:21   ` [PATCH v7 0/2] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Peter Zijlstra
2017-08-23  8:54     ` Alexey Budankov
2017-08-31 10:12     ` Alexey Budankov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c052d929-db1d-8b31-4a62-07563d48063b@linux.intel.com \
    --to=alexey.budankov@linux.intel.com \
    --cc=Dmitry.Prohorov@intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=davidcc@google.com \
    --cc=eranian@google.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=valery.cherepennikov@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox