[BUG] perf_event: when events are grouped, the time enabled / running values are incorrect

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
@ 2010-05-08  1:56 Corey Ashford
  2010-05-08  2:24 ` Corey Ashford
  2010-05-11 14:42 ` Peter Zijlstra
  0 siblings, 2 replies; 15+ messages in thread
From: Corey Ashford @ 2010-05-08  1:56 UTC (permalink / raw)
  To: LKML, Peter Zijlstra, Paul Mackerras; +Cc: Stephane Eranian

[-- Attachment #1: Type: text/plain, Size: 895 bytes --]

Hi,

There appears to be a bug in the kernel related to reading up the time 
running and enabled values for events that are in a group.  The group 
leader's time running and time enable values look correct, but all of 
the other group members have a zero value for their time running and 
time enabled fields.

This happens only when remote monitoring a process (perhaps only after 
it has terminated)... when self monitoring, the time running/enabled 
values come out non-zero, and the values are the same for all of the 
counters (as one would expect since they can be enabled/disabled 
simultaneously).

I've attached a test case which you can place in the tools/perf 
subdirectory and compile with just:

gcc -o show_re_bug show_re_bug.c

Thanks for your consideration,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
cjashfor@us.ibm.com

[-- Attachment #2: show_re_bug.c --]
[-- Type: text/plain, Size: 6356 bytes --]

/*
 * task_inherit.c - example of a task counting event in a tree of child processes
 *
 * Copyright (c) 2009 Google, Inc
 * Contributed by Stephane Eranian <eranian@gmail.com>
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
 * of the Software, and to permit persons to whom the Software is furnished to do so,
 * subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
 * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
 * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 *
 * This test case is a cut down of the program perf_examples/task from libpfm4.
 */
#include <sys/types.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdarg.h>

#include <sys/wait.h>
#include "perf.h"

typedef struct {
        struct perf_event_attr hw;
        uint64_t value, prev_value;
        uint64_t enabled, running;
        char *name;
        int fd;
} perf_event_desc_t;

int
child(char **arg)
{
	/*
	 * execute the requested command
	 */
	execvp(arg[0], arg);
	fprintf(stderr, "cannot exec: %s\n", arg[0]);
	exit(1);
	/* not reached */
}

static inline uint64_t
perf_scale(uint64_t *values)
{
	uint64_t res = 0;

	if (values[2])
		res = (uint64_t)((double)values[0] * values[1]/values[2]);
	return res;
}


static void
read_group(perf_event_desc_t *fds, int num)
{
	uint64_t *values;
	size_t sz;
	int i, ret;

	/*
	 * 	{ u64		nr;
	 * 	  { u64		time_enabled; } && PERF_FORMAT_ENABLED
	 * 	  { u64		time_running; } && PERF_FORMAT_RUNNING
	 * 	  { u64		value;
	 * 	    { u64	id;           } && PERF_FORMAT_ID
	 * 	  }		cntr[nr];
	 * 	} && PERF_FORMAT_GROUP
	 *
	 * we do not use FORMAT_ID in this program
	 */
	sz = sizeof(uint64_t) * 3;
	values = malloc(sz);
	if (!values)
		fprintf(stderr, "cannot allocate memory for values\n");

	for (i = 0; i < num; i++) {
		ret = read(fds[i].fd, values, sz);
		if (ret != sz) { /* unsigned */
			if (ret == -1)
				fprintf(stderr, "cannot read values\n");
			else	/* likely pinned and could not be loaded */
				fprintf(stderr, "Error! tried to read %d bytes, but got %d\n", (int)sz, ret);
		}
		printf("values[0] = %llx\n", values[0]);
		printf("values[1] = %llx\n", values[1]);
		printf("values[2] = %llx\n", values[2]);
	}
	free(values);
	return;

	/*
	 * propagate to save area
	 */
	for(i = 0; i < num; i++) {
		values[0] = values[3+i];
		/*
		 * scaling because we may be sharing the PMU and
		 * thus may be multiplexed
		 */
		fds[i].prev_value = fds[i].value;
		fds[i].value = perf_scale(values);
		fds[i].enabled = values[1];
		fds[i].running = values[2];
	}
	free(values);
}


static void
print_counts(perf_event_desc_t *fds, int num)
{
	int i;


	read_group(fds, num);
	for(i = 0; i < num; i++) {
		double ratio;
		uint64_t val;

		val = fds[i].value - fds[i].prev_value;

		ratio = 0.0;
		if (fds[i].enabled)
			ratio = 1.0 * fds[i].running / fds[i].enabled;

		if (ratio == 1.0)
			printf("%'20"PRIu64" %s (%'"PRIu64" : %'"PRIu64")\n", val, fds[i].name, fds[i].enabled, fds[i].running);
		else
			if (ratio == 0.0)
				printf("%'20"PRIu64" %s (did not run: incompatible events, too many events in a group, competing session)\n", val, fds[i].name);
			else
				printf("%'20"PRIu64" %s (scaled from %.2f%% of time)\n", val, fds[i].name, ratio*100.0);

	}
}

int
parent(char **arg)
{
	perf_event_desc_t *fds;
	int status, ret, i, num;
	int ready[2], go[2];
	char buf;
	pid_t pid;

	num = 2;
	fds = calloc(num, sizeof(perf_event_desc_t));

	fds[0].name = "PERF_COUNT_HW_CPU_CYCLES";
	fds[0].hw.type = PERF_TYPE_HARDWARE;
        fds[0].hw.size = sizeof(struct perf_event_attr);
	fds[0].hw.config = PERF_COUNT_HW_CPU_CYCLES;
	fds[1].name = "PERF_COUNT_HW_INSTRUCTIONS";
        fds[1].hw.size = sizeof(struct perf_event_attr);
	fds[1].hw.type = PERF_TYPE_HARDWARE;
	fds[1].hw.config = PERF_COUNT_HW_INSTRUCTIONS;

	ret = pipe(ready);
	if (ret)
		fprintf(stderr, "cannot create pipe ready");

	ret = pipe(go);
	if (ret)
		fprintf(stderr, "cannot create pipe go");


	/*
	 * Create the child task
	 */
	if ((pid = fork()) == -1)
		fprintf(stderr, "Cannot fork process");

	/*
	 * and launch the child code
	 *
	 * The pipe is used to avoid a race condition
	 * between for() and exec(). We need the pid
	 * of the new tak but we want to start measuring
	 * at the first user level instruction. Thus we
	 * need to prevent exec until we have attached
	 * the events.
	 */
	if (pid == 0) {
		close(ready[0]);
		close(go[1]);

		/*
		 * let the parent know we exist
		 */
		close(ready[1]);
		if (read(go[0], &buf, 1) == -1)
			fprintf(stderr, "unable to read go_pipe");


		exit(child(arg));
	}

	close(ready[1]);
	close(go[0]);

	if (read(ready[0], &buf, 1) == -1)
		fprintf(stderr, "unable to read child_ready_pipe");

	close(ready[0]);

	fds[0].fd = -1;
	//fds[0].hw.read_format = PERF_FORMAT_GROUP;
	for(i = 0; i < num; i++) {
		/*
		 * create leader disabled with enable_on-exec
		 */
		fds[i].hw.disabled = (i == 0);
		fds[i].hw.enable_on_exec = (i == 0);
		fds[i].hw.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
			 PERF_FORMAT_TOTAL_TIME_RUNNING;

		fds[i].fd = sys_perf_event_open(&fds[i].hw, pid, -1, fds[0].fd, 0);
		if (fds[i].fd == -1) {
			warn("cannot attach event %d %s", i, fds[i].name);
			goto error;
		}
	}	

	close(go[1]);

	waitpid(pid, &status, 0);

	print_counts(fds, num);

	for(i = 0; i < num; i++)
		close(fds[i].fd);

	free(fds);
	return 0;
error:
	free(fds);
	kill(SIGKILL, pid);
	return -1;
}


int
main(int argc, char **argv)
{
	int c;

	if (argc < 2) {
		fprintf(stderr, "you must specify a command to execute\n");
	} 

	return parent(&argv[1]);
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
  2010-05-08  1:56 [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect Corey Ashford
@ 2010-05-08  2:24 ` Corey Ashford
  2010-05-11 14:42 ` Peter Zijlstra
  1 sibling, 0 replies; 15+ messages in thread
From: Corey Ashford @ 2010-05-08  2:24 UTC (permalink / raw)
  To: LKML, Peter Zijlstra, Paul Mackerras; +Cc: Stephane Eranian

On 05/07/2010 06:56 PM, Corey Ashford wrote:
> Hi,
>
> There appears to be a bug in the kernel related to reading up the time
> running and enabled values for events that are in a group. The group
> leader's time running and time enable values look correct, but all of
> the other group members have a zero value for their time running and
> time enabled fields.
>
> This happens only when remote monitoring a process (perhaps only after
> it has terminated)... when self monitoring, the time running/enabled
> values come out non-zero, and the values are the same for all of the
> counters (as one would expect since they can be enabled/disabled
> simultaneously).
>
> I've attached a test case which you can place in the tools/perf
> subdirectory and compile with just:
>
> gcc -o show_re_bug show_re_bug.c

I forgot to mention that you should run this with some sort of the load 
program, like

./show_re_bug md5sum `which gdb`

- Corey

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
  2010-05-08  1:56 [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect Corey Ashford
  2010-05-08  2:24 ` Corey Ashford
@ 2010-05-11 14:42 ` Peter Zijlstra
  2010-05-11 15:43   ` [tip:perf/core] perf: Fix exit() vs event-groups tip-bot for Peter Zijlstra
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-11 14:42 UTC (permalink / raw)
  To: Corey Ashford; +Cc: LKML, Paul Mackerras, Stephane Eranian

On Fri, 2010-05-07 at 18:56 -0700, Corey Ashford wrote:
> Hi,
> 
> There appears to be a bug in the kernel related to reading up the time 
> running and enabled values for events that are in a group.  The group 
> leader's time running and time enable values look correct, but all of 
> the other group members have a zero value for their time running and 
> time enabled fields.
> 
> This happens only when remote monitoring a process (perhaps only after 
> it has terminated)... when self monitoring, the time running/enabled 
> values come out non-zero, and the values are the same for all of the 
> counters (as one would expect since they can be enabled/disabled 
> simultaneously).
> 
> I've attached a test case which you can place in the tools/perf 
> subdirectory and compile with just:
> 
> gcc -o show_re_bug show_re_bug.c

The below seems to fix this for me.

---
Subject: perf: Fix exit() vs event-groups
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue May 11 16:19:10 CEST 2010

Corey reported that the value scale times of group siblings are not
updated when the monitored task dies.

The problem appears to be that we only update the group leader's
time values, fix it by updating the whole group.

Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: stable@kernel.org
---
 kernel/perf_event.c |   44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -255,6 +255,18 @@ static void update_event_times(struct pe
 	event->total_time_running = run_end - event->tstamp_running;
 }
 
+/*
+ * Update total_time_enabled and total_time_running for all events in a group.
+ */
+static void update_group_times(struct perf_event *leader)
+{
+	struct perf_event *event;
+
+	update_event_times(leader);
+	list_for_each_entry(event, &leader->sibling_list, group_entry)
+		update_event_times(event);
+}
+
 static struct list_head *
 ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
 {
@@ -320,7 +332,7 @@ list_del_event(struct perf_event *event,
 	if (event->group_leader != event)
 		event->group_leader->nr_siblings--;
 
-	update_event_times(event);
+	update_group_times(event);
 
 	/*
 	 * If event was in error state, then keep it
@@ -502,18 +514,6 @@ retry:
 }
 
 /*
- * Update total_time_enabled and total_time_running for all events in a group.
- */
-static void update_group_times(struct perf_event *leader)
-{
-	struct perf_event *event;
-
-	update_event_times(leader);
-	list_for_each_entry(event, &leader->sibling_list, group_entry)
-		update_event_times(event);
-}
-
-/*
  * Cross CPU call to disable a performance event
  */
 static void __perf_event_disable(void *info)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf: Fix exit() vs event-groups
  2010-05-11 14:42 ` Peter Zijlstra
@ 2010-05-11 15:43   ` tip-bot for Peter Zijlstra
  2010-05-11 19:55   ` [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect stephane eranian
  2010-05-12 17:25   ` Corey Ashford
  2 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-05-11 15:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, acme, paulus, hpa, mingo, a.p.zijlstra, efault,
	fweisbec, tglx, cjashfor, mingo

Commit-ID:  96c21a460a37880abfbc8445d5b098dbab958a29
Gitweb:     http://git.kernel.org/tip/96c21a460a37880abfbc8445d5b098dbab958a29
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 11 May 2010 16:19:10 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 11 May 2010 17:08:24 +0200

perf: Fix exit() vs event-groups

Corey reported that the value scale times of group siblings are not
updated when the monitored task dies.

The problem appears to be that we only update the group leader's
time values, fix it by updating the whole group.

Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org> # .34.x
LKML-Reference: <1273588935.1810.6.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/perf_event.c |   26 +++++++++++++-------------
 1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index c97e825..a4fa381 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -255,6 +255,18 @@ static void update_event_times(struct perf_event *event)
 	event->total_time_running = run_end - event->tstamp_running;
 }
 
+/*
+ * Update total_time_enabled and total_time_running for all events in a group.
+ */
+static void update_group_times(struct perf_event *leader)
+{
+	struct perf_event *event;
+
+	update_event_times(leader);
+	list_for_each_entry(event, &leader->sibling_list, group_entry)
+		update_event_times(event);
+}
+
 static struct list_head *
 ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
 {
@@ -320,7 +332,7 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	if (event->group_leader != event)
 		event->group_leader->nr_siblings--;
 
-	update_event_times(event);
+	update_group_times(event);
 
 	/*
 	 * If event was in error state, then keep it
@@ -502,18 +514,6 @@ retry:
 }
 
 /*
- * Update total_time_enabled and total_time_running for all events in a group.
- */
-static void update_group_times(struct perf_event *leader)
-{
-	struct perf_event *event;
-
-	update_event_times(leader);
-	list_for_each_entry(event, &leader->sibling_list, group_entry)
-		update_event_times(event);
-}
-
-/*
  * Cross CPU call to disable a performance event
  */
 static void __perf_event_disable(void *info)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled /  running values are incorrect
  2010-05-11 14:42 ` Peter Zijlstra
  2010-05-11 15:43   ` [tip:perf/core] perf: Fix exit() vs event-groups tip-bot for Peter Zijlstra
@ 2010-05-11 19:55   ` stephane eranian
  2010-05-11 20:11     ` Peter Zijlstra
  2010-05-12 17:25   ` Corey Ashford
  2 siblings, 1 reply; 15+ messages in thread
From: stephane eranian @ 2010-05-11 19:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Corey Ashford, LKML, Paul Mackerras, eranian

Hi,

I believe there is another bug related to timing and I am not
sure if this patch fixes it too.

If you add TIME_ENABLED|TIME_RUNNING in samples, they
will get zero all the time (ENA, RUN):

$ libpfm/perf_examples/task_smpl noploop 4
period=240000000 freq=0
58  PERF_COUNT_HW_CPU_CYCLES
59  PERF_COUNT_HW_INSTRUCTIONS
noploop for 4 seconds

IIP:0x0000000000400651 PID:10613 TID:10613 TIME:27480525003765
STREAM_ID:58 PERIOD:240000000 ENA=0 RUN=0 NR=2
	240000257 PERF_COUNT_HW_CPU_CYCLES (58)
	239740143 PERF_COUNT_HW_INSTRUCTIONS (59)

IP:0x0000000000400651 PID:10613 TID:10613 TIME:27480626252349
STREAM_ID:58 PERIOD:240000000 ENA=0 RUN=0 NR=2
	480000257 PERF_COUNT_HW_CPU_CYCLES (58)
	479688363 PERF_COUNT_HW_INSTRUCTIONS (59)

Events are also grouped in this example. Both this same issue exists
also when only
one event is used. I suspect an update_event_times() or
update_group_times() is also
missing on the sampling path in perf_overflow_handler().


On Tue, May 11, 2010 at 4:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2010-05-07 at 18:56 -0700, Corey Ashford wrote:
>> Hi,
>>
>> There appears to be a bug in the kernel related to reading up the time
>> running and enabled values for events that are in a group.  The group
>> leader's time running and time enable values look correct, but all of
>> the other group members have a zero value for their time running and
>> time enabled fields.
>>
>> This happens only when remote monitoring a process (perhaps only after
>> it has terminated)... when self monitoring, the time running/enabled
>> values come out non-zero, and the values are the same for all of the
>> counters (as one would expect since they can be enabled/disabled
>> simultaneously).
>>
>> I've attached a test case which you can place in the tools/perf
>> subdirectory and compile with just:
>>
>> gcc -o show_re_bug show_re_bug.c
>
> The below seems to fix this for me.
>
> ---
> Subject: perf: Fix exit() vs event-groups
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Tue May 11 16:19:10 CEST 2010
>
> Corey reported that the value scale times of group siblings are not
> updated when the monitored task dies.
>
> The problem appears to be that we only update the group leader's
> time values, fix it by updating the whole group.
>
> Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: stable@kernel.org
> ---
>  kernel/perf_event.c |   44 +++++++++++++++++++++++---------------------
>  1 file changed, 23 insertions(+), 21 deletions(-)
>
> Index: linux-2.6/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/kernel/perf_event.c
> +++ linux-2.6/kernel/perf_event.c
> @@ -255,6 +255,18 @@ static void update_event_times(struct pe
>        event->total_time_running = run_end - event->tstamp_running;
>  }
>
> +/*
> + * Update total_time_enabled and total_time_running for all events in a group.
> + */
> +static void update_group_times(struct perf_event *leader)
> +{
> +       struct perf_event *event;
> +
> +       update_event_times(leader);
> +       list_for_each_entry(event, &leader->sibling_list, group_entry)
> +               update_event_times(event);
> +}
> +
>  static struct list_head *
>  ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
>  {
> @@ -320,7 +332,7 @@ list_del_event(struct perf_event *event,
>        if (event->group_leader != event)
>                event->group_leader->nr_siblings--;
>
> -       update_event_times(event);
> +       update_group_times(event);
>
>        /*
>         * If event was in error state, then keep it
> @@ -502,18 +514,6 @@ retry:
>  }
>
>  /*
> - * Update total_time_enabled and total_time_running for all events in a group.
> - */
> -static void update_group_times(struct perf_event *leader)
> -{
> -       struct perf_event *event;
> -
> -       update_event_times(leader);
> -       list_for_each_entry(event, &leader->sibling_list, group_entry)
> -               update_event_times(event);
> -}
> -
> -/*
>  * Cross CPU call to disable a performance event
>  */
>  static void __perf_event_disable(void *info)
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled /  running values are incorrect
  2010-05-11 19:55   ` [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect stephane eranian
@ 2010-05-11 20:11     ` Peter Zijlstra
  2010-05-11 20:23       ` Stephane Eranian
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-11 20:11 UTC (permalink / raw)
  To: eranian; +Cc: Corey Ashford, LKML, Paul Mackerras, eranian

On Tue, 2010-05-11 at 21:55 +0200, stephane eranian wrote:
> Events are also grouped in this example. Both this same issue exists
> also when only
> one event is used. I suspect an update_event_times() or
> update_group_times() is also
> missing on the sampling path in perf_overflow_handler().

Hrmm, you're right, except fixing that is non-trivial.
update_event_times() wants to be called with ctx->lock held and we can't
do that from NMI context.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled /  running values are incorrect
  2010-05-11 20:11     ` Peter Zijlstra
@ 2010-05-11 20:23       ` Stephane Eranian
  2010-05-11 20:27         ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Eranian @ 2010-05-11 20:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: eranian, Corey Ashford, LKML, Paul Mackerras

On Tue, May 11, 2010 at 10:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2010-05-11 at 21:55 +0200, stephane eranian wrote:
>> Events are also grouped in this example. Both this same issue exists
>> also when only
>> one event is used. I suspect an update_event_times() or
>> update_group_times() is also
>> missing on the sampling path in perf_overflow_handler().
>
> Hrmm, you're right, except fixing that is non-trivial.
> update_event_times() wants to be called with ctx->lock held and we can't
> do that from NMI context.
>
Why do you need the lock when the event is running on a CPU in NMI context?
What action issued remotely could modify the data used by this function without
the need for IPI?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled /  running values are incorrect
  2010-05-11 20:23       ` Stephane Eranian
@ 2010-05-11 20:27         ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-11 20:27 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: eranian, Corey Ashford, LKML, Paul Mackerras

On Tue, 2010-05-11 at 22:23 +0200, Stephane Eranian wrote:
> On Tue, May 11, 2010 at 10:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, 2010-05-11 at 21:55 +0200, stephane eranian wrote:
> >> Events are also grouped in this example. Both this same issue exists
> >> also when only
> >> one event is used. I suspect an update_event_times() or
> >> update_group_times() is also
> >> missing on the sampling path in perf_overflow_handler().
> >
> > Hrmm, you're right, except fixing that is non-trivial.
> > update_event_times() wants to be called with ctx->lock held and we can't
> > do that from NMI context.
> >
> Why do you need the lock when the event is running on a CPU in NMI context?
> What action issued remotely could modify the data used by this function without
> the need for IPI?

Not sure, would at the very least have to verify that. I'll try and have
a look.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
  2010-05-11 14:42 ` Peter Zijlstra
  2010-05-11 15:43   ` [tip:perf/core] perf: Fix exit() vs event-groups tip-bot for Peter Zijlstra
  2010-05-11 19:55   ` [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect stephane eranian
@ 2010-05-12 17:25   ` Corey Ashford
  2010-05-12 17:50     ` Peter Zijlstra
  2010-05-13  0:10     ` Paul Mackerras
  2 siblings, 2 replies; 15+ messages in thread
From: Corey Ashford @ 2010-05-12 17:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Paul Mackerras, Stephane Eranian

On 05/11/2010 07:42 AM, Peter Zijlstra wrote:
> ---
> Subject: perf: Fix exit() vs event-groups
> From: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Date: Tue May 11 16:19:10 CEST 2010
>
> Corey reported that the value scale times of group siblings are not
> updated when the monitored task dies.
>
> The problem appears to be that we only update the group leader's
> time values, fix it by updating the whole group.
>
> Reported-by: Corey Ashford<cjashfor@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras<paulus@samba.org>
> Cc: stable@kernel.org
> ---
>   kernel/perf_event.c |   44 +++++++++++++++++++++++---------------------
>   1 file changed, 23 insertions(+), 21 deletions(-)
>
> Index: linux-2.6/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/kernel/perf_event.c
> +++ linux-2.6/kernel/perf_event.c
> @@ -255,6 +255,18 @@ static void update_event_times(struct pe
>   	event->total_time_running = run_end - event->tstamp_running;
>   }
>
> +/*
> + * Update total_time_enabled and total_time_running for all events in a group.
> + */
...

Peter,

Thanks for looking into this problem!  Unfortunately, I've been unable 
to test your patch.  It doesn't apply to 2.6.33.3 and so I tried the 
-tip tree (which has this patch committed).  The kernel built fine, but 
crashed almost immediately during the boot-up process on my Power5 machine.

I don't have the crash details at hand, but they didn't look related to 
this perf change.

- Corey

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
  2010-05-12 17:25   ` Corey Ashford
@ 2010-05-12 17:50     ` Peter Zijlstra
  2010-05-12 18:15       ` Corey Ashford
  2010-05-12 18:42       ` Corey Ashford
  2010-05-13  0:10     ` Paul Mackerras
  1 sibling, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-12 17:50 UTC (permalink / raw)
  To: Corey Ashford; +Cc: LKML, Paul Mackerras, Stephane Eranian

On Wed, 2010-05-12 at 10:25 -0700, Corey Ashford wrote:

> Thanks for looking into this problem!  Unfortunately, I've been unable 
> to test your patch.  It doesn't apply to 2.6.33.3 

Strange the code looks very similar, the below is a diff against a
virgin v2.6.33.3.

---
 kernel/perf_event.c |   26 +++++++++++++-------------
 1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 32d0ae2..e8ae0c2 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -290,6 +290,18 @@ static void update_event_times(struct perf_event *event)
 }
 
 /*
+ * Update total_time_enabled and total_time_running for all events in a group.
+ */
+static void update_group_times(struct perf_event *leader)
+{
+	struct perf_event *event;
+
+	update_event_times(leader);
+	list_for_each_entry(event, &leader->sibling_list, group_entry)
+		update_event_times(event);
+}
+
+/*
  * Add a event from the lists for its context.
  * Must be called with ctx->mutex and ctx->lock held.
  */
@@ -337,7 +349,7 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	if (event->group_leader != event)
 		event->group_leader->nr_siblings--;
 
-	update_event_times(event);
+	update_group_times(event);
 
 	/*
 	 * If event was in error state, then keep it
@@ -508,18 +520,6 @@ retry:
 }
 
 /*
- * Update total_time_enabled and total_time_running for all events in a group.
- */
-static void update_group_times(struct perf_event *leader)
-{
-	struct perf_event *event;
-
-	update_event_times(leader);
-	list_for_each_entry(event, &leader->sibling_list, group_entry)
-		update_event_times(event);
-}
-
-/*
  * Cross CPU call to disable a performance event
  */
 static void __perf_event_disable(void *info)



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
  2010-05-12 17:50     ` Peter Zijlstra
@ 2010-05-12 18:15       ` Corey Ashford
  2010-05-12 18:42       ` Corey Ashford
  1 sibling, 0 replies; 15+ messages in thread
From: Corey Ashford @ 2010-05-12 18:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Paul Mackerras, Stephane Eranian

Thanks, Peter,

I had copied and pasted the patch from the web gossamer-threads lkml archive (I didn't have access to my email at the time), and had problems with the patch program complaining that the patch was in improper format.  I couldn't see anything wrong with what it was complaining about, so I assumed that it was something to do with the code it was actually patching.  Probably a bad assumption in retrospect.

I will try your patch for 2.6.33.3, and since I have access to my email again, there probably won't be any problems.   Stay tuned :)

- Corey

On 5/12/2010 10:50 AM, Peter Zijlstra wrote:
> On Wed, 2010-05-12 at 10:25 -0700, Corey Ashford wrote:
> 
>> Thanks for looking into this problem!  Unfortunately, I've been unable 
>> to test your patch.  It doesn't apply to 2.6.33.3 
> 
> Strange the code looks very similar, the below is a diff against a
> virgin v2.6.33.3.
> 
> ---
>  kernel/perf_event.c |   26 +++++++++++++-------------
>  1 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index 32d0ae2..e8ae0c2 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -290,6 +290,18 @@ static void update_event_times(struct perf_event *event)
>  }
> 
>  /*
> + * Update total_time_enabled and total_time_running for all events in a group.
> + */
> +static void update_group_times(struct perf_event *leader)
> +{
> +	struct perf_event *event;
> +
> +	update_event_times(leader);
> +	list_for_each_entry(event, &leader->sibling_list, group_entry)
> +		update_event_times(event);
> +}
> +
> +/*
>   * Add a event from the lists for its context.
>   * Must be called with ctx->mutex and ctx->lock held.
>   */
> @@ -337,7 +349,7 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
>  	if (event->group_leader != event)
>  		event->group_leader->nr_siblings--;
> 
> -	update_event_times(event);
> +	update_group_times(event);
> 
>  	/*
>  	 * If event was in error state, then keep it
> @@ -508,18 +520,6 @@ retry:
>  }
> 
>  /*
> - * Update total_time_enabled and total_time_running for all events in a group.
> - */
> -static void update_group_times(struct perf_event *leader)
> -{
> -	struct perf_event *event;
> -
> -	update_event_times(leader);
> -	list_for_each_entry(event, &leader->sibling_list, group_entry)
> -		update_event_times(event);
> -}
> -
> -/*
>   * Cross CPU call to disable a performance event
>   */
>  static void __perf_event_disable(void *info)
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Regards,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR 
503-578-3507 
cjashfor@us.ibm.com


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
  2010-05-12 17:50     ` Peter Zijlstra
  2010-05-12 18:15       ` Corey Ashford
@ 2010-05-12 18:42       ` Corey Ashford
  1 sibling, 0 replies; 15+ messages in thread
From: Corey Ashford @ 2010-05-12 18:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Paul Mackerras, Stephane Eranian

Peter,

I just tested your 2.6.33.3-specific patch, and it works great; I was able to remove the work-around in the user-space tool.

Thanks,

- Corey

On 5/12/2010 10:50 AM, Peter Zijlstra wrote:
> On Wed, 2010-05-12 at 10:25 -0700, Corey Ashford wrote:
> 
>> Thanks for looking into this problem!  Unfortunately, I've been unable 
>> to test your patch.  It doesn't apply to 2.6.33.3 
> 
> Strange the code looks very similar, the below is a diff against a
> virgin v2.6.33.3.
> 
> ---
>  kernel/perf_event.c |   26 +++++++++++++-------------
>  1 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index 32d0ae2..e8ae0c2 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -290,6 +290,18 @@ static void update_event_times(struct perf_event *event)
>  }
> 
>  /*
> + * Update total_time_enabled and total_time_running for all events in a group.
> + */
> +static void update_group_times(struct perf_event *leader)
> +{
> +	struct perf_event *event;
> +
> +	update_event_times(leader);
> +	list_for_each_entry(event, &leader->sibling_list, group_entry)
> +		update_event_times(event);
> +}
> +
> +/*
>   * Add a event from the lists for its context.
>   * Must be called with ctx->mutex and ctx->lock held.
>   */
> @@ -337,7 +349,7 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
>  	if (event->group_leader != event)
>  		event->group_leader->nr_siblings--;
> 
> -	update_event_times(event);
> +	update_group_times(event);
> 
>  	/*
>  	 * If event was in error state, then keep it
> @@ -508,18 +520,6 @@ retry:
>  }
> 
>  /*
> - * Update total_time_enabled and total_time_running for all events in a group.
> - */
> -static void update_group_times(struct perf_event *leader)
> -{
> -	struct perf_event *event;
> -
> -	update_event_times(leader);
> -	list_for_each_entry(event, &leader->sibling_list, group_entry)
> -		update_event_times(event);
> -}
> -
> -/*
>   * Cross CPU call to disable a performance event
>   */
>  static void __perf_event_disable(void *info)
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Regards,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR 
503-578-3507 
cjashfor@us.ibm.com


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
  2010-05-12 17:25   ` Corey Ashford
  2010-05-12 17:50     ` Peter Zijlstra
@ 2010-05-13  0:10     ` Paul Mackerras
  2010-05-13 10:17       ` Paul Mackerras
  1 sibling, 1 reply; 15+ messages in thread
From: Paul Mackerras @ 2010-05-13  0:10 UTC (permalink / raw)
  To: Corey Ashford; +Cc: Peter Zijlstra, LKML, Stephane Eranian

On Wed, May 12, 2010 at 10:25:10AM -0700, Corey Ashford wrote:

> Thanks for looking into this problem!  Unfortunately, I've been
> unable to test your patch.  It doesn't apply to 2.6.33.3 and so I
> tried the -tip tree (which has this patch committed).  The kernel
> built fine, but crashed almost immediately during the boot-up
> process on my Power5 machine.

Ugh.  I'll take a look today at that.

Paul.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
  2010-05-13  0:10     ` Paul Mackerras
@ 2010-05-13 10:17       ` Paul Mackerras
  2010-05-13 17:37         ` Corey Ashford
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Mackerras @ 2010-05-13 10:17 UTC (permalink / raw)
  To: Corey Ashford; +Cc: Peter Zijlstra, LKML, Stephane Eranian

On Thu, May 13, 2010 at 10:10:36AM +1000, Paul Mackerras wrote:

> Ugh.  I'll take a look today at that.

Current tip/perf/core boots up just fine for me on a POWER5+ partition
here.  If you're still seeing the problem, could you send me your
config?

Paul.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect
  2010-05-13 10:17       ` Paul Mackerras
@ 2010-05-13 17:37         ` Corey Ashford
  0 siblings, 0 replies; 15+ messages in thread
From: Corey Ashford @ 2010-05-13 17:37 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Peter Zijlstra, LKML, Stephane Eranian

On 05/13/2010 03:17 AM, Paul Mackerras wrote:
> On Thu, May 13, 2010 at 10:10:36AM +1000, Paul Mackerras wrote:
>
>> Ugh.  I'll take a look today at that.
>
> Current tip/perf/core boots up just fine for me on a POWER5+ partition
> here.  If you're still seeing the problem, could you send me your
> config?
>
> Paul.

I'll pull and give it another go today and send you the .config if it fails.

- Corey


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-05-13 17:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-08  1:56 [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect Corey Ashford
2010-05-08  2:24 ` Corey Ashford
2010-05-11 14:42 ` Peter Zijlstra
2010-05-11 15:43   ` [tip:perf/core] perf: Fix exit() vs event-groups tip-bot for Peter Zijlstra
2010-05-11 19:55   ` [BUG] perf_event: when events are grouped, the time enabled / running values are incorrect stephane eranian
2010-05-11 20:11     ` Peter Zijlstra
2010-05-11 20:23       ` Stephane Eranian
2010-05-11 20:27         ` Peter Zijlstra
2010-05-12 17:25   ` Corey Ashford
2010-05-12 17:50     ` Peter Zijlstra
2010-05-12 18:15       ` Corey Ashford
2010-05-12 18:42       ` Corey Ashford
2010-05-13  0:10     ` Paul Mackerras
2010-05-13 10:17       ` Paul Mackerras
2010-05-13 17:37         ` Corey Ashford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).