Re: [PATCH 09/10] Define PERF_PMU_TXN_READ interface

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [PATCH 09/10] Define PERF_PMU_TXN_READ interface
Date: Tue, 11 Aug 2015 21:14:00 -0700	[thread overview]
Message-ID: <20150812041400.GA28426@us.ibm.com> (raw)
In-Reply-To: <20150806121029.GQ19282@twins.programming.kicks-ass.net>

Peter Zijlstra [peterz@infradead.org] wrote:
| On Sun, Jul 26, 2015 at 10:40:37PM -0700, Sukadev Bhattiprolu wrote:
| > @@ -3743,7 +3762,13 @@ static u64 perf_event_aggregate(struct perf_event *event, u64 *enabled,
| >  	lockdep_assert_held(&event->child_mutex);
| >  
| >  	list_for_each_entry(child, &event->child_list, child_list) {
| > +#if 0
| > +		/*
| > +		 * TODO: Do we need this read() for group events on PMUs that
| > +		 * 	 don't implement PERF_PMU_TXN_READ transactions?
| > +		 */
| >  		(void)perf_event_read(child, false);
| > +#endif
| >  		total += perf_event_count(child);
| >  		*enabled += child->total_time_enabled;
| >  		*running += child->total_time_running;
| 
| Aw gawd, I've been an idiot!!
| 
| I just realized this is a _CHILD_ loop, not a _SIBLING_ loop !!
| 
| We need to flip the loops in perf_read_group(), find attached two
| patches that go on top of 1,2,4.
| 
| After this you can add the perf_event_read() return value (just fold
| patches 6,8) after which you can do patch 10 (which has a broken
| Subject fwiw).

Thanks for the patches. I am building and testing, but have a question
on the second patch below:

<snip>


| Subject: perf: Invert perf_read_group() loops
| From: Peter Zijlstra <peterz@infradead.org>
| Date: Thu Aug 6 13:41:13 CEST 2015
| 
| In order to enable the use of perf_event_read(.group = true), we need
| to invert the sibling-child loop nesting of perf_read_group().
| 
| Currently we iterate the child list for each sibling, this precludes
| using group reads. Flip things around so we iterate each group for
| each child.
| 
| Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
| ---
|  kernel/events/core.c |   84 ++++++++++++++++++++++++++++++++-------------------
|  1 file changed, 54 insertions(+), 30 deletions(-)
| 
| --- a/kernel/events/core.c
| +++ b/kernel/events/core.c
| @@ -3809,50 +3809,74 @@ u64 perf_event_read_value(struct perf_ev
|  }
|  EXPORT_SYMBOL_GPL(perf_event_read_value);
| 
| -static int perf_read_group(struct perf_event *event,
| -				   u64 read_format, char __user *buf)
| +static void __perf_read_group_add(struct perf_event *leader, u64 read_format, u64 *values)
|  {
| -	struct perf_event *leader = event->group_leader, *sub;
| -	struct perf_event_context *ctx = leader->ctx;
| -	int n = 0, size = 0, ret;
| -	u64 count, enabled, running;
| -	u64 values[5];
| +	struct perf_event *sub;
| +	int n = 1; /* skip @nr */

This n = 1 is to skip over the values[0] = 1 + nr_siblings in the
caller.

Anyway, in __perf_read_group_add() we always start with n = 1, however
...
| 
| -	lockdep_assert_held(&ctx->mutex);
| +	perf_event_read(leader, true);
| +
| +	/*
| +	 * Since we co-schedule groups, {enabled,running} times of siblings
| +	 * will be identical to those of the leader, so we only publish one
| +	 * set.
| +	 */
| +	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
| +		values[n++] += leader->total_time_enabled +
| +			atomic64_read(leader->child_total_time_enabled);
| +	}
| 
| -	count = perf_event_read_value(leader, &enabled, &running);
| +	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) {
| +		values[n++] += leader->total_time_running +
| +			atomic64_read(leader->child_total_time_running);
| +	}
| 
| -	values[n++] = 1 + leader->nr_siblings;
| -	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
| -		values[n++] = enabled;
| -	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
| -		values[n++] = running;
| -	values[n++] = count;
| +	/*
| +	 * Write {count,id} tuples for every sibling.
| +	 */
| +	values[n++] += perf_event_count(leader);
|  	if (read_format & PERF_FORMAT_ID)
|  		values[n++] = primary_event_id(leader);
| 
| -	size = n * sizeof(u64);
| +	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
| +		values[n++] += perf_event_count(sub);
| +		if (read_format & PERF_FORMAT_ID)
| +			values[n++] = primary_event_id(sub);
| +	}
| +}
| 
| -	if (copy_to_user(buf, values, size))
| -		return -EFAULT;
| +static int perf_read_group(struct perf_event *event,
| +				   u64 read_format, char __user *buf)
| +{
| +	struct perf_event *leader = event->group_leader, *child;
| +	struct perf_event_context *ctx = leader->ctx;
| +	int ret = leader->read_size;
| +	u64 *values;
| 
| -	ret = size;
| +	lockdep_assert_held(&ctx->mutex);
| 
| -	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
| -		n = 0;
| +	values = kzalloc(event->read_size);
| +	if (!values)
| +		return -ENOMEM;
| 
| -		values[n++] = perf_event_read_value(sub, &enabled, &running);
| -		if (read_format & PERF_FORMAT_ID)
| -			values[n++] = primary_event_id(sub);
| +	values[0] = 1 + leader->nr_siblings;
| 
| -		size = n * sizeof(u64);
| +	/*
| +	 * By locking the child_mutex of the leader we effectively
| +	 * lock the child list of all siblings.. XXX explain how.
| +	 */
| +	mutex_lock(&leader->child_mutex);
| 
| -		if (copy_to_user(buf + ret, values, size)) {
| -			return -EFAULT;
| -		}
| +	__perf_read_group_add(leader, read_format, values);

... we don't copy_to_user() here,

| +	list_for_each_entry(child, &leader->child_list, child_list)
| +		__perf_read_group_add(child, read_format, values);

so won't we overwrite the values[], if we always start at n = 1
in __perf_read_group_add()?

| 

| -		ret += size;
| -	}
| +	mutex_unlock(&leader->child_mutex);
| +
| +	if (copy_to_user(buf, values, event->read_size))
| +		ret = -EFAULT;
| +
| +	kfree(values);
| 
|  	return ret;
|  }

WARNING: multiple messages have this Message-ID (diff)

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [PATCH 09/10] Define PERF_PMU_TXN_READ interface
Date: Wed, 12 Aug 2015 04:14:00 +0000	[thread overview]
Message-ID: <20150812041400.GA28426@us.ibm.com> (raw)
In-Reply-To: <20150806121029.GQ19282@twins.programming.kicks-ass.net>

Peter Zijlstra [peterz@infradead.org] wrote:
| On Sun, Jul 26, 2015 at 10:40:37PM -0700, Sukadev Bhattiprolu wrote:
| > @@ -3743,7 +3762,13 @@ static u64 perf_event_aggregate(struct perf_event *event, u64 *enabled,
| >  	lockdep_assert_held(&event->child_mutex);
| >  
| >  	list_for_each_entry(child, &event->child_list, child_list) {
| > +#if 0
| > +		/*
| > +		 * TODO: Do we need this read() for group events on PMUs that
| > +		 * 	 don't implement PERF_PMU_TXN_READ transactions?
| > +		 */
| >  		(void)perf_event_read(child, false);
| > +#endif
| >  		total += perf_event_count(child);
| >  		*enabled += child->total_time_enabled;
| >  		*running += child->total_time_running;
| 
| Aw gawd, I've been an idiot!!
| 
| I just realized this is a _CHILD_ loop, not a _SIBLING_ loop !!
| 
| We need to flip the loops in perf_read_group(), find attached two
| patches that go on top of 1,2,4.
| 
| After this you can add the perf_event_read() return value (just fold
| patches 6,8) after which you can do patch 10 (which has a broken
| Subject fwiw).

Thanks for the patches. I am building and testing, but have a question
on the second patch below:

<snip>


| Subject: perf: Invert perf_read_group() loops
| From: Peter Zijlstra <peterz@infradead.org>
| Date: Thu Aug 6 13:41:13 CEST 2015
| 
| In order to enable the use of perf_event_read(.group = true), we need
| to invert the sibling-child loop nesting of perf_read_group().
| 
| Currently we iterate the child list for each sibling, this precludes
| using group reads. Flip things around so we iterate each group for
| each child.
| 
| Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
| ---
|  kernel/events/core.c |   84 ++++++++++++++++++++++++++++++++-------------------
|  1 file changed, 54 insertions(+), 30 deletions(-)
| 
| --- a/kernel/events/core.c
| +++ b/kernel/events/core.c
| @@ -3809,50 +3809,74 @@ u64 perf_event_read_value(struct perf_ev
|  }
|  EXPORT_SYMBOL_GPL(perf_event_read_value);
| 
| -static int perf_read_group(struct perf_event *event,
| -				   u64 read_format, char __user *buf)
| +static void __perf_read_group_add(struct perf_event *leader, u64 read_format, u64 *values)
|  {
| -	struct perf_event *leader = event->group_leader, *sub;
| -	struct perf_event_context *ctx = leader->ctx;
| -	int n = 0, size = 0, ret;
| -	u64 count, enabled, running;
| -	u64 values[5];
| +	struct perf_event *sub;
| +	int n = 1; /* skip @nr */

This n = 1 is to skip over the values[0] = 1 + nr_siblings in the
caller.

Anyway, in __perf_read_group_add() we always start with n = 1, however
...
| 
| -	lockdep_assert_held(&ctx->mutex);
| +	perf_event_read(leader, true);
| +
| +	/*
| +	 * Since we co-schedule groups, {enabled,running} times of siblings
| +	 * will be identical to those of the leader, so we only publish one
| +	 * set.
| +	 */
| +	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
| +		values[n++] += leader->total_time_enabled +
| +			atomic64_read(leader->child_total_time_enabled);
| +	}
| 
| -	count = perf_event_read_value(leader, &enabled, &running);
| +	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) {
| +		values[n++] += leader->total_time_running +
| +			atomic64_read(leader->child_total_time_running);
| +	}
| 
| -	values[n++] = 1 + leader->nr_siblings;
| -	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
| -		values[n++] = enabled;
| -	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
| -		values[n++] = running;
| -	values[n++] = count;
| +	/*
| +	 * Write {count,id} tuples for every sibling.
| +	 */
| +	values[n++] += perf_event_count(leader);
|  	if (read_format & PERF_FORMAT_ID)
|  		values[n++] = primary_event_id(leader);
| 
| -	size = n * sizeof(u64);
| +	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
| +		values[n++] += perf_event_count(sub);
| +		if (read_format & PERF_FORMAT_ID)
| +			values[n++] = primary_event_id(sub);
| +	}
| +}
| 
| -	if (copy_to_user(buf, values, size))
| -		return -EFAULT;
| +static int perf_read_group(struct perf_event *event,
| +				   u64 read_format, char __user *buf)
| +{
| +	struct perf_event *leader = event->group_leader, *child;
| +	struct perf_event_context *ctx = leader->ctx;
| +	int ret = leader->read_size;
| +	u64 *values;
| 
| -	ret = size;
| +	lockdep_assert_held(&ctx->mutex);
| 
| -	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
| -		n = 0;
| +	values = kzalloc(event->read_size);
| +	if (!values)
| +		return -ENOMEM;
| 
| -		values[n++] = perf_event_read_value(sub, &enabled, &running);
| -		if (read_format & PERF_FORMAT_ID)
| -			values[n++] = primary_event_id(sub);
| +	values[0] = 1 + leader->nr_siblings;
| 
| -		size = n * sizeof(u64);
| +	/*
| +	 * By locking the child_mutex of the leader we effectively
| +	 * lock the child list of all siblings.. XXX explain how.
| +	 */
| +	mutex_lock(&leader->child_mutex);
| 
| -		if (copy_to_user(buf + ret, values, size)) {
| -			return -EFAULT;
| -		}
| +	__perf_read_group_add(leader, read_format, values);

... we don't copy_to_user() here,

| +	list_for_each_entry(child, &leader->child_list, child_list)
| +		__perf_read_group_add(child, read_format, values);

so won't we overwrite the values[], if we always start at n = 1
in __perf_read_group_add()?

| 

| -		ret += size;
| -	}
| +	mutex_unlock(&leader->child_mutex);
| +
| +	if (copy_to_user(buf, values, event->read_size))
| +		ret = -EFAULT;
| +
| +	kfree(values);
| 
|  	return ret;
|  }

next prev parent reply	other threads:[~2015-08-12  4:14 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-27  5:40 [PATCH v4 0/10] Implement group-read of events using txn interface Sukadev Bhattiprolu
2015-07-27  5:40 ` Sukadev Bhattiprolu
2015-07-27  5:40 ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 01/10] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 02/10] perf: Split perf_event_read() and perf_event_count() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 03/10] perf: Define perf_event_aggregate() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 04/10] perf: Rename perf_event_read_{one,group}, perf_read_hw Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 05/10] perf: Unroll perf_event_read_value() in perf_read_group() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 06/10] perf: Add return value for perf_event_read() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 07/10] perf: Add group parameter to perf_event_read() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 08/10] perf: Add return value to __perf_event_read() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 09/10] Define PERF_PMU_TXN_READ interface Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-08-06 12:10   ` Peter Zijlstra
2015-08-06 12:10     ` Peter Zijlstra
2015-08-12  4:14     ` Sukadev Bhattiprolu [this message]
2015-08-12  4:14       ` Sukadev Bhattiprolu
2015-08-12  8:45       ` Peter Zijlstra
2015-08-12  8:45         ` Peter Zijlstra
2015-08-13 20:04         ` Sukadev Bhattiprolu
2015-08-13 20:04           ` Sukadev Bhattiprolu
2015-08-13 20:47           ` Peter Zijlstra
2015-08-13 20:47             ` Peter Zijlstra
2015-07-27  5:40 ` [PATCH 10/10] powerpc/perf/hv-24x7: Use PMU_TXN_READ interface Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150812041400.GA28426@us.ibm.com \
    --to=sukadev@linux.vnet.ibm.com \
    --cc=acme@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infradead.org \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.