Re: [PATCH 09/10] Define PERF_PMU_TXN_READ interface

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [PATCH 09/10] Define PERF_PMU_TXN_READ interface
Date: Thu, 13 Aug 2015 13:04:28 -0700	[thread overview]
Message-ID: <20150813200428.GA19057@us.ibm.com> (raw)
In-Reply-To: <20150812084545.GR16853@twins.programming.kicks-ass.net>

Peter Zijlstra [peterz@infradead.org] wrote:
| On Tue, Aug 11, 2015 at 09:14:00PM -0700, Sukadev Bhattiprolu wrote:
| > | +static void __perf_read_group_add(struct perf_event *leader, u64 read_format, u64 *values)
| > |  {
| > | +	struct perf_event *sub;
| > | +	int n = 1; /* skip @nr */
| > 
| > This n = 1 is to skip over the values[0] = 1 + nr_siblings in the
| > caller.
| > 
| > Anyway, in __perf_read_group_add() we always start with n = 1, however
| > ...
| > | 
| > | +	perf_event_read(leader, true);
| > | +
| > | +	/*
| > | +	 * Since we co-schedule groups, {enabled,running} times of siblings
| > | +	 * will be identical to those of the leader, so we only publish one
| > | +	 * set.
| > | +	 */
| > | +	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
| > | +		values[n++] += leader->total_time_enabled +
| > | +			atomic64_read(leader->child_total_time_enabled);
| 
| Note how this is an in-place addition,

Ah, yes, Sorry I missed that. It make sense now and my tests seem to
be running fine.

| 
| > | +	}
| > | 
| > | +	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) {
| > | +		values[n++] += leader->total_time_running +
| > | +			atomic64_read(leader->child_total_time_running);
| 
| and here,
| 
| > | +	}
| > | 
| > | +	/*
| > | +	 * Write {count,id} tuples for every sibling.
| > | +	 */
| > | +	values[n++] += perf_event_count(leader);
| 
| and here,
| 
| 
| > |  	if (read_format & PERF_FORMAT_ID)
| > |  		values[n++] = primary_event_id(leader);
| 
| and this will always assign the same value.
| 
| > | +	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
| > | +		values[n++] += perf_event_count(sub);
| > | +		if (read_format & PERF_FORMAT_ID)
| > | +			values[n++] = primary_event_id(sub);
| 
| Same for these, therefore,
| 
| > | +	}
| > | +}
| > | 
| > | +static int perf_read_group(struct perf_event *event,
| > | +				   u64 read_format, char __user *buf)
| > | +{
| > | +	struct perf_event *leader = event->group_leader, *child;
| > | +	struct perf_event_context *ctx = leader->ctx;
| > | +	int ret = leader->read_size;

One other question, We return leader->read_size but allocate/copy_to_user
the sibling's event->read_size. We consistently use read_format from the
'event' being read, rather than its 'group_leader', so we are ok in terms
of what we copy into values[] for each event in the group.

But, can the leader's read_format (and hence its read_size) differ from
its sibling's read_size? If so, in the current code, we return the event's
read_size but in the new code, we return the leader's read_size.

| > | +	u64 *values;
| > | 
| > | +	lockdep_assert_held(&ctx->mutex);
| > | 
| > | +	values = kzalloc(event->read_size);
| > | +	if (!values)
| > | +		return -ENOMEM;
| > | 
| > | +	values[0] = 1 + leader->nr_siblings;
| > | 
| > | +	/*
| > | +	 * By locking the child_mutex of the leader we effectively
| > | +	 * lock the child list of all siblings.. XXX explain how.
| > | +	 */
| > | +	mutex_lock(&leader->child_mutex);
| > | 
| > | +	__perf_read_group_add(leader, read_format, values);
| > 
| > ... we don't copy_to_user() here,
| > 
| > | +	list_for_each_entry(child, &leader->child_list, child_list)
| > | +		__perf_read_group_add(child, read_format, values);
| > 
| > so won't we overwrite the values[], if we always start at n = 1
| > in __perf_read_group_add()?
| 
| yes and no, we have to re-iterate the same values for each child as they
| all have the same group, but we add the time and count fields, we do not
| overwrite. The _add() suffix was supposed to be a hint ;-)
| 
| > | +	mutex_unlock(&leader->child_mutex);
| > | +
| > | +	if (copy_to_user(buf, values, event->read_size))
| > | +		ret = -EFAULT;
| > | +
| > | +	kfree(values);
| > | 
| > |  	return ret;
| > |  }
| 
| Where previously we would iterate the group and for each member
| iterate/sum all the child values together before copying the value out,
| we now, because we need to read groups together, need to first iterate
| the child list and sum whole groups.

WARNING: multiple messages have this Message-ID (diff)

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [PATCH 09/10] Define PERF_PMU_TXN_READ interface
Date: Thu, 13 Aug 2015 20:04:28 +0000	[thread overview]
Message-ID: <20150813200428.GA19057@us.ibm.com> (raw)
In-Reply-To: <20150812084545.GR16853@twins.programming.kicks-ass.net>

Peter Zijlstra [peterz@infradead.org] wrote:
| On Tue, Aug 11, 2015 at 09:14:00PM -0700, Sukadev Bhattiprolu wrote:
| > | +static void __perf_read_group_add(struct perf_event *leader, u64 read_format, u64 *values)
| > |  {
| > | +	struct perf_event *sub;
| > | +	int n = 1; /* skip @nr */
| > 
| > This n = 1 is to skip over the values[0] = 1 + nr_siblings in the
| > caller.
| > 
| > Anyway, in __perf_read_group_add() we always start with n = 1, however
| > ...
| > | 
| > | +	perf_event_read(leader, true);
| > | +
| > | +	/*
| > | +	 * Since we co-schedule groups, {enabled,running} times of siblings
| > | +	 * will be identical to those of the leader, so we only publish one
| > | +	 * set.
| > | +	 */
| > | +	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
| > | +		values[n++] += leader->total_time_enabled +
| > | +			atomic64_read(leader->child_total_time_enabled);
| 
| Note how this is an in-place addition,

Ah, yes, Sorry I missed that. It make sense now and my tests seem to
be running fine.

| 
| > | +	}
| > | 
| > | +	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) {
| > | +		values[n++] += leader->total_time_running +
| > | +			atomic64_read(leader->child_total_time_running);
| 
| and here,
| 
| > | +	}
| > | 
| > | +	/*
| > | +	 * Write {count,id} tuples for every sibling.
| > | +	 */
| > | +	values[n++] += perf_event_count(leader);
| 
| and here,
| 
| 
| > |  	if (read_format & PERF_FORMAT_ID)
| > |  		values[n++] = primary_event_id(leader);
| 
| and this will always assign the same value.
| 
| > | +	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
| > | +		values[n++] += perf_event_count(sub);
| > | +		if (read_format & PERF_FORMAT_ID)
| > | +			values[n++] = primary_event_id(sub);
| 
| Same for these, therefore,
| 
| > | +	}
| > | +}
| > | 
| > | +static int perf_read_group(struct perf_event *event,
| > | +				   u64 read_format, char __user *buf)
| > | +{
| > | +	struct perf_event *leader = event->group_leader, *child;
| > | +	struct perf_event_context *ctx = leader->ctx;
| > | +	int ret = leader->read_size;

One other question, We return leader->read_size but allocate/copy_to_user
the sibling's event->read_size. We consistently use read_format from the
'event' being read, rather than its 'group_leader', so we are ok in terms
of what we copy into values[] for each event in the group.

But, can the leader's read_format (and hence its read_size) differ from
its sibling's read_size? If so, in the current code, we return the event's
read_size but in the new code, we return the leader's read_size.

| > | +	u64 *values;
| > | 
| > | +	lockdep_assert_held(&ctx->mutex);
| > | 
| > | +	values = kzalloc(event->read_size);
| > | +	if (!values)
| > | +		return -ENOMEM;
| > | 
| > | +	values[0] = 1 + leader->nr_siblings;
| > | 
| > | +	/*
| > | +	 * By locking the child_mutex of the leader we effectively
| > | +	 * lock the child list of all siblings.. XXX explain how.
| > | +	 */
| > | +	mutex_lock(&leader->child_mutex);
| > | 
| > | +	__perf_read_group_add(leader, read_format, values);
| > 
| > ... we don't copy_to_user() here,
| > 
| > | +	list_for_each_entry(child, &leader->child_list, child_list)
| > | +		__perf_read_group_add(child, read_format, values);
| > 
| > so won't we overwrite the values[], if we always start at n = 1
| > in __perf_read_group_add()?
| 
| yes and no, we have to re-iterate the same values for each child as they
| all have the same group, but we add the time and count fields, we do not
| overwrite. The _add() suffix was supposed to be a hint ;-)
| 
| > | +	mutex_unlock(&leader->child_mutex);
| > | +
| > | +	if (copy_to_user(buf, values, event->read_size))
| > | +		ret = -EFAULT;
| > | +
| > | +	kfree(values);
| > | 
| > |  	return ret;
| > |  }
| 
| Where previously we would iterate the group and for each member
| iterate/sum all the child values together before copying the value out,
| we now, because we need to read groups together, need to first iterate
| the child list and sum whole groups.

next prev parent reply	other threads:[~2015-08-13 20:04 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-27  5:40 [PATCH v4 0/10] Implement group-read of events using txn interface Sukadev Bhattiprolu
2015-07-27  5:40 ` Sukadev Bhattiprolu
2015-07-27  5:40 ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 01/10] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 02/10] perf: Split perf_event_read() and perf_event_count() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 03/10] perf: Define perf_event_aggregate() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 04/10] perf: Rename perf_event_read_{one,group}, perf_read_hw Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 05/10] perf: Unroll perf_event_read_value() in perf_read_group() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 06/10] perf: Add return value for perf_event_read() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 07/10] perf: Add group parameter to perf_event_read() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 08/10] perf: Add return value to __perf_event_read() Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40 ` [PATCH 09/10] Define PERF_PMU_TXN_READ interface Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-08-06 12:10   ` Peter Zijlstra
2015-08-06 12:10     ` Peter Zijlstra
2015-08-12  4:14     ` Sukadev Bhattiprolu
2015-08-12  4:14       ` Sukadev Bhattiprolu
2015-08-12  8:45       ` Peter Zijlstra
2015-08-12  8:45         ` Peter Zijlstra
2015-08-13 20:04         ` Sukadev Bhattiprolu [this message]
2015-08-13 20:04           ` Sukadev Bhattiprolu
2015-08-13 20:47           ` Peter Zijlstra
2015-08-13 20:47             ` Peter Zijlstra
2015-07-27  5:40 ` [PATCH 10/10] powerpc/perf/hv-24x7: Use PMU_TXN_READ interface Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu
2015-07-27  5:40   ` Sukadev Bhattiprolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150813200428.GA19057@us.ibm.com \
    --to=sukadev@linux.vnet.ibm.com \
    --cc=acme@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infradead.org \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.