Re: [RFC][PATCH] perf: Implement read_group() PMU operation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: mingo@kernel.org, Michael Ellerman <mpe@ellerman.id.au>,
	Anton Blanchard <anton@au1.ibm.com>,
	Stephane Eranian <eranian@google.com>,
	Jiri Olsa <jolsa@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] perf: Implement read_group() PMU operation
Date: Thu, 12 Feb 2015 16:58:56 +0100	[thread overview]
Message-ID: <20150212155856.GC21418@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20150206025915.GA31650@us.ibm.com>

On Thu, Feb 05, 2015 at 06:59:15PM -0800, Sukadev Bhattiprolu wrote:
> From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Date: Thu Feb  5 20:56:20 EST 2015 -0300
> Subject: [RFC][PATCH] perf: Implement read_group() PMU operation
> 
> This is a lightly tested, exploratory patch to allow PMUs to return
> several counters at once. Appreciate any comments :-)
> 
> Unlike normal hardware PMCs, the 24x7 counters[1] in Power8 are stored
> in memory and accessed via a hypervisor call (HCALL).  A major aspect
> of the HCALL is that it allows retireving _SEVERAL_ counters at once
> (unlike regular PMCs, which are read one at a time).
> 
> This patch implements a ->read_group() PMU operation that tries to
> take advantage of this ability to read several counters at once.  A
> PMU that implements the ->read_group() operation would allow users
> to retrieve several counters at once and get a more consistent
> snapshot.
> 
> NOTE: 	This patch has a TODO in h_24x7_event_read_group() in that it
> 	still does multiple HCALLS. I think that can be optimized 
> 	independently, once the pmu->read_group() interface itself is
> 	finalized.
> 
> Appreciate comments on the ->read_group interface and best managing the
> interfaces between the core and PMU layers - eg: Ok for hv-24x7 PMU to
> to walk the ->sibling_list ?


> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3549,10 +3549,43 @@ static int perf_event_read_group(struct perf_event *event,

You also want perf_output_read_group().

>  	struct perf_event *leader = event->group_leader, *sub;
>  	int n = 0, size = 0, ret = -EFAULT;
>  	struct perf_event_context *ctx = leader->ctx;
> +	u64 *valuesp;
>  	u64 values[5];
> +	int use_group_read;
>  	u64 count, enabled, running;
> +	struct pmu *pmu = event->pmu;
> +
> +	/*
> +	 * If PMU supports group read and group read is requested,
> +	 * allocate memory before taking the mutex.
> +	 */
> +	use_group_read = 0;
> +	if ((read_format & PERF_FORMAT_GROUP) && pmu->read_group) {
> +		use_group_read++;
> +	}
> +
> +	if (use_group_read) {
> +		valuesp = kzalloc(leader->nr_siblings * sizeof(u64), GFP_KERNEL);
> +		if (!valuesp)
> +			return -ENOMEM;
> +	}

This seems 'sad', the hardware already knows how many it can maximally
use at once and can preallocate, right?

>  
>  	mutex_lock(&ctx->mutex);
> +
> +	if (use_group_read) {
> +		ret = pmu->read_group(leader, valuesp, leader->nr_siblings);
> +		if (ret >= 0) {
> +			size = ret * sizeof(u64);
> +
> +			ret = size;
> +			if (copy_to_user(buf, valuesp, size))
> +				ret = -EFAULT;
> +		}
> +
> +		kfree(valuesp);
> +		goto unlock;
> +	}
> +
>  	count = perf_event_read_value(leader, &enabled, &running);
>  
>  	values[n++] = 1 + leader->nr_siblings;

Since ->read() has a void return value, we can delay its effect, so I'm
currently thinking we might want to extend the transaction interface for
this; give pmu::start_txn() a flags argument to indicate scheduling
(add) or reading (read).

So we'd end up with something like:

	pmu->start_txn(pmu, PMU_TXN_READ);

	leader->read();

	for_each_sibling()
		sibling->read();

	pmu->commit_txn();

after which we can use the values updated by the read calls. The trivial
no-support implementation lets read do its immediate thing like it does
now.

A more complex driver can then collect the actual counter values and
execute one hypercall using its pre-allocated memory.

So no allocations in the core code, and no sibling iterations in the
driver code.

Would that work for you?

next prev parent reply	other threads:[~2015-02-12 15:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-06  2:59 [RFC][PATCH] perf: Implement read_group() PMU operation Sukadev Bhattiprolu
2015-02-12 15:58 ` Peter Zijlstra [this message]
2015-02-17  8:33   ` Sukadev Bhattiprolu
2015-02-17 10:03     ` Peter Zijlstra
2015-02-22 21:04 ` Cody P Schafer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150212155856.GC21418@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=anton@au1.ibm.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=sukadev@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox