Re: [RFC][PATCH] perf: Implement read_group() PMU operation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org, Michael Ellerman <mpe@ellerman.id.au>,
	Anton Blanchard <anton@au1.ibm.com>,
	Stephane Eranian <eranian@google.com>,
	Jiri Olsa <jolsa@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] perf: Implement read_group() PMU operation
Date: Tue, 17 Feb 2015 00:33:12 -0800	[thread overview]
Message-ID: <20150217083312.GA16221@us.ibm.com> (raw)
In-Reply-To: <20150212155856.GC21418@twins.programming.kicks-ass.net>

Peter Zijlstra [peterz@infradead.org] wrote:
| > --- a/kernel/events/core.c
| > +++ b/kernel/events/core.c
| > @@ -3549,10 +3549,43 @@ static int perf_event_read_group(struct perf_event *event,
| 
| You also want perf_output_read_group().

Ok. Will look into it. We currently don't support sampling with
the 24x7 counters but we should make sure that the new interface
fits correctly.


| 
| >  	struct perf_event *leader = event->group_leader, *sub;
| >  	int n = 0, size = 0, ret = -EFAULT;
| >  	struct perf_event_context *ctx = leader->ctx;
| > +	u64 *valuesp;
| >  	u64 values[5];
| > +	int use_group_read;
| >  	u64 count, enabled, running;
| > +	struct pmu *pmu = event->pmu;
| > +
| > +	/*
| > +	 * If PMU supports group read and group read is requested,
| > +	 * allocate memory before taking the mutex.
| > +	 */
| > +	use_group_read = 0;
| > +	if ((read_format & PERF_FORMAT_GROUP) && pmu->read_group) {
| > +		use_group_read++;
| > +	}
| > +
| > +	if (use_group_read) {
| > +		valuesp = kzalloc(leader->nr_siblings * sizeof(u64), GFP_KERNEL);
| > +		if (!valuesp)
| > +			return -ENOMEM;
| > +	}
| 
| This seems 'sad', the hardware already knows how many it can maximally
| use at once and can preallocate, right?

Yeah :-) In a subsequent version, I got rid of the allocation by moving
the copy_to_user() into the PMU, but still needed to walk the sibling list.
| 
| >  
| >  	mutex_lock(&ctx->mutex);
| > +
| > +	if (use_group_read) {
| > +		ret = pmu->read_group(leader, valuesp, leader->nr_siblings);
| > +		if (ret >= 0) {
| > +			size = ret * sizeof(u64);
| > +
| > +			ret = size;
| > +			if (copy_to_user(buf, valuesp, size))
| > +				ret = -EFAULT;
| > +		}
| > +
| > +		kfree(valuesp);
| > +		goto unlock;
| > +	}
| > +
| >  	count = perf_event_read_value(leader, &enabled, &running);
| >  
| >  	values[n++] = 1 + leader->nr_siblings;
| 
| Since ->read() has a void return value, we can delay its effect, so I'm
| currently thinking we might want to extend the transaction interface for
| this; give pmu::start_txn() a flags argument to indicate scheduling
| (add) or reading (read).
| 
| So we'd end up with something like:
| 
| 	pmu->start_txn(pmu, PMU_TXN_READ);
| 
| 	leader->read();
| 
| 	for_each_sibling()
| 		sibling->read();
| 
| 	pmu->commit_txn();

So, each of the ->read() calls is really "appending a counter" to a
list of counters that the PMU should read and the values for the counters
(results of the read) are only available after the commit_txn() right? 

In which case, perf_event_read_group() would then follow this commit_txn()
with its "normal" read,  and the PMU would return the result cached during
->commit_txn(). If so, we need a way to invalidate the cached result ?

| 
| after which we can use the values updated by the read calls. The trivial
| no-support implementation lets read do its immediate thing like it does
| now.
| 
| A more complex driver can then collect the actual counter values and
| execute one hypercall using its pre-allocated memory.

the hypercall should happen  in the ->commit_txn() right ?

| 
| So no allocations in the core code, and no sibling iterations in the
| driver code.
| 
| Would that work for you?

I think it would, 

I am working on breaking up the underlying code along start/read/commit
lines, and hope to have it done later this week and then can experiment
more with this interface.

Appreciate the input.

Sukadev

next prev parent reply	other threads:[~2015-02-17  8:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-06  2:59 [RFC][PATCH] perf: Implement read_group() PMU operation Sukadev Bhattiprolu
2015-02-12 15:58 ` Peter Zijlstra
2015-02-17  8:33   ` Sukadev Bhattiprolu [this message]
2015-02-17 10:03     ` Peter Zijlstra
2015-02-22 21:04 ` Cody P Schafer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150217083312.GA16221@us.ibm.com \
    --to=sukadev@linux.vnet.ibm.com \
    --cc=acme@kernel.org \
    --cc=anton@au1.ibm.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.