public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear
@ 2011-08-25 17:19 Stephane Eranian
  2011-08-25 17:32 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Stephane Eranian @ 2011-08-25 17:19 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, mingo, Robert Richter, Vince Weaver

Hi,

I was looking at the kernel code dealing with PERF_SAMPLE_READ. You use
this option if you want to capture the values of other events in your
event group
on overflow.

This is similar to what you can do with a read() on an event group
leader. If you've
setup PERF_FORMAT_READ, then you can read the values of the other events
in your event group.

The issue at stake here is what values for the other counters. In
particular, how
recent are those values? Ideally, you'd like those values to be as recent as the
value of the main event.

In the case of sampling, you'd like to capture the values of the other events at
the time of the overflow or very close to it.

In either case, you'd like to get a consistent view of the events,
i.e., take their
values as close as possible from each other.

In the case of read(), the values are all retrieved from the actual counters if
the event group is active. Thus, you get the most recent values
possible. If it's not
active, then it's been saved, and the SW counter represents the most recent
values.

In the case of sampling, however, it is not clear what you get.

The perf_output_read() routine does not read the actual counters.
Instead, it relies
on the SW counter, event->count,  updated via x86_perf_event_update()
who knows when.
I think this could be a problem as the 'snapshot' you're getting is
not really consistent.

I think the perf_output_read() function must read the actual counters
or force an update
of the SW counters before saving the counts into the buffer. Because
we are in the interrupt
handler, we are guaranteed to have the events in the actual counters.
But the difficulty is that
we cannot grab any locks, not sure we need one given the call path.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear
  2011-08-25 17:19 [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear Stephane Eranian
@ 2011-08-25 17:32 ` Peter Zijlstra
  2011-08-25 21:24   ` Stephane Eranian
  2011-08-26 12:02   ` Stephane Eranian
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2011-08-25 17:32 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, mingo, Robert Richter, Vince Weaver

On Thu, 2011-08-25 at 19:19 +0200, Stephane Eranian wrote:
> But the difficulty is that
> we cannot grab any locks, not sure we need one given the call path. 

Nah we should be able to simply iterate all siblings and update them
in-place, since its group members they should all be co-scheduled. The
only difficulty is cross pmu group members.. 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear
  2011-08-25 17:32 ` Peter Zijlstra
@ 2011-08-25 21:24   ` Stephane Eranian
  2011-08-26 12:02   ` Stephane Eranian
  1 sibling, 0 replies; 7+ messages in thread
From: Stephane Eranian @ 2011-08-25 21:24 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, Robert Richter, Vince Weaver

On Thu, Aug 25, 2011 at 7:32 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2011-08-25 at 19:19 +0200, Stephane Eranian wrote:
>> But the difficulty is that
>> we cannot grab any locks, not sure we need one given the call path.
>
> Nah we should be able to simply iterate all siblings and update them
> in-place, since its group members they should all be co-scheduled. The
> only difficulty is cross pmu group members..
>
Yes, that's what I thought, we should be able to invoke an arch callback
to update the SW counter then we proceed with perf_event_count().

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear
  2011-08-25 17:32 ` Peter Zijlstra
  2011-08-25 21:24   ` Stephane Eranian
@ 2011-08-26 12:02   ` Stephane Eranian
  2011-08-26 12:08     ` Stephane Eranian
  2011-08-26 12:13     ` Peter Zijlstra
  1 sibling, 2 replies; 7+ messages in thread
From: Stephane Eranian @ 2011-08-26 12:02 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, Robert Richter, Vince Weaver

On Thu, Aug 25, 2011 at 7:32 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2011-08-25 at 19:19 +0200, Stephane Eranian wrote:
>> But the difficulty is that
>> we cannot grab any locks, not sure we need one given the call path.
>
> Nah we should be able to simply iterate all siblings and update them
> in-place, since its group members they should all be co-scheduled. The
> only difficulty is cross pmu group members..
>
Are we allowing event from different PMU to be in the same event group?
If so, is that useful?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear
  2011-08-26 12:02   ` Stephane Eranian
@ 2011-08-26 12:08     ` Stephane Eranian
  2011-08-26 12:13     ` Peter Zijlstra
  1 sibling, 0 replies; 7+ messages in thread
From: Stephane Eranian @ 2011-08-26 12:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, Robert Richter, Vince Weaver

Ok, forget what I said, I think the current code is fine.

perf_output_read_group() does call event->pmu->read() which
calls x86_perf_event_update(). And this routine does read the
actual counter. So you do get the latest version possible.


On Fri, Aug 26, 2011 at 2:02 PM, Stephane Eranian <eranian@google.com> wrote:
> On Thu, Aug 25, 2011 at 7:32 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Thu, 2011-08-25 at 19:19 +0200, Stephane Eranian wrote:
>>> But the difficulty is that
>>> we cannot grab any locks, not sure we need one given the call path.
>>
>> Nah we should be able to simply iterate all siblings and update them
>> in-place, since its group members they should all be co-scheduled. The
>> only difficulty is cross pmu group members..
>>
> Are we allowing event from different PMU to be in the same event group?
> If so, is that useful?
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear
  2011-08-26 12:02   ` Stephane Eranian
  2011-08-26 12:08     ` Stephane Eranian
@ 2011-08-26 12:13     ` Peter Zijlstra
  2011-08-26 14:02       ` Stephane Eranian
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2011-08-26 12:13 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, mingo, Robert Richter, Vince Weaver

On Fri, 2011-08-26 at 14:02 +0200, Stephane Eranian wrote:
> On Thu, Aug 25, 2011 at 7:32 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2011-08-25 at 19:19 +0200, Stephane Eranian wrote:
> >> But the difficulty is that
> >> we cannot grab any locks, not sure we need one given the call path.
> >
> > Nah we should be able to simply iterate all siblings and update them
> > in-place, since its group members they should all be co-scheduled. The
> > only difficulty is cross pmu group members..
> >
> Are we allowing event from different PMU to be in the same event group?
> If so, is that useful?

We allow software events, which can always be scheduled, to be part of a
hardware group. We don't allow mixing of different hardware pmus.

Allowing a software event is useful if for example the hardware pmu
doesn't have a sampling interrupt.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear
  2011-08-26 12:13     ` Peter Zijlstra
@ 2011-08-26 14:02       ` Stephane Eranian
  0 siblings, 0 replies; 7+ messages in thread
From: Stephane Eranian @ 2011-08-26 14:02 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, Robert Richter, Vince Weaver

On Fri, Aug 26, 2011 at 2:13 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2011-08-26 at 14:02 +0200, Stephane Eranian wrote:
>> On Thu, Aug 25, 2011 at 7:32 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, 2011-08-25 at 19:19 +0200, Stephane Eranian wrote:
>> >> But the difficulty is that
>> >> we cannot grab any locks, not sure we need one given the call path.
>> >
>> > Nah we should be able to simply iterate all siblings and update them
>> > in-place, since its group members they should all be co-scheduled. The
>> > only difficulty is cross pmu group members..
>> >
>> Are we allowing event from different PMU to be in the same event group?
>> If so, is that useful?
>
> We allow software events, which can always be scheduled, to be part of a
> hardware group. We don't allow mixing of different hardware pmus.
>
ok, good.

> Allowing a software event is useful if for example the hardware pmu
> doesn't have a sampling interrupt.
>
I think that's a good reason.

Now, there is something else I noticed when using PERF_SAMPLE_READ.
The libpfm4 example, task_smpl, shows the problem very easily.

$ task_smpl -e unhalted_core_cycles:period=2400000 noploop 1

IIP:0x0000000040062d PID:3658 TID:3658 TIME:19141370205834 STREAM_ID:1
PERIOD:2400000 ENA=979088465 RUN=979088465
	2330400873 unhalted_core_cycles:period=2400000

IIP:0x0000000040062d PID:3658 TID:3658 TIME:19141371213765 STREAM_ID:1
PERIOD:2400000 ENA=980096325 RUN=980096325
	2332800869 unhalted_core_cycles:period=2400000

IIP:0x0000000040062d PID:3658 TID:3658 TIME:19141372221662 STREAM_ID:1
PERIOD:2400000 ENA=981104225 RUN=981104225
	2335200869 unhalted_core_cycles:period=2400000

IIP:0x0000000040062d PID:3658 TID:3658 TIME:19141373229569 STREAM_ID:1
PERIOD:2400000 ENA=982112126 RUN=982112126
	2337600869 unhalted_core_cycles:period=2400000

Here each sample, capture the value of the sampling event. I would expect the
value to increase by at least the period (2400000) between two
consecutive samples.
The reality is: it's not always the case. It's close, though. For instance:

2332800869-2330400873 = 2399996

How to interpret this? Did the counter overflow before the period?

Is it that in the previous sample, it took a few cycles to stop the
PMU, i.e., the counter was not zero
when the PMU was stopped? And thus, when we add the period, we
actually come up with a shorter
effective period. I think the code that implements this is in
x86_perf_event_set_period().

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-08-26 14:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-25 17:19 [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear Stephane Eranian
2011-08-25 17:32 ` Peter Zijlstra
2011-08-25 21:24   ` Stephane Eranian
2011-08-26 12:02   ` Stephane Eranian
2011-08-26 12:08     ` Stephane Eranian
2011-08-26 12:13     ` Peter Zijlstra
2011-08-26 14:02       ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox