perf_events: proposed fix for broken intr throttling (repost)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* perf_events: proposed fix for broken intr throttling (repost)
@ 2012-01-04 14:39 Stephane Eranian
  2012-01-04 21:24 ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Stephane Eranian @ 2012-01-04 14:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, gleb, asharma, vince, wcohen

[repost, misspelled Peter's email address]

Hi,

In running some tests with 3.2.0-rc7-tip, I noticed unexpected throttling
notification samples. I was using fixed period with a long enough period
that I could not possibly hit the default limit of 100000 samples/sec/cpu.

I investigated the matter and discovered that the following commit
is the culprit:

commit 0f5a2601284237e2ba089389fd75d67f77626cef
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Wed Nov 16 14:38:16 2011 +0100

    perf: Avoid a useless pmu_disable() in the perf-tick

The throttling mechanism REQUIRES that the hwc->interrupt counter be reset
at EACH timer tick. This is regardless of the fact that the counter is in fixed
period or frequency mode. The optimization introduced in this patch breaks this
by avoiding calling perf_ctx_adjust_freq() at each timer tick. For events with
fixed period, it would not adjust any period at all BUT it would reset the
throttling counter.

Given the way the throttling mechanism is implemented we cannot avoid doing
some work at each timer tick. Otherwise we loose many samples for no good
reasons.

One may also question the motivation behind checking the interrupt rate at
each timer tick rather than every second, i.e., average it out over a longer
period.

I see two solutions short term:
   1 - revert the commit above
   2 - special case the situation with no frequency-based sampling event

I have implemented solution 2 with the draft fix below. It does not invoke
perf_pmu_enable()/perf_pmu_disable().  I am not clear on whether or not this
is really needed in this case. Please advise.

Signed-off-by: Stephane Eranian <eranian@google.com>
---

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 91fb68a..d1fe81a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2325,6 +2325,37 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
 	}
 }

+static void perf_ctx_adjust_throttle(struct perf_event_context *ctx)
+{
+	struct perf_event *event;
+	struct hw_perf_event *hwc;
+	u64 interrupts;
+
+	raw_spin_lock(&ctx->lock);
+
+	list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
+		if (event->state != PERF_EVENT_STATE_ACTIVE)
+			continue;
+
+		if (!event_filter_match(event))
+			continue;
+
+		hwc = &event->hw;
+
+		interrupts = hwc->interrupts;
+		hwc->interrupts = 0;
+
+		/*
+		 * unthrottle events on the tick
+		 */
+		if (interrupts == MAX_INTERRUPTS) {
+			perf_log_throttle(event, 1);
+			event->pmu->start(event, 0);
+		}
+	}
+	raw_spin_unlock(&ctx->lock);
+}
+
 static void perf_ctx_adjust_freq(struct perf_event_context *ctx, u64 period)
 {
 	struct perf_event *event;
@@ -2445,10 +2476,24 @@ void perf_event_task_tick(void)
 {
 	struct list_head *head = &__get_cpu_var(rotation_list);
 	struct perf_cpu_context *cpuctx, *tmp;
+	struct perf_event_context *ctx;

 	WARN_ON(!irqs_disabled());

 	list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
+
+		/*
+		 * throttling counter must be reset at each tick
+		 * unthrottling must be done at each tick
+		 */
+		ctx = &cpuctx->ctx;
+		if (!ctx->nr_freq)
+			perf_ctx_adjust_throttle(&cpuctx->ctx);
+
+		ctx = cpuctx->task_ctx;
+		if (ctx && !ctx->nr_freq)
+			perf_ctx_adjust_throttle(ctx);
+
 		if (cpuctx->jiffies_interval == 1 ||
 				!(jiffies % cpuctx->jiffies_interval))
 			perf_rotate_context(cpuctx);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: perf_events: proposed fix for broken intr throttling (repost)
  2012-01-04 14:39 perf_events: proposed fix for broken intr throttling (repost) Stephane Eranian
@ 2012-01-04 21:24 ` Peter Zijlstra
  2012-01-04 21:33   ` Stephane Eranian
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2012-01-04 21:24 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, gleb, asharma, vince, wcohen

On Wed, 2012-01-04 at 15:39 +0100, Stephane Eranian wrote:

> In running some tests with 3.2.0-rc7-tip, I noticed unexpected throttling
> notification samples. I was using fixed period with a long enough period
> that I could not possibly hit the default limit of 100000 samples/sec/cpu.
> 
> I investigated the matter and discovered that the following commit
> is the culprit:
> 
> commit 0f5a2601284237e2ba089389fd75d67f77626cef
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Wed Nov 16 14:38:16 2011 +0100
> 
>     perf: Avoid a useless pmu_disable() in the perf-tick
> 
> 
> The throttling mechanism REQUIRES that the hwc->interrupt counter be reset
> at EACH timer tick. This is regardless of the fact that the counter is in fixed
> period or frequency mode. The optimization introduced in this patch breaks this
> by avoiding calling perf_ctx_adjust_freq() at each timer tick. For events with
> fixed period, it would not adjust any period at all BUT it would reset the
> throttling counter.
> 
> Given the way the throttling mechanism is implemented we cannot avoid doing
> some work at each timer tick. Otherwise we loose many samples for no good
> reasons.
> 
> One may also question the motivation behind checking the interrupt rate at
> each timer tick rather than every second, i.e., average it out over a longer
> period.

That also allows your system to be dead for longer..

> I see two solutions short term:
>    1 - revert the commit above
>    2 - special case the situation with no frequency-based sampling event
> 
> I have implemented solution 2 with the draft fix below. It does not invoke
> perf_pmu_enable()/perf_pmu_disable().  I am not clear on whether or not this
> is really needed in this case. Please advise.

I don't think it needs that, I do dislike the unconditional iterate all
events thing though. Maybe we can set some per-cpu state indicating
someone got throttled (rare under normal operation -- you'd hope) and
only iterate to unthrottle when we find this set.

I think the event scheduling resulting from migration will already
re-enable the event, avoiding the loss of unthrottle due to that..
although it would be good to verify that.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_events: proposed fix for broken intr throttling (repost)
  2012-01-04 21:24 ` Peter Zijlstra
@ 2012-01-04 21:33   ` Stephane Eranian
  2012-01-04 22:49     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Stephane Eranian @ 2012-01-04 21:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, gleb, asharma, vince, wcohen

On Wed, Jan 4, 2012 at 9:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2012-01-04 at 15:39 +0100, Stephane Eranian wrote:
>
>> In running some tests with 3.2.0-rc7-tip, I noticed unexpected throttling
>> notification samples. I was using fixed period with a long enough period
>> that I could not possibly hit the default limit of 100000 samples/sec/cpu.
>>
>> I investigated the matter and discovered that the following commit
>> is the culprit:
>>
>> commit 0f5a2601284237e2ba089389fd75d67f77626cef
>> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Date:   Wed Nov 16 14:38:16 2011 +0100
>>
>>     perf: Avoid a useless pmu_disable() in the perf-tick
>>
>>
>> The throttling mechanism REQUIRES that the hwc->interrupt counter be reset
>> at EACH timer tick. This is regardless of the fact that the counter is in fixed
>> period or frequency mode. The optimization introduced in this patch breaks this
>> by avoiding calling perf_ctx_adjust_freq() at each timer tick. For events with
>> fixed period, it would not adjust any period at all BUT it would reset the
>> throttling counter.
>>
>> Given the way the throttling mechanism is implemented we cannot avoid doing
>> some work at each timer tick. Otherwise we loose many samples for no good
>> reasons.
>>
>> One may also question the motivation behind checking the interrupt rate at
>> each timer tick rather than every second, i.e., average it out over a longer
>> period.
>
> That also allows your system to be dead for longer..
>
Yes, I concur...

>> I see two solutions short term:
>>    1 - revert the commit above
>>    2 - special case the situation with no frequency-based sampling event
>>
>> I have implemented solution 2 with the draft fix below. It does not invoke
>> perf_pmu_enable()/perf_pmu_disable().  I am not clear on whether or not this
>> is really needed in this case. Please advise.
>
> I don't think it needs that, I do dislike the unconditional iterate all
> events thing though. Maybe we can set some per-cpu state indicating
> someone got throttled (rare under normal operation -- you'd hope) and
> only iterate to unthrottle when we find this set.
>
Could try that too.

> I think the event scheduling resulting from migration will already
> re-enable the event, avoiding the loss of unthrottle due to that..
> although it would be good to verify that.
>
Yes, you're not dead forever, but still it is not acceptable as is.

Will code the solution you suggested.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_events: proposed fix for broken intr throttling (repost)
  2012-01-04 21:33   ` Stephane Eranian
@ 2012-01-04 22:49     ` Peter Zijlstra
  2012-01-04 23:02       ` Stephane Eranian
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2012-01-04 22:49 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, gleb, asharma, vince, wcohen

On Wed, 2012-01-04 at 21:33 +0000, Stephane Eranian wrote:
> > I don't think it needs that, I do dislike the unconditional iterate all
> > events thing though. Maybe we can set some per-cpu state indicating
> > someone got throttled (rare under normal operation -- you'd hope) and
> > only iterate to unthrottle when we find this set.
> >
> Could try that too.
> 
> > I think the event scheduling resulting from migration will already
> > re-enable the event, avoiding the loss of unthrottle due to that..
> > although it would be good to verify that.
> >
> Yes, you're not dead forever, but still it is not acceptable as is. 

Oh for sure, I didn't mean it like that. What I was getting at is a
counter getting throttled on one cpu, setting the per-cpu variable,
getting migrated and not getting unthrottled due to now living on
another cpu which doesn't have the per-cpu thing set.

If the scheduling resulting from the migration already unthrottles that
scenario can't happen.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_events: proposed fix for broken intr throttling (repost)
  2012-01-04 22:49     ` Peter Zijlstra
@ 2012-01-04 23:02       ` Stephane Eranian
  2012-01-05 13:08         ` Stephane Eranian
  0 siblings, 1 reply; 8+ messages in thread
From: Stephane Eranian @ 2012-01-04 23:02 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, gleb, asharma, vince, wcohen

On Wed, Jan 4, 2012 at 10:49 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2012-01-04 at 21:33 +0000, Stephane Eranian wrote:
>> > I don't think it needs that, I do dislike the unconditional iterate all
>> > events thing though. Maybe we can set some per-cpu state indicating
>> > someone got throttled (rare under normal operation -- you'd hope) and
>> > only iterate to unthrottle when we find this set.
>> >
>> Could try that too.
>>
>> > I think the event scheduling resulting from migration will already
>> > re-enable the event, avoiding the loss of unthrottle due to that..
>> > although it would be good to verify that.
>> >
>> Yes, you're not dead forever, but still it is not acceptable as is.
>
> Oh for sure, I didn't mean it like that. What I was getting at is a
> counter getting throttled on one cpu, setting the per-cpu variable,
> getting migrated and not getting unthrottled due to now living on
> another cpu which doesn't have the per-cpu thing set.
>
Yes, that is true.
I think that throttled counter needs to live in ctx and not per-cpu.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_events: proposed fix for broken intr throttling (repost)
  2012-01-04 23:02       ` Stephane Eranian
@ 2012-01-05 13:08         ` Stephane Eranian
  2012-01-05 13:19           ` Gleb Natapov
  0 siblings, 1 reply; 8+ messages in thread
From: Stephane Eranian @ 2012-01-05 13:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, gleb, asharma, vince, wcohen

Peter,

I looked into this some more this morning. I don't think your proposed
scheme can work.
Unless, I misunderstood you, you were suggesting that we could perhaps
use a lazy
approach in perf_event_task_tick() and walk the event list only when
we have, at least,  one
event to unthrottle, i.e., similar to what is done with nr_freq. That
cannot work. The problem is
that you'd let all events get throttled before you'd unthrottle them
in the next timer tick.
At each overflow, hwc->interrupt would get incremented until it
reached MAX_INTERRUPTS.
Then, the event would be stopped (throttled), you'd do
ctx->nr_throttled = 1. At the next
timer tick, perf_event_task_tick() would then unthrottle the event. In
that scheme, the
event would be throttled for at most a tick. But in fact, the event
never generated that
many overflows/tick to justify throttling.

I think there is no other way than what I suggested in my initial email:
  1- revert the nr_freq optimization
  2- reset hwc->interrupt on all events at each tick

Do you have a better idea?

On Wed, Jan 4, 2012 at 11:02 PM, Stephane Eranian <eranian@google.com> wrote:
> On Wed, Jan 4, 2012 at 10:49 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Wed, 2012-01-04 at 21:33 +0000, Stephane Eranian wrote:
>>> > I don't think it needs that, I do dislike the unconditional iterate all
>>> > events thing though. Maybe we can set some per-cpu state indicating
>>> > someone got throttled (rare under normal operation -- you'd hope) and
>>> > only iterate to unthrottle when we find this set.
>>> >
>>> Could try that too.
>>>
>>> > I think the event scheduling resulting from migration will already
>>> > re-enable the event, avoiding the loss of unthrottle due to that..
>>> > although it would be good to verify that.
>>> >
>>> Yes, you're not dead forever, but still it is not acceptable as is.
>>
>> Oh for sure, I didn't mean it like that. What I was getting at is a
>> counter getting throttled on one cpu, setting the per-cpu variable,
>> getting migrated and not getting unthrottled due to now living on
>> another cpu which doesn't have the per-cpu thing set.
>>
> Yes, that is true.
> I think that throttled counter needs to live in ctx and not per-cpu.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_events: proposed fix for broken intr throttling (repost)
  2012-01-05 13:08         ` Stephane Eranian
@ 2012-01-05 13:19           ` Gleb Natapov
  2012-01-05 13:31             ` Stephane Eranian
  0 siblings, 1 reply; 8+ messages in thread
From: Gleb Natapov @ 2012-01-05 13:19 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, linux-kernel, mingo, asharma, vince, wcohen

On Thu, Jan 05, 2012 at 01:08:41PM +0000, Stephane Eranian wrote:
> Peter,
> 
> I looked into this some more this morning. I don't think your proposed
> scheme can work.
> Unless, I misunderstood you, you were suggesting that we could perhaps
> use a lazy
> approach in perf_event_task_tick() and walk the event list only when
> we have, at least,  one
> event to unthrottle, i.e., similar to what is done with nr_freq. That
> cannot work. The problem is
> that you'd let all events get throttled before you'd unthrottle them
> in the next timer tick.
> At each overflow, hwc->interrupt would get incremented until it
> reached MAX_INTERRUPTS.
> Then, the event would be stopped (throttled), you'd do
> ctx->nr_throttled = 1. At the next
> timer tick, perf_event_task_tick() would then unthrottle the event. In
> that scheme, the
> event would be throttled for at most a tick. But in fact, the event
> never generated that
> many overflows/tick to justify throttling.
> 
> I think there is no other way than what I suggested in my initial email:
>   1- revert the nr_freq optimization
>   2- reset hwc->interrupt on all events at each tick
> 
I think my original patch did that: https://lkml.org/lkml/2011/11/15/114

--
			Gleb.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_events: proposed fix for broken intr throttling (repost)
  2012-01-05 13:19           ` Gleb Natapov
@ 2012-01-05 13:31             ` Stephane Eranian
  0 siblings, 0 replies; 8+ messages in thread
From: Stephane Eranian @ 2012-01-05 13:31 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Peter Zijlstra, linux-kernel, mingo, asharma, vince, wcohen

On Thu, Jan 5, 2012 at 1:19 PM, Gleb Natapov <gleb@redhat.com> wrote:
> On Thu, Jan 05, 2012 at 01:08:41PM +0000, Stephane Eranian wrote:
>> Peter,
>>
>> I looked into this some more this morning. I don't think your proposed
>> scheme can work.
>> Unless, I misunderstood you, you were suggesting that we could perhaps
>> use a lazy
>> approach in perf_event_task_tick() and walk the event list only when
>> we have, at least,  one
>> event to unthrottle, i.e., similar to what is done with nr_freq. That
>> cannot work. The problem is
>> that you'd let all events get throttled before you'd unthrottle them
>> in the next timer tick.
>> At each overflow, hwc->interrupt would get incremented until it
>> reached MAX_INTERRUPTS.
>> Then, the event would be stopped (throttled), you'd do
>> ctx->nr_throttled = 1. At the next
>> timer tick, perf_event_task_tick() would then unthrottle the event. In
>> that scheme, the
>> event would be throttled for at most a tick. But in fact, the event
>> never generated that
>> many overflows/tick to justify throttling.
>>
>> I think there is no other way than what I suggested in my initial email:
>>   1- revert the nr_freq optimization
>>   2- reset hwc->interrupt on all events at each tick
>>
> I think my original patch did that: https://lkml.org/lkml/2011/11/15/114
>
Yes, looks like it, because it is systematically calling perf_ctx_adjust_freq()
which does reset the hwc->interrupts counter on ALL events.

But the pmu_disabled trick is not very pretty. We have to find some
middle ground
solution here.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-01-05 13:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-04 14:39 perf_events: proposed fix for broken intr throttling (repost) Stephane Eranian
2012-01-04 21:24 ` Peter Zijlstra
2012-01-04 21:33   ` Stephane Eranian
2012-01-04 22:49     ` Peter Zijlstra
2012-01-04 23:02       ` Stephane Eranian
2012-01-05 13:08         ` Stephane Eranian
2012-01-05 13:19           ` Gleb Natapov
2012-01-05 13:31             ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox