public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
@ 2011-10-07 20:42 Eric B Munson
  2011-10-10 23:38 ` Eric B Munson
  0 siblings, 1 reply; 14+ messages in thread
From: Eric B Munson @ 2011-10-07 20:42 UTC (permalink / raw)
  To: eranian; +Cc: a.p.zijlstra, mingo, anton, linux-kernel, paulus, hbabu

[-- Attachment #1: Type: text/plain, Size: 500 bytes --]

This commit seems to have caused a regression with oprofile.  It is fairly easy
to trigger, simply run oprofile monitoring an event that will fire (something
frequent like CPU cycles) causes oprofile to fail saying that the PMU is in use.
If I disable CONFIG_CGROUP_PERF, everything goes back to working.  I suspect the
problem is that the PMU is being initialized without being reserved for perf.  I
am not yet sure of the right fix yet so if you have any suggestions I would
appreciate them.

Eric

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-07 20:42 Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC Eric B Munson
@ 2011-10-10 23:38 ` Eric B Munson
  2011-10-11  7:44   ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Eric B Munson @ 2011-10-10 23:38 UTC (permalink / raw)
  To: eranian; +Cc: a.p.zijlstra, mingo, anton, linux-kernel, paulus, hbabu

[-- Attachment #1: Type: text/plain, Size: 1037 bytes --]

On Fri, 07 Oct 2011, Eric B Munson wrote:

> This commit seems to have caused a regression with oprofile.  It is fairly easy
> to trigger, simply run oprofile monitoring an event that will fire (something
> frequent like CPU cycles) causes oprofile to fail saying that the PMU is in use.
> If I disable CONFIG_CGROUP_PERF, everything goes back to working.  I suspect the
> problem is that the PMU is being initialized without being reserved for perf.  I
> am not yet sure of the right fix yet so if you have any suggestions I would
> appreciate them.
> 
> Eric

This isn't the best description of the behavior we see, what happens is at some
point in the profiling session the MMCR register is clobbered by
perf_cgroup_switch() which calls perf_pmu_enable() without reserving the PMC
hardware.  When this happens oprofile stops counting.  It doesn't happen each
time so some runs show event counts that are reasonable, but it can also lead to
event counts that are smaller than expected, or completely missing.

Eric

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-10 23:38 ` Eric B Munson
@ 2011-10-11  7:44   ` Peter Zijlstra
  2011-10-11  7:47     ` Stephane Eranian
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Peter Zijlstra @ 2011-10-11  7:44 UTC (permalink / raw)
  To: Eric B Munson; +Cc: eranian, mingo, anton, linux-kernel, paulus, hbabu

On Mon, 2011-10-10 at 19:38 -0400, Eric B Munson wrote:
> On Fri, 07 Oct 2011, Eric B Munson wrote:
> 
> > This commit seems to have caused a regression with oprofile.  It is fairly easy
> > to trigger, simply run oprofile monitoring an event that will fire (something
> > frequent like CPU cycles) causes oprofile to fail saying that the PMU is in use.
> > If I disable CONFIG_CGROUP_PERF, everything goes back to working.  I suspect the
> > problem is that the PMU is being initialized without being reserved for perf.  I
> > am not yet sure of the right fix yet so if you have any suggestions I would
> > appreciate them.
> > 
> > Eric
> 
> This isn't the best description of the behavior we see, what happens is at some
> point in the profiling session the MMCR register is clobbered by
> perf_cgroup_switch() which calls perf_pmu_enable() without reserving the PMC
> hardware.  When this happens oprofile stops counting.  It doesn't happen each
> time so some runs show event counts that are reasonable, but it can also lead to
> event counts that are smaller than expected, or completely missing.

What kernel are you testing?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-11  7:44   ` Peter Zijlstra
@ 2011-10-11  7:47     ` Stephane Eranian
  2011-10-11 13:34       ` Eric B Munson
  2011-10-11 13:32     ` Eric B Munson
  2011-10-12 16:51     ` Eric B Munson
  2 siblings, 1 reply; 14+ messages in thread
From: Stephane Eranian @ 2011-10-11  7:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Eric B Munson, mingo, anton, linux-kernel, paulus, hbabu

On Tue, Oct 11, 2011 at 9:44 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Mon, 2011-10-10 at 19:38 -0400, Eric B Munson wrote:
>> On Fri, 07 Oct 2011, Eric B Munson wrote:
>>
>> > This commit seems to have caused a regression with oprofile.  It is fairly easy
>> > to trigger, simply run oprofile monitoring an event that will fire (something
>> > frequent like CPU cycles) causes oprofile to fail saying that the PMU is in use.
>> > If I disable CONFIG_CGROUP_PERF, everything goes back to working.  I suspect the
>> > problem is that the PMU is being initialized without being reserved for perf.  I
>> > am not yet sure of the right fix yet so if you have any suggestions I would
>> > appreciate them.
>> >
>> > Eric
>>
>> This isn't the best description of the behavior we see, what happens is at some
>> point in the profiling session the MMCR register is clobbered by
>> perf_cgroup_switch() which calls perf_pmu_enable() without reserving the PMC
>> hardware.  When this happens oprofile stops counting.  It doesn't happen each
>> time so some runs show event counts that are reasonable, but it can also lead to
>> event counts that are smaller than expected, or completely missing.
>
> What kernel are you testing?
>
Looking at 3.1-rc9, I doubt it's coming from perf_cgroup_switch(). The function
is checking for perf context with at least of cgroup event before
calling perf_pmu_disable().
If there is no active perf context, there is no cgroup event, thus the
function is a nop.
Even if you have a competing perf session, it would have to have at
least one cgroup
event for this code to touch HW. The problem must come from somewhere else.

>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-11  7:44   ` Peter Zijlstra
  2011-10-11  7:47     ` Stephane Eranian
@ 2011-10-11 13:32     ` Eric B Munson
  2011-10-12 16:51     ` Eric B Munson
  2 siblings, 0 replies; 14+ messages in thread
From: Eric B Munson @ 2011-10-11 13:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: eranian, mingo, anton, linux-kernel, paulus, hbabu

[-- Attachment #1: Type: text/plain, Size: 1445 bytes --]

On Tue, 11 Oct 2011, Peter Zijlstra wrote:

> On Mon, 2011-10-10 at 19:38 -0400, Eric B Munson wrote:
> > On Fri, 07 Oct 2011, Eric B Munson wrote:
> > 
> > > This commit seems to have caused a regression with oprofile.  It is fairly easy
> > > to trigger, simply run oprofile monitoring an event that will fire (something
> > > frequent like CPU cycles) causes oprofile to fail saying that the PMU is in use.
> > > If I disable CONFIG_CGROUP_PERF, everything goes back to working.  I suspect the
> > > problem is that the PMU is being initialized without being reserved for perf.  I
> > > am not yet sure of the right fix yet so if you have any suggestions I would
> > > appreciate them.
> > > 
> > > Eric
> > 
> > This isn't the best description of the behavior we see, what happens is at some
> > point in the profiling session the MMCR register is clobbered by
> > perf_cgroup_switch() which calls perf_pmu_enable() without reserving the PMC
> > hardware.  When this happens oprofile stops counting.  It doesn't happen each
> > time so some runs show event counts that are reasonable, but it can also lead to
> > event counts that are smaller than expected, or completely missing.
> 
> What kernel are you testing?
> 

The tests first showed in Beta testing the new RHEL and SLES kernels, but the
perf cgroup code looks relatively unchanged since.  I have asked that the tests
be re-run with Linus' HEAD.

Eric

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-11  7:47     ` Stephane Eranian
@ 2011-10-11 13:34       ` Eric B Munson
  2011-10-11 13:35         ` Stephane Eranian
  0 siblings, 1 reply; 14+ messages in thread
From: Eric B Munson @ 2011-10-11 13:34 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, mingo, anton, linux-kernel, paulus, hbabu

[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]

On Tue, 11 Oct 2011, Stephane Eranian wrote:

> On Tue, Oct 11, 2011 at 9:44 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > On Mon, 2011-10-10 at 19:38 -0400, Eric B Munson wrote:
> >> On Fri, 07 Oct 2011, Eric B Munson wrote:
> >>
> >> > This commit seems to have caused a regression with oprofile.  It is fairly easy
> >> > to trigger, simply run oprofile monitoring an event that will fire (something
> >> > frequent like CPU cycles) causes oprofile to fail saying that the PMU is in use.
> >> > If I disable CONFIG_CGROUP_PERF, everything goes back to working.  I suspect the
> >> > problem is that the PMU is being initialized without being reserved for perf.  I
> >> > am not yet sure of the right fix yet so if you have any suggestions I would
> >> > appreciate them.
> >> >
> >> > Eric
> >>
> >> This isn't the best description of the behavior we see, what happens is at some
> >> point in the profiling session the MMCR register is clobbered by
> >> perf_cgroup_switch() which calls perf_pmu_enable() without reserving the PMC
> >> hardware.  When this happens oprofile stops counting.  It doesn't happen each
> >> time so some runs show event counts that are reasonable, but it can also lead to
> >> event counts that are smaller than expected, or completely missing.
> >
> > What kernel are you testing?
> >
> Looking at 3.1-rc9, I doubt it's coming from perf_cgroup_switch(). The function
> is checking for perf context with at least of cgroup event before
> calling perf_pmu_disable().
> If there is no active perf context, there is no cgroup event, thus the
> function is a nop.
> Even if you have a competing perf session, it would have to have at
> least one cgroup
> event for this code to touch HW. The problem must come from somewhere else.
> 
> >
> 

Does activating any cgroup setup a cgroup event?

Eric

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-11 13:34       ` Eric B Munson
@ 2011-10-11 13:35         ` Stephane Eranian
  0 siblings, 0 replies; 14+ messages in thread
From: Stephane Eranian @ 2011-10-11 13:35 UTC (permalink / raw)
  To: Eric B Munson; +Cc: Peter Zijlstra, mingo, anton, linux-kernel, paulus, hbabu

On Tue, Oct 11, 2011 at 3:34 PM, Eric B Munson <emunson@mgebm.net> wrote:
> On Tue, 11 Oct 2011, Stephane Eranian wrote:
>
>> On Tue, Oct 11, 2011 at 9:44 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> > On Mon, 2011-10-10 at 19:38 -0400, Eric B Munson wrote:
>> >> On Fri, 07 Oct 2011, Eric B Munson wrote:
>> >>
>> >> > This commit seems to have caused a regression with oprofile.  It is fairly easy
>> >> > to trigger, simply run oprofile monitoring an event that will fire (something
>> >> > frequent like CPU cycles) causes oprofile to fail saying that the PMU is in use.
>> >> > If I disable CONFIG_CGROUP_PERF, everything goes back to working.  I suspect the
>> >> > problem is that the PMU is being initialized without being reserved for perf.  I
>> >> > am not yet sure of the right fix yet so if you have any suggestions I would
>> >> > appreciate them.
>> >> >
>> >> > Eric
>> >>
>> >> This isn't the best description of the behavior we see, what happens is at some
>> >> point in the profiling session the MMCR register is clobbered by
>> >> perf_cgroup_switch() which calls perf_pmu_enable() without reserving the PMC
>> >> hardware.  When this happens oprofile stops counting.  It doesn't happen each
>> >> time so some runs show event counts that are reasonable, but it can also lead to
>> >> event counts that are smaller than expected, or completely missing.
>> >
>> > What kernel are you testing?
>> >
>> Looking at 3.1-rc9, I doubt it's coming from perf_cgroup_switch(). The function
>> is checking for perf context with at least of cgroup event before
>> calling perf_pmu_disable().
>> If there is no active perf context, there is no cgroup event, thus the
>> function is a nop.
>> Even if you have a competing perf session, it would have to have at
>> least one cgroup
>> event for this code to touch HW. The problem must come from somewhere else.
>>
>> >
>>
>
> Does activating any cgroup setup a cgroup event?
>
No, you need to explicitly request a cgroup event.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-11  7:44   ` Peter Zijlstra
  2011-10-11  7:47     ` Stephane Eranian
  2011-10-11 13:32     ` Eric B Munson
@ 2011-10-12 16:51     ` Eric B Munson
  2011-10-12 20:37       ` Stephane Eranian
  2 siblings, 1 reply; 14+ messages in thread
From: Eric B Munson @ 2011-10-12 16:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: eranian, mingo, anton, linux-kernel, paulus, hbabu

[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]

On Tue, 11 Oct 2011, Peter Zijlstra wrote:

> On Mon, 2011-10-10 at 19:38 -0400, Eric B Munson wrote:
> > On Fri, 07 Oct 2011, Eric B Munson wrote:
> > 
> > > This commit seems to have caused a regression with oprofile.  It is fairly easy
> > > to trigger, simply run oprofile monitoring an event that will fire (something
> > > frequent like CPU cycles) causes oprofile to fail saying that the PMU is in use.
> > > If I disable CONFIG_CGROUP_PERF, everything goes back to working.  I suspect the
> > > problem is that the PMU is being initialized without being reserved for perf.  I
> > > am not yet sure of the right fix yet so if you have any suggestions I would
> > > appreciate them.
> > > 
> > > Eric
> > 
> > This isn't the best description of the behavior we see, what happens is at some
> > point in the profiling session the MMCR register is clobbered by
> > perf_cgroup_switch() which calls perf_pmu_enable() without reserving the PMC
> > hardware.  When this happens oprofile stops counting.  It doesn't happen each
> > time so some runs show event counts that are reasonable, but it can also lead to
> > event counts that are smaller than expected, or completely missing.
> 
> What kernel are you testing?
> 

The problem seems to be fixed in 3.1-rc9 so I just need to identify the
patch(es) that fixed it.

Eric

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-12 16:51     ` Eric B Munson
@ 2011-10-12 20:37       ` Stephane Eranian
  2011-10-12 22:20         ` Eric B Munson
  2011-10-18 13:53         ` Eric B Munson
  0 siblings, 2 replies; 14+ messages in thread
From: Stephane Eranian @ 2011-10-12 20:37 UTC (permalink / raw)
  To: Eric B Munson; +Cc: Peter Zijlstra, mingo, anton, linux-kernel, paulus, hbabu

Could be:
   a8d757e perf events: Fix slow and broken cgroup context switch code


On Wed, Oct 12, 2011 at 6:51 PM, Eric B Munson <emunson@mgebm.net> wrote:
> On Tue, 11 Oct 2011, Peter Zijlstra wrote:
>
>> On Mon, 2011-10-10 at 19:38 -0400, Eric B Munson wrote:
>> > On Fri, 07 Oct 2011, Eric B Munson wrote:
>> >
>> > > This commit seems to have caused a regression with oprofile.  It is fairly easy
>> > > to trigger, simply run oprofile monitoring an event that will fire (something
>> > > frequent like CPU cycles) causes oprofile to fail saying that the PMU is in use.
>> > > If I disable CONFIG_CGROUP_PERF, everything goes back to working.  I suspect the
>> > > problem is that the PMU is being initialized without being reserved for perf.  I
>> > > am not yet sure of the right fix yet so if you have any suggestions I would
>> > > appreciate them.
>> > >
>> > > Eric
>> >
>> > This isn't the best description of the behavior we see, what happens is at some
>> > point in the profiling session the MMCR register is clobbered by
>> > perf_cgroup_switch() which calls perf_pmu_enable() without reserving the PMC
>> > hardware.  When this happens oprofile stops counting.  It doesn't happen each
>> > time so some runs show event counts that are reasonable, but it can also lead to
>> > event counts that are smaller than expected, or completely missing.
>>
>> What kernel are you testing?
>>
>
> The problem seems to be fixed in 3.1-rc9 so I just need to identify the
> patch(es) that fixed it.
>
> Eric
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-12 20:37       ` Stephane Eranian
@ 2011-10-12 22:20         ` Eric B Munson
  2011-10-18 13:53         ` Eric B Munson
  1 sibling, 0 replies; 14+ messages in thread
From: Eric B Munson @ 2011-10-12 22:20 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, mingo, anton, linux-kernel, paulus, hbabu

[-- Attachment #1: Type: text/plain, Size: 202 bytes --]

On Wed, 12 Oct 2011, Stephane Eranian wrote:

> Could be:
>    a8d757e perf events: Fix slow and broken cgroup context switch code
> 

Thanks for the pointer, I will have a look at that.

Eric

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-12 20:37       ` Stephane Eranian
  2011-10-12 22:20         ` Eric B Munson
@ 2011-10-18 13:53         ` Eric B Munson
  2011-10-18 14:43           ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Eric B Munson @ 2011-10-18 13:53 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, mingo, anton, linux-kernel, paulus, hbabu

[-- Attachment #1: Type: text/plain, Size: 330 bytes --]

On Wed, 12 Oct 2011, Stephane Eranian wrote:

> Could be:
>    a8d757e perf events: Fix slow and broken cgroup context switch code
> 

Thanks for the pointer, but the fix was in:
    facc4307 perf: Optimize event scheduling locking

This might be a candidate for stable given that oprofile is broken without it.

Eric

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-18 13:53         ` Eric B Munson
@ 2011-10-18 14:43           ` Peter Zijlstra
  2011-10-18 14:50             ` Stephane Eranian
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2011-10-18 14:43 UTC (permalink / raw)
  To: Eric B Munson; +Cc: Stephane Eranian, mingo, anton, linux-kernel, paulus, hbabu

On Tue, 2011-10-18 at 09:53 -0400, Eric B Munson wrote:
> On Wed, 12 Oct 2011, Stephane Eranian wrote:
> 
> > Could be:
> >    a8d757e perf events: Fix slow and broken cgroup context switch code
> > 
> 
> Thanks for the pointer, but the fix was in:
>     facc4307 perf: Optimize event scheduling locking
> 
> This might be a candidate for stable given that oprofile is broken without it.

I might feel much more confident about recommending that if someone
could explain why that patches fixes what exact problem.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-18 14:43           ` Peter Zijlstra
@ 2011-10-18 14:50             ` Stephane Eranian
  2011-10-18 15:20               ` Eric B Munson
  0 siblings, 1 reply; 14+ messages in thread
From: Stephane Eranian @ 2011-10-18 14:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Eric B Munson, mingo, anton, linux-kernel, paulus, hbabu

Hi,

I suspect it's because of this chunk:
        rcu_read_lock();

        list_for_each_entry_rcu(pmu, &pmus, entry) {
-
                cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);

-               perf_pmu_disable(cpuctx->ctx.pmu);
-
                /*
                 * perf_cgroup_events says at least one
                 * context on this CPU has cgroup events.
@@ -353,6 +366,8 @@ void perf_cgroup_switch(struct task_struct *task, int mode)
                 * events for a context.
                 */
                if (cpuctx->ctx.nr_cgroups > 0) {
+                       perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+                       perf_pmu_disable(cpuctx->ctx.pmu);


In other words, you don't call perf_pmu_disable() unless you know
you have cgroup events.

Without that, I think you will touch the PMU on cgroup switch and
that night conflict with another subsystem using the PMU, e.g. OProfile.


On Tue, Oct 18, 2011 at 4:43 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Tue, 2011-10-18 at 09:53 -0400, Eric B Munson wrote:
>> On Wed, 12 Oct 2011, Stephane Eranian wrote:
>>
>> > Could be:
>> >    a8d757e perf events: Fix slow and broken cgroup context switch code
>> >
>>
>> Thanks for the pointer, but the fix was in:
>>     facc4307 perf: Optimize event scheduling locking
>>
>> This might be a candidate for stable given that oprofile is broken without it.
>
> I might feel much more confident about recommending that if someone
> could explain why that patches fixes what exact problem.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC
  2011-10-18 14:50             ` Stephane Eranian
@ 2011-10-18 15:20               ` Eric B Munson
  0 siblings, 0 replies; 14+ messages in thread
From: Eric B Munson @ 2011-10-18 15:20 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, mingo, anton, linux-kernel, paulus, hbabu

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

On Tue, 18 Oct 2011, Stephane Eranian wrote:

> Hi,
> 
> I suspect it's because of this chunk:
>         rcu_read_lock();
> 
>         list_for_each_entry_rcu(pmu, &pmus, entry) {
> -
>                 cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
> 
> -               perf_pmu_disable(cpuctx->ctx.pmu);
> -
>                 /*
>                  * perf_cgroup_events says at least one
>                  * context on this CPU has cgroup events.
> @@ -353,6 +366,8 @@ void perf_cgroup_switch(struct task_struct *task, int mode)
>                  * events for a context.
>                  */
>                 if (cpuctx->ctx.nr_cgroups > 0) {
> +                       perf_ctx_lock(cpuctx, cpuctx->task_ctx);
> +                       perf_pmu_disable(cpuctx->ctx.pmu);
> 
> 
> In other words, you don't call perf_pmu_disable() unless you know
> you have cgroup events.
> 
> Without that, I think you will touch the PMU on cgroup switch and
> that night conflict with another subsystem using the PMU, e.g. OProfile.

Exactly, the inital bug was being caused by something in perf touching the MMCR0
register without reserving the hardware first (or noticing that oprofile had
already done so).  So oprofile runs would show abnormally low event counts or
counts that were not there if you were really unlucky.  This patch fixes the
problem because the MMCR0 register will not be touched unless there was there
were cgroup events active.

> 
> 
> On Tue, Oct 18, 2011 at 4:43 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > On Tue, 2011-10-18 at 09:53 -0400, Eric B Munson wrote:
> >> On Wed, 12 Oct 2011, Stephane Eranian wrote:
> >>
> >> > Could be:
> >> >    a8d757e perf events: Fix slow and broken cgroup context switch code
> >> >
> >>
> >> Thanks for the pointer, but the fix was in:
> >>     facc4307 perf: Optimize event scheduling locking
> >>
> >> This might be a candidate for stable given that oprofile is broken without it.
> >
> > I might feel much more confident about recommending that if someone
> > could explain why that patches fixes what exact problem.
> >
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-10-18 15:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-07 20:42 Oprofile Regression Caused by commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 on PPC Eric B Munson
2011-10-10 23:38 ` Eric B Munson
2011-10-11  7:44   ` Peter Zijlstra
2011-10-11  7:47     ` Stephane Eranian
2011-10-11 13:34       ` Eric B Munson
2011-10-11 13:35         ` Stephane Eranian
2011-10-11 13:32     ` Eric B Munson
2011-10-12 16:51     ` Eric B Munson
2011-10-12 20:37       ` Stephane Eranian
2011-10-12 22:20         ` Eric B Munson
2011-10-18 13:53         ` Eric B Munson
2011-10-18 14:43           ` Peter Zijlstra
2011-10-18 14:50             ` Stephane Eranian
2011-10-18 15:20               ` Eric B Munson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox