linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.37 kernel warning in perf_events code
@ 2011-02-10  2:50 Arun Sharma
  2011-02-10 19:08 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Arun Sharma @ 2011-02-10  2:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users

[ Not sure if this is known and fixed in newer kernels. My machine
becomes unusable after this warning triggers ]

perf record -g -p <pid> cs -o csw.data -- sleep 3

Feb 7 15:28:36 localhost kernel: WARNING: at kernel/perf_event.c:1302
task_ctx_sched_out+0x45/0x67()
..
Feb 7 15:28:36 localhost kernel: [<ffffffff8103c60d>]
warn_slowpath_null+0x15/0x17
Feb 7 15:28:36 localhost kernel: [<ffffffff8109e3b6>]
task_ctx_sched_out+0x45/0x67
Feb 7 15:28:36 localhost kernel: [<ffffffff8109e3b6>]
task_ctx_sched_out+0x45/0x67
Feb 7 15:28:36 localhost kernel: [<ffffffff810a207f>]
perf_event_exit_task+0xca/0x196
Feb 7 15:28:36 localhost kernel: [<ffffffff810a207f>]
perf_event_exit_task+0xca/0x196
Feb 7 15:28:36 localhost kernel: [<ffffffff8103fb3f>] do_exit+0x2c5/0x6e6
Feb 7 15:28:36 localhost kernel: [<ffffffff8103fb3f>] do_exit+0x2c5/0x6e6
Feb 7 15:28:36 localhost kernel: [<ffffffff81040021>] complete_and_exit+0x0/0x1e
Feb 7 15:28:36 localhost kernel: [<ffffffff81040021>] complete_and_exit+0x0/0x1e
Feb 7 15:28:36 localhost kernel: [<ffffffff8100293b>]
system_call_fastpath+0x16/0x1b
Feb 7 15:28:36 localhost kernel: [<ffffffff8100293b>]
system_call_fastpath+0x16/0x1b


1301
1302         if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
1303                 return;

 -Arun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.37 kernel warning in perf_events code
  2011-02-10  2:50 2.6.37 kernel warning in perf_events code Arun Sharma
@ 2011-02-10 19:08 ` Arnaldo Carvalho de Melo
  2011-02-10 20:02   ` Arun Sharma
  0 siblings, 1 reply; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2011-02-10 19:08 UTC (permalink / raw)
  To: Arun Sharma; +Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-perf-users

Em Wed, Feb 09, 2011 at 06:50:09PM -0800, Arun Sharma escreveu:
> [ Not sure if this is known and fixed in newer kernels. My machine
> becomes unusable after this warning triggers ]
> 
> perf record -g -p <pid> cs -o csw.data -- sleep 3

What kernel are you using? Peter, rings any bells?

- Arnaldo
 
> Feb 7 15:28:36 localhost kernel: WARNING: at kernel/perf_event.c:1302
> task_ctx_sched_out+0x45/0x67()
> ..
> Feb 7 15:28:36 localhost kernel: [<ffffffff8103c60d>]
> warn_slowpath_null+0x15/0x17
> Feb 7 15:28:36 localhost kernel: [<ffffffff8109e3b6>]
> task_ctx_sched_out+0x45/0x67
> Feb 7 15:28:36 localhost kernel: [<ffffffff8109e3b6>]
> task_ctx_sched_out+0x45/0x67
> Feb 7 15:28:36 localhost kernel: [<ffffffff810a207f>]
> perf_event_exit_task+0xca/0x196
> Feb 7 15:28:36 localhost kernel: [<ffffffff810a207f>]
> perf_event_exit_task+0xca/0x196
> Feb 7 15:28:36 localhost kernel: [<ffffffff8103fb3f>] do_exit+0x2c5/0x6e6
> Feb 7 15:28:36 localhost kernel: [<ffffffff8103fb3f>] do_exit+0x2c5/0x6e6
> Feb 7 15:28:36 localhost kernel: [<ffffffff81040021>] complete_and_exit+0x0/0x1e
> Feb 7 15:28:36 localhost kernel: [<ffffffff81040021>] complete_and_exit+0x0/0x1e
> Feb 7 15:28:36 localhost kernel: [<ffffffff8100293b>]
> system_call_fastpath+0x16/0x1b
> Feb 7 15:28:36 localhost kernel: [<ffffffff8100293b>]
> system_call_fastpath+0x16/0x1b
> 
> 
> 1301
> 1302         if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
> 1303                 return;
> 
>  -Arun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.37 kernel warning in perf_events code
  2011-02-10 19:08 ` Arnaldo Carvalho de Melo
@ 2011-02-10 20:02   ` Arun Sharma
  2011-02-10 20:20     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Arun Sharma @ 2011-02-10 20:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-perf-users

On Thu, Feb 10, 2011 at 11:08 AM, Arnaldo Carvalho de Melo
<acme@redhat.com> wrote:
> Em Wed, Feb 09, 2011 at 06:50:09PM -0800, Arun Sharma escreveu:
>> [ Not sure if this is known and fixed in newer kernels. My machine
>> becomes unusable after this warning triggers ]
>>
>> perf record -g -p <pid> cs -o csw.data -- sleep 3
>
> What kernel are you using? Peter, rings any bells?

Stock 2.6.37.

 -Arun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.37 kernel warning in perf_events code
  2011-02-10 20:02   ` Arun Sharma
@ 2011-02-10 20:20     ` Arnaldo Carvalho de Melo
  2011-02-10 20:46       ` Arun Sharma
  0 siblings, 1 reply; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2011-02-10 20:20 UTC (permalink / raw)
  To: Arun Sharma; +Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-perf-users

Em Thu, Feb 10, 2011 at 12:02:20PM -0800, Arun Sharma escreveu:
> On Thu, Feb 10, 2011 at 11:08 AM, Arnaldo Carvalho de Melo
> <acme@redhat.com> wrote:
> > Em Wed, Feb 09, 2011 at 06:50:09PM -0800, Arun Sharma escreveu:
> >> [ Not sure if this is known and fixed in newer kernels. My machine
> >> becomes unusable after this warning triggers ]
> >>
> >> perf record -g -p <pid> cs -o csw.data -- sleep 3

Arun, are you shure the above line is right? I guess it should read:

perf record -g -p <pid> -e cs -o csw.data -- sleep 3

To specify the context switches soft event, right?

> > What kernel are you using? Peter, rings any bells?
> 
> Stock 2.6.37.

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.37 kernel warning in perf_events code
  2011-02-10 20:20     ` Arnaldo Carvalho de Melo
@ 2011-02-10 20:46       ` Arun Sharma
  2011-02-11  0:42         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Arun Sharma @ 2011-02-10 20:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-perf-users

On Thu, Feb 10, 2011 at 12:20 PM, Arnaldo Carvalho de Melo
<acme@redhat.com> wrote:
>
>> >> perf record -g -p <pid> cs -o csw.data -- sleep 3
>
> Arun, are you shure the above line is right? I guess it should read:
>
>
> perf record -g -p <pid> -e cs -o csw.data -- sleep 3
>
> To specify the context switches soft event, right?
>

You caught a cut and paste error. I'm pretty sure I had the -e in
there when the warning triggered. I tried this command a few times,
just to verify and here's what I found:

* Under low loads, everything works fine.
* Under a heavy work load - I'm not able to reproduce the warning, but
hitting very similar symptoms:

[ perf record: Captured and wrote 2.282 MB /tmp/junk.data (~99721 samples) ]
[ perf record: Captured and wrote 1.734 MB /tmp/junk.data (~75740 samples) ]
[ perf record: Captured and wrote 0.091 MB /tmp/junk.data (~3975
samples) ]  <--- bad run

The bad run made my shell unresponsive and took around 30-40 seconds
to complete (whereas the good runs completed in less than 5 secs).
Could this be some kind of a feedback loop where what the measurement
machinery is perturbing what's being measured?

 -Arun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.37 kernel warning in perf_events code
  2011-02-10 20:46       ` Arun Sharma
@ 2011-02-11  0:42         ` Arnaldo Carvalho de Melo
  2011-02-11  0:57           ` Arun Sharma
  0 siblings, 1 reply; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2011-02-11  0:42 UTC (permalink / raw)
  To: Arun Sharma; +Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-perf-users

Em Thu, Feb 10, 2011 at 12:46:07PM -0800, Arun Sharma escreveu:
> On Thu, Feb 10, 2011 at 12:20 PM, Arnaldo Carvalho de Melo wrote:
> >> >> perf record -g -p <pid> cs -o csw.data -- sleep 3

> > Arun, are you shure the above line is right? I guess it should read:

> > perf record -g -p <pid> -e cs -o csw.data -- sleep 3

> > To specify the context switches soft event, right?
> 
> You caught a cut and paste error. I'm pretty sure I had the -e in
> there when the warning triggered. I tried this command a few times,
> just to verify and here's what I found:

> * Under low loads, everything works fine.
> * Under a heavy work load - I'm not able to reproduce the warning, but
> hitting very similar symptoms:

> [ perf record: Captured and wrote 2.282 MB /tmp/junk.data (~99721 samples) ]
> [ perf record: Captured and wrote 1.734 MB /tmp/junk.data (~75740 samples) ]
> [ perf record: Captured and wrote 0.091 MB /tmp/junk.data (~3975
> samples) ]  <--- bad run

> The bad run made my shell unresponsive and took around 30-40 seconds
> to complete (whereas the good runs completed in less than 5 secs).
> Could this be some kind of a feedback loop where what the measurement
> machinery is perturbing what's being measured?

Is it possible for you to test this with 2.6.38-rc4? At least the user
level tools, just do:

[acme@felicio linux]$ make help | grep perf
  perf-tar-src-pkg    - Build perf-2.6.38-rc3.tar source tarball
  perf-targz-src-pkg  - Build perf-2.6.38-rc3.tar.gz source tarball
  perf-tarbz2-src-pkg - Build perf-2.6.38-rc3.tar.bz2 source tarball
[acme@felicio linux]$

Pick one of these targets on the source tree for 2.6.38-rc4, move the
tarball to the machine where you need to run the older kernel (.37,
right?) and try building and running it there.

Either or just build the new tools and run it on the older kernel.

There were several changes to better inform about lost events due to
heavy load that may be at play in your case.

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.37 kernel warning in perf_events code
  2011-02-11  0:42         ` Arnaldo Carvalho de Melo
@ 2011-02-11  0:57           ` Arun Sharma
  2011-02-11 12:16             ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Arun Sharma @ 2011-02-11  0:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-perf-users

On Thu, Feb 10, 2011 at 4:42 PM, Arnaldo Carvalho de Melo
<acme@redhat.com> wrote:
> Is it possible for you to test this with 2.6.38-rc4? At least the user
> level tools, just do:

I tried the 2.6.38rc3 perf binary on two machines:

kernel = 2.6.37: The machine hung solid on the first try.
kernel = 2.6.38rc3: I could run it 10 times in a loop and no bad runs.

Looks like this is already fixed in newer kernels.

 -Arun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.37 kernel warning in perf_events code
  2011-02-11  0:57           ` Arun Sharma
@ 2011-02-11 12:16             ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2011-02-11 12:16 UTC (permalink / raw)
  To: Arun Sharma
  Cc: Arnaldo Carvalho de Melo, Linux Kernel Mailing List,
	linux-perf-users

On Thu, 2011-02-10 at 16:57 -0800, Arun Sharma wrote:
> On Thu, Feb 10, 2011 at 4:42 PM, Arnaldo Carvalho de Melo
> <acme@redhat.com> wrote:
> > Is it possible for you to test this with 2.6.38-rc4? At least the user
> > level tools, just do:
> 
> I tried the 2.6.38rc3 perf binary on two machines:
> 
> kernel = 2.6.37: The machine hung solid on the first try.
> kernel = 2.6.38rc3: I could run it 10 times in a loop and no bad runs.
> 
> Looks like this is already fixed in newer kernels.

I couldn't actually reproduce on .37 with the given details, so while
its already fixed it would be nice to figure out what patch fixed it so
we can make sure it ends up in .37-stable.

Could you provide exact reproduction details or maybe bisect the thing?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-02-11 12:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-10  2:50 2.6.37 kernel warning in perf_events code Arun Sharma
2011-02-10 19:08 ` Arnaldo Carvalho de Melo
2011-02-10 20:02   ` Arun Sharma
2011-02-10 20:20     ` Arnaldo Carvalho de Melo
2011-02-10 20:46       ` Arun Sharma
2011-02-11  0:42         ` Arnaldo Carvalho de Melo
2011-02-11  0:57           ` Arun Sharma
2011-02-11 12:16             ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).