2.6.22.6 + oprofile oops

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.22.6 + oprofile oops
@ 2007-09-22 11:23 Sami Farin
  2007-09-29 17:05 ` Sami Farin
  2007-10-01 10:29 ` Andi Kleen
  0 siblings, 2 replies; 7+ messages in thread
From: Sami Farin @ 2007-09-22 11:23 UTC (permalink / raw)
  To: Linux kernel Mailing List

x86_64 SMP kernel v2.6.22.6 (not using callgraph).
sometimes oprofile works for a longer time... but not this time.

2007-09-22 13:53:32.527237777 <1>[ 3372.390188] Unable to handle kernel NULL pointer dereference at 0000000000000650 RIP: 
2007-09-22 13:53:32.527245948 <1>[ 3372.390195]  [<ffffffff80652f44>] _spin_lock+0x4/0x20
2007-09-22 13:53:32.527247694 <4>[ 3372.390211] PGD 13a6c067 PUD 3ad91067 PMD 0 
2007-09-22 13:53:32.527249371 <0>[ 3372.390216] Oops: 0002 [1] SMP 
2007-09-22 13:53:32.527250837 <4>[ 3372.390220] CPU 0 
2007-09-22 13:53:32.527252304 <4>[ 3372.390227] Modules linked in: i915 oprofile xt_CLASSIFY ipt_ECN xt_CONNMARK xt_connlimit xt_length ipt_set xt_multiport ip_set_iphash ip_set_nethash ip_set_portmap sch_esfq ipt_REJECT ip6t_LOG xt_limit ipt_LOG xt_hashlimit ipt_owner nf_conntrack_ipv4 xt_state xt_tcpudp ip6table_filter ip6table_mangle ip6_tables iptable_filter iptable_mangle iptable_raw ip_tables tcp_highspeed tcp_htcp tcp_hybla tcp_scalable tcp_vegas tcp_westwood ip_set sch_netem sch_hfsc sch_htb sch_sfq cls_fw cls_u32 cls_route sch_ingress sch_red sch_tbf sch_teql sch_prio sch_gred cls_rsvp cls_rsvp6 cls_tcindex sch_cbq sch_dsmark nf_conntrack xt_TARPIT x_tables perfctr dccp_diag dccp ioatdma cmtp kernelcapi l2cap bluetooth intelfb i2c_algo_bit i810 lp i2c_dev ftdi_sio usbserial usb_storage eeprom ohci_hcd irlan irda crc_ccitt binfmt_misc loop dm_mod video dock button battery ac tcp_cubic nvram snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss parport_p
2007-09-22 13:53:32.527328363 c snd_pcm parport snd_timer ohci1394 ieee1394 snd i2c_i801 i2c_core soundcore ehci_hcd uhci_hcd e1000 snd_page_alloc iTCO_wdt iTCO_vendor_support
2007-09-22 13:53:32.527330808 <4>[ 3372.390363] Pid: 7, comm: events/0 Not tainted 2.6.22.6-cfs-v20.5-64-2 #7
2007-09-22 13:53:32.527332623 <4>[ 3372.390366] RIP: 0010:[<ffffffff80652f44>]  [<ffffffff80652f44>] _spin_lock+0x4/0x20
2007-09-22 13:53:32.527334439 <4>[ 3372.390373] RSP: 0018:ffff81003e5fddc0  EFLAGS: 00010282
2007-09-22 13:53:32.527336116 <4>[ 3372.390380] RAX: 000000000000000d RBX: 0000000000000004 RCX: 0000000000018000
2007-09-22 13:53:32.527346033 <4>[ 3372.390383] RDX: 00000000000155ae RSI: 0000000000000000 RDI: 0000000000000650
2007-09-22 13:53:32.527347849 <4>[ 3372.390387] RBP: ffff81003e5fddc0 R08: 2222222222222222 R09: 2222222222222222
2007-09-22 13:53:32.527349595 <4>[ 3372.390391] R10: 2222222222222222 R11: 2222222222222222 R12: 0000000000000004
2007-09-22 13:53:32.527351411 <4>[ 3372.390398] R13: ffffffffffffffff R14: ffffffff88321d00 R15: 000000000000000d
2007-09-22 13:53:32.527365100 <4>[ 3372.390402] FS:  0000000000000000(0000) GS:ffffffff80805000(0000) knlGS:0000000000000000
2007-09-22 13:53:32.527367056 <4>[ 3372.390406] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
2007-09-22 13:53:32.527368662 <4>[ 3372.390409] CR2: 0000000000000650 CR3: 0000000011d52000 CR4: 00000000000006e0
2007-09-22 13:53:32.527370478 <4>[ 3372.390417] Process events/0 (pid: 7, threadinfo ffff81003e5fc000, task ffff810001ffb100)
2007-09-22 13:53:32.527380117 <4>[ 3372.390419] Stack:  ffff81003e5fdde0 ffffffff80232b88 ffff81003e5fdeb0 00000000000000ba
2007-09-22 13:53:32.527382002 <4>[ 3372.390427]  ffff81003e5fde60 ffffffff8831abd4 ffffffff88321d70 0000000000000000
2007-09-22 13:53:32.527383818 <4>[ 3372.390437]  0000000000000000 0000000000000000 0000000100000001 00000000000000ba
2007-09-22 13:53:32.527385634 <4>[ 3372.390442] Call Trace:
2007-09-22 13:53:32.527390314 <4>[ 3372.390457]  [<ffffffff80232b88>] get_task_mm+0x18/0x60
2007-09-22 13:53:32.527391990 <4>[ 3372.390484]  [<ffffffff8831abd4>] :oprofile:sync_buffer+0xf4/0x480
2007-09-22 13:53:32.527393666 <4>[ 3372.390544]  [<ffffffff8831a6b0>] :oprofile:wq_sync_buffer+0x0/0x60
2007-09-22 13:53:32.527399673 <4>[ 3372.390576]  [<ffffffff8831a6e3>] :oprofile:wq_sync_buffer+0x33/0x60
2007-09-22 13:53:32.527401419 <4>[ 3372.390602]  [<ffffffff80244bfd>] run_workqueue+0xcd/0x170
2007-09-22 13:53:32.527406238 <4>[ 3372.390635]  [<ffffffff802456b3>] worker_thread+0xb3/0x120
2007-09-22 13:53:32.527421324 <4>[ 3372.390652]  [<ffffffff80249000>] autoremove_wake_function+0x0/0x40
2007-09-22 13:53:32.527423140 <4>[ 3372.390684]  [<ffffffff80245600>] worker_thread+0x0/0x120
2007-09-22 13:53:32.527424816 <4>[ 3372.390697]  [<ffffffff80248bfd>] kthread+0x4d/0x80
2007-09-22 13:53:32.527426423 <4>[ 3372.390730]  [<ffffffff8020abc8>] child_rip+0xa/0x12
2007-09-22 13:53:32.527428099 <4>[ 3372.390810]  [<ffffffff80248bb0>] kthread+0x0/0x80
2007-09-22 13:53:32.527444931 <4>[ 3372.390822]  [<ffffffff8020abbe>] child_rip+0x0/0x12
2007-09-22 13:53:32.527446817 <4>[ 3372.390844] 
2007-09-22 13:53:32.527448144 <4>[ 3372.390846] 
2007-09-22 13:53:32.527449541 <4>[ 3372.390846] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c9 c3 66 66 66 2e 
2007-09-22 13:53:32.527455757 <1>[ 3372.390866] RIP  [<ffffffff80652f44>] _spin_lock+0x4/0x20
2007-09-22 13:53:32.527457433 <4>[ 3372.390876]  RSP <ffff81003e5fddc0>
2007-09-22 13:53:32.527463090 <0>[ 3372.390878] CR2: 0000000000000650

BTW. does somebody have callgraph working on x86_64 with oprofile?
I get instant freeze and I need to power cycle system... (callgraph worked on IA-32)

-- 
Do what you love because life is too short for anything else.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.22.6 + oprofile oops
  2007-09-22 11:23 2.6.22.6 + oprofile oops Sami Farin
@ 2007-09-29 17:05 ` Sami Farin
  2007-10-11 14:36   ` Philippe Elie
  2007-10-01 10:29 ` Andi Kleen
  1 sibling, 1 reply; 7+ messages in thread
From: Sami Farin @ 2007-09-29 17:05 UTC (permalink / raw)
  To: Linux kernel Mailing List; +Cc: oprofile Mailing List

On Sat, Sep 22, 2007 at 14:23:35 +0300, Sami Farin wrote:
> x86_64 SMP kernel v2.6.22.6 (not using callgraph).
> sometimes oprofile works for a longer time... but not this time.
> 
> 2007-09-22 13:53:32.527237777 <1>[ 3372.390188] Unable to handle kernel NULL pointer dereference at 0000000000000650 RIP: 
> 2007-09-22 13:53:32.527245948 <1>[ 3372.390195]  [<ffffffff80652f44>] _spin_lock+0x4/0x20
> 2007-09-22 13:53:32.527247694 <4>[ 3372.390211] PGD 13a6c067 PUD 3ad91067 PMD 0 
> 2007-09-22 13:53:32.527249371 <0>[ 3372.390216] Oops: 0002 [1] SMP 
> 2007-09-22 13:53:32.527250837 <4>[ 3372.390220] CPU 0 
> 2007-09-22 13:53:32.527252304 <4>[ 3372.390227] <snip>
...
> 2007-09-22 13:53:32.527385634 <4>[ 3372.390442] Call Trace:
> 2007-09-22 13:53:32.527390314 <4>[ 3372.390457]  [<ffffffff80232b88>] get_task_mm+0x18/0x60
> 2007-09-22 13:53:32.527391990 <4>[ 3372.390484]  [<ffffffff8831abd4>] :oprofile:sync_buffer+0xf4/0x480
> 2007-09-22 13:53:32.527393666 <4>[ 3372.390544]  [<ffffffff8831a6b0>] :oprofile:wq_sync_buffer+0x0/0x60
> 2007-09-22 13:53:32.527399673 <4>[ 3372.390576]  [<ffffffff8831a6e3>] :oprofile:wq_sync_buffer+0x33/0x60
> 2007-09-22 13:53:32.527401419 <4>[ 3372.390602]  [<ffffffff80244bfd>] run_workqueue+0xcd/0x170
> 2007-09-22 13:53:32.527406238 <4>[ 3372.390635]  [<ffffffff802456b3>] worker_thread+0xb3/0x120
> 2007-09-22 13:53:32.527421324 <4>[ 3372.390652]  [<ffffffff80249000>] autoremove_wake_function+0x0/0x40
> 2007-09-22 13:53:32.527423140 <4>[ 3372.390684]  [<ffffffff80245600>] worker_thread+0x0/0x120
> 2007-09-22 13:53:32.527424816 <4>[ 3372.390697]  [<ffffffff80248bfd>] kthread+0x4d/0x80
> 2007-09-22 13:53:32.527426423 <4>[ 3372.390730]  [<ffffffff8020abc8>] child_rip+0xa/0x12
> 2007-09-22 13:53:32.527428099 <4>[ 3372.390810]  [<ffffffff80248bb0>] kthread+0x0/0x80
> 2007-09-22 13:53:32.527444931 <4>[ 3372.390822]  [<ffffffff8020abbe>] child_rip+0x0/0x12

This is easy to trigger with "ping -f 127.0.0.1" (as root).
Oopses inside 0.1s.  If you can't reproduce, try more events and/or smaller
sample count.

May I suggest something: if system is too slow to process events,
find out which event is triggering too often, double the sample count
for it, reset all counters, and log an informative kernel message.
If that's why it's crashing...   if it's some other reason, tough luck
:)

> 2007-09-22 13:53:32.527446817 <4>[ 3372.390844] 
> 2007-09-22 13:53:32.527448144 <4>[ 3372.390846] 
> 2007-09-22 13:53:32.527449541 <4>[ 3372.390846] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c9 c3 66 66 66 2e 
> 2007-09-22 13:53:32.527455757 <1>[ 3372.390866] RIP  [<ffffffff80652f44>] _spin_lock+0x4/0x20
> 2007-09-22 13:53:32.527457433 <4>[ 3372.390876]  RSP <ffff81003e5fddc0>
> 2007-09-22 13:53:32.527463090 <0>[ 3372.390878] CR2: 0000000000000650
> 
> BTW. does somebody have callgraph working on x86_64 with oprofile?
> I get instant freeze and I need to power cycle system... (callgraph worked on IA-32)

-- 
Do what you love because life is too short for anything else.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.22.6 + oprofile oops
  2007-09-29 17:05 ` Sami Farin
@ 2007-10-11 14:36   ` Philippe Elie
  2007-10-11 18:09     ` Sami Farin
  0 siblings, 1 reply; 7+ messages in thread
From: Philippe Elie @ 2007-10-11 14:36 UTC (permalink / raw)
  To: Linux kernel Mailing List, oprofile Mailing List; +Cc: Sami Farin, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 1669 bytes --]

On Sat, 29 Sep 2007 at 20:05 +0000, Sami Farin wrote:

> > x86_64 SMP kernel v2.6.22.6 (not using callgraph).
> > sometimes oprofile works for a longer time... but not this time.
> > 
> > 2007-09-22 13:53:32.527237777 <1>[ 3372.390188] Unable to handle kernel NULL pointer dereference at 0000000000000650 RIP: 
> > 2007-09-22 13:53:32.527245948 <1>[ 3372.390195]  [<ffffffff80652f44>] _spin_lock+0x4/0x20
...
> 2007-09-22 13:53:32.527390314 <4>[ 3372.390457]  [<ffffffff80232b88>] get_task_mm+0x18/0x60

On the per cpu buffer writer side oprofile_add_sample() use profile_pc()
to get the eip, profile_pc() can return ~0lu, but an eip == ~0lu is a
magic value = ESCAPE_CODE. The per cpu reader side in buffer_sync.c use
this value to know that the associated data is a task pointer but here
the associated data is a counter number.

This has already been reported two years ago by Jesse Barnes on the same
sort of box, pentium D. This is not reproducible on a duo core nor I was
able to on a P4 box two years ago, I dunno why. Anyway profile_pc() is
broken() on both i386/x86_64, w/o frame pointer. For i386:

000000b0 <_spin_lock_bh>:
  b0:   53                      push   %ebx /* break profile_pc() */
  b1:   89 c3                   mov    %eax,%ebx

On x86_64 it's broken with or w/o frame pointer.

I understand the motivation to get the eip calling a spinlock function,
but that's a cheat and it has a price. Beside that, the trouble is also
on oprofile side, magic value are evil. This bug exists since at least
2.6.13.

Sami can you try the attached patch, the chunk in buffer_sync.c is
here only to avoid oopsing if another problem exists somewhere.


-- 
Phe

[-- Attachment #2: oprof-fix-profile_pc-use.patch --]
[-- Type: text/plain, Size: 1616 bytes --]

--- drivers/oprofile/cpu_buffer.c.orig	2007-10-03 21:36:08.000000000 +0200
+++ drivers/oprofile/cpu_buffer.c	2007-10-11 12:41:57.000000000 +0200
@@ -241,7 +241,7 @@
 void oprofile_add_sample(struct pt_regs * const regs, unsigned long event)
 {
 	int is_kernel = !user_mode(regs);
-	unsigned long pc = profile_pc(regs);
+	unsigned long pc = instruction_pointer(regs);
 
 	oprofile_add_ext_sample(pc, regs, event, is_kernel);
 }
--- drivers/oprofile/buffer_sync.c.orig	2007-10-04 10:15:27.000000000 +0200
+++ drivers/oprofile/buffer_sync.c	2007-10-11 12:49:22.000000000 +0200
@@ -530,11 +530,30 @@
 				/* userspace context switch */
 				new = (struct task_struct *)s->event;
 
-				release_mm(oldmm);
-				mm = take_tasks_mm(new);
-				if (mm != oldmm)
-					cookie = get_exec_dcookie(mm);
-				add_user_ctx_switch(new, cookie);
+				/* Ugly, let says task pointer can't be
+				   in the last page nor in the first page */
+				if (s->event + 0x1000 > 0x2000) {
+					release_mm(oldmm);
+					mm = take_tasks_mm(new);
+					if (mm != oldmm)
+						cookie = get_exec_dcookie(mm);
+					add_user_ctx_switch(new, cookie);
+				} else {
+					static int show_it = 0;
+					if (show_it++ < 5) {
+						printk(KERN_INFO
+						       "oprofile: Invalid "
+						       "event %lu, head %lu, "
+						       "tail %lu, eip %lu, "
+						       "available %lu, i %u "
+						       "state %d %d\n",
+						       s->event,
+						       cpu_buf->head_pos,
+						       cpu_buf->tail_pos,
+						       s->eip, available, i,
+						       state, in_kernel);
+					}
+				}
 			}
 		} else {
 			if (state >= sb_bt_start &&

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.22.6 + oprofile oops
  2007-10-11 14:36   ` Philippe Elie
@ 2007-10-11 18:09     ` Sami Farin
  0 siblings, 0 replies; 7+ messages in thread
From: Sami Farin @ 2007-10-11 18:09 UTC (permalink / raw)
  To: Linux kernel Mailing List, oprofile Mailing List

On Thu, Oct 11, 2007 at 16:36:10 +0200, Philippe Elie wrote:
> On Sat, 29 Sep 2007 at 20:05 +0000, Sami Farin wrote:
> 
> > > x86_64 SMP kernel v2.6.22.6 (not using callgraph).
> > > sometimes oprofile works for a longer time... but not this time.
> > > 
> > > 2007-09-22 13:53:32.527237777 <1>[ 3372.390188] Unable to handle kernel NULL pointer dereference at 0000000000000650 RIP: 
> > > 2007-09-22 13:53:32.527245948 <1>[ 3372.390195]  [<ffffffff80652f44>] _spin_lock+0x4/0x20
> ...
> > 2007-09-22 13:53:32.527390314 <4>[ 3372.390457]  [<ffffffff80232b88>] get_task_mm+0x18/0x60
> 
> On the per cpu buffer writer side oprofile_add_sample() use profile_pc()
> to get the eip, profile_pc() can return ~0lu, but an eip == ~0lu is a
> magic value = ESCAPE_CODE. The per cpu reader side in buffer_sync.c use
> this value to know that the associated data is a task pointer but here
> the associated data is a counter number.

I applied this patch and I'm not able to reproduce Oops anymore
with 2.6.22.10 or 2.6.23.
Haven't gotten "oprofile: Invalid event", either.

Thanks a bunch!

-- 
Do what you love because life is too short for anything else.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.22.6 + oprofile oops
  2007-09-22 11:23 2.6.22.6 + oprofile oops Sami Farin
  2007-09-29 17:05 ` Sami Farin
@ 2007-10-01 10:29 ` Andi Kleen
  2007-10-01 13:22   ` Sami Farin
  2007-10-01 14:47   ` Ingo Molnar
  1 sibling, 2 replies; 7+ messages in thread
From: Andi Kleen @ 2007-10-01 10:29 UTC (permalink / raw)
  To: Sami Farin; +Cc: Linux kernel Mailing List, mingo

Sami Farin <safari-kernel@safari.iki.fi> writes:

> x86_64 SMP kernel v2.6.22.6 (not using callgraph).
> sometimes oprofile works for a longer time... but not this time.
> 
> 2007-09-22 13:53:32.527237777 <1>[ 3372.390188] Unable to handle kernel NULL pointer dereference at 0000000000000650 RIP: 
> 2007-09-22 13:53:32.527245948 <1>[ 3372.390195]  [<ffffffff80652f44>] _spin_lock+0x4/0x20
> 2007-09-22 13:53:32.527247694 <4>[ 3372.390211] PGD 13a6c067 PUD 3ad91067 PMD 0 
> 2007-09-22 13:53:32.527249371 <0>[ 3372.390216] Oops: 0002 [1] SMP 
> 2007-09-22 13:53:32.527250837 <4>[ 3372.390220] CPU 0 
> 2007-09-22 13:53:32.527252304 <4>[ 3372.390227] Modules linked in: i915 oprofile xt_CLASSIFY ipt_ECN xt_CONNMARK xt_connlimit xt_length ipt_set xt_multiport ip_set_iphash ip_set_nethash ip_set_portmap sch_esfq ipt_REJECT ip6t_LOG xt_limit ipt_LOG xt_hashlimit ipt_owner nf_conntrack_ipv4 xt_state xt_tcpudp ip6table_filter ip6table_mangle ip6_tables iptable_filter iptable_mangle iptable_raw ip_tables tcp_highspeed tcp_htcp tcp_hybla tcp_scalable tcp_vegas tcp_westwood ip_set sch_netem sch_hfsc sch_htb sch_sfq cls_fw cls_u32 cls_route sch_ingress sch_red sch_tbf sch_teql sch_prio sch_gred cls_rsvp cls_rsvp6 cls_tcindex sch_cbq sch_dsmark nf_conntrack xt_TARPIT x_tables perfctr dccp_diag dccp ioatdma cmtp kernelcapi l2cap bluetooth intelfb i2c_algo_bit i810 lp i2c_dev ftdi_sio usbserial usb_st
>  orage eeprom ohci_hcd irlan irda crc_ccitt binfmt_misc loop dm_mod video dock button battery ac tcp_cubic nvram snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss parport_p
> 2007-09-22 13:53:32.527328363 c snd_pcm parport snd_timer ohci1394 ieee1394 snd i2c_i801 i2c_core soundcore ehci_hcd uhci_hcd e1000 snd_page_alloc iTCO_wdt iTCO_vendor_support
> 2007-09-22 13:53:32.527330808 <4>[ 3372.390363] Pid: 7, comm: events/0 Not tainted 2.6.22.6-cfs-v20.5-64-2 #7

Can you reproduce it with a non tainted kernel without any patches like CFS applied?

The oops happens because oprofile tries to reference a NULL task in
its event buffer for a context switch, which might well be a minor
merging mistake or similar in the CFS backport.

-Andi

> 2007-09-22 13:53:32.527332623 <4>[ 3372.390366] RIP: 0010:[<ffffffff80652f44>]  [<ffffffff80652f44>] _spin_lock+0x4/0x20
> 2007-09-22 13:53:32.527334439 <4>[ 3372.390373] RSP: 0018:ffff81003e5fddc0  EFLAGS: 00010282
> 2007-09-22 13:53:32.527336116 <4>[ 3372.390380] RAX: 000000000000000d RBX: 0000000000000004 RCX: 0000000000018000
> 2007-09-22 13:53:32.527346033 <4>[ 3372.390383] RDX: 00000000000155ae RSI: 0000000000000000 RDI: 0000000000000650
> 2007-09-22 13:53:32.527347849 <4>[ 3372.390387] RBP: ffff81003e5fddc0 R08: 2222222222222222 R09: 2222222222222222
> 2007-09-22 13:53:32.527349595 <4>[ 3372.390391] R10: 2222222222222222 R11: 2222222222222222 R12: 0000000000000004
> 2007-09-22 13:53:32.527351411 <4>[ 3372.390398] R13: ffffffffffffffff R14: ffffffff88321d00 R15: 000000000000000d
> 2007-09-22 13:53:32.527365100 <4>[ 3372.390402] FS:  0000000000000000(0000) GS:ffffffff80805000(0000) knlGS:0000000000000000
> 2007-09-22 13:53:32.527367056 <4>[ 3372.390406] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> 2007-09-22 13:53:32.527368662 <4>[ 3372.390409] CR2: 0000000000000650 CR3: 0000000011d52000 CR4: 00000000000006e0
> 2007-09-22 13:53:32.527370478 <4>[ 3372.390417] Process events/0 (pid: 7, threadinfo ffff81003e5fc000, task ffff810001ffb100)
> 2007-09-22 13:53:32.527380117 <4>[ 3372.390419] Stack:  ffff81003e5fdde0 ffffffff80232b88 ffff81003e5fdeb0 00000000000000ba
> 2007-09-22 13:53:32.527382002 <4>[ 3372.390427]  ffff81003e5fde60 ffffffff8831abd4 ffffffff88321d70 0000000000000000
> 2007-09-22 13:53:32.527383818 <4>[ 3372.390437]  0000000000000000 0000000000000000 0000000100000001 00000000000000ba
> 2007-09-22 13:53:32.527385634 <4>[ 3372.390442] Call Trace:
> 2007-09-22 13:53:32.527390314 <4>[ 3372.390457]  [<ffffffff80232b88>] get_task_mm+0x18/0x60
> 2007-09-22 13:53:32.527391990 <4>[ 3372.390484]  [<ffffffff8831abd4>] :oprofile:sync_buffer+0xf4/0x480
> 2007-09-22 13:53:32.527393666 <4>[ 3372.390544]  [<ffffffff8831a6b0>] :oprofile:wq_sync_buffer+0x0/0x60
> 2007-09-22 13:53:32.527399673 <4>[ 3372.390576]  [<ffffffff8831a6e3>] :oprofile:wq_sync_buffer+0x33/0x60
> 2007-09-22 13:53:32.527401419 <4>[ 3372.390602]  [<ffffffff80244bfd>] run_workqueue+0xcd/0x170
> 2007-09-22 13:53:32.527406238 <4>[ 3372.390635]  [<ffffffff802456b3>] worker_thread+0xb3/0x120
> 2007-09-22 13:53:32.527421324 <4>[ 3372.390652]  [<ffffffff80249000>] autoremove_wake_function+0x0/0x40
> 2007-09-22 13:53:32.527423140 <4>[ 3372.390684]  [<ffffffff80245600>] worker_thread+0x0/0x120
> 2007-09-22 13:53:32.527424816 <4>[ 3372.390697]  [<ffffffff80248bfd>] kthread+0x4d/0x80
> 2007-09-22 13:53:32.527426423 <4>[ 3372.390730]  [<ffffffff8020abc8>] child_rip+0xa/0x12
> 2007-09-22 13:53:32.527428099 <4>[ 3372.390810]  [<ffffffff80248bb0>] kthread+0x0/0x80
> 2007-09-22 13:53:32.527444931 <4>[ 3372.390822]  [<ffffffff8020abbe>] child_rip+0x0/0x12

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.22.6 + oprofile oops
  2007-10-01 10:29 ` Andi Kleen
@ 2007-10-01 13:22   ` Sami Farin
  2007-10-01 14:47   ` Ingo Molnar
  1 sibling, 0 replies; 7+ messages in thread
From: Sami Farin @ 2007-10-01 13:22 UTC (permalink / raw)
  To: Linux kernel Mailing List; +Cc: Ingo Molnar

On Mon, Oct 01, 2007 at 12:29:18 +0200, Andi Kleen wrote:
> Sami Farin <safari-kernel@safari.iki.fi> writes:
... 
> Can you reproduce it with a non tainted kernel without any patches like CFS applied?

I tried without CFS and it Oopsed just as nicely,
the only difference was that in call trace before
:oprofile:wq_sync_buffer there's now cache_reap.

Call Trace:
 [<ffffffff80231988>] get_task_mm+0x18/0x50
 [<ffffffff88316b64>] :oprofile:sync_buffer+0xf4/0x450
 [<ffffffff88316650>] :oprofile:wq_sync_buffer+0x0/0x60
 [<ffffffff88316683>] :oprofile:wq_sync_buffer+0x33/0x60
 [<ffffffff80289f10>] cache_reap+0x0/0x130
 [<ffffffff80243269>] run_workqueue+0x89/0x120
 [<ffffffff80243cf7>] worker_thread+0xc7/0x130
 [<ffffffff802474d0>] autoremove_wake_function+0x0/0x40
 [<ffffffff80243c30>] worker_thread+0x0/0x130
 [<ffffffff802470cd>] kthread+0x4d/0x80
 [<ffffffff8020aa78>] child_rip+0xa/0x12
 [<ffffffff80247080>] kthread+0x0/0x80
 [<ffffffff8020aa6e>] child_rip+0x0/0x12

> The oops happens because oprofile tries to reference a NULL task in
> its event buffer for a context switch, which might well be a minor
> merging mistake or similar in the CFS backport.

...
> > 2007-09-22 13:53:32.527385634 <4>[ 3372.390442] Call Trace:
> > 2007-09-22 13:53:32.527390314 <4>[ 3372.390457]  [<ffffffff80232b88>] get_task_mm+0x18/0x60
> > 2007-09-22 13:53:32.527391990 <4>[ 3372.390484]  [<ffffffff8831abd4>] :oprofile:sync_buffer+0xf4/0x480
> > 2007-09-22 13:53:32.527393666 <4>[ 3372.390544]  [<ffffffff8831a6b0>] :oprofile:wq_sync_buffer+0x0/0x60
> > 2007-09-22 13:53:32.527399673 <4>[ 3372.390576]  [<ffffffff8831a6e3>] :oprofile:wq_sync_buffer+0x33/0x60
> > 2007-09-22 13:53:32.527401419 <4>[ 3372.390602]  [<ffffffff80244bfd>] run_workqueue+0xcd/0x170
> > 2007-09-22 13:53:32.527406238 <4>[ 3372.390635]  [<ffffffff802456b3>] worker_thread+0xb3/0x120
> > 2007-09-22 13:53:32.527421324 <4>[ 3372.390652]  [<ffffffff80249000>] autoremove_wake_function+0x0/0x40
> > 2007-09-22 13:53:32.527423140 <4>[ 3372.390684]  [<ffffffff80245600>] worker_thread+0x0/0x120
> > 2007-09-22 13:53:32.527424816 <4>[ 3372.390697]  [<ffffffff80248bfd>] kthread+0x4d/0x80
> > 2007-09-22 13:53:32.527426423 <4>[ 3372.390730]  [<ffffffff8020abc8>] child_rip+0xa/0x12
> > 2007-09-22 13:53:32.527428099 <4>[ 3372.390810]  [<ffffffff80248bb0>] kthread+0x0/0x80
> > 2007-09-22 13:53:32.527444931 <4>[ 3372.390822]  [<ffffffff8020abbe>] child_rip+0x0/0x12

-- 
Do what you love because life is too short for anything else.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.22.6 + oprofile oops
  2007-10-01 10:29 ` Andi Kleen
  2007-10-01 13:22   ` Sami Farin
@ 2007-10-01 14:47   ` Ingo Molnar
  1 sibling, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2007-10-01 14:47 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Sami Farin, Linux kernel Mailing List


* Andi Kleen <andi@firstfloor.org> wrote:

> > 2007-09-22 13:53:32.527330808 <4>[ 3372.390363] Pid: 7, comm: 
> > events/0 Not tainted 2.6.22.6-cfs-v20.5-64-2 #7
             ^^^^^^^^^^^
> 
> Can you reproduce it with a non tainted kernel without any patches 
> like CFS applied?

the kernel is not tainted, as indicated in the log message above.

> The oops happens because oprofile tries to reference a NULL task in 
> its event buffer for a context switch, which might well be a minor 
> merging mistake or similar in the CFS backport.

CFS does not change that aspect of the oprofile code.

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-10-11 19:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-22 11:23 2.6.22.6 + oprofile oops Sami Farin
2007-09-29 17:05 ` Sami Farin
2007-10-11 14:36   ` Philippe Elie
2007-10-11 18:09     ` Sami Farin
2007-10-01 10:29 ` Andi Kleen
2007-10-01 13:22   ` Sami Farin
2007-10-01 14:47   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox