All of lore.kernel.org
 help / color / mirror / Atom feed
* oops in x86/oprofile/dump_stack with 3.4.6
@ 2012-08-02  7:04 wyang1
  0 siblings, 0 replies; only message in thread
From: wyang1 @ 2012-08-02  7:04 UTC (permalink / raw)
  To: linux-kernel, robert.richter

Hi all,

A couple of days ago I tried to use oprofile with enabling call graph in 
a recent build of 3.4.6. this causes a OOPS
"BUG: unable to handle kernel paging request at 636f7270".

The oops can be often reproduced by the following steps on my board 
based on Intel Atom.

opcontrol --no-vmlinux
opcontrol -c 10
opcontrol -e BR_INST_RETIRED:8000
opcontrl -s

With a few seconds I get the oops.

I've spent some time investigating the problem, and found it is possible 
that the marked line in dump_strace function results in the oops as the 
content of stack
variable is less than CONFIG_PAGE_OFFSET(0xc000_0000).
arch/x86/kernel/dumpstack_32.c
int dump_stack()
...
for (;;) {
         struct thread_info *context;
         context = (struct thread_info *)
             ((unsigned long)stack & (~(THREAD_SIZE - 1)));
         bp = ops->walk_stack(context, stack, bp, ops, data, NULL, &graph);

----->  stack = (unsigned long *)context->previous_esp;
         if (!stack)
             break;
         if (ops->stack(data, "IRQ") < 0)
             break;
         touch_nmi_watchdog();
     }
...
Actually I think this scenario should not happen, because the stack 
should be kernel stack or irq stack.
So I changed this function by adding simplistic check to avoid this 
scenario. It seems to work for my board.

for (;;) {
                 struct thread_info *context;
+               /*
+                * Somehow stack may be less than CONFIG_PAGE_OFFSET,
+                * So we should temporarily avoid the scenario before
+                * figuring out the root cause.
+                */
+               if (stack < CONFIG_PAGE_OFFSET)
+                       break;

                 context = (struct thread_info *)
                         ((unsigned long)stack & (~(THREAD_SIZE - 1)));


Any suggestion for the change.

Thanks
Wei

The following is the oops I encountered.
BUG: unable to handle kernel paging request at 636f7270
IP: [<c1005229>] print_context_stack+0x69/0x130
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
LTT NESTING LEVEL : 0
Modules linked in:

Pid: 0, comm: swapper/0 Not tainted 
3.4.6-WR5.0+snapshot-20120801_standard #9 To be filled by O.E.M. To be 
filled by O.E.M./To be filled by O.E.M.
EIP: 0060:[<c1005229>] EFLAGS: 00010046 CPU: 0
EIP is at print_context_stack+0x69/0x130
EAX: 0000002e EBX: 636f7270 ECX: 00000000 EDX: 04010000
ESI: ffffe000 EDI: 00000000 EBP: f680be98 ESP: f680be68
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 636f7270 CR3: 0192e000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process swapper/0 (pid: 0, ti=f680a000 task=c183cfa0 task.ti=c1832000)
Stack:
  c1761c2c 636f7270 636f6000 00000000 636f7ffc 636f6000 c1833ed4 ffffe000
  c1833ed4 636f7270 c18888dc 636f6000 f680bec8 c1004218 c18888dc f680bedc
  00000000 f680beb4 c1833ed4 00000000 f680bec0 c1833ed4 f680bfc4 0000000a
Call Trace:
  [<c1004218>] dump_trace+0x68/0xf0
  [<c1554682>] x86_backtrace+0xb2/0xc0
  [<c1552682>] oprofile_add_sample+0xa2/0xc0
  [<c10040df>] ? do_softirq+0x6f/0xa0
  [<c15562c9>] ppro_check_ctrs+0x79/0x100
  [<c1556250>] ? ppro_shutdown+0x60/0x60
  [<c15550af>] profile_exceptions_notify+0x8f/0xb0
  [<c1660d78>] nmi_handle.isra.0+0x48/0x70
  [<c1660e9f>] do_nmi+0xff/0x570
  [<c105bd75>] ? run_rebalance_domains+0x155/0x180
  [<c105428b>] ? get_parent_ip+0xb/0x40
  [<c102e379>] ? __local_bh_enable+0x29/0x70
  [<c102ecd0>] ? ftrace_define_fields_irq_handler_entry+0x80/0x80
  [<c16601c9>] nmi_stack_correct+0x28/0x2d
  [<c102ecd0>] ? ftrace_define_fields_irq_handler_entry+0x80/0x80
  [<c10040df>] ? do_softirq+0x6f/0xa0
<IRQ>
  [<c102f155>] irq_exit+0x65/0x70
  [<c1666441>] do_IRQ+0x51/0xc0
  [<c1666369>] common_interrupt+0x29/0x30
  [<c102007b>] ? amd_get_subcaches+0x4b/0x90
  [<c1330416>] ? intel_idle+0xc6/0x120
  [<c14e4719>] cpuidle_enter+0x19/0x30
  [<c14e4cf0>] cpuidle_idle_call+0xa0/0x320
  [<c1009e8a>] cpu_idle+0x5a/0xc0
  [<c163fa48>] rest_init+0x6c/0x74
  [<c189d703>] start_kernel+0x2fe/0x305
  [<c189d23d>] ? repair_env_string+0x51/0x51
  [<c189d078>] i386_start_kernel+0x78/0x7d
Code: 00 00 3b 5d dc 72 13 8b 45 f0 8d 64 24 24 5b 5e 5f 5d c3 8d b4 26 
00 00 00 00 3b 5d ec 72 e8 81 fb ff ff ff bf 0f 86 a3 00 00 00 <8b> 33 
89 f0 e8 ae e5 03 00 85 c0 74 1e 8b 45 f0 83 c0 04 39 c3
EIP: [<c1005229>] print_context_stack+0x69/0x130 SS:ESP 0068:f680be68
CR2: 00000000636f7270
---[ end trace d4af25ee5ff6fd8c ]---

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2012-08-02  7:05 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-02  7:04 oops in x86/oprofile/dump_stack with 3.4.6 wyang1

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.