oops in x86/oprofile/dump_stack with 3.4.6

All of lore.kernel.org
 help / color / mirror / Atom feed

From: wyang1 <Wei.Yang@windriver.com>
To: <linux-kernel@vger.kernel.org>, <robert.richter@amd.com>
Subject: oops in x86/oprofile/dump_stack with 3.4.6
Date: Thu, 2 Aug 2012 15:04:38 +0800	[thread overview]
Message-ID: <501A2686.8000605@windriver.com> (raw)

Hi all,

A couple of days ago I tried to use oprofile with enabling call graph in 
a recent build of 3.4.6. this causes a OOPS
"BUG: unable to handle kernel paging request at 636f7270".

The oops can be often reproduced by the following steps on my board 
based on Intel Atom.

opcontrol --no-vmlinux
opcontrol -c 10
opcontrol -e BR_INST_RETIRED:8000
opcontrl -s

With a few seconds I get the oops.

I've spent some time investigating the problem, and found it is possible 
that the marked line in dump_strace function results in the oops as the 
content of stack
variable is less than CONFIG_PAGE_OFFSET(0xc000_0000).
arch/x86/kernel/dumpstack_32.c
int dump_stack()
...
for (;;) {
         struct thread_info *context;
         context = (struct thread_info *)
             ((unsigned long)stack & (~(THREAD_SIZE - 1)));
         bp = ops->walk_stack(context, stack, bp, ops, data, NULL, &graph);

----->  stack = (unsigned long *)context->previous_esp;
         if (!stack)
             break;
         if (ops->stack(data, "IRQ") < 0)
             break;
         touch_nmi_watchdog();
     }
...
Actually I think this scenario should not happen, because the stack 
should be kernel stack or irq stack.
So I changed this function by adding simplistic check to avoid this 
scenario. It seems to work for my board.

for (;;) {
                 struct thread_info *context;
+               /*
+                * Somehow stack may be less than CONFIG_PAGE_OFFSET,
+                * So we should temporarily avoid the scenario before
+                * figuring out the root cause.
+                */
+               if (stack < CONFIG_PAGE_OFFSET)
+                       break;

                 context = (struct thread_info *)
                         ((unsigned long)stack & (~(THREAD_SIZE - 1)));


Any suggestion for the change.

Thanks
Wei

The following is the oops I encountered.
BUG: unable to handle kernel paging request at 636f7270
IP: [<c1005229>] print_context_stack+0x69/0x130
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
LTT NESTING LEVEL : 0
Modules linked in:

Pid: 0, comm: swapper/0 Not tainted 
3.4.6-WR5.0+snapshot-20120801_standard #9 To be filled by O.E.M. To be 
filled by O.E.M./To be filled by O.E.M.
EIP: 0060:[<c1005229>] EFLAGS: 00010046 CPU: 0
EIP is at print_context_stack+0x69/0x130
EAX: 0000002e EBX: 636f7270 ECX: 00000000 EDX: 04010000
ESI: ffffe000 EDI: 00000000 EBP: f680be98 ESP: f680be68
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 636f7270 CR3: 0192e000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process swapper/0 (pid: 0, ti=f680a000 task=c183cfa0 task.ti=c1832000)
Stack:
  c1761c2c 636f7270 636f6000 00000000 636f7ffc 636f6000 c1833ed4 ffffe000
  c1833ed4 636f7270 c18888dc 636f6000 f680bec8 c1004218 c18888dc f680bedc
  00000000 f680beb4 c1833ed4 00000000 f680bec0 c1833ed4 f680bfc4 0000000a
Call Trace:
  [<c1004218>] dump_trace+0x68/0xf0
  [<c1554682>] x86_backtrace+0xb2/0xc0
  [<c1552682>] oprofile_add_sample+0xa2/0xc0
  [<c10040df>] ? do_softirq+0x6f/0xa0
  [<c15562c9>] ppro_check_ctrs+0x79/0x100
  [<c1556250>] ? ppro_shutdown+0x60/0x60
  [<c15550af>] profile_exceptions_notify+0x8f/0xb0
  [<c1660d78>] nmi_handle.isra.0+0x48/0x70
  [<c1660e9f>] do_nmi+0xff/0x570
  [<c105bd75>] ? run_rebalance_domains+0x155/0x180
  [<c105428b>] ? get_parent_ip+0xb/0x40
  [<c102e379>] ? __local_bh_enable+0x29/0x70
  [<c102ecd0>] ? ftrace_define_fields_irq_handler_entry+0x80/0x80
  [<c16601c9>] nmi_stack_correct+0x28/0x2d
  [<c102ecd0>] ? ftrace_define_fields_irq_handler_entry+0x80/0x80
  [<c10040df>] ? do_softirq+0x6f/0xa0
<IRQ>
  [<c102f155>] irq_exit+0x65/0x70
  [<c1666441>] do_IRQ+0x51/0xc0
  [<c1666369>] common_interrupt+0x29/0x30
  [<c102007b>] ? amd_get_subcaches+0x4b/0x90
  [<c1330416>] ? intel_idle+0xc6/0x120
  [<c14e4719>] cpuidle_enter+0x19/0x30
  [<c14e4cf0>] cpuidle_idle_call+0xa0/0x320
  [<c1009e8a>] cpu_idle+0x5a/0xc0
  [<c163fa48>] rest_init+0x6c/0x74
  [<c189d703>] start_kernel+0x2fe/0x305
  [<c189d23d>] ? repair_env_string+0x51/0x51
  [<c189d078>] i386_start_kernel+0x78/0x7d
Code: 00 00 3b 5d dc 72 13 8b 45 f0 8d 64 24 24 5b 5e 5f 5d c3 8d b4 26 
00 00 00 00 3b 5d ec 72 e8 81 fb ff ff ff bf 0f 86 a3 00 00 00 <8b> 33 
89 f0 e8 ae e5 03 00 85 c0 74 1e 8b 45 f0 83 c0 04 39 c3
EIP: [<c1005229>] print_context_stack+0x69/0x130 SS:ESP 0068:f680be68
CR2: 00000000636f7270
---[ end trace d4af25ee5ff6fd8c ]---

                 reply	other threads:[~2012-08-02  7:05 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=501A2686.8000605@windriver.com \
    --to=wei.yang@windriver.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robert.richter@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.