From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753879Ab2HBHFv (ORCPT ); Thu, 2 Aug 2012 03:05:51 -0400 Received: from mail1.windriver.com ([147.11.146.13]:41012 "EHLO mail1.windriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751248Ab2HBHFu (ORCPT ); Thu, 2 Aug 2012 03:05:50 -0400 Message-ID: <501A2686.8000605@windriver.com> Date: Thu, 2 Aug 2012 15:04:38 +0800 From: wyang1 Reply-To: User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.21) Gecko/20110831 Thunderbird/3.1.13 MIME-Version: 1.0 To: , Subject: oops in x86/oprofile/dump_stack with 3.4.6 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [128.224.162.190] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, A couple of days ago I tried to use oprofile with enabling call graph in a recent build of 3.4.6. this causes a OOPS "BUG: unable to handle kernel paging request at 636f7270". The oops can be often reproduced by the following steps on my board based on Intel Atom. opcontrol --no-vmlinux opcontrol -c 10 opcontrol -e BR_INST_RETIRED:8000 opcontrl -s With a few seconds I get the oops. I've spent some time investigating the problem, and found it is possible that the marked line in dump_strace function results in the oops as the content of stack variable is less than CONFIG_PAGE_OFFSET(0xc000_0000). arch/x86/kernel/dumpstack_32.c int dump_stack() ... for (;;) { struct thread_info *context; context = (struct thread_info *) ((unsigned long)stack & (~(THREAD_SIZE - 1))); bp = ops->walk_stack(context, stack, bp, ops, data, NULL, &graph); -----> stack = (unsigned long *)context->previous_esp; if (!stack) break; if (ops->stack(data, "IRQ") < 0) break; touch_nmi_watchdog(); } ... Actually I think this scenario should not happen, because the stack should be kernel stack or irq stack. So I changed this function by adding simplistic check to avoid this scenario. It seems to work for my board. for (;;) { struct thread_info *context; + /* + * Somehow stack may be less than CONFIG_PAGE_OFFSET, + * So we should temporarily avoid the scenario before + * figuring out the root cause. + */ + if (stack < CONFIG_PAGE_OFFSET) + break; context = (struct thread_info *) ((unsigned long)stack & (~(THREAD_SIZE - 1))); Any suggestion for the change. Thanks Wei The following is the oops I encountered. BUG: unable to handle kernel paging request at 636f7270 IP: [] print_context_stack+0x69/0x130 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP LTT NESTING LEVEL : 0 Modules linked in: Pid: 0, comm: swapper/0 Not tainted 3.4.6-WR5.0+snapshot-20120801_standard #9 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M. EIP: 0060:[] EFLAGS: 00010046 CPU: 0 EIP is at print_context_stack+0x69/0x130 EAX: 0000002e EBX: 636f7270 ECX: 00000000 EDX: 04010000 ESI: ffffe000 EDI: 00000000 EBP: f680be98 ESP: f680be68 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 636f7270 CR3: 0192e000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process swapper/0 (pid: 0, ti=f680a000 task=c183cfa0 task.ti=c1832000) Stack: c1761c2c 636f7270 636f6000 00000000 636f7ffc 636f6000 c1833ed4 ffffe000 c1833ed4 636f7270 c18888dc 636f6000 f680bec8 c1004218 c18888dc f680bedc 00000000 f680beb4 c1833ed4 00000000 f680bec0 c1833ed4 f680bfc4 0000000a Call Trace: [] dump_trace+0x68/0xf0 [] x86_backtrace+0xb2/0xc0 [] oprofile_add_sample+0xa2/0xc0 [] ? do_softirq+0x6f/0xa0 [] ppro_check_ctrs+0x79/0x100 [] ? ppro_shutdown+0x60/0x60 [] profile_exceptions_notify+0x8f/0xb0 [] nmi_handle.isra.0+0x48/0x70 [] do_nmi+0xff/0x570 [] ? run_rebalance_domains+0x155/0x180 [] ? get_parent_ip+0xb/0x40 [] ? __local_bh_enable+0x29/0x70 [] ? ftrace_define_fields_irq_handler_entry+0x80/0x80 [] nmi_stack_correct+0x28/0x2d [] ? ftrace_define_fields_irq_handler_entry+0x80/0x80 [] ? do_softirq+0x6f/0xa0 [] irq_exit+0x65/0x70 [] do_IRQ+0x51/0xc0 [] common_interrupt+0x29/0x30 [] ? amd_get_subcaches+0x4b/0x90 [] ? intel_idle+0xc6/0x120 [] cpuidle_enter+0x19/0x30 [] cpuidle_idle_call+0xa0/0x320 [] cpu_idle+0x5a/0xc0 [] rest_init+0x6c/0x74 [] start_kernel+0x2fe/0x305 [] ? repair_env_string+0x51/0x51 [] i386_start_kernel+0x78/0x7d Code: 00 00 3b 5d dc 72 13 8b 45 f0 8d 64 24 24 5b 5e 5f 5d c3 8d b4 26 00 00 00 00 3b 5d ec 72 e8 81 fb ff ff ff bf 0f 86 a3 00 00 00 <8b> 33 89 f0 e8 ae e5 03 00 85 c0 74 1e 8b 45 f0 83 c0 04 39 c3 EIP: [] print_context_stack+0x69/0x130 SS:ESP 0068:f680be68 CR2: 00000000636f7270 ---[ end trace d4af25ee5ff6fd8c ]---