From mboxrd@z Thu Jan 1 00:00:00 1970 Received: with ECARTIS (v1.0.0; list linux-mips); Tue, 05 Aug 2008 22:38:50 +0100 (BST) Received: from zrtps0kn.nortel.com ([47.140.192.55]:48869 "EHLO zrtps0kn.nortel.com") by ftp.linux-mips.org with ESMTP id S20043350AbYHEVio (ORCPT ); Tue, 5 Aug 2008 22:38:44 +0100 Received: from zcarhxs1.corp.nortel.com (zcarhxs1.corp.nortel.com [47.129.230.89]) by zrtps0kn.nortel.com (Switch-2.2.6/Switch-2.2.0) with ESMTP id m75LcVh08078; Tue, 5 Aug 2008 21:38:31 GMT Received: from [47.130.64.132] ([47.130.64.132] RDNS failed) by zcarhxs1.corp.nortel.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 5 Aug 2008 17:38:16 -0400 Message-ID: <4898C845.1060903@nortel.com> Date: Tue, 05 Aug 2008 15:38:13 -0600 From: "Chris Friesen" User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Kaz Kylheku CC: linux-mips@linux-mips.org, ralf@linux-mips.org Subject: Re: looking for help interpreting softlockup/stack trace References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 05 Aug 2008 21:38:17.0104 (UTC) FILETIME=[92654900:01C8F743] Return-Path: X-Envelope-To: <"|/home/ecartis/ecartis -s linux-mips"> (uid 0) X-Orcpt: rfc822;linux-mips@linux-mips.org Original-Recipient: rfc822;linux-mips@linux-mips.org X-archive-position: 20121 X-ecartis-version: Ecartis v1.0.0 Sender: linux-mips-bounce@linux-mips.org Errors-to: linux-mips-bounce@linux-mips.org X-original-sender: cfriesen@nortel.com Precedence: bulk X-list: linux-mips Kaz Kylheku wrote: > Chris Friesem wrote: >>In the trace below, is "epc" the program counter at the time of the >>timer interrupt? How does "ra" fit into this, given that the function >>whose address it contains isn't seen in the stack trace until quite a >>ways down? > Mind you, these Linux MIPS stack traces are not completely > trustworthy; the routine just walks the stack one word at a time > and prints out anything that looks like the address of code. > So the stack trace may include stale stack data from previous > call chains that have already returned. It also might not > include the full call chain, because the return address is not > always stored on the stack. For instance, here it looks like > RA is pointing into a kernel module. But you don't actually > see this in the fake stack trace. Nowhere in the stack trace > do you see _read_lock being called by c000000001b4ab9c. Right...but it does look like it calls _read_unlock(). > I would proceed by inspecting this vnb module and its use of locks. Yeah...I'll do that. >>scheduler_tick() calls BUG: soft lockup detected on CPU#0! One thing I've noticed, all the softlockups are on cpu0 even though other cpus are also complaining about scheduling delays. Is there something special about cpu0? >>Call Trace: >> [] softlockup_tick+0x12c/0x158 >> [] timer_interrupt+0xd4/0x4c8 >> [] timer_interrupt+0x288/0x4c8 >> [] octeon_main_timer_interrupt+0x78/0x90 >> [] handle_IRQ_event+0x45c/0x1528 >> [] __do_IRQ+0xd4/0x230 >> [] _read_lock+0x0/0x20 > > > This read lock is almost certainly a red herring; it's > stuff left over on the stack that hasn't been overwritten > by new activation chains. Wouldn't it be part of the do_IRQ call below? >> [] do_IRQ+0x440/0x1520 > > > This is the timer interrupt going off, which ends up > detecting the lockup. So the problem is above here. I assume you mean "above" in terms of chronological order, since in terms of the call trace the problem would be below. >> [] neigh_lookup+0xb4/0x128 > > > This may also be garbage. Basically, the stack > trace is not that useful, except that it provides > some additional circumstantial evidence implicating > the vnb module. It's unfortunate that the call trace isn't accurate. Thanks for the help, Chris