From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754440AbYFCQ3A (ORCPT ); Tue, 3 Jun 2008 12:29:00 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751762AbYFCQ2w (ORCPT ); Tue, 3 Jun 2008 12:28:52 -0400 Received: from saeurebad.de ([85.214.36.134]:50775 "EHLO saeurebad.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751753AbYFCQ2v (ORCPT ); Tue, 3 Jun 2008 12:28:51 -0400 From: Johannes Weiner To: "Kevin Winchester" Cc: "Peter Zijlstra" , "Ingo Molnar" , "Linux Kernel Mailing List" Subject: Re: linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() References: <4844868F.20104@gmail.com> <1212483192.6304.8.camel@twins> Date: Tue, 03 Jun 2008 18:28:09 +0200 In-Reply-To: (Kevin Winchester's message of "Tue, 3 Jun 2008 09:22:28 -0300") Message-ID: <87ve0qh49y.fsf@saeurebad.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.1.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, "Kevin Winchester" writes: > On Tue, Jun 3, 2008 at 5:53 AM, Peter Zijlstra wrote: >> On Mon, 2008-06-02 at 20:47 -0300, Kevin Winchester wrote: >>> In next-20080530 and next-20080602 (and possibly earlier - I can't >>> remember the linux-next tree before that I tried) I get the following: >>> >>> [ 12.885153] ------------[ cut here ]------------ >>> [ 12.885203] WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() >>> [ 12.885248] Pid: 4, comm: watchdog/0 Not tainted >>> 2.6.26-rc4-next-20080602 #13 >>> [ 12.885292] >>> [ 12.885293] Call Trace: >>> [ 12.885364] [] warn_on_slowpath+0x58/0x8a >>> [ 12.885410] [] ? _spin_unlock_irqrestore+0x51/0x6d >>> [ 12.885455] [] ? debug_locks_off+0x9/0x3c >>> [ 12.885498] [] ? ftrace_record_ip+0x1fa/0x272 >>> [ 12.885542] [] ? watchdog+0xc5/0x1ff >>> [ 12.885584] [] ? mcount_call+0x5/0x35 >>> [ 12.885627] [] ? watchdog+0xc5/0x1ff >>> [ 12.885668] [] check_flags+0x98/0x151 >>> [ 12.885710] [] lock_acquire+0x4a/0xa9 >>> [ 12.885753] [] ? watchdog+0xc5/0x1ff >>> [ 12.885795] [] ? watchdog+0x0/0x1ff >>> [ 12.885837] [] _read_lock+0x37/0x43 >>> [ 12.885879] [] watchdog+0xc5/0x1ff >>> [ 12.885921] [] ? watchdog+0x0/0x1ff >>> [ 12.885963] [] kthread+0x4e/0x7b >>> [ 12.886005] [] child_rip+0xa/0x12 >>> [ 12.886046] [] ? finish_task_switch+0x57/0x92 >>> [ 12.886090] [] ? _spin_unlock_irq+0x3b/0x57 >>> [ 12.886133] [] ? restore_args+0x0/0x30 >>> [ 12.886137] [] ? kthread+0x0/0x7b >>> [ 12.886137] [] ? child_rip+0x0/0x12 >>> [ 12.886137] >>> [ 12.886137] ---[ end trace 60e7f098a6913839 ]--- >>> [ 12.886137] possible reason: unannotated irqs-on. >>> [ 12.886137] irq event stamp: 20 >>> [ 12.886137] hardirqs last enabled at (19): [] >>> trace_hardirqs_on+0xd/0xf >>> [ 12.886137] hardirqs last disabled at (20): [] >>> trace_hardirqs_off+0xd/0xf >>> [ 12.886137] softirqs last enabled at (0): [] >>> copy_process+0x2da/0x109e >>> [ 12.886137] softirqs last disabled at (0): [<0000000000000000>] 0x0 >>> >>> Do I understand this correctly that something enabled irqs in a way that >>> got around lockdep? I assume the problem is not in watchdog, just that >>> the watchdog was the next thing to run that interacted with irqs and >>> thus lockdep picked up the situation then? >>> >>> Is there additional debugging I can do, given some instructions? If the >>> cause is readily apparent to anyone, could they let me know (for my own >>> interest) why it is apparent so that I can investigate things like this >>> further next time? >> >> >> You are correct - someone did sti but failed to call >> trace_hardirqs_on(). Frequently its possible to isolate the code from >> knowing the last recorded event: >> >> [ 12.886137] irq event stamp: 20 >> [ 12.886137] hardirqs last disabled at (20): [] trace_hardirqs_off+0xd/0xf >> >> However your compilation seems to have lost the caller IP: >> >> void trace_hardirqs_off(void) >> { >> trace_hardirqs_off_caller(CALLER_ADDR0); >> } >> EXPORT_SYMBOL(trace_hardirqs_off); >> >> So that is of little help here. (Not sure how that happened, nor how you >> could fix that - perhaps turn on some debugging knobs like build with >> debug info etc..) >> > > Could this be related to CONFIG_CC_OPTIMIZE_INLINING (or whatever it > is called now)? I am pretty sure I have that enabled. I will check > the rest of my config this evening to see if there is anything else I > can turn on/off to help. > >> >>> This is completely reproducible on every boot - should I try to bisect it? >> >> That is a possibility yes - if you can find the offending patch it >> should be relatively straight forward to find the offending sti. >> > > Sure - I can try it this evening. Also, is calling sti the only way > this could have happened? And is linux-next broken out into a single > patch? I wouldn't expect there to be too many calls to sti, so a grep > in the patch file might be quicker than bisection. git-log -p origin/master..next/master showed nothing interesting. Grepping for sti() would also not find indirect calls. Hannes