From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Tue, 14 Oct 2008 17:57:03 -0700 (PDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m9F0uxPB030105 for ; Tue, 14 Oct 2008 17:57:00 -0700 Received: from ipmail05.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 86F4EA8BB5C for ; Tue, 14 Oct 2008 17:58:41 -0700 (PDT) Received: from ipmail05.adl2.internode.on.net (ipmail05.adl2.internode.on.net [203.16.214.145]) by cuda.sgi.com with ESMTP id wnHj9Jvl4GpCaMve for ; Tue, 14 Oct 2008 17:58:41 -0700 (PDT) Date: Wed, 15 Oct 2008 11:54:41 +1100 From: Dave Chinner Subject: Re: fw: [PATCH] fix instant oops with tracing enabled Message-ID: <20081015005441.GR10716@disturbed> References: <20081013223932.GE10716@disturbed> <48F3EA6F.9000209@sgi.com> <20081014131140.GB17351@lst.de> <48F546ED.6050702@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48F546ED.6050702@sgi.com> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Lachlan McIlroy Cc: Christoph Hellwig , Mark Goodwin , xfs@oss.sgi.com On Wed, Oct 15, 2008 at 11:27:09AM +1000, Lachlan McIlroy wrote: > Christoph Hellwig wrote: >> On Tue, Oct 14, 2008 at 10:40:15AM +1000, Mark Goodwin wrote: >>> Lachlan also saw some regressions after merging these patchsets : >>> . replace the mount inode list with radix tree traversals >>> . clean up sync code >> >> What exactly? I saw some softlookup in 042, but when applying Dave's >> xfs_sync_inodeS_ag fix (or the hal of it applying without the del inodes >> tracking in the radix tree) it goes away. > > I saw this panic but I don't think it's related to the above patches: > > [252921.307588] BUG: unable to handle kernel <3>BUG: scheduling while atomic: dd/16976/0xf101da90 Isn't there another line with this ouutput that looks like: atomic = 1 in_interrupt = 0 To indicate the "atomic" reason? > [252921.307908] Modules linked in: > [252921.307911] Pid: 16976, comm: dd Not tainted 2.6.27-rc8 #183 > [252921.307913] [252921.307913] Call Trace: [ snip exceedingly deep stack that'll blow a 4k ia32 stack completely ] In summary, the stack is: write balance_dirty_pages xfs_iomap_write_allocate try_to_free_pages xfs_iomap_write_allocate _xfs_trans_commit xlog_write xlog_state_get_iclog_space The question is what is the reason for running in atomic mode? The only place I can see a sleep happening in this function is the call to sv_wait(), which means the atomic state must have come from higher up.... Seems very strange. > I saw sync get stuck in an infinite loop running test 042 - maybe the same > problem you saw. Yes, that's the lockup that the later patch I posted fixes. > I saw the panic in _xfs_itrace_exit() which has now been fixed. > > And I also saw this assertion: > > <4>[34770.626472] Assertion failed: (index >= 0) && (index < ktp->kt_nentries), file: fs/xfs/support/ktrace.c, line: 173 > <0>[34770.626511] ------------[ cut here ]------------ > <2>[34770.627419] kernel BUG at fs/xfs/support/debug.c:81! I can't see how that is related to the changes - it's a trace buffer index overrun. That kind of implies that the ktrace_t has been corrupted. Memory corruption of some kind? Cheers, Dave. -- Dave Chinner david@fromorbit.com