From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Mar 2007 23:46:48 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l2K6kf6p024723 for ; Mon, 19 Mar 2007 23:46:42 -0700 Date: Tue, 20 Mar 2007 17:46:32 +1100 From: David Chinner Subject: Re: XFS internal error xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller 0xc01b00bd Message-ID: <20070320064632.GO32602149@melbourne.sgi.com> References: <20070316012520.GN5743@melbourne.sgi.com> <20070316195951.GB5743@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Marco Berizzi Cc: David Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com On Mon, Mar 19, 2007 at 11:32:27AM +0100, Marco Berizzi wrote: > Marco Berizzi wrote: > > David Chinner wrote: > > > >> Ok, so an ipsec change. And I see from the history below it > >> really has nothing to do with this problem. it seems the problem > >> has something to do with changes between 2.6.19.1 and 2.6.19.2. > > > > indeed. Yesterday at 13:00 I have switched from 2.6.19.1 to 2.6.19.2 > > (without the ipsec fix) and at about 17:30 linux has crashed again. > > I have recompiled 2.6.19.2 with all kernel debugging options enabled > > and rebooted. Now I'm waiting for the crash... > > Linux has not been crashed. However here is dmesg output > with all debugging option enabled: (search for 'INFO: > possible recursive locking detected'). Is that normal? ..... > ============================================= > [ INFO: possible recursive locking detected ] > 2.6.19.2 #1 > --------------------------------------------- > rm/470 is trying to acquire lock: > (&(&ip->i_lock)->mr_lock){----}, at: [] xfs_ilock+0x5b/0xa1 > > but task is already holding lock: > (&(&ip->i_lock)->mr_lock){----}, at: [] xfs_ilock+0x5b/0xa1 > > other info that might help us debug this: > 3 locks held by rm/470: > #0: (&inode->i_mutex/1){--..}, at: [] do_unlinkat+0x70/0x115 > #1: (&inode->i_mutex){--..}, at: [] mutex_lock+0x1c/0x1f > #2: (&(&ip->i_lock)->mr_lock){----}, at: [] > xfs_ilock+0x5b/0xa1 > > stack backtrace: > [] dump_trace+0x215/0x21a > [] show_trace_log_lvl+0x1a/0x30 > [] show_trace+0x12/0x14 > [] dump_stack+0x19/0x1b > [] print_deadlock_bug+0xc0/0xcf > [] check_deadlock+0x6a/0x79 > [] __lock_acquire+0x350/0x970 > [] lock_acquire+0x75/0x97 > [] down_write+0x3a/0x54 > [] xfs_ilock+0x5b/0xa1 > [] xfs_lock_dir_and_entry+0x105/0x11b > [] xfs_remove+0x180/0x47f > [] xfs_vn_unlink+0x22/0x4f > [] vfs_unlink+0x9e/0xa2 > [] do_unlinkat+0xa8/0x115 > [] sys_unlink+0x10/0x12 > [] syscall_call+0x7/0xb > [] 0xb7efaa7d > ======================= That's no problem - lockdep just doesn't know that we can nest i_lock (we've got to get the annotations for this sorted out). > Here is the relevant results: > > Phase 2 - found root inode chunk > Phase 3 - ... > agno = 0 > ... > agno = 12 > LEAFN node level is 1 inode 1610612918 bno = 8388608 Hmmm - single bit error in the bno - that reminds of this: http://oss.sgi.com/projects/xfs/faq.html#dir2 So I'd definitely make sure that is repaired.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group