From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id n4K95rX4098250 for ; Wed, 20 May 2009 04:05:54 -0500 Received: from mail.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 0E9A719EF6B5 for ; Wed, 20 May 2009 02:06:01 -0700 (PDT) Received: from mail.internode.on.net (bld-mail13.adl6.internode.on.net [150.101.137.98]) by cuda.sgi.com with ESMTP id c9TupWdIcS6QC5Cf for ; Wed, 20 May 2009 02:06:01 -0700 (PDT) Date: Wed, 20 May 2009 19:05:58 +1000 From: Dave Chinner Subject: Re: Kernel crash with 2.6.29 + nfs + xfs (radix-tree) Message-ID: <20090520090558.GQ16929@discord.disaster> References: <20090520003745.GA27491@samad.com.au> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20090520003745.GA27491@samad.com.au> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Alex Samad Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com On Wed, May 20, 2009 at 10:37:45AM +1000, Alex Samad wrote: > Hi > > I have been quit a lot of crashes on my debian amd64 box in the 2.6.29 > series of kernel. Seems for me to be when the system is under load and > there is network action -> nfsd -> xfs. Perhaps a use after free or a reference counting problem. Thanks for reporting it. > May 5 19:45:38 x kernel: ------------[ cut here ]------------ > May 5 19:45:39 x kernel: kernel BUG at lib/radix-tree.c:485! > May 5 19:45:39 x kernel: invalid opcode: 0000 [#1] SMP > May 5 19:45:39 x kernel: last sysfs file: > /sys/block/sdc/queue/nr_requests > May 5 19:45:39 x kernel: CPU 0 > May 5 19:45:39 x kernel: Pid: 335, comm: kswapd0 Not tainted 2.6.29.2 #1 S2895 > May 5 19:45:39 x kernel: RIP: 0010:[] [] radix_tree_tag_set+0x86/0xc6 > May 5 19:45:39 x kernel: RSP: 0018:ffff88016e2d1c88 EFLAGS: 00010246 > May 5 19:45:39 x kernel: RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000000000 > May 5 19:45:39 x kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88016a822b58 > May 5 19:45:39 x kernel: RBP: 0000000000000004 R08: 0000000000000000 R09: 8000000000000000 > May 5 19:45:39 x kernel: R10: ffffa5a5a5a5a5a5 R11: ffffffff8037541d R12: 0000000000000001 > May 5 19:45:39 x kernel: R13: 0000000000000000 R14: ffff88016d1bc310 R15: 0000000000000000 > May 5 19:45:39 x kernel: FS: 00007fea1903f6e0(0000) GS:ffffffff80759040(0000) knlGS:0000000000000000 > May 5 19:45:39 x kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > May 5 19:45:39 x kernel: CR2: 00007fd2df5ae8e0 CR3: 000000016bad0000 CR4: 00000000000006e0 > May 5 19:45:39 x kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > May 5 19:45:39 x kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > May 5 19:45:39 x kernel: Process kswapd0 (pid: 335, threadinfo ffff88016e2d0000, task ffff88016f23eac0) > May 5 19:45:39 x kernel: Stack: > May 5 19:45:39 x kernel: 000000000069d804 0000000000000000 ffff88016d1bc2d0 ffff88000a8b7400 > May 5 19:45:39 x kernel: ffff88000a8b7400 ffff88016df30000 ffff88000a8b74f8 ffff88016d1bc30c > May 5 19:45:39 x kernel: ffffffff80376b02 ffff88000a8b7580 0000000000000024 ffff88016e2d1d60 > May 5 19:45:39 x kernel: Call Trace: > May 5 19:45:39 x kernel: [] ? xfs_inode_set_reclaim_tag+0x69/0x89 > May 5 19:45:39 x kernel: [] ? xfs_reclaim+0x99/0x9f > May 5 19:45:39 x kernel: [] ? xfs_fs_destroy_inode+0x36/0x54 > May 5 19:45:39 x kernel: [] ? dispose_list+0xcd/0xfb > May 5 19:45:39 x kernel: [] ? shrink_icache_memory+0x1f4/0x22a > May 5 19:45:39 x kernel: [] ? shrink_slab+0xe4/0x157 > May 5 19:45:39 x kernel: [] ? kswapd+0x44f/0x5c9 > May 5 19:45:39 x kernel: [] ? isolate_pages_global+0x0/0x231 > May 5 19:45:39 x kernel: [] ? autoremove_wake_function+0x0/0x2e > May 5 19:45:39 x kernel: [] ? __wake_up_common+0x44/0x73 > May 5 19:45:39 x kernel: [] ? kswapd+0x0/0x5c9 > May 5 19:45:39 x kernel: [] ? kthread+0x47/0x73 > May 5 19:45:39 x kernel: [] ? child_rip+0xa/0x20 > May 5 19:45:39 x kernel: [] ? kthread+0x0/0x73 > May 5 19:45:39 x kernel: [] ? child_rip+0x0/0x20 > May 5 19:45:39 x kernel: Code: 83 e5 3f 89 ea e8 04 fc ff ff 85 c0 75 > 10 48 8b 54 24 08 48 8d 84 13 18 02 00 00 0f ab 28 48 63 c5 48 8b 5c c3 > 18 48 85 db 75 04 <0f> 0b eb fe 41 83 ed 06 41 ff cc 45$ > May 5 19:45:39 x kernel: RIP [] > radix_tree_tag_set+0x86/0xc6 > May 5 19:45:39 x kernel: RSP > May 5 19:45:39 x kernel: ---[ end trace aed81d6fef80e624 ]--- > > > I have logged a bug with debian > ( more info http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=526406), > there has been one other to report this problem. > > we believe somebody has already reported a similar problem here > http://groups.google.com/group/linux.kernel/browse_thread/thread/dd00f52e93397c9e/6b6814dab9b41a05?pli=1 Which no-one noticed was related to XFS (not in the subject line) and so most people (like me) would have simply deleted it without reading it.... > has any one else seen this problem, who do I need to raise this too ? I've cc'd the XFS list. > I am able to reproduce this problem on my machine (amd64 phenomem II 8G > ram), running virtualbox, I have a vm access the local filesystem via > nfs (udp) and when I do a rm -fr I see the bug I run debian, XFS and 2.6.29 on all my machines but I haven't tripped over the problem - it all appears to be related to calling dispose_list() during/just after removing a lot of files. If you have a simple method of reproducing the problem (e.g. a simple shell script) it would help track down the problem much faster.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs