From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id n7Q0kIbj236745 for ; Tue, 25 Aug 2009 19:46:28 -0500 Received: from peace.netnation.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 82184153EF6B for ; Tue, 25 Aug 2009 17:47:00 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by cuda.sgi.com with ESMTP id reQBvJw8u0CEDOXt for ; Tue, 25 Aug 2009 17:47:00 -0700 (PDT) Date: Tue, 25 Aug 2009 17:46:58 -0700 From: Simon Kirby Subject: [2.6.30.4] XFS-related BUG and hang via shrink_icache_memory Message-ID: <20090826004658.GA30929@hostway.ca> MIME-Version: 1.0 Content-Disposition: inline List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com, linux-kernel@vger.kernel.org On an NFS storage server, we started using some XFS filesystems along with many other EXT3 (on LVM on AOE). The following bug has occurred twice, with the machine hanging immediately after (console full of scrolling oopses or bugs -- haven't seen it myself -- after this): Aug 25 16:16:15 nas03 kernel: kernel BUG at lib/radix-tree.c:485! Aug 25 16:16:15 nas03 kernel: CPU 1 Aug 25 16:16:15 nas03 kernel: Pid: 417, comm: kswapd0 Not tainted 2.6.30.4-hw #1 PowerEdge 1950 Aug 25 16:16:15 nas03 kernel: RIP: 0010:[] [] radix_tree_tag_set+0xa2/0xb0 Aug 25 16:16:15 nas03 kernel: RSP: 0018:ffff88022fb1dc78 EFLAGS: 00010246 Aug 25 16:16:15 nas03 kernel: RAX: 000000000000001e RBX: 0000000000000000 RCX: ffff8801d2f855c8 Aug 25 16:16:15 nas03 kernel: RDX: 0000000000000000 RSI: 000000000000009e RDI: ffff88022c704530 Aug 25 16:16:15 nas03 kernel: RBP: ffff88022fb1dc80 R08: 000000000000001e R09: 0000000000000000 Aug 25 16:16:15 nas03 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8801385a6180 Aug 25 16:16:15 nas03 kernel: R13: ffff88022cb56800 R14: 000000000000000f R15: 0000000000000080 Aug 25 16:16:15 nas03 kernel: FS: 0000000000000000(0000) GS:ffff88002804d000(0000) knlGS:0000000000000000 Aug 25 16:16:15 nas03 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Aug 25 16:16:15 nas03 kernel: CR2: 00007f759ae1dae0 CR3: 0000000000201000 CR4: 00000000000006e0 Aug 25 16:16:15 nas03 kernel: ffff88022c7044f0 ffff88022fb1dcb0 ffffffff80439198 ffff88022fb1dcc0 Aug 25 16:16:15 nas03 kernel: ffff8801385a6180 ffff8801385a6300 ffff88022fb1dd60 ffff88022fb1dcd0 Aug 25 16:16:15 nas03 kernel: ffffffff80429cbb ffff88022fb1dce0 ffff8801385a6300 ffff88022fb1dcf0 Aug 25 16:16:15 nas03 kernel: Call Trace: Aug 25 16:16:15 nas03 kernel: [] xfs_inode_set_reclaim_tag+0x78/0xa0 Aug 25 16:16:15 nas03 kernel: [] xfs_reclaim+0x5b/0xb0 Aug 25 16:16:15 nas03 kernel: [] xfs_fs_destroy_inode+0x38/0x60 Aug 25 16:16:15 nas03 kernel: [] destroy_inode+0x2e/0x50 Aug 25 16:16:15 nas03 kernel: [] dispose_list+0x96/0x110 Aug 25 16:16:15 nas03 kernel: [] shrink_icache_memory+0x1be/0x2b0 Aug 25 16:16:15 nas03 kernel: [] shrink_slab+0x125/0x180 Aug 25 16:16:15 nas03 kernel: [] kswapd+0x3c9/0x5c0 Aug 25 16:16:15 nas03 kernel: [] ? isolate_pages_global+0x0/0x290 Aug 25 16:16:15 nas03 kernel: [] ? thread_return+0x3f/0x63e Aug 25 16:16:15 nas03 kernel: [] ? autoremove_wake_function+0x0/0x40 Aug 25 16:16:15 nas03 kernel: [] ? kswapd+0x0/0x5c0 Aug 25 16:16:15 nas03 kernel: [] ? early_idt_handler+0x0/0x71 Aug 25 16:16:15 nas03 kernel: [] kthread+0x5a/0x90 Aug 25 16:16:15 nas03 kernel: [] child_rip+0xa/0x20 Aug 25 16:16:15 nas03 kernel: [] ? early_idt_handler+0x0/0x71 Aug 25 16:16:15 nas03 kernel: [] ? kthread+0x0/0x90 Aug 25 16:16:15 nas03 kernel: [] ? child_rip+0x0/0x20 Aug 25 16:16:15 nas03 kernel: Code: 4d 85 d2 74 26 41 ff cb 75 c4 4d 85 d2 74 16 8b 47 04 8d 4b 15 ba 01 00 00 00 d3 e2 85 c2 75 05 09 d0 89 47 04 5b c9 4c 89 d0 c3 <0f> 0b eb fe 0f 0b eb fe 66 66 90 66 66 90 55 48 89 e5 41 57 41 >>RIP; ffffffff8046b4f2 <===== >>RCX; ffff8801d2f855c8 >>RDI; ffff88022c704530 >>RBP; ffff88022fb1dc80 >>R12; ffff8801385a6180 >>R13; ffff88022cb56800 Trace; ffffffff80439198 Trace; ffffffff80429cbb Trace; ffffffff80437ce8 Trace; ffffffff802c6e0e Trace; ffffffff802c7206 Trace; ffffffff802c743e Trace; ffffffff80290d25 Trace; ffffffff802915d9 Trace; ffffffff8028ee80 Trace; ffffffff80702e11 Trace; ffffffff80256790 Trace; ffffffff80291210 Trace; ffffffff8095c140 Trace; ffffffff8025637a Trace; ffffffff8020ce0a Trace; ffffffff8095c140 Trace; ffffffff80256320 Trace; ffffffff8020ce00 Code; ffffffff8046b4c7 0000000000000000 <_RIP>: Code; ffffffff8046b4c7 0: 4d 85 d2 test %r10,%r10 Code; ffffffff8046b4ca 3: 74 26 je 2b <_RIP+0x2b> Code; ffffffff8046b4cc 5: 41 ff cb dec %r11d Code; ffffffff8046b4cf 8: 75 c4 jne ffffffffffffffce <_RIP+0xffffffffffffffce> Code; ffffffff8046b4d1 a: 4d 85 d2 test %r10,%r10 Code; ffffffff8046b4d4 d: 74 16 je 25 <_RIP+0x25> Code; ffffffff8046b4d6 f: 8b 47 04 mov 0x4(%rdi),%eax Code; ffffffff8046b4d9 12: 8d 4b 15 lea 0x15(%rbx),%ecx Code; ffffffff8046b4dc 15: ba 01 00 00 00 mov $0x1,%edx Code; ffffffff8046b4e1 1a: d3 e2 shl %cl,%edx Code; ffffffff8046b4e3 1c: 85 c2 test %eax,%edx Code; ffffffff8046b4e5 1e: 75 05 jne 25 <_RIP+0x25> Code; ffffffff8046b4e7 20: 09 d0 or %edx,%eax Code; ffffffff8046b4e9 22: 89 47 04 mov %eax,0x4(%rdi) Code; ffffffff8046b4ec 25: 5b pop %rbx Code; ffffffff8046b4ed 26: c9 leaveq Code; ffffffff8046b4ee 27: 4c 89 d0 mov %r10,%rax Code; ffffffff8046b4f1 2a: c3 retq Code; ffffffff8046b4f2 <===== 2b: 0f 0b ud2a <===== Code; ffffffff8046b4f4 2d: eb fe jmp 2d <_RIP+0x2d> Code; ffffffff8046b4f6 2f: 0f 0b ud2a Code; ffffffff8046b4f8 31: eb fe jmp 31 <_RIP+0x31> Code; ffffffff8046b4fa 33: 66 66 90 xchg %ax,%ax Code; ffffffff8046b4fd 36: 66 66 90 xchg %ax,%ax Code; ffffffff8046b500 39: 55 push %rbp Code; ffffffff8046b501 3a: 48 89 e5 mov %rsp,%rbp Code; ffffffff8046b504 3d: 41 57 push %r15 Code; ffffffff8046b506 3f: 41 rex.B This is stock 2.6.30.4, x86_64, serving files over NFS. Perhaps something in the shrink_icache_memory path (which happens to get hit a lot with our particular load patterns) isn't safe with XFS? I'm a bit low on sleep so I'm sure I'm missing some info. Please ask. :) Simon- _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs