From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Mon, 14 Aug 2006 07:18:37 -0700 (PDT) Received: from mx.wurtel.net (xs.wurtel.net [83.68.3.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k7EEIODW007628 for ; Mon, 14 Aug 2006 07:18:26 -0700 Received: from wurtel ([192.168.1.1] helo=wurtel-ws.wurtel.net) by mx.wurtel.net with esmtp (Exim 3.36 #1 (Debian)) id 1GCdFz-0004w0-00 for ; Mon, 14 Aug 2006 16:17:31 +0200 Received: from paul by wurtel-ws.wurtel.net with local (Exim 4.62) (envelope-from ) id 1GCdFz-0008Uv-AR for xfs@oss.sgi.com; Mon, 14 Aug 2006 16:17:31 +0200 Date: Mon, 14 Aug 2006 16:17:31 +0200 From: Paul Slootman Subject: XFS internal error XFS_WANT_CORRUPTED_GOTO Message-ID: <20060814141731.GA9098@wurtel.net> References: <20060810164222.GA16332@wurtel.net> <200608110125.LAA18091@larry.melbourne.sgi.com> <20060811090218.GB22934@wurtel.net> <20060812091451.GA16661@wurtel.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060812091451.GA16661@wurtel.net> Sender: xfs-bounce@oss.sgi.com Errors-To: xfs-bounce@oss.sgi.com List-Id: xfs To: xfs@oss.sgi.com On Sat 12 Aug 2006, Paul Slootman wrote: > > I've now zapped that directory with xfs_db, and am running the (daily?!) > xfs_repair at this moment. As the filesystem is 1.1TB, it takes a couple > of hours :( That showed the following message in phase 3 because of the xfs_db action: imap claims a free inode 261 is in use, correcting imap and clearing inode and then in phase 4: entry "lost+found.x" at block 0 offset 584 in directory inode 256 references free inode 261 clearing inode number in entry at offset 584... and in phase 6: rebuilding directory inode 256 and phase 7: resetting inode 256 nlinks from 17 to 16 but nothing beyond that. However, that night: Aug 13 08:28:00 boes kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 874 of file fs/xfs/xfs_ialloc.c. Caller 0xffffffff8803be2f Aug 13 08:28:00 boes kernel: Aug 13 08:28:00 boes kernel: Call Trace: {:xfs:xfs_dialloc+1958} Aug 13 08:28:00 boes kernel: {:xfs:_xfs_buf_lookup_pages+711} {:xfs:xlog_state_get_iclog_space+56} Aug 13 08:28:00 boes kernel: {:xfs:xfs_ialloc+95} {:xfs:kmem_zone_alloc+91} Aug 13 08:28:00 boes kernel: {:xfs:xfs_dir_ialloc+134} {:xfs:xfs_log_reserve+195} Aug 13 08:28:00 boes kernel: {:xfs:xfs_mkdir+923} {:xfs:xfs_acl_get_attr+91} Aug 13 08:28:00 boes kernel: {:xfs:xfs_vn_mknod+465} {d_rehash+112} Aug 13 08:28:00 boes kernel: {__mutex_unlock_slowpath+415} {real_lookup+157} Aug 13 08:28:00 boes kernel: {_atomic_dec_and_lock+65} {mntput_no_expire+36} Aug 13 08:28:00 boes kernel: {__link_path_walk+3576} {__up_read+33} Aug 13 08:28:00 boes kernel: {:xfs:xfs_iunlock+102} {:xfs:xfs_access+74} Aug 13 08:28:00 boes kernel: {:xfs:xfs_vn_permission+20} {permission+104} Aug 13 08:28:00 boes kernel: {__link_path_walk+170} {:xfs:xfs_access+74} Aug 13 08:28:00 boes kernel: {vfs_mkdir+130} {sys_mkdirat+165} Aug 13 08:28:00 boes kernel: {system_call+126} Aug 13 08:28:00 boes kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 874 of file fs/xfs/xfs_ialloc.c. Caller 0xffffffff8803be2f Aug 13 08:28:00 boes kernel: Aug 13 08:28:00 boes kernel: Call Trace: {:xfs:xfs_dialloc+1958} Aug 13 08:28:00 boes kernel: {__generic_unplug_device+33} {kobject_release+0} Aug 13 08:28:00 boes kernel: {:xfs:xlog_state_get_iclog_space+56} Aug 13 08:28:00 boes kernel: {:xfs:xfs_ialloc+95} {:xfs:kmem_zone_alloc+91} Aug 13 08:28:00 boes kernel: {:xfs:xfs_dir_ialloc+134} {:xfs:xfs_log_reserve+195} Aug 13 08:28:00 boes kernel: {:xfs:xfs_mkdir+923} {:xfs:xfs_acl_get_attr+91} Aug 13 08:28:00 boes kernel: {:xfs:xfs_vn_mknod+465} {d_rehash+112} Aug 13 08:28:00 boes kernel: {__mutex_unlock_slowpath+415} {real_lookup+157} Aug 13 08:28:00 boes kernel: {_atomic_dec_and_lock+65} {mntput_no_expire+36} Aug 13 08:28:00 boes kernel: {__link_path_walk+3576} {__up_read+33} Aug 13 08:28:00 boes kernel: {:xfs:xfs_iunlock+102} {:xfs:xfs_access+74} Aug 13 08:28:00 boes kernel: {:xfs:xfs_vn_permission+20} {permission+104} Aug 13 08:28:00 boes kernel: {__link_path_walk+170} {:xfs:xfs_access+74} Aug 13 08:28:00 boes kernel: {vfs_mkdir+130} {sys_mkdirat+165} Aug 13 08:28:00 boes kernel: {system_call+126} Variations of this trace repeat a number of times, and then: Aug 13 08:31:09 boes kernel: xfs_force_shutdown(md6,0x8) called from line 1151 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88065ba8 Aug 13 08:31:09 boes kernel: Filesystem "md6": Corruption of in-memory data detected. Shutting down filesystem: md6 Aug 13 08:31:09 boes kernel: Please umount the filesystem, and rectify the problem(s) The repair after this gave the following messages: Phase 3: correcting nblocks for inode 3080162495, was 2034 - counted 4 Phase 7: resetting inode 256 nlinks from 17 to 16 resetting inode 3080162495 nlinks from 1 to 10 That's all. Needless to say, the night after that repair it all went pear-shaped again: Aug 14 01:00:03 boes kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 874 of file fs/xfs/xfs_ialloc.c. Caller 0xffffffff8803be2f Aug 14 01:00:03 boes kernel: Aug 14 01:00:03 boes kernel: Call Trace: {:xfs:xfs_dialloc+1958} Aug 14 01:00:03 boes kernel: {:xfs:_xfs_buf_lookup_pages+711} {:xfs:xlog_state_get_iclog_space+56} Aug 14 01:00:03 boes kernel: {:xfs:xfs_ialloc+95} {:xfs:kmem_zone_alloc+91} Aug 14 01:00:03 boes kernel: {:xfs:xfs_dir_ialloc+134} {:xfs:xfs_log_reserve+195} Aug 14 01:00:03 boes kernel: {:xfs:xfs_mkdir+923} {:xfs:xfs_acl_get_attr+91} Aug 14 01:00:03 boes kernel: {:xfs:xfs_vn_mknod+465} {d_rehash+112} Aug 14 01:00:03 boes kernel: {__mutex_unlock_slowpath+415} {real_lookup+157} Aug 14 01:00:03 boes kernel: {_atomic_dec_and_lock+65} {mntput_no_expire+36} Aug 14 01:00:03 boes kernel: {__link_path_walk+3576} {__up_read+33} Aug 14 01:00:03 boes kernel: {:xfs:xfs_iunlock+102} {:xfs:xfs_access+74} Aug 14 01:00:03 boes kernel: {:xfs:xfs_vn_permission+20} {permission+104} Aug 14 01:00:03 boes kernel: {__link_path_walk+170} {:xfs:xfs_access+74} Aug 14 01:00:03 boes kernel: {vfs_mkdir+130} {sys_mkdirat+165} Aug 14 01:00:03 boes kernel: {system_call+126} Aug 14 01:00:03 boes kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 874 of file fs/xfs/xfs_ialloc.c. Caller 0xffffffff8803be2f Aug 14 01:00:03 boes kernel: Aug 14 01:00:03 boes kernel: Call Trace: {:xfs:xfs_dialloc+1958} Aug 14 01:00:03 boes kernel: {:xfs:xfs_ialloc+95} {:xfs:kmem_zone_alloc+91} Aug 14 01:00:03 boes kernel: {:xfs:xfs_dir_ialloc+134} {:xfs:xfs_log_reserve+195} Aug 14 01:00:03 boes kernel: {:xfs:xfs_mkdir+923} {:xfs:xfs_acl_get_attr+91} Aug 14 01:00:04 boes kernel: {:xfs:xfs_vn_mknod+465} {d_rehash+112} Aug 14 01:00:04 boes kernel: {__mutex_unlock_slowpath+415} {real_lookup+157} Aug 14 01:00:04 boes kernel: {_atomic_dec_and_lock+65} {mntput_no_expire+36} Aug 14 01:00:04 boes kernel: {__link_path_walk+3576} {__up_read+33} Aug 14 01:00:04 boes kernel: {:xfs:xfs_trans_unlocked_item+44} Aug 14 01:00:04 boes kernel: {:xfs:xfs_access+74} {:xfs:xfs_vn_permission+20} Aug 14 01:00:04 boes kernel: {permission+104} {__link_path_walk+170} Aug 14 01:00:04 boes kernel: {:xfs:xfs_access+74} {vfs_mkdir+130} Aug 14 01:00:04 boes kernel: {sys_mkdirat+165} {system_call+126} etc. I had umounted and mounted the filesystem after that. I tried removing a couple of junk directories at this point (probably a bad idea in retrospect) and when I tried to umount the directory again in preparation of the repair, the system stopped responding. The kernel was spewing these messages: Aug 14 12:23:45 boes kernel: BUG: soft lockup detected on CPU#0! Aug 14 12:23:45 boes kernel: Aug 14 12:23:45 boes kernel: Call Trace: {softlockup_tick+233} Aug 14 12:23:45 boes kernel: {update_process_times+80} {smp_local_timer_interrupt+35} Aug 14 12:23:45 boes kernel: {smp_apic_timer_interrupt+65} {apic_timer_interrupt+98} Aug 14 12:23:45 boes kernel: {:xfs:xfs_iextract+264} {debug_mutex_add_waiter+161} Aug 14 12:23:45 boes kernel: {:xfs:xfs_iflush_all+22} {__mutex_lock_slowpath+767} Aug 14 12:23:45 boes kernel: {__mutex_lock_slowpath+724} {:xfs:xfs_iflush_all+22} Aug 14 12:23:45 boes kernel: {:xfs:xfs_unmountfs+19} {:xfs:xfs_unmount+301} Aug 14 12:23:45 boes kernel: {:xfs:vfs_unmount+40} {:xfs:xfs_fs_put_super+50} Aug 14 12:23:45 boes kernel: {generic_shutdown_super+159} {kill_block_super+45} Aug 14 12:23:45 boes kernel: {deactivate_super+79} {sys_umount+137} Aug 14 12:23:45 boes kernel: {__up_write+34} {error_exit+0} Aug 14 12:23:45 boes kernel: {system_call+126} Aug 14 12:23:55 boes kernel: BUG: soft lockup detected on CPU#0! Aug 14 12:23:55 boes kernel: Aug 14 12:23:55 boes kernel: Call Trace: {softlockup_tick+233} Aug 14 12:23:55 boes kernel: {update_process_times+80} {smp_local_timer_interrupt+35} Aug 14 12:23:55 boes kernel: {smp_apic_timer_interrupt+65} {apic_timer_interrupt+98} Aug 14 12:23:56 boes kernel: {:xfs:xfs_iflush_all+22} {debug_mutex_add_waiter+161} Aug 14 12:23:56 boes kernel: {__mutex_lock_slowpath+767} {:xfs:xfs_iflush_all+81} Aug 14 12:23:56 boes kernel: {__mutex_unlock_slowpath+488} {:xfs:xfs_iflush_all+81} Aug 14 12:23:56 boes kernel: {:xfs:xfs_unmountfs+19} {:xfs:xfs_unmount+301} Aug 14 12:23:56 boes kernel: {:xfs:vfs_unmount+40} {:xfs:xfs_fs_put_super+50} Aug 14 12:23:56 boes kernel: {generic_shutdown_super+159} {kill_block_super+45} Aug 14 12:23:56 boes kernel: {deactivate_super+79} {sys_umount+137} Aug 14 12:23:56 boes kernel: {__up_write+34} {error_exit+0} Aug 14 12:23:56 boes kernel: {system_call+126} Dumping the locks held via magic-sysreq showed: Aug 14 12:26:46 boes kernel: #009: [ffff81013020d488] {alloc_super} Aug 14 12:26:46 boes kernel: .. held by: umount:18733 [ffff810154498340, 117] Aug 14 12:26:46 boes kernel: ... acquired at: generic_shutdown_super+0x63/0x150 kernel: 2.6.17.7 x86_64 xfstools: 2.8.11 from CVS last week I'm now running the "standard" debian xfs_repair (version 2.6.20) for kicks, as the 2.8.11 version didn't really seem to help much. I'm now getting plenty of these errors: entry "img-050806-090_onlin_81895f.jpg" at block 4 offset 2752 in directory inode 1343503044 references free inode 2511243327 clearing inode number in entry at offset 2752... entry "img-050806-090_onlin_81895f.jpg" at block 4 offset 2704 in directory inode 2160247870 references free inode 2511243327 clearing inode number in entry at offset 2704... entry "xbase-clients" at block 1 offset 1248 in directory inode 2457926717 references free inode 2511243327 clearing inode number in entry at offset 1248... entry "img-050806-090_onlin_81895f.jpg" at block 5 offset 592 in directory inode 2508332587 references free inode 2511243327 clearing inode number in entry at offset 592... Phase 6: rebuilding directory inode 256 rebuilding directory inode 1343503044 rebuilding directory inode 2508332587 rebuilding directory inode 2160247870 rebuilding directory inode 2457926717 Phase 7: resetting inode 256 nlinks from 17 to 16 resetting inode 2457926717 nlinks from 12 to 2 resetting inode 3080162495 nlinks from 1 to 10 Note the recurring them of "resetting inode 256 nlinks from 17 to 16". It seems like xfs_repair 2.8.11 doesn't, in fact, reset the nlinks. (Or it's the deletion and recreation of lost+found as 256 is the root dir, but that doesn't explain the other two inode nlinks.) Help! :-( Paul Slootman