cache_purge: shake on cache 0x5880a0 left 8 nodes!?

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* cache_purge: shake on cache 0x5880a0 left 8 nodes!?
@ 2006-08-10 16:42 Paul Slootman
  2006-08-11  1:30 ` Barry Naujok
  2006-08-14 14:59 ` Chris Wedgwood
  0 siblings, 2 replies; 10+ messages in thread
From: Paul Slootman @ 2006-08-10 16:42 UTC (permalink / raw)
  To: xfs

Hi,

I had problems due to the well-known endian bug.
I first ran xfs_repair on it from the standard Debian package,
version 2.6.20.  That gave a lot of output (which can be found at
http://www.xs4all.nl/~wurtel2/xfs_repair.out.4 ).

While that was running, I got today's CVS version and built that
(it reports version 2.8.11).  I ran that version against the
just-repaired filesystem, and got a number of additional errors,
notably:

empty data block 10 in directory inode 1343747104: junking block
empty data block 11 in directory inode 1343747104: junking block
empty data block 12 in directory inode 1343747104: junking block
free block 16777216 entry 10 for directory ino 1343747104 bad
rebuilding directory inode 1343747104

This 7 times; additionally, this was mentioned a couple of times:

cache_purge: shake on cache 0x5880a0 left 8 nodes!?

In fact, twice before the final "done.".

The second repair output can be found at
http://www.xs4all.nl/~wurtel2/xfs_repair.out.4b .

Should I be worried about that?  I tried a search for "shake on cache"
and found zero hits...

Additionally, the hash hit rates seem pretty bad. Anything to be
worried about, and could anything be done about it?

thanks,
Paul Slootman (please CC me on responses)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: cache_purge: shake on cache 0x5880a0 left 8 nodes!?
  2006-08-10 16:42 cache_purge: shake on cache 0x5880a0 left 8 nodes!? Paul Slootman
@ 2006-08-11  1:30 ` Barry Naujok
  2006-08-11  9:02   ` Paul Slootman
  2006-08-14 14:59 ` Chris Wedgwood
  1 sibling, 1 reply; 10+ messages in thread
From: Barry Naujok @ 2006-08-11  1:30 UTC (permalink / raw)
  To: 'Paul Slootman', xfs

Hi Paul,

What you are seeing is fine. The current libxfs cache with the directory update code is leaving some blocks referenced,
and the libxfs code is printing out these blocks with outstanding references. 

The actual data was written to disk (prior to 2.8.10 it may not have). 

> -----Original Message-----
> From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] 
> On Behalf Of Paul Slootman
> Sent: Friday, 11 August 2006 2:42 AM
> To: xfs@oss.sgi.com
> Subject: cache_purge: shake on cache 0x5880a0 left 8 nodes!?
> 
> Hi,
> 
> I had problems due to the well-known endian bug.
> I first ran xfs_repair on it from the standard Debian package,
> version 2.6.20.  That gave a lot of output (which can be found at
> http://www.xs4all.nl/~wurtel2/xfs_repair.out.4 ).
> 
> While that was running, I got today's CVS version and built that
> (it reports version 2.8.11).  I ran that version against the
> just-repaired filesystem, and got a number of additional errors,
> notably:
> 
> empty data block 10 in directory inode 1343747104: junking block
> empty data block 11 in directory inode 1343747104: junking block
> empty data block 12 in directory inode 1343747104: junking block
> free block 16777216 entry 10 for directory ino 1343747104 bad
> rebuilding directory inode 1343747104
> 
> This 7 times; additionally, this was mentioned a couple of times:
> 
> cache_purge: shake on cache 0x5880a0 left 8 nodes!?
> 
> In fact, twice before the final "done.".
> 
> The second repair output can be found at
> http://www.xs4all.nl/~wurtel2/xfs_repair.out.4b .
> 
> Should I be worried about that?  I tried a search for "shake on cache"
> and found zero hits...
> 
> Additionally, the hash hit rates seem pretty bad. Anything to be
> worried about, and could anything be done about it?
> 
> 
> thanks,
> Paul Slootman (please CC me on responses)
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cache_purge: shake on cache 0x5880a0 left 8 nodes!?
  2006-08-11  1:30 ` Barry Naujok
@ 2006-08-11  9:02   ` Paul Slootman
  2006-08-12  9:14     ` Paul Slootman
  0 siblings, 1 reply; 10+ messages in thread
From: Paul Slootman @ 2006-08-11  9:02 UTC (permalink / raw)
  To: Barry Naujok; +Cc: xfs

On Fri 11 Aug 2006, Barry Naujok wrote:
> 
> What you are seeing is fine. The current libxfs cache with the directory update code is leaving some blocks referenced,
> and the libxfs code is printing out these blocks with outstanding references. 
> 
> The actual data was written to disk (prior to 2.8.10 it may not have). 

Hmmm....

Unfortunately, the filesystem panicked again last night.
This was after the double repair, and rebooting into 2.6.17.7, which
shouldn't have the bug, right?
I'll run another repair now :-(


Paul Slootman

The latest kernel message:


rip copy_user_generic_c+0x8/0x26
Filesystem "md6": XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff880577d3

Call Trace: <ffffffff880501d0>{:xfs:xfs_trans_cancel+96}
       <ffffffff880577d3>{:xfs:xfs_create+1587} <ffffffff88062381>{:xfs:xfs_vn_mknod+433}
       <ffffffff8802cff9>{:xfs:xfs_dir2_leafn_lookup_int+89}
       <ffffffff8805076c>{:xfs:xfs_trans_unlocked_item+44}
       <ffffffff8805e1ae>{:xfs:xfs_buf_rele+62} <ffffffff8802ea28>{:xfs:xfs_dir2_node_lookup+184}
       <ffffffff88027bab>{:xfs:xfs_dir2_lookup+267} <ffffffff802892ef>{link_path_walk+415}
       <ffffffff880560aa>{:xfs:xfs_access+74} <ffffffff80289e9e>{vfs_create+142}
       <ffffffff8028a22f>{open_namei+383} <ffffffff80278ff7>{do_filp_open+39}
       <ffffffff80279211>{get_unused_fd+113} <ffffffff80279406>{do_sys_open+70}
       <ffffffff80209b5a>{system_call+126}
xfs_force_shutdown(md6,0x8) called from line 1151 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffff88065ba8
xfs_force_shutdown(md6,0x2) called from line 729 of file fs/xfs/xfs_log.c.  Return address = 0xffffffff88065ba8
Filesystem "md6": Corruption of in-memory data detected.  Shutting down filesystem: md6
Please umount the filesystem, and rectify the problem(s)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cache_purge: shake on cache 0x5880a0 left 8 nodes!?
  2006-08-11  9:02   ` Paul Slootman
@ 2006-08-12  9:14     ` Paul Slootman
  2006-08-14 14:17       ` XFS internal error XFS_WANT_CORRUPTED_GOTO Paul Slootman
  2006-08-15  1:49       ` cache_purge: shake on cache 0x5880a0 left 8 nodes!? Barry Naujok
  0 siblings, 2 replies; 10+ messages in thread
From: Paul Slootman @ 2006-08-12  9:14 UTC (permalink / raw)
  To: xfs

On Fri 11 Aug 2006, Paul Slootman wrote:
> 
> Unfortunately, the filesystem panicked again last night.
> This was after the double repair, and rebooting into 2.6.17.7, which
> shouldn't have the bug, right?
> I'll run another repair now :-(

And again.
Curious fact appeared: the lost+found directory from the first time
(which I had moved to lost+found.x) could not be removed "Directory not
empty". This would indicate that the CVS repair version (of 2 days ago)
does not properly build the lost+found directory.

I've now zapped that directory with xfs_db, and am running the (daily?!)
xfs_repair at this moment. As the filesystem is 1.1TB, it takes a couple
of hours :(

Paul Slootman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* XFS internal error XFS_WANT_CORRUPTED_GOTO
  2006-08-12  9:14     ` Paul Slootman
@ 2006-08-14 14:17       ` Paul Slootman
  2006-08-15 11:54         ` Paul Slootman
  2006-08-15  1:49       ` cache_purge: shake on cache 0x5880a0 left 8 nodes!? Barry Naujok
  1 sibling, 1 reply; 10+ messages in thread
From: Paul Slootman @ 2006-08-14 14:17 UTC (permalink / raw)
  To: xfs

On Sat 12 Aug 2006, Paul Slootman wrote:
> 
> I've now zapped that directory with xfs_db, and am running the (daily?!)
> xfs_repair at this moment. As the filesystem is 1.1TB, it takes a couple
> of hours :(

That showed the following message in phase 3 because of the xfs_db action:

    imap claims a free inode 261 is in use, correcting imap and clearing inode

and then in phase 4:

    entry "lost+found.x" at block 0 offset 584 in directory inode 256 references free inode 261
            clearing inode number in entry at offset 584...

and in phase 6:

    rebuilding directory inode 256

and phase 7:

    resetting inode 256 nlinks from 17 to 16

but nothing beyond that.

However, that night:

Aug 13 08:28:00 boes kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 874 of file fs/xfs/xfs_ialloc.c.  Caller 0xffffffff8803be2f
Aug 13 08:28:00 boes kernel: 
Aug 13 08:28:00 boes kernel: Call Trace: <ffffffff880366d6>{:xfs:xfs_dialloc+1958}
Aug 13 08:28:00 boes kernel:        <ffffffff8805d8e7>{:xfs:_xfs_buf_lookup_pages+711} <ffffffff88045858>{:xfs:xlog_state_get_iclog_space+56}
Aug 13 08:28:00 boes kernel:        <ffffffff8803be2f>{:xfs:xfs_ialloc+95} <ffffffff8805b83b>{:xfs:kmem_zone_alloc+91}
Aug 13 08:28:00 boes kernel:        <ffffffff88052116>{:xfs:xfs_dir_ialloc+134} <ffffffff88043913>{:xfs:xfs_log_reserve+195}
Aug 13 08:28:00 boes kernel:        <ffffffff8805867b>{:xfs:xfs_mkdir+923} <ffffffff88007f1b>{:xfs:xfs_acl_get_attr+91}
Aug 13 08:28:00 boes kernel:        <ffffffff880623a1>{:xfs:xfs_vn_mknod+465} <ffffffff80292ab0>{d_rehash+112}
Aug 13 08:28:00 boes kernel:        <ffffffff804a136f>{__mutex_unlock_slowpath+415} <ffffffff80287f9d>{real_lookup+157}
Aug 13 08:28:00 boes kernel:        <ffffffff8033fac1>{_atomic_dec_and_lock+65} <ffffffff80296544>{mntput_no_expire+36}
Aug 13 08:28:00 boes kernel:        <ffffffff80289138>{__link_path_walk+3576} <ffffffff80342cd1>{__up_read+33}
Aug 13 08:28:00 boes kernel:        <ffffffff8803a816>{:xfs:xfs_iunlock+102} <ffffffff880560aa>{:xfs:xfs_access+74}
Aug 13 08:28:00 boes kernel:        <ffffffff88062b44>{:xfs:xfs_vn_permission+20} <ffffffff80287c48>{permission+104}
Aug 13 08:28:00 boes kernel:        <ffffffff802883ea>{__link_path_walk+170} <ffffffff880560aa>{:xfs:xfs_access+74}
Aug 13 08:28:00 boes kernel:        <ffffffff8028ab02>{vfs_mkdir+130} <ffffffff8028abf5>{sys_mkdirat+165}
Aug 13 08:28:00 boes kernel:        <ffffffff80209b5a>{system_call+126}
Aug 13 08:28:00 boes kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 874 of file fs/xfs/xfs_ialloc.c.  Caller 0xffffffff8803be2f
Aug 13 08:28:00 boes kernel: 
Aug 13 08:28:00 boes kernel: Call Trace: <ffffffff880366d6>{:xfs:xfs_dialloc+1958}
Aug 13 08:28:00 boes kernel:        <ffffffff80331a11>{__generic_unplug_device+33} <ffffffff80340aa0>{kobject_release+0}
Aug 13 08:28:00 boes kernel:        <ffffffff88045858>{:xfs:xlog_state_get_iclog_space+56}
Aug 13 08:28:00 boes kernel:        <ffffffff8803be2f>{:xfs:xfs_ialloc+95} <ffffffff8805b83b>{:xfs:kmem_zone_alloc+91}
Aug 13 08:28:00 boes kernel:        <ffffffff88052116>{:xfs:xfs_dir_ialloc+134} <ffffffff88043913>{:xfs:xfs_log_reserve+195}
Aug 13 08:28:00 boes kernel:        <ffffffff8805867b>{:xfs:xfs_mkdir+923} <ffffffff88007f1b>{:xfs:xfs_acl_get_attr+91}
Aug 13 08:28:00 boes kernel:        <ffffffff880623a1>{:xfs:xfs_vn_mknod+465} <ffffffff80292ab0>{d_rehash+112}
Aug 13 08:28:00 boes kernel:        <ffffffff804a136f>{__mutex_unlock_slowpath+415} <ffffffff80287f9d>{real_lookup+157}
Aug 13 08:28:00 boes kernel:        <ffffffff8033fac1>{_atomic_dec_and_lock+65} <ffffffff80296544>{mntput_no_expire+36}
Aug 13 08:28:00 boes kernel:        <ffffffff80289138>{__link_path_walk+3576} <ffffffff80342cd1>{__up_read+33}
Aug 13 08:28:00 boes kernel:        <ffffffff8803a816>{:xfs:xfs_iunlock+102} <ffffffff880560aa>{:xfs:xfs_access+74}
Aug 13 08:28:00 boes kernel:        <ffffffff88062b44>{:xfs:xfs_vn_permission+20} <ffffffff80287c48>{permission+104}
Aug 13 08:28:00 boes kernel:        <ffffffff802883ea>{__link_path_walk+170} <ffffffff880560aa>{:xfs:xfs_access+74}
Aug 13 08:28:00 boes kernel:        <ffffffff8028ab02>{vfs_mkdir+130} <ffffffff8028abf5>{sys_mkdirat+165}
Aug 13 08:28:00 boes kernel:        <ffffffff80209b5a>{system_call+126}

Variations of this trace repeat a number of times, and then:

Aug 13 08:31:09 boes kernel: xfs_force_shutdown(md6,0x8) called from line 1151 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffff88065ba8
Aug 13 08:31:09 boes kernel: Filesystem "md6": Corruption of in-memory data detected.  Shutting down filesystem: md6
Aug 13 08:31:09 boes kernel: Please umount the filesystem, and rectify the problem(s)

The repair after this gave the following messages:

Phase 3: correcting nblocks for inode 3080162495, was 2034 - counted 4
Phase 7: resetting inode 256 nlinks from 17 to 16
         resetting inode 3080162495 nlinks from 1 to 10

That's all.

Needless to say, the night after that repair it all went pear-shaped again:

Aug 14 01:00:03 boes kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 874 of file fs/xfs/xfs_ialloc.c.  Caller 0xffffffff8803be2f
Aug 14 01:00:03 boes kernel: 
Aug 14 01:00:03 boes kernel: Call Trace: <ffffffff880366d6>{:xfs:xfs_dialloc+1958}
Aug 14 01:00:03 boes kernel:        <ffffffff8805d8e7>{:xfs:_xfs_buf_lookup_pages+711} <ffffffff88045858>{:xfs:xlog_state_get_iclog_space+56}
Aug 14 01:00:03 boes kernel:        <ffffffff8803be2f>{:xfs:xfs_ialloc+95} <ffffffff8805b83b>{:xfs:kmem_zone_alloc+91}
Aug 14 01:00:03 boes kernel:        <ffffffff88052116>{:xfs:xfs_dir_ialloc+134} <ffffffff88043913>{:xfs:xfs_log_reserve+195}
Aug 14 01:00:03 boes kernel:        <ffffffff8805867b>{:xfs:xfs_mkdir+923} <ffffffff88007f1b>{:xfs:xfs_acl_get_attr+91}
Aug 14 01:00:03 boes kernel:        <ffffffff880623a1>{:xfs:xfs_vn_mknod+465} <ffffffff80292ab0>{d_rehash+112}
Aug 14 01:00:03 boes kernel:        <ffffffff804a136f>{__mutex_unlock_slowpath+415} <ffffffff80287f9d>{real_lookup+157}
Aug 14 01:00:03 boes kernel:        <ffffffff8033fac1>{_atomic_dec_and_lock+65} <ffffffff80296544>{mntput_no_expire+36}
Aug 14 01:00:03 boes kernel:        <ffffffff80289138>{__link_path_walk+3576} <ffffffff80342cd1>{__up_read+33}
Aug 14 01:00:03 boes kernel:        <ffffffff8803a816>{:xfs:xfs_iunlock+102} <ffffffff880560aa>{:xfs:xfs_access+74}
Aug 14 01:00:03 boes kernel:        <ffffffff88062b44>{:xfs:xfs_vn_permission+20} <ffffffff80287c48>{permission+104}
Aug 14 01:00:03 boes kernel:        <ffffffff802883ea>{__link_path_walk+170} <ffffffff880560aa>{:xfs:xfs_access+74}
Aug 14 01:00:03 boes kernel:        <ffffffff8028ab02>{vfs_mkdir+130} <ffffffff8028abf5>{sys_mkdirat+165}
Aug 14 01:00:03 boes kernel:        <ffffffff80209b5a>{system_call+126}
Aug 14 01:00:03 boes kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 874 of file fs/xfs/xfs_ialloc.c.  Caller 0xffffffff8803be2f
Aug 14 01:00:03 boes kernel: 
Aug 14 01:00:03 boes kernel: Call Trace: <ffffffff880366d6>{:xfs:xfs_dialloc+1958}
Aug 14 01:00:03 boes kernel:        <ffffffff8803be2f>{:xfs:xfs_ialloc+95} <ffffffff8805b83b>{:xfs:kmem_zone_alloc+91}
Aug 14 01:00:03 boes kernel:        <ffffffff88052116>{:xfs:xfs_dir_ialloc+134} <ffffffff88043913>{:xfs:xfs_log_reserve+195}
Aug 14 01:00:03 boes kernel:        <ffffffff8805867b>{:xfs:xfs_mkdir+923} <ffffffff88007f1b>{:xfs:xfs_acl_get_attr+91}
Aug 14 01:00:04 boes kernel:        <ffffffff880623a1>{:xfs:xfs_vn_mknod+465} <ffffffff80292ab0>{d_rehash+112}
Aug 14 01:00:04 boes kernel:        <ffffffff804a136f>{__mutex_unlock_slowpath+415} <ffffffff80287f9d>{real_lookup+157}
Aug 14 01:00:04 boes kernel:        <ffffffff8033fac1>{_atomic_dec_and_lock+65} <ffffffff80296544>{mntput_no_expire+36}
Aug 14 01:00:04 boes kernel:        <ffffffff80289138>{__link_path_walk+3576} <ffffffff80342cd1>{__up_read+33}
Aug 14 01:00:04 boes kernel:        <ffffffff8805076c>{:xfs:xfs_trans_unlocked_item+44}
Aug 14 01:00:04 boes kernel:        <ffffffff880560aa>{:xfs:xfs_access+74} <ffffffff88062b44>{:xfs:xfs_vn_permission+20}
Aug 14 01:00:04 boes kernel:        <ffffffff80287c48>{permission+104} <ffffffff802883ea>{__link_path_walk+170}
Aug 14 01:00:04 boes kernel:        <ffffffff880560aa>{:xfs:xfs_access+74} <ffffffff8028ab02>{vfs_mkdir+130}
Aug 14 01:00:04 boes kernel:        <ffffffff8028abf5>{sys_mkdirat+165} <ffffffff80209b5a>{system_call+126}

etc.

I had umounted and mounted the filesystem after that. I tried removing
a couple of junk directories at this point (probably a bad idea in retrospect)
and when I tried to umount the directory again in preparation of the repair,
the system stopped responding. The kernel was spewing these messages:

Aug 14 12:23:45 boes kernel: BUG: soft lockup detected on CPU#0!
Aug 14 12:23:45 boes kernel: 
Aug 14 12:23:45 boes kernel: Call Trace: <IRQ> <ffffffff802511a9>{softlockup_tick+233}
Aug 14 12:23:45 boes kernel:        <ffffffff802367e0>{update_process_times+80} <ffffffff802163e3>{smp_local_timer_interrupt+35}
Aug 14 12:23:45 boes kernel:        <ffffffff80216451>{smp_apic_timer_interrupt+65} <ffffffff8020a69a>{apic_timer_interrupt+98} <EOI>
Aug 14 12:23:45 boes kernel:        <ffffffff8803a578>{:xfs:xfs_iextract+264} <ffffffff80245591>{debug_mutex_add_waiter+161}
Aug 14 12:23:45 boes kernel:        <ffffffff8803e226>{:xfs:xfs_iflush_all+22} <ffffffff804a10df>{__mutex_lock_slowpath+767}
Aug 14 12:23:45 boes kernel:        <ffffffff804a10b4>{__mutex_lock_slowpath+724} <ffffffff8803e226>{:xfs:xfs_iflush_all+22}
Aug 14 12:23:45 boes kernel:        <ffffffff8804c733>{:xfs:xfs_unmountfs+19} <ffffffff8805368d>{:xfs:xfs_unmount+301}
Aug 14 12:23:45 boes kernel:        <ffffffff880659f8>{:xfs:vfs_unmount+40} <ffffffff88065342>{:xfs:xfs_fs_put_super+50}
Aug 14 12:23:45 boes kernel:        <ffffffff802805ff>{generic_shutdown_super+159} <ffffffff802811dd>{kill_block_super+45}
Aug 14 12:23:45 boes kernel:        <ffffffff8028048f>{deactivate_super+79} <ffffffff80296d79>{sys_umount+137}
Aug 14 12:23:45 boes kernel:        <ffffffff80342d82>{__up_write+34} <ffffffff8020a7ed>{error_exit+0}
Aug 14 12:23:45 boes kernel:        <ffffffff80209b5a>{system_call+126}
Aug 14 12:23:55 boes kernel: BUG: soft lockup detected on CPU#0!
Aug 14 12:23:55 boes kernel: 
Aug 14 12:23:55 boes kernel: Call Trace: <IRQ> <ffffffff802511a9>{softlockup_tick+233}
Aug 14 12:23:55 boes kernel:        <ffffffff802367e0>{update_process_times+80} <ffffffff802163e3>{smp_local_timer_interrupt+35}
Aug 14 12:23:55 boes kernel:        <ffffffff80216451>{smp_apic_timer_interrupt+65} <ffffffff8020a69a>{apic_timer_interrupt+98} <EOI>
Aug 14 12:23:56 boes kernel:        <ffffffff8803e226>{:xfs:xfs_iflush_all+22} <ffffffff80245591>{debug_mutex_add_waiter+161}
Aug 14 12:23:56 boes kernel:        <ffffffff804a10df>{__mutex_lock_slowpath+767} <ffffffff8803e261>{:xfs:xfs_iflush_all+81}
Aug 14 12:23:56 boes kernel:        <ffffffff804a13b8>{__mutex_unlock_slowpath+488} <ffffffff8803e261>{:xfs:xfs_iflush_all+81}
Aug 14 12:23:56 boes kernel:        <ffffffff8804c733>{:xfs:xfs_unmountfs+19} <ffffffff8805368d>{:xfs:xfs_unmount+301}
Aug 14 12:23:56 boes kernel:        <ffffffff880659f8>{:xfs:vfs_unmount+40} <ffffffff88065342>{:xfs:xfs_fs_put_super+50}
Aug 14 12:23:56 boes kernel:        <ffffffff802805ff>{generic_shutdown_super+159} <ffffffff802811dd>{kill_block_super+45}
Aug 14 12:23:56 boes kernel:        <ffffffff8028048f>{deactivate_super+79} <ffffffff80296d79>{sys_umount+137}
Aug 14 12:23:56 boes kernel:        <ffffffff80342d82>{__up_write+34} <ffffffff8020a7ed>{error_exit+0}
Aug 14 12:23:56 boes kernel:        <ffffffff80209b5a>{system_call+126}

Dumping the locks held via magic-sysreq showed:

Aug 14 12:26:46 boes kernel: #009:             [ffff81013020d488] {alloc_super}
Aug 14 12:26:46 boes kernel: .. held by:            umount:18733 [ffff810154498340, 117]
Aug 14 12:26:46 boes kernel: ... acquired at:               generic_shutdown_super+0x63/0x150

kernel: 2.6.17.7 x86_64
xfstools: 2.8.11 from CVS last week

I'm now running the "standard" debian xfs_repair (version 2.6.20) for kicks,
as the 2.8.11 version didn't really seem to help much. I'm now getting
plenty of these errors:

entry "img-050806-090_onlin_81895f.jpg" at block 4 offset 2752 in directory inode 1343503044 references free inode 2511243327
        clearing inode number in entry at offset 2752...
entry "img-050806-090_onlin_81895f.jpg" at block 4 offset 2704 in directory inode 2160247870 references free inode 2511243327
        clearing inode number in entry at offset 2704...
entry "xbase-clients" at block 1 offset 1248 in directory inode 2457926717 references free inode 2511243327
        clearing inode number in entry at offset 1248...
entry "img-050806-090_onlin_81895f.jpg" at block 5 offset 592 in directory inode 2508332587 references free inode 2511243327
        clearing inode number in entry at offset 592...

Phase 6:
rebuilding directory inode 256
rebuilding directory inode 1343503044
rebuilding directory inode 2508332587
rebuilding directory inode 2160247870
rebuilding directory inode 2457926717

Phase 7:
resetting inode 256 nlinks from 17 to 16
resetting inode 2457926717 nlinks from 12 to 2
resetting inode 3080162495 nlinks from 1 to 10

Note the recurring them of "resetting inode 256 nlinks from 17 to 16".
It seems like xfs_repair 2.8.11 doesn't, in fact, reset the nlinks.
(Or it's the deletion and recreation of lost+found as 256 is the root dir,
but that doesn't explain the other two inode nlinks.)

Help! :-(

Paul Slootman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cache_purge: shake on cache 0x5880a0 left 8 nodes!?
  2006-08-10 16:42 cache_purge: shake on cache 0x5880a0 left 8 nodes!? Paul Slootman
  2006-08-11  1:30 ` Barry Naujok
@ 2006-08-14 14:59 ` Chris Wedgwood
  2006-08-14 15:55   ` Paul Slootman
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Wedgwood @ 2006-08-14 14:59 UTC (permalink / raw)
  To: Paul Slootman; +Cc: xfs

On Thu, Aug 10, 2006 at 06:42:22PM +0200, Paul Slootman wrote:

> empty data block 10 in directory inode 1343747104: junking block
> empty data block 11 in directory inode 1343747104: junking block
> empty data block 12 in directory inode 1343747104: junking block
> free block 16777216 entry 10 for directory ino 1343747104 bad
> rebuilding directory inode 1343747104

do subsequent xfs_repair runs go silent?  if not then you've probably
hit a case where it's leaving something icky behind and you have to
manually poke it using xfs_db (i had this myself)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cache_purge: shake on cache 0x5880a0 left 8 nodes!?
  2006-08-14 14:59 ` Chris Wedgwood
@ 2006-08-14 15:55   ` Paul Slootman
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Slootman @ 2006-08-14 15:55 UTC (permalink / raw)
  To: xfs

On Mon 14 Aug 2006, Chris Wedgwood wrote:
> On Thu, Aug 10, 2006 at 06:42:22PM +0200, Paul Slootman wrote:
> 
> > empty data block 10 in directory inode 1343747104: junking block
> > empty data block 11 in directory inode 1343747104: junking block
> > empty data block 12 in directory inode 1343747104: junking block
> > free block 16777216 entry 10 for directory ino 1343747104 bad
> > rebuilding directory inode 1343747104
> 
> do subsequent xfs_repair runs go silent?  if not then you've probably
> hit a case where it's leaving something icky behind and you have to
> manually poke it using xfs_db (i had this myself)

No, these particular messages didn't reoccur, although I've been having
a lot of trouble with that filesystem (see my other message today).


Paul Slootman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: cache_purge: shake on cache 0x5880a0 left 8 nodes!?
  2006-08-12  9:14     ` Paul Slootman
  2006-08-14 14:17       ` XFS internal error XFS_WANT_CORRUPTED_GOTO Paul Slootman
@ 2006-08-15  1:49       ` Barry Naujok
  2006-08-15 11:55         ` Paul Slootman
  1 sibling, 1 reply; 10+ messages in thread
From: Barry Naujok @ 2006-08-15  1:49 UTC (permalink / raw)
  To: 'Paul Slootman', xfs

Paul,

Do you still have the filesystem with the undeletable directory?

If so, can you run xfs_db and email me the contents of that directory?

The commands would be:

xfs_db> blockget -n
xfs_db> ncheck lost+found.x
        <inum> lost+found.x/.
xfs_db> inode <inum>
xfs_db> p

# if not a shortform directory, ie. the above does not output 
  "u.sfdir2.etc", do the following sequence for all the blocks
  in the extent map, or at least offset 0 and 8388608 if they
  exist:

xfs_db> dblock <n>
xfs_db> p


With this output, I should be able to recreate a similar 
directory and test xfs_repair for problems.

Thanks,
Barry.

> -----Original Message-----
> From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] 
> On Behalf Of Paul Slootman
> Sent: Saturday, 12 August 2006 7:15 PM
> To: xfs@oss.sgi.com
> Subject: Re: cache_purge: shake on cache 0x5880a0 left 8 nodes!?
> 
> On Fri 11 Aug 2006, Paul Slootman wrote:
> > 
> > Unfortunately, the filesystem panicked again last night.
> > This was after the double repair, and rebooting into 2.6.17.7, which
> > shouldn't have the bug, right?
> > I'll run another repair now :-(
> 
> And again.
> Curious fact appeared: the lost+found directory from the first time
> (which I had moved to lost+found.x) could not be removed 
> "Directory not
> empty". This would indicate that the CVS repair version (of 2 
> days ago)
> does not properly build the lost+found directory.
> 
> I've now zapped that directory with xfs_db, and am running 
> the (daily?!)
> xfs_repair at this moment. As the filesystem is 1.1TB, it 
> takes a couple
> of hours :(
> 
> 
> Paul Slootman
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS internal error XFS_WANT_CORRUPTED_GOTO
  2006-08-14 14:17       ` XFS internal error XFS_WANT_CORRUPTED_GOTO Paul Slootman
@ 2006-08-15 11:54         ` Paul Slootman
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Slootman @ 2006-08-15 11:54 UTC (permalink / raw)
  To: xfs

On Mon 14 Aug 2006, Paul Slootman wrote:
> 
> kernel: 2.6.17.7 x86_64
> xfstools: 2.8.11 from CVS last week
> 
> I'm now running the "standard" debian xfs_repair (version 2.6.20) for kicks,
> as the 2.8.11 version didn't really seem to help much. I'm now getting
> plenty of these errors:
[snip]

After all that, again, tonight:

Aug 15 01:11:57 boes kernel: Filesystem "md6": XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff880577d3
Aug 15 01:11:57 boes kernel: 
Aug 15 01:11:57 boes kernel: Call Trace: <ffffffff880501d0>{:xfs:xfs_trans_cancel+96}
Aug 15 01:11:57 boes kernel:        <ffffffff880577d3>{:xfs:xfs_create+1587} <ffffffff88062381>{:xfs:xfs_vn_mknod+433}
Aug 15 01:11:57 boes kernel:        <ffffffff8805076c>{:xfs:xfs_trans_unlocked_item+44}
Aug 15 01:11:57 boes kernel:        <ffffffff8805e1ae>{:xfs:xfs_buf_rele+62} <ffffffff880260f1>{:xfs:xfs_da_brelse+129}
Aug 15 01:11:57 boes kernel:        <ffffffff8802c24b>{:xfs:xfs_dir2_leaf_lookup_int+555}
Aug 15 01:11:57 boes kernel:        <ffffffff8802bf93>{:xfs:xfs_dir2_leaf_lookup+35} <ffffffff8802842f>{:xfs:xfs_dir2_isleaf+31}
Aug 15 01:11:57 boes kernel:        <ffffffff88027b9f>{:xfs:xfs_dir2_lookup+255} <ffffffff802892ef>{link_path_walk+415}
Aug 15 01:11:57 boes kernel:        <ffffffff880560aa>{:xfs:xfs_access+74} <ffffffff80289e9e>{vfs_create+142}
Aug 15 01:11:57 boes kernel:        <ffffffff8028a22f>{open_namei+383} <ffffffff80278ff7>{do_filp_open+39}
Aug 15 01:11:57 boes kernel:        <ffffffff80279211>{get_unused_fd+113} <ffffffff80279406>{do_sys_open+70}
Aug 15 01:11:57 boes kernel:        <ffffffff80209b5a>{system_call+126}
Aug 15 01:11:57 boes kernel: xfs_force_shutdown(md6,0x8) called from line 1151 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffff88065ba8
Aug 15 01:11:57 boes kernel: Filesystem "md6": Corruption of in-memory data detected.  Shutting down filesystem: md6
Aug 15 01:11:57 boes kernel: Please umount the filesystem, and rectify the problem(s)

The only thing the (2.8.11) xfs_repair reported was, in phase 3:

    data fork in ino 2995825682 claims free block 251664817

Paul Slootman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cache_purge: shake on cache 0x5880a0 left 8 nodes!?
  2006-08-15  1:49       ` cache_purge: shake on cache 0x5880a0 left 8 nodes!? Barry Naujok
@ 2006-08-15 11:55         ` Paul Slootman
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Slootman @ 2006-08-15 11:55 UTC (permalink / raw)
  To: Barry Naujok; +Cc: xfs

On Tue 15 Aug 2006, Barry Naujok wrote:
> 
> Do you still have the filesystem with the undeletable directory?

No, as I wrote, I zapped it with xfs_db and the subsequent xfs_repair
cleaned it up.

Thanks for your concern however.


Paul Slootman

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-08-15 11:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-10 16:42 cache_purge: shake on cache 0x5880a0 left 8 nodes!? Paul Slootman
2006-08-11  1:30 ` Barry Naujok
2006-08-11  9:02   ` Paul Slootman
2006-08-12  9:14     ` Paul Slootman
2006-08-14 14:17       ` XFS internal error XFS_WANT_CORRUPTED_GOTO Paul Slootman
2006-08-15 11:54         ` Paul Slootman
2006-08-15  1:49       ` cache_purge: shake on cache 0x5880a0 left 8 nodes!? Barry Naujok
2006-08-15 11:55         ` Paul Slootman
2006-08-14 14:59 ` Chris Wedgwood
2006-08-14 15:55   ` Paul Slootman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox