public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c
@ 2009-02-24 13:04 Federico Sevilla III
  2009-02-24 22:46 ` Dave Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Federico Sevilla III @ 2009-02-24 13:04 UTC (permalink / raw)
  To: Linux XFS


[-- Attachment #1.1: Type: text/plain, Size: 21305 bytes --]

Hi,

Recently, we had two file servers crash during periods of increased load
(increased access from workstations in the LAN). After the crash, the
XFS file systems would no longer mount. The mount process would just
stay in state D, with no progress, and no significant disk activity.

The first pass of xfs_repair (without -L) successfully found a secondary
super block. The primary was corrupted for some reason. The file system
could still not be mounted after this first xfs_repair, and xfs_repair
would no longer continue because the log had to be replayed. Running
xfs_repair a second time with the -L option worked.

Unfortunately we don't have the output of these runs of xfs_repair to
share with the list. The above narrative is the same for the crash on
both servers, though.

On one of the servers now, on the same file system that had trouble, we
are having the following messages (the system otherwise remains usable,
though, which is weird):

        attempt to access beyond end of device
        sda7: rw=0, want=154858897362229008, limit=3885978852
        I/O error in filesystem ("sda7") meta-data dev sda7 block 0x2262b58bf959708       ("xfs_trans_read_buf") error 5 buf count 4096

On this server which has been up for ~6 hours, we have 348 of the above
messages, and they are all identical.

Both servers use CentOS5 with the Linux 2.6.18-92.1.22.el5 kernel. For
the server currently spewing the above messages, the underlying storage
(quoted directly from dmesg) is:

        megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
        SCSI subsystem initialized
        megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
        megaraid: probe new device 0x1000:0x1960:0x1000:0x0523: bus 4:slot 3:func 0
        ACPI: PCI Interrupt 0000:04:03.0[A] -> GSI 25 (level, low) -> IRQ 201
        megaraid: fw version:[713S] bios version:[G121]
        scsi0 : LSI Logic MegaRAID driver
        scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
        scsi[0]: scanning scsi channel 1 [virtual] for logical drives
          Vendor: MegaRAID  Model: LD 0 RAID5 1907G  Rev: 713S
          Type:   Direct-Access                      ANSI SCSI revision: 02
        SCSI device sda: 3906242560 512-byte hdwr sectors (1999996 MB)

The information of the file system on /dev/sda7 is as follows:

        meta-data=/dev/sda7              isize=256    agcount=32, agsize=15179616 blks
                 =                       sectsz=512   attr=2
        data     =                       bsize=4096   blocks=485747328, imaxpct=25
                 =                       sunit=32     swidth=160 blks, unwritten=1
        naming   =version 2              bsize=4096  
        log      =internal               bsize=4096   blocks=32768, version=2
                 =                       sectsz=512   sunit=32 blks, lazy-count=0
        realtime =none                   extsz=4096   blocks=0, rtextents=0

Mount options are:

        rw,logbufs=8,logbsize=256k,sunit=256,swidth=1280,nobarrier

Write cache is disabled by the hardware RAID controller for all the
drives, and caching on the controller is set to write-through. These
servers are not new, and before setting them up we ran multiple passes
of memtest86+ with no issue.

Please help me understand what the cause of this problem could be. I
have been searching and it remains unclear. Also, suggestions on what
can be done to fix this.

Thanks!

On 2008-01-03 15:55, Jay Sullivan wrote:
> I'm still seeing a lot of the following in my dmesg.  Any ideas?  See
> below for what I have already tried (including moving data to a fresh
> XFS volume).
> 
>  
> 
> Tons of these; sometimes the want= changes, but it is always huge.
> 
> ###
> 
> attempt to access beyond end of device
> 
> dm-0: rw=0, want=68609558288793608, limit=8178892800
> 
> I/O error in filesystem ("dm-0") meta-data dev dm-0 block
> 0xf3c0079e000000       ("xfs_trans_read_buf") error 5 buf count 4096
> 
> ###
> 
>  
> 
> Occasionally some of these:
> 
> ###
> 
> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4533 of file
> fs/xfs/xfs_bmap.c.  Caller 0xc028c5a2
> [<c026bc58>] xfs_bmap_read_extents+0x3bd/0x498
> [<c028c5a2>] xfs_iread_extents+0x74/0xe1
> [<c028fb02>] xfs_iext_realloc_direct+0xa4/0xe7
> [<c028f3ef>] xfs_iex t;c028c5a2>] xfs_iread_extents+0x74/0xe1
> [<c026befd>] xfs_bmapi+0x1ca/0x173f
> [<c02e2d7e>] elv_rb_add+0x6f/0x88
> [<c02eb843>] as_update_rq+0x32/0x72
> [<c02ec08b>] as_add_request+0x76/0xa4
> [<c02e330c>] elv_insert+0xd5/0x142
> [<c02e70ad>] __make_request+0xc8/0x305
> [<c02e7480>] generic_make_request+0x122/0x1d9
> [<c03ee0e3>] __map_bio+0x33/0xa9
> [<c03ee36c>] __clone_and_map+0xda/0x34c
> [<c0148fce>] mempool_alloc+0x2a/0xdb
> [<c028aa3c>] xfs_ilock+0x58/0xa0
> [<c029168b>] xfs_iomap+0x216/0x4b7
> [<c02b2000>] __xfs_get_blocks+0x6b/0x226
> [<c02f2792>] radix_tree_node_alloc+0x16/0x57
> [<c02f2997>] radix_tree_insert+0xb0/0x126
> [<c02b21e3>] xfs_get_blocks+0x28/0x2d
> [<c0183a32>] block_read_full_page+0x192/0x346
> [<c02b21bb>] xfs_get_blocks+0x0/0x2d
> [<c028a667>] xfs_iget+0x145/0x150
> [<c018982d>] do_mpage_readpag 28aba1>] xfs_iunlock+0x43/0x84
> [<c02a8096>] xfs_vget+0xe1/0xf2
> [<c020a578>] find_exported_dentry+0x71/0x4b6
> [<c014c4a4>] __do_page_cache_readahead+0x88/0x153
> [<c0189aa4>] mpage_readpage+0x4b/0x5e
> [<c02b21bb>] xfs_get_blocks+0x0/0x2d
> [<c014c69d>] blockable_page_cache_readahead+0x4d/0xb9
> [<c014c942>] page_cache_readahead+0x174/0x1a3
> [<c014630f>] find_get_page+0x18/0x3a
> [<c014684e>] do_generic_mapping_read+0x1b5/0x535
> [<c012621a>] __capable+0x8/0x1b
> [<c0146f6c>] generic_file_sendfile+0x68/0x83
> [<c020eff2>] nfsd_read_actor+0x0/0x10f
> [<c02b822f>] xfs_sendfile+0x94/0x164
> [<c020eff2>] nfsd_read_actor+0x0/0x10f
> [<c0211325>] nfsd_permission+0x6e/0x103
> [<c02b4868>] xfs_file_sendfile+0x4c/0x5c
> [<c020eff2>] nfsd_read_actor+0x0/0x10f
> [<c020f445>] nfsd_vfs_read+0x344/0x361
> [<c020eff2>] nfsd_read_actor+0x0/0x ] nfsd_read+0xd8/0xf9
> [<c021548e>] nfsd3_proc_read+0xb0/0x174
> [<c02170b4>] nfs3svc_decode_readargs+0x0/0xf7
> [<c020b535>] nfsd_dispatch+0x8a/0x1f5
> [<c048c43e>] svcauth_unix_set_client+0x11d/0x175
> [<c0488d73>] svc_process+0x4fd/0x681
> [<c020b39b>] nfsd+0x163/0x273
> [<c020b238>] nfsd+0x0/0x273
> [<c01037fb>] kernel_thread_helper+0x7/0x10
> ###
> 
>  
> 
> Thanks!
> 
>  
> 
> ~Jay
> 
>  
> 
> From: Jay Sullivan [mailto:jpspgd@???] 
> Sent: Thursday, December 20, 2007 9:01 PM
> To: xfs@???
> Cc: Jay Sullivan
> Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c
> 
>  
> 
> I'm still seeing problems.  =(
> 
>  
> 
> Most recently I have copied all of the data off of the suspect XFS
> volume onto another fresh XFS volume.  A few days later I saw the same
> messages show up in dmesg.  I haven't had a catastrophic failure that
> makes the kernel remount the FS RO, but I don't want to wait for that to
> happen.
> 
>  
> 
> Today I upgraded to the latest stable kernel in Gentoo (2.6.23-r3) and
> I'm still on xfsprogs 2.9.4, also the latest stable release.  A few
> hours after rebooting to load the new kernel, I saw the following in
> dmesg:
> 
>  
> 
> ####################
> 
> attempt to access beyond end of device
> 
> dm-0: rw=0, want=68609558288793608, limit=8178892800
> 
> I/O error in files dev dm-0 block 0xf3c0079e000000
> ("xfs_trans_read_buf") error 5 buf count 4096
> 
> attempt to access beyond end of device
> 
> dm-0: rw=0, want=68609558288793608, limit=8178892800
> 
> I/O error in filesystem ("dm-0") meta-data dev dm-0 block
> 0xf3c0079e000000       ("xfs_trans_read_buf") error 5 buf count 4096
> 
> attempt to access beyond end of device
> 
> dm-0: rw=0, want=68609558288793608, limit=8178892800
> 
> I/O error in filesystem ("dm-0") meta-data dev dm-0 block
> 0xf3c0079e000000       ("xfs_trans_read_buf") error 5 buf count 4096
> 
> attempt to access beyond end of device
> 
> dm-0: rw=0, want=68609558288793608, limit=8178892800
> 
> I/O error in filesystem ("dm-0") meta-data dev dm-0 block
> 0xf3c0079e000000       ("xfs_trans_read_buf") error 5 buf count 4096
> 
> ###################
> 
>  
> 
> These are the same to access a block that is WAAAY outside of the range
> of my drives) that I was seeing before the last time my FS got remounted
> read-only by the colonel.
> 
>  
> 
> Any ideas?  What other information can I gather that would help with
> troubleshooting?  Here are some more specifics:
> 
>  
> 
> This is a Dell PowerEdge 1850 with a FusionMPT/LSI fibre channel card.
> The XFS volume is a 3.9TB logical volume in LVM.  The volume group is
> spread across LUNs of a Apple XServe RAIDs which are connected o'er FC
> to our fabric.  I just swapped FC switches (to a different brand even)
> and the problem was showing before and after the switch switch, so
> that's not it.  I have also swapped FC cards, upgraded FC card firmware,
> updated BIOSs, etc..  This server sees heavy NFS (v3) and samba
> (currently 3.0.24 until the current regression bug is squashed and
> stable) traffic.   usually sees 200-300Mbps throughput 24/7, although
> sometimes more.
> 
>  
> 
> Far-fetched:  Is there any way that a particular file on my FS, when
> accessed, is causing the problem?  
> 
>  
> 
> I have a very similar system (Dell PE 2650, same FC card, same type of
> RAID, same SFP cables, same GPT scheme, same kernel) but instead with an
> ext3 (full journal) FS in a 5.[something]TB logical volume (LVM) with no
> problems.  Oh, and it sees system load values in the mid-20s just about
> all day.  
> 
>  
> 
> Grasping at straws.  I need XFS to work because we'll soon be requiring
> seriously large filesystems with non-sucky extended attribute and ACL
> support.  Plus it's fast and I like it.
> 
>  
> 
> Can the XFS community help?  I don't want to have to turn to that guy
> that a  =P
> 
>  
> 
> ~Jay
> 
>  
> 
>  
> 
> On Nov 14, 2007, at 10:05 AM, Jay Sullivan wrote:
> 
> 
> 
> 
> 
> Of course this had to happen one more time before my scheduled
> maintenance window...  Anyways, here's all of the good stuff I
> collected.  Can anyone make sense of it?  Oh, and I upgraded to xfsprogs
> 2.9.4 last week, so all output you see is with that version.  
> 
> Thanks!
> 
> ###################
> 
> dmesg output
> 
> ###################
> 
> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4533 of file
> fs/xfs/xfs_bmap.c.  Caller 0xc028c5a2
> [<c026bc58>] xfs_bmap_read_extents+0x3bd/0x498
> [<c028c5a2>] xfs_iread_extents+0x74/0xe1
> [<c028fb02>] xfs_iext_realloc_direct+0xa4/0xe7
> [<c028f3ef>] xfs_iex t;c028c5a2>] xfs_iread_extents+0x74/0xe1
> [<c026befd>] xfs_bmapi+0x1ca/0x173f
> [<c02e2d7e>] elv_rb_add+0x6f/0x88
> [<c02eb843>] as_update_rq+0x32/0x72
> [<c02ec08b>] as_add_request+0x76/0xa4
> [<c02e330c>] elv_insert+0xd5/0x142
> [<c02e70ad>] __make_request+0xc8/0x305
> [<c02e7480>] generic_make_request+0x122/0x1d9
> [<c03ee0e3>] __map_bio+0x33/0xa9
> [<c03ee36c>] __clone_and_map+0xda/0x34c
> [<c0148fce>] mempool_alloc+0x2a/0xdb
> [<c028aa3c>] xfs_ilock+0x58/0xa0
> [<c029168b>] xfs_iomap+0x216/0x4b7
> [<c02b2000>] __xfs_get_blocks+0x6b/0x226
> [<c02f2792>] radix_tree_node_alloc+0x16/0x57
> [<c02f2997>] radix_tree_insert+0xb0/0x126
> [<c02b21e3>] xfs_get_blocks+0x28/0x2d
> [<c0183a32>] block_read_full_page+0x192/0x346
> [<c02b21bb>] xfs_get_blocks+0x0/0x2d
> [<c028a667>] xfs_iget+0x145/0x150
> [<c018982d>] do_mpage_readpag 28aba1>] xfs_iunlock+0x43/0x84
> [<c02a8096>] xfs_vget+0xe1/0xf2
> [<c020a578>] find_exported_dentry+0x71/0x4b6
> [<c014c4a4>] __do_page_cache_readahead+0x88/0x153
> [<c0189aa4>] mpage_readpage+0x4b/0x5e
> [<c02b21bb>] xfs_get_blocks+0x0/0x2d
> [<c014c69d>] blockable_page_cache_readahead+0x4d/0xb9
> [<c014c942>] page_cache_readahead+0x174/0x1a3
> [<c014630f>] find_get_page+0x18/0x3a
> [<c014684e>] do_generic_mapping_read+0x1b5/0x535
> [<c012621a>] __capable+0x8/0x1b
> [<c0146f6c>] generic_file_sendfile+0x68/0x83
> [<c020eff2>] nfsd_read_actor+0x0/0x10f
> [<c02b822f>] xfs_sendfile+0x94/0x164
> [<c020eff2>] nfsd_read_actor+0x0/0x10f
> [<c0211325>] nfsd_permission+0x6e/0x103
> [<c02b4868>] xfs_file_sendfile+0x4c/0x5c
> [<c020eff2>] nfsd_read_actor+0x0/0x10f
> [<c020f445>] nfsd_vfs_read+0x344/0x361
> [<c020eff2>] nfsd_read_actor+0x0/0x ] nfsd_read+0xd8/0xf9
> [<c021548e>] nfsd3_proc_read+0xb0/0x174
> [<c02170b4>] nfs3svc_decode_readargs+0x0/0xf7
> [<c020b535>] nfsd_dispatch+0x8a/0x1f5
> [<c048c43e>] svcauth_unix_set_client+0x11d/0x175
> [<c0488d73>] svc_process+0x4fd/0x681
> [<c020b39b>] nfsd+0x163/0x273
> [<c020b238>] nfsd+0x0/0x273
> [<c01037fb>] kernel_thread_helper+0x7/0x10
> =======================
> attempt to access beyond end of device
> dm-1: rw=0, want=6763361770196172808, limit=7759462400
> I/O error in filesystem ("dm-1") meta-data dev dm-1 block
> 0x5ddc49b238000000       ("xfs_trans_read_buf") error 5 buf count 4096
> xfs_force_shutdown(dm-1,0x1) called from line 415 of file
> fs/xfs/xfs_trans_buf.c.  Return address = 0xc02baa25
> Filesystem "dm-1": I/O Error Detected.  Shutting down filesystem: dm-1
> Please umount the filesystem, and rectify the problem(s)
> 
> 
> #################### I umount'ed and mount'ed the FS several times, but
> xfs_repair still told me to use -L...  Any ideas?
> 
> #######################
> 
> server-files ~ # umount /mnt/san/
> server-files ~ # mount /mnt/san/
> server-files ~ # umount /mnt/san/
> server-files ~ # xfs_repair
> /dev/server-files-sanvg01/server-files-sanlv01 
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>        - zero log...
> ERROR: The filesystem has valuable metadata changes in a log which needs
> to
> be replayed.  Mount the filesystem to replay the log, and unmount it
> before
> re-running xfs_repair.  If you are unable to mount the filesystem, then
> use
> the -L option to destroy the log and attempt a repair.
> Note that destroying the log may cause corruption -- please attempt a
> mount
> of the filesystem before doing this.
> server-files ~ # xfs_repair -L
> /dev/server-files-sanvg01/server-files-sanlv01 
> Pha perblock...
> Phase 2 - using internal log
>        - zero log...
> ALERT: The filesystem has valuable metadata changes in a log which is
> being
> destroyed because the -L option was used.
>        - scan filesystem freespace and inode maps...
>        - found root inode chunk
> Phase 3 - for each AG...
>        - scan and clear agi unlinked lists...
>        - process known inodes and perform inode discovery...
>        - agno = 0
> 4002: Badness in key lookup (length)
> bp=(bno 2561904, len 16384 bytes) key=(bno 2561904, len 8192 bytes)
> 8003: Badness in key lookup (length)
> bp=(bno 0, len 512 bytes) key=(bno 0, len 4096 bytes)
> bad bmap btree ptr 0x5f808b0400000000 in ino 5123809
> bad data fork in inode 5123809
> cleared inode 5123809
> bad magi 480148 (data fork) bmbt block 0
> bad data fork in inode 7480148
> cleared inode 7480148
>        - agno = 1
>        - agno = 2
>        - agno = 3
>        - agno = 4
>        - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>        - setting up duplicate extent list...
>        - check for inodes claiming duplicate blocks...
>        - agno = 0
>        - agno = 1
>        - agno = 2
>        - agno = 3
>        - agno = 4
> entry "Fuller_RotoscopeCorrected.mov" at block 0 offset 184 in directory
> inode 89923 7480148
>        clearing inode number in entry at offset 184...
> Phase 5 - rebuild AG headers and trees...
>        - reset superblock...
> 4000: Badness in key lookup (length)
> bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes)
> Phase 6 - check inode connectivity...
>        - resetting contents of realtime bitmap and summary inodes
>        - traversing filesystem ...
> bad hash table for directory inode 8992373 (no data entry): rebuilding
> rebuilding directory inode 8992373
> 4000: Badness in key lookup (length)
> bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes)
> 4000: Badness in key lookup (length)
> bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes)
>        - traversal finished ...
>        - moving disconnected inodes to lost+found nd correct link
> counts...
> 4000: Badness in key lookup (length)
> bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes)
> done
> server-files ~ # mount /mnt/san
> server-files ~ # umount /mnt/san
> server-files ~ # xfs_repair -L
> /dev/server-files-sanvg01/server-files-sanlv01 
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>        - zero log...
> 
> server-files ~ # xfs_repair
> /dev/server-files-sanvg01/server-files-sanlv01 
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>        - zero log...
> XFS: totally zeroed log
>        - scan filesystem freespace and inode maps...
>        - found root inode chunk
> Phase 3 - for each AG...
>        - scan and clear agi unlinked lists...
>        - proc orm inode discovery...
>        - agno = 0
>        - agno = 1
>        - agno = 2
>        - agno = 3
>        - agno = 4
>        - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>        - setting up duplicate extent list...
>        - check for inodes claiming duplicate blocks...
>        - agno = 0
>        - agno = 1
>        - agno = 2
>        - agno = 3
>        - agno = 4
> Phase 5 - rebuild AG headers and trees...
>        - reset su check inode connectivity...
>        - resetting contents of realtime bitmap and summary inodes
>        - traversing filesystem ...
>        - traversal finished ...
>        - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
> 
> ################
> 
> So that's it for now.  Next week I'll be rsyncing all of the data off of
> this volume to another array.  I still want to know what's happening,
> though...  *pout*
> 
> Anyways, thanks a lot for everyone's help.
> 
> ~Jay
> 
> 
> -----Original Message-----
> From: xfs-bounce@??? [mailto:xfs-bounce@???] On Behalf
> Of Jay Sullivan
> Sent: Friday, November 02, 2007 10:49 AM
> To: xfs@???
> Subject: RE: xf rom file fs/xfs/xfs_trans_buf.c
> 
> What can I say about Murphy and his silly laws?  I just had a drive fail
> on my array.  I wonder if this is the root of my problems...  Yay
> parity.  
> 
> ~Jay
> 
> -----Original Message-----
> From: xfs-bounce@??? [mailto:xfs-bounce@???] On Behalf
> Of Jay Sullivan
> Sent: Friday, November 02, 2007 10:00 AM
> To: xfs@???
> Subject: RE: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c
> 
> I lost the xfs_repair output on an xterm with only four lines of
> scrollback...  I'll definitely be more careful to preserve more
> 'evidence' next time.  =(  "Pics or it didn't happen", right?
> 
> I just upgraded xfsprogs and will scan the disk during my next scheduled
> downtime (probably in about 2 weeks).  I'm tempted to just wipe the
> volume and start over:  I have enough ' to copy
> everything out to a fresh XFS volume.
> 
> Regarding "areca":  I'm using hardware RAID built into Apple XServe
> RAIDs o'er LSI FC929X cards.
> 
> Someone else offered the likely explanation that the btree is corrupted.
> Isn't this something xfs_repair should be able to fix?  Would it be
> easier, safer, and faster to move the data to a new volume (and restore
> corrupted files if/as I find them from backup)?  We're talking about
> just less than 4TB of data which used to take about 6 hours to fsck (one
> pass) with ext3.  Restoring the whole shebang from backups would
> probably take the better part of 12 years (waiting for compression,
> resetting ACLs, etc.)...
> 
> FWIW, another (way less important,) much busier and significantly larger
> logical volume on the same array has been totally fine.  Murphy--go
> figure.
> 
> Thanks!
> 
> -----Original Message-----
> From: Eric Sandeen [mail >] 
> Sent: Thursday, November 01, 2007 10:30 PM
> To: Jay Sullivan
> Cc: <mailto:sandeen@???> xfs@???
> Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c
> 
> Jay Sullivan wrote:
> 
> 
> 
> Good eye:  it wasn't mountable, thus the -L flag.  No recent  
> 
>     (unplanned) power outages.  The machine and the array that holds
> the  
> 
>     disks are both on serious batteries/UPS and the array's cache  
> 
>     batteries are in good health.
> 
> 
> Did you have the xfs_repair output to see what it found?  You might also
> grab the very latest xfsprogs (2.9.4) in case it's catching more cases.
> 
> I hate it when people suggest running memtest86, but I might do that
> anyway.  :)
> 
> What controller are you using?  If you say "areca" I might be on to
> something e seen...
> 
> -Eric
> 
> 
> 
> [[HTML alternate version deleted]]
> 
> 
> 
> 

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread
[parent not found: <B3EDBE0F860AF74BAA82EF17A7CDEDC660BE05A3@svits26.main.ad.rit.edu>]
* xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c
@ 2007-11-02  2:08 Jay Sullivan
  2007-11-02  5:18 ` David Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Jay Sullivan @ 2007-11-02  2:08 UTC (permalink / raw)
  To: xfs

(Sorry if this is a dupe to the list; it has been a long day.)

I have an XFS filesystem that has had the following happen twice in 3  
months, both times an impossibly large block number was requested.   
Unfortunately my logs don’t go back far enough for me to know if it  
was the _exact_ same block both times…  I’m running xfsprogs 2.8.21.   
Excerpt from syslog (hostname obfuscated to ‘servername’ to protect  
the innocent):

##
Nov  1 14:06:32 servername dm-1: rw=0, want=39943195856896,  
limit=7759462400
Nov  1 14:06:32 servername I/O error in filesystem ("dm-1") meta-data  
dev dm-1 block 0x245400000ff8       ("xfs_trans_read_buf") error 5 buf  
count 4096
Nov  1 14:06:32 servername xfs_force_shutdown(dm-1,0x1) called from  
line 415 of file fs/xfs/xfs_trans_buf.c.  Return address = 0xc02baa25
Nov  1 14:06:32 servername Filesystem "dm-1": I/O Error Detected.   
Shutting down filesystem: dm-1
Nov  1 14:06:32 servername Please umount the filesystem, and rectify  
the problem(s)
###

I ran xfs_repair –L on the FS and it could be mounted again, but how  
long until it happens a third time?  What concerns me is that this is  
a FS smaller than 4TB and 39943195856896 (or 0x245400000ff8) seems  
like a block that I would only have if my FS was muuuuuch larger.  The  
following is output from some pertinent programs:

###
servername ~ # xfs_info /mnt/san
meta-data=/dev/servername-sanvg01/servername-sanlv01 isize=256     
agcount=5, agsize=203161600 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=969932800,  
imaxpct=25
          =                       sunit=0      swidth=0 blks,  
unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
          =                       sectsz=512   sunit=0 blks, lazy- 
count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0
servername ~ # mount
/dev/sda3 on / type ext3 (rw,noatime,acl)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
udev on /dev type tmpfs (rw,nosuid)
devpts on /dev/pts type devpts (rw,nosuid,noexec)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
usbfs on /proc/bus/usb type usbfs  
(rw,noexec,nosuid,devmode=0664,devgid=85)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc  
(rw,noexec,nosuid,nodev)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/mapper/servername--sanvg01-servername--sanlv01 on /mnt/san type  
xfs (rw,noatime,nodiratime,logbufs=8,attr2)
/dev/mapper/servername--sanvg01-servername--rendersharelv01 on /mnt/ 
san/rendershare type xfs (rw,noatime,nodiratime,logbufs=8,attr2)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
servername ~ # uname -a
Linux servername 2.6.20-gentoo-r8 #7 SMP Fri Jun 29 14:46:02 EDT 2007  
i686 Intel(R) Xeon(TM) CPU 3.20GHz GenuineIntel GNU/Linux
###

Does anyone know if this points to a bad block on a disk or if  
something is corrupted and can be fixed with some expert knowledge of  
xfs_db?

~Jay

[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 21+ messages in thread
* xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c
@ 2007-11-01 20:06 Jay Sullivan
  2007-11-02  2:14 ` Eric Sandeen
  0 siblings, 1 reply; 21+ messages in thread
From: Jay Sullivan @ 2007-11-01 20:06 UTC (permalink / raw)
  To: xfs

I have an XFS filesystem that has had the following happen twice in 3
months, both times with an impossibly large block number was requested.
Unfortunately my logs don't go back far enough for me to know if it was
the _exact_ same block both times...  I'm running xfsprogs 2.8.21.
Excerpt from syslog (hostname obfuscated to 'servername' to protect the
innocent):

 

##

Nov  1 14:06:32 servername dm-1: rw=0, want=39943195856896,
limit=7759462400

Nov  1 14:06:32 servername I/O error in filesystem ("dm-1") meta-data
dev dm-1 block 0x245400000ff8       ("xfs_trans_read_buf") error 5 buf
count 4096

Nov  1 14:06:32 servername xfs_force_shutdown(dm-1,0x1) called from line
415 of file fs/xfs/xfs_trans_buf.c.  Return address = 0xc02baa25

Nov  1 14:06:32 servername Filesystem "dm-1": I/O Error Detected.
Shutting down filesystem: dm-1

Nov  1 14:06:32 servername Please umount the filesystem, and rectify the
problem(s)

###

 

I ran xfs_repair -L on the FS and it could be mounted again, but how
long until it happens a third time?  What concerns me is that this is a
FS smaller than 4TB and 39943195856896 (or 0x245400000ff8) seems like a
block that I would only have if my FS was muuuuuch larger.  The
following is output from some pertinent programs:

 

###

servername ~ # xfs_info /mnt/san

meta-data=/dev/servername-sanvg01/servername-sanlv01 isize=256
agcount=5, agsize=203161600 blks

         =                       sectsz=512   attr=2

data     =                       bsize=4096   blocks=969932800,
imaxpct=25

         =                       sunit=0      swidth=0 blks, unwritten=1

naming   =version 2              bsize=4096  

log      =internal               bsize=4096   blocks=32768, version=1

         =                       sectsz=512   sunit=0 blks, lazy-count=0

realtime =none                   extsz=4096   blocks=0, rtextents=0

servername ~ # mount

/dev/sda3 on / type ext3 (rw,noatime,acl)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)

udev on /dev type tmpfs (rw,nosuid)

devpts on /dev/pts type devpts (rw,nosuid,noexec)

shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)

usbfs on /proc/bus/usb type usbfs
(rw,noexec,nosuid,devmode=0664,devgid=85)

binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc
(rw,noexec,nosuid,nodev)

nfsd on /proc/fs/nfsd type nfsd (rw)

/dev/mapper/servername--sanvg01-servername--sanlv01 on /mnt/san type xfs
(rw,noatime,nodiratime,logbufs=8,attr2)

/dev/mapper/servername--sanvg01-servername--rendersharelv01 on
/mnt/san/rendershare type xfs (rw,noatime,nodiratime,logbufs=8,attr2)

rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

servername ~ # uname -a

Linux servername 2.6.20-gentoo-r8 #7 SMP Fri Jun 29 14:46:02 EDT 2007
i686 Intel(R) Xeon(TM) CPU 3.20GHz GenuineIntel GNU/Linux

###

 

Does anyone know if this points to a bad block on a disk or if something
is corrupted and can be fixed with some expert knowledge of xfs_db?

 

~Jay



[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2009-02-25 18:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-24 13:04 xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Federico Sevilla III
2009-02-24 22:46 ` Dave Chinner
2009-02-25 10:00   ` Federico Sevilla III
2009-02-25 11:51     ` Michael Monnerie
2009-02-25 18:47     ` Federico Sevilla III
     [not found] <B3EDBE0F860AF74BAA82EF17A7CDEDC660BE05A3@svits26.main.ad.rit.edu>
2007-12-21  2:01 ` Jay Sullivan
2008-01-03 15:55   ` Jay Sullivan
2008-08-04 16:55   ` Richard Freeman
  -- strict thread matches above, loose matches on Subject: below --
2007-11-02  2:08 Jay Sullivan
2007-11-02  5:18 ` David Chinner
2007-11-01 20:06 Jay Sullivan
2007-11-02  2:14 ` Eric Sandeen
2007-11-02  2:22   ` Jay Sullivan
2007-11-02  2:30     ` Eric Sandeen
2007-11-02  9:07       ` Ralf Gross
2007-11-02 16:10         ` Eric Sandeen
2007-11-02 14:00       ` Jay Sullivan
2007-11-02 14:49         ` Jay Sullivan
2007-11-14 15:05           ` Jay Sullivan
2007-11-15  3:26             ` Eric Sandeen
2007-11-02  4:37   ` Timothy Shimmin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox