All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Mike Dacre <mike.dacre@gmail.com>, xfs@oss.sgi.com
Subject: Re: Sudden File System Corruption
Date: Mon, 09 Dec 2013 13:04:29 -0600	[thread overview]
Message-ID: <52A6143D.7050002@sandeen.net> (raw)
In-Reply-To: <CAPd9ww_qT9J_Rt04g7+OApoBeggNOyWNwD+57DiDTuUvz-O-0g@mail.gmail.com>

On 12/4/13, 8:55 PM, Mike Dacre wrote:
> Hi Folks,
> 
> Apologies if this is the wrong place to post or if this has been answered already.
> 
> I have a 16 2TB drive RAID6 array powered by an LSI 9240-4i.  It has an XFS filesystem and has been online for over a year.  It is accessed by 23 different machines connected via Infiniband over NFS v3.  I haven't had any major problems yet, one drive failed but it was easily replaced.
> 
> However, today the drive suddenly stopped responding and started returning IO errors when any requests were made.  This happened while it was being accessed by  5 different users, one was doing a very large rm operation (rm *sh on thousands on files in a directory).  Also, about 30 minutes before we had connected the globus connect endpoint to allow easy file transfers to SDSC.
> 
> I rebooted the machine which hosts it and checked the RAID6 logs, no physical problems with the drives at all.  I tried to mount and got the following error:
> 
> XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1510 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0432ba1
> mount: Structure needs cleaning

I've seen a similar problem w/ a customer on a similar (proper) RHEL6 kernel.

Just to rule something in or out, do you regularly use xfs_fsr on this filesystem?

Is this something you can reliably reproduce?

thanks,
-Eric

> I ran xfs_check and got the following message:
> ERROR: The filesystem has valuable metadata changes in a log which needs to
> be replayed.  Mount the filesystem to replay the log, and unmount it before
> re-running xfs_check.  If you are unable to mount the filesystem, then use
> the xfs_repair -L option to destroy the log and attempt a repair.
> Note that destroying the log may cause corruption -- please attempt a mount
> of the filesystem before doing this.
> 
> 
> I checked the log and found the following message:
> 
> Dec  4 18:26:33 fruster kernel: XFS (sda1): Mounting Filesystem
> Dec  4 18:26:33 fruster kernel: XFS (sda1): Starting recovery (logdev: internal)
> Dec  4 18:26:36 fruster kernel: XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1510 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0432ba1
> Dec  4 18:26:36 fruster kernel: 
> Dec  4 18:26:36 fruster kernel: Pid: 5491, comm: mount Not tainted 2.6.32-358.23.2.el6.x86_64 #1
> Dec  4 18:26:36 fruster kernel: Call Trace:
> Dec  4 18:26:36 fruster kernel: [<ffffffffa045b0ef>] ? xfs_error_report+0x3f/0x50 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa0432ba1>] ? xfs_free_extent+0x101/0x130 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa0430c2b>] ? xfs_free_ag_extent+0x58b/0x750 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa0432ba1>] ? xfs_free_extent+0x101/0x130 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa046de2d>] ? xlog_recover_process_efi+0x1bd/0x200 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa04796ea>] ? xfs_trans_ail_cursor_set+0x1a/0x30 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa046ded2>] ? xlog_recover_process_efis+0x62/0xc0 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa0471f34>] ? xlog_recover_finish+0x24/0xd0 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa046a3ac>] ? xfs_log_mount_finish+0x2c/0x30 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa0475a61>] ? xfs_mountfs+0x421/0x6a0 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa048d6f4>] ? xfs_fs_fill_super+0x224/0x2e0 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffff811847ce>] ? get_sb_bdev+0x18e/0x1d0
> Dec  4 18:26:36 fruster kernel: [<ffffffffa048d4d0>] ? xfs_fs_fill_super+0x0/0x2e0 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffffa048b5b8>] ? xfs_fs_get_sb+0x18/0x20 [xfs]
> Dec  4 18:26:36 fruster kernel: [<ffffffff81183c1b>] ? vfs_kern_mount+0x7b/0x1b0
> Dec  4 18:26:36 fruster kernel: [<ffffffff81183dc2>] ? do_kern_mount+0x52/0x130
> Dec  4 18:26:36 fruster kernel: [<ffffffff811a3f22>] ? do_mount+0x2d2/0x8d0
> Dec  4 18:26:36 fruster kernel: [<ffffffff811a45b0>] ? sys_mount+0x90/0xe0
> Dec  4 18:26:36 fruster kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
> Dec  4 18:26:36 fruster kernel: XFS (sda1): Failed to recover EFIs
> Dec  4 18:26:36 fruster kernel: XFS (sda1): log mount finish failed
> 
> 
> I went back and looked at the log from around the time the drive died and found this message:
> Dec  4 17:58:16 fruster kernel: XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1510 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0432ba1
> Dec  4 17:58:16 fruster kernel: 
> Dec  4 17:58:16 fruster kernel: Pid: 4548, comm: nfsd Not tainted 2.6.32-358.23.2.el6.x86_64 #1
> Dec  4 17:58:16 fruster kernel: Call Trace:
> Dec  4 17:58:16 fruster kernel: [<ffffffffa045b0ef>] ? xfs_error_report+0x3f/0x50 [xfs]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa0432ba1>] ? xfs_free_extent+0x101/0x130 [xfs]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa0430c2b>] ? xfs_free_ag_extent+0x58b/0x750 [xfs]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa0432ba1>] ? xfs_free_extent+0x101/0x130 [xfs]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa043c89d>] ? xfs_bmap_finish+0x15d/0x1a0 [xfs]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa04626ff>] ? xfs_itruncate_finish+0x15f/0x320 [xfs]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa047e370>] ? xfs_inactive+0x330/0x480 [xfs]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa04793f4>] ? _xfs_trans_commit+0x214/0x2a0 [xfs]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa048b9a0>] ? xfs_fs_clear_inode+0xa0/0xd0 [xfs]
> Dec  4 17:58:16 fruster kernel: [<ffffffff8119d31c>] ? clear_inode+0xac/0x140
> Dec  4 17:58:16 fruster kernel: [<ffffffff8119dad6>] ? generic_delete_inode+0x196/0x1d0
> Dec  4 17:58:16 fruster kernel: [<ffffffff8119db75>] ? generic_drop_inode+0x65/0x80
> Dec  4 17:58:16 fruster kernel: [<ffffffff8119c9c2>] ? iput+0x62/0x70
> Dec  4 17:58:16 fruster kernel: [<ffffffff81199610>] ? dentry_iput+0x90/0x100
> Dec  4 17:58:16 fruster kernel: [<ffffffff8119c278>] ? d_delete+0xe8/0xf0
> Dec  4 17:58:16 fruster kernel: [<ffffffff8118fe99>] ? vfs_unlink+0xd9/0xf0
> Dec  4 17:58:16 fruster kernel: [<ffffffffa071cf4f>] ? nfsd_unlink+0x1af/0x250 [nfsd]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa0723f03>] ? nfsd3_proc_remove+0x83/0x120 [nfsd]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa071543e>] ? nfsd_dispatch+0xfe/0x240 [nfsd]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa068e624>] ? svc_process_common+0x344/0x640 [sunrpc]
> Dec  4 17:58:16 fruster kernel: [<ffffffff81063990>] ? default_wake_function+0x0/0x20
> Dec  4 17:58:16 fruster kernel: [<ffffffffa068ec60>] ? svc_process+0x110/0x160 [sunrpc]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa0715b62>] ? nfsd+0xc2/0x160 [nfsd]
> Dec  4 17:58:16 fruster kernel: [<ffffffffa0715aa0>] ? nfsd+0x0/0x160 [nfsd]
> Dec  4 17:58:16 fruster kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0
> Dec  4 17:58:16 fruster kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
> Dec  4 17:58:16 fruster kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0
> Dec  4 17:58:16 fruster kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> Dec  4 17:58:16 fruster kernel: XFS (sda1): xfs_do_force_shutdown(0x8) called from line 3863 of file fs/xfs/xfs_bmap.c.  Return address = 0xffffffffa043c8d6
> Dec  4 17:58:16 fruster kernel: XFS (sda1): Corruption of in-memory data detected.  Shutting down filesystem
> Dec  4 17:58:16 fruster kernel: XFS (sda1): Please umount the filesystem and rectify the problem(s)
> Dec  4 17:58:19 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 17:58:49 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 17:59:19 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 17:59:49 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 18:00:19 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 18:00:49 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 18:01:19 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 18:01:49 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 18:02:05 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 18:02:05 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> Dec  4 18:02:05 fruster kernel: XFS (sda1): xfs_do_force_shutdown(0x1) called from line 1061 of file fs/xfs/linux-2.6/xfs_buf.c.  Return address = 0xffffffffa04856e3
> Dec  4 18:02:19 fruster kernel: XFS (sda1): xfs_log_force: error 5 returned.
> 
> 
> I have attached the complete log from the time it died until now.
> 
> In the end, I successfully repaired the filesystem with `xfs_repair -L /dev/sda1`.  However, I am nervous that some files may have been corrupted.
> 
> Do any of you have any idea what could have caused this problem?
> 
> Thanks,
> 
> Mike
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      parent reply	other threads:[~2013-12-09 19:04 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-05  2:55 Sudden File System Corruption Mike Dacre
2013-12-05  3:40 ` Dave Chinner
2013-12-05  3:46   ` Mike Dacre
2013-12-05  3:59     ` Dave Chinner
2013-12-05  8:10 ` Stan Hoeppner
     [not found]   ` <CAPd9ww9hsOFK6pxqRY-YtLLAkkJHCuSi1BaM4n9=2XTjNVAn2Q@mail.gmail.com>
2013-12-05 15:58     ` Fwd: " Mike Dacre
2013-12-06  8:58       ` Stan Hoeppner
     [not found]         ` <CAPd9ww8+W2VX2HAfxEkVN5mL1a_+=HDAStf1126WSE33Vb=VsQ@mail.gmail.com>
2013-12-06 23:15           ` Fwd: " Mike Dacre
2013-12-07 11:12           ` Stan Hoeppner
2013-12-07 18:36             ` Mike Dacre
2013-12-08  5:22               ` Stan Hoeppner
2013-12-08 15:03                 ` Emmanuel Florac
2013-12-09  0:58                   ` Stan Hoeppner
2013-12-09  1:40                     ` Dave Chinner
2013-12-09 19:51                       ` Stan Hoeppner
2013-12-09 22:21                         ` Dave Chinner
2013-12-09 22:30                           ` Emmanuel Florac
2013-12-10  3:39                             ` Stan Hoeppner
2013-12-10  8:45                               ` Emmanuel Florac
2013-12-09 22:24                         ` Emmanuel Florac
2013-12-09  9:49                     ` Emmanuel Florac
2013-12-05 17:40 ` Ben Myers
     [not found]   ` <20131205175053.GG1935@sgi.com>
     [not found]     ` <CAPd9ww9YFbMEe-dM96zHsbRJgQuBHfF=ipromch1Yw6SzPUftg@mail.gmail.com>
     [not found]       ` <20131206002308.GS10553@sgi.com>
     [not found]         ` <CAPd9ww8XDzGbSZsEEoCmSuJ+KBYUWqHeRON1sFr6bG1fZ6af7w@mail.gmail.com>
     [not found]           ` <20131206225612.GU10553@sgi.com>
2013-12-06 23:15             ` Mike Dacre
2013-12-08 22:20               ` Dave Chinner
2013-12-09 19:04 ` Eric Sandeen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52A6143D.7050002@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=mike.dacre@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.