From: Eric Sandeen <sandeen@sandeen.net>
To: Gregory Farnum <gregory.farnum@dreamhost.com>
Cc: xfs@oss.sgi.com
Subject: Re: EFSCORRUPTED on mount?
Date: Tue, 22 Nov 2011 15:53:07 -0600 [thread overview]
Message-ID: <4ECC19C3.5070905@sandeen.net> (raw)
In-Reply-To: <CAF3hT9CA23aDfYTF__mVEM7jRq=ZEgqhfX49hK9kcBqM_+h0CQ@mail.gmail.com>
On 11/22/11 1:29 PM, Gregory Farnum wrote:
> On Tue, Nov 22, 2011 at 10:52 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> Ok, error 5 is EIO:
>>
>> 8 include/asm-generic/errno-base.h 8 #define EIO 5
>>
>> So the very first error you saw was "xfs_do_force_shutdown(0x1) called from line..." ?
>> Or the "xfs_log_force error 5 returned?" I'm wondering if there was more
>> before this.
>>
>> It's worth looking carefully to see the very first problem reported by xfs,
>> and posibly from storage before that. (i.e. did your storage go wonky?)
> Oh, we have a few more logs than I'd thought to look for. The xfs
> related messages from bootup after the kernel upgrade:
> Nov 17 16:01:01 cephstore6358 kernel: [ 1.924668] SGI XFS with
> security attributes, large block/inode numbers, no debug enabled
> ...
> Nov 17 16:01:01 cephstore6358 kernel: [ 190.047204] XFS (sdc1):
> Mounting Filesystem
> Nov 17 16:01:01 cephstore6358 kernel: [ 190.198126] XFS (sdc1):
> Starting recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 190.281929] XFS (sdc1):
> Ending recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 190.296303] XFS (sde1):
> Mounting Filesystem
> Nov 17 16:01:01 cephstore6358 kernel: [ 190.430809] XFS (sde1):
> Starting recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 197.486417] XFS (sde1):
> Ending recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 197.492596] XFS (sdg1):
> Mounting Filesystem
> Nov 17 16:01:01 cephstore6358 kernel: [ 197.652085] XFS (sdg1):
> Starting recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 197.724493] XFS (sdg1):
> Ending recovery (logdev: internal)
so by here sdg1 had to go through recovery, but was otherwise happy.
> Nov 17 16:01:01 cephstore6358 kernel: [ 197.730526] XFS (sdi1):
> Mounting Filesystem
> Nov 17 16:01:01 cephstore6358 kernel: [ 197.871074] XFS (sdi1):
> Starting recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 206.570177] XFS (sdi1):
> Ending recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 206.576329] XFS (sdk1):
> Mounting Filesystem
> Nov 17 16:01:01 cephstore6358 kernel: [ 206.738760] XFS (sdk1):
> Starting recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 206.823346] XFS (sdk1):
> Ending recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 206.837938] XFS (sdm1):
> Mounting Filesystem
> Nov 17 16:01:01 cephstore6358 kernel: [ 206.962455] XFS (sdm1):
> Starting recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 207.062120] XFS (sdm1):
> Ending recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 207.078134] XFS (sdo1):
> Mounting Filesystem
> Nov 17 16:01:01 cephstore6358 kernel: [ 207.240052] XFS (sdo1):
> Starting recovery (logdev: internal)
> Nov 17 16:01:01 cephstore6358 kernel: [ 207.321602] XFS (sdo1):
> Ending recovery (logdev: internal)
> ...
All that recovery a result of the icky shutdown procedure I guess....
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.214688] XFS: Internal
> error XFS_WANT_CORRUPTED_GOTO at line 1664 of file fs/xfs/xfs_alloc.c.
> Caller 0xffffffff811d6b71
And this was the first indication of trouble.
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.214692]
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.227313] Pid: 11196, comm:
> ceph-osd Not tainted 3.1.0-dho-00004-g1ffcb5c-dirty #1
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.235056] Call Trace:
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.237530]
> [<ffffffff811d606e>] ? xfs_free_ag_extent+0x4e3/0x698
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.243717]
> [<ffffffff811d6b71>] ? xfs_free_extent+0xb6/0xf9
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.249468]
> [<ffffffff811d3034>] ? kmem_zone_alloc+0x58/0x9e
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.255220]
> [<ffffffff812095f9>] ? xfs_trans_get_efd+0x21/0x2a
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.261159]
> [<ffffffff811e2011>] ? xfs_bmap_finish+0xeb/0x160
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.266993]
> [<ffffffff811f8634>] ? xfs_itruncate_extents+0xe8/0x1d0
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.273361]
> [<ffffffff811f879f>] ? xfs_itruncate_data+0x83/0xee
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.279362]
> [<ffffffff811cb0a2>] ? xfs_setattr_size+0x246/0x36c
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.285363]
> [<ffffffff811cb1e3>] ? xfs_vn_setattr+0x1b/0x2f
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.291031]
> [<ffffffff810e7875>] ? notify_change+0x16d/0x23e
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.296776]
> [<ffffffff810d2982>] ? do_truncate+0x68/0x86
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.302172]
> [<ffffffff810d2b11>] ? sys_truncate+0x171/0x173
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.307846]
> [<ffffffff8166c07b>] ? system_call_fastpath+0x16/0x1b
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.314031] XFS (sdg1):
> xfs_do_force_shutdown(0x8) called from line 3864 of file
> fs/xfs/xfs_bmap.c. Return address = 0xffffffff811e2046
by here it had shut down, and you were just riding along when
it went kablooey. Any non-xfs error just before this point?
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.340451] XFS (sdg1):
> Corruption of in-memory data detected. Shutting down filesystem
> Nov 17 16:01:01 cephstore6358 kernel: [ 214.348518] XFS (sdg1):
> Please umount the filesystem and rectify the problem(s)
> Nov 17 16:01:01 cephstore6358 kernel: [ 227.789285] XFS (sdg1):
> xfs_log_force: error 5 returned.
> Nov 17 16:01:01 cephstore6358 kernel: [ 229.820255] XFS (sdg1):
> xfs_log_force: error 5 returned.
To be honest I'm not sure offhand if this error 5 (EIO) is a
result of the shutdown, or the cause of it.
-Eric
> Nov 17 16:01:01 cephstore6358 kernel: [ 229.825550] XFS (sdg1):
> xfs_do_force_shutdown(0x1) called from line 1037 of file
> fs/xfs/xfs_buf.c. Return address = 0xffffffff811c2aa8
> Nov 17 16:01:01 cephstore6358 kernel: [ 229.845089] XFS (sdg1):
> xfs_log_force: error 5 returned.
> Nov 17 16:01:01 cephstore6358 kernel: [ 229.850388] XFS (sdg1):
> xfs_do_force_shutdown(0x1) called from line 1037 of file
> fs/xfs/xfs_buf.c. Return address = 0xffffffff811c2aa8
> (etc)
>
> I don't know the xfs code at all, but that looks like a bug to me —
> either the system got itself into a broken state from valid on-disk
> structures, or else the (best I can tell properly-ordered, barriered,
> etc) journal didn't properly protect against brokenness elsewhere.
> Also note that the initial post-reboot mount succeeded (it didn't
> break until after doing a series of truncates), and the subsequent
> ones are failing.
> -Greg
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-11-22 21:53 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-21 18:06 EFSCORRUPTED on mount? Gregory Farnum
2011-11-21 21:52 ` Emmanuel Florac
2011-11-21 22:13 ` Ben Myers
2011-11-22 0:21 ` Gregory Farnum
2011-11-22 1:41 ` Dave Chinner
2011-11-22 18:47 ` Gregory Farnum
2011-11-22 18:52 ` Eric Sandeen
2011-11-22 19:29 ` Gregory Farnum
2011-11-22 21:53 ` Eric Sandeen [this message]
2011-11-22 22:55 ` Christoph Hellwig
2011-11-23 0:03 ` Gregory Farnum
2011-11-23 15:51 ` Christoph Hellwig
2011-11-22 22:11 ` Christoph Hellwig
2011-11-22 8:06 ` Emmanuel Florac
2011-11-22 15:06 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ECC19C3.5070905@sandeen.net \
--to=sandeen@sandeen.net \
--cc=gregory.farnum@dreamhost.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox