public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: John Quigley <jquigley@jquigley.com>
Cc: XFS Development <xfs@oss.sgi.com>
Subject: Re: File system corruption
Date: Thu, 16 Jul 2009 14:20:57 -0500	[thread overview]
Message-ID: <4A5F7D99.4010503@sandeen.net> (raw)
In-Reply-To: <4A5F6C8C.609@jquigley.com>

John Quigley wrote:
> Hey Folks:
> 
> I'm periodically encountering an issue with XFS that you might perhaps be interested in.  The environment in which this manifests itself is on a CentOS Linux machine (custom 2.6.28.7 kernel), which is serving the XFS mount point in question with the standard Linux nfsd.  The XFS file system lives on an LVM device in a striping configuration (2 wide stripe), with two iSCSI volumes acting as the constituent physical volumes.  This configuration is somewhat baroque, I know.
> 
> I'm experiencing periodic file system corruption, which manifests in the XFS file system going offline, and refusing subsequent mounts.  The only way to recover from this has been to perform a xfs_repair -L, which has resulted in data loss on each occasion, as expected.

The log corruption might be related to data reordering somewhere along
your IO path, though I wouldn't swear to it.  But often when write
caches are on, barriers are off, and power is lost, this sort of thing
shows up.

> Now, here's what I witness in the system logs:
> 
> <snip>
> kernel: XFS: bad magic number
> kernel: XFS: SB validate failed

That's the first error?

> kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> kernel: Filesystem "dm-0": XFS internal error xfs_ialloc_read_agi at line 1408 of file fs/xfs/xfs_ialloc.c.  Caller 0xffffffff8118711a

This means that after you read an agi, it failed a sanity test:

1403                 be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC &&
1404                 XFS_AGI_GOOD_VERSION(be32_to_cpu(agi->agi_versionnum));

bad magic number, etc.  The "00 00 00 00 ..." is the contents of the
buffer that it thought was the agi, containing all that wonderful magic
- but it's all 0s.

...

> The resultant stack trace coming from "XFS internal error xfs_ialloc_read_agi" repeats itself numerous times, at which point, the following is seen:
> 
> <snip>
> 
> kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> kernel: Filesystem "dm-0": XFS internal error xfs_alloc_read_agf at line 2194 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffff8115cf09

Similar, but bad info on the AGF:
2184         agf_ok =
2185                 be32_to_cpu(agf->agf_magicnum) == XFS_AGF_MAGIC &&
2186
XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)) &&
2187                 be32_to_cpu(agf->agf_freeblks) <=
be32_to_cpu(agf->agf_length) &&
2188                 be32_to_cpu(agf->agf_flfirst) < XFS_AGFL_SIZE(mp) &&
2189                 be32_to_cpu(agf->agf_fllast) < XFS_AGFL_SIZE(mp) &&
2190                 be32_to_cpu(agf->agf_flcount) <= XFS_AGFL_SIZE(mp);

Again w/ the zeros ...

> 
> kernel: Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff811a9411

...

and then the fs tried to back out of a dirty transaction, which it can't
do, but that's secondary.

> kernel: xfs_force_shutdown(dm-0,0x8) called from line 1165 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffff811a348e
> kernel: Filesystem "dm-0": Corruption of in-memory data detected.  Shutting down filesystem: dm-0
> kernel: Please umount the filesystem, and rectify the problem(s)
> kernel: nfsd: non-standard errno: -117

117 EFSCORRUPTED IIRC?

> kernel: Filesystem "dm-0": xfs_log_force: error 5 returned.

EIO

> </snip>
> 
> I'm somewhat at a loss with this one - it's been experienced on a customer's installation, so I don't have ready access to the machine.  All internal tests to attempt reproduction with identical hardware/software configurations has been unfruitful.  I'm concerned about the custom kernel, and may attempt to downgrade to the stock CentOS 5.3 kernel (2.6.18, if I remember correctly).
> 
> Any insight would be hugely appreciated, and of course tell me how I can help further.  Thanks so much.

I'm happy to blame the storage here, given the buffers full of 0s ...
you could modify the messages to print the block nrs in question and go
back directly to the storage, read it, and see what's there.

Were there no iscsi or other assorted messages before all this?

-Eric

> John Quigley
> jquigley.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2009-07-16 19:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-16 18:08 File system corruption John Quigley
2009-07-16 19:20 ` Eric Sandeen [this message]
  -- strict thread matches above, loose matches on Subject: below --
2012-10-11 17:52 Wayne Walker
2012-10-11 18:03 ` Wayne Walker
2012-10-11 21:07 ` Dave Chinner
     [not found]   ` <50789076.7040402@crossroads.com>
2012-10-13  0:14     ` Dave Chinner
2012-10-24 21:19       ` Wayne Walker
2012-10-24 22:51         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A5F7D99.4010503@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=jquigley@jquigley.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox