All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kamal Dasu <kdasu.kdev@gmail.com>
To: xfs@oss.sgi.com
Subject: Re: xfs filesystem corruption with kernel 2.6.37
Date: Thu, 1 Nov 2012 12:30:13 -0700 (PDT)	[thread overview]
Message-ID: <34630253.post@talk.nabble.com> (raw)
In-Reply-To: <20121025224713.GF29378@dastard>


Dave,

Thanks for you reply.

I am trying to act on the hints you gave me but I still have a few
questions.

On Thu, Oct 25, 2012 at 6:47 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Oct 25, 2012 at 09:45:10AM -0400, Kamal Dasu wrote:
>> with  "CONFIG_XFS_DEBUG=y" I get the following assertion:
>>
>> Assertion failed: prev.br_state == XFS_EXT_NORM, file:
>> fs/xfs/xfs_bmap.c, line: 5192
>
> Yup, that's pretty clear indication of a corrupted extent record.
>

What is the best way to prevent  transactions that record bad
extent length and block numbers.

>> would have cleared inode 6776
>>         - agno = 1
>> 771a3500: Badness in key lookup (length)
>> bp=(bno 16107312, len 16384 bytes) key=(bno 16107312, len 8192 bytes)
>>         - agno = 2
>> bad nblocks 5120 for inode 33701135, would reset to 4096
>> inode 34297761 - bad rt extent start block number 2392537303836672,
>                                                 0x88000001B6800
>
> That's the open, unlinked file at the time the system crashed. That
> may be where your problems are coming from. The RT is mostly
> untested, and we sure as anything don't do any crash resiliency or
> recovery testing on it, so there's a good chance there are bugs in
> it that might show up in situations like this....
>
> You need to detect extents with invalid lengths in them and trigger
> a corruption-based filesystem shutdown.
>

Looked at the log during one of the filesystem shutdown when the
I/O error occurs. is this an indication of already corrupted log due to
corrupted in-memory metadata structures?.
===
attempt to access beyond end of device
sda2: rw=0, want=33792081130943048, limit=31471329
I/O error in filesystem ("sda2") meta-data dev sda2 block
0x780db80007f240       ("xfs_trans_read_buf") error 5 buf count 4096
xfs_force_shutdown(sda2,0x1) called from line 395 of file
fs/xfs/xfs_trans_buf.c.  Return address = 0x801f4f88
Filesystem "sda2": I/O Error Detected.  Shutting down filesystem: sda2
Please umount the filesystem, and rectify the problem(s)
====

However the log is already corrupted. So is there a check on a write
to the log ?.

>> also if there is something that can be done to avoid this situation in
>> the first place.
>
> Track down where those stray upper bits in the block numbers are
> coming from, and you'll have your answer.
>

Have not been able to track this down yet. But could it be a possible memory
corruption, leading to the in-memory metadata to get corrupted.

On a similar occurrence of this issue on recovery after a reboot seems
to always go through the evict path

Filesystem "sda2": XFS internal error xfs_trans_cancel at line 1815
of file fs/xfs/xfs_trans.c.  Caller 0x801f8524

Call Trace:
[<80439d2c>] dump_stack+0x8/0x34
[<801f3bec>] xfs_trans_cancel+0x10c/0x128
[<801f8524>] xfs_inactive+0x2fc/0x450
[<800dcd54>] evict+0x28/0xd0
[<800dd300>] iput+0x19c/0x2d8
[<801e5bcc>] xlog_recover_process_one_iunlink+0xec/0x130
[<801e7b60>] xlog_recover_process_iunlinks.clone.25+0xa8/0x108
[<801eb360>] xlog_recover_finish+0x40/0x100
[<801eedd8>] xfs_mountfs+0x434/0x654
..
.
Filesystem "sda2": Corruption of in-memory data detected.  Shutting
down filesystem: sda2

-- 
View this message in context: http://old.nabble.com/xfs-filesystem-corruption-with-kernel-2.6.37-tp34601185p34630253.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-11-01 19:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-25 13:45 xfs filesystem corruption with kernel 2.6.37 Kamal Dasu
2012-10-25 22:47 ` Dave Chinner
2012-11-01 19:30   ` Kamal Dasu [this message]
2012-11-02  1:27     ` Dave Chinner
2012-11-02 16:34       ` Kamal Dasu
2012-11-02 22:55         ` Dave Chinner
2012-11-03  1:57           ` Kamal Dasu
2012-11-03 22:25             ` Dave Chinner
2012-11-09 21:18               ` Kamal Dasu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34630253.post@talk.nabble.com \
    --to=kdasu.kdev@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.