From: Dave Chinner <david@fromorbit.com>
To: Amit Sahrawat <amit.sahrawat83@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS internal error XFS_WANT_CORRUPTED_GOTO
Date: Sun, 24 Jul 2011 11:34:57 +1000 [thread overview]
Message-ID: <20110724013457.GB31951@dastard> (raw)
In-Reply-To: <CADDb1s1MpajXEHhMVf-h2K1JKQqaDLZ0QQY41seURUpde8D7Sg@mail.gmail.com>
On Fri, Jul 22, 2011 at 04:03:53PM +0530, Amit Sahrawat wrote:
> On Fri, Jul 22, 2011 at 2:04 PM, Dave Chinner <david@fromorbit.com> wrote:
> > [<c0023000>] (dump_backtrace+0x0/0x110)
> > [<c02dd668>] (dump_stack+0x0/0x1c)
> > [<c0176b84>] (xfs_error_report+0x0/0x5c)
> > [<c0150cd4>] (xfs_free_ag_extent+0x0/0x600)
> > [<c0152b78>] (xfs_free_extent+0x0/0xa4)
> > [<c015fea0>] (xfs_bmap_finish+0x0/0x194)
> > [<c017e664>] (xfs_itruncate_finish+0x0/0x30c)
> > [<c0197bbc>] (xfs_inactive+0x0/0x40c)
> > [<c01a3d50>] (xfs_fs_clear_inode+0x0/0x60)
> > [<c00bdcf8>] (clear_inode+0x0/0xe8)
> > [<c00be4a8>] (generic_delete_inode+0x0/0x178)
> > [<c00be620>] (generic_drop_inode+0x0/0x68)
> > [<c00bd2fc>] (iput+0x0/0x7c)
> > [<c00b4b44>] (do_unlinkat+0x0/0x154)
> > [<c00b4c98>] (sys_unlink+0x0/0x1c)
> >
> > So, you powered off an active machine while writing to it, and after
> > it started back up it hit a free space between corruption. And then
> > you couldn't mount it because log replay was trying to replay the
> > last committed transaction to the log. That transaction shows inode
> > 132 being unlinked, added to the AGI unliked list, and then being
> > inactivated. There is an EFI committed for 1 extent. There is no EFD
> > committed, so the shutdown occurred during that operation. Log
> > replay then hits the corruption repeatedly by trying to replay the
> > EFI to complete the extent free operation.
> >
> Yes, it happened exactly as you mentioned above. But the problem
> happened much earlier.
Obviously.
> While writing the 'reset' happened. At the next reboot - the
> filesystem mounted even though there
> was corruption - the file and journal did not match.
You found this how?
> And it did not
> show any error either.
Well, no. If the journal format is intact, then it's contents are
trusted.
> The moment the issue happened and then if checked through xfs_logprint
> - the blocks free in the allocation group does not match up properly.
> I checked by taking the print from xfs_free_ag_extent - it showed the
> wrong blocks contigous to that file.
Yes, which is why I suspect that the journal contained something it
shouldn't have.
> Original file which was to be deleted is of '255' blocks and started
> from physical block number - 12.
Yes, I saw that in the EFI record in the logprint output I quoted.
> While the free space tree has free
> blocks larger than what it should be. it returns the length starting
> from block number and the length which is last committed to the log.
>
> As I mentioned in the last mail - the details about the observation.
> Please check.
Going back to the transaction you quoted first: it doesn't match the
entries in the log print you attached in your second email. I can't
comment on it at all.
> > So, the log and the repair output are useless for determining what
> > caused the problem - you need the log from the mount *before* the
> > first shutdown occurred, and to have run repair *before* you
> > tried to unlink anything.
>
> It is only after 'first shutdown' I came to know about the corruption.
> Since, there was no error shown at the mount time, it did not seemed
> reasonable enough to run "repair" on the disk.
It's standard practice for power failure integrity testing.
> But at the same time I checked with the case(reset while direct-IO),
> when there was no issue - I tried to check the logs the same way as
> you mentioned
> - reset - log_print - check using xfs_repair and then mount - the
> behaviour was as per the logs - The number of commit transactions and
> the files created was ok.
> Logs are attached for xfs_logprint output.
>
> mount after taking the log prints.
>
> #mount /dev/sdb1 /media/b
> XFS mounting filesystem sda1
> Starting XFS recovery on filesystem: sda1 (logdev: internal)
> Ending XFS recovery on filesystem: sda1 (logdev: internal)
>
> #> ls -li /media/b
> 131 -rwxr-xr-x 1 root 0 2887184 Jan 1 00:00 test_code
> 132 -rw-r--r-- 1 root 0 1044480 Jan 1 00:00
> direct_io_file_0
> 133 -rw-r--r-- 1 root 0 819200 Jan 1 00:00
> direct_io_file_1
Last entry in the log for inode 133 is this:
Oper (480): tid: 738ef289 len: 56 clientid: TRANS flags: none
INODE: #regs: 3 ino: 0x85 flags: 0x5 dsize: 16
blkno: 64 len: 16 boff: 1280
Oper (481): tid: 738ef289 len: 96 clientid: TRANS flags: none
INODE CORE
magic 0x494e mode 0100644 version 2 format 2
nlink 1 uid 0 gid 0
atime 0xc mtime 0xd ctime 0xd
size 0xc9400 nblocks 0xcb extsize 0x0 nextents 0x1
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x0
Oper (482): tid: 738ef289 len: 16 clientid: TRANS flags: none
EXTENTS inode data
The inode size is 0xc9400 = 824,320 bytes, and blocks = 0xcb = 203 =
831,488 bytes. So there's nothing wrong here. Indeed, it appears
this transaction was not replayed because:
----------------------------------------------------------------------------
Oper (477): tid: 44eff979 len: 0 clientid: TRANS flags: COMMIT
Oper (478): tid: 738ef289 len: 0 clientid: TRANS flags: START
Oper (479): tid: 738ef289 len: 16 clientid: TRANS flags: none
Oper (480): tid: 738ef289 len: 56 clientid: TRANS flags: none
Oper (481): tid: 738ef289 len: 96 clientid: TRANS flags: none
Oper (482): tid: 738ef289 len: 16 clientid: TRANS flags: none
Oper (483): tid: 738ef289 len: 24 clientid: TRANS flags: none
Oper (484): tid: 738ef289 len: 128 clientid: TRANS flags: none
Oper (485): tid: 738ef289 len: 28 clientid: TRANS flags: none
Oper (486): tid: 738ef289 len: 128 clientid: TRANS flags: none
Oper (487): tid: 738ef289 len: 20 clientid: TRANS flags: CONTINUE
BUF: #regs: 2 Not printing rest of data
============================================================================
xfs_logprint: skipped 3510 cleared blocks in range: 592 - 4101
xfs_logprint: skipped 61434 zeroed blocks in range: 4102 - 65535
xfs_logprint: physical end of log
============================================================================
xfs_logprint: logical end of log
============================================================================
The transaction was never committed completely in the journal. The
last complete transaction in the journal matches up with the inode
size in your ls -l output about.
Your test code is obviously doing single block allocation (probably
4k DIO append writes). I can't see any inconsistencies with what is
in the journal, with what you see from ls -l, nor any
inconsistencies with the AGF spce accounting. Hence I suspect that
it's either a barrier problem, or perhaps yet another manifestation
of vmap cache aliasing causing random corruption during journal
replay. Can you verify that the transactions being replayed match
the contents of the logprint output (i.e. the entire transactions
including buffer data matches, and not just the transaction IDs).
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-07-24 1:35 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-22 4:52 XFS internal error XFS_WANT_CORRUPTED_GOTO Amit Sahrawat
2011-07-22 5:23 ` Amit Sahrawat
2011-07-22 6:59 ` Amit Sahrawat
2011-07-22 8:34 ` Dave Chinner
2011-07-22 10:33 ` Amit Sahrawat
2011-07-24 1:34 ` Dave Chinner [this message]
2011-07-25 5:26 ` Amit Sahrawat
2011-07-26 3:33 ` Dave Chinner
2011-07-26 9:47 ` Ajeet Yadav
2011-07-26 10:27 ` Christoph Hellwig
2011-07-27 4:33 ` Amit Sahrawat
-- strict thread matches above, loose matches on Subject: below --
2009-06-24 11:56 Nitin Arora
2009-06-24 12:40 ` Michael Monnerie
2009-06-24 13:18 ` Nitin Arora
2009-06-24 13:23 ` Eric Sandeen
2009-06-25 10:06 ` Nitin Arora
2009-06-25 13:45 ` Eric Sandeen
2008-05-29 13:48 Louis-David Mitterrand
2007-04-19 13:28 Burbidge, Simon A
2007-04-19 14:18 ` David Chinner
2007-04-19 14:36 ` Burbidge, Simon A
2007-04-19 22:10 ` David Chinner
2006-08-10 16:42 cache_purge: shake on cache 0x5880a0 left 8 nodes!? Paul Slootman
2006-08-11 1:30 ` Barry Naujok
2006-08-11 9:02 ` Paul Slootman
2006-08-12 9:14 ` Paul Slootman
2006-08-14 14:17 ` XFS internal error XFS_WANT_CORRUPTED_GOTO Paul Slootman
2006-08-15 11:54 ` Paul Slootman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110724013457.GB31951@dastard \
--to=david@fromorbit.com \
--cc=amit.sahrawat83@gmail.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox