public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Arkadiusz Miśkiewicz" <arekm@maven.pl>
Cc: Alex Elder <elder@kernel.org>, xfs@oss.sgi.com
Subject: Re: xfs_repair 3.2.0 cannot (?) fix fs
Date: Mon, 30 Jun 2014 21:12:14 +1000	[thread overview]
Message-ID: <20140630111214.GE4453@dastard> (raw)
In-Reply-To: <201406300736.24291.arekm@maven.pl>

On Mon, Jun 30, 2014 at 07:36:24AM +0200, Arkadiusz Miśkiewicz wrote:
> On Monday 30 of June 2014, Dave Chinner wrote:
> > [Compendium reply to all 3 emails]
> > 
> > On Sat, Jun 28, 2014 at 01:41:54AM +0200, Arkadiusz Miśkiewicz wrote:
> > > reset bad sb for ag 5
> > >
> > >. non-null group quota inode field in superblock 7
> > 
> > OK, so this is indicative of something screwed up a long time ago.
> > Firstly, the primary superblocks shows:
> > 
> > uquotino = 4077961
> > gquotino = 0
> > qflags = 0
> > 
> > i.e. user quota @ inode 4077961, no group quota. The secondary
> > superblocks that are being warned about show:
> > 
> > uquotino = 0
> > gquotino = 4077962
> > qflags = 0
> > 
> > Which is clearly wrong. They should have been overwritten during the
> > growfs operation to match the primary superblock.
> > 
> > The similarity in inode number leads me to beleive at some point
> > both user and group/project quotas were enabled on this filesystem,
> 
> Both user and project quotas were enabled on this fs for last few years.
> 
> > but right now only user quotas are enabled.  It's only AGs 1-15 that
> > show this, so this seems to me that it is likely that this
> > filesystem was originally only 16 AGs and it's been grown many times
> > since?
> 
> The quotas was running fine until some repair run (ie. before and after first 
> repair mounting with quota succeeded) - some xfs_repair run later broke this.

Actually, it looks more likely that a quotacheck has failed part way
though, leaving the quota in an indeterminate state and then repair
has been run, messing things up more...

> > > Invalid inode number 0xfeffffffffffffff
> > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > done
> > 
> > Not sure what that is yet, but it looks like writing a directory
> > block found entries with invalid inode numbers in it. i.e. it's
> > telling me that there's something not been fixed up.
> > 
> > I'm actually seeing this in phase4:
> > 
> >         - agno = 148
> > Invalid inode number 0xfeffffffffffffff
> > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > Metadata corruption detected at block 0x11fbb698/0x1000
> > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > 
> > Second time around, this does not happen, so the error has been
> > corrected in a later phase of the first pass.
> 
> Here on two runs I got exactly the same report:
> 
> Phase 7 - verify and correct link counts...
> 
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> 
> but there were more of errors like this earlier so repair fixed some but left 
> with these two.

Right, I suspect that I've got a partial fix for this already in
place - i was having xfs_repair -n ... SEGV on when parsing the
broken directory in phase 6, so I have some code that prevents that
crash which might also be partially fixing this.

> > > 5)Metadata CRC error detected at block 0x0/0x200
> > > but it is not CRC enabled fs
> > 
> > That's typically caused by junk in the superblock beyond the end
> > of the v4 superblock structure. It should be followed by "zeroing
> > junk ..."
> 
> Shouldn't repair fix superblocks when noticing v4 fs?

It does.

> I mean 3.2.0 repair reports:
> 
> $ xfs_repair -v ./1t-image 
> Phase 1 - find and verify superblock...
>         - reporting progress in intervals of 15 minutes
>         - block cache size set to 748144 entries
> Phase 2 - using internal log
>         - zero log...
> zero_log: head block 2 tail block 2
>         - scan filesystem freespace and inode maps...
> Metadata CRC error detected at block 0x0/0x200
> zeroing unused portion of primary superblock (AG #0)
>         - 07:20:11: scanning filesystem freespace - 391 of 391 allocation 
> groups done
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - 07:20:11: scanning agi unlinked lists - 391 of 391 allocation groups 
> done
>         - process known inodes and perform inode discovery...
>         - agno = 0
> [...]
> 
> but if I run 3.1.11 after running 3.2.0 then superblocks get fixed:
> 
> $ ./xfsprogs/repair/xfs_repair -v ./1t-image 
> Phase 1 - find and verify superblock...
>         - block cache size set to 748144 entries
> Phase 2 - using internal log
>         - zero log...
> zero_log: head block 2 tail block 2
>         - scan filesystem freespace and inode maps...
> zeroing unused portion of primary superblock (AG #0)

,,,

> Shouldn't these be "unused" for 3.2.0, too (since v4 fs) ?

I'm pretty sure that's indicative of older xfs_repair code
not understanding that sb_badfeatures2  didn't need to be zeroed.
It wasn't until:

cbd7508 xfs_repair: zero out unused parts of superblocks

that xfs_repair correctly sized the unused area of the superblock.
You'll probably find that mounting this filesystem resulted in 
""sb_badfeatures2 mistach detected. Correcting." or something
similar in dmesg because of this (now fixed) repair bug.

> > > Made xfs metadump without file obfuscation and I'm able to reproduce the
> > > problem reliably on the image (if some xfs developer wants metadump image
> > > then please mail me - I don't want to put it for everyone due to obvious
> > > reasons).
> > > 
> > > So additional bug in xfs_metadump where file obfuscation "fixes" some
> > > issues. Does it obfuscate but keep invalid conditions (like keeping "/"
> > > in file name) ? I guess it is not doing that.
> > 
> > I doubt it handles a "/" in a file name properly - that's rather
> > illegal, and the obfuscation code probably doesn't handle it at all.
> 
> Would be nice to keep these bad conditions. obfuscated metadump is behaving 
> differently than non-obfuscated metadump with xfs_repair here (less issues 
> with obfuscated than non-obfuscated), so obfuscation simply hides problems.

Sure, but we didn't even know this was a problem until now, so that
will have to wait....

> I assume that you do testing on the non-obfuscated dump I gave on irc?

Yes, but I've been cross checking against the obfuscated one with
xfs_db....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-06-30 11:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-27 23:41 xfs_repair 3.2.0 cannot (?) fix fs Arkadiusz Miśkiewicz
2014-06-28 21:52 ` Arkadiusz Miśkiewicz
2014-06-28 22:01   ` Arkadiusz Miśkiewicz
2014-06-30  3:18 ` Dave Chinner
2014-06-30  3:44   ` Dave Chinner
2014-06-30  5:36   ` Arkadiusz Miśkiewicz
2014-06-30 11:12     ` Dave Chinner [this message]
2014-06-30 11:53       ` Arkadiusz Miśkiewicz
2014-06-30 12:06         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140630111214.GE4453@dastard \
    --to=david@fromorbit.com \
    --cc=arekm@maven.pl \
    --cc=elder@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox