All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Arkadiusz Miśkiewicz" <arekm@maven.pl>
Cc: Alex Elder <elder@kernel.org>, xfs@oss.sgi.com
Subject: Re: xfs_repair 3.2.0 cannot (?) fix fs
Date: Mon, 30 Jun 2014 21:12:14 +1000	[thread overview]
Message-ID: <20140630111214.GE4453@dastard> (raw)
In-Reply-To: <201406300736.24291.arekm@maven.pl>

On Mon, Jun 30, 2014 at 07:36:24AM +0200, Arkadiusz Miśkiewicz wrote:
> On Monday 30 of June 2014, Dave Chinner wrote:
> > [Compendium reply to all 3 emails]
> > 
> > On Sat, Jun 28, 2014 at 01:41:54AM +0200, Arkadiusz Miśkiewicz wrote:
> > > reset bad sb for ag 5
> > >
> > >. non-null group quota inode field in superblock 7
> > 
> > OK, so this is indicative of something screwed up a long time ago.
> > Firstly, the primary superblocks shows:
> > 
> > uquotino = 4077961
> > gquotino = 0
> > qflags = 0
> > 
> > i.e. user quota @ inode 4077961, no group quota. The secondary
> > superblocks that are being warned about show:
> > 
> > uquotino = 0
> > gquotino = 4077962
> > qflags = 0
> > 
> > Which is clearly wrong. They should have been overwritten during the
> > growfs operation to match the primary superblock.
> > 
> > The similarity in inode number leads me to beleive at some point
> > both user and group/project quotas were enabled on this filesystem,
> 
> Both user and project quotas were enabled on this fs for last few years.
> 
> > but right now only user quotas are enabled.  It's only AGs 1-15 that
> > show this, so this seems to me that it is likely that this
> > filesystem was originally only 16 AGs and it's been grown many times
> > since?
> 
> The quotas was running fine until some repair run (ie. before and after first 
> repair mounting with quota succeeded) - some xfs_repair run later broke this.

Actually, it looks more likely that a quotacheck has failed part way
though, leaving the quota in an indeterminate state and then repair
has been run, messing things up more...

> > > Invalid inode number 0xfeffffffffffffff
> > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > done
> > 
> > Not sure what that is yet, but it looks like writing a directory
> > block found entries with invalid inode numbers in it. i.e. it's
> > telling me that there's something not been fixed up.
> > 
> > I'm actually seeing this in phase4:
> > 
> >         - agno = 148
> > Invalid inode number 0xfeffffffffffffff
> > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > Metadata corruption detected at block 0x11fbb698/0x1000
> > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > 
> > Second time around, this does not happen, so the error has been
> > corrected in a later phase of the first pass.
> 
> Here on two runs I got exactly the same report:
> 
> Phase 7 - verify and correct link counts...
> 
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> 
> but there were more of errors like this earlier so repair fixed some but left 
> with these two.

Right, I suspect that I've got a partial fix for this already in
place - i was having xfs_repair -n ... SEGV on when parsing the
broken directory in phase 6, so I have some code that prevents that
crash which might also be partially fixing this.

> > > 5)Metadata CRC error detected at block 0x0/0x200
> > > but it is not CRC enabled fs
> > 
> > That's typically caused by junk in the superblock beyond the end
> > of the v4 superblock structure. It should be followed by "zeroing
> > junk ..."
> 
> Shouldn't repair fix superblocks when noticing v4 fs?

It does.

> I mean 3.2.0 repair reports:
> 
> $ xfs_repair -v ./1t-image 
> Phase 1 - find and verify superblock...
>         - reporting progress in intervals of 15 minutes
>         - block cache size set to 748144 entries
> Phase 2 - using internal log
>         - zero log...
> zero_log: head block 2 tail block 2
>         - scan filesystem freespace and inode maps...
> Metadata CRC error detected at block 0x0/0x200
> zeroing unused portion of primary superblock (AG #0)
>         - 07:20:11: scanning filesystem freespace - 391 of 391 allocation 
> groups done
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - 07:20:11: scanning agi unlinked lists - 391 of 391 allocation groups 
> done
>         - process known inodes and perform inode discovery...
>         - agno = 0
> [...]
> 
> but if I run 3.1.11 after running 3.2.0 then superblocks get fixed:
> 
> $ ./xfsprogs/repair/xfs_repair -v ./1t-image 
> Phase 1 - find and verify superblock...
>         - block cache size set to 748144 entries
> Phase 2 - using internal log
>         - zero log...
> zero_log: head block 2 tail block 2
>         - scan filesystem freespace and inode maps...
> zeroing unused portion of primary superblock (AG #0)

,,,

> Shouldn't these be "unused" for 3.2.0, too (since v4 fs) ?

I'm pretty sure that's indicative of older xfs_repair code
not understanding that sb_badfeatures2  didn't need to be zeroed.
It wasn't until:

cbd7508 xfs_repair: zero out unused parts of superblocks

that xfs_repair correctly sized the unused area of the superblock.
You'll probably find that mounting this filesystem resulted in 
""sb_badfeatures2 mistach detected. Correcting." or something
similar in dmesg because of this (now fixed) repair bug.

> > > Made xfs metadump without file obfuscation and I'm able to reproduce the
> > > problem reliably on the image (if some xfs developer wants metadump image
> > > then please mail me - I don't want to put it for everyone due to obvious
> > > reasons).
> > > 
> > > So additional bug in xfs_metadump where file obfuscation "fixes" some
> > > issues. Does it obfuscate but keep invalid conditions (like keeping "/"
> > > in file name) ? I guess it is not doing that.
> > 
> > I doubt it handles a "/" in a file name properly - that's rather
> > illegal, and the obfuscation code probably doesn't handle it at all.
> 
> Would be nice to keep these bad conditions. obfuscated metadump is behaving 
> differently than non-obfuscated metadump with xfs_repair here (less issues 
> with obfuscated than non-obfuscated), so obfuscation simply hides problems.

Sure, but we didn't even know this was a problem until now, so that
will have to wait....

> I assume that you do testing on the non-obfuscated dump I gave on irc?

Yes, but I've been cross checking against the obfuscated one with
xfs_db....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-06-30 11:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-27 23:41 xfs_repair 3.2.0 cannot (?) fix fs Arkadiusz Miśkiewicz
2014-06-28 21:52 ` Arkadiusz Miśkiewicz
2014-06-28 22:01   ` Arkadiusz Miśkiewicz
2014-06-30  3:18 ` Dave Chinner
2014-06-30  3:44   ` Dave Chinner
2014-06-30  5:36   ` Arkadiusz Miśkiewicz
2014-06-30 11:12     ` Dave Chinner [this message]
2014-06-30 11:53       ` Arkadiusz Miśkiewicz
2014-06-30 12:06         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140630111214.GE4453@dastard \
    --to=david@fromorbit.com \
    --cc=arekm@maven.pl \
    --cc=elder@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.