public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Arkadiusz Miśkiewicz" <arekm@maven.pl>
Cc: Alex Elder <elder@kernel.org>, xfs@oss.sgi.com
Subject: Re: xfs_repair 3.2.0 cannot (?) fix fs
Date: Mon, 30 Jun 2014 22:06:13 +1000	[thread overview]
Message-ID: <20140630120613.GF4453@dastard> (raw)
In-Reply-To: <201406301353.10829.arekm@maven.pl>

On Mon, Jun 30, 2014 at 01:53:10PM +0200, Arkadiusz Miśkiewicz wrote:
> On Monday 30 of June 2014, Dave Chinner wrote:
> > On Mon, Jun 30, 2014 at 07:36:24AM +0200, Arkadiusz Miśkiewicz wrote:
> > > On Monday 30 of June 2014, Dave Chinner wrote:
> > > > but right now only user quotas are enabled.  It's only AGs 1-15 that
> > > > show this, so this seems to me that it is likely that this
> > > > filesystem was originally only 16 AGs and it's been grown many times
> > > > since?
> > > 
> > > The quotas was running fine until some repair run (ie. before and after
> > > first repair mounting with quota succeeded) - some xfs_repair run later
> > > broke this.
> > 
> > Actually, it looks more likely that a quotacheck has failed part way
> > though, leaving the quota in an indeterminate state and then repair
> > has been run, messing things up more...
> 
> Hm, the only quotacheck I see in logs from that day reported "Done". I assume 
> it wouldn't report that if some problem occured in middle?
> 
> Jun 28 00:57:36 web2 kernel: [736161.906626] XFS (dm-1): Quotacheck needed: 
> Please wait.
> Jun 28 01:09:10 web2 kernel: [736855.851555] XFS (dm-1): Quotacheck: Done.

If there was an error, it should report it and say that quotas are
being turned off.

> [...] here were few Internal error xfs_bmap_read_extents(1) while doing 
> xfs_dir_lookup (I assume due to not fixed directory entries problem). 
> xfs_repair was also run few times and then...
> 
> Jun 28 23:16:50 web2 kernel: [816515.898210] XFS (dm-1): Mounting Filesystem
> Jun 28 23:16:50 web2 kernel: [816515.915356] XFS (dm-1): Ending clean mount
> Jun 28 23:16:50 web2 kernel: [816515.940008] XFS (dm-1): Failed to initialize 
> disk quotas.

I haven't yet tracked down what the error here is yet - I'm still
working on the reapir side of things before I even try to mount the
images you sent me. :/

Once I get repair running cleanly, I'll look at why this is failing.

> > > > > Invalid inode number 0xfeffffffffffffff
> > > > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > > > done
> > > > 
> > > > Not sure what that is yet, but it looks like writing a directory
> > > > block found entries with invalid inode numbers in it. i.e. it's
> > > > telling me that there's something not been fixed up.
> > > > 
> > > > I'm actually seeing this in phase4:
> > > >         - agno = 148
> > > > 
> > > > Invalid inode number 0xfeffffffffffffff
> > > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > > 
> > > > Second time around, this does not happen, so the error has been
> > > > corrected in a later phase of the first pass.
> > > 
> > > Here on two runs I got exactly the same report:
> > > 
> > > Phase 7 - verify and correct link counts...
> > > 
> > > Invalid inode number 0xfeffffffffffffff
> > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > Invalid inode number 0xfeffffffffffffff
> > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > 
> > > but there were more of errors like this earlier so repair fixed some but
> > > left with these two.
> > 
> > Right, I suspect that I've got a partial fix for this already in
> > place - i was having xfs_repair -n ... SEGV on when parsing the
> > broken directory in phase 6, so I have some code that prevents that
> > crash which might also be partially fixing this.
> 
> Nice :-) Do you also know why 3.1.11 doesn't have this problem with 
> xfs_dir_ino_validate: XFS_ERROR_REPORT ?

Oh, that's easy: 3.1.11 doesn't have write verifiers, so it would
never know that it wrote a bad inode number to disk. Like the kernel
code, the write verifiers actually check that the modifications
being made result in valid on disk values, and that's something
we've never had in repair before 3.2.0.

IOWs, 3.1.11 could well be writing inodes with 0xfeffffffffffffff in
them, but there's nothing to catch that in repair or libxfs on read
or write. Hence we could be tripping over an old bug we never knew
existed until now...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      reply	other threads:[~2014-06-30 12:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-27 23:41 xfs_repair 3.2.0 cannot (?) fix fs Arkadiusz Miśkiewicz
2014-06-28 21:52 ` Arkadiusz Miśkiewicz
2014-06-28 22:01   ` Arkadiusz Miśkiewicz
2014-06-30  3:18 ` Dave Chinner
2014-06-30  3:44   ` Dave Chinner
2014-06-30  5:36   ` Arkadiusz Miśkiewicz
2014-06-30 11:12     ` Dave Chinner
2014-06-30 11:53       ` Arkadiusz Miśkiewicz
2014-06-30 12:06         ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140630120613.GF4453@dastard \
    --to=david@fromorbit.com \
    --cc=arekm@maven.pl \
    --cc=elder@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox