linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/2] xfs: zero length symlinks are not valid
Date: Thu, 21 Jun 2018 08:59:18 +1000	[thread overview]
Message-ID: <20180620225918.GN19934@dastard> (raw)
In-Reply-To: <20180620115048.GA3241@bfoster>

On Wed, Jun 20, 2018 at 07:50:48AM -0400, Brian Foster wrote:
> On Wed, Jun 20, 2018 at 09:22:42AM +1000, Dave Chinner wrote:
> > On Tue, Jun 19, 2018 at 12:28:40PM -0400, Brian Foster wrote:
> > > On Tue, Jun 19, 2018 at 08:48:10AM -0700, Darrick J. Wong wrote:
> > > > On Tue, Jun 19, 2018 at 07:54:34AM -0400, Brian Foster wrote:
> > > > > On Tue, Jun 19, 2018 at 08:42:59AM +1000, Dave Chinner wrote:
> > > > > Rather, I'm asking about the pre-existing code that we remove. The hunk
> > > > > just above from xfs_inactive_symlink():
> > > > > 
> > > > > -	/*
> > > > > -	 * Zero length symlinks _can_ exist.
> > > > > -	 */
> > > > > -	if (!pathlen) {
> > > > > -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > > -		return 0;
> > > > > -	}
> > > > > 
> > > > > ... obviously suggests that there could have been a situation where we'd
> > > > > see a zero-length symlink on entry to xfs_inactive_symlink(). This
> > > > > patch, however, focuses on fixing the transient zero-length symlink
> > > > > caused by xfs_inactive_symlink_rmt() (which comes after the above
> > > > > check).
> > 
> > It can't happen through normal VFS paths, and I don't think it can
> > happen through log recovery. That's why I replaced this with an ASSERT.
> > 
> > In the normal behaviour case, zero length symlinks were created
> > /after/ this point in time, and we've always been unable to read
> > such inodes because xfs_dinode_verify() has always rejected zero
> > length symlink inodes.
> > 
> 
> That was my initial understanding as well..
> 
> > However, log recovery doesn't trip over the transient inode state
> > because it does not use xfs_dinode_verify() for inode recovery - it
> > reads the buffer with xfs_inode_buf_ops, and that just checks the
> > inode numbers and then recovers whatever is in the log over the top.
> > It never checks for zero length symlinks on recovery, and it never
> > goes through the dinode verifier (on read or write!) to catch this.
> > 
> > It's not until we then recover the remaining intent chain that we go
> > through xfs_iget/xfs_iread/xfs_inactive/xfs_ifree, and that
> > xfs_iget() call runs xfs_dinode_verify(). If we've already recovered
> > any part of the remote symlink removal intent chain (and we must
> > have to be replaying EFIs!) this should see a zero length symlink
> > inode. AIUI, we have no evidence that the verifier has ever fired at
> > this point in time, even when recovering known transient zero length
> > states.
> > 
> 
> Hmm, not sure we're talking about the same thing. I ran a quick
> experiment to try and clear up my confusion here, at least. I injected a
> shutdown at the point inactive creates the zero sized symlink and
> created/removed a remote symlink. On a remount, I hit the following
> during log recovery:
> 
> [  685.931834] XFS (dm-3): Starting recovery (logdev: internal)
> [  685.993014] XFS (dm-3): Metadata corruption detected at xfs_dinode_verify+0x331/0x490 [xfs], inode 0x85 dinode
> [  685.996287] XFS (dm-3): Unmount and run xfs_repair
> [  685.996911] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
> ...
> [  686.006647] Call Trace:
> [  686.006922]  dump_stack+0x85/0xcb
> [  686.007338]  xfs_iread+0xeb/0x220 [xfs]
> [  686.007820]  xfs_iget+0x4bd/0x1100 [xfs]
> [  686.008344]  xlog_recover_process_one_iunlink+0x4d/0x3c0 [xfs]
> ...
> 
> That seems to show that the verifier can impede unlinked list processing
> if the change is at least present in the log.

Which happens after we process intents - ah, the bmap intents only
occur from swap extents, so the inode is not read via xfs_iget()
in this case until unlinked list processing.

So, yeah, we are talking about the same thing - it's the second
phase of recovery that reads inodes through xfs_iget() (for whatever
reason) that will trip over the xfs_dinode_verifier.

> If I recreate that same dirty log state and mount with this patch
> applied (note that the fs is created without this patch to simulate an
> old kernel that has not changed i_mode in the same transaction that sets
> di_size = 0) along with a hack to avoid the check in
> xfs_dinode_verify(), I now hit the new assert and corruption error
> that's been placed in xfs_inactive_symlink().
> 
> So to Darrick's point, that seems to show that this is a vector to the
> pre-existing len == 0 check in xfs_inactive_symlink(). Given that, it
> seems to me that if we want to handle recovery from this state, we'd
> still need to work around the verifier check and retain the initial
> di_size == 0 check in xfs_inactive_symlink().

I think we should get rid of the transient state, not continue to
work around it. Because the transient state only exists in a log
recovery context, we can change the behaviour and not care about
older kernels still having the problem because all that is needed to
avoid the issue is a clean log when upgrading the kernel.

> > i.e all the cases I've seen where repair complains about symlinks
> > with "dfork format 2 and size 0" it is because the log is dirty and
> > hasn't been replayed. Mounting the filesystem and running log
> > recovery makes the problem go away, and this is what lead to me
> > removing the zeor length handling - the verifier should already
> > be firing if it is recovering an intent on a zero length symlink
> > inode...
> > 
> 
> I haven't grokked all the details here, but the behavior you describe
> does sound like this might be a different scenario from the above.

No, we are talking about the same thing. The only difference is that
I've been working from a bug report and corrupted fuzzer image, not
a real world corruption vector. Hence I'm trying to work backwards
from "here's a corrupt image that crashed" to "this is all the paths
that can lead to that on disk state".

While writing my last reply, I suspected you were correct about the
log recovery state, which is why I said:

> > That said, I'll try some more focussed testing with intentional
> > corruption to see if it's just a case of my testing being (un)lucky
> > and not triggering this issue...

As you've shown, I was being (un)lucky, so more work is needed on
this one.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-06-20 22:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-18  5:57 [PATCH 0/2] xfs: symlink and inode writeback issues Dave Chinner
2018-06-18  5:57 ` [PATCH 1/2] xfs: zero length symlinks are not valid Dave Chinner
2018-06-18 13:24   ` Brian Foster
2018-06-18 22:42     ` Dave Chinner
2018-06-19 11:54       ` Brian Foster
2018-06-19 15:48         ` Darrick J. Wong
2018-06-19 16:28           ` Brian Foster
2018-06-19 23:22             ` Dave Chinner
2018-06-20 11:50               ` Brian Foster
2018-06-20 22:59                 ` Dave Chinner [this message]
2018-06-21 11:46                   ` Brian Foster
2018-06-21 22:31                     ` Dave Chinner
2018-06-21 22:53                       ` Darrick J. Wong
2018-06-22 10:44                         ` Brian Foster
2018-06-23 17:38                           ` Darrick J. Wong
2018-06-18  5:57 ` [PATCH 2/2] xfs: xfs_iflush_abort() can be called twice on cluster writeback failure Dave Chinner
2018-06-18 13:24   ` Brian Foster
2018-06-19  5:05   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180620225918.GN19934@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).