linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: symlinks can be zero length during log recovery
Date: Mon, 18 Jun 2018 12:53:12 +1000	[thread overview]
Message-ID: <20180618025312.GC19934@dastard> (raw)
In-Reply-To: <20180616001034.GR10363@dastard>

On Sat, Jun 16, 2018 at 10:10:34AM +1000, Dave Chinner wrote:
> On Fri, Jun 15, 2018 at 07:31:26AM -0400, Brian Foster wrote:
> > On Fri, Jun 15, 2018 at 11:43:14AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > A log recovery failure has been reproduced where a symlink inode has
> > > a zero length in extent form. It was caused by a shutdown during a
> > > combined fstress+fsmark workload.
> > > 
> > > To fix it, we have to allow zero length symlink inodes through
> > > xfs_dinode_verify() during log recovery. We already specifically
> > > check and allow this case in the shortform symlink fork verifier,
> > > but in this case we don't get that far, and the inode is not in
> > > shortform format.
> > > 
> > > Update the dinode verifier to handle this case, and change the
> > > symlink fork verifier to only allow this case to exist during log
> > > recovery.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > 
> > Seems Ok to me, but before we restrict some of the existing checks to
> > log recovery I am curious about one thing. xfs_inactive_symlink() has
> > this:
> > 
> >         /*
> >          * Zero length symlinks _can_ exist.
> >          */
> >         pathlen = (int)ip->i_d.di_size;
> >         if (!pathlen) {
> >                 xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >                 return 0;
> >         }
> > 
> > I'm not quite sure what case that covers, but it seems slightly
> > inconsistent with the fork verifer change (simply because that path is
> > not exclusive to the read from disk case), at least. Any idea?
> 
> Yeah, that's what I'm trying to chase down right now. I had the
> verifier fire on inode writeback during generic/269. I don't know
> yet where these zero length symlinks are coming from, and none of
> the comments (there's a couple that say the above)
> actually give any hint to their source.

Ok, so there is this comment in fs/namei.c w.r.t. symlink handling
before getname_flags():

* POSIX.1 2.4: an empty pathname is invalid (ENOENT).

So the call chain is

sys_symlink(oldname ....)
  do_symlinkat(oldname ...)
    getname(oldname)
      getname_flags(oldname, 0, NULL)
        len = strncpy_from_user(... oldname ....)
	....
	if (!len) {
		if (!(flags & LOOKUP_EMPTY))
			return -ENOENT;
	}

So we should never see a zero length symlink from userspace as flags
is always zero. Hence if we are seeing zero length symlinks on disk,
then that's an XFS implementation issue, not a user API requirement.

There's two issues in the symlink code that can lead to zero length
symlinks firing the verifiers. They are symptoms of the same core
issue in xfs_inactive_symlink(): the inode is unlocked between the
symlink inactivation/truncation and the inode being freed. This
opens a window for the inode to be written to disk before it
xfs_ifree() removes it from the unlinked list, marks it free in the
inobt and zeros the mode.

The first, and simplest to solve issue is the shortform verifier.
This verifier doesn't actually verify on disk state - it verifies
*in memory inode fork state*. Specifically, it checks for a zero
length inode fork (ifp->if_bytes) and says specifically "this can
happen". The only place it can happen in in the window between
xfs_inactive_symlink() and xfs_ifree() because
xfs_inactive_symlink() tears down the data fork. It doesn't,
however, change the inode size, so if the inode is written back to
disk in this window, it's written with a non-zero size, leaving the
data fork in the inode untouched. i.e. the inode on disk is still a
valid symlink.

To fix this is easy. xfs_ifree() actually cleans up the in-memory
data and attr fork structures, and so there is absolutely no need to
do it in xfs_inactive_symlink(). With that change, the symlink
verifier error goes away.

Which leaves remote symlink inactivation. This runs a transaction
that truncates away the symlink extent and sets the inode size to
zero. IOWs, it creates an actual path for zero length symlinks on
disk. However, the symlink inode at this point is unreferenced by
userspace and is on the unlinked list, and hence userspace can never
see a zero length symlink inode. This does, however, create a
problem - we can get zero length symlinks on disk in log recovery
because the inode size is set to zero at the same time the EFI
intents are recorded. hence if log recovery then reads the inode off
disk to replay EFIs or other non-completed intents, it can see a
symlink inode with zero length.

So we have a choice here: either special case log recovery for
symlink inode verification, or prevent zero length extent for
symlink inodes from existing on disk. The former is essentially the
patch I posted, the latter requires discussion.

If we want to avoid zero length extent form symlinks on disk we
either need to make the inactivation and freeing atomic, or we can
make symlink inactivation change the type of inode to something that
allows zero length. The former is complex and a major undertaking
(new deferred op, a bunch of new intents, log recovery work, etc)
while the latter is one line of code. i.e. we simply change the
mode of the inode to a regular file at the same time we set the size
to zero.

If the transaction with the EFIs and zero size goes to disk, we
don't really care what the inode type is. it's on the unlinked list,
can't be seen from userspace, and we just need to run extent removal
and freeing on it in log recovery. Hence if we change it to be a
regular file inode, then we maintain the "no zero length symlinks
on disk" rule, and we get cleanup occurring without any new code or
concerns being created.

Thoughts?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-06-18  2:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-15  1:43 [PATCH] xfs: symlinks can be zero length during log recovery Dave Chinner
2018-06-15  1:57 ` Eric Sandeen
2018-06-15  2:02   ` Eric Sandeen
2018-06-15  2:34     ` Dave Chinner
2018-06-15 11:31 ` Brian Foster
2018-06-16  0:10   ` Dave Chinner
2018-06-18  2:53     ` Dave Chinner [this message]
2018-06-18  3:56 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180618025312.GC19934@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).