linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: NeilBrown <neilb@suse.de>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: Is lockref_get_not_zero() really correct in dget_parent()
Date: Tue, 5 Aug 2014 06:14:39 +0100	[thread overview]
Message-ID: <20140805051439.GQ18016@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFxDQ8fE16uKPfjU-nBnvfjWHz3ib5mo1BDn2=JKXVQQmQ@mail.gmail.com>

On Mon, Aug 04, 2014 at 09:07:32PM -0700, Linus Torvalds wrote:

> So renaming it to "lockref_get_active()", and changing the "not zero"
> test to check for "positive" and change the rtype of "count" to be
> signed, all sound like good things to me.

Fine by me.  I can do that, unless somebody else beats me to that.

> But I don't actually think it's an active bug, it's just an "active
> horribly ugly and subtly working code". I guess in theory if you can
> get lots of CPU's triggering the race at the same time, the magic
> negative number could become zero and positive, but at that point I
> don't think we're really talking reality any more.
> 
> Can somebody pick holes in that? Does somebody want to send in the
> cleanup patch?

FWIW, dget_parent() has been a bloody annoyance - most of the callers used
to be racy and looking through the current set...  Take a look at e.g.
fs/btrfs/tree-log.c:check_parent_dirs_for_sync().  In the loop there we
do
                if (!parent || !parent->d_inode || sb != parent->d_inode->i_sb)
                        break;

                if (IS_ROOT(parent))
                        break;

                parent = dget_parent(parent);
                dput(old_parent);
                old_parent = parent;
                inode = parent->d_inode;
and it's so bogus it's not even funny.  What the hell is that code trying to
check?  And while we are at it, can we race with renames there?

>From my reading of that code, most of it ought to have been under
rcu_read_lock() and sod all that playing with refcounts.  Other
btrfs users are not much nicer (who says, for instance, that result of
check_parent_dirs_for_sync() will remain valid?)

ecryptfs lock_parent(): AFAICS, all callers could do ecryptfs_dentry_to_lower()
on dentry->d_parent instead of doing ecryptfs_dentry_to_lower(dentry) and
then doing dget_parent() - the covering layer dentries have ->d_parent
stabilized by ->i_mutex on said parents and we really, really don't want to
run into the case when tree topologies go out of sync.  I.e. we want to grab
->i_mutex on lower(parent), then check that it's equal to parent(lower) and,
if it's at all possible that some joker has played with rename(2) on underlying
layer mounted somewhere else, drop everything we'd taken and bugger off.

XFS xfs_filestream_get_parent() is clearly bogus -
        parent = dget_parent(dentry);
        if (!parent)
                goto out_dput;
When does dget_parent() return NULL?

And that's just from a quick grep.  The point is, dget_parent() is not a nice
API - historically it's been abused more often than not and we keep playing
whack-a-mole with it.  Sigh...

Speaking of bogosities, apparently nobody has noticed that automagical
turning BSD process accounting off upon r/o remounts of the filesystem
hosting the log got broken several years ago.  Suppose we find an opened
log when remounting.  OK, it gets closed, which means filp_close(...).
Which means that actual closing that sucker will happen just before we
return to userland.  Return with -EBUSY, since the filesystem still has
a file opened for write when we get around to checking that...

I have kinda-sorta fixes for that (basically, restoring the situation we
used to have before delayed fput() and being damn careful about avoiding
deadlocks), but I would really love to just rip that idiocy out.  IOW,
just let it fail with -EBUSY in such cases, same as for any other write-opened
file on that fs.

That, BTW, is the most painful part of the acct series, so being able to
drop it would be very nice.  Comments?

  parent reply	other threads:[~2014-08-05  5:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05  3:17 Is lockref_get_not_zero() really correct in dget_parent() NeilBrown
2014-08-05  4:07 ` Linus Torvalds
2014-08-05  4:54   ` Steven Noonan
2014-08-05  4:56     ` Steven Noonan
2014-08-05 16:53       ` Linus Torvalds
2014-08-05  5:14   ` Al Viro [this message]
2014-08-05 22:58     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140805051439.GQ18016@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).