linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nick Piggin <npiggin@kernel.dk>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] vfs: fix race in rcu lookup of pruned dentry
Date: Mon, 18 Jul 2011 03:08:18 +0100	[thread overview]
Message-ID: <20110718020818.GW11013@ZenIV.linux.org.uk> (raw)
In-Reply-To: <alpine.LSU.2.00.1107171740260.1327@sister.anvils>

On Sun, Jul 17, 2011 at 06:13:34PM -0700, Hugh Dickins wrote:

> And in the genuine negative dentry case: at present (with or without
> my or Linus's patches) there is no final nd->seq check, is there?

For stat() - no, there isn't.  We really bail out with -ENOENT, no matter
what.  Racy, which is what you are hitting.

> Having found a negative dentry, we just get out at that point with
> -ENOENT.  Now, it's true that there might be a race in which the
> negative dentry is being made positive just as we're getting out;
> but since we're not digging in any deeper, that seems to me just
> a natural inevitable race, nothing to be guarded against.
>
> Whereas your "if (!*inode) goto unlazy;" is adding an additional
> sequence check, isn't it?  Is that a race you see as important?
> If so, it is a different matter than I was ever considering.

Umm...  I really don't think that trying to avoid that ->d_seq comparison
is worth doing.  I understand your point, but... IMO it's safer to leave
->d_inode cleaning as-is, at least for now.  Hell knows; I'd be a lot
more optimistic if fs/dcache.c had been in better shape.  As it is, *and*
considering that we are within a spitting distance from -final...

It's not as if we'd been talking about a heavy contention or cacheline
bouncing here, let alone doing ->lookup().  I'm not entirely convinced
that it's a valid optimization in the first place (probably is, but I'm
seriously scared by the complexity we already have there), and I'm really
not fond of the idea of dealing with whatever subtle crap we might
discover with Linus' patch.  Again, dcache is not in a healthy shape
right now; at this point dumb and straightforward is, IMO, better than
subtle and risking to step on toes of very odd code out there.

Hell, stuff that walks towards root with just rcu_read_lock, checking
->d_inode as it goes and making assumptions about the lifecycle
stages of the dentries it sees is not even all that weird, compared to
the code actually in there ;-/

Once we are done with code audit, sure, I'm fine with ->d_inode being
kept until dentry is actually freed.  Any code that relies on that thing
being cleared is asking for trouble and should be rewritten anyway.
The only thing is, it needs to be found before we rewrite it...

Speaking of weird code - could somebody explain what the deuce is going
on in dget_parent()?
        rcu_read_lock();
        ret = dentry->d_parent;
        if (!ret) {
                rcu_read_unlock();
                goto out; 
        }
in the very beginning looks wrong - we *never* have NULL ->d_parent on
dentries that might be seen by anyone; d_alloc(NULL, name) callers all
set ->d_parent to dentry itself ASAP and we never put NULL there afterwards.

What the hell is going on there?  Note that callers of dget_parent() will
be really unimpressed by getting NULL from it; most of them will cheerfully
dereference what they get.  It seems to be Nick's change; is there anybody
who would remember reasons for it?

  reply	other threads:[~2011-07-18  2:08 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-17 21:03 [PATCH] vfs: fix race in rcu lookup of pruned dentry Hugh Dickins
2011-07-17 22:00 ` Linus Torvalds
2011-07-17 22:59   ` Linus Torvalds
2011-07-17 23:26     ` Al Viro
2011-07-17 23:16   ` Al Viro
2011-07-17 23:38     ` Linus Torvalds
2011-07-17 23:47       ` Hugh Dickins
2011-07-18  0:25         ` Al Viro
2011-07-18  1:13           ` Hugh Dickins
2011-07-18  2:08             ` Al Viro [this message]
2011-07-18  6:31               ` Linus Torvalds
2011-07-18 14:41                 ` Hugh Dickins
2011-07-18 18:11                 ` Linus Torvalds
2011-07-18 18:20                   ` Al Viro
2011-07-18 19:08                     ` Linus Torvalds
2011-07-18 19:20                       ` Al Viro
2011-07-18 19:23                         ` Al Viro
2011-07-18 19:34                         ` Linus Torvalds
2011-07-18 19:04                   ` Hugh Dickins
2011-07-18 19:33                     ` Linus Torvalds
2011-07-18 19:47                       ` Al Viro
2011-07-18 20:24                         ` Linus Torvalds
2011-07-18 21:19                           ` Hugh Dickins
2011-07-18 21:42                             ` Linus Torvalds
2011-07-18 22:43                               ` Hugh Dickins
2011-07-18 23:17                                 ` Al Viro
2011-07-18 23:21                                   ` Al Viro
2011-07-18 23:27                                     ` Linus Torvalds
2011-07-18 23:40                                       ` Al Viro
2011-07-19  2:07                                         ` Hugh Dickins
2011-07-19  2:14                                           ` Linus Torvalds
2011-07-19  2:17                                             ` Linus Torvalds
2011-07-19  2:23                                               ` Al Viro
2011-07-19  2:37                                                 ` Chris Ball
2011-07-19  4:41                                                 ` Nicolas Pitre
2011-07-19  2:21                                           ` Al Viro
2011-07-19 23:45                               ` Al Viro
2011-07-19 23:52                                 ` Al Viro
2011-07-19 23:55                                   ` Al Viro
2011-07-20  0:47                                     ` NeilBrown
2011-07-20  1:40                                       ` Al Viro
2011-07-20  4:52                                         ` Linus Torvalds
2011-07-19 23:56                                 ` Linus Torvalds
2011-07-20  0:04                                   ` Al Viro
2011-07-17 23:53       ` Al Viro
2011-07-17 23:31   ` Hugh Dickins
2011-07-17 23:52     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110718020818.GW11013@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).