linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nick Piggin <npiggin@kernel.dk>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] vfs: fix race in rcu lookup of pruned dentry
Date: Sun, 17 Jul 2011 15:00:06 -0700	[thread overview]
Message-ID: <CA+55aFw5LhGq-a6mAhgZbGP2FJ+tnV9HRxKoiFuVYbttOo2tNw@mail.gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1107171356480.3175@sister.anvils>

On Sun, Jul 17, 2011 at 2:03 PM, Hugh Dickins <hughd@google.com> wrote:
>
> That -ENOENT in walk_component: isn't it assuming we found a negative
> dentry, before reaching the read_seqcount_retry which complete_walk
> (or nameidata_drop_rcu_last before 3.0) would use to confirm a successful
> lookup?

Hmm. I think you're right. The ENOENT will basically short-circuit the
full proper checks.

>  And can't memory pressure prune a dentry, coming to dentry_kill
> which __d_drops to unhash before dentry_iput resets d_inode to NULL, but
> the dentry_rcuwalk_barrier between those is ineffective if the other end
> ignores the seqcount?

Yes. However, looking at it, I'm not very happy with your patch. It
doesn't really make sense to me to special-case the NULL inode and
only do a seq_retry for that case.

I kind of see why you do it for that particular bug, but at the same
time, it just makes me go "Eww". If that inode isn't NULL yet, you
then return the dentry that can get a NULL d_inode later. So the only
special thing there is that we happen to check for a NULL inode there.
What protects *later* checks for a NULL d_inode?

So my gut feel is that we should instead

 - either remove the -ENOENT return at that point entirely, and move
it to after we have re-verified the dentry lookup for other reasons.
That looks pretty involved, though, and those paths do end up
accessing inode data structures etc, so it looks less than trivial.

OR

 - simply just not clear d_inode at all in d_kill(), so that when we
prune a dentry due to memory pressure, it doesn't actually change the
state of the dentry.

and I think the second solution is the right one. It's kind of odd:
we'll have called down to the iput() routine, and the inode will be
"gone", but that is already true for the *normal* race of actually
deleting a file too, and we have that whole "inodes are RCU-released",
so the inode allocation will still exist.

So my gut feel is that we should instead just do this:

  --- a/fs/dcache.c
  +++ b/fs/dcache.c
  @@ -187,7 +187,6 @@ static void dentry_iput(struct dentry * dentry)
   {
          struct inode *inode = dentry->d_inode;
          if (inode) {
  -               dentry->d_inode = NULL;
                  list_del_init(&dentry->d_alias);
                  spin_unlock(&dentry->d_lock);
                  spin_unlock(&inode->i_lock);

and see what the fall-out from that would be. Nobody should then *use*
the stale inode, because __d_drop has done that
dentry_rcuwalk_barrier(). So we avoid the NULL inode special case
entirely.

Comments?

The above (whitespace-damaged) patch may look trivial, but it is
*entirely* untested, and maybe my gut feel that the above is the right
way to solve the problem is just wrong.

Al, any reactions? Hugh, does the above patch work for your
stress-test case? Or, indeed, at all?

                       Linus

  reply	other threads:[~2011-07-17 22:00 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-17 21:03 [PATCH] vfs: fix race in rcu lookup of pruned dentry Hugh Dickins
2011-07-17 22:00 ` Linus Torvalds [this message]
2011-07-17 22:59   ` Linus Torvalds
2011-07-17 23:26     ` Al Viro
2011-07-17 23:16   ` Al Viro
2011-07-17 23:38     ` Linus Torvalds
2011-07-17 23:47       ` Hugh Dickins
2011-07-18  0:25         ` Al Viro
2011-07-18  1:13           ` Hugh Dickins
2011-07-18  2:08             ` Al Viro
2011-07-18  6:31               ` Linus Torvalds
2011-07-18 14:41                 ` Hugh Dickins
2011-07-18 18:11                 ` Linus Torvalds
2011-07-18 18:20                   ` Al Viro
2011-07-18 19:08                     ` Linus Torvalds
2011-07-18 19:20                       ` Al Viro
2011-07-18 19:23                         ` Al Viro
2011-07-18 19:34                         ` Linus Torvalds
2011-07-18 19:04                   ` Hugh Dickins
2011-07-18 19:33                     ` Linus Torvalds
2011-07-18 19:47                       ` Al Viro
2011-07-18 20:24                         ` Linus Torvalds
2011-07-18 21:19                           ` Hugh Dickins
2011-07-18 21:42                             ` Linus Torvalds
2011-07-18 22:43                               ` Hugh Dickins
2011-07-18 23:17                                 ` Al Viro
2011-07-18 23:21                                   ` Al Viro
2011-07-18 23:27                                     ` Linus Torvalds
2011-07-18 23:40                                       ` Al Viro
2011-07-19  2:07                                         ` Hugh Dickins
2011-07-19  2:14                                           ` Linus Torvalds
2011-07-19  2:17                                             ` Linus Torvalds
2011-07-19  2:23                                               ` Al Viro
2011-07-19  2:37                                                 ` Chris Ball
2011-07-19  4:41                                                 ` Nicolas Pitre
2011-07-19  2:21                                           ` Al Viro
2011-07-19 23:45                               ` Al Viro
2011-07-19 23:52                                 ` Al Viro
2011-07-19 23:55                                   ` Al Viro
2011-07-20  0:47                                     ` NeilBrown
2011-07-20  1:40                                       ` Al Viro
2011-07-20  4:52                                         ` Linus Torvalds
2011-07-19 23:56                                 ` Linus Torvalds
2011-07-20  0:04                                   ` Al Viro
2011-07-17 23:53       ` Al Viro
2011-07-17 23:31   ` Hugh Dickins
2011-07-17 23:52     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFw5LhGq-a6mAhgZbGP2FJ+tnV9HRxKoiFuVYbttOo2tNw@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).