linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yun Levi <ppbuk5246@gmail.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Guillaume Nault <gnault@redhat.com>,
	Nicolas Dichtel <nicolas.dichtel@6wind.com>,
	Eric Dumazet <edumazet@google.com>,
	Li RongQing <lirongqing@baidu.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Johannes Berg <johannes.berg@intel.com>,
	David Howells <dhowells@redhat.com>,
	daniel@iogearbox.net, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH] netns: dangling pointer on netns bind mount point.
Date: Wed, 8 Apr 2020 14:59:17 +0900	[thread overview]
Message-ID: <CAM7-yPS_xh54H9M7B8-tAmPM4+w0VgnruJhK509upsDgZvcNhg@mail.gmail.com> (raw)
In-Reply-To: <20200407182609.GA23230@ZenIV.linux.org.uk>

Thank you for great comments. Thanks to you I understand what i missed.

I try to generate problem on mainline But, as you explained that
situation isn't happen,

Maybe my other things which I made generate some problem (freeing
network namespace..)

Thanks for great answering and sharing.

If I meet the situation, at that time I'll share. Thank you very much!

P.S. If I have a question, Could I ask via e-mail like this?

On Wed, Apr 8, 2020 at 3:26 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Apr 07, 2020 at 09:53:29PM +0900, Yun Levi wrote:
> > BTW, It's my question.
> >
> > >Look: we call ns_get_path(), which calls ns_get_path_cb(), which
> > >calls the callback passed to it (ns_get_path_task(), in this case),
> > >which grabs a reference to ns.  Then we pass that reference to
> > >__ns_get_path().
> > >
> > >__ns_get_path() looks for dentry stashed in ns.  If there is one
> > >and it's still alive, we grab a reference to that dentry and drop
> > >the reference to ns - no new inodes had been created, so no new
> > >namespace references have appeared.  Existing inode is pinned
> > >by dentry and dentry is pinned by _dentry_ reference we've got.
> >
> > actually ns_get_path is called in unshare(2).
>
> Yes, it does.  Via perf_event_namespaces(), which does
>         perf_fill_ns_link_info(&ns_link_info[NET_NS_INDEX],
>                                task, &netns_operations);
> and there we have
>         error = ns_get_path(&ns_path, task, ns_ops);
>         if (!error) {
>                 ns_inode = ns_path.dentry->d_inode;
>                 ns_link_info->dev = new_encode_dev(ns_inode->i_sb->s_dev);
>                 ns_link_info->ino = ns_inode->i_ino;
>                 path_put(&ns_path);
>         }
> See that path_put()?  Dentry reference is dropped by it.
>
> > and it makes new dentry and
> > inode in __ns_get_path finally (Cuz it create new network namespace)
> >
> > In that case, when I mount with --bind option to this proc/self/ns/net, it
> > only increase dentry refcount on do_loopback->clone_mnt finally (not call
> > netns_operation->get)
> > That means it's not increase previous created network namespace reference
> > count but only increase reference count of _dentry_
> >
> > In that situation, If I exit the child process it definitely frees the
> > net_namespace previous created at the same time it decrease net_namespace's
> > refcnt in exit_task_namespace().
> > It means it's possible that bind mount point can hold the dentry with inode
> > having net_namespace's dangling pointer in another process.
> > In above situation, parent who know that binded mount point calls setns(2)
> > then it sets the net_namespace which is refered by the inode of the dentry
> > increased by do_loopback.
> > That makes set the net_namespace which was freed already.
>
> How?  Netns reference in inode contributes to netns refcount.  And it's held
> for as long as the _inode_ has positive refcount - we only drop it from
> the inode destructor, *NOT* every time we drop a reference to inode.
> In the similar fashion, the inode reference in dentry contributes to inode
> refcount.  And again, that inode reference won't be dropped until the _last_
> reference to dentry gets dropped.
>
> Incrementing refcount of dentry is enough to pin the inode and thus the
> netns the inode refers to.  It's a very common pattern with refcounting;
> a useful way of thinking about it is to consider the refcount of e.g.
> inode as sum of several components, one of them being "number of struct
> dentry instances with ->d_inode pointing to our inode".  And look at e.g.
> __ns_get_path() like this:
>         rcu_read_lock();
>         d = atomic_long_read(&ns->stashed);
>         if (!d)
>                 goto slow;
>         dentry = (struct dentry *)d;
>         if (!lockref_get_not_dead(&dentry->d_lockref))
>                 goto slow;
> other_count(dentry) got incremented by 1.
>         rcu_read_unlock();
>         ns->ops->put(ns);
> other_count(ns) decremented by 1.
> got_it:
>         path->mnt = mntget(mnt);
>         path->dentry = dentry;
> path added to paths(dentry), other_count(dentry) decremented by 1 (getting
> it back to the original value).
>         return 0;
> slow:
>         rcu_read_unlock();
>         inode = new_inode_pseudo(mnt->mnt_sb);
>         if (!inode) {
>                 ns->ops->put(ns);
> subtract 1 from other_count(ns)
>                 return -ENOMEM;
>         }
> dentries(inode) = empty
> other_count(inode) = 1
>         ....
>         inode->i_private = ns;
> add inode to inodes(ns), subtract 1 from other_count(ns); the total
> is unchanged.
>         dentry = d_alloc_anon(mnt->mnt_sb);
>         if (!dentry) {
>                 iput(inode);
> subtract 1 from other_count(inode).  Since now all components of
> inode refcount are zero, inode gets destroyed.  Destructor calls
> nsfs_evict_inode(), which removes the inode from inodes(ns).
> The total effect: inode is destroyed, inodes(ns) is back to what
> it used to be and other_count(ns) is left decremented by 1 compared
> to what we used to have.  IOW, the balance is the same as if inode
> allocation would've failed.
>                 return -ENOMEM;
>         }
> other_count(dentry) = 1
>         d_instantiate(dentry, inode);
> add dentry to dentries(inode), subtract 1 from other_count(inode).
> The total is unchanged.  Now other_count(inode) is 0 and dentries(inode)
> is {dentry}.
>         d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
>         if (d) {
> somebody else has gotten there first
>                 d_delete(dentry);       /* make sure ->d_prune() does nothing */
>                 dput(dentry);
> subtract 1 from other_count(dentry) (which will drive it to 0).  Since
> no other references exist, dentry gets destroyed.  Destructor will
> remove dentry from dentries(inode) and since other_count(inode) is already
> zero, trigger destruction of inode.  That, in turn, will remove inode
> from inodes(ns).  Total effect: dentry is destroyed, inode is destroyed,
> inodes(ns) is back to what it used to be, other_count(ns) is left decremented
> by 1 compared to what we used to have.
>                 cpu_relax();
>                 return -EAGAIN;
>         }
>         goto got_it;
> got_it:
>         path->mnt = mntget(mnt);
>         path->dentry = dentry;
> add path to paths(dentry), subtract 1 from other_count(dentry).  At that
> point other_count(dentry) is back to 0, ditto for other_count(inode) and
> other_count(ns) is left decremented by 1 compared to what it used to be.
> inode is added to inodes(ns), dentry - to dentries(inode) and path - to
> paths(dentry).
>         return 0;
> and we are done.
>
> In all cases the total effect is the same as far as "other" counts go:
> other_count(ns) is down by 1 and that's the only change in other_count()
> of *ANY* objects.  Of course we do not keep track of the sets explicitly;
> it would cost too much and we only interested in the sum of their sizes
> anyway.  What we actually store is the sum, so operations like "transfer
> the reference from one component to another" are not immediately obvious
> to be refcounting ones - the sum is unchanged.  Conceptually, though,
> they are refcounting operations.
>         Up to d_instantiate() we are holding a reference to inode;
> after that we are *not* - it has been transferred to dentry.  That's
> why on subsequent failure exits we do not call iput() - the inode
> reference is not ours to discard anymore.
>         In the same way, up to inode->i_private = ns; we are holding
> a reference to ns.  After that we are not - it's been transferred to
> inode.  From that point on it's not ours to discard; it will be
> dropped when inode gets destroyed, whenever that happens.

  reply	other threads:[~2020-04-08  5:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-07  2:35 [PATCH] netns: dangling pointer on netns bind mount point Levi
2020-04-07  3:05 ` Al Viro
2020-04-07  3:13   ` Al Viro
2020-04-07  3:29     ` Yun Levi
2020-04-07  4:03       ` Al Viro
     [not found]         ` <CAM7-yPRaQsNgZKjru40nM1N_u8HVLVKmJCAzu20DcPL=jzKjWQ@mail.gmail.com>
2020-04-07 12:57           ` Fwd: " Yun Levi
2020-04-07 18:26           ` Al Viro
2020-04-08  5:59             ` Yun Levi [this message]
2020-04-08 13:48               ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM7-yPS_xh54H9M7B8-tAmPM4+w0VgnruJhK509upsDgZvcNhg@mail.gmail.com \
    --to=ppbuk5246@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=gnault@redhat.com \
    --cc=johannes.berg@intel.com \
    --cc=kuba@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).