From: Yun Levi <ppbuk5246@gmail.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Guillaume Nault <gnault@redhat.com>,
Nicolas Dichtel <nicolas.dichtel@6wind.com>,
Eric Dumazet <edumazet@google.com>,
Li RongQing <lirongqing@baidu.com>,
Thomas Gleixner <tglx@linutronix.de>,
Johannes Berg <johannes.berg@intel.com>,
David Howells <dhowells@redhat.com>,
daniel@iogearbox.net, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH] netns: dangling pointer on netns bind mount point.
Date: Wed, 8 Apr 2020 14:59:17 +0900 [thread overview]
Message-ID: <CAM7-yPS_xh54H9M7B8-tAmPM4+w0VgnruJhK509upsDgZvcNhg@mail.gmail.com> (raw)
In-Reply-To: <20200407182609.GA23230@ZenIV.linux.org.uk>
Thank you for great comments. Thanks to you I understand what i missed.
I try to generate problem on mainline But, as you explained that
situation isn't happen,
Maybe my other things which I made generate some problem (freeing
network namespace..)
Thanks for great answering and sharing.
If I meet the situation, at that time I'll share. Thank you very much!
P.S. If I have a question, Could I ask via e-mail like this?
On Wed, Apr 8, 2020 at 3:26 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Apr 07, 2020 at 09:53:29PM +0900, Yun Levi wrote:
> > BTW, It's my question.
> >
> > >Look: we call ns_get_path(), which calls ns_get_path_cb(), which
> > >calls the callback passed to it (ns_get_path_task(), in this case),
> > >which grabs a reference to ns. Then we pass that reference to
> > >__ns_get_path().
> > >
> > >__ns_get_path() looks for dentry stashed in ns. If there is one
> > >and it's still alive, we grab a reference to that dentry and drop
> > >the reference to ns - no new inodes had been created, so no new
> > >namespace references have appeared. Existing inode is pinned
> > >by dentry and dentry is pinned by _dentry_ reference we've got.
> >
> > actually ns_get_path is called in unshare(2).
>
> Yes, it does. Via perf_event_namespaces(), which does
> perf_fill_ns_link_info(&ns_link_info[NET_NS_INDEX],
> task, &netns_operations);
> and there we have
> error = ns_get_path(&ns_path, task, ns_ops);
> if (!error) {
> ns_inode = ns_path.dentry->d_inode;
> ns_link_info->dev = new_encode_dev(ns_inode->i_sb->s_dev);
> ns_link_info->ino = ns_inode->i_ino;
> path_put(&ns_path);
> }
> See that path_put()? Dentry reference is dropped by it.
>
> > and it makes new dentry and
> > inode in __ns_get_path finally (Cuz it create new network namespace)
> >
> > In that case, when I mount with --bind option to this proc/self/ns/net, it
> > only increase dentry refcount on do_loopback->clone_mnt finally (not call
> > netns_operation->get)
> > That means it's not increase previous created network namespace reference
> > count but only increase reference count of _dentry_
> >
> > In that situation, If I exit the child process it definitely frees the
> > net_namespace previous created at the same time it decrease net_namespace's
> > refcnt in exit_task_namespace().
> > It means it's possible that bind mount point can hold the dentry with inode
> > having net_namespace's dangling pointer in another process.
> > In above situation, parent who know that binded mount point calls setns(2)
> > then it sets the net_namespace which is refered by the inode of the dentry
> > increased by do_loopback.
> > That makes set the net_namespace which was freed already.
>
> How? Netns reference in inode contributes to netns refcount. And it's held
> for as long as the _inode_ has positive refcount - we only drop it from
> the inode destructor, *NOT* every time we drop a reference to inode.
> In the similar fashion, the inode reference in dentry contributes to inode
> refcount. And again, that inode reference won't be dropped until the _last_
> reference to dentry gets dropped.
>
> Incrementing refcount of dentry is enough to pin the inode and thus the
> netns the inode refers to. It's a very common pattern with refcounting;
> a useful way of thinking about it is to consider the refcount of e.g.
> inode as sum of several components, one of them being "number of struct
> dentry instances with ->d_inode pointing to our inode". And look at e.g.
> __ns_get_path() like this:
> rcu_read_lock();
> d = atomic_long_read(&ns->stashed);
> if (!d)
> goto slow;
> dentry = (struct dentry *)d;
> if (!lockref_get_not_dead(&dentry->d_lockref))
> goto slow;
> other_count(dentry) got incremented by 1.
> rcu_read_unlock();
> ns->ops->put(ns);
> other_count(ns) decremented by 1.
> got_it:
> path->mnt = mntget(mnt);
> path->dentry = dentry;
> path added to paths(dentry), other_count(dentry) decremented by 1 (getting
> it back to the original value).
> return 0;
> slow:
> rcu_read_unlock();
> inode = new_inode_pseudo(mnt->mnt_sb);
> if (!inode) {
> ns->ops->put(ns);
> subtract 1 from other_count(ns)
> return -ENOMEM;
> }
> dentries(inode) = empty
> other_count(inode) = 1
> ....
> inode->i_private = ns;
> add inode to inodes(ns), subtract 1 from other_count(ns); the total
> is unchanged.
> dentry = d_alloc_anon(mnt->mnt_sb);
> if (!dentry) {
> iput(inode);
> subtract 1 from other_count(inode). Since now all components of
> inode refcount are zero, inode gets destroyed. Destructor calls
> nsfs_evict_inode(), which removes the inode from inodes(ns).
> The total effect: inode is destroyed, inodes(ns) is back to what
> it used to be and other_count(ns) is left decremented by 1 compared
> to what we used to have. IOW, the balance is the same as if inode
> allocation would've failed.
> return -ENOMEM;
> }
> other_count(dentry) = 1
> d_instantiate(dentry, inode);
> add dentry to dentries(inode), subtract 1 from other_count(inode).
> The total is unchanged. Now other_count(inode) is 0 and dentries(inode)
> is {dentry}.
> d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
> if (d) {
> somebody else has gotten there first
> d_delete(dentry); /* make sure ->d_prune() does nothing */
> dput(dentry);
> subtract 1 from other_count(dentry) (which will drive it to 0). Since
> no other references exist, dentry gets destroyed. Destructor will
> remove dentry from dentries(inode) and since other_count(inode) is already
> zero, trigger destruction of inode. That, in turn, will remove inode
> from inodes(ns). Total effect: dentry is destroyed, inode is destroyed,
> inodes(ns) is back to what it used to be, other_count(ns) is left decremented
> by 1 compared to what we used to have.
> cpu_relax();
> return -EAGAIN;
> }
> goto got_it;
> got_it:
> path->mnt = mntget(mnt);
> path->dentry = dentry;
> add path to paths(dentry), subtract 1 from other_count(dentry). At that
> point other_count(dentry) is back to 0, ditto for other_count(inode) and
> other_count(ns) is left decremented by 1 compared to what it used to be.
> inode is added to inodes(ns), dentry - to dentries(inode) and path - to
> paths(dentry).
> return 0;
> and we are done.
>
> In all cases the total effect is the same as far as "other" counts go:
> other_count(ns) is down by 1 and that's the only change in other_count()
> of *ANY* objects. Of course we do not keep track of the sets explicitly;
> it would cost too much and we only interested in the sum of their sizes
> anyway. What we actually store is the sum, so operations like "transfer
> the reference from one component to another" are not immediately obvious
> to be refcounting ones - the sum is unchanged. Conceptually, though,
> they are refcounting operations.
> Up to d_instantiate() we are holding a reference to inode;
> after that we are *not* - it has been transferred to dentry. That's
> why on subsequent failure exits we do not call iput() - the inode
> reference is not ours to discard anymore.
> In the same way, up to inode->i_private = ns; we are holding
> a reference to ns. After that we are not - it's been transferred to
> inode. From that point on it's not ours to discard; it will be
> dropped when inode gets destroyed, whenever that happens.
next prev parent reply other threads:[~2020-04-08 5:59 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-07 2:35 [PATCH] netns: dangling pointer on netns bind mount point Levi
2020-04-07 3:05 ` Al Viro
2020-04-07 3:13 ` Al Viro
2020-04-07 3:29 ` Yun Levi
2020-04-07 4:03 ` Al Viro
[not found] ` <CAM7-yPRaQsNgZKjru40nM1N_u8HVLVKmJCAzu20DcPL=jzKjWQ@mail.gmail.com>
2020-04-07 12:57 ` Fwd: " Yun Levi
2020-04-07 18:26 ` Al Viro
2020-04-08 5:59 ` Yun Levi [this message]
2020-04-08 13:48 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAM7-yPS_xh54H9M7B8-tAmPM4+w0VgnruJhK509upsDgZvcNhg@mail.gmail.com \
--to=ppbuk5246@gmail.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=gnault@redhat.com \
--cc=johannes.berg@intel.com \
--cc=kuba@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=netdev@vger.kernel.org \
--cc=nicolas.dichtel@6wind.com \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).