From mboxrd@z Thu Jan  1 00:00:00 1970
From: ebiederm@xmission.com (Eric W. Biederman)
Subject: Re: [PATCH 1/3] net: Fix possible race in peernet2id_alloc()
Date: Thu, 21 Dec 2017 11:39:44 -0600
Message-ID: <87ind0ou8v.fsf@xmission.com>
References: <151386201910.3724.7199367937841370542.stgit@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain
Cc: netdev@vger.kernel.org, davem@davemloft.net, eric.dumazet@gmail.com
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from out02.mta.xmission.com ([166.70.13.232]:59371 "EHLO
        out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1752483AbdLURkM (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 21 Dec 2017 12:40:12 -0500
In-Reply-To: <151386201910.3724.7199367937841370542.stgit@localhost.localdomain>
        (Kirill Tkhai's message of "Thu, 21 Dec 2017 16:13:52 +0300")
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

Kirill Tkhai <ktkhai@virtuozzo.com> writes:

> peernet2id_alloc() is racy without rtnl_lock() as atomic_read(&peer->count)
> under net->nsid_lock does not guarantee, peer is alive:
>
> rcu_read_lock()
> peernet2id_alloc()                            ..
>   spin_lock_bh(&net->nsid_lock)               ..
>   atomic_read(&peer->count) == 1              ..
>   ..                                          put_net()
>   ..                                            cleanup_net()
>   ..                                              for_each_net(tmp)
>   ..                                                spin_lock_bh(&tmp->nsid_lock)
>   ..                                                __peernet2id(tmp, net) == -1
>   ..                                                    ..
>   ..                                                    ..
>     __peernet2id_alloc(alloc == true)                   ..
>   ..                                                    ..
> rcu_read_unlock()                                       ..
> ..                                                synchronize_rcu()
> ..                                                kmem_cache_free(net)
>
> After the above situation, net::netns_id contains id pointing to freed memory,
> and any other dereferencing by the id will operate with this freed memory.
>
> Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
> ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
> is generic interface, and better we fix it before someone really starts
> use it in wrong context.

So it comes down to this piece of code from ovs and just let me say ick.
	if (!net_eq(net, dev_net(vport->dev))) {
		int id = peernet2id_alloc(net, dev_net(vport->dev));

		if (nla_put_s32(skb, OVS_VPORT_ATTR_NETNSID, id))
			goto nla_put_failure;
	}

Without the rtnl lock dev_net can cange between the test and the
call of peernet2id_alloc.

At first glance it looks like the bug is that we are running a control
path of the networking stack without the rtnl lock. So it may be that
ASSERT_RTNL() is the better fix.

Given that it would be nice to reduce the scope of the rtnl lock this
might not be a bad direction.  Let me see.

Is rtnl_notify safe without the rtnl lock?


>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  net/core/net_namespace.c |   23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 60a71be75aea..6a4eab438221 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -221,17 +221,32 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int id);
>   */
>  int peernet2id_alloc(struct net *net, struct net *peer)
>  {
> -	bool alloc;
> +	bool alloc = false, alive = false;
>  	int id;

        ^^^ Perhaps we want "ASSERT_RTNL();" here?
>  
> -	if (atomic_read(&net->count) == 0)
> -		return NETNSA_NSID_NOT_ASSIGNED;

Moving this hunk is of no benefit.  The code must be called with a valid
reference to net.   Which means net->count is a fancy way of testing to
see if the code is in cleanup_net.  In all other cases net->count should
be non-zero and it should remain that way because of our caller must
keep a reference.

>  	spin_lock_bh(&net->nsid_lock);
> -	alloc = atomic_read(&peer->count) == 0 ? false : true;
> +	/* Spinlock guarantees we never hash a peer to net->netns_ids
> +	 * after idr_destroy(&net->netns_ids) occurs in cleanup_net().
> +	 */
> +	if (atomic_read(&net->count) == 0) {
> +		id = NETNSA_NSID_NOT_ASSIGNED;
> +		goto unlock;
> +	}
> +	/*
> +	 * When peer is obtained from RCU lists, we may race with
> +	 * its cleanup. Check whether it's alive, and this guarantees
> +	 * we never hash a peer back to net->netns_ids, after it has
> +	 * just been idr_remove()'d from there in cleanup_net().
> +	 */
> +	if (maybe_get_net(peer))
> +		alive = alloc = true;

Yes this does seem reasonable.  The more obvious looking code which
would return NETNSA_NSID_NOT_ASSIGNED if the peer has a count of 0, is
silly as it makes would make it appear that a peer is momentary outside
of a network namespace when the peer is in fact moving from one network
namespace to another.
        
>  	id = __peernet2id_alloc(net, peer, &alloc);
> +unlock:
>  	spin_unlock_bh(&net->nsid_lock);
>  	if (alloc && id >= 0)
>  		rtnl_net_notifyid(net, RTM_NEWNSID, id);
                ^^^^^^
                Is this safe without the rtnl lock?
> +	if (alive)
> +		put_net(peer);
>  	return id;
>  }
>  EXPORT_SYMBOL_GPL(peernet2id_alloc);

Eric