netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] net: race condition when removing virtual net_device
@ 2013-09-09 23:15 Francesco Ruggeri
  2013-09-10  0:57 ` Stephen Hemminger
  0 siblings, 1 reply; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-09 23:15 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev
  Cc: Francesco Ruggeri

There is a race condition when removing a net_device while its net namespace
is also being removed.
This can result in a crash or other unpredictable behavior.
This is a sample scenario with veth, but the same applies to other virtual
devices such as vlan and macvlan.
veth pair v0-v1 is created, with v0 in namespace ns0 and v1 in ns1.
Process p0 deletes v0. v1 is also unregistered (in veth_dellink), so when p0
gets to netdev_run_todo both v0 and v1 are in net_todo_list, and they have both
been unlisted from their respective namespaces. If all references to v1 have not
already been dropped then netdev_run_todo/netdev_wait_allrefs will call netdev
notifiers for v1, releasing the rtnl lock between calls.
Now process p1 removes namespace ns1. v1 will not prevent this from happening
(in default_device_exit_batch) since it was already unlisted by p0.
Next time p0 invokes the notifiers for v1 any notifiers that use dev_net(v1)
will get a pointer to a namespace that has been or is being removed.
Similar scenarios apply with v1 as a vlan or macvlan interface and v0 as its
real device.
We hit this problem in 3.4 with sequence fib_netdev_event, fib_disable_ip,
rt_cache_flush, rt_cache_invalidate, inetpeer_invalidate_tree. That sequence no
longer applies in later kernels, but unless we can guarantee that no
NETDEV_UNREGISTER or NETDEV_UNREGISTER_FINAL handler accesses a net_device's
dev_net(dev) then there is a vulnerability (this happens for example with
NETDEV_UNREGISTER_FINAL in dst_dev_event/dst_ifdown).
Commit 0115e8e3 later made things better by reducing the chances of this
happening, but the underlying problem still seems to be there.
I would like to get some feedback on this patch.
The idea is to take a reference to the loopback_dev of a net_device's
namespace (which is always the last net_device to be removed when a namespace
is destroyed) when the net_device is unlisted, and release it when the
net_device is disposed of.
To avoid deadlocks in cleanup_net all loopback_devs are queued at the end
of net_todo_list.

Tested on Linux 3.4.

Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
---
 net/core/dev.c |   30 +++++++++++++++++++++++++++++-
 1 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 26755dd..da2fd78 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -225,10 +225,14 @@ static void list_netdevice(struct net_device *dev)
 }
 
 /* Device list removal
+ * Take a reference to dev_net(dev)->loopback_dev, so dev_net(dev)
+ * will not be freed until we are done with dev.
  * caller must respect a RCU grace period before freeing/reusing dev
  */
 static void unlist_netdevice(struct net_device *dev)
 {
+	struct net_device *loopback_dev = dev_net(dev)->loopback_dev;
+
 	ASSERT_RTNL();
 
 	/* Unlink dev from the device chain */
@@ -238,9 +242,23 @@ static void unlist_netdevice(struct net_device *dev)
 	hlist_del_rcu(&dev->index_hlist);
 	write_unlock_bh(&dev_base_lock);
 
+	if (dev != loopback_dev)
+		dev_hold(loopback_dev);
+
 	dev_base_seq_inc(dev_net(dev));
 }
 
+/**
+ * Called when a net_device that has been previously unlisted from a net
+ * namespace is disposed of.
+ */
+static inline void unlist_netdevice_done(struct net_device *dev)
+{
+	struct net_device *loopback_dev = dev_net(dev)->loopback_dev;
+	if (dev != loopback_dev)
+		dev_put(loopback_dev);
+}
+
 /*
  *	Our notifier list
  */
@@ -5009,10 +5027,17 @@ static int dev_new_index(struct net *net)
 
 /* Delayed registration/unregisteration */
 static LIST_HEAD(net_todo_list);
+static struct list_head *first_loopback_dev = &net_todo_list;
 
 static void net_set_todo(struct net_device *dev)
 {
-	list_add_tail(&dev->todo_list, &net_todo_list);
+	/* All loopback_devs go to end of net_todo_list. */
+	if (dev == dev_net(dev)->loopback_dev) {
+		list_add_tail(&dev->todo_list, &net_todo_list);
+		if (first_loopback_dev == &net_todo_list)
+			first_loopback_dev = &dev->todo_list;
+	} else
+		list_add_tail(&dev->todo_list, first_loopback_dev);
 }
 
 static void rollback_registered_many(struct list_head *head)
@@ -5641,6 +5666,7 @@ void netdev_run_todo(void)
 
 	/* Snapshot list, allow later requests */
 	list_replace_init(&net_todo_list, &list);
+	first_loopback_dev = &net_todo_list;
 
 	__rtnl_unlock();
 
@@ -5670,6 +5696,7 @@ void netdev_run_todo(void)
 		on_each_cpu(flush_backlog, dev, 1);
 
 		netdev_wait_allrefs(dev);
+		unlist_netdevice_done(dev);
 
 		/* paranoia */
 		BUG_ON(netdev_refcnt_read(dev));
@@ -6076,6 +6103,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
 
 	/* Actually switch the network namespace */
+	unlist_netdevice_done(dev);
 	dev_net_set(dev, net);
 
 	/* If there is an ifindex conflict assign a new one */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2013-09-28 22:07 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1379008796-2121-1-git-send-email-fruggeri@aristanetworks.com>
2013-09-12 20:06 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
2013-09-12 21:48   ` Francesco Ruggeri
2013-09-12 22:02     ` Francesco Ruggeri
2013-09-13  5:50     ` Eric W. Biederman
2013-09-13 17:54       ` Francesco Ruggeri
2013-09-14  1:46         ` Eric W. Biederman
2013-09-16  2:54           ` Francesco Ruggeri
2013-09-16 10:45             ` Eric W. Biederman
2013-09-16 20:30               ` Francesco Ruggeri
2013-09-16 23:52                 ` [PATCH net-next] net loopback: Set loopback_dev to NULL when freed Eric W. Biederman
2013-09-17  0:50                   ` Eric Dumazet
2013-09-17  1:34                     ` David Miller
2013-09-17  1:41                       ` Eric Dumazet
2013-09-17  1:52                         ` Eric W. Biederman
2013-09-17 23:05                           ` David Miller
2013-09-17  0:25                 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
2013-09-17  5:12                   ` Francesco Ruggeri
2013-09-17  3:49                 ` [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering Eric W. Biederman
2013-09-17  6:54                   ` Francesco Ruggeri
2013-09-17  9:38                     ` Eric W. Biederman
2013-09-17 17:14                       ` Francesco Ruggeri
2013-09-17 23:21                   ` David Miller
2013-09-17 23:41                     ` Eric W. Biederman
2013-09-18  0:15                       ` David Miller
2013-09-18  3:50                         ` Francesco Ruggeri
2013-09-18  3:52                           ` David Miller
2013-09-18  8:19                             ` Eric W. Biederman
2013-09-20 16:34                               ` Francesco Ruggeri
2013-09-24  4:19                             ` [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2 Eric W. Biederman
2013-09-24 17:54                               ` Francesco Ruggeri
2013-09-28 22:14                               ` David Miller
2013-09-14  0:16   ` [PATCH 1/1] net: race condition when removing virtual net_device David Miller
2013-09-14  1:32     ` Eric W. Biederman
2013-09-09 23:15 Francesco Ruggeri
2013-09-10  0:57 ` Stephen Hemminger
2013-09-10  2:03   ` Francesco Ruggeri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).