netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown
@ 2016-02-04 23:35 Salam Noureddine
  2016-02-04 23:35 ` [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
  Cc: Salam Noureddine

Added changes suggested by Julian Anastasov in version 2.

fib_flush walks the whole fib in a net_namespace and is called for
each net_device being closed or unregistered. This can be very expensive
when dealing with 100k or more routes in the fib and removal of a lot
of interfaces. These four patches deal with this issue by calling fib_flush
just once for each net namespace and introduce a new function arp_ifdown_all
that does a similar optimization for the neighbour table.

The benchmark tests were run on linux-3.18.

Salam Noureddine (4):
  net: add event_list to struct net and provide utility functions
  net: dev: add batching to net_device notifiers
  net: core: introduce neigh_ifdown_all for all down interfaces
  net: fib: avoid calling fib_flush for each device when doing batch
    close     and unregister

 include/linux/netdevice.h   |  2 ++
 include/net/arp.h           |  1 +
 include/net/neighbour.h     |  1 +
 include/net/net_namespace.h | 22 +++++++++++++++++++++
 include/net/netns/ipv4.h    |  1 +
 net/core/dev.c              | 48 ++++++++++++++++++++++++++++++++++++++++-----
 net/core/neighbour.c        | 38 ++++++++++++++++++++++++++++-------
 net/core/net_namespace.c    |  1 +
 net/ipv4/arp.c              |  4 ++++
 net/ipv4/fib_frontend.c     | 16 +++++++++++++--
 10 files changed, 120 insertions(+), 14 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions
  2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
@ 2016-02-04 23:35 ` Salam Noureddine
  2016-02-04 23:35 ` [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
  Cc: Salam Noureddine


Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
 include/net/net_namespace.h | 22 ++++++++++++++++++++++
 net/core/net_namespace.c    |  1 +
 2 files changed, 23 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..6dbc0b2 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -58,6 +58,7 @@ struct net {
 	struct list_head	list;		/* list of network namespaces */
 	struct list_head	cleanup_list;	/* namespaces on death row */
 	struct list_head	exit_list;	/* Use only net_mutex */
+	struct list_head	event_list;     /* net_device notifier list */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
 	spinlock_t		nsid_lock;
@@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net)
 	atomic_inc(&net->fnhe_genid);
 }
 
+#ifdef CONFIG_NET_NS
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+	if (list_empty(&net->event_list))
+		list_add_tail(&net->event_list, head);
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+	list_del_init(&net->event_list);
+}
+#else
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+}
+#endif
+
 #endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..58e84ce 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -282,6 +282,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	net->user_ns = user_ns;
 	idr_init(&net->netns_ids);
 	spin_lock_init(&net->nsid_lock);
+	INIT_LIST_HEAD(&net->event_list);
 
 	list_for_each_entry(ops, &pernet_list, list) {
 		error = ops_init(ops, net);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
  2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
  2016-02-04 23:35 ` [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
@ 2016-02-04 23:35 ` Salam Noureddine
  2016-02-06 18:58   ` Julian Anastasov
  2016-02-04 23:35 ` [PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces Salam Noureddine
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
  Cc: Salam Noureddine

This can be used to optimize bringing down and unregsitering
net_devices by running certain cleanup operations only on the
net namespace instead of on each net_device.

Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 48 ++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20b814..1b12269 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2183,6 +2183,8 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_BONDING_INFO	0x0019
 #define NETDEV_PRECHANGEUPPER	0x001A
 #define NETDEV_CHANGELOWERSTATE	0x001B
+#define NETDEV_UNREGISTER_BATCH	0x001C
+#define NETDEV_DOWN_BATCH	0x001D
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 914b4a2..dbd8995 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1439,6 +1439,8 @@ static int __dev_close(struct net_device *dev)
 int dev_close_many(struct list_head *head, bool unlink)
 {
 	struct net_device *dev, *tmp;
+	struct net *net, *net_tmp;
+	LIST_HEAD(net_head);
 
 	/* Remove the devices that don't need to be closed */
 	list_for_each_entry_safe(dev, tmp, head, close_list)
@@ -1447,13 +1449,22 @@ int dev_close_many(struct list_head *head, bool unlink)
 
 	__dev_close_many(head);
 
-	list_for_each_entry_safe(dev, tmp, head, close_list) {
+	list_for_each_entry(dev, head, close_list) {
 		rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
 		call_netdevice_notifiers(NETDEV_DOWN, dev);
+	}
+
+	list_for_each_entry_safe(dev, tmp, head, close_list) {
+		net_add_event_list(&net_head, dev_net(dev));
 		if (unlink)
 			list_del_init(&dev->close_list);
 	}
 
+	list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
+		call_netdevice_notifiers(NETDEV_DOWN_BATCH, net->loopback_dev);
+		net_del_event_list(net);
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(dev_close_many);
@@ -1572,12 +1583,17 @@ rollback:
 				call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
 							dev);
 				call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+				call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+							dev);
 			}
 			call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
 		}
+		call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+					net->loopback_dev);
 	}
 
 outroll:
+	call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
 	raw_notifier_chain_unregister(&netdev_chain, nb);
 	goto unlock;
 }
@@ -1614,9 +1630,13 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
 				call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
 							dev);
 				call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+				call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+							dev);
 			}
 			call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
 		}
+		call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+					net->loopback_dev);
 	}
 unlock:
 	rtnl_unlock();
@@ -6187,10 +6207,12 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
 		rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
 
 	if (changes & IFF_UP) {
-		if (dev->flags & IFF_UP)
+		if (dev->flags & IFF_UP) {
 			call_netdevice_notifiers(NETDEV_UP, dev);
-		else
+		} else {
 			call_netdevice_notifiers(NETDEV_DOWN, dev);
+			call_netdevice_notifiers(NETDEV_DOWN_BATCH, dev);
+		}
 	}
 
 	if (dev->flags & IFF_UP &&
@@ -6427,7 +6449,9 @@ static void net_set_todo(struct net_device *dev)
 static void rollback_registered_many(struct list_head *head)
 {
 	struct net_device *dev, *tmp;
+	struct net *net, *net_tmp;
 	LIST_HEAD(close_head);
+	LIST_HEAD(net_head);
 
 	BUG_ON(dev_boot_phase);
 	ASSERT_RTNL();
@@ -6465,8 +6489,6 @@ static void rollback_registered_many(struct list_head *head)
 	synchronize_net();
 
 	list_for_each_entry(dev, head, unreg_list) {
-		struct sk_buff *skb = NULL;
-
 		/* Shutdown queueing discipline. */
 		dev_shutdown(dev);
 
@@ -6475,6 +6497,20 @@ static void rollback_registered_many(struct list_head *head)
 		   this device. They should clean all the things.
 		*/
 		call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+	}
+
+	/* call batch notifiers which act on net namespaces */
+	list_for_each_entry(dev, head, unreg_list) {
+		net_add_event_list(&net_head, dev_net(dev));
+	}
+	list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
+		call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
+					 net->loopback_dev);
+		net_del_event_list(net);
+	}
+
+	list_for_each_entry(dev, head, unreg_list) {
+		struct sk_buff *skb = NULL;
 
 		if (!dev->rtnl_link_ops ||
 		    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
@@ -7065,6 +7101,7 @@ static void netdev_wait_allrefs(struct net_device *dev)
 
 			/* Rebroadcast unregister notification */
 			call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+			call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
 
 			__rtnl_unlock();
 			rcu_barrier();
@@ -7581,6 +7618,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	   the device is just moving and can keep their slaves up.
 	*/
 	call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+	call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
 	rcu_barrier();
 	call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
 	rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces
  2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
  2016-02-04 23:35 ` [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
  2016-02-04 23:35 ` [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
@ 2016-02-04 23:35 ` Salam Noureddine
  2016-02-04 23:35 ` [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
  2016-02-05 22:37 ` [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
  4 siblings, 0 replies; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
  Cc: Salam Noureddine

This cleans up neighbour entries for all interfaces in the down
state, avoiding walking the whole neighbour table for each interface
being brought down.

Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
 include/net/arp.h       |  1 +
 include/net/neighbour.h |  1 +
 net/core/neighbour.c    | 38 +++++++++++++++++++++++++++++++-------
 net/ipv4/arp.c          |  4 ++++
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..0efee66 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip,
 	      const unsigned char *src_hw, const unsigned char *th);
 int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir);
 void arp_ifdown(struct net_device *dev);
+void arp_ifdown_all(void);
 
 struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
 			   struct net_device *dev, __be32 src_ip,
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 8b68384..8785d7b 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags);
 void __neigh_set_probe_once(struct neighbour *neigh);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+int neigh_ifdown_all(struct neigh_table *tbl);
 int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f18ae91..bfbd97a 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -54,7 +54,8 @@ do {						\
 static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+			 bool all_down);
 
 #ifdef CONFIG_PROC_FS
 static const struct file_operations neigh_stat_seq_fops;
@@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list)
 	}
 }
 
-static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
+static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
+			    bool all_down)
 {
 	int i;
 	struct neigh_hash_table *nht;
@@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
 				np = &n->next;
 				continue;
 			}
+			if (!dev && n->dev && all_down) {
+				if (n->dev->flags & IFF_UP) {
+					np = &n->next;
+					continue;
+				}
+			}
 			rcu_assign_pointer(*np,
 				   rcu_dereference_protected(n->next,
 						lockdep_is_held(&tbl->lock)));
@@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
 {
 	write_lock_bh(&tbl->lock);
-	neigh_flush_dev(tbl, dev);
+	neigh_flush_dev(tbl, dev, false);
 	write_unlock_bh(&tbl->lock);
 }
 EXPORT_SYMBOL(neigh_changeaddr);
@@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 {
 	write_lock_bh(&tbl->lock);
-	neigh_flush_dev(tbl, dev);
-	pneigh_ifdown(tbl, dev);
+	neigh_flush_dev(tbl, dev, false);
+	pneigh_ifdown(tbl, dev, false);
 	write_unlock_bh(&tbl->lock);
 
 	del_timer_sync(&tbl->proxy_timer);
@@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 }
 EXPORT_SYMBOL(neigh_ifdown);
 
+int neigh_ifdown_all(struct neigh_table *tbl)
+{
+	write_lock_bh(&tbl->lock);
+	neigh_flush_dev(tbl, NULL, true);
+	pneigh_ifdown(tbl, NULL, true);
+	write_unlock_bh(&tbl->lock);
+
+	del_timer_sync(&tbl->proxy_timer);
+	pneigh_queue_purge(&tbl->proxy_queue);
+	return 0;
+}
+EXPORT_SYMBOL(neigh_ifdown_all);
+
 static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device *dev)
 {
 	struct neighbour *n = NULL;
@@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, const void *pkey,
 	return -ENOENT;
 }
 
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+			 bool all_down)
 {
 	struct pneigh_entry *n, **np;
 	u32 h;
@@ -653,7 +675,9 @@ static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 	for (h = 0; h <= PNEIGH_HASHMASK; h++) {
 		np = &tbl->phash_buckets[h];
 		while ((n = *np) != NULL) {
-			if (!dev || n->dev == dev) {
+			if ((!dev && !all_down) || (all_down && n->dev &&
+						    !(n->dev->flags & IFF_UP)) ||
+			    n->dev == dev) {
 				*np = n->next;
 				if (tbl->pdestructor)
 					tbl->pdestructor(n);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 59b3e0e..1328244 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1219,6 +1219,10 @@ void arp_ifdown(struct net_device *dev)
 	neigh_ifdown(&arp_tbl, dev);
 }
 
+void arp_ifdown_all(void)
+{
+	neigh_ifdown_all(&arp_tbl);
+}
 
 /*
  *	Called once on startup.
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
  2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
                   ` (2 preceding siblings ...)
  2016-02-04 23:35 ` [PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces Salam Noureddine
@ 2016-02-04 23:35 ` Salam Noureddine
  2016-02-05 16:04   ` Sergei Shtylyov
  2016-02-05 22:37 ` [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
  4 siblings, 1 reply; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
  Cc: Salam Noureddine

Call fib_flush at the end when closing or unregistering multiple
devices. This can save walking the fib many times and greatly
reduce rtnl_lock hold time when unregistering many devices with
a fib having hundreds of thousands of routes.

Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
 include/net/netns/ipv4.h |  1 +
 net/ipv4/fib_frontend.c  | 16 ++++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d75be32..d59a078 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -111,5 +111,6 @@ struct netns_ipv4 {
 #endif
 #endif
 	atomic_t	rt_genid;
+	bool		needs_fib_flush;
 };
 #endif
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4734475..808426e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 	unsigned int flags;
 
 	if (event == NETDEV_UNREGISTER) {
-		fib_disable_ip(dev, event, true);
+		if (fib_sync_down_dev(dev, event, true))
+			net->ipv4.needs_fib_flush = true;
 		rt_flush_dev(dev);
 		return NOTIFY_DONE;
 	}
 
+	if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) {
+		if (net->ipv4.needs_fib_flush) {
+			fib_flush(net);
+			net->ipv4.needs_fib_flush = false;
+		}
+		rt_cache_flush(net);
+		arp_ifdown_all();
+		return NOTIFY_DONE;
+	}
+
 	in_dev = __in_dev_get_rtnl(dev);
 	if (!in_dev)
 		return NOTIFY_DONE;
@@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 		rt_cache_flush(net);
 		break;
 	case NETDEV_DOWN:
-		fib_disable_ip(dev, event, false);
+		if (fib_sync_down_dev(dev, event, false))
+			net->ipv4.needs_fib_flush = true;
 		break;
 	case NETDEV_CHANGE:
 		flags = dev_get_flags(dev);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
  2016-02-04 23:35 ` [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
@ 2016-02-05 16:04   ` Sergei Shtylyov
  2016-02-07  6:09     ` Salam Noureddine
  0 siblings, 1 reply; 12+ messages in thread
From: Sergei Shtylyov @ 2016-02-05 16:04 UTC (permalink / raw)
  To: Salam Noureddine, David S. Miller, Eric Dumazet, Jiri Pirko,
	Alexei Starovoitov, Daniel Borkmann, Eric W. Biederman,
	Julian Anastasov, netdev

On 02/05/2016 02:35 AM, Salam Noureddine wrote:

> Call fib_flush at the end when closing or unregistering multiple
> devices. This can save walking the fib many times and greatly
> reduce rtnl_lock hold time when unregistering many devices with
> a fib having hundreds of thousands of routes.
>
> Signed-off-by: Salam Noureddine <noureddine@arista.com>
> ---
>   include/net/netns/ipv4.h |  1 +
>   net/ipv4/fib_frontend.c  | 16 ++++++++++++++--
>   2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index d75be32..d59a078 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
[...]
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 4734475..808426e 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>   	unsigned int flags;
>
>   	if (event == NETDEV_UNREGISTER) {
> -		fib_disable_ip(dev, event, true);
> +		if (fib_sync_down_dev(dev, event, true))
> +			net->ipv4.needs_fib_flush = true;
>   		rt_flush_dev(dev);
>   		return NOTIFY_DONE;
>   	}
>
> +	if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) {
> +		if (net->ipv4.needs_fib_flush) {
> +			fib_flush(net);
> +			net->ipv4.needs_fib_flush = false;
> +		}
> +		rt_cache_flush(net);
> +		arp_ifdown_all();
> +		return NOTIFY_DONE;
> +	}
> +

    I'd convert to *switch* the above 2 *if*'s...

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown
  2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
                   ` (3 preceding siblings ...)
  2016-02-04 23:35 ` [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
@ 2016-02-05 22:37 ` Salam Noureddine
  4 siblings, 0 replies; 12+ messages in thread
From: Salam Noureddine @ 2016-02-05 22:37 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Julian Anastasov,
	Network Development
  Cc: Salam Noureddine

Forgot to mention the rtnl_lock hold time gain with these changes.

I got the following benchmark results on one of our switches.
Without this patch, deleting 1k interfaces with 100k routes in the fib held
the rtnl_lock for 13 seconds. With the patch, rtnl_lock hold time went down
to 5 seconds. The gain is even more pronounced with 512k routes in the FIB.
In this case, without the patch, rtnl_lock was held for 36 seconds and with
the patch it was held for 5.5 seconds.

On Thu, Feb 4, 2016 at 3:35 PM, Salam Noureddine <noureddine@arista.com> wrote:
> Added changes suggested by Julian Anastasov in version 2.
>
> fib_flush walks the whole fib in a net_namespace and is called for
> each net_device being closed or unregistered. This can be very expensive
> when dealing with 100k or more routes in the fib and removal of a lot
> of interfaces. These four patches deal with this issue by calling fib_flush
> just once for each net namespace and introduce a new function arp_ifdown_all
> that does a similar optimization for the neighbour table.
>
> The benchmark tests were run on linux-3.18.
>
> Salam Noureddine (4):
>   net: add event_list to struct net and provide utility functions
>   net: dev: add batching to net_device notifiers
>   net: core: introduce neigh_ifdown_all for all down interfaces
>   net: fib: avoid calling fib_flush for each device when doing batch
>     close     and unregister
>
>  include/linux/netdevice.h   |  2 ++
>  include/net/arp.h           |  1 +
>  include/net/neighbour.h     |  1 +
>  include/net/net_namespace.h | 22 +++++++++++++++++++++
>  include/net/netns/ipv4.h    |  1 +
>  net/core/dev.c              | 48 ++++++++++++++++++++++++++++++++++++++++-----
>  net/core/neighbour.c        | 38 ++++++++++++++++++++++++++++-------
>  net/core/net_namespace.c    |  1 +
>  net/ipv4/arp.c              |  4 ++++
>  net/ipv4/fib_frontend.c     | 16 +++++++++++++--
>  10 files changed, 120 insertions(+), 14 deletions(-)
>
> --
> 1.8.1.4
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
  2016-02-04 23:35 ` [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
@ 2016-02-06 18:58   ` Julian Anastasov
  2016-02-07  6:25     ` Salam Noureddine
  0 siblings, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2016-02-06 18:58 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, netdev


	Hello,

On Thu, 4 Feb 2016, Salam Noureddine wrote:

> @@ -1572,12 +1583,17 @@ rollback:
>  				call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
>  							dev);
>  				call_netdevice_notifier(nb, NETDEV_DOWN, dev);
> +				call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
> +							dev);

	I now see that we should split the loop
here, so that NETDEV_DOWN_BATCH is called only once per net:

		bool down = false;

                for_each_netdev(net, dev) {
                        if (dev == last)
                                break;

                        if (dev->flags & IFF_UP) {
                                call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
                                                        dev);
                                call_netdevice_notifier(nb, NETDEV_DOWN, dev);
				down = true;
                        }
                }

		rt_cache_flush and arp_ifdown_all will be called
		on NETDEV_UNREGISTER_BATCH, so use 'down' flag:

		if (down)
			call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
						net->loopback_dev);
                for_each_netdev(net, dev) {
                        if (dev == last)
                                goto outroll;
                        call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
                }
		call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
					net->loopback_dev);


>  			}
>  			call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>  		}
> +		call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> +					net->loopback_dev);
>  	}
>  
>  outroll:
> +	call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
>  	raw_notifier_chain_unregister(&netdev_chain, nb);
>  	goto unlock;
>  }
> @@ -1614,9 +1630,13 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
>  				call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
>  							dev);
>  				call_netdevice_notifier(nb, NETDEV_DOWN, dev);
> +				call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
> +							dev);

	Same here, split loop for NETDEV_DOWN_BATCH because
arp_ifdown_all is slow.

>  			}
>  			call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>  		}
> +		call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> +					net->loopback_dev);
>  	}
>  unlock:
>  	rtnl_unlock();

>  static void rollback_registered_many(struct list_head *head)
>  {
>  	struct net_device *dev, *tmp;
> +	struct net *net, *net_tmp;
>  	LIST_HEAD(close_head);
> +	LIST_HEAD(net_head);
>  
>  	BUG_ON(dev_boot_phase);
>  	ASSERT_RTNL();
> @@ -6465,8 +6489,6 @@ static void rollback_registered_many(struct list_head *head)
>  	synchronize_net();
>  
>  	list_for_each_entry(dev, head, unreg_list) {
> -		struct sk_buff *skb = NULL;
> -
>  		/* Shutdown queueing discipline. */
>  		dev_shutdown(dev);
>  
> @@ -6475,6 +6497,20 @@ static void rollback_registered_many(struct list_head *head)
>  		   this device. They should clean all the things.
>  		*/
>  		call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
> +	}
> +
> +	/* call batch notifiers which act on net namespaces */
> +	list_for_each_entry(dev, head, unreg_list) {
> +		net_add_event_list(&net_head, dev_net(dev));

	Looks like we can move the above net_add_event_list with
the comment into the previous loop after NETDEV_UNREGISTER,
we will save some cycles.

> +	}
> +	list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
> +		call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
> +					 net->loopback_dev);
> +		net_del_event_list(net);
> +	}
> +
> +	list_for_each_entry(dev, head, unreg_list) {
> +		struct sk_buff *skb = NULL;
>  
>  		if (!dev->rtnl_link_ops ||
>  		    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)

Regards

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
  2016-02-05 16:04   ` Sergei Shtylyov
@ 2016-02-07  6:09     ` Salam Noureddine
  2016-02-07 10:24       ` Sergei Shtylyov
  0 siblings, 1 reply; 12+ messages in thread
From: Salam Noureddine @ 2016-02-07  6:09 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Julian Anastasov,
	Network Development

On Fri, Feb 5, 2016 at 8:04 AM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> On 02/05/2016 02:35 AM, Salam Noureddine wrote:

>>
>>         if (event == NETDEV_UNREGISTER) {
>> -               fib_disable_ip(dev, event, true);
>> +               if (fib_sync_down_dev(dev, event, true))
>> +                       net->ipv4.needs_fib_flush = true;
>>                 rt_flush_dev(dev);
>>                 return NOTIFY_DONE;
>>         }
>>
>> +       if (event == NETDEV_UNREGISTER_BATCH || event ==
>> NETDEV_DOWN_BATCH) {
>> +               if (net->ipv4.needs_fib_flush) {
>> +                       fib_flush(net);
>> +                       net->ipv4.needs_fib_flush = false;
>> +               }
>> +               rt_cache_flush(net);
>> +               arp_ifdown_all();
>> +               return NOTIFY_DONE;
>> +       }
>> +
>
>
>    I'd convert to *switch* the above 2 *if*'s...
>
> [...]
>
> MBR, Sergei
>

I could do that.

Thanks,

Salam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
  2016-02-06 18:58   ` Julian Anastasov
@ 2016-02-07  6:25     ` Salam Noureddine
  2016-02-07 10:09       ` Julian Anastasov
  0 siblings, 1 reply; 12+ messages in thread
From: Salam Noureddine @ 2016-02-07  6:25 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Network Development

On Sat, Feb 6, 2016 at 10:58 AM, Julian Anastasov <ja@ssi.bg> wrote:

>
>         I now see that we should split the loop
> here, so that NETDEV_DOWN_BATCH is called only once per net:
>
>                 bool down = false;
>
>                 for_each_netdev(net, dev) {
>                         if (dev == last)
>                                 break;
>
>                         if (dev->flags & IFF_UP) {
>                                 call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
>                                                         dev);
>                                 call_netdevice_notifier(nb, NETDEV_DOWN, dev);
>                                 down = true;
>                         }
>                 }
>
>                 rt_cache_flush and arp_ifdown_all will be called
>                 on NETDEV_UNREGISTER_BATCH, so use 'down' flag:
>
>                 if (down)
>                         call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
>                                                 net->loopback_dev);
>                 for_each_netdev(net, dev) {
>                         if (dev == last)
>                                 goto outroll;
>                         call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>                 }
>                 call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
>                                         net->loopback_dev);
>
>
>>                       }
>>                       call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>>               }
>> +             call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
>> +                                     net->loopback_dev);
>>       }
>>
>>  outroll:
>> +     call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
>>       raw_notifier_chain_unregister(&netdev_chain, nb);
>>       goto unlock;
>>  }

I am not sure we need to worry too much about optimizing the failure
code path to justify splitting into two loops.


>>  static void rollback_registered_many(struct list_head *head)
>>  {

>>       list_for_each_entry(dev, head, unreg_list) {
>> -             struct sk_buff *skb = NULL;
>> -
>>               /* Shutdown queueing discipline. */
>>               dev_shutdown(dev);
>>
>> @@ -6475,6 +6497,20 @@ static void rollback_registered_many(struct list_head *head)
>>                  this device. They should clean all the things.
>>               */
>>               call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>> +     }
>> +
>> +     /* call batch notifiers which act on net namespaces */
>> +     list_for_each_entry(dev, head, unreg_list) {
>> +             net_add_event_list(&net_head, dev_net(dev));
>
>         Looks like we can move the above net_add_event_list with
> the comment into the previous loop after NETDEV_UNREGISTER,
> we will save some cycles.
>

I didn't move it into the previous loop because the NETDEV_UNREGISTER
notifier can
end up calling rollback_registered_many (for example the vlan driver
unregistering
all vlans on top of a device) in which case we would be using the
event_list in the net
namespace.

> Julian Anastasov <ja@ssi.bg>

Thanks,

Salam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
  2016-02-07  6:25     ` Salam Noureddine
@ 2016-02-07 10:09       ` Julian Anastasov
  0 siblings, 0 replies; 12+ messages in thread
From: Julian Anastasov @ 2016-02-07 10:09 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Network Development


	Hello,

On Sat, 6 Feb 2016, Salam Noureddine wrote:

> On Sat, Feb 6, 2016 at 10:58 AM, Julian Anastasov <ja@ssi.bg> wrote:

> >> +     /* call batch notifiers which act on net namespaces */
> >> +     list_for_each_entry(dev, head, unreg_list) {
> >> +             net_add_event_list(&net_head, dev_net(dev));
> >
> >         Looks like we can move the above net_add_event_list with
> > the comment into the previous loop after NETDEV_UNREGISTER,
> > we will save some cycles.
> 
> I didn't move it into the previous loop because the NETDEV_UNREGISTER
> notifier can
> end up calling rollback_registered_many (for example the vlan driver
> unregistering
> all vlans on top of a device) in which case we would be using the
> event_list in the net
> namespace.

	ok. May be another safe option is to call it just before
NETDEV_UNREGISTER, so that if rollback_registered_many is called
recursively we will have single NETDEV_UNREGISTER_BATCH call.
Same should work for NETDEV_DOWN_BATCH in dev_close_many. But
these are small optimizations, so it is up to you to decide.

Regards

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
  2016-02-07  6:09     ` Salam Noureddine
@ 2016-02-07 10:24       ` Sergei Shtylyov
  0 siblings, 0 replies; 12+ messages in thread
From: Sergei Shtylyov @ 2016-02-07 10:24 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Julian Anastasov,
	Network Development

Hello.

On 2/7/2016 9:09 AM, Salam Noureddine wrote:

>>>
>>>          if (event == NETDEV_UNREGISTER) {
>>> -               fib_disable_ip(dev, event, true);
>>> +               if (fib_sync_down_dev(dev, event, true))
>>> +                       net->ipv4.needs_fib_flush = true;
>>>                  rt_flush_dev(dev);
>>>                  return NOTIFY_DONE;
>>>          }
>>>
>>> +       if (event == NETDEV_UNREGISTER_BATCH || event ==
>>> NETDEV_DOWN_BATCH) {
>>> +               if (net->ipv4.needs_fib_flush) {
>>> +                       fib_flush(net);
>>> +                       net->ipv4.needs_fib_flush = false;
>>> +               }
>>> +               rt_cache_flush(net);
>>> +               arp_ifdown_all();
>>> +               return NOTIFY_DONE;
>>> +       }
>>> +
>>
>>
>>     I'd convert to *switch* the above 2 *if*'s...

> I could do that.

    Please do it, in this patch.

> Thanks,
>
> Salam

MBR, Sergei

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-02-07 10:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
2016-02-06 18:58   ` Julian Anastasov
2016-02-07  6:25     ` Salam Noureddine
2016-02-07 10:09       ` Julian Anastasov
2016-02-04 23:35 ` [PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
2016-02-05 16:04   ` Sergei Shtylyov
2016-02-07  6:09     ` Salam Noureddine
2016-02-07 10:24       ` Sergei Shtylyov
2016-02-05 22:37 ` [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).