* [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown
@ 2016-02-04 23:35 Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
Cc: Salam Noureddine
Added changes suggested by Julian Anastasov in version 2.
fib_flush walks the whole fib in a net_namespace and is called for
each net_device being closed or unregistered. This can be very expensive
when dealing with 100k or more routes in the fib and removal of a lot
of interfaces. These four patches deal with this issue by calling fib_flush
just once for each net namespace and introduce a new function arp_ifdown_all
that does a similar optimization for the neighbour table.
The benchmark tests were run on linux-3.18.
Salam Noureddine (4):
net: add event_list to struct net and provide utility functions
net: dev: add batching to net_device notifiers
net: core: introduce neigh_ifdown_all for all down interfaces
net: fib: avoid calling fib_flush for each device when doing batch
close and unregister
include/linux/netdevice.h | 2 ++
include/net/arp.h | 1 +
include/net/neighbour.h | 1 +
include/net/net_namespace.h | 22 +++++++++++++++++++++
include/net/netns/ipv4.h | 1 +
net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++-----
net/core/neighbour.c | 38 ++++++++++++++++++++++++++++-------
net/core/net_namespace.c | 1 +
net/ipv4/arp.c | 4 ++++
net/ipv4/fib_frontend.c | 16 +++++++++++++--
10 files changed, 120 insertions(+), 14 deletions(-)
--
1.8.1.4
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions
2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
@ 2016-02-04 23:35 ` Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
` (3 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
Cc: Salam Noureddine
Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
include/net/net_namespace.h | 22 ++++++++++++++++++++++
net/core/net_namespace.c | 1 +
2 files changed, 23 insertions(+)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..6dbc0b2 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -58,6 +58,7 @@ struct net {
struct list_head list; /* list of network namespaces */
struct list_head cleanup_list; /* namespaces on death row */
struct list_head exit_list; /* Use only net_mutex */
+ struct list_head event_list; /* net_device notifier list */
struct user_namespace *user_ns; /* Owning user namespace */
spinlock_t nsid_lock;
@@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net)
atomic_inc(&net->fnhe_genid);
}
+#ifdef CONFIG_NET_NS
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+ if (list_empty(&net->event_list))
+ list_add_tail(&net->event_list, head);
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+ list_del_init(&net->event_list);
+}
+#else
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+}
+#endif
+
#endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..58e84ce 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -282,6 +282,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
net->user_ns = user_ns;
idr_init(&net->netns_ids);
spin_lock_init(&net->nsid_lock);
+ INIT_LIST_HEAD(&net->event_list);
list_for_each_entry(ops, &pernet_list, list) {
error = ops_init(ops, net);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
@ 2016-02-04 23:35 ` Salam Noureddine
2016-02-06 18:58 ` Julian Anastasov
2016-02-04 23:35 ` [PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces Salam Noureddine
` (2 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
Cc: Salam Noureddine
This can be used to optimize bringing down and unregsitering
net_devices by running certain cleanup operations only on the
net namespace instead of on each net_device.
Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
include/linux/netdevice.h | 2 ++
net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 45 insertions(+), 5 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20b814..1b12269 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2183,6 +2183,8 @@ struct netdev_lag_lower_state_info {
#define NETDEV_BONDING_INFO 0x0019
#define NETDEV_PRECHANGEUPPER 0x001A
#define NETDEV_CHANGELOWERSTATE 0x001B
+#define NETDEV_UNREGISTER_BATCH 0x001C
+#define NETDEV_DOWN_BATCH 0x001D
int register_netdevice_notifier(struct notifier_block *nb);
int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 914b4a2..dbd8995 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1439,6 +1439,8 @@ static int __dev_close(struct net_device *dev)
int dev_close_many(struct list_head *head, bool unlink)
{
struct net_device *dev, *tmp;
+ struct net *net, *net_tmp;
+ LIST_HEAD(net_head);
/* Remove the devices that don't need to be closed */
list_for_each_entry_safe(dev, tmp, head, close_list)
@@ -1447,13 +1449,22 @@ int dev_close_many(struct list_head *head, bool unlink)
__dev_close_many(head);
- list_for_each_entry_safe(dev, tmp, head, close_list) {
+ list_for_each_entry(dev, head, close_list) {
rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
call_netdevice_notifiers(NETDEV_DOWN, dev);
+ }
+
+ list_for_each_entry_safe(dev, tmp, head, close_list) {
+ net_add_event_list(&net_head, dev_net(dev));
if (unlink)
list_del_init(&dev->close_list);
}
+ list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
+ call_netdevice_notifiers(NETDEV_DOWN_BATCH, net->loopback_dev);
+ net_del_event_list(net);
+ }
+
return 0;
}
EXPORT_SYMBOL(dev_close_many);
@@ -1572,12 +1583,17 @@ rollback:
call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+ call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+ dev);
}
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
}
+ call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+ net->loopback_dev);
}
outroll:
+ call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
raw_notifier_chain_unregister(&netdev_chain, nb);
goto unlock;
}
@@ -1614,9 +1630,13 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+ call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+ dev);
}
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
}
+ call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+ net->loopback_dev);
}
unlock:
rtnl_unlock();
@@ -6187,10 +6207,12 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
if (changes & IFF_UP) {
- if (dev->flags & IFF_UP)
+ if (dev->flags & IFF_UP) {
call_netdevice_notifiers(NETDEV_UP, dev);
- else
+ } else {
call_netdevice_notifiers(NETDEV_DOWN, dev);
+ call_netdevice_notifiers(NETDEV_DOWN_BATCH, dev);
+ }
}
if (dev->flags & IFF_UP &&
@@ -6427,7 +6449,9 @@ static void net_set_todo(struct net_device *dev)
static void rollback_registered_many(struct list_head *head)
{
struct net_device *dev, *tmp;
+ struct net *net, *net_tmp;
LIST_HEAD(close_head);
+ LIST_HEAD(net_head);
BUG_ON(dev_boot_phase);
ASSERT_RTNL();
@@ -6465,8 +6489,6 @@ static void rollback_registered_many(struct list_head *head)
synchronize_net();
list_for_each_entry(dev, head, unreg_list) {
- struct sk_buff *skb = NULL;
-
/* Shutdown queueing discipline. */
dev_shutdown(dev);
@@ -6475,6 +6497,20 @@ static void rollback_registered_many(struct list_head *head)
this device. They should clean all the things.
*/
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ }
+
+ /* call batch notifiers which act on net namespaces */
+ list_for_each_entry(dev, head, unreg_list) {
+ net_add_event_list(&net_head, dev_net(dev));
+ }
+ list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
+ call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
+ net->loopback_dev);
+ net_del_event_list(net);
+ }
+
+ list_for_each_entry(dev, head, unreg_list) {
+ struct sk_buff *skb = NULL;
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
@@ -7065,6 +7101,7 @@ static void netdev_wait_allrefs(struct net_device *dev)
/* Rebroadcast unregister notification */
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
__rtnl_unlock();
rcu_barrier();
@@ -7581,6 +7618,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
the device is just moving and can keep their slaves up.
*/
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
rcu_barrier();
call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces
2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
@ 2016-02-04 23:35 ` Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
2016-02-05 22:37 ` [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
4 siblings, 0 replies; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
Cc: Salam Noureddine
This cleans up neighbour entries for all interfaces in the down
state, avoiding walking the whole neighbour table for each interface
being brought down.
Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
include/net/arp.h | 1 +
include/net/neighbour.h | 1 +
net/core/neighbour.c | 38 +++++++++++++++++++++++++++++++-------
net/ipv4/arp.c | 4 ++++
4 files changed, 37 insertions(+), 7 deletions(-)
diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..0efee66 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip,
const unsigned char *src_hw, const unsigned char *th);
int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir);
void arp_ifdown(struct net_device *dev);
+void arp_ifdown_all(void);
struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
struct net_device *dev, __be32 src_ip,
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 8b68384..8785d7b 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags);
void __neigh_set_probe_once(struct neighbour *neigh);
void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+int neigh_ifdown_all(struct neigh_table *tbl);
int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb);
int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f18ae91..bfbd97a 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -54,7 +54,8 @@ do { \
static void neigh_timer_handler(unsigned long arg);
static void __neigh_notify(struct neighbour *n, int type, int flags);
static void neigh_update_notify(struct neighbour *neigh);
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+ bool all_down);
#ifdef CONFIG_PROC_FS
static const struct file_operations neigh_stat_seq_fops;
@@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list)
}
}
-static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
+static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
+ bool all_down)
{
int i;
struct neigh_hash_table *nht;
@@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
np = &n->next;
continue;
}
+ if (!dev && n->dev && all_down) {
+ if (n->dev->flags & IFF_UP) {
+ np = &n->next;
+ continue;
+ }
+ }
rcu_assign_pointer(*np,
rcu_dereference_protected(n->next,
lockdep_is_held(&tbl->lock)));
@@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
{
write_lock_bh(&tbl->lock);
- neigh_flush_dev(tbl, dev);
+ neigh_flush_dev(tbl, dev, false);
write_unlock_bh(&tbl->lock);
}
EXPORT_SYMBOL(neigh_changeaddr);
@@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr);
int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
{
write_lock_bh(&tbl->lock);
- neigh_flush_dev(tbl, dev);
- pneigh_ifdown(tbl, dev);
+ neigh_flush_dev(tbl, dev, false);
+ pneigh_ifdown(tbl, dev, false);
write_unlock_bh(&tbl->lock);
del_timer_sync(&tbl->proxy_timer);
@@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
}
EXPORT_SYMBOL(neigh_ifdown);
+int neigh_ifdown_all(struct neigh_table *tbl)
+{
+ write_lock_bh(&tbl->lock);
+ neigh_flush_dev(tbl, NULL, true);
+ pneigh_ifdown(tbl, NULL, true);
+ write_unlock_bh(&tbl->lock);
+
+ del_timer_sync(&tbl->proxy_timer);
+ pneigh_queue_purge(&tbl->proxy_queue);
+ return 0;
+}
+EXPORT_SYMBOL(neigh_ifdown_all);
+
static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device *dev)
{
struct neighbour *n = NULL;
@@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, const void *pkey,
return -ENOENT;
}
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+ bool all_down)
{
struct pneigh_entry *n, **np;
u32 h;
@@ -653,7 +675,9 @@ static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
for (h = 0; h <= PNEIGH_HASHMASK; h++) {
np = &tbl->phash_buckets[h];
while ((n = *np) != NULL) {
- if (!dev || n->dev == dev) {
+ if ((!dev && !all_down) || (all_down && n->dev &&
+ !(n->dev->flags & IFF_UP)) ||
+ n->dev == dev) {
*np = n->next;
if (tbl->pdestructor)
tbl->pdestructor(n);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 59b3e0e..1328244 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1219,6 +1219,10 @@ void arp_ifdown(struct net_device *dev)
neigh_ifdown(&arp_tbl, dev);
}
+void arp_ifdown_all(void)
+{
+ neigh_ifdown_all(&arp_tbl);
+}
/*
* Called once on startup.
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
` (2 preceding siblings ...)
2016-02-04 23:35 ` [PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces Salam Noureddine
@ 2016-02-04 23:35 ` Salam Noureddine
2016-02-05 16:04 ` Sergei Shtylyov
2016-02-05 22:37 ` [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
4 siblings, 1 reply; 12+ messages in thread
From: Salam Noureddine @ 2016-02-04 23:35 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Julian Anastasov, netdev
Cc: Salam Noureddine
Call fib_flush at the end when closing or unregistering multiple
devices. This can save walking the fib many times and greatly
reduce rtnl_lock hold time when unregistering many devices with
a fib having hundreds of thousands of routes.
Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
include/net/netns/ipv4.h | 1 +
net/ipv4/fib_frontend.c | 16 ++++++++++++++--
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d75be32..d59a078 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -111,5 +111,6 @@ struct netns_ipv4 {
#endif
#endif
atomic_t rt_genid;
+ bool needs_fib_flush;
};
#endif
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4734475..808426e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
unsigned int flags;
if (event == NETDEV_UNREGISTER) {
- fib_disable_ip(dev, event, true);
+ if (fib_sync_down_dev(dev, event, true))
+ net->ipv4.needs_fib_flush = true;
rt_flush_dev(dev);
return NOTIFY_DONE;
}
+ if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) {
+ if (net->ipv4.needs_fib_flush) {
+ fib_flush(net);
+ net->ipv4.needs_fib_flush = false;
+ }
+ rt_cache_flush(net);
+ arp_ifdown_all();
+ return NOTIFY_DONE;
+ }
+
in_dev = __in_dev_get_rtnl(dev);
if (!in_dev)
return NOTIFY_DONE;
@@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
rt_cache_flush(net);
break;
case NETDEV_DOWN:
- fib_disable_ip(dev, event, false);
+ if (fib_sync_down_dev(dev, event, false))
+ net->ipv4.needs_fib_flush = true;
break;
case NETDEV_CHANGE:
flags = dev_get_flags(dev);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
2016-02-04 23:35 ` [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
@ 2016-02-05 16:04 ` Sergei Shtylyov
2016-02-07 6:09 ` Salam Noureddine
0 siblings, 1 reply; 12+ messages in thread
From: Sergei Shtylyov @ 2016-02-05 16:04 UTC (permalink / raw)
To: Salam Noureddine, David S. Miller, Eric Dumazet, Jiri Pirko,
Alexei Starovoitov, Daniel Borkmann, Eric W. Biederman,
Julian Anastasov, netdev
On 02/05/2016 02:35 AM, Salam Noureddine wrote:
> Call fib_flush at the end when closing or unregistering multiple
> devices. This can save walking the fib many times and greatly
> reduce rtnl_lock hold time when unregistering many devices with
> a fib having hundreds of thousands of routes.
>
> Signed-off-by: Salam Noureddine <noureddine@arista.com>
> ---
> include/net/netns/ipv4.h | 1 +
> net/ipv4/fib_frontend.c | 16 ++++++++++++++--
> 2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index d75be32..d59a078 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
[...]
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 4734475..808426e 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
> unsigned int flags;
>
> if (event == NETDEV_UNREGISTER) {
> - fib_disable_ip(dev, event, true);
> + if (fib_sync_down_dev(dev, event, true))
> + net->ipv4.needs_fib_flush = true;
> rt_flush_dev(dev);
> return NOTIFY_DONE;
> }
>
> + if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) {
> + if (net->ipv4.needs_fib_flush) {
> + fib_flush(net);
> + net->ipv4.needs_fib_flush = false;
> + }
> + rt_cache_flush(net);
> + arp_ifdown_all();
> + return NOTIFY_DONE;
> + }
> +
I'd convert to *switch* the above 2 *if*'s...
[...]
MBR, Sergei
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown
2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
` (3 preceding siblings ...)
2016-02-04 23:35 ` [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
@ 2016-02-05 22:37 ` Salam Noureddine
4 siblings, 0 replies; 12+ messages in thread
From: Salam Noureddine @ 2016-02-05 22:37 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Julian Anastasov,
Network Development
Cc: Salam Noureddine
Forgot to mention the rtnl_lock hold time gain with these changes.
I got the following benchmark results on one of our switches.
Without this patch, deleting 1k interfaces with 100k routes in the fib held
the rtnl_lock for 13 seconds. With the patch, rtnl_lock hold time went down
to 5 seconds. The gain is even more pronounced with 512k routes in the FIB.
In this case, without the patch, rtnl_lock was held for 36 seconds and with
the patch it was held for 5.5 seconds.
On Thu, Feb 4, 2016 at 3:35 PM, Salam Noureddine <noureddine@arista.com> wrote:
> Added changes suggested by Julian Anastasov in version 2.
>
> fib_flush walks the whole fib in a net_namespace and is called for
> each net_device being closed or unregistered. This can be very expensive
> when dealing with 100k or more routes in the fib and removal of a lot
> of interfaces. These four patches deal with this issue by calling fib_flush
> just once for each net namespace and introduce a new function arp_ifdown_all
> that does a similar optimization for the neighbour table.
>
> The benchmark tests were run on linux-3.18.
>
> Salam Noureddine (4):
> net: add event_list to struct net and provide utility functions
> net: dev: add batching to net_device notifiers
> net: core: introduce neigh_ifdown_all for all down interfaces
> net: fib: avoid calling fib_flush for each device when doing batch
> close and unregister
>
> include/linux/netdevice.h | 2 ++
> include/net/arp.h | 1 +
> include/net/neighbour.h | 1 +
> include/net/net_namespace.h | 22 +++++++++++++++++++++
> include/net/netns/ipv4.h | 1 +
> net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++-----
> net/core/neighbour.c | 38 ++++++++++++++++++++++++++++-------
> net/core/net_namespace.c | 1 +
> net/ipv4/arp.c | 4 ++++
> net/ipv4/fib_frontend.c | 16 +++++++++++++--
> 10 files changed, 120 insertions(+), 14 deletions(-)
>
> --
> 1.8.1.4
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
2016-02-04 23:35 ` [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
@ 2016-02-06 18:58 ` Julian Anastasov
2016-02-07 6:25 ` Salam Noureddine
0 siblings, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2016-02-06 18:58 UTC (permalink / raw)
To: Salam Noureddine
Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, netdev
Hello,
On Thu, 4 Feb 2016, Salam Noureddine wrote:
> @@ -1572,12 +1583,17 @@ rollback:
> call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
> dev);
> call_netdevice_notifier(nb, NETDEV_DOWN, dev);
> + call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
> + dev);
I now see that we should split the loop
here, so that NETDEV_DOWN_BATCH is called only once per net:
bool down = false;
for_each_netdev(net, dev) {
if (dev == last)
break;
if (dev->flags & IFF_UP) {
call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
down = true;
}
}
rt_cache_flush and arp_ifdown_all will be called
on NETDEV_UNREGISTER_BATCH, so use 'down' flag:
if (down)
call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
net->loopback_dev);
for_each_netdev(net, dev) {
if (dev == last)
goto outroll;
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
}
call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
net->loopback_dev);
> }
> call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> }
> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> + net->loopback_dev);
> }
>
> outroll:
> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
> raw_notifier_chain_unregister(&netdev_chain, nb);
> goto unlock;
> }
> @@ -1614,9 +1630,13 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
> call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
> dev);
> call_netdevice_notifier(nb, NETDEV_DOWN, dev);
> + call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
> + dev);
Same here, split loop for NETDEV_DOWN_BATCH because
arp_ifdown_all is slow.
> }
> call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> }
> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> + net->loopback_dev);
> }
> unlock:
> rtnl_unlock();
> static void rollback_registered_many(struct list_head *head)
> {
> struct net_device *dev, *tmp;
> + struct net *net, *net_tmp;
> LIST_HEAD(close_head);
> + LIST_HEAD(net_head);
>
> BUG_ON(dev_boot_phase);
> ASSERT_RTNL();
> @@ -6465,8 +6489,6 @@ static void rollback_registered_many(struct list_head *head)
> synchronize_net();
>
> list_for_each_entry(dev, head, unreg_list) {
> - struct sk_buff *skb = NULL;
> -
> /* Shutdown queueing discipline. */
> dev_shutdown(dev);
>
> @@ -6475,6 +6497,20 @@ static void rollback_registered_many(struct list_head *head)
> this device. They should clean all the things.
> */
> call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
> + }
> +
> + /* call batch notifiers which act on net namespaces */
> + list_for_each_entry(dev, head, unreg_list) {
> + net_add_event_list(&net_head, dev_net(dev));
Looks like we can move the above net_add_event_list with
the comment into the previous loop after NETDEV_UNREGISTER,
we will save some cycles.
> + }
> + list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
> + call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
> + net->loopback_dev);
> + net_del_event_list(net);
> + }
> +
> + list_for_each_entry(dev, head, unreg_list) {
> + struct sk_buff *skb = NULL;
>
> if (!dev->rtnl_link_ops ||
> dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
Regards
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
2016-02-05 16:04 ` Sergei Shtylyov
@ 2016-02-07 6:09 ` Salam Noureddine
2016-02-07 10:24 ` Sergei Shtylyov
0 siblings, 1 reply; 12+ messages in thread
From: Salam Noureddine @ 2016-02-07 6:09 UTC (permalink / raw)
To: Sergei Shtylyov
Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Julian Anastasov,
Network Development
On Fri, Feb 5, 2016 at 8:04 AM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> On 02/05/2016 02:35 AM, Salam Noureddine wrote:
>>
>> if (event == NETDEV_UNREGISTER) {
>> - fib_disable_ip(dev, event, true);
>> + if (fib_sync_down_dev(dev, event, true))
>> + net->ipv4.needs_fib_flush = true;
>> rt_flush_dev(dev);
>> return NOTIFY_DONE;
>> }
>>
>> + if (event == NETDEV_UNREGISTER_BATCH || event ==
>> NETDEV_DOWN_BATCH) {
>> + if (net->ipv4.needs_fib_flush) {
>> + fib_flush(net);
>> + net->ipv4.needs_fib_flush = false;
>> + }
>> + rt_cache_flush(net);
>> + arp_ifdown_all();
>> + return NOTIFY_DONE;
>> + }
>> +
>
>
> I'd convert to *switch* the above 2 *if*'s...
>
> [...]
>
> MBR, Sergei
>
I could do that.
Thanks,
Salam
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
2016-02-06 18:58 ` Julian Anastasov
@ 2016-02-07 6:25 ` Salam Noureddine
2016-02-07 10:09 ` Julian Anastasov
0 siblings, 1 reply; 12+ messages in thread
From: Salam Noureddine @ 2016-02-07 6:25 UTC (permalink / raw)
To: Julian Anastasov
Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Network Development
On Sat, Feb 6, 2016 at 10:58 AM, Julian Anastasov <ja@ssi.bg> wrote:
>
> I now see that we should split the loop
> here, so that NETDEV_DOWN_BATCH is called only once per net:
>
> bool down = false;
>
> for_each_netdev(net, dev) {
> if (dev == last)
> break;
>
> if (dev->flags & IFF_UP) {
> call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
> dev);
> call_netdevice_notifier(nb, NETDEV_DOWN, dev);
> down = true;
> }
> }
>
> rt_cache_flush and arp_ifdown_all will be called
> on NETDEV_UNREGISTER_BATCH, so use 'down' flag:
>
> if (down)
> call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
> net->loopback_dev);
> for_each_netdev(net, dev) {
> if (dev == last)
> goto outroll;
> call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> }
> call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> net->loopback_dev);
>
>
>> }
>> call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>> }
>> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
>> + net->loopback_dev);
>> }
>>
>> outroll:
>> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
>> raw_notifier_chain_unregister(&netdev_chain, nb);
>> goto unlock;
>> }
I am not sure we need to worry too much about optimizing the failure
code path to justify splitting into two loops.
>> static void rollback_registered_many(struct list_head *head)
>> {
>> list_for_each_entry(dev, head, unreg_list) {
>> - struct sk_buff *skb = NULL;
>> -
>> /* Shutdown queueing discipline. */
>> dev_shutdown(dev);
>>
>> @@ -6475,6 +6497,20 @@ static void rollback_registered_many(struct list_head *head)
>> this device. They should clean all the things.
>> */
>> call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>> + }
>> +
>> + /* call batch notifiers which act on net namespaces */
>> + list_for_each_entry(dev, head, unreg_list) {
>> + net_add_event_list(&net_head, dev_net(dev));
>
> Looks like we can move the above net_add_event_list with
> the comment into the previous loop after NETDEV_UNREGISTER,
> we will save some cycles.
>
I didn't move it into the previous loop because the NETDEV_UNREGISTER
notifier can
end up calling rollback_registered_many (for example the vlan driver
unregistering
all vlans on top of a device) in which case we would be using the
event_list in the net
namespace.
> Julian Anastasov <ja@ssi.bg>
Thanks,
Salam
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
2016-02-07 6:25 ` Salam Noureddine
@ 2016-02-07 10:09 ` Julian Anastasov
0 siblings, 0 replies; 12+ messages in thread
From: Julian Anastasov @ 2016-02-07 10:09 UTC (permalink / raw)
To: Salam Noureddine
Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Network Development
Hello,
On Sat, 6 Feb 2016, Salam Noureddine wrote:
> On Sat, Feb 6, 2016 at 10:58 AM, Julian Anastasov <ja@ssi.bg> wrote:
> >> + /* call batch notifiers which act on net namespaces */
> >> + list_for_each_entry(dev, head, unreg_list) {
> >> + net_add_event_list(&net_head, dev_net(dev));
> >
> > Looks like we can move the above net_add_event_list with
> > the comment into the previous loop after NETDEV_UNREGISTER,
> > we will save some cycles.
>
> I didn't move it into the previous loop because the NETDEV_UNREGISTER
> notifier can
> end up calling rollback_registered_many (for example the vlan driver
> unregistering
> all vlans on top of a device) in which case we would be using the
> event_list in the net
> namespace.
ok. May be another safe option is to call it just before
NETDEV_UNREGISTER, so that if rollback_registered_many is called
recursively we will have single NETDEV_UNREGISTER_BATCH call.
Same should work for NETDEV_DOWN_BATCH in dev_close_many. But
these are small optimizations, so it is up to you to decide.
Regards
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
2016-02-07 6:09 ` Salam Noureddine
@ 2016-02-07 10:24 ` Sergei Shtylyov
0 siblings, 0 replies; 12+ messages in thread
From: Sergei Shtylyov @ 2016-02-07 10:24 UTC (permalink / raw)
To: Salam Noureddine
Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
Daniel Borkmann, Eric W. Biederman, Julian Anastasov,
Network Development
Hello.
On 2/7/2016 9:09 AM, Salam Noureddine wrote:
>>>
>>> if (event == NETDEV_UNREGISTER) {
>>> - fib_disable_ip(dev, event, true);
>>> + if (fib_sync_down_dev(dev, event, true))
>>> + net->ipv4.needs_fib_flush = true;
>>> rt_flush_dev(dev);
>>> return NOTIFY_DONE;
>>> }
>>>
>>> + if (event == NETDEV_UNREGISTER_BATCH || event ==
>>> NETDEV_DOWN_BATCH) {
>>> + if (net->ipv4.needs_fib_flush) {
>>> + fib_flush(net);
>>> + net->ipv4.needs_fib_flush = false;
>>> + }
>>> + rt_cache_flush(net);
>>> + arp_ifdown_all();
>>> + return NOTIFY_DONE;
>>> + }
>>> +
>>
>>
>> I'd convert to *switch* the above 2 *if*'s...
> I could do that.
Please do it, in this patch.
> Thanks,
>
> Salam
MBR, Sergei
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-02-07 10:24 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-04 23:35 [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
2016-02-06 18:58 ` Julian Anastasov
2016-02-07 6:25 ` Salam Noureddine
2016-02-07 10:09 ` Julian Anastasov
2016-02-04 23:35 ` [PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces Salam Noureddine
2016-02-04 23:35 ` [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
2016-02-05 16:04 ` Sergei Shtylyov
2016-02-07 6:09 ` Salam Noureddine
2016-02-07 10:24 ` Sergei Shtylyov
2016-02-05 22:37 ` [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).