* [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device. [not found] ` <m14p2l4v2l.fsf_-_@frodo.ebiederm.org> @ 2008-11-05 23:25 ` Eric W. Biederman 2008-11-05 23:27 ` [PATCH 3/3] net: Don't leak packets when a netns is going down Eric W. Biederman 2008-11-06 0:00 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device David Miller 0 siblings, 2 replies; 12+ messages in thread From: Eric W. Biederman @ 2008-11-05 23:25 UTC (permalink / raw) To: David Miller Cc: Daniel Lezcano, Linux Containers, Denis V. Lunev, Pavel Emelyanov, netdev I was recently hunting a bug that occurred in network namespace cleanup. In looking at the code it became apparrent that we have and will continue to have cases where if we have anything going on in a network namespace there will be assumptions that the loopback device is present. Things like sending igmp unsubscribe messages when we bring down network devices invokes the routing code which assumes that at least the loopback driver is present. Therefore to avoid magic initcall ordering hackery that is hard to follow and hard to get right insert a call to register the loopback device directly from net_dev_init(). This guarantes that the loopback device is the first device registered and the last network device to go away. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/net/loopback.c | 13 ++----------- include/linux/netdevice.h | 1 + net/core/dev.c | 12 ++++++++++++ 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index 91d0858..c4516b5 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -204,17 +204,8 @@ static __net_exit void loopback_net_exit(struct net *net) unregister_netdev(dev); } -static struct pernet_operations __net_initdata loopback_net_ops = { +/* Registered in net/core/dev.c */ +struct pernet_operations __net_initdata loopback_net_ops = { .init = loopback_net_init, .exit = loopback_net_exit, }; - -static int __init loopback_init(void) -{ - return register_pernet_device(&loopback_net_ops); -} - -/* Loopback is special. It should be initialized before any other network - * device and network subsystem. - */ -fs_initcall(loopback_init); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f1b0dbe..12d7f44 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1766,6 +1766,7 @@ static inline int skb_bond_should_drop(struct sk_buff *skb) return 0; } +extern struct pernet_operations __net_initdata loopback_net_ops; #endif /* __KERNEL__ */ #endif /* _LINUX_DEV_H */ diff --git a/net/core/dev.c b/net/core/dev.c index 9475f3e..811507c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4904,6 +4904,18 @@ static int __init net_dev_init(void) if (register_pernet_subsys(&netdev_net_ops)) goto out; + /* The loopback device is special if any other network devices + * is present in a network namespace the loopback device must + * be present. Since we now dynamically allocate and free the + * loopback device ensure this invariant is maintained by + * keeping the loopback device as the first device on the + * list of network devices. Ensuring the loopback devices + * is the first device that appears and the last network device + * that disappears. + */ + if (register_pernet_device(&loopback_net_ops)) + goto out; + if (register_pernet_device(&default_device_ops)) goto out; -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] net: Don't leak packets when a netns is going down 2008-11-05 23:25 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device Eric W. Biederman @ 2008-11-05 23:27 ` Eric W. Biederman 2008-11-06 0:00 ` David Miller 2008-11-06 0:00 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device David Miller 1 sibling, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2008-11-05 23:27 UTC (permalink / raw) To: David Miller Cc: Daniel Lezcano, Linux Containers, Denis V. Lunev, Pavel Emelyanov, netdev I have been tracking for a while a case where when the network namespace exits the cleanup gets stck in an endless precessess of: unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 It turns out that if you listen on a multicast address an unsubscribe packet is sent when the network device goes down. If you shutdown the network namespace without carefully cleaning up this can trigger the unsubscribe packet to be sent over the loopback interface while the network namespace is going down. All of which is fine except when we drop the packet and forget to free it leaking the skb and the dst entry attached to. As it turns out the dst entry hold a reference to the idev which holds the dev and keeps everything from being cleaned up. Yuck! By fixing my earlier thinko and add the needed kfree_skb and everything cleans up beautifully. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- net/core/dev.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 811507c..a0c6060 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2253,8 +2253,10 @@ int netif_receive_skb(struct sk_buff *skb) rcu_read_lock(); /* Don't receive packets in an exiting network namespace */ - if (!net_alive(dev_net(skb->dev))) + if (!net_alive(dev_net(skb->dev))) { + kfree_skb(skb); goto out; + } #ifdef CONFIG_NET_CLS_ACT if (skb->tc_verd & TC_NCLS) { -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] net: Don't leak packets when a netns is going down 2008-11-05 23:27 ` [PATCH 3/3] net: Don't leak packets when a netns is going down Eric W. Biederman @ 2008-11-06 0:00 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2008-11-06 0:00 UTC (permalink / raw) To: ebiederm; +Cc: dlezcano, containers, den, xemul, netdev From: ebiederm@xmission.com (Eric W. Biederman) Date: Wed, 05 Nov 2008 15:27:34 -0800 > > I have been tracking for a while a case where when the > network namespace exits the cleanup gets stck in an > endless precessess of: > > unregister_netdevice: waiting for lo to become free. Usage count = 3 > unregister_netdevice: waiting for lo to become free. Usage count = 3 > unregister_netdevice: waiting for lo to become free. Usage count = 3 > unregister_netdevice: waiting for lo to become free. Usage count = 3 > unregister_netdevice: waiting for lo to become free. Usage count = 3 > unregister_netdevice: waiting for lo to become free. Usage count = 3 > unregister_netdevice: waiting for lo to become free. Usage count = 3 > > It turns out that if you listen on a multicast address an unsubscribe > packet is sent when the network device goes down. If you shutdown > the network namespace without carefully cleaning up this can trigger > the unsubscribe packet to be sent over the loopback interface while > the network namespace is going down. > > All of which is fine except when we drop the packet and forget to > free it leaking the skb and the dst entry attached to. As it > turns out the dst entry hold a reference to the idev which holds > the dev and keeps everything from being cleaned up. Yuck! > > By fixing my earlier thinko and add the needed kfree_skb and everything > cleans up beautifully. > > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Applied. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device. 2008-11-05 23:25 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device Eric W. Biederman 2008-11-05 23:27 ` [PATCH 3/3] net: Don't leak packets when a netns is going down Eric W. Biederman @ 2008-11-06 0:00 ` David Miller 2008-11-06 13:02 ` Eric W. Biederman 1 sibling, 1 reply; 12+ messages in thread From: David Miller @ 2008-11-06 0:00 UTC (permalink / raw) To: ebiederm; +Cc: dlezcano, containers, den, xemul, netdev From: ebiederm@xmission.com (Eric W. Biederman) Date: Wed, 05 Nov 2008 15:25:39 -0800 > > I was recently hunting a bug that occurred in network namespace > cleanup. In looking at the code it became apparrent that we have > and will continue to have cases where if we have anything going > on in a network namespace there will be assumptions that the > loopback device is present. Things like sending igmp unsubscribe > messages when we bring down network devices invokes the routing > code which assumes that at least the loopback driver is present. > > Therefore to avoid magic initcall ordering hackery that is hard > to follow and hard to get right insert a call to register the > loopback device directly from net_dev_init(). This guarantes > that the loopback device is the first device registered and > the last network device to go away. > > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Applied. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device. 2008-11-06 0:00 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device David Miller @ 2008-11-06 13:02 ` Eric W. Biederman 2008-11-06 15:34 ` [PATCH 1/2] net: fib_rules ordering fixes Eric W. Biederman ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Eric W. Biederman @ 2008-11-06 13:02 UTC (permalink / raw) To: David Miller; +Cc: dlezcano, containers, den, xemul, netdev Dave can you please drop this one for the moment. I cleaned up my patch after the basic testing was over and the result is a kernel that won't boot. So if we can prevent this patch from spreading and breaking a git-bisect that would be great. I will follow up in a moment with a properly tested version. My apologies. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] net: fib_rules ordering fixes. 2008-11-06 13:02 ` Eric W. Biederman @ 2008-11-06 15:34 ` Eric W. Biederman 2008-11-06 15:36 ` [PATCH 2/2] net: Guaranetee the proper ordering of the loopback device. v2 Eric W. Biederman 2008-11-08 6:54 ` [PATCH 1/2] net: fib_rules ordering fixes David Miller 2008-11-06 21:20 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device David Miller 2008-11-08 6:53 ` David Miller 2 siblings, 2 replies; 12+ messages in thread From: Eric W. Biederman @ 2008-11-06 15:34 UTC (permalink / raw) To: David Miller; +Cc: dlezcano, containers, den, xemul, netdev We need to setup the network namespace state before we register the notifier. Otherwise if a network device is already registered we get a nasty NULL pointer dereference. Signed-off-by: Eric W. Biederman <ebiederm@maxwell.aristanetworks.com> --- net/core/fib_rules.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 79de3b1..32b3a01 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -664,17 +664,18 @@ static int __init fib_rules_init(void) rtnl_register(PF_UNSPEC, RTM_DELRULE, fib_nl_delrule, NULL); rtnl_register(PF_UNSPEC, RTM_GETRULE, NULL, fib_nl_dumprule); - err = register_netdevice_notifier(&fib_rules_notifier); + err = register_pernet_subsys(&fib_rules_net_ops); if (err < 0) goto fail; - err = register_pernet_subsys(&fib_rules_net_ops); + err = register_netdevice_notifier(&fib_rules_notifier); if (err < 0) goto fail_unregister; + return 0; fail_unregister: - unregister_netdevice_notifier(&fib_rules_notifier); + unregister_pernet_subsys(&fib_rules_net_ops); fail: rtnl_unregister(PF_UNSPEC, RTM_NEWRULE); rtnl_unregister(PF_UNSPEC, RTM_DELRULE); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] net: Guaranetee the proper ordering of the loopback device. v2 2008-11-06 15:34 ` [PATCH 1/2] net: fib_rules ordering fixes Eric W. Biederman @ 2008-11-06 15:36 ` Eric W. Biederman 2008-11-08 6:55 ` David Miller 2008-11-08 6:54 ` [PATCH 1/2] net: fib_rules ordering fixes David Miller 1 sibling, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2008-11-06 15:36 UTC (permalink / raw) To: David Miller; +Cc: dlezcano, containers, den, xemul, netdev I was recently hunting a bug that occurred in network namespace cleanup. In looking at the code it became apparrent that we have and will continue to have cases where if we have anything going on in a network namespace there will be assumptions that the loopback device is present. Things like sending igmp unsubscribe messages when we bring down network devices invokes the routing code which assumes that at least the loopback driver is present. Therefore to avoid magic initcall ordering hackery that is hard to follow and hard to get right insert a call to register the loopback device directly from net_dev_init(). This guarantes that the loopback device is the first device registered and the last network device to go away. But do it carefully so we register the loopback device after we clear dev_boot_phase. Signed-off-by: Eric W. Biederman <ebiederm@maxwell.aristanetworks.com> --- drivers/net/loopback.c | 13 ++----------- include/linux/netdevice.h | 1 + net/core/dev.c | 22 +++++++++++++++++----- 3 files changed, 20 insertions(+), 16 deletions(-) diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index 3b43bfd..bcc9945 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -215,17 +215,8 @@ static __net_exit void loopback_net_exit(struct net *net) unregister_netdev(dev); } -static struct pernet_operations __net_initdata loopback_net_ops = { +/* Registered in net/core/dev.c */ +struct pernet_operations __net_initdata loopback_net_ops = { .init = loopback_net_init, .exit = loopback_net_exit, }; - -static int __init loopback_init(void) -{ - return register_pernet_device(&loopback_net_ops); -} - -/* Loopback is special. It should be initialized before any other network - * device and network subsystem. - */ -fs_initcall(loopback_init); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 488c56e..c7004a5 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1726,6 +1726,7 @@ static inline int skb_bond_should_drop(struct sk_buff *skb) return 0; } +extern struct pernet_operations __net_initdata loopback_net_ops; #endif /* __KERNEL__ */ #endif /* _LINUX_DEV_H */ diff --git a/net/core/dev.c b/net/core/dev.c index 3785c4b..cf54670 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4824,9 +4824,6 @@ static int __init net_dev_init(void) if (register_pernet_subsys(&netdev_net_ops)) goto out; - if (register_pernet_device(&default_device_ops)) - goto out; - /* * Initialise the packet receive queues. */ @@ -4843,10 +4840,25 @@ static int __init net_dev_init(void) queue->backlog.weight = weight_p; } - netdev_dma_register(); - dev_boot_phase = 0; + /* The loopback device is special if any other network devices + * is present in a network namespace the loopback device must + * be present. Since we now dynamically allocate and free the + * loopback device ensure this invariant is maintained by + * keeping the loopback device as the first device on the + * list of network devices. Ensuring the loopback devices + * is the first device that appears and the last network device + * that disappears. + */ + if (register_pernet_device(&loopback_net_ops)) + goto out; + + if (register_pernet_device(&default_device_ops)) + goto out; + + netdev_dma_register(); + open_softirq(NET_TX_SOFTIRQ, net_tx_action); open_softirq(NET_RX_SOFTIRQ, net_rx_action); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] net: Guaranetee the proper ordering of the loopback device. v2 2008-11-06 15:36 ` [PATCH 2/2] net: Guaranetee the proper ordering of the loopback device. v2 Eric W. Biederman @ 2008-11-08 6:55 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2008-11-08 6:55 UTC (permalink / raw) To: ebiederm; +Cc: dlezcano, containers, den, xemul, netdev From: ebiederm@xmission.com (Eric W. Biederman) Date: Thu, 06 Nov 2008 07:36:00 -0800 > I was recently hunting a bug that occurred in network namespace > cleanup. In looking at the code it became apparrent that we have > and will continue to have cases where if we have anything going > on in a network namespace there will be assumptions that the > loopback device is present. Things like sending igmp unsubscribe > messages when we bring down network devices invokes the routing > code which assumes that at least the loopback driver is present. > > Therefore to avoid magic initcall ordering hackery that is hard > to follow and hard to get right insert a call to register the > loopback device directly from net_dev_init(). This guarantes > that the loopback device is the first device registered and > the last network device to go away. > > But do it carefully so we register the loopback device after > we clear dev_boot_phase. > > Signed-off-by: Eric W. Biederman <ebiederm@maxwell.aristanetworks.com> Applied. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] net: fib_rules ordering fixes. 2008-11-06 15:34 ` [PATCH 1/2] net: fib_rules ordering fixes Eric W. Biederman 2008-11-06 15:36 ` [PATCH 2/2] net: Guaranetee the proper ordering of the loopback device. v2 Eric W. Biederman @ 2008-11-08 6:54 ` David Miller 1 sibling, 0 replies; 12+ messages in thread From: David Miller @ 2008-11-08 6:54 UTC (permalink / raw) To: ebiederm; +Cc: dlezcano, containers, den, xemul, netdev From: ebiederm@xmission.com (Eric W. Biederman) Date: Thu, 06 Nov 2008 07:34:28 -0800 > > We need to setup the network namespace state before we register > the notifier. Otherwise if a network device is already registered > we get a nasty NULL pointer dereference. > > Signed-off-by: Eric W. Biederman <ebiederm@maxwell.aristanetworks.com> Applied. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device. 2008-11-06 13:02 ` Eric W. Biederman 2008-11-06 15:34 ` [PATCH 1/2] net: fib_rules ordering fixes Eric W. Biederman @ 2008-11-06 21:20 ` David Miller 2008-11-08 6:53 ` David Miller 2 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2008-11-06 21:20 UTC (permalink / raw) To: ebiederm; +Cc: dlezcano, containers, den, xemul, netdev From: ebiederm@xmission.com (Eric W. Biederman) Date: Thu, 06 Nov 2008 05:02:33 -0800 > > Dave can you please drop this one for the moment. > > I cleaned up my patch after the basic testing was over and the > result is a kernel that won't boot. So if we can prevent this > patch from spreading and breaking a git-bisect that would be great. > > I will follow up in a moment with a properly tested version. > My apologies. It's already in my tree, so I need a relative fixup patch not an entire new one. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device. 2008-11-06 13:02 ` Eric W. Biederman 2008-11-06 15:34 ` [PATCH 1/2] net: fib_rules ordering fixes Eric W. Biederman 2008-11-06 21:20 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device David Miller @ 2008-11-08 6:53 ` David Miller 2008-11-08 7:13 ` Eric W. Biederman 2 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2008-11-08 6:53 UTC (permalink / raw) To: ebiederm; +Cc: dlezcano, containers, den, xemul, netdev From: ebiederm@xmission.com (Eric W. Biederman) Date: Thu, 06 Nov 2008 05:02:33 -0800 > Dave can you please drop this one for the moment. > > I cleaned up my patch after the basic testing was over and the > result is a kernel that won't boot. So if we can prevent this > patch from spreading and breaking a git-bisect that would be great. > > I will follow up in a moment with a properly tested version. > My apologies. I'm putting a revert changeset in there, then your two new patches on top. I'm not screwing up my GIT tree for everyone who pulls from me just because you can't be bothered to test the actual changes you send me. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device. 2008-11-08 6:53 ` David Miller @ 2008-11-08 7:13 ` Eric W. Biederman 0 siblings, 0 replies; 12+ messages in thread From: Eric W. Biederman @ 2008-11-08 7:13 UTC (permalink / raw) To: David Miller; +Cc: dlezcano, containers, den, xemul, netdev David Miller <davem@davemloft.net> writes: > I'm putting a revert changeset in there, then your two new patches on > top. > > I'm not screwing up my GIT tree for everyone who pulls from me just > because you can't be bothered to test the actual changes you send me. Understood. When I sent the email it looked like it might not have landed in your git tree yet. I badly messed up on that one and I apologize. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-11-08 7:15 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <m18wt6v7eb.fsf@frodo.ebiederm.org>
[not found] ` <48EB36FC.4000008@fr.ibm.com>
[not found] ` <48EB3F72.5090201@openvz.org>
[not found] ` <m1d4ic4pbr.fsf@frodo.ebiederm.org>
[not found] ` <48ECA8D2.4090406@openvz.org>
[not found] ` <m14p2l4v2l.fsf_-_@frodo.ebiederm.org>
2008-11-05 23:25 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device Eric W. Biederman
2008-11-05 23:27 ` [PATCH 3/3] net: Don't leak packets when a netns is going down Eric W. Biederman
2008-11-06 0:00 ` David Miller
2008-11-06 0:00 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device David Miller
2008-11-06 13:02 ` Eric W. Biederman
2008-11-06 15:34 ` [PATCH 1/2] net: fib_rules ordering fixes Eric W. Biederman
2008-11-06 15:36 ` [PATCH 2/2] net: Guaranetee the proper ordering of the loopback device. v2 Eric W. Biederman
2008-11-08 6:55 ` David Miller
2008-11-08 6:54 ` [PATCH 1/2] net: fib_rules ordering fixes David Miller
2008-11-06 21:20 ` [PATCH 2/3] net: Guaranetee the proper ordering of the loopback device David Miller
2008-11-08 6:53 ` David Miller
2008-11-08 7:13 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).