* [PATCH 0/3] Fix Network namespace shutdown take 2 @ 2009-02-20 15:47 Eric W. Biederman 2009-02-20 15:52 ` [PATCH 1/3] netns: Fix icmp shutdown Eric W. Biederman 2009-02-25 12:43 ` [PATCH 0/3] Fix Network namespace shutdown take 2 Daniel Lezcano 0 siblings, 2 replies; 12+ messages in thread From: Eric W. Biederman @ 2009-02-20 15:47 UTC (permalink / raw) To: David Miller; +Cc: Linux Containers, netdev, Alexey Dobriyan, Denis V. Lunev 6 months ago when I introduced net_alive I fixed the symptoms but I failed to properly fix network namespace shutdown. I realized this when I received a bug report on Tuesday about a failure in icmp_send caused by packets in the arp_gueue. It turns out that the net_alive check in netif_receive_skb is completely unnecessary and just masked the real problem. If we remove all network devices from a network namespace before we shutdown network subsystems and protocols then as designed we cannot have packets in flight causing problems. It turns out that the root cause of these problems is that the icmp code was calling register_pernet_device instead of register_pernet_subsys and so it's cleanup was happening much too early. The following patchset which should work against both 2.6.29-rcX and net-next fixes the registration problems and removes the unncessary net_alive check, making the code simpler and hopefully more comprehensible. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] netns: Fix icmp shutdown. 2009-02-20 15:47 [PATCH 0/3] Fix Network namespace shutdown take 2 Eric W. Biederman @ 2009-02-20 15:52 ` Eric W. Biederman 2009-02-20 15:53 ` [PATCH 2/3] tcp: Like icmp use register_pernet_subsys Eric W. Biederman [not found] ` <m1bpsxhzqz.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 2009-02-25 12:43 ` [PATCH 0/3] Fix Network namespace shutdown take 2 Daniel Lezcano 1 sibling, 2 replies; 12+ messages in thread From: Eric W. Biederman @ 2009-02-20 15:52 UTC (permalink / raw) To: David Miller; +Cc: Linux Containers, netdev, Alexey Dobriyan, Denis V. Lunev Recently I had a kernel panic in icmp_send during a network namespace cleanup. There were packets in the arp queue that failed to be sent and we attempted to generate an ICMP host unreachable message, but failed because icmp_sk_exit had already been called. The network devices are removed from a network namespace and their arp queues are flushed before we do attempt to shutdown subsystems so this error should have been impossible. It turns out icmp_init is using register_pernet_device instead of register_pernet_subsys. Which resulted in icmp being shut down while we still had the possibility of packets in flight, making a nasty NULL pointer deference in interrupt context possible. Changing this to register_pernet_subsys fixes the problem in my testing. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> --- net/ipv4/icmp.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 382800a..3f50807 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -1207,7 +1207,7 @@ static struct pernet_operations __net_initdata icmp_sk_ops = { int __init icmp_init(void) { - return register_pernet_device(&icmp_sk_ops); + return register_pernet_subsys(&icmp_sk_ops); } EXPORT_SYMBOL(icmp_err_convert); -- 1.6.1.2.350.g88cc ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] tcp: Like icmp use register_pernet_subsys 2009-02-20 15:52 ` [PATCH 1/3] netns: Fix icmp shutdown Eric W. Biederman @ 2009-02-20 15:53 ` Eric W. Biederman [not found] ` <m163j5hzow.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 2009-02-20 16:02 ` [PATCH 3/3] netns: Remove net_alive Eric W. Biederman [not found] ` <m1bpsxhzqz.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 1 sibling, 2 replies; 12+ messages in thread From: Eric W. Biederman @ 2009-02-20 15:53 UTC (permalink / raw) To: David Miller; +Cc: Linux Containers, netdev, Alexey Dobriyan, Denis V. Lunev To remove the possibility of packets flying around when network devices are being cleaned up use reisger_pernet_subsys instead of register_pernet_device. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> --- net/ipv4/tcp_ipv4.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f6b962f..a738120 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2443,7 +2443,7 @@ static struct pernet_operations __net_initdata tcp_sk_ops = { void __init tcp_v4_init(void) { inet_hashinfo_init(&tcp_hashinfo); - if (register_pernet_device(&tcp_sk_ops)) + if (register_pernet_subsys(&tcp_sk_ops)) panic("Failed to create the TCP control socket.\n"); } -- 1.6.1.2.350.g88cc ^ permalink raw reply related [flat|nested] 12+ messages in thread
[parent not found: <m163j5hzow.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>]
* Re: [PATCH 2/3] tcp: Like icmp use register_pernet_subsys [not found] ` <m163j5hzow.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> @ 2009-02-20 15:57 ` Denis V. Lunev 2009-02-22 8:10 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Denis V. Lunev @ 2009-02-20 15:57 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, netdev-u79uwXL29TY76Z2rM5mHXA, David Miller, Alexey Dobriyan Acked-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> On Fri, 2009-02-20 at 07:53 -0800, Eric W. Biederman wrote: > To remove the possibility of packets flying around when network > devices are being cleaned up use reisger_pernet_subsys instead of > register_pernet_device. > > Signed-off-by: Eric W. Biederman <ebiederm-BGArkANP9klv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> > --- > net/ipv4/tcp_ipv4.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index f6b962f..a738120 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -2443,7 +2443,7 @@ static struct pernet_operations __net_initdata tcp_sk_ops = { > void __init tcp_v4_init(void) > { > inet_hashinfo_init(&tcp_hashinfo); > - if (register_pernet_device(&tcp_sk_ops)) > + if (register_pernet_subsys(&tcp_sk_ops)) > panic("Failed to create the TCP control socket.\n"); > } > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] tcp: Like icmp use register_pernet_subsys 2009-02-20 15:57 ` Denis V. Lunev @ 2009-02-22 8:10 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2009-02-22 8:10 UTC (permalink / raw) To: den; +Cc: ebiederm, containers, netdev, adobriyan From: "Denis V. Lunev" <den@openvz.org> Date: Fri, 20 Feb 2009 18:57:23 +0300 > Acked-by: Denis V. Lunev <den@openvz.org> Applied. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/3] netns: Remove net_alive 2009-02-20 15:53 ` [PATCH 2/3] tcp: Like icmp use register_pernet_subsys Eric W. Biederman [not found] ` <m163j5hzow.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> @ 2009-02-20 16:02 ` Eric W. Biederman 2009-02-22 8:11 ` David Miller 1 sibling, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2009-02-20 16:02 UTC (permalink / raw) To: David Miller; +Cc: Linux Containers, netdev, Alexey Dobriyan, Denis V. Lunev It turns out that net_alive is unnecessary, and the original problem that led to it being added was simply that the icmp code thought it was a network device and wound up being unable to handle packets while there were still packets in the network namespace. Now that icmp and tcp have been fixed to properly register themselves this problem is no longer present and we have a stronger guarantee that packets will not arrive in a network namespace then that provided by net_alive in netif_receive_skb. So remove net_alive allowing packet reception run a little faster. Additionally document the strong reason why network namespace cleanup is safe so that if something happens again someone else will have a chance of figuring it out. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> --- include/net/net_namespace.h | 27 +++++++++++++++++---------- net/core/dev.c | 6 ------ net/core/net_namespace.c | 3 --- 3 files changed, 17 insertions(+), 19 deletions(-) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 6fc13d9..ded434b 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -109,11 +109,6 @@ extern struct list_head net_namespace_list; #ifdef CONFIG_NET_NS extern void __put_net(struct net *net); -static inline int net_alive(struct net *net) -{ - return net && atomic_read(&net->count); -} - static inline struct net *get_net(struct net *net) { atomic_inc(&net->count); @@ -145,11 +140,6 @@ int net_eq(const struct net *net1, const struct net *net2) } #else -static inline int net_alive(struct net *net) -{ - return 1; -} - static inline struct net *get_net(struct net *net) { return net; @@ -234,6 +224,23 @@ struct pernet_operations { void (*exit)(struct net *net); }; +/* + * Use these carefully. If you implement a network device and it + * needs per network namespace operations use device pernet operations, + * otherwise use pernet subsys operations. + * + * This is critically important. Most of the network code cleanup + * runs with the assumption that dev_remove_pack has been called so no + * new packets will arrive during and after the cleanup functions have + * been called. dev_remove_pack is not per namespace so instead the + * guarantee of no more packets arriving in a network namespace is + * provided by ensuring that all network devices and all sockets have + * left the network namespace before the cleanup methods are called. + * + * For the longest time the ipv4 icmp code was registered as a pernet + * device which caused kernel oops, and panics during network + * namespace cleanup. So please don't get this wrong. + */ extern int register_pernet_subsys(struct pernet_operations *); extern void unregister_pernet_subsys(struct pernet_operations *); extern int register_pernet_gen_subsys(int *id, struct pernet_operations *); diff --git a/net/core/dev.c b/net/core/dev.c index 5493394..fd1712d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2267,12 +2267,6 @@ int netif_receive_skb(struct sk_buff *skb) rcu_read_lock(); - /* Don't receive packets in an exiting network namespace */ - if (!net_alive(dev_net(skb->dev))) { - kfree_skb(skb); - goto out; - } - #ifdef CONFIG_NET_CLS_ACT if (skb->tc_verd & TC_NCLS) { skb->tc_verd = CLR_TC_NCLS(skb->tc_verd); diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 55151fa..516c7b1 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -140,9 +140,6 @@ static void cleanup_net(struct work_struct *work) struct pernet_operations *ops; struct net *net; - /* Be very certain incoming network packets will not find us */ - rcu_barrier(); - net = container_of(work, struct net, work); mutex_lock(&net_mutex); -- 1.6.1.2.350.g88cc ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] netns: Remove net_alive 2009-02-20 16:02 ` [PATCH 3/3] netns: Remove net_alive Eric W. Biederman @ 2009-02-22 8:11 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2009-02-22 8:11 UTC (permalink / raw) To: ebiederm; +Cc: containers, netdev, adobriyan, den From: ebiederm@xmission.com (Eric W. Biederman) Date: Fri, 20 Feb 2009 08:02:57 -0800 > > It turns out that net_alive is unnecessary, and the original problem > that led to it being added was simply that the icmp code thought > it was a network device and wound up being unable to handle packets > while there were still packets in the network namespace. > > Now that icmp and tcp have been fixed to properly register themselves > this problem is no longer present and we have a stronger guarantee > that packets will not arrive in a network namespace then that provided > by net_alive in netif_receive_skb. So remove net_alive allowing > packet reception run a little faster. > > Additionally document the strong reason why network namespace cleanup > is safe so that if something happens again someone else will have > a chance of figuring it out. > > Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Also applied, thanks Eric. ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <m1bpsxhzqz.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>]
* Re: [PATCH 1/3] netns: Fix icmp shutdown. [not found] ` <m1bpsxhzqz.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> @ 2009-02-20 15:57 ` Denis V. Lunev 2009-02-22 8:09 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Denis V. Lunev @ 2009-02-20 15:57 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, netdev-u79uwXL29TY76Z2rM5mHXA, David Miller, Alexey Dobriyan Acked-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> On Fri, 2009-02-20 at 07:52 -0800, Eric W. Biederman wrote: > Recently I had a kernel panic in icmp_send during a network namespace > cleanup. There were packets in the arp queue that failed to be sent > and we attempted to generate an ICMP host unreachable message, but > failed because icmp_sk_exit had already been called. > > The network devices are removed from a network namespace and their > arp queues are flushed before we do attempt to shutdown subsystems > so this error should have been impossible. > > It turns out icmp_init is using register_pernet_device instead > of register_pernet_subsys. Which resulted in icmp being shut down > while we still had the possibility of packets in flight, making > a nasty NULL pointer deference in interrupt context possible. > > Changing this to register_pernet_subsys fixes the problem in > my testing. > > Signed-off-by: Eric W. Biederman <ebiederm-BGArkANP9klv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> > --- > net/ipv4/icmp.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c > index 382800a..3f50807 100644 > --- a/net/ipv4/icmp.c > +++ b/net/ipv4/icmp.c > @@ -1207,7 +1207,7 @@ static struct pernet_operations __net_initdata icmp_sk_ops = { > > int __init icmp_init(void) > { > - return register_pernet_device(&icmp_sk_ops); > + return register_pernet_subsys(&icmp_sk_ops); > } > > EXPORT_SYMBOL(icmp_err_convert); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] netns: Fix icmp shutdown. 2009-02-20 15:57 ` [PATCH 1/3] netns: Fix icmp shutdown Denis V. Lunev @ 2009-02-22 8:09 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2009-02-22 8:09 UTC (permalink / raw) To: den; +Cc: ebiederm, containers, netdev, adobriyan From: "Denis V. Lunev" <den@openvz.org> Date: Fri, 20 Feb 2009 18:57:02 +0300 > Acked-by: Denis V. Lunev <den@openvz.org> Applied. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] Fix Network namespace shutdown take 2 2009-02-20 15:47 [PATCH 0/3] Fix Network namespace shutdown take 2 Eric W. Biederman 2009-02-20 15:52 ` [PATCH 1/3] netns: Fix icmp shutdown Eric W. Biederman @ 2009-02-25 12:43 ` Daniel Lezcano 2009-03-03 9:07 ` David Miller 1 sibling, 1 reply; 12+ messages in thread From: Daniel Lezcano @ 2009-02-25 12:43 UTC (permalink / raw) To: David Miller Cc: Eric W. Biederman, Linux Containers, netdev, Alexey Dobriyan, Denis V. Lunev Eric W. Biederman wrote: > 6 months ago when I introduced net_alive I fixed the symptoms > but I failed to properly fix network namespace shutdown. > > I realized this when I received a bug report on Tuesday about a > failure in icmp_send caused by packets in the arp_gueue. > > It turns out that the net_alive check in netif_receive_skb > is completely unnecessary and just masked the real problem. > > If we remove all network devices from a network namespace before we > shutdown network subsystems and protocols then as designed we cannot > have packets in flight causing problems. > > It turns out that the root cause of these problems is that the icmp > code was calling register_pernet_device instead of > register_pernet_subsys and so it's cleanup was happening much too > early. > > The following patchset which should work against both 2.6.29-rcX > and net-next fixes the registration problems and removes the > unncessary net_alive check, making the code simpler and hopefully > more comprehensible. > Hi Dave, I don't see these patches in the net-2.6 tree. Shouldn't they be in net-2.6 too ? -- Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] Fix Network namespace shutdown take 2 2009-02-25 12:43 ` [PATCH 0/3] Fix Network namespace shutdown take 2 Daniel Lezcano @ 2009-03-03 9:07 ` David Miller 2009-03-05 12:35 ` Daniel Lezcano 0 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2009-03-03 9:07 UTC (permalink / raw) To: daniel.lezcano; +Cc: ebiederm, containers, netdev, adobriyan, den From: Daniel Lezcano <daniel.lezcano@free.fr> Date: Wed, 25 Feb 2009 13:43:29 +0100 > I don't see these patches in the net-2.6 tree. Shouldn't they be in > net-2.6 too ? Ok, I'll think about cherry picking them into net-2.6... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] Fix Network namespace shutdown take 2 2009-03-03 9:07 ` David Miller @ 2009-03-05 12:35 ` Daniel Lezcano 0 siblings, 0 replies; 12+ messages in thread From: Daniel Lezcano @ 2009-03-05 12:35 UTC (permalink / raw) To: David Miller; +Cc: ebiederm, containers, netdev, adobriyan, den David Miller wrote: > From: Daniel Lezcano <daniel.lezcano@free.fr> > Date: Wed, 25 Feb 2009 13:43:29 +0100 > > >> I don't see these patches in the net-2.6 tree. Shouldn't they be in >> net-2.6 too ? >> > > Ok, I'll think about cherry picking them into net-2.6... > Thanks Dave. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-03-05 12:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-20 15:47 [PATCH 0/3] Fix Network namespace shutdown take 2 Eric W. Biederman
2009-02-20 15:52 ` [PATCH 1/3] netns: Fix icmp shutdown Eric W. Biederman
2009-02-20 15:53 ` [PATCH 2/3] tcp: Like icmp use register_pernet_subsys Eric W. Biederman
[not found] ` <m163j5hzow.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-02-20 15:57 ` Denis V. Lunev
2009-02-22 8:10 ` David Miller
2009-02-20 16:02 ` [PATCH 3/3] netns: Remove net_alive Eric W. Biederman
2009-02-22 8:11 ` David Miller
[not found] ` <m1bpsxhzqz.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-02-20 15:57 ` [PATCH 1/3] netns: Fix icmp shutdown Denis V. Lunev
2009-02-22 8:09 ` David Miller
2009-02-25 12:43 ` [PATCH 0/3] Fix Network namespace shutdown take 2 Daniel Lezcano
2009-03-03 9:07 ` David Miller
2009-03-05 12:35 ` Daniel Lezcano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).