* [PATCH 0/3] Fix Network namespace shutdown take 2
@ 2009-02-20 15:47 Eric W. Biederman
2009-02-20 15:52 ` [PATCH 1/3] netns: Fix icmp shutdown Eric W. Biederman
2009-02-25 12:43 ` [PATCH 0/3] Fix Network namespace shutdown take 2 Daniel Lezcano
0 siblings, 2 replies; 12+ messages in thread
From: Eric W. Biederman @ 2009-02-20 15:47 UTC (permalink / raw)
To: David Miller; +Cc: Linux Containers, netdev, Alexey Dobriyan, Denis V. Lunev
6 months ago when I introduced net_alive I fixed the symptoms
but I failed to properly fix network namespace shutdown.
I realized this when I received a bug report on Tuesday about a
failure in icmp_send caused by packets in the arp_gueue.
It turns out that the net_alive check in netif_receive_skb
is completely unnecessary and just masked the real problem.
If we remove all network devices from a network namespace before we
shutdown network subsystems and protocols then as designed we cannot
have packets in flight causing problems.
It turns out that the root cause of these problems is that the icmp
code was calling register_pernet_device instead of
register_pernet_subsys and so it's cleanup was happening much too
early.
The following patchset which should work against both 2.6.29-rcX
and net-next fixes the registration problems and removes the
unncessary net_alive check, making the code simpler and hopefully
more comprehensible.
Eric
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] netns: Fix icmp shutdown.
2009-02-20 15:47 [PATCH 0/3] Fix Network namespace shutdown take 2 Eric W. Biederman
@ 2009-02-20 15:52 ` Eric W. Biederman
2009-02-20 15:53 ` [PATCH 2/3] tcp: Like icmp use register_pernet_subsys Eric W. Biederman
[not found] ` <m1bpsxhzqz.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-02-25 12:43 ` [PATCH 0/3] Fix Network namespace shutdown take 2 Daniel Lezcano
1 sibling, 2 replies; 12+ messages in thread
From: Eric W. Biederman @ 2009-02-20 15:52 UTC (permalink / raw)
To: David Miller; +Cc: Linux Containers, netdev, Alexey Dobriyan, Denis V. Lunev
Recently I had a kernel panic in icmp_send during a network namespace
cleanup. There were packets in the arp queue that failed to be sent
and we attempted to generate an ICMP host unreachable message, but
failed because icmp_sk_exit had already been called.
The network devices are removed from a network namespace and their
arp queues are flushed before we do attempt to shutdown subsystems
so this error should have been impossible.
It turns out icmp_init is using register_pernet_device instead
of register_pernet_subsys. Which resulted in icmp being shut down
while we still had the possibility of packets in flight, making
a nasty NULL pointer deference in interrupt context possible.
Changing this to register_pernet_subsys fixes the problem in
my testing.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
net/ipv4/icmp.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 382800a..3f50807 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -1207,7 +1207,7 @@ static struct pernet_operations __net_initdata icmp_sk_ops = {
int __init icmp_init(void)
{
- return register_pernet_device(&icmp_sk_ops);
+ return register_pernet_subsys(&icmp_sk_ops);
}
EXPORT_SYMBOL(icmp_err_convert);
--
1.6.1.2.350.g88cc
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] tcp: Like icmp use register_pernet_subsys
2009-02-20 15:52 ` [PATCH 1/3] netns: Fix icmp shutdown Eric W. Biederman
@ 2009-02-20 15:53 ` Eric W. Biederman
[not found] ` <m163j5hzow.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-02-20 16:02 ` [PATCH 3/3] netns: Remove net_alive Eric W. Biederman
[not found] ` <m1bpsxhzqz.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
1 sibling, 2 replies; 12+ messages in thread
From: Eric W. Biederman @ 2009-02-20 15:53 UTC (permalink / raw)
To: David Miller; +Cc: Linux Containers, netdev, Alexey Dobriyan, Denis V. Lunev
To remove the possibility of packets flying around when network
devices are being cleaned up use reisger_pernet_subsys instead of
register_pernet_device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
net/ipv4/tcp_ipv4.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index f6b962f..a738120 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2443,7 +2443,7 @@ static struct pernet_operations __net_initdata tcp_sk_ops = {
void __init tcp_v4_init(void)
{
inet_hashinfo_init(&tcp_hashinfo);
- if (register_pernet_device(&tcp_sk_ops))
+ if (register_pernet_subsys(&tcp_sk_ops))
panic("Failed to create the TCP control socket.\n");
}
--
1.6.1.2.350.g88cc
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] netns: Fix icmp shutdown.
[not found] ` <m1bpsxhzqz.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-02-20 15:57 ` Denis V. Lunev
2009-02-22 8:09 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Denis V. Lunev @ 2009-02-20 15:57 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Linux Containers, netdev-u79uwXL29TY76Z2rM5mHXA, David Miller,
Alexey Dobriyan
Acked-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
On Fri, 2009-02-20 at 07:52 -0800, Eric W. Biederman wrote:
> Recently I had a kernel panic in icmp_send during a network namespace
> cleanup. There were packets in the arp queue that failed to be sent
> and we attempted to generate an ICMP host unreachable message, but
> failed because icmp_sk_exit had already been called.
>
> The network devices are removed from a network namespace and their
> arp queues are flushed before we do attempt to shutdown subsystems
> so this error should have been impossible.
>
> It turns out icmp_init is using register_pernet_device instead
> of register_pernet_subsys. Which resulted in icmp being shut down
> while we still had the possibility of packets in flight, making
> a nasty NULL pointer deference in interrupt context possible.
>
> Changing this to register_pernet_subsys fixes the problem in
> my testing.
>
> Signed-off-by: Eric W. Biederman <ebiederm-BGArkANP9klv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
> net/ipv4/icmp.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 382800a..3f50807 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -1207,7 +1207,7 @@ static struct pernet_operations __net_initdata icmp_sk_ops = {
>
> int __init icmp_init(void)
> {
> - return register_pernet_device(&icmp_sk_ops);
> + return register_pernet_subsys(&icmp_sk_ops);
> }
>
> EXPORT_SYMBOL(icmp_err_convert);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] tcp: Like icmp use register_pernet_subsys
[not found] ` <m163j5hzow.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-02-20 15:57 ` Denis V. Lunev
2009-02-22 8:10 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Denis V. Lunev @ 2009-02-20 15:57 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Linux Containers, netdev-u79uwXL29TY76Z2rM5mHXA, David Miller,
Alexey Dobriyan
Acked-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
On Fri, 2009-02-20 at 07:53 -0800, Eric W. Biederman wrote:
> To remove the possibility of packets flying around when network
> devices are being cleaned up use reisger_pernet_subsys instead of
> register_pernet_device.
>
> Signed-off-by: Eric W. Biederman <ebiederm-BGArkANP9klv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
> net/ipv4/tcp_ipv4.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index f6b962f..a738120 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -2443,7 +2443,7 @@ static struct pernet_operations __net_initdata tcp_sk_ops = {
> void __init tcp_v4_init(void)
> {
> inet_hashinfo_init(&tcp_hashinfo);
> - if (register_pernet_device(&tcp_sk_ops))
> + if (register_pernet_subsys(&tcp_sk_ops))
> panic("Failed to create the TCP control socket.\n");
> }
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/3] netns: Remove net_alive
2009-02-20 15:53 ` [PATCH 2/3] tcp: Like icmp use register_pernet_subsys Eric W. Biederman
[not found] ` <m163j5hzow.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-02-20 16:02 ` Eric W. Biederman
2009-02-22 8:11 ` David Miller
1 sibling, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2009-02-20 16:02 UTC (permalink / raw)
To: David Miller; +Cc: Linux Containers, netdev, Alexey Dobriyan, Denis V. Lunev
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.
Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb. So remove net_alive allowing
packet reception run a little faster.
Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
include/net/net_namespace.h | 27 +++++++++++++++++----------
net/core/dev.c | 6 ------
net/core/net_namespace.c | 3 ---
3 files changed, 17 insertions(+), 19 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 6fc13d9..ded434b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -109,11 +109,6 @@ extern struct list_head net_namespace_list;
#ifdef CONFIG_NET_NS
extern void __put_net(struct net *net);
-static inline int net_alive(struct net *net)
-{
- return net && atomic_read(&net->count);
-}
-
static inline struct net *get_net(struct net *net)
{
atomic_inc(&net->count);
@@ -145,11 +140,6 @@ int net_eq(const struct net *net1, const struct net *net2)
}
#else
-static inline int net_alive(struct net *net)
-{
- return 1;
-}
-
static inline struct net *get_net(struct net *net)
{
return net;
@@ -234,6 +224,23 @@ struct pernet_operations {
void (*exit)(struct net *net);
};
+/*
+ * Use these carefully. If you implement a network device and it
+ * needs per network namespace operations use device pernet operations,
+ * otherwise use pernet subsys operations.
+ *
+ * This is critically important. Most of the network code cleanup
+ * runs with the assumption that dev_remove_pack has been called so no
+ * new packets will arrive during and after the cleanup functions have
+ * been called. dev_remove_pack is not per namespace so instead the
+ * guarantee of no more packets arriving in a network namespace is
+ * provided by ensuring that all network devices and all sockets have
+ * left the network namespace before the cleanup methods are called.
+ *
+ * For the longest time the ipv4 icmp code was registered as a pernet
+ * device which caused kernel oops, and panics during network
+ * namespace cleanup. So please don't get this wrong.
+ */
extern int register_pernet_subsys(struct pernet_operations *);
extern void unregister_pernet_subsys(struct pernet_operations *);
extern int register_pernet_gen_subsys(int *id, struct pernet_operations *);
diff --git a/net/core/dev.c b/net/core/dev.c
index 5493394..fd1712d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2267,12 +2267,6 @@ int netif_receive_skb(struct sk_buff *skb)
rcu_read_lock();
- /* Don't receive packets in an exiting network namespace */
- if (!net_alive(dev_net(skb->dev))) {
- kfree_skb(skb);
- goto out;
- }
-
#ifdef CONFIG_NET_CLS_ACT
if (skb->tc_verd & TC_NCLS) {
skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 55151fa..516c7b1 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -140,9 +140,6 @@ static void cleanup_net(struct work_struct *work)
struct pernet_operations *ops;
struct net *net;
- /* Be very certain incoming network packets will not find us */
- rcu_barrier();
-
net = container_of(work, struct net, work);
mutex_lock(&net_mutex);
--
1.6.1.2.350.g88cc
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] netns: Fix icmp shutdown.
2009-02-20 15:57 ` [PATCH 1/3] netns: Fix icmp shutdown Denis V. Lunev
@ 2009-02-22 8:09 ` David Miller
0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2009-02-22 8:09 UTC (permalink / raw)
To: den; +Cc: ebiederm, containers, netdev, adobriyan
From: "Denis V. Lunev" <den@openvz.org>
Date: Fri, 20 Feb 2009 18:57:02 +0300
> Acked-by: Denis V. Lunev <den@openvz.org>
Applied.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] tcp: Like icmp use register_pernet_subsys
2009-02-20 15:57 ` Denis V. Lunev
@ 2009-02-22 8:10 ` David Miller
0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2009-02-22 8:10 UTC (permalink / raw)
To: den; +Cc: ebiederm, containers, netdev, adobriyan
From: "Denis V. Lunev" <den@openvz.org>
Date: Fri, 20 Feb 2009 18:57:23 +0300
> Acked-by: Denis V. Lunev <den@openvz.org>
Applied.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] netns: Remove net_alive
2009-02-20 16:02 ` [PATCH 3/3] netns: Remove net_alive Eric W. Biederman
@ 2009-02-22 8:11 ` David Miller
0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2009-02-22 8:11 UTC (permalink / raw)
To: ebiederm; +Cc: containers, netdev, adobriyan, den
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Fri, 20 Feb 2009 08:02:57 -0800
>
> It turns out that net_alive is unnecessary, and the original problem
> that led to it being added was simply that the icmp code thought
> it was a network device and wound up being unable to handle packets
> while there were still packets in the network namespace.
>
> Now that icmp and tcp have been fixed to properly register themselves
> this problem is no longer present and we have a stronger guarantee
> that packets will not arrive in a network namespace then that provided
> by net_alive in netif_receive_skb. So remove net_alive allowing
> packet reception run a little faster.
>
> Additionally document the strong reason why network namespace cleanup
> is safe so that if something happens again someone else will have
> a chance of figuring it out.
>
> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Also applied, thanks Eric.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] Fix Network namespace shutdown take 2
2009-02-20 15:47 [PATCH 0/3] Fix Network namespace shutdown take 2 Eric W. Biederman
2009-02-20 15:52 ` [PATCH 1/3] netns: Fix icmp shutdown Eric W. Biederman
@ 2009-02-25 12:43 ` Daniel Lezcano
2009-03-03 9:07 ` David Miller
1 sibling, 1 reply; 12+ messages in thread
From: Daniel Lezcano @ 2009-02-25 12:43 UTC (permalink / raw)
To: David Miller
Cc: Eric W. Biederman, Linux Containers, netdev, Alexey Dobriyan,
Denis V. Lunev
Eric W. Biederman wrote:
> 6 months ago when I introduced net_alive I fixed the symptoms
> but I failed to properly fix network namespace shutdown.
>
> I realized this when I received a bug report on Tuesday about a
> failure in icmp_send caused by packets in the arp_gueue.
>
> It turns out that the net_alive check in netif_receive_skb
> is completely unnecessary and just masked the real problem.
>
> If we remove all network devices from a network namespace before we
> shutdown network subsystems and protocols then as designed we cannot
> have packets in flight causing problems.
>
> It turns out that the root cause of these problems is that the icmp
> code was calling register_pernet_device instead of
> register_pernet_subsys and so it's cleanup was happening much too
> early.
>
> The following patchset which should work against both 2.6.29-rcX
> and net-next fixes the registration problems and removes the
> unncessary net_alive check, making the code simpler and hopefully
> more comprehensible.
>
Hi Dave,
I don't see these patches in the net-2.6 tree. Shouldn't they be in
net-2.6 too ?
-- Daniel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] Fix Network namespace shutdown take 2
2009-02-25 12:43 ` [PATCH 0/3] Fix Network namespace shutdown take 2 Daniel Lezcano
@ 2009-03-03 9:07 ` David Miller
2009-03-05 12:35 ` Daniel Lezcano
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2009-03-03 9:07 UTC (permalink / raw)
To: daniel.lezcano; +Cc: ebiederm, containers, netdev, adobriyan, den
From: Daniel Lezcano <daniel.lezcano@free.fr>
Date: Wed, 25 Feb 2009 13:43:29 +0100
> I don't see these patches in the net-2.6 tree. Shouldn't they be in
> net-2.6 too ?
Ok, I'll think about cherry picking them into net-2.6...
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] Fix Network namespace shutdown take 2
2009-03-03 9:07 ` David Miller
@ 2009-03-05 12:35 ` Daniel Lezcano
0 siblings, 0 replies; 12+ messages in thread
From: Daniel Lezcano @ 2009-03-05 12:35 UTC (permalink / raw)
To: David Miller; +Cc: ebiederm, containers, netdev, adobriyan, den
David Miller wrote:
> From: Daniel Lezcano <daniel.lezcano@free.fr>
> Date: Wed, 25 Feb 2009 13:43:29 +0100
>
>
>> I don't see these patches in the net-2.6 tree. Shouldn't they be in
>> net-2.6 too ?
>>
>
> Ok, I'll think about cherry picking them into net-2.6...
>
Thanks Dave.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-03-05 12:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-20 15:47 [PATCH 0/3] Fix Network namespace shutdown take 2 Eric W. Biederman
2009-02-20 15:52 ` [PATCH 1/3] netns: Fix icmp shutdown Eric W. Biederman
2009-02-20 15:53 ` [PATCH 2/3] tcp: Like icmp use register_pernet_subsys Eric W. Biederman
[not found] ` <m163j5hzow.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-02-20 15:57 ` Denis V. Lunev
2009-02-22 8:10 ` David Miller
2009-02-20 16:02 ` [PATCH 3/3] netns: Remove net_alive Eric W. Biederman
2009-02-22 8:11 ` David Miller
[not found] ` <m1bpsxhzqz.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-02-20 15:57 ` [PATCH 1/3] netns: Fix icmp shutdown Denis V. Lunev
2009-02-22 8:09 ` David Miller
2009-02-25 12:43 ` [PATCH 0/3] Fix Network namespace shutdown take 2 Daniel Lezcano
2009-03-03 9:07 ` David Miller
2009-03-05 12:35 ` Daniel Lezcano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).