net_ns cleanup / RCU overhead

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* net_ns cleanup / RCU overhead
@ 2014-08-20  5:58 Simon Kirby
  2014-08-28 19:24 ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Simon Kirby @ 2014-08-20  5:58 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Eric W. Biederman

Hello!

In trying to figure out what happened to a box running lots of vsftpd
since we deployed a CONFIG_NET_NS=y kernel to it, we found that the
(wall) time needed for cleanup_net() to complete, even on an idle box,
can be quite long:

#!/bin/bash

ip netns delete test >&/dev/null
while ip netns add test; do
        echo hi
        ip netns delete test
done

On my desktop and typical hosts, this prints at only around 4 or 6 per
second. While this is happening, "vmstat 1" reports 100% idle, and there
there are D-state processes with stacks similar to:

30566 [kworker/u16:1] D wait_rcu_gp+0x48, synchronize_sched+0x2f, cleanup_net+0xdb, process_one_work+0x175, worker_thread+0x119, kthread+0xbb, ret_from_fork+0x7c, 0xffffffffffffffff

32220 ip              D copy_net_ns+0x68, create_new_namespaces+0xfc, unshare_nsproxy_namespaces+0x66, SyS_unshare+0x159, system_call_fastpath+0x16, 0xffffffffffffffff

copy_net_ns() is waiting on net_mutex which is held by cleanup_net().

vsftpd uses CLONE_NEWNET to set up privsep processes. There is a comment
about it being really slow before 2.6.35 (it avoids CLONE_NEWNET in that
case). I didn't find anything that makes 2.6.35 any faster, but on Debian
2.6.36-5-amd64, I notice it does seem to be a bit faster than 3.2, 3.10,
3.16, though still not anything I'd ever want to rely on per connection.

C implementation of the above: http://0x.ca/sim/ref/tools/netnsloop.c

Kernel stack "top": http://0x.ca/sim/ref/tools/pstack

What's going on here?

Simon-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-20  5:58 net_ns cleanup / RCU overhead Simon Kirby
@ 2014-08-28 19:24 ` Paul E. McKenney
  2014-08-28 19:44   ` Simon Kirby
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2014-08-28 19:24 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel, netdev, Eric W. Biederman

On Tue, Aug 19, 2014 at 10:58:55PM -0700, Simon Kirby wrote:
> Hello!
> 
> In trying to figure out what happened to a box running lots of vsftpd
> since we deployed a CONFIG_NET_NS=y kernel to it, we found that the
> (wall) time needed for cleanup_net() to complete, even on an idle box,
> can be quite long:
> 
> #!/bin/bash
> 
> ip netns delete test >&/dev/null
> while ip netns add test; do
>         echo hi
>         ip netns delete test
> done
> 
> On my desktop and typical hosts, this prints at only around 4 or 6 per
> second. While this is happening, "vmstat 1" reports 100% idle, and there
> there are D-state processes with stacks similar to:
> 
> 30566 [kworker/u16:1] D wait_rcu_gp+0x48, synchronize_sched+0x2f, cleanup_net+0xdb, process_one_work+0x175, worker_thread+0x119, kthread+0xbb, ret_from_fork+0x7c, 0xffffffffffffffff
> 
> 32220 ip              D copy_net_ns+0x68, create_new_namespaces+0xfc, unshare_nsproxy_namespaces+0x66, SyS_unshare+0x159, system_call_fastpath+0x16, 0xffffffffffffffff
> 
> copy_net_ns() is waiting on net_mutex which is held by cleanup_net().
> 
> vsftpd uses CLONE_NEWNET to set up privsep processes. There is a comment
> about it being really slow before 2.6.35 (it avoids CLONE_NEWNET in that
> case). I didn't find anything that makes 2.6.35 any faster, but on Debian
> 2.6.36-5-amd64, I notice it does seem to be a bit faster than 3.2, 3.10,
> 3.16, though still not anything I'd ever want to rely on per connection.
> 
> C implementation of the above: http://0x.ca/sim/ref/tools/netnsloop.c
> 
> Kernel stack "top": http://0x.ca/sim/ref/tools/pstack
> 
> What's going on here?

That is a bit slow for many configurations, but there are some exceptions.

So, what is your kernel's .config?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-28 19:24 ` Paul E. McKenney
@ 2014-08-28 19:44   ` Simon Kirby
  2014-08-28 20:33     ` Eric W. Biederman
  0 siblings, 1 reply; 12+ messages in thread
From: Simon Kirby @ 2014-08-28 19:44 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, netdev, Eric W. Biederman

On Thu, Aug 28, 2014 at 12:24:31PM -0700, Paul E. McKenney wrote:

> On Tue, Aug 19, 2014 at 10:58:55PM -0700, Simon Kirby wrote:
> > Hello!
> > 
> > In trying to figure out what happened to a box running lots of vsftpd
> > since we deployed a CONFIG_NET_NS=y kernel to it, we found that the
> > (wall) time needed for cleanup_net() to complete, even on an idle box,
> > can be quite long:
> > 
> > #!/bin/bash
> > 
> > ip netns delete test >&/dev/null
> > while ip netns add test; do
> >         echo hi
> >         ip netns delete test
> > done
> > 
> > On my desktop and typical hosts, this prints at only around 4 or 6 per
> > second. While this is happening, "vmstat 1" reports 100% idle, and there
> > there are D-state processes with stacks similar to:
> > 
> > 30566 [kworker/u16:1] D wait_rcu_gp+0x48, synchronize_sched+0x2f, cleanup_net+0xdb, process_one_work+0x175, worker_thread+0x119, kthread+0xbb, ret_from_fork+0x7c, 0xffffffffffffffff
> > 
> > 32220 ip              D copy_net_ns+0x68, create_new_namespaces+0xfc, unshare_nsproxy_namespaces+0x66, SyS_unshare+0x159, system_call_fastpath+0x16, 0xffffffffffffffff
> > 
> > copy_net_ns() is waiting on net_mutex which is held by cleanup_net().
> > 
> > vsftpd uses CLONE_NEWNET to set up privsep processes. There is a comment
> > about it being really slow before 2.6.35 (it avoids CLONE_NEWNET in that
> > case). I didn't find anything that makes 2.6.35 any faster, but on Debian
> > 2.6.36-5-amd64, I notice it does seem to be a bit faster than 3.2, 3.10,
> > 3.16, though still not anything I'd ever want to rely on per connection.
> > 
> > C implementation of the above: http://0x.ca/sim/ref/tools/netnsloop.c
> > 
> > Kernel stack "top": http://0x.ca/sim/ref/tools/pstack
> > 
> > What's going on here?
> 
> That is a bit slow for many configurations, but there are some exceptions.
> 
> So, what is your kernel's .config?

I was unable to find a config (or stock kernel) that was any different,
but here's the one we're using: http://0x.ca/sim/ref/3.10/config-3.10.53

How fast does the above test run for you?

We've been running with the attached, which has helped a little, but it's
still quite slow in our particular use case (vsftpd), and with the above
test. Should I enable RCU_TRACE or STALL_INFO with a low timeout or
something?

Simon-

-- >8 --
Subject: [PATCH] netns: use synchronize_rcu_expedited instead of
 synchronize_rcu

Similar to ef323088, with synchronize_rcu(), we are only able to create
and destroy about 4 or 7 net namespaces per second, which really puts a
dent in the performance of programs attempting to use CLONE_NEWNET for
privilege separation (vsftpd, chromium).
---
 net/core/net_namespace.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 85b6269..6dcb4b3 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -296,7 +296,7 @@ static void cleanup_net(struct work_struct *work)
 	 * This needs to be before calling the exit() notifiers, so
 	 * the rcu_barrier() below isn't sufficient alone.
 	 */
-	synchronize_rcu();
+	synchronize_rcu_expedited();
 
 	/* Run all of the network namespace exit methods */
 	list_for_each_entry_reverse(ops, &pernet_list, list)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-28 19:44   ` Simon Kirby
@ 2014-08-28 20:33     ` Eric W. Biederman
  2014-08-28 20:46       ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2014-08-28 20:33 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Paul E. McKenney, linux-kernel, netdev

Simon Kirby <sim@hostway.ca> writes:

> On Thu, Aug 28, 2014 at 12:24:31PM -0700, Paul E. McKenney wrote:
>
>> On Tue, Aug 19, 2014 at 10:58:55PM -0700, Simon Kirby wrote:
>> > Hello!
>> > 
>> > In trying to figure out what happened to a box running lots of vsftpd
>> > since we deployed a CONFIG_NET_NS=y kernel to it, we found that the
>> > (wall) time needed for cleanup_net() to complete, even on an idle box,
>> > can be quite long:
>> > 
>> > #!/bin/bash
>> > 
>> > ip netns delete test >&/dev/null
>> > while ip netns add test; do
>> >         echo hi
>> >         ip netns delete test
>> > done
>> > 
>> > On my desktop and typical hosts, this prints at only around 4 or 6 per
>> > second. While this is happening, "vmstat 1" reports 100% idle, and there
>> > there are D-state processes with stacks similar to:
>> > 
>> > 30566 [kworker/u16:1] D wait_rcu_gp+0x48, synchronize_sched+0x2f, cleanup_net+0xdb, process_one_work+0x175, worker_thread+0x119, kthread+0xbb, ret_from_fork+0x7c, 0xffffffffffffffff
>> > 
>> > 32220 ip              D copy_net_ns+0x68, create_new_namespaces+0xfc, unshare_nsproxy_namespaces+0x66, SyS_unshare+0x159, system_call_fastpath+0x16, 0xffffffffffffffff
>> > 
>> > copy_net_ns() is waiting on net_mutex which is held by cleanup_net().
>> > 
>> > vsftpd uses CLONE_NEWNET to set up privsep processes. There is a comment
>> > about it being really slow before 2.6.35 (it avoids CLONE_NEWNET in that
>> > case). I didn't find anything that makes 2.6.35 any faster, but on Debian
>> > 2.6.36-5-amd64, I notice it does seem to be a bit faster than 3.2, 3.10,
>> > 3.16, though still not anything I'd ever want to rely on per connection.
>> > 
>> > C implementation of the above: http://0x.ca/sim/ref/tools/netnsloop.c
>> > 
>> > Kernel stack "top": http://0x.ca/sim/ref/tools/pstack
>> > 
>> > What's going on here?
>> 
>> That is a bit slow for many configurations, but there are some exceptions.
>> 
>> So, what is your kernel's .config?
>
> I was unable to find a config (or stock kernel) that was any different,
> but here's the one we're using: http://0x.ca/sim/ref/3.10/config-3.10.53
>
> How fast does the above test run for you?
>
> We've been running with the attached, which has helped a little, but it's
> still quite slow in our particular use case (vsftpd), and with the above
n> test. Should I enable RCU_TRACE or STALL_INFO with a low timeout or
> something?

I just want to add a little bit more analysis to this.

What we desire to be fast is the copy_net_ns, cleanup_net is batched and
asynchronous which nothing really cares how long it takes except that
cleanup_net holds the net_mutex and thus blocks copy_net_ns.

The puzzle is why and which rcu delays Simon is seeing in the network
namespace cleanup path, as it seems like the synchronize_rcu is not
the only one, and in the case of vsftp with trivail network namespaces
where nothing has been done we should not need to delay.

Eric


> Simon-
>
> -- >8 --
> Subject: [PATCH] netns: use synchronize_rcu_expedited instead of
>  synchronize_rcu
>
> Similar to ef323088, with synchronize_rcu(), we are only able to create
> and destroy about 4 or 7 net namespaces per second, which really puts a
> dent in the performance of programs attempting to use CLONE_NEWNET for
> privilege separation (vsftpd, chromium).
> ---
>  net/core/net_namespace.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 85b6269..6dcb4b3 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -296,7 +296,7 @@ static void cleanup_net(struct work_struct *work)
>  	 * This needs to be before calling the exit() notifiers, so
>  	 * the rcu_barrier() below isn't sufficient alone.
>  	 */
> -	synchronize_rcu();
> +	synchronize_rcu_expedited();
>  
>  	/* Run all of the network namespace exit methods */
>  	list_for_each_entry_reverse(ops, &pernet_list, list)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-28 20:33     ` Eric W. Biederman
@ 2014-08-28 20:46       ` Paul E. McKenney
  2014-08-29  0:40         ` Simon Kirby
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2014-08-28 20:46 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Simon Kirby, linux-kernel, netdev

On Thu, Aug 28, 2014 at 03:33:42PM -0500, Eric W. Biederman wrote:
> Simon Kirby <sim@hostway.ca> writes:
> 
> > On Thu, Aug 28, 2014 at 12:24:31PM -0700, Paul E. McKenney wrote:
> >
> >> On Tue, Aug 19, 2014 at 10:58:55PM -0700, Simon Kirby wrote:
> >> > Hello!
> >> > 
> >> > In trying to figure out what happened to a box running lots of vsftpd
> >> > since we deployed a CONFIG_NET_NS=y kernel to it, we found that the
> >> > (wall) time needed for cleanup_net() to complete, even on an idle box,
> >> > can be quite long:
> >> > 
> >> > #!/bin/bash
> >> > 
> >> > ip netns delete test >&/dev/null
> >> > while ip netns add test; do
> >> >         echo hi
> >> >         ip netns delete test
> >> > done
> >> > 
> >> > On my desktop and typical hosts, this prints at only around 4 or 6 per
> >> > second. While this is happening, "vmstat 1" reports 100% idle, and there
> >> > there are D-state processes with stacks similar to:
> >> > 
> >> > 30566 [kworker/u16:1] D wait_rcu_gp+0x48, synchronize_sched+0x2f, cleanup_net+0xdb, process_one_work+0x175, worker_thread+0x119, kthread+0xbb, ret_from_fork+0x7c, 0xffffffffffffffff
> >> > 
> >> > 32220 ip              D copy_net_ns+0x68, create_new_namespaces+0xfc, unshare_nsproxy_namespaces+0x66, SyS_unshare+0x159, system_call_fastpath+0x16, 0xffffffffffffffff
> >> > 
> >> > copy_net_ns() is waiting on net_mutex which is held by cleanup_net().
> >> > 
> >> > vsftpd uses CLONE_NEWNET to set up privsep processes. There is a comment
> >> > about it being really slow before 2.6.35 (it avoids CLONE_NEWNET in that
> >> > case). I didn't find anything that makes 2.6.35 any faster, but on Debian
> >> > 2.6.36-5-amd64, I notice it does seem to be a bit faster than 3.2, 3.10,
> >> > 3.16, though still not anything I'd ever want to rely on per connection.
> >> > 
> >> > C implementation of the above: http://0x.ca/sim/ref/tools/netnsloop.c
> >> > 
> >> > Kernel stack "top": http://0x.ca/sim/ref/tools/pstack
> >> > 
> >> > What's going on here?
> >> 
> >> That is a bit slow for many configurations, but there are some exceptions.
> >> 
> >> So, what is your kernel's .config?
> >
> > I was unable to find a config (or stock kernel) that was any different,
> > but here's the one we're using: http://0x.ca/sim/ref/3.10/config-3.10.53
> >
> > How fast does the above test run for you?
> >
> > We've been running with the attached, which has helped a little, but it's
> > still quite slow in our particular use case (vsftpd), and with the above
> n> test. Should I enable RCU_TRACE or STALL_INFO with a low timeout or
> > something?
> 
> I just want to add a little bit more analysis to this.
> 
> What we desire to be fast is the copy_net_ns, cleanup_net is batched and
> asynchronous which nothing really cares how long it takes except that
> cleanup_net holds the net_mutex and thus blocks copy_net_ns.
> 
> The puzzle is why and which rcu delays Simon is seeing in the network
> namespace cleanup path, as it seems like the synchronize_rcu is not
> the only one, and in the case of vsftp with trivail network namespaces
> where nothing has been done we should not need to delay.

Indeed, given the version and .config, I can't see why any individual
RCU grace-period operation would be particularly slow.

I suggest using ftrace on synchronize_rcu() and friends.

							Thanx, Paul

> Eric
> 
> 
> > Simon-
> >
> > -- >8 --
> > Subject: [PATCH] netns: use synchronize_rcu_expedited instead of
> >  synchronize_rcu
> >
> > Similar to ef323088, with synchronize_rcu(), we are only able to create
> > and destroy about 4 or 7 net namespaces per second, which really puts a
> > dent in the performance of programs attempting to use CLONE_NEWNET for
> > privilege separation (vsftpd, chromium).
> > ---
> >  net/core/net_namespace.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> > index 85b6269..6dcb4b3 100644
> > --- a/net/core/net_namespace.c
> > +++ b/net/core/net_namespace.c
> > @@ -296,7 +296,7 @@ static void cleanup_net(struct work_struct *work)
> >  	 * This needs to be before calling the exit() notifiers, so
> >  	 * the rcu_barrier() below isn't sufficient alone.
> >  	 */
> > -	synchronize_rcu();
> > +	synchronize_rcu_expedited();
> >  
> >  	/* Run all of the network namespace exit methods */
> >  	list_for_each_entry_reverse(ops, &pernet_list, list)
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-28 20:46       ` Paul E. McKenney
@ 2014-08-29  0:40         ` Simon Kirby
  2014-08-29  3:57           ` Julian Anastasov
  2014-08-30  2:52           ` Paul E. McKenney
  0 siblings, 2 replies; 12+ messages in thread
From: Simon Kirby @ 2014-08-29  0:40 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Eric W. Biederman, linux-kernel, netdev

On Thu, Aug 28, 2014 at 01:46:58PM -0700, Paul E. McKenney wrote:

> On Thu, Aug 28, 2014 at 03:33:42PM -0500, Eric W. Biederman wrote:
> 
> > I just want to add a little bit more analysis to this.
> > 
> > What we desire to be fast is the copy_net_ns, cleanup_net is batched and
> > asynchronous which nothing really cares how long it takes except that
> > cleanup_net holds the net_mutex and thus blocks copy_net_ns.
> > 
> > The puzzle is why and which rcu delays Simon is seeing in the network
> > namespace cleanup path, as it seems like the synchronize_rcu is not
> > the only one, and in the case of vsftp with trivail network namespaces
> > where nothing has been done we should not need to delay.
> 
> Indeed, given the version and .config, I can't see why any individual
> RCU grace-period operation would be particularly slow.
> 
> I suggest using ftrace on synchronize_rcu() and friends.

I made a parallel net namespace create/destroy benchmark that prints the
progress and time to create and cleanup 32 unshare()d child processes:

http://0x.ca/sim/ref/tools/netnsbench.c

I noticed that if I haven't run it for a while, the first batch often is
fast, followed by slowness from then on:

++++++++++++++++++++++++++++++++-------------------------------- 0.039478s
++++++++++++++++++++-----+----------------+++++++++---------++-- 4.463837s
+++++++++++++++++++++++++------+--------------------++++++------ 3.011882s
+++++++++++++++---+-------------++++++++++++++++---------------- 2.283993s

Fiddling around on a stock kernel, "echo 1 > /sys/kernel/rcu_expedited"
makes behaviour change as it did with my patch:

++-++-+++-+-----+-+-++-+-++--++-+--+-+-++--++-+-+-+-++-+--++---- 0.801406s
+-+-+-++-+-+-+-+-++--+-+-++-+--++-+-+-+-+-+-+-+-+-+-+-+--++-+--- 0.872011s
++--+-++--+-++--+-++--+-+-+-+-++-+--++--+-++-+-+-+-+--++-+-+-+-- 0.946745s

How would I use ftrace on synchronize_rcu() here?

As Eric said, cleanup_net() is batched, but while it is cleaning up,
net_mutex is held. Isn't the issue just that net_mutex is held while
some other things are going on that are meant to be lazy / batched?

What is net_mutex protecting in cleanup_net()?

I noticed that [kworker/u16:0]'s stack is often:

[<ffffffff810942a6>] wait_rcu_gp+0x46/0x50
[<ffffffff8109607e>] synchronize_sched+0x2e/0x50
[<ffffffffa00385ac>] nf_nat_net_exit+0x2c/0x50 [nf_nat]
[<ffffffff81720339>] ops_exit_list.isra.4+0x39/0x60
[<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
[<ffffffff81062997>] process_one_work+0x157/0x440
[<ffffffff81063303>] worker_thread+0x63/0x520
[<ffffffff81068b96>] kthread+0xd6/0xf0
[<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

and

[<ffffffff81095364>] _rcu_barrier+0x154/0x1f0
[<ffffffff81095450>] rcu_barrier+0x10/0x20
[<ffffffff81102c2c>] kmem_cache_destroy+0x6c/0xb0
[<ffffffffa0089e97>] nf_conntrack_cleanup_net_list+0x167/0x1c0 [nf_conntrack]
[<ffffffffa008aab5>] nf_conntrack_pernet_exit+0x65/0x70 [nf_conntrack]
[<ffffffff81720353>] ops_exit_list.isra.4+0x53/0x60
[<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
[<ffffffff81062997>] process_one_work+0x157/0x440
[<ffffffff81063303>] worker_thread+0x63/0x520
[<ffffffff81068b96>] kthread+0xd6/0xf0
[<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

So I tried flushing iptables rules and rmmoding netfilter bits:

++++++++++++++++++++-+--------------------+++++++++++----------- 0.179940s
++++++++++++++--+-------------+++++++++++++++++----------------- 0.151988s
++++++++++++++++++++++++++++---+--------------------------+++--- 0.159967s
++++++++++++++++++++++----------------------++++++++++---------- 0.175964s

Expedited:

++-+--++-+-+-+-+-+-+--++-+-+-++-+-+-+--++-+-+-+-+-+-+-+-+-+-+--- 0.079988s
++-+-+-+-+-+-+-+-+-+-+-+--++-+--++-+--+-++-+-+--++-+-+-+-+-+-+-- 0.089347s
++++--+++--++--+-+++++++-+++++--------------++-+-+--++-+-+--++-- 0.081566s
+++++-+++-------++-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+--- 0.089026s

So, much faster. It seems that just loading nf_conntrack_ipv4 (like by
running iptables -t nat -nvL) is enough to slow it way down. But it is
still capable of being fast, as above.

Simon-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-29  0:40         ` Simon Kirby
@ 2014-08-29  3:57           ` Julian Anastasov
  2014-08-29 21:57             ` Eric W. Biederman
  2014-08-30  2:52           ` Paul E. McKenney
  1 sibling, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2014-08-29  3:57 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Paul E. McKenney, Eric W. Biederman, linux-kernel, netdev


	Hello,

On Thu, 28 Aug 2014, Simon Kirby wrote:

> I noticed that [kworker/u16:0]'s stack is often:
> 
> [<ffffffff810942a6>] wait_rcu_gp+0x46/0x50
> [<ffffffff8109607e>] synchronize_sched+0x2e/0x50
> [<ffffffffa00385ac>] nf_nat_net_exit+0x2c/0x50 [nf_nat]

	I guess the problem is in nf_nat_net_exit,
may be other nf exit handlers too. pernet-exit handlers
should avoid synchronize_rcu and rcu_barrier.
A RCU callback and rcu_barrier in module-exit is the way
to go. cleanup_net includes rcu_barrier, so pernet-exit
does not need such calls.

> [<ffffffff81720339>] ops_exit_list.isra.4+0x39/0x60
> [<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
> [<ffffffff81062997>] process_one_work+0x157/0x440
> [<ffffffff81063303>] worker_thread+0x63/0x520
> [<ffffffff81068b96>] kthread+0xd6/0xf0
> [<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-29  3:57           ` Julian Anastasov
@ 2014-08-29 21:57             ` Eric W. Biederman
  2014-08-29 23:52               ` Florian Westphal
  2014-08-30  8:20               ` Julian Anastasov
  0 siblings, 2 replies; 12+ messages in thread
From: Eric W. Biederman @ 2014-08-29 21:57 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Simon Kirby, Paul E. McKenney, linux-kernel, netdev,
	Florian Westphal, Pablo Neira Ayuso

Julian Anastasov <ja@ssi.bg> writes:

> 	Hello,
>
> On Thu, 28 Aug 2014, Simon Kirby wrote:
>
>> I noticed that [kworker/u16:0]'s stack is often:
>> 
>> [<ffffffff810942a6>] wait_rcu_gp+0x46/0x50
>> [<ffffffff8109607e>] synchronize_sched+0x2e/0x50
>> [<ffffffffa00385ac>] nf_nat_net_exit+0x2c/0x50 [nf_nat]
>
> 	I guess the problem is in nf_nat_net_exit,
> may be other nf exit handlers too. pernet-exit handlers
> should avoid synchronize_rcu and rcu_barrier.
> A RCU callback and rcu_barrier in module-exit is the way
> to go. cleanup_net includes rcu_barrier, so pernet-exit
> does not need such calls.

In principle I agree, however in this particular case it looks a bit
tricky because a separate hash table to track nat state per network
namespace.

At the same time all of the packets should be drained before
we get to nf_nat_net_exit so it doesn't look the synchronize_rcu
in nf_nat_exit is actually protecting anything.

Further calling a rcu delay function in net_exit methods largely
destroys the batched cleanup of network namespaces, so it is very
unpleasant.

Could someone who knows nf_nat_core.c better than I do look and
see if we can just remove the synchronize_rcu in nf_nat_exit?

>> [<ffffffff81720339>] ops_exit_list.isra.4+0x39/0x60
>> [<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
>> [<ffffffff81062997>] process_one_work+0x157/0x440
>> [<ffffffff81063303>] worker_thread+0x63/0x520
>> [<ffffffff81068b96>] kthread+0xd6/0xf0
>> [<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
>> [<ffffffffffffffff>] 0xffffffffffffffff

Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-29 21:57             ` Eric W. Biederman
@ 2014-08-29 23:52               ` Florian Westphal
  2014-08-30  2:56                 ` Paul E. McKenney
  2014-08-30  8:20               ` Julian Anastasov
  1 sibling, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2014-08-29 23:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Julian Anastasov, Simon Kirby, Paul E. McKenney, linux-kernel,
	netdev, Florian Westphal, Pablo Neira Ayuso

Eric W. Biederman <ebiederm@xmission.com> wrote:
> Julian Anastasov <ja@ssi.bg> writes:
> 
> > 	Hello,
> >
> > On Thu, 28 Aug 2014, Simon Kirby wrote:
> >
> >> I noticed that [kworker/u16:0]'s stack is often:
> >> 
> >> [<ffffffff810942a6>] wait_rcu_gp+0x46/0x50
> >> [<ffffffff8109607e>] synchronize_sched+0x2e/0x50
> >> [<ffffffffa00385ac>] nf_nat_net_exit+0x2c/0x50 [nf_nat]
> >
> > 	I guess the problem is in nf_nat_net_exit,
> > may be other nf exit handlers too. pernet-exit handlers
> > should avoid synchronize_rcu and rcu_barrier.
> > A RCU callback and rcu_barrier in module-exit is the way
> > to go. cleanup_net includes rcu_barrier, so pernet-exit
> > does not need such calls.
> 
> In principle I agree, however in this particular case it looks a bit
> tricky because a separate hash table to track nat state per network
> namespace.
> 
> At the same time all of the packets should be drained before
> we get to nf_nat_net_exit so it doesn't look the synchronize_rcu
> in nf_nat_exit is actually protecting anything.

Hmm, the problem is with the conntrack entries living in the netns being
destroyed.

I don't think they are guaranteed to be removed by the time
the nat netns exit function runs.

> Further calling a rcu delay function in net_exit methods largely
> destroys the batched cleanup of network namespaces, so it is very
> unpleasant.
> 
> Could someone who knows nf_nat_core.c better than I do look and
> see if we can just remove the synchronize_rcu in nf_nat_exit?

If I remember correctly its needed to ensure that
all conntracks with nat extensions that might still be referenced
on other cpu have finished (i.e., nf_conntrack_destroy() has been
called, which calls nf_nat_cleanup_conntrack() which deletes
the extension from the hash table).

As we remove the ct from that table ourselves EXCEPT in the
case where we cannot steal the timers' reference we should
be able to avoid that call virtually every time.

Perhaps this is worth a shot (not even compile tested):

diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 4e0b478..80cfe10 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -508,6 +508,7 @@ EXPORT_SYMBOL_GPL(nf_nat_packet);
 struct nf_nat_proto_clean {
 	u8	l3proto;
 	u8	l4proto;
+	bool	need_sync_rcu;
 };
 
 /* kill conntracks with affected NAT section */
@@ -528,23 +529,32 @@ static int nf_nat_proto_remove(struct nf_conn *i, void *data)
 
 static int nf_nat_proto_clean(struct nf_conn *ct, void *data)
 {
+	struct nf_nat_proto_clean *clean = data;
 	struct nf_conn_nat *nat = nfct_nat(ct);
 
-	if (nf_nat_proto_remove(ct, data))
-		return 1;
-
 	if (!nat || !nat->ct)
 		return 0;
 
-	/* This netns is being destroyed, and conntrack has nat null binding.
+	/* This netns is being destroyed, and conntrack has nat binding.
 	 * Remove it from bysource hash, as the table will be freed soon.
 	 *
-	 * Else, when the conntrack is destoyed, nf_nat_cleanup_conntrack()
+	 * Else, when the conntrack is destroyed, nf_nat_cleanup_conntrack()
 	 * will delete entry from already-freed table.
 	 */
-	if (!del_timer(&ct->timeout))
+	if (!del_timer(&ct->timeout)) {
+		/* We have nat binding, but destruction
+		 * might already be in progress.
+		 *
+		 * nat entry is removed only after last
+		 * nf_ct_put().
+		 */
+		clean->need_sync_rcu = true;
 		return 1;
+	}
 
+	/* We stole refcount owned by timer;
+	 * conntrack cannot go away.
+	 */
 	spin_lock_bh(&nf_nat_lock);
 	hlist_del_rcu(&nat->bysource);
 	ct->status &= ~IPS_NAT_DONE_MASK;
@@ -553,6 +563,9 @@ static int nf_nat_proto_clean(struct nf_conn *ct, void *data)
 
 	add_timer(&ct->timeout);
 
+	if (nf_nat_proto_remove(ct, data))
+		return 1;
+
 	/* don't delete conntrack.  Although that would make things a lot
 	 * simpler, we'd end up flushing all conntracks on nat rmmod.
 	 */
@@ -830,7 +843,8 @@ static void __net_exit nf_nat_net_exit(struct net *net)
 	struct nf_nat_proto_clean clean = {};
 
 	nf_ct_iterate_cleanup(net, nf_nat_proto_clean, &clean, 0, 0);
-	synchronize_rcu();
+	if (clean.need_sync_rcu)
+		synchronize_rcu();
 	nf_ct_free_hashtable(net->ct.nat_bysource, net->ct.nat_htable_size);
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-29  0:40         ` Simon Kirby
  2014-08-29  3:57           ` Julian Anastasov
@ 2014-08-30  2:52           ` Paul E. McKenney
  1 sibling, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2014-08-30  2:52 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Eric W. Biederman, linux-kernel, netdev

On Thu, Aug 28, 2014 at 05:40:29PM -0700, Simon Kirby wrote:
> On Thu, Aug 28, 2014 at 01:46:58PM -0700, Paul E. McKenney wrote:
> 
> > On Thu, Aug 28, 2014 at 03:33:42PM -0500, Eric W. Biederman wrote:
> > 
> > > I just want to add a little bit more analysis to this.
> > > 
> > > What we desire to be fast is the copy_net_ns, cleanup_net is batched and
> > > asynchronous which nothing really cares how long it takes except that
> > > cleanup_net holds the net_mutex and thus blocks copy_net_ns.
> > > 
> > > The puzzle is why and which rcu delays Simon is seeing in the network
> > > namespace cleanup path, as it seems like the synchronize_rcu is not
> > > the only one, and in the case of vsftp with trivail network namespaces
> > > where nothing has been done we should not need to delay.
> > 
> > Indeed, given the version and .config, I can't see why any individual
> > RCU grace-period operation would be particularly slow.
> > 
> > I suggest using ftrace on synchronize_rcu() and friends.
> 
> I made a parallel net namespace create/destroy benchmark that prints the
> progress and time to create and cleanup 32 unshare()d child processes:
> 
> http://0x.ca/sim/ref/tools/netnsbench.c
> 
> I noticed that if I haven't run it for a while, the first batch often is
> fast, followed by slowness from then on:
> 
> ++++++++++++++++++++++++++++++++-------------------------------- 0.039478s
> ++++++++++++++++++++-----+----------------+++++++++---------++-- 4.463837s
> +++++++++++++++++++++++++------+--------------------++++++------ 3.011882s
> +++++++++++++++---+-------------++++++++++++++++---------------- 2.283993s
> 
> Fiddling around on a stock kernel, "echo 1 > /sys/kernel/rcu_expedited"
> makes behaviour change as it did with my patch:
> 
> ++-++-+++-+-----+-+-++-+-++--++-+--+-+-++--++-+-+-+-++-+--++---- 0.801406s
> +-+-+-++-+-+-+-+-++--+-+-++-+--++-+-+-+-+-+-+-+-+-+-+-+--++-+--- 0.872011s
> ++--+-++--+-++--+-++--+-+-+-+-++-+--++--+-++-+-+-+-+--++-+-+-+-- 0.946745s
> 
> How would I use ftrace on synchronize_rcu() here?

http://lwn.net/Articles/370423/ is your friend here.  If your kernel
is built with the needed configuration, you give the command
"echo synchronize_rcu > set_ftrace_filter"

http://lwn.net/Articles/365835/ and http://lwn.net/Articles/366796/
have background info.

> As Eric said, cleanup_net() is batched, but while it is cleaning up,
> net_mutex is held. Isn't the issue just that net_mutex is held while
> some other things are going on that are meant to be lazy / batched?
> 
> What is net_mutex protecting in cleanup_net()?
> 
> I noticed that [kworker/u16:0]'s stack is often:
> 
> [<ffffffff810942a6>] wait_rcu_gp+0x46/0x50
> [<ffffffff8109607e>] synchronize_sched+0x2e/0x50
> [<ffffffffa00385ac>] nf_nat_net_exit+0x2c/0x50 [nf_nat]
> [<ffffffff81720339>] ops_exit_list.isra.4+0x39/0x60
> [<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
> [<ffffffff81062997>] process_one_work+0x157/0x440
> [<ffffffff81063303>] worker_thread+0x63/0x520
> [<ffffffff81068b96>] kthread+0xd6/0xf0
> [<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> and
> 
> [<ffffffff81095364>] _rcu_barrier+0x154/0x1f0
> [<ffffffff81095450>] rcu_barrier+0x10/0x20
> [<ffffffff81102c2c>] kmem_cache_destroy+0x6c/0xb0
> [<ffffffffa0089e97>] nf_conntrack_cleanup_net_list+0x167/0x1c0 [nf_conntrack]
> [<ffffffffa008aab5>] nf_conntrack_pernet_exit+0x65/0x70 [nf_conntrack]
> [<ffffffff81720353>] ops_exit_list.isra.4+0x53/0x60
> [<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
> [<ffffffff81062997>] process_one_work+0x157/0x440
> [<ffffffff81063303>] worker_thread+0x63/0x520
> [<ffffffff81068b96>] kthread+0xd6/0xf0
> [<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> So I tried flushing iptables rules and rmmoding netfilter bits:
> 
> ++++++++++++++++++++-+--------------------+++++++++++----------- 0.179940s
> ++++++++++++++--+-------------+++++++++++++++++----------------- 0.151988s
> ++++++++++++++++++++++++++++---+--------------------------+++--- 0.159967s
> ++++++++++++++++++++++----------------------++++++++++---------- 0.175964s
> 
> Expedited:
> 
> ++-+--++-+-+-+-+-+-+--++-+-+-++-+-+-+--++-+-+-+-+-+-+-+-+-+-+--- 0.079988s
> ++-+-+-+-+-+-+-+-+-+-+-+--++-+--++-+--+-++-+-+--++-+-+-+-+-+-+-- 0.089347s
> ++++--+++--++--+-+++++++-+++++--------------++-+-+--++-+-+--++-- 0.081566s
> +++++-+++-------++-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+--- 0.089026s
> 
> So, much faster. It seems that just loading nf_conntrack_ipv4 (like by
> running iptables -t nat -nvL) is enough to slow it way down. But it is
> still capable of being fast, as above.

My first guess is that this code sequence is calling synchronize_rcu()
quite often.  Would it be possible to consolidate these?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-29 23:52               ` Florian Westphal
@ 2014-08-30  2:56                 ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2014-08-30  2:56 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Eric W. Biederman, Julian Anastasov, Simon Kirby, linux-kernel,
	netdev, Pablo Neira Ayuso

On Sat, Aug 30, 2014 at 01:52:06AM +0200, Florian Westphal wrote:
> Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Julian Anastasov <ja@ssi.bg> writes:
> > 
> > > 	Hello,
> > >
> > > On Thu, 28 Aug 2014, Simon Kirby wrote:
> > >
> > >> I noticed that [kworker/u16:0]'s stack is often:
> > >> 
> > >> [<ffffffff810942a6>] wait_rcu_gp+0x46/0x50
> > >> [<ffffffff8109607e>] synchronize_sched+0x2e/0x50
> > >> [<ffffffffa00385ac>] nf_nat_net_exit+0x2c/0x50 [nf_nat]
> > >
> > > 	I guess the problem is in nf_nat_net_exit,
> > > may be other nf exit handlers too. pernet-exit handlers
> > > should avoid synchronize_rcu and rcu_barrier.
> > > A RCU callback and rcu_barrier in module-exit is the way
> > > to go. cleanup_net includes rcu_barrier, so pernet-exit
> > > does not need such calls.
> > 
> > In principle I agree, however in this particular case it looks a bit
> > tricky because a separate hash table to track nat state per network
> > namespace.
> > 
> > At the same time all of the packets should be drained before
> > we get to nf_nat_net_exit so it doesn't look the synchronize_rcu
> > in nf_nat_exit is actually protecting anything.
> 
> Hmm, the problem is with the conntrack entries living in the netns being
> destroyed.
> 
> I don't think they are guaranteed to be removed by the time
> the nat netns exit function runs.
> 
> > Further calling a rcu delay function in net_exit methods largely
> > destroys the batched cleanup of network namespaces, so it is very
> > unpleasant.
> > 
> > Could someone who knows nf_nat_core.c better than I do look and
> > see if we can just remove the synchronize_rcu in nf_nat_exit?
> 
> If I remember correctly its needed to ensure that
> all conntracks with nat extensions that might still be referenced
> on other cpu have finished (i.e., nf_conntrack_destroy() has been
> called, which calls nf_nat_cleanup_conntrack() which deletes
> the extension from the hash table).
> 
> As we remove the ct from that table ourselves EXCEPT in the
> case where we cannot steal the timers' reference we should
> be able to avoid that call virtually every time.
> 
> Perhaps this is worth a shot (not even compile tested):
> 
> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> index 4e0b478..80cfe10 100644
> --- a/net/netfilter/nf_nat_core.c
> +++ b/net/netfilter/nf_nat_core.c
> @@ -508,6 +508,7 @@ EXPORT_SYMBOL_GPL(nf_nat_packet);
>  struct nf_nat_proto_clean {
>  	u8	l3proto;
>  	u8	l4proto;
> +	bool	need_sync_rcu;
>  };
> 
>  /* kill conntracks with affected NAT section */
> @@ -528,23 +529,32 @@ static int nf_nat_proto_remove(struct nf_conn *i, void *data)
> 
>  static int nf_nat_proto_clean(struct nf_conn *ct, void *data)
>  {
> +	struct nf_nat_proto_clean *clean = data;
>  	struct nf_conn_nat *nat = nfct_nat(ct);
> 
> -	if (nf_nat_proto_remove(ct, data))
> -		return 1;
> -
>  	if (!nat || !nat->ct)
>  		return 0;
> 
> -	/* This netns is being destroyed, and conntrack has nat null binding.
> +	/* This netns is being destroyed, and conntrack has nat binding.
>  	 * Remove it from bysource hash, as the table will be freed soon.
>  	 *
> -	 * Else, when the conntrack is destoyed, nf_nat_cleanup_conntrack()
> +	 * Else, when the conntrack is destroyed, nf_nat_cleanup_conntrack()
>  	 * will delete entry from already-freed table.
>  	 */
> -	if (!del_timer(&ct->timeout))
> +	if (!del_timer(&ct->timeout)) {
> +		/* We have nat binding, but destruction
> +		 * might already be in progress.
> +		 *
> +		 * nat entry is removed only after last
> +		 * nf_ct_put().
> +		 */
> +		clean->need_sync_rcu = true;

So this happens only if we race with the timer handler?  If so, this
patch might give good speedups.  (Can't comment on any other correctness
issues due to unfamiliarity with the code and what it is trying to do.)

							Thanx, Paul

>  		return 1;
> +	}
> 
> +	/* We stole refcount owned by timer;
> +	 * conntrack cannot go away.
> +	 */
>  	spin_lock_bh(&nf_nat_lock);
>  	hlist_del_rcu(&nat->bysource);
>  	ct->status &= ~IPS_NAT_DONE_MASK;
> @@ -553,6 +563,9 @@ static int nf_nat_proto_clean(struct nf_conn *ct, void *data)
> 
>  	add_timer(&ct->timeout);
> 
> +	if (nf_nat_proto_remove(ct, data))
> +		return 1;
> +
>  	/* don't delete conntrack.  Although that would make things a lot
>  	 * simpler, we'd end up flushing all conntracks on nat rmmod.
>  	 */
> @@ -830,7 +843,8 @@ static void __net_exit nf_nat_net_exit(struct net *net)
>  	struct nf_nat_proto_clean clean = {};
> 
>  	nf_ct_iterate_cleanup(net, nf_nat_proto_clean, &clean, 0, 0);
> -	synchronize_rcu();
> +	if (clean.need_sync_rcu)
> +		synchronize_rcu();
>  	nf_ct_free_hashtable(net->ct.nat_bysource, net->ct.nat_htable_size);
>  }
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: net_ns cleanup / RCU overhead
  2014-08-29 21:57             ` Eric W. Biederman
  2014-08-29 23:52               ` Florian Westphal
@ 2014-08-30  8:20               ` Julian Anastasov
  1 sibling, 0 replies; 12+ messages in thread
From: Julian Anastasov @ 2014-08-30  8:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Simon Kirby, Paul E. McKenney, linux-kernel, netdev,
	Florian Westphal, Pablo Neira Ayuso


	Hello,

On Fri, 29 Aug 2014, Eric W. Biederman wrote:

> > 	I guess the problem is in nf_nat_net_exit,
> > may be other nf exit handlers too. pernet-exit handlers
> > should avoid synchronize_rcu and rcu_barrier.
> > A RCU callback and rcu_barrier in module-exit is the way
> > to go. cleanup_net includes rcu_barrier, so pernet-exit
> > does not need such calls.
> 
> In principle I agree, however in this particular case it looks a bit
> tricky because a separate hash table to track nat state per network
> namespace.

	It is still possible module's pernet-init handler to
attach in net->ct... special structure with all pointers that
should be freed by RCU callback for the module, like the hash table.

For example:

struct netns_ct_nat_rcu_allocs {
	struct rcu_head		rcu_head;
	struct hlist_head	*nat_bysource;
	unsigned int		nat_htable_size;
};

- pernet-init:
	- allocate structure, attach it to net->ct.nat_rcu_allocs
	- the original nat_bysource place remains because
	we want to avoid net->ct.nat_rcu_allocs dereference.

- pernet-exit:
	- copy nat_bysource and nat_htable_size to net->ct.nat_rcu_allocs,
	this can be done even in above pernet-init function
	- call_rcu(&net->ct.nat_rcu_allocs->rcu_head, nat_rcu_free);

- cleanup_net:
	- rcu_barrier()

- RCU callback (nat_rcu_free):
	- call nf_ct_free_hashtable
	- kfree the structure

- cleanup_net:
	- drop netns after rcu_barrier

	Due to the rcu_barrier in cleanup_net it is even
possible to provide per-module rcu_head instead of using
allocated structure, for example:

	call_rcu(&net->ct.nat_rcu_head, nat_rcu_free);

	Then the nat_rcu_free function will just call
nf_ct_free_hashtable before cleanup_net drops the netns struct.
In this case the memory price is just one rcu_head for every module
that uses RCU callback.

> At the same time all of the packets should be drained before
> we get to nf_nat_net_exit so it doesn't look the synchronize_rcu
> in nf_nat_exit is actually protecting anything.

	It is true for cleanup_net. I don't remember, can we
see packets while the particular module-exit calls
unregister_pernet_subsys(), may be yes?

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-08-30  8:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-20  5:58 net_ns cleanup / RCU overhead Simon Kirby
2014-08-28 19:24 ` Paul E. McKenney
2014-08-28 19:44   ` Simon Kirby
2014-08-28 20:33     ` Eric W. Biederman
2014-08-28 20:46       ` Paul E. McKenney
2014-08-29  0:40         ` Simon Kirby
2014-08-29  3:57           ` Julian Anastasov
2014-08-29 21:57             ` Eric W. Biederman
2014-08-29 23:52               ` Florian Westphal
2014-08-30  2:56                 ` Paul E. McKenney
2014-08-30  8:20               ` Julian Anastasov
2014-08-30  2:52           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).