regression: unregister_netdev() unusably slow

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* regression: unregister_netdev() unusably slow
@ 2009-05-24 19:21 Benjamin LaHaise
  2009-05-24 21:23 ` Denys Fedoryschenko
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin LaHaise @ 2009-05-24 19:21 UTC (permalink / raw)
  To: netdev

Hi folks,

I just ran a few L2TP tests against 2.6.30-rc7, and it looks like network 
device deletion has become unusably slow.  At least in 2.6.27.10, deleting 
1000 network interfaces takes less than 2 seconds of real time.  The same 
test run under 2.6.30-rc7 is taking hundreds of seconds to delete 1000 
interfaces at a rate of about 5 per second.  The interfaces all share the 
same local ip address, but each have a single route to a unique client 
ip address.

This is a fairly reasonable use-case, as a single L2TP daemon can be 
terminating thousands of client connections on a single tunnel, and a 
tunnel flap will require tearing down all these interfaces.  I'll work on 
bisecting it, but if someone has any ideas of the source, I'd appreciate 
hearing about it.

		-ben

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-24 19:21 regression: unregister_netdev() unusably slow Benjamin LaHaise
@ 2009-05-24 21:23 ` Denys Fedoryschenko
  2009-05-24 21:37   ` Benjamin LaHaise
  0 siblings, 1 reply; 12+ messages in thread
From: Denys Fedoryschenko @ 2009-05-24 21:23 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: netdev

Once oprofile gave me hint about similar issue

On Sunday 24 May 2009 22:21:50 Benjamin LaHaise wrote:
> Hi folks,
>
> I just ran a few L2TP tests against 2.6.30-rc7, and it looks like network
> device deletion has become unusably slow.  At least in 2.6.27.10, deleting
> 1000 network interfaces takes less than 2 seconds of real time.  The same
> test run under 2.6.30-rc7 is taking hundreds of seconds to delete 1000
> interfaces at a rate of about 5 per second.  The interfaces all share the
> same local ip address, but each have a single route to a unique client
> ip address.
>
> This is a fairly reasonable use-case, as a single L2TP daemon can be
> terminating thousands of client connections on a single tunnel, and a
> tunnel flap will require tearing down all these interfaces.  I'll work on
> bisecting it, but if someone has any ideas of the source, I'd appreciate
> hearing about it.
>
> 		-ben
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-24 21:23 ` Denys Fedoryschenko
@ 2009-05-24 21:37   ` Benjamin LaHaise
  2009-05-24 21:42     ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin LaHaise @ 2009-05-24 21:37 UTC (permalink / raw)
  To: Denys Fedoryschenko; +Cc: netdev

On Mon, May 25, 2009 at 12:23:30AM +0300, Denys Fedoryschenko wrote:
> Once oprofile gave me hint about similar issue

I forgot to mention: there is no more than ~1% CPU usage while the interfaces 
are being deleted.  I'm about half way through the bisect now and should have 
the culprit soon. 

		-ben

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-24 21:37   ` Benjamin LaHaise
@ 2009-05-24 21:42     ` Eric Dumazet
  2009-05-24 21:44       ` Benjamin LaHaise
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2009-05-24 21:42 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Denys Fedoryschenko, netdev

Benjamin LaHaise a écrit :
> On Mon, May 25, 2009 at 12:23:30AM +0300, Denys Fedoryschenko wrote:
>> Once oprofile gave me hint about similar issue
> 
> I forgot to mention: there is no more than ~1% CPU usage while the interfaces 
> are being deleted.  I'm about half way through the bisect now and should have 
> the culprit soon. 
> 
> 		-ben

unregister a vlan here cost about 100 ms if CONFIG_NO_HZ is set, 50 ms if not set

(But vlan case might be litle bit more expensive than your case, since we call
synchronize_net() three times (once in unregister_vlan_dev(), and twice in rollback_registered()



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-24 21:42     ` Eric Dumazet
@ 2009-05-24 21:44       ` Benjamin LaHaise
  2009-05-24 22:07         ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin LaHaise @ 2009-05-24 21:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Denys Fedoryschenko, netdev

On Sun, May 24, 2009 at 11:42:17PM +0200, Eric Dumazet wrote:
> (But vlan case might be litle bit more expensive than your case, since we call
> synchronize_net() three times (once in unregister_vlan_dev(), and twice in rollback_registered()

I did try commenting out the synchronize_net() calls in rollback_registered(), 
but that had almost no effect on the rate of interface deletion.  50ms is 
still way too expensive.

		-ben

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-24 21:44       ` Benjamin LaHaise
@ 2009-05-24 22:07         ` Eric Dumazet
  2009-05-24 22:12           ` Benjamin LaHaise
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2009-05-24 22:07 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Denys Fedoryschenko, netdev

Benjamin LaHaise a écrit :
> On Sun, May 24, 2009 at 11:42:17PM +0200, Eric Dumazet wrote:
>> (But vlan case might be litle bit more expensive than your case, since we call
>> synchronize_net() three times (once in unregister_vlan_dev(), and twice in rollback_registered()
> 
> I did try commenting out the synchronize_net() calls in rollback_registered(), 
> but that had almost no effect on the rate of interface deletion.  50ms is 
> still way too expensive.
> 
> 		-ben

Maybe your HZ is too low ? Changing to 1000 HZ helps a lot

Also your hotplug config might do strange things at device removal ?

What distro do you use ?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-24 22:07         ` Eric Dumazet
@ 2009-05-24 22:12           ` Benjamin LaHaise
  2009-05-24 22:47             ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin LaHaise @ 2009-05-24 22:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Denys Fedoryschenko, netdev

On Mon, May 25, 2009 at 12:07:07AM +0200, Eric Dumazet wrote:
> Maybe your HZ is too low ? Changing to 1000 HZ helps a lot

HZ is set to 1000.  I turned off CONFIG_NO_HZ

> Also your hotplug config might do strange things at device removal ?

If it were CPU bound, maybe, but this behaviour is tied to the kernel 
version in use.

> What distro do you use ?

It's an older Fedora 7 install inside kvm on a Fedora 10 host running 
2.6.30-rc7.

		-ben

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-24 22:12           ` Benjamin LaHaise
@ 2009-05-24 22:47             ` Eric Dumazet
  2009-05-25  0:00               ` Benjamin LaHaise
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2009-05-24 22:47 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Denys Fedoryschenko, netdev

Benjamin LaHaise a écrit :
> On Mon, May 25, 2009 at 12:07:07AM +0200, Eric Dumazet wrote:
>> Maybe your HZ is too low ? Changing to 1000 HZ helps a lot
> 
> HZ is set to 1000.  I turned off CONFIG_NO_HZ
> 
>> Also your hotplug config might do strange things at device removal ?
> 
> If it were CPU bound, maybe, but this behaviour is tied to the kernel 
> version in use.
> 
>> What distro do you use ?
> 
> It's an older Fedora 7 install inside kvm on a Fedora 10 host running 
> 2.6.30-rc7.
> 

OK thanks

I switched HZ from 250 to 1000 and got :

time ip link del vlan.899

real    0m0.011s
user    0m0.000s
sys     0m0.002s

There is a strong dependancy against HZ
BTW, I am using TREE_RCU

# RCU Subsystem
# CONFIG_CLASSIC_RCU is not set
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_RCU_TRACE=y
CONFIG_RCU_FANOUT=32
# CONFIG_RCU_FANOUT_EXACT is not set
CONFIG_TREE_RCU_TRACE=y
# CONFIG_PREEMPT_RCU_TRACE is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_CPU_STALL_DETECTOR is not set



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-24 22:47             ` Eric Dumazet
@ 2009-05-25  0:00               ` Benjamin LaHaise
  2009-05-25  5:22                 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin LaHaise @ 2009-05-25  0:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Denys Fedoryschenko, netdev

On Mon, May 25, 2009 at 12:47:39AM +0200, Eric Dumazet wrote:
> There is a strong dependancy against HZ
> BTW, I am using TREE_RCU

I'm using CLASSIC_RCU.  The bisect just completed, and it points to RCU.  
It makes some degree of sense since I'm testing on an otherwise idle 
machine.  That said, where is fixing it going to make sense?  I'm not 
opposed to having device unregister take a few timer ticks, but there 
has to be some way of exposing parallelism to the system, and since the 
synchronize_net() calls are done under rntl_lock(), none is possible at 
present.  Hrm.

		-ben

bf51935f3e988e0ed6f34b55593e5912f990750a is first bad commit
commit bf51935f3e988e0ed6f34b55593e5912f990750a
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Feb 17 06:01:30 2009 -0800

    x86, rcu: fix strange load average and ksoftirqd behavior
    
    Damien Wyart reported high ksoftirqd CPU usage (20%) on an
    otherwise idle system.
    
    The function-graph trace Damien provided:
...
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c

index a546f55..bd4da2a 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -104,9 +104,6 @@ void cpu_idle(void)
 			check_pgt_cache();
 			rmb();
 
-			if (rcu_pending(cpu))
-				rcu_check_callbacks(cpu, 0);
-
 			if (cpu_is_offline(cpu))
 				play_dead();
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-25  0:00               ` Benjamin LaHaise
@ 2009-05-25  5:22                 ` Eric Dumazet
  2009-05-25  8:04                   ` Damien Wyart
  2009-05-25 16:21                   ` Paul E. McKenney
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2009-05-25  5:22 UTC (permalink / raw)
  To: Benjamin LaHaise, Paul E. McKenney
  Cc: Denys Fedoryschenko, netdev, linux kernel, damien.wyart

Benjamin LaHaise a écrit :
> On Mon, May 25, 2009 at 12:47:39AM +0200, Eric Dumazet wrote:
>> There is a strong dependancy against HZ
>> BTW, I am using TREE_RCU
> 
> I'm using CLASSIC_RCU.  The bisect just completed, and it points to RCU.  
> It makes some degree of sense since I'm testing on an otherwise idle 
> machine.  That said, where is fixing it going to make sense?  I'm not 
> opposed to having device unregister take a few timer ticks, but there 
> has to be some way of exposing parallelism to the system, and since the 
> synchronize_net() calls are done under rntl_lock(), none is possible at 
> present.  Hrm.

Thanks Ben, this bisection indeed confirms how nasty synchronize_rcu() is :)

Time to include Paul and lkml in the discussion, and find a better solution than 
one provided in February.

> 
> 		-ben
> 
> bf51935f3e988e0ed6f34b55593e5912f990750a is first bad commit
> commit bf51935f3e988e0ed6f34b55593e5912f990750a
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Tue Feb 17 06:01:30 2009 -0800
> 
>     x86, rcu: fix strange load average and ksoftirqd behavior
>     
>     Damien Wyart reported high ksoftirqd CPU usage (20%) on an
>     otherwise idle system.
>     
>     The function-graph trace Damien provided:
> ...
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> 
> index a546f55..bd4da2a 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -104,9 +104,6 @@ void cpu_idle(void)
>  			check_pgt_cache();
>  			rmb();
>  
> -			if (rcu_pending(cpu))
> -				rcu_check_callbacks(cpu, 0);
> -
>  			if (cpu_is_offline(cpu))
>  				play_dead();
>  
> 
> --

Paul, this commit makes net device unregister very slow (more than 100 ms
 if CONFIG_NO_HZ is set), while it used to be pretty fast in previous kernels.

Quoting Ben : 
" I just ran a few L2TP tests against 2.6.30-rc7, and it looks like network 
  device deletion has become unusably slow.  At least in 2.6.27.10, deleting 
  1000 network interfaces takes less than 2 seconds of real time.  The same 
  test run under 2.6.30-rc7 is taking hundreds of seconds to delete 1000 
  interfaces at a rate of about 5 per second.  The interfaces all share the 
  same local ip address, but each have a single route to a unique client 
  ip address."

Device unregister is a synchronize_rcu() abuser (three calls to dismantle
a vlan...) so delaying rcu callbacks can be pretty expensive for it.

I wonder if the real root of the problem was not discovered in the meantime,
by commit 64ca5ab913f1594ef316556e65f5eae63ff50cee
rcu: increment quiescent state counter in ksoftirqd()

Maybe this commit solved Damien Wyart problem as well, and we can revert
commit bf51935f3e988e0ed6f34b55593e5912f990750a ?

Thank you

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-25  5:22                 ` Eric Dumazet
@ 2009-05-25  8:04                   ` Damien Wyart
  2009-05-25 16:21                   ` Paul E. McKenney
  1 sibling, 0 replies; 12+ messages in thread
From: Damien Wyart @ 2009-05-25  8:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Benjamin LaHaise, Paul E. McKenney, Denys Fedoryschenko, netdev,
	linux kernel

Hello,

* Eric Dumazet <dada1@cosmosbay.com> [2009-05-25 07:22]:
> Time to include Paul and lkml in the discussion, and find a better
> solution than one provided in February.

> > bf51935f3e988e0ed6f34b55593e5912f990750a is first bad commit
> > commit bf51935f3e988e0ed6f34b55593e5912f990750a
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Tue Feb 17 06:01:30 2009 -0800

> >     x86, rcu: fix strange load average and ksoftirqd behavior

> >     Damien Wyart reported high ksoftirqd CPU usage (20%) on an
> >     otherwise idle system.

> >     The function-graph trace Damien provided:
> > ...
> > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c

> > index a546f55..bd4da2a 100644
> > --- a/arch/x86/kernel/process_32.c
> > +++ b/arch/x86/kernel/process_32.c
> > @@ -104,9 +104,6 @@ void cpu_idle(void)
> >  			check_pgt_cache();
> >  			rmb();

> > -			if (rcu_pending(cpu))
> > -				rcu_check_callbacks(cpu, 0);
> > -
> >  			if (cpu_is_offline(cpu))
> >  				play_dead();

> I wonder if the real root of the problem was not discovered in the meantime,
> by commit 64ca5ab913f1594ef316556e65f5eae63ff50cee
> rcu: increment quiescent state counter in ksoftirqd()

> Maybe this commit solved Damien Wyart problem as well, and we can revert
> commit bf51935f3e988e0ed6f34b55593e5912f990750a ?

Ran some tests on 2.6.30-rc7 with bf51935f reverted, and I am still
seeing the problems I originally reported back in February, so I guess
64ca5ab9 is not enough to fully solve all the issues... Note that I am
using CONFIG_TREE_RCU=y (was already true in my February reports).

Feel free to ask if more testing is needed.

-- 
Damien Wyart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: unregister_netdev() unusably slow
  2009-05-25  5:22                 ` Eric Dumazet
  2009-05-25  8:04                   ` Damien Wyart
@ 2009-05-25 16:21                   ` Paul E. McKenney
  1 sibling, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2009-05-25 16:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Benjamin LaHaise, Denys Fedoryschenko, netdev, linux kernel,
	damien.wyart

On Mon, May 25, 2009 at 07:22:02AM +0200, Eric Dumazet wrote:
> Benjamin LaHaise a écrit :
> > On Mon, May 25, 2009 at 12:47:39AM +0200, Eric Dumazet wrote:
> >> There is a strong dependancy against HZ
> >> BTW, I am using TREE_RCU
> > 
> > I'm using CLASSIC_RCU.  The bisect just completed, and it points to RCU.  
> > It makes some degree of sense since I'm testing on an otherwise idle 
> > machine.  That said, where is fixing it going to make sense?  I'm not 
> > opposed to having device unregister take a few timer ticks, but there 
> > has to be some way of exposing parallelism to the system, and since the 
> > synchronize_net() calls are done under rntl_lock(), none is possible at 
> > present.  Hrm.
> 
> Thanks Ben, this bisection indeed confirms how nasty synchronize_rcu() is :)

Yet another step in my learning what is required of RCU, it seems!  ;-)

> Time to include Paul and lkml in the discussion, and find a better solution than 
> one provided in February.

One approach would be to convert the offending synchronize_rcu() to
call_rcu(), but if this were straightforward, I would guess that you would
have already done this.  But if the code following the synchronize_rcu()
does nothing but free up old data structures, this is an easy fix.
If there are statistics or other state involved, then call_rcu() might
not be the right tool for the job.

Another approach is to apply the patch at:

	http://lkml.org/lkml/2009/5/22/332

Then replace the offending synchronize_rcu() with synchronize_rcu_expedited().
This code is still a bit on the experimental side, but tests have been
going quite well, so, unlike a week or two ago, it is definitely worth
trying out.

Do either of these approaches work for you?

							Thanx, Paul

> > 		-ben
> > 
> > bf51935f3e988e0ed6f34b55593e5912f990750a is first bad commit
> > commit bf51935f3e988e0ed6f34b55593e5912f990750a
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Tue Feb 17 06:01:30 2009 -0800
> > 
> >     x86, rcu: fix strange load average and ksoftirqd behavior
> >     
> >     Damien Wyart reported high ksoftirqd CPU usage (20%) on an
> >     otherwise idle system.
> >     
> >     The function-graph trace Damien provided:
> > ...
> > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> > 
> > index a546f55..bd4da2a 100644
> > --- a/arch/x86/kernel/process_32.c
> > +++ b/arch/x86/kernel/process_32.c
> > @@ -104,9 +104,6 @@ void cpu_idle(void)
> >  			check_pgt_cache();
> >  			rmb();
> >  
> > -			if (rcu_pending(cpu))
> > -				rcu_check_callbacks(cpu, 0);
> > -
> >  			if (cpu_is_offline(cpu))
> >  				play_dead();
> >  
> > 
> > --
> 
> Paul, this commit makes net device unregister very slow (more than 100 ms
>  if CONFIG_NO_HZ is set), while it used to be pretty fast in previous kernels.
> 
> Quoting Ben : 
> " I just ran a few L2TP tests against 2.6.30-rc7, and it looks like network 
>   device deletion has become unusably slow.  At least in 2.6.27.10, deleting 
>   1000 network interfaces takes less than 2 seconds of real time.  The same 
>   test run under 2.6.30-rc7 is taking hundreds of seconds to delete 1000 
>   interfaces at a rate of about 5 per second.  The interfaces all share the 
>   same local ip address, but each have a single route to a unique client 
>   ip address."
> 
> Device unregister is a synchronize_rcu() abuser (three calls to dismantle
> a vlan...) so delaying rcu callbacks can be pretty expensive for it.
> 
> I wonder if the real root of the problem was not discovered in the meantime,
> by commit 64ca5ab913f1594ef316556e65f5eae63ff50cee
> rcu: increment quiescent state counter in ksoftirqd()
> 
> Maybe this commit solved Damien Wyart problem as well, and we can revert
> commit bf51935f3e988e0ed6f34b55593e5912f990750a ?
> 
> Thank you
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-05-25 16:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-24 19:21 regression: unregister_netdev() unusably slow Benjamin LaHaise
2009-05-24 21:23 ` Denys Fedoryschenko
2009-05-24 21:37   ` Benjamin LaHaise
2009-05-24 21:42     ` Eric Dumazet
2009-05-24 21:44       ` Benjamin LaHaise
2009-05-24 22:07         ` Eric Dumazet
2009-05-24 22:12           ` Benjamin LaHaise
2009-05-24 22:47             ` Eric Dumazet
2009-05-25  0:00               ` Benjamin LaHaise
2009-05-25  5:22                 ` Eric Dumazet
2009-05-25  8:04                   ` Damien Wyart
2009-05-25 16:21                   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).