* [PATCH/RFC] synchronize_rcu(): high latency on idle system @ 2008-01-12 1:26 Benjamin LaHaise 2008-01-12 2:37 ` Andi Kleen 2008-01-12 9:23 ` Peter Zijlstra 0 siblings, 2 replies; 10+ messages in thread From: Benjamin LaHaise @ 2008-01-12 1:26 UTC (permalink / raw) To: dipankar, Andrew Morton; +Cc: linux-kernel, linux-arch Hello folks, I'd like to put the patch below out for comments to see if folks think the approach is a valid fix to reduce the latency of synchronize_rcu(). The motivation is that an otherwise idle system takes about 3 ticks per network interface in unregister_netdev() due to multiple calls to synchronize_rcu(), which adds up to quite a few seconds for tearing down thousands of interfaces. By flushing pending rcu callbacks in the idle loop, the system makes progress hundreds of times faster. If this is indeed a sane thing to, it probably needs to be done for other architectures than x86. And yes, the network stack shouldn't call synchronize_rcu() quite so much, but fixing that is a little more involved. -ben diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 9663c2a..592f6e4 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -188,6 +188,9 @@ void cpu_idle(void) rmb(); idle = pm_idle; + if (rcu_pending(cpu)) + rcu_check_callbacks(cpu, 0); + if (!idle) idle = default_idle; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system 2008-01-12 1:26 [PATCH/RFC] synchronize_rcu(): high latency on idle system Benjamin LaHaise @ 2008-01-12 2:37 ` Andi Kleen 2008-01-12 17:51 ` Benjamin LaHaise 2008-01-12 9:23 ` Peter Zijlstra 1 sibling, 1 reply; 10+ messages in thread From: Andi Kleen @ 2008-01-12 2:37 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: dipankar, Andrew Morton, linux-kernel, linux-arch > And yes, the > network stack shouldn't call synchronize_rcu() quite so much, but fixing that > is a little more involved. ... but the correct solution. -Andi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system 2008-01-12 2:37 ` Andi Kleen @ 2008-01-12 17:51 ` Benjamin LaHaise 2008-01-12 18:35 ` Andi Kleen 0 siblings, 1 reply; 10+ messages in thread From: Benjamin LaHaise @ 2008-01-12 17:51 UTC (permalink / raw) To: Andi Kleen; +Cc: dipankar, Andrew Morton, linux-kernel, linux-arch On Sat, Jan 12, 2008 at 03:37:59AM +0100, Andi Kleen wrote: > > And yes, the > > network stack shouldn't call synchronize_rcu() quite so much, but fixing that > > is a little more involved. > > ... but the correct solution. There has to be at least 1 synchronize_rcu() or equivalent in the unregister_netdev() path. I suspect the easiest way to fix it might be to use call_rcu() to actually free the network device, as anything else will limit performance of single threaded teardown (ie, when an l2tp daemon gets terminated via kill -9). This means an API change that exposes rcu for unregister_netdev(). -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <zyntrop@kvack.org>. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system 2008-01-12 17:51 ` Benjamin LaHaise @ 2008-01-12 18:35 ` Andi Kleen 2008-01-13 1:52 ` Stephen Hemminger 0 siblings, 1 reply; 10+ messages in thread From: Andi Kleen @ 2008-01-12 18:35 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: dipankar, Andrew Morton, linux-kernel, linux-arch On Saturday 12 January 2008 18:51:35 Benjamin LaHaise wrote: > On Sat, Jan 12, 2008 at 03:37:59AM +0100, Andi Kleen wrote: > > > And yes, the > > > network stack shouldn't call synchronize_rcu() quite so much, but fixing that > > > is a little more involved. > > > > ... but the correct solution. > > There has to be at least 1 synchronize_rcu() or equivalent in the > unregister_netdev() path. I suspect the easiest way to fix it might be to > use call_rcu() to actually free the network device, as anything else will > limit performance of single threaded teardown (ie, when an l2tp daemon > gets terminated via kill -9). This means an API change that exposes > rcu for unregister_netdev(). The call_rcu() could be in free_netdev() couldn't it? -Andi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system 2008-01-12 18:35 ` Andi Kleen @ 2008-01-13 1:52 ` Stephen Hemminger 2008-01-13 15:34 ` Andi Kleen 0 siblings, 1 reply; 10+ messages in thread From: Stephen Hemminger @ 2008-01-13 1:52 UTC (permalink / raw) To: Andi Kleen; +Cc: netdev, linux-kerne On Sat, 12 Jan 2008 19:35:58 +0100 Andi Kleen <ak@suse.de> wrote: > On Saturday 12 January 2008 18:51:35 Benjamin LaHaise wrote: > > On Sat, Jan 12, 2008 at 03:37:59AM +0100, Andi Kleen wrote: > > > > And yes, the > > > > network stack shouldn't call synchronize_rcu() quite so much, but fixing that > > > > is a little more involved. > > > > > > ... but the correct solution. > > > > There has to be at least 1 synchronize_rcu() or equivalent in the > > unregister_netdev() path. I suspect the easiest way to fix it might be to > > use call_rcu() to actually free the network device, as anything else will > > limit performance of single threaded teardown (ie, when an l2tp daemon > > gets terminated via kill -9). This means an API change that exposes > > rcu for unregister_netdev(). > > The call_rcu() could be in free_netdev() couldn't it? I think it should be in netdev_unregister_kobject(). But that would only get rid of one of the two calls to synchronize_rcu() in the unregister_netdev. The other synchronize_rcu() is for qdisc's and not sure if that one can be removed? -- Stephen Hemminger <stephen.hemminger@vyatta.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system 2008-01-13 1:52 ` Stephen Hemminger @ 2008-01-13 15:34 ` Andi Kleen 2008-01-14 17:19 ` Stephen Hemminger 0 siblings, 1 reply; 10+ messages in thread From: Andi Kleen @ 2008-01-13 15:34 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev, linux-kerne > I think it should be in netdev_unregister_kobject(). But that would > only get rid of one of the two calls to synchronize_rcu() in the unregister_netdev. Would be already an improvement. > The other synchronize_rcu() is for qdisc's and not sure if that one can > be removed? The standard way to remove such calls is to set a "deleted" flag in the object, then check and ignore such objects in the reader and finally remove the object with call_rcu I have not checked if that is really feasible for qdiscs. -Andi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system 2008-01-13 15:34 ` Andi Kleen @ 2008-01-14 17:19 ` Stephen Hemminger 0 siblings, 0 replies; 10+ messages in thread From: Stephen Hemminger @ 2008-01-14 17:19 UTC (permalink / raw) To: Andi Kleen; +Cc: netdev, linux-kerne On Sun, 13 Jan 2008 16:34:17 +0100 Andi Kleen <ak@suse.de> wrote: > > > I think it should be in netdev_unregister_kobject(). But that would > > only get rid of one of the two calls to synchronize_rcu() in the unregister_netdev. > > Would be already an improvement. > > > The other synchronize_rcu() is for qdisc's and not sure if that one can > > be removed? > > The standard way to remove such calls is to set a "deleted" flag in the object, > then check and ignore such objects in the reader and finally remove the object with > call_rcu > > I have not checked if that is really feasible for qdiscs. > > -Andi Actually, the synchronize_rcu() is now acting a barrier between two sections in the current unregister process. It can't be removed. But, an alternative unregister_and_free_netdev() could be created that uses call_rcu. Basically: void unregistr_and_free_netdev() { do stuff before barrier... setup rcu callback call_rcu(); } static void netdev_after_rcu() { rtnl_lock(); do stuff after barier rtnl_unlock(); free_netdev } -- Stephen Hemminger <stephen.hemminger@vyatta.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system 2008-01-12 1:26 [PATCH/RFC] synchronize_rcu(): high latency on idle system Benjamin LaHaise 2008-01-12 2:37 ` Andi Kleen @ 2008-01-12 9:23 ` Peter Zijlstra 2008-01-12 16:55 ` Paul E. McKenney 2008-01-12 17:33 ` Andi Kleen 1 sibling, 2 replies; 10+ messages in thread From: Peter Zijlstra @ 2008-01-12 9:23 UTC (permalink / raw) To: Benjamin LaHaise Cc: dipankar, Andrew Morton, linux-kernel, linux-arch, Paul E. McKenney, rostedt, Ingo Molnar, Andi Kleen On Fri, 2008-01-11 at 20:26 -0500, Benjamin LaHaise wrote: > Hello folks, > > I'd like to put the patch below out for comments to see if folks think the > approach is a valid fix to reduce the latency of synchronize_rcu(). The > motivation is that an otherwise idle system takes about 3 ticks per network > interface in unregister_netdev() due to multiple calls to synchronize_rcu(), > which adds up to quite a few seconds for tearing down thousands of > interfaces. By flushing pending rcu callbacks in the idle loop, the system > makes progress hundreds of times faster. If this is indeed a sane thing to, > it probably needs to be done for other architectures than x86. And yes, the > network stack shouldn't call synchronize_rcu() quite so much, but fixing that > is a little more involved. So, instead of only relying on the tick to drive the RCU state machine, you add the idle loop to it. This seems to make sense, esp because nohz is held off until rcu is idle too. Even though Andi is right in that its not the proper solution to your problem, I think its worth doing anyway for the general benefit of RCU. But lets ask Paul, he is Mr RCU after all :-) > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c > index 9663c2a..592f6e4 100644 > --- a/arch/x86/kernel/process_32.c > +++ b/arch/x86/kernel/process_32.c > @@ -188,6 +188,9 @@ void cpu_idle(void) > rmb(); > idle = pm_idle; > > + if (rcu_pending(cpu)) > + rcu_check_callbacks(cpu, 0); > + > if (!idle) > idle = default_idle; > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system 2008-01-12 9:23 ` Peter Zijlstra @ 2008-01-12 16:55 ` Paul E. McKenney 2008-01-12 17:33 ` Andi Kleen 1 sibling, 0 replies; 10+ messages in thread From: Paul E. McKenney @ 2008-01-12 16:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Benjamin LaHaise, dipankar, Andrew Morton, linux-kernel, linux-arch, rostedt, Ingo Molnar, Andi Kleen On Sat, Jan 12, 2008 at 10:23:11AM +0100, Peter Zijlstra wrote: > > On Fri, 2008-01-11 at 20:26 -0500, Benjamin LaHaise wrote: > > Hello folks, > > > > I'd like to put the patch below out for comments to see if folks think the > > approach is a valid fix to reduce the latency of synchronize_rcu(). The > > motivation is that an otherwise idle system takes about 3 ticks per network > > interface in unregister_netdev() due to multiple calls to synchronize_rcu(), > > which adds up to quite a few seconds for tearing down thousands of > > interfaces. By flushing pending rcu callbacks in the idle loop, the system > > makes progress hundreds of times faster. If this is indeed a sane thing to, > > it probably needs to be done for other architectures than x86. And yes, the > > network stack shouldn't call synchronize_rcu() quite so much, but fixing that > > is a little more involved. > > So, instead of only relying on the tick to drive the RCU state machine, > you add the idle loop to it. This seems to make sense, esp because nohz > is held off until rcu is idle too. > > Even though Andi is right in that its not the proper solution to your > problem, I think its worth doing anyway for the general benefit of RCU. > > But lets ask Paul, he is Mr RCU after all :-) ;-) At first glance, looks workable! One concern is how often it gets invoked. If rcu_check_callbacks() is invoked too often on lots of idle CPUs, it could degrade system performance due to contention on the RCU internal locks and due to cacheline bouncing. Now, my guess is that the rcu_pending() call should throttle things nicely, but it would be good to test. All the testing ideas thus far have been involved and unlikely to test it well, for example: CPU 0: lots of synchronize_rcu() calls. CPU 1: lots of synchronize_rcu() calls. CPU 2: idle. CPU 3: CPU-bound workload. Compare the rate of progress made by CPU 3 with CPUs 0 and 1 active or not. But this would not test much -- the load that CPUs 0, 1, and 2 might be placing on the bus/cache/RCU-locks would not be visible to CPU 3. One could cache-thrash between CPU 3 and 4, but this requires a >=5-CPU system. Will think on it some more. > > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c > > index 9663c2a..592f6e4 100644 > > --- a/arch/x86/kernel/process_32.c > > +++ b/arch/x86/kernel/process_32.c > > @@ -188,6 +188,9 @@ void cpu_idle(void) > > rmb(); > > idle = pm_idle; > > > > + if (rcu_pending(cpu)) > > + rcu_check_callbacks(cpu, 0); Given that it is not legal to have RCU read-side critical sections in the idle loop, how about the following? + rcu_check_callbacks(cpu, 1); Perhaps also changing the name of rcu_check_callbacks()'s second parameter from "user" to something like "in_quiescent_state". Might speed up grace-period recognition in some cases -- wouldn't need to wait for the next trip through the scheduler in some cases. Thanx, Paul > > + > > if (!idle) > > idle = default_idle; > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system 2008-01-12 9:23 ` Peter Zijlstra 2008-01-12 16:55 ` Paul E. McKenney @ 2008-01-12 17:33 ` Andi Kleen 1 sibling, 0 replies; 10+ messages in thread From: Andi Kleen @ 2008-01-12 17:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Benjamin LaHaise, dipankar, Andrew Morton, linux-kernel, linux-arch, Paul E. McKenney, rostedt, Ingo Molnar On Saturday 12 January 2008 10:23:11 Peter Zijlstra wrote: > > On Fri, 2008-01-11 at 20:26 -0500, Benjamin LaHaise wrote: > > Hello folks, > > > > I'd like to put the patch below out for comments to see if folks think the > > approach is a valid fix to reduce the latency of synchronize_rcu(). The > > motivation is that an otherwise idle system takes about 3 ticks per network > > interface in unregister_netdev() due to multiple calls to synchronize_rcu(), > > which adds up to quite a few seconds for tearing down thousands of > > interfaces. By flushing pending rcu callbacks in the idle loop, the system > > makes progress hundreds of times faster. If this is indeed a sane thing to, > > it probably needs to be done for other architectures than x86. And yes, the > > network stack shouldn't call synchronize_rcu() quite so much, but fixing that > > is a little more involved. > > So, instead of only relying on the tick to drive the RCU state machine, > you add the idle loop to it. This seems to make sense, esp because nohz > is held off until rcu is idle too. For NOHZ I agree it would be probably better to just force a quiescent cycle than to schedule a one jiffie tick like it is currently done. For non NOHZ I'm not so sure. -Andi ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-01-14 17:21 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-12 1:26 [PATCH/RFC] synchronize_rcu(): high latency on idle system Benjamin LaHaise 2008-01-12 2:37 ` Andi Kleen 2008-01-12 17:51 ` Benjamin LaHaise 2008-01-12 18:35 ` Andi Kleen 2008-01-13 1:52 ` Stephen Hemminger 2008-01-13 15:34 ` Andi Kleen 2008-01-14 17:19 ` Stephen Hemminger 2008-01-12 9:23 ` Peter Zijlstra 2008-01-12 16:55 ` Paul E. McKenney 2008-01-12 17:33 ` Andi Kleen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.