[PATCH] improve stop_machine performance

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] improve stop_machine performance
@ 2010-03-04 21:20 Dimitri Sivanich
  2010-03-05  0:22 ` Tejun Heo
  0 siblings, 1 reply; 4+ messages in thread
From: Dimitri Sivanich @ 2010-03-04 21:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: Rusty Russell, Linus Torvalds, Heiko Carstens, Tejun Heo

On systems with large cpu counts, we've been seeing long bootup times
associated with stop_machine operations.  I've noticed that by simply
removing the creation of the workqueue and associated percpu variables
in subsequent stop_machine calls, we can reduce boot times on a
1024 processor SGI UV system from 25-30 (or more) minutes down to 12
minutes.

The attached patch does this in a simple way by removing the
stop_machine_destroy interface, thereby by leaving the workqueues and
percpu variables for later use once they are created.

If people are against having these areas around after boot, maybe there
are some alternatives that will still allow for this optimization:

 - Set a timer to go off after a configurable number of minutes, at
   which point the workqueue areas will be deleted.

 - Keep the stop_machine_destroy function, but somehow run it at the tail
   end of boot (after modules have loaded), rather than running it at
   every stop_machine call.


Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>

---

 drivers/xen/manage.c         |    1 -
 include/linux/stop_machine.h |    9 ---------
 kernel/cpu.c                 |    2 --
 kernel/module.c              |    1 -
 kernel/stop_machine.c        |   25 +++++--------------------
 5 files changed, 5 insertions(+), 33 deletions(-)

Index: linux/kernel/stop_machine.c
===================================================================
--- linux.orig/kernel/stop_machine.c
+++ linux/kernel/stop_machine.c
@@ -38,10 +38,8 @@ struct stop_machine_data {
 static unsigned int num_threads;
 static atomic_t thread_ack;
 static DEFINE_MUTEX(lock);
-/* setup_lock protects refcount, stop_machine_wq and stop_machine_work. */
+/* setup_lock protects stop_machine_wq and stop_machine_work. */
 static DEFINE_MUTEX(setup_lock);
-/* Users of stop_machine. */
-static int refcount;
 static struct workqueue_struct *stop_machine_wq;
 static struct stop_machine_data active, idle;
 static const struct cpumask *active_cpus;
@@ -115,7 +113,7 @@ static int chill(void *unused)
 int stop_machine_create(void)
 {
 	mutex_lock(&setup_lock);
-	if (refcount)
+	if (stop_machine_wq)
 		goto done;
 	stop_machine_wq = create_rt_workqueue("kstop");
 	if (!stop_machine_wq)
@@ -124,31 +122,19 @@ int stop_machine_create(void)
 	if (!stop_machine_work)
 		goto err_out;
 done:
-	refcount++;
 	mutex_unlock(&setup_lock);
 	return 0;
 
 err_out:
-	if (stop_machine_wq)
+	if (stop_machine_wq) {
 		destroy_workqueue(stop_machine_wq);
+		stop_machine_wq = NULL;
+	}
 	mutex_unlock(&setup_lock);
 	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(stop_machine_create);
 
-void stop_machine_destroy(void)
-{
-	mutex_lock(&setup_lock);
-	refcount--;
-	if (refcount)
-		goto done;
-	destroy_workqueue(stop_machine_wq);
-	free_percpu(stop_machine_work);
-done:
-	mutex_unlock(&setup_lock);
-}
-EXPORT_SYMBOL_GPL(stop_machine_destroy);
-
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
 {
 	struct work_struct *sm_work;
@@ -193,7 +179,6 @@ int stop_machine(int (*fn)(void *), void
 	get_online_cpus();
 	ret = __stop_machine(fn, data, cpus);
 	put_online_cpus();
-	stop_machine_destroy();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(stop_machine);
Index: linux/drivers/xen/manage.c
===================================================================
--- linux.orig/drivers/xen/manage.c
+++ linux/drivers/xen/manage.c
@@ -138,7 +138,6 @@ out_thaw:
 
 out_destroy_sm:
 #endif
-	stop_machine_destroy();
 
 out:
 	shutting_down = SHUTDOWN_INVALID;
Index: linux/include/linux/stop_machine.h
===================================================================
--- linux.orig/include/linux/stop_machine.h
+++ linux/include/linux/stop_machine.h
@@ -45,14 +45,6 @@ int __stop_machine(int (*fn)(void *), vo
  */
 int stop_machine_create(void);
 
-/**
- * stop_machine_destroy: destroy all stop_machine threads
- *
- * Description: This causes all stop_machine threads which were created with
- * stop_machine_create to be destroyed again.
- */
-void stop_machine_destroy(void);
-
 #else
 
 static inline int stop_machine(int (*fn)(void *), void *data,
@@ -66,7 +58,6 @@ static inline int stop_machine(int (*fn)
 }
 
 static inline int stop_machine_create(void) { return 0; }
-static inline void stop_machine_destroy(void) { }
 
 #endif /* CONFIG_SMP */
 #endif /* _LINUX_STOP_MACHINE */
Index: linux/kernel/cpu.c
===================================================================
--- linux.orig/kernel/cpu.c
+++ linux/kernel/cpu.c
@@ -285,7 +285,6 @@ int __ref cpu_down(unsigned int cpu)
 
 out:
 	cpu_maps_update_done();
-	stop_machine_destroy();
 	return err;
 }
 EXPORT_SYMBOL(cpu_down);
@@ -399,7 +398,6 @@ int disable_nonboot_cpus(void)
 		printk(KERN_ERR "Non-boot CPUs are not disabled\n");
 	}
 	cpu_maps_update_done();
-	stop_machine_destroy();
 	return error;
 }
 
Index: linux/kernel/module.c
===================================================================
--- linux.orig/kernel/module.c
+++ linux/kernel/module.c
@@ -731,7 +731,6 @@ SYSCALL_DEFINE2(delete_module, const cha
  out:
 	mutex_unlock(&module_mutex);
 out_stop:
-	stop_machine_destroy();
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] improve stop_machine performance
  2010-03-04 21:20 [PATCH] improve stop_machine performance Dimitri Sivanich
@ 2010-03-05  0:22 ` Tejun Heo
  2010-03-05 14:11   ` Dimitri Sivanich
  0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2010-03-05  0:22 UTC (permalink / raw)
  To: Dimitri Sivanich
  Cc: linux-kernel, Rusty Russell, Linus Torvalds, Heiko Carstens

Hello,

On 03/05/2010 06:20 AM, Dimitri Sivanich wrote:
> On systems with large cpu counts, we've been seeing long bootup times
> associated with stop_machine operations.  I've noticed that by simply
> removing the creation of the workqueue and associated percpu variables
> in subsequent stop_machine calls, we can reduce boot times on a
> 1024 processor SGI UV system from 25-30 (or more) minutes down to 12
> minutes.
> 
> The attached patch does this in a simple way by removing the
> stop_machine_destroy interface, thereby by leaving the workqueues and
> percpu variables for later use once they are created.
> 
> If people are against having these areas around after boot, maybe there
> are some alternatives that will still allow for this optimization:
> 
>  - Set a timer to go off after a configurable number of minutes, at
>    which point the workqueue areas will be deleted.
> 
>  - Keep the stop_machine_destroy function, but somehow run it at the tail
>    end of boot (after modules have loaded), rather than running it at
>    every stop_machine call.

Yeah, I can indeed imagine that creating and destroying all those
workers on every module load during boot would be very costly if there
are lots of CPUs.  How about sharing the migration thread so that it
serves as one-per-cpu uninterruptible RT simple thread pool?  It's not
like these things can run taking their turns anyway.  I'll go ahead
and make something up.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] improve stop_machine performance
  2010-03-05  0:22 ` Tejun Heo
@ 2010-03-05 14:11   ` Dimitri Sivanich
  2010-03-05 14:17     ` Tejun Heo
  0 siblings, 1 reply; 4+ messages in thread
From: Dimitri Sivanich @ 2010-03-05 14:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Rusty Russell, Linus Torvalds, Heiko Carstens

On Fri, Mar 05, 2010 at 09:22:00AM +0900, Tejun Heo wrote:
> Hello,
> 
> On 03/05/2010 06:20 AM, Dimitri Sivanich wrote:
> > On systems with large cpu counts, we've been seeing long bootup times
> > associated with stop_machine operations.  I've noticed that by simply
> > removing the creation of the workqueue and associated percpu variables
> > in subsequent stop_machine calls, we can reduce boot times on a
> > 1024 processor SGI UV system from 25-30 (or more) minutes down to 12
> > minutes.
> > 
> > The attached patch does this in a simple way by removing the
> > stop_machine_destroy interface, thereby by leaving the workqueues and
> > percpu variables for later use once they are created.
> > 
> > If people are against having these areas around after boot, maybe there
> > are some alternatives that will still allow for this optimization:
> > 
> >  - Set a timer to go off after a configurable number of minutes, at
> >    which point the workqueue areas will be deleted.
> > 
> >  - Keep the stop_machine_destroy function, but somehow run it at the tail
> >    end of boot (after modules have loaded), rather than running it at
> >    every stop_machine call.
> 
> Yeah, I can indeed imagine that creating and destroying all those
> workers on every module load during boot would be very costly if there
> are lots of CPUs.  How about sharing the migration thread so that it
> serves as one-per-cpu uninterruptible RT simple thread pool?  It's not
> like these things can run taking their turns anyway.  I'll go ahead
> and make something up.
>

It seems reasonable as long as setup is fast enough.  Will that thread indeed become fully uninterruptible (not affected by anything including scheduler decisions like sched_rt_period_us/sched_rt_runtime_us, etc..)?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] improve stop_machine performance
  2010-03-05 14:11   ` Dimitri Sivanich
@ 2010-03-05 14:17     ` Tejun Heo
  0 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2010-03-05 14:17 UTC (permalink / raw)
  To: Dimitri Sivanich
  Cc: linux-kernel, Rusty Russell, Linus Torvalds, Heiko Carstens

Hello,

On 03/05/2010 11:11 PM, Dimitri Sivanich wrote:
> It seems reasonable as long as setup is fast enough.  Will that
> thread indeed become fully uninterruptible (not affected by anything
> including scheduler decisions like
> sched_rt_period_us/sched_rt_runtime_us, etc..)?

I think so.  They basically are used to do the same thing - occupying
the CPU.  I'll give a shot.  Please give me a couple of days.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-03-05 14:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-04 21:20 [PATCH] improve stop_machine performance Dimitri Sivanich
2010-03-05  0:22 ` Tejun Heo
2010-03-05 14:11   ` Dimitri Sivanich
2010-03-05 14:17     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox