public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: missing locking in sched_domains code
@ 2008-04-27 21:12 Heiko Carstens
  2008-04-28  1:39 ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2008-04-27 21:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.

So add a new sched_domains_mutex which protects these sections again.

Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 include/linux/sched.h |    2 ++
 kernel/cpuset.c       |    2 ++
 kernel/sched.c        |   11 +++++++++++
 3 files changed, 15 insertions(+)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -7807,14 +7807,23 @@ match2:
 	unlock_doms_cur();
 }
 
+/*
+ * Protects against concurrent calls to detach_destroy_domains
+ * and arch_init_sched_domains.
+ */
+DEFINE_MUTEX(sched_domains_mutex);
+
 #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
 int arch_reinit_sched_domains(void)
 {
+	static DEFINE_MUTEX(arch_reinit_sched_domains_mutex);
 	int err;
 
 	get_online_cpus();
+	mutex_lock(&sched_domains_mutex);
 	detach_destroy_domains(&cpu_online_map);
 	err = arch_init_sched_domains(&cpu_online_map);
+	mutex_unlock(&sched_domains_mutex);
 	put_online_cpus();
 
 	return err;
@@ -7932,10 +7941,12 @@ void __init sched_init_smp(void)
 	BUG_ON(sched_group_nodes_bycpu == NULL);
 #endif
 	get_online_cpus();
+	mutex_lock(&sched_domains_mutex);
 	arch_init_sched_domains(&cpu_online_map);
 	cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
 	if (cpus_empty(non_isolated_cpus))
 		cpu_set(smp_processor_id(), non_isolated_cpus);
+	mutex_unlock(&sched_domains_mutex);
 	put_online_cpus();
 	/* XXX: Theoretical race here - CPU may be hotplugged now */
 	hotcpu_notifier(update_sched_domains, 0);
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -809,6 +809,8 @@ struct sched_domain {
 #endif
 };
 
+extern struct mutex sched_domains_mutex;
+
 extern void partition_sched_domains(int ndoms_new, cpumask_t *doms_new,
 				    struct sched_domain_attr *dattr_new);
 extern int arch_reinit_sched_domains(void);
Index: linux-2.6/kernel/cpuset.c
===================================================================
--- linux-2.6.orig/kernel/cpuset.c
+++ linux-2.6/kernel/cpuset.c
@@ -684,7 +684,9 @@ restart:
 rebuild:
 	/* Have scheduler rebuild sched domains */
 	get_online_cpus();
+	mutex_lock(&sched_domains_mutex);
 	partition_sched_domains(ndoms, doms, dattr);
+	mutex_unlock(&sched_domains_mutex);
 	put_online_cpus();
 
 done:

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-27 21:12 [PATCH] sched: missing locking in sched_domains code Heiko Carstens
@ 2008-04-28  1:39 ` Andrew Morton
  2008-04-28  7:09   ` Heiko Carstens
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2008-04-28  1:39 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel

On Sun, 27 Apr 2008 23:12:24 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> 
> Concurrent calls to detach_destroy_domains and arch_init_sched_domains
> were prevented by the old scheduler subsystem cpu hotplug mutex. When
> this got converted to get_online_cpus() the locking got broken.
> Unlike before now several processes can concurrently enter the critical
> sections that were protected by the old lock.
> 
> So add a new sched_domains_mutex which protects these sections again.
> 
> Cc: Gautham R Shenoy <ego@in.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Paul Jackson <pj@sgi.com>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> ---
>  include/linux/sched.h |    2 ++
>  kernel/cpuset.c       |    2 ++
>  kernel/sched.c        |   11 +++++++++++
>  3 files changed, 15 insertions(+)
> 
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -7807,14 +7807,23 @@ match2:
>  	unlock_doms_cur();
>  }
>  
> +/*
> + * Protects against concurrent calls to detach_destroy_domains
> + * and arch_init_sched_domains.
> + */
> +DEFINE_MUTEX(sched_domains_mutex);
> +
>  #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
>  int arch_reinit_sched_domains(void)
>  {
> +	static DEFINE_MUTEX(arch_reinit_sched_domains_mutex);

leftover hunk.

>  	int err;
>  
>  	get_online_cpus();
> +	mutex_lock(&sched_domains_mutex);
>  	detach_destroy_domains(&cpu_online_map);
>  	err = arch_init_sched_domains(&cpu_online_map);
> +	mutex_unlock(&sched_domains_mutex);
>  	put_online_cpus();
>  
>  	return err;
> @@ -7932,10 +7941,12 @@ void __init sched_init_smp(void)
>  	BUG_ON(sched_group_nodes_bycpu == NULL);
>  #endif
>  	get_online_cpus();
> +	mutex_lock(&sched_domains_mutex);
>  	arch_init_sched_domains(&cpu_online_map);
>  	cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
>  	if (cpus_empty(non_isolated_cpus))
>  		cpu_set(smp_processor_id(), non_isolated_cpus);
> +	mutex_unlock(&sched_domains_mutex);
>  	put_online_cpus();
>  	/* XXX: Theoretical race here - CPU may be hotplugged now */
>  	hotcpu_notifier(update_sched_domains, 0);
> Index: linux-2.6/include/linux/sched.h
> ===================================================================
> --- linux-2.6.orig/include/linux/sched.h
> +++ linux-2.6/include/linux/sched.h
> @@ -809,6 +809,8 @@ struct sched_domain {
>  #endif
>  };
>  
> +extern struct mutex sched_domains_mutex;
> +
>  extern void partition_sched_domains(int ndoms_new, cpumask_t *doms_new,
>  				    struct sched_domain_attr *dattr_new);
>  extern int arch_reinit_sched_domains(void);
> Index: linux-2.6/kernel/cpuset.c
> ===================================================================
> --- linux-2.6.orig/kernel/cpuset.c
> +++ linux-2.6/kernel/cpuset.c
> @@ -684,7 +684,9 @@ restart:
>  rebuild:
>  	/* Have scheduler rebuild sched domains */
>  	get_online_cpus();
> +	mutex_lock(&sched_domains_mutex);
>  	partition_sched_domains(ndoms, doms, dattr);
> +	mutex_unlock(&sched_domains_mutex);
>  	put_online_cpus();
>  

It seems a bit fragile to take this lock in the caller without even adding
a comment at the callee site which documents the new locking rule.

It would be more robust to take the lock within partition_sched_domains().

partition_sched_domains() already covers itself with lock_doms_cur().  Can
we take that in arch_reinit_sched_domains() rather than adding the new lock?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  1:39 ` Andrew Morton
@ 2008-04-28  7:09   ` Heiko Carstens
  2008-04-28  7:24     ` Ingo Molnar
  2008-04-28  7:28     ` Andrew Morton
  0 siblings, 2 replies; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28  7:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel

On Sun, Apr 27, 2008 at 06:39:26PM -0700, Andrew Morton wrote:
> On Sun, 27 Apr 2008 23:12:24 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > Index: linux-2.6/kernel/cpuset.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/cpuset.c
> > +++ linux-2.6/kernel/cpuset.c
> > @@ -684,7 +684,9 @@ restart:
> >  rebuild:
> >  	/* Have scheduler rebuild sched domains */
> >  	get_online_cpus();
> > +	mutex_lock(&sched_domains_mutex);
> >  	partition_sched_domains(ndoms, doms, dattr);
> > +	mutex_unlock(&sched_domains_mutex);
> >  	put_online_cpus();
> >  
> 
> It seems a bit fragile to take this lock in the caller without even adding
> a comment at the callee site which documents the new locking rule.
> 
> It would be more robust to take the lock within partition_sched_domains().
> 
> partition_sched_domains() already covers itself with lock_doms_cur().  Can
> we take that in arch_reinit_sched_domains() rather than adding the new lock?

I think you meant taking it in partition_sched_domains? But anyway, I moved
it all over to sched.c. So here's the new patch. Shorter and doesn't export
a new lock :)


Subject: [PATCH] sched: fix sched_domains locking

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.

So add a new sched_domains_mutex which protects these sections again.

Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/sched.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -7730,6 +7730,12 @@ static int dattrs_equal(struct sched_dom
 }
 
 /*
+ * Protects against concurrent calls to detach_destroy_domains
+ * and arch_init_sched_domains.
+ */
+static DEFINE_MUTEX(sched_domains_mutex);
+
+/*
  * Partition sched domains as specified by the 'ndoms_new'
  * cpumasks in the array doms_new[] of cpumasks. This compares
  * doms_new[] to the current sched domain partitioning, doms_cur[].
@@ -7756,7 +7762,8 @@ void partition_sched_domains(int ndoms_n
 	int i, j;
 
 	lock_doms_cur();
-
+	mutex_lock(&sched_domains_mutex);
+
 	/* always unregister in case we don't destroy any domains */
 	unregister_sched_domain_sysctl();
 
@@ -7804,6 +7811,7 @@ match2:
 
 	register_sched_domain_sysctl();
 
+	mutex_unlock(&sched_domains_mutex);
 	unlock_doms_cur();
 }
 
@@ -7813,8 +7821,10 @@ int arch_reinit_sched_domains(void)
 	int err;
 
 	get_online_cpus();
+	mutex_lock(&sched_domains_mutex);
 	detach_destroy_domains(&cpu_online_map);
 	err = arch_init_sched_domains(&cpu_online_map);
+	mutex_unlock(&sched_domains_mutex);
 	put_online_cpus();
 
 	return err;
@@ -7932,10 +7942,12 @@ void __init sched_init_smp(void)
 	BUG_ON(sched_group_nodes_bycpu == NULL);
 #endif
 	get_online_cpus();
+	mutex_lock(&sched_domains_mutex);
 	arch_init_sched_domains(&cpu_online_map);
 	cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
 	if (cpus_empty(non_isolated_cpus))
 		cpu_set(smp_processor_id(), non_isolated_cpus);
+	mutex_unlock(&sched_domains_mutex);
 	put_online_cpus();
 	/* XXX: Theoretical race here - CPU may be hotplugged now */
 	hotcpu_notifier(update_sched_domains, 0);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  7:09   ` Heiko Carstens
@ 2008-04-28  7:24     ` Ingo Molnar
  2008-04-28  7:28     ` Andrew Morton
  1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2008-04-28  7:24 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel,
	Peter Zijlstra


* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

>  
>  /*
> + * Protects against concurrent calls to detach_destroy_domains
> + * and arch_init_sched_domains.
> + */
> +static DEFINE_MUTEX(sched_domains_mutex);
> +
> +/*
>   * Partition sched domains as specified by the 'ndoms_new'
>   * cpumasks in the array doms_new[] of cpumasks. This compares
>   * doms_new[] to the current sched domain partitioning, doms_cur[].
> @@ -7756,7 +7762,8 @@ void partition_sched_domains(int ndoms_n
>  	int i, j;
>  
>  	lock_doms_cur();
> -
> +	mutex_lock(&sched_domains_mutex);

i might be missing something but why not make doms_cur_mutex locking 
unconditional and extend it to detach_destroy_domains() as well?

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  7:09   ` Heiko Carstens
  2008-04-28  7:24     ` Ingo Molnar
@ 2008-04-28  7:28     ` Andrew Morton
  2008-04-28  7:52       ` Heiko Carstens
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2008-04-28  7:28 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel

On Mon, 28 Apr 2008 09:09:46 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> On Sun, Apr 27, 2008 at 06:39:26PM -0700, Andrew Morton wrote:
> > On Sun, 27 Apr 2008 23:12:24 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > Index: linux-2.6/kernel/cpuset.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/cpuset.c
> > > +++ linux-2.6/kernel/cpuset.c
> > > @@ -684,7 +684,9 @@ restart:
> > >  rebuild:
> > >  	/* Have scheduler rebuild sched domains */
> > >  	get_online_cpus();
> > > +	mutex_lock(&sched_domains_mutex);
> > >  	partition_sched_domains(ndoms, doms, dattr);
> > > +	mutex_unlock(&sched_domains_mutex);
> > >  	put_online_cpus();
> > >  
> > 
> > It seems a bit fragile to take this lock in the caller without even adding
> > a comment at the callee site which documents the new locking rule.
> > 
> > It would be more robust to take the lock within partition_sched_domains().
> > 
> > partition_sched_domains() already covers itself with lock_doms_cur().  Can
> > we take that in arch_reinit_sched_domains() rather than adding the new lock?
> 
> I think you meant taking it in partition_sched_domains?

What I meant was: rather than adding the new sched_domains_mutex, can we
instead call lock_doms_cur() from arch_reinit_sched_domains() and
sched_init_smp()?  Borrow the existing lock?

Whether that makes sense depends upon what lock_doms_cur() semantically
*means*.  As that appears to be somewhat of a secret, we get to decide ;)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  7:28     ` Andrew Morton
@ 2008-04-28  7:52       ` Heiko Carstens
  2008-04-28  8:11         ` Heiko Carstens
  0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28  7:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel

On Mon, Apr 28, 2008 at 12:28:53AM -0700, Andrew Morton wrote:
> On Mon, 28 Apr 2008 09:09:46 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> 
> > On Sun, Apr 27, 2008 at 06:39:26PM -0700, Andrew Morton wrote:
> > > On Sun, 27 Apr 2008 23:12:24 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > > Index: linux-2.6/kernel/cpuset.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/kernel/cpuset.c
> > > > +++ linux-2.6/kernel/cpuset.c
> > > > @@ -684,7 +684,9 @@ restart:
> > > >  rebuild:
> > > >  	/* Have scheduler rebuild sched domains */
> > > >  	get_online_cpus();
> > > > +	mutex_lock(&sched_domains_mutex);
> > > >  	partition_sched_domains(ndoms, doms, dattr);
> > > > +	mutex_unlock(&sched_domains_mutex);
> > > >  	put_online_cpus();
> > > >  
> > > 
> > > It seems a bit fragile to take this lock in the caller without even adding
> > > a comment at the callee site which documents the new locking rule.
> > > 
> > > It would be more robust to take the lock within partition_sched_domains().
> > > 
> > > partition_sched_domains() already covers itself with lock_doms_cur().  Can
> > > we take that in arch_reinit_sched_domains() rather than adding the new lock?
> > 
> > I think you meant taking it in partition_sched_domains?
> 
> What I meant was: rather than adding the new sched_domains_mutex, can we
> instead call lock_doms_cur() from arch_reinit_sched_domains() and
> sched_init_smp()?  Borrow the existing lock?
> 
> Whether that makes sense depends upon what lock_doms_cur() semantically
> *means*.  As that appears to be somewhat of a secret, we get to decide ;)

Oh, I didn't realize that lock_doms_cur() was only used in one function.
So that should work. Will try.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  7:52       ` Heiko Carstens
@ 2008-04-28  8:11         ` Heiko Carstens
  2008-04-28  8:32           ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28  8:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel

Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.

So use the already present doms_cur_mutex to protect these sections again.

Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/sched.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -311,6 +311,16 @@ static DEFINE_SPINLOCK(task_group_lock);
 /* doms_cur_mutex serializes access to doms_cur[] array */
 static DEFINE_MUTEX(doms_cur_mutex);
 
+static inline void lock_doms_cur(void)
+{
+	mutex_lock(&doms_cur_mutex);
+}
+
+static inline void unlock_doms_cur(void)
+{
+	mutex_unlock(&doms_cur_mutex);
+}
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_USER_SCHED
 # define INIT_TASK_GROUP_LOAD	(2*NICE_0_LOAD)
@@ -358,21 +368,9 @@ static inline void set_task_rq(struct ta
 #endif
 }
 
-static inline void lock_doms_cur(void)
-{
-	mutex_lock(&doms_cur_mutex);
-}
-
-static inline void unlock_doms_cur(void)
-{
-	mutex_unlock(&doms_cur_mutex);
-}
-
 #else
 
 static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline void lock_doms_cur(void) { }
-static inline void unlock_doms_cur(void) { }
 
 #endif	/* CONFIG_GROUP_SCHED */
 
@@ -7813,8 +7811,10 @@ int arch_reinit_sched_domains(void)
 	int err;
 
 	get_online_cpus();
+	lock_doms_cur();
 	detach_destroy_domains(&cpu_online_map);
 	err = arch_init_sched_domains(&cpu_online_map);
+	unlock_doms_cur();
 	put_online_cpus();
 
 	return err;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  8:11         ` Heiko Carstens
@ 2008-04-28  8:32           ` Ingo Molnar
  2008-04-28  8:49             ` Heiko Carstens
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2008-04-28  8:32 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel


* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

>  /* doms_cur_mutex serializes access to doms_cur[] array */
>  static DEFINE_MUTEX(doms_cur_mutex);
>  
> +static inline void lock_doms_cur(void)
> +{
> +	mutex_lock(&doms_cur_mutex);
> +}

> @@ -7813,8 +7811,10 @@ int arch_reinit_sched_domains(void)
>  	int err;
>  
>  	get_online_cpus();
> +	lock_doms_cur();

thanks, that looks a lot more clean already. May i ask for another 
thing, if you are hacking on this anyway? Please get rid of the 
lock_doms_cur() complication now that it's not conditional - an open 
coded mutex_lock(&sched_doms_mutex) looks more readable - it gives a 
clear idea about what's happening. Also, please rename sched_doms_mutex 
to something less tongue-twisting - such as sched_domains_mutex. Hm?

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  8:32           ` Ingo Molnar
@ 2008-04-28  8:49             ` Heiko Carstens
  2008-04-28  8:57               ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28  8:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel

On Mon, Apr 28, 2008 at 10:32:22AM +0200, Ingo Molnar wrote:
> 
> * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> 
> >  /* doms_cur_mutex serializes access to doms_cur[] array */
> >  static DEFINE_MUTEX(doms_cur_mutex);
> >  
> > +static inline void lock_doms_cur(void)
> > +{
> > +	mutex_lock(&doms_cur_mutex);
> > +}
> 
> > @@ -7813,8 +7811,10 @@ int arch_reinit_sched_domains(void)
> >  	int err;
> >  
> >  	get_online_cpus();
> > +	lock_doms_cur();
> 
> thanks, that looks a lot more clean already. May i ask for another 
> thing, if you are hacking on this anyway? Please get rid of the 
> lock_doms_cur() complication now that it's not conditional - an open 
> coded mutex_lock(&sched_doms_mutex) looks more readable - it gives a 
> clear idea about what's happening. Also, please rename sched_doms_mutex 
> to something less tongue-twisting - such as sched_domains_mutex. Hm?

Your wish is my order:

Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.

So use the already present doms_cur_mutex to protect these sections again.

Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/sched.c |   20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -309,7 +309,7 @@ static DEFINE_PER_CPU(struct rt_rq, init
 static DEFINE_SPINLOCK(task_group_lock);
 
 /* doms_cur_mutex serializes access to doms_cur[] array */
-static DEFINE_MUTEX(doms_cur_mutex);
+static DEFINE_MUTEX(sched_domains_mutex);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_USER_SCHED
@@ -358,21 +358,9 @@ static inline void set_task_rq(struct ta
 #endif
 }
 
-static inline void lock_doms_cur(void)
-{
-	mutex_lock(&doms_cur_mutex);
-}
-
-static inline void unlock_doms_cur(void)
-{
-	mutex_unlock(&doms_cur_mutex);
-}
-
 #else
 
 static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline void lock_doms_cur(void) { }
-static inline void unlock_doms_cur(void) { }
 
 #endif	/* CONFIG_GROUP_SCHED */
 
@@ -7755,7 +7743,7 @@ void partition_sched_domains(int ndoms_n
 {
 	int i, j;
 
-	lock_doms_cur();
+	mutex_lock(&sched_domains_mutex);
 
 	/* always unregister in case we don't destroy any domains */
 	unregister_sched_domain_sysctl();
@@ -7804,7 +7792,7 @@ match2:
 
 	register_sched_domain_sysctl();
 
-	unlock_doms_cur();
+	mutex_unlock(&sched_domains_mutex);
 }
 
 #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
@@ -7813,8 +7801,10 @@ int arch_reinit_sched_domains(void)
 	int err;
 
 	get_online_cpus();
+	mutex_lock(&sched_domains_mutex);
 	detach_destroy_domains(&cpu_online_map);
 	err = arch_init_sched_domains(&cpu_online_map);
+	mutex_unlock(&sched_domains_mutex);
 	put_online_cpus();
 
 	return err;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  8:49             ` Heiko Carstens
@ 2008-04-28  8:57               ` Andrew Morton
  2008-04-28  9:17                 ` Heiko Carstens
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2008-04-28  8:57 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Ingo Molnar, Gautham R Shenoy, Paul Jackson, linux-kernel

On Mon, 28 Apr 2008 10:49:04 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> On Mon, Apr 28, 2008 at 10:32:22AM +0200, Ingo Molnar wrote:
> > 
> > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > 
> > >  /* doms_cur_mutex serializes access to doms_cur[] array */
> > >  static DEFINE_MUTEX(doms_cur_mutex);
> > >  
> > > +static inline void lock_doms_cur(void)
> > > +{
> > > +	mutex_lock(&doms_cur_mutex);
> > > +}
> > 
> > > @@ -7813,8 +7811,10 @@ int arch_reinit_sched_domains(void)
> > >  	int err;
> > >  
> > >  	get_online_cpus();
> > > +	lock_doms_cur();
> > 
> > thanks, that looks a lot more clean already. May i ask for another 
> > thing, if you are hacking on this anyway? Please get rid of the 
> > lock_doms_cur() complication now that it's not conditional - an open 
> > coded mutex_lock(&sched_doms_mutex) looks more readable - it gives a 
> > clear idea about what's happening. Also, please rename sched_doms_mutex 
> > to something less tongue-twisting - such as sched_domains_mutex. Hm?
> 
> Your wish is my order:

heh, let's all boss Heiko around.

>  /* doms_cur_mutex serializes access to doms_cur[] array */
> -static DEFINE_MUTEX(doms_cur_mutex);
> +static DEFINE_MUTEX(sched_domains_mutex);

The comment refers to a no-longer-existing lock, and no longer correctly
describes the lock's usage.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  8:57               ` Andrew Morton
@ 2008-04-28  9:17                 ` Heiko Carstens
  2008-04-28  9:31                   ` Andrew Morton
  2008-04-28  9:33                   ` Heiko Carstens
  0 siblings, 2 replies; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28  9:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, Gautham R Shenoy, Paul Jackson, linux-kernel

On Mon, Apr 28, 2008 at 01:57:23AM -0700, Andrew Morton wrote:
> On Mon, 28 Apr 2008 10:49:04 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> >  /* doms_cur_mutex serializes access to doms_cur[] array */
> > -static DEFINE_MUTEX(doms_cur_mutex);
> > +static DEFINE_MUTEX(sched_domains_mutex);
> 
> The comment refers to a no-longer-existing lock, and no longer correctly
> describes the lock's usage.

version 42. Please feel free to change the comment if you think it could
be better :)

Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.

So use the already present doms_cur_mutex to protect these sections again.

Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/sched.c |   24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -308,8 +308,10 @@ static DEFINE_PER_CPU(struct rt_rq, init
  */
 static DEFINE_SPINLOCK(task_group_lock);
 
-/* doms_cur_mutex serializes access to doms_cur[] array */
-static DEFINE_MUTEX(doms_cur_mutex);
+/* sched_domains_mutex serializes calls to arch_init_sched_domains,
+ * detach_destroy_domains and partition_sched_domains.
+ */
+static DEFINE_MUTEX(sched_domains_mutex);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_USER_SCHED
@@ -358,21 +360,9 @@ static inline void set_task_rq(struct ta
 #endif
 }
 
-static inline void lock_doms_cur(void)
-{
-	mutex_lock(&doms_cur_mutex);
-}
-
-static inline void unlock_doms_cur(void)
-{
-	mutex_unlock(&doms_cur_mutex);
-}
-
 #else
 
 static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline void lock_doms_cur(void) { }
-static inline void unlock_doms_cur(void) { }
 
 #endif	/* CONFIG_GROUP_SCHED */
 
@@ -7755,7 +7745,7 @@ void partition_sched_domains(int ndoms_n
 {
 	int i, j;
 
-	lock_doms_cur();
+	mutex_lock(&sched_domains_mutex);
 
 	/* always unregister in case we don't destroy any domains */
 	unregister_sched_domain_sysctl();
@@ -7804,7 +7794,7 @@ match2:
 
 	register_sched_domain_sysctl();
 
-	unlock_doms_cur();
+	mutex_unlock(&sched_domains_mutex);
 }
 
 #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
@@ -7813,8 +7803,10 @@ int arch_reinit_sched_domains(void)
 	int err;
 
 	get_online_cpus();
+	mutex_lock(&sched_domains_mutex);
 	detach_destroy_domains(&cpu_online_map);
 	err = arch_init_sched_domains(&cpu_online_map);
+	mutex_unlock(&sched_domains_mutex);
 	put_online_cpus();
 
 	return err;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  9:17                 ` Heiko Carstens
@ 2008-04-28  9:31                   ` Andrew Morton
  2008-04-28  9:33                   ` Heiko Carstens
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2008-04-28  9:31 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Ingo Molnar, Gautham R Shenoy, Paul Jackson, linux-kernel

On Mon, 28 Apr 2008 11:17:45 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> On Mon, Apr 28, 2008 at 01:57:23AM -0700, Andrew Morton wrote:
> > On Mon, 28 Apr 2008 10:49:04 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > >  /* doms_cur_mutex serializes access to doms_cur[] array */
> > > -static DEFINE_MUTEX(doms_cur_mutex);
> > > +static DEFINE_MUTEX(sched_domains_mutex);
> > 
> > The comment refers to a no-longer-existing lock, and no longer correctly
> > describes the lock's usage.
> 
> version 42. Please feel free to change the comment if you think it could
> be better :)

Actually, it's a pretty bad comment ;)

> +/* sched_domains_mutex serializes calls to arch_init_sched_domains,
> + * detach_destroy_domains and partition_sched_domains.
> + */

locks protect *data*, not "calls".  This matters.  Which data is actually
protected by this lock??


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  9:17                 ` Heiko Carstens
  2008-04-28  9:31                   ` Andrew Morton
@ 2008-04-28  9:33                   ` Heiko Carstens
  2008-04-28 12:27                     ` Ingo Molnar
  2008-04-28 13:13                     ` Ingo Molnar
  1 sibling, 2 replies; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28  9:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, Gautham R Shenoy, Paul Jackson, linux-kernel

On Mon, Apr 28, 2008 at 11:17:45AM +0200, Heiko Carstens wrote:
> On Mon, Apr 28, 2008 at 01:57:23AM -0700, Andrew Morton wrote:
> > On Mon, 28 Apr 2008 10:49:04 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > >  /* doms_cur_mutex serializes access to doms_cur[] array */
> > > -static DEFINE_MUTEX(doms_cur_mutex);
> > > +static DEFINE_MUTEX(sched_domains_mutex);
> > 
> > The comment refers to a no-longer-existing lock, and no longer correctly
> > describes the lock's usage.
> 
> version 42. Please feel free to change the comment if you think it could
> be better :)

I don't believe this... version 43 ;) I forgot to add the lock in
sched_init_smp(). Not that it would really matter, but it should be
there to avoid (even more) confusion.

Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.

So use the already present doms_cur_mutex to protect these sections again.

Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/sched.c |   26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -308,8 +308,10 @@ static DEFINE_PER_CPU(struct rt_rq, init
  */
 static DEFINE_SPINLOCK(task_group_lock);
 
-/* doms_cur_mutex serializes access to doms_cur[] array */
-static DEFINE_MUTEX(doms_cur_mutex);
+/* sched_domains_mutex serializes calls to arch_init_sched_domains,
+ * detach_destroy_domains and partition_sched_domains.
+ */
+static DEFINE_MUTEX(sched_domains_mutex);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_USER_SCHED
@@ -358,21 +360,9 @@ static inline void set_task_rq(struct ta
 #endif
 }
 
-static inline void lock_doms_cur(void)
-{
-	mutex_lock(&doms_cur_mutex);
-}
-
-static inline void unlock_doms_cur(void)
-{
-	mutex_unlock(&doms_cur_mutex);
-}
-
 #else
 
 static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline void lock_doms_cur(void) { }
-static inline void unlock_doms_cur(void) { }
 
 #endif	/* CONFIG_GROUP_SCHED */
 
@@ -7755,7 +7745,7 @@ void partition_sched_domains(int ndoms_n
 {
 	int i, j;
 
-	lock_doms_cur();
+	mutex_lock(&sched_domains_mutex);
 
 	/* always unregister in case we don't destroy any domains */
 	unregister_sched_domain_sysctl();
@@ -7804,7 +7794,7 @@ match2:
 
 	register_sched_domain_sysctl();
 
-	unlock_doms_cur();
+	mutex_unlock(&sched_domains_mutex);
 }
 
 #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
@@ -7813,8 +7803,10 @@ int arch_reinit_sched_domains(void)
 	int err;
 
 	get_online_cpus();
+	mutex_lock(&sched_domains_mutex);
 	detach_destroy_domains(&cpu_online_map);
 	err = arch_init_sched_domains(&cpu_online_map);
+	mutex_unlock(&sched_domains_mutex);
 	put_online_cpus();
 
 	return err;
@@ -7932,10 +7924,12 @@ void __init sched_init_smp(void)
 	BUG_ON(sched_group_nodes_bycpu == NULL);
 #endif
 	get_online_cpus();
+	mutex_lock(&sched_domains_mutex);
 	arch_init_sched_domains(&cpu_online_map);
 	cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
 	if (cpus_empty(non_isolated_cpus))
 		cpu_set(smp_processor_id(), non_isolated_cpus);
+	mutex_unlock(&sched_domains_mutex);
 	put_online_cpus();
 	/* XXX: Theoretical race here - CPU may be hotplugged now */
 	hotcpu_notifier(update_sched_domains, 0);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  9:33                   ` Heiko Carstens
@ 2008-04-28 12:27                     ` Ingo Molnar
  2008-04-28 13:13                     ` Ingo Molnar
  1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2008-04-28 12:27 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel


* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> > version 42. Please feel free to change the comment if you think it 
> > could be better :)
> 
> I don't believe this... version 43 ;) I forgot to add the lock in 
> sched_init_smp(). Not that it would really matter, but it should be 
> there to avoid (even more) confusion.
> 
> Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains

thanks Heiko, applied.

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: missing locking in sched_domains code
  2008-04-28  9:33                   ` Heiko Carstens
  2008-04-28 12:27                     ` Ingo Molnar
@ 2008-04-28 13:13                     ` Ingo Molnar
  1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2008-04-28 13:13 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel


* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> > version 42. Please feel free to change the comment if you think it could
> > be better :)
> 
> I don't believe this... version 43 ;) I forgot to add the lock in 
> sched_init_smp(). Not that it would really matter, but it should be 
> there to avoid (even more) confusion.

hey i just had to iterate it to version 44.

this bit:

> -/* doms_cur_mutex serializes access to doms_cur[] array */
> -static DEFINE_MUTEX(doms_cur_mutex);
> +/* sched_domains_mutex serializes calls to arch_init_sched_domains,
> + * detach_destroy_domains and partition_sched_domains.
> + */
> +static DEFINE_MUTEX(sched_domains_mutex);

was inside an #ifdef section ;-)

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-04-28 13:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-27 21:12 [PATCH] sched: missing locking in sched_domains code Heiko Carstens
2008-04-28  1:39 ` Andrew Morton
2008-04-28  7:09   ` Heiko Carstens
2008-04-28  7:24     ` Ingo Molnar
2008-04-28  7:28     ` Andrew Morton
2008-04-28  7:52       ` Heiko Carstens
2008-04-28  8:11         ` Heiko Carstens
2008-04-28  8:32           ` Ingo Molnar
2008-04-28  8:49             ` Heiko Carstens
2008-04-28  8:57               ` Andrew Morton
2008-04-28  9:17                 ` Heiko Carstens
2008-04-28  9:31                   ` Andrew Morton
2008-04-28  9:33                   ` Heiko Carstens
2008-04-28 12:27                     ` Ingo Molnar
2008-04-28 13:13                     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox