* [PATCH] sched: missing locking in sched_domains code
@ 2008-04-27 21:12 Heiko Carstens
2008-04-28 1:39 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2008-04-27 21:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.
So add a new sched_domains_mutex which protects these sections again.
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
include/linux/sched.h | 2 ++
kernel/cpuset.c | 2 ++
kernel/sched.c | 11 +++++++++++
3 files changed, 15 insertions(+)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -7807,14 +7807,23 @@ match2:
unlock_doms_cur();
}
+/*
+ * Protects against concurrent calls to detach_destroy_domains
+ * and arch_init_sched_domains.
+ */
+DEFINE_MUTEX(sched_domains_mutex);
+
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
int arch_reinit_sched_domains(void)
{
+ static DEFINE_MUTEX(arch_reinit_sched_domains_mutex);
int err;
get_online_cpus();
+ mutex_lock(&sched_domains_mutex);
detach_destroy_domains(&cpu_online_map);
err = arch_init_sched_domains(&cpu_online_map);
+ mutex_unlock(&sched_domains_mutex);
put_online_cpus();
return err;
@@ -7932,10 +7941,12 @@ void __init sched_init_smp(void)
BUG_ON(sched_group_nodes_bycpu == NULL);
#endif
get_online_cpus();
+ mutex_lock(&sched_domains_mutex);
arch_init_sched_domains(&cpu_online_map);
cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
if (cpus_empty(non_isolated_cpus))
cpu_set(smp_processor_id(), non_isolated_cpus);
+ mutex_unlock(&sched_domains_mutex);
put_online_cpus();
/* XXX: Theoretical race here - CPU may be hotplugged now */
hotcpu_notifier(update_sched_domains, 0);
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -809,6 +809,8 @@ struct sched_domain {
#endif
};
+extern struct mutex sched_domains_mutex;
+
extern void partition_sched_domains(int ndoms_new, cpumask_t *doms_new,
struct sched_domain_attr *dattr_new);
extern int arch_reinit_sched_domains(void);
Index: linux-2.6/kernel/cpuset.c
===================================================================
--- linux-2.6.orig/kernel/cpuset.c
+++ linux-2.6/kernel/cpuset.c
@@ -684,7 +684,9 @@ restart:
rebuild:
/* Have scheduler rebuild sched domains */
get_online_cpus();
+ mutex_lock(&sched_domains_mutex);
partition_sched_domains(ndoms, doms, dattr);
+ mutex_unlock(&sched_domains_mutex);
put_online_cpus();
done:
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-27 21:12 [PATCH] sched: missing locking in sched_domains code Heiko Carstens
@ 2008-04-28 1:39 ` Andrew Morton
2008-04-28 7:09 ` Heiko Carstens
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2008-04-28 1:39 UTC (permalink / raw)
To: Heiko Carstens; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel
On Sun, 27 Apr 2008 23:12:24 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> Concurrent calls to detach_destroy_domains and arch_init_sched_domains
> were prevented by the old scheduler subsystem cpu hotplug mutex. When
> this got converted to get_online_cpus() the locking got broken.
> Unlike before now several processes can concurrently enter the critical
> sections that were protected by the old lock.
>
> So add a new sched_domains_mutex which protects these sections again.
>
> Cc: Gautham R Shenoy <ego@in.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Paul Jackson <pj@sgi.com>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> ---
> include/linux/sched.h | 2 ++
> kernel/cpuset.c | 2 ++
> kernel/sched.c | 11 +++++++++++
> 3 files changed, 15 insertions(+)
>
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -7807,14 +7807,23 @@ match2:
> unlock_doms_cur();
> }
>
> +/*
> + * Protects against concurrent calls to detach_destroy_domains
> + * and arch_init_sched_domains.
> + */
> +DEFINE_MUTEX(sched_domains_mutex);
> +
> #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
> int arch_reinit_sched_domains(void)
> {
> + static DEFINE_MUTEX(arch_reinit_sched_domains_mutex);
leftover hunk.
> int err;
>
> get_online_cpus();
> + mutex_lock(&sched_domains_mutex);
> detach_destroy_domains(&cpu_online_map);
> err = arch_init_sched_domains(&cpu_online_map);
> + mutex_unlock(&sched_domains_mutex);
> put_online_cpus();
>
> return err;
> @@ -7932,10 +7941,12 @@ void __init sched_init_smp(void)
> BUG_ON(sched_group_nodes_bycpu == NULL);
> #endif
> get_online_cpus();
> + mutex_lock(&sched_domains_mutex);
> arch_init_sched_domains(&cpu_online_map);
> cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
> if (cpus_empty(non_isolated_cpus))
> cpu_set(smp_processor_id(), non_isolated_cpus);
> + mutex_unlock(&sched_domains_mutex);
> put_online_cpus();
> /* XXX: Theoretical race here - CPU may be hotplugged now */
> hotcpu_notifier(update_sched_domains, 0);
> Index: linux-2.6/include/linux/sched.h
> ===================================================================
> --- linux-2.6.orig/include/linux/sched.h
> +++ linux-2.6/include/linux/sched.h
> @@ -809,6 +809,8 @@ struct sched_domain {
> #endif
> };
>
> +extern struct mutex sched_domains_mutex;
> +
> extern void partition_sched_domains(int ndoms_new, cpumask_t *doms_new,
> struct sched_domain_attr *dattr_new);
> extern int arch_reinit_sched_domains(void);
> Index: linux-2.6/kernel/cpuset.c
> ===================================================================
> --- linux-2.6.orig/kernel/cpuset.c
> +++ linux-2.6/kernel/cpuset.c
> @@ -684,7 +684,9 @@ restart:
> rebuild:
> /* Have scheduler rebuild sched domains */
> get_online_cpus();
> + mutex_lock(&sched_domains_mutex);
> partition_sched_domains(ndoms, doms, dattr);
> + mutex_unlock(&sched_domains_mutex);
> put_online_cpus();
>
It seems a bit fragile to take this lock in the caller without even adding
a comment at the callee site which documents the new locking rule.
It would be more robust to take the lock within partition_sched_domains().
partition_sched_domains() already covers itself with lock_doms_cur(). Can
we take that in arch_reinit_sched_domains() rather than adding the new lock?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 1:39 ` Andrew Morton
@ 2008-04-28 7:09 ` Heiko Carstens
2008-04-28 7:24 ` Ingo Molnar
2008-04-28 7:28 ` Andrew Morton
0 siblings, 2 replies; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28 7:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel
On Sun, Apr 27, 2008 at 06:39:26PM -0700, Andrew Morton wrote:
> On Sun, 27 Apr 2008 23:12:24 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > Index: linux-2.6/kernel/cpuset.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/cpuset.c
> > +++ linux-2.6/kernel/cpuset.c
> > @@ -684,7 +684,9 @@ restart:
> > rebuild:
> > /* Have scheduler rebuild sched domains */
> > get_online_cpus();
> > + mutex_lock(&sched_domains_mutex);
> > partition_sched_domains(ndoms, doms, dattr);
> > + mutex_unlock(&sched_domains_mutex);
> > put_online_cpus();
> >
>
> It seems a bit fragile to take this lock in the caller without even adding
> a comment at the callee site which documents the new locking rule.
>
> It would be more robust to take the lock within partition_sched_domains().
>
> partition_sched_domains() already covers itself with lock_doms_cur(). Can
> we take that in arch_reinit_sched_domains() rather than adding the new lock?
I think you meant taking it in partition_sched_domains? But anyway, I moved
it all over to sched.c. So here's the new patch. Shorter and doesn't export
a new lock :)
Subject: [PATCH] sched: fix sched_domains locking
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.
So add a new sched_domains_mutex which protects these sections again.
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
kernel/sched.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -7730,6 +7730,12 @@ static int dattrs_equal(struct sched_dom
}
/*
+ * Protects against concurrent calls to detach_destroy_domains
+ * and arch_init_sched_domains.
+ */
+static DEFINE_MUTEX(sched_domains_mutex);
+
+/*
* Partition sched domains as specified by the 'ndoms_new'
* cpumasks in the array doms_new[] of cpumasks. This compares
* doms_new[] to the current sched domain partitioning, doms_cur[].
@@ -7756,7 +7762,8 @@ void partition_sched_domains(int ndoms_n
int i, j;
lock_doms_cur();
-
+ mutex_lock(&sched_domains_mutex);
+
/* always unregister in case we don't destroy any domains */
unregister_sched_domain_sysctl();
@@ -7804,6 +7811,7 @@ match2:
register_sched_domain_sysctl();
+ mutex_unlock(&sched_domains_mutex);
unlock_doms_cur();
}
@@ -7813,8 +7821,10 @@ int arch_reinit_sched_domains(void)
int err;
get_online_cpus();
+ mutex_lock(&sched_domains_mutex);
detach_destroy_domains(&cpu_online_map);
err = arch_init_sched_domains(&cpu_online_map);
+ mutex_unlock(&sched_domains_mutex);
put_online_cpus();
return err;
@@ -7932,10 +7942,12 @@ void __init sched_init_smp(void)
BUG_ON(sched_group_nodes_bycpu == NULL);
#endif
get_online_cpus();
+ mutex_lock(&sched_domains_mutex);
arch_init_sched_domains(&cpu_online_map);
cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
if (cpus_empty(non_isolated_cpus))
cpu_set(smp_processor_id(), non_isolated_cpus);
+ mutex_unlock(&sched_domains_mutex);
put_online_cpus();
/* XXX: Theoretical race here - CPU may be hotplugged now */
hotcpu_notifier(update_sched_domains, 0);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 7:09 ` Heiko Carstens
@ 2008-04-28 7:24 ` Ingo Molnar
2008-04-28 7:28 ` Andrew Morton
1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2008-04-28 7:24 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel,
Peter Zijlstra
* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> /*
> + * Protects against concurrent calls to detach_destroy_domains
> + * and arch_init_sched_domains.
> + */
> +static DEFINE_MUTEX(sched_domains_mutex);
> +
> +/*
> * Partition sched domains as specified by the 'ndoms_new'
> * cpumasks in the array doms_new[] of cpumasks. This compares
> * doms_new[] to the current sched domain partitioning, doms_cur[].
> @@ -7756,7 +7762,8 @@ void partition_sched_domains(int ndoms_n
> int i, j;
>
> lock_doms_cur();
> -
> + mutex_lock(&sched_domains_mutex);
i might be missing something but why not make doms_cur_mutex locking
unconditional and extend it to detach_destroy_domains() as well?
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 7:09 ` Heiko Carstens
2008-04-28 7:24 ` Ingo Molnar
@ 2008-04-28 7:28 ` Andrew Morton
2008-04-28 7:52 ` Heiko Carstens
1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2008-04-28 7:28 UTC (permalink / raw)
To: Heiko Carstens; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel
On Mon, 28 Apr 2008 09:09:46 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> On Sun, Apr 27, 2008 at 06:39:26PM -0700, Andrew Morton wrote:
> > On Sun, 27 Apr 2008 23:12:24 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > Index: linux-2.6/kernel/cpuset.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/cpuset.c
> > > +++ linux-2.6/kernel/cpuset.c
> > > @@ -684,7 +684,9 @@ restart:
> > > rebuild:
> > > /* Have scheduler rebuild sched domains */
> > > get_online_cpus();
> > > + mutex_lock(&sched_domains_mutex);
> > > partition_sched_domains(ndoms, doms, dattr);
> > > + mutex_unlock(&sched_domains_mutex);
> > > put_online_cpus();
> > >
> >
> > It seems a bit fragile to take this lock in the caller without even adding
> > a comment at the callee site which documents the new locking rule.
> >
> > It would be more robust to take the lock within partition_sched_domains().
> >
> > partition_sched_domains() already covers itself with lock_doms_cur(). Can
> > we take that in arch_reinit_sched_domains() rather than adding the new lock?
>
> I think you meant taking it in partition_sched_domains?
What I meant was: rather than adding the new sched_domains_mutex, can we
instead call lock_doms_cur() from arch_reinit_sched_domains() and
sched_init_smp()? Borrow the existing lock?
Whether that makes sense depends upon what lock_doms_cur() semantically
*means*. As that appears to be somewhat of a secret, we get to decide ;)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 7:28 ` Andrew Morton
@ 2008-04-28 7:52 ` Heiko Carstens
2008-04-28 8:11 ` Heiko Carstens
0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28 7:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel
On Mon, Apr 28, 2008 at 12:28:53AM -0700, Andrew Morton wrote:
> On Mon, 28 Apr 2008 09:09:46 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > On Sun, Apr 27, 2008 at 06:39:26PM -0700, Andrew Morton wrote:
> > > On Sun, 27 Apr 2008 23:12:24 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > > Index: linux-2.6/kernel/cpuset.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/kernel/cpuset.c
> > > > +++ linux-2.6/kernel/cpuset.c
> > > > @@ -684,7 +684,9 @@ restart:
> > > > rebuild:
> > > > /* Have scheduler rebuild sched domains */
> > > > get_online_cpus();
> > > > + mutex_lock(&sched_domains_mutex);
> > > > partition_sched_domains(ndoms, doms, dattr);
> > > > + mutex_unlock(&sched_domains_mutex);
> > > > put_online_cpus();
> > > >
> > >
> > > It seems a bit fragile to take this lock in the caller without even adding
> > > a comment at the callee site which documents the new locking rule.
> > >
> > > It would be more robust to take the lock within partition_sched_domains().
> > >
> > > partition_sched_domains() already covers itself with lock_doms_cur(). Can
> > > we take that in arch_reinit_sched_domains() rather than adding the new lock?
> >
> > I think you meant taking it in partition_sched_domains?
>
> What I meant was: rather than adding the new sched_domains_mutex, can we
> instead call lock_doms_cur() from arch_reinit_sched_domains() and
> sched_init_smp()? Borrow the existing lock?
>
> Whether that makes sense depends upon what lock_doms_cur() semantically
> *means*. As that appears to be somewhat of a secret, we get to decide ;)
Oh, I didn't realize that lock_doms_cur() was only used in one function.
So that should work. Will try.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 7:52 ` Heiko Carstens
@ 2008-04-28 8:11 ` Heiko Carstens
2008-04-28 8:32 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28 8:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: Gautham R Shenoy, Ingo Molnar, Paul Jackson, linux-kernel
Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.
So use the already present doms_cur_mutex to protect these sections again.
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
kernel/sched.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -311,6 +311,16 @@ static DEFINE_SPINLOCK(task_group_lock);
/* doms_cur_mutex serializes access to doms_cur[] array */
static DEFINE_MUTEX(doms_cur_mutex);
+static inline void lock_doms_cur(void)
+{
+ mutex_lock(&doms_cur_mutex);
+}
+
+static inline void unlock_doms_cur(void)
+{
+ mutex_unlock(&doms_cur_mutex);
+}
+
#ifdef CONFIG_FAIR_GROUP_SCHED
#ifdef CONFIG_USER_SCHED
# define INIT_TASK_GROUP_LOAD (2*NICE_0_LOAD)
@@ -358,21 +368,9 @@ static inline void set_task_rq(struct ta
#endif
}
-static inline void lock_doms_cur(void)
-{
- mutex_lock(&doms_cur_mutex);
-}
-
-static inline void unlock_doms_cur(void)
-{
- mutex_unlock(&doms_cur_mutex);
-}
-
#else
static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline void lock_doms_cur(void) { }
-static inline void unlock_doms_cur(void) { }
#endif /* CONFIG_GROUP_SCHED */
@@ -7813,8 +7811,10 @@ int arch_reinit_sched_domains(void)
int err;
get_online_cpus();
+ lock_doms_cur();
detach_destroy_domains(&cpu_online_map);
err = arch_init_sched_domains(&cpu_online_map);
+ unlock_doms_cur();
put_online_cpus();
return err;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 8:11 ` Heiko Carstens
@ 2008-04-28 8:32 ` Ingo Molnar
2008-04-28 8:49 ` Heiko Carstens
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2008-04-28 8:32 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel
* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> /* doms_cur_mutex serializes access to doms_cur[] array */
> static DEFINE_MUTEX(doms_cur_mutex);
>
> +static inline void lock_doms_cur(void)
> +{
> + mutex_lock(&doms_cur_mutex);
> +}
> @@ -7813,8 +7811,10 @@ int arch_reinit_sched_domains(void)
> int err;
>
> get_online_cpus();
> + lock_doms_cur();
thanks, that looks a lot more clean already. May i ask for another
thing, if you are hacking on this anyway? Please get rid of the
lock_doms_cur() complication now that it's not conditional - an open
coded mutex_lock(&sched_doms_mutex) looks more readable - it gives a
clear idea about what's happening. Also, please rename sched_doms_mutex
to something less tongue-twisting - such as sched_domains_mutex. Hm?
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 8:32 ` Ingo Molnar
@ 2008-04-28 8:49 ` Heiko Carstens
2008-04-28 8:57 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28 8:49 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel
On Mon, Apr 28, 2008 at 10:32:22AM +0200, Ingo Molnar wrote:
>
> * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > /* doms_cur_mutex serializes access to doms_cur[] array */
> > static DEFINE_MUTEX(doms_cur_mutex);
> >
> > +static inline void lock_doms_cur(void)
> > +{
> > + mutex_lock(&doms_cur_mutex);
> > +}
>
> > @@ -7813,8 +7811,10 @@ int arch_reinit_sched_domains(void)
> > int err;
> >
> > get_online_cpus();
> > + lock_doms_cur();
>
> thanks, that looks a lot more clean already. May i ask for another
> thing, if you are hacking on this anyway? Please get rid of the
> lock_doms_cur() complication now that it's not conditional - an open
> coded mutex_lock(&sched_doms_mutex) looks more readable - it gives a
> clear idea about what's happening. Also, please rename sched_doms_mutex
> to something less tongue-twisting - such as sched_domains_mutex. Hm?
Your wish is my order:
Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.
So use the already present doms_cur_mutex to protect these sections again.
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
kernel/sched.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -309,7 +309,7 @@ static DEFINE_PER_CPU(struct rt_rq, init
static DEFINE_SPINLOCK(task_group_lock);
/* doms_cur_mutex serializes access to doms_cur[] array */
-static DEFINE_MUTEX(doms_cur_mutex);
+static DEFINE_MUTEX(sched_domains_mutex);
#ifdef CONFIG_FAIR_GROUP_SCHED
#ifdef CONFIG_USER_SCHED
@@ -358,21 +358,9 @@ static inline void set_task_rq(struct ta
#endif
}
-static inline void lock_doms_cur(void)
-{
- mutex_lock(&doms_cur_mutex);
-}
-
-static inline void unlock_doms_cur(void)
-{
- mutex_unlock(&doms_cur_mutex);
-}
-
#else
static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline void lock_doms_cur(void) { }
-static inline void unlock_doms_cur(void) { }
#endif /* CONFIG_GROUP_SCHED */
@@ -7755,7 +7743,7 @@ void partition_sched_domains(int ndoms_n
{
int i, j;
- lock_doms_cur();
+ mutex_lock(&sched_domains_mutex);
/* always unregister in case we don't destroy any domains */
unregister_sched_domain_sysctl();
@@ -7804,7 +7792,7 @@ match2:
register_sched_domain_sysctl();
- unlock_doms_cur();
+ mutex_unlock(&sched_domains_mutex);
}
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
@@ -7813,8 +7801,10 @@ int arch_reinit_sched_domains(void)
int err;
get_online_cpus();
+ mutex_lock(&sched_domains_mutex);
detach_destroy_domains(&cpu_online_map);
err = arch_init_sched_domains(&cpu_online_map);
+ mutex_unlock(&sched_domains_mutex);
put_online_cpus();
return err;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 8:49 ` Heiko Carstens
@ 2008-04-28 8:57 ` Andrew Morton
2008-04-28 9:17 ` Heiko Carstens
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2008-04-28 8:57 UTC (permalink / raw)
To: Heiko Carstens; +Cc: Ingo Molnar, Gautham R Shenoy, Paul Jackson, linux-kernel
On Mon, 28 Apr 2008 10:49:04 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> On Mon, Apr 28, 2008 at 10:32:22AM +0200, Ingo Molnar wrote:
> >
> > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> >
> > > /* doms_cur_mutex serializes access to doms_cur[] array */
> > > static DEFINE_MUTEX(doms_cur_mutex);
> > >
> > > +static inline void lock_doms_cur(void)
> > > +{
> > > + mutex_lock(&doms_cur_mutex);
> > > +}
> >
> > > @@ -7813,8 +7811,10 @@ int arch_reinit_sched_domains(void)
> > > int err;
> > >
> > > get_online_cpus();
> > > + lock_doms_cur();
> >
> > thanks, that looks a lot more clean already. May i ask for another
> > thing, if you are hacking on this anyway? Please get rid of the
> > lock_doms_cur() complication now that it's not conditional - an open
> > coded mutex_lock(&sched_doms_mutex) looks more readable - it gives a
> > clear idea about what's happening. Also, please rename sched_doms_mutex
> > to something less tongue-twisting - such as sched_domains_mutex. Hm?
>
> Your wish is my order:
heh, let's all boss Heiko around.
> /* doms_cur_mutex serializes access to doms_cur[] array */
> -static DEFINE_MUTEX(doms_cur_mutex);
> +static DEFINE_MUTEX(sched_domains_mutex);
The comment refers to a no-longer-existing lock, and no longer correctly
describes the lock's usage.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 8:57 ` Andrew Morton
@ 2008-04-28 9:17 ` Heiko Carstens
2008-04-28 9:31 ` Andrew Morton
2008-04-28 9:33 ` Heiko Carstens
0 siblings, 2 replies; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28 9:17 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ingo Molnar, Gautham R Shenoy, Paul Jackson, linux-kernel
On Mon, Apr 28, 2008 at 01:57:23AM -0700, Andrew Morton wrote:
> On Mon, 28 Apr 2008 10:49:04 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > /* doms_cur_mutex serializes access to doms_cur[] array */
> > -static DEFINE_MUTEX(doms_cur_mutex);
> > +static DEFINE_MUTEX(sched_domains_mutex);
>
> The comment refers to a no-longer-existing lock, and no longer correctly
> describes the lock's usage.
version 42. Please feel free to change the comment if you think it could
be better :)
Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.
So use the already present doms_cur_mutex to protect these sections again.
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
kernel/sched.c | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -308,8 +308,10 @@ static DEFINE_PER_CPU(struct rt_rq, init
*/
static DEFINE_SPINLOCK(task_group_lock);
-/* doms_cur_mutex serializes access to doms_cur[] array */
-static DEFINE_MUTEX(doms_cur_mutex);
+/* sched_domains_mutex serializes calls to arch_init_sched_domains,
+ * detach_destroy_domains and partition_sched_domains.
+ */
+static DEFINE_MUTEX(sched_domains_mutex);
#ifdef CONFIG_FAIR_GROUP_SCHED
#ifdef CONFIG_USER_SCHED
@@ -358,21 +360,9 @@ static inline void set_task_rq(struct ta
#endif
}
-static inline void lock_doms_cur(void)
-{
- mutex_lock(&doms_cur_mutex);
-}
-
-static inline void unlock_doms_cur(void)
-{
- mutex_unlock(&doms_cur_mutex);
-}
-
#else
static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline void lock_doms_cur(void) { }
-static inline void unlock_doms_cur(void) { }
#endif /* CONFIG_GROUP_SCHED */
@@ -7755,7 +7745,7 @@ void partition_sched_domains(int ndoms_n
{
int i, j;
- lock_doms_cur();
+ mutex_lock(&sched_domains_mutex);
/* always unregister in case we don't destroy any domains */
unregister_sched_domain_sysctl();
@@ -7804,7 +7794,7 @@ match2:
register_sched_domain_sysctl();
- unlock_doms_cur();
+ mutex_unlock(&sched_domains_mutex);
}
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
@@ -7813,8 +7803,10 @@ int arch_reinit_sched_domains(void)
int err;
get_online_cpus();
+ mutex_lock(&sched_domains_mutex);
detach_destroy_domains(&cpu_online_map);
err = arch_init_sched_domains(&cpu_online_map);
+ mutex_unlock(&sched_domains_mutex);
put_online_cpus();
return err;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 9:17 ` Heiko Carstens
@ 2008-04-28 9:31 ` Andrew Morton
2008-04-28 9:33 ` Heiko Carstens
1 sibling, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2008-04-28 9:31 UTC (permalink / raw)
To: Heiko Carstens; +Cc: Ingo Molnar, Gautham R Shenoy, Paul Jackson, linux-kernel
On Mon, 28 Apr 2008 11:17:45 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> On Mon, Apr 28, 2008 at 01:57:23AM -0700, Andrew Morton wrote:
> > On Mon, 28 Apr 2008 10:49:04 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > /* doms_cur_mutex serializes access to doms_cur[] array */
> > > -static DEFINE_MUTEX(doms_cur_mutex);
> > > +static DEFINE_MUTEX(sched_domains_mutex);
> >
> > The comment refers to a no-longer-existing lock, and no longer correctly
> > describes the lock's usage.
>
> version 42. Please feel free to change the comment if you think it could
> be better :)
Actually, it's a pretty bad comment ;)
> +/* sched_domains_mutex serializes calls to arch_init_sched_domains,
> + * detach_destroy_domains and partition_sched_domains.
> + */
locks protect *data*, not "calls". This matters. Which data is actually
protected by this lock??
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 9:17 ` Heiko Carstens
2008-04-28 9:31 ` Andrew Morton
@ 2008-04-28 9:33 ` Heiko Carstens
2008-04-28 12:27 ` Ingo Molnar
2008-04-28 13:13 ` Ingo Molnar
1 sibling, 2 replies; 15+ messages in thread
From: Heiko Carstens @ 2008-04-28 9:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ingo Molnar, Gautham R Shenoy, Paul Jackson, linux-kernel
On Mon, Apr 28, 2008 at 11:17:45AM +0200, Heiko Carstens wrote:
> On Mon, Apr 28, 2008 at 01:57:23AM -0700, Andrew Morton wrote:
> > On Mon, 28 Apr 2008 10:49:04 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > /* doms_cur_mutex serializes access to doms_cur[] array */
> > > -static DEFINE_MUTEX(doms_cur_mutex);
> > > +static DEFINE_MUTEX(sched_domains_mutex);
> >
> > The comment refers to a no-longer-existing lock, and no longer correctly
> > describes the lock's usage.
>
> version 42. Please feel free to change the comment if you think it could
> be better :)
I don't believe this... version 43 ;) I forgot to add the lock in
sched_init_smp(). Not that it would really matter, but it should be
there to avoid (even more) confusion.
Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.
So use the already present doms_cur_mutex to protect these sections again.
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
kernel/sched.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -308,8 +308,10 @@ static DEFINE_PER_CPU(struct rt_rq, init
*/
static DEFINE_SPINLOCK(task_group_lock);
-/* doms_cur_mutex serializes access to doms_cur[] array */
-static DEFINE_MUTEX(doms_cur_mutex);
+/* sched_domains_mutex serializes calls to arch_init_sched_domains,
+ * detach_destroy_domains and partition_sched_domains.
+ */
+static DEFINE_MUTEX(sched_domains_mutex);
#ifdef CONFIG_FAIR_GROUP_SCHED
#ifdef CONFIG_USER_SCHED
@@ -358,21 +360,9 @@ static inline void set_task_rq(struct ta
#endif
}
-static inline void lock_doms_cur(void)
-{
- mutex_lock(&doms_cur_mutex);
-}
-
-static inline void unlock_doms_cur(void)
-{
- mutex_unlock(&doms_cur_mutex);
-}
-
#else
static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline void lock_doms_cur(void) { }
-static inline void unlock_doms_cur(void) { }
#endif /* CONFIG_GROUP_SCHED */
@@ -7755,7 +7745,7 @@ void partition_sched_domains(int ndoms_n
{
int i, j;
- lock_doms_cur();
+ mutex_lock(&sched_domains_mutex);
/* always unregister in case we don't destroy any domains */
unregister_sched_domain_sysctl();
@@ -7804,7 +7794,7 @@ match2:
register_sched_domain_sysctl();
- unlock_doms_cur();
+ mutex_unlock(&sched_domains_mutex);
}
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
@@ -7813,8 +7803,10 @@ int arch_reinit_sched_domains(void)
int err;
get_online_cpus();
+ mutex_lock(&sched_domains_mutex);
detach_destroy_domains(&cpu_online_map);
err = arch_init_sched_domains(&cpu_online_map);
+ mutex_unlock(&sched_domains_mutex);
put_online_cpus();
return err;
@@ -7932,10 +7924,12 @@ void __init sched_init_smp(void)
BUG_ON(sched_group_nodes_bycpu == NULL);
#endif
get_online_cpus();
+ mutex_lock(&sched_domains_mutex);
arch_init_sched_domains(&cpu_online_map);
cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
if (cpus_empty(non_isolated_cpus))
cpu_set(smp_processor_id(), non_isolated_cpus);
+ mutex_unlock(&sched_domains_mutex);
put_online_cpus();
/* XXX: Theoretical race here - CPU may be hotplugged now */
hotcpu_notifier(update_sched_domains, 0);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 9:33 ` Heiko Carstens
@ 2008-04-28 12:27 ` Ingo Molnar
2008-04-28 13:13 ` Ingo Molnar
1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2008-04-28 12:27 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel
* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > version 42. Please feel free to change the comment if you think it
> > could be better :)
>
> I don't believe this... version 43 ;) I forgot to add the lock in
> sched_init_smp(). Not that it would really matter, but it should be
> there to avoid (even more) confusion.
>
> Subject: [PATCH] sched: fix locking in arch_reinit_sched_domains
thanks Heiko, applied.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sched: missing locking in sched_domains code
2008-04-28 9:33 ` Heiko Carstens
2008-04-28 12:27 ` Ingo Molnar
@ 2008-04-28 13:13 ` Ingo Molnar
1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2008-04-28 13:13 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, Gautham R Shenoy, Paul Jackson, linux-kernel
* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > version 42. Please feel free to change the comment if you think it could
> > be better :)
>
> I don't believe this... version 43 ;) I forgot to add the lock in
> sched_init_smp(). Not that it would really matter, but it should be
> there to avoid (even more) confusion.
hey i just had to iterate it to version 44.
this bit:
> -/* doms_cur_mutex serializes access to doms_cur[] array */
> -static DEFINE_MUTEX(doms_cur_mutex);
> +/* sched_domains_mutex serializes calls to arch_init_sched_domains,
> + * detach_destroy_domains and partition_sched_domains.
> + */
> +static DEFINE_MUTEX(sched_domains_mutex);
was inside an #ifdef section ;-)
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-04-28 13:14 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-27 21:12 [PATCH] sched: missing locking in sched_domains code Heiko Carstens
2008-04-28 1:39 ` Andrew Morton
2008-04-28 7:09 ` Heiko Carstens
2008-04-28 7:24 ` Ingo Molnar
2008-04-28 7:28 ` Andrew Morton
2008-04-28 7:52 ` Heiko Carstens
2008-04-28 8:11 ` Heiko Carstens
2008-04-28 8:32 ` Ingo Molnar
2008-04-28 8:49 ` Heiko Carstens
2008-04-28 8:57 ` Andrew Morton
2008-04-28 9:17 ` Heiko Carstens
2008-04-28 9:31 ` Andrew Morton
2008-04-28 9:33 ` Heiko Carstens
2008-04-28 12:27 ` Ingo Molnar
2008-04-28 13:13 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox