From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764187AbYD1BkJ (ORCPT ); Sun, 27 Apr 2008 21:40:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755127AbYD1Bj5 (ORCPT ); Sun, 27 Apr 2008 21:39:57 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:60133 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753767AbYD1Bj5 (ORCPT ); Sun, 27 Apr 2008 21:39:57 -0400 Date: Sun, 27 Apr 2008 18:39:26 -0700 From: Andrew Morton To: Heiko Carstens Cc: Gautham R Shenoy , Ingo Molnar , Paul Jackson , linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched: missing locking in sched_domains code Message-Id: <20080427183926.acb66fff.akpm@linux-foundation.org> In-Reply-To: <20080427211224.GA21830@osiris.boeblingen.de.ibm.com> References: <20080427211224.GA21830@osiris.boeblingen.de.ibm.com> X-Mailer: Sylpheed 2.4.8 (GTK+ 2.12.5; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 27 Apr 2008 23:12:24 +0200 Heiko Carstens wrote: > From: Heiko Carstens > > Concurrent calls to detach_destroy_domains and arch_init_sched_domains > were prevented by the old scheduler subsystem cpu hotplug mutex. When > this got converted to get_online_cpus() the locking got broken. > Unlike before now several processes can concurrently enter the critical > sections that were protected by the old lock. > > So add a new sched_domains_mutex which protects these sections again. > > Cc: Gautham R Shenoy > Cc: Ingo Molnar > Cc: Paul Jackson > Signed-off-by: Heiko Carstens > --- > include/linux/sched.h | 2 ++ > kernel/cpuset.c | 2 ++ > kernel/sched.c | 11 +++++++++++ > 3 files changed, 15 insertions(+) > > Index: linux-2.6/kernel/sched.c > =================================================================== > --- linux-2.6.orig/kernel/sched.c > +++ linux-2.6/kernel/sched.c > @@ -7807,14 +7807,23 @@ match2: > unlock_doms_cur(); > } > > +/* > + * Protects against concurrent calls to detach_destroy_domains > + * and arch_init_sched_domains. > + */ > +DEFINE_MUTEX(sched_domains_mutex); > + > #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT) > int arch_reinit_sched_domains(void) > { > + static DEFINE_MUTEX(arch_reinit_sched_domains_mutex); leftover hunk. > int err; > > get_online_cpus(); > + mutex_lock(&sched_domains_mutex); > detach_destroy_domains(&cpu_online_map); > err = arch_init_sched_domains(&cpu_online_map); > + mutex_unlock(&sched_domains_mutex); > put_online_cpus(); > > return err; > @@ -7932,10 +7941,12 @@ void __init sched_init_smp(void) > BUG_ON(sched_group_nodes_bycpu == NULL); > #endif > get_online_cpus(); > + mutex_lock(&sched_domains_mutex); > arch_init_sched_domains(&cpu_online_map); > cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map); > if (cpus_empty(non_isolated_cpus)) > cpu_set(smp_processor_id(), non_isolated_cpus); > + mutex_unlock(&sched_domains_mutex); > put_online_cpus(); > /* XXX: Theoretical race here - CPU may be hotplugged now */ > hotcpu_notifier(update_sched_domains, 0); > Index: linux-2.6/include/linux/sched.h > =================================================================== > --- linux-2.6.orig/include/linux/sched.h > +++ linux-2.6/include/linux/sched.h > @@ -809,6 +809,8 @@ struct sched_domain { > #endif > }; > > +extern struct mutex sched_domains_mutex; > + > extern void partition_sched_domains(int ndoms_new, cpumask_t *doms_new, > struct sched_domain_attr *dattr_new); > extern int arch_reinit_sched_domains(void); > Index: linux-2.6/kernel/cpuset.c > =================================================================== > --- linux-2.6.orig/kernel/cpuset.c > +++ linux-2.6/kernel/cpuset.c > @@ -684,7 +684,9 @@ restart: > rebuild: > /* Have scheduler rebuild sched domains */ > get_online_cpus(); > + mutex_lock(&sched_domains_mutex); > partition_sched_domains(ndoms, doms, dattr); > + mutex_unlock(&sched_domains_mutex); > put_online_cpus(); > It seems a bit fragile to take this lock in the caller without even adding a comment at the callee site which documents the new locking rule. It would be more robust to take the lock within partition_sched_domains(). partition_sched_domains() already covers itself with lock_doms_cur(). Can we take that in arch_reinit_sched_domains() rather than adding the new lock?