Re: [RFC][PATCH] CPUSets: Move most calls to rebuild_sched_domains() to the workqueue

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gautham R Shenoy <ego@in.ibm.com>
To: Paul Menage <menage@google.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>,
	Paul Jackson <pj@sgi.com>,
	a.p.zijlstra@chello.nl, maxk@qualcomm.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] CPUSets: Move most calls to rebuild_sched_domains() to the workqueue
Date: Fri, 27 Jun 2008 08:52:28 +0530	[thread overview]
Message-ID: <20080627032228.GA3419@in.ibm.com> (raw)
In-Reply-To: <48634BC1.8@google.com>

On Thu, Jun 26, 2008 at 12:56:49AM -0700, Paul Menage wrote:
> CPUsets: Move most calls to rebuild_sched_domains() to the workqueue
>
> In the current cpusets code the lock nesting between cgroup_mutex and
> cpuhotplug.lock when calling rebuild_sched_domains is inconsistent -
> in the CPU hotplug path cpuhotplug.lock nests outside cgroup_mutex,
> and in all other paths that call rebuild_sched_domains() it nests
> inside.
>
> This patch makes most calls to rebuild_sched_domains() asynchronous
> via the workqueue, which removes the nesting of the two locks in that
> case. In the case of an actual hotplug event, cpuhotplug.lock nests
> outside cgroup_mutex as now.
>
> Signed-off-by: Paul Menage <menage@google.com>
>
> ---
>
> Note that all I've done with this patch is verify that it compiles
> without warnings; I'm not sure how to trigger a hotplug event to test
> the lock dependencies or verify that scheduler domain support is still
> behaving correctly. Vegard, does this fix the problems that you were
> seeing? Paul/Max, does this still seem sane with regard to scheduler domains?
>

Hi Paul, 

Using a multithreaded workqueue(kevent here) for this is not such a
great idea this,since currently we cannot call
get_online_cpus() from a workitem executed by a multithreaded workqueue.

Can one use a single threaded workqueue here instead ?

Or, better, I think we can ask Oleg to re-submit the patch he had to make
get_online_cpus() safe to be called from within the workqueue. It does
require a special post CPU_DEAD notification, but as it does work the
last time I checked.

>
> kernel/cpuset.c |   35 +++++++++++++++++++++++------------
> 1 file changed, 23 insertions(+), 12 deletions(-)
>
> Index: lockfix-2.6.26-rc5-mm3/kernel/cpuset.c
> ===================================================================
> --- lockfix-2.6.26-rc5-mm3.orig/kernel/cpuset.c
> +++ lockfix-2.6.26-rc5-mm3/kernel/cpuset.c
> @@ -522,13 +522,9 @@ update_domain_attr(struct sched_domain_a
>  * domains when operating in the severe memory shortage situations
>  * that could cause allocation failures below.
>  *
> - * Call with cgroup_mutex held.  May take callback_mutex during
> - * call due to the kfifo_alloc() and kmalloc() calls.  May nest
> - * a call to the get_online_cpus()/put_online_cpus() pair.
> - * Must not be called holding callback_mutex, because we must not
> - * call get_online_cpus() while holding callback_mutex.  Elsewhere
> - * the kernel nests callback_mutex inside get_online_cpus() calls.
> - * So the reverse nesting would risk an ABBA deadlock.
> + * Call with cgroup_mutex held, and inside get_online_cpus().  May
> + * take callback_mutex during call due to the kfifo_alloc() and
> + * kmalloc() calls.
>  *
>  * The three key local variables below are:
>  *    q  - a kfifo queue of cpuset pointers, used to implement a
> @@ -689,9 +685,7 @@ restart:
>
> rebuild:
> 	/* Have scheduler rebuild sched domains */
> -	get_online_cpus();
> 	partition_sched_domains(ndoms, doms, dattr);
> -	put_online_cpus();
>
> done:
> 	if (q && !IS_ERR(q))
> @@ -701,6 +695,21 @@ done:
> 	/* Don't kfree(dattr) -- partition_sched_domains() does that. */
> }
>
> +/*
> + * Due to the need to nest cgroup_mutex inside cpuhotplug.lock, most
> + * of our invocations of rebuild_sched_domains() are done
> + * asynchronously via the workqueue
> + */
> +static void delayed_rebuild_sched_domains(struct work_struct *work)
> +{
> +	get_online_cpus();
> +	cgroup_lock();
> +	rebuild_sched_domains();
> +	cgroup_unlock();
> +	put_online_cpus();
> +}
> +static DECLARE_WORK(rebuild_sched_domains_work, delayed_rebuild_sched_domains);
> +
> static inline int started_after_time(struct task_struct *t1,
> 				     struct timespec *time,
> 				     struct task_struct *t2)
> @@ -853,7 +862,7 @@ static int update_cpumask(struct cpuset 		return 
> retval;
>
> 	if (is_load_balanced)
> -		rebuild_sched_domains();
> +		schedule_work(&rebuild_sched_domains_work);
> 	return 0;
> }
>
> @@ -1080,7 +1089,7 @@ static int update_relax_domain_level(str
>
> 	if (val != cs->relax_domain_level) {
> 		cs->relax_domain_level = val;
> -		rebuild_sched_domains();
> +		schedule_work(&rebuild_sched_domains_work);
> 	}
>
> 	return 0;
> @@ -1121,7 +1130,7 @@ static int update_flag(cpuset_flagbits_t
> 	mutex_unlock(&callback_mutex);
>
> 	if (cpus_nonempty && balance_flag_changed)
> -		rebuild_sched_domains();
> +		schedule_work(&rebuild_sched_domains_work);
>
> 	return 0;
> }
> @@ -1929,6 +1938,7 @@ static void scan_for_empty_cpusets(const
>
> static void common_cpu_mem_hotplug_unplug(void)
> {
> +	get_online_cpus();
> 	cgroup_lock();
>
> 	top_cpuset.cpus_allowed = cpu_online_map;
> @@ -1942,6 +1952,7 @@ static void common_cpu_mem_hotplug_unplu
> 	rebuild_sched_domains();
>
> 	cgroup_unlock();
> +	put_online_cpus();
> }
>
> /*
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
Thanks and Regards
gautham

next prev parent reply	other threads:[~2008-06-27  3:21 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-26  7:56 [RFC][PATCH] CPUSets: Move most calls to rebuild_sched_domains() to the workqueue Paul Menage
2008-06-26  9:34 ` Vegard Nossum
2008-06-26  9:50   ` Paul Menage
2008-06-26 18:49     ` Max Krasnyansky
2008-06-26 19:19       ` Peter Zijlstra
2008-06-26 20:34       ` Paul Menage
2008-06-26 21:17         ` Paul Menage
2008-06-27  5:10           ` Max Krasnyansky
2008-06-27  5:51             ` Paul Menage
2008-06-27 17:31               ` Max Krasnyansky
2008-06-27  3:22 ` Gautham R Shenoy [this message]
2008-06-27  3:23   ` Gautham R Shenoy
2008-06-27  4:53     ` Max Krasnyansky
2008-06-27 16:42     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080627032228.GA3419@in.ibm.com \
    --to=ego@in.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxk@qualcomm.com \
    --cc=menage@google.com \
    --cc=pj@sgi.com \
    --cc=vegard.nossum@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.