All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	menage@google.com, miaox@cn.fujitsu.com, maxk@qualcomm.com,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 2/3] cgroup: introduce cgroup_queue_deferred_work()
Date: Sun, 18 Jan 2009 10:04:26 +0100	[thread overview]
Message-ID: <20090118090426.GA27144@elte.hu> (raw)
In-Reply-To: <4972E30A.6080107@cn.fujitsu.com>


* Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> Sometimes we need require a lock to prevent something,
> but this lock cannot nest in cgroup_lock. So this work
> should be moved out of cgroup_lock's critical region.
> 
> Using schedule_work() can move this work out of cgroup_lock's
> critical region. But it's a overkill for move a work to
> other process. And if we need flush_work() with cgroup_lock
> held, schedule_work() can not work for flush_work() will
> cause deadlock.
> 
> Another solution is that deferring the work, and processing
> it after cgroup_lock released. This patch introduces
> cgroup_queue_deferred_work() for queue a cgroup_deferred_work.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Cc: Max Krasnyansky <maxk@qualcomm.com>
> Cc: Miao Xie <miaox@cn.fujitsu.com>
> ---
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index e267e62..6d3e6dc 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -437,6 +437,19 @@ void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it);
>  int cgroup_scan_tasks(struct cgroup_scanner *scan);
>  int cgroup_attach_task(struct cgroup *, struct task_struct *);
>  
> +struct cgroup_deferred_work {
> +	struct list_head list;
> +	void (*func)(struct cgroup_deferred_work *);
> +};
> +
> +#define CGROUP_DEFERRED_WORK(name, function)		\
> +	struct cgroup_deferred_work name = {		\
> +		.list = LIST_HEAD_INIT((name).list),	\
> +		.func = (function),			\
> +	};
> +
> +int cgroup_queue_deferred_work(struct cgroup_deferred_work *deferred_work);
> +
>  #else /* !CONFIG_CGROUPS */
>  
>  static inline int cgroup_init_early(void) { return 0; }
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index c298310..75a352b 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -540,6 +540,7 @@ void cgroup_lock(void)
>  	mutex_lock(&cgroup_mutex);
>  }
>  
> +static void cgroup_flush_deferred_work_locked(void);
>  /**
>   * cgroup_unlock - release lock on cgroup changes
>   *
> @@ -547,9 +548,80 @@ void cgroup_lock(void)
>   */
>  void cgroup_unlock(void)
>  {
> +	cgroup_flush_deferred_work_locked();
>  	mutex_unlock(&cgroup_mutex);

So in cgroup_unlock() [which is called all over the places] we first call 
cgroup_flush_deferred_work_locked(), then drop the cgroup_mutex. Then:

>  }
>  
> +/* deferred_work_list is protected by cgroup_mutex */
> +static LIST_HEAD(deferred_work_list);
> +
> +/* flush deferred works with cgroup_lock released */
> +static void cgroup_flush_deferred_work_locked(void)
> +{
> +	static bool running_deferred_work;
> +
> +	if (likely(list_empty(&deferred_work_list)))
> +		return;

we check whether there's any work done, then:

> +
> +	/*
> +	 * Ensure it's not recursive and also
> +	 * ensure deferred works are run orderly.
> +	 */
> +	if (running_deferred_work)
> +		return;
> +	running_deferred_work = true;

we set a recursion flag, then:

> +
> +	for ( ; ; ) {

 [ please change this to the standard 'for (;;)' style. ]

> +		struct cgroup_deferred_work *deferred_work;
> +
> +		/* dequeue the first work, and mark it dequeued */
> +		deferred_work = list_first_entry(&deferred_work_list,
> +				struct cgroup_deferred_work, list);
> +		list_del_init(&deferred_work->list);
> +
> +		mutex_unlock(&cgroup_mutex);

we drop the cgroup_mutex and start processing deferred work, then:

> +
> +		/*
> +		 * cgroup_mutex is released. The callback function can use
> +		 * cgroup_lock()/cgroup_unlock(). This behavior is safe
> +		 * for running_deferred_work is set to 'true'.
> +		 */
> +		deferred_work->func(deferred_work);
> +
> +		/*
> +		 * regain cgroup_mutex to access deferred_work_list
> +		 * and running_deferred_work.
> +		 */
> +		mutex_lock(&cgroup_mutex);

then we drop the mutex and:

> +
> +		if (list_empty(&deferred_work_list))
> +			break;
> +	}
> +
> +	running_deferred_work = false;

clear the recursion flag.

So this is already a high-complexity, high-overhead codepath for the 
deferred work case.

Why isnt this in a workqueue? That way there's no overhead for the normal 
fastpath _at all_ - the deferred wakeup would be handled as side-effect of 
the mutex unlock in essence. Nor would you duplicate core kernel 
infrastructure that way.

Plus:

> +int cgroup_queue_deferred_work(struct cgroup_deferred_work *deferred_work)
> +{
> +	int ret = 0;
> +
> +	if (list_empty(&deferred_work->list)) {
> +		list_add_tail(&deferred_work->list, &deferred_work_list);
> +		ret = 1;
> +	}
> +
> +	return ret;

Why is the addition of work dependent on whether it's queued up already? 
Callers should know whether it's queued or not - and if they dont then 
this is hiding a code structure problem elsewhere.

	Ingo

  reply	other threads:[~2009-01-18  9:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-16  2:24 [PATCH] cpuset: fix possible deadlock in async_rebuild_sched_domains Miao Xie
2009-01-16  3:33 ` Lai Jiangshan
2009-01-16 20:57   ` Andrew Morton
2009-01-18  8:06     ` [PATCH 1/3] cgroup: convert open-coded mutex_lock(&cgroup_mutex) calls into cgroup_lock() calls Lai Jiangshan
2009-01-18  9:10       ` Ingo Molnar
2009-01-19  1:37         ` Paul Menage
2009-01-19  1:41           ` Ingo Molnar
2009-01-20  1:28             ` Paul Menage
2009-01-20 18:22               ` Peter Zijlstra
2009-01-20  1:18       ` Paul Menage
2009-01-18  8:06     ` [PATCH 2/3] cgroup: introduce cgroup_queue_deferred_work() Lai Jiangshan
2009-01-18  9:04       ` Ingo Molnar [this message]
2009-01-19  1:55         ` Lai Jiangshan
2009-01-20  1:26       ` Paul Menage
2009-01-18  8:06     ` [PATCH 3/3] cpuset: fix possible deadlock in async_rebuild_sched_domains Lai Jiangshan
2009-01-18  9:06       ` Ingo Molnar
2009-01-19  1:40         ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090118090426.GA27144@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxk@qualcomm.com \
    --cc=menage@google.com \
    --cc=miaox@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.