From: Andrew Morton <akpm@linux-foundation.org>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: menage@google.com, miaox@cn.fujitsu.com, maxk@qualcomm.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cpuset: fix possible deadlock in async_rebuild_sched_domains
Date: Fri, 16 Jan 2009 12:57:38 -0800 [thread overview]
Message-ID: <20090116125738.22c21bf2.akpm@linux-foundation.org> (raw)
In-Reply-To: <4970000E.7040902@cn.fujitsu.com>
On Fri, 16 Jan 2009 11:33:34 +0800
Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>
> But queuing a work to an other thread is adding some overhead for cpuset.
> And a new separate workqueue thread is wasteful, this thread is sleeping
> at most time.
>
> This is an effective fix:
>
> This patch add cgroup_queue_defer_work(). And the works will be deferring
> processed with cgroup_mutex released. And this patch just add very very
> little overhead for cgroup_unlock()'s fast path.
>
> Lai
>
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>
> Lockdep reported some possible circular locking info when we tested cpuset on
> NUMA/fake NUMA box.
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.29-rc1-00224-ga652504 #111
> -------------------------------------------------------
> bash/2968 is trying to acquire lock:
> (events){--..}, at: [<ffffffff8024c8cd>] flush_work+0x24/0xd8
>
> but task is already holding lock:
> (cgroup_mutex){--..}, at: [<ffffffff8026ad1e>] cgroup_lock_live_group+0x12/0x29
>
> which lock already depends on the new lock.
> ......
> -------------------------------------------------------
>
> Steps to reproduce:
> # mkdir /dev/cpuset
> # mount -t cpuset xxx /dev/cpuset
> # mkdir /dev/cpuset/0
> # echo 0 > /dev/cpuset/0/cpus
> # echo 0 > /dev/cpuset/0/mems
> # echo 1 > /dev/cpuset/0/memory_migrate
> # cat /dev/zero > /dev/null &
> # echo $! > /dev/cpuset/0/tasks
>
> This is because async_rebuild_sched_domains has the following lock sequence:
> run_workqueue(async_rebuild_sched_domains)
> -> do_rebuild_sched_domains -> cgroup_lock
>
> But, attaching tasks when memory_migrate is set has following:
> cgroup_lock_live_group(cgroup_tasks_write)
> -> do_migrate_pages -> flush_work
>
> This can be fixed by using a separate workqueue thread.
>
> But queuing a work to an other thread is adding some overhead for cpuset.
> And a new separate workqueue thread is wasteful, this thread is sleeping
> at most time.
>
> This patch add cgroup_queue_defer_work(). And the works will be deferring
> processed with cgroup_mutex released. And this patch just add very very
> little overhead for cgroup_unlock()'s fast path.
hm. There's little discussion for such a large-looking patch?
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -437,6 +437,19 @@ void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it);
> int cgroup_scan_tasks(struct cgroup_scanner *scan);
> int cgroup_attach_task(struct cgroup *, struct task_struct *);
>
> +struct cgroup_defer_work {
> + struct list_head list;
> + void (*func)(struct cgroup_defer_work *);
> +};
> +
> +#define CGROUP_DEFER_WORK(name, function) \
> + struct cgroup_defer_work name = { \
> + .list = LIST_HEAD_INIT((name).list), \
> + .func = (function), \
> + };
> +
> +int cgroup_queue_defer_work(struct cgroup_defer_work *defer_work);
These should be called "cgroup_deferred_work" and
"queue_deferred_work", I think? Maybe not.
> +static void cgroup_flush_defer_work_locked(void);
cgroup_flush_deferred_work_locked()?
> /**
> * cgroup_unlock - release lock on cgroup changes
> *
> @@ -547,9 +548,67 @@ void cgroup_lock(void)
> */
> void cgroup_unlock(void)
> {
> + cgroup_flush_defer_work_locked();
> mutex_unlock(&cgroup_mutex);
> }
>
> +static LIST_HEAD(defer_work_list);
Please add a comment telling readers what the locking protocol is for
this list.
> +/* flush deferred works with cgroup_mutex released */
> +static void cgroup_flush_defer_work_locked(void)
> +{
> + static bool running_dely_work;
"running_delayed_work"
> + if (likely(list_empty(&defer_work_list)))
> + return;
> +
> + /*
> + * Insure it's not recursive and also
> + * insure deferred works are run orderly.
"ensure"
> + */
> + if (running_dely_work)
> + return;
> + running_dely_work = true;
> +
> + for ( ; ; ) {
> + struct cgroup_defer_work *defer_work;
> +
> + defer_work = list_first_entry(&defer_work_list,
> + struct cgroup_defer_work, list);
> + list_del_init(&defer_work->list);
> + mutex_unlock(&cgroup_mutex);
> +
> + defer_work->func(defer_work);
> +
> + mutex_lock(&cgroup_mutex);
> + if (list_empty(&defer_work_list))
> + break;
> + }
> +
> + running_dely_work = false;
> +}
> +
> +/**
> + * cgroup_queue_defer_work - queue a deferred work
> + * @defer_work: work to queue
> + *
> + * Returns 0 if @defer_work was already on the queue, non-zero otherwise.
> + *
> + * Must called when cgroup_mutex held.
"must be"
> + * The defered work will be run after cgroup_mutex released.
"deferred"
> + */
> +int cgroup_queue_defer_work(struct cgroup_defer_work *defer_work)
> +{
> + int ret = 0;
> +
> + if (list_empty(&defer_work->list)) {
> + list_add_tail(&defer_work->list, &defer_work_list);
> + ret = 1;
> + }
> +
> + return ret;
> +}
> +
> /*
> * A couple of forward declarations required, due to cyclic reference loop:
> * cgroup_mkdir -> cgroup_create -> cgroup_populate_dir ->
> @@ -616,7 +675,7 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
> * agent */
> synchronize_rcu();
>
> - mutex_lock(&cgroup_mutex);
> + cgroup_lock();
All these changes add rather a lot of noise. It would be better to do
this as two patches:
1: "convert open-coded mutex_lock(&cgroup_mutex) calls into cgroup_lock() calls"
2: "cpuset: fix possible deadlock in async_rebuild_sched_domains"
Although it doesn't matter a lot.
Also, as it now appears to be compulsory to use cgroup_lock(), you
should rename cgroup_mutex to something else, to prevent accidental
direct usage of the lock.
next prev parent reply other threads:[~2009-01-16 20:58 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-16 2:24 [PATCH] cpuset: fix possible deadlock in async_rebuild_sched_domains Miao Xie
2009-01-16 3:33 ` Lai Jiangshan
2009-01-16 20:57 ` Andrew Morton [this message]
2009-01-18 8:06 ` [PATCH 1/3] cgroup: convert open-coded mutex_lock(&cgroup_mutex) calls into cgroup_lock() calls Lai Jiangshan
2009-01-18 9:10 ` Ingo Molnar
2009-01-19 1:37 ` Paul Menage
2009-01-19 1:41 ` Ingo Molnar
2009-01-20 1:28 ` Paul Menage
2009-01-20 18:22 ` Peter Zijlstra
2009-01-20 1:18 ` Paul Menage
2009-01-18 8:06 ` [PATCH 2/3] cgroup: introduce cgroup_queue_deferred_work() Lai Jiangshan
2009-01-18 9:04 ` Ingo Molnar
2009-01-19 1:55 ` Lai Jiangshan
2009-01-20 1:26 ` Paul Menage
2009-01-18 8:06 ` [PATCH 3/3] cpuset: fix possible deadlock in async_rebuild_sched_domains Lai Jiangshan
2009-01-18 9:06 ` Ingo Molnar
2009-01-19 1:40 ` Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090116125738.22c21bf2.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maxk@qualcomm.com \
--cc=menage@google.com \
--cc=miaox@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.