From: Tejun Heo <tj@kernel.org>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
Date: Fri, 7 Sep 2012 16:41:58 -0700 [thread overview]
Message-ID: <20120907234158.GL9426@google.com> (raw)
In-Reply-To: <20120907230746.GK9426@google.com>
I think this should do it. Can you spot any hole with the following
patch?
Thanks.
Index: work/kernel/workqueue.c
===================================================================
--- work.orig/kernel/workqueue.c
+++ work/kernel/workqueue.c
@@ -66,6 +66,7 @@ enum {
/* pool flags */
POOL_MANAGE_WORKERS = 1 << 0, /* need to manage workers */
+ POOL_MANAGING_WORKERS = 1 << 1, /* managing workers */
/* worker flags */
WORKER_STARTED = 1 << 0, /* started */
@@ -165,7 +166,7 @@ struct worker_pool {
struct timer_list idle_timer; /* L: worker idle timeout */
struct timer_list mayday_timer; /* L: SOS timer for workers */
- struct mutex manager_mutex; /* mutex manager should hold */
+ struct mutex manager_mutex; /* manager <-> CPU hotplug */
struct ida worker_ida; /* L: for worker IDs */
};
@@ -480,6 +481,7 @@ static atomic_t unbound_pool_nr_running[
};
static int worker_thread(void *__worker);
+static void process_scheduled_works(struct worker *worker);
static int worker_pool_pri(struct worker_pool *pool)
{
@@ -652,7 +654,7 @@ static bool need_to_manage_workers(struc
/* Do we have too many workers and should some go away? */
static bool too_many_workers(struct worker_pool *pool)
{
- bool managing = mutex_is_locked(&pool->manager_mutex);
+ bool managing = pool->flags & POOL_MANAGING_WORKERS;
int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
int nr_busy = pool->nr_workers - nr_idle;
@@ -1820,14 +1822,43 @@ static bool maybe_destroy_workers(struct
* some action was taken.
*/
static bool manage_workers(struct worker *worker)
+ __releases(&gcwq->lock) __acquires(&gcwq->lock)
{
struct worker_pool *pool = worker->pool;
+ struct global_cwq *gcwq = pool->gcwq;
bool ret = false;
- if (!mutex_trylock(&pool->manager_mutex))
- return ret;
+ if (pool->flags & POOL_MANAGING_WORKERS)
+ return ret;
pool->flags &= ~POOL_MANAGE_WORKERS;
+ pool->flags |= POOL_MANAGING_WORKERS;
+
+ /*
+ * To simplify both worker management and CPU hotplug, hold off
+ * management while hotplug is in progress. CPU hotplug path can't
+ * grab %POOL_MANAGING_WORKERS to achieve this because that can
+ * lead to idle worker depletion (all become busy thinking someone
+ * else is managing) which in turn can result in deadlock under
+ * extreme circumstances.
+ *
+ * manager_mutex would always be free unless CPU hotplug is in
+ * progress. trylock first without dropping gcwq->lock.
+ */
+ if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
+ spin_unlock_irq(&gcwq->lock);
+ mutex_lock(&pool->manager_mutex);
+ spin_lock_irq(&gcwq->lock);
+
+ /*
+ * CPU hotplug could have scheduled rebind_work while we're
+ * waiting for manager_mutex. Rebind before doing anything
+ * else. This has to be handled here. worker_thread()
+ * will be confused by the unexpected work item.
+ */
+ process_scheduled_works(worker);
+ ret = true;
+ }
/*
* Destroy and then create so that may_start_working() is true
@@ -1836,7 +1867,9 @@ static bool manage_workers(struct worker
ret |= maybe_destroy_workers(pool);
ret |= maybe_create_worker(pool);
+ pool->flags &= ~POOL_MANAGING_WORKERS;
mutex_unlock(&pool->manager_mutex);
+
return ret;
}
@@ -3393,7 +3426,7 @@ EXPORT_SYMBOL_GPL(work_busy);
* cpu comes back online.
*/
-/* claim manager positions of all pools */
+/* claim manager positions of all pools, see manage_workers() for details */
static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
{
struct worker_pool *pool;
next prev parent reply other threads:[~2012-09-07 23:42 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-06 20:06 [PATCH wq/for-3.6-fixes 1/3] workqueue: break out gcwq->lock locking from gcwq_claim/release_management_and_[un]lock() Tejun Heo
2012-09-06 20:07 ` [PATCH wq/for-3.6-fixes 2/3] workqueue: rename rebind_workers() to gcwq_associate() and let it handle locking and DISASSOCIATED clearing Tejun Heo
2012-09-06 20:08 ` [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE Tejun Heo
2012-09-07 1:53 ` Lai Jiangshan
2012-09-07 19:25 ` Tejun Heo
2012-09-07 3:10 ` Lai Jiangshan
2012-09-07 19:29 ` Tejun Heo
2012-09-07 20:22 ` Tejun Heo
2012-09-07 20:34 ` Tejun Heo
2012-09-07 23:05 ` Tejun Heo
2012-09-07 23:07 ` Tejun Heo
2012-09-07 23:41 ` Tejun Heo [this message]
2012-09-08 17:18 ` Lai Jiangshan
2012-09-08 17:29 ` Tejun Heo
2012-09-08 17:32 ` Tejun Heo
2012-09-08 17:40 ` Lai Jiangshan
2012-09-08 17:41 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120907234158.GL9426@google.com \
--to=tj@kernel.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.