From: Tejun Heo <tj@kernel.org>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
Date: Fri, 7 Sep 2012 16:07:46 -0700 [thread overview]
Message-ID: <20120907230746.GK9426@google.com> (raw)
In-Reply-To: <20120907230556.GJ9426@google.com>
On Fri, Sep 07, 2012 at 04:05:56PM -0700, Tejun Heo wrote:
> I got it down to the following but it creates a problem where CPU
> hotplug queues a work item on worker->scheduled before the execution
> loops starts. :(
Oops, wrong patch. This is the right one.
Index: work/kernel/workqueue.c
===================================================================
--- work.orig/kernel/workqueue.c
+++ work/kernel/workqueue.c
@@ -66,6 +66,7 @@ enum {
/* pool flags */
POOL_MANAGE_WORKERS = 1 << 0, /* need to manage workers */
+ POOL_MANAGING_WORKERS = 1 << 1, /* managing workers */
/* worker flags */
WORKER_STARTED = 1 << 0, /* started */
@@ -165,7 +166,7 @@ struct worker_pool {
struct timer_list idle_timer; /* L: worker idle timeout */
struct timer_list mayday_timer; /* L: SOS timer for workers */
- struct mutex manager_mutex; /* mutex manager should hold */
+ struct mutex manager_mutex; /* manager <-> CPU hotplug */
struct ida worker_ida; /* L: for worker IDs */
};
@@ -652,7 +653,7 @@ static bool need_to_manage_workers(struc
/* Do we have too many workers and should some go away? */
static bool too_many_workers(struct worker_pool *pool)
{
- bool managing = mutex_is_locked(&pool->manager_mutex);
+ bool managing = pool->flags & POOL_MANAGING_WORKERS;
int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
int nr_busy = pool->nr_workers - nr_idle;
@@ -1820,14 +1821,35 @@ static bool maybe_destroy_workers(struct
* some action was taken.
*/
static bool manage_workers(struct worker *worker)
+ __releases(&gcwq->lock) __acquires(&gcwq->lock)
{
struct worker_pool *pool = worker->pool;
+ struct global_cwq *gcwq = pool->gcwq;
bool ret = false;
- if (!mutex_trylock(&pool->manager_mutex))
- return ret;
+ if (pool->flags & POOL_MANAGING_WORKERS)
+ return ret;
pool->flags &= ~POOL_MANAGE_WORKERS;
+ pool->flags |= POOL_MANAGING_WORKERS;
+
+ /*
+ * To simplify both worker management and CPU hotplug, hold off
+ * management while hotplug is in progress. CPU hotplug path can't
+ * grab %POOL_MANAGING_WORKERS to achieve this because that can
+ * lead to idle worker depletion (all become busy thinking someone
+ * else is managing) which in turn can result in deadlock under
+ * extreme circumstances.
+ *
+ * manager_mutex would always be free unless CPU hotplug is in
+ * progress. trylock first without dropping gcwq->lock.
+ */
+ if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
+ spin_unlock_irq(&gcwq->lock);
+ mutex_lock(&pool->manager_mutex);
+ spin_lock_irq(&gcwq->lock);
+ ret = true;
+ }
/*
* Destroy and then create so that may_start_working() is true
@@ -1836,6 +1858,7 @@ static bool manage_workers(struct worker
ret |= maybe_destroy_workers(pool);
ret |= maybe_create_worker(pool);
+ pool->flags &= ~POOL_MANAGING_WORKERS;
mutex_unlock(&pool->manager_mutex);
return ret;
}
@@ -3393,7 +3416,7 @@ EXPORT_SYMBOL_GPL(work_busy);
* cpu comes back online.
*/
-/* claim manager positions of all pools */
+/* claim manager positions of all pools, see manage_workers() for details */
static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
{
struct worker_pool *pool;
--
tejun
next prev parent reply other threads:[~2012-09-07 23:07 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-06 20:06 [PATCH wq/for-3.6-fixes 1/3] workqueue: break out gcwq->lock locking from gcwq_claim/release_management_and_[un]lock() Tejun Heo
2012-09-06 20:07 ` [PATCH wq/for-3.6-fixes 2/3] workqueue: rename rebind_workers() to gcwq_associate() and let it handle locking and DISASSOCIATED clearing Tejun Heo
2012-09-06 20:08 ` [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE Tejun Heo
2012-09-07 1:53 ` Lai Jiangshan
2012-09-07 19:25 ` Tejun Heo
2012-09-07 3:10 ` Lai Jiangshan
2012-09-07 19:29 ` Tejun Heo
2012-09-07 20:22 ` Tejun Heo
2012-09-07 20:34 ` Tejun Heo
2012-09-07 23:05 ` Tejun Heo
2012-09-07 23:07 ` Tejun Heo [this message]
2012-09-07 23:41 ` Tejun Heo
2012-09-08 17:18 ` Lai Jiangshan
2012-09-08 17:29 ` Tejun Heo
2012-09-08 17:32 ` Tejun Heo
2012-09-08 17:40 ` Lai Jiangshan
2012-09-08 17:41 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120907230746.GK9426@google.com \
--to=tj@kernel.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.