public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] workqueue: fix possible bug which may silence the pool
@ 2013-03-02 15:55 Lai Jiangshan
  2013-03-04 19:20 ` Tejun Heo
  2013-03-08 23:15 ` [PATCH wq/for-3.9-fixes] workqueue: fix possible pool stall bug in wq_unbind_fn() Tejun Heo
  0 siblings, 2 replies; 7+ messages in thread
From: Lai Jiangshan @ 2013-03-02 15:55 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

After we introduce multiple pools for cpu pools, a part of the comments
in wq_unbind_fn() becomes wrong.

It said that "current worker would trigger unbound chain execution".
It is wrong. current worker only belongs to one of the multiple pools.

If wq_unbind_fn() does unbind the normal_pri pool(not the pool of the current
worker), the current worker is not the available worker to trigger unbound
chain execution of the normal_pri pool, and if all the workers of
the normal_pri goto sleep after they were set %WORKER_UNBOUND but before
they finish their current work, unbound chain execution is not triggered
totally. The pool is stopped!

We can change wq_unbind_fn() only does unbind one pool and we launch multiple
wq_unbind_fn()s, one for each pool to solve the problem.
But this change will add much latency to hotplug path unnecessarily.

So we choice to wake up a worker directly to trigger unbound chain execution.

current worker may sleep on &second_pool->assoc_mutex, so we also move
the wakeup code into the loop to avoid second_pool silences the first_pool.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   45 ++++++++++++++++++++++++++-------------------
 1 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 81f2457..03159c2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3446,28 +3446,35 @@ static void wq_unbind_fn(struct work_struct *work)
 
 		spin_unlock_irq(&pool->lock);
 		mutex_unlock(&pool->assoc_mutex);
-	}
 
-	/*
-	 * Call schedule() so that we cross rq->lock and thus can guarantee
-	 * sched callbacks see the %WORKER_UNBOUND flag.  This is necessary
-	 * as scheduler callbacks may be invoked from other cpus.
-	 */
-	schedule();
+		/*
+		 * Call schedule() so that we cross rq->lock and thus can
+		 * guarantee sched callbacks see the %WORKER_UNBOUND flag.
+		 * This is necessary as scheduler callbacks may be invoked
+		 * from other cpus.
+		 */
+		schedule();
 
-	/*
-	 * Sched callbacks are disabled now.  Zap nr_running.  After this,
-	 * nr_running stays zero and need_more_worker() and keep_working()
-	 * are always true as long as the worklist is not empty.  Pools on
-	 * @cpu now behave as unbound (in terms of concurrency management)
-	 * pools which are served by workers tied to the CPU.
-	 *
-	 * On return from this function, the current worker would trigger
-	 * unbound chain execution of pending work items if other workers
-	 * didn't already.
-	 */
-	for_each_std_worker_pool(pool, cpu)
+		/*
+		 * Sched callbacks are disabled now.  Zap nr_running.
+		 * After this, nr_running stays zero and need_more_worker()
+		 * and keep_working() are always true as long as the worklist
+		 * is not empty.  This pool now behave as unbound (in terms of
+		 * concurrency management) pool which are served by workers
+		 * tied to the pool.
+		 */
 		atomic_set(&pool->nr_running, 0);
+
+		/* The current busy workers of this pool may goto sleep without
+		 * wake up any other worker after they were set %WORKER_UNBOUND
+		 * flag. Here we wake up another possible worker to start
+		 * the unbound chain execution of pending work items in this
+		 * case.
+		 */
+		spin_lock_irq(&pool->lock);
+		wake_up_worker(pool);
+		spin_unlock_irq(&pool->lock);
+	}
 }
 
 /*
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-03-11 16:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-02 15:55 [PATCH] workqueue: fix possible bug which may silence the pool Lai Jiangshan
2013-03-04 19:20 ` Tejun Heo
2013-03-05  1:17   ` Lai Jiangshan
2013-03-05 18:04     ` Tejun Heo
2013-03-08 23:15 ` [PATCH wq/for-3.9-fixes] workqueue: fix possible pool stall bug in wq_unbind_fn() Tejun Heo
2013-03-11 16:09   ` Lai Jiangshan
2013-03-11 16:24     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox