public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Eric Sandeen <sandeen@sandeen.net>,
	Dave Chinner <david@fromorbit.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>
Subject: [PATCH 3.14 32/33] workqueue: fix subtle pool management issue which can stall whole worker_pool
Date: Tue,  3 Feb 2015 15:14:55 -0800	[thread overview]
Message-ID: <20150203231240.603122066@linuxfoundation.org> (raw)
In-Reply-To: <20150203231235.843316867@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tejun Heo <tj@kernel.org>

commit 29187a9eeaf362d8422e62e17a22a6e115277a49 upstream.

A worker_pool's forward progress is guaranteed by the fact that the
last idle worker assumes the manager role to create more workers and
summon the rescuers if creating workers doesn't succeed in timely
manner before proceeding to execute work items.

This manager role is implemented in manage_workers(), which indicates
whether the worker may proceed to work item execution with its return
value.  This is necessary because multiple workers may contend for the
manager role, and, if there already is a manager, others should
proceed to work item execution.

Unfortunately, the function also indicates that the worker may proceed
to work item execution if need_to_create_worker() is false at the head
of the function.  need_to_create_worker() tests the following
conditions.

	pending work items && !nr_running && !nr_idle

The first and third conditions are protected by pool->lock and thus
won't change while holding pool->lock; however, nr_running can change
asynchronously as other workers block and resume and while it's likely
to be zero, as someone woke this worker up in the first place, some
other workers could have become runnable inbetween making it non-zero.

If this happens, manage_worker() could return false even with zero
nr_idle making the worker, the last idle one, proceed to execute work
items.  If then all workers of the pool end up blocking on a resource
which can only be released by a work item which is pending on that
pool, the whole pool can deadlock as there's no one to create more
workers or summon the rescuers.

This patch fixes the problem by removing the early exit condition from
maybe_create_worker() and making manage_workers() return false iff
there's already another manager, which ensures that the last worker
doesn't start executing work items.

We can leave the early exit condition alone and just ignore the return
value but the only reason it was put there is because the
manage_workers() used to perform both creations and destructions of
workers and thus the function may be invoked while the pool is trying
to reduce the number of workers.  Now that manage_workers() is called
only when more workers are needed, the only case this early exit
condition is triggered is rare race conditions rendering it pointless.

Tested with simulated workload and modified workqueue code which
trigger the pool deadlock reliably without this patch.

tj: Updated to v3.14 where manage_workers() is responsible not only
    for creating more workers but also destroying surplus ones.
    maybe_create_worker() needs to keep its early exit condition to
    avoid creating a new worker when manage_workers() is called to
    destroy surplus ones.  Other than that, the adaptabion is
    straight-forward.  Both maybe_{create|destroy}_worker() functions
    are converted to return void and manage_workers() returns %false
    iff it lost manager arbitration.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Eric Sandeen <sandeen@sandeen.net>
Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
Cc: Dave Chinner <david@fromorbit.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 kernel/workqueue.c |   42 +++++++++++++-----------------------------
 1 file changed, 13 insertions(+), 29 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1962,17 +1962,13 @@ static void pool_mayday_timeout(unsigned
  * spin_lock_irq(pool->lock) which may be released and regrabbed
  * multiple times.  Does GFP_KERNEL allocations.  Called only from
  * manager.
- *
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
  */
-static bool maybe_create_worker(struct worker_pool *pool)
+static void maybe_create_worker(struct worker_pool *pool)
 __releases(&pool->lock)
 __acquires(&pool->lock)
 {
 	if (!need_to_create_worker(pool))
-		return false;
+		return;
 restart:
 	spin_unlock_irq(&pool->lock);
 
@@ -1989,7 +1985,7 @@ restart:
 			start_worker(worker);
 			if (WARN_ON_ONCE(need_to_create_worker(pool)))
 				goto restart;
-			return true;
+			return;
 		}
 
 		if (!need_to_create_worker(pool))
@@ -2006,7 +2002,7 @@ restart:
 	spin_lock_irq(&pool->lock);
 	if (need_to_create_worker(pool))
 		goto restart;
-	return true;
+	return;
 }
 
 /**
@@ -2019,15 +2015,9 @@ restart:
  * LOCKING:
  * spin_lock_irq(pool->lock) which may be released and regrabbed
  * multiple times.  Called only from manager.
- *
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
  */
-static bool maybe_destroy_workers(struct worker_pool *pool)
+static void maybe_destroy_workers(struct worker_pool *pool)
 {
-	bool ret = false;
-
 	while (too_many_workers(pool)) {
 		struct worker *worker;
 		unsigned long expires;
@@ -2041,10 +2031,7 @@ static bool maybe_destroy_workers(struct
 		}
 
 		destroy_worker(worker);
-		ret = true;
 	}
-
-	return ret;
 }
 
 /**
@@ -2064,16 +2051,14 @@ static bool maybe_destroy_workers(struct
  * multiple times.  Does GFP_KERNEL allocations.
  *
  * Return:
- * %false if the pool don't need management and the caller can safely start
- * processing works, %true indicates that the function released pool->lock
- * and reacquired it to perform some management function and that the
- * conditions that the caller verified while holding the lock before
- * calling the function might no longer be true.
+ * %false if the pool doesn't need management and the caller can safely
+ * start processing works, %true if management function was performed and
+ * the conditions that the caller verified before calling the function may
+ * no longer be true.
  */
 static bool manage_workers(struct worker *worker)
 {
 	struct worker_pool *pool = worker->pool;
-	bool ret = false;
 
 	/*
 	 * Managership is governed by two mutexes - manager_arb and
@@ -2097,7 +2082,7 @@ static bool manage_workers(struct worker
 	 * manager_mutex.
 	 */
 	if (!mutex_trylock(&pool->manager_arb))
-		return ret;
+		return false;
 
 	/*
 	 * With manager arbitration won, manager_mutex would be free in
@@ -2107,7 +2092,6 @@ static bool manage_workers(struct worker
 		spin_unlock_irq(&pool->lock);
 		mutex_lock(&pool->manager_mutex);
 		spin_lock_irq(&pool->lock);
-		ret = true;
 	}
 
 	pool->flags &= ~POOL_MANAGE_WORKERS;
@@ -2116,12 +2100,12 @@ static bool manage_workers(struct worker
 	 * Destroy and then create so that may_start_working() is true
 	 * on return.
 	 */
-	ret |= maybe_destroy_workers(pool);
-	ret |= maybe_create_worker(pool);
+	maybe_destroy_workers(pool);
+	maybe_create_worker(pool);
 
 	mutex_unlock(&pool->manager_mutex);
 	mutex_unlock(&pool->manager_arb);
-	return ret;
+	return true;
 }
 
 /**



  parent reply	other threads:[~2015-02-03 23:22 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-03 23:14 [PATCH 3.14 00/33] 3.14.32-stable review Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 01/33] x86, build: replace Perl script with Shell script Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 02/33] spi: dw-mid: fix FIFO size Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 03/33] ASoC: wm8960: Fix capture sample rate from 11250 to 11025 Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 04/33] ASoC: fsl_esai: Fix incorrect xDC field width of xCCR registers Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 05/33] ASoC: soc-compress.c: fix NULL dereference Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 06/33] ASoC: omap-mcbsp: Correct CBM_CFS dai format configuration Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 07/33] can: kvaser_usb: Do not sleep in atomic context Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 08/33] can: kvaser_usb: Send correct context to URB completion Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 09/33] can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 10/33] can: kvaser_usb: Fix state handling upon BUS_ERROR events Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 11/33] powerpc/xmon: Fix another endiannes issue in RTAS call from xmon Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 12/33] ALSA: seq-dummy: remove deadlock-causing events on close Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 13/33] rbd: drop parent_ref in rbd_dev_unprobe() unconditionally Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 14/33] i2c: s3c2410: fix ABBA deadlock by keeping clock prepared Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 15/33] Input: synaptics - adjust min/max for Lenovo ThinkPad X1 Carbon 2nd Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 16/33] Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857) Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 17/33] nfs: fix dio deadlock when O_DIRECT flag is flipped Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 18/33] NFSv4.1: Fix an Oops in nfs41_walk_client_list Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 19/33] mac80211: properly set CCK flag in radiotap Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 20/33] nl80211: fix per-station group key get/del and memory leak Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 21/33] dm thin: dont allow messages to be sent to a pool target in READ_ONLY or FAIL mode Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 22/33] dm cache: fix missing ERR_PTR returns and handling Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 23/33] spi/pxa2xx: Clear cur_chip pointer before starting next message Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 24/33] regulator: core: fix race condition in regulator_put() Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 25/33] drivers: net: cpsw: discard dual emac default vlan configuration Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 26/33] drm/i915: Only fence tiled region of object Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 27/33] ARM: DMA: ensure that old section mappings are flushed from the TLB Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 28/33] pstore: clarify clearing of _read_cnt in ramoops_context Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 29/33] pstore: skip zero size persistent ram buffer in traverse Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 30/33] pstore: Fix NULL pointer fault if get NULL prz in ramoops_get_next_prz Greg Kroah-Hartman
2015-02-03 23:14 ` [PATCH 3.14 31/33] rbd: fix rbd_dev_parent_get() when parent_overlap == 0 Greg Kroah-Hartman
2015-02-03 23:14 ` Greg Kroah-Hartman [this message]
2015-02-03 23:14 ` [PATCH 3.14 33/33] target: Drop arbitrary maximum I/O size limit Greg Kroah-Hartman
2015-02-04 14:02 ` [PATCH 3.14 00/33] 3.14.32-stable review Guenter Roeck
2015-02-04 17:30 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150203231240.603122066@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=david@fromorbit.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox