* [PATCH] workqueue: ensure @task is valid across kthread_stop()
@ 2014-02-15 14:02 Lai Jiangshan
[not found] ` <1392654243-2829-1-git-send-email-laijs@cn.fujitsu.com>
2014-02-18 21:37 ` [PATCH wq/for-3.14-fixes] workqueue: ensure @task is valid across kthread_stop() Tejun Heo
0 siblings, 2 replies; 10+ messages in thread
From: Lai Jiangshan @ 2014-02-15 14:02 UTC (permalink / raw)
To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan, stable
When a worker is set WORKER_DIE, it may die very quickly,
and kthread_stop() will access to a stale task stuct/stack.
To avoid this, we use get_task_struct() to ensure @task is valid.
See more comments in kthread_create_on_node()&kthread_stop().
Note: comments in kthread_create_on_node() does not elaborate
any use case like this patch, but it is a valid way to use
kthread_stop().
CC: stable@vger.kernel.org
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
kernel/workqueue.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 82ef9f3..783d5f2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1856,9 +1856,12 @@ static void destroy_worker(struct worker *worker)
idr_remove(&pool->worker_idr, worker->id);
+ /* Enusre the @worker->task is valid across kthread_stop() */
+ get_task_struct(worker->task);
spin_unlock_irq(&pool->lock);
kthread_stop(worker->task);
+ put_task_struct(worker->task);
kfree(worker);
spin_lock_irq(&pool->lock);
--
1.7.7.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 1/3] workqueue: free worker earlier in worker_thread()
[not found] ` <1392654243-2829-1-git-send-email-laijs@cn.fujitsu.com>
@ 2014-02-17 16:24 ` Lai Jiangshan
2014-02-17 16:24 ` [PATCH 2/3] workqueue: async worker destruction Lai Jiangshan
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2014-02-17 16:24 UTC (permalink / raw)
To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel
When @worker is set %WORKER_DIE, it is moved out from
idle_list&idr, no one can access it excepct kthread_data().
And in worker_thread, its task is clearred %PF_WQ_WORKER,
no one can access the @worker via kthread_data(),
we can safely free it.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
kernel/workqueue.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 783d5f2..fc05700 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1862,7 +1862,6 @@ static void destroy_worker(struct worker *worker)
kthread_stop(worker->task);
put_task_struct(worker->task);
- kfree(worker);
spin_lock_irq(&pool->lock);
}
@@ -2297,6 +2296,8 @@ woke_up:
spin_unlock_irq(&pool->lock);
WARN_ON_ONCE(!list_empty(&worker->entry));
worker->task->flags &= ~PF_WQ_WORKER;
+ /* No one can access to @worker now, free it. */
+ kfree(worker);
return 0;
}
--
1.7.7.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] workqueue: async worker destruction
[not found] ` <1392654243-2829-1-git-send-email-laijs@cn.fujitsu.com>
2014-02-17 16:24 ` [PATCH 1/3] workqueue: free worker earlier in worker_thread() Lai Jiangshan
@ 2014-02-17 16:24 ` Lai Jiangshan
2014-02-18 22:29 ` Tejun Heo
2014-02-17 16:24 ` [PATCH 3/3] workqueue: kick worker to die directly in idle timeout handler Lai Jiangshan
2014-02-18 1:39 ` [PATCH 0/3] workqueue: async worker destruction Lai Jiangshan
3 siblings, 1 reply; 10+ messages in thread
From: Lai Jiangshan @ 2014-02-17 16:24 UTC (permalink / raw)
To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel
This patchset moves the worker-destruction(partial) to worker_thread(),
and worker to be die will perform self-destruction.
This async worker destruction give us a room to reduce the mananger's invocation,
and simply the idle-worker-timeout handler later.
put_unbound_pool() still need to sync the destructions to ensure every
worker access to valid pool when performing self-destruction.
so this patch adds a special sync code to put_unbound_pool().
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
kernel/workqueue.c | 47 ++++++++++++++++++++++++++++++++++-------------
1 files changed, 34 insertions(+), 13 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fc05700..6634326 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1853,17 +1853,7 @@ static void destroy_worker(struct worker *worker)
list_del_init(&worker->entry);
worker->flags |= WORKER_DIE;
-
- idr_remove(&pool->worker_idr, worker->id);
-
- /* Enusre the @worker->task is valid across kthread_stop() */
- get_task_struct(worker->task);
- spin_unlock_irq(&pool->lock);
-
- kthread_stop(worker->task);
- put_task_struct(worker->task);
-
- spin_lock_irq(&pool->lock);
+ wake_up_process(worker->task);
}
static void idle_worker_timeout(unsigned long __pool)
@@ -2295,9 +2285,16 @@ woke_up:
if (unlikely(worker->flags & WORKER_DIE)) {
spin_unlock_irq(&pool->lock);
WARN_ON_ONCE(!list_empty(&worker->entry));
+
+ /* perform worker self-destruction */
+ mutex_lock(&pool->manager_mutex);
+ spin_lock_irq(&pool->lock);
+ idr_remove(&pool->worker_idr, worker->id);
worker->task->flags &= ~PF_WQ_WORKER;
/* No one can access to @worker now, free it. */
kfree(worker);
+ spin_unlock_irq(&pool->lock);
+ mutex_unlock(&pool->manager_mutex);
return 0;
}
@@ -3553,6 +3550,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
static void put_unbound_pool(struct worker_pool *pool)
{
struct worker *worker;
+ int wi;
lockdep_assert_held(&wq_pool_mutex);
@@ -3576,13 +3574,36 @@ static void put_unbound_pool(struct worker_pool *pool)
*/
mutex_lock(&pool->manager_arb);
mutex_lock(&pool->manager_mutex);
- spin_lock_irq(&pool->lock);
+ spin_lock_irq(&pool->lock);
while ((worker = first_worker(pool)))
destroy_worker(worker);
WARN_ON(pool->nr_workers || pool->nr_idle);
-
spin_unlock_irq(&pool->lock);
+
+ /* sync all workers dead */
+ for_each_pool_worker(worker, wi, pool) {
+ /*
+ * Although @worker->task was kicked to die, but we hold
+ * ->manager_mutex, it can't die, so we get its reference
+ * before drop ->manager_mutex. And we do sync until it die.
+ */
+ get_task_struct(worker->task);
+
+ /*
+ * First, for_each_pool_worker() travels based on ID(@wi),
+ * so it is safe even both ->manager_mutex and ->lock
+ * are dropped inside the loop.
+ * Second, no worker can be added now, so the loop
+ * ensures to travel all undead workers and sync them dead.
+ */
+ mutex_unlock(&pool->manager_mutex);
+
+ kthread_stop(worker->task);
+ put_task_struct(worker->task);
+ mutex_lock(&pool->manager_mutex);
+ }
+
mutex_unlock(&pool->manager_mutex);
mutex_unlock(&pool->manager_arb);
--
1.7.7.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] workqueue: kick worker to die directly in idle timeout handler
[not found] ` <1392654243-2829-1-git-send-email-laijs@cn.fujitsu.com>
2014-02-17 16:24 ` [PATCH 1/3] workqueue: free worker earlier in worker_thread() Lai Jiangshan
2014-02-17 16:24 ` [PATCH 2/3] workqueue: async worker destruction Lai Jiangshan
@ 2014-02-17 16:24 ` Lai Jiangshan
2014-02-18 22:31 ` Tejun Heo
2014-02-18 1:39 ` [PATCH 0/3] workqueue: async worker destruction Lai Jiangshan
3 siblings, 1 reply; 10+ messages in thread
From: Lai Jiangshan @ 2014-02-17 16:24 UTC (permalink / raw)
To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel
since original destroy_worker() does not do the full-destruction,
its main work just deprives the worker-duty from worker-thread and kick
it to die, so we rename it to kick_worker_to_die().
And since kick_worker_to_die() don't need manager_mutex and sleeping,
so we move maybe_destroy_worker() out from manager and kick worker to die
directly in idle timeout handler. And we remove %POOL_MANAGE_WORKERS which
help us remove a branch in worker_thread().
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
kernel/workqueue.c | 85 ++++++++--------------------------------------------
1 files changed, 13 insertions(+), 72 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6634326..1a672c5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -68,7 +68,6 @@ enum {
* manager_mutex to avoid changing binding state while
* create_worker() is in progress.
*/
- POOL_MANAGE_WORKERS = 1 << 0, /* need to manage workers */
POOL_DISASSOCIATED = 1 << 2, /* cpu can't serve workers */
POOL_FREEZING = 1 << 3, /* freeze in progress */
@@ -756,13 +755,6 @@ static bool need_to_create_worker(struct worker_pool *pool)
return need_more_worker(pool) && !may_start_working(pool);
}
-/* Do I need to be the manager? */
-static bool need_to_manage_workers(struct worker_pool *pool)
-{
- return need_to_create_worker(pool) ||
- (pool->flags & POOL_MANAGE_WORKERS);
-}
-
/* Do we have too many workers and should some go away? */
static bool too_many_workers(struct worker_pool *pool)
{
@@ -1698,7 +1690,7 @@ static struct worker *alloc_worker(void)
*
* Create a new worker which is bound to @pool. The returned worker
* can be started by calling start_worker() or destroyed using
- * destroy_worker().
+ * kick_worker_to_die().
*
* CONTEXT:
* Might sleep. Does GFP_KERNEL allocations.
@@ -1826,19 +1818,19 @@ static int create_and_start_worker(struct worker_pool *pool)
}
/**
- * destroy_worker - destroy a workqueue worker
- * @worker: worker to be destroyed
+ * kick_worker_to_die - kick a workqueue worker to die
+ * @worker: worker to die
*
- * Destroy @worker and adjust @pool stats accordingly.
+ * Kick @worker to die and adjust @pool stats accordingly.
+ * The @worker will be dying in the flight.
*
* CONTEXT:
- * spin_lock_irq(pool->lock) which is released and regrabbed.
+ * spin_lock_irq(pool->lock).
*/
-static void destroy_worker(struct worker *worker)
+static void kick_worker_to_die(struct worker *worker)
{
struct worker_pool *pool = worker->pool;
- lockdep_assert_held(&pool->manager_mutex);
lockdep_assert_held(&pool->lock);
/* sanity check frenzy */
@@ -1861,8 +1853,7 @@ static void idle_worker_timeout(unsigned long __pool)
struct worker_pool *pool = (void *)__pool;
spin_lock_irq(&pool->lock);
-
- if (too_many_workers(pool)) {
+ while (too_many_workers(pool)) {
struct worker *worker;
unsigned long expires;
@@ -1870,15 +1861,13 @@ static void idle_worker_timeout(unsigned long __pool)
worker = list_entry(pool->idle_list.prev, struct worker, entry);
expires = worker->last_active + IDLE_WORKER_TIMEOUT;
- if (time_before(jiffies, expires))
+ if (time_before(jiffies, expires)) {
mod_timer(&pool->idle_timer, expires);
- else {
- /* it's been idle for too long, wake up manager */
- pool->flags |= POOL_MANAGE_WORKERS;
- wake_up_worker(pool);
+ break;
}
- }
+ kick_worker_to_die(worker);
+ }
spin_unlock_irq(&pool->lock);
}
@@ -1989,44 +1978,6 @@ restart:
}
/**
- * maybe_destroy_worker - destroy workers which have been idle for a while
- * @pool: pool to destroy workers for
- *
- * Destroy @pool workers which have been idle for longer than
- * IDLE_WORKER_TIMEOUT.
- *
- * LOCKING:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
- * multiple times. Called only from manager.
- *
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
- */
-static bool maybe_destroy_workers(struct worker_pool *pool)
-{
- bool ret = false;
-
- while (too_many_workers(pool)) {
- struct worker *worker;
- unsigned long expires;
-
- worker = list_entry(pool->idle_list.prev, struct worker, entry);
- expires = worker->last_active + IDLE_WORKER_TIMEOUT;
-
- if (time_before(jiffies, expires)) {
- mod_timer(&pool->idle_timer, expires);
- break;
- }
-
- destroy_worker(worker);
- ret = true;
- }
-
- return ret;
-}
-
-/**
* manage_workers - manage worker pool
* @worker: self
*
@@ -2089,13 +2040,6 @@ static bool manage_workers(struct worker *worker)
ret = true;
}
- pool->flags &= ~POOL_MANAGE_WORKERS;
-
- /*
- * Destroy and then create so that may_start_working() is true
- * on return.
- */
- ret |= maybe_destroy_workers(pool);
ret |= maybe_create_worker(pool);
mutex_unlock(&pool->manager_mutex);
@@ -2342,9 +2286,6 @@ recheck:
worker_set_flags(worker, WORKER_PREP, false);
sleep:
- if (unlikely(need_to_manage_workers(pool)) && manage_workers(worker))
- goto recheck;
-
/*
* pool->lock is held and there's no work to process and no need to
* manage, sleep. Workers are woken up only while holding
@@ -3577,7 +3518,7 @@ static void put_unbound_pool(struct worker_pool *pool)
spin_lock_irq(&pool->lock);
while ((worker = first_worker(pool)))
- destroy_worker(worker);
+ kick_worker_to_die(worker);
WARN_ON(pool->nr_workers || pool->nr_idle);
spin_unlock_irq(&pool->lock);
--
1.7.7.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] workqueue: async worker destruction
[not found] ` <1392654243-2829-1-git-send-email-laijs@cn.fujitsu.com>
` (2 preceding siblings ...)
2014-02-17 16:24 ` [PATCH 3/3] workqueue: kick worker to die directly in idle timeout handler Lai Jiangshan
@ 2014-02-18 1:39 ` Lai Jiangshan
3 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2014-02-18 1:39 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: Tejun Heo, LKML
Sorry, the cover letter was forgotten to send to LKML.
On 02/18/2014 12:24 AM, Lai Jiangshan wrote:
> This patchset moves the worker-destruction(partial) to worker_thread(),
> and worker to be die will perform self-destruction.
>
> This async worker destruction helps us to reduce the mananger's invocation,
> and simply the idle-worker-timeout handler.
>
> This patchset requires earlier patch
> "workqueue: ensure @task is valid across kthread_stop()".
>
> Lai Jiangshan (3):
> workqueue: free worker earlier in worker_thread()
> workqueue: async worker destruction
> workqueue: kick worker to die directly in idle timeout handler
>
> kernel/workqueue.c | 135 +++++++++++++++++++---------------------------------
> 1 files changed, 49 insertions(+), 86 deletions(-)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH wq/for-3.14-fixes] workqueue: ensure @task is valid across kthread_stop()
2014-02-15 14:02 [PATCH] workqueue: ensure @task is valid across kthread_stop() Lai Jiangshan
[not found] ` <1392654243-2829-1-git-send-email-laijs@cn.fujitsu.com>
@ 2014-02-18 21:37 ` Tejun Heo
2014-02-19 3:39 ` Lai Jiangshan
1 sibling, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2014-02-18 21:37 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: linux-kernel, stable
Hello, Lai.
I massaged the patch a bit and applied it to wq/for-3.14-fixes.
Thanks.
-------- 8< --------
>From 5bdfff96c69a4d5ab9c49e60abf9e070ecd2acbb Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Sat, 15 Feb 2014 22:02:28 +0800
When a kworker should die, the kworkre is notified through WORKER_DIE
flag instead of kthread_should_stop(). This, IIRC, is primarily to
keep the test synchronized inside worker_pool lock. WORKER_DIE is
first set while holding pool->lock, the lock is dropped and
kthread_stop() is called.
Unfortunately, this means that there's a slight chance that the target
kworker may see WORKER_DIE before kthread_stop() finishes and exits
and frees the target task before or during kthread_stop().
Fix it by pinning the target task before setting WORKER_DIE and
putting it after kthread_stop() is done.
tj: Improved patch description and comment. Moved pinning above
WORKER_DIE for better signify what it's protecting.
CC: stable@vger.kernel.org
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/workqueue.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 82ef9f3..193e977 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1851,6 +1851,12 @@ static void destroy_worker(struct worker *worker)
if (worker->flags & WORKER_IDLE)
pool->nr_idle--;
+ /*
+ * Once WORKER_DIE is set, the kworker may destroy itself at any
+ * point. Pin to ensure the task stays until we're done with it.
+ */
+ get_task_struct(worker->task);
+
list_del_init(&worker->entry);
worker->flags |= WORKER_DIE;
@@ -1859,6 +1865,7 @@ static void destroy_worker(struct worker *worker)
spin_unlock_irq(&pool->lock);
kthread_stop(worker->task);
+ put_task_struct(worker->task);
kfree(worker);
spin_lock_irq(&pool->lock);
--
1.8.5.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] workqueue: async worker destruction
2014-02-17 16:24 ` [PATCH 2/3] workqueue: async worker destruction Lai Jiangshan
@ 2014-02-18 22:29 ` Tejun Heo
0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2014-02-18 22:29 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: linux-kernel
Hello, Lai.
On Tue, Feb 18, 2014 at 12:24:02AM +0800, Lai Jiangshan wrote:
> @@ -2295,9 +2285,16 @@ woke_up:
> if (unlikely(worker->flags & WORKER_DIE)) {
> spin_unlock_irq(&pool->lock);
> WARN_ON_ONCE(!list_empty(&worker->entry));
> +
> + /* perform worker self-destruction */
> + mutex_lock(&pool->manager_mutex);
> + spin_lock_irq(&pool->lock);
> + idr_remove(&pool->worker_idr, worker->id);
> worker->task->flags &= ~PF_WQ_WORKER;
> /* No one can access to @worker now, free it. */
> kfree(worker);
> + spin_unlock_irq(&pool->lock);
> + mutex_unlock(&pool->manager_mutex);
Hmm... is manager_mutex necessary for synchronization with
put_unbound_pool() path?
> @@ -3576,13 +3574,36 @@ static void put_unbound_pool(struct worker_pool *pool)
> */
> mutex_lock(&pool->manager_arb);
> mutex_lock(&pool->manager_mutex);
> - spin_lock_irq(&pool->lock);
>
> + spin_lock_irq(&pool->lock);
> while ((worker = first_worker(pool)))
> destroy_worker(worker);
> WARN_ON(pool->nr_workers || pool->nr_idle);
> -
> spin_unlock_irq(&pool->lock);
> +
> + /* sync all workers dead */
> + for_each_pool_worker(worker, wi, pool) {
> + /*
> + * Although @worker->task was kicked to die, but we hold
> + * ->manager_mutex, it can't die, so we get its reference
> + * before drop ->manager_mutex. And we do sync until it die.
> + */
> + get_task_struct(worker->task);
> +
> + /*
> + * First, for_each_pool_worker() travels based on ID(@wi),
> + * so it is safe even both ->manager_mutex and ->lock
> + * are dropped inside the loop.
> + * Second, no worker can be added now, so the loop
> + * ensures to travel all undead workers and sync them dead.
> + */
> + mutex_unlock(&pool->manager_mutex);
> +
> + kthread_stop(worker->task);
> + put_task_struct(worker->task);
> + mutex_lock(&pool->manager_mutex);
> + }
I can't say I'm a fan of the above. It's icky. Why can't we do
simple set WORKER_DIE & wakeup thing here?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] workqueue: kick worker to die directly in idle timeout handler
2014-02-17 16:24 ` [PATCH 3/3] workqueue: kick worker to die directly in idle timeout handler Lai Jiangshan
@ 2014-02-18 22:31 ` Tejun Heo
0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2014-02-18 22:31 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: linux-kernel
On Tue, Feb 18, 2014 at 12:24:03AM +0800, Lai Jiangshan wrote:
> since original destroy_worker() does not do the full-destruction,
> its main work just deprives the worker-duty from worker-thread and kick
> it to die, so we rename it to kick_worker_to_die().
>
> And since kick_worker_to_die() don't need manager_mutex and sleeping,
> so we move maybe_destroy_worker() out from manager and kick worker to die
> directly in idle timeout handler. And we remove %POOL_MANAGE_WORKERS which
> help us remove a branch in worker_thread().
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Heh, this is awesome. :)
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH wq/for-3.14-fixes] workqueue: ensure @task is valid across kthread_stop()
2014-02-18 21:37 ` [PATCH wq/for-3.14-fixes] workqueue: ensure @task is valid across kthread_stop() Tejun Heo
@ 2014-02-19 3:39 ` Lai Jiangshan
2014-02-20 0:13 ` Tejun Heo
0 siblings, 1 reply; 10+ messages in thread
From: Lai Jiangshan @ 2014-02-19 3:39 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-kernel, stable
On 02/19/2014 05:37 AM, Tejun Heo wrote:
> Hello, Lai.
>
> I massaged the patch a bit and applied it to wq/for-3.14-fixes.
>
> Thanks.
> -------- 8< --------
>>From 5bdfff96c69a4d5ab9c49e60abf9e070ecd2acbb Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Sat, 15 Feb 2014 22:02:28 +0800
>
> When a kworker should die, the kworkre is notified through WORKER_DIE
> flag instead of kthread_should_stop(). This, IIRC, is primarily to
> keep the test synchronized inside worker_pool lock. WORKER_DIE is
> first set while holding pool->lock, the lock is dropped and
> kthread_stop() is called.
>
> Unfortunately, this means that there's a slight chance that the target
> kworker may see WORKER_DIE before kthread_stop() finishes and exits
> and frees the target task before or during kthread_stop().
>
> Fix it by pinning the target task before setting WORKER_DIE and
> putting it after kthread_stop() is done.
>
> tj: Improved patch description and comment. Moved pinning above
> WORKER_DIE for better signify what it's protecting.
>
> CC: stable@vger.kernel.org
I think no one hit this bug. So I add this stable TAG?
(Jason's bug-report drives me to review the workqueue harder,
and I found this possible bug, but I think it is irrespective
with Jason's bug-report.)
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> kernel/workqueue.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 82ef9f3..193e977 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1851,6 +1851,12 @@ static void destroy_worker(struct worker *worker)
> if (worker->flags & WORKER_IDLE)
> pool->nr_idle--;
>
> + /*
> + * Once WORKER_DIE is set, the kworker may destroy itself at any
> + * point. Pin to ensure the task stays until we're done with it.
> + */
> + get_task_struct(worker->task);
> +
> list_del_init(&worker->entry);
> worker->flags |= WORKER_DIE;
>
> @@ -1859,6 +1865,7 @@ static void destroy_worker(struct worker *worker)
> spin_unlock_irq(&pool->lock);
>
> kthread_stop(worker->task);
> + put_task_struct(worker->task);
> kfree(worker);
>
> spin_lock_irq(&pool->lock);
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH wq/for-3.14-fixes] workqueue: ensure @task is valid across kthread_stop()
2014-02-19 3:39 ` Lai Jiangshan
@ 2014-02-20 0:13 ` Tejun Heo
0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2014-02-20 0:13 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: linux-kernel, stable
Hello, Lai.
On Wed, Feb 19, 2014 at 11:39:42AM +0800, Lai Jiangshan wrote:
> > CC: stable@vger.kernel.org
>
> I think no one hit this bug. So I add this stable TAG?
>
> (Jason's bug-report drives me to review the workqueue harder,
> and I found this possible bug, but I think it is irrespective
> with Jason's bug-report.)
Hmm.... the issue won't happen frequently but it can happen and the
change is on the safer side. I think it'd be better to cc stable.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-02-20 0:13 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-15 14:02 [PATCH] workqueue: ensure @task is valid across kthread_stop() Lai Jiangshan
[not found] ` <1392654243-2829-1-git-send-email-laijs@cn.fujitsu.com>
2014-02-17 16:24 ` [PATCH 1/3] workqueue: free worker earlier in worker_thread() Lai Jiangshan
2014-02-17 16:24 ` [PATCH 2/3] workqueue: async worker destruction Lai Jiangshan
2014-02-18 22:29 ` Tejun Heo
2014-02-17 16:24 ` [PATCH 3/3] workqueue: kick worker to die directly in idle timeout handler Lai Jiangshan
2014-02-18 22:31 ` Tejun Heo
2014-02-18 1:39 ` [PATCH 0/3] workqueue: async worker destruction Lai Jiangshan
2014-02-18 21:37 ` [PATCH wq/for-3.14-fixes] workqueue: ensure @task is valid across kthread_stop() Tejun Heo
2014-02-19 3:39 ` Lai Jiangshan
2014-02-20 0:13 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox