Re: [PATCH wq/for-3.9-fixes] workqueue: fix possible pool stall bug in wq_unbind_fn()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH wq/for-3.9-fixes] workqueue: fix possible pool stall bug in wq_unbind_fn()
Date: Tue, 12 Mar 2013 00:09:47 +0800	[thread overview]
Message-ID: <513E01CB.7060103@cn.fujitsu.com> (raw)
In-Reply-To: <20130308231517.GS14556@mtj.dyndns.org>

Hi, Tejun,

Forgot to send a pull-request?
Add CC Linus.


Thanks,
Lai


On 09/03/13 07:15, Tejun Heo wrote:
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> Since multiple pools per cpu have been introduced, wq_unbind_fn() has
> a subtle bug which may theoretically stall work item processing.  The
> problem is two-fold.
> 
> * wq_unbind_fn() depends on the worker executing wq_unbind_fn() itself
>   to start unbound chain execution, which works fine when there was
>   only single pool.  With multiple pools, only the pool which is
>   running wq_unbind_fn() - the highpri one - is guaranteed to have
>   such kick-off.  The other pool could stall when its busy workers
>   block.
> 
> * The current code is setting WORKER_UNBIND / POOL_DISASSOCIATED of
>   the two pools in succession without initiating work execution
>   inbetween.  Because setting the flags requires grabbing assoc_mutex
>   which is held while new workers are created, this could lead to
>   stalls if a pool's manager is waiting for the previous pool's work
>   items to release memory.  This is almost purely theoretical tho.
> 
> Update wq_unbind_fn() such that it sets WORKER_UNBIND /
> POOL_DISASSOCIATED, goes over schedule() and explicitly kicks off
> execution for a pool and then moves on to the next one.
> 
> tj: Updated comments and description.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: stable@vger.kernel.org
> ---
> As you seemingly has disappeared, I just fixed up this patch and
> applied it to wq/for-3.9-fixes.
> 
> Thanks.
> 
>  kernel/workqueue.c |   44 +++++++++++++++++++++++++-------------------
>  1 file changed, 25 insertions(+), 19 deletions(-)
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3446,28 +3446,34 @@ static void wq_unbind_fn(struct work_str
>  
>  		spin_unlock_irq(&pool->lock);
>  		mutex_unlock(&pool->assoc_mutex);
> -	}
>  
> -	/*
> -	 * Call schedule() so that we cross rq->lock and thus can guarantee
> -	 * sched callbacks see the %WORKER_UNBOUND flag.  This is necessary
> -	 * as scheduler callbacks may be invoked from other cpus.
> -	 */
> -	schedule();
> +		/*
> +		 * Call schedule() so that we cross rq->lock and thus can
> +		 * guarantee sched callbacks see the %WORKER_UNBOUND flag.
> +		 * This is necessary as scheduler callbacks may be invoked
> +		 * from other cpus.
> +		 */
> +		schedule();
>  
> -	/*
> -	 * Sched callbacks are disabled now.  Zap nr_running.  After this,
> -	 * nr_running stays zero and need_more_worker() and keep_working()
> -	 * are always true as long as the worklist is not empty.  Pools on
> -	 * @cpu now behave as unbound (in terms of concurrency management)
> -	 * pools which are served by workers tied to the CPU.
> -	 *
> -	 * On return from this function, the current worker would trigger
> -	 * unbound chain execution of pending work items if other workers
> -	 * didn't already.
> -	 */
> -	for_each_std_worker_pool(pool, cpu)
> +		/*
> +		 * Sched callbacks are disabled now.  Zap nr_running.
> +		 * After this, nr_running stays zero and need_more_worker()
> +		 * and keep_working() are always true as long as the
> +		 * worklist is not empty.  This pool now behaves as an
> +		 * unbound (in terms of concurrency management) pool which
> +		 * are served by workers tied to the pool.
> +		 */
>  		atomic_set(&pool->nr_running, 0);
> +
> +		/*
> +		 * With concurrency management just turned off, a busy
> +		 * worker blocking could lead to lengthy stalls.  Kick off
> +		 * unbound chain execution of currently pending work items.
> +		 */
> +		spin_lock_irq(&pool->lock);
> +		wake_up_worker(pool);
> +		spin_unlock_irq(&pool->lock);
> +	}
>  }
>  
>  /*
>

next prev parent reply	other threads:[~2013-03-11 16:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-02 15:55 [PATCH] workqueue: fix possible bug which may silence the pool Lai Jiangshan
2013-03-04 19:20 ` Tejun Heo
2013-03-05  1:17   ` Lai Jiangshan
2013-03-05 18:04     ` Tejun Heo
2013-03-08 23:15 ` [PATCH wq/for-3.9-fixes] workqueue: fix possible pool stall bug in wq_unbind_fn() Tejun Heo
2013-03-11 16:09   ` Lai Jiangshan [this message]
2013-03-11 16:24     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=513E01CB.7060103@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.