[PATCH] destroy_workqueue() can livelock

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] destroy_workqueue() can livelock
@ 2007-07-13 13:16 Oleg Nesterov
  2007-07-13 17:03 ` Michal Schmidt
  0 siblings, 1 reply; 2+ messages in thread
From: Oleg Nesterov @ 2007-07-13 13:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michal Schmidt, Srivatsa Vaddagiri, stable, linux-kernel

Pointed out by Michal Schmidt <mschmidt@redhat.com>.

The bug was introduced in 2.6.22 by me.

cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until
->worklist becomes empty. This is live-lockable, a re-niced caller can
get CPU after wake_up() and insert a new barrier before the lower-priority
cwq->thread has a chance to clear ->current_work.

Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once.
We can rely on the fact that run_workqueue() won't return until it flushes
all works. So it is safe to call kthread_stop() after that, the "should stop"
request won't be noticed until run_workqueue() returns.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- t/kernel/workqueue.c~LIVELOCK	2007-06-13 18:26:56.000000000 +0400
+++ t/kernel/workqueue.c	2007-07-13 16:46:27.000000000 +0400
@@ -739,18 +739,17 @@ static void cleanup_workqueue_thread(str
 	if (cwq->thread == NULL)
 		return;
 
+	flush_cpu_workqueue(cwq);
 	/*
-	 * If the caller is CPU_DEAD the single flush_cpu_workqueue()
-	 * is not enough, a concurrent flush_workqueue() can insert a
-	 * barrier after us.
+	 * If the caller is CPU_DEAD and cwq->worklist was not empty,
+	 * a concurrent flush_workqueue() can insert a barrier after us.
+	 * However, in that case run_workqueue() won't return and check
+	 * kthread_should_stop() until it flushes all work_struct's.
 	 * When ->worklist becomes empty it is safe to exit because no
 	 * more work_structs can be queued on this cwq: flush_workqueue
 	 * checks list_empty(), and a "normal" queue_work() can't use
 	 * a dead CPU.
 	 */
-	while (flush_cpu_workqueue(cwq))
-		;
-
 	kthread_stop(cwq->thread);
 	cwq->thread = NULL;
 }


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] destroy_workqueue() can livelock
  2007-07-13 13:16 [PATCH] destroy_workqueue() can livelock Oleg Nesterov
@ 2007-07-13 17:03 ` Michal Schmidt
  0 siblings, 0 replies; 2+ messages in thread
From: Michal Schmidt @ 2007-07-13 17:03 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Srivatsa Vaddagiri, stable, linux-kernel

Oleg Nesterov wrote:
> Pointed out by Michal Schmidt <mschmidt@redhat.com>.
> 
> The bug was introduced in 2.6.22 by me.
> 
> cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until
> ->worklist becomes empty. This is live-lockable, a re-niced caller can
> get CPU after wake_up() and insert a new barrier before the lower-priority
> cwq->thread has a chance to clear ->current_work.
> 
> Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once.
> We can rely on the fact that run_workqueue() won't return until it flushes
> all works. So it is safe to call kthread_stop() after that, the "should stop"
> request won't be noticed until run_workqueue() returns.
> 
> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

I confirm the patch fixes the bug I was seeing.

Michal


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-07-13 17:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-13 13:16 [PATCH] destroy_workqueue() can livelock Oleg Nesterov
2007-07-13 17:03 ` Michal Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox