destroy_workqueue can livelock

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* destroy_workqueue can livelock
@ 2007-07-11 17:59 Michal Schmidt
  0 siblings, 0 replies; 2+ messages in thread
From: Michal Schmidt @ 2007-07-11 17:59 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]

Hi,

While using SystemTap I noticed an interesting situation. When my stap
probe was exiting, there was a several seconds long delay, during which
the CPU was 100% loaded. I narrowed the problem down to destroy_workqueue.

The attached module is a minimized testcase. To reproduce it, load the
module and then try to rmmod it from a higher priority process:
 nice -n -10 rmmod wqtest.ko  # that's how SystemTap's staprun behaves
or:
 chrt -f  90 rmmod wqtest.ko  # this may be more reliably reproducible

I tested it (with "nice") on Linux 2.6.22. The rmmod process took about
55% CPU, the workqueue thread consumed the rest. This situation can last
for minutes. As soon as the rmmod process is reniced to 0, the workqueue
is destroyed successfully and the module is unloaded.

Here's what happens in detail:

When rmmod executes cancel_rearming_delayed_workqueue() ->
wait_on_work() -> wait_on_cpu_work(), the work is the current_work on
the workqueue (it's in ssleep(1)). So wait_on_cpu_work() inserts a
wq_barrier on the workqueue and waits for the completion. As soon as
wq_barrier_func signals the completion, it is most likely preempted by
the rmmod process. At this moment, the worklist is already empty, but
cwq->current_work still points to the barrier. run_workqueue() didn't
get to reset it to NULL yet.

Now rmmod calls destroy_workqueue() -> cleanup_workqueue_thread() ->
flush_cpu_workqueue(). Because cwq->current_work!=NULL it decides to
insert another wq_barrier and wait for it to complete. But
cwq->current_work will never be reset to NULL, so
cleanup_workqueue_thread() keeps trying flush_cpu_workqueue()
indefinitely, inserting wq_barriers and waiting for them.

If rmmod's priority is lowered, run_workqueue() will not be preempted by
it and manages to reset cwq->current_work. This ends the livelock.

Can this be fixed? Or is it just a case of "Don't do that then!"?
("that" meaning destroying workqueues from negatively reniced processes)

Michal

[-- Attachment #2: wqtest.c --]
[-- Type: text/x-csrc, Size: 978 bytes --]

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/workqueue.h>
#include <linux/delay.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Michal Schmidt");

static void wq_func(struct work_struct *w);
static DECLARE_DELAYED_WORK(wq_work, wq_func);
static struct workqueue_struct *wq;

static DECLARE_WAIT_QUEUE_HEAD(ctl_wq);

static void wq_func(struct work_struct *w)
{
	/*
	 * So that this work is most likely cwq->current_work
	 * when destroy_workqueue comes...
	 */
	ssleep(1);

	queue_delayed_work(wq, &wq_work, HZ/100);
}

static int wqtest_start(void)
{
	wq = create_workqueue("wqtest");
	if (!wq)
		return -1;

	queue_delayed_work(wq, &wq_work, HZ/100);

	return 0;
}

static void wqtest_stop(void)
{
	printk(KERN_CRIT "wqtest: cancelling the work\n");
	cancel_rearming_delayed_work(&wq_work);
	printk(KERN_CRIT "wqtest: destroying the wq\n");
	destroy_workqueue(wq);
	printk(KERN_CRIT "wqtest: done\n");
}

module_init(wqtest_start);
module_exit(wqtest_stop);

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: destroy_workqueue can livelock
@ 2007-07-11 22:26 Oleg Nesterov
  0 siblings, 0 replies; 2+ messages in thread
From: Oleg Nesterov @ 2007-07-11 22:26 UTC (permalink / raw)
  To: Michal Schmidt; +Cc: linux-kernel

Michal Schmidt wrote:
>
> While using SystemTap I noticed an interesting situation. When my stap
> probe was exiting, there was a several seconds long delay, during which
> the CPU was 100% loaded. I narrowed the problem down to destroy_workqueue.
>
> The attached module is a minimized testcase. To reproduce it, load the
> module and then try to rmmod it from a higher priority process:
>  nice -n -10 rmmod wqtest.ko  # that's how SystemTap's staprun behaves
> or:
>  chrt -f  90 rmmod wqtest.ko  # this may be more reliably reproducible
>
> I tested it (with "nice") on Linux 2.6.22. The rmmod process took about
> 55% CPU, the workqueue thread consumed the rest. This situation can last
> for minutes. As soon as the rmmod process is reniced to 0, the workqueue
> is destroyed successfully and the module is unloaded.
>
> Here's what happens in detail:
>
> When rmmod executes cancel_rearming_delayed_workqueue() ->
> wait_on_work() -> wait_on_cpu_work(), the work is the current_work on
> the workqueue (it's in ssleep(1)). So wait_on_cpu_work() inserts a
> wq_barrier on the workqueue and waits for the completion. As soon as
> wq_barrier_func signals the completion, it is most likely preempted by
> the rmmod process. At this moment, the worklist is already empty, but
> cwq->current_work still points to the barrier. run_workqueue() didn't
> get to reset it to NULL yet.
>
> Now rmmod calls destroy_workqueue() -> cleanup_workqueue_thread() ->
> flush_cpu_workqueue(). Because cwq->current_work!=NULL it decides to
> insert another wq_barrier and wait for it to complete. But
> cwq->current_work will never be reset to NULL, so
> cleanup_workqueue_thread() keeps trying flush_cpu_workqueue()
> indefinitely, inserting wq_barriers and waiting for them.

In short: "while (flush_cpu_workqueue(cwq))" can livelock because a re-niced
caller can add a new barrier before the lower-priority cwq->thread clears
->current_work.

> Can this be fixed?

Yes, and the fix is very simple. In fact cleanup_workqueue_thread() doesn't
need the "while" loop. I did it that way to avoid a subtle dependency with
run_workqueue(), and because I failed to invent a good comment which explains
why it is safe to do flush_cpu_workqueue() once.

In short, if we have another barrier when flush_cpu_workqueue() returns,
cwq->thread must be "inside" run_workqueue() which can't return until
cwq->worklist becomes empty. This means we can do kthread_stop() right now,
kthread_should_stop() won't be checked until run_workqueue() returns.

(Another option is to clear cwq->current_work in wq_barrier_func(), before
 complete(). This is possible because nobody can "see" this barrier except
 flush_cpu_workqueue()).

I'll re-check my thinking and send a patch tomorrow.

Thanks a lot, Michal.

Oleg.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-07-11 22:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-11 17:59 destroy_workqueue can livelock Michal Schmidt
  -- strict thread matches above, loose matches on Subject: below --
2007-07-11 22:26 Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox