public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@tv-sign.ru>
To: Michal Schmidt <mschmidt@redhat.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: destroy_workqueue can livelock
Date: Thu, 12 Jul 2007 02:26:53 +0400	[thread overview]
Message-ID: <20070711222652.GA287@tv-sign.ru> (raw)

Michal Schmidt wrote:
>
> While using SystemTap I noticed an interesting situation. When my stap
> probe was exiting, there was a several seconds long delay, during which
> the CPU was 100% loaded. I narrowed the problem down to destroy_workqueue.
>
> The attached module is a minimized testcase. To reproduce it, load the
> module and then try to rmmod it from a higher priority process:
>  nice -n -10 rmmod wqtest.ko  # that's how SystemTap's staprun behaves
> or:
>  chrt -f  90 rmmod wqtest.ko  # this may be more reliably reproducible
>
> I tested it (with "nice") on Linux 2.6.22. The rmmod process took about
> 55% CPU, the workqueue thread consumed the rest. This situation can last
> for minutes. As soon as the rmmod process is reniced to 0, the workqueue
> is destroyed successfully and the module is unloaded.
>
> Here's what happens in detail:
>
> When rmmod executes cancel_rearming_delayed_workqueue() ->
> wait_on_work() -> wait_on_cpu_work(), the work is the current_work on
> the workqueue (it's in ssleep(1)). So wait_on_cpu_work() inserts a
> wq_barrier on the workqueue and waits for the completion. As soon as
> wq_barrier_func signals the completion, it is most likely preempted by
> the rmmod process. At this moment, the worklist is already empty, but
> cwq->current_work still points to the barrier. run_workqueue() didn't
> get to reset it to NULL yet.
>
> Now rmmod calls destroy_workqueue() -> cleanup_workqueue_thread() ->
> flush_cpu_workqueue(). Because cwq->current_work!=NULL it decides to
> insert another wq_barrier and wait for it to complete. But
> cwq->current_work will never be reset to NULL, so
> cleanup_workqueue_thread() keeps trying flush_cpu_workqueue()
> indefinitely, inserting wq_barriers and waiting for them.

In short: "while (flush_cpu_workqueue(cwq))" can livelock because a re-niced
caller can add a new barrier before the lower-priority cwq->thread clears
->current_work.

> Can this be fixed?

Yes, and the fix is very simple. In fact cleanup_workqueue_thread() doesn't
need the "while" loop. I did it that way to avoid a subtle dependency with
run_workqueue(), and because I failed to invent a good comment which explains
why it is safe to do flush_cpu_workqueue() once.

In short, if we have another barrier when flush_cpu_workqueue() returns,
cwq->thread must be "inside" run_workqueue() which can't return until
cwq->worklist becomes empty. This means we can do kthread_stop() right now,
kthread_should_stop() won't be checked until run_workqueue() returns.

(Another option is to clear cwq->current_work in wq_barrier_func(), before
 complete(). This is possible because nobody can "see" this barrier except
 flush_cpu_workqueue()).

I'll re-check my thinking and send a patch tomorrow.

Thanks a lot, Michal.

Oleg.


             reply	other threads:[~2007-07-11 22:26 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-11 22:26 Oleg Nesterov [this message]
  -- strict thread matches above, loose matches on Subject: below --
2007-07-11 17:59 destroy_workqueue can livelock Michal Schmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070711222652.GA287@tv-sign.ru \
    --to=oleg@tv-sign.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mschmidt@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox