From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@osdl.org>,
David Howells <dhowells@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
Ingo Molnar <mingo@elte.hu>, Linus Torvalds <torvalds@osdl.org>,
linux-kernel@vger.kernel.org, Gautham shenoy <ego@in.ibm.com>
Subject: Re: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update
Date: Mon, 8 Jan 2007 22:01:40 +0530 [thread overview]
Message-ID: <20070108163140.GC31263@in.ibm.com> (raw)
In-Reply-To: <20070108155638.GA156@tv-sign.ru>
On Mon, Jan 08, 2007 at 06:56:38PM +0300, Oleg Nesterov wrote:
> > Spotted atleast these problems:
> >
> > 1. run_workqueue()->work.func()->flush_work()->mutex_lock(workqueue_mutex)
> > deadlocks if we are blocked in cleanup_workqueue_thread()->kthread_stop()
> > for the same worker thread to exit.
> >
> > Looks possible in practice to me.
>
> Yes, this is the same (old) problem as we have/had with flush_workqueue().
> We can convert flush_work() to use preempt_disable (this is not straightforward,
> but easy), or forbid to call flush_work() from work.func().
I think I noticed other problems of avoiding workqueue_mutex() in
flush_work() ..don't recall the exact problem.
> > 2.
> >
> > CPU_DEAD->cleanup_workqueue_thread->(cwq->thread = NULL)->kthread_stop() ..
> > ^^^^^^^^^^^^^^^^^^^^
> > |___ Problematic
>
> Hmm... This should not be possible? cwq->thread != NULL on CPU_DEAD event.
sure, cwq->thread != NULL at CPU_DEAD event. However
cleanup_workqueue_thread() will set it to NULL and block in
kthread_stop(), waiting for the kthread to finish run_workqueue and
exit. If one of the work functions being run by run_workqueue() calls
flush_workqueue()->flush_cpu_workqueue() now, flush_cpu_workqueue() can fail to
recognize that "keventd is trying to flush its own queue" which can
cause deadlocks.
> > Now while we are blocked here, if a work->func() calls
> > flush_workqueue->flush_cpu_workqueue, we clearly cant identify that event
> > thread is trying to flush its own queue (cwq->thread == current test
> > fails) and hence we will deadlock.
>
> Could you clarify? I believe cwq->thread == current test always works, we never
> "substitute" cwq->thread.
The test fails in the window described above.
> > A lock_cpu_hotplug(), or any other ability to block concurrent hotplug
> > operations from happening, in run_workqueue would have avoided both the above
> > races.
>
> I still don't think this is a good idea. We also need
> is_cpu_down_waits_for_lock_cpu_hotplug()
>
> helper, otherwise we have a deadlock if work->func() sleeps and re-queues itself.
Can you elaborate this a bit?
> > Alternatively, for the second race, I guess we can avoid setting
> > cwq->thread = NULL in cleanup_workqueue_thread() till the thread has exited,
>
> Yes, http://marc.theaimsgroup.com/?l=linux-kernel&m=116818097927685, I believe
> we can do this later. This way workqueue will have almost zero interaction
> with cpu-hotplug, and cpu UP/DOWN event won't be delayed by sleeping work.func().
> take_over_work() can go away, this also allows us to simplify things.
I agree it minimizes the interactions. Maybe worth attempting. However I
suspect it may not be as simple as it appears :)
--
Regards,
vatsa
next prev parent reply other threads:[~2007-01-08 16:31 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-17 22:34 [PATCH, RFC] reimplement flush_workqueue() Oleg Nesterov
2006-12-18 3:09 ` Linus Torvalds
2006-12-19 0:27 ` Andrew Morton
2006-12-19 0:43 ` Oleg Nesterov
2006-12-19 1:00 ` Andrew Morton
2007-01-04 11:32 ` Srivatsa Vaddagiri
2007-01-04 14:29 ` Oleg Nesterov
2007-01-04 15:56 ` Srivatsa Vaddagiri
2007-01-04 16:31 ` Oleg Nesterov
2007-01-04 16:57 ` Srivatsa Vaddagiri
2007-01-04 17:18 ` Andrew Morton
2007-01-04 18:09 ` Oleg Nesterov
2007-01-04 18:31 ` Andrew Morton
2007-01-05 9:03 ` Srivatsa Vaddagiri
2007-01-05 14:07 ` Oleg Nesterov
2007-01-06 15:24 ` Srivatsa Vaddagiri
2007-01-05 8:56 ` Srivatsa Vaddagiri
2007-01-05 12:42 ` Oleg Nesterov
2007-01-06 15:11 ` Srivatsa Vaddagiri
2007-01-06 15:10 ` [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update Oleg Nesterov
2007-01-06 15:45 ` Srivatsa Vaddagiri
2007-01-06 16:30 ` Oleg Nesterov
2007-01-06 16:38 ` Srivatsa Vaddagiri
2007-01-06 17:34 ` Oleg Nesterov
2007-01-07 10:43 ` Srivatsa Vaddagiri
2007-01-07 12:56 ` Oleg Nesterov
2007-01-07 14:22 ` Oleg Nesterov
2007-01-07 14:42 ` Oleg Nesterov
2007-01-07 16:43 ` Srivatsa Vaddagiri
2007-01-07 17:01 ` Srivatsa Vaddagiri
2007-01-07 17:33 ` Oleg Nesterov
2007-01-07 17:18 ` Oleg Nesterov
2007-01-07 16:21 ` Srivatsa Vaddagiri
2007-01-07 17:09 ` Oleg Nesterov
2007-01-06 19:11 ` Andrew Morton
2007-01-06 19:13 ` Ingo Molnar
2007-01-07 11:00 ` Srivatsa Vaddagiri
2007-01-07 19:59 ` Andrew Morton
2007-01-07 21:01 ` [PATCH] flush_cpu_workqueue: don't flush an empty ->worklist Oleg Nesterov
2007-01-08 23:54 ` Andrew Morton
2007-01-09 5:04 ` Srivatsa Vaddagiri
2007-01-09 5:26 ` Andrew Morton
2007-01-09 6:56 ` Ingo Molnar
2007-01-09 9:33 ` Srivatsa Vaddagiri
2007-01-09 9:44 ` Ingo Molnar
2007-01-09 9:51 ` Andrew Morton
2007-01-09 10:09 ` Srivatsa Vaddagiri
2007-01-09 10:15 ` Andrew Morton
2007-01-09 15:07 ` Oleg Nesterov
2007-01-09 15:59 ` Srivatsa Vaddagiri
2007-01-09 16:38 ` Oleg Nesterov
2007-01-09 16:46 ` Srivatsa Vaddagiri
2007-01-09 16:56 ` Oleg Nesterov
2007-01-14 23:54 ` Oleg Nesterov
2007-01-15 4:33 ` Srivatsa Vaddagiri
2007-01-15 12:54 ` Oleg Nesterov
2007-01-15 13:08 ` Oleg Nesterov
2007-01-15 16:18 ` Srivatsa Vaddagiri
2007-01-15 16:55 ` Oleg Nesterov
2007-01-16 5:26 ` Srivatsa Vaddagiri
2007-01-16 13:27 ` Oleg Nesterov
2007-01-17 6:17 ` Srivatsa Vaddagiri
2007-01-17 15:47 ` Oleg Nesterov
2007-01-17 16:12 ` Srivatsa Vaddagiri
2007-01-17 17:01 ` Oleg Nesterov
2007-01-17 16:25 ` Srivatsa Vaddagiri
2007-01-07 21:51 ` [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update Oleg Nesterov
2007-01-08 15:22 ` Srivatsa Vaddagiri
2007-01-08 15:56 ` Oleg Nesterov
2007-01-08 16:31 ` Srivatsa Vaddagiri [this message]
2007-01-08 17:06 ` Oleg Nesterov
2007-01-08 18:37 ` Pallipadi, Venkatesh
2007-01-09 1:11 ` Srivatsa Vaddagiri
2007-01-09 4:39 ` Srivatsa Vaddagiri
2007-01-09 14:38 ` Oleg Nesterov
2007-01-08 15:37 ` Srivatsa Vaddagiri
2007-01-04 12:02 ` [PATCH, RFC] reimplement flush_workqueue() Srivatsa Vaddagiri
2007-01-04 14:38 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070108163140.GC31263@in.ibm.com \
--to=vatsa@in.ibm.com \
--cc=akpm@osdl.org \
--cc=dhowells@redhat.com \
--cc=ego@in.ibm.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@tv-sign.ru \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.