All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@osdl.org>,
	David Howells <dhowells@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Ingo Molnar <mingo@elte.hu>, Linus Torvalds <torvalds@osdl.org>,
	linux-kernel@vger.kernel.org, Gautham shenoy <ego@in.ibm.com>
Subject: Re: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update
Date: Mon, 8 Jan 2007 20:52:11 +0530	[thread overview]
Message-ID: <20070108152211.GA31263@in.ibm.com> (raw)
In-Reply-To: <20070107215103.GA7960@tv-sign.ru>

On Mon, Jan 08, 2007 at 12:51:03AM +0300, Oleg Nesterov wrote:
> Change flush_workqueue() to use for_each_possible_cpu(). This means that
> flush_cpu_workqueue() may hit CPU which is already dead. However in that
> case
> 
> 	if (!list_empty(&cwq->worklist) || cwq->current_work != NULL)
> 
> means that CPU_DEAD in progress, it will do kthread_stop() + take_over_work()
> so we can proceed and insert a barrier. We hold cwq->lock, so we are safe.
> 
> This patch replaces fix-flush_workqueue-vs-cpu_dead-race.patch which was
> broken by switching to preempt_disable (now we don't need locking at all).
> Note that migrate_sequence (was hotplug_sequence) is incremented under
> cwq->lock. Since flush_workqueue does lock/unlock of cwq->lock on all CPUs,
> it must see the new value if take_over_work() happened before we checked
> this cwq, and this is the case we should worry about: otherwise we added
> a barrier.
> 
> Srivatsa?

This is head-spinning :)

Spotted atleast these problems:

1. run_workqueue()->work.func()->flush_work()->mutex_lock(workqueue_mutex)
   deadlocks if we are blocked in cleanup_workqueue_thread()->kthread_stop() 
   for the same worker thread to exit.

   Looks possible in practice to me.

2. 
     
CPU_DEAD->cleanup_workqueue_thread->(cwq->thread = NULL)->kthread_stop() ..
				    ^^^^^^^^^^^^^^^^^^^^
						|___ Problematic

Now while we are blocked here, if a work->func() calls
flush_workqueue->flush_cpu_workqueue, we clearly cant identify that event 
thread is trying to flush its own queue (cwq->thread == current test
fails) and hence we will deadlock.

A lock_cpu_hotplug(), or any other ability to block concurrent hotplug 
operations from happening, in run_workqueue would have avoided both the above
races.

Alternatively, for the second race, I guess we can avoid setting 
cwq->thread = NULL in cleanup_workqueue_thread() till the thread has exited, 
but I am not sure if that opens up any other race. The first race seems
harder to fix ..

I wonder if spin (spinroot.com) or some other formal model can make this job of
spotting-races easier for us ..

-- 
Regards,
vatsa

  reply	other threads:[~2007-01-08 15:22 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-17 22:34 [PATCH, RFC] reimplement flush_workqueue() Oleg Nesterov
2006-12-18  3:09 ` Linus Torvalds
2006-12-19  0:27 ` Andrew Morton
2006-12-19  0:43   ` Oleg Nesterov
2006-12-19  1:00     ` Andrew Morton
2007-01-04 11:32     ` Srivatsa Vaddagiri
2007-01-04 14:29       ` Oleg Nesterov
2007-01-04 15:56         ` Srivatsa Vaddagiri
2007-01-04 16:31           ` Oleg Nesterov
2007-01-04 16:57             ` Srivatsa Vaddagiri
2007-01-04 17:18         ` Andrew Morton
2007-01-04 18:09           ` Oleg Nesterov
2007-01-04 18:31             ` Andrew Morton
2007-01-05  9:03               ` Srivatsa Vaddagiri
2007-01-05 14:07                 ` Oleg Nesterov
2007-01-06 15:24                   ` Srivatsa Vaddagiri
2007-01-05  8:56           ` Srivatsa Vaddagiri
2007-01-05 12:42             ` Oleg Nesterov
2007-01-06 15:11               ` Srivatsa Vaddagiri
2007-01-06 15:10           ` [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update Oleg Nesterov
2007-01-06 15:45             ` Srivatsa Vaddagiri
2007-01-06 16:30               ` Oleg Nesterov
2007-01-06 16:38                 ` Srivatsa Vaddagiri
2007-01-06 17:34                   ` Oleg Nesterov
2007-01-07 10:43                     ` Srivatsa Vaddagiri
2007-01-07 12:56                       ` Oleg Nesterov
2007-01-07 14:22                         ` Oleg Nesterov
2007-01-07 14:42                           ` Oleg Nesterov
2007-01-07 16:43                           ` Srivatsa Vaddagiri
2007-01-07 17:01                             ` Srivatsa Vaddagiri
2007-01-07 17:33                               ` Oleg Nesterov
2007-01-07 17:18                             ` Oleg Nesterov
2007-01-07 16:21                         ` Srivatsa Vaddagiri
2007-01-07 17:09                           ` Oleg Nesterov
2007-01-06 19:11                   ` Andrew Morton
2007-01-06 19:13                     ` Ingo Molnar
2007-01-07 11:00                     ` Srivatsa Vaddagiri
2007-01-07 19:59                       ` Andrew Morton
2007-01-07 21:01                         ` [PATCH] flush_cpu_workqueue: don't flush an empty ->worklist Oleg Nesterov
2007-01-08 23:54                           ` Andrew Morton
2007-01-09  5:04                             ` Srivatsa Vaddagiri
2007-01-09  5:26                               ` Andrew Morton
2007-01-09  6:56                                 ` Ingo Molnar
2007-01-09  9:33                                 ` Srivatsa Vaddagiri
2007-01-09  9:44                                   ` Ingo Molnar
2007-01-09  9:51                                   ` Andrew Morton
2007-01-09 10:09                                     ` Srivatsa Vaddagiri
2007-01-09 10:15                                       ` Andrew Morton
2007-01-09 15:07                                 ` Oleg Nesterov
2007-01-09 15:59                                   ` Srivatsa Vaddagiri
2007-01-09 16:38                                     ` Oleg Nesterov
2007-01-09 16:46                                       ` Srivatsa Vaddagiri
2007-01-09 16:56                                         ` Oleg Nesterov
2007-01-14 23:54                                           ` Oleg Nesterov
2007-01-15  4:33                                             ` Srivatsa Vaddagiri
2007-01-15 12:54                                               ` Oleg Nesterov
2007-01-15 13:08                                                 ` Oleg Nesterov
2007-01-15 16:18                                                 ` Srivatsa Vaddagiri
2007-01-15 16:55                                                   ` Oleg Nesterov
2007-01-16  5:26                                                     ` Srivatsa Vaddagiri
2007-01-16 13:27                                                       ` Oleg Nesterov
2007-01-17  6:17                                                         ` Srivatsa Vaddagiri
2007-01-17 15:47                                                           ` Oleg Nesterov
2007-01-17 16:12                                                             ` Srivatsa Vaddagiri
2007-01-17 17:01                                                               ` Oleg Nesterov
2007-01-17 16:25                                                             ` Srivatsa Vaddagiri
2007-01-07 21:51                         ` [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update Oleg Nesterov
2007-01-08 15:22                           ` Srivatsa Vaddagiri [this message]
2007-01-08 15:56                             ` Oleg Nesterov
2007-01-08 16:31                               ` Srivatsa Vaddagiri
2007-01-08 17:06                                 ` Oleg Nesterov
2007-01-08 18:37                                   ` Pallipadi, Venkatesh
2007-01-09  1:11                                     ` Srivatsa Vaddagiri
2007-01-09  4:39                                   ` Srivatsa Vaddagiri
2007-01-09 14:38                                     ` Oleg Nesterov
2007-01-08 15:37                         ` Srivatsa Vaddagiri
2007-01-04 12:02 ` [PATCH, RFC] reimplement flush_workqueue() Srivatsa Vaddagiri
2007-01-04 14:38   ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070108152211.GA31263@in.ibm.com \
    --to=vatsa@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=dhowells@redhat.com \
    --cc=ego@in.ibm.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@tv-sign.ru \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.