[PATCH 3/2] fix flush_workqueue() vs CPU

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race
@ 2006-12-30 16:10 Oleg Nesterov
  2007-01-03  0:27 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2006-12-30 16:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, David Howells, Christoph Hellwig, Gautham R Shenoy,
	linux-kernel

"[PATCH 1/2] reimplement flush_workqueue()" fixed one race when CPU goes down
while flush_cpu_workqueue() plays with it. But there is another problem, CPU
can die before flush_workqueue() has a chance to call flush_cpu_workqueue().
In that case pending work_structs can migrate to CPU which was already checked,
so we should redo the "for_each_online_cpu(cpu)" loop.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- mm-6.20-rc2/kernel/workqueue.c~3_race	2006-12-29 18:37:31.000000000 +0300
+++ mm-6.20-rc2/kernel/workqueue.c	2006-12-30 18:09:07.000000000 +0300
@@ -65,6 +65,7 @@ struct workqueue_struct {
 
 /* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
    threads to each one as cpus come/go. */
+static long hotplug_sequence __read_mostly;
 static DEFINE_MUTEX(workqueue_mutex);
 static LIST_HEAD(workqueues);
 
@@ -454,10 +455,16 @@ void fastcall flush_workqueue(struct wor
 		/* Always use first cpu's area. */
 		flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, singlethread_cpu));
 	} else {
+		long sequence;
 		int cpu;
+again:
+		sequence = hotplug_sequence;
 
 		for_each_online_cpu(cpu)
 			flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
+
+		if (unlikely(sequence != hotplug_sequence))
+			goto again;
 	}
 	mutex_unlock(&workqueue_mutex);
 }
@@ -874,6 +881,7 @@ static int __devinit workqueue_cpu_callb
 			cleanup_workqueue_thread(wq, hotcpu);
 		list_for_each_entry(wq, &workqueues, list)
 			take_over_work(wq, hotcpu);
+		hotplug_sequence++;
 		break;
 
 	case CPU_LOCK_RELEASE:


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race
  2006-12-30 16:10 [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race Oleg Nesterov
@ 2007-01-03  0:27 ` Andrew Morton
  2007-01-03 14:04   ` Gautham R Shenoy
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2007-01-03  0:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, David Howells, Christoph Hellwig, Gautham R Shenoy,
	linux-kernel

On Sat, 30 Dec 2006 19:10:31 +0300
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> "[PATCH 1/2] reimplement flush_workqueue()" fixed one race when CPU goes down
> while flush_cpu_workqueue() plays with it. But there is another problem, CPU
> can die before flush_workqueue() has a chance to call flush_cpu_workqueue().
> In that case pending work_structs can migrate to CPU which was already checked,
> so we should redo the "for_each_online_cpu(cpu)" loop.
> 

I have a mental note that these:

extend-notifier_call_chain-to-count-nr_calls-made.patch
extend-notifier_call_chain-to-count-nr_calls-made-fixes.patch
extend-notifier_call_chain-to-count-nr_calls-made-fixes-2.patch
define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release.patch
define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release-fix.patch
eliminate-lock_cpu_hotplug-in-kernel-schedc.patch
eliminate-lock_cpu_hotplug-in-kernel-schedc-fix.patch
handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch

should be scrapped.  But really I forget what their status is.  Gautham,
can you please remind us where we're at?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race
  2007-01-03  0:27 ` Andrew Morton
@ 2007-01-03 14:04   ` Gautham R Shenoy
  2007-01-03 15:17     ` Gautham R Shenoy
  0 siblings, 1 reply; 6+ messages in thread
From: Gautham R Shenoy @ 2007-01-03 14:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Ingo Molnar, David Howells, Christoph Hellwig,
	Gautham R Shenoy, linux-kernel, dipankar, vatsa

Hi Andrew,

Sorry, I am yet to check out Venki's and Oleg's patches as I
just returned from Vacation.

On Tue, Jan 02, 2007 at 04:27:27PM -0800, Andrew Morton wrote:
> 
> I have a mental note that these:
> 
> extend-notifier_call_chain-to-count-nr_calls-made.patch
> extend-notifier_call_chain-to-count-nr_calls-made-fixes.patch
> extend-notifier_call_chain-to-count-nr_calls-made-fixes-2.patch

These patches are needed because they allow us to send out the "failed"
notifications to only those subsystems that received the "prepare"
notifications earlier.

> define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release.patch
> define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release-fix.patch

These were posted inorder to have a common place where the subsystems
could lock their per-subsystem hotplug mutexes/semaphore from within the
cpu-hotplug-callback function. Hence they are needed IMO.

> eliminate-lock_cpu_hotplug-in-kernel-schedc.patch
> eliminate-lock_cpu_hotplug-in-kernel-schedc-fix.patch

These patches define and use a mutex to handle cpu-hotplug and eliminate
the use of lock_cpu_hotplug in sched.c. Hence they are still needed.

> handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch

Again, this one ensures that workqueue_mutex is taken/released on
CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in the cpuhotplug callback
function. So this one is required, unless it conflicts with what Oleg
has posted. Will check that out tonite.

> 
> should be scrapped.  But really I forget what their status is.  Gautham,
> can you please remind us where we're at?
> 

If all goes fine (w.r.t cpufreq and workqueue), eliminating
lock_cpu_hotplug from kernel/*.c should be relatively easy.<fingers crossed>

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race
  2007-01-03 14:04   ` Gautham R Shenoy
@ 2007-01-03 15:17     ` Gautham R Shenoy
  2007-01-03 17:26       ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Gautham R Shenoy @ 2007-01-03 15:17 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, David Howells,
	Christoph Hellwig, linux-kernel, dipankar, vatsa

On Wed, Jan 03, 2007 at 07:34:59PM +0530, Gautham R Shenoy wrote:
> 
> > handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch
> 
> Again, this one ensures that workqueue_mutex is taken/released on
> CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in the cpuhotplug callback
> function. So this one is required, unless it conflicts with what Oleg
> has posted. Will check that out tonite.

We would still be needing this patch as it's complementing what Oleg has
posted.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race
  2007-01-03 15:17     ` Gautham R Shenoy
@ 2007-01-03 17:26       ` Oleg Nesterov
  2007-01-04  4:30         ` Gautham R Shenoy
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2007-01-03 17:26 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Andrew Morton, Ingo Molnar, David Howells, Christoph Hellwig,
	linux-kernel, dipankar, vatsa

On 01/03, Gautham R Shenoy wrote:
>
> On Wed, Jan 03, 2007 at 07:34:59PM +0530, Gautham R Shenoy wrote:
> > 
> > > handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch
> > 
> > Again, this one ensures that workqueue_mutex is taken/released on
> > CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in the cpuhotplug callback
> > function. So this one is required, unless it conflicts with what Oleg
> > has posted. Will check that out tonite.
> 
> We would still be needing this patch as it's complementing what Oleg has
> posted.

I thought that these patches don't depend on each other, flush_work/workueue
don't care where cpu-hotplug takes workqueue_mutex, in CPU_LOCK_ACQUIRE or in
CPU_UP_PREPARE case (or CPU_DEAD/CPU_LOCK_RELEASE for unlock).

Could you clarify? Just curious.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race
  2007-01-03 17:26       ` Oleg Nesterov
@ 2007-01-04  4:30         ` Gautham R Shenoy
  0 siblings, 0 replies; 6+ messages in thread
From: Gautham R Shenoy @ 2007-01-04  4:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, Andrew Morton, Ingo Molnar, David Howells,
	Christoph Hellwig, linux-kernel, dipankar, vatsa

On Wed, Jan 03, 2007 at 08:26:57PM +0300, Oleg Nesterov wrote:
> 
> I thought that these patches don't depend on each other, flush_work/workueue
> don't care where cpu-hotplug takes workqueue_mutex, in CPU_LOCK_ACQUIRE or in
> CPU_UP_PREPARE case (or CPU_DEAD/CPU_LOCK_RELEASE for unlock).
> 
> Could you clarify? Just curious.

You are right. They don't depend on each other. 

The intention behind introducing CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE
was to have a standard place where the subsystems could acquire/release
the "cpu hotplug protection" mutex in the cpu_hotplug callback function.

The same can be acheived by acquiring these mutexes in
CPU_UP_PREPARE/CPU_DOWN_PREPARE etc. 

This is true for every subsystem that is cpu-hotplug aware.

> Oleg.
> 

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-01-04  4:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-30 16:10 [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race Oleg Nesterov
2007-01-03  0:27 ` Andrew Morton
2007-01-03 14:04   ` Gautham R Shenoy
2007-01-03 15:17     ` Gautham R Shenoy
2007-01-03 17:26       ` Oleg Nesterov
2007-01-04  4:30         ` Gautham R Shenoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox