[RFC v2][PATCH] create workqueue threads only when needed

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC v2][PATCH] create workqueue threads only when needed
@ 2009-01-28  0:27 Frederic Weisbecker
  2009-01-28  3:02 ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2009-01-28  0:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Lai Jiangshan, Peter Zijlstra,
	Steven Rostedt, Oleg Nesterov, Alasdair G Kergon,
	Arjan van de Ven, fweisbec

While looking at the statistics from the workqueue tracer, I've been suprised by the
number of useless workqueues I had:

 CPU  INSERTED  EXECUTED   NAME
 |      |         |          |

  *      0          0       kpsmoused
  *      0          0       ata_aux
  *      0          0       cqueue
  *      0          0       kacpi_notify
  *      0          0       kacpid
  *    998        998       khelper
  *      0          0       cpuset

  1      0          0       hda0/1
  1     42         42       reiserfs/1
  1      0          0       scsi_tgtd/1
  1      0          0       aio/1
  1      0          0       ata/1
  1    193        193       kblockd/1
  1      0          0       kintegrityd/1
  1      4          4       work_on_cpu/1
  1   1244       1244       events/1

  0      0          0       hda0/0
  0     63         63       reiserfs/0
  0      0          0       scsi_tgtd/0
  0      0          0       aio/0
  0      0          0       ata/0
  0    188        188       kblockd/0
  0      0          0       kintegrityd/0
  0     16         16       work_on_cpu/0
  0   1360       1360       events/0


All of the workqueues with 0 work inserted do nothing. 
For several reasons:

_ Unneeded built drivers for my system that create workqueue(s) when they init
_ Services which need their own workqueue, for several reasons, but who receive
  very rare jobs (often never)
_ ...?

And the result of git-grep create_singlethread_workqueue is even more surprising.

So I've started a patch which creates the workqueues by default without thread except
the kevents one.
They will have their thread created and started only when these workqueues will receive a
first work to do. This is performed by submitting a task's creation work to the kevent workqueues
which are always there, and are the only one which have their thread started on creation.

The result after this patch:

# CPU  INSERTED  EXECUTED   NAME
# |      |         |          |

  *    999       1000       khelper

  1      5          6       reiserfs/1
  1      0          2       work_on_cpu/1
  1     86         87       kblockd/1
  1     14         16       work_on_cpu/1
  1    149        149       events/1

  0     15         16       reiserfs/0
  0     85         86       kblockd/0
  0    146        146       events/0


Dropping 16 useless kernel threads in my system.

The workqueues created without thread at start are called "shadowed" whereas the others
like keventd are called "unshadowed".

Tested successfully, even after an hibernation.

Changelog in v2:

_ Fix problems pointed out by Oleg Nesterov, Ingo molnar and Alasdair G Kergon
_ Changes the names of structures/functions to bring the notion of "shadowed" workqueues
_ Fix the race when two works can concurrently create a thread for a same cpu workqueue
_ Fix the race when a work is pending to create a thread for a workqueue whereas this one
  is concurrently on cleanup.
_ If we fail to create a thread for a shadowed workqueue, retry in 1 sec (and warn!)


Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/workqueue.h |   44 +++++++------
 kernel/workqueue.c        |  154 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 162 insertions(+), 36 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 3cd51e5..7aba6b3 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -162,33 +162,35 @@ struct execute_work {
 extern struct workqueue_struct *
 __create_workqueue_key(const char *name, int singlethread,
 		       int freezeable, int rt, struct lock_class_key *key,
-		       const char *lock_name);
+		       const char *lock_name, bool shadowed);
 
 #ifdef CONFIG_LOCKDEP
-#define __create_workqueue(name, singlethread, freezeable, rt)	\
-({								\
-	static struct lock_class_key __key;			\
-	const char *__lock_name;				\
-								\
-	if (__builtin_constant_p(name))				\
-		__lock_name = (name);				\
-	else							\
-		__lock_name = #name;				\
-								\
-	__create_workqueue_key((name), (singlethread),		\
-			       (freezeable), (rt), &__key,	\
-			       __lock_name);			\
+#define __create_workqueue(name, singlethread, freezeable, rt, shadowed) \
+({									    \
+	static struct lock_class_key __key;				    \
+	const char *__lock_name;					    \
+									    \
+	if (__builtin_constant_p(name))					    \
+		__lock_name = (name);					    \
+	else								    \
+		__lock_name = #name;					    \
+									    \
+	__create_workqueue_key((name), (singlethread),			    \
+			       (freezeable), (rt), &__key,		    \
+			       __lock_name, shadowed);			    \
 })
 #else
-#define __create_workqueue(name, singlethread, freezeable, rt)	\
-	__create_workqueue_key((name), (singlethread), (freezeable), (rt), \
-			       NULL, NULL)
+#define __create_workqueue(name, singlethread, freezeable, rt, shadowed) \
+	__create_workqueue_key((name), (singlethread), (freezeable), (rt),  \
+			       NULL, NULL, shadowed)
 #endif
 
-#define create_workqueue(name) __create_workqueue((name), 0, 0, 0)
-#define create_rt_workqueue(name) __create_workqueue((name), 0, 0, 1)
-#define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1, 0)
-#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0, 0)
+#define create_workqueue(name) __create_workqueue((name), 0, 0, 0, 1)
+#define create_rt_workqueue(name) __create_workqueue((name), 0, 0, 1, 0)
+#define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1, 0, 0)
+#define create_singlethread_workqueue(name)			\
+				__create_workqueue((name), 1, 0, 0, 1)
+#define create_kevents(name) __create_workqueue((name), 0, 0, 0, 0)
 
 extern void destroy_workqueue(struct workqueue_struct *wq);
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e53ee18..6e0a38e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -49,6 +49,8 @@ struct cpu_workqueue_struct {
 
 	struct workqueue_struct *wq;
 	struct task_struct *thread;
+	atomic_t unshadowed; /* 1 if the thread creation is pending or done */
+	wait_queue_head_t thread_creation;
 
 	int run_depth;		/* Detect run_workqueue() recursion depth */
 } ____cacheline_aligned;
@@ -69,6 +71,17 @@ struct workqueue_struct {
 #endif
 };
 
+/*
+ * This structure embeeds a work to create the thread of a given cpu workqueue
+ * Most of the cpu workqueues (except keventd) are created "shadowed", which
+ * means their thread will be created when the first work comes to them,
+ * making the cpu workqueue "unshadowed".
+ */
+struct workqueue_shadow {
+	struct cpu_workqueue_struct *cwq;
+	struct delayed_work work;
+};
+
 /* Serializes the accesses to the list of workqueues. */
 static DEFINE_SPINLOCK(workqueue_lock);
 static LIST_HEAD(workqueues);
@@ -126,12 +139,46 @@ struct cpu_workqueue_struct *get_wq_data(struct work_struct *work)
 	return (void *) (atomic_long_read(&work->data) & WORK_STRUCT_WQ_DATA_MASK);
 }
 
+static void workqueue_unshadow_work(struct work_struct *work);
+
+/*
+ * Called when a first work is inserted on a workqueue.
+ * Insert a work to kevent to create a thread for the given cwq
+ */
+static void workqueue_unshadow(struct cpu_workqueue_struct *cwq)
+{
+	struct workqueue_shadow *ws;
+
+	/* Prevent from concurrent unshadowing */
+	if (unlikely(atomic_inc_return(&cwq->unshadowed) != 1))
+		goto already_unshadowed;
+
+	/*
+	 * The work can be inserted whatever is the context.
+	 * But such atomic allocation will be rare and freed soon.
+	 */
+	ws = kmalloc(sizeof(*ws), GFP_ATOMIC);
+	if (!ws) {
+		WARN_ON_ONCE(1);
+		goto already_unshadowed;
+	}
+	INIT_DELAYED_WORK(&ws->work, workqueue_unshadow_work);
+	ws->cwq = cwq;
+	schedule_delayed_work(&ws->work, 0);
+
+	return;
+
+already_unshadowed:
+	atomic_dec(&cwq->unshadowed);
+}
+
+
 DEFINE_TRACE(workqueue_insertion);
 
 static void insert_work(struct cpu_workqueue_struct *cwq,
 			struct work_struct *work, struct list_head *head)
 {
-	trace_workqueue_insertion(cwq->thread, work);
+	trace_workqueue_insertion(cwq->thread, work, cwq->wq->singlethread);
 
 	set_wq_data(work, cwq);
 	/*
@@ -148,6 +195,9 @@ static void __queue_work(struct cpu_workqueue_struct *cwq,
 {
 	unsigned long flags;
 
+	if (unlikely(!cwq->thread))
+		workqueue_unshadow(cwq);
+
 	spin_lock_irqsave(&cwq->lock, flags);
 	insert_work(cwq, work, &cwq->worklist);
 	spin_unlock_irqrestore(&cwq->lock, flags);
@@ -291,7 +341,9 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq)
 		 */
 		struct lockdep_map lockdep_map = work->lockdep_map;
 #endif
-		trace_workqueue_execution(cwq->thread, work);
+		struct workqueue_struct *wq = cwq->wq;
+
+		trace_workqueue_execution(cwq->thread, work, wq->singlethread);
 		cwq->current_work = work;
 		list_del_init(cwq->worklist.next);
 		spin_unlock_irq(&cwq->lock);
@@ -387,6 +439,9 @@ static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
 	} else {
 		struct wq_barrier barr;
 
+		if (unlikely(!cwq->thread))
+			workqueue_unshadow(cwq);
+
 		active = 0;
 		spin_lock_irq(&cwq->lock);
 		if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
@@ -759,6 +814,11 @@ int current_is_keventd(void)
 
 }
 
+static inline void init_unshadowed_state(struct cpu_workqueue_struct *cwq)
+{
+	atomic_set(&cwq->unshadowed, 1);
+}
+
 static struct cpu_workqueue_struct *
 init_cpu_workqueue(struct workqueue_struct *wq, int cpu)
 {
@@ -768,6 +828,7 @@ init_cpu_workqueue(struct workqueue_struct *wq, int cpu)
 	spin_lock_init(&cwq->lock);
 	INIT_LIST_HEAD(&cwq->worklist);
 	init_waitqueue_head(&cwq->more_work);
+	init_waitqueue_head(&cwq->thread_creation);
 
 	return cwq;
 }
@@ -796,7 +857,8 @@ static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
 		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 	cwq->thread = p;
 
-	trace_workqueue_creation(cwq->thread, cpu);
+	trace_workqueue_creation(cwq->thread, wq->singlethread ?
+					      SINGLETHREAD_CPU : cpu);
 
 	return 0;
 }
@@ -812,12 +874,45 @@ static void start_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
 	}
 }
 
+static void workqueue_unshadow_work(struct work_struct *work)
+{
+	struct workqueue_shadow *l;
+	struct delayed_work *delayed;
+	struct cpu_workqueue_struct *cwq;
+	int cpu = smp_processor_id();
+	int err = 0;
+
+	delayed = container_of(work, struct delayed_work, work);
+	l = container_of(delayed, struct workqueue_shadow, work);
+	cwq = l->cwq;
+
+	if (is_wq_single_threaded(cwq->wq)) {
+		err = create_workqueue_thread(cwq, singlethread_cpu);
+		start_workqueue_thread(cwq, -1);
+	} else {
+		err = create_workqueue_thread(cwq, cpu);
+		start_workqueue_thread(cwq, cpu);
+	}
+
+	/* If we can't create the thread, retry 1 sec later */
+	if (err) {
+		WARN_ON_ONCE(err);
+		PREPARE_DELAYED_WORK(delayed, workqueue_unshadow_work);
+		schedule_delayed_work(delayed, HZ);
+		return;
+	}
+
+	kfree(l);
+	wake_up(&cwq->thread_creation);
+}
+
 struct workqueue_struct *__create_workqueue_key(const char *name,
 						int singlethread,
 						int freezeable,
 						int rt,
 						struct lock_class_key *key,
-						const char *lock_name)
+						const char *lock_name,
+						bool shadowed)
 {
 	struct workqueue_struct *wq;
 	struct cpu_workqueue_struct *cwq;
@@ -842,8 +937,12 @@ struct workqueue_struct *__create_workqueue_key(const char *name,
 
 	if (singlethread) {
 		cwq = init_cpu_workqueue(wq, singlethread_cpu);
-		err = create_workqueue_thread(cwq, singlethread_cpu);
-		start_workqueue_thread(cwq, -1);
+		if (!shadowed) {
+			err = create_workqueue_thread(cwq, singlethread_cpu);
+			start_workqueue_thread(cwq, -1);
+			if (!err)
+				init_unshadowed_state(cwq);
+		}
 	} else {
 		cpu_maps_update_begin();
 		/*
@@ -863,10 +962,14 @@ struct workqueue_struct *__create_workqueue_key(const char *name,
 		 */
 		for_each_possible_cpu(cpu) {
 			cwq = init_cpu_workqueue(wq, cpu);
-			if (err || !cpu_online(cpu))
+			if (err || !cpu_online(cpu) || shadowed)
 				continue;
+
 			err = create_workqueue_thread(cwq, cpu);
 			start_workqueue_thread(cwq, cpu);
+			if (!err)
+				init_unshadowed_state(cwq);
+
 		}
 		cpu_maps_update_done();
 	}
@@ -883,12 +986,25 @@ DEFINE_TRACE(workqueue_destruction);
 
 static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
 {
+	DEFINE_WAIT(wait);
+	long timeout = 0;
+	int unshadowed = atomic_read(&cwq->unshadowed);
+
+	/* Shadowed => no thread has been created */
+	if (!unshadowed)
+		return;
+
 	/*
-	 * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
-	 * cpu_add_remove_lock protects cwq->thread.
+	 * If it's unshadowed, we want to ensure the thread creation
+	 * has been completed.
 	 */
-	if (cwq->thread == NULL)
-		return;
+	prepare_to_wait(&cwq->thread_creation, &wait, TASK_UNINTERRUPTIBLE);
+	if (!cwq->thread)
+		timeout = schedule_timeout_interruptible(HZ * 3);
+	finish_wait(&cwq->thread_creation, &wait);
+
+	/* We waited for 3 seconds, this is likely a soft lockup */
+	WARN_ON(timeout);
 
 	lock_map_acquire(&cwq->wq->lockdep_map);
 	lock_map_release(&cwq->wq->lockdep_map);
@@ -904,9 +1020,12 @@ static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
 	 * checks list_empty(), and a "normal" queue_work() can't use
 	 * a dead CPU.
 	 */
-	trace_workqueue_destruction(cwq->thread);
-	kthread_stop(cwq->thread);
-	cwq->thread = NULL;
+
+	if (!timeout) {
+		trace_workqueue_destruction(cwq->thread, cwq->wq->singlethread);
+		kthread_stop(cwq->thread);
+		cwq->thread = NULL;
+	}
 }
 
 /**
@@ -955,6 +1074,9 @@ undo:
 
 		switch (action) {
 		case CPU_UP_PREPARE:
+			/* Will be created during the first work insertion */
+			if (!atomic_read(&cwq->unshadowed))
+				break;
 			if (!create_workqueue_thread(cwq, cpu))
 				break;
 			printk(KERN_ERR "workqueue [%s] for %i failed\n",
@@ -964,6 +1086,8 @@ undo:
 			goto undo;
 
 		case CPU_ONLINE:
+			if (!atomic_read(&cwq->unshadowed))
+				break;
 			start_workqueue_thread(cwq, cpu);
 			break;
 
@@ -1033,7 +1157,7 @@ void __init init_workqueues(void)
 	singlethread_cpu = cpumask_first(cpu_possible_mask);
 	cpu_singlethread_map = cpumask_of(singlethread_cpu);
 	hotcpu_notifier(workqueue_cpu_callback, 0);
-	keventd_wq = create_workqueue("events");
+	keventd_wq = create_kevents("events");
 	BUG_ON(!keventd_wq);
 #ifdef CONFIG_SMP
 	work_on_cpu_wq = create_workqueue("work_on_cpu");



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC v2][PATCH] create workqueue threads only when needed
  2009-01-28  0:27 [RFC v2][PATCH] create workqueue threads only when needed Frederic Weisbecker
@ 2009-01-28  3:02 ` Oleg Nesterov
  2009-01-28  8:15   ` Peter Zijlstra
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Oleg Nesterov @ 2009-01-28  3:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, linux-kernel, Andrew Morton, Lai Jiangshan,
	Peter Zijlstra, Steven Rostedt, Alasdair G Kergon,
	Arjan van de Ven

On 01/28, Frederic Weisbecker wrote:
>
> +static void workqueue_unshadow(struct cpu_workqueue_struct *cwq)
> +{
> +	struct workqueue_shadow *ws;
> +
> +	/* Prevent from concurrent unshadowing */
> +	if (unlikely(atomic_inc_return(&cwq->unshadowed) != 1))
> +		goto already_unshadowed;
> +
> +	/*
> +	 * The work can be inserted whatever is the context.
> +	 * But such atomic allocation will be rare and freed soon.
> +	 */
> +	ws = kmalloc(sizeof(*ws), GFP_ATOMIC);
> +	if (!ws) {
> +		WARN_ON_ONCE(1);
> +		goto already_unshadowed;
> +	}
> +	INIT_DELAYED_WORK(&ws->work, workqueue_unshadow_work);
> +	ws->cwq = cwq;
> +	schedule_delayed_work(&ws->work, 0);
> +
> +	return;
> +
> +already_unshadowed:
> +	atomic_dec(&cwq->unshadowed);
> +}

Can't understand why do you use delayed work...

I must admit, I don't like this patch. Perhaps I am wrong, mostly I
dislike the complications it adds.

Anybody else please vote for this change?

Hmm. We never reset cwq->unshadowed, so cwq->thread becomes "non-lazy"
after cpu_down() + cpu_up().

And. Of course it is not good that queue_work() can silently fail just
because GFP_ATOMIC fails. This is not acceptable, imho. But fixable.

What is not fixable is that this patch adds a subtle lock-ordering
problem. With this patch any flush_work() or flush_workqueue() or
destroy_workqueue() depend on keventd, and can deadlock if the caller
shares the lock with any work_struct on keventd.

Or. let's suppose keventd has a sleeping work_struct which waits
for the event. Now we queue the work which should "implement"
this event on !unshadowed wq - deadlock.

Another problem. workqueue_unshadow_work() populates cwq->thread and
binds it to smp_processor_id(). This is not safe, CPU can go away
after smp_processor_id() but before wake_up_process().

Oh, and schedule_delayed_work() is not right, think about queue_work_on().

>  static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
>  {
> +	DEFINE_WAIT(wait);
> +	long timeout = 0;
> +	int unshadowed = atomic_read(&cwq->unshadowed);
> +
> +	/* Shadowed => no thread has been created */
> +	if (!unshadowed)
> +		return;

This is not right, if the previous workqueue_unshadow() failed, we
can return with the pending works.

> +
>  	/*
> -	 * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
> -	 * cpu_add_remove_lock protects cwq->thread.
> +	 * If it's unshadowed, we want to ensure the thread creation
> +	 * has been completed.
>  	 */
> -	if (cwq->thread == NULL)
> -		return;
> +	prepare_to_wait(&cwq->thread_creation, &wait, TASK_UNINTERRUPTIBLE);
> +	if (!cwq->thread)
> +		timeout = schedule_timeout_interruptible(HZ * 3);
> +	finish_wait(&cwq->thread_creation, &wait);
> +
> +	/* We waited for 3 seconds, this is likely a soft lockup */
> +	WARN_ON(timeout);

Can't understand... If timeout != 0, then we were woken by
workqueue_unshadow_work() ?

Anyway. We should not proceed if we failed to create cwq->thread.
The kernel can crash. And of course this is not good too. Yes,
you modified flush_cpu_workqueue() to call workqueue_unshadow(),
but this can fail too. And if another thread cancels the pending
works, flush_cpu_workqueue() just returns, and we crash. Or we
can hang forever.

Also. Please note that cleanup_workqueue_thread() can also be
called by CPU_UP_CANCELED when cwq->thread == NULL because it
was never created. We should do nothing in this case, but we
will hang if cwq->unshadowed != 0.

>  		switch (action) {
>  		case CPU_UP_PREPARE:
> +			/* Will be created during the first work insertion */
> +			if (!atomic_read(&cwq->unshadowed))
> +				break;
>  			if (!create_workqueue_thread(cwq, cpu))
>  				break;
>  			printk(KERN_ERR "workqueue [%s] for %i failed\n",
> @@ -964,6 +1086,8 @@ undo:
>  			goto undo;
>
>  		case CPU_ONLINE:
> +			if (!atomic_read(&cwq->unshadowed))
> +				break;
>  			start_workqueue_thread(cwq, cpu);
>  			break;

Suppose that we have some strange cpu_callback(action, cpu)
which does:

		case CPU_UP_PREPARE:
			queue_work_on(cpu, my_wq, percpu_work);
			break;
		case CPU_UP_CANCELED:
			cancel_work_sync(percpu_work);

Currently this works. But with this patch, queue_work_on() above
can leak workqueue_shadow.

Oleg.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v2][PATCH] create workqueue threads only when needed
  2009-01-28  3:02 ` Oleg Nesterov
@ 2009-01-28  8:15   ` Peter Zijlstra
  2009-01-28 11:33     ` Frédéric Weisbecker
  2009-01-28 11:24   ` Frédéric Weisbecker
  2009-01-28 23:31   ` Frederic Weisbecker
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2009-01-28  8:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Frederic Weisbecker, Ingo Molnar, linux-kernel, Andrew Morton,
	Lai Jiangshan, Steven Rostedt, Alasdair G Kergon,
	Arjan van de Ven

On Wed, 2009-01-28 at 04:02 +0100, Oleg Nesterov wrote:
> 
> I must admit, I don't like this patch. Perhaps I am wrong, mostly I
> dislike the complications it adds.
> 
> Anybody else please vote for this change?

Because you asked ;-)

I too am not particularly fond of it, for much the same reasons you
mentioned.

While I think its good to reduce the number of random kernel threads,
I'm afraid this isn't a good way to do so.

Why does everybody and his pony create their own workqueue, is that
really because it deadlocks with keventd, or are there other reasons, if
so, which, and can we solve those?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v2][PATCH] create workqueue threads only when needed
  2009-01-28  3:02 ` Oleg Nesterov
  2009-01-28  8:15   ` Peter Zijlstra
@ 2009-01-28 11:24   ` Frédéric Weisbecker
  2009-01-28 23:31   ` Frederic Weisbecker
  2 siblings, 0 replies; 7+ messages in thread
From: Frédéric Weisbecker @ 2009-01-28 11:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, linux-kernel, Andrew Morton, Lai Jiangshan,
	Peter Zijlstra, Steven Rostedt, Alasdair G Kergon,
	Arjan van de Ven

2009/1/28 Oleg Nesterov <oleg@redhat.com>:
> On 01/28, Frederic Weisbecker wrote:
>>
>> +static void workqueue_unshadow(struct cpu_workqueue_struct *cwq)
>> +{
>> +     struct workqueue_shadow *ws;
>> +
>> +     /* Prevent from concurrent unshadowing */
>> +     if (unlikely(atomic_inc_return(&cwq->unshadowed) != 1))
>> +             goto already_unshadowed;
>> +
>> +     /*
>> +      * The work can be inserted whatever is the context.
>> +      * But such atomic allocation will be rare and freed soon.
>> +      */
>> +     ws = kmalloc(sizeof(*ws), GFP_ATOMIC);
>> +     if (!ws) {
>> +             WARN_ON_ONCE(1);
>> +             goto already_unshadowed;
>> +     }
>> +     INIT_DELAYED_WORK(&ws->work, workqueue_unshadow_work);
>> +     ws->cwq = cwq;
>> +     schedule_delayed_work(&ws->work, 0);
>> +
>> +     return;
>> +
>> +already_unshadowed:
>> +     atomic_dec(&cwq->unshadowed);
>> +}
>
> Can't understand why do you use delayed work...


Because if it fails to create the thread, we want to requeue the work
within a delay.


> I must admit, I don't like this patch. Perhaps I am wrong, mostly I
> dislike the complications it adds.
>
> Anybody else please vote for this change?
>
> Hmm. We never reset cwq->unshadowed, so cwq->thread becomes "non-lazy"
> after cpu_down() + cpu_up().


No need to reset it. If it is unshadowed, then the thread was created
and we want to recreate
it on cpu_up(). And if it is shadowed, then we don't need to create it yet.


> And. Of course it is not good that queue_work() can silently fail just
> because GFP_ATOMIC fails. This is not acceptable, imho. But fixable.


In my opinion too. I just didn't find a proper way to handler it yet.


> What is not fixable is that this patch adds a subtle lock-ordering
> problem. With this patch any flush_work() or flush_workqueue() or
> destroy_workqueue() depend on keventd, and can deadlock if the caller
> shares the lock with any work_struct on keventd.
>
> Or. let's suppose keventd has a sleeping work_struct which waits
> for the event. Now we queue the work which should "implement"
> this event on !unshadowed wq - deadlock.


I didn't thought about such deadlocks.... :-/


> Another problem. workqueue_unshadow_work() populates cwq->thread and
> binds it to smp_processor_id(). This is not safe, CPU can go away
> after smp_processor_id() but before wake_up_process().


So it should use get_cpu/put_cpu, right?


> Oh, and schedule_delayed_work() is not right, think about queue_work_on().
>
>>  static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
>>  {
>> +     DEFINE_WAIT(wait);
>> +     long timeout = 0;
>> +     int unshadowed = atomic_read(&cwq->unshadowed);
>> +
>> +     /* Shadowed => no thread has been created */
>> +     if (!unshadowed)
>> +             return;
>
> This is not right, if the previous workqueue_unshadow() failed, we
> can return with the pending works.


Yeah, looks like this bit is not a good synchronization solution. Only
a spinlock
can solve this, to synchronize the access to cwq->unshadowed and block it
while we are trying to allocate the work.


>> +
>>       /*
>> -      * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
>> -      * cpu_add_remove_lock protects cwq->thread.
>> +      * If it's unshadowed, we want to ensure the thread creation
>> +      * has been completed.
>>        */
>> -     if (cwq->thread == NULL)
>> -             return;
>> +     prepare_to_wait(&cwq->thread_creation, &wait, TASK_UNINTERRUPTIBLE);
>> +     if (!cwq->thread)
>> +             timeout = schedule_timeout_interruptible(HZ * 3);
>> +     finish_wait(&cwq->thread_creation, &wait);
>> +
>> +     /* We waited for 3 seconds, this is likely a soft lockup */
>> +     WARN_ON(timeout);
>
> Can't understand... If timeout != 0, then we were woken by
> workqueue_unshadow_work() ?


Oops, I was confused with the return value of schedule_timeout()


> Anyway. We should not proceed if we failed to create cwq->thread.
> The kernel can crash. And of course this is not good too. Yes,
> you modified flush_cpu_workqueue() to call workqueue_unshadow(),
> but this can fail too. And if another thread cancels the pending
> works, flush_cpu_workqueue() just returns, and we crash. Or we
> can hang forever.


Right.


> Also. Please note that cleanup_workqueue_thread() can also be
> called by CPU_UP_CANCELED when cwq->thread == NULL because it
> was never created. We should do nothing in this case, but we
> will hang if cwq->unshadowed != 0.


Not sure I understand your point here...unless...if it hasn't been
created yet, the keventd
which was about to do it can have been cleanup... Is that what you are
talking about?
Anyway, yes, this is a race condition.


>>               switch (action) {
>>               case CPU_UP_PREPARE:
>> +                     /* Will be created during the first work insertion */
>> +                     if (!atomic_read(&cwq->unshadowed))
>> +                             break;
>>                       if (!create_workqueue_thread(cwq, cpu))
>>                               break;
>>                       printk(KERN_ERR "workqueue [%s] for %i failed\n",
>> @@ -964,6 +1086,8 @@ undo:
>>                       goto undo;
>>
>>               case CPU_ONLINE:
>> +                     if (!atomic_read(&cwq->unshadowed))
>> +                             break;
>>                       start_workqueue_thread(cwq, cpu);
>>                       break;
>
> Suppose that we have some strange cpu_callback(action, cpu)
> which does:
>
>                case CPU_UP_PREPARE:
>                        queue_work_on(cpu, my_wq, percpu_work);
>                        break;
>                case CPU_UP_CANCELED:
>                        cancel_work_sync(percpu_work);
>
> Currently this works. But with this patch, queue_work_on() above
> can leak workqueue_shadow.


But the work of task creation is pending on kevent, which works are
not cancelled here.

Anyway you are right, the approach is not good. Even if the races are
fixed, when the caller of a
workqueue holds a lock needed by a work queued before the task
creation on kevent, we have an unfixable
deadlock.

The only solution would be to have a dedicated thread on workqueue to
unshadow the other workqueues. But isn't
it solving the problem of random tasks by creating another one?
Yes and no: create one task to avoid creating several pointless other
one can be a solution.

Hm?

Thanks for your review Oleg!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v2][PATCH] create workqueue threads only when needed
  2009-01-28  8:15   ` Peter Zijlstra
@ 2009-01-28 11:33     ` Frédéric Weisbecker
  0 siblings, 0 replies; 7+ messages in thread
From: Frédéric Weisbecker @ 2009-01-28 11:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, linux-kernel, Andrew Morton,
	Lai Jiangshan, Steven Rostedt, Alasdair G Kergon,
	Arjan van de Ven

2009/1/28 Peter Zijlstra <a.p.zijlstra@chello.nl>:
> On Wed, 2009-01-28 at 04:02 +0100, Oleg Nesterov wrote:
>>
>> I must admit, I don't like this patch. Perhaps I am wrong, mostly I
>> dislike the complications it adds.
>>
>> Anybody else please vote for this change?
>
> Because you asked ;-)
>
> I too am not particularly fond of it, for much the same reasons you
> mentioned.
>
> While I think its good to reduce the number of random kernel threads,
> I'm afraid this isn't a good way to do so.
>
> Why does everybody and his pony create their own workqueue, is that
> really because it deadlocks with keventd, or are there other reasons, if
> so, which, and can we solve those?

Even with the approach of shadow workqueues, these call sites that
create pointless workqueues
are still a problem that should be fixed, since they consume memory
for their workqueue.

The reasons for these callsites are not easy to guess, they are not so
much commented and it's
hard to imagine if this is because of deadlocks with kevents, too long
work which can starve other
kevent work, or just funny workqueues....

Depending on the reason, I guess most of them can become threads
created and destroyed on the fly,
or async functions as Arjan suggested, or simple kevent works.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v2][PATCH] create workqueue threads only when needed
  2009-01-28  3:02 ` Oleg Nesterov
  2009-01-28  8:15   ` Peter Zijlstra
  2009-01-28 11:24   ` Frédéric Weisbecker
@ 2009-01-28 23:31   ` Frederic Weisbecker
  2009-01-29  4:58     ` Oleg Nesterov
  2 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2009-01-28 23:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, linux-kernel, Andrew Morton, Lai Jiangshan,
	Peter Zijlstra, Steven Rostedt, Alasdair G Kergon,
	Arjan van de Ven

On Wed, Jan 28, 2009 at 04:02:24AM +0100, Oleg Nesterov wrote:
> On 01/28, Frederic Weisbecker wrote:
> >
> > +static void workqueue_unshadow(struct cpu_workqueue_struct *cwq)
> > +{
> > +	struct workqueue_shadow *ws;
> > +
> > +	/* Prevent from concurrent unshadowing */
> > +	if (unlikely(atomic_inc_return(&cwq->unshadowed) != 1))
> > +		goto already_unshadowed;
> > +
> > +	/*
> > +	 * The work can be inserted whatever is the context.
> > +	 * But such atomic allocation will be rare and freed soon.
> > +	 */
> > +	ws = kmalloc(sizeof(*ws), GFP_ATOMIC);
> > +	if (!ws) {
> > +		WARN_ON_ONCE(1);
> > +		goto already_unshadowed;
> > +	}
> > +	INIT_DELAYED_WORK(&ws->work, workqueue_unshadow_work);
> > +	ws->cwq = cwq;
> > +	schedule_delayed_work(&ws->work, 0);
> > +
> > +	return;
> > +
> > +already_unshadowed:
> > +	atomic_dec(&cwq->unshadowed);
> > +}
> 
> Can't understand why do you use delayed work...
> 
> I must admit, I don't like this patch. Perhaps I am wrong, mostly I
> dislike the complications it adds.
> 
> Anybody else please vote for this change?
> 
> Hmm. We never reset cwq->unshadowed, so cwq->thread becomes "non-lazy"
> after cpu_down() + cpu_up().
> 
> And. Of course it is not good that queue_work() can silently fail just
> because GFP_ATOMIC fails. This is not acceptable, imho. But fixable.
> 
> What is not fixable is that this patch adds a subtle lock-ordering
> problem. With this patch any flush_work() or flush_workqueue() or
> destroy_workqueue() depend on keventd, and can deadlock if the caller
> shares the lock with any work_struct on keventd.
> 
> Or. let's suppose keventd has a sleeping work_struct which waits
> for the event. Now we queue the work which should "implement"
> this event on !unshadowed wq - deadlock.
> 
> Another problem. workqueue_unshadow_work() populates cwq->thread and
> binds it to smp_processor_id(). This is not safe, CPU can go away
> after smp_processor_id() but before wake_up_process().
> 
> Oh, and schedule_delayed_work() is not right, think about queue_work_on().


Hi Oleg,

I was about to retry the same approach but through async functions (kernel/async.c)
which would have solved the possible deadlock you described and would have made
the synchronizations easier.

But actually this is only a half solution:

Pointless workqueues stay pointless, even if they don't appear before they run once.
And moreover this is hiding the real problem: parts of the kernel use dedicated workqueues
while kevent is sufficient most of the time.

And if there are problems with using the workqueues, because of deadlocks or slow
works, async functions are a good solution.

I think (more precisely I agree with you) these callsites should be fixed
individually. I shall fix some of them if I can with the limited hardware I can test.

Thanks.

 
> >  static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
> >  {
> > +	DEFINE_WAIT(wait);
> > +	long timeout = 0;
> > +	int unshadowed = atomic_read(&cwq->unshadowed);
> > +
> > +	/* Shadowed => no thread has been created */
> > +	if (!unshadowed)
> > +		return;
> 
> This is not right, if the previous workqueue_unshadow() failed, we
> can return with the pending works.
> 
> > +
> >  	/*
> > -	 * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
> > -	 * cpu_add_remove_lock protects cwq->thread.
> > +	 * If it's unshadowed, we want to ensure the thread creation
> > +	 * has been completed.
> >  	 */
> > -	if (cwq->thread == NULL)
> > -		return;
> > +	prepare_to_wait(&cwq->thread_creation, &wait, TASK_UNINTERRUPTIBLE);
> > +	if (!cwq->thread)
> > +		timeout = schedule_timeout_interruptible(HZ * 3);
> > +	finish_wait(&cwq->thread_creation, &wait);
> > +
> > +	/* We waited for 3 seconds, this is likely a soft lockup */
> > +	WARN_ON(timeout);
> 
> Can't understand... If timeout != 0, then we were woken by
> workqueue_unshadow_work() ?
> 
> Anyway. We should not proceed if we failed to create cwq->thread.
> The kernel can crash. And of course this is not good too. Yes,
> you modified flush_cpu_workqueue() to call workqueue_unshadow(),
> but this can fail too. And if another thread cancels the pending
> works, flush_cpu_workqueue() just returns, and we crash. Or we
> can hang forever.
> 
> Also. Please note that cleanup_workqueue_thread() can also be
> called by CPU_UP_CANCELED when cwq->thread == NULL because it
> was never created. We should do nothing in this case, but we
> will hang if cwq->unshadowed != 0.
> 
> >  		switch (action) {
> >  		case CPU_UP_PREPARE:
> > +			/* Will be created during the first work insertion */
> > +			if (!atomic_read(&cwq->unshadowed))
> > +				break;
> >  			if (!create_workqueue_thread(cwq, cpu))
> >  				break;
> >  			printk(KERN_ERR "workqueue [%s] for %i failed\n",
> > @@ -964,6 +1086,8 @@ undo:
> >  			goto undo;
> >
> >  		case CPU_ONLINE:
> > +			if (!atomic_read(&cwq->unshadowed))
> > +				break;
> >  			start_workqueue_thread(cwq, cpu);
> >  			break;
> 
> Suppose that we have some strange cpu_callback(action, cpu)
> which does:
> 
> 		case CPU_UP_PREPARE:
> 			queue_work_on(cpu, my_wq, percpu_work);
> 			break;
> 		case CPU_UP_CANCELED:
> 			cancel_work_sync(percpu_work);
> 
> Currently this works. But with this patch, queue_work_on() above
> can leak workqueue_shadow.
> 
> Oleg.
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v2][PATCH] create workqueue threads only when needed
  2009-01-28 23:31   ` Frederic Weisbecker
@ 2009-01-29  4:58     ` Oleg Nesterov
  0 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2009-01-29  4:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, linux-kernel, Andrew Morton, Lai Jiangshan,
	Peter Zijlstra, Steven Rostedt, Alasdair G Kergon,
	Arjan van de Ven

On 01/29, Frederic Weisbecker wrote:
>
> I was about to retry the same approach but through async functions (kernel/async.c)
> which would have solved the possible deadlock you described and would have made
> the synchronizations easier.
>
> But actually this is only a half solution:
>
> Pointless workqueues stay pointless, even if they don't appear before they run once.
> And moreover this is hiding the real problem: parts of the kernel use dedicated workqueues
> while kevent is sufficient most of the time.
>
> And if there are problems with using the workqueues, because of deadlocks or slow
> works, async functions are a good solution.

Completely agreed.

And. Even if some subsystem really needs its own workqueue, probably it
can use the single-threaded one, but we have a lot of multithreaded wqs.

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-01-29  5:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-28  0:27 [RFC v2][PATCH] create workqueue threads only when needed Frederic Weisbecker
2009-01-28  3:02 ` Oleg Nesterov
2009-01-28  8:15   ` Peter Zijlstra
2009-01-28 11:33     ` Frédéric Weisbecker
2009-01-28 11:24   ` Frédéric Weisbecker
2009-01-28 23:31   ` Frederic Weisbecker
2009-01-29  4:58     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox