Screwing with the concurrency limit

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Screwing with the concurrency limit
@ 2011-01-08 14:55 Kent Overstreet
  2011-01-08 16:18 ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2011-01-08 14:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

First off, wild applause for cmwq. The limitations of the old workqueues 
were a major irritation, I think your new implementation is fabulous.

However, when merging bcache with mainline, I ran into a bit of a thorny 
issue. Bcache relies heavily on workqueues, updates to the cache's btree 
have to be done after every relevant IO completes. Additionally, btree 
insertions can involve sleeping on IO while the root of the tree isn't 
write locked - so we'd like to not block other work items from 
completing if we don't have to.

So, one might expect the way to get the best performance would be
   alloc_workqueue("bcache", WQ_HIGHPRI|WQ_MEM_RECLAIM, 0)

Trouble is, sometimes we do write lock the root of the btree, blocking 
everything else from getting anything done - the end result is
   root@moria:~# ps ax|grep kworker|wc -l
   1550

(running dbench in a VM with disks in tmpfs). Performance is fine (I 
think, haven't been trying to rigorously benchmark) but that's annoying.

I think the best way I can express it is that bcache normally wants a 
concurrency limit of 1, except when we're blocking and we aren't write 
locking the root of the btree.

So, do you think there might be some sane way of doing this with cmwq? 
Some way to say "Don't count this work item I'm in right now count 
against the workqueue's concurrency limit anymore". If such a thing 
could be done, I think it'd be the perfect solution (and I'll owe you a 
case of your choice of beer :)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Screwing with the concurrency limit
  2011-01-08 14:55 Screwing with the concurrency limit Kent Overstreet
@ 2011-01-08 16:18 ` Tejun Heo
  2011-01-08 16:37   ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2011-01-08 16:18 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel

Hello,

On Sat, Jan 08, 2011 at 06:55:41AM -0800, Kent Overstreet wrote:
> First off, wild applause for cmwq. The limitations of the old
> workqueues were a major irritation, I think your new implementation
> is fabulous.

Heh, that's flattering.  Thanks.

> However, when merging bcache with mainline, I ran into a bit of a
> thorny issue. Bcache relies heavily on workqueues, updates to the
> cache's btree have to be done after every relevant IO completes.
> Additionally, btree insertions can involve sleeping on IO while the
> root of the tree isn't write locked - so we'd like to not block
> other work items from completing if we don't have to.
> 
> So, one might expect the way to get the best performance would be
>   alloc_workqueue("bcache", WQ_HIGHPRI|WQ_MEM_RECLAIM, 0)

A bit tangential but is WQ_HIGHPRI really necessary?  Is the use case
very latency sensitive?

> Trouble is, sometimes we do write lock the root of the btree,
> blocking everything else from getting anything done - the end result
> is
>   root@moria:~# ps ax|grep kworker|wc -l
>   1550

Yeah, that will happen.

> (running dbench in a VM with disks in tmpfs). Performance is fine (I
> think, haven't been trying to rigorously benchmark) but that's
> annoying.
> 
> I think the best way I can express it is that bcache normally wants
> a concurrency limit of 1, except when we're blocking and we aren't
> write locking the root of the btree.
> 
> So, do you think there might be some sane way of doing this with
> cmwq? Some way to say "Don't count this work item I'm in right now
> count against the workqueue's concurrency limit anymore". If such a
> thing could be done, I think it'd be the perfect solution (and I'll
> owe you a case of your choice of beer :)

Hmmm... workqueue allows adjusting @max_active on the fly with
workqueue_set_max_active().  I think what you can do is to wrap write
locks with max_active cramping.  ie. set_max_active(1); write_lock();
do the stuff; write_unlock(); set_max_active(orig_max_active);

workqueue_set_max_active() would need a bit update to make it behave
under such dynamic usage (so that it returns the original max_active
after applying the new one and kicks the delayed work items when
max_active gets increased) but if that sounds like a plan which could
work, I'll be happy to update it.

Good luck.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Screwing with the concurrency limit
  2011-01-08 16:18 ` Tejun Heo
@ 2011-01-08 16:37   ` Tejun Heo
  2011-01-08 18:06     ` Kent Overstreet
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2011-01-08 16:37 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel

On Sat, Jan 08, 2011 at 11:18:40AM -0500, Tejun Heo wrote:
> workqueue_set_max_active() would need a bit update to make it behave
> under such dynamic usage (so that it returns the original max_active
> after applying the new one and kicks the delayed work items when
> max_active gets increased) but if that sounds like a plan which could
> work, I'll be happy to update it.

So, something like the following.  It's only compile tested but the
idea is to update workqueue_set_max_active() such that

* If 0 is specified as the new @max_active, the default concurrency
  level is used.

* The old max_active is returned so that it can be restored later.

* When max_active changes, if it becomes higher, the delayed work
  items will be activated immediately.


diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 0c0771f..bea79ed 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -363,8 +363,8 @@ extern bool flush_delayed_work(struct delayed_work *dwork);
 extern bool flush_delayed_work_sync(struct delayed_work *work);
 extern bool cancel_delayed_work_sync(struct delayed_work *dwork);
 
-extern void workqueue_set_max_active(struct workqueue_struct *wq,
-				     int max_active);
+extern int workqueue_set_max_active(struct workqueue_struct *wq,
+				    int max_active);
 extern bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq);
 extern unsigned int work_cpu(struct work_struct *work);
 extern unsigned int work_busy(struct work_struct *work);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e785b0f..0a45072 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2814,11 +2814,13 @@ static void free_cwqs(struct workqueue_struct *wq)
 	}
 }
 
-static int wq_clamp_max_active(int max_active, unsigned int flags,
-			       const char *name)
+static int wq_determine_max_active(int max_active, unsigned int flags,
+				   const char *name)
 {
 	int lim = flags & WQ_UNBOUND ? WQ_UNBOUND_MAX_ACTIVE : WQ_MAX_ACTIVE;
 
+	max_active = max_active ?: WQ_DFL_ACTIVE;
+
 	if (max_active < 1 || max_active > lim)
 		printk(KERN_WARNING "workqueue: max_active %d requested for %s "
 		       "is out of range, clamping between %d and %d\n",
@@ -2827,6 +2829,15 @@ static int wq_clamp_max_active(int max_active, unsigned int flags,
 	return clamp_val(max_active, 1, lim);
 }
 
+static void cwq_set_max_active(struct cpu_workqueue_struct *cwq, int max_active)
+{
+	cwq->max_active = max_active;
+
+	while (!list_empty(&cwq->delayed_works) &&
+	       cwq->nr_active < cwq->max_active)
+		cwq_activate_first_delayed(cwq);
+}
+
 struct workqueue_struct *__alloc_workqueue_key(const char *name,
 					       unsigned int flags,
 					       int max_active,
@@ -2850,8 +2861,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *name,
 	if (flags & WQ_UNBOUND)
 		flags |= WQ_HIGHPRI;
 
-	max_active = max_active ?: WQ_DFL_ACTIVE;
-	max_active = wq_clamp_max_active(max_active, flags, name);
+	max_active = wq_determine_max_active(max_active, flags, name);
 
 	wq = kzalloc(sizeof(*wq), GFP_KERNEL);
 	if (!wq)
@@ -2879,8 +2889,8 @@ struct workqueue_struct *__alloc_workqueue_key(const char *name,
 		cwq->gcwq = gcwq;
 		cwq->wq = wq;
 		cwq->flush_color = -1;
-		cwq->max_active = max_active;
 		INIT_LIST_HEAD(&cwq->delayed_works);
+		cwq_set_max_active(cwq, max_active);
 	}
 
 	if (flags & WQ_RESCUER) {
@@ -2910,7 +2920,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *name,
 
 	if (workqueue_freezing && wq->flags & WQ_FREEZEABLE)
 		for_each_cwq_cpu(cpu, wq)
-			get_cwq(cpu, wq)->max_active = 0;
+			cwq_set_max_active(get_cwq(cpu, wq), 0);
 
 	list_add(&wq->list, &workqueues);
 
@@ -2981,29 +2991,31 @@ EXPORT_SYMBOL_GPL(destroy_workqueue);
  * CONTEXT:
  * Don't call from IRQ context.
  */
-void workqueue_set_max_active(struct workqueue_struct *wq, int max_active)
+int workqueue_set_max_active(struct workqueue_struct *wq, int max_active)
 {
+	int old_max_active;
 	unsigned int cpu;
 
-	max_active = wq_clamp_max_active(max_active, wq->flags, wq->name);
+	max_active = wq_determine_max_active(max_active, wq->flags, wq->name);
 
 	spin_lock(&workqueue_lock);
 
+	old_max_active = wq->saved_max_active;
 	wq->saved_max_active = max_active;
 
 	for_each_cwq_cpu(cpu, wq) {
 		struct global_cwq *gcwq = get_gcwq(cpu);
+		struct cpu_workqueue_struct *cwq = get_cwq(gcwq->cpu, wq);
 
 		spin_lock_irq(&gcwq->lock);
-
-		if (!(wq->flags & WQ_FREEZEABLE) ||
-		    !(gcwq->flags & GCWQ_FREEZING))
-			get_cwq(gcwq->cpu, wq)->max_active = max_active;
-
+		if (cwq->max_active)
+			cwq_set_max_active(cwq, max_active);
 		spin_unlock_irq(&gcwq->lock);
 	}
 
 	spin_unlock(&workqueue_lock);
+
+	return old_max_active;
 }
 EXPORT_SYMBOL_GPL(workqueue_set_max_active);
 
@@ -3533,7 +3545,7 @@ void freeze_workqueues_begin(void)
 			struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
 
 			if (cwq && wq->flags & WQ_FREEZEABLE)
-				cwq->max_active = 0;
+				cwq_set_max_active(cwq, 0);
 		}
 
 		spin_unlock_irq(&gcwq->lock);
@@ -3622,11 +3634,7 @@ void thaw_workqueues(void)
 				continue;
 
 			/* restore max_active and repopulate worklist */
-			cwq->max_active = wq->saved_max_active;
-
-			while (!list_empty(&cwq->delayed_works) &&
-			       cwq->nr_active < cwq->max_active)
-				cwq_activate_first_delayed(cwq);
+			cwq_set_max_active(cwq, wq->saved_max_active);
 		}
 
 		wake_up_worker(gcwq);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Screwing with the concurrency limit
  2011-01-08 16:37   ` Tejun Heo
@ 2011-01-08 18:06     ` Kent Overstreet
  2011-01-09 14:41       ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2011-01-08 18:06 UTC (permalink / raw)
  To: linux-kernel

On 01/08/2011 08:37 AM, Tejun Heo wrote:
> On Sat, Jan 08, 2011 at 11:18:40AM -0500, Tejun Heo wrote:
>> workqueue_set_max_active() would need a bit update to make it behave
>> under such dynamic usage (so that it returns the original max_active
>> after applying the new one and kicks the delayed work items when
>> max_active gets increased) but if that sounds like a plan which could
>> work, I'll be happy to update it.
>
> So, something like the following.  It's only compile tested but the
> idea is to update workqueue_set_max_active() such that
>
> * If 0 is specified as the new @max_active, the default concurrency
>    level is used.
>
> * The old max_active is returned so that it can be restored later.
>
> * When max_active changes, if it becomes higher, the delayed work
>    items will be activated immediately.

Well, that doesn't quite do it, I'd need workqueue_inc_max_active() and 
workqueue_dec_max_active()... set_max_active() would be racy. But those 
would be trivial now that I've seen your patch (I missed 
workqueue_set_max_active() when I was reading through the code earlier).

But also there's no point in adjusting max_active on every cpu's 
workqueue, adjusting just the one on the local cpu would do exactly what 
I want and be more efficient too... Can you see any issues in doing it 
that way?

What I was really hoping for was something like... maybe 
move_work_to_workqueue() - if you could do that on the work item you're 
executing, move it from the workqueue that has max_active = 1 to a 
different one - it's stateless from the caller's perspective. But I 
suspect that'd be more complicated than your way of doing it, and 
inc()/dec() is probably just as good...

> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 0c0771f..bea79ed 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -363,8 +363,8 @@ extern bool flush_delayed_work(struct delayed_work *dwork);
>   extern bool flush_delayed_work_sync(struct delayed_work *work);
>   extern bool cancel_delayed_work_sync(struct delayed_work *dwork);
>
> -extern void workqueue_set_max_active(struct workqueue_struct *wq,
> -				     int max_active);
> +extern int workqueue_set_max_active(struct workqueue_struct *wq,
> +				    int max_active);
>   extern bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq);
>   extern unsigned int work_cpu(struct work_struct *work);
>   extern unsigned int work_busy(struct work_struct *work);
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index e785b0f..0a45072 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2814,11 +2814,13 @@ static void free_cwqs(struct workqueue_struct *wq)
>   	}
>   }
>
> -static int wq_clamp_max_active(int max_active, unsigned int flags,
> -			       const char *name)
> +static int wq_determine_max_active(int max_active, unsigned int flags,
> +				   const char *name)
>   {
>   	int lim = flags&  WQ_UNBOUND ? WQ_UNBOUND_MAX_ACTIVE : WQ_MAX_ACTIVE;
>
> +	max_active = max_active ?: WQ_DFL_ACTIVE;
> +
>   	if (max_active<  1 || max_active>  lim)
>   		printk(KERN_WARNING "workqueue: max_active %d requested for %s "
>   		       "is out of range, clamping between %d and %d\n",
> @@ -2827,6 +2829,15 @@ static int wq_clamp_max_active(int max_active, unsigned int flags,
>   	return clamp_val(max_active, 1, lim);
>   }
>
> +static void cwq_set_max_active(struct cpu_workqueue_struct *cwq, int max_active)
> +{
> +	cwq->max_active = max_active;
> +
> +	while (!list_empty(&cwq->delayed_works)&&
> +	       cwq->nr_active<  cwq->max_active)
> +		cwq_activate_first_delayed(cwq);
> +}
> +
>   struct workqueue_struct *__alloc_workqueue_key(const char *name,
>   					       unsigned int flags,
>   					       int max_active,
> @@ -2850,8 +2861,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *name,
>   	if (flags&  WQ_UNBOUND)
>   		flags |= WQ_HIGHPRI;
>
> -	max_active = max_active ?: WQ_DFL_ACTIVE;
> -	max_active = wq_clamp_max_active(max_active, flags, name);
> +	max_active = wq_determine_max_active(max_active, flags, name);
>
>   	wq = kzalloc(sizeof(*wq), GFP_KERNEL);
>   	if (!wq)
> @@ -2879,8 +2889,8 @@ struct workqueue_struct *__alloc_workqueue_key(const char *name,
>   		cwq->gcwq = gcwq;
>   		cwq->wq = wq;
>   		cwq->flush_color = -1;
> -		cwq->max_active = max_active;
>   		INIT_LIST_HEAD(&cwq->delayed_works);
> +		cwq_set_max_active(cwq, max_active);
>   	}
>
>   	if (flags&  WQ_RESCUER) {
> @@ -2910,7 +2920,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *name,
>
>   	if (workqueue_freezing&&  wq->flags&  WQ_FREEZEABLE)
>   		for_each_cwq_cpu(cpu, wq)
> -			get_cwq(cpu, wq)->max_active = 0;
> +			cwq_set_max_active(get_cwq(cpu, wq), 0);
>
>   	list_add(&wq->list,&workqueues);
>
> @@ -2981,29 +2991,31 @@ EXPORT_SYMBOL_GPL(destroy_workqueue);
>    * CONTEXT:
>    * Don't call from IRQ context.
>    */
> -void workqueue_set_max_active(struct workqueue_struct *wq, int max_active)
> +int workqueue_set_max_active(struct workqueue_struct *wq, int max_active)
>   {
> +	int old_max_active;
>   	unsigned int cpu;
>
> -	max_active = wq_clamp_max_active(max_active, wq->flags, wq->name);
> +	max_active = wq_determine_max_active(max_active, wq->flags, wq->name);
>
>   	spin_lock(&workqueue_lock);
>
> +	old_max_active = wq->saved_max_active;
>   	wq->saved_max_active = max_active;
>
>   	for_each_cwq_cpu(cpu, wq) {
>   		struct global_cwq *gcwq = get_gcwq(cpu);
> +		struct cpu_workqueue_struct *cwq = get_cwq(gcwq->cpu, wq);
>
>   		spin_lock_irq(&gcwq->lock);
> -
> -		if (!(wq->flags&  WQ_FREEZEABLE) ||
> -		    !(gcwq->flags&  GCWQ_FREEZING))
> -			get_cwq(gcwq->cpu, wq)->max_active = max_active;
> -
> +		if (cwq->max_active)
> +			cwq_set_max_active(cwq, max_active);
>   		spin_unlock_irq(&gcwq->lock);
>   	}
>
>   	spin_unlock(&workqueue_lock);
> +
> +	return old_max_active;
>   }
>   EXPORT_SYMBOL_GPL(workqueue_set_max_active);
>
> @@ -3533,7 +3545,7 @@ void freeze_workqueues_begin(void)
>   			struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
>
>   			if (cwq&&  wq->flags&  WQ_FREEZEABLE)
> -				cwq->max_active = 0;
> +				cwq_set_max_active(cwq, 0);
>   		}
>
>   		spin_unlock_irq(&gcwq->lock);
> @@ -3622,11 +3634,7 @@ void thaw_workqueues(void)
>   				continue;
>
>   			/* restore max_active and repopulate worklist */
> -			cwq->max_active = wq->saved_max_active;
> -
> -			while (!list_empty(&cwq->delayed_works)&&
> -			       cwq->nr_active<  cwq->max_active)
> -				cwq_activate_first_delayed(cwq);
> +			cwq_set_max_active(cwq, wq->saved_max_active);
>   		}
>
>   		wake_up_worker(gcwq);



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Screwing with the concurrency limit
  2011-01-08 18:06     ` Kent Overstreet
@ 2011-01-09 14:41       ` Tejun Heo
  2011-01-11 19:38         ` Kent Overstreet
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2011-01-09 14:41 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel

Hello,

On Sat, Jan 08, 2011 at 10:06:04AM -0800, Kent Overstreet wrote:
> Well, that doesn't quite do it, I'd need workqueue_inc_max_active()
> and workqueue_dec_max_active()... set_max_active() would be racy.

You'll of course need to grab an outer mutex around max_active
updates.

> But also there's no point in adjusting max_active on every cpu's
> workqueue, adjusting just the one on the local cpu would do exactly
> what I want and be more efficient too... Can you see any issues in
> doing it that way?

Can you please explain the use case a bit more?  Is something per-cpu?
ie. Are your write locks per-cpu?  How frequent do you expect the
write locking to be?  I think adjusting max_active per-cpu should be
doable but I'd rather stay away from that.

> What I was really hoping for was something like... maybe
> move_work_to_workqueue() - if you could do that on the work item
> you're executing, move it from the workqueue that has max_active = 1
> to a different one - it's stateless from the caller's perspective.

I don't think that's gonna be a good idea.  It's too specialized
soultion which is likely to bite our asses down the road.

> But I suspect that'd be more complicated than your way of doing it,
> and inc()/dec() is probably just as good...

So, I think it would be better to make max_active manipulation work
somehow but again I want to stay way from being too specialized.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Screwing with the concurrency limit
  2011-01-09 14:41       ` Tejun Heo
@ 2011-01-11 19:38         ` Kent Overstreet
  2011-01-12 13:08           ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2011-01-11 19:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 01/09/2011 06:41 AM, Tejun Heo wrote:
> Hello,
>
> On Sat, Jan 08, 2011 at 10:06:04AM -0800, Kent Overstreet wrote:
>> Well, that doesn't quite do it, I'd need workqueue_inc_max_active()
>> and workqueue_dec_max_active()... set_max_active() would be racy.
>
> You'll of course need to grab an outer mutex around max_active
> updates.
>
>> But also there's no point in adjusting max_active on every cpu's
>> workqueue, adjusting just the one on the local cpu would do exactly
>> what I want and be more efficient too... Can you see any issues in
>> doing it that way?
>
> Can you please explain the use case a bit more?  Is something per-cpu?
> ie. Are your write locks per-cpu?  How frequent do you expect the
> write locking to be?  I think adjusting max_active per-cpu should be
> doable but I'd rather stay away from that.

Hm, I must not be explaining myself very well.

Forget about the write locks for the moment.

So, high rate of work items, latency sensitive and _usually_ execute 
without blocking.

We would like a concurrency limit of 1 when they don't block - otherwise 
we're just scheduling for no reason. But sometimes they do block, and 
it's impossible to know whether they will or won't ahead of time.

That's the catch, if we have to block and we have a concurrency limit of 
1, we've got latency sensitive jobs queued on this CPU that are waiting 
around for no reason.

The write locks are the reason the concurrency limit pretty much has to 
be 1, because if it's not we'll sometimes just be trying to execute 
everything pointlessly.

So I'm trying to have my cake and eat it to. If a work item is 
executing, right before it blocks on IO it would like to do something to 
say "hey, start running whatever is available for this cpu". And it's 
only blocking the other work items on the cpu it's on, that's why I 
suggested adjusting only the local max_active.

>
>> What I was really hoping for was something like... maybe
>> move_work_to_workqueue() - if you could do that on the work item
>> you're executing, move it from the workqueue that has max_active = 1
>> to a different one - it's stateless from the caller's perspective.
>
> I don't think that's gonna be a good idea.  It's too specialized
> soultion which is likely to bite our asses down the road.

Well, I'm hoping I figure out the right way to convey what I'm trying to 
do, because I don't _think_ it's actually as specialized as it sounds. 
But as far as keeping the code sane, I agree with you there.

>> But I suspect that'd be more complicated than your way of doing it,
>> and inc()/dec() is probably just as good...
>
> So, I think it would be better to make max_active manipulation work
> somehow but again I want to stay way from being too specialized.

Yeah, it'll work. Does what I'm trying to do make any sense now?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Screwing with the concurrency limit
  2011-01-11 19:38         ` Kent Overstreet
@ 2011-01-12 13:08           ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2011-01-12 13:08 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel

Hello,

On Tue, Jan 11, 2011 at 11:38:25AM -0800, Kent Overstreet wrote:
> >Can you please explain the use case a bit more?  Is something per-cpu?
> >ie. Are your write locks per-cpu?  How frequent do you expect the
> >write locking to be?  I think adjusting max_active per-cpu should be
> >doable but I'd rather stay away from that.
> 
> Hm, I must not be explaining myself very well.

Heh, you were doing just fine.  I was inbetween flights so I probably
was the one to blame.  I don't believe we're misunderstanding the
problem but we are talking about approaches from the opposite
directions.

> Forget about the write locks for the moment.
> 
> So, high rate of work items, latency sensitive and _usually_ execute
> without blocking.
> 
> We would like a concurrency limit of 1 when they don't block -
> otherwise we're just scheduling for no reason. But sometimes they do
> block, and it's impossible to know whether they will or won't ahead
> of time.

Might be a bit of tangent but cmwq is aimed at minimizing scheduling
unless necessary to maintain execution bandwidth, so its users
shouldn't usually be concerned about that.

> That's the catch, if we have to block and we have a concurrency
> limit of 1, we've got latency sensitive jobs queued on this CPU that
> are waiting around for no reason.
> 
> The write locks are the reason the concurrency limit pretty much has
> to be 1, because if it's not we'll sometimes just be trying to
> execute everything pointlessly.

But the situation is different here and there's this quite disruptive
condition where the usual behavior doesn't result in a very desirable
outcome.

> So I'm trying to have my cake and eat it to. If a work item is
> executing, right before it blocks on IO it would like to do
> something to say "hey, start running whatever is available for this
> cpu". And it's only blocking the other work items on the cpu it's
> on, that's why I suggested adjusting only the local max_active.

Yeap, understood.  I just think it would be better to approach it from
the other direction.  Instead of trying to micro manage concurrency in
hot paths, I think it would be better to somehow communicate the
exceptional condition above to the workqueue code.  Another thing to
consider is that marking those exceptional conditions is necessary for
other purposes too and already in use, so I'd like to head that way
instead of introducing something completely different.

> >>What I was really hoping for was something like... maybe
> >>move_work_to_workqueue() - if you could do that on the work item
> >>you're executing, move it from the workqueue that has max_active = 1
> >>to a different one - it's stateless from the caller's perspective.
> >
> >I don't think that's gonna be a good idea.  It's too specialized
> >soultion which is likely to bite our asses down the road.
> 
> Well, I'm hoping I figure out the right way to convey what I'm
> trying to do, because I don't _think_ it's actually as specialized
> as it sounds. But as far as keeping the code sane, I agree with you
> there.

I don't know.  In this case, you might know that you will sleep but
that usually isn't a piece of information usually available until you
actually sleep.  A lot of sleep conditions are buried deeply under
various APIs which might sleep (memory allocation, competing on
resources which may be shared with various different users and so on).
So, the schedule() itself is the only reliable indication that we're
gonna lose execution concurrency and that is what cmwq implements.
I'm not (yet) quite convinced extending micro control upwards is a
good idea.

So, can you please explain a bit more about your use case?  How often
do you expect the write operations would be?  Maybe we can implement a
per-workqueue inhibit hint if the existing max_active adjustment is
insufficient or too expensive.  If the micro management from API users
is the only way to go, maybe we'll go there but let's talk about it
first.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-01-12 13:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-08 14:55 Screwing with the concurrency limit Kent Overstreet
2011-01-08 16:18 ` Tejun Heo
2011-01-08 16:37   ` Tejun Heo
2011-01-08 18:06     ` Kent Overstreet
2011-01-09 14:41       ` Tejun Heo
2011-01-11 19:38         ` Kent Overstreet
2011-01-12 13:08           ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox