From: Tejun Heo <tj@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org, awalls@radix.net,
linux-kernel@vger.kernel.org, jeff@garzik.org, mingo@elte.hu,
akpm@linux-foundation.org, jens.axboe@oracle.com,
rusty@rustcorp.com.au, cl@linux-foundation.org,
dhowells@redhat.com, arjan@linux.intel.com, avi@redhat.com,
johannes@sipsolutions.net, andi@firstfloor.org
Subject: Re: workqueue thing
Date: Wed, 23 Dec 2009 13:18:40 +0900 [thread overview]
Message-ID: <4B319A20.9010305@kernel.org> (raw)
In-Reply-To: <1261480001.4937.21.camel@laptop>
Hello,
On 12/22/2009 08:06 PM, Peter Zijlstra wrote:
> On Tue, 2009-12-22 at 08:50 +0900, Tejun Heo wrote:
>>
>>> 3) gets fragile at memory-pressure/reclaim
>>
>> Shared dynamic pool is going to be affected by memory pressure no
>> matter how you implement it. cmwq tries to maintain stable level of
>> workers and has forward progress guarantee. If you're gonna do shared
>> pool, it can't get much better.
>
> And here I'm questioning the very need for shared stuff, I don't see
> any. That is, I'm not seeing it being worth the hassle.
Then you see the situation pretty different from the way I do. Maybe
it's caused by the different things we work on. Whenever I want to
create something which would need async context, I'm always faced with
these tradeoffs that I think is silly to worry about at that layer.
It ends up scattering partial solutions all over the place.
libata has two workqueues just because one may depend on the other.
The workqueue used for polling is MT to increase parallelism in case
there are multiple devices which would require polling but it's both
wasteful and not enough - they won't be used most of the time but they
aren't enough when there are multiple pollers on the same CPU. libata
just had to make a rather mediocre in-the-middle tradeoff between
having one poller for each device and sharing single poller for all
devices.
The same goes for EH threads. How often they are used heavily depends
on the system configuration. For example, libata handles ATAPI CHECK
CONDITION as an exception and acquire sense data from the exception
handler and it happens pretty frequently. So, I want to have
per-device EHs and have ideas on how to escalate from device level EH
to host level EH. The problem here again is how to maintain the
concurrency because having a single kthread for each block device
won't be acceptable from scalability POV.
Another similar but less severe problem is in-kernel media presence
pollers. Here, I think I can have a single poller for each device
without having too many scalability issues but it just isn't efficient
because most of the time one poller would be enough. It's only when
you get to the corner cases or error conditions when you would need
more than one. So, again, I can implement a special poller pool for
this one.
And there are slow work and async both of which are there just to
provide process context to tasks which may take quite some time to
complete waiting for IOs and quite a few ST workqueues which got
separated out because they somehow got involved in some obscure
deadlock condition and the only reason they're ST is because MT would
create too many threads. CPU affinity would work better for them but
they have to make these tradeoffs.
So, if we can have a mehanism which can solve these issues, it's an
obvious plus. Shifting complexity out of peripheral code to better
crafted and managed core code is the right thing to do and it will
shift a lot of complexity out of peripheral codes.
Thanks.
--
tejun
next prev parent reply other threads:[~2009-12-23 4:17 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-18 12:57 Tejun Heo
2009-12-18 12:57 ` [PATCH 01/27] sched: rename preempt_notifiers to sched_notifiers and refactor implementation Tejun Heo
2009-12-18 12:57 ` [PATCH 02/27] sched: refactor try_to_wake_up() Tejun Heo
2009-12-18 12:57 ` [PATCH 03/27] sched: implement __set_cpus_allowed() Tejun Heo
2009-12-18 12:57 ` [PATCH 04/27] sched: make sched_notifiers unconditional Tejun Heo
2009-12-18 12:57 ` [PATCH 05/27] sched: add wakeup/sleep sched_notifiers and allow NULL notifier ops Tejun Heo
2009-12-18 12:57 ` [PATCH 06/27] sched: implement try_to_wake_up_local() Tejun Heo
2009-12-18 12:57 ` [PATCH 07/27] acpi: use queue_work_on() instead of binding workqueue worker to cpu0 Tejun Heo
2009-12-18 12:57 ` [PATCH 08/27] stop_machine: reimplement without using workqueue Tejun Heo
2009-12-18 12:57 ` [PATCH 09/27] workqueue: misc/cosmetic updates Tejun Heo
2009-12-18 12:57 ` [PATCH 10/27] workqueue: merge feature parameters into flags Tejun Heo
2009-12-18 12:57 ` [PATCH 11/27] workqueue: define both bit position and mask for work flags Tejun Heo
2009-12-18 12:57 ` [PATCH 12/27] workqueue: separate out process_one_work() Tejun Heo
2009-12-18 12:57 ` [PATCH 13/27] workqueue: temporarily disable workqueue tracing Tejun Heo
2009-12-18 12:57 ` [PATCH 14/27] workqueue: kill cpu_populated_map Tejun Heo
2009-12-18 12:57 ` [PATCH 15/27] workqueue: update cwq alignement Tejun Heo
2009-12-18 12:57 ` [PATCH 16/27] workqueue: reimplement workqueue flushing using color coded works Tejun Heo
2009-12-18 12:57 ` [PATCH 17/27] workqueue: introduce worker Tejun Heo
2009-12-18 12:57 ` [PATCH 18/27] workqueue: reimplement work flushing using linked works Tejun Heo
2009-12-18 12:58 ` [PATCH 19/27] workqueue: implement per-cwq active work limit Tejun Heo
2009-12-18 12:58 ` [PATCH 20/27] workqueue: reimplement workqueue freeze using max_active Tejun Heo
2009-12-18 12:58 ` [PATCH 21/27] workqueue: introduce global cwq and unify cwq locks Tejun Heo
2009-12-18 12:58 ` [PATCH 22/27] workqueue: implement worker states Tejun Heo
2009-12-18 12:58 ` [PATCH 23/27] workqueue: reimplement CPU hotplugging support using trustee Tejun Heo
2009-12-18 12:58 ` [PATCH 24/27] workqueue: make single thread workqueue shared worker pool friendly Tejun Heo
2009-12-18 12:58 ` [PATCH 25/27] workqueue: use shared worklist and pool all workers per cpu Tejun Heo
2009-12-18 12:58 ` [PATCH 26/27] workqueue: implement concurrency managed dynamic worker pool Tejun Heo
2009-12-18 12:58 ` [PATCH 27/27] workqueue: increase max_active of keventd and kill current_is_keventd() Tejun Heo
2009-12-18 13:00 ` SUBJ: [RFC PATCHSET] concurrency managed workqueue, take#2 Tejun Heo
2009-12-18 13:03 ` Tejun Heo
2009-12-18 13:45 ` workqueue thing Peter Zijlstra
2009-12-18 13:50 ` Andi Kleen
2009-12-18 15:01 ` Arjan van de Ven
2009-12-21 3:19 ` Tejun Heo
2009-12-21 9:17 ` Jens Axboe
2009-12-21 10:35 ` Peter Zijlstra
2009-12-21 11:09 ` Andi Kleen
2009-12-21 11:17 ` Arjan van de Ven
2009-12-21 11:33 ` Andi Kleen
2009-12-21 13:18 ` Tejun Heo
2009-12-21 11:11 ` Arjan van de Ven
2009-12-21 13:22 ` Tejun Heo
2009-12-21 13:53 ` Arjan van de Ven
2009-12-21 14:19 ` Tejun Heo
2009-12-21 15:19 ` Arjan van de Ven
2009-12-22 0:00 ` Tejun Heo
2009-12-22 11:10 ` Peter Zijlstra
2009-12-22 17:20 ` Linus Torvalds
2009-12-22 17:47 ` Peter Zijlstra
2009-12-22 18:07 ` Andi Kleen
2009-12-22 18:20 ` Peter Zijlstra
2009-12-23 8:17 ` Stijn Devriendt
2009-12-23 8:43 ` Peter Zijlstra
2009-12-23 9:01 ` Stijn Devriendt
2009-12-22 18:28 ` Linus Torvalds
2009-12-23 8:06 ` Johannes Berg
2009-12-23 3:37 ` Tejun Heo
2009-12-23 6:52 ` Herbert Xu
2009-12-23 8:00 ` Steffen Klassert
2009-12-23 8:01 ` [PATCH 0/2] Parallel crypto/IPsec v7 Steffen Klassert
2009-12-23 8:03 ` [PATCH 1/2] padata: generic parallelization/serialization interface Steffen Klassert
2009-12-23 8:04 ` [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper Steffen Klassert
2010-01-07 5:39 ` [PATCH 0/2] Parallel crypto/IPsec v7 Herbert Xu
2010-01-16 9:44 ` David Miller
2009-12-18 15:30 ` workqueue thing Linus Torvalds
2009-12-18 15:39 ` Ingo Molnar
2009-12-18 15:39 ` Peter Zijlstra
2009-12-18 15:47 ` Linus Torvalds
2009-12-18 15:53 ` Peter Zijlstra
2009-12-21 3:04 ` Tejun Heo
2009-12-21 9:22 ` Peter Zijlstra
2009-12-21 13:30 ` Tejun Heo
2009-12-21 14:26 ` Peter Zijlstra
2009-12-21 23:50 ` Tejun Heo
2009-12-22 11:00 ` Peter Zijlstra
2009-12-22 11:03 ` Peter Zijlstra
2009-12-23 3:43 ` Tejun Heo
2009-12-22 11:04 ` Peter Zijlstra
2009-12-23 3:48 ` Tejun Heo
2009-12-22 11:06 ` Peter Zijlstra
2009-12-23 4:18 ` Tejun Heo [this message]
2009-12-23 4:42 ` Linus Torvalds
2009-12-23 6:02 ` Ingo Molnar
2009-12-23 6:13 ` Jeff Garzik
2009-12-23 7:53 ` Ingo Molnar
2009-12-23 8:41 ` Peter Zijlstra
2009-12-23 10:25 ` Jeff Garzik
2009-12-23 13:33 ` Stefan Richter
2009-12-23 14:20 ` Mark Brown
2009-12-23 7:09 ` Tejun Heo
2009-12-23 8:01 ` Ingo Molnar
2009-12-23 8:12 ` Ingo Molnar
2009-12-23 8:32 ` Tejun Heo
2009-12-23 8:42 ` Ingo Molnar
2009-12-23 8:27 ` Tejun Heo
2009-12-23 8:37 ` Ingo Molnar
2009-12-23 8:49 ` Tejun Heo
2009-12-23 8:49 ` Ingo Molnar
2009-12-23 9:03 ` Tejun Heo
2009-12-23 13:40 ` Stefan Richter
2009-12-23 13:43 ` Stefan Richter
2009-12-23 8:25 ` Arjan van de Ven
2009-12-23 13:00 ` Stefan Richter
2009-12-23 8:31 ` Stijn Devriendt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B319A20.9010305@kernel.org \
--to=tj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=arjan@linux.intel.com \
--cc=avi@redhat.com \
--cc=awalls@radix.net \
--cc=cl@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=jeff@garzik.org \
--cc=jens.axboe@oracle.com \
--cc=johannes@sipsolutions.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rusty@rustcorp.com.au \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox