* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND @ 2010-07-20 22:39 Tejun Heo 2010-07-21 13:08 ` David Howells 0 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2010-07-20 22:39 UTC (permalink / raw) To: David Howells Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, David. "David Howells" <dhowells@redhat.com> wrote: > Does this mean you don't get reentrancy guarantees with unbounded work queues? It means that unbound wq behaves like a generic worker pool. Bound wq limits concurrency to minimal level but unbound one executes works as long as resources are available. I'll continue below. >I can't work out how you're achieving it with unbounded queues. I presume with >CPU-bound workqueues your doing it by binding the work item to the current CPU >still... Unbound works are served by a dedicated gcwq whose workers are not affine to any particular CPU. As all unbound works are served by the same gcwq, non reentrancy is automatically guaranteed. >Btw, how does this fare in an RT system, where work items bound to a CPU can't >get executed because their CPU is busy with an RT thread, even though there are >other, idle CPUs? Sure, there's nothing special about unbound workers. They're just normal kthreads. >> Oh, and Frederic suggested that we would be better off with something based >> on tracing API and I agree, so the debugfs thing is currently dropped from >> the tree. What do you think? > >I probably disagree. I just want to be able to cat a file and see the current >runqueue state. I don't want to have to write and distribute a special program >to do this. Of course, I don't know that much about the tracing API, so >cat'ing a file to get the runqueue listed nicely may be possible with that. I'm relatively sure we can do that. Frederic? Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-20 22:39 [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo @ 2010-07-21 13:08 ` David Howells 2010-07-21 14:59 ` Tejun Heo 0 siblings, 1 reply; 15+ messages in thread From: David Howells @ 2010-07-21 13:08 UTC (permalink / raw) To: Tejun Heo Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Tejun Heo <tj@kernel.org> wrote: > As all unbound works are served by the same gcwq, non reentrancy is > automatically guaranteed. That doesn't actually explain _how_ it's non-reentrant. The gcwq includes a collection of threads that can execute from it, right? If so, what mechanism prevents two threads from executing the same work item, if that work item isn't bound to a CPU? I've been trying to figure this out from the code, but I don't see it offhand. > > Btw, how does this fare in an RT system, where work items bound to a CPU > > can't get executed because their CPU is busy with an RT thread, even > > though there are other, idle CPUs? > > Sure, there's nothing special about unbound workers. They're just normal > kthreads. I should've been clearer: As I understand it, normal (unbound) worker items are bound to the CPU on which they were queued, and will be executed there only (barring CPU removal). If that's the case, isn't it possible that work items can be prevented from getting execution time by an RT thread that's hogging a CPU and won't let go? David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-21 13:08 ` David Howells @ 2010-07-21 14:59 ` Tejun Heo 2010-07-21 15:03 ` Tejun Heo 2010-07-21 15:25 ` David Howells 0 siblings, 2 replies; 15+ messages in thread From: Tejun Heo @ 2010-07-21 14:59 UTC (permalink / raw) To: David Howells Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, On 07/21/2010 03:08 PM, David Howells wrote: > Tejun Heo <tj@kernel.org> wrote: > >> As all unbound works are served by the same gcwq, non reentrancy is >> automatically guaranteed. > > That doesn't actually explain _how_ it's non-reentrant. The gcwq includes a > collection of threads that can execute from it, right? If so, what mechanism > prevents two threads from executing the same work item, if that work item > isn't bound to a CPU? I've been trying to figure this out from the code, but > I don't see it offhand. Sharing the same gcwq is why workqueues bound to one CPU have non-reentrancy, so they're using the same mechanism. If it doesn't work for unbound workqueues, the normal ones are broken too. Each gcwq keeps track of currently running works in a hash table and looks whether the work in question is already executing before starting executing it. It's a bit complex but as a work_struct may be freed once execution starts, the status needs to be tracked outside. >>> Btw, how does this fare in an RT system, where work items bound to a CPU >>> can't get executed because their CPU is busy with an RT thread, even >>> though there are other, idle CPUs? >> >> Sure, there's nothing special about unbound workers. They're just normal >> kthreads. > > I should've been clearer: As I understand it, normal (unbound) worker items > are bound to the CPU on which they were queued, and will be executed there > only (barring CPU removal). If that's the case, isn't it possible that work > items can be prevented from getting execution time by an RT thread that's > hogging a CPU and won't let go? Yeah, for bound workqueues, sure. That's exactly the same as the original workqueue implementation. For unbound workqueues, it doesn't matter. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-21 14:59 ` Tejun Heo @ 2010-07-21 15:03 ` Tejun Heo 2010-07-21 15:25 ` David Howells 1 sibling, 0 replies; 15+ messages in thread From: Tejun Heo @ 2010-07-21 15:03 UTC (permalink / raw) To: David Howells Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Just a bit of clarification. On 07/21/2010 04:59 PM, Tejun Heo wrote: >> I should've been clearer: As I understand it, normal (unbound) worker items In workqueue land, normal workqueues would be bound to CPUs while workers for WQ_UNBOUND workqueues aren't affined to any specific CPU. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-21 14:59 ` Tejun Heo 2010-07-21 15:03 ` Tejun Heo @ 2010-07-21 15:25 ` David Howells 2010-07-21 15:31 ` Tejun Heo 1 sibling, 1 reply; 15+ messages in thread From: David Howells @ 2010-07-21 15:25 UTC (permalink / raw) To: Tejun Heo Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Tejun Heo <tj@kernel.org> wrote: > Each gcwq keeps track of currently running works in a hash table and looks > whether the work in question is already executing before starting executing > it. It's a bit complex but as a work_struct may be freed once execution > starts, the status needs to be tracked outside. Thanks, that's what I wanted to know. I presume this survives an executing work_struct being freed, reallocated and requeued before the address of the work_struct is removed from the hash table? I can see at least one way of doing this: marking the work_struct address in the hash when the address becomes pending again so that the process of hash removal will cause the work_struct to be requeued automatically. David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-21 15:25 ` David Howells @ 2010-07-21 15:31 ` Tejun Heo 2010-07-21 15:38 ` David Howells 2010-07-21 15:45 ` David Howells 0 siblings, 2 replies; 15+ messages in thread From: Tejun Heo @ 2010-07-21 15:31 UTC (permalink / raw) To: David Howells Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, On 07/21/2010 05:25 PM, David Howells wrote: >> Each gcwq keeps track of currently running works in a hash table and looks >> whether the work in question is already executing before starting executing >> it. It's a bit complex but as a work_struct may be freed once execution >> starts, the status needs to be tracked outside. > > Thanks, that's what I wanted to know. > > I presume this survives an executing work_struct being freed, reallocated and > requeued before the address of the work_struct is removed from the hash table? It will unnecessarily stall the execution of the new work if the last work is still running but nothing will be broken correctness-wise. > I can see at least one way of doing this: marking the work_struct address in > the hash when the address becomes pending again so that the process of hash > removal will cause the work_struct to be requeued automatically. If I'm correctly understanding what you're saying, the code already does about the same thing. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-21 15:31 ` Tejun Heo @ 2010-07-21 15:38 ` David Howells 2010-07-21 15:42 ` Tejun Heo 2010-07-21 15:45 ` David Howells 1 sibling, 1 reply; 15+ messages in thread From: David Howells @ 2010-07-21 15:38 UTC (permalink / raw) To: Tejun Heo Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Tejun Heo <tj@kernel.org> wrote: > If I'm correctly understanding what you're saying, the code already > does about the same thing. Cool. Btw, it seems to work for fscache. Feel free to add my Acked-by to your patches. David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-21 15:38 ` David Howells @ 2010-07-21 15:42 ` Tejun Heo 0 siblings, 0 replies; 15+ messages in thread From: Tejun Heo @ 2010-07-21 15:42 UTC (permalink / raw) To: David Howells Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, On 07/21/2010 05:38 PM, David Howells wrote: > Tejun Heo <tj@kernel.org> wrote: > >> If I'm correctly understanding what you're saying, the code already >> does about the same thing. > > Cool. > > Btw, it seems to work for fscache. Feel free to add my Acked-by to your > patches. Great, I'll start working on the debugging stuff once things settle down a bit. Thank you. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-21 15:31 ` Tejun Heo 2010-07-21 15:38 ` David Howells @ 2010-07-21 15:45 ` David Howells 2010-07-21 15:51 ` Tejun Heo 1 sibling, 1 reply; 15+ messages in thread From: David Howells @ 2010-07-21 15:45 UTC (permalink / raw) To: Tejun Heo Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Tejun Heo <tj@kernel.org> wrote: > It will unnecessarily stall the execution of the new work if the last > work is still running but nothing will be broken correctness-wise. That's fine. Better that than risk unexpected reentrance. You could add a function to allow an executing work item to yield the hash entry to indicate that the work_item that invoked it has been destroyed, but it's probably not worth it, and it has scope for mucking things up horribly if used at the wrong time. I presume also that if a work_item being executed on one work queue is queued on another work queue, then there is no non-reentrancy guarantee (which is fine; if you don't like that, don't do it). David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-21 15:45 ` David Howells @ 2010-07-21 15:51 ` Tejun Heo 0 siblings, 0 replies; 15+ messages in thread From: Tejun Heo @ 2010-07-21 15:51 UTC (permalink / raw) To: David Howells Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, On 07/21/2010 05:45 PM, David Howells wrote: > That's fine. Better that than risk unexpected reentrance. You could add a > function to allow an executing work item to yield the hash entry to indicate > that the work_item that invoked it has been destroyed, but it's probably not > worth it, and it has scope for mucking things up horribly if used at the wrong > time. Yeah, I agree, it's going too far and can be easily misused. Given that there are very few users which actually do that, I think it would be best to leave it alone. > I presume also that if a work_item being executed on one work queue is queued > on another work queue, then there is no non-reentrancy guarantee (which is > fine; if you don't like that, don't do it). Right, there is no non-reentrancy guarantee. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 34/35] async: use workqueue for worker pool
2010-06-29 16:40 ` Arjan van de Ven
@ 2010-06-29 16:59 Tejun Heo
2010-06-28 21:03 ` [PATCHSET] workqueue: concurrency managed workqueue, take#6 Tejun Heo
0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-06-29 16:59 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm,
rusty, cl, dhowells, oleg, axboe, dwalker, stefanr, florian, andi,
mst, randy.dunlap, Arjan van de Ven
Hello, Arjan.
On 06/29/2010 06:40 PM, Arjan van de Ven wrote:
> uh? clearly the assumption is that if I have a 16 CPU machine, and 12
> items of work get scheduled,
> that we get all 12 running in parallel. All the smarts of cmwq surely
> only kick in once you've reached the
> "one work item per cpu" threshold ???
Hmmm... workqueue workers are bound to certain cpu, so if you schedule
a work on a specific CPU, it will run there. Once a cpu gets
saturated, the issuing thread will be moved elsewhere. I don't think
it matters to any of the current async users one way or the other,
would it?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCHSET] workqueue: concurrency managed workqueue, take#6 @ 2010-06-28 21:03 ` Tejun Heo 2010-06-28 21:04 ` [PATCH 34/35] async: use workqueue for worker pool Tejun Heo 0 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2010-06-28 21:03 UTC (permalink / raw) To: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells, arjan, oleg, axboe, fweisbec, dwalker, stefanr, florian, andi, mst, randy.dunlap Hello, all. This is the sixth take of cmwq (concurrency managed workqueue) patchset. It's on top of v2.6.35-rc3 + sched/core branch. Git tree is available at git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-cmwq Linus, please read the merge plan section. Thanks. Table of contents ================= A. This take A-1. Merge plan A-2. Changes from the last take[L] A-3. TODOs A-4. Patches and diffstat B. General documentation of Concurrency Managed Workqueue (cmwq) B-1. Why? B-2. Overview B-3. Unified worklist B-4. Concurrency managed shared worker pool B-5. Performance test results A. This take ============ == A-1. Merge plan Until now, cmwq patches haven't been fixed into permanent commits mainly because sched patches which they are dependent upon made into sched/core tree only recently. After review, I'll put this take into permanent commits. Further developments or fixes will be done on top. I believe that expected users of cmwq are generally in favor of the flexibility added by cmwq. In the last take, the following issues were raised. * Andi Kleen wanted to use high priority dispatching for memory fault handlers. WQ_HIGHPRI is implemented to deal with this and padata integration. * Andrew Morton raised two issues - workqueue users which use RT priority setting (ivtv) and padata integration. kthread_worker which provides simple work based interface on top of kthread is added for cases where fixed association with a specific kthread is required for priority setting, cpuset and other task attributes adjustments. This will also be used by virtnet. WQ_CPU_INTENSIVE is added to address padata integration. When combined with WQ_HIGHPRI, all concurrency management logic is bypassed and cmwq works as a (conceptually) simple context provider and padata should operate without any noticeable difference. * Daniel Walker objected on the ground that cmwq would make it impossible to adjust priorities of workqueue threads which can be useful as an ad-hoc optimization. I don't plan to address this concern (suggested solution is to add userland visible knobs to adjust workqueue priorities) at this point because it is an implementation detail that userspace shouldn't diddle with in the first place. If anyone is interested in the details of the dicussion, please read the dicussion thread on the last take[L]. Unless there are fundamental objections, I'll push the patchset out to linux-next and proceed with the followings. * integrating with other subsystems * auditing all the workqueue users to better suit cmwq * implementing features which will depend on cmwq (in-kernel media presence polling is the first target) I expect there to be some, hopefully not too many, cross tree pulls in the process and it will be a bit messy to back out later, so if you have any fundamental concerns, please speak sooner than later. Linus, it would be great if you let me know whether you agree with the merge plan. == A-2. Changes from the last take * kthread_worker is added. kthread_worker is a minimal work execution wrapper around kthread. This is to ease using kthread for users which require control over thread attributes like priority, cpuset or whatever. kthreads can be created with kthread_worker_fn() directly or kthread_worker_fn() can be called after running any code the kthread needs to run for initialization. The kthread can be treated the same way as any other kthread. - ivtv which used single threaded workqueue and bumped the priority of the worker to RT is converted to use kthread_worker. * WQ_HIGHPRI and WQ_CPU_INTENSIVE are implemented. Works queued to a high priority workqueues are queued at the head of the global worklist and don't get blocked by other works. They're dispatched to a worker as soon as possible. Works queued to a CPU intensive workqueue don't participate in concurrency management and thus don't block other works from executing. This is to be used by works which are expected to burn considerable amount of CPU cycles. Workqueues w/ both WQ_HIGHPRI and WQ_CPU_INTENSIVE set don't get affected by or participate in concurrency management. Works queued on such workqueues are dispatched immediately and don't affect other works. - pcrypt which creates workqueues and uses them for padata is converted to use high priority cpu intensive workqueues with max_active of 1, which should behave about the same as the original implementation. Going forward, as workqueues themselves don't cost to have around anymore, it would be better to make padata to directly create workqueues for its users. * To implement HIGHPRI and CPU_INTENSIVE, handling of worker flags which affect the running state for concurrency management has been updated. worker_{set|clr}_flags() are added which manage the nr_running count according to worker state transitions. This also makes nr_running counting easier to follow and verify. * __create_workqueue() is renamed to alloc_workqueue() and is now a public interface. It now interprets 0 max_active as the default max_active. In the long run, all create*_workqueue() calls will be replaced with alloc_workqueue(). * Custom workqueue instrumentation via debugfs is removed. The plan is to implement proper tracing API based instrumentation as suggested by Frederic Weisbecker. * The original workqueue tracer code removed as suggested by Frederic Weisbecker. * Comments updated/added. == A-3. TODOs * fscache/slow-work conversion is not in this series. It needs to be performance tested and acked by David Howells. * Audit each workqueue users and - make them use system workqueue instead if possible. - drop emergency worker if possible. - make them use alloc_workqueue() instead. * Improve lockdep annotations. * Implement workqueue tracer. == A-4. Patches and diffstat 0001-kthread-implement-kthread_worker.patch 0002-ivtv-use-kthread_worker-instead-of-workqueue.patch 0003-kthread-implement-kthread_data.patch 0004-acpi-use-queue_work_on-instead-of-binding-workqueue-.patch 0005-workqueue-kill-RT-workqueue.patch 0006-workqueue-misc-cosmetic-updates.patch 0007-workqueue-merge-feature-parameters-into-flags.patch 0008-workqueue-define-masks-for-work-flags-and-conditiona.patch 0009-workqueue-separate-out-process_one_work.patch 0010-workqueue-temporarily-remove-workqueue-tracing.patch 0011-workqueue-kill-cpu_populated_map.patch 0012-workqueue-update-cwq-alignement.patch 0013-workqueue-reimplement-workqueue-flushing-using-color.patch 0014-workqueue-introduce-worker.patch 0015-workqueue-reimplement-work-flushing-using-linked-wor.patch 0016-workqueue-implement-per-cwq-active-work-limit.patch 0017-workqueue-reimplement-workqueue-freeze-using-max_act.patch 0018-workqueue-introduce-global-cwq-and-unify-cwq-locks.patch 0019-workqueue-implement-worker-states.patch 0020-workqueue-reimplement-CPU-hotplugging-support-using-.patch 0021-workqueue-make-single-thread-workqueue-shared-worker.patch 0022-workqueue-add-find_worker_executing_work-and-track-c.patch 0023-workqueue-carry-cpu-number-in-work-data-once-executi.patch 0024-workqueue-implement-WQ_NON_REENTRANT.patch 0025-workqueue-use-shared-worklist-and-pool-all-workers-p.patch 0026-workqueue-implement-worker_-set-clr-_flags.patch 0027-workqueue-implement-concurrency-managed-dynamic-work.patch 0028-workqueue-increase-max_active-of-keventd-and-kill-cu.patch 0029-workqueue-s-__create_workqueue-alloc_workqueue-and-a.patch 0030-workqueue-implement-several-utility-APIs.patch 0031-workqueue-implement-high-priority-workqueue.patch 0032-workqueue-implement-cpu-intensive-workqueue.patch 0033-libata-take-advantage-of-cmwq-and-remove-concurrency.patch 0034-async-use-workqueue-for-worker-pool.patch 0035-pcrypt-use-HIGHPRI-and-CPU_INTENSIVE-workqueues-for-.patch arch/ia64/kernel/smpboot.c | 2 arch/x86/kernel/smpboot.c | 2 crypto/pcrypt.c | 4 drivers/acpi/osl.c | 40 drivers/ata/libata-core.c | 20 drivers/ata/libata-eh.c | 4 drivers/ata/libata-scsi.c | 10 drivers/ata/libata-sff.c | 9 drivers/ata/libata.h | 1 drivers/media/video/ivtv/ivtv-driver.c | 26 drivers/media/video/ivtv/ivtv-driver.h | 8 drivers/media/video/ivtv/ivtv-irq.c | 15 drivers/media/video/ivtv/ivtv-irq.h | 2 include/linux/cpu.h | 2 include/linux/kthread.h | 65 include/linux/libata.h | 1 include/linux/workqueue.h | 135 + include/trace/events/workqueue.h | 92 kernel/async.c | 140 - kernel/kthread.c | 164 + kernel/power/process.c | 21 kernel/trace/Kconfig | 11 kernel/workqueue.c | 3260 +++++++++++++++++++++++++++------ kernel/workqueue_sched.h | 13 24 files changed, 3202 insertions(+), 845 deletions(-) B. General documentation of Concurrency Managed Workqueue (cmwq) ================================================================ == B-1. Why? cmwq brings the following benefits. * By using a shared pool of workers for each cpu, cmwq uses resources more efficiently and the system no longer ends up with a lot of kernel threads which sit mostly idle. The separate dedicated per-cpu workers of the current workqueue implementation are already becoming an actual scalability issue and with increasing number of cpus it will only get worse. * cmwq can provide flexible level of concurrency on demand. While the current workqueue implementation keeps a lot of worker threads around, it still can only provide very limited level of concurrency. * cmwq makes obtaining and using execution contexts easy, which results in less complexities and awkward compromises in its users. IOW, it transfers complexity from its users to core code. This will also allow implementation of things which need a flexible async mechanism but aren't important enough to have dedicated worker pools for. * Work execution latencies are shorter and more predictable. They are no longer affected by how long random previous works might take to finish but, in the most part, regulated only by processing cycle availability. * Much less to worry about causing deadlocks around execution resources. * All the above while maintaining behavior compatibility with the original workqueue and without any noticeable run time overhead. == B-2. Overview There are many cases where an execution context is needed and there already are several mechanisms for them. The most commonly used one is workqueue (wq) and there also are slow_work, async and some other. Although wq has been serving the kernel well for quite some time, it has certain limitations which are becoming more apparent. There are two types of wq, single and multi threaded. Multi threaded (MT) wq keeps a bound thread for each online CPU, while single threaded (ST) wq uses single unbound thread. The number of CPU cores is continuously rising and there already are systems which saturate the default 32k PID space during boot up. Frustratingly, although MT wq end up spending a lot of resources, the level of concurrency provided is unsatisfactory. The limitation is common to both ST and MT wq although it's less severe on MT ones. Worker pools of wq are separate from each other. A MT wq provides one execution context per CPU while a ST wq one for the whole system, which leads to various problems. One of the problems is possible deadlock through dependency on the same execution resource. These can be detected reliably with lockdep these days but in most cases the only solution is to create a dedicated wq for one of the parties involved in the deadlock, which feeds back into the waste of resources problem. Also, when creating such dedicated wq to avoid deadlock, in an attempt to avoid wasting large number of threads just for that work, ST wq are often used but in most cases ST wq are suboptimal compared to MT wq. The tension between the provided level of concurrency and resource usage forces its users to make unnecessary tradeoffs like libata choosing to use ST wq for polling PIOs and accepting a silly limitation that no two polling PIOs can progress at the same time. As MT wq don't provide much better concurrency, users which require higher level of concurrency, like async or fscache, end up having to implement their own worker pool. Concurrency managed workqueue (cmwq) extends wq with focus on the following goals. * Maintain compatibility with the current workqueue API while removing above mentioned limitations. * Provide single unified worker pool per cpu which can be shared by all users. The worker pool and level of concurrency should be regulated automatically so that the API users don't need to worry about such details. * Use what's necessary and allocate resources lazily on demand while guaranteeing forward progress where necessary. == B-3. Unified worklist There's a single global cwq (gcwq) per each possible cpu which actually serves out execution contexts. cpu_workqueue's (cwq) of each wq are mostly simple frontends to the associated gcwq. Under normal operation, when a work is queued, it's queued to the gcwq of the cpu. Each gcwq has its own pool of workers which is used to process all the works queued on the cpu. Works mostly don't care to which wq they're queued to and using a unified worklist is straight forward but there are a couple of areas where things become more complicated. First, when queueing works from different wq on the same worklist, ordering of works needs some care. Originally, a MT wq allows a work to be executed simultaneously on multiple cpus although it doesn't allow the same one to execute simultaneously on the same cpu (reentrant). A ST wq allows only single work to be executed on any cpu which guarantees both non-reentrancy and single-threadedness. cmwq provides three different ordering modes - reentrant (default mode), non-reentrant and single-cpu. Single-cpu can be used to achieve single-threadedness and full ordering if combined with max_active of 1. The default mode (reentrant) is the same as the original MT wq. The distinction between non-reentrancy and single-cpu is made because some of the current ST wq users dont't need single threadedness but only non-reentrancy. Another area where things are more involved is wq flushing because wq act as flushing domains. cmwq implements it by coloring works and tracking how many times each color is used. When a work is queued to a cwq, it's assigned a color and each cwq maintains counters for each work color. The color assignment changes on each wq flush attempt. A cwq can tell that all works queued before a certain wq flush attempt have finished by waiting for all the colors upto that point to drain. This maintains the original wq flush semantics without adding unscalable overhead. == B-4. Concurrency managed shared worker pool For any worker pool, managing the concurrency level (how many workers are executing simultaneously) is an important issue. cmwq tries to keep the concurrency at minimal but sufficient level. Concurrency management is implemented by hooking into the scheduler. The gcwq is notified whenever a busy worker wakes up or sleeps and keeps track of the level of concurrency. Generally, works aren't supposed to be cpu cycle hogs and maintaining just enough concurrency to prevent work processing from stalling is optimal. As long as there's one or more workers running on the cpu, no new worker is scheduled, but, when the last running worker blocks, the gcwq immediately schedules a new worker so that the cpu doesn't sit idle while there are pending works. This allows using minimal number of workers without losing execution bandwidth. Keeping idle workers around doesn't cost other than the memory space for kthreads, so cmwq holds onto idle ones for a while before killing them. As multiple execution contexts are available for each wq, deadlocks around execution contexts is much harder to create. The default wq, system_wq, has maximum concurrency level of 256 and unless there is a scenario which can result in a dependency loop involving more than 254 workers, it won't deadlock. Such forward progress guarantee relies on that workers can be created when more execution contexts are necessary. This is guaranteed by using emergency workers. All wq which can be used in memory allocation path are required to have emergency workers which are reserved for execution of that specific wq so that memory allocation for worker creation doesn't deadlock on workers. == B-5. Performance test results NOTE: This is with the third take[3] but nothing which could affect performance noticeably has changed since then. wq workload is generated by perf-wq.c module which is a very simple synthetic wq load generator. A work is described by five parameters - burn_usecs, mean_sleep_msecs, mean_resched_msecs and factor. It randomly splits burn_usecs into two, burns the first part, sleeps for 0 - 2 * mean_sleep_msecs, burns what's left of burn_usecs and then reschedules itself in 0 - 2 * mean_resched_msecs. factor is used to tune the number of cycles to match execution duration. It issues three types of works - short, medium and long, each with two burn durations L and S. burn/L(us) burn/S(us) mean_sleep(ms) mean_resched(ms) cycles short 50 1 1 10 454 medium 50 2 10 50 125 long 50 4 100 250 42 And then these works are put into the following workloads. The lower numbered workloads have more short/medium works. workload 0 * 12 wq with 4 short works * 2 wq with 2 short and 2 medium works * 4 wq with 2 medium and 1 long works * 8 wq with 1 long work workload 1 * 8 wq with 4 short works * 2 wq with 2 short and 2 medium works * 4 wq with 2 medium and 1 long works * 8 wq with 1 long work workload 2 * 4 wq with 4 short works * 2 wq with 2 short and 2 medium works * 4 wq with 2 medium and 1 long works * 8 wq with 1 long work workload 3 * 2 wq with 4 short works * 2 wq with 2 short and 2 medium works * 4 wq with 2 medium and 1 long works * 8 wq with 1 long work workload 4 * 2 wq with 4 short works * 2 wq with 2 medium works * 4 wq with 2 medium and 1 long works * 8 wq with 1 long work workload 5 * 2 wq with 2 medium works * 4 wq with 2 medium and 1 long works * 8 wq with 1 long work The above wq loads are run in parallel with mencoder converting 76M mjpeg file into mpeg4 which takes 25.59 seconds with standard deviation of 0.19 without wq loading. The CPU was intel netburst celeron running at 2.66GHz which was chosen for its small cache size and slowness. wl0 and 1 are only tested for burn/S. Each test case was run 11 times and the first run was discarded. vanilla/L cmwq/L vanilla/S cmwq/S wl0 26.18 d0.24 26.27 d0.29 wl1 26.50 d0.45 26.52 d0.23 wl2 26.62 d0.35 26.53 d0.23 26.14 d0.22 26.12 d0.32 wl3 26.30 d0.25 26.29 d0.26 25.94 d0.25 26.17 d0.30 wl4 26.26 d0.23 25.93 d0.24 25.90 d0.23 25.91 d0.29 wl5 25.81 d0.33 25.88 d0.25 25.63 d0.27 25.59 d0.26 There is no significant difference between the two. Maybe the code overhead and benefits coming from context sharing are canceling each other nicely. With longer burns, cmwq looks better but it's nothing significant. With shorter burns, other than wl3 spiking up for vanilla which probably would go away if the test is repeated, the two are performing virtually identically. The above is exaggerated synthetic test result and the performance difference will be even less noticeable in either direction under realistic workloads. -- tejun [L] http://thread.gmane.org/gmane.linux.kernel/998652 [3] http://thread.gmane.org/gmane.linux.kernel/939353 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 34/35] async: use workqueue for worker pool 2010-06-28 21:03 ` [PATCHSET] workqueue: concurrency managed workqueue, take#6 Tejun Heo @ 2010-06-28 21:04 ` Tejun Heo 2010-06-28 22:55 ` Frederic Weisbecker 0 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2010-06-28 21:04 UTC (permalink / raw) To: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells, arjan, oleg, axboe, fweisbec, dwalker, stefanr, florian, andi, mst, randy.dunlap Cc: Tejun Heo, Arjan van de Ven Replace private worker pool with system_long_wq. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@infradead.org> --- kernel/async.c | 140 ++++++++----------------------------------------------- 1 files changed, 21 insertions(+), 119 deletions(-) diff --git a/kernel/async.c b/kernel/async.c index 15319d6..c285258 100644 --- a/kernel/async.c +++ b/kernel/async.c @@ -49,40 +49,32 @@ asynchronous and synchronous parts of the kernel. */ #include <linux/async.h> -#include <linux/bug.h> #include <linux/module.h> #include <linux/wait.h> #include <linux/sched.h> -#include <linux/init.h> -#include <linux/kthread.h> -#include <linux/delay.h> #include <linux/slab.h> #include <asm/atomic.h> static async_cookie_t next_cookie = 1; -#define MAX_THREADS 256 #define MAX_WORK 32768 static LIST_HEAD(async_pending); static LIST_HEAD(async_running); static DEFINE_SPINLOCK(async_lock); -static int async_enabled = 0; - struct async_entry { - struct list_head list; - async_cookie_t cookie; - async_func_ptr *func; - void *data; - struct list_head *running; + struct list_head list; + struct work_struct work; + async_cookie_t cookie; + async_func_ptr *func; + void *data; + struct list_head *running; }; static DECLARE_WAIT_QUEUE_HEAD(async_done); -static DECLARE_WAIT_QUEUE_HEAD(async_new); static atomic_t entry_count; -static atomic_t thread_count; extern int initcall_debug; @@ -117,27 +109,23 @@ static async_cookie_t lowest_in_progress(struct list_head *running) spin_unlock_irqrestore(&async_lock, flags); return ret; } + /* * pick the first pending entry and run it */ -static void run_one_entry(void) +static void async_run_entry_fn(struct work_struct *work) { + struct async_entry *entry = + container_of(work, struct async_entry, work); unsigned long flags; - struct async_entry *entry; ktime_t calltime, delta, rettime; - /* 1) pick one task from the pending queue */ - + /* 1) move self to the running queue */ spin_lock_irqsave(&async_lock, flags); - if (list_empty(&async_pending)) - goto out; - entry = list_first_entry(&async_pending, struct async_entry, list); - - /* 2) move it to the running queue */ list_move_tail(&entry->list, entry->running); spin_unlock_irqrestore(&async_lock, flags); - /* 3) run it (and print duration)*/ + /* 2) run (and print duration) */ if (initcall_debug && system_state == SYSTEM_BOOTING) { printk("calling %lli_%pF @ %i\n", (long long)entry->cookie, entry->func, task_pid_nr(current)); @@ -153,31 +141,25 @@ static void run_one_entry(void) (long long)ktime_to_ns(delta) >> 10); } - /* 4) remove it from the running queue */ + /* 3) remove self from the running queue */ spin_lock_irqsave(&async_lock, flags); list_del(&entry->list); - /* 5) free the entry */ + /* 4) free the entry */ kfree(entry); atomic_dec(&entry_count); spin_unlock_irqrestore(&async_lock, flags); - /* 6) wake up any waiters. */ + /* 5) wake up any waiters */ wake_up(&async_done); - return; - -out: - spin_unlock_irqrestore(&async_lock, flags); } - static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct list_head *running) { struct async_entry *entry; unsigned long flags; async_cookie_t newcookie; - /* allow irq-off callers */ entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC); @@ -186,7 +168,7 @@ static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct l * If we're out of memory or if there's too much work * pending already, we execute synchronously. */ - if (!async_enabled || !entry || atomic_read(&entry_count) > MAX_WORK) { + if (!entry || atomic_read(&entry_count) > MAX_WORK) { kfree(entry); spin_lock_irqsave(&async_lock, flags); newcookie = next_cookie++; @@ -196,6 +178,7 @@ static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct l ptr(data, newcookie); return newcookie; } + INIT_WORK(&entry->work, async_run_entry_fn); entry->func = ptr; entry->data = data; entry->running = running; @@ -205,7 +188,10 @@ static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct l list_add_tail(&entry->list, &async_pending); atomic_inc(&entry_count); spin_unlock_irqrestore(&async_lock, flags); - wake_up(&async_new); + + /* schedule for execution */ + queue_work(system_long_wq, &entry->work); + return newcookie; } @@ -312,87 +298,3 @@ void async_synchronize_cookie(async_cookie_t cookie) async_synchronize_cookie_domain(cookie, &async_running); } EXPORT_SYMBOL_GPL(async_synchronize_cookie); - - -static int async_thread(void *unused) -{ - DECLARE_WAITQUEUE(wq, current); - add_wait_queue(&async_new, &wq); - - while (!kthread_should_stop()) { - int ret = HZ; - set_current_state(TASK_INTERRUPTIBLE); - /* - * check the list head without lock.. false positives - * are dealt with inside run_one_entry() while holding - * the lock. - */ - rmb(); - if (!list_empty(&async_pending)) - run_one_entry(); - else - ret = schedule_timeout(HZ); - - if (ret == 0) { - /* - * we timed out, this means we as thread are redundant. - * we sign off and die, but we to avoid any races there - * is a last-straw check to see if work snuck in. - */ - atomic_dec(&thread_count); - wmb(); /* manager must see our departure first */ - if (list_empty(&async_pending)) - break; - /* - * woops work came in between us timing out and us - * signing off; we need to stay alive and keep working. - */ - atomic_inc(&thread_count); - } - } - remove_wait_queue(&async_new, &wq); - - return 0; -} - -static int async_manager_thread(void *unused) -{ - DECLARE_WAITQUEUE(wq, current); - add_wait_queue(&async_new, &wq); - - while (!kthread_should_stop()) { - int tc, ec; - - set_current_state(TASK_INTERRUPTIBLE); - - tc = atomic_read(&thread_count); - rmb(); - ec = atomic_read(&entry_count); - - while (tc < ec && tc < MAX_THREADS) { - if (IS_ERR(kthread_run(async_thread, NULL, "async/%i", - tc))) { - msleep(100); - continue; - } - atomic_inc(&thread_count); - tc++; - } - - schedule(); - } - remove_wait_queue(&async_new, &wq); - - return 0; -} - -static int __init async_init(void) -{ - async_enabled = - !IS_ERR(kthread_run(async_manager_thread, NULL, "async/mgr")); - - WARN_ON(!async_enabled); - return 0; -} - -core_initcall(async_init); -- 1.6.4.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 34/35] async: use workqueue for worker pool 2010-06-28 21:04 ` [PATCH 34/35] async: use workqueue for worker pool Tejun Heo @ 2010-06-28 22:55 ` Frederic Weisbecker 2010-06-29 7:25 ` Tejun Heo 0 siblings, 1 reply; 15+ messages in thread From: Frederic Weisbecker @ 2010-06-28 22:55 UTC (permalink / raw) To: Tejun Heo Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells, arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven On Mon, Jun 28, 2010 at 11:04:22PM +0200, Tejun Heo wrote: > Replace private worker pool with system_long_wq. It appeared to me that async is deemed to parallelize as much as possible, to probe devices faster on boot for example, while cmwq seems to do the opposite: trying to execute in batches as much as possible, and fork when a work goes to sleep voluntarily. That said I haven't checked that deeply so it's fairly possible I missed something obvious :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 34/35] async: use workqueue for worker pool 2010-06-28 22:55 ` Frederic Weisbecker @ 2010-06-29 7:25 ` Tejun Heo 2010-06-29 12:18 ` Frederic Weisbecker 0 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2010-06-29 7:25 UTC (permalink / raw) To: Frederic Weisbecker Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells, arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, On 06/29/2010 12:55 AM, Frederic Weisbecker wrote: > On Mon, Jun 28, 2010 at 11:04:22PM +0200, Tejun Heo wrote: >> Replace private worker pool with system_long_wq. > > It appeared to me that async is deemed to parallelize as much as > possible, to probe devices faster on boot for example, while cmwq > seems to do the opposite: trying to execute in batches as much as > possible, and fork when a work goes to sleep voluntarily. Yeah, well, that's kind of the whole point of cmwq. It would try to minimize the number of used workers but the provided concurrency will still be enough. No async probe will be stalled due to lack of execution context and the timings should be about the same between the original async implemetnation and cmwq based one. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 34/35] async: use workqueue for worker pool 2010-06-29 7:25 ` Tejun Heo @ 2010-06-29 12:18 ` Frederic Weisbecker 2010-06-29 15:46 ` Tejun Heo 0 siblings, 1 reply; 15+ messages in thread From: Frederic Weisbecker @ 2010-06-29 12:18 UTC (permalink / raw) To: Tejun Heo Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells, arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven On Tue, Jun 29, 2010 at 09:25:12AM +0200, Tejun Heo wrote: > Hello, > > On 06/29/2010 12:55 AM, Frederic Weisbecker wrote: > > On Mon, Jun 28, 2010 at 11:04:22PM +0200, Tejun Heo wrote: > >> Replace private worker pool with system_long_wq. > > > > It appeared to me that async is deemed to parallelize as much as > > possible, to probe devices faster on boot for example, while cmwq > > seems to do the opposite: trying to execute in batches as much as > > possible, and fork when a work goes to sleep voluntarily. > > Yeah, well, that's kind of the whole point of cmwq. It would try to > minimize the number of used workers but the provided concurrency will > still be enough. No async probe will be stalled due to lack of > execution context and the timings should be about the same between the > original async implemetnation and cmwq based one. > > Thanks. Right. I just don't know what is supposed to be slow on boot that needs to use async. Is that because reading some ports is slow or because we need to do something and wait for some times to get the result. If there is a question of slow ports to probe, then cmwq wouldn't seem the right thing here, as it only forks when we go to sleep. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 34/35] async: use workqueue for worker pool 2010-06-29 12:18 ` Frederic Weisbecker @ 2010-06-29 15:46 ` Tejun Heo 2010-06-29 15:52 ` Frederic Weisbecker 0 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2010-06-29 15:46 UTC (permalink / raw) To: Frederic Weisbecker Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells, arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, On 06/29/2010 02:18 PM, Frederic Weisbecker wrote: >> Yeah, well, that's kind of the whole point of cmwq. It would try to >> minimize the number of used workers but the provided concurrency will >> still be enough. No async probe will be stalled due to lack of >> execution context and the timings should be about the same between the >> original async implemetnation and cmwq based one. > > Right. I just don't know what is supposed to be slow on boot that > needs to use async. Is that because reading some ports is slow or > because we need to do something and wait for some times to get the > result. It's things like ATA bus resetting and probing. They're usually composed of short CPU activities and rather long sleeps. > If there is a question of slow ports to probe, then cmwq wouldn't seem the > right thing here, as it only forks when we go to sleep. I lost you here. If something during boot has to burn cpu cycles (which it shouldn't, really), it has to burn cpu cycles and having multiple concurent threads won't help anything. If something doesn't burn cpu cycles but takes long, it gotta sleep and cmwq will start a new thread immediately. So, can you please elaborate why cmwq would be problematic? Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 34/35] async: use workqueue for worker pool 2010-06-29 15:46 ` Tejun Heo @ 2010-06-29 15:52 ` Frederic Weisbecker 2010-06-29 15:55 ` Tejun Heo 0 siblings, 1 reply; 15+ messages in thread From: Frederic Weisbecker @ 2010-06-29 15:52 UTC (permalink / raw) To: Tejun Heo Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells, arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven On Tue, Jun 29, 2010 at 05:46:32PM +0200, Tejun Heo wrote: > Hello, > > On 06/29/2010 02:18 PM, Frederic Weisbecker wrote: > >> Yeah, well, that's kind of the whole point of cmwq. It would try to > >> minimize the number of used workers but the provided concurrency will > >> still be enough. No async probe will be stalled due to lack of > >> execution context and the timings should be about the same between the > >> original async implemetnation and cmwq based one. > > > > Right. I just don't know what is supposed to be slow on boot that > > needs to use async. Is that because reading some ports is slow or > > because we need to do something and wait for some times to get the > > result. > > It's things like ATA bus resetting and probing. They're usually > composed of short CPU activities and rather long sleeps. Ok. > > If there is a question of slow ports to probe, then cmwq wouldn't seem the > > right thing here, as it only forks when we go to sleep. > > I lost you here. If something during boot has to burn cpu cycles > (which it shouldn't, really), it has to burn cpu cycles and having > multiple concurent threads won't help anything. It would on SMP. > If something doesn't > burn cpu cycles but takes long, it gotta sleep and cmwq will start a > new thread immediately. So, can you please elaborate why cmwq would > be problematic? No in this case it's not problematic, as far as the things that were using async have a small cpu burn and long sleep waiting, it looks like cmwq fits :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 34/35] async: use workqueue for worker pool 2010-06-29 15:52 ` Frederic Weisbecker @ 2010-06-29 15:55 ` Tejun Heo 2010-06-29 16:40 ` Arjan van de Ven 0 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2010-06-29 15:55 UTC (permalink / raw) To: Frederic Weisbecker Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells, arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, On 06/29/2010 05:52 PM, Frederic Weisbecker wrote: >>> If there is a question of slow ports to probe, then cmwq wouldn't seem the >>> right thing here, as it only forks when we go to sleep. >> >> I lost you here. If something during boot has to burn cpu cycles >> (which it shouldn't, really), it has to burn cpu cycles and having >> multiple concurent threads won't help anything. > > It would on SMP. Oh, I see. Parallel cpu hogs. We don't have such users for async and I think using padata would be the right solution for those situations. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 34/35] async: use workqueue for worker pool 2010-06-29 15:55 ` Tejun Heo @ 2010-06-29 16:40 ` Arjan van de Ven 2010-06-29 21:37 ` David Howells 0 siblings, 1 reply; 15+ messages in thread From: Arjan van de Ven @ 2010-06-29 16:40 UTC (permalink / raw) To: Tejun Heo Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven On 6/29/2010 8:55 AM, Tejun Heo wrote: > Hello, > > On 06/29/2010 05:52 PM, Frederic Weisbecker wrote: > >>>> If there is a question of slow ports to probe, then cmwq wouldn't seem the >>>> right thing here, as it only forks when we go to sleep. >>>> >>> I lost you here. If something during boot has to burn cpu cycles >>> (which it shouldn't, really), it has to burn cpu cycles and having >>> multiple concurent threads won't help anything. >>> >> It would on SMP. >> > Oh, I see. Parallel cpu hogs. We don't have such users for async and > I think using padata would be the right solution for those situations. > uh? clearly the assumption is that if I have a 16 CPU machine, and 12 items of work get scheduled, that we get all 12 running in parallel. All the smarts of cmwq surely only kick in once you've reached the "one work item per cpu" threshold ??? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 34/35] async: use workqueue for worker pool 2010-06-29 16:59 [PATCH 34/35] async: use workqueue for worker pool Tejun Heo @ 2010-06-29 21:37 ` David Howells 2010-07-02 9:17 ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo 0 siblings, 1 reply; 15+ messages in thread From: David Howells @ 2010-06-29 21:37 UTC (permalink / raw) To: Tejun Heo Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Tejun Heo <tj@kernel.org> wrote: > Hmmm... workqueue workers are bound to certain cpu, so if you schedule > a work on a specific CPU, it will run there. That's my main problem with using cmwq to replace slow-work. > Once a cpu gets saturated, the issuing thread will be moved elsewhere. Assuming that the issuing thread isn't bound by the condition specified in the previous sentence... David ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-06-29 21:37 ` David Howells @ 2010-07-02 9:17 ` Tejun Heo 2010-07-02 9:32 ` Tejun Heo ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Tejun Heo @ 2010-07-02 9:17 UTC (permalink / raw) To: David Howells, Arjan van de Ven Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, David, Arjan. These four patches implement unbound workqueues which can be used as simple execution context provider. I changed async to use it and will also make fscache use it. This can be used by setting WQ_UNBOUND on workqueue creation. Works queued to unbound workqueues are implicitly HIGHPRI and dispatched to unbound workers as soon as resources are available and the only limitation applied by workqueue code is @max_active. IOW, for both async and fscache, things will stay about the same. WQ_UNBOUND can serve the role of WQ_SINGLE_CPU. WQ_SINGLE_CPU is dropped and replaced by WQ_UNBOUND. Arjan, I still think we'll be better off using bound workqueues for async but let's first convert without causing behavior difference. Either way isn't gonna result in any noticeable difference anyway. If you're okay with the conversion, please ack it. David, this should work for fscache/slow-work the same way too. That should relieve your concern, right? Oh, and Frederic suggested that we would be better off with something based on tracing API and I agree, so the debugfs thing is currently dropped from the tree. What do you think? Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-02 9:17 ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo @ 2010-07-02 9:32 ` Tejun Heo 2010-07-07 5:41 ` Tejun Heo 2010-07-20 22:01 ` David Howells 2 siblings, 0 replies; 15+ messages in thread From: Tejun Heo @ 2010-07-02 9:32 UTC (permalink / raw) To: David Howells, Arjan van de Ven Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven On 07/02/2010 11:17 AM, Tejun Heo wrote: > Hello, David, Arjan. > > These four patches implement unbound workqueues which can be used as > simple execution context provider. I changed async to use it and will > also make fscache use it. This can be used by setting WQ_UNBOUND on > workqueue creation. Works queued to unbound workqueues are implicitly > HIGHPRI and dispatched to unbound workers as soon as resources are > available and the only limitation applied by workqueue code is > @max_active. IOW, for both async and fscache, things will stay about > the same. > > WQ_UNBOUND can serve the role of WQ_SINGLE_CPU. WQ_SINGLE_CPU is > dropped and replaced by WQ_UNBOUND. > > Arjan, I still think we'll be better off using bound workqueues for > async but let's first convert without causing behavior difference. > Either way isn't gonna result in any noticeable difference anyway. If > you're okay with the conversion, please ack it. > > David, this should work for fscache/slow-work the same way too. That > should relieve your concern, right? Oh, and Frederic suggested that > we would be better off with something based on tracing API and I > agree, so the debugfs thing is currently dropped from the tree. What > do you think? Oops, forgot something. These four patches are on top of wq#for-next-candidate branch which is cmwq take#6 + four fix patches git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-next-candidate and available in the following git tree. git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-cmwq Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-02 9:17 ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo 2010-07-02 9:32 ` Tejun Heo @ 2010-07-07 5:41 ` Tejun Heo 2010-07-14 9:39 ` Tejun Heo 2010-07-20 22:01 ` David Howells 2 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2010-07-07 5:41 UTC (permalink / raw) To: David Howells, Arjan van de Ven Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven On 07/02/2010 11:17 AM, Tejun Heo wrote: > Arjan, I still think we'll be better off using bound workqueues for > async but let's first convert without causing behavior difference. > Either way isn't gonna result in any noticeable difference anyway. If > you're okay with the conversion, please ack it. Ping, Arjan. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-07 5:41 ` Tejun Heo @ 2010-07-14 9:39 ` Tejun Heo 0 siblings, 0 replies; 15+ messages in thread From: Tejun Heo @ 2010-07-14 9:39 UTC (permalink / raw) To: David Howells, Arjan van de Ven Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Hello, On 07/07/2010 07:41 AM, Tejun Heo wrote: > On 07/02/2010 11:17 AM, Tejun Heo wrote: >> Arjan, I still think we'll be better off using bound workqueues for >> async but let's first convert without causing behavior difference. >> Either way isn't gonna result in any noticeable difference anyway. If >> you're okay with the conversion, please ack it. > > Ping, Arjan. Just for the record, I pinged Arjan again offlist and Arjan acked the conversion in the reply. Added Acked-by and pushed the conversion to for-next-candidate which will be pushed into linux-next the next week. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND 2010-07-02 9:17 ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo 2010-07-02 9:32 ` Tejun Heo 2010-07-07 5:41 ` Tejun Heo @ 2010-07-20 22:01 ` David Howells 2 siblings, 0 replies; 15+ messages in thread From: David Howells @ 2010-07-20 22:01 UTC (permalink / raw) To: Tejun Heo Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven Tejun Heo <tj@kernel.org> wrote: > David, this should work for fscache/slow-work the same way too. That > should relieve your concern, right? Not at the moment. What does this mean: * Unbound workqueues aren't concurrency managed and should be * dispatched to workers immediately. Does this mean you don't get reentrancy guarantees with unbounded work queues? I can't work out how you're achieving it with unbounded queues. I presume with CPU-bound workqueues your doing it by binding the work item to the current CPU still... Btw, how does this fare in an RT system, where work items bound to a CPU can't get executed because their CPU is busy with an RT thread, even though there are other, idle CPUs? > Oh, and Frederic suggested that we would be better off with something based > on tracing API and I agree, so the debugfs thing is currently dropped from > the tree. What do you think? I probably disagree. I just want to be able to cat a file and see the current runqueue state. I don't want to have to write and distribute a special program to do this. Of course, I don't know that much about the tracing API, so cat'ing a file to get the runqueue listed nicely may be possible with that. David ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-07-21 15:52 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-20 22:39 [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo 2010-07-21 13:08 ` David Howells 2010-07-21 14:59 ` Tejun Heo 2010-07-21 15:03 ` Tejun Heo 2010-07-21 15:25 ` David Howells 2010-07-21 15:31 ` Tejun Heo 2010-07-21 15:38 ` David Howells 2010-07-21 15:42 ` Tejun Heo 2010-07-21 15:45 ` David Howells 2010-07-21 15:51 ` Tejun Heo -- strict thread matches above, loose matches on Subject: below -- 2010-06-29 16:59 [PATCH 34/35] async: use workqueue for worker pool Tejun Heo 2010-06-28 21:03 ` [PATCHSET] workqueue: concurrency managed workqueue, take#6 Tejun Heo 2010-06-28 21:04 ` [PATCH 34/35] async: use workqueue for worker pool Tejun Heo 2010-06-28 22:55 ` Frederic Weisbecker 2010-06-29 7:25 ` Tejun Heo 2010-06-29 12:18 ` Frederic Weisbecker 2010-06-29 15:46 ` Tejun Heo 2010-06-29 15:52 ` Frederic Weisbecker 2010-06-29 15:55 ` Tejun Heo 2010-06-29 16:40 ` Arjan van de Ven 2010-06-29 21:37 ` David Howells 2010-07-02 9:17 ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo 2010-07-02 9:32 ` Tejun Heo 2010-07-07 5:41 ` Tejun Heo 2010-07-14 9:39 ` Tejun Heo 2010-07-20 22:01 ` David Howells
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).