From: Florian Mickler <florian@mickler.org>
To: Tejun Heo <tj@kernel.org>
Cc: mingo@elte.hu, awalls@radix.net, linux-kernel@vger.kernel.org,
jeff@garzik.org, akpm@linux-foundation.org,
rusty@rustcorp.com.au, cl@linux-foundation.org,
dhowells@redhat.com, arjan@linux.intel.com,
johannes@sipsolutions.net, oleg@redhat.com, axboe@kernel.dk
Subject: Re: Overview of concurrency managed workqueue
Date: Wed, 16 Jun 2010 08:55:47 +0200 [thread overview]
Message-ID: <20100616085547.77f9169e@schatten.dmk.lab> (raw)
In-Reply-To: <4C17C598.7070303@kernel.org>
Hi!
On Tue, 15 Jun 2010 20:25:28 +0200 Tejun Heo <tj@kernel.org> wrote:
> Hello, all.
>
> So, here's the overview I wrote up today. If anything needs more
> clarification, just ask. Thanks.
Nice writeup! I think it is sufficient already and I probably wouldn't
bother, but here are a little comments if you want to polish it up...
Also, feel free to ignore :)
As a genereal rule, every abbreviation should be written out at least
once and if you are going to abbreviate it from then on, the
abbreviation goes in parenthesis after that. That helps the reader a
lot.
For example:
>
> == Overview
>
> There are many cases where an execution context is needed and there
> already are several mechanisms for them. The most commonly used one
> is workqueue and there are slow_work, async and a few other. Although
The most commonly used one is workqueue (wq) and there are ...
> workqueue has been serving the kernel for quite some time now, it has
> some limitations.
here you can then already use "wq". That makes it shorter, and if you
use it consistently the reader doesn't wonder if wq and worqueue are
different things.
>
> There are two types of workqueues, single and multi threaded. MT wq
... multi threaded (MT). MT wq keeps a bound ...
> keeps a bound thread for each online CPU, while ST wq uses single
... while single threaded (ST) wq uses single ...
> unbound thread. With the quickly rising number of CPU cores, there
> already are systems in which just booting up saturates the default 32k
> PID space.
CPU and PID are well defined in the kernel, so no need to explain these.
>
> Frustratingly, although MT wqs end up spending a lot of resources, the
> level of concurrency provided is unsatisfactory. The concurrency
> limitation is common to both ST and MT wqs although it's less severe
I don't know what the english rules for plural of abbreviated word. But
I would probably just drop the plural s and let the reader add it when
he decodes the abbreviation. (ie replace wqs with wq) Or introduce it
properly: "... workqueues (wqs) ... ", Or don't abbreviate it in the
plural.
> on MT ones. Worker pools of wqs are completely separate from each
> other. A MT wq provides one execution context per CPU while a ST wq
> one for the whole system. This leads to various problems.
>
> One such problem is possible deadlock through dependency on the same
> execution resource. These can be detected quite reliably with lockdep
> these days but in most cases the only solution is to create a
> dedicated wq for one of the parties involved in the deadlock, which
> feeds back into the waste of resources. Also, when creating such
> dedicated wq to avoid deadlock, to avoid wasting large number of
> threads just for that work, ST wqs are often used but in most cases ST
> wqs are suboptimal compared to MT wqs.
>
> The tension between the provided level of concurrency and resource
> usage force its users to make unnecessary tradeoffs like libata
> choosing to use ST wq for polling PIOs and accepting a silly
> limitation that no two polling PIOs can be in progress at the same
> time. As MT wqs don't provide much better concurrency, users which
> require higher level of concurrency, like async or fscache, end up
> having to implement their own worker pool.
>
> cmwq extends workqueue with focus on the following goals.
first mentioning of cmwq as an abbreviation is not nice for the reader.
Better:
Concurrency managed wq (cmwq) ... goals:
Concurrency managed workqueue (cmwq) ... goals:
>
> * Workqueue is already very widely used. Maintain compatibility with
> the current API while removing limitations of the current
> implementation.
* Because the current wq implementation is already very widely used we
maintain compatibility with the API while removing above
mentioned limitations.
>
> * Provide single unified worker pool per cpu which can be shared by
> all users. The worker pool and level of concurrency should be
> regulated automatically so that the API users don't need to worry
> about that.
>
> * Use what's necessary and allocate resources lazily on demand while
> still maintaining forward progress guarantee where necessary.
>
>
> == Unified worklist
>
> There's a single global cwq, or gcwq, per each possible cpu which
... global cwq (gcwq) per each possible cpu
> actually serves out the execution contexts. cpu_workqueues or cwqs of
cpu_workqueues (cwqs)
> each wq are mostly simple frontends to the associated gcwq. Under
> normal operation, when a work is queued, it's queued to the gcwq on
> the same cpu. Each gcwq has its own pool of workers bound to the gcwq
> which will be used to process all the works queued on the cpu. For
> the most part, works don't care to which wqs they're queued to and
> using a unified worklist is pretty straight forward. There are a
> couple of areas where things are a bit more complicated.
>
> First, when queueing works from different wqs on the same queue,
> ordering of works needs special care. Originally, a MT wq allows a
> work to be executed simultaneously on multiple cpus although it
> doesn't allow the same one to execute simultaneously on the same cpu
> (reentrant). A ST wq allows only single work to be executed on any
> cpu which guarantees both non-reentrancy and single-threadedness.
>
> cmwq provides three different ordering modes - reentrant (default),
... (default mode)...
> non-reentrant and single-cpu, where single-cpu can be used to achieve
> single-threadedness and full ordering combined with in-flight work
> limit of 1. The default mode is basically the same as the original
The default mode (reentrant) is basically...
> implementation. The distinction between non-reentrancy and single-cpu
> were made because some ST wq users didn't really need single
> threadedness but just non-reentrancy.
> Another area where things get more involved is workqueue flushing as
> for flushing to which wq a work is queued matters. cmwq tracks this
> using colors. When a work is queued to a cwq, it's assigned a color
> and each cwq maintains counters for each work color. The color
> assignment changes on each wq flush attempt. A cwq can tell that all
> works queued before a certain wq flush attempt have finished by
> waiting for all the colors upto that point to drain. This maintains
> the original workqueue flush semantics without adding unscalable
> overhead.
[nice solution, btw]
>
>
> == Automatically regulated shared worker pool
>
> For any worker pool, managing the concurrency level (how many workers
> are executing simultaneously) is an important issue. cmwq tries to
> keep the concurrency at minimum but sufficient level.
>
> Concurrency management is implemented by hooking into the scheduler.
> gcwq is notified whenever a busy worker wakes up or sleeps and thus
There is only one gcwq?
Then maybe better:
_The_ gcwq is notified...
> can keep track of the current level of concurrency. Works aren't
> supposed to be cpu cycle hogs and maintaining just enough concurrency
> to prevent work processing from stalling due to lack of processing
> context should be optimal. gcwq keeps the number of concurrent active
> workers to minimum but no less.
also:
... The gcwq keeps the number of concurrent ...
> As long as there's one or more
> running workers on the cpu, no new worker is scheduled so that works
> can be processed in batch as much as possible but when the last
> running worker blocks, gcwq immediately schedules new worker so that
> the cpu doesn't sit idle while there are works to be processed.
here too: ..., the gcwq immediately schedules ...
> This allows using minimal number of workers without losing execution
> bandwidth. Keeping idle workers around doesn't cost much other than
> the memory space, so cmwq holds onto idle ones for a while before
> killing them.
>
> As multiple execution contexts are available for each wq, deadlocks
> around execution contexts is much harder to create. The default
> workqueue, system_wq, has maximum concurrency level of 256 and unless
> there is a use case which can result in a dependency loop involving
> more than 254 workers, it won't deadlock.
>
> Such forward progress guarantee relies on that workers can be created
> when more execution contexts are necessary. This is guaranteed by
> using emergency workers. All wqs which can be used in allocation path
> are required to have emergency workers which are reserved for
> execution of that specific workqueue so that allocation needed for
> worker creation doesn't deadlock on workers.
>
>
> == Benefits
>
> * Less to worry about causing deadlocks around execution resources.
>
> * Far fewer number of kthreads.
>
> * More flexibility without runtime overhead.
>
> * As concurrency is no longer a problem, workloads which needed
> separate mechanisms can now use generic workqueue instead. This
> easy access to concurrency also allows stuff which wasn't worth
> implementing a dedicated mechanism for but still needed flexible
> concurrency.
* improved latency for current schedule_work() users, i.e. the work
get's executed in a more timely fashion?
> == Numbers (this is with the third take but nothing which could affect
> performance has changed since then. Eh well, very little has
> changed since then in fact.)
>
> wq workload is generated by perf-wq.c module which is a very simple
> synthetic wq load generator (I'll attach it to this message). A work
> is described by five parameters - burn_usecs, mean_sleep_msecs,
> mean_resched_msecs and factor. It randomly splits burn_usecs into
> two, burns the first part, sleeps for 0 - 2 * mean_sleep_msecs, burns
> what's left of burn_usecs and then reschedules itself in 0 - 2 *
> mean_resched_msecs. factor is used to tune the number of cycles to
> match execution duration.
>
> It issues three types of works - short, medium and long, each with two
> burn durations L and S.
>
> burn/L(us) burn/S(us) mean_sleep(ms) mean_resched(ms) cycles
> short 50 1 1 10 454
> medium 50 2 10 50 125
> long 50 4 100 250 42
>
> And then these works are put into the following workloads. The lower
> numbered workloads have more short/medium works.
>
> workload 0
> * 12 wqs with 4 short works
> * 2 wqs with 2 short and 2 medium works
> * 4 wqs with 2 medium and 1 long works
> * 8 wqs with 1 long work
>
> workload 1
> * 8 wqs with 4 short works
> * 2 wqs with 2 short and 2 medium works
> * 4 wqs with 2 medium and 1 long works
> * 8 wqs with 1 long work
>
> workload 2
> * 4 wqs with 4 short works
> * 2 wqs with 2 short and 2 medium works
> * 4 wqs with 2 medium and 1 long works
> * 8 wqs with 1 long work
>
> workload 3
> * 2 wqs with 4 short works
> * 2 wqs with 2 short and 2 medium works
> * 4 wqs with 2 medium and 1 long works
> * 8 wqs with 1 long work
>
> workload 4
> * 2 wqs with 4 short works
> * 2 wqs with 2 medium works
> * 4 wqs with 2 medium and 1 long works
> * 8 wqs with 1 long work
>
> workload 5
> * 2 wqs with 2 medium works
> * 4 wqs with 2 medium and 1 long works
> * 8 wqs with 1 long work
>
> The above wq loads are run in parallel with mencoder converting 76M
> mjpeg file into mpeg4 which takes 25.59 seconds with standard
> deviation of 0.19 without wq loading. The CPU was intel netburst
> celeron running at 2.66GHz (chosen for its small cache size and
> slowness). wl0 and 1 are only tested for burn/S. Each test case was
> run 11 times and the first run was discarded.
>
> vanilla/L cmwq/L vanilla/S cmwq/S
> wl0 26.18 d0.24 26.27 d0.29
> wl1 26.50 d0.45 26.52 d0.23
> wl2 26.62 d0.35 26.53 d0.23 26.14 d0.22 26.12 d0.32
> wl3 26.30 d0.25 26.29 d0.26 25.94 d0.25 26.17 d0.30
> wl4 26.26 d0.23 25.93 d0.24 25.90 d0.23 25.91 d0.29
> wl5 25.81 d0.33 25.88 d0.25 25.63 d0.27 25.59 d0.26
>
> There is no significant difference between the two. Maybe the code
> overhead and benefits coming from context sharing are canceling each
> other nicely. With longer burns, cmwq looks better but it's nothing
> significant. With shorter burns, other than wl3 spiking up for
> vanilla which probably would go away if the test is repeated, the two
> are performing virtually identically.
>
> The above is exaggerated synthetic test result and the performance
> difference will be even less noticeable in either direction under
> realistic workloads.
>
> cmwq extends workqueue such that it can serve as robust async
> mechanism which can be used (mostly) universally without introducing
> any noticeable performance degradation.
>
> Thanks.
>
> --
> tejun
>
Cheers,
Flo
next prev parent reply other threads:[~2010-06-16 6:56 UTC|newest]
Thread overview: 129+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-14 21:37 [PATCHSET] workqueue: concurrency managed workqueue, take#5 Tejun Heo
2010-06-14 21:37 ` [PATCH 01/30] kthread: implement kthread_data() Tejun Heo
2010-06-14 21:37 ` [PATCH 02/30] acpi: use queue_work_on() instead of binding workqueue worker to cpu0 Tejun Heo
2010-06-14 21:37 ` [PATCH 03/30] workqueue: kill RT workqueue Tejun Heo
2010-06-14 21:37 ` [PATCH 04/30] workqueue: misc/cosmetic updates Tejun Heo
2010-06-14 21:37 ` [PATCH 05/30] workqueue: merge feature parameters into flags Tejun Heo
2010-06-14 21:37 ` [PATCH 06/30] workqueue: define masks for work flags and conditionalize STATIC flags Tejun Heo
2010-06-14 21:37 ` [PATCH 07/30] workqueue: separate out process_one_work() Tejun Heo
2010-06-14 21:37 ` [PATCH 08/30] workqueue: temporarily disable workqueue tracing Tejun Heo
2010-06-15 13:29 ` Frederic Weisbecker
2010-06-15 16:37 ` Tejun Heo
2010-06-14 21:37 ` [PATCH 09/30] workqueue: kill cpu_populated_map Tejun Heo
2010-06-14 21:37 ` [PATCH 10/30] workqueue: update cwq alignement Tejun Heo
2010-06-14 21:37 ` [PATCH 11/30] workqueue: reimplement workqueue flushing using color coded works Tejun Heo
2010-06-14 21:37 ` [PATCH 12/30] workqueue: introduce worker Tejun Heo
2010-06-14 21:37 ` [PATCH 13/30] workqueue: reimplement work flushing using linked works Tejun Heo
2010-06-14 21:37 ` [PATCH 14/30] workqueue: implement per-cwq active work limit Tejun Heo
2010-06-14 21:37 ` [PATCH 15/30] workqueue: reimplement workqueue freeze using max_active Tejun Heo
2010-06-14 21:37 ` [PATCH 16/30] workqueue: introduce global cwq and unify cwq locks Tejun Heo
2010-06-14 21:37 ` [PATCH 17/30] workqueue: implement worker states Tejun Heo
2010-06-14 21:37 ` [PATCH 18/30] workqueue: reimplement CPU hotplugging support using trustee Tejun Heo
2010-06-14 21:37 ` [PATCH 19/30] workqueue: make single thread workqueue shared worker pool friendly Tejun Heo
2010-06-14 21:37 ` [PATCH 20/30] workqueue: add find_worker_executing_work() and track current_cwq Tejun Heo
2010-06-14 21:37 ` [PATCH 21/30] workqueue: carry cpu number in work data once execution starts Tejun Heo
2010-06-14 21:37 ` [PATCH 22/30] workqueue: implement WQ_NON_REENTRANT Tejun Heo
2010-06-14 21:37 ` [PATCH 23/30] workqueue: use shared worklist and pool all workers per cpu Tejun Heo
2010-06-14 21:37 ` [PATCH 24/30] workqueue: implement concurrency managed dynamic worker pool Tejun Heo
2010-06-14 21:37 ` [PATCH 25/30] workqueue: increase max_active of keventd and kill current_is_keventd() Tejun Heo
2010-06-14 21:37 ` [PATCH 26/30] workqueue: add system_wq, system_long_wq and system_nrt_wq Tejun Heo
2010-06-14 21:37 ` [PATCH 27/30] workqueue: implement DEBUGFS/workqueue Tejun Heo
2010-06-15 13:54 ` Frederic Weisbecker
2010-06-15 16:42 ` Tejun Heo
2010-06-14 21:37 ` [PATCH 28/30] workqueue: implement several utility APIs Tejun Heo
2010-06-14 21:37 ` [PATCH 29/30] libata: take advantage of cmwq and remove concurrency limitations Tejun Heo
2010-06-14 21:37 ` [PATCH 30/30] async: use workqueue for worker pool Tejun Heo
2010-06-14 21:58 ` [PATCHSET] workqueue: concurrency managed workqueue, take#5 Andrew Morton
2010-06-14 22:17 ` Tejun Heo
2010-06-14 22:31 ` Daniel Walker
2010-06-14 22:33 ` Tejun Heo
2010-06-14 22:35 ` Daniel Walker
2010-06-14 22:44 ` Tejun Heo
2010-06-14 22:49 ` Daniel Walker
2010-06-14 22:52 ` Tejun Heo
2010-06-14 22:35 ` Andrew Morton
2010-06-14 22:43 ` Tejun Heo
2010-06-14 23:06 ` Andrew Morton
2010-06-15 12:53 ` tytso
2010-06-15 16:15 ` [PATCH] SubmittingPatches: add more about patch descriptions Randy Dunlap
2010-06-15 16:33 ` Christoph Lameter
2010-06-15 18:15 ` [PATCHSET] workqueue: concurrency managed workqueue, take#5 Stefan Richter
2010-06-15 19:39 ` Tejun Heo
2010-06-15 1:20 ` Jeff Garzik
2010-06-15 18:25 ` Overview of concurrency managed workqueue Tejun Heo
2010-06-15 18:40 ` Christoph Lameter
2010-06-15 18:44 ` Tejun Heo
2010-06-15 19:43 ` Daniel Walker
2010-06-16 12:10 ` Tejun Heo
2010-06-16 13:27 ` Daniel Walker
2010-06-16 13:30 ` Tejun Heo
2010-06-16 13:41 ` Daniel Walker
2010-06-16 13:45 ` Tejun Heo
2010-06-16 14:05 ` Daniel Walker
2010-06-16 14:15 ` Tejun Heo
2010-06-16 14:34 ` Daniel Walker
2010-06-16 14:50 ` Tejun Heo
2010-06-16 15:11 ` Daniel Walker
2010-06-16 15:50 ` Tejun Heo
2010-06-16 16:30 ` Daniel Walker
2010-06-16 16:55 ` Tejun Heo
2010-06-16 18:22 ` Daniel Walker
2010-06-16 18:46 ` Tejun Heo
2010-06-16 19:20 ` Tejun Heo
2010-06-16 19:46 ` Daniel Walker
2010-06-16 19:58 ` Tejun Heo
2010-06-17 5:29 ` Florian Mickler
2010-06-17 6:21 ` Florian Mickler
2010-06-17 8:28 ` Tejun Heo
2010-06-17 18:03 ` Daniel Walker
2010-06-18 6:36 ` Florian Mickler
2010-06-18 16:38 ` Daniel Walker
2010-06-16 19:36 ` Daniel Walker
2010-06-16 19:52 ` Tejun Heo
2010-06-16 20:19 ` Daniel Walker
2010-06-16 20:24 ` Tejun Heo
2010-06-16 20:40 ` Daniel Walker
2010-06-16 21:41 ` Tejun Heo
2010-06-17 23:15 ` Andrew Morton
2010-06-18 8:03 ` Tejun Heo
2010-06-18 8:22 ` Tejun Heo
2010-06-18 17:29 ` Daniel Walker
2010-06-16 18:31 ` Stefan Richter
2010-06-16 18:41 ` Daniel Walker
2010-06-17 12:01 ` Andy Walls
2010-06-17 16:56 ` Daniel Walker
2010-06-17 23:16 ` Andrew Morton
2010-06-18 7:16 ` Tejun Heo
2010-06-18 7:31 ` Andrew Morton
2010-06-18 8:09 ` Tejun Heo
2010-06-18 17:02 ` Andrew Morton
2010-06-18 17:28 ` Tejun Heo
2010-06-19 15:53 ` [PATCH] kthread: implement kthread_worker Tejun Heo
2010-06-21 20:33 ` Randy Dunlap
2010-06-22 7:31 ` Tejun Heo
2010-06-19 8:38 ` Overview of concurrency managed workqueue Andi Kleen
2010-06-19 8:40 ` Tejun Heo
2010-06-19 8:55 ` Andi Kleen
2010-06-19 9:01 ` Tejun Heo
2010-06-19 9:08 ` Andi Kleen
2010-06-19 9:12 ` Tejun Heo
2010-06-19 9:15 ` Andi Kleen
2010-06-19 9:17 ` Tejun Heo
2010-06-19 9:27 ` Andi Kleen
2010-06-19 9:42 ` Tejun Heo
2010-06-19 12:20 ` Andi Kleen
2010-06-19 12:48 ` Tejun Heo
2010-06-17 22:28 ` Daniel Walker
2010-06-16 6:55 ` Florian Mickler [this message]
2010-06-16 12:22 ` Tejun Heo
2010-06-16 13:37 ` Johannes Berg
2010-06-16 13:39 ` Tejun Heo
2010-06-16 13:42 ` Johannes Berg
2010-06-17 23:14 ` Andrew Morton
2010-06-17 23:25 ` Joel Becker
2010-06-17 23:56 ` Andrew Morton
2010-06-18 7:15 ` Tejun Heo
2010-06-18 7:31 ` Tejun Heo
2010-06-15 18:29 ` [PATCHSET] workqueue: concurrency managed workqueue, take#5 Stefan Richter
2010-06-15 18:40 ` Tejun Heo
2010-06-15 20:29 ` Stefan Richter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100616085547.77f9169e@schatten.dmk.lab \
--to=florian@mickler.org \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=awalls@radix.net \
--cc=axboe@kernel.dk \
--cc=cl@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=jeff@garzik.org \
--cc=johannes@sipsolutions.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=rusty@rustcorp.com.au \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.