Re: [PATCHSET] workqueue: implement and use WQ

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
@ 2010-07-20 22:39 Tejun Heo
  2010-07-21 13:08 ` David Howells
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-07-20 22:39 UTC (permalink / raw)
  To: David Howells
  Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven


Hello, David.

"David Howells" <dhowells@redhat.com> wrote:
> Does this mean you don't get reentrancy guarantees with unbounded work queues?

It means that unbound wq behaves like a generic worker pool.  Bound wq limits concurrency to minimal level but unbound one executes works as long as resources are available.  I'll continue below.

>I can't work out how you're achieving it with unbounded queues.  I presume with
>CPU-bound workqueues your doing it by binding the work item to the current CPU
>still...

Unbound works are served by a dedicated gcwq whose workers are not affine to any particular CPU.  As all unbound works are served by the same gcwq, non reentrancy is automatically guaranteed.

>Btw, how does this fare in an RT system, where work items bound to a CPU can't
>get executed because their CPU is busy with an RT thread, even though there are
>other, idle CPUs?

Sure, there's nothing special about unbound workers.  They're just normal kthreads.

>> Oh, and Frederic suggested that we would be better off with something based
>> on tracing API and I agree, so the debugfs thing is currently dropped from
>> the tree.  What do you think?
>
>I probably disagree.  I just want to be able to cat a file and see the current
>runqueue state.  I don't want to have to write and distribute a special program
>to do this.  Of course, I don't know that much about the tracing API, so
>cat'ing a file to get the runqueue listed nicely may be possible with that.

I'm relatively sure we can do that.  Frederic?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-20 22:39 [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo
@ 2010-07-21 13:08 ` David Howells
  2010-07-21 14:59   ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2010-07-21 13:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Tejun Heo <tj@kernel.org> wrote:

> As all unbound works are served by the same gcwq, non reentrancy is
> automatically guaranteed.

That doesn't actually explain _how_ it's non-reentrant.  The gcwq includes a
collection of threads that can execute from it, right?  If so, what mechanism
prevents two threads from executing the same work item, if that work item
isn't bound to a CPU?  I've been trying to figure this out from the code, but
I don't see it offhand.

> > Btw, how does this fare in an RT system, where work items bound to a CPU
> > can't get executed because their CPU is busy with an RT thread, even
> > though there are other, idle CPUs?
> 
> Sure, there's nothing special about unbound workers.  They're just normal
> kthreads.

I should've been clearer: As I understand it, normal (unbound) worker items
are bound to the CPU on which they were queued, and will be executed there
only (barring CPU removal).  If that's the case, isn't it possible that work
items can be prevented from getting execution time by an RT thread that's
hogging a CPU and won't let go?

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-21 13:08 ` David Howells
@ 2010-07-21 14:59   ` Tejun Heo
  2010-07-21 15:03     ` Tejun Heo
  2010-07-21 15:25     ` David Howells
  0 siblings, 2 replies; 15+ messages in thread
From: Tejun Heo @ 2010-07-21 14:59 UTC (permalink / raw)
  To: David Howells
  Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Hello,

On 07/21/2010 03:08 PM, David Howells wrote:
> Tejun Heo <tj@kernel.org> wrote:
> 
>> As all unbound works are served by the same gcwq, non reentrancy is
>> automatically guaranteed.
> 
> That doesn't actually explain _how_ it's non-reentrant.  The gcwq includes a
> collection of threads that can execute from it, right?  If so, what mechanism
> prevents two threads from executing the same work item, if that work item
> isn't bound to a CPU?  I've been trying to figure this out from the code, but
> I don't see it offhand.

Sharing the same gcwq is why workqueues bound to one CPU have
non-reentrancy, so they're using the same mechanism.  If it doesn't
work for unbound workqueues, the normal ones are broken too.  Each
gcwq keeps track of currently running works in a hash table and looks
whether the work in question is already executing before starting
executing it.  It's a bit complex but as a work_struct may be freed
once execution starts, the status needs to be tracked outside.

>>> Btw, how does this fare in an RT system, where work items bound to a CPU
>>> can't get executed because their CPU is busy with an RT thread, even
>>> though there are other, idle CPUs?
>>
>> Sure, there's nothing special about unbound workers.  They're just normal
>> kthreads.
> 
> I should've been clearer: As I understand it, normal (unbound) worker items
> are bound to the CPU on which they were queued, and will be executed there
> only (barring CPU removal).  If that's the case, isn't it possible that work
> items can be prevented from getting execution time by an RT thread that's
> hogging a CPU and won't let go?

Yeah, for bound workqueues, sure.  That's exactly the same as the
original workqueue implementation.  For unbound workqueues, it doesn't
matter.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-21 14:59   ` Tejun Heo
@ 2010-07-21 15:03     ` Tejun Heo
  2010-07-21 15:25     ` David Howells
  1 sibling, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2010-07-21 15:03 UTC (permalink / raw)
  To: David Howells
  Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Just a bit of clarification.

On 07/21/2010 04:59 PM, Tejun Heo wrote:
>> I should've been clearer: As I understand it, normal (unbound) worker items

In workqueue land, normal workqueues would be bound to CPUs while
workers for WQ_UNBOUND workqueues aren't affined to any specific CPU.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-21 14:59   ` Tejun Heo
  2010-07-21 15:03     ` Tejun Heo
@ 2010-07-21 15:25     ` David Howells
  2010-07-21 15:31       ` Tejun Heo
  1 sibling, 1 reply; 15+ messages in thread
From: David Howells @ 2010-07-21 15:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Tejun Heo <tj@kernel.org> wrote:

> Each gcwq keeps track of currently running works in a hash table and looks
> whether the work in question is already executing before starting executing
> it.  It's a bit complex but as a work_struct may be freed once execution
> starts, the status needs to be tracked outside.

Thanks, that's what I wanted to know.

I presume this survives an executing work_struct being freed, reallocated and
requeued before the address of the work_struct is removed from the hash table?

I can see at least one way of doing this: marking the work_struct address in
the hash when the address becomes pending again so that the process of hash
removal will cause the work_struct to be requeued automatically.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-21 15:25     ` David Howells
@ 2010-07-21 15:31       ` Tejun Heo
  2010-07-21 15:38         ` David Howells
  2010-07-21 15:45         ` David Howells
  0 siblings, 2 replies; 15+ messages in thread
From: Tejun Heo @ 2010-07-21 15:31 UTC (permalink / raw)
  To: David Howells
  Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Hello,

On 07/21/2010 05:25 PM, David Howells wrote:
>> Each gcwq keeps track of currently running works in a hash table and looks
>> whether the work in question is already executing before starting executing
>> it.  It's a bit complex but as a work_struct may be freed once execution
>> starts, the status needs to be tracked outside.
> 
> Thanks, that's what I wanted to know.
> 
> I presume this survives an executing work_struct being freed, reallocated and
> requeued before the address of the work_struct is removed from the hash table?

It will unnecessarily stall the execution of the new work if the last
work is still running but nothing will be broken correctness-wise.

> I can see at least one way of doing this: marking the work_struct address in
> the hash when the address becomes pending again so that the process of hash
> removal will cause the work_struct to be requeued automatically.

If I'm correctly understanding what you're saying, the code already
does about the same thing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-21 15:31       ` Tejun Heo
@ 2010-07-21 15:38         ` David Howells
  2010-07-21 15:42           ` Tejun Heo
  2010-07-21 15:45         ` David Howells
  1 sibling, 1 reply; 15+ messages in thread
From: David Howells @ 2010-07-21 15:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Tejun Heo <tj@kernel.org> wrote:

> If I'm correctly understanding what you're saying, the code already
> does about the same thing.

Cool.

Btw, it seems to work for fscache.  Feel free to add my Acked-by to your
patches.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-21 15:38         ` David Howells
@ 2010-07-21 15:42           ` Tejun Heo
  0 siblings, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2010-07-21 15:42 UTC (permalink / raw)
  To: David Howells
  Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Hello,

On 07/21/2010 05:38 PM, David Howells wrote:
> Tejun Heo <tj@kernel.org> wrote:
> 
>> If I'm correctly understanding what you're saying, the code already
>> does about the same thing.
> 
> Cool.
> 
> Btw, it seems to work for fscache.  Feel free to add my Acked-by to your
> patches.

Great, I'll start working on the debugging stuff once things settle
down a bit.  Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-21 15:31       ` Tejun Heo
  2010-07-21 15:38         ` David Howells
@ 2010-07-21 15:45         ` David Howells
  2010-07-21 15:51           ` Tejun Heo
  1 sibling, 1 reply; 15+ messages in thread
From: David Howells @ 2010-07-21 15:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Tejun Heo <tj@kernel.org> wrote:

> It will unnecessarily stall the execution of the new work if the last
> work is still running but nothing will be broken correctness-wise.

That's fine.  Better that than risk unexpected reentrance.  You could add a
function to allow an executing work item to yield the hash entry to indicate
that the work_item that invoked it has been destroyed, but it's probably not
worth it, and it has scope for mucking things up horribly if used at the wrong
time.

I presume also that if a work_item being executed on one work queue is queued
on another work queue, then there is no non-reentrancy guarantee (which is
fine; if you don't like that, don't do it).

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-21 15:45         ` David Howells
@ 2010-07-21 15:51           ` Tejun Heo
  0 siblings, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2010-07-21 15:51 UTC (permalink / raw)
  To: David Howells
  Cc: Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Hello,

On 07/21/2010 05:45 PM, David Howells wrote:
> That's fine.  Better that than risk unexpected reentrance.  You could add a
> function to allow an executing work item to yield the hash entry to indicate
> that the work_item that invoked it has been destroyed, but it's probably not
> worth it, and it has scope for mucking things up horribly if used at the wrong
> time.

Yeah, I agree, it's going too far and can be easily misused.  Given
that there are very few users which actually do that, I think it would
be best to leave it alone.

> I presume also that if a work_item being executed on one work queue is queued
> on another work queue, then there is no non-reentrancy guarantee (which is
> fine; if you don't like that, don't do it).

Right, there is no non-reentrancy guarantee.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 34/35] async: use workqueue for worker pool
  2010-06-29 16:40                 ` Arjan van de Ven
@ 2010-06-29 16:59 Tejun Heo
  2010-06-28 21:03 ` [PATCHSET] workqueue: concurrency managed workqueue, take#6 Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-06-29 16:59 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm,
	rusty, cl, dhowells, oleg, axboe, dwalker, stefanr, florian, andi,
	mst, randy.dunlap, Arjan van de Ven

Hello, Arjan.

On 06/29/2010 06:40 PM, Arjan van de Ven wrote:
> uh? clearly the assumption is that if I have a 16 CPU machine, and 12
> items of work get scheduled,
> that we get all 12 running in parallel. All the smarts of cmwq surely
> only kick in once you've reached the
> "one work item per cpu" threshold ???

Hmmm... workqueue workers are bound to certain cpu, so if you schedule
a work on a specific CPU, it will run there.  Once a cpu gets
saturated, the issuing thread will be moved elsewhere.  I don't think
it matters to any of the current async users one way or the other,
would it?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHSET] workqueue: concurrency managed workqueue, take#6
@ 2010-06-28 21:03 ` Tejun Heo
  2010-06-28 21:04   ` [PATCH 34/35] async: use workqueue for worker pool Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-06-28 21:03 UTC (permalink / raw)
  To: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells,
	arjan, oleg, axboe, fweisbec, dwalker, stefanr, florian, andi,
	mst, randy.dunlap

Hello, all.

This is the sixth take of cmwq (concurrency managed workqueue)
patchset.  It's on top of v2.6.35-rc3 + sched/core branch.  Git tree
is available at

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-cmwq

Linus, please read the merge plan section.

Thanks.

Table of contents
=================

A. This take
A-1. Merge plan
A-2. Changes from the last take[L]
A-3. TODOs
A-4. Patches and diffstat

B. General documentation of Concurrency Managed Workqueue (cmwq)
B-1. Why?
B-2. Overview
B-3. Unified worklist
B-4. Concurrency managed shared worker pool
B-5. Performance test results

A. This take
============

== A-1. Merge plan

Until now, cmwq patches haven't been fixed into permanent commits
mainly because sched patches which they are dependent upon made into
sched/core tree only recently.  After review, I'll put this take into
permanent commits.  Further developments or fixes will be done on top.

I believe that expected users of cmwq are generally in favor of the
flexibility added by cmwq.  In the last take, the following issues
were raised.

* Andi Kleen wanted to use high priority dispatching for memory fault
  handlers.  WQ_HIGHPRI is implemented to deal with this and padata
  integration.

* Andrew Morton raised two issues - workqueue users which use RT
  priority setting (ivtv) and padata integration.  kthread_worker
  which provides simple work based interface on top of kthread is
  added for cases where fixed association with a specific kthread is
  required for priority setting, cpuset and other task attributes
  adjustments.  This will also be used by virtnet.

  WQ_CPU_INTENSIVE is added to address padata integration.  When
  combined with WQ_HIGHPRI, all concurrency management logic is
  bypassed and cmwq works as a (conceptually) simple context provider
  and padata should operate without any noticeable difference.

* Daniel Walker objected on the ground that cmwq would make it
  impossible to adjust priorities of workqueue threads which can be
  useful as an ad-hoc optimization.  I don't plan to address this
  concern (suggested solution is to add userland visible knobs to
  adjust workqueue priorities) at this point because it is an
  implementation detail that userspace shouldn't diddle with in the
  first place.  If anyone is interested in the details of the
  dicussion, please read the dicussion thread on the last take[L].

Unless there are fundamental objections, I'll push the patchset out to
linux-next and proceed with the followings.

* integrating with other subsystems

* auditing all the workqueue users to better suit cmwq

* implementing features which will depend on cmwq (in-kernel media
  presence polling is the first target)

I expect there to be some, hopefully not too many, cross tree pulls in
the process and it will be a bit messy to back out later, so if you
have any fundamental concerns, please speak sooner than later.

Linus, it would be great if you let me know whether you agree with the
merge plan.

== A-2. Changes from the last take

* kthread_worker is added.  kthread_worker is a minimal work execution
  wrapper around kthread.  This is to ease using kthread for users
  which require control over thread attributes like priority, cpuset
  or whatever.

  kthreads can be created with kthread_worker_fn() directly or
  kthread_worker_fn() can be called after running any code the kthread
  needs to run for initialization.  The kthread can be treated the
  same way as any other kthread.

  - ivtv which used single threaded workqueue and bumped the priority
    of the worker to RT is converted to use kthread_worker.

* WQ_HIGHPRI and WQ_CPU_INTENSIVE are implemented.

  Works queued to a high priority workqueues are queued at the head of
  the global worklist and don't get blocked by other works.  They're
  dispatched to a worker as soon as possible.

  Works queued to a CPU intensive workqueue don't participate in
  concurrency management and thus don't block other works from
  executing.  This is to be used by works which are expected to burn
  considerable amount of CPU cycles.

  Workqueues w/ both WQ_HIGHPRI and WQ_CPU_INTENSIVE set don't get
  affected by or participate in concurrency management.  Works queued
  on such workqueues are dispatched immediately and don't affect other
  works.

  - pcrypt which creates workqueues and uses them for padata is
    converted to use high priority cpu intensive workqueues with
    max_active of 1, which should behave about the same as the
    original implementation.  Going forward, as workqueues themselves
    don't cost to have around anymore, it would be better to make
    padata to directly create workqueues for its users.

* To implement HIGHPRI and CPU_INTENSIVE, handling of worker flags
  which affect the running state for concurrency management has been
  updated.  worker_{set|clr}_flags() are added which manage the
  nr_running count according to worker state transitions.  This also
  makes nr_running counting easier to follow and verify.

* __create_workqueue() is renamed to alloc_workqueue() and is now a
  public interface.  It now interprets 0 max_active as the default
  max_active.  In the long run, all create*_workqueue() calls will be
  replaced with alloc_workqueue().

* Custom workqueue instrumentation via debugfs is removed.  The plan
  is to implement proper tracing API based instrumentation as
  suggested by Frederic Weisbecker.

* The original workqueue tracer code removed as suggested by Frederic
  Weisbecker.

* Comments updated/added.

== A-3. TODOs

* fscache/slow-work conversion is not in this series.  It needs to be
  performance tested and acked by David Howells.

* Audit each workqueue users and
  - make them use system workqueue instead if possible.
  - drop emergency worker if possible.
  - make them use alloc_workqueue() instead.

* Improve lockdep annotations.

* Implement workqueue tracer.

== A-4. Patches and diffstat

 0001-kthread-implement-kthread_worker.patch
 0002-ivtv-use-kthread_worker-instead-of-workqueue.patch
 0003-kthread-implement-kthread_data.patch
 0004-acpi-use-queue_work_on-instead-of-binding-workqueue-.patch
 0005-workqueue-kill-RT-workqueue.patch
 0006-workqueue-misc-cosmetic-updates.patch
 0007-workqueue-merge-feature-parameters-into-flags.patch
 0008-workqueue-define-masks-for-work-flags-and-conditiona.patch
 0009-workqueue-separate-out-process_one_work.patch
 0010-workqueue-temporarily-remove-workqueue-tracing.patch
 0011-workqueue-kill-cpu_populated_map.patch
 0012-workqueue-update-cwq-alignement.patch
 0013-workqueue-reimplement-workqueue-flushing-using-color.patch
 0014-workqueue-introduce-worker.patch
 0015-workqueue-reimplement-work-flushing-using-linked-wor.patch
 0016-workqueue-implement-per-cwq-active-work-limit.patch
 0017-workqueue-reimplement-workqueue-freeze-using-max_act.patch
 0018-workqueue-introduce-global-cwq-and-unify-cwq-locks.patch
 0019-workqueue-implement-worker-states.patch
 0020-workqueue-reimplement-CPU-hotplugging-support-using-.patch
 0021-workqueue-make-single-thread-workqueue-shared-worker.patch
 0022-workqueue-add-find_worker_executing_work-and-track-c.patch
 0023-workqueue-carry-cpu-number-in-work-data-once-executi.patch
 0024-workqueue-implement-WQ_NON_REENTRANT.patch
 0025-workqueue-use-shared-worklist-and-pool-all-workers-p.patch
 0026-workqueue-implement-worker_-set-clr-_flags.patch
 0027-workqueue-implement-concurrency-managed-dynamic-work.patch
 0028-workqueue-increase-max_active-of-keventd-and-kill-cu.patch
 0029-workqueue-s-__create_workqueue-alloc_workqueue-and-a.patch
 0030-workqueue-implement-several-utility-APIs.patch
 0031-workqueue-implement-high-priority-workqueue.patch
 0032-workqueue-implement-cpu-intensive-workqueue.patch
 0033-libata-take-advantage-of-cmwq-and-remove-concurrency.patch
 0034-async-use-workqueue-for-worker-pool.patch
 0035-pcrypt-use-HIGHPRI-and-CPU_INTENSIVE-workqueues-for-.patch

 arch/ia64/kernel/smpboot.c             |    2 
 arch/x86/kernel/smpboot.c              |    2 
 crypto/pcrypt.c                        |    4 
 drivers/acpi/osl.c                     |   40 
 drivers/ata/libata-core.c              |   20 
 drivers/ata/libata-eh.c                |    4 
 drivers/ata/libata-scsi.c              |   10 
 drivers/ata/libata-sff.c               |    9 
 drivers/ata/libata.h                   |    1 
 drivers/media/video/ivtv/ivtv-driver.c |   26 
 drivers/media/video/ivtv/ivtv-driver.h |    8 
 drivers/media/video/ivtv/ivtv-irq.c    |   15 
 drivers/media/video/ivtv/ivtv-irq.h    |    2 
 include/linux/cpu.h                    |    2 
 include/linux/kthread.h                |   65 
 include/linux/libata.h                 |    1 
 include/linux/workqueue.h              |  135 +
 include/trace/events/workqueue.h       |   92 
 kernel/async.c                         |  140 -
 kernel/kthread.c                       |  164 +
 kernel/power/process.c                 |   21 
 kernel/trace/Kconfig                   |   11 
 kernel/workqueue.c                     | 3260 +++++++++++++++++++++++++++------
 kernel/workqueue_sched.h               |   13 
 24 files changed, 3202 insertions(+), 845 deletions(-)

B. General documentation of Concurrency Managed Workqueue (cmwq)
================================================================

== B-1. Why?

cmwq brings the following benefits.

* By using a shared pool of workers for each cpu, cmwq uses resources
  more efficiently and the system no longer ends up with a lot of
  kernel threads which sit mostly idle.

  The separate dedicated per-cpu workers of the current workqueue
  implementation are already becoming an actual scalability issue and
  with increasing number of cpus it will only get worse.

* cmwq can provide flexible level of concurrency on demand.  While the
  current workqueue implementation keeps a lot of worker threads
  around, it still can only provide very limited level of concurrency.

* cmwq makes obtaining and using execution contexts easy, which
  results in less complexities and awkward compromises in its users.
  IOW, it transfers complexity from its users to core code.

  This will also allow implementation of things which need a flexible
  async mechanism but aren't important enough to have dedicated worker
  pools for.

* Work execution latencies are shorter and more predictable.  They are
  no longer affected by how long random previous works might take to
  finish but, in the most part, regulated only by processing cycle
  availability.

* Much less to worry about causing deadlocks around execution
  resources.

* All the above while maintaining behavior compatibility with the
  original workqueue and without any noticeable run time overhead.

== B-2. Overview

There are many cases where an execution context is needed and there
already are several mechanisms for them.  The most commonly used one
is workqueue (wq) and there also are slow_work, async and some other.
Although wq has been serving the kernel well for quite some time, it
has certain limitations which are becoming more apparent.

There are two types of wq, single and multi threaded.  Multi threaded
(MT) wq keeps a bound thread for each online CPU, while single
threaded (ST) wq uses single unbound thread.  The number of CPU cores
is continuously rising and there already are systems which saturate
the default 32k PID space during boot up.

Frustratingly, although MT wq end up spending a lot of resources, the
level of concurrency provided is unsatisfactory.  The limitation is
common to both ST and MT wq although it's less severe on MT ones.
Worker pools of wq are separate from each other.  A MT wq provides one
execution context per CPU while a ST wq one for the whole system,
which leads to various problems.

One of the problems is possible deadlock through dependency on the
same execution resource.  These can be detected reliably with lockdep
these days but in most cases the only solution is to create a
dedicated wq for one of the parties involved in the deadlock, which
feeds back into the waste of resources problem.  Also, when creating
such dedicated wq to avoid deadlock, in an attempt to avoid wasting
large number of threads just for that work, ST wq are often used but
in most cases ST wq are suboptimal compared to MT wq.

The tension between the provided level of concurrency and resource
usage forces its users to make unnecessary tradeoffs like libata
choosing to use ST wq for polling PIOs and accepting a silly
limitation that no two polling PIOs can progress at the same time.  As
MT wq don't provide much better concurrency, users which require
higher level of concurrency, like async or fscache, end up having to
implement their own worker pool.

Concurrency managed workqueue (cmwq) extends wq with focus on the
following goals.

* Maintain compatibility with the current workqueue API while removing
  above mentioned limitations.

* Provide single unified worker pool per cpu which can be shared by
  all users.  The worker pool and level of concurrency should be
  regulated automatically so that the API users don't need to worry
  about such details.

* Use what's necessary and allocate resources lazily on demand while
  guaranteeing forward progress where necessary.

== B-3. Unified worklist

There's a single global cwq (gcwq) per each possible cpu which
actually serves out execution contexts.  cpu_workqueue's (cwq) of each
wq are mostly simple frontends to the associated gcwq.  Under normal
operation, when a work is queued, it's queued to the gcwq of the cpu.
Each gcwq has its own pool of workers which is used to process all the
works queued on the cpu.  Works mostly don't care to which wq they're
queued to and using a unified worklist is straight forward but there
are a couple of areas where things become more complicated.

First, when queueing works from different wq on the same worklist,
ordering of works needs some care.  Originally, a MT wq allows a work
to be executed simultaneously on multiple cpus although it doesn't
allow the same one to execute simultaneously on the same cpu
(reentrant).  A ST wq allows only single work to be executed on any
cpu which guarantees both non-reentrancy and single-threadedness.

cmwq provides three different ordering modes - reentrant (default
mode), non-reentrant and single-cpu.  Single-cpu can be used to
achieve single-threadedness and full ordering if combined with
max_active of 1.  The default mode (reentrant) is the same as the
original MT wq.  The distinction between non-reentrancy and single-cpu
is made because some of the current ST wq users dont't need single
threadedness but only non-reentrancy.

Another area where things are more involved is wq flushing because wq
act as flushing domains.  cmwq implements it by coloring works and
tracking how many times each color is used.  When a work is queued to
a cwq, it's assigned a color and each cwq maintains counters for each
work color.  The color assignment changes on each wq flush attempt.  A
cwq can tell that all works queued before a certain wq flush attempt
have finished by waiting for all the colors upto that point to drain.
This maintains the original wq flush semantics without adding
unscalable overhead.

== B-4. Concurrency managed shared worker pool

For any worker pool, managing the concurrency level (how many workers
are executing simultaneously) is an important issue.  cmwq tries to
keep the concurrency at minimal but sufficient level.

Concurrency management is implemented by hooking into the scheduler.
The gcwq is notified whenever a busy worker wakes up or sleeps and
keeps track of the level of concurrency.  Generally, works aren't
supposed to be cpu cycle hogs and maintaining just enough concurrency
to prevent work processing from stalling is optimal.  As long as
there's one or more workers running on the cpu, no new worker is
scheduled, but, when the last running worker blocks, the gcwq
immediately schedules a new worker so that the cpu doesn't sit idle
while there are pending works.

This allows using minimal number of workers without losing execution
bandwidth.  Keeping idle workers around doesn't cost other than the
memory space for kthreads, so cmwq holds onto idle ones for a while
before killing them.

As multiple execution contexts are available for each wq, deadlocks
around execution contexts is much harder to create.  The default wq,
system_wq, has maximum concurrency level of 256 and unless there is a
scenario which can result in a dependency loop involving more than 254
workers, it won't deadlock.

Such forward progress guarantee relies on that workers can be created
when more execution contexts are necessary.  This is guaranteed by
using emergency workers.  All wq which can be used in memory
allocation path are required to have emergency workers which are
reserved for execution of that specific wq so that memory allocation
for worker creation doesn't deadlock on workers.

== B-5. Performance test results

NOTE: This is with the third take[3] but nothing which could affect
      performance noticeably has changed since then.

wq workload is generated by perf-wq.c module which is a very simple
synthetic wq load generator.  A work is described by five parameters -
burn_usecs, mean_sleep_msecs, mean_resched_msecs and factor.  It
randomly splits burn_usecs into two, burns the first part, sleeps for
0 - 2 * mean_sleep_msecs, burns what's left of burn_usecs and then
reschedules itself in 0 - 2 * mean_resched_msecs.  factor is used to
tune the number of cycles to match execution duration.

It issues three types of works - short, medium and long, each with two
burn durations L and S.

	burn/L(us)	burn/S(us)	mean_sleep(ms)	mean_resched(ms) cycles
 short	50		1		1		10		 454
 medium	50		2		10		50		 125
 long	50		4		100		250		 42

And then these works are put into the following workloads.  The lower
numbered workloads have more short/medium works.

 workload 0
 * 12 wq with 4 short works
 *  2 wq with 2 short  and 2 medium works
 *  4 wq with 2 medium and 1 long works
 *  8 wq with 1 long work

 workload 1
 *  8 wq with 4 short works
 *  2 wq with 2 short  and 2 medium works
 *  4 wq with 2 medium and 1 long works
 *  8 wq with 1 long work

 workload 2
 *  4 wq with 4 short works
 *  2 wq with 2 short  and 2 medium works
 *  4 wq with 2 medium and 1 long works
 *  8 wq with 1 long work

 workload 3
 *  2 wq with 4 short works
 *  2 wq with 2 short  and 2 medium works
 *  4 wq with 2 medium and 1 long works
 *  8 wq with 1 long work

 workload 4
 *  2 wq with 4 short works
 *  2 wq with 2 medium works
 *  4 wq with 2 medium and 1 long works
 *  8 wq with 1 long work

 workload 5
 *  2 wq with 2 medium works
 *  4 wq with 2 medium and 1 long works
 *  8 wq with 1 long work

The above wq loads are run in parallel with mencoder converting 76M
mjpeg file into mpeg4 which takes 25.59 seconds with standard
deviation of 0.19 without wq loading.  The CPU was intel netburst
celeron running at 2.66GHz which was chosen for its small cache size
and slowness.  wl0 and 1 are only tested for burn/S.  Each test case
was run 11 times and the first run was discarded.

	 vanilla/L	cmwq/L		vanilla/S	cmwq/S
 wl0					26.18 d0.24	26.27 d0.29
 wl1					26.50 d0.45	26.52 d0.23
 wl2	26.62 d0.35	26.53 d0.23	26.14 d0.22	26.12 d0.32
 wl3	26.30 d0.25	26.29 d0.26	25.94 d0.25	26.17 d0.30
 wl4	26.26 d0.23	25.93 d0.24	25.90 d0.23	25.91 d0.29
 wl5	25.81 d0.33	25.88 d0.25	25.63 d0.27	25.59 d0.26

There is no significant difference between the two.  Maybe the code
overhead and benefits coming from context sharing are canceling each
other nicely.  With longer burns, cmwq looks better but it's nothing
significant.  With shorter burns, other than wl3 spiking up for
vanilla which probably would go away if the test is repeated, the two
are performing virtually identically.

The above is exaggerated synthetic test result and the performance
difference will be even less noticeable in either direction under
realistic workloads.

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel/998652
[3] http://thread.gmane.org/gmane.linux.kernel/939353

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 34/35] async: use workqueue for worker pool
  2010-06-28 21:03 ` [PATCHSET] workqueue: concurrency managed workqueue, take#6 Tejun Heo
@ 2010-06-28 21:04   ` Tejun Heo
  2010-06-28 22:55     ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-06-28 21:04 UTC (permalink / raw)
  To: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells,
	arjan, oleg, axboe, fweisbec, dwalker, stefanr, florian, andi,
	mst, randy.dunlap
  Cc: Tejun Heo, Arjan van de Ven

Replace private worker pool with system_long_wq.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@infradead.org>
---
 kernel/async.c |  140 ++++++++-----------------------------------------------
 1 files changed, 21 insertions(+), 119 deletions(-)

diff --git a/kernel/async.c b/kernel/async.c
index 15319d6..c285258 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -49,40 +49,32 @@ asynchronous and synchronous parts of the kernel.
 */
 
 #include <linux/async.h>
-#include <linux/bug.h>
 #include <linux/module.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
-#include <linux/init.h>
-#include <linux/kthread.h>
-#include <linux/delay.h>
 #include <linux/slab.h>
 #include <asm/atomic.h>
 
 static async_cookie_t next_cookie = 1;
 
-#define MAX_THREADS	256
 #define MAX_WORK	32768
 
 static LIST_HEAD(async_pending);
 static LIST_HEAD(async_running);
 static DEFINE_SPINLOCK(async_lock);
 
-static int async_enabled = 0;
-
 struct async_entry {
-	struct list_head list;
-	async_cookie_t   cookie;
-	async_func_ptr	 *func;
-	void             *data;
-	struct list_head *running;
+	struct list_head	list;
+	struct work_struct	work;
+	async_cookie_t		cookie;
+	async_func_ptr		*func;
+	void			*data;
+	struct list_head	*running;
 };
 
 static DECLARE_WAIT_QUEUE_HEAD(async_done);
-static DECLARE_WAIT_QUEUE_HEAD(async_new);
 
 static atomic_t entry_count;
-static atomic_t thread_count;
 
 extern int initcall_debug;
 
@@ -117,27 +109,23 @@ static async_cookie_t  lowest_in_progress(struct list_head *running)
 	spin_unlock_irqrestore(&async_lock, flags);
 	return ret;
 }
+
 /*
  * pick the first pending entry and run it
  */
-static void run_one_entry(void)
+static void async_run_entry_fn(struct work_struct *work)
 {
+	struct async_entry *entry =
+		container_of(work, struct async_entry, work);
 	unsigned long flags;
-	struct async_entry *entry;
 	ktime_t calltime, delta, rettime;
 
-	/* 1) pick one task from the pending queue */
-
+	/* 1) move self to the running queue */
 	spin_lock_irqsave(&async_lock, flags);
-	if (list_empty(&async_pending))
-		goto out;
-	entry = list_first_entry(&async_pending, struct async_entry, list);
-
-	/* 2) move it to the running queue */
 	list_move_tail(&entry->list, entry->running);
 	spin_unlock_irqrestore(&async_lock, flags);
 
-	/* 3) run it (and print duration)*/
+	/* 2) run (and print duration) */
 	if (initcall_debug && system_state == SYSTEM_BOOTING) {
 		printk("calling  %lli_%pF @ %i\n", (long long)entry->cookie,
 			entry->func, task_pid_nr(current));
@@ -153,31 +141,25 @@ static void run_one_entry(void)
 			(long long)ktime_to_ns(delta) >> 10);
 	}
 
-	/* 4) remove it from the running queue */
+	/* 3) remove self from the running queue */
 	spin_lock_irqsave(&async_lock, flags);
 	list_del(&entry->list);
 
-	/* 5) free the entry  */
+	/* 4) free the entry */
 	kfree(entry);
 	atomic_dec(&entry_count);
 
 	spin_unlock_irqrestore(&async_lock, flags);
 
-	/* 6) wake up any waiters. */
+	/* 5) wake up any waiters */
 	wake_up(&async_done);
-	return;
-
-out:
-	spin_unlock_irqrestore(&async_lock, flags);
 }
 
-
 static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct list_head *running)
 {
 	struct async_entry *entry;
 	unsigned long flags;
 	async_cookie_t newcookie;
-	
 
 	/* allow irq-off callers */
 	entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
@@ -186,7 +168,7 @@ static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct l
 	 * If we're out of memory or if there's too much work
 	 * pending already, we execute synchronously.
 	 */
-	if (!async_enabled || !entry || atomic_read(&entry_count) > MAX_WORK) {
+	if (!entry || atomic_read(&entry_count) > MAX_WORK) {
 		kfree(entry);
 		spin_lock_irqsave(&async_lock, flags);
 		newcookie = next_cookie++;
@@ -196,6 +178,7 @@ static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct l
 		ptr(data, newcookie);
 		return newcookie;
 	}
+	INIT_WORK(&entry->work, async_run_entry_fn);
 	entry->func = ptr;
 	entry->data = data;
 	entry->running = running;
@@ -205,7 +188,10 @@ static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct l
 	list_add_tail(&entry->list, &async_pending);
 	atomic_inc(&entry_count);
 	spin_unlock_irqrestore(&async_lock, flags);
-	wake_up(&async_new);
+
+	/* schedule for execution */
+	queue_work(system_long_wq, &entry->work);
+
 	return newcookie;
 }
 
@@ -312,87 +298,3 @@ void async_synchronize_cookie(async_cookie_t cookie)
 	async_synchronize_cookie_domain(cookie, &async_running);
 }
 EXPORT_SYMBOL_GPL(async_synchronize_cookie);
-
-
-static int async_thread(void *unused)
-{
-	DECLARE_WAITQUEUE(wq, current);
-	add_wait_queue(&async_new, &wq);
-
-	while (!kthread_should_stop()) {
-		int ret = HZ;
-		set_current_state(TASK_INTERRUPTIBLE);
-		/*
-		 * check the list head without lock.. false positives
-		 * are dealt with inside run_one_entry() while holding
-		 * the lock.
-		 */
-		rmb();
-		if (!list_empty(&async_pending))
-			run_one_entry();
-		else
-			ret = schedule_timeout(HZ);
-
-		if (ret == 0) {
-			/*
-			 * we timed out, this means we as thread are redundant.
-			 * we sign off and die, but we to avoid any races there
-			 * is a last-straw check to see if work snuck in.
-			 */
-			atomic_dec(&thread_count);
-			wmb(); /* manager must see our departure first */
-			if (list_empty(&async_pending))
-				break;
-			/*
-			 * woops work came in between us timing out and us
-			 * signing off; we need to stay alive and keep working.
-			 */
-			atomic_inc(&thread_count);
-		}
-	}
-	remove_wait_queue(&async_new, &wq);
-
-	return 0;
-}
-
-static int async_manager_thread(void *unused)
-{
-	DECLARE_WAITQUEUE(wq, current);
-	add_wait_queue(&async_new, &wq);
-
-	while (!kthread_should_stop()) {
-		int tc, ec;
-
-		set_current_state(TASK_INTERRUPTIBLE);
-
-		tc = atomic_read(&thread_count);
-		rmb();
-		ec = atomic_read(&entry_count);
-
-		while (tc < ec && tc < MAX_THREADS) {
-			if (IS_ERR(kthread_run(async_thread, NULL, "async/%i",
-					       tc))) {
-				msleep(100);
-				continue;
-			}
-			atomic_inc(&thread_count);
-			tc++;
-		}
-
-		schedule();
-	}
-	remove_wait_queue(&async_new, &wq);
-
-	return 0;
-}
-
-static int __init async_init(void)
-{
-	async_enabled =
-		!IS_ERR(kthread_run(async_manager_thread, NULL, "async/mgr"));
-
-	WARN_ON(!async_enabled);
-	return 0;
-}
-
-core_initcall(async_init);
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 34/35] async: use workqueue for worker pool
  2010-06-28 21:04   ` [PATCH 34/35] async: use workqueue for worker pool Tejun Heo
@ 2010-06-28 22:55     ` Frederic Weisbecker
  2010-06-29  7:25       ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2010-06-28 22:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells,
	arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

On Mon, Jun 28, 2010 at 11:04:22PM +0200, Tejun Heo wrote:
> Replace private worker pool with system_long_wq.


It appeared to me that async is deemed to parallelize as much as
possible, to probe devices faster on boot for example, while cmwq
seems to do the opposite: trying to execute in batches as much as
possible, and fork when a work goes to sleep voluntarily.

That said I haven't checked that deeply so it's fairly possible
I missed something obvious :)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 34/35] async: use workqueue for worker pool
  2010-06-28 22:55     ` Frederic Weisbecker
@ 2010-06-29  7:25       ` Tejun Heo
  2010-06-29 12:18         ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-06-29  7:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells,
	arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

Hello,

On 06/29/2010 12:55 AM, Frederic Weisbecker wrote:
> On Mon, Jun 28, 2010 at 11:04:22PM +0200, Tejun Heo wrote:
>> Replace private worker pool with system_long_wq.
> 
> It appeared to me that async is deemed to parallelize as much as
> possible, to probe devices faster on boot for example, while cmwq
> seems to do the opposite: trying to execute in batches as much as
> possible, and fork when a work goes to sleep voluntarily.

Yeah, well, that's kind of the whole point of cmwq.  It would try to
minimize the number of used workers but the provided concurrency will
still be enough.  No async probe will be stalled due to lack of
execution context and the timings should be about the same between the
original async implemetnation and cmwq based one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 34/35] async: use workqueue for worker pool
  2010-06-29  7:25       ` Tejun Heo
@ 2010-06-29 12:18         ` Frederic Weisbecker
  2010-06-29 15:46           ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 12:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells,
	arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

On Tue, Jun 29, 2010 at 09:25:12AM +0200, Tejun Heo wrote:
> Hello,
> 
> On 06/29/2010 12:55 AM, Frederic Weisbecker wrote:
> > On Mon, Jun 28, 2010 at 11:04:22PM +0200, Tejun Heo wrote:
> >> Replace private worker pool with system_long_wq.
> > 
> > It appeared to me that async is deemed to parallelize as much as
> > possible, to probe devices faster on boot for example, while cmwq
> > seems to do the opposite: trying to execute in batches as much as
> > possible, and fork when a work goes to sleep voluntarily.
> 
> Yeah, well, that's kind of the whole point of cmwq.  It would try to
> minimize the number of used workers but the provided concurrency will
> still be enough.  No async probe will be stalled due to lack of
> execution context and the timings should be about the same between the
> original async implemetnation and cmwq based one.
> 
> Thanks.


Right. I just don't know what is supposed to be slow on boot that needs
to use async. Is that because reading some ports is slow or because we
need to do something and wait for some times to get the result.

If there is a question of slow ports to probe, then cmwq wouldn't seem the
right thing here, as it only forks when we go to sleep.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 34/35] async: use workqueue for worker pool
  2010-06-29 12:18         ` Frederic Weisbecker
@ 2010-06-29 15:46           ` Tejun Heo
  2010-06-29 15:52             ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-06-29 15:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells,
	arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

Hello,

On 06/29/2010 02:18 PM, Frederic Weisbecker wrote:
>> Yeah, well, that's kind of the whole point of cmwq.  It would try to
>> minimize the number of used workers but the provided concurrency will
>> still be enough.  No async probe will be stalled due to lack of
>> execution context and the timings should be about the same between the
>> original async implemetnation and cmwq based one.
> 
> Right. I just don't know what is supposed to be slow on boot that
> needs to use async.  Is that because reading some ports is slow or
> because we need to do something and wait for some times to get the
> result.

It's things like ATA bus resetting and probing.  They're usually
composed of short CPU activities and rather long sleeps.

> If there is a question of slow ports to probe, then cmwq wouldn't seem the
> right thing here, as it only forks when we go to sleep.

I lost you here.  If something during boot has to burn cpu cycles
(which it shouldn't, really), it has to burn cpu cycles and having
multiple concurent threads won't help anything.  If something doesn't
burn cpu cycles but takes long, it gotta sleep and cmwq will start a
new thread immediately.  So, can you please elaborate why cmwq would
be problematic?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 34/35] async: use workqueue for worker pool
  2010-06-29 15:46           ` Tejun Heo
@ 2010-06-29 15:52             ` Frederic Weisbecker
  2010-06-29 15:55               ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 15:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells,
	arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

On Tue, Jun 29, 2010 at 05:46:32PM +0200, Tejun Heo wrote:
> Hello,
> 
> On 06/29/2010 02:18 PM, Frederic Weisbecker wrote:
> >> Yeah, well, that's kind of the whole point of cmwq.  It would try to
> >> minimize the number of used workers but the provided concurrency will
> >> still be enough.  No async probe will be stalled due to lack of
> >> execution context and the timings should be about the same between the
> >> original async implemetnation and cmwq based one.
> > 
> > Right. I just don't know what is supposed to be slow on boot that
> > needs to use async.  Is that because reading some ports is slow or
> > because we need to do something and wait for some times to get the
> > result.
> 
> It's things like ATA bus resetting and probing.  They're usually
> composed of short CPU activities and rather long sleeps.


Ok.


 
> > If there is a question of slow ports to probe, then cmwq wouldn't seem the
> > right thing here, as it only forks when we go to sleep.
> 
> I lost you here.  If something during boot has to burn cpu cycles
> (which it shouldn't, really), it has to burn cpu cycles and having
> multiple concurent threads won't help anything.



It would on SMP.



> If something doesn't
> burn cpu cycles but takes long, it gotta sleep and cmwq will start a
> new thread immediately.  So, can you please elaborate why cmwq would
> be problematic?


No in this case it's not problematic, as far as the things that were using
async have a small cpu burn and long sleep waiting, it looks like cmwq
fits :)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 34/35] async: use workqueue for worker pool
  2010-06-29 15:52             ` Frederic Weisbecker
@ 2010-06-29 15:55               ` Tejun Heo
  2010-06-29 16:40                 ` Arjan van de Ven
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-06-29 15:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: torvalds, mingo, linux-kernel, jeff, akpm, rusty, cl, dhowells,
	arjan, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

Hello,

On 06/29/2010 05:52 PM, Frederic Weisbecker wrote:
>>> If there is a question of slow ports to probe, then cmwq wouldn't seem the
>>> right thing here, as it only forks when we go to sleep.
>>
>> I lost you here.  If something during boot has to burn cpu cycles
>> (which it shouldn't, really), it has to burn cpu cycles and having
>> multiple concurent threads won't help anything.
> 
> It would on SMP.

Oh, I see.  Parallel cpu hogs.  We don't have such users for async and
I think using padata would be the right solution for those situations.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 34/35] async: use workqueue for worker pool
  2010-06-29 15:55               ` Tejun Heo
@ 2010-06-29 16:40                 ` Arjan van de Ven
  2010-06-29 21:37                   ` David Howells
  0 siblings, 1 reply; 15+ messages in thread
From: Arjan van de Ven @ 2010-06-29 16:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm,
	rusty, cl, dhowells, oleg, axboe, dwalker, stefanr, florian, andi,
	mst, randy.dunlap, Arjan van de Ven

On 6/29/2010 8:55 AM, Tejun Heo wrote:
> Hello,
>
> On 06/29/2010 05:52 PM, Frederic Weisbecker wrote:
>    
>>>> If there is a question of slow ports to probe, then cmwq wouldn't seem the
>>>> right thing here, as it only forks when we go to sleep.
>>>>          
>>> I lost you here.  If something during boot has to burn cpu cycles
>>> (which it shouldn't, really), it has to burn cpu cycles and having
>>> multiple concurent threads won't help anything.
>>>        
>> It would on SMP.
>>      
> Oh, I see.  Parallel cpu hogs.  We don't have such users for async and
> I think using padata would be the right solution for those situations.
>    

uh? clearly the assumption is that if I have a 16 CPU machine, and 12 
items of work get scheduled,
that we get all 12 running in parallel. All the smarts of cmwq surely 
only kick in once you've reached the
"one work item per cpu" threshold ???



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 34/35] async: use workqueue for worker pool
  2010-06-29 16:59 [PATCH 34/35] async: use workqueue for worker pool Tejun Heo
@ 2010-06-29 21:37                   ` David Howells
  2010-07-02  9:17                     ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2010-06-29 21:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Tejun Heo <tj@kernel.org> wrote:

> Hmmm... workqueue workers are bound to certain cpu, so if you schedule
> a work on a specific CPU, it will run there.

That's my main problem with using cmwq to replace slow-work.

> Once a cpu gets saturated, the issuing thread will be moved elsewhere.

Assuming that the issuing thread isn't bound by the condition specified in the
previous sentence...

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-06-29 21:37                   ` David Howells
@ 2010-07-02  9:17                     ` Tejun Heo
  2010-07-02  9:32                       ` Tejun Heo
                                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Tejun Heo @ 2010-07-02  9:17 UTC (permalink / raw)
  To: David Howells, Arjan van de Ven
  Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm,
	rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

Hello, David, Arjan.

These four patches implement unbound workqueues which can be used as
simple execution context provider.  I changed async to use it and will
also make fscache use it.  This can be used by setting WQ_UNBOUND on
workqueue creation.  Works queued to unbound workqueues are implicitly
HIGHPRI and dispatched to unbound workers as soon as resources are
available and the only limitation applied by workqueue code is
@max_active.  IOW, for both async and fscache, things will stay about
the same.

WQ_UNBOUND can serve the role of WQ_SINGLE_CPU.  WQ_SINGLE_CPU is
dropped and replaced by WQ_UNBOUND.

Arjan, I still think we'll be better off using bound workqueues for
async but let's first convert without causing behavior difference.
Either way isn't gonna result in any noticeable difference anyway.  If
you're okay with the conversion, please ack it.

David, this should work for fscache/slow-work the same way too.  That
should relieve your concern, right?  Oh, and Frederic suggested that
we would be better off with something based on tracing API and I
agree, so the debugfs thing is currently dropped from the tree.  What
do you think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-02  9:17                     ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo
@ 2010-07-02  9:32                       ` Tejun Heo
  2010-07-07  5:41                       ` Tejun Heo
  2010-07-20 22:01                       ` David Howells
  2 siblings, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2010-07-02  9:32 UTC (permalink / raw)
  To: David Howells, Arjan van de Ven
  Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm,
	rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

On 07/02/2010 11:17 AM, Tejun Heo wrote:
> Hello, David, Arjan.
> 
> These four patches implement unbound workqueues which can be used as
> simple execution context provider.  I changed async to use it and will
> also make fscache use it.  This can be used by setting WQ_UNBOUND on
> workqueue creation.  Works queued to unbound workqueues are implicitly
> HIGHPRI and dispatched to unbound workers as soon as resources are
> available and the only limitation applied by workqueue code is
> @max_active.  IOW, for both async and fscache, things will stay about
> the same.
> 
> WQ_UNBOUND can serve the role of WQ_SINGLE_CPU.  WQ_SINGLE_CPU is
> dropped and replaced by WQ_UNBOUND.
> 
> Arjan, I still think we'll be better off using bound workqueues for
> async but let's first convert without causing behavior difference.
> Either way isn't gonna result in any noticeable difference anyway.  If
> you're okay with the conversion, please ack it.
> 
> David, this should work for fscache/slow-work the same way too.  That
> should relieve your concern, right?  Oh, and Frederic suggested that
> we would be better off with something based on tracing API and I
> agree, so the debugfs thing is currently dropped from the tree.  What
> do you think?

Oops, forgot something.  These four patches are on top of
wq#for-next-candidate branch which is cmwq take#6 + four fix patches

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-next-candidate

and available in the following git tree.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-cmwq

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-02  9:17                     ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo
  2010-07-02  9:32                       ` Tejun Heo
@ 2010-07-07  5:41                       ` Tejun Heo
  2010-07-14  9:39                         ` Tejun Heo
  2010-07-20 22:01                       ` David Howells
  2 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-07-07  5:41 UTC (permalink / raw)
  To: David Howells, Arjan van de Ven
  Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm,
	rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

On 07/02/2010 11:17 AM, Tejun Heo wrote:
> Arjan, I still think we'll be better off using bound workqueues for
> async but let's first convert without causing behavior difference.
> Either way isn't gonna result in any noticeable difference anyway.  If
> you're okay with the conversion, please ack it.

Ping, Arjan.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-07  5:41                       ` Tejun Heo
@ 2010-07-14  9:39                         ` Tejun Heo
  0 siblings, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2010-07-14  9:39 UTC (permalink / raw)
  To: David Howells, Arjan van de Ven
  Cc: Frederic Weisbecker, torvalds, mingo, linux-kernel, jeff, akpm,
	rusty, cl, oleg, axboe, dwalker, stefanr, florian, andi, mst,
	randy.dunlap, Arjan van de Ven

Hello,

On 07/07/2010 07:41 AM, Tejun Heo wrote:
> On 07/02/2010 11:17 AM, Tejun Heo wrote:
>> Arjan, I still think we'll be better off using bound workqueues for
>> async but let's first convert without causing behavior difference.
>> Either way isn't gonna result in any noticeable difference anyway.  If
>> you're okay with the conversion, please ack it.
> 
> Ping, Arjan.

Just for the record, I pinged Arjan again offlist and Arjan acked the
conversion in the reply.  Added Acked-by and pushed the conversion to
for-next-candidate which will be pushed into linux-next the next week.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET] workqueue: implement and use WQ_UNBOUND
  2010-07-02  9:17                     ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo
  2010-07-02  9:32                       ` Tejun Heo
  2010-07-07  5:41                       ` Tejun Heo
@ 2010-07-20 22:01                       ` David Howells
  2 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2010-07-20 22:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: dhowells, Arjan van de Ven, Frederic Weisbecker, torvalds, mingo,
	linux-kernel, jeff, akpm, rusty, cl, oleg, axboe, dwalker,
	stefanr, florian, andi, mst, randy.dunlap, Arjan van de Ven

Tejun Heo <tj@kernel.org> wrote:

> David, this should work for fscache/slow-work the same way too.  That
> should relieve your concern, right?

Not at the moment.  What does this mean:

	 * Unbound workqueues aren't concurrency managed and should be
	 * dispatched to workers immediately.

Does this mean you don't get reentrancy guarantees with unbounded work queues?

I can't work out how you're achieving it with unbounded queues.  I presume with
CPU-bound workqueues your doing it by binding the work item to the current CPU
still...

Btw, how does this fare in an RT system, where work items bound to a CPU can't
get executed because their CPU is busy with an RT thread, even though there are
other, idle CPUs?

> Oh, and Frederic suggested that we would be better off with something based
> on tracing API and I agree, so the debugfs thing is currently dropped from
> the tree.  What do you think?

I probably disagree.  I just want to be able to cat a file and see the current
runqueue state.  I don't want to have to write and distribute a special program
to do this.  Of course, I don't know that much about the tracing API, so
cat'ing a file to get the runqueue listed nicely may be possible with that.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-07-21 15:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-20 22:39 [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo
2010-07-21 13:08 ` David Howells
2010-07-21 14:59   ` Tejun Heo
2010-07-21 15:03     ` Tejun Heo
2010-07-21 15:25     ` David Howells
2010-07-21 15:31       ` Tejun Heo
2010-07-21 15:38         ` David Howells
2010-07-21 15:42           ` Tejun Heo
2010-07-21 15:45         ` David Howells
2010-07-21 15:51           ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2010-06-29 16:59 [PATCH 34/35] async: use workqueue for worker pool Tejun Heo
2010-06-28 21:03 ` [PATCHSET] workqueue: concurrency managed workqueue, take#6 Tejun Heo
2010-06-28 21:04   ` [PATCH 34/35] async: use workqueue for worker pool Tejun Heo
2010-06-28 22:55     ` Frederic Weisbecker
2010-06-29  7:25       ` Tejun Heo
2010-06-29 12:18         ` Frederic Weisbecker
2010-06-29 15:46           ` Tejun Heo
2010-06-29 15:52             ` Frederic Weisbecker
2010-06-29 15:55               ` Tejun Heo
2010-06-29 16:40                 ` Arjan van de Ven
2010-06-29 21:37                   ` David Howells
2010-07-02  9:17                     ` [PATCHSET] workqueue: implement and use WQ_UNBOUND Tejun Heo
2010-07-02  9:32                       ` Tejun Heo
2010-07-07  5:41                       ` Tejun Heo
2010-07-14  9:39                         ` Tejun Heo
2010-07-20 22:01                       ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).