linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Jeff Garzik <jeff@garzik.org>
Cc: Jens Axboe <jens.axboe@oracle.com>, Mark Lord <liml@rtr.ca>,
	linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [PATCH] libata: use single threaded work queue
Date: Wed, 19 Aug 2009 23:11:44 +0900	[thread overview]
Message-ID: <4A8C0820.8060200@gmail.com> (raw)
In-Reply-To: <4A8BFDE2.1010904@garzik.org>

Hello, guys.

Jeff Garzik wrote:
>> Let people complain with code :) libata has two basic needs in this area:
>> (1) specifying a thread count other than "1" or "nr-cpus"
>> (2) don't start unneeded threads / idle out unused threads
> 
> To be even more general,
> 
> libata needs a workqueue or thread pool that can
> 
> (a) scale up to nr-drives-that-use-pio threads, on demand
> (b) scale down to zero threads, with lack of demand
> 
> That handles the worst case of each PIO-polling drive needing to sleep
> (thus massively impacting latency, if any other PIO-polling drive must
> wait for a free thread).
> 
> That also handles the best case of not needing any threads at all.

Heh... I've been trying to implement in-kernel media presence polling
and hit about the same problem.  The problem is quite widespread.  The
choice of multithreaded workqueue was intentional as Jeff explained.
There are many workqueues which are created in fear of blocking or
being blocked by other works although in most cases it shouldn't be a
problem then there's the newly added async mechanism, which I don't
quite get as it runs the worker function from different environment
depending on resource availability - the worker function might be
executed synchronously where it might have different context
w.r.t. locking or whatever.

So, I've spent some time thinking about alternative so that things can
be unified.

* Per-cpu binding is good.

* Managing the right level of concurrency isn't easy.  If we try to
  schedule works too soonish we can end up wasting resources and slow
  things down compared to the current somewhat confined work
  processing.  If works are scheduled too late, resources will be
  underutilized.

* Some workqueues are there to guarantee forward progress and avoid
  deadlocks around the work execution resource (workqueue threads).
  Similar mechanism needs to be in place.

* It would be nice to implement async execution in terms of workqueue
  or even replace it with workqueue.

My a bit crazy idea was like the followings.

* All works get queued on a single unified per-cpu work list.

* Perfect level of concurrency can be managed by hooking into
  scheduler and kicking a new worker thread iff the currently running
  worker is about to be scheduled out for whatever reason and there's
  no other worker ready to run.

* Thread pool of a few idle threads is always maintained per cpu and
  they get used by the above scheduler hooking.  When the thread pool
  gets exhausted, manager thread is scheduled instead and replenishes
  the pool.  When there are too many idle threads, the pool size is
  reduced slowly.

* Forward-progress can be guaranteed by reserving a single thread for
  any such group of works.  When there are such works pending and the
  manager is invoked to replenish the worker pook, all such works on
  the queue are dispatched to their respective reserved threads.
  Please note that this will happen only rarely as the worker pool
  size will be kept enough and stable most of the time.

* Async can be reimplemented as work which get assigned to cpus in
  round-robin manner.  This wouldn't be perfect but should be enough.

Managing the perfect level of concurrency would have benefits in
resource usages, cache footprint, bandwidth and responsiveness.  I
haven't actually tried to implement the above yet and am still
wondering whether the complexity is justified.

Thanks.

-- 
tejun

  reply	other threads:[~2009-08-19 14:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-19 11:25 [PATCH] libata: use single threaded work queue Jens Axboe
2009-08-19 11:59 ` Jeff Garzik
2009-08-19 12:04   ` Jens Axboe
2009-08-19 12:14     ` Mark Lord
2009-08-19 12:23       ` Jens Axboe
2009-08-19 13:22         ` Jeff Garzik
2009-08-19 13:28           ` Jeff Garzik
2009-08-19 14:11             ` Tejun Heo [this message]
2009-08-19 15:21               ` Alan Cox
2009-08-19 15:53                 ` Tejun Heo
2009-08-19 16:15                   ` Alan Cox
2009-08-19 16:58                     ` Tejun Heo
2009-08-19 17:23                       ` Alan Cox
2009-08-20 12:46                         ` Tejun Heo
2009-08-20 11:39                 ` Stefan Richter
2009-08-20 12:11                   ` Stefan Richter
2009-08-19 22:22         ` Benjamin Herrenschmidt
2009-08-20 12:47           ` Tejun Heo
2009-08-20 12:48             ` Tejun Heo
2009-08-20 14:28               ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A8C0820.8060200@gmail.com \
    --to=htejun@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=jeff@garzik.org \
    --cc=jens.axboe@oracle.com \
    --cc=liml@rtr.ca \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).