Delayed interrupt work, thread pools

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Delayed interrupt work, thread pools
@ 2008-07-01 12:45 Benjamin Herrenschmidt
  2008-07-01 12:53 ` [Ksummit-2008-discuss] " Matthew Wilcox
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-01 12:45 UTC (permalink / raw)
  To: ksummit-2008-discuss; +Cc: Linux Kernel list

Here's something that's been running in the back of my mind for some
time that could be a good topic of discussion at KS.

In various areas (I'll come up with some examples later), kernel code
such as drivers want to defer some processing to "task level", for
various reasons such as locking (taking mutexes), memory allocation,
interrupt latency, or simply doing things that may take more time than
is reasonable to do at interrupt time or do things that may block.

Currently, the main mechanism we provide to do that is workqueues. They
somewhat solve the problem, but at the same time, somewhat can make it
worse.

The problem is that delaying a potentially long/sleeping task to a work
queue will have the effect of delaying everything else waiting on that
work queue.

The ability to have per-cpu work queues helps in areas where the problem
scope is mostly per-cpu, but doesn't necessarily cover the case where
the problem scope depends on the driver's activity, not necessarily tied
to one CPU.

Let's take some examples: The main one (which triggers my email) is
spufs, ie, the management of the SPU "co-processors" on the cell
processor, though the same thing mostly applies to any similar
co-processor architecture that would require the need to service page
faults to access user memory.

In this case, various contexts running on the device may want to service
long operations (ie. handle_mm_fault in this case), but using the main
work queue or even a dedicated per-cpu one will cause a context to
potentially hog other contexts or other drivers trying to do the same
while the first one is blocked in the page fault code waiting for IOs...

The basic interface that such drivers want it still about the same as
workqueues tho: "call that function at task level as soon as possible".

Thus the idea of turning workqueues into some kind of pool of threads. 

At a given point in time, if none are available (idle) and work stacks
up, the kernel can allocate a new bunch and dispatch more work. Of
course, we would have to find tune what the actual algorithm is to
decide whether to allocate new threads or just wait / throttle for
current delayed work to complete. But I believe the basic premise still
stand.

So what about we allocate a "pool" of task structs, initially blocked,
ready to service jobs dispatched from interrupt time, with some
mechanism, possibly based on the existing base work queue, that can
allocate more if too much work stacks up or (via some scheduler
feedback) too many of the current ones are blocked (ie. waiting for IOs
for example).

For the specific SPU management issue we've been thinking about, we
could just implement an ad-hoc mechanism locally, but it occurs to me
that maybe this is a more generic problem and thus some kind of
extension to workqueues would be a good idea here.

Any comments ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-01 12:45 Delayed interrupt work, thread pools Benjamin Herrenschmidt
@ 2008-07-01 12:53 ` Matthew Wilcox
  2008-07-01 13:38   ` Benjamin Herrenschmidt
  2008-07-01 13:02 ` Robin Holt
  2008-07-02  4:22 ` Arjan van de Ven
  2 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2008-07-01 12:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: ksummit-2008-discuss, Linux Kernel list

On Tue, Jul 01, 2008 at 10:45:35PM +1000, Benjamin Herrenschmidt wrote:
> In various areas (I'll come up with some examples later), kernel code
> such as drivers want to defer some processing to "task level", for
> various reasons such as locking (taking mutexes), memory allocation,
> interrupt latency, or simply doing things that may take more time than
> is reasonable to do at interrupt time or do things that may block.
> 
> Currently, the main mechanism we provide to do that is workqueues. They
> somewhat solve the problem, but at the same time, somewhat can make it
> worse.

Why not just use a dedicated thread?  The API to start / stop threads is
now pretty easy to use.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Delayed interrupt work, thread pools
  2008-07-01 12:45 Delayed interrupt work, thread pools Benjamin Herrenschmidt
  2008-07-01 12:53 ` [Ksummit-2008-discuss] " Matthew Wilcox
@ 2008-07-01 13:02 ` Robin Holt
  2008-07-02  1:39   ` Dean Nelson
  2008-07-02  4:22 ` Arjan van de Ven
  2 siblings, 1 reply; 25+ messages in thread
From: Robin Holt @ 2008-07-01 13:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Dean Nelson
  Cc: ksummit-2008-discuss, Linux Kernel list

Adding Dean Nelson to this discussion.  I don't think he actively
follows lkml.  We do something similar to this in xpc by managing our
own pool of threads.  I know he has talked about this type thing in the
past.

Thanks,
Robin


On Tue, Jul 01, 2008 at 10:45:35PM +1000, Benjamin Herrenschmidt wrote:
> Here's something that's been running in the back of my mind for some
> time that could be a good topic of discussion at KS.
> 
> In various areas (I'll come up with some examples later), kernel code
> such as drivers want to defer some processing to "task level", for
> various reasons such as locking (taking mutexes), memory allocation,
> interrupt latency, or simply doing things that may take more time than
> is reasonable to do at interrupt time or do things that may block.
> 
> Currently, the main mechanism we provide to do that is workqueues. They
> somewhat solve the problem, but at the same time, somewhat can make it
> worse.
> 
> The problem is that delaying a potentially long/sleeping task to a work
> queue will have the effect of delaying everything else waiting on that
> work queue.
> 
> The ability to have per-cpu work queues helps in areas where the problem
> scope is mostly per-cpu, but doesn't necessarily cover the case where
> the problem scope depends on the driver's activity, not necessarily tied
> to one CPU.
> 
> Let's take some examples: The main one (which triggers my email) is
> spufs, ie, the management of the SPU "co-processors" on the cell
> processor, though the same thing mostly applies to any similar
> co-processor architecture that would require the need to service page
> faults to access user memory.
> 
> In this case, various contexts running on the device may want to service
> long operations (ie. handle_mm_fault in this case), but using the main
> work queue or even a dedicated per-cpu one will cause a context to
> potentially hog other contexts or other drivers trying to do the same
> while the first one is blocked in the page fault code waiting for IOs...
> 
> The basic interface that such drivers want it still about the same as
> workqueues tho: "call that function at task level as soon as possible".
> 
> Thus the idea of turning workqueues into some kind of pool of threads. 
> 
> At a given point in time, if none are available (idle) and work stacks
> up, the kernel can allocate a new bunch and dispatch more work. Of
> course, we would have to find tune what the actual algorithm is to
> decide whether to allocate new threads or just wait / throttle for
> current delayed work to complete. But I believe the basic premise still
> stand.
> 
> So what about we allocate a "pool" of task structs, initially blocked,
> ready to service jobs dispatched from interrupt time, with some
> mechanism, possibly based on the existing base work queue, that can
> allocate more if too much work stacks up or (via some scheduler
> feedback) too many of the current ones are blocked (ie. waiting for IOs
> for example).
> 
> For the specific SPU management issue we've been thinking about, we
> could just implement an ad-hoc mechanism locally, but it occurs to me
> that maybe this is a more generic problem and thus some kind of
> extension to workqueues would be a good idea here.
> 
> Any comments ?
> 
> Cheers,
> Ben.
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-01 12:53 ` [Ksummit-2008-discuss] " Matthew Wilcox
@ 2008-07-01 13:38   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 25+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-01 13:38 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: ksummit-2008-discuss, Linux Kernel list

On Tue, 2008-07-01 at 06:53 -0600, Matthew Wilcox wrote:
> On Tue, Jul 01, 2008 at 10:45:35PM +1000, Benjamin Herrenschmidt wrote:
> > In various areas (I'll come up with some examples later), kernel code
> > such as drivers want to defer some processing to "task level", for
> > various reasons such as locking (taking mutexes), memory allocation,
> > interrupt latency, or simply doing things that may take more time than
> > is reasonable to do at interrupt time or do things that may block.
> > 
> > Currently, the main mechanism we provide to do that is workqueues. They
> > somewhat solve the problem, but at the same time, somewhat can make it
> > worse.
> 
> Why not just use a dedicated thread?  The API to start / stop threads is
> now pretty easy to use.

A dedicated thread isn't far from a dedicated workqueue. The thread can
be blocked servicing a page fault and that will delay any further work.

In the case of spufs, we could solve that by having a dedicated thread
per context. That's probably what we'll do for our proof-of-concept
implementation of our new ideas. But that sounds overkill, there
shouldn't be -that- much page faults. Similar comes with gfx cards with
MMUs, etc.. we'd end up with shitload of dedicated threads mostly
staying there sleeping and wasting kernel resources.

Another option I though about would be something akin to some of the
threadlet discussions (or whatever we call those nowadays). ie, have the
workqueue fork when it blocks basically. That would require some API
changes as current drivers may rely on the fact that all workqueues
tasks are serialized though.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Delayed interrupt work, thread pools
  2008-07-01 13:02 ` Robin Holt
@ 2008-07-02  1:39   ` Dean Nelson
  2008-07-02  2:38     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 25+ messages in thread
From: Dean Nelson @ 2008-07-02  1:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Robin Holt, ksummit-2008-discuss, Linux Kernel list

On Tue, Jul 01, 2008 at 08:02:40AM -0500, Robin Holt wrote:
> Adding Dean Nelson to this discussion.  I don't think he actively
> follows lkml.  We do something similar to this in xpc by managing our
> own pool of threads.  I know he has talked about this type thing in the
> past.
> 
> Thanks,
> Robin
> 
> 
> On Tue, Jul 01, 2008 at 10:45:35PM +1000, Benjamin Herrenschmidt wrote:
> > 
> > For the specific SPU management issue we've been thinking about, we
> > could just implement an ad-hoc mechanism locally, but it occurs to me
> > that maybe this is a more generic problem and thus some kind of
> > extension to workqueues would be a good idea here.
> > 
> > Any comments ?

As Robin, mentioned XPC manages a pool of kthreads that can (for performance
reasons) be quickly awakened by an interrupt handler and that are able to
block for indefinite periods of time.

In drivers/misc/sgi-xp/xpc_main.c you'll find a rather simplistic attempt
at maintaining this pool of kthreads. 

The kthreads are activated by calling xpc_activate_kthreads(). Either idle
kthreads are awakened or new kthreads are created if a sufficent number of
idle kthreads are not available.

Once finished with current 'work' a kthread waits for new work by calling
wait_event_interruptible_exclusive(). (The call is found in
xpc_kthread_waitmsgs().)

The number of idle kthreads is limited as is the total number of kthreads
allowed to exist concurrently.

It's certainly not optimal in the way it maintains the number of kthreads
in the pool over time, but I've not had the time to spare to make it better.

I'd love it if a general mechanism were provided so that XPC could get out
of maintaining its own pool.

Thanks,
Dean

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Delayed interrupt work, thread pools
  2008-07-02  1:39   ` Dean Nelson
@ 2008-07-02  2:38     ` Benjamin Herrenschmidt
  2008-07-02  2:47       ` Dave Chinner
  2008-07-02 14:27       ` [Ksummit-2008-discuss] " Hugh Dickins
  0 siblings, 2 replies; 25+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-02  2:38 UTC (permalink / raw)
  To: Dean Nelson; +Cc: Robin Holt, ksummit-2008-discuss, Linux Kernel list

On Tue, 2008-07-01 at 20:39 -0500, Dean Nelson wrote:
> As Robin, mentioned XPC manages a pool of kthreads that can (for performance
> reasons) be quickly awakened by an interrupt handler and that are able to
> block for indefinite periods of time.
> 
> In drivers/misc/sgi-xp/xpc_main.c you'll find a rather simplistic attempt
> at maintaining this pool of kthreads. 
> 
> The kthreads are activated by calling xpc_activate_kthreads(). Either idle
> kthreads are awakened or new kthreads are created if a sufficent number of
> idle kthreads are not available.
> 
> Once finished with current 'work' a kthread waits for new work by calling
> wait_event_interruptible_exclusive(). (The call is found in
> xpc_kthread_waitmsgs().)
> 
> The number of idle kthreads is limited as is the total number of kthreads
> allowed to exist concurrently.
> 
> It's certainly not optimal in the way it maintains the number of kthreads
> in the pool over time, but I've not had the time to spare to make it better.
> 
> I'd love it if a general mechanism were provided so that XPC could get out
> of maintaining its own pool.

Thanks. That makes one existing in-tree user and a one likely WIP user,
probably enough to move forward :-)

I'll look at your implementation and discuss internally see what our
specific needs in term of number of threads etc... look like.

I might come up with something simple first (ie, generalizing your
current implementation for example) and then look at some smarter
management of the thread pools.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Delayed interrupt work, thread pools
  2008-07-02  2:38     ` Benjamin Herrenschmidt
@ 2008-07-02  2:47       ` Dave Chinner
  2008-07-02 14:27       ` [Ksummit-2008-discuss] " Hugh Dickins
  1 sibling, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2008-07-02  2:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Dean Nelson, Robin Holt, ksummit-2008-discuss, Linux Kernel list

On Wed, Jul 02, 2008 at 12:38:52PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2008-07-01 at 20:39 -0500, Dean Nelson wrote:
> > As Robin, mentioned XPC manages a pool of kthreads that can (for performance
> > reasons) be quickly awakened by an interrupt handler and that are able to
> > block for indefinite periods of time.
> > 
> > In drivers/misc/sgi-xp/xpc_main.c you'll find a rather simplistic attempt
> > at maintaining this pool of kthreads. 
.....
> > I'd love it if a general mechanism were provided so that XPC could get out
> > of maintaining its own pool.
> 
> Thanks. That makes one existing in-tree user and a one likely WIP user,
> probably enough to move forward :-)

FWIW, the NFS server has a fairly sophisicated thread pool
implementation that allows interesting control of pool
affinity. Look up struct svc_pool in your local tree ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-01 12:45 Delayed interrupt work, thread pools Benjamin Herrenschmidt
  2008-07-01 12:53 ` [Ksummit-2008-discuss] " Matthew Wilcox
  2008-07-01 13:02 ` Robin Holt
@ 2008-07-02  4:22 ` Arjan van de Ven
  2008-07-02  5:44   ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 25+ messages in thread
From: Arjan van de Ven @ 2008-07-02  4:22 UTC (permalink / raw)
  To: benh; +Cc: ksummit-2008-discuss, Linux Kernel list

Benjamin Herrenschmidt wrote:
> Here's something that's been running in the back of my mind for some
> time that could be a good topic of discussion at KS.
> 
> In various areas (I'll come up with some examples later), kernel code
> such as drivers want to defer some processing to "task level", for
> various reasons such as locking (taking mutexes), memory allocation,
> interrupt latency, or simply doing things that may take more time than
> is reasonable to do at interrupt time or do things that may block.
> 
> Currently, the main mechanism we provide to do that is workqueues. They
> somewhat solve the problem, but at the same time, somewhat can make it
> worse.
> 
> The problem is that delaying a potentially long/sleeping task to a work
> queue will have the effect of delaying everything else waiting on that
> work queue.
> 

how much of this would be obsoleted if we had irqthreads ?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02  4:22 ` Arjan van de Ven
@ 2008-07-02  5:44   ` Benjamin Herrenschmidt
  2008-07-02 11:02     ` Andi Kleen
  2008-07-02 14:11     ` James Bottomley
  0 siblings, 2 replies; 25+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-02  5:44 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: ksummit-2008-discuss, Linux Kernel list, Jeremy Kerr

> how much of this would be obsoleted if we had irqthreads ?

I'm not sure irqthreads is what I want...

First, can they call handle_mm_fault ? (ie, I'm not sure precisely what
kind of context those operate into).

But even if that's ok, it doesn't quite satisfy my primary needs unless
we can fire off an irqthread per interrupt -occurence- rather than
having an irqthread per source.

There is two aspects to the problem. The less important is that I need
to be able to service other interrupts from that source
after firing off the "job".

For example, the GFX chip or the SPU in my case takes a page fault when
accessing the user mm context it's attached to, I fire off a thread to
handle it (which I attach/detach from the mm, catch signals, etc...),
but that doesn't stop execution. Transfers to/from main memory on the
SPU (and to some extend on graphic chips) are asynchronous and thus the
SPU can still run and emit other interrupts representing different
conditions (though not other page faults).

The second aspect which is more important in the SPU case is that they
context switch. While an SPU context causes a page fault, and I fire off
that thread to service it, I want to be able to context switch some
other context on the SPU which will itself emit interrupts etc... on
that same source.

I could get away by simply allocating a kernel thread per SPU context,
and that's what we're going to do in our proof-of-concept
implementation, but I was hoping to avoid it with the thread pools in
the long run, thus saving a few resources left and right and loading the
main scheduler lists less with huge amount of mostly idle threads.

Now regarding the other usage scenario mentioned here (XPC and the NFS
server) that already have thread pools, how much of these would be also
replaced by irqthreads ? I don't think much off hand but I can't say for
sure until I have a look ... Again, that may be me just not
understanding what irqthreads are but it looks to me that they are one
thread per IRQ source or so, not the ability for a single IRQ source to
fire off multiple threads. Maybe if irqthreads could fork() that would
be an option... 

In any case, Dave messages imply we have at least two existing in tree
thread pool implementations for two users and possibly spufs being a 3rd
one (I'm keeping graphics at bay for now as I see that being a more long
term scenario). Probably worth looking at some consolidation.

Anyway, time for me to go look at the XPC and NFS code and see if there
is anything worth putting in common in there. Might take me a little
while, there is nothing urgent (which is why I was thinking about a KS
chat but the list is fine too), we are doing a proof-of-concept
implementation using per-context threads in the meantime anyway.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02  5:44   ` Benjamin Herrenschmidt
@ 2008-07-02 11:02     ` Andi Kleen
  2008-07-02 11:19       ` Leon Woestenberg
  2008-07-02 20:57       ` Benjamin Herrenschmidt
  2008-07-02 14:11     ` James Bottomley
  1 sibling, 2 replies; 25+ messages in thread
From: Andi Kleen @ 2008-07-02 11:02 UTC (permalink / raw)
  To: benh; +Cc: Arjan van de Ven, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

>> how much of this would be obsoleted if we had irqthreads ?
>
> I'm not sure irqthreads is what I want...
>
> First, can they call handle_mm_fault ? (ie, I'm not sure precisely what
> kind of context those operate into).

Interrupt threads would be kernel threads and kernel threads
run with lazy (= random) mm and calling handle_mm_fault on that
wouldn't be very useful because you would affect a random mm.

Ok you could force them to run with a specific MM, but that would
cause first live time issues with the original MM (how could you
ever free it?) and also increase the interrupt handling latency
because the interrupt would be a nearly full blown VM context
switch then.

I also think interrupts threads are a bad idea in many cases because
their whole "advantage" over classical interrupts is that they can
block. Now blocking can be usually take a unbounded potentially long
time.

What do you do when there are more interrupts in that unbounded time?

Create more interrupt threads?  At some point you'll have hundreds
of threads doing nothing when you're unlucky.

Keep the interrupt event blocked? Then you'll not be handling
any events for a long time.

The usual Linux design that you design that interrupt to be fast
and run in a bounded time seems far more sane to me.

RT Linux has interrupt threads and they work around this problem
by assuming that the block events are only short (only 
what would be normally spinlocks).  If someone took a lock there
long enough then the bad things described above would happen.

Given if a spinlock is taken too long then a non RT kernel
is usually also not too happy so it's usually ensured that
that this is not the case.

But for your case these guarantees (only short lock regions 
block) would not be the case (handle_mm_fault can block for 
a long time in some cases) and then all hell could break lose.

So in short I don't think interrupts are a solution to your
problem or that they even solve any problem in a non RT kernel.

-Andi 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 11:02     ` Andi Kleen
@ 2008-07-02 11:19       ` Leon Woestenberg
  2008-07-02 11:24         ` Andi Kleen
  2008-07-02 20:57       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 25+ messages in thread
From: Leon Woestenberg @ 2008-07-02 11:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: benh, Arjan van de Ven, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr, RT

Hello,

(including linux-rt-users in the CC:, irqthreads are on-topic there)

On Wed, Jul 2, 2008 at 1:02 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>
>>> how much of this would be obsoleted if we had irqthreads ?
>>
>> I'm not sure irqthreads is what I want...
>>
> I also think interrupts threads are a bad idea in many cases because
> their whole "advantage" over classical interrupts is that they can
> block. Now blocking can be usually take a unbounded potentially long
> time.
>
> What do you do when there are more interrupts in that unbounded time?
>
If by irqthreads the -rt implementation is meant, isn't this what happens:

irq kernel handler masks the source interrupt
irq handler awakes the matching irqthread (they always are present)
irqthread is scheduled, does work and returns
irq kernel unmasks the source interrupt

> Create more interrupt threads?  At some point you'll have hundreds
> of threads doing nothing when you're unlucky.
>
Each irqthread handles one irq.
So now new irq thread would spawn for any interrupt.

Regards,
-- 
Leon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 11:19       ` Leon Woestenberg
@ 2008-07-02 11:24         ` Andi Kleen
  0 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2008-07-02 11:24 UTC (permalink / raw)
  To: Leon Woestenberg
  Cc: benh, Arjan van de Ven, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr, RT

Leon Woestenberg wrote:
> Hello,
> 
> (including linux-rt-users in the CC:, irqthreads are on-topic there)

Actually it's probably not interesting for this case.

> 
> On Wed, Jul 2, 2008 at 1:02 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>>
>>>> how much of this would be obsoleted if we had irqthreads ?
>>> I'm not sure irqthreads is what I want...
>>>
>> I also think interrupts threads are a bad idea in many cases because
>> their whole "advantage" over classical interrupts is that they can
>> block. Now blocking can be usually take a unbounded potentially long
>> time.
>>
>> What do you do when there are more interrupts in that unbounded time?
>>
> If by irqthreads the -rt implementation is meant, isn't this what happens:
> 
> irq kernel handler masks the source interrupt
> irq handler awakes the matching irqthread (they always are present)
> irqthread is scheduled, does work and returns
> irq kernel unmasks the source interrupt

I described this case. If the interrupt handler blocks for a long
time (as Ben asked for) then the interrupts will not be handled
for a long time. Probably not what you want.

BTW this was not a criticsm of rt linux (in whose context
irqthreads make sense as I explained) , just an explanation why they
imho don't make sense (IMHO) in a non hard rt interruptible kernel
and especially not to solve Ben's issue.

> 
>> Create more interrupt threads?  At some point you'll have hundreds
>> of threads doing nothing when you're unlucky.
>>
> Each irqthread handles one irq.
> So now new irq thread would spawn for any interrupt.

It was a general description of all possible irqthreads.

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02  5:44   ` Benjamin Herrenschmidt
  2008-07-02 11:02     ` Andi Kleen
@ 2008-07-02 14:11     ` James Bottomley
  2008-07-02 20:00       ` Steven Rostedt
                         ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: James Bottomley @ 2008-07-02 14:11 UTC (permalink / raw)
  To: benh; +Cc: Arjan van de Ven, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr

On Wed, 2008-07-02 at 15:44 +1000, Benjamin Herrenschmidt wrote:
> > how much of this would be obsoleted if we had irqthreads ?
> 
> I'm not sure irqthreads is what I want...
> 
> First, can they call handle_mm_fault ? (ie, I'm not sure precisely what
> kind of context those operate into).
> 
> But even if that's ok, it doesn't quite satisfy my primary needs unless
> we can fire off an irqthread per interrupt -occurence- rather than
> having an irqthread per source.
> 
> There is two aspects to the problem. The less important is that I need
> to be able to service other interrupts from that source
> after firing off the "job".
> 
> For example, the GFX chip or the SPU in my case takes a page fault when
> accessing the user mm context it's attached to, I fire off a thread to
> handle it (which I attach/detach from the mm, catch signals, etc...),
> but that doesn't stop execution. Transfers to/from main memory on the
> SPU (and to some extend on graphic chips) are asynchronous and thus the
> SPU can still run and emit other interrupts representing different
> conditions (though not other page faults).
> 
> The second aspect which is more important in the SPU case is that they
> context switch. While an SPU context causes a page fault, and I fire off
> that thread to service it, I want to be able to context switch some
> other context on the SPU which will itself emit interrupts etc... on
> that same source.
> 
> I could get away by simply allocating a kernel thread per SPU context,
> and that's what we're going to do in our proof-of-concept
> implementation, but I was hoping to avoid it with the thread pools in
> the long run, thus saving a few resources left and right and loading the
> main scheduler lists less with huge amount of mostly idle threads.
> 
> Now regarding the other usage scenario mentioned here (XPC and the NFS
> server) that already have thread pools, how much of these would be also
> replaced by irqthreads ? I don't think much off hand but I can't say for
> sure until I have a look ... Again, that may be me just not
> understanding what irqthreads are but it looks to me that they are one
> thread per IRQ source or so, not the ability for a single IRQ source to
> fire off multiple threads. Maybe if irqthreads could fork() that would
> be an option... 
> 
> In any case, Dave messages imply we have at least two existing in tree
> thread pool implementations for two users and possibly spufs being a 3rd
> one (I'm keeping graphics at bay for now as I see that being a more long
> term scenario). Probably worth looking at some consolidation.
> 
> Anyway, time for me to go look at the XPC and NFS code and see if there
> is anything worth putting in common in there. Might take me a little
> while, there is nothing urgent (which is why I was thinking about a KS
> chat but the list is fine too), we are doing a proof-of-concept
> implementation using per-context threads in the meantime anyway.

If you really need the full scheduling capabilities of threads, then it
sounds like a threadpool is all you need (and we should just provide a
unified interface).

Initially you were implying you'd prefer some type of non blockable
workqueue (i.e. a workqueue that shifts to the next work item when and
earlier item blocks).   I can see this construct being useful because it
would have easier to use semantics and be more lightweight than a full
thread spawn.  It strikes me we could use some of the syslets work to do
this ... all the queue needs is an "next activation head", which will be
the next job in the queue in the absence of blocking.  When a job
blocks, syslets informs the workqueue and it moves on to the work on the
"next activation head".  If a prior job unblocks, syslets informs the
queue and it moves the "next activation head" to the unblocked job.
What this is doing is implementing a really simple scheduler within a
single workqueue, which I'm unsure is actually a good idea since
schedulers are complex and tricky things, but it is probably worthy of
discussion.

James



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02  2:38     ` Benjamin Herrenschmidt
  2008-07-02  2:47       ` Dave Chinner
@ 2008-07-02 14:27       ` Hugh Dickins
  1 sibling, 0 replies; 25+ messages in thread
From: Hugh Dickins @ 2008-07-02 14:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Dean Nelson, ksummit-2008-discuss, Robin Holt, Linux Kernel list

On Wed, 2 Jul 2008, Benjamin Herrenschmidt wrote:
> On Tue, 2008-07-01 at 20:39 -0500, Dean Nelson wrote:
> > As Robin, mentioned XPC manages a pool of kthreads that can (for performance
> > reasons) be quickly awakened by an interrupt handler and that are able to
> > block for indefinite periods of time.
> > 
> > In drivers/misc/sgi-xp/xpc_main.c you'll find a rather simplistic attempt
> > at maintaining this pool of kthreads. 
> > 
> > The kthreads are activated by calling xpc_activate_kthreads(). Either idle
> > kthreads are awakened or new kthreads are created if a sufficent number of
> > idle kthreads are not available.
> > 
> > Once finished with current 'work' a kthread waits for new work by calling
> > wait_event_interruptible_exclusive(). (The call is found in
> > xpc_kthread_waitmsgs().)
> > 
> > The number of idle kthreads is limited as is the total number of kthreads
> > allowed to exist concurrently.
> > 
> > It's certainly not optimal in the way it maintains the number of kthreads
> > in the pool over time, but I've not had the time to spare to make it better.
> > 
> > I'd love it if a general mechanism were provided so that XPC could get out
> > of maintaining its own pool.
> 
> Thanks. That makes one existing in-tree user and a one likely WIP user,
> probably enough to move forward :-)
> 
> I'll look at your implementation and discuss internally see what our
> specific needs in term of number of threads etc... look like.
> 
> I might come up with something simple first (ie, generalizing your
> current implementation for example) and then look at some smarter
> management of the thread pools.

Do the pdflush daemons (from mm/pdflush.c) provide another example?

Hugh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 14:11     ` James Bottomley
@ 2008-07-02 20:00       ` Steven Rostedt
  2008-07-02 20:22         ` James Bottomley
  2008-07-02 21:02         ` Benjamin Herrenschmidt
  2008-07-02 21:00       ` Benjamin Herrenschmidt
  2008-07-07 14:09       ` Chris Mason
  2 siblings, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2008-07-02 20:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: benh, Arjan van de Ven, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr

On Wed, Jul 02, 2008 at 09:11:36AM -0500, James Bottomley wrote:
> 
> If you really need the full scheduling capabilities of threads, then it
> sounds like a threadpool is all you need (and we should just provide a
> unified interface).

Something like this may also be useful for the RT kernel as well. Being
able to push off tasks that we could prioritize would be greatly
beneficial.

Too bad we don't have a lighter task. Looking at the task_struct it
looks quite heavy, to be storing lots of threads. Perhaps we can clean
it up some time and remove out anything that would only be useful for
userspace threads. Not sure how much that would save us.

As for interrupt threads, those would help for some non-RT issues
(having a better desktop feel) but not for the issue that Ben has been
stating. I would be interested in knowing exactly what is needing to
handle a page fault inside the kernel.  If we need to do something for a
user space task, as soon as that task is found the work should be passed
to that thread.

> 
> Initially you were implying you'd prefer some type of non blockable
> workqueue (i.e. a workqueue that shifts to the next work item when and
> earlier item blocks).   I can see this construct being useful because it
> would have easier to use semantics and be more lightweight than a full
> thread spawn.  It strikes me we could use some of the syslets work to do
> this ... all the queue needs is an "next activation head", which will be
> the next job in the queue in the absence of blocking.  When a job
> blocks, syslets informs the workqueue and it moves on to the work on the
> "next activation head".  If a prior job unblocks, syslets informs the
> queue and it moves the "next activation head" to the unblocked job.
> What this is doing is implementing a really simple scheduler within a
> single workqueue, which I'm unsure is actually a good idea since
> schedulers are complex and tricky things, but it is probably worthy of
> discussion.

I think doing a "mini scheduler" inside a workgroup thread would be a
major hack.  We would have to have hooks into the normal scheduler to
let the mini-scheduler know something is blocking, and then have that
scheduler do some work. Not to mention that we need to handle
preemption.

Having a thread pool sounds much more reasonable and easier to
implement.

BTW, if something like this is implemented, I think that it should be a
replacement for softirqs and tasklets.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 20:00       ` Steven Rostedt
@ 2008-07-02 20:22         ` James Bottomley
  2008-07-02 20:28           ` Arjan van de Ven
  2008-07-02 20:40           ` Steven Rostedt
  2008-07-02 21:02         ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 25+ messages in thread
From: James Bottomley @ 2008-07-02 20:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: benh, Arjan van de Ven, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr

On Wed, 2008-07-02 at 16:00 -0400, Steven Rostedt wrote:
> On Wed, Jul 02, 2008 at 09:11:36AM -0500, James Bottomley wrote:
> > 
> > If you really need the full scheduling capabilities of threads, then it
> > sounds like a threadpool is all you need (and we should just provide a
> > unified interface).
> 
> Something like this may also be useful for the RT kernel as well. Being
> able to push off tasks that we could prioritize would be greatly
> beneficial.
> 
> Too bad we don't have a lighter task. Looking at the task_struct it
> looks quite heavy, to be storing lots of threads. Perhaps we can clean
> it up some time and remove out anything that would only be useful for
> userspace threads. Not sure how much that would save us.
> 
> As for interrupt threads, those would help for some non-RT issues
> (having a better desktop feel) but not for the issue that Ben has been
> stating. I would be interested in knowing exactly what is needing to
> handle a page fault inside the kernel.  If we need to do something for a
> user space task, as soon as that task is found the work should be passed
> to that thread.
> 
> > 
> > Initially you were implying you'd prefer some type of non blockable
> > workqueue (i.e. a workqueue that shifts to the next work item when and
> > earlier item blocks).   I can see this construct being useful because it
> > would have easier to use semantics and be more lightweight than a full
> > thread spawn.  It strikes me we could use some of the syslets work to do
> > this ... all the queue needs is an "next activation head", which will be
> > the next job in the queue in the absence of blocking.  When a job
> > blocks, syslets informs the workqueue and it moves on to the work on the
> > "next activation head".  If a prior job unblocks, syslets informs the
> > queue and it moves the "next activation head" to the unblocked job.
> > What this is doing is implementing a really simple scheduler within a
> > single workqueue, which I'm unsure is actually a good idea since
> > schedulers are complex and tricky things, but it is probably worthy of
> > discussion.
> 
> I think doing a "mini scheduler" inside a workgroup thread would be a
> major hack.  We would have to have hooks into the normal scheduler to
> let the mini-scheduler know something is blocking, and then have that
> scheduler do some work. Not to mention that we need to handle
> preemption.

Not necessarly ... a simplistic round robin is fine.

The work to detect the "am I being blocked" has already been done for
some of the aio patches, so I'm merely suggesting another use for it.

Isn't preemption an orthogonal problem ... it will surely exist even in
the threadpool approach?

> Having a thread pool sounds much more reasonable and easier to
> implement.

Easier to implement, yes.  Easier to program, unlikely, and coming with
a large amount of overhead, definitely.

> BTW, if something like this is implemented, I think that it should be a
> replacement for softirqs and tasklets.

James



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 20:22         ` James Bottomley
@ 2008-07-02 20:28           ` Arjan van de Ven
  2008-07-02 20:40           ` Steven Rostedt
  1 sibling, 0 replies; 25+ messages in thread
From: Arjan van de Ven @ 2008-07-02 20:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: Steven Rostedt, benh, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr

James Bottomley wrote:
> Easier to implement, yes.  Easier to program, unlikely, and coming with
> a large amount of overhead, definitely.
> 
>> BTW, if something like this is implemented, I think that it should be a
>> replacement for softirqs and tasklets.
> 
under the "better steal right (from other open source) than invent wrong" mantra:
it's worth looking at what glib does here; I've used their threadpools before and
it worked really well for me.... we could learn a lot from that


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 20:22         ` James Bottomley
  2008-07-02 20:28           ` Arjan van de Ven
@ 2008-07-02 20:40           ` Steven Rostedt
  1 sibling, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2008-07-02 20:40 UTC (permalink / raw)
  To: James Bottomley
  Cc: benh, Arjan van de Ven, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr


On Wed, 2 Jul 2008, James Bottomley wrote:
> >
> > I think doing a "mini scheduler" inside a workgroup thread would be a
> > major hack.  We would have to have hooks into the normal scheduler to
> > let the mini-scheduler know something is blocking, and then have that
> > scheduler do some work. Not to mention that we need to handle
> > preemption.
>
> Not necessarly ... a simplistic round robin is fine.

Coming from the RT world I was hoping for something that we could have
better control of prioritizing the tasks ;-)

>
> The work to detect the "am I being blocked" has already been done for
> some of the aio patches, so I'm merely suggesting another use for it.

Hmm, I didn't realize this. I'll have to go look at that code.

>
> Isn't preemption an orthogonal problem ... it will surely exist even in
> the threadpool approach?

I was just thinking that the scheduler would need to differentiate between
being blocked and being preempted. Seems that anytime a task would sleep
(outside preemption) the mini-scheduler would need to schedule the next
task.

>
> > Having a thread pool sounds much more reasonable and easier to
> > implement.
>
> Easier to implement, yes.  Easier to program, unlikely, and coming with
> a large amount of overhead, definitely.

Hmm, I'd argue about the "easier to program" part, but the overhead I,
unfortunately, have to argee with you.

>
> > BTW, if something like this is implemented, I think that it should be a
> > replacement for softirqs and tasklets.


-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 11:02     ` Andi Kleen
  2008-07-02 11:19       ` Leon Woestenberg
@ 2008-07-02 20:57       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 25+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-02 20:57 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arjan van de Ven, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr

On Wed, 2008-07-02 at 13:02 +0200, Andi Kleen wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> >> how much of this would be obsoleted if we had irqthreads ?
> >
> > I'm not sure irqthreads is what I want...
> >
> > First, can they call handle_mm_fault ? (ie, I'm not sure precisely what
> > kind of context those operate into).
> 
> Interrupt threads would be kernel threads and kernel threads
> run with lazy (= random) mm and calling handle_mm_fault on that
> wouldn't be very useful because you would affect a random mm.

That isn't a big issue. handle_mm_fault() takes the mm as an argument
(like when called from get_user_pages()) and if there's anything fishy I
can always attach/detach the mm to the thread. Been done before, works
fine.

> Ok you could force them to run with a specific MM, but that would
> cause first live time issues with the original MM (how could you
> ever free it?) and also increase the interrupt handling latency
> because the interrupt would be a nearly full blown VM context
> switch then.

handle_mm_fault() shouldn't need an mm context switch. I can just
refcount while I have a ref. to the mm in my queue. I can deal with
lifetime, that isn't a big issue.

> I also think interrupts threads are a bad idea in many cases because
> their whole "advantage" over classical interrupts is that they can
> block. Now blocking can be usually take a unbounded potentially long
> time.

Yes, that's what I explain in the rest of my mail. That plus the fact
that I need to context switch the SPU to other contexts while we block.

 .../...

I agree with most of your points, which is why I believe interrupt
threads aren't a good option for me.

Interrupts for "normal" events will be handled in a short/bounded time.

Interrupts coming from SPU page faults will be deferred to a thread from
a pool (which can need more time if none is available, ie work queue ->
allocate more, or just wait on one to free up, the stategy here is to be
defined).

It's not a problem to have them delayed. I can context switch a faulting
SPU to some other task and switch it back later when the fault is
serviced. Anything time critical shouldn't operate on fault-able memory
in the first place :-)

So I need at most one kernel thread per SPU context for handling the
faults. The idea of the thread pools is that most of the time, I don't
take faults, and thus I don't nearly need as many threads in practice.
Thus having a pool that can dynamically grow or shrink based on pressure
would make sense.

Ben.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 14:11     ` James Bottomley
  2008-07-02 20:00       ` Steven Rostedt
@ 2008-07-02 21:00       ` Benjamin Herrenschmidt
  2008-07-03 10:12         ` Eric W. Biederman
  2008-07-07 14:09       ` Chris Mason
  2 siblings, 1 reply; 25+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-02 21:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Arjan van de Ven, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr


> If you really need the full scheduling capabilities of threads, then it
> sounds like a threadpool is all you need (and we should just provide a
> unified interface).

That's my thinking nowadays.

> Initially you were implying you'd prefer some type of non blockable
> workqueue (i.e. a workqueue that shifts to the next work item when and
> earlier item blocks).

That's also something I had in mind, I was tossing ideas around and
collecting feedback :-)

>    I can see this construct being useful because it
> would have easier to use semantics and be more lightweight than a full
> thread spawn.  It strikes me we could use some of the syslets work to do
> this ... 

Precisely what I had in mind.

> all the queue needs is an "next activation head", which will be
> the next job in the queue in the absence of blocking.  When a job
> blocks, syslets informs the workqueue and it moves on to the work on the
> "next activation head".  If a prior job unblocks, syslets informs the
> queue and it moves the "next activation head" to the unblocked job.
> What this is doing is implementing a really simple scheduler within a
> single workqueue, which I'm unsure is actually a good idea since
> schedulers are complex and tricky things, but it is probably worthy of
> discussion.

The question is: is that significantly less overhead than just spawning
a new full blown kernel thread ? enough to justify the complexity ? at
the end of the day, it means allocating a stack (which on ppc64 is still
16K, I know it sucks)... 

Ben.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 20:00       ` Steven Rostedt
  2008-07-02 20:22         ` James Bottomley
@ 2008-07-02 21:02         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 25+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-02 21:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: James Bottomley, Arjan van de Ven, ksummit-2008-discuss,
	Linux Kernel list, Jeremy Kerr

On Wed, 2008-07-02 at 16:00 -0400, Steven Rostedt wrote:
> 
> As for interrupt threads, those would help for some non-RT issues
> (having a better desktop feel) but not for the issue that Ben has been
> stating. I would be interested in knowing exactly what is needing to
> handle a page fault inside the kernel.  If we need to do something for a
> user space task, as soon as that task is found the work should be passed
> to that thread.

Not much is needed, as the mm is passed as an argument to
handle_mm_fault(). Page faults can already be handled by 'other'
processes (get_user_pages() doesn't have to be called in the context of
the target mm). I need to check if we may not get into funky issues down
at the vfs level if using a kernel thread without files and I may need
to dbl check if anything in that path tries to signal but that's about
it afaik.

Ben.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 21:00       ` Benjamin Herrenschmidt
@ 2008-07-03 10:12         ` Eric W. Biederman
  2008-07-03 10:31           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2008-07-03 10:12 UTC (permalink / raw)
  To: benh; +Cc: James Bottomley, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> The question is: is that significantly less overhead than just spawning
> a new full blown kernel thread ? enough to justify the complexity ? at
> the end of the day, it means allocating a stack (which on ppc64 is still
> 16K, I know it sucks)... 

I looked at this a while ago.  And right  now kernel_thread is fairly light.
kthread_create has latency issues because we need to queue up a task on
our kernel thread spawning daemon, and let it fork the child.  Needing
to go via the kthread spawning daemon didn't look fundamental, just something
that was a challenge to sort out.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-03 10:12         ` Eric W. Biederman
@ 2008-07-03 10:31           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 25+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-03 10:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Bottomley, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr

On Thu, 2008-07-03 at 03:12 -0700, Eric W. Biederman wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> > The question is: is that significantly less overhead than just spawning
> > a new full blown kernel thread ? enough to justify the complexity ? at
> > the end of the day, it means allocating a stack (which on ppc64 is still
> > 16K, I know it sucks)... 
> 
> I looked at this a while ago.  And right  now kernel_thread is fairly light.
> kthread_create has latency issues because we need to queue up a task on
> our kernel thread spawning daemon, and let it fork the child.  Needing
> to go via the kthread spawning daemon didn't look fundamental, just something
> that was a challenge to sort out.

Yes. I was thinking that if it becomes an issue, we could special case
something in the scheduler to pop them.

Ben.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-02 14:11     ` James Bottomley
  2008-07-02 20:00       ` Steven Rostedt
  2008-07-02 21:00       ` Benjamin Herrenschmidt
@ 2008-07-07 14:09       ` Chris Mason
  2008-07-07 23:03         ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 25+ messages in thread
From: Chris Mason @ 2008-07-07 14:09 UTC (permalink / raw)
  To: James Bottomley
  Cc: benh, ksummit-2008-discuss, Linux Kernel list, Jeremy Kerr

On Wed, 2008-07-02 at 09:11 -0500, James Bottomley wrote:

> If you really need the full scheduling capabilities of threads, then it
> sounds like a threadpool is all you need (and we should just provide a
> unified interface).
> 

workqueues weren't quite right for btrfs either, where I need to be able
to verify checksums after IO completes (among other things).  So I also
ended up with a simple thread pool system that can add kthreads on
demand.

So, it sounds like we'd have a number of users for a unified interface.

> Initially you were implying you'd prefer some type of non blockable
> workqueue (i.e. a workqueue that shifts to the next work item when and
> earlier item blocks).   I can see this construct being useful because it
> would have easier to use semantics and be more lightweight than a full
> thread spawn.  It strikes me we could use some of the syslets work to do
> this ... all the queue needs is an "next activation head", which will be
> the next job in the queue in the absence of blocking.  When a job
> blocks, syslets informs the workqueue and it moves on to the work on the
> "next activation head".  If a prior job unblocks, syslets informs the
> queue and it moves the "next activation head" to the unblocked job.
> What this is doing is implementing a really simple scheduler within a
> single workqueue, which I'm unsure is actually a good idea since
> schedulers are complex and tricky things, but it is probably worthy of
> discussion.

I have a few different users of the thread pools, and I ended up having
to create a number of pools to avoid deadlocks between different types
of operations on the same work list.  Ideas like the next activation
head really sound cool, but the simplicity of just making dedicated
pools to dedicated tasks is much much easier to debug.

If the pools are able to resize themselves sanely, it should perform
about the same as the fancy stuff ;)

-chris



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Ksummit-2008-discuss] Delayed interrupt work, thread pools
  2008-07-07 14:09       ` Chris Mason
@ 2008-07-07 23:03         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 25+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-07 23:03 UTC (permalink / raw)
  To: Chris Mason
  Cc: James Bottomley, ksummit-2008-discuss, Linux Kernel list,
	Jeremy Kerr

On Mon, 2008-07-07 at 10:09 -0400, Chris Mason wrote:
> I have a few different users of the thread pools, and I ended up
> having
> to create a number of pools to avoid deadlocks between different types
> of operations on the same work list.  Ideas like the next activation
> head really sound cool, but the simplicity of just making dedicated
> pools to dedicated tasks is much much easier to debug.
> 
> If the pools are able to resize themselves sanely, it should perform
> about the same as the fancy stuff ;)

Could be just like workqueues: a "default" common pool and the ability
to create specialized pools with possibly configurable constraints in
size etc...

I'm a bit too busy with preparing for the merge window right now along
with a few other things, so I haven't looked in details at the existing
implementations yet. I'm in no hurry tho, thus somebody feel free to
beat me to it, else I'll dig into it later this month or so.

Ben.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-07-07 23:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-01 12:45 Delayed interrupt work, thread pools Benjamin Herrenschmidt
2008-07-01 12:53 ` [Ksummit-2008-discuss] " Matthew Wilcox
2008-07-01 13:38   ` Benjamin Herrenschmidt
2008-07-01 13:02 ` Robin Holt
2008-07-02  1:39   ` Dean Nelson
2008-07-02  2:38     ` Benjamin Herrenschmidt
2008-07-02  2:47       ` Dave Chinner
2008-07-02 14:27       ` [Ksummit-2008-discuss] " Hugh Dickins
2008-07-02  4:22 ` Arjan van de Ven
2008-07-02  5:44   ` Benjamin Herrenschmidt
2008-07-02 11:02     ` Andi Kleen
2008-07-02 11:19       ` Leon Woestenberg
2008-07-02 11:24         ` Andi Kleen
2008-07-02 20:57       ` Benjamin Herrenschmidt
2008-07-02 14:11     ` James Bottomley
2008-07-02 20:00       ` Steven Rostedt
2008-07-02 20:22         ` James Bottomley
2008-07-02 20:28           ` Arjan van de Ven
2008-07-02 20:40           ` Steven Rostedt
2008-07-02 21:02         ` Benjamin Herrenschmidt
2008-07-02 21:00       ` Benjamin Herrenschmidt
2008-07-03 10:12         ` Eric W. Biederman
2008-07-03 10:31           ` Benjamin Herrenschmidt
2008-07-07 14:09       ` Chris Mason
2008-07-07 23:03         ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox