All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: kvm aio wishlist
       [not found]       ` <492BC5CB.6000609@redhat.com>
@ 2008-11-25 10:19         ` Suparna Bhattacharya
  2008-11-25 10:48           ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Suparna Bhattacharya @ 2008-11-25 10:19 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel,
	mingo


[cc'ing lkml as well ] 

On Tue, Nov 25, 2008 at 11:30:51AM +0200, Avi Kivity wrote:
> Zach Brown wrote:
>>> I'm also worried about introducing threads.  With direct I/O, we know
>>> we're going to block.  The easiest thing is to slap the request onto a
>>> queue (blockdev or netdev) and unplug it.
>>>     
>>
>> Is it really that easy?  There's a non-trivial number of places it can
>> block before submitting the IO and making it to the async completion
>> phase.  They show up as latency spikes in real-world loads.
>>
>> DIO is a good example.  Using a kernel thread lets the entire path be
>> async.  We don't have to go in and fold an async state machine under
>> pinning user space pages, performing file system block mapping lookups,
>> allocating block layer requests, on and on.
>>
>>   
>
> Certainly, filesystem backed storage is much harder.  Maybe we can use one 
> of the fork-on-demand proposals to make the block mapping async, then queue 
> the request+pinned pages.
>
>>> IIRC, the idea behind the *lets/*rils was that the calls are usually
>>> nonblocking, so you fork on block, no?  I don't see that here.  Of
>>> course, that's not the case in my wishlist; all requests will block
>>> without exception.
>>>     
>>
>> Yeah.  My thinking is that if someone wants to experiment with syslets
>> it'll be pretty easy for them to add a flag to the submission struct and
>> re-use most of the submission and completion framework.  That's not my
>> priority.  I want posix aio in glibc to work.
>>   
>
> Why not extend io_submit() to use a thread pool when going through a 
> non-aio-ready path?  Yet a new interface, with another round of integrating 
> to the previous interfaces, is not a comforting thought.  I still haven't 
> got used to the fact that aio can work with fd polling.

Even paths that provide fop->aio_read/write can be synchronous (like non
O_DIRECT filesystem read/writes) underneath, and then there could be multiple
blocking points.

BTW, Ben had implemented a fallback approach that spawned kernel threads
- it was an initial patch and didn't do any thread pooling at that time.

I had a fallback path for pollable fds which did not require thread pools
http://lwn.net/Articles/216443/ 
(limited to fds which support non blocking semantics)

OR

Maybe we could use a very simple version of syslets to do an io_submit
in libaio :) 

Does the syslet approach of continuing in a different thread (different
thread id) affect kvm ?

Regards
Suparna

>
>>> Actually without preadv/pwritev (and without changes in qemu; that has
>>> its own wishlist) we can't really make good use of this now.
>>>     
>>
>> I could trivially add preadv and pwritev to the patch series.  The vfs
>> paths already support it, it's just that we don't have a syscall entry
>> point which takes the file position from an argument instead of from the
>> file struct behind the fd.
>>
>> Would that make it an interesting experiment for you to work with?
>>   
>
> Not really -- it doesn't add anything (at the moment) that a userspace 
> thread pool doesn't have.
>
> The key here is in the richer interface to the scheduler.  If we can get 
> the async exec thread to stay on the same cpu as the user thread that 
> launched it, and to start executing on the userspace thread's return to 
> userspace, then I guess many of the problems of threads are eliminated.
>
> -- 
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kvm aio wishlist
  2008-11-25 10:19         ` kvm aio wishlist Suparna Bhattacharya
@ 2008-11-25 10:48           ` Avi Kivity
  2008-11-25 14:59             ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-11-25 10:48 UTC (permalink / raw)
  To: suparna
  Cc: Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel,
	mingo

Suparna Bhattacharya wrote:
>> Why not extend io_submit() to use a thread pool when going through a 
>> non-aio-ready path?  Yet a new interface, with another round of integrating 
>> to the previous interfaces, is not a comforting thought.  I still haven't 
>> got used to the fact that aio can work with fd polling.
>>     
>
> Even paths that provide fop->aio_read/write can be synchronous (like non
> O_DIRECT filesystem read/writes) underneath, and then there could be multiple
> blocking points.
>   

If they are known to be synchronous when execution starts, they could 
just return -ENOSYS and fall back to threads, until someone implements a 
truly async path.

> BTW, Ben had implemented a fallback approach that spawned kernel threads
> - it was an initial patch and didn't do any thread pooling at that time.
>
> I had a fallback path for pollable fds which did not require thread pools
> http://lwn.net/Articles/216443/ 
> (limited to fds which support non blocking semantics)
>   

These are good solutions for the complex-blocking and never blocking cases.

> OR
>
> Maybe we could use a very simple version of syslets to do an io_submit
> in libaio :) 
>
> Does the syslet approach of continuing in a different thread (different
> thread id) affect kvm ?
>   

Yes, we like to pthread_kill() threads from time to time, and even 
expose the thread IDs to management tools so they can control pinning.

Perhaps a variant of syslet, that is kernel-only, and does:

- always allocate a new kernel stack at io_submit() time, but not a new 
thread
- start executing the rarely-blocking path of the request (like block 
mapping and get_users_pages_fast) on the new stack
- if we block here, clone a new thread and graft the stack onto it
- start the always-blocking portion of the call (enqueuing a bio)
- exit the new thead if we hit the slowpath, or deallocate the stack and 
longjmp back to the main stack if we did not

This does not expose any new semantics to userspace.  It does twist the 
guts of the kernel in that we have to duplicate thread_info, but if 
thread_info is only accessed from current, I think that is managable.

(I think I just described fibrils, no?  I think that was a good idea.  
Why can't we go back to it?)

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kvm aio wishlist
  2008-11-25 10:48           ` Avi Kivity
@ 2008-11-25 14:59             ` Ingo Molnar
  2008-11-25 15:10               ` Jens Axboe
  2008-11-25 16:51               ` Avi Kivity
  0 siblings, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2008-11-25 14:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: suparna, Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori,
	linux-kernel, Peter Zijlstra, Thomas Gleixner


* Avi Kivity <avi@redhat.com> wrote:

> Perhaps a variant of syslet, that is kernel-only, and does:
>
> - always allocate a new kernel stack at io_submit() time, but not a 
>   new thread

such a N:M threading design is a loss - sooner or later we arrive to a 
point where people actually start using it and then we want to 
load-balance and schedule these entities.

So i'd suggest the kthread based async engine i wrote for syslets. It 
worked well and for kernel-only entities it schedules super-fast - it 
can do up to 20 million events per second on a 16-way box i'm testing 
on. The objections about syslets were not related to the scheduling of 
it but were mostly about the userspace API/ABI: you dont have to use 
that.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kvm aio wishlist
  2008-11-25 14:59             ` Ingo Molnar
@ 2008-11-25 15:10               ` Jens Axboe
  2008-11-25 15:25                 ` Zach Brown
  2008-11-25 16:51               ` Avi Kivity
  1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2008-11-25 15:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, suparna, Zach Brown, linux-aio, Jeff Moyer,
	Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Tue, Nov 25 2008, Ingo Molnar wrote:
> 
> * Avi Kivity <avi@redhat.com> wrote:
> 
> > Perhaps a variant of syslet, that is kernel-only, and does:
> >
> > - always allocate a new kernel stack at io_submit() time, but not a 
> >   new thread
> 
> such a N:M threading design is a loss - sooner or later we arrive to a 
> point where people actually start using it and then we want to 
> load-balance and schedule these entities.
> 
> So i'd suggest the kthread based async engine i wrote for syslets. It 
> worked well and for kernel-only entities it schedules super-fast - it 
> can do up to 20 million events per second on a 16-way box i'm testing 
> on. The objections about syslets were not related to the scheduling of 
> it but were mostly about the userspace API/ABI: you dont have to use 
> that.

Still unsure why that stuff never got anywhere. Do you have a pointer to
the latest posting?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kvm aio wishlist
  2008-11-25 15:10               ` Jens Axboe
@ 2008-11-25 15:25                 ` Zach Brown
  2008-11-25 15:57                   ` Ingo Molnar
  2008-11-25 16:55                   ` Avi Kivity
  0 siblings, 2 replies; 10+ messages in thread
From: Zach Brown @ 2008-11-25 15:25 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Avi Kivity, suparna, linux-aio, Jeff Moyer,
	Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner


> Still unsure why that stuff never got anywhere.

Changing the tid of submitting tasks makes it unsuitable for sys_io_*()
or posix aio users as it stands.  Maybe we could swap tids on the
switch, but we'd probably then have to audit the life time of tid ->
task_struct users in the kernel.

And there's still the question of what ptrace is supposed to do.

- z

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kvm aio wishlist
  2008-11-25 15:25                 ` Zach Brown
@ 2008-11-25 15:57                   ` Ingo Molnar
  2008-11-25 16:55                   ` Avi Kivity
  1 sibling, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2008-11-25 15:57 UTC (permalink / raw)
  To: Zach Brown
  Cc: Jens Axboe, Avi Kivity, suparna, linux-aio, Jeff Moyer,
	Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner


* Zach Brown <zach.brown@oracle.com> wrote:

> > Still unsure why that stuff never got anywhere.
> 
> Changing the tid of submitting tasks makes it unsuitable for 
> sys_io_*() or posix aio users as it stands.  Maybe we could swap 
> tids on the switch, but we'd probably then have to audit the life 
> time of tid -> task_struct users in the kernel.

doesnt look like a big thing affecting the fastpath materially.

> And there's still the question of what ptrace is supposed to do.

debug-only, we sure can work something out.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kvm aio wishlist
  2008-11-25 14:59             ` Ingo Molnar
  2008-11-25 15:10               ` Jens Axboe
@ 2008-11-25 16:51               ` Avi Kivity
  2008-11-25 16:56                 ` Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-11-25 16:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: suparna, Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori,
	linux-kernel, Peter Zijlstra, Thomas Gleixner

Ingo Molnar wrote:
>   
>> Perhaps a variant of syslet, that is kernel-only, and does:
>>
>> - always allocate a new kernel stack at io_submit() time, but not a 
>>   new thread
>>     
>
> such a N:M threading design is a loss - sooner or later we arrive to a 
> point where people actually start using it and then we want to 
> load-balance and schedule these entities.
>   

It's only N:M as long as its nonblocking.  If it blocks it becomes 1:1 
again.  If it doesn't, it's probably faster to do things on the same 
cache as the caller.

> So i'd suggest the kthread based async engine i wrote for syslets. It 
> worked well and for kernel-only entities it schedules super-fast - it 
> can do up to 20 million events per second on a 16-way box i'm testing 
> on. The objections about syslets were not related to the scheduling of 
> it but were mostly about the userspace API/ABI: you dont have to use 
> that.

I'd love to have something :)

I guess any cache and latency considerations could be fixed if
- we schedule a syslet for the first time when the thread that launched 
it exits to userspace
- we queue it on the current cpu's runqueue

In that case, for the nonblocking case syslets and fibrils would have 
very similar performance.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kvm aio wishlist
  2008-11-25 15:25                 ` Zach Brown
  2008-11-25 15:57                   ` Ingo Molnar
@ 2008-11-25 16:55                   ` Avi Kivity
  2008-11-25 16:57                     ` Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-11-25 16:55 UTC (permalink / raw)
  To: Zach Brown
  Cc: Jens Axboe, Ingo Molnar, suparna, linux-aio, Jeff Moyer,
	Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner

Zach Brown wrote:
> And there's still the question of what ptrace is supposed to do.
>   

If it's kernel-only (which I think is a good start for something like 
this), then is ptrace relevant at all?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kvm aio wishlist
  2008-11-25 16:51               ` Avi Kivity
@ 2008-11-25 16:56                 ` Ingo Molnar
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2008-11-25 16:56 UTC (permalink / raw)
  To: Avi Kivity
  Cc: suparna, Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori,
	linux-kernel, Peter Zijlstra, Thomas Gleixner


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>>   
>>> Perhaps a variant of syslet, that is kernel-only, and does:
>>>
>>> - always allocate a new kernel stack at io_submit() time, but not a   
>>> new thread
>>>     
>>
>> such a N:M threading design is a loss - sooner or later we arrive to a  
>> point where people actually start using it and then we want to  
>> load-balance and schedule these entities.
>>   
>
> It's only N:M as long as its nonblocking.  If it blocks it becomes 1:1  
> again.  If it doesn't, it's probably faster to do things on the same  
> cache as the caller.
>
>> So i'd suggest the kthread based async engine i wrote for syslets. It  
>> worked well and for kernel-only entities it schedules super-fast - it  
>> can do up to 20 million events per second on a 16-way box i'm testing  
>> on. The objections about syslets were not related to the scheduling of  
>> it but were mostly about the userspace API/ABI: you dont have to use  
>> that.
>
> I'd love to have something :)
>
> I guess any cache and latency considerations could be fixed if
> - we schedule a syslet for the first time when the thread that launched  
> it exits to userspace
> - we queue it on the current cpu's runqueue
>
> In that case, for the nonblocking case syslets and fibrils would 
> have very similar performance.

yes. Hence given that fibrills have various tradeoffs, we should do 
the syslet thread pool. The code is there and it works :)

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kvm aio wishlist
  2008-11-25 16:55                   ` Avi Kivity
@ 2008-11-25 16:57                     ` Ingo Molnar
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2008-11-25 16:57 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zach Brown, Jens Axboe, suparna, linux-aio, Jeff Moyer,
	Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner


* Avi Kivity <avi@redhat.com> wrote:

> Zach Brown wrote:
>> And there's still the question of what ptrace is supposed to do.
>
> If it's kernel-only (which I think is a good start for something 
> like this), then is ptrace relevant at all?

it's relevant wrt. details: to make sure that it's all transparent and 
the ptrace engine is not confused by thread switching tricks.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-11-25 16:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <492B0CDD.7080000@redhat.com>
     [not found] ` <492B2348.9090008@oracle.com>
     [not found]   ` <492B2976.3010209@redhat.com>
     [not found]     ` <492B3912.3030707@oracle.com>
     [not found]       ` <492BC5CB.6000609@redhat.com>
2008-11-25 10:19         ` kvm aio wishlist Suparna Bhattacharya
2008-11-25 10:48           ` Avi Kivity
2008-11-25 14:59             ` Ingo Molnar
2008-11-25 15:10               ` Jens Axboe
2008-11-25 15:25                 ` Zach Brown
2008-11-25 15:57                   ` Ingo Molnar
2008-11-25 16:55                   ` Avi Kivity
2008-11-25 16:57                     ` Ingo Molnar
2008-11-25 16:51               ` Avi Kivity
2008-11-25 16:56                 ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.