* Re: kvm aio wishlist [not found] ` <492BC5CB.6000609@redhat.com> @ 2008-11-25 10:19 ` Suparna Bhattacharya 2008-11-25 10:48 ` Avi Kivity 0 siblings, 1 reply; 10+ messages in thread From: Suparna Bhattacharya @ 2008-11-25 10:19 UTC (permalink / raw) To: Avi Kivity Cc: Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, mingo [cc'ing lkml as well ] On Tue, Nov 25, 2008 at 11:30:51AM +0200, Avi Kivity wrote: > Zach Brown wrote: >>> I'm also worried about introducing threads. With direct I/O, we know >>> we're going to block. The easiest thing is to slap the request onto a >>> queue (blockdev or netdev) and unplug it. >>> >> >> Is it really that easy? There's a non-trivial number of places it can >> block before submitting the IO and making it to the async completion >> phase. They show up as latency spikes in real-world loads. >> >> DIO is a good example. Using a kernel thread lets the entire path be >> async. We don't have to go in and fold an async state machine under >> pinning user space pages, performing file system block mapping lookups, >> allocating block layer requests, on and on. >> >> > > Certainly, filesystem backed storage is much harder. Maybe we can use one > of the fork-on-demand proposals to make the block mapping async, then queue > the request+pinned pages. > >>> IIRC, the idea behind the *lets/*rils was that the calls are usually >>> nonblocking, so you fork on block, no? I don't see that here. Of >>> course, that's not the case in my wishlist; all requests will block >>> without exception. >>> >> >> Yeah. My thinking is that if someone wants to experiment with syslets >> it'll be pretty easy for them to add a flag to the submission struct and >> re-use most of the submission and completion framework. That's not my >> priority. I want posix aio in glibc to work. >> > > Why not extend io_submit() to use a thread pool when going through a > non-aio-ready path? Yet a new interface, with another round of integrating > to the previous interfaces, is not a comforting thought. I still haven't > got used to the fact that aio can work with fd polling. Even paths that provide fop->aio_read/write can be synchronous (like non O_DIRECT filesystem read/writes) underneath, and then there could be multiple blocking points. BTW, Ben had implemented a fallback approach that spawned kernel threads - it was an initial patch and didn't do any thread pooling at that time. I had a fallback path for pollable fds which did not require thread pools http://lwn.net/Articles/216443/ (limited to fds which support non blocking semantics) OR Maybe we could use a very simple version of syslets to do an io_submit in libaio :) Does the syslet approach of continuing in a different thread (different thread id) affect kvm ? Regards Suparna > >>> Actually without preadv/pwritev (and without changes in qemu; that has >>> its own wishlist) we can't really make good use of this now. >>> >> >> I could trivially add preadv and pwritev to the patch series. The vfs >> paths already support it, it's just that we don't have a syscall entry >> point which takes the file position from an argument instead of from the >> file struct behind the fd. >> >> Would that make it an interesting experiment for you to work with? >> > > Not really -- it doesn't add anything (at the moment) that a userspace > thread pool doesn't have. > > The key here is in the richer interface to the scheduler. If we can get > the async exec thread to stay on the same cpu as the user thread that > launched it, and to start executing on the userspace thread's return to > userspace, then I guess many of the problems of threads are eliminated. > > -- > error compiling committee.c: too many arguments to function > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to majordomo@kvack.org. For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kvm aio wishlist 2008-11-25 10:19 ` kvm aio wishlist Suparna Bhattacharya @ 2008-11-25 10:48 ` Avi Kivity 2008-11-25 14:59 ` Ingo Molnar 0 siblings, 1 reply; 10+ messages in thread From: Avi Kivity @ 2008-11-25 10:48 UTC (permalink / raw) To: suparna Cc: Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, mingo Suparna Bhattacharya wrote: >> Why not extend io_submit() to use a thread pool when going through a >> non-aio-ready path? Yet a new interface, with another round of integrating >> to the previous interfaces, is not a comforting thought. I still haven't >> got used to the fact that aio can work with fd polling. >> > > Even paths that provide fop->aio_read/write can be synchronous (like non > O_DIRECT filesystem read/writes) underneath, and then there could be multiple > blocking points. > If they are known to be synchronous when execution starts, they could just return -ENOSYS and fall back to threads, until someone implements a truly async path. > BTW, Ben had implemented a fallback approach that spawned kernel threads > - it was an initial patch and didn't do any thread pooling at that time. > > I had a fallback path for pollable fds which did not require thread pools > http://lwn.net/Articles/216443/ > (limited to fds which support non blocking semantics) > These are good solutions for the complex-blocking and never blocking cases. > OR > > Maybe we could use a very simple version of syslets to do an io_submit > in libaio :) > > Does the syslet approach of continuing in a different thread (different > thread id) affect kvm ? > Yes, we like to pthread_kill() threads from time to time, and even expose the thread IDs to management tools so they can control pinning. Perhaps a variant of syslet, that is kernel-only, and does: - always allocate a new kernel stack at io_submit() time, but not a new thread - start executing the rarely-blocking path of the request (like block mapping and get_users_pages_fast) on the new stack - if we block here, clone a new thread and graft the stack onto it - start the always-blocking portion of the call (enqueuing a bio) - exit the new thead if we hit the slowpath, or deallocate the stack and longjmp back to the main stack if we did not This does not expose any new semantics to userspace. It does twist the guts of the kernel in that we have to duplicate thread_info, but if thread_info is only accessed from current, I think that is managable. (I think I just described fibrils, no? I think that was a good idea. Why can't we go back to it?) -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kvm aio wishlist 2008-11-25 10:48 ` Avi Kivity @ 2008-11-25 14:59 ` Ingo Molnar 2008-11-25 15:10 ` Jens Axboe 2008-11-25 16:51 ` Avi Kivity 0 siblings, 2 replies; 10+ messages in thread From: Ingo Molnar @ 2008-11-25 14:59 UTC (permalink / raw) To: Avi Kivity Cc: suparna, Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner * Avi Kivity <avi@redhat.com> wrote: > Perhaps a variant of syslet, that is kernel-only, and does: > > - always allocate a new kernel stack at io_submit() time, but not a > new thread such a N:M threading design is a loss - sooner or later we arrive to a point where people actually start using it and then we want to load-balance and schedule these entities. So i'd suggest the kthread based async engine i wrote for syslets. It worked well and for kernel-only entities it schedules super-fast - it can do up to 20 million events per second on a 16-way box i'm testing on. The objections about syslets were not related to the scheduling of it but were mostly about the userspace API/ABI: you dont have to use that. Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kvm aio wishlist 2008-11-25 14:59 ` Ingo Molnar @ 2008-11-25 15:10 ` Jens Axboe 2008-11-25 15:25 ` Zach Brown 2008-11-25 16:51 ` Avi Kivity 1 sibling, 1 reply; 10+ messages in thread From: Jens Axboe @ 2008-11-25 15:10 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, suparna, Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner On Tue, Nov 25 2008, Ingo Molnar wrote: > > * Avi Kivity <avi@redhat.com> wrote: > > > Perhaps a variant of syslet, that is kernel-only, and does: > > > > - always allocate a new kernel stack at io_submit() time, but not a > > new thread > > such a N:M threading design is a loss - sooner or later we arrive to a > point where people actually start using it and then we want to > load-balance and schedule these entities. > > So i'd suggest the kthread based async engine i wrote for syslets. It > worked well and for kernel-only entities it schedules super-fast - it > can do up to 20 million events per second on a 16-way box i'm testing > on. The objections about syslets were not related to the scheduling of > it but were mostly about the userspace API/ABI: you dont have to use > that. Still unsure why that stuff never got anywhere. Do you have a pointer to the latest posting? -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kvm aio wishlist 2008-11-25 15:10 ` Jens Axboe @ 2008-11-25 15:25 ` Zach Brown 2008-11-25 15:57 ` Ingo Molnar 2008-11-25 16:55 ` Avi Kivity 0 siblings, 2 replies; 10+ messages in thread From: Zach Brown @ 2008-11-25 15:25 UTC (permalink / raw) To: Jens Axboe Cc: Ingo Molnar, Avi Kivity, suparna, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner > Still unsure why that stuff never got anywhere. Changing the tid of submitting tasks makes it unsuitable for sys_io_*() or posix aio users as it stands. Maybe we could swap tids on the switch, but we'd probably then have to audit the life time of tid -> task_struct users in the kernel. And there's still the question of what ptrace is supposed to do. - z ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kvm aio wishlist 2008-11-25 15:25 ` Zach Brown @ 2008-11-25 15:57 ` Ingo Molnar 2008-11-25 16:55 ` Avi Kivity 1 sibling, 0 replies; 10+ messages in thread From: Ingo Molnar @ 2008-11-25 15:57 UTC (permalink / raw) To: Zach Brown Cc: Jens Axboe, Avi Kivity, suparna, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner * Zach Brown <zach.brown@oracle.com> wrote: > > Still unsure why that stuff never got anywhere. > > Changing the tid of submitting tasks makes it unsuitable for > sys_io_*() or posix aio users as it stands. Maybe we could swap > tids on the switch, but we'd probably then have to audit the life > time of tid -> task_struct users in the kernel. doesnt look like a big thing affecting the fastpath materially. > And there's still the question of what ptrace is supposed to do. debug-only, we sure can work something out. Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kvm aio wishlist 2008-11-25 15:25 ` Zach Brown 2008-11-25 15:57 ` Ingo Molnar @ 2008-11-25 16:55 ` Avi Kivity 2008-11-25 16:57 ` Ingo Molnar 1 sibling, 1 reply; 10+ messages in thread From: Avi Kivity @ 2008-11-25 16:55 UTC (permalink / raw) To: Zach Brown Cc: Jens Axboe, Ingo Molnar, suparna, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner Zach Brown wrote: > And there's still the question of what ptrace is supposed to do. > If it's kernel-only (which I think is a good start for something like this), then is ptrace relevant at all? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kvm aio wishlist 2008-11-25 16:55 ` Avi Kivity @ 2008-11-25 16:57 ` Ingo Molnar 0 siblings, 0 replies; 10+ messages in thread From: Ingo Molnar @ 2008-11-25 16:57 UTC (permalink / raw) To: Avi Kivity Cc: Zach Brown, Jens Axboe, suparna, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner * Avi Kivity <avi@redhat.com> wrote: > Zach Brown wrote: >> And there's still the question of what ptrace is supposed to do. > > If it's kernel-only (which I think is a good start for something > like this), then is ptrace relevant at all? it's relevant wrt. details: to make sure that it's all transparent and the ptrace engine is not confused by thread switching tricks. Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kvm aio wishlist 2008-11-25 14:59 ` Ingo Molnar 2008-11-25 15:10 ` Jens Axboe @ 2008-11-25 16:51 ` Avi Kivity 2008-11-25 16:56 ` Ingo Molnar 1 sibling, 1 reply; 10+ messages in thread From: Avi Kivity @ 2008-11-25 16:51 UTC (permalink / raw) To: Ingo Molnar Cc: suparna, Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner Ingo Molnar wrote: > >> Perhaps a variant of syslet, that is kernel-only, and does: >> >> - always allocate a new kernel stack at io_submit() time, but not a >> new thread >> > > such a N:M threading design is a loss - sooner or later we arrive to a > point where people actually start using it and then we want to > load-balance and schedule these entities. > It's only N:M as long as its nonblocking. If it blocks it becomes 1:1 again. If it doesn't, it's probably faster to do things on the same cache as the caller. > So i'd suggest the kthread based async engine i wrote for syslets. It > worked well and for kernel-only entities it schedules super-fast - it > can do up to 20 million events per second on a 16-way box i'm testing > on. The objections about syslets were not related to the scheduling of > it but were mostly about the userspace API/ABI: you dont have to use > that. I'd love to have something :) I guess any cache and latency considerations could be fixed if - we schedule a syslet for the first time when the thread that launched it exits to userspace - we queue it on the current cpu's runqueue In that case, for the nonblocking case syslets and fibrils would have very similar performance. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kvm aio wishlist 2008-11-25 16:51 ` Avi Kivity @ 2008-11-25 16:56 ` Ingo Molnar 0 siblings, 0 replies; 10+ messages in thread From: Ingo Molnar @ 2008-11-25 16:56 UTC (permalink / raw) To: Avi Kivity Cc: suparna, Zach Brown, linux-aio, Jeff Moyer, Anthony Liguori, linux-kernel, Peter Zijlstra, Thomas Gleixner * Avi Kivity <avi@redhat.com> wrote: > Ingo Molnar wrote: >> >>> Perhaps a variant of syslet, that is kernel-only, and does: >>> >>> - always allocate a new kernel stack at io_submit() time, but not a >>> new thread >>> >> >> such a N:M threading design is a loss - sooner or later we arrive to a >> point where people actually start using it and then we want to >> load-balance and schedule these entities. >> > > It's only N:M as long as its nonblocking. If it blocks it becomes 1:1 > again. If it doesn't, it's probably faster to do things on the same > cache as the caller. > >> So i'd suggest the kthread based async engine i wrote for syslets. It >> worked well and for kernel-only entities it schedules super-fast - it >> can do up to 20 million events per second on a 16-way box i'm testing >> on. The objections about syslets were not related to the scheduling of >> it but were mostly about the userspace API/ABI: you dont have to use >> that. > > I'd love to have something :) > > I guess any cache and latency considerations could be fixed if > - we schedule a syslet for the first time when the thread that launched > it exits to userspace > - we queue it on the current cpu's runqueue > > In that case, for the nonblocking case syslets and fibrils would > have very similar performance. yes. Hence given that fibrills have various tradeoffs, we should do the syslet thread pool. The code is there and it works :) Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-11-25 16:57 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <492B0CDD.7080000@redhat.com>
[not found] ` <492B2348.9090008@oracle.com>
[not found] ` <492B2976.3010209@redhat.com>
[not found] ` <492B3912.3030707@oracle.com>
[not found] ` <492BC5CB.6000609@redhat.com>
2008-11-25 10:19 ` kvm aio wishlist Suparna Bhattacharya
2008-11-25 10:48 ` Avi Kivity
2008-11-25 14:59 ` Ingo Molnar
2008-11-25 15:10 ` Jens Axboe
2008-11-25 15:25 ` Zach Brown
2008-11-25 15:57 ` Ingo Molnar
2008-11-25 16:55 ` Avi Kivity
2008-11-25 16:57 ` Ingo Molnar
2008-11-25 16:51 ` Avi Kivity
2008-11-25 16:56 ` Ingo Molnar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.