* Re: [Qemu-devel] Combining synchronous and asynchronous IO @ 2019-04-11 11:41 ` Sergio Lopez 0 siblings, 0 replies; 2+ messages in thread From: Sergio Lopez @ 2019-04-11 11:41 UTC (permalink / raw) To: Kevin Wolf Cc: Stefan Hajnoczi, qemu-devel, qemu-block, mreitz@redhat.com, fam@euphon.net, Paolo Bonzini, jusual [-- Attachment #1: Type: text/plain, Size: 2829 bytes --] Kevin Wolf <kwolf@redhat.com> writes: > Am 15.03.2019 um 16:33 hat Sergio Lopez geschrieben: >> >> Stefan Hajnoczi writes: >> >> > On Thu, Mar 14, 2019 at 06:31:34PM +0100, Sergio Lopez wrote: >> >> Our current AIO path does a great job at unloading the work from the VM, >> >> and combined with IOThreads provides a good performance in most >> >> scenarios. But it also comes with its costs, in both a longer execution >> >> path and the need of the intervention of the scheduler at various >> >> points. >> >> >> >> There's one particular workload that suffers from this cost, and that's >> >> when you have just 1 or 2 cores on the Guest issuing synchronous >> >> requests. This happens to be a pretty common workload for some DBs and, >> >> in a general sense, on small VMs. >> >> >> >> I did a quick'n'dirty implementation on top of virtio-blk to get some >> >> numbers. This comes from a VM with 4 CPUs running on an idle server, >> >> with a secondary virtio-blk disk backed by a null_blk device with a >> >> simulated latency of 30us. >> > >> > Can you describe the implementation in more detail? Does "synchronous" >> > mean that hw/block/virtio_blk.c makes a blocking preadv()/pwritev() call >> > instead of calling blk_aio_preadv/pwritev()? If so, then you are also >> > bypassing the QEMU block layer (coroutines, request tracking, etc) and >> > that might explain some of the latency. >> >> The first implementation, the one I've used for getting these numbers, >> it's just preadv/pwrite from virtio_blk.c, as you correctly guessed. I >> know it's unfair, but I wanted to take a look at the best possible >> scenario, and then measure the cost of the other layers. >> >> I'm working now on writing non-coroutine counterparts for >> blk_co_[preadv|pwrite], so we have SIO without bypassing the block layer. > > Maybe try to keep the change local to file-posix.c? I think you would > only have to modify raw_thread_pool_submit() so that it doesn't go > through the thread pool, but just calls func directly. > > I don't think avoiding coroutines is possible without bypassing the block > layer altogether because everything is really expecting to be run in > coroutine context. Turns out what I initially thought was a cost induced by the AIO nature of our block layer, it's actually a bug in which polling mode works against aio=threads, delaying the execution of the request completions. This has been fixed by Paolo's "aio-posix: ensure poll mode is left when aio_notify is called": https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg01426.html So we can throw away the idea of combining synchronous and asynchronous requests, as it doesn't provide a significant improvement that would justify the added complexity. Thanks, Sergio. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Qemu-devel] Combining synchronous and asynchronous IO @ 2019-04-11 11:41 ` Sergio Lopez 0 siblings, 0 replies; 2+ messages in thread From: Sergio Lopez @ 2019-04-11 11:41 UTC (permalink / raw) To: Kevin Wolf Cc: fam@euphon.net, qemu-block, qemu-devel, mreitz@redhat.com, Stefan Hajnoczi, Paolo Bonzini, jusual [-- Attachment #1: Type: text/plain, Size: 2829 bytes --] Kevin Wolf <kwolf@redhat.com> writes: > Am 15.03.2019 um 16:33 hat Sergio Lopez geschrieben: >> >> Stefan Hajnoczi writes: >> >> > On Thu, Mar 14, 2019 at 06:31:34PM +0100, Sergio Lopez wrote: >> >> Our current AIO path does a great job at unloading the work from the VM, >> >> and combined with IOThreads provides a good performance in most >> >> scenarios. But it also comes with its costs, in both a longer execution >> >> path and the need of the intervention of the scheduler at various >> >> points. >> >> >> >> There's one particular workload that suffers from this cost, and that's >> >> when you have just 1 or 2 cores on the Guest issuing synchronous >> >> requests. This happens to be a pretty common workload for some DBs and, >> >> in a general sense, on small VMs. >> >> >> >> I did a quick'n'dirty implementation on top of virtio-blk to get some >> >> numbers. This comes from a VM with 4 CPUs running on an idle server, >> >> with a secondary virtio-blk disk backed by a null_blk device with a >> >> simulated latency of 30us. >> > >> > Can you describe the implementation in more detail? Does "synchronous" >> > mean that hw/block/virtio_blk.c makes a blocking preadv()/pwritev() call >> > instead of calling blk_aio_preadv/pwritev()? If so, then you are also >> > bypassing the QEMU block layer (coroutines, request tracking, etc) and >> > that might explain some of the latency. >> >> The first implementation, the one I've used for getting these numbers, >> it's just preadv/pwrite from virtio_blk.c, as you correctly guessed. I >> know it's unfair, but I wanted to take a look at the best possible >> scenario, and then measure the cost of the other layers. >> >> I'm working now on writing non-coroutine counterparts for >> blk_co_[preadv|pwrite], so we have SIO without bypassing the block layer. > > Maybe try to keep the change local to file-posix.c? I think you would > only have to modify raw_thread_pool_submit() so that it doesn't go > through the thread pool, but just calls func directly. > > I don't think avoiding coroutines is possible without bypassing the block > layer altogether because everything is really expecting to be run in > coroutine context. Turns out what I initially thought was a cost induced by the AIO nature of our block layer, it's actually a bug in which polling mode works against aio=threads, delaying the execution of the request completions. This has been fixed by Paolo's "aio-posix: ensure poll mode is left when aio_notify is called": https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg01426.html So we can throw away the idea of combining synchronous and asynchronous requests, as it doesn't provide a significant improvement that would justify the added complexity. Thanks, Sergio. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-04-11 11:42 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <87zhpxmkg9.fsf@redhat.com>
[not found] ` <20190315150036.GA11173@stefanha-x1.localdomain>
[not found] ` <87a7hwm9t4.fsf@redhat.com>
[not found] ` <20190315155010.GG5368@linux.fritz.box>
2019-04-11 11:41 ` [Qemu-devel] Combining synchronous and asynchronous IO Sergio Lopez
2019-04-11 11:41 ` Sergio Lopez
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.