From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:53938) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLYQz-00066f-Vk for qemu-devel@nongnu.org; Tue, 09 Oct 2012 07:57:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLYQv-0001Ov-5z for qemu-devel@nongnu.org; Tue, 09 Oct 2012 07:57:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37475) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLYQu-0001Oi-Sq for qemu-devel@nongnu.org; Tue, 09 Oct 2012 07:57:09 -0400 Message-ID: <507410BD.6050901@redhat.com> Date: Tue, 09 Oct 2012 13:55:41 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1348577763-12920-1-git-send-email-pbonzini@redhat.com> <20121008113932.GB16332@stefanha-thinkpad.redhat.com> <5072CE54.8020208@redhat.com> <20121009090811.GB13775@stefanha-thinkpad.redhat.com> <5073EDB3.3020804@redhat.com> <5073FE3A.1090903@redhat.com> <507401D8.8090203@redhat.com> <507405B5.4060108@redhat.com> In-Reply-To: <507405B5.4060108@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Kevin Wolf , Stefan Hajnoczi , Anthony Liguori , Ping Fan Liu , qemu-devel@nongnu.org On 10/09/2012 01:08 PM, Paolo Bonzini wrote: >> >> That's not strictly a coroutine issue. Switching to ordinary threads >> may make the problem worse, since there will clearly be contention. > > The point is you don't need either coroutines or userspace threads if > you use native AIO. longjmp/setjmp is probably a smaller overhead > compared to the many syscalls involved in poll+eventfd > reads+io_submit+io_getevents, but it's also not cheap. Also, if you > process AIO in batches you risk overflowing the pool of free coroutines, > which gets expensive real fast (allocate/free the stack, do the > expensive getcontext/swapcontext instead of the cheaper longjmp/setjmp, > etc.). > > It seems better to sidestep the issue completely, it's a small amount of > work. Oh, agree 100% raw + native aio wants to bypass coroutines/threads completely. >> What is the I/O processing time we have? If it's say 10 microseconds, >> then we'll have 100,000 context switches per second assuming a device >> lock and a saturated iothread (split into multiple threads). > > Hopefully with a saturated dedicated iothread you would not have any > context switches and a single CPU will be just dedicated to virtio > processing. I meant, if you break that saturated thread into multiple threads (in order to break the 1 core limit), then you start to context switch badly. > >> The coroutine work may have laid the groundwork for fine-grained >> locking. I'm doubtful we should use qcow when we want >100K IOPS though. > > Yep. Going away from coroutines is a solution in search of a problem, > it will introduce several new variables (kernel scheduling, more > expensive lock contention, starving the thread pool with locked threads, > ...), all for a case where performance hardly matters. > >>>>> I'm also curious about virtqueue_pop()/virtqueue_push() outside the QEMU mutex >>>>> although that might be blocked by the current work around MMIO/PIO dispatch >>>>> outside the global mutex. >>>> >>>> It is, yes. >>> >>> It should only require unlocked memory map/unmap, not MMIO dispatch. >>> The MMIO/PIO bits are taken care of by ioeventfd. >> >> The ring, or indirect descriptors, or the data, can all be on mmio. >> IIRC the virtio spec forbids that, but the APIs have to be general. We >> don't have cpu_physical_memory_map_nommio() (or >> address_space_map_nommio(), as soon as the coding style committee >> ratifies srtuct literals). > > cpu_physical_memory_map could still take the QEMU lock in the slow > bounce-buffer case. You're right. In fact this is a good opportunity to introduce lockless lookups where the only optimized path is RAM -- ioeventfd provides a lockless lookup of its own. We could perhaps even avoid refcounting, by shutting down the device thread as part of hotunplug. [could we also avoid refcounting by doing the equivalent of stop_machine() during hotunplug?] > BTW the block layer has been using struct literals > for a long time and we're just as happy as you are about them. :) So does upstream memory.c and the json tests. -- error compiling committee.c: too many arguments to function