From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:56071) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLXQF-0008Eu-IU for qemu-devel@nongnu.org; Tue, 09 Oct 2012 06:52:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLXQA-0003HH-4S for qemu-devel@nongnu.org; Tue, 09 Oct 2012 06:52:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45534) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLXQ9-0003H7-Qm for qemu-devel@nongnu.org; Tue, 09 Oct 2012 06:52:18 -0400 Message-ID: <507401D8.8090203@redhat.com> Date: Tue, 09 Oct 2012 12:52:08 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1348577763-12920-1-git-send-email-pbonzini@redhat.com> <20121008113932.GB16332@stefanha-thinkpad.redhat.com> <5072CE54.8020208@redhat.com> <20121009090811.GB13775@stefanha-thinkpad.redhat.com> <5073EDB3.3020804@redhat.com> <5073FE3A.1090903@redhat.com> In-Reply-To: <5073FE3A.1090903@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Kevin Wolf , Stefan Hajnoczi , Anthony Liguori , Ping Fan Liu , qemu-devel@nongnu.org On 10/09/2012 12:36 PM, Paolo Bonzini wrote: > Il 09/10/2012 11:26, Avi Kivity ha scritto: >> On 10/09/2012 11:08 AM, Stefan Hajnoczi wrote: >>> Here are the steps that have been mentioned: >>> >>> 1. aio fastpath - for raw-posix and other aio block drivers, can we reduce I/O >>> request latency by skipping block layer coroutines? >> >> Is coroutine overhead noticable? > > I'm thinking more about throughput than latency. If the iothread > becomes CPU-bound, then everything is noticeable. That's not strictly a coroutine issue. Switching to ordinary threads may make the problem worse, since there will clearly be contention. What is the I/O processing time we have? If it's say 10 microseconds, then we'll have 100,000 context switches per second assuming a device lock and a saturated iothread (split into multiple threads). The coroutine work may have laid the groundwork for fine-grained locking. I'm doubtful we should use qcow when we want >100K IOPS though. > >>> I'm also curious about virtqueue_pop()/virtqueue_push() outside the QEMU mutex >>> although that might be blocked by the current work around MMIO/PIO dispatch >>> outside the global mutex. >> >> It is, yes. > > It should only require unlocked memory map/unmap, not MMIO dispatch. > The MMIO/PIO bits are taken care of by ioeventfd. The ring, or indirect descriptors, or the data, can all be on mmio. IIRC the virtio spec forbids that, but the APIs have to be general. We don't have cpu_physical_memory_map_nommio() (or address_space_map_nommio(), as soon as the coding style committee ratifies srtuct literals). -- error compiling committee.c: too many arguments to function