From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:44839) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TMHsz-0003x5-E8 for qemu-devel@nongnu.org; Thu, 11 Oct 2012 08:29:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TMHsq-0002Uj-Sy for qemu-devel@nongnu.org; Thu, 11 Oct 2012 08:29:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:32526) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TMHsq-0002Ub-KA for qemu-devel@nongnu.org; Thu, 11 Oct 2012 08:29:00 -0400 Message-ID: <5076BB71.9020500@redhat.com> Date: Thu, 11 Oct 2012 14:28:33 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <1348577763-12920-1-git-send-email-pbonzini@redhat.com> <20121008113932.GB16332@stefanha-thinkpad.redhat.com> <5072CE54.8020208@redhat.com> <20121009090811.GB13775@stefanha-thinkpad.redhat.com> <877gqzn0xc.fsf@codemonkey.ws> In-Reply-To: <877gqzn0xc.fsf@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Stefan Hajnoczi , Avi Kivity , Ping Fan Liu , qemu-devel@nongnu.org, Paolo Bonzini Am 09.10.2012 17:02, schrieb Anthony Liguori: > Stefan Hajnoczi writes: > >> On Mon, Oct 08, 2012 at 03:00:04PM +0200, Paolo Bonzini wrote: >>> Another important step would be to add bdrv_drain. Kevin pointed out to >>> me that only ->file and ->backing_hd need to be drained. Well, there >>> may be other BlockDriverStates for vmdk extents or similar cases >>> (Benoit's quorum device for example)... these need to be handled the >>> same way for bdrv_flush, bdrv_reopen, bdrv_drain so perhaps it is useful >>> to add a common way to get them. >>> >>> And you need a lock to the AioContext, too. Then the block device can >>> we the AioContext lock in order to synchronize multiple threads working >>> on the block device. The lock will effectively block the ioeventfd >>> thread, so that bdrv_lock+bdrv_drain+...+bdrv_unlock is a replacement >>> for the current usage of bdrv_drain_all within the QEMU lock. >>> >>>> I'm starting to work on these steps and will send RFCs. This series >>>> looks good to me. >>> >>> Thanks! A lot of the next steps can be done in parallel and more >>> importantly none of them blocks each other (roughly)... so I'm eager to >>> look at your stuff! :) >> >> Some notes on moving virtio-blk processing out of the QEMU global mutex: >> >> 1. Dedicated thread for non-QEMU mutex virtio ioeventfd processing. >> The point of this thread is to process without the QEMU global mutex, using >> only fine-grained locks. (In the future this thread can be integrated back >> into the QEMU iothread when the global mutex has been eliminated.) >> >> Dedicated thread must hold reference to virtio-blk device so it will >> not be destroyed. Hot unplug requires asking ioeventfd processing >> threads to release reference. >> >> 2. Versions of virtqueue_pop() and virtqueue_push() that execute outside >> global QEMU mutex. Look at memory API and threaded device dispatch. >> >> The virtio device itself must have a lock so its vring-related state >> can be modified safely. >> >> Here are the steps that have been mentioned: >> >> 1. aio fastpath - for raw-posix and other aio block drivers, can we reduce I/O >> request latency by skipping block layer coroutines? This is can be >> prototyped (hacked) easily to scope out how much benefit we get. It's >> completely independent from the global mutex related work. > > We've discussed previously about having an additional layer on top of > the block API. > > One problem with the block API today is that it doesn't distinguish > between device access and internal access. I think this is an > opportunity to introduce a device-only API. > > In the very short term, I can imagine an aio fastpath that was only > implemented in terms of the device API. We could have a slow path that > acquired the BQL. FWIW, I think we'll automatically get two APIs with the BlockDriverState/BlockBackend separation. However, I'm not entirely sure if it's exactly the thing you're imagining, because BlockBackend (the "device API") wouldn't only be used by devices, but also by qemu-img/io, libqblock and probably block jobs, too. Kevin