From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57448) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vlr8r-0004ZG-3Z for qemu-devel@nongnu.org; Wed, 27 Nov 2013 21:15:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vlr8k-0005jH-Bu for qemu-devel@nongnu.org; Wed, 27 Nov 2013 21:15:45 -0500 Received: from merlin.infradead.org ([205.233.59.134]:52342) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vlr8j-0005hk-UN for qemu-devel@nongnu.org; Wed, 27 Nov 2013 21:15:37 -0500 Date: Wed, 27 Nov 2013 19:15:13 -0700 From: Jens Axboe Message-ID: <20131128021513.GJ2360@kernel.dk> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] Linux multiqueue block layer thoughts List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Paolo Bonzini , qemu-devel , Michael Roth On Wed, Nov 27 2013, Stefan Hajnoczi wrote: > I finally got around to reading the Linux multiqueue block layer paper > and wanted to share some thoughts about how it relates to QEMU and > dataplane/QContext: > http://kernel.dk/blk-mq.pdf > > I think Jens has virtio-blk multiqueue patches. So let's imagine that > the virtio-blk device has multiple virtqueues. (virtio-scsi is > already multiqueue BTW.) > > The paper focusses on two queue mappings: 1 queue per core and 1 queue > per node. In both cases the idea is to keep the block I/O code path > localized. This makes block I/O scale as the number of CPUs > increases. > > In QEMU we'd want to set up a mapping for the virtio-blk mq device: > each guest vcpu or guest node has a virtio-blk virtqueue which is > serviced by a dataplane/QContext thread. > > QEMU would then process requests across these queues in parallel, > although currently BlockDriverState is not thread-safe. At least for > raw we should be able to submit requests in parallel from QEMU. > > Unfortunately there are some complications in the QEMU block layer: > QEMU's own accounting, request tracking, and throttling features are > global. We'd need to eventually do something similar to the > multiqueue block layer changes in the kernel to detangle this state. > > Doing multiqueue for image formats is much more challenging - we'd > have to tackle thread-safety in qcow2 and friends. For network block > drivers like Gluster or NBD it's also not 100% clear what the best > approach is. But I think the target here is local SSDs that are > capable of high IOPs together with an SMP guest. > > At the end of all this we'd arrive at the following architecture: > 1. Guest virtio device has multiple queues (1 per node or vcpu). > 2. QEMU has multiple dataplane/QContext threads that process virtqueue > kicks, they are bound to host CPUs/nodes. > 3. Linux kernel has multiqueue block I/O. I think that sounds very reasonable. Let me know if there's anything you need help or advice with. > Jens: when experimenting with multiqueue virtio-blk, how far did you > modify QEMU to eliminate global request processing state from block.c? I did very little scaling testing on virtio-blk, it was more a demo case for conversion than anything else. So probably not of much use to what you are looking for... -- Jens Axboe