From: Jens Axboe <axboe@kernel.dk>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>,
Michael Roth <mdroth@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] Linux multiqueue block layer thoughts
Date: Wed, 27 Nov 2013 19:15:13 -0700 [thread overview]
Message-ID: <20131128021513.GJ2360@kernel.dk> (raw)
In-Reply-To: <CAJSP0QXLfmq_dPvh90s-5VfaMN48u66+teca5_qYCq+ROpO=tw@mail.gmail.com>
On Wed, Nov 27 2013, Stefan Hajnoczi wrote:
> I finally got around to reading the Linux multiqueue block layer paper
> and wanted to share some thoughts about how it relates to QEMU and
> dataplane/QContext:
> http://kernel.dk/blk-mq.pdf
>
> I think Jens has virtio-blk multiqueue patches. So let's imagine that
> the virtio-blk device has multiple virtqueues. (virtio-scsi is
> already multiqueue BTW.)
>
> The paper focusses on two queue mappings: 1 queue per core and 1 queue
> per node. In both cases the idea is to keep the block I/O code path
> localized. This makes block I/O scale as the number of CPUs
> increases.
>
> In QEMU we'd want to set up a mapping for the virtio-blk mq device:
> each guest vcpu or guest node has a virtio-blk virtqueue which is
> serviced by a dataplane/QContext thread.
>
> QEMU would then process requests across these queues in parallel,
> although currently BlockDriverState is not thread-safe. At least for
> raw we should be able to submit requests in parallel from QEMU.
>
> Unfortunately there are some complications in the QEMU block layer:
> QEMU's own accounting, request tracking, and throttling features are
> global. We'd need to eventually do something similar to the
> multiqueue block layer changes in the kernel to detangle this state.
>
> Doing multiqueue for image formats is much more challenging - we'd
> have to tackle thread-safety in qcow2 and friends. For network block
> drivers like Gluster or NBD it's also not 100% clear what the best
> approach is. But I think the target here is local SSDs that are
> capable of high IOPs together with an SMP guest.
>
> At the end of all this we'd arrive at the following architecture:
> 1. Guest virtio device has multiple queues (1 per node or vcpu).
> 2. QEMU has multiple dataplane/QContext threads that process virtqueue
> kicks, they are bound to host CPUs/nodes.
> 3. Linux kernel has multiqueue block I/O.
I think that sounds very reasonable. Let me know if there's anything you
need help or advice with.
> Jens: when experimenting with multiqueue virtio-blk, how far did you
> modify QEMU to eliminate global request processing state from block.c?
I did very little scaling testing on virtio-blk, it was more a demo case
for conversion than anything else. So probably not of much use to what
you are looking for...
--
Jens Axboe
next prev parent reply other threads:[~2013-11-28 2:15 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-27 10:16 [Qemu-devel] Linux multiqueue block layer thoughts Stefan Hajnoczi
2013-11-28 2:15 ` Jens Axboe [this message]
2013-12-03 14:55 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131128021513.GJ2360@kernel.dk \
--to=axboe@kernel.dk \
--cc=kwolf@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.