qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Michael Roth <mdroth@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] Linux multiqueue block layer thoughts
Date: Wed, 27 Nov 2013 19:15:13 -0700	[thread overview]
Message-ID: <20131128021513.GJ2360@kernel.dk> (raw)
In-Reply-To: <CAJSP0QXLfmq_dPvh90s-5VfaMN48u66+teca5_qYCq+ROpO=tw@mail.gmail.com>

On Wed, Nov 27 2013, Stefan Hajnoczi wrote:
> I finally got around to reading the Linux multiqueue block layer paper
> and wanted to share some thoughts about how it relates to QEMU and
> dataplane/QContext:
> http://kernel.dk/blk-mq.pdf
> 
> I think Jens has virtio-blk multiqueue patches.  So let's imagine that
> the virtio-blk device has multiple virtqueues.  (virtio-scsi is
> already multiqueue BTW.)
> 
> The paper focusses on two queue mappings: 1 queue per core and 1 queue
> per node.  In both cases the idea is to keep the block I/O code path
> localized.  This makes block I/O scale as the number of CPUs
> increases.
> 
> In QEMU we'd want to set up a mapping for the virtio-blk mq device:
> each guest vcpu or guest node has a virtio-blk virtqueue which is
> serviced by a dataplane/QContext thread.
> 
> QEMU would then process requests across these queues in parallel,
> although currently BlockDriverState is not thread-safe.  At least for
> raw we should be able to submit requests in parallel from QEMU.
> 
> Unfortunately there are some complications in the QEMU block layer:
> QEMU's own accounting, request tracking, and throttling features are
> global.  We'd need to eventually do something similar to the
> multiqueue block layer changes in the kernel to detangle this state.
> 
> Doing multiqueue for image formats is much more challenging - we'd
> have to tackle thread-safety in qcow2 and friends.  For network block
> drivers like Gluster or NBD it's also not 100% clear what the best
> approach is.  But I think the target here is local SSDs that are
> capable of high IOPs together with an SMP guest.
> 
> At the end of all this we'd arrive at the following architecture:
> 1. Guest virtio device has multiple queues (1 per node or vcpu).
> 2. QEMU has multiple dataplane/QContext threads that process virtqueue
> kicks, they are bound to host CPUs/nodes.
> 3. Linux kernel has multiqueue block I/O.

I think that sounds very reasonable. Let me know if there's anything you
need help or advice with.

> Jens: when experimenting with multiqueue virtio-blk, how far did you
> modify QEMU to eliminate global request processing state from block.c?

I did very little scaling testing on virtio-blk, it was more a demo case
for conversion than anything else. So probably not of much use to what
you are looking for...

-- 
Jens Axboe

  reply	other threads:[~2013-11-28  2:15 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27 10:16 [Qemu-devel] Linux multiqueue block layer thoughts Stefan Hajnoczi
2013-11-28  2:15 ` Jens Axboe [this message]
2013-12-03 14:55   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131128021513.GJ2360@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=kwolf@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).