qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chris Friesen <chris.friesen@windriver.com>
To: "Benoît Canet" <benoit.canet@irqsave.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] is there a limit on the number of in-flight I/O operations?
Date: Tue, 26 Aug 2014 23:43:55 -0600	[thread overview]
Message-ID: <53FD701B.9030402@windriver.com> (raw)
In-Reply-To: <53FBAF8A.3050005@windriver.com>

On 08/25/2014 03:50 PM, Chris Friesen wrote:

> I think I might have a glimmering of what's going on.  Someone please
> correct me if I get something wrong.
>
> I think that VIRTIO_PCI_QUEUE_MAX doesn't really mean anything with
> respect to max inflight operations, and neither does virtio-blk calling
> virtio_add_queue() with a queue size of 128.
>
> I think what's happening is that virtio_blk_handle_output() spins,
> pulling data off the 128-entry queue and calling
> virtio_blk_handle_request().  At this point that queue entry can be
> reused, so the queue size isn't really relevant.
>
> In virtio_blk_handle_write() we add the request to a MultiReqBuffer and
> every 32 writes we'll call virtio_submit_multiwrite() which calls down
> into bdrv_aio_multiwrite().  That tries to merge requests and then for
> each resulting request calls bdrv_aio_writev() which ends up calling
> qemu_rbd_aio_writev(), which calls rbd_start_aio().
>
> rbd_start_aio() allocates a buffer and converts from iovec to a single
> buffer.  This buffer stays allocated until the request is acked, which
> is where the bulk of the memory overhead with rbd is coming from (has
> anyone considered adding iovec support to rbd to avoid this extra copy?).
>
> The only limit I see in the whole call chain from
> virtio_blk_handle_request() on down is the call to
> bdrv_io_limits_intercept() in bdrv_co_do_writev().  However, that
> doesn't provide any limit on the absolute number of inflight operations,
> only on operations/sec.  If the ceph server cluster can't keep up with
> the aggregate load, then the number of inflight operations can still
> grow indefinitely.
>
> Chris

I was a bit concerned that I'd need to extend the IO throttling code to 
support a limit on total inflight bytes, but it doesn't look like that 
will be necessary.

It seems that using mallopt() to set the trim/mmap thresholds to 128K is 
enough to minimize the increase in RSS and also drop it back down after 
an I/O burst.  For now this looks like it should be sufficient for our 
purposes.

I'm actually a bit surprised I didn't have to go lower, but it seems to 
work for both "dd" and dbench testcases so we'll give it a try.

Chris

  reply	other threads:[~2014-08-27  5:44 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-18 14:58 [Qemu-devel] is there a limit on the number of in-flight I/O operations? Chris Friesen
2014-07-18 15:24 ` Paolo Bonzini
2014-07-18 16:22   ` Chris Friesen
2014-07-18 20:13     ` Paolo Bonzini
2014-07-18 22:48       ` Chris Friesen
2014-07-19  5:49         ` Paolo Bonzini
2014-07-19  6:27           ` Chris Friesen
2014-07-19  7:23             ` Paolo Bonzini
2014-07-19  8:45               ` Benoît Canet
2014-07-21 14:59                 ` Chris Friesen
2014-07-21 15:15                   ` Benoît Canet
2014-07-21 15:35                     ` Chris Friesen
2014-07-21 15:54                       ` Benoît Canet
2014-07-21 16:10                       ` Benoît Canet
2014-08-23  0:59                         ` Chris Friesen
2014-08-23  7:56                           ` Benoît Canet
2014-08-25 15:12                             ` Chris Friesen
2014-08-25 17:43                               ` Chris Friesen
2015-08-27 16:37                                 ` Stefan Hajnoczi
2015-08-27 16:33                               ` Stefan Hajnoczi
2014-08-25 21:50                             ` Chris Friesen
2014-08-27  5:43                               ` Chris Friesen [this message]
2015-05-14 13:42                                 ` Andrey Korolyov
2015-08-26 17:10                                   ` Andrey Korolyov
2015-08-26 23:31                                     ` Josh Durgin
2015-08-26 23:47                                       ` Andrey Korolyov
2015-08-27  0:56                                         ` Josh Durgin
2015-08-27 16:48                               ` Stefan Hajnoczi
2015-08-27 17:05                                 ` Stefan Hajnoczi
2015-08-27 16:49                               ` Stefan Hajnoczi
2015-08-28  0:31                                 ` Josh Durgin
2015-08-28  8:31                                   ` Andrey Korolyov
2014-07-21 19:47                       ` Benoît Canet
2014-07-21 21:12                         ` Chris Friesen
2014-07-21 22:04                           ` Benoît Canet
2014-07-18 15:54 ` Andrey Korolyov
2014-07-18 16:26   ` Chris Friesen
2014-07-18 16:30     ` Andrey Korolyov
2014-07-18 16:46       ` Chris Friesen
     [not found] <1000957815.25879188.1441820902018.JavaMail.zimbra@redhat.com>
2015-09-09 18:51 ` Jason Dillaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53FD701B.9030402@windriver.com \
    --to=chris.friesen@windriver.com \
    --cc=benoit.canet@irqsave.net \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).