From: Chris Friesen <chris.friesen@windriver.com>
To: "Benoît Canet" <benoit.canet@irqsave.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] is there a limit on the number of in-flight I/O operations?
Date: Tue, 26 Aug 2014 23:43:55 -0600 [thread overview]
Message-ID: <53FD701B.9030402@windriver.com> (raw)
In-Reply-To: <53FBAF8A.3050005@windriver.com>
On 08/25/2014 03:50 PM, Chris Friesen wrote:
> I think I might have a glimmering of what's going on. Someone please
> correct me if I get something wrong.
>
> I think that VIRTIO_PCI_QUEUE_MAX doesn't really mean anything with
> respect to max inflight operations, and neither does virtio-blk calling
> virtio_add_queue() with a queue size of 128.
>
> I think what's happening is that virtio_blk_handle_output() spins,
> pulling data off the 128-entry queue and calling
> virtio_blk_handle_request(). At this point that queue entry can be
> reused, so the queue size isn't really relevant.
>
> In virtio_blk_handle_write() we add the request to a MultiReqBuffer and
> every 32 writes we'll call virtio_submit_multiwrite() which calls down
> into bdrv_aio_multiwrite(). That tries to merge requests and then for
> each resulting request calls bdrv_aio_writev() which ends up calling
> qemu_rbd_aio_writev(), which calls rbd_start_aio().
>
> rbd_start_aio() allocates a buffer and converts from iovec to a single
> buffer. This buffer stays allocated until the request is acked, which
> is where the bulk of the memory overhead with rbd is coming from (has
> anyone considered adding iovec support to rbd to avoid this extra copy?).
>
> The only limit I see in the whole call chain from
> virtio_blk_handle_request() on down is the call to
> bdrv_io_limits_intercept() in bdrv_co_do_writev(). However, that
> doesn't provide any limit on the absolute number of inflight operations,
> only on operations/sec. If the ceph server cluster can't keep up with
> the aggregate load, then the number of inflight operations can still
> grow indefinitely.
>
> Chris
I was a bit concerned that I'd need to extend the IO throttling code to
support a limit on total inflight bytes, but it doesn't look like that
will be necessary.
It seems that using mallopt() to set the trim/mmap thresholds to 128K is
enough to minimize the increase in RSS and also drop it back down after
an I/O burst. For now this looks like it should be sufficient for our
purposes.
I'm actually a bit surprised I didn't have to go lower, but it seems to
work for both "dd" and dbench testcases so we'll give it a try.
Chris
next prev parent reply other threads:[~2014-08-27 5:44 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-18 14:58 [Qemu-devel] is there a limit on the number of in-flight I/O operations? Chris Friesen
2014-07-18 15:24 ` Paolo Bonzini
2014-07-18 16:22 ` Chris Friesen
2014-07-18 20:13 ` Paolo Bonzini
2014-07-18 22:48 ` Chris Friesen
2014-07-19 5:49 ` Paolo Bonzini
2014-07-19 6:27 ` Chris Friesen
2014-07-19 7:23 ` Paolo Bonzini
2014-07-19 8:45 ` Benoît Canet
2014-07-21 14:59 ` Chris Friesen
2014-07-21 15:15 ` Benoît Canet
2014-07-21 15:35 ` Chris Friesen
2014-07-21 15:54 ` Benoît Canet
2014-07-21 16:10 ` Benoît Canet
2014-08-23 0:59 ` Chris Friesen
2014-08-23 7:56 ` Benoît Canet
2014-08-25 15:12 ` Chris Friesen
2014-08-25 17:43 ` Chris Friesen
2015-08-27 16:37 ` Stefan Hajnoczi
2015-08-27 16:33 ` Stefan Hajnoczi
2014-08-25 21:50 ` Chris Friesen
2014-08-27 5:43 ` Chris Friesen [this message]
2015-05-14 13:42 ` Andrey Korolyov
2015-08-26 17:10 ` Andrey Korolyov
2015-08-26 23:31 ` Josh Durgin
2015-08-26 23:47 ` Andrey Korolyov
2015-08-27 0:56 ` Josh Durgin
2015-08-27 16:48 ` Stefan Hajnoczi
2015-08-27 17:05 ` Stefan Hajnoczi
2015-08-27 16:49 ` Stefan Hajnoczi
2015-08-28 0:31 ` Josh Durgin
2015-08-28 8:31 ` Andrey Korolyov
2014-07-21 19:47 ` Benoît Canet
2014-07-21 21:12 ` Chris Friesen
2014-07-21 22:04 ` Benoît Canet
2014-07-18 15:54 ` Andrey Korolyov
2014-07-18 16:26 ` Chris Friesen
2014-07-18 16:30 ` Andrey Korolyov
2014-07-18 16:46 ` Chris Friesen
[not found] <1000957815.25879188.1441820902018.JavaMail.zimbra@redhat.com>
2015-09-09 18:51 ` Jason Dillaman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53FD701B.9030402@windriver.com \
--to=chris.friesen@windriver.com \
--cc=benoit.canet@irqsave.net \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).