From: Chris Friesen <chris.friesen@windriver.com>
To: "Benoît Canet" <benoit.canet@irqsave.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] is there a limit on the number of in-flight I/O operations?
Date: Tue, 26 Aug 2014 23:43:55 -0600 [thread overview]
Message-ID: <53FD701B.9030402@windriver.com> (raw)
In-Reply-To: <53FBAF8A.3050005@windriver.com>
On 08/25/2014 03:50 PM, Chris Friesen wrote:
> I think I might have a glimmering of what's going on. Someone please
> correct me if I get something wrong.
>
> I think that VIRTIO_PCI_QUEUE_MAX doesn't really mean anything with
> respect to max inflight operations, and neither does virtio-blk calling
> virtio_add_queue() with a queue size of 128.
>
> I think what's happening is that virtio_blk_handle_output() spins,
> pulling data off the 128-entry queue and calling
> virtio_blk_handle_request(). At this point that queue entry can be
> reused, so the queue size isn't really relevant.
>
> In virtio_blk_handle_write() we add the request to a MultiReqBuffer and
> every 32 writes we'll call virtio_submit_multiwrite() which calls down
> into bdrv_aio_multiwrite(). That tries to merge requests and then for
> each resulting request calls bdrv_aio_writev() which ends up calling
> qemu_rbd_aio_writev(), which calls rbd_start_aio().
>
> rbd_start_aio() allocates a buffer and converts from iovec to a single
> buffer. This buffer stays allocated until the request is acked, which
> is where the bulk of the memory overhead with rbd is coming from (has
> anyone considered adding iovec support to rbd to avoid this extra copy?).
>
> The only limit I see in the whole call chain from
> virtio_blk_handle_request() on down is the call to
> bdrv_io_limits_intercept() in bdrv_co_do_writev(). However, that
> doesn't provide any limit on the absolute number of inflight operations,
> only on operations/sec. If the ceph server cluster can't keep up with
> the aggregate load, then the number of inflight operations can still
> grow indefinitely.
>
> Chris
I was a bit concerned that I'd need to extend the IO throttling code to
support a limit on total inflight bytes, but it doesn't look like that
will be necessary.
It seems that using mallopt() to set the trim/mmap thresholds to 128K is
enough to minimize the increase in RSS and also drop it back down after
an I/O burst. For now this looks like it should be sufficient for our
purposes.
I'm actually a bit surprised I didn't have to go lower, but it seems to
work for both "dd" and dbench testcases so we'll give it a try.
Chris
next prev parent reply other threads:[~2014-08-27 5:44 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-18 14:58 [Qemu-devel] is there a limit on the number of in-flight I/O operations? Chris Friesen
2014-07-18 15:24 ` Paolo Bonzini
2014-07-18 16:22 ` Chris Friesen
2014-07-18 20:13 ` Paolo Bonzini
2014-07-18 22:48 ` Chris Friesen
2014-07-19 5:49 ` Paolo Bonzini
2014-07-19 6:27 ` Chris Friesen
2014-07-19 7:23 ` Paolo Bonzini
2014-07-19 8:45 ` Benoît Canet
2014-07-21 14:59 ` Chris Friesen
2014-07-21 15:15 ` Benoît Canet
2014-07-21 15:35 ` Chris Friesen
2014-07-21 15:54 ` Benoît Canet
2014-07-21 16:10 ` Benoît Canet
2014-08-23 0:59 ` Chris Friesen
2014-08-23 7:56 ` Benoît Canet
2014-08-25 15:12 ` Chris Friesen
2014-08-25 17:43 ` Chris Friesen
2015-08-27 16:37 ` Stefan Hajnoczi
2015-08-27 16:33 ` Stefan Hajnoczi
2014-08-25 21:50 ` Chris Friesen
2014-08-27 5:43 ` Chris Friesen [this message]
2015-05-14 13:42 ` Andrey Korolyov
2015-08-26 17:10 ` Andrey Korolyov
2015-08-26 23:31 ` Josh Durgin
2015-08-26 23:47 ` Andrey Korolyov
2015-08-27 0:56 ` Josh Durgin
2015-08-27 16:48 ` Stefan Hajnoczi
2015-08-27 17:05 ` Stefan Hajnoczi
2015-08-27 16:49 ` Stefan Hajnoczi
2015-08-28 0:31 ` Josh Durgin
2015-08-28 8:31 ` Andrey Korolyov
2014-07-21 19:47 ` Benoît Canet
2014-07-21 21:12 ` Chris Friesen
2014-07-21 22:04 ` Benoît Canet
2014-07-18 15:54 ` Andrey Korolyov
2014-07-18 16:26 ` Chris Friesen
2014-07-18 16:30 ` Andrey Korolyov
2014-07-18 16:46 ` Chris Friesen
[not found] <1000957815.25879188.1441820902018.JavaMail.zimbra@redhat.com>
2015-09-09 18:51 ` Jason Dillaman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53FD701B.9030402@windriver.com \
--to=chris.friesen@windriver.com \
--cc=benoit.canet@irqsave.net \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.