qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chris Friesen <chris.friesen@windriver.com>
To: "Benoît Canet" <benoit.canet@irqsave.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] is there a limit on the number of in-flight I/O operations?
Date: Mon, 25 Aug 2014 15:50:02 -0600	[thread overview]
Message-ID: <53FBAF8A.3050005@windriver.com> (raw)
In-Reply-To: <20140823075658.GA6687@irqsave.net>

On 08/23/2014 01:56 AM, Benoît Canet wrote:
> The Friday 22 Aug 2014 à 18:59:38 (-0600), Chris Friesen wrote :
>> On 07/21/2014 10:10 AM, Benoît Canet wrote:
>>> The Monday 21 Jul 2014 à 09:35:29 (-0600), Chris Friesen wrote :
>>>> On 07/21/2014 09:15 AM, Benoît Canet wrote:
>>>>> The Monday 21 Jul 2014 à 08:59:45 (-0600), Chris Friesen wrote :
>>>>>> On 07/19/2014 02:45 AM, Benoît Canet wrote:
>>>>>>
>>>>>>> I think in the throttling case the number of in flight operation is limited by
>>>>>>> the emulated hardware queue. Else request would pile up and throttling would be
>>>>>>> inefective.
>>>>>>>
>>>>>>> So this number should be around: #define VIRTIO_PCI_QUEUE_MAX 64 or something like than that.
>>>>>>
>>>>>> Okay, that makes sense.  Do you know how much data can be written as part of
>>>>>> a single operation?  We're using 2MB hugepages for the guest memory, and we
>>>>>> saw the qemu RSS numbers jump from 25-30MB during normal operation up to
>>>>>> 120-180MB when running dbench.  I'd like to know what the worst-case would
>>>
>>> Sorry I didn't understood this part at first read.
>>>
>>> In the linux guest can you monitor:
>>> benoit@Laure:~$ cat /sys/class/block/xyz/inflight ?
>>>
>>> This would give us a faily precise number of the requests actually in flight between the guest and qemu.
>>
>>
>> After a bit of a break I'm looking at this again.
>>
>
> Strange.
>
> I would use dd with the flag oflag=nocache to make sure the write request
> does not do in the guest cache though.
>
> Best regards
>
> Benoît
>
>> While doing "dd if=/dev/zero of=testfile bs=1M count=700" in the guest, I
>> got a max "inflight" value of 181.  This seems quite a bit higher than
>> VIRTIO_PCI_QUEUE_MAX.
>>
>> I've seen throughput as high as ~210 MB/sec, which also kicked the RSS
>> numbers up above 200MB.
>>
>> I tried dropping VIRTIO_PCI_QUEUE_MAX down to 32 (it didn't seem to work at
>> all for values much less than that, though I didn't bother getting an exact
>> value) and it didn't really make any difference, I saw inflight values as
>> high as 177.

I think I might have a glimmering of what's going on.  Someone please 
correct me if I get something wrong.

I think that VIRTIO_PCI_QUEUE_MAX doesn't really mean anything with 
respect to max inflight operations, and neither does virtio-blk calling 
virtio_add_queue() with a queue size of 128.

I think what's happening is that virtio_blk_handle_output() spins, 
pulling data off the 128-entry queue and calling 
virtio_blk_handle_request().  At this point that queue entry can be 
reused, so the queue size isn't really relevant.

In virtio_blk_handle_write() we add the request to a MultiReqBuffer and 
every 32 writes we'll call virtio_submit_multiwrite() which calls down 
into bdrv_aio_multiwrite().  That tries to merge requests and then for 
each resulting request calls bdrv_aio_writev() which ends up calling 
qemu_rbd_aio_writev(), which calls rbd_start_aio().

rbd_start_aio() allocates a buffer and converts from iovec to a single 
buffer.  This buffer stays allocated until the request is acked, which 
is where the bulk of the memory overhead with rbd is coming from (has 
anyone considered adding iovec support to rbd to avoid this extra copy?).

The only limit I see in the whole call chain from 
virtio_blk_handle_request() on down is the call to 
bdrv_io_limits_intercept() in bdrv_co_do_writev().  However, that 
doesn't provide any limit on the absolute number of inflight operations, 
only on operations/sec.  If the ceph server cluster can't keep up with 
the aggregate load, then the number of inflight operations can still 
grow indefinitely.

Chris

  parent reply	other threads:[~2014-08-25 21:50 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-18 14:58 [Qemu-devel] is there a limit on the number of in-flight I/O operations? Chris Friesen
2014-07-18 15:24 ` Paolo Bonzini
2014-07-18 16:22   ` Chris Friesen
2014-07-18 20:13     ` Paolo Bonzini
2014-07-18 22:48       ` Chris Friesen
2014-07-19  5:49         ` Paolo Bonzini
2014-07-19  6:27           ` Chris Friesen
2014-07-19  7:23             ` Paolo Bonzini
2014-07-19  8:45               ` Benoît Canet
2014-07-21 14:59                 ` Chris Friesen
2014-07-21 15:15                   ` Benoît Canet
2014-07-21 15:35                     ` Chris Friesen
2014-07-21 15:54                       ` Benoît Canet
2014-07-21 16:10                       ` Benoît Canet
2014-08-23  0:59                         ` Chris Friesen
2014-08-23  7:56                           ` Benoît Canet
2014-08-25 15:12                             ` Chris Friesen
2014-08-25 17:43                               ` Chris Friesen
2015-08-27 16:37                                 ` Stefan Hajnoczi
2015-08-27 16:33                               ` Stefan Hajnoczi
2014-08-25 21:50                             ` Chris Friesen [this message]
2014-08-27  5:43                               ` Chris Friesen
2015-05-14 13:42                                 ` Andrey Korolyov
2015-08-26 17:10                                   ` Andrey Korolyov
2015-08-26 23:31                                     ` Josh Durgin
2015-08-26 23:47                                       ` Andrey Korolyov
2015-08-27  0:56                                         ` Josh Durgin
2015-08-27 16:48                               ` Stefan Hajnoczi
2015-08-27 17:05                                 ` Stefan Hajnoczi
2015-08-27 16:49                               ` Stefan Hajnoczi
2015-08-28  0:31                                 ` Josh Durgin
2015-08-28  8:31                                   ` Andrey Korolyov
2014-07-21 19:47                       ` Benoît Canet
2014-07-21 21:12                         ` Chris Friesen
2014-07-21 22:04                           ` Benoît Canet
2014-07-18 15:54 ` Andrey Korolyov
2014-07-18 16:26   ` Chris Friesen
2014-07-18 16:30     ` Andrey Korolyov
2014-07-18 16:46       ` Chris Friesen
     [not found] <1000957815.25879188.1441820902018.JavaMail.zimbra@redhat.com>
2015-09-09 18:51 ` Jason Dillaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53FBAF8A.3050005@windriver.com \
    --to=chris.friesen@windriver.com \
    --cc=benoit.canet@irqsave.net \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).