qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Juan Quintela <quintela@trasno.org>,
	qemu-devel@nongnu.org, kvm-devel <kvm@vger.kernel.org>,
	Juan Quintela <quintela@redhat.com>
Subject: [Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too long
Date: Tue, 30 Nov 2010 09:00:09 -0600	[thread overview]
Message-ID: <4CF51179.9070306@codemonkey.ws> (raw)
In-Reply-To: <4CF5063B.7020504@redhat.com>

On 11/30/2010 08:12 AM, Paolo Bonzini wrote:
> On 11/30/2010 02:47 PM, Anthony Liguori wrote:
>> On 11/30/2010 01:15 AM, Paolo Bonzini wrote:
>>> On 11/30/2010 03:11 AM, Anthony Liguori wrote:
>>>>
>>>> BufferedFile should hit the qemu_file_rate_limit check when the socket
>>>> buffer gets filled up.
>>>
>>> The problem is that the file rate limit is not hit because work is
>>> done elsewhere. The rate can limit the bandwidth used and makes QEMU
>>> aware that socket operations may block (because that's what the
>>> buffered file freeze/unfreeze logic does); but it cannot be used to
>>> limit the _time_ spent in the migration code.
>>
>> Yes, it can, if you set the rate limit sufficiently low.
>
> You mean, just like you can drive a car without brakes by keeping the 
> speed sufficiently low.
>
>> [..] accounting zero pages as full sized
>> pages should "fix" the problem.
>
> I know you used quotes, but it's a very very generous definition of 
> fix.  Both these proposed "fixes" are nothing more than workarounds, 
> and even particularly ugly ones.  The worst thing about them is that 
> there is no guarantee of migration finishing in a reasonable time, or 
> at all.
>
> If you account zero pages as full, you don't use effectively the 
> bandwidth that was allotted to you, you use only 0.2% of it (8/4096). 
> It then takes an exaggerate amount of time to start iteration on pages 
> that matter.  If you set the bandwidth low, instead, you do not have 
> the bandwidth you need in order to converge.
>
> Even from an aesthetic point of view, if there is such a thing, I 
> don't understand why you advocate conflating network bandwidth and CPU 
> usage into a single measurement.  Nobody disagrees that all you 
> propose is nice to have, and that what Juan sent is a stopgap measure 
> (though a very effective one).  However, this doesn't negate that 
> Juan's accounting patches make a lot of sense in the current design.

Juan's patch, IIUC, does the following: If you've been iterating in a 
tight loop, return to the main loop for *one* iteration every 50ms.

But this means that during this 50ms period of time, a VCPU may be 
blocked from running.  If the guest isn't doing a lot of device I/O 
*and* you're on a relatively low link speed, then this will mean that 
you don't hold qemu_mutex for more than 50ms at a time.

But in the degenerate case where you have a high speed link and you have 
a guest doing a lot of device I/O, you'll see the guest VCPU being 
blocked for 50ms, then getting to run for a very brief period of time, 
followed by another block for 50ms.  The guest's execution will be 
extremely sporadic.

This isn't fixable with this approach.  The only way to really fix this 
is to say that over a given period of time, migration may only consume 
XX amount of CPU time which guarantees the VCPUs get the qemu_mutex for 
the rest of the time.

This is exactly what rate limiting does.  Yes, it results in a longer 
migration time but that's the trade-off we have to make if we want 
deterministic VCPU execution until we can implement threading properly.

If you want a simple example, doing I/O with the rtl8139 adapter while 
doing your migration test and run a tight loop in the get running 
gettimeofday().  Graph the results to see how much execution time the 
guest is actually getting.


>> In the long term, we need a new dirty bit interface from kvm.ko that
>> uses a multi-level table. That should dramatically improve scan
>> performance. We also need to implement live migration in a separate
>> thread that doesn't carry qemu_mutex while it runs.
>
> This may be a good way to fix it, but it's also basically a rewrite.

The only correct short term solution I can see if rate limiting 
unfortunately.

Regards,

Anthony Liguori

> Paolo
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-12-01  4:37 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-23 23:02 [Qemu-devel] [PATCH 00/10] Fix migration with lots of memory Juan Quintela
2010-11-23 23:02 ` [Qemu-devel] [PATCH 01/10] Add spent time to migration Juan Quintela
2010-11-23 23:02 ` [Qemu-devel] [PATCH 02/10] Add buffered_file_internal constant Juan Quintela
2010-11-24 10:40   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-24 10:52     ` Juan Quintela
2010-11-24 11:04       ` Michael S. Tsirkin
2010-11-24 11:13         ` Juan Quintela
2010-11-24 11:19           ` Michael S. Tsirkin
     [not found]       ` <4CF46012.2060804@codemonkey.ws>
2010-11-30 11:56         ` Juan Quintela
2010-11-30 14:02           ` Anthony Liguori
2010-11-30 14:11             ` Michael S. Tsirkin
2010-11-30 14:22               ` Anthony Liguori
2010-11-30 15:40             ` Juan Quintela
2010-11-30 16:10               ` Michael S. Tsirkin
2010-11-30 16:32                 ` Juan Quintela
2010-11-30 16:44                   ` Anthony Liguori
2010-11-30 18:04                     ` Juan Quintela
2010-11-30 18:54                       ` Anthony Liguori
2010-11-30 19:15                         ` Juan Quintela
2010-11-30 20:23                           ` Anthony Liguori
2010-11-30 20:56                             ` Juan Quintela
2010-11-23 23:03 ` [Qemu-devel] [PATCH 03/10] Add printf debug to savevm Juan Quintela
     [not found]   ` <4CF45AB2.7050506@codemonkey.ws>
2010-11-30 10:36     ` Stefan Hajnoczi
2010-11-30 22:40       ` [Qemu-devel] " Juan Quintela
2010-12-01  7:50         ` Stefan Hajnoczi
2010-11-23 23:03 ` [Qemu-devel] [PATCH 04/10] No need to iterate if we already are over the limit Juan Quintela
2010-11-23 23:03 ` [Qemu-devel] [PATCH 05/10] KVM don't care about TLB handling Juan Quintela
2010-11-23 23:03 ` [Qemu-devel] [PATCH 06/10] Only calculate expected_time for stage 2 Juan Quintela
2010-11-23 23:03 ` [Qemu-devel] [PATCH 07/10] ram_save_remaining() returns an uint64_t Juan Quintela
     [not found]   ` <4CF45C0C.705@codemonkey.ws>
2010-11-30  7:21     ` [Qemu-devel] " Paolo Bonzini
2010-11-30 13:44       ` Anthony Liguori
2010-11-30 14:38     ` Juan Quintela
2010-11-23 23:03 ` [Qemu-devel] [PATCH 08/10] Count nanoseconds with uint64_t not doubles Juan Quintela
2010-11-30  7:17   ` [Qemu-devel] " Paolo Bonzini
     [not found]   ` <4CF45C5B.9080507@codemonkey.ws>
2010-11-30 14:40     ` Juan Quintela
2010-11-23 23:03 ` [Qemu-devel] [PATCH 09/10] Exit loop if we have been there too long Juan Quintela
2010-11-24 10:40   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-24 11:01     ` Juan Quintela
2010-11-24 11:14       ` Michael S. Tsirkin
2010-11-24 15:16         ` Paolo Bonzini
2010-11-24 15:59           ` Michael S. Tsirkin
     [not found]           ` <4CF45E3F.4040609@codemonkey.ws>
2010-11-30  8:10             ` Paolo Bonzini
2010-11-30 13:26             ` Juan Quintela
     [not found]   ` <4CF45D67.5010906@codemonkey.ws>
2010-11-30  7:15     ` Paolo Bonzini
2010-11-30 13:47       ` Anthony Liguori
2010-11-30 13:58         ` Avi Kivity
2010-11-30 14:17           ` Anthony Liguori
2010-11-30 14:27             ` Avi Kivity
2010-11-30 14:50               ` Anthony Liguori
2010-12-01 12:40                 ` Avi Kivity
2010-11-30 17:43               ` Juan Quintela
2010-12-01  1:20               ` Takuya Yoshikawa
2010-12-01  1:52                 ` Juan Quintela
2010-12-01  2:22                   ` Takuya Yoshikawa
2010-12-01 12:35                   ` Avi Kivity
2010-12-01 13:45                     ` Juan Quintela
2010-12-02  1:31                     ` Takuya Yoshikawa
2010-12-02  8:37                       ` Avi Kivity
2010-11-30 14:12         ` Paolo Bonzini
2010-11-30 15:00           ` Anthony Liguori [this message]
2010-11-30 17:59             ` Juan Quintela
2010-11-23 23:03 ` [Qemu-devel] [PATCH 10/10] Maintaing number of dirty pages Juan Quintela
     [not found]   ` <4CF45DE0.8020701@codemonkey.ws>
2010-11-30 14:46     ` [Qemu-devel] " Juan Quintela
2010-12-01 14:46       ` Avi Kivity
2010-12-01 15:51         ` Juan Quintela
2010-12-01 15:55           ` Anthony Liguori
2010-12-01 16:25             ` Juan Quintela
2010-12-01 16:33               ` Anthony Liguori
2010-12-01 16:43                 ` Avi Kivity
2010-12-01 16:49                   ` Anthony Liguori
2010-12-01 16:52                     ` Avi Kivity
2010-12-01 16:56                       ` Anthony Liguori
2010-12-01 17:01                         ` Avi Kivity
2010-12-01 17:05                           ` Anthony Liguori
2010-12-01 18:51                             ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CF51179.9070306@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=quintela@trasno.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).