Re: [Qemu-devel] migration: broken ram_save_pending

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Paolo Bonzini <pbonzini@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: Alex Graf <agraf@suse.de>
Subject: Re: [Qemu-devel] migration: broken ram_save_pending
Date: Tue, 04 Feb 2014 22:59:13 +1100	[thread overview]
Message-ID: <52F0D611.7070105@ozlabs.ru> (raw)
In-Reply-To: <52F0C523.30102@redhat.com>

On 02/04/2014 09:46 PM, Paolo Bonzini wrote:
> Il 04/02/2014 08:15, Alexey Kardashevskiy ha scritto:
>> So. migration_thread() gets dirty pages number, tries to send them in a
>> loop but every iteration resets the number of pages to 96 and we start
>> again. After several tries we cross BUFFER_DELAY timeout and calculate new
>> @max_size and if the host machine is fast enough it is bigger than 393216
>> and next loop will finally finish the migration.
> 
> This should have happened pretty much immediately, because it's not while
> (pending()) but rather
> 
>             while (pending_size && pending_size >= max_size)
> 
> (it's an "if" in the code, but the idea is the same).  And max_size is the
> following:
> 
>             max_size = bandwidth * migrate_max_downtime() / 1000000;
> 
> With the default throttling of 32 MiB/s, bandwidth must be something like
> 33000 (expressed in bytes/ms) with the default settings, and then max_size
> should be 33000*3*10^9 / 10^6 = 6000000.  Where is my computation wrong?


migrate_max_downtime() = 30000000 = 3*10^7.

When the migration is in iterating stage, bandwidth is a speed in last
100ms which is usually 5 blocks 250KB each so it is
1250000/100=12500bytes/s and max_size=12500*30000000/10^6=375000 which is
less than the last chunk is.


> 
> Also, did you profile it to find the hotspot?  Perhaps the bitmap
> operations are taking a lot of time.  How big is the guest?

1024MB.


>  Juan's patches
> were optimizing the bitmaps but not all of them apply to your case because
> of hpratio.

This I had to disable :)


>> I can only think of something simple like below and not sure it does not
>> break other things. I would expect ram_save_pending() to return correct
>> number of bytes QEMU is going to send rather than number of pages
>> multiplied by 4096 but checking if all these pages are really empty is not
>> too cheap.
> 
> If you use qemu_update_position you will use very little bandwidth in the
> case where a lot of pages are zero.

My guest migrates in a second or so. I guess in this case
qemu_file_rate_limit() limits the speed and it does not look at QEMUFile::pos.


> What you mention in ram_save_pending() is not problematic just because of
> finding if the pages are empty, but also because you have to find the
> nonzero spots in the bitmap!

Sure.


> 
> Paolo
> 
>> Thanks!
>>
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 2ba297e..90949b0 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -537,16 +537,17 @@ static int ram_save_block(QEMUFile *f, bool
>> last_stage)
>>                          acct_info.dup_pages++;
>>                      }
>>                  }
>>              } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
>>                  acct_info.dup_pages++;
>>                  bytes_sent = save_block_hdr(f, block, offset, cont,
>>                                              RAM_SAVE_FLAG_COMPRESS);
>>                  qemu_put_byte(f, 0);
>> +                qemu_update_position(f, TARGET_PAGE_SIZE);
>>                  bytes_sent++;
>>              } else if (!ram_bulk_stage && migrate_use_xbzrle()) {
>>                  current_addr = block->offset + offset;
>>                  bytes_sent = save_xbzrle_page(f, p, current_addr, block,
>>                                                offset, cont, last_stage);
>>                  if (!last_stage) {
>>                      p = get_cached_data(XBZRLE.cache, current_addr);
>>                  }
>>
>>
> 
> 


-- 
Alexey

next prev parent reply	other threads:[~2014-02-04 11:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-04  7:15 [Qemu-devel] migration: broken ram_save_pending Alexey Kardashevskiy
2014-02-04 10:46 ` Paolo Bonzini
2014-02-04 11:59   ` Alexey Kardashevskiy [this message]
2014-02-04 12:07     ` Paolo Bonzini
2014-02-04 12:16       ` Alexey Kardashevskiy
2014-02-04 14:00         ` Paolo Bonzini
2014-02-04 22:17           ` Alexey Kardashevskiy
2014-02-05  7:18             ` Paolo Bonzini
2014-02-05  9:09               ` Dr. David Alan Gilbert
2014-02-05 16:35                 ` Paolo Bonzini
2014-02-05 16:42                   ` Dr. David Alan Gilbert
2014-02-05 16:45                     ` Paolo Bonzini
2014-02-06  3:10                       ` Alexey Kardashevskiy
2014-02-06 11:24                         ` Dr. David Alan Gilbert
2014-02-07  5:39                           ` Alexey Kardashevskiy
2014-02-07  8:55                             ` Dr. David Alan Gilbert
2014-02-06 23:49                         ` Paolo Bonzini
2014-02-07  5:42                           ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52F0D611.7070105@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=agraf@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.