Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Balamuruhan S <bala24@linux.vnet.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, amit.shah@redhat.com, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
Date: Wed, 04 Apr 2018 11:55:14 +0530	[thread overview]
Message-ID: <bb088a1ba2e344273db32402f7c9fae4@linux.vnet.ibm.com> (raw)

On 2018-04-04 07:29, Peter Xu wrote:
> On Tue, Apr 03, 2018 at 11:00:00PM +0530, bala24 wrote:
>> On 2018-04-03 11:40, Peter Xu wrote:
>> > On Sun, Apr 01, 2018 at 12:25:36AM +0530, Balamuruhan S wrote:
>> > > expected_downtime value is not accurate with dirty_pages_rate *
>> > > page_size,
>> > > using ram_bytes_remaining would yeild it correct.
>> > >
>> > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
>> > > ---
>> > >  migration/migration.c | 3 +--
>> > >  1 file changed, 1 insertion(+), 2 deletions(-)
>> > >
>> > > diff --git a/migration/migration.c b/migration/migration.c
>> > > index 58bd382730..4e43dc4f92 100644
>> > > --- a/migration/migration.c
>> > > +++ b/migration/migration.c
>> > > @@ -2245,8 +2245,7 @@ static void
>> > > migration_update_counters(MigrationState *s,
>> > >       * recalculate. 10000 is a small enough number for our purposes
>> > >       */
>> > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
>> > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
>> > > -            qemu_target_page_size() / bandwidth;
>> > > +        s->expected_downtime = ram_bytes_remaining() / bandwidth;
>> >
>> > This field was removed in e4ed1541ac ("savevm: New save live migration
>> > method: pending", 2012-12-20), in which remaing RAM was used.
>> >
>> > And it was added back in 90f8ae724a ("migration: calculate
>> > expected_downtime", 2013-02-22), in which dirty rate was used.
>> >
>> > However I didn't find a clue on why we changed from using remaining
>> > RAM to using dirty rate...  So I'll leave this question to Juan.
>> >
>> > Besides, I'm a bit confused on when we'll want such a value.  AFAIU
>> > precopy is mostly used by setting up the target downtime before hand,
>> > so we should already know the downtime before hand.  Then why we want
>> > to observe such a thing?
>> 
>> Thanks Peter Xu for reviewing,
>> 
>> I tested precopy migration with 16M hugepage backed ppc guest and
>> granularity
>> of page size in migration is 4K so any page dirtied would result in 
>> 4096
>> pages
>> to be transmitted again, this led for migration to continue endless,
>> 
>> default migrate_parameters:
>> downtime-limit: 300 milliseconds
>> 
>> info migrate:
>> expected downtime: 1475 milliseconds
>> 
>> Migration status: active
>> total time: 130874 milliseconds
>> expected downtime: 1475 milliseconds
>> setup: 3475 milliseconds
>> transferred ram: 18197383 kbytes
>> throughput: 866.83 mbps
>> remaining ram: 376892 kbytes
>> total ram: 8388864 kbytes
>> duplicate: 1678265 pages
>> skipped: 0 pages
>> normal: 4536795 pages
>> normal bytes: 18147180 kbytes
>> dirty sync count: 6
>> page size: 4 kbytes
>> dirty pages rate: 39044 pages
>> 
>> In order to complete migration I configured downtime-limit to 1475
>> milliseconds but still migration was endless. Later calculated 
>> expected
>> downtime by remaining ram 376892 Kbytes / 866.83 mbps yeilded 3478.34
>> milliseconds and configuring it as downtime-limit succeeds the 
>> migration
>> to complete. This led to the conclusion that expected downtime is not
>> accurate.
> 
> Hmm, thanks for the information.  I'd say your calculation seems
> reasonable to me: it shows how long time will it need if we stop the
> VM now on source immediately and migrate the rest. However Juan might
> have an explanation on existing algorithm which I would like to know

Sure, I agree

> too. So still I'll put aside the "which one is better" question.
> 
> For your use case, you can have a look on either of below way to
> have a converged migration:
> 
> - auto-converge: that's a migration capability that throttles CPU
>   usage of guests

I used auto-converge option before hand and still it doesn't help
for migration to complete

> 
> - postcopy: that'll let you start the destination VM even without
>   transferring all the RAMs before hand

I am seeing issue in postcopy migration between POWER8(16M) -> 
POWER9(1G)
where the hugepage size is different. I am trying to enable it but host 
start
address have to be aligned with 1G page size in 
ram_block_discard_range(),
which I am debugging further to fix it.

Regards,
Balamuruhan S

> 
> Either of the technique can be configured via "migrate_set_capability"
> HMP command or "migrate-set-capabilities" QMP command (some googling
> would show detailed steps). And, either of above should help you to
> migrate successfully in this hard-to-converge scenario, instead of
> your current way (observing downtime, set downtime).
> 
> Meanwhile, I'm thinking whether instead of observing the downtime in
> real time, whether we should introduce a command to stop the VM
> immediately to migrate the rest when we want, or, a new parameter to
> current "migrate" command.

next             reply	other threads:[~2018-04-04  6:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-04  6:25 Balamuruhan S [this message]
2018-04-04  8:06 ` [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining() Peter Xu
2018-04-04  8:49   ` Balamuruhan S
2018-04-09 18:57     ` Dr. David Alan Gilbert
2018-04-10  1:22       ` David Gibson
2018-04-10 10:02         ` Dr. David Alan Gilbert
2018-04-11  1:28           ` David Gibson
  -- strict thread matches above, loose matches on Subject: below --
2018-03-31 18:55 Balamuruhan S
2018-04-03  6:10 ` Peter Xu
2018-04-03 17:30   ` bala24
2018-04-04  1:59     ` Peter Xu
2018-04-04  9:02   ` Juan Quintela
2018-04-04  9:04 ` Juan Quintela
2018-04-10  9:52   ` Balamuruhan S
2018-04-10 10:52     ` Balamuruhan S

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb088a1ba2e344273db32402f7c9fae4@linux.vnet.ibm.com \
    --to=bala24@linux.vnet.ibm.com \
    --cc=amit.shah@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.