Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	amit.shah@redhat.com, mdroth@linux.vnet.ibm.com,
	quintela@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
Date: Wed, 18 Apr 2018 09:36:33 +0100	[thread overview]
Message-ID: <20180418083632.GB2710@work-vm> (raw)
In-Reply-To: <20180418064641.GA12871@9.122.211.20>

* Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > using ram_bytes_remaining would yeild it correct.
> > > 
> > > This commit message hasn't been changed since v1, but the patch is
> > > doing something completely different.  I think most of the info from
> > > your cover letter needs to be in here.
> > > 
> > > > 
> > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > ---
> > > >  migration/migration.c | 6 +++---
> > > >  migration/migration.h | 1 +
> > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index 52a5092add..4d866bb920 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > >      }
> > > >  
> > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > >      }
> > > >  }
> > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > >      time_spent = current_time - s->iteration_start_time;
> > > >      bandwidth = (double)transferred / time_spent;
> > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > >  
> > > >      s->mbps = (((double) transferred * 8.0) /
> > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > >       * recalculate. 10000 is a small enough number for our purposes
> > > >       */
> > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > -            qemu_target_page_size() / bandwidth;
> > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > >      }
> > 
> > ..but more importantly, I still think this change is bogus.  expected
> > downtime is not the same thing as remaining ram / bandwidth.
> 
> I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> and observed precopy migration was infinite with expected_downtime set as
> downtime-limit.

Did you debug why it was infinite? Which component of the calculation
had gone wrong and why?

> During the discussion for Bug RH1560562, Michael Roth quoted that
> 
> One thing to note: in my testing I found that the "expected downtime" value
> seems inaccurate in this scenario. To find a max downtime that allowed
> migration to complete I had to divide "remaining ram" by "throughput" from
> "info migrate" (after the initial pre-copy pass through ram, i.e. once
> "dirty pages" value starts getting reported and we're just sending dirtied
> pages).
> 
> Later by trying it precopy migration could able to complete with this
> approach.
> 
> adding Michael Roth in cc.

We should try and _understand_ the rational for the change, not just go
with it.  Now, remember that whatever we do is just an estimate and
there will be lots of cases where it's bad - so be careful what you're
using it for - you definitely should NOT use the value in any automated
system.
My problem with just using ram_bytes_remaining is that it doesn't take
into account the rate at which the guest is changing RAM - which feels
like it's the important measure for expected downtime.

Dave

> Regards,
> Bala
> 
> > 
> > > >  
> > > >      qemu_file_reset_rate_limit(s->to_dst_file);
> > > > diff --git a/migration/migration.h b/migration/migration.h
> > > > index 8d2f320c48..8584f8e22e 100644
> > > > --- a/migration/migration.h
> > > > +++ b/migration/migration.h
> > > > @@ -128,6 +128,7 @@ struct MigrationState
> > > >      int64_t downtime_start;
> > > >      int64_t downtime;
> > > >      int64_t expected_downtime;
> > > > +    int64_t ram_bytes_remaining;
> > > >      bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> > > >      int64_t setup_time;
> > > >      /*
> > > 
> > 
> > 
> > 
> > -- 
> > David Gibson			| I'll have my music baroque, and my code
> > david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> > 				| _way_ _around_!
> > http://www.ozlabs.org/~dgibson
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2018-04-18  8:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 13:23 [Qemu-devel] [PATCH v2 0/1] migration: calculate expected_downtime with ram_bytes_remaining() Balamuruhan S
2018-04-17 13:23 ` [Qemu-devel] [PATCH v2 1/1] " Balamuruhan S
2018-04-18  0:55   ` David Gibson
2018-04-18  0:57     ` David Gibson
2018-04-18  6:46       ` Balamuruhan S
2018-04-18  8:36         ` Dr. David Alan Gilbert [this message]
2018-04-19  4:44           ` Balamuruhan S
2018-04-19 11:24             ` Dr. David Alan Gilbert
2018-04-20  5:47               ` David Gibson
2018-04-20 10:28                 ` Dr. David Alan Gilbert
2018-04-21 19:24                   ` Balamuruhan S
2018-04-19 11:48             ` David Gibson
2018-04-20 18:57               ` Dr. David Alan Gilbert
2018-05-03  2:08                 ` David Gibson
2018-04-21 19:12               ` Balamuruhan S
2018-05-03  2:14                 ` David Gibson
2018-04-18  6:52     ` Balamuruhan S

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180418083632.GB2710@work-vm \
    --to=dgilbert@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=bala24@linux.vnet.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).