Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>,
	quintela@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
Date: Fri, 20 Apr 2018 11:28:04 +0100	[thread overview]
Message-ID: <20180420102803.GF2533@work-vm> (raw)
In-Reply-To: <20180420054712.GH2434@umbus.fritz.box>

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Thu, Apr 19, 2018 at 12:24:04PM +0100, Dr. David Alan Gilbert wrote:
> > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > > > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > > > > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > > > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > > > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > > > > > using ram_bytes_remaining would yeild it correct.
> > > > > > > 
> > > > > > > This commit message hasn't been changed since v1, but the patch is
> > > > > > > doing something completely different.  I think most of the info from
> > > > > > > your cover letter needs to be in here.
> > > > > > > 
> > > > > > > > 
> > > > > > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > > > > > ---
> > > > > > > >  migration/migration.c | 6 +++---
> > > > > > > >  migration/migration.h | 1 +
> > > > > > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > > index 52a5092add..4d866bb920 100644
> > > > > > > > --- a/migration/migration.c
> > > > > > > > +++ b/migration/migration.c
> > > > > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > > > > > >      }
> > > > > > > >  
> > > > > > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > > > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > > > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > > > > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > > > > > >      }
> > > > > > > >  }
> > > > > > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > > > > > >      time_spent = current_time - s->iteration_start_time;
> > > > > > > >      bandwidth = (double)transferred / time_spent;
> > > > > > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > > > > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > > > > > >  
> > > > > > > >      s->mbps = (((double) transferred * 8.0) /
> > > > > > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > > >       * recalculate. 10000 is a small enough number for our purposes
> > > > > > > >       */
> > > > > > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > > > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > > > > > -            qemu_target_page_size() / bandwidth;
> > > > > > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > > > > > >      }
> > > > > > 
> > > > > > ..but more importantly, I still think this change is bogus.  expected
> > > > > > downtime is not the same thing as remaining ram / bandwidth.
> > > > > 
> > > > > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> > > > > and observed precopy migration was infinite with expected_downtime set as
> > > > > downtime-limit.
> > > > 
> > > > Did you debug why it was infinite? Which component of the calculation
> > > > had gone wrong and why?
> > > > 
> > > > > During the discussion for Bug RH1560562, Michael Roth quoted that
> > > > > 
> > > > > One thing to note: in my testing I found that the "expected downtime" value
> > > > > seems inaccurate in this scenario. To find a max downtime that allowed
> > > > > migration to complete I had to divide "remaining ram" by "throughput" from
> > > > > "info migrate" (after the initial pre-copy pass through ram, i.e. once
> > > > > "dirty pages" value starts getting reported and we're just sending dirtied
> > > > > pages).
> > > > > 
> > > > > Later by trying it precopy migration could able to complete with this
> > > > > approach.
> > > > > 
> > > > > adding Michael Roth in cc.
> > > > 
> > > > We should try and _understand_ the rational for the change, not just go
> > > > with it.  Now, remember that whatever we do is just an estimate and
> > > 
> > > I have made the change based on my understanding,
> > > 
> > > Currently the calculation is,
> > > 
> > > expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth
> > > 
> > > dirty_pages_rate = No of dirty pages / time =>  its unit (1 / seconds)
> > > qemu_target_page_size => its unit (bytes)
> > > 
> > > dirty_pages_rate * qemu_target_page_size => bytes/seconds
> > > 
> > > bandwidth = bytes transferred / time => bytes/seconds
> > > 
> > > dividing this would not be a measurement of time.
> > 
> > OK, that argument makes sense to me about why it feels broken; but see
> > below.
> > 
> > > > there will be lots of cases where it's bad - so be careful what you're
> > > > using it for - you definitely should NOT use the value in any automated
> > > > system.
> > > 
> > > I agree with it and I would not use it in automated system.
> > > 
> > > > My problem with just using ram_bytes_remaining is that it doesn't take
> > > > into account the rate at which the guest is changing RAM - which feels
> > > > like it's the important measure for expected downtime.
> > > 
> > > ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE
> > > 
> > > This means ram_bytes_remaining is proportional to guest changing RAM, so
> > > we can consider this change would yield expected_downtime
> > 
> > ram_bytes_remaining comes from the *current* number of dirty pages, so it
> > tells you how much you have to transmit, but if the guest wasn't
> > changing RAM, then that just tells you how much longer you have to keep
> > going - not the amount of downtime required.  e.g. right at the start of
> > migration you might have 16G of dirty-pages, but you don't need downtime
> > to transmit them all.
> > 
> > It's actually slightly different, because migration_update_counters is
> > called in the main iteration loop after an iteration and I think that
> > means it only ends up there either at the end of migration OR when
> > qemu_file_rate_limit(f) causes ram_save_iterate to return to the main
> > loop; so you've got the number of dirty pages when it's interrupted by
> > rate limiting.
> > 
> > So I don't think the use of ram_bytes_remaining is right either.
> > 
> > What is the right answer?
> > I'm not sure; but:
> > 
> >    a) If the bandwidth is lower then you can see the downtime should be
> > longer; so  having x/bandwidth  makes sense
> >    b) If the guest is dirtying RAM faster then you can see the downtime
> > should be longer;  so having  dirty_pages_rate on the top seems right.
> > 
> > So you can kind of see where the calculation above comes from.
> > 
> > I can't convince myself of any calculation that actually works!
> > 
> > Lets imagine a setup with a guest dirtying memory at 'Dr' Bytes/s
> > with the bandwidth (Bw), and we enter an iteration with
> > 'Db' bytes dirty:
> > 
> >   The time for that iteration is:
> >      It   = Db / Bw
> > 
> >   during that time we've dirtied 'Dr' more RAM, so at the end of
> > it we have:
> >      Db' = Dr * It
> >          = Dr * Db
> >            -------
> >               Bw
> > 
> > But then if you follow that, in any case where Dr < Bw that iterates
> > down to Db' being ~0  irrespective of what that ration is - but that
> > makes no sense.
> 
> So, as per our IRC discussion, this is pretty hard.
> 
> That said, I think Bala's proposed patch is better than what we have
> now.  It will initially be a gross over-estimate, but for for
> non-converging migrations it should approach a reasonable estimate
> later on.  What we have now can never really be right.
> 
> So while it would be nice to have some better modelling of this long
> term, in the short term I think it makes sense to apply Bala's patch.

I'd like to see where the original one was going wrong for Bala; my
problem is that for me, the old code (which logically is wrong) is
giving sensible results here, within a factor of 2 of the actual
downtime I needed to set.  The code maybe wrong, but the results are
reasonably right.

iDave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2018-04-20 10:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 13:23 [Qemu-devel] [PATCH v2 0/1] migration: calculate expected_downtime with ram_bytes_remaining() Balamuruhan S
2018-04-17 13:23 ` [Qemu-devel] [PATCH v2 1/1] " Balamuruhan S
2018-04-18  0:55   ` David Gibson
2018-04-18  0:57     ` David Gibson
2018-04-18  6:46       ` Balamuruhan S
2018-04-18  8:36         ` Dr. David Alan Gilbert
2018-04-19  4:44           ` Balamuruhan S
2018-04-19 11:24             ` Dr. David Alan Gilbert
2018-04-20  5:47               ` David Gibson
2018-04-20 10:28                 ` Dr. David Alan Gilbert [this message]
2018-04-21 19:24                   ` Balamuruhan S
2018-04-19 11:48             ` David Gibson
2018-04-20 18:57               ` Dr. David Alan Gilbert
2018-05-03  2:08                 ` David Gibson
2018-04-21 19:12               ` Balamuruhan S
2018-05-03  2:14                 ` David Gibson
2018-04-18  6:52     ` Balamuruhan S

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180420102803.GF2533@work-vm \
    --to=dgilbert@redhat.com \
    --cc=bala24@linux.vnet.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).