From: Balamuruhan S <bala24@linux.vnet.ibm.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: amit.shah@redhat.com, quintela@redhat.com, qemu-devel@nongnu.org,
david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
Date: Thu, 19 Apr 2018 10:14:52 +0530 [thread overview]
Message-ID: <20180419044452.GA11708@9.122.211.20> (raw)
In-Reply-To: <20180418083632.GB2710@work-vm>
On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > > using ram_bytes_remaining would yeild it correct.
> > > >
> > > > This commit message hasn't been changed since v1, but the patch is
> > > > doing something completely different. I think most of the info from
> > > > your cover letter needs to be in here.
> > > >
> > > > >
> > > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > > ---
> > > > > migration/migration.c | 6 +++---
> > > > > migration/migration.h | 1 +
> > > > > 2 files changed, 4 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > index 52a5092add..4d866bb920 100644
> > > > > --- a/migration/migration.c
> > > > > +++ b/migration/migration.c
> > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > > > }
> > > > >
> > > > > if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > > - info->ram->remaining = ram_bytes_remaining();
> > > > > + info->ram->remaining = s->ram_bytes_remaining;
> > > > > info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > > > }
> > > > > }
> > > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > > > time_spent = current_time - s->iteration_start_time;
> > > > > bandwidth = (double)transferred / time_spent;
> > > > > + s->ram_bytes_remaining = ram_bytes_remaining();
> > > > > s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > > >
> > > > > s->mbps = (((double) transferred * 8.0) /
> > > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > * recalculate. 10000 is a small enough number for our purposes
> > > > > */
> > > > > if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > > - s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > > - qemu_target_page_size() / bandwidth;
> > > > > + s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > > > }
> > >
> > > ..but more importantly, I still think this change is bogus. expected
> > > downtime is not the same thing as remaining ram / bandwidth.
> >
> > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> > and observed precopy migration was infinite with expected_downtime set as
> > downtime-limit.
>
> Did you debug why it was infinite? Which component of the calculation
> had gone wrong and why?
>
> > During the discussion for Bug RH1560562, Michael Roth quoted that
> >
> > One thing to note: in my testing I found that the "expected downtime" value
> > seems inaccurate in this scenario. To find a max downtime that allowed
> > migration to complete I had to divide "remaining ram" by "throughput" from
> > "info migrate" (after the initial pre-copy pass through ram, i.e. once
> > "dirty pages" value starts getting reported and we're just sending dirtied
> > pages).
> >
> > Later by trying it precopy migration could able to complete with this
> > approach.
> >
> > adding Michael Roth in cc.
>
> We should try and _understand_ the rational for the change, not just go
> with it. Now, remember that whatever we do is just an estimate and
I have made the change based on my understanding,
Currently the calculation is,
expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth
dirty_pages_rate = No of dirty pages / time => its unit (1 / seconds)
qemu_target_page_size => its unit (bytes)
dirty_pages_rate * qemu_target_page_size => bytes/seconds
bandwidth = bytes transferred / time => bytes/seconds
dividing this would not be a measurement of time.
> there will be lots of cases where it's bad - so be careful what you're
> using it for - you definitely should NOT use the value in any automated
> system.
I agree with it and I would not use it in automated system.
> My problem with just using ram_bytes_remaining is that it doesn't take
> into account the rate at which the guest is changing RAM - which feels
> like it's the important measure for expected downtime.
ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE
This means ram_bytes_remaining is proportional to guest changing RAM, so
we can consider this change would yield expected_downtime
Regards,
Bala
>
> Dave
>
> > Regards,
> > Bala
> >
> > >
> > > > >
> > > > > qemu_file_reset_rate_limit(s->to_dst_file);
> > > > > diff --git a/migration/migration.h b/migration/migration.h
> > > > > index 8d2f320c48..8584f8e22e 100644
> > > > > --- a/migration/migration.h
> > > > > +++ b/migration/migration.h
> > > > > @@ -128,6 +128,7 @@ struct MigrationState
> > > > > int64_t downtime_start;
> > > > > int64_t downtime;
> > > > > int64_t expected_downtime;
> > > > > + int64_t ram_bytes_remaining;
> > > > > bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> > > > > int64_t setup_time;
> > > > > /*
> > > >
> > >
> > >
> > >
> > > --
> > > David Gibson | I'll have my music baroque, and my code
> > > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> > > | _way_ _around_!
> > > http://www.ozlabs.org/~dgibson
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
next prev parent reply other threads:[~2018-04-19 4:45 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-17 13:23 [Qemu-devel] [PATCH v2 0/1] migration: calculate expected_downtime with ram_bytes_remaining() Balamuruhan S
2018-04-17 13:23 ` [Qemu-devel] [PATCH v2 1/1] " Balamuruhan S
2018-04-18 0:55 ` David Gibson
2018-04-18 0:57 ` David Gibson
2018-04-18 6:46 ` Balamuruhan S
2018-04-18 8:36 ` Dr. David Alan Gilbert
2018-04-19 4:44 ` Balamuruhan S [this message]
2018-04-19 11:24 ` Dr. David Alan Gilbert
2018-04-20 5:47 ` David Gibson
2018-04-20 10:28 ` Dr. David Alan Gilbert
2018-04-21 19:24 ` Balamuruhan S
2018-04-19 11:48 ` David Gibson
2018-04-20 18:57 ` Dr. David Alan Gilbert
2018-05-03 2:08 ` David Gibson
2018-04-21 19:12 ` Balamuruhan S
2018-05-03 2:14 ` David Gibson
2018-04-18 6:52 ` Balamuruhan S
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180419044452.GA11708@9.122.211.20 \
--to=bala24@linux.vnet.ibm.com \
--cc=amit.shah@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgilbert@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).