Re: [Qemu-devel] [PATCH 1/1] migration: fix expected_downtime

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Igor Redko <redkoi@virtuozzo.com>
Cc: Amit Shah <amit.shah@redhat.com>,
	"Denis V. Lunev" <den@openvz.org>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] migration: fix expected_downtime
Date: Wed, 7 Oct 2015 11:56:38 +0100	[thread overview]
Message-ID: <20151007105637.GB2710@work-vm> (raw)
In-Reply-To: <560D4430.3090203@virtuozzo.com>

* Igor Redko (redkoi@virtuozzo.com) wrote:
> On 28.09.2015 22:22, Dr. David Alan Gilbert wrote:
> >* Denis V. Lunev (den@openvz.org) wrote:
> >>From: Igor Redko <redkoi@virtuozzo.com>
> >>
> >>To get this estimation we must divide pending_size by bandwidth
> >>according to description of expected-downtime ("qmp-commands.hx:3246"):
> >>   "expected-downtime": only present while migration is active
> >>               total amount in ms for downtime that was calculated on
> >>               the last bitmap round (json-int)
> >>
> >>Previous version was just wrong because dirty_bytes_rate and bandwidth
> >>are measured in Bytes/ms, so dividing first by second we get some
> >>dimensionless quantity.
> >>As it said in description above this value is showed during active
> >>migration phase and recalculated only after transferring all memory
> >>and if this process took more than 1 sec. So maybe just nobody noticed
> >>that bug.
> >
> >While I agree the existing code looks wrong, I don't see how this is
> >any more correct.
> 
> This patch is aimed to fix units of expected_downtime. It is reasonable that
> expected_downtime should be measured in milliseconds. While the existing
> implementation lacks of any units.

I agree your units are correct where the old one isn't; and I agree
it needs fixing.
However I'm worrying about whether the value in your fix is correct.

> >  I think 'pending_size' is an estimate of the number of bytes left
> >to transfer, the intention being that most of those are transferred
> >prior to pausing the machine, if those are transferred before pausing
> >then they aren't part of the downtime.
> >
> Yes, 'pending_size' is an estimate of the number of bytes left to transfer,
> indeed.
> But the condition:
> >	if (s->dirty_bytes_rate && transferred_bytes > 10000) {
> slightly modifies the meaning of pending_size correspondingly.
> dirty_bytes_rate is set in migration_sync() that is called when pending_size
> < max_downtime * bandwidth. This estimation is higher than max_downtime by
> design

I don't think that check really modifies the meaning of pending_size;
it's just a sanity check that means we don't start trying to predict downtime
when we've not transmitted much yet.

> >It feels that:
> >    * If the guest wasn't dirtying pages, then you wouldn't have to
> >      pause the guest; if it was just dirtying them a little then you
> >      wouldn't have much to transfer after the pages you'd already
> >      sent; so if the guest dirty pages fast then the estimate should be
> >      larger; so 'dirty_bytes_rate' being on top of the fraction feels right.
> >
> >    * If the bandwidth is higher then the estimate should be smaller; so
> >      'bandwidth' being on the bottom of the fraction feels right.
> >
> >Dave
> >
> The 'expected_downtime' in the existing code takes two types of values:
>   * positive - dirty_bytes_rate is higher than bandwidth. In this
>     case migration doesn't complete.
>   * zero - bandwidth is higher than dirty_bytes_rate. In this case
>     migration is possible, but we don’t have the downtime value.

OK, I don't think that should give zero; my argument being that given
each pass throuhg RAM takes some time, you're always going to some proportion
of RAM that's dirty.

> This patch has some imperfections. But if we would look back into history,
> it seems that this patch just restores the broken logic.
> The existing code is introduced by commit
> https://github.com/qemu/qemu/commit/90f8ae724a575861f093fbdbfd49a925bcfec327
> which claims, that it just restores the mistakenly deleted estimation
> (commit
> https://github.com/qemu/qemu/commit/e4ed1541ac9413eac494a03532e34beaf8a7d1c5)
> Meanwhile, the estimation has changed during this restore operation. The
> estimation before the removal (before
> e4ed1541ac9413eac494a03532e34beaf8a7d1c5) was just like the one in my patch.

Yes; remember I believe that the current code is wrong - I'm just not
sure your suggested fix is right.

> So maybe we should think about improvement of this estimation.
> I'm suggest using something like:
>     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth
> 
> In my opinion this is more correct than the existing approach since the last
> step of the migration process (before pause) is transferring of max_size
> bytes (max_size =  bandwidth * migrate_max_downtime() / 1000000). So the
> bytes that were dirtied at this step will be transferred during downtime.
> The transferred bytes count is dirty_bytes_rate * max_size/bandwidth or
> migrate_max_downtime * dirty_bytes_rate and division by the bandwidth
> results in a formula:
>     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth

Are you sure - I thought migrate_max_downtime is governing the time *after* pause;
and it doesn't make sense to me for the expectation of the system to be
related to the expectation of the user.

Dave

> Igor
> 
> >>Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
> >>Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
> >>Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>CC: Juan Quintela <quintela@redhat.com>
> >>CC: Amit Shah <amit.shah@redhat.com>
> >>---
> >>  migration/migration.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/migration/migration.c b/migration/migration.c
> >>index 662e77e..d55d545 100644
> >>--- a/migration/migration.c
> >>+++ b/migration/migration.c
> >>@@ -994,7 +994,7 @@ static void *migration_thread(void *opaque)
> >>              /* if we haven't sent anything, we don't want to recalculate
> >>                 10000 is a small enough number for our purposes */
> >>              if (s->dirty_bytes_rate && transferred_bytes > 10000) {
> >>-                s->expected_downtime = s->dirty_bytes_rate / bandwidth;
> >>+                s->expected_downtime = pending_size / bandwidth;
> >>              }
> >>
> >>              qemu_file_reset_rate_limit(s->file);
> >>--
> >>2.1.4
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2015-10-07 10:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-24 14:07 [Qemu-devel] [PATCH 1/1] migration: fix expected_downtime Denis V. Lunev
2015-09-28 19:22 ` Dr. David Alan Gilbert
2015-10-01  8:31   ` Igor Redko
2015-10-01 14:33   ` Igor Redko
2015-10-07 10:56     ` Dr. David Alan Gilbert [this message]
2015-10-09  9:08       ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151007105637.GB2710@work-vm \
    --to=dgilbert@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=den@openvz.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=redkoi@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).