From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Igor Redko <redkoi@virtuozzo.com>
Cc: Amit Shah <amit.shah@redhat.com>,
"Denis V. Lunev" <den@openvz.org>,
qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] migration: fix expected_downtime
Date: Wed, 7 Oct 2015 11:56:38 +0100 [thread overview]
Message-ID: <20151007105637.GB2710@work-vm> (raw)
In-Reply-To: <560D4430.3090203@virtuozzo.com>
* Igor Redko (redkoi@virtuozzo.com) wrote:
> On 28.09.2015 22:22, Dr. David Alan Gilbert wrote:
> >* Denis V. Lunev (den@openvz.org) wrote:
> >>From: Igor Redko <redkoi@virtuozzo.com>
> >>
> >>To get this estimation we must divide pending_size by bandwidth
> >>according to description of expected-downtime ("qmp-commands.hx:3246"):
> >> "expected-downtime": only present while migration is active
> >> total amount in ms for downtime that was calculated on
> >> the last bitmap round (json-int)
> >>
> >>Previous version was just wrong because dirty_bytes_rate and bandwidth
> >>are measured in Bytes/ms, so dividing first by second we get some
> >>dimensionless quantity.
> >>As it said in description above this value is showed during active
> >>migration phase and recalculated only after transferring all memory
> >>and if this process took more than 1 sec. So maybe just nobody noticed
> >>that bug.
> >
> >While I agree the existing code looks wrong, I don't see how this is
> >any more correct.
>
> This patch is aimed to fix units of expected_downtime. It is reasonable that
> expected_downtime should be measured in milliseconds. While the existing
> implementation lacks of any units.
I agree your units are correct where the old one isn't; and I agree
it needs fixing.
However I'm worrying about whether the value in your fix is correct.
> > I think 'pending_size' is an estimate of the number of bytes left
> >to transfer, the intention being that most of those are transferred
> >prior to pausing the machine, if those are transferred before pausing
> >then they aren't part of the downtime.
> >
> Yes, 'pending_size' is an estimate of the number of bytes left to transfer,
> indeed.
> But the condition:
> > if (s->dirty_bytes_rate && transferred_bytes > 10000) {
> slightly modifies the meaning of pending_size correspondingly.
> dirty_bytes_rate is set in migration_sync() that is called when pending_size
> < max_downtime * bandwidth. This estimation is higher than max_downtime by
> design
I don't think that check really modifies the meaning of pending_size;
it's just a sanity check that means we don't start trying to predict downtime
when we've not transmitted much yet.
> >It feels that:
> > * If the guest wasn't dirtying pages, then you wouldn't have to
> > pause the guest; if it was just dirtying them a little then you
> > wouldn't have much to transfer after the pages you'd already
> > sent; so if the guest dirty pages fast then the estimate should be
> > larger; so 'dirty_bytes_rate' being on top of the fraction feels right.
> >
> > * If the bandwidth is higher then the estimate should be smaller; so
> > 'bandwidth' being on the bottom of the fraction feels right.
> >
> >Dave
> >
> The 'expected_downtime' in the existing code takes two types of values:
> * positive - dirty_bytes_rate is higher than bandwidth. In this
> case migration doesn't complete.
> * zero - bandwidth is higher than dirty_bytes_rate. In this case
> migration is possible, but we don’t have the downtime value.
OK, I don't think that should give zero; my argument being that given
each pass throuhg RAM takes some time, you're always going to some proportion
of RAM that's dirty.
> This patch has some imperfections. But if we would look back into history,
> it seems that this patch just restores the broken logic.
> The existing code is introduced by commit
> https://github.com/qemu/qemu/commit/90f8ae724a575861f093fbdbfd49a925bcfec327
> which claims, that it just restores the mistakenly deleted estimation
> (commit
> https://github.com/qemu/qemu/commit/e4ed1541ac9413eac494a03532e34beaf8a7d1c5)
> Meanwhile, the estimation has changed during this restore operation. The
> estimation before the removal (before
> e4ed1541ac9413eac494a03532e34beaf8a7d1c5) was just like the one in my patch.
Yes; remember I believe that the current code is wrong - I'm just not
sure your suggested fix is right.
> So maybe we should think about improvement of this estimation.
> I'm suggest using something like:
> expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth
>
> In my opinion this is more correct than the existing approach since the last
> step of the migration process (before pause) is transferring of max_size
> bytes (max_size = bandwidth * migrate_max_downtime() / 1000000). So the
> bytes that were dirtied at this step will be transferred during downtime.
> The transferred bytes count is dirty_bytes_rate * max_size/bandwidth or
> migrate_max_downtime * dirty_bytes_rate and division by the bandwidth
> results in a formula:
> expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth
Are you sure - I thought migrate_max_downtime is governing the time *after* pause;
and it doesn't make sense to me for the expectation of the system to be
related to the expectation of the user.
Dave
> Igor
>
> >>Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
> >>Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
> >>Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>CC: Juan Quintela <quintela@redhat.com>
> >>CC: Amit Shah <amit.shah@redhat.com>
> >>---
> >> migration/migration.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/migration/migration.c b/migration/migration.c
> >>index 662e77e..d55d545 100644
> >>--- a/migration/migration.c
> >>+++ b/migration/migration.c
> >>@@ -994,7 +994,7 @@ static void *migration_thread(void *opaque)
> >> /* if we haven't sent anything, we don't want to recalculate
> >> 10000 is a small enough number for our purposes */
> >> if (s->dirty_bytes_rate && transferred_bytes > 10000) {
> >>- s->expected_downtime = s->dirty_bytes_rate / bandwidth;
> >>+ s->expected_downtime = pending_size / bandwidth;
> >> }
> >>
> >> qemu_file_reset_rate_limit(s->file);
> >>--
> >>2.1.4
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2015-10-07 10:56 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-24 14:07 [Qemu-devel] [PATCH 1/1] migration: fix expected_downtime Denis V. Lunev
2015-09-28 19:22 ` Dr. David Alan Gilbert
2015-10-01 8:31 ` Igor Redko
2015-10-01 14:33 ` Igor Redko
2015-10-07 10:56 ` Dr. David Alan Gilbert [this message]
2015-10-09 9:08 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151007105637.GB2710@work-vm \
--to=dgilbert@redhat.com \
--cc=amit.shah@redhat.com \
--cc=den@openvz.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=redkoi@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).