From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51931) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZjmOs-000431-D7 for qemu-devel@nongnu.org; Wed, 07 Oct 2015 06:56:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZjmOp-0001BY-5f for qemu-devel@nongnu.org; Wed, 07 Oct 2015 06:56:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36091) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZjmOo-0001BD-UH for qemu-devel@nongnu.org; Wed, 07 Oct 2015 06:56:43 -0400 Date: Wed, 7 Oct 2015 11:56:38 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20151007105637.GB2710@work-vm> References: <1443103624-17230-1-git-send-email-den@openvz.org> <20150928192258.GE5468@work-vm> <560D4430.3090203@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <560D4430.3090203@virtuozzo.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 1/1] migration: fix expected_downtime List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Redko Cc: Amit Shah , "Denis V. Lunev" , qemu-devel@nongnu.org, Juan Quintela * Igor Redko (redkoi@virtuozzo.com) wrote: > On 28.09.2015 22:22, Dr. David Alan Gilbert wrote: > >* Denis V. Lunev (den@openvz.org) wrote: > >>From: Igor Redko > >> > >>To get this estimation we must divide pending_size by bandwidth > >>according to description of expected-downtime ("qmp-commands.hx:3246"= ): > >> "expected-downtime": only present while migration is active > >> total amount in ms for downtime that was calculated on > >> the last bitmap round (json-int) > >> > >>Previous version was just wrong because dirty_bytes_rate and bandwidt= h > >>are measured in Bytes/ms, so dividing first by second we get some > >>dimensionless quantity. > >>As it said in description above this value is showed during active > >>migration phase and recalculated only after transferring all memory > >>and if this process took more than 1 sec. So maybe just nobody notice= d > >>that bug. > > > >While I agree the existing code looks wrong, I don't see how this is > >any more correct. >=20 > This patch is aimed to fix units of expected_downtime. It is reasonable= that > expected_downtime should be measured in milliseconds. While the existin= g > implementation lacks of any units. I agree your units are correct where the old one isn't; and I agree it needs fixing. However I'm worrying about whether the value in your fix is correct. > > I think 'pending_size' is an estimate of the number of bytes left > >to transfer, the intention being that most of those are transferred > >prior to pausing the machine, if those are transferred before pausing > >then they aren't part of the downtime. > > > Yes, 'pending_size' is an estimate of the number of bytes left to trans= fer, > indeed. > But the condition: > > if (s->dirty_bytes_rate && transferred_bytes > 10000) { > slightly modifies the meaning of pending_size correspondingly. > dirty_bytes_rate is set in migration_sync() that is called when pending= _size > < max_downtime * bandwidth. This estimation is higher than max_downtime= by > design I don't think that check really modifies the meaning of pending_size; it's just a sanity check that means we don't start trying to predict down= time when we've not transmitted much yet. > >It feels that: > > * If the guest wasn't dirtying pages, then you wouldn't have to > > pause the guest; if it was just dirtying them a little then you > > wouldn't have much to transfer after the pages you'd already > > sent; so if the guest dirty pages fast then the estimate should = be > > larger; so 'dirty_bytes_rate' being on top of the fraction feels= right. > > > > * If the bandwidth is higher then the estimate should be smaller; = so > > 'bandwidth' being on the bottom of the fraction feels right. > > > >Dave > > > The 'expected_downtime' in the existing code takes two types of values: > * positive - dirty_bytes_rate is higher than bandwidth. In this > case migration doesn't complete. > * zero - bandwidth is higher than dirty_bytes_rate. In this case > migration is possible, but we don=E2=80=99t have the downtime value= . OK, I don't think that should give zero; my argument being that given each pass throuhg RAM takes some time, you're always going to some propor= tion of RAM that's dirty. > This patch has some imperfections. But if we would look back into histo= ry, > it seems that this patch just restores the broken logic. > The existing code is introduced by commit > https://github.com/qemu/qemu/commit/90f8ae724a575861f093fbdbfd49a925bcf= ec327 > which claims, that it just restores the mistakenly deleted estimation > (commit > https://github.com/qemu/qemu/commit/e4ed1541ac9413eac494a03532e34beaf8a= 7d1c5) > Meanwhile, the estimation has changed during this restore operation. Th= e > estimation before the removal (before > e4ed1541ac9413eac494a03532e34beaf8a7d1c5) was just like the one in my p= atch. Yes; remember I believe that the current code is wrong - I'm just not sure your suggested fix is right. > So maybe we should think about improvement of this estimation. > I'm suggest using something like: > expected_downtime =3D migrate_max_downtime * dirty_bytes_rate / ban= dwidth >=20 > In my opinion this is more correct than the existing approach since the= last > step of the migration process (before pause) is transferring of max_siz= e > bytes (max_size =3D bandwidth * migrate_max_downtime() / 1000000). So = the > bytes that were dirtied at this step will be transferred during downtim= e. > The transferred bytes count is dirty_bytes_rate * max_size/bandwidth or > migrate_max_downtime * dirty_bytes_rate and division by the bandwidth > results in a formula: > expected_downtime =3D migrate_max_downtime * dirty_bytes_rate / ban= dwidth Are you sure - I thought migrate_max_downtime is governing the time *afte= r* pause; and it doesn't make sense to me for the expectation of the system to be related to the expectation of the user. Dave > Igor >=20 > >>Signed-off-by: Igor Redko > >>Reviewed-by: Anna Melekhova > >>Signed-off-by: Denis V. Lunev > >>CC: Juan Quintela > >>CC: Amit Shah > >>--- > >> migration/migration.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >>diff --git a/migration/migration.c b/migration/migration.c > >>index 662e77e..d55d545 100644 > >>--- a/migration/migration.c > >>+++ b/migration/migration.c > >>@@ -994,7 +994,7 @@ static void *migration_thread(void *opaque) > >> /* if we haven't sent anything, we don't want to recalc= ulate > >> 10000 is a small enough number for our purposes */ > >> if (s->dirty_bytes_rate && transferred_bytes > 10000) { > >>- s->expected_downtime =3D s->dirty_bytes_rate / bandw= idth; > >>+ s->expected_downtime =3D pending_size / bandwidth; > >> } > >> > >> qemu_file_reset_rate_limit(s->file); > >>-- > >>2.1.4 > >> > >> > >-- > >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > >=20 -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK