From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:54453)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <redkoi@odin.com>) id 1ZhewA-00066V-Bx
	for qemu-devel@nongnu.org; Thu, 01 Oct 2015 10:34:23 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <redkoi@odin.com>) id 1Zhew7-0002uX-Iz
	for qemu-devel@nongnu.org; Thu, 01 Oct 2015 10:34:22 -0400
Received: from mx2.parallels.com ([199.115.105.18]:57626)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <redkoi@odin.com>) id 1Zhew7-0002ZW-Az
	for qemu-devel@nongnu.org; Thu, 01 Oct 2015 10:34:19 -0400
From: Igor Redko <redkoi@virtuozzo.com>
References: <1443103624-17230-1-git-send-email-den@openvz.org>
	<20150928192258.GE5468@work-vm>
Message-ID: <560D4430.3090203@virtuozzo.com>
Date: Thu, 1 Oct 2015 17:33:20 +0300
MIME-Version: 1.0
In-Reply-To: <20150928192258.GE5468@work-vm>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH 1/1] migration: fix expected_downtime
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, qemu-devel@nongnu.org
Cc: Amit Shah <amit.shah@redhat.com>, "Denis V. Lunev" <den@openvz.org>, Juan Quintela <quintela@redhat.com>

On 28.09.2015 22:22, Dr. David Alan Gilbert wrote:
> * Denis V. Lunev (den@openvz.org) wrote:
>> From: Igor Redko <redkoi@virtuozzo.com>
>>
>> To get this estimation we must divide pending_size by bandwidth
>> according to description of expected-downtime ("qmp-commands.hx:3246"):
>>    "expected-downtime": only present while migration is active
>>                total amount in ms for downtime that was calculated on
>>                the last bitmap round (json-int)
>>
>> Previous version was just wrong because dirty_bytes_rate and bandwidth
>> are measured in Bytes/ms, so dividing first by second we get some
>> dimensionless quantity.
>> As it said in description above this value is showed during active
>> migration phase and recalculated only after transferring all memory
>> and if this process took more than 1 sec. So maybe just nobody noticed
>> that bug.
>
> While I agree the existing code looks wrong, I don't see how this is
> any more correct.

This patch is aimed to fix units of expected_downtime. It is reasonable 
that expected_downtime should be measured in milliseconds. While the 
existing implementation lacks of any units.
>
>   I think 'pending_size' is an estimate of the number of bytes left
> to transfer, the intention being that most of those are transferred
> prior to pausing the machine, if those are transferred before pausing
> then they aren't part of the downtime.
>
Yes, 'pending_size' is an estimate of the number of bytes left to 
transfer, indeed.
But the condition:
>	if (s->dirty_bytes_rate && transferred_bytes > 10000) {
slightly modifies the meaning of pending_size correspondingly. 
dirty_bytes_rate is set in migration_sync() that is called when 
pending_size < max_downtime * bandwidth. This estimation is higher than 
max_downtime by design
> It feels that:
>     * If the guest wasn't dirtying pages, then you wouldn't have to
>       pause the guest; if it was just dirtying them a little then you
>       wouldn't have much to transfer after the pages you'd already
>       sent; so if the guest dirty pages fast then the estimate should be
>       larger; so 'dirty_bytes_rate' being on top of the fraction feels right.
>
>     * If the bandwidth is higher then the estimate should be smaller; so
>       'bandwidth' being on the bottom of the fraction feels right.
>
> Dave
>
The 'expected_downtime' in the existing code takes two types of values:
   * positive - dirty_bytes_rate is higher than bandwidth. In this
     case migration doesn't complete.
   * zero - bandwidth is higher than dirty_bytes_rate. In this case
     migration is possible, but we don’t have the downtime value.

This patch has some imperfections. But if we would look back into 
history, it seems that this patch just restores the broken logic.
The existing code is introduced by commit 
https://github.com/qemu/qemu/commit/90f8ae724a575861f093fbdbfd49a925bcfec327 
which claims, that it just restores the mistakenly deleted estimation 
(commit 
https://github.com/qemu/qemu/commit/e4ed1541ac9413eac494a03532e34beaf8a7d1c5)
Meanwhile, the estimation has changed during this restore operation. The 
estimation before the removal (before 
e4ed1541ac9413eac494a03532e34beaf8a7d1c5) was just like the one in my patch.

So maybe we should think about improvement of this estimation.
I'm suggest using something like:
     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth

In my opinion this is more correct than the existing approach since the 
last step of the migration process (before pause) is transferring of 
max_size bytes (max_size =  bandwidth * migrate_max_downtime() / 
1000000). So the bytes that were dirtied at this step will be 
transferred during downtime. The transferred bytes count is 
dirty_bytes_rate * max_size/bandwidth or migrate_max_downtime * 
dirty_bytes_rate and division by the bandwidth results in a formula:
     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth

Igor

>> Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
>> Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Amit Shah <amit.shah@redhat.com>
>> ---
>>   migration/migration.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 662e77e..d55d545 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -994,7 +994,7 @@ static void *migration_thread(void *opaque)
>>               /* if we haven't sent anything, we don't want to recalculate
>>                  10000 is a small enough number for our purposes */
>>               if (s->dirty_bytes_rate && transferred_bytes > 10000) {
>> -                s->expected_downtime = s->dirty_bytes_rate / bandwidth;
>> +                s->expected_downtime = pending_size / bandwidth;
>>               }
>>
>>               qemu_file_reset_rate_limit(s->file);
>> --
>> 2.1.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>