From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:33137)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1WAhNZ-0008H8-IQ
	for qemu-devel@nongnu.org; Tue, 04 Feb 2014 09:53:43 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1WAhNT-00015t-D7
	for qemu-devel@nongnu.org; Tue, 04 Feb 2014 09:53:37 -0500
Received: from mx1.redhat.com ([209.132.183.28]:60149)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1WAhNT-00015n-5K
	for qemu-devel@nongnu.org; Tue, 04 Feb 2014 09:53:31 -0500
Message-ID: <52F0F26A.5020304@redhat.com>
Date: Tue, 04 Feb 2014 15:00:10 +0100
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <52F0938F.2040102@ozlabs.ru> <52F0C523.30102@redhat.com>
	<52F0D611.7070105@ozlabs.ru> <52F0D810.4070806@redhat.com>
	<52F0DA04.9040003@ozlabs.ru>
In-Reply-To: <52F0DA04.9040003@ozlabs.ru>
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] migration: broken ram_save_pending
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: Alex Graf <agraf@suse.de>

Il 04/02/2014 13:16, Alexey Kardashevskiy ha scritto:
> On 02/04/2014 11:07 PM, Paolo Bonzini wrote:
>> Il 04/02/2014 12:59, Alexey Kardashevskiy ha scritto:
>>>>> With the default throttling of 32 MiB/s, bandwidth must be something like
>>>>> 33000 (expressed in bytes/ms) with the default settings, and then
>>>> max_size
>>>>> should be 33000*3*10^9 / 10^6 = 6000000.  Where is my computation wrong?
>>>
>>> migrate_max_downtime() = 30000000 = 3*10^7.
>>
>> Oops, that's the mistake.
>
> Make a patch? :)

I mean, my mistake. :)  I assumed 3000 ms = 3*10^9.

30 ms is too little, but 3000 ms is probably too much for a default.

>>> When the migration is in iterating stage, bandwidth is a speed in last
>>> 100ms which is usually 5 blocks 250KB each so it is
>>> 1250000/100=12500bytes/s and max_size=12500*30000000/10^6=375000 which is
>>> less than the last chunk is.
>>
>> Perhaps our default maximum downtime is too low.  30 ms doesn't seem
>> achievable in practice with 32 MiB/s bandwidth.  Just making it 300 ms or
>> so should fix your problem.
>
> Well, it will fix it in my particular case but in a long run this does not
> feel like a fix - there should be a way for migration_thread() to know that
> ram_save_iterate() sent all dirty pages it had to send, no?

No, because new pages might be dirtied while ram_save_iterate() was running.

Paolo