From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:58684)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1TVbby-0005e8-31
	for qemu-devel@nongnu.org; Tue, 06 Nov 2012 00:22:07 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1TVbbw-0004qA-Sj
	for qemu-devel@nongnu.org; Tue, 06 Nov 2012 00:22:06 -0500
Received: from mail-pb0-f45.google.com ([209.85.160.45]:37071)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1TVbbw-0004pm-M6
	for qemu-devel@nongnu.org; Tue, 06 Nov 2012 00:22:04 -0500
Received: by mail-pb0-f45.google.com with SMTP id rp2so112995pbb.4
	for <qemu-devel@nongnu.org>; Mon, 05 Nov 2012 21:22:03 -0800 (PST)
Message-ID: <50989E83.3030008@ozlabs.ru>
Date: Tue, 06 Nov 2012 16:22:11 +1100
From: Alexey Kardashevskiy <aik@ozlabs.ru>
MIME-Version: 1.0
References: <20121102031011.GM27695@truffula.fritz.box>
	<5093B8A9.4060501@redhat.com>
In-Reply-To: <5093B8A9.4060501@redhat.com>
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Testing migration under stress
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Orit Wasserman <owasserm@redhat.com>
Cc: quintela@redhat.com, qemu-devel@nongnu.org, David Gibson <david@gibson.dropbear.id.au>

On 02/11/12 23:12, Orit Wasserman wrote:
> On 11/02/2012 05:10 AM, David Gibson wrote:
>> Asking for some advice on the list.
>>
>> I have prorotype savevm and migration support ready for the pseries
>> machine.  They seem to work under simple circumstances (idle guest).
>> To test them more extensively I've been attempting to perform live
>> migrations (just over tcp->localhost) which the guest is active with
>> something.  In particular I've tried while using octave to do matrix
>> multiply (so exercising the FP unit) and my colleague Alexey has tried
>> during some video encoding.
>>
> As you are doing local migration one option is to setting the speed higher
> than line speed , as we don't actually send the data, another is to set high downtime.
>
>> However, in each of these cases, we've found that the migration only
>> completes and the source instance only stops after the intensive
>> workload has (just) completed.  What I surmise is happening is that
>> the workload is touching memory pages fast enough that the ram
>> migration code is never getting below the threshold to complete the
>> migration until the guest is idle again.
>>
> The workload you chose is really bad for live migration, as all the guest does is
> dirtying his memory. I recommend looking for workload that does some networking or disk IO.
> Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't
> know if they run on pseries, but similar workload should be ok(small database/warehouse).
> We found out that SpecJbb on the other hand is hard to converge.
> Web workload or video streaming also do the trick.


My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36 pixels. 
So it should not be dirtying memory too much. Or is it?

(qemu) info migrate
capabilities: xbzrle: off
Migration status: completed
total time: 14538 milliseconds
downtime: 1273 milliseconds
transferred ram: 389961 kbytes
remaining ram: 0 kbytes
total ram: 1065024 kbytes
duplicate: 181949 pages
normal: 97446 pages
normal bytes: 389784 kbytes

How many bytes were actually transferred? "duplicate" * 4K = 745MB?

Is there any tool in QEMU to see how many pages are used/dirty/etc?
"info" does not seem to have any kind of such statistic.

btw the new guest did not resume (qemu still responds on commands) but this 
is probably our problem within "pseries" platform. What is strange is that 
"info migrate" on the new guest shows nothing:

(qemu) info migrate
(qemu)


> Cheers,
> Orit
>
>> Does anyone have some ideas for testing this better: workloads that
>> are less likely to trigger this behaviour, or settings to tweak in the
>> migration itself to make it more likely to complete migration while
>> the workload is still active.
>>
>


-- 
Alexey