From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:56440)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <owasserm@redhat.com>) id 1TUG6i-0006hm-0t
	for qemu-devel@nongnu.org; Fri, 02 Nov 2012 08:12:20 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <owasserm@redhat.com>) id 1TUG6d-0006Sg-7A
	for qemu-devel@nongnu.org; Fri, 02 Nov 2012 08:12:15 -0400
Received: from mx1.redhat.com ([209.132.183.28]:10376)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <owasserm@redhat.com>) id 1TUG6c-0006SZ-V9
	for qemu-devel@nongnu.org; Fri, 02 Nov 2012 08:12:11 -0400
Message-ID: <5093B8A9.4060501@redhat.com>
Date: Fri, 02 Nov 2012 14:12:25 +0200
From: Orit Wasserman <owasserm@redhat.com>
MIME-Version: 1.0
References: <20121102031011.GM27695@truffula.fritz.box>
In-Reply-To: <20121102031011.GM27695@truffula.fritz.box>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Testing migration under stress
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: aik@ozlabs.ru, qemu-devel@nongnu.org, quintela@redhat.com

On 11/02/2012 05:10 AM, David Gibson wrote:
> Asking for some advice on the list.
> 
> I have prorotype savevm and migration support ready for the pseries
> machine.  They seem to work under simple circumstances (idle guest).
> To test them more extensively I've been attempting to perform live
> migrations (just over tcp->localhost) which the guest is active with
> something.  In particular I've tried while using octave to do matrix
> multiply (so exercising the FP unit) and my colleague Alexey has tried
> during some video encoding.
>
As you are doing local migration one option is to setting the speed higher
than line speed , as we don't actually send the data, another is to set high downtime.

> However, in each of these cases, we've found that the migration only
> completes and the source instance only stops after the intensive
> workload has (just) completed.  What I surmise is happening is that
> the workload is touching memory pages fast enough that the ram
> migration code is never getting below the threshold to complete the
> migration until the guest is idle again.
> 
The workload you chose is really bad for live migration, as all the guest does is
dirtying his memory. I recommend looking for workload that does some networking or disk IO.
Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't
know if they run on pseries, but similar workload should be ok(small database/warehouse).
We found out that SpecJbb on the other hand is hard to converge.
Web workload or video streaming also do the trick.

Cheers,
Orit

> Does anyone have some ideas for testing this better: workloads that
> are less likely to trigger this behaviour, or settings to tweak in the
> migration itself to make it more likely to complete migration while
> the workload is still active.
>