From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:45204)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <owasserm@redhat.com>) id 1TVLgC-0005lv-16
	for qemu-devel@nongnu.org; Mon, 05 Nov 2012 07:21:29 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <owasserm@redhat.com>) id 1TVLgA-0008Nb-RV
	for qemu-devel@nongnu.org; Mon, 05 Nov 2012 07:21:23 -0500
Received: from mx1.redhat.com ([209.132.183.28]:40332)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <owasserm@redhat.com>) id 1TVLgA-0008NU-Iv
	for qemu-devel@nongnu.org; Mon, 05 Nov 2012 07:21:22 -0500
Message-ID: <5097AF51.9010703@redhat.com>
Date: Mon, 05 Nov 2012 14:21:37 +0200
From: Orit Wasserman <owasserm@redhat.com>
MIME-Version: 1.0
References: <20121102031011.GM27695@truffula.fritz.box>
	<5093B8A9.4060501@redhat.com>
	<20121105003006.GW27695@truffula.fritz.box>
In-Reply-To: <20121105003006.GW27695@truffula.fritz.box>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Testing migration under stress
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: aik@ozlabs.ru, qemu-devel@nongnu.org, quintela@redhat.com

On 11/05/2012 02:30 AM, David Gibson wrote:
> On Fri, Nov 02, 2012 at 02:12:25PM +0200, Orit Wasserman wrote:
>> On 11/02/2012 05:10 AM, David Gibson wrote:
>>> Asking for some advice on the list.
>>>
>>> I have prorotype savevm and migration support ready for the pseries
>>> machine.  They seem to work under simple circumstances (idle guest).
>>> To test them more extensively I've been attempting to perform live
>>> migrations (just over tcp->localhost) which the guest is active with
>>> something.  In particular I've tried while using octave to do matrix
>>> multiply (so exercising the FP unit) and my colleague Alexey has tried
>>> during some video encoding.
> 
>> As you are doing local migration one option is to setting the speed
>> higher than line speed , as we don't actually send the data, another
>> is to set high downtime.
> 
> I'm not entirely sure what you mean by that.  But I do have suspicions
> based on this and other factors that the default bandwidth it is
> limiting to is horribly, horribly low.
> 
>>> However, in each of these cases, we've found that the migration only
>>> completes and the source instance only stops after the intensive
>>> workload has (just) completed.  What I surmise is happening is that
>>> the workload is touching memory pages fast enough that the ram
>>> migration code is never getting below the threshold to complete the
>>> migration until the guest is idle again.
>>>
>> The workload you chose is really bad for live migration, as all the
>> guest does is dirtying his memory.
> 
> Well, I realised that was true of the matrix multiply.  For video
> encode though, the output data should be much, much smaller than the
> input, so I wouldn't expect it to be dirtying memory that fast.
> 
>> I recommend looking for workload
>> that does some networking or disk IO.  Vinod succeeded running
>> SwingBench and SLOB benchmarks that converged ok, I don't know if
>> they run on pseries, but similar workload should be ok(small
>> database/warehouse).  We found out that SpecJbb on the other hand is
>> hard to converge.  Web workload or video streaming also do the
>> trick.
> 
> Hrm.  As something really simple and stupid, I did try migrationg an
> ls -lR /, but even that didn't converge :/.
That is strange, it should converge even with the defaults,
Any special about your storage setup ?
>