From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:40673)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1TUGyc-0001AP-TM
	for qemu-devel@nongnu.org; Fri, 02 Nov 2012 09:08:02 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1TUGyY-0005wt-QE
	for qemu-devel@nongnu.org; Fri, 02 Nov 2012 09:07:58 -0400
Received: from mx1.redhat.com ([209.132.183.28]:35238)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1TUGyY-0005wk-I5
	for qemu-devel@nongnu.org; Fri, 02 Nov 2012 09:07:54 -0400
From: Juan Quintela <quintela@redhat.com>
In-Reply-To: <20121102031011.GM27695@truffula.fritz.box> (David Gibson's
	message of "Fri, 2 Nov 2012 14:10:11 +1100")
References: <20121102031011.GM27695@truffula.fritz.box>
Date: Fri, 02 Nov 2012 14:07:45 +0100
Message-ID: <87sj8sgnku.fsf@elfo.mitica>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [Qemu-devel] Testing migration under stress
Reply-To: quintela@redhat.com
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: aik@ozlabs.ru, qemu-devel@nongnu.org

David Gibson <david@gibson.dropbear.id.au> wrote:
> Asking for some advice on the list.
>
> I have prorotype savevm and migration support ready for the pseries
> machine.  They seem to work under simple circumstances (idle guest).
> To test them more extensively I've been attempting to perform live
> migrations (just over tcp->localhost) which the guest is active with
> something.  In particular I've tried while using octave to do matrix
> multiply (so exercising the FP unit) and my colleague Alexey has tried
> during some video encoding.
>
> However, in each of these cases, we've found that the migration only
> completes and the source instance only stops after the intensive
> workload has (just) completed.  What I surmise is happening is that
> the workload is touching memory pages fast enough that the ram
> migration code is never getting below the threshold to complete the
> migration until the guest is idle again.
>
> Does anyone have some ideas for testing this better: workloads that
> are less likely to trigger this behaviour, or settings to tweak in the
> migration itself to make it more likely to complete migration while
> the workload is still active.

You can:

migrate_set_downtime 2s (or so)

I normally run stress, and you move the memory that it dirties until it
converges (depends a lot of your networking).

Doing anything that is really memory intensive is basically never gonig
to converge.

Later, Juan.