From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:40673) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TUGyc-0001AP-TM for qemu-devel@nongnu.org; Fri, 02 Nov 2012 09:08:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TUGyY-0005wt-QE for qemu-devel@nongnu.org; Fri, 02 Nov 2012 09:07:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35238) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TUGyY-0005wk-I5 for qemu-devel@nongnu.org; Fri, 02 Nov 2012 09:07:54 -0400 From: Juan Quintela In-Reply-To: <20121102031011.GM27695@truffula.fritz.box> (David Gibson's message of "Fri, 2 Nov 2012 14:10:11 +1100") References: <20121102031011.GM27695@truffula.fritz.box> Date: Fri, 02 Nov 2012 14:07:45 +0100 Message-ID: <87sj8sgnku.fsf@elfo.mitica> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] Testing migration under stress Reply-To: quintela@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: aik@ozlabs.ru, qemu-devel@nongnu.org David Gibson wrote: > Asking for some advice on the list. > > I have prorotype savevm and migration support ready for the pseries > machine. They seem to work under simple circumstances (idle guest). > To test them more extensively I've been attempting to perform live > migrations (just over tcp->localhost) which the guest is active with > something. In particular I've tried while using octave to do matrix > multiply (so exercising the FP unit) and my colleague Alexey has tried > during some video encoding. > > However, in each of these cases, we've found that the migration only > completes and the source instance only stops after the intensive > workload has (just) completed. What I surmise is happening is that > the workload is touching memory pages fast enough that the ram > migration code is never getting below the threshold to complete the > migration until the guest is idle again. > > Does anyone have some ideas for testing this better: workloads that > are less likely to trigger this behaviour, or settings to tweak in the > migration itself to make it more likely to complete migration while > the workload is still active. You can: migrate_set_downtime 2s (or so) I normally run stress, and you move the memory that it dirties until it converges (depends a lot of your networking). Doing anything that is really memory intensive is basically never gonig to converge. Later, Juan.