From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:45204) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TVLgC-0005lv-16 for qemu-devel@nongnu.org; Mon, 05 Nov 2012 07:21:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TVLgA-0008Nb-RV for qemu-devel@nongnu.org; Mon, 05 Nov 2012 07:21:23 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40332) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TVLgA-0008NU-Iv for qemu-devel@nongnu.org; Mon, 05 Nov 2012 07:21:22 -0500 Message-ID: <5097AF51.9010703@redhat.com> Date: Mon, 05 Nov 2012 14:21:37 +0200 From: Orit Wasserman MIME-Version: 1.0 References: <20121102031011.GM27695@truffula.fritz.box> <5093B8A9.4060501@redhat.com> <20121105003006.GW27695@truffula.fritz.box> In-Reply-To: <20121105003006.GW27695@truffula.fritz.box> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Testing migration under stress List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: aik@ozlabs.ru, qemu-devel@nongnu.org, quintela@redhat.com On 11/05/2012 02:30 AM, David Gibson wrote: > On Fri, Nov 02, 2012 at 02:12:25PM +0200, Orit Wasserman wrote: >> On 11/02/2012 05:10 AM, David Gibson wrote: >>> Asking for some advice on the list. >>> >>> I have prorotype savevm and migration support ready for the pseries >>> machine. They seem to work under simple circumstances (idle guest). >>> To test them more extensively I've been attempting to perform live >>> migrations (just over tcp->localhost) which the guest is active with >>> something. In particular I've tried while using octave to do matrix >>> multiply (so exercising the FP unit) and my colleague Alexey has tried >>> during some video encoding. > >> As you are doing local migration one option is to setting the speed >> higher than line speed , as we don't actually send the data, another >> is to set high downtime. > > I'm not entirely sure what you mean by that. But I do have suspicions > based on this and other factors that the default bandwidth it is > limiting to is horribly, horribly low. > >>> However, in each of these cases, we've found that the migration only >>> completes and the source instance only stops after the intensive >>> workload has (just) completed. What I surmise is happening is that >>> the workload is touching memory pages fast enough that the ram >>> migration code is never getting below the threshold to complete the >>> migration until the guest is idle again. >>> >> The workload you chose is really bad for live migration, as all the >> guest does is dirtying his memory. > > Well, I realised that was true of the matrix multiply. For video > encode though, the output data should be much, much smaller than the > input, so I wouldn't expect it to be dirtying memory that fast. > >> I recommend looking for workload >> that does some networking or disk IO. Vinod succeeded running >> SwingBench and SLOB benchmarks that converged ok, I don't know if >> they run on pseries, but similar workload should be ok(small >> database/warehouse). We found out that SpecJbb on the other hand is >> hard to converge. Web workload or video streaming also do the >> trick. > > Hrm. As something really simple and stupid, I did try migrationg an > ls -lR /, but even that didn't converge :/. That is strange, it should converge even with the defaults, Any special about your storage setup ? >