From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:52438) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TVgni-0003Oe-GU for qemu-devel@nongnu.org; Tue, 06 Nov 2012 05:54:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TVgnY-0003A7-Jc for qemu-devel@nongnu.org; Tue, 06 Nov 2012 05:54:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48552) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TVgnY-00037g-Bi for qemu-devel@nongnu.org; Tue, 06 Nov 2012 05:54:24 -0500 Message-ID: <5098EC6F.30007@redhat.com> Date: Tue, 06 Nov 2012 12:54:39 +0200 From: Orit Wasserman MIME-Version: 1.0 References: <20121102031011.GM27695@truffula.fritz.box> <5093B8A9.4060501@redhat.com> <50989E83.3030008@ozlabs.ru> In-Reply-To: <50989E83.3030008@ozlabs.ru> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Testing migration under stress List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: quintela@redhat.com, qemu-devel@nongnu.org, David Gibson On 11/06/2012 07:22 AM, Alexey Kardashevskiy wrote: > On 02/11/12 23:12, Orit Wasserman wrote: >> On 11/02/2012 05:10 AM, David Gibson wrote: >>> Asking for some advice on the list. >>> >>> I have prorotype savevm and migration support ready for the pseries >>> machine. They seem to work under simple circumstances (idle guest). >>> To test them more extensively I've been attempting to perform live >>> migrations (just over tcp->localhost) which the guest is active with >>> something. In particular I've tried while using octave to do matrix >>> multiply (so exercising the FP unit) and my colleague Alexey has tried >>> during some video encoding. >>> >> As you are doing local migration one option is to setting the speed higher >> than line speed , as we don't actually send the data, another is to set high downtime. >> >>> However, in each of these cases, we've found that the migration only >>> completes and the source instance only stops after the intensive >>> workload has (just) completed. What I surmise is happening is that >>> the workload is touching memory pages fast enough that the ram >>> migration code is never getting below the threshold to complete the >>> migration until the guest is idle again. >>> >> The workload you chose is really bad for live migration, as all the guest does is >> dirtying his memory. I recommend looking for workload that does some networking or disk IO. >> Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't >> know if they run on pseries, but similar workload should be ok(small database/warehouse). >> We found out that SpecJbb on the other hand is hard to converge. >> Web workload or video streaming also do the trick. > > > My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36 pixels. So it should not be dirtying memory too much. Or is it? > > (qemu) info migrate > capabilities: xbzrle: off > Migration status: completed > total time: 14538 milliseconds > downtime: 1273 milliseconds > transferred ram: 389961 kbytes > remaining ram: 0 kbytes > total ram: 1065024 kbytes > duplicate: 181949 pages > normal: 97446 pages > normal bytes: 389784 kbytes > > How many bytes were actually transferred? "duplicate" * 4K = 745MB? For duplicate we send one byte and those are usually zero pages + the page header. transferred is the actual amount of bytes sent so here is around 389M was sent. > > Is there any tool in QEMU to see how many pages are used/dirty/etc? sadly no. > "info" does not seem to have any kind of such statistic. > > btw the new guest did not resume (qemu still responds on commands) but this is probably our problem within "pseries" platform. What is strange is that "info migrate" on the new guest shows nothing: > > (qemu) info migrate > (qemu) > the "info migrate" command displays outgoing migration information not incoming .. > > > >> Cheers, >> Orit >> >>> Does anyone have some ideas for testing this better: workloads that >>> are less likely to trigger this behaviour, or settings to tweak in the >>> migration itself to make it more likely to complete migration while >>> the workload is still active. >>> >> > >