From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:58684) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TVbby-0005e8-31 for qemu-devel@nongnu.org; Tue, 06 Nov 2012 00:22:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TVbbw-0004qA-Sj for qemu-devel@nongnu.org; Tue, 06 Nov 2012 00:22:06 -0500 Received: from mail-pb0-f45.google.com ([209.85.160.45]:37071) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TVbbw-0004pm-M6 for qemu-devel@nongnu.org; Tue, 06 Nov 2012 00:22:04 -0500 Received: by mail-pb0-f45.google.com with SMTP id rp2so112995pbb.4 for ; Mon, 05 Nov 2012 21:22:03 -0800 (PST) Message-ID: <50989E83.3030008@ozlabs.ru> Date: Tue, 06 Nov 2012 16:22:11 +1100 From: Alexey Kardashevskiy MIME-Version: 1.0 References: <20121102031011.GM27695@truffula.fritz.box> <5093B8A9.4060501@redhat.com> In-Reply-To: <5093B8A9.4060501@redhat.com> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Testing migration under stress List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Orit Wasserman Cc: quintela@redhat.com, qemu-devel@nongnu.org, David Gibson On 02/11/12 23:12, Orit Wasserman wrote: > On 11/02/2012 05:10 AM, David Gibson wrote: >> Asking for some advice on the list. >> >> I have prorotype savevm and migration support ready for the pseries >> machine. They seem to work under simple circumstances (idle guest). >> To test them more extensively I've been attempting to perform live >> migrations (just over tcp->localhost) which the guest is active with >> something. In particular I've tried while using octave to do matrix >> multiply (so exercising the FP unit) and my colleague Alexey has tried >> during some video encoding. >> > As you are doing local migration one option is to setting the speed higher > than line speed , as we don't actually send the data, another is to set high downtime. > >> However, in each of these cases, we've found that the migration only >> completes and the source instance only stops after the intensive >> workload has (just) completed. What I surmise is happening is that >> the workload is touching memory pages fast enough that the ram >> migration code is never getting below the threshold to complete the >> migration until the guest is idle again. >> > The workload you chose is really bad for live migration, as all the guest does is > dirtying his memory. I recommend looking for workload that does some networking or disk IO. > Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't > know if they run on pseries, but similar workload should be ok(small database/warehouse). > We found out that SpecJbb on the other hand is hard to converge. > Web workload or video streaming also do the trick. My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36 pixels. So it should not be dirtying memory too much. Or is it? (qemu) info migrate capabilities: xbzrle: off Migration status: completed total time: 14538 milliseconds downtime: 1273 milliseconds transferred ram: 389961 kbytes remaining ram: 0 kbytes total ram: 1065024 kbytes duplicate: 181949 pages normal: 97446 pages normal bytes: 389784 kbytes How many bytes were actually transferred? "duplicate" * 4K = 745MB? Is there any tool in QEMU to see how many pages are used/dirty/etc? "info" does not seem to have any kind of such statistic. btw the new guest did not resume (qemu still responds on commands) but this is probably our problem within "pseries" platform. What is strange is that "info migrate" on the new guest shows nothing: (qemu) info migrate (qemu) > Cheers, > Orit > >> Does anyone have some ideas for testing this better: workloads that >> are less likely to trigger this behaviour, or settings to tweak in the >> migration itself to make it more likely to complete migration while >> the workload is still active. >> > -- Alexey