From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:46132) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TVe0K-0006gM-SI for qemu-devel@nongnu.org; Tue, 06 Nov 2012 02:55:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TVe0E-0003iN-PP for qemu-devel@nongnu.org; Tue, 06 Nov 2012 02:55:24 -0500 Received: from mail-pa0-f45.google.com ([209.85.220.45]:33451) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TVe0E-0003iI-FH for qemu-devel@nongnu.org; Tue, 06 Nov 2012 02:55:18 -0500 Received: by mail-pa0-f45.google.com with SMTP id fb10so132372pad.4 for ; Mon, 05 Nov 2012 23:55:17 -0800 (PST) Message-ID: <5098C276.9010905@ozlabs.ru> Date: Tue, 06 Nov 2012 18:55:34 +1100 From: Alexey Kardashevskiy MIME-Version: 1.0 References: <20121102031011.GM27695@truffula.fritz.box> <5093B8A9.4060501@redhat.com> <50989E83.3030008@ozlabs.ru> <20121106065519.GC23553@truffula.fritz.box> In-Reply-To: <20121106065519.GC23553@truffula.fritz.box> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Testing migration under stress List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Orit Wasserman , qemu-devel@nongnu.org, quintela@redhat.com On 06/11/12 17:55, David Gibson wrote: > On Tue, Nov 06, 2012 at 04:22:11PM +1100, Alexey Kardashevskiy wrote: >> On 02/11/12 23:12, Orit Wasserman wrote: >>> On 11/02/2012 05:10 AM, David Gibson wrote: >>>> Asking for some advice on the list. >>>> >>>> I have prorotype savevm and migration support ready for the pseries >>>> machine. They seem to work under simple circumstances (idle guest). >>>> To test them more extensively I've been attempting to perform live >>>> migrations (just over tcp->localhost) which the guest is active with >>>> something. In particular I've tried while using octave to do matrix >>>> multiply (so exercising the FP unit) and my colleague Alexey has tried >>>> during some video encoding. >>>> >>> As you are doing local migration one option is to setting the speed higher >>> than line speed , as we don't actually send the data, another is to set high downtime. >>> >>>> However, in each of these cases, we've found that the migration only >>>> completes and the source instance only stops after the intensive >>>> workload has (just) completed. What I surmise is happening is that >>>> the workload is touching memory pages fast enough that the ram >>>> migration code is never getting below the threshold to complete the >>>> migration until the guest is idle again. >>>> >>> The workload you chose is really bad for live migration, as all the guest does is >>> dirtying his memory. I recommend looking for workload that does some networking or disk IO. >>> Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't >>> know if they run on pseries, but similar workload should be ok(small database/warehouse). >>> We found out that SpecJbb on the other hand is hard to converge. >>> Web workload or video streaming also do the trick. >> >> >> My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36 >> pixels. So it should not be dirtying memory too much. Or is it? > > Oh.. if your encoding the same format to the same format it may well > be optimized and therefore memory limited. No, it is not optimized, it still decodes and encodes as I inserted some filter in a chain. > I was envisaging encoding > an uncompressed format to a highly compressed format, which should be > compute limited rather than memory bandwidth limited. > The size and > resolution of the input doesn't really matter as long as: > * the output size is much smaller than the input size This is another scenario, I run both. I just tried to reduce memory consumption as it was recommended here and see if anything changes. Originally it was 1280*720 to 64*36 but I am not sure it does not use much memory as (I suspect at least sometime) ffmpeg may decode a series of full size frames to do motion detection or something. > and * it takes several minutes for the full encode to give a > reasonable amount of time for the migrate to converge. 90 seconds each file, if I run a script which does encoding in a loop, the pause between encodings is not big enough to finish migration anyway if I encode big video to small video. However if it is 64*36, migration finishes (the first qemu succeeds and stops) but the new guest does not resume. >> >> (qemu) info migrate >> capabilities: xbzrle: off >> Migration status: completed >> total time: 14538 milliseconds >> downtime: 1273 milliseconds >> transferred ram: 389961 kbytes >> remaining ram: 0 kbytes >> total ram: 1065024 kbytes >> duplicate: 181949 pages >> normal: 97446 pages >> normal bytes: 389784 kbytes >> >> How many bytes were actually transferred? "duplicate" * 4K = 745MB? >> >> Is there any tool in QEMU to see how many pages are used/dirty/etc? >> "info" does not seem to have any kind of such statistic. >> >> btw the new guest did not resume (qemu still responds on commands) >> but this is probably our problem within "pseries" platform. What is > > Uh, that's a bug, and I'm not sure when it broke. If the migrate > isn't even working we're premature in attempting to work out why it > isn't happening when we expect. Here I wanted to emphasize that I would like to find some way to get information about how migration is doing (or done) in the new guest - there is no statistic about it. -- Alexey