From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:46132)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1TVe0K-0006gM-SI
	for qemu-devel@nongnu.org; Tue, 06 Nov 2012 02:55:31 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1TVe0E-0003iN-PP
	for qemu-devel@nongnu.org; Tue, 06 Nov 2012 02:55:24 -0500
Received: from mail-pa0-f45.google.com ([209.85.220.45]:33451)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1TVe0E-0003iI-FH
	for qemu-devel@nongnu.org; Tue, 06 Nov 2012 02:55:18 -0500
Received: by mail-pa0-f45.google.com with SMTP id fb10so132372pad.4
	for <qemu-devel@nongnu.org>; Mon, 05 Nov 2012 23:55:17 -0800 (PST)
Message-ID: <5098C276.9010905@ozlabs.ru>
Date: Tue, 06 Nov 2012 18:55:34 +1100
From: Alexey Kardashevskiy <aik@ozlabs.ru>
MIME-Version: 1.0
References: <20121102031011.GM27695@truffula.fritz.box>
	<5093B8A9.4060501@redhat.com> <50989E83.3030008@ozlabs.ru>
	<20121106065519.GC23553@truffula.fritz.box>
In-Reply-To: <20121106065519.GC23553@truffula.fritz.box>
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Testing migration under stress
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Orit Wasserman <owasserm@redhat.com>, qemu-devel@nongnu.org, quintela@redhat.com

On 06/11/12 17:55, David Gibson wrote:
> On Tue, Nov 06, 2012 at 04:22:11PM +1100, Alexey Kardashevskiy wrote:
>> On 02/11/12 23:12, Orit Wasserman wrote:
>>> On 11/02/2012 05:10 AM, David Gibson wrote:
>>>> Asking for some advice on the list.
>>>>
>>>> I have prorotype savevm and migration support ready for the pseries
>>>> machine.  They seem to work under simple circumstances (idle guest).
>>>> To test them more extensively I've been attempting to perform live
>>>> migrations (just over tcp->localhost) which the guest is active with
>>>> something.  In particular I've tried while using octave to do matrix
>>>> multiply (so exercising the FP unit) and my colleague Alexey has tried
>>>> during some video encoding.
>>>>
>>> As you are doing local migration one option is to setting the speed higher
>>> than line speed , as we don't actually send the data, another is to set high downtime.
>>>
>>>> However, in each of these cases, we've found that the migration only
>>>> completes and the source instance only stops after the intensive
>>>> workload has (just) completed.  What I surmise is happening is that
>>>> the workload is touching memory pages fast enough that the ram
>>>> migration code is never getting below the threshold to complete the
>>>> migration until the guest is idle again.
>>>>
>>> The workload you chose is really bad for live migration, as all the guest does is
>>> dirtying his memory. I recommend looking for workload that does some networking or disk IO.
>>> Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't
>>> know if they run on pseries, but similar workload should be ok(small database/warehouse).
>>> We found out that SpecJbb on the other hand is hard to converge.
>>> Web workload or video streaming also do the trick.
>>
>>
>> My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36
>> pixels. So it should not be dirtying memory too much. Or is it?
>
> Oh.. if your encoding the same format to the same format it may well
> be optimized and therefore memory limited.

No, it is not optimized, it still decodes and encodes as I inserted some 
filter in a chain.

> I was envisaging encoding
> an uncompressed format to a highly compressed format, which should be
> compute limited rather than memory bandwidth limited.

> The size and
> resolution of the input doesn't really matter as long as:
> 	   * the output size is much smaller than the input size

This is another scenario, I run both. I just tried to reduce memory 
consumption as it was recommended here and see if anything changes.

Originally it was 1280*720 to 64*36 but I am not sure it does not use much 
memory as (I suspect at least sometime) ffmpeg may decode a series of full 
size frames to do motion detection or something.


> and	   * it takes several minutes for the full encode to give a
> 	     reasonable amount of  time for the migrate to converge.

90 seconds each file, if I run a script which does encoding in a loop, the 
pause between encodings is not big enough to finish migration anyway if I 
encode big video to small video.

However if it is 64*36, migration finishes (the first qemu succeeds and 
stops) but the new guest does not resume.


>>
>> (qemu) info migrate
>> capabilities: xbzrle: off
>> Migration status: completed
>> total time: 14538 milliseconds
>> downtime: 1273 milliseconds
>> transferred ram: 389961 kbytes
>> remaining ram: 0 kbytes
>> total ram: 1065024 kbytes
>> duplicate: 181949 pages
>> normal: 97446 pages
>> normal bytes: 389784 kbytes
>>
>> How many bytes were actually transferred? "duplicate" * 4K = 745MB?
>>
>> Is there any tool in QEMU to see how many pages are used/dirty/etc?
>> "info" does not seem to have any kind of such statistic.
>>
>> btw the new guest did not resume (qemu still responds on commands)
>> but this is probably our problem within "pseries" platform. What is
>
> Uh, that's a bug, and I'm not sure when it broke.  If the migrate
> isn't even working we're premature in attempting to work out why it
> isn't happening when we expect.

Here I wanted to emphasize that I would like to find some way to get 
information about how migration is doing (or done) in the new guest - there 
is no statistic about it.


-- 
Alexey