From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:55909)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rhod@redhat.com>) id 1Rctms-0003T5-ER
	for qemu-devel@nongnu.org; Tue, 20 Dec 2011 02:06:59 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rhod@redhat.com>) id 1Rctmr-0007pG-4D
	for qemu-devel@nongnu.org; Tue, 20 Dec 2011 02:06:58 -0500
Received: from mx1.redhat.com ([209.132.183.28]:26377)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rhod@redhat.com>) id 1Rctmq-0007p8-TJ
	for qemu-devel@nongnu.org; Tue, 20 Dec 2011 02:06:57 -0500
Received: from int-mx02.intmail.prod.int.phx2.redhat.com
	(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id pBK76s8v010511
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <qemu-devel@nongnu.org>; Tue, 20 Dec 2011 02:06:55 -0500
Received: from dhcp-1-73.tlv.redhat.com (vpn-202-127.tlv.redhat.com
	[10.35.202.127])
	by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP
	id pBK76rZ8011699
	for <qemu-devel@nongnu.org>; Tue, 20 Dec 2011 02:06:54 -0500
Message-ID: <4EF0340C.9000005@redhat.com>
Date: Tue, 20 Dec 2011 09:06:52 +0200
From: Ronen Hod <rhod@redhat.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="------------030405050701000000060909"
Subject: [Qemu-devel] [RFC] Migration convergence - a suggestion
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

This is a multi-part message in MIME format.
--------------030405050701000000060909
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Well the issue is not new, anyhow, following a conversation with Orit ...

Since we want the migration to finish, I believe that the "migration 
speed" parameter alone cannot do the job.
I suggest using two distinct parameters:
1. Migration speed - will be used to limit the network resources utilization
2. aggressionLevel - A number between 0.0 and 1.0, where low values 
imply minimal interruption to the guest, and 1.0 mean that the guest 
will be completely stalled.

In any case the migration will have to do its work and finish given any 
actual migration-speed, so even low aggressionLevel values will 
sometimes imply that the guest will be throttled substantially.

The algorithm:
The aggressionLevel should determine the targetGuest%CPU (how much CPU 
time we want to allocate to the guest)
With aggressionLevel = 1.0, the guest gets no CPU-resources (stalled).
With aggressionLevel = 0.0, the guest gets minGuest%CPU, such that 
migrationRate == dirtyPagesRate. This minGuest%CPU is continuously 
updated based on the running average of the recent samples (more below).

Note that the targetGuest%CPU allocation is continuously updated due to 
changes guest behavior, network congestion, and alike.

Some more details
- minGuest%CPU (i.e., for dirtyPagesRate == migrationRate) is easy to 
calculate as a running average of
   (migrationRate / dirtyPagesRate * guest%CPU)
- There are several methods to calculate the running average, my 
favorite is IIR, where, roughly speaking,
   newVal = 0.99 * oldVal + 0.01 * newSample
- I would use two measures to ensure that there are more migrated pages 
than "dirty" pages.
   1. The running average (based on recent samples) of the migrated 
pages is larger than that of the new dirty pages
   2. The total number of migrated pages so far is larger than the total 
number of new dirty pages.

And yes, many details are still missing.

Ronen.


--------------030405050701000000060909
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-text-flowed" style="font-family: -moz-fixed;
      font-size: 12px;" lang="x-western">Well the issue is not new,
      anyhow, following a conversation with Orit ...
      <br>
      <br>
      Since we want the migration to finish, I believe that the
      "migration speed" parameter alone cannot do the job.
      <br>
      I suggest using two distinct parameters:<br>
      1. Migration speed - will be used to limit the network resources
      utilization<br>
      2. aggressionLevel - A number between 0.0 and 1.0, where low
      values imply minimal interruption to the guest, and 1.0 mean that
      the guest will be completely stalled.<br>
      <br>
      In any case the migration will have to do its work and finish
      given any actual migration-speed, so even low aggressionLevel
      values will sometimes imply that the guest will be throttled
      substantially.
      <br>
      <br>
      The algorithm:<br>
      The aggressionLevel should determine the targetGuest%CPU
      (how much CPU time we want to allocate to the guest)<br>
      With aggressionLevel = 1.0, the guest gets no CPU-resources
      (stalled).
      <br>
      With aggressionLevel = 0.0, the guest gets minGuest%CPU, such that
      migrationRate == dirtyPagesRate. This minGuest%CPU is continuously
      updated based on the running average of the recent samples (more
      below).<br>
      <br>
      Note that the targetGuest%CPU allocation is continuously updated
      due to changes guest behavior, network congestion, and alike.
      <br>
      <br>
      Some more details<br>
      - minGuest%CPU (i.e., for dirtyPagesRate == migrationRate) is easy
      to calculate as a running average of<br>
      &nbsp; (migrationRate / dirtyPagesRate * guest%CPU)<br>
      - There are several methods to calculate the running average, my
      favorite is IIR, where, roughly speaking,<br>
      &nbsp; newVal = 0.99 * oldVal + 0.01 * newSample<br>
      - I would use two measures to ensure that there are more migrated
      pages than "dirty" pages. <br>
      &nbsp; 1. The running average (based on recent samples) of the migrated
      pages is larger than that of the new dirty pages
      <br>
      &nbsp; 2. The total number of migrated pages so far is larger than the
      total number of new dirty pages.<br>
      <br>
      And yes, many details are still missing.<br>
      <br>
      Ronen.
      <br>
      <br>
    </div>
  </body>
</html>

--------------030405050701000000060909--