From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=59975 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PnEYD-0005jz-KY
	for qemu-devel@nongnu.org; Wed, 09 Feb 2011 13:14:02 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <thomas@scripty.at>) id 1PnEYC-0007l1-2Z
	for qemu-devel@nongnu.org; Wed, 09 Feb 2011 13:14:01 -0500
Received: from grace.univie.ac.at ([131.130.3.115]:42174)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <thomas@scripty.at>) id 1PnEYB-0007jf-Jg
	for qemu-devel@nongnu.org; Wed, 09 Feb 2011 13:14:00 -0500
Received: from joan.univie.ac.at ([131.130.3.110] helo=joan.univie.ac.at)
	by grace.univie.ac.at with esmtp (Exim 4.73)
	(envelope-from <thomas@scripty.at>) id 1PnEY8-0003KQ-H2
	for qemu-devel@nongnu.org; Wed, 09 Feb 2011 19:13:56 +0100
Received: from d86-32-228-72.cust.tele2.at ([86.32.228.72] helo=[10.0.3.2])
	by joan.univie.ac.at with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.73)
	(envelope-from <thomas@scripty.at>) id 1PnEY8-0007jJ-Eq
	for qemu-devel@nongnu.org; Wed, 09 Feb 2011 19:13:56 +0100
Message-ID: <4D52D95D.3030300@scripty.at>
Date: Wed, 09 Feb 2011 19:13:49 +0100
From: Thomas Treutner <thomas@scripty.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel]
	Migration speed throttling, max_throttle in migration.c
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Hi,

I was reading qemu's (qemu-kvm-0.13.0's, to be specific) live migration 
code to unterstand how the iterative dirty page transfer is implemented. 
During this I noticed that ram_save_live in arch_init.c is called quite 
often, more often than I expected (approx. 200 times for an idle 500MiB 
VM). I found out that this is because of while 
(!qemu_file_rate_limit(f)), which evaluates very often to true, and as 
there are remaining dirty pages, ram_save_live is called again.

As I had set no bandwith limit in the libvirt call, I digged deeper and 
found a hard coded maximum bandwidth in migration.c:

/* Migration speed throttling */
static uint32_t max_throttle = (32 << 20);

Using a packet sniffer I verified that max_throttle is Byte/s, here of 
course 32 MiB/s. Additionally, it translates directly to network 
bandwidth - I was not sure about that, as the bandwidth measured in 
ram_save_live seems to be buffer/memory subsystem bandwidth?

Anyways, I'm wondering why exactly *this* value was chosen as a hard 
coded limit? 32MiB/s are ~ 250Mbit/s, which is *both* much more than 
100Mbit/s Ethernet and much less than Gbit-Ethernet can cope with? So in 
the first case, TCP congestion control should take over control anyways, 
and in the second, 3/4 of the bandwidth is thrown away.

As I'm using Gbit-Ethernet, I experimented with different values. With 
max_throttle = (112 << 20); - which is ~ 900Mbit/s - my Gbit network is 
nicely saturated, and live migrations of a rather idle 700MiB VM take 
~5s instead of ~15s without any problems, which is very nice. Much more 
important is the fact that VMs with higher memory activity and therefore 
higher rates of page dirtying are transferred more easily without 
additional manual action, as the default maxdowntime is 30ms, which is 
often unreachable in such situations and there is no evasive action 
built in, like a maximum number of iterations and a forced last 
iteration or aborting the migration when this limit is reached.

So, I'm asking if there is a good reason why *not* to change 
max_throttle to a value that targets at saturating a Gbit network, if 
100Mbit networks will be "flooded" anyways by the current setting?


thanks & regards,
-t