From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=59975 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PnEYD-0005jz-KY for qemu-devel@nongnu.org; Wed, 09 Feb 2011 13:14:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PnEYC-0007l1-2Z for qemu-devel@nongnu.org; Wed, 09 Feb 2011 13:14:01 -0500 Received: from grace.univie.ac.at ([131.130.3.115]:42174) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PnEYB-0007jf-Jg for qemu-devel@nongnu.org; Wed, 09 Feb 2011 13:14:00 -0500 Received: from joan.univie.ac.at ([131.130.3.110] helo=joan.univie.ac.at) by grace.univie.ac.at with esmtp (Exim 4.73) (envelope-from ) id 1PnEY8-0003KQ-H2 for qemu-devel@nongnu.org; Wed, 09 Feb 2011 19:13:56 +0100 Received: from d86-32-228-72.cust.tele2.at ([86.32.228.72] helo=[10.0.3.2]) by joan.univie.ac.at with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.73) (envelope-from ) id 1PnEY8-0007jJ-Eq for qemu-devel@nongnu.org; Wed, 09 Feb 2011 19:13:56 +0100 Message-ID: <4D52D95D.3030300@scripty.at> Date: Wed, 09 Feb 2011 19:13:49 +0100 From: Thomas Treutner MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Migration speed throttling, max_throttle in migration.c List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hi, I was reading qemu's (qemu-kvm-0.13.0's, to be specific) live migration code to unterstand how the iterative dirty page transfer is implemented. During this I noticed that ram_save_live in arch_init.c is called quite often, more often than I expected (approx. 200 times for an idle 500MiB VM). I found out that this is because of while (!qemu_file_rate_limit(f)), which evaluates very often to true, and as there are remaining dirty pages, ram_save_live is called again. As I had set no bandwith limit in the libvirt call, I digged deeper and found a hard coded maximum bandwidth in migration.c: /* Migration speed throttling */ static uint32_t max_throttle = (32 << 20); Using a packet sniffer I verified that max_throttle is Byte/s, here of course 32 MiB/s. Additionally, it translates directly to network bandwidth - I was not sure about that, as the bandwidth measured in ram_save_live seems to be buffer/memory subsystem bandwidth? Anyways, I'm wondering why exactly *this* value was chosen as a hard coded limit? 32MiB/s are ~ 250Mbit/s, which is *both* much more than 100Mbit/s Ethernet and much less than Gbit-Ethernet can cope with? So in the first case, TCP congestion control should take over control anyways, and in the second, 3/4 of the bandwidth is thrown away. As I'm using Gbit-Ethernet, I experimented with different values. With max_throttle = (112 << 20); - which is ~ 900Mbit/s - my Gbit network is nicely saturated, and live migrations of a rather idle 700MiB VM take ~5s instead of ~15s without any problems, which is very nice. Much more important is the fact that VMs with higher memory activity and therefore higher rates of page dirtying are transferred more easily without additional manual action, as the default maxdowntime is 30ms, which is often unreachable in such situations and there is no evasive action built in, like a maximum number of iterations and a forced last iteration or aborting the migration when this limit is reached. So, I'm asking if there is a good reason why *not* to change max_throttle to a value that targets at saturating a Gbit network, if 100Mbit networks will be "flooded" anyways by the current setting? thanks & regards, -t