From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53210) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WhlYl-0002kD-Vw for qemu-devel@nongnu.org; Tue, 06 May 2014 16:02:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WhlYd-0004b0-GQ for qemu-devel@nongnu.org; Tue, 06 May 2014 16:01:51 -0400 Received: from mail1.windriver.com ([147.11.146.13]:40888) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WhlYd-0004ah-9B for qemu-devel@nongnu.org; Tue, 06 May 2014 16:01:43 -0400 Received: from ALA-HCA.corp.ad.wrs.com (ala-hca.corp.ad.wrs.com [147.11.189.40]) by mail1.windriver.com (8.14.5/8.14.5) with ESMTP id s46K1fUH021367 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL) for ; Tue, 6 May 2014 13:01:41 -0700 (PDT) Message-ID: <53693FA4.3000306@windriver.com> Date: Tue, 6 May 2014 14:01:40 -0600 From: Chris Friesen MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] virtio-serial-pci very expensive during live migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hi, I recently made the unfortunate discovery that virtio-serial-pci is quite expensive to stop/start during live migration. By default we support 32 ports, each of which uses 2 queues. In my case it takes 2-3ms per queue to disconnect on the source host, and another 2-3ms per queue to connect on the target host, for a total cost of >300ms. In our case this roughly tripled the outage times of a libvirt-based live migration, from 150ms to almost 500ms. It seems like the main problem is that we loop over all the queues, calling virtio_pci_set_host_notifier_internal() on each of them. That in turn calls memory_region_add_eventfd(), which calls memory_region_transaction_commit(), which scans over all the address spaces, which seems to take the vast majority of the time. Yes, setting the max_ports value to something smaller does help, but each port still adds 10-12ms to the overall live migration time, which is crazy. Is there anything that could be done to make this code more efficient? Could we tweak the API so that we add all the eventfds and then do a single commit at the end? Do we really need to scan the entire address space? I don't know the code well enough to answer that sort of question, but I'm hoping that one of you does. Thanks, Chris