From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53210)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Chris.Friesen@windriver.com>) id 1WhlYl-0002kD-Vw
	for qemu-devel@nongnu.org; Tue, 06 May 2014 16:02:00 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Chris.Friesen@windriver.com>) id 1WhlYd-0004b0-GQ
	for qemu-devel@nongnu.org; Tue, 06 May 2014 16:01:51 -0400
Received: from mail1.windriver.com ([147.11.146.13]:40888)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Chris.Friesen@windriver.com>) id 1WhlYd-0004ah-9B
	for qemu-devel@nongnu.org; Tue, 06 May 2014 16:01:43 -0400
Received: from ALA-HCA.corp.ad.wrs.com (ala-hca.corp.ad.wrs.com
	[147.11.189.40])
	by mail1.windriver.com (8.14.5/8.14.5) with ESMTP id s46K1fUH021367
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL)
	for <qemu-devel@nongnu.org>; Tue, 6 May 2014 13:01:41 -0700 (PDT)
Message-ID: <53693FA4.3000306@windriver.com>
Date: Tue, 6 May 2014 14:01:40 -0600
From: Chris Friesen <chris.friesen@windriver.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] virtio-serial-pci very expensive during live migration
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Hi,

I recently made the unfortunate discovery that virtio-serial-pci is 
quite expensive to stop/start during live migration.

By default we support 32 ports, each of which uses 2 queues.  In my case 
it takes 2-3ms per queue to disconnect on the source host, and another 
2-3ms per queue to connect on the target host, for a total cost of >300ms.

In our case this roughly tripled the outage times of a libvirt-based 
live migration, from 150ms to almost 500ms.

It seems like the main problem is that we loop over all the queues, 
calling virtio_pci_set_host_notifier_internal() on each of them.  That 
in turn calls memory_region_add_eventfd(), which calls 
memory_region_transaction_commit(), which scans over all the address 
spaces, which seems to take the vast majority of the time.

Yes, setting the max_ports value to something smaller does help, but 
each port still adds 10-12ms to the overall live migration time, which 
is crazy.

Is there anything that could be done to make this code more efficient? 
Could we tweak the API so that we add all the eventfds and then do a 
single commit at the end?  Do we really need to scan the entire address 
space?  I don't know the code well enough to answer that sort of 
question, but I'm hoping that one of you does.

Thanks,
Chris