From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1N6haN-00070F-Lz
	for qemu-devel@nongnu.org; Sat, 07 Nov 2009 04:27:55 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1N6haI-0006vF-N1
	for qemu-devel@nongnu.org; Sat, 07 Nov 2009 04:27:55 -0500
Received: from [199.232.76.173] (port=39439 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1N6haI-0006v2-FL
	for qemu-devel@nongnu.org; Sat, 07 Nov 2009 04:27:50 -0500
Received: from mx1.redhat.com ([209.132.183.28]:49372)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1N6haI-0002Ru-0y
	for qemu-devel@nongnu.org; Sat, 07 Nov 2009 04:27:50 -0500
Message-ID: <4AF53D8C.7080708@redhat.com>
Date: Sat, 07 Nov 2009 11:27:40 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH 0/4] net-bridge: rootless bridge support
	for qemu
References: <1257294485-27015-1-git-send-email-aliguori@us.ibm.com>
	<20091105163702.GC21630@shareable.org>	<4AF30129.7080203@us.ibm.com>
	<200911051820.48878.arnd@arndb.de>	<4AF3154F.8090901@redhat.com>
	<4AF36DE9.3040803@us.ibm.com> <4AF3CF8C.1030408@redhat.com>
	<4AF44A33.6010602@us.ibm.com>
In-Reply-To: <4AF44A33.6010602@us.ibm.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <aliguori@us.ibm.com>
Cc: Mark McLoughlin <markmc@redhat.com>, Arnd Bergmann <arndbergmann@googlemail.com>, agl@linux.vnet.ibm.com, Arnd Bergmann <arnd@arndb.de>, Juan Quintela <quintela@redhat.com>, Dustin Kirkland <kirkland@canonical.com>, qemu-devel@nongnu.org, Michael Tsirkin <mst@redhat.com>

On 11/06/2009 06:09 PM, Anthony Liguori wrote:
>> No, it's an argument against fork() of large programs.
>
>
> After putting together a work around, I'm starting to have my doubts 
> about how real of a problem this is.
>
> You're only write protecting memory, correct? 


The kernel write protects qemu memory, and kvm write protects shadow 
page table entries.  Live migration only does the second.

> So it's equivalent to enabling dirty tracking during live migration.  
> In my mind, if the cost associated with hot plug is a fraction of the 
> cost of live migration, we're in good shape.

I don't see why.  Live migration is pretty expensive, but that doesn't 
mean NIC hotplug should be.  Deployments where live migration isn't a 
concern (for example, performance critical guests that use device 
assignment) don't suffer the live migration penalty so they shouldn't 
expect to see a gratuitous NIC hotplug penalty that is a fraction of that.

Come to think of that, we probably have some fork() breakage with device 
assignment since we can't write protect pages assigned to the iommu.

> It's not likely that a 16GB guest is going to write-fault in it's 
> entirely memory range immediately.  In fact, it's likely to be 
> amortized over a very long period of time so I have a hard time 
> believing this is really an issue in practice.

It depends on the workload.  With large pages in both host and guest you 
can touch 10M pages/sec without difficulty.  Once you write protect them 
this drops to maybe 0.3M pages/sec.  The right workload will suffer 
pretty badly from this.

> Arguably, it's a much bigger problem for live migration.

It is.  I once considered switching live migration to shadow pte dirty 
bit tracking instead of write protection, but ept doesn't have dirty 
bits, so this will only help a minority of deployments by the time it 
reaches users.

So vfork() is required, or in light of its man page and glowing 
recommendations from the security people, we can mark guest memory as 
shared on fork and use plain fork(), like we do for pre mmu notifier 
kernels.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.