From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N6haN-00070F-Lz for qemu-devel@nongnu.org; Sat, 07 Nov 2009 04:27:55 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N6haI-0006vF-N1 for qemu-devel@nongnu.org; Sat, 07 Nov 2009 04:27:55 -0500 Received: from [199.232.76.173] (port=39439 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N6haI-0006v2-FL for qemu-devel@nongnu.org; Sat, 07 Nov 2009 04:27:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49372) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1N6haI-0002Ru-0y for qemu-devel@nongnu.org; Sat, 07 Nov 2009 04:27:50 -0500 Message-ID: <4AF53D8C.7080708@redhat.com> Date: Sat, 07 Nov 2009 11:27:40 +0200 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH 0/4] net-bridge: rootless bridge support for qemu References: <1257294485-27015-1-git-send-email-aliguori@us.ibm.com> <20091105163702.GC21630@shareable.org> <4AF30129.7080203@us.ibm.com> <200911051820.48878.arnd@arndb.de> <4AF3154F.8090901@redhat.com> <4AF36DE9.3040803@us.ibm.com> <4AF3CF8C.1030408@redhat.com> <4AF44A33.6010602@us.ibm.com> In-Reply-To: <4AF44A33.6010602@us.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Mark McLoughlin , Arnd Bergmann , agl@linux.vnet.ibm.com, Arnd Bergmann , Juan Quintela , Dustin Kirkland , qemu-devel@nongnu.org, Michael Tsirkin On 11/06/2009 06:09 PM, Anthony Liguori wrote: >> No, it's an argument against fork() of large programs. > > > After putting together a work around, I'm starting to have my doubts > about how real of a problem this is. > > You're only write protecting memory, correct? The kernel write protects qemu memory, and kvm write protects shadow page table entries. Live migration only does the second. > So it's equivalent to enabling dirty tracking during live migration. > In my mind, if the cost associated with hot plug is a fraction of the > cost of live migration, we're in good shape. I don't see why. Live migration is pretty expensive, but that doesn't mean NIC hotplug should be. Deployments where live migration isn't a concern (for example, performance critical guests that use device assignment) don't suffer the live migration penalty so they shouldn't expect to see a gratuitous NIC hotplug penalty that is a fraction of that. Come to think of that, we probably have some fork() breakage with device assignment since we can't write protect pages assigned to the iommu. > It's not likely that a 16GB guest is going to write-fault in it's > entirely memory range immediately. In fact, it's likely to be > amortized over a very long period of time so I have a hard time > believing this is really an issue in practice. It depends on the workload. With large pages in both host and guest you can touch 10M pages/sec without difficulty. Once you write protect them this drops to maybe 0.3M pages/sec. The right workload will suffer pretty badly from this. > Arguably, it's a much bigger problem for live migration. It is. I once considered switching live migration to shadow pte dirty bit tracking instead of write protection, but ept doesn't have dirty bits, so this will only help a minority of deployments by the time it reaches users. So vfork() is required, or in light of its man page and glowing recommendations from the security people, we can mark guest memory as shared on fork and use plain fork(), like we do for pre mmu notifier kernels. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.