From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MInBP-0002EO-KL for qemu-devel@nongnu.org; Mon, 22 Jun 2009 13:19:51 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MInBK-00026s-6l for qemu-devel@nongnu.org; Mon, 22 Jun 2009 13:19:50 -0400 Received: from [199.232.76.173] (port=38547 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MInBJ-00026f-Vu for qemu-devel@nongnu.org; Mon, 22 Jun 2009 13:19:46 -0400 Received: from mx2.redhat.com ([66.187.237.31]:49778) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MInBJ-0003FU-Fj for qemu-devel@nongnu.org; Mon, 22 Jun 2009 13:19:45 -0400 Message-ID: <4A3FBD61.8030109@redhat.com> Date: Mon, 22 Jun 2009 20:20:33 +0300 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [Qemu-commits] [COMMIT 3086844] Instead of writing a zero page, madvise it away References: <200906221549.n5MFn3Qd015389@d03av02.boulder.ibm.com> <4A3FAD69.60507@redhat.com> <4A3FB077.4040607@codemonkey.ws> <4A3FB390.4060809@redhat.com> <4A3FB95D.3060404@us.ibm.com> In-Reply-To: <4A3FB95D.3060404@us.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: qemu-devel On 06/22/2009 08:03 PM, Anthony Liguori wrote: >>> >>> 1) Start a guest with 1024, balloon down to 128MB. RSS size is now >>> ~128MB >>> 2) Live migrate to a different node >>> 3) RSS on different node jumps to ~1GB >> >> 3.5) RSS on source node jumps to ~1GB, since reading the page >> instantiates the pte > I don't follow. In this case, the issue is: > > Surely we can do better here... > > For TCG, we always know when memory is dirty and we can check it > atomically. So we know whether a page has changed since we knew it > was last zero. We basically need a ZERO_DIRTY bit. All memory > initially carries this bit and ballooning also sets the bit. During > live migration, we can check the dirty bit first. You mean, a NONZERO bit which is cleared by ballooning and set on any write. This will work naturally with the qemu dirty bytemap. > > For KVM, we would have to enable dirty tracking always to keep > ZERO_DIRTY up to date. Since write faults are going to happen anyway > at start up, perhaps the cost of doing this wouldn't be so bad? You need to do this on the source node. Unfortunately, there's no way to initialize the values racelessly when you start live migration without introducing a new ioctl. I'd like a more general solution rather than something that targets this specific problem. -- error compiling committee.c: too many arguments to function