From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MImr6-0003Vw-P9 for qemu-devel@nongnu.org; Mon, 22 Jun 2009 12:58:52 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MImr1-0003Uy-3W for qemu-devel@nongnu.org; Mon, 22 Jun 2009 12:58:51 -0400 Received: from [199.232.76.173] (port=47428 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MImr0-0003Uv-TA for qemu-devel@nongnu.org; Mon, 22 Jun 2009 12:58:46 -0400 Received: from mx20.gnu.org ([199.232.41.8]:16039) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MImr0-0004WT-As for qemu-devel@nongnu.org; Mon, 22 Jun 2009 12:58:46 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MImqw-0006y1-7h for qemu-devel@nongnu.org; Mon, 22 Jun 2009 12:58:42 -0400 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e33.co.us.ibm.com (8.13.1/8.13.1) with ESMTP id n5MGuTK4016011 for ; Mon, 22 Jun 2009 10:56:29 -0600 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n5MGwLSt220416 for ; Mon, 22 Jun 2009 10:58:23 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n5MGwKGb017826 for ; Mon, 22 Jun 2009 10:58:20 -0600 Message-ID: <4A3FB829.10203@us.ibm.com> Date: Mon, 22 Jun 2009 11:58:17 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [Qemu-commits] [COMMIT 3086844] Instead of writing a zero page, madvise it away References: <200906221549.n5MFn3Qd015389@d03av02.boulder.ibm.com> <4A3FAD69.60507@redhat.com> <4A3FB077.4040607@codemonkey.ws> <4A3FB390.4060809@redhat.com> In-Reply-To: <4A3FB390.4060809@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: qemu-devel Avi Kivity wrote: > On 06/22/2009 07:25 PM, Anthony Liguori wrote: > >> >> Xen had a similar issue. This ends up biting people who overcommit >> their VMs via ballooning, live migration, and badness ensues. At >> least for us, the error is swapping but madvise also avoids the issue >> by never consuming that memory to begin with. > > Right. I'd love to do madvise() on the source node as well if we > fault in a page and find out it's zero, but the guest (and aio) is > still running and we might drop live data. We need a > madvise(MADV_DONTNEED_IFZERO), or a mincore() flag that tells us if > the page exists (vs. swapped). ksm would also do this, but it is > overkill for some applications. > > Note that the patch contains a small bug -- the kernel is allowed to > ignore the advise according to the manual page, so it's better to > memset() the memory before dropping it. Hrm, that's not quite how I interpreted the man page. "This call does not influence the semantics of the application (except in the case of MADV_DONTNEED), but may influence its performance. The kernel is free to ignore the advice." MADV_DONTNEED is called out as changing the application semantics. Specifically, I think the kernel has to zero-fill even if it choose to ignore the advice. I limited the guard to Linux specifically because I was unsure about that behavior but it would be good to clarify if anyone knows how. -- Regards, Anthony Liguori