From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56128) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZzOyl-00067n-HG for qemu-devel@nongnu.org; Thu, 19 Nov 2015 08:10:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZzOyf-0006GG-Of for qemu-devel@nongnu.org; Thu, 19 Nov 2015 08:10:23 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51652) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZzOyf-0006G5-HK for qemu-devel@nongnu.org; Thu, 19 Nov 2015 08:10:17 -0500 Date: Thu, 19 Nov 2015 13:10:11 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20151119131011.GD2653@work-vm> References: <1447341516-18076-1-git-send-email-jjherne@linux.vnet.ibm.com> <564C7DCA.8010400@suse.cz> <564D86AE.1010305@de.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <564D86AE.1010305@de.ibm.com> Subject: Re: [Qemu-devel] [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christian Borntraeger Cc: aarcange@redhat.com, Juan Quintela , qemu-devel , "Jason J. Herne" , akpm@linux-foundation.org, Vlastimil Babka * Christian Borntraeger (borntraeger@de.ibm.com) wrote: > On 11/18/2015 02:31 PM, Vlastimil Babka wrote: > > [CC +=3D linux-api@vger.kernel.org] > >=20 > > Since this is a kernel-user-space API change, please CC linux-api@. The= kernel > > source file Documentation/SubmitChecklist notes that all Linux kernel p= atches > > that change userspace interfaces should be CCed to linux-api@vger.kerne= l.org, so > > that the various parties who are interested in API changes are informed= =2E For > > further information, see https://www.kernel.org/doc/man-pages/linux-api= -ml.html > >=20 > > On 11/12/2015 04:18 PM, Jason J. Herne wrote: > >> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables > >> hugepage but hugepage_madvise() takes the error path when we ask to tu= rn > >> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu= 's > >> new postcopy migration feature to fail on s390 because its first actio= n is > >> to madvise the guest address space as NOHUGEPAGE. This patch modifies = the > >> code so that the operation succeeds without error now. > >> > >> Signed-off-by: Jason J. Herne > >> Reviewed-by: Andrea Arcangeli > >=20 > > Looks like the manpage should be fine, as it wasn't very specific wrt t= hese > > madvise flags. The only thing that potentially applies is: > >=20 > > "EINVAL advice is not a valid." > >=20 > > which itself looks like it needs fixing. Valid what, value? As in compl= etely > > unknown flags, or flags not valid for the given vma? > >=20 > > Anyway, I agree that it doesn't make sense to fail madvise when the giv= en flag > > is already set. On the other hand, I don't think the userspace app shou= ld fail > > just because of madvise failing? It should in general be an advice that= the > > kernel is also strictly speaking free to ignore as it shouldn't affect > > correctnes, just performance. Yeah, there are exceptions today like > > MADV_DONTNEED, but that shouldn't apply to hugepages? > > So I think Qemu needs fixing too. >=20 > yes, I agree. David, Juan. I think The postcopy code should not fail if t= he madvise. > Can you fix that?=20 (cutting down cc list a bit) Can you tell me if the following works for you: =46rom 545809a18fa768eccdaafe9bd842490c3390b00c Mon Sep 17 00:00:00 2001 =46rom: "Dr. David Alan Gilbert" Date: Thu, 19 Nov 2015 12:05:36 +0000 Subject: [PATCH] Assume madvise for (no)hugepage works madvise() returns EINVAL in the case of many failures, but also returns it in cases where the host kernel doesn't have THP enabled. Postcopy only really cares that THP is off before it detects faults, and turns it back on afterwards; so we're going to have to assume that if the madvise fails then the host just doesn't do THP and we can carry on with the postcopy. Signed-off-by: Dr. David Alan Gilbert --- migration/postcopy-ram.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 22d6b18..3946aa9 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -241,10 +241,7 @@ static int cleanup_range(const char *block_name, void = *host_addr, * We turned off hugepage for the precopy stage with postcopy enabled * we can turn it back on now. */ - if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) { - error_report("%s HUGEPAGE: %s", __func__, strerror(errno)); - return -1; - } + qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE); =20 /* * We can also turn off userfault now since we should have all the @@ -345,10 +342,7 @@ static int nhp_range(const char *block_name, void *hos= t_addr, * do delete areas of the page, even if THP thinks a hugepage would * be a good idea, so force hugepages off. */ - if (qemu_madvise(host_addr, length, QEMU_MADV_NOHUGEPAGE)) { - error_report("%s: NOHUGEPAGE: %s", __func__, strerror(errno)); - return -1; - } + qemu_madvise(host_addr, length, QEMU_MADV_NOHUGEPAGE); =20 return 0; } --=20 2.5.0 -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK