From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42994) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtgO7-0005L3-Et for qemu-devel@nongnu.org; Tue, 03 Nov 2015 13:33:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZtgO4-0007FL-Mb for qemu-devel@nongnu.org; Tue, 03 Nov 2015 13:32:55 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55904) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtgO4-0007FG-FV for qemu-devel@nongnu.org; Tue, 03 Nov 2015 13:32:52 -0500 From: Juan Quintela In-Reply-To: <20151103175902.GB31958@work-vm> (David Alan Gilbert's message of "Tue, 3 Nov 2015 17:59:02 +0000") References: <1443515898-3594-1-git-send-email-dgilbert@redhat.com> <1443515898-3594-34-git-send-email-dgilbert@redhat.com> <87d1w8bow4.fsf@neno.neno> <20151103175902.GB31958@work-vm> Date: Tue, 03 Nov 2015 19:32:50 +0100 Message-ID: <87r3k7lyr1.fsf@emacs.mitica> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation Reply-To: quintela@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: aarcange@redhat.com, liang.z.li@intel.com, qemu-devel@nongnu.org, luis@cs.umu.se, bharata@linux.vnet.ibm.com, amit.shah@redhat.com, pbonzini@redhat.com "Dr. David Alan Gilbert" wrote: > * Juan Quintela (quintela@redhat.com) wrote: >> "Dr. David Alan Gilbert (git)" wrote: >> > From: "Dr. David Alan Gilbert" >> > >> > Signed-off-by: Dr. David Alan Gilbert >> > Reviewed-by: David Gibson >> > Reviewed-by: Amit Shah >> >> > +/* >> > + * At the end of migration, undo the effects of init_range >> > + * opaque should be the MIS. >> > + */ >> > +static int cleanup_range(const char *block_name, void *host_addr, >> > + ram_addr_t offset, ram_addr_t length, void *opaque) >> > +{ >> > + MigrationIncomingState *mis = opaque; >> > + struct uffdio_range range_struct; >> > + trace_postcopy_cleanup_range(block_name, host_addr, offset, length); >> > + >> > + /* >> > + * We turned off hugepage for the precopy stage with postcopy enabled >> > + * we can turn it back on now. >> > + */ >> > +#ifdef MADV_HUGEPAGE >> > + if (madvise(host_addr, length, MADV_HUGEPAGE)) { >> > + error_report("%s HUGEPAGE: %s", __func__, strerror(errno)); >> > + return -1; >> > + } >> > +#endif >> >> this should be the same than: >> >> qemu_madvise(host_addr, lenght, QEMU_MADV_HUGEPAGE); >> >> Only problem I can see, is that there is no way to differentiate that >> madvise() has given one error or that MADV_HUGEPAGE is not defined. >> >> If we really want that: >> >> if (QEMU_MADV_HUGEPAGE != QEM_MADV_INVALID) { >> if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) { >> error_report("%s HUGEPAGE: %s", __func__, strerror(errno)); >> return -1; >> } >> >> But I am not sure if we want it. > > Yes, so what I've currently got is: > > if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) { > error_report("%s HUGEPAGE: %s", __func__, strerror(errno)); > return -1; > } > > I'm tempted to add that if check, but the other similar case > is where you have headers that define HUGEPAGE, but a kernel built without > it, and in that case the madvise fails, which is a shame, since if the > kernel hasn't actually got transparent hugepages, then we don't > care if it failed to turn them on/off - but there doesn't seem > to be a good way to tell that. > >> > + /* >> > + * We can also turn off userfault now since we should have all the >> > + * pages. It can be useful to leave it on to debug postcopy >> > + * if you're not sure it's always getting every page. >> > + */ >> > + range_struct.start = (uintptr_t)host_addr; >> > + range_struct.len = length; >> > + >> > + if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) { >> > + error_report("%s: userfault unregister %s", __func__, >> > strerror(errno)); >> > + >> > + return -1; >> > + } >> > + >> > + return 0; >> > +} >> >> >> I still think that exposing the userfault API all around is a bad idea, >> that it would be easier to just export: >> >> qemu_userfault_register_range(addr, lenght); >> qemu_userfault_unregister_range(addr, lenght); >> >> And hide the details on a header file. > > That only hides a tiny bit of the detail; > for example the ioctl's for UFFDIO_COPY and UFFDIO_ZEROPAGE, have semantics > associated with them (that they also wake the waiting process for example) > it's not obvious that another OS would implement it in a similar way > or what the constraints on it would be. Indeed the previous kernel API > we had, meant I had to do a lot more work with a similar set of calls > in userspace. Most of the places where we pull this out into separate > headers/libraries is where we have an interface that's the same across > a bunch of different OSs but the detail is different. Currently we only > have one interaface and no idea what the commonality would be, or how > much of the semantics that's in postcopy-ram.c would need to move with > that interface as well. ok, if it is too difficult (I didn't knew about the associated semantics), we will wait until something else implemented a similar interface. Thanks, Juan.