From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59329) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtfrV-0003EC-3L for qemu-devel@nongnu.org; Tue, 03 Nov 2015 12:59:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZtfrQ-0004hk-S1 for qemu-devel@nongnu.org; Tue, 03 Nov 2015 12:59:13 -0500 Received: from mx1.redhat.com ([209.132.183.28]:8656) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtfrQ-0004he-KO for qemu-devel@nongnu.org; Tue, 03 Nov 2015 12:59:08 -0500 Date: Tue, 3 Nov 2015 17:59:02 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20151103175902.GB31958@work-vm> References: <1443515898-3594-1-git-send-email-dgilbert@redhat.com> <1443515898-3594-34-git-send-email-dgilbert@redhat.com> <87d1w8bow4.fsf@neno.neno> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87d1w8bow4.fsf@neno.neno> Subject: Re: [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: aarcange@redhat.com, liang.z.li@intel.com, qemu-devel@nongnu.org, luis@cs.umu.se, bharata@linux.vnet.ibm.com, amit.shah@redhat.com, pbonzini@redhat.com * Juan Quintela (quintela@redhat.com) wrote: > "Dr. David Alan Gilbert (git)" wrote: > > From: "Dr. David Alan Gilbert" > > > > Signed-off-by: Dr. David Alan Gilbert > > Reviewed-by: David Gibson > > Reviewed-by: Amit Shah > > > +/* > > + * At the end of migration, undo the effects of init_range > > + * opaque should be the MIS. > > + */ > > +static int cleanup_range(const char *block_name, void *host_addr, > > + ram_addr_t offset, ram_addr_t length, void *opaque) > > +{ > > + MigrationIncomingState *mis = opaque; > > + struct uffdio_range range_struct; > > + trace_postcopy_cleanup_range(block_name, host_addr, offset, length); > > + > > + /* > > + * We turned off hugepage for the precopy stage with postcopy enabled > > + * we can turn it back on now. > > + */ > > +#ifdef MADV_HUGEPAGE > > + if (madvise(host_addr, length, MADV_HUGEPAGE)) { > > + error_report("%s HUGEPAGE: %s", __func__, strerror(errno)); > > + return -1; > > + } > > +#endif > > this should be the same than: > > qemu_madvise(host_addr, lenght, QEMU_MADV_HUGEPAGE); > > Only problem I can see, is that there is no way to differentiate that > madvise() has given one error or that MADV_HUGEPAGE is not defined. > > If we really want that: > > if (QEMU_MADV_HUGEPAGE != QEM_MADV_INVALID) { > if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) { > error_report("%s HUGEPAGE: %s", __func__, strerror(errno)); > return -1; > } > > But I am not sure if we want it. Yes, so what I've currently got is: if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) { error_report("%s HUGEPAGE: %s", __func__, strerror(errno)); return -1; } I'm tempted to add that if check, but the other similar case is where you have headers that define HUGEPAGE, but a kernel built without it, and in that case the madvise fails, which is a shame, since if the kernel hasn't actually got transparent hugepages, then we don't care if it failed to turn them on/off - but there doesn't seem to be a good way to tell that. > > + /* > > + * We can also turn off userfault now since we should have all the > > + * pages. It can be useful to leave it on to debug postcopy > > + * if you're not sure it's always getting every page. > > + */ > > + range_struct.start = (uintptr_t)host_addr; > > + range_struct.len = length; > > + > > + if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) { > > + error_report("%s: userfault unregister %s", __func__, strerror(errno)); > > + > > + return -1; > > + } > > + > > + return 0; > > +} > > > I still think that exposing the userfault API all around is a bad idea, > that it would be easier to just export: > > qemu_userfault_register_range(addr, lenght); > qemu_userfault_unregister_range(addr, lenght); > > And hide the details on a header file. That only hides a tiny bit of the detail; for example the ioctl's for UFFDIO_COPY and UFFDIO_ZEROPAGE, have semantics associated with them (that they also wake the waiting process for example) it's not obvious that another OS would implement it in a similar way or what the constraints on it would be. Indeed the previous kernel API we had, meant I had to do a lot more work with a similar set of calls in userspace. Most of the places where we pull this out into separate headers/libraries is where we have an interface that's the same across a bunch of different OSs but the detail is different. Currently we only have one interaface and no idea what the commonality would be, or how much of the semantics that's in postcopy-ram.c would need to move with that interface as well. Dave > > Later, Juan. -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK