From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52122) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YFy75-0006Rl-Ng for qemu-devel@nongnu.org; Mon, 26 Jan 2015 23:50:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YFy72-00033v-D6 for qemu-devel@nongnu.org; Mon, 26 Jan 2015 23:50:55 -0500 Received: from ozlabs.org ([103.22.144.67]:50517) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YFy71-00033Y-RW for qemu-devel@nongnu.org; Mon, 26 Jan 2015 23:50:52 -0500 Date: Tue, 27 Jan 2015 15:33:12 +1100 From: David Gibson Message-ID: <20150127043312.GA19081@voom.fritz.box> References: <1412358473-31398-1-git-send-email-dgilbert@redhat.com> <1412358473-31398-45-git-send-email-dgilbert@redhat.com> <20141113032306.GF7291@voom.fritz.box> <20150105171349.GA18097@work-vm> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="82I3+IH0IqGh5yIs" Content-Disposition: inline In-Reply-To: <20150105171349.GA18097@work-vm> Subject: Re: [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, cristian.klein@cs.umu.se, qemu-devel@nongnu.org, amit.shah@redhat.com, yanghy@cn.fujitsu.com --82I3+IH0IqGh5yIs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jan 05, 2015 at 05:13:50PM +0000, Dr. David Alan Gilbert wrote: > * David Gibson (david@gibson.dropbear.id.au) wrote: > > On Fri, Oct 03, 2014 at 06:47:50PM +0100, Dr. David Alan Gilbert (git) = wrote: > > > From: "Dr. David Alan Gilbert" > > >=20 > > > userfaultfd is a Linux syscall that gives an fd that receives a stream > > > of notifications of accesses to pages marked as MADV_USERFAULT, and > > > allows the program to acknowledge those stalls and tell the accessing > > > thread to carry on. > > >=20 > > > Signed-off-by: Dr. David Alan Gilbert > >=20 > > [snip] > > > /* > > > + * Tell the kernel that we've now got some memory it previously aske= d for. > > > + * Note: We're not allowed to ack a page which wasn't requested. > > > + */ > > > +static int ack_userfault(MigrationIncomingState *mis, void *start, s= ize_t len) > > > +{ > > > + uint64_t tmp[2]; > > > + > > > + /* > > > + * Kernel wants the range that's now safe to access > > > + * Note it always takes 64bit values, even on a 32bit host. > > > + */ > > > + tmp[0] =3D (uint64_t)(uintptr_t)start; > > > + tmp[1] =3D (uint64_t)(uintptr_t)start + (uint64_t)len; > > > + > > > + if (write(mis->userfault_fd, tmp, 16) !=3D 16) { > > > + int e =3D errno; > >=20 > > Is an EOF (i.e. write() returns 0) ever possible here? If so errno > > may not have a meaningful value. >=20 > I don't think so; I think any !=3D16 case is an error; however if I under= stand > correctly the safe thing to do is for me to do an=20 >=20 > errno =3D 0 >=20 > before the call. Either that, or handle unexpected EOF / short write as a different case. >=20 > >=20 > > > + if (e =3D=3D ENOENT) { > > > + /* Kernel said it wasn't waiting - one case where this c= an > > > + * happen is where two threads triggered the userfault > > > + * and we receive the page and ack it just after we rece= ived > > > + * the 2nd request and that ends up deciding it should a= ck it > > > + * We could optimise it out, but it's rare. > > > + */ > > > + /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", star= t, len); */ > > > + return 0; > > > + } > > > + error_report("postcopy_ram: Failed to notify kernel for %p/%= zx (%d)", > > > + start, len, e); > > > + return -errno; >=20 > Hmm, and made that return -e Ah, yes, otherwise it's very likely that error_report() will clobber the value. > > > +/* > > > * Handle faults detected by the USERFAULT markings > > > */ > > > static void *postcopy_ram_fault_thread(void *opaque) > > > { > > > MigrationIncomingState *mis =3D (MigrationIncomingState *)opaque; > > > + void *hostaddr; > > > + int ret; > > > + size_t hostpagesize =3D getpagesize(); > > > + RAMBlock *rb =3D NULL; > > > + RAMBlock *last_rb =3D NULL; /* last RAMBlock we sent part of */ > > > =20 > > > - fprintf(stderr, "postcopy_ram_fault_thread\n"); > > > - /* TODO: In later patch */ > > > + DPRINTF("%s", __func__); > > > qemu_sem_post(&mis->fault_thread_sem); > > > - while (1) { > > > - /* TODO: In later patch */ > > > - } > > > + while (true) { > > > + PostcopyPMIState old_state, tmp_state; > > > + ram_addr_t rb_offset; > > > + ram_addr_t in_raspace; > > > + unsigned long bitmap_index; > > > + struct pollfd pfd[2]; > > > + > > > + /* > > > + * We're mainly waiting for the kernel to give us a faulting= HVA, > > > + * however we can be told to quit via userfault_quit_fd whic= h is > > > + * an eventfd > > > + */ > > > + pfd[0].fd =3D mis->userfault_fd; > > > + pfd[0].events =3D POLLIN; > > > + pfd[0].revents =3D 0; > > > + pfd[1].fd =3D mis->userfault_quit_fd; > > > + pfd[1].events =3D POLLIN; /* Waiting for eventfd to go posit= ive */ > > > + pfd[1].revents =3D 0; > > > + > > > + if (poll(pfd, 2, -1 /* Wait forever */) =3D=3D -1) { > > > + perror("userfault poll"); > > > + break; > > > + } > > > =20 > > > + if (pfd[1].revents) { > > > + DPRINTF("%s got quit event", __func__); > > > + break; > >=20 > > I don't see any cleanup path in the userfault thread. So wouldn't it > > be simpler to just pthread_cancel() it rather than using an extra fd > > for quit notifications. >=20 > But it does call functions that take locks (both the pmi and the > return path qemu-file), so I don't feel comfortable just cancelling the > thread. Ah, good point. Use of an event restrict the points at which the thread can exit, which is significant. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --82I3+IH0IqGh5yIs Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJUxxUHAAoJEGw4ysog2bOSPXMQAI2xkJGeJETC9JbVsAvHswCY RzcY7TZ1iaKSRFLawS0VJZ2NZuMvfzEpzs/GwtoGoSoEStS+gxWwmo2LskrpsRuw +7M0gol7arP6cxoHV81GMeYcmmEJotstXZZR7X/VFSyNxUsa17F8G70PKzjnRvkW zNiM9QP+ABk1Xgu3Cd8hwBDXysY/OTmDnDqKgWCvEvuuHwfdmlQAZG5AhIqKqZ79 gcX9xlG3UcUuW5mGYimVecN2G2FHc6n/3G5NuLRkYlEbX7skvA+yFYH8Z3tMuZ+i Q83E4eai9ZnDWX09WG5zp34km5fgltUkWyO5eNFl1MzAaAK0Gi3OTqXjBwwujaPu 6+SRLlgSANb4RugQ8nQy+6CiG/l8uUZRKyZ1Ois3ORniYsdbFwYaVaB1RhmcE6LM ZStttlpJj8Xpsva0n2rgC65b5ERr45dl+9JwfHaBl0XMX7MNZb+18F/4rdL31We0 i/qKecG2B8JN8YuI37BG+5HU1QqE+vXLh0wrlvSGZ7ynAtZEYG8sngdvLxgT+0jG AEyIKw2MRQq7lGM7Lb0oYN41MRvzxOCL1cgRPqXJFH6pLG680tSRfYgQ1ty9FGMh OHeQ5L205daKKFjUOlUdPdzSSeY8lv1grcuUP1UMjwtvbg/QCObLIIkww08AcNlg AMeekabvqi15SNFqgUxp =LhB4 -----END PGP SIGNATURE----- --82I3+IH0IqGh5yIs--