From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49016) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKcLZ-00055r-5B for qemu-devel@nongnu.org; Sun, 08 Feb 2015 19:37:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YKcLX-0000tU-NS for qemu-devel@nongnu.org; Sun, 08 Feb 2015 19:37:05 -0500 Date: Mon, 9 Feb 2015 11:37:18 +1100 From: David Gibson Message-ID: <20150209113718.5974a1a9@voom.fritz.box> In-Reply-To: <54D473B0.3020201@suse.de> References: <1422943851-25836-1-git-send-email-david@gibson.dropbear.id.au> <20150203211906.GA13992@iris.ozlabs.ibm.com> <20150204013211.GU28703@voom.fritz.box> <54D23872.90007@suse.de> <20150205004812.GD25675@voom.fritz.box> <54D2BF4F.1030609@suse.de> <20150205025556.GH25675@voom.fritz.box> <6EFB0F0E-BB1D-4DAE-8BA4-367B05E88553@suse.de> <20150205113007.GT25675@voom.fritz.box> <54D35A41.6020907@suse.de> <20150206025415.GZ25675@voom.fritz.box> <54D473B0.3020201@suse.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/DCKMt4u4LW05Y/vDLUvEOwX"; protocol="application/pgp-signature" Subject: Re: [Qemu-devel] [RFC] pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Graf Cc: "mdroth@us.ibm.com" , "aik@ozlabs.ru" , "qemu-devel@nongnu.org" , "qemu-ppc@nongnu.org" , Stefan Hajnoczi , Paul Mackerras --Sig_/DCKMt4u4LW05Y/vDLUvEOwX Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Fri, 06 Feb 2015 08:56:32 +0100 Alexander Graf wrote: >=20 >=20 > On 06.02.15 03:54, David Gibson wrote: > > On Thu, Feb 05, 2015 at 12:55:45PM +0100, Alexander Graf wrote: > >> > >> > >> On 05.02.15 12:30, David Gibson wrote: > >>> On Thu, Feb 05, 2015 at 11:22:13AM +0100, Alexander Graf wrote: > > [snip] > >>>>>>>>>> [snip] > >>>>>>>>>> > >>>>>>>>>>> + ret1 =3D kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_LOA= D); > >>>>>>>>>>> + if (ret1 !=3D 0) { > >>>>>>>>>>> + fprintf(stderr, "Warning: error enabling H_LOGICAL_C= I_LOAD in KVM:" > >>>>>>>>>>> + " %s\n", strerror(errno)); > >>>>>>>>>>> + } > >>>>>>>>>>> + > >>>>>>>>>>> + ret2 =3D kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_STO= RE); > >>>>>>>>>>> + if (ret2 !=3D 0) { > >>>>>>>>>>> + fprintf(stderr, "Warning: error enabling H_LOGICAL_C= I_STORE in KVM:" > >>>>>>>>>>> + " %s\n", strerror(errno)); > >>>>>>>>>>> + } > >>>>>>>>>>> + > >>>>>>>>>>> + if ((ret1 !=3D 0) || (ret2 !=3D 0)) { > >>>>>>>>>>> + fprintf(stderr, "Warning: Couldn't enable H_LOGICAL_= CI_* in KVM, SLOF" > >>>>>>>>>>> + " may be unable to operate devices with in-k= ernel emulation\n"); > >>>>>>>>>>> + } > >>>>>>>>>> > >>>>>>>>>> You'll always get these warnings if you're running on an old (= meaning > >>>>>>>>>> current upstream) kernel, which could be annoying. > >>>>>>>>> > >>>>>>>>> True. > >>>>>>>>> > >>>>>>>>>> Is there any way > >>>>>>>>>> to tell whether you have configured any devices which need the > >>>>>>>>>> in-kernel MMIO emulation and only warn if you have? > >>>>>>>>> > >>>>>>>>> In theory, I guess so. In practice I can't see how you'd enume= rate > >>>>>>>>> all devices that might require kernel intervention without some= thing > >>>>>>>>> horribly invasive. > >>>>>>>> > >>>>>>>> We could WARN_ONCE in QEMU if we emulate such a hypercall, but i= ts > >>>>>>>> handler is io_mem_unassigned (or we add another minimum priority= huge > >>>>>>>> memory region on all 64bits of address space that reports the br= eakage). > >>>>>>> > >>>>>>> Would that work for the virtio+iothread case? I had the impressi= on > >>>>>>> the kernel handled notification region was layered over the qemu > >>>>>>> emulated region in that case. > >>>>>> > >>>>>> IIRC we don't have a way to call back into kvm saying "please writ= e to > >>>>>> this in-kernel device". But we could at least defer the warning to= a > >>>>>> point where we know that we actually hit it. > >>>>> > >>>>> Right, but I'm saying we might miss the warning in cases where we w= ant > >>>>> it, because the KVM device is shadowed by a qemu device, so qemu wo= n't > >>>>> see the IO as unassigned or unhandled. > >>>>> > >>>>> In particular, I think that will happen in the case of virtio-blk w= ith > >>>>> iothread, which is the simplest case in which to observe the proble= m. > >>>>> The virtio-blk device exists in qemu and is functional, but we rely= on > >>>>> KVM catching the queue notification MMIO before it reaches the qemu > >>>>> implementation of the rest of the device's IO space. > >>>> > >>>> But in that case the VM stays functional and will merely see a > >>>> performance hit when using virtio in SLOF, no? I don't think that's > >>>> a problem worth worrying users about. > >>> > >>> Alas, no. The iothread stuff *relies* on the in-kernel notification, > >>> so it will not work if the IO gets punted to qemu. This is the whole > >>> reason for the in-kernel hcall implementation. > >> > >> So at least with vhost-net the in-kernel trapping is optional. If we > >> happen to get MMIO into QEMU, we'll just handle it there. > >> > >> Enlighten me why the iothread stuff can't handle it that way too. > >=20 > > So, as I understand it, it could, but it doesn't. Working out how to > > fix it properly requires better understanding of the dataplane code > > than I currently possess, > >=20 > > So, using virtio-blk as the example case. Normally the queue notify > > mmio will get routed by the general virtio code to > > virtio_blk_handle_output(). > >=20 > > In the case of dataplane, that just calls > > virtio_blk_data_plane_start(). So the first time we get a vq notify, > > the dataplane is started. That sets up the host notifier > > (VirtioBusClass::set_host_notifier -> virtio_pci_set_host_notifier -> > > virtio_pci_set_host_notifier_internal -> memory_region_add_eventfd() > > -> memory_region_transaction_commit() -> > > address_space_update_ioeventfds - >address_space_add_del_ioeventfds -> > > kvm_mem_ioeventfd_add -> kvm_set_ioeventfd_mmio -> KVM_IOEVENTFD > > ioctl) > >=20 > > From this point on further calls to virtio_blk_handle_output() are > > IIUC a "can't happen", because vq notifies should go to the eventfd > > instead, where they will kick the iothread. > >=20 > > So, with SLOF, the first request is ok - it hits > > virtio_blk_handle_output() which starts the iothread which goes on to > > process the request. > >=20 > > On the second request, however, we get back into > > virtio_blk_data_plane_start() which sees the iothread is already > > running and aborts. I think it is assuming that this must be the > > result of a race with another vcpu starting the dataplane, and so > > assumes the racing thread will have woken the dataplane which will > > then handle this vcpu's request as well. > >=20 > > In our case, however, the IO hcalls go through to > > virtio_blk_handle_output() when the dataplane already going, and > > become no-ops without waking it up again to handle the new request. > >=20 > > Enlightened enough yet? >=20 > So reading this, it sounds like we could just add logic in the virtio > dataplane code that allows for a graceful fallback to QEMU based MMIO by > triggering the eventfd itself in the MMIO handler. When going via this > slow path, we should of course emit a warning (once) to the user ;). >=20 > Stefan, what do you think? So, as I understand it this should be possible. I did even have a draft which did this. However, I don't know the dataplane well enough to know what gotchas there might be in terms of races, and therefore how to do this quite right. Note that this doesn't remove the need for the in-kernel H_LOGICAL_CI_* hcalls, because those will still be necessary if we get real in-kernel emulated devices in future. --=20 David Gibson Senior Software Engineer, Virtualization, Red Hat --Sig_/DCKMt4u4LW05Y/vDLUvEOwX Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJU2AE+AAoJEGw4ysog2bOSYaUP/23hk4L2ZvM2iEeYTF2jjKhI NBszviFV5xZHz6xOunWmVobEZ7IXDAUr/FXZMr2/HDmuXrxTXYudjHPvY1uCgRZ4 UNfkyt9s9wiKbY11yxTScUQNENcGtSLgAdcqwcZQdYFR98u0bFEU4l0ZlgvVghr1 mRmA7SbZmnuIztiljjxFdSFnlDnbwfU8oqoBvUwcprDtXPZm2fOB8K5MBKIykckS 23zVpRnNJSB/Mb3FJ7C9rz+zZzaOpAKrUvcqdmQ1jzYgPT+gm1wAN9p+AGtmNyHB axBVxzs4A6LvSNa+wtHxqOn8ZII2Y5y9CCSSLv47Plaa2JW/oVISJIO2EdzlLU2m SYBaG5tUcW5SREwQKb63fu4ctILwWgAcOEy+orGPVQ87APZBiqQw4rbi9czGHLDs A2REZennBskfT84e+nBbrAyAFJNOsIXeiYX9gCk0ui16uzsABjyUBp6hw4sYeQ8j wZt7CQSTeJqnFRs/dxNbTIDR/rMnHa/nP3re8XK1LouHYRdA3tiFzIMXiQSufJ88 4SVXpYu7ig4pqGLjwVAlerYQDo1uFKYvQ3HYCk9MrXjEru5ocnhZiRczGOJBupNR t5jfRd33OfgGdnqrtAPoCr7Alg15bbm1cHd/Gzq1QRehIbuE1N/IUr8Y6dT6b1wq PthWp7EmCUFOPdGCirX5 =cymk -----END PGP SIGNATURE----- --Sig_/DCKMt4u4LW05Y/vDLUvEOwX--