From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39775) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJdmK-00055I-AU for qemu-devel@nongnu.org; Fri, 06 Feb 2015 02:56:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YJdmF-00061s-6m for qemu-devel@nongnu.org; Fri, 06 Feb 2015 02:56:40 -0500 Message-ID: <54D473B0.3020201@suse.de> Date: Fri, 06 Feb 2015 08:56:32 +0100 From: Alexander Graf MIME-Version: 1.0 References: <1422943851-25836-1-git-send-email-david@gibson.dropbear.id.au> <20150203211906.GA13992@iris.ozlabs.ibm.com> <20150204013211.GU28703@voom.fritz.box> <54D23872.90007@suse.de> <20150205004812.GD25675@voom.fritz.box> <54D2BF4F.1030609@suse.de> <20150205025556.GH25675@voom.fritz.box> <6EFB0F0E-BB1D-4DAE-8BA4-367B05E88553@suse.de> <20150205113007.GT25675@voom.fritz.box> <54D35A41.6020907@suse.de> <20150206025415.GZ25675@voom.fritz.box> In-Reply-To: <20150206025415.GZ25675@voom.fritz.box> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC] pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: "mdroth@us.ibm.com" , "aik@ozlabs.ru" , "qemu-devel@nongnu.org" , "qemu-ppc@nongnu.org" , Stefan Hajnoczi , Paul Mackerras On 06.02.15 03:54, David Gibson wrote: > On Thu, Feb 05, 2015 at 12:55:45PM +0100, Alexander Graf wrote: >> >> >> On 05.02.15 12:30, David Gibson wrote: >>> On Thu, Feb 05, 2015 at 11:22:13AM +0100, Alexander Graf wrote: > [snip] >>>>>>>>>> [snip] >>>>>>>>>> >>>>>>>>>>> + ret1 =3D kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_LOA= D); >>>>>>>>>>> + if (ret1 !=3D 0) { >>>>>>>>>>> + fprintf(stderr, "Warning: error enabling H_LOGICAL_C= I_LOAD in KVM:" >>>>>>>>>>> + " %s\n", strerror(errno)); >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + ret2 =3D kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_STO= RE); >>>>>>>>>>> + if (ret2 !=3D 0) { >>>>>>>>>>> + fprintf(stderr, "Warning: error enabling H_LOGICAL_C= I_STORE in KVM:" >>>>>>>>>>> + " %s\n", strerror(errno)); >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + if ((ret1 !=3D 0) || (ret2 !=3D 0)) { >>>>>>>>>>> + fprintf(stderr, "Warning: Couldn't enable H_LOGICAL_= CI_* in KVM, SLOF" >>>>>>>>>>> + " may be unable to operate devices with in-k= ernel emulation\n"); >>>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> You'll always get these warnings if you're running on an old (= meaning >>>>>>>>>> current upstream) kernel, which could be annoying. >>>>>>>>> >>>>>>>>> True. >>>>>>>>> >>>>>>>>>> Is there any way >>>>>>>>>> to tell whether you have configured any devices which need the >>>>>>>>>> in-kernel MMIO emulation and only warn if you have? >>>>>>>>> >>>>>>>>> In theory, I guess so. In practice I can't see how you'd enume= rate >>>>>>>>> all devices that might require kernel intervention without some= thing >>>>>>>>> horribly invasive. >>>>>>>> >>>>>>>> We could WARN_ONCE in QEMU if we emulate such a hypercall, but i= ts >>>>>>>> handler is io_mem_unassigned (or we add another minimum priority= huge >>>>>>>> memory region on all 64bits of address space that reports the br= eakage). >>>>>>> >>>>>>> Would that work for the virtio+iothread case? I had the impressi= on >>>>>>> the kernel handled notification region was layered over the qemu >>>>>>> emulated region in that case. >>>>>> >>>>>> IIRC we don't have a way to call back into kvm saying "please writ= e to >>>>>> this in-kernel device". But we could at least defer the warning to= a >>>>>> point where we know that we actually hit it. >>>>> >>>>> Right, but I'm saying we might miss the warning in cases where we w= ant >>>>> it, because the KVM device is shadowed by a qemu device, so qemu wo= n't >>>>> see the IO as unassigned or unhandled. >>>>> >>>>> In particular, I think that will happen in the case of virtio-blk w= ith >>>>> iothread, which is the simplest case in which to observe the proble= m. >>>>> The virtio-blk device exists in qemu and is functional, but we rely= on >>>>> KVM catching the queue notification MMIO before it reaches the qemu >>>>> implementation of the rest of the device's IO space. >>>> >>>> But in that case the VM stays functional and will merely see a >>>> performance hit when using virtio in SLOF, no? I don't think that's >>>> a problem worth worrying users about. >>> >>> Alas, no. The iothread stuff *relies* on the in-kernel notification, >>> so it will not work if the IO gets punted to qemu. This is the whole >>> reason for the in-kernel hcall implementation. >> >> So at least with vhost-net the in-kernel trapping is optional. If we >> happen to get MMIO into QEMU, we'll just handle it there. >> >> Enlighten me why the iothread stuff can't handle it that way too. >=20 > So, as I understand it, it could, but it doesn't. Working out how to > fix it properly requires better understanding of the dataplane code > than I currently possess, >=20 > So, using virtio-blk as the example case. Normally the queue notify > mmio will get routed by the general virtio code to > virtio_blk_handle_output(). >=20 > In the case of dataplane, that just calls > virtio_blk_data_plane_start(). So the first time we get a vq notify, > the dataplane is started. That sets up the host notifier > (VirtioBusClass::set_host_notifier -> virtio_pci_set_host_notifier -> > virtio_pci_set_host_notifier_internal -> memory_region_add_eventfd() > -> memory_region_transaction_commit() -> > address_space_update_ioeventfds - >address_space_add_del_ioeventfds -> > kvm_mem_ioeventfd_add -> kvm_set_ioeventfd_mmio -> KVM_IOEVENTFD > ioctl) >=20 > From this point on further calls to virtio_blk_handle_output() are > IIUC a "can't happen", because vq notifies should go to the eventfd > instead, where they will kick the iothread. >=20 > So, with SLOF, the first request is ok - it hits > virtio_blk_handle_output() which starts the iothread which goes on to > process the request. >=20 > On the second request, however, we get back into > virtio_blk_data_plane_start() which sees the iothread is already > running and aborts. I think it is assuming that this must be the > result of a race with another vcpu starting the dataplane, and so > assumes the racing thread will have woken the dataplane which will > then handle this vcpu's request as well. >=20 > In our case, however, the IO hcalls go through to > virtio_blk_handle_output() when the dataplane already going, and > become no-ops without waking it up again to handle the new request. >=20 > Enlightened enough yet? So reading this, it sounds like we could just add logic in the virtio dataplane code that allows for a graceful fallback to QEMU based MMIO by triggering the eventfd itself in the MMIO handler. When going via this slow path, we should of course emit a warning (once) to the user ;). Stefan, what do you think? Alex