From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58606) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKdME-0002pf-T4 for qemu-devel@nongnu.org; Sun, 08 Feb 2015 20:41:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YKdMB-0002UL-Jc for qemu-devel@nongnu.org; Sun, 08 Feb 2015 20:41:50 -0500 Message-ID: <54D81056.1090108@suse.de> Date: Mon, 09 Feb 2015 02:41:42 +0100 From: Alexander Graf MIME-Version: 1.0 References: <1422943851-25836-1-git-send-email-david@gibson.dropbear.id.au> <20150203211906.GA13992@iris.ozlabs.ibm.com> <20150204013211.GU28703@voom.fritz.box> <54D23872.90007@suse.de> <20150205004812.GD25675@voom.fritz.box> <54D2BF4F.1030609@suse.de> <20150205025556.GH25675@voom.fritz.box> <6EFB0F0E-BB1D-4DAE-8BA4-367B05E88553@suse.de> <20150205113007.GT25675@voom.fritz.box> <54D35A41.6020907@suse.de> <20150206025415.GZ25675@voom.fritz.box> <54D473B0.3020201@suse.de> <20150209113718.5974a1a9@voom.fritz.box> In-Reply-To: <20150209113718.5974a1a9@voom.fritz.box> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: "mdroth@us.ibm.com" , "aik@ozlabs.ru" , "qemu-devel@nongnu.org" , "qemu-ppc@nongnu.org" , Stefan Hajnoczi , Paul Mackerras On 09.02.15 01:37, David Gibson wrote: > On Fri, 06 Feb 2015 08:56:32 +0100 > Alexander Graf wrote: > >> >> >> On 06.02.15 03:54, David Gibson wrote: >>> On Thu, Feb 05, 2015 at 12:55:45PM +0100, Alexander Graf wrote: >>>> >>>> >>>> On 05.02.15 12:30, David Gibson wrote: >>>>> On Thu, Feb 05, 2015 at 11:22:13AM +0100, Alexander Graf wrote: >>> [snip] >>>>>>>>>>>> [snip] >>>>>>>>>>>> >>>>>>>>>>>>> + ret1 = kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_LOAD); >>>>>>>>>>>>> + if (ret1 != 0) { >>>>>>>>>>>>> + fprintf(stderr, "Warning: error enabling H_LOGICAL_CI_LOAD in KVM:" >>>>>>>>>>>>> + " %s\n", strerror(errno)); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + >>>>>>>>>>>>> + ret2 = kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_STORE); >>>>>>>>>>>>> + if (ret2 != 0) { >>>>>>>>>>>>> + fprintf(stderr, "Warning: error enabling H_LOGICAL_CI_STORE in KVM:" >>>>>>>>>>>>> + " %s\n", strerror(errno)); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + >>>>>>>>>>>>> + if ((ret1 != 0) || (ret2 != 0)) { >>>>>>>>>>>>> + fprintf(stderr, "Warning: Couldn't enable H_LOGICAL_CI_* in KVM, SLOF" >>>>>>>>>>>>> + " may be unable to operate devices with in-kernel emulation\n"); >>>>>>>>>>>>> + } >>>>>>>>>>>> >>>>>>>>>>>> You'll always get these warnings if you're running on an old (meaning >>>>>>>>>>>> current upstream) kernel, which could be annoying. >>>>>>>>>>> >>>>>>>>>>> True. >>>>>>>>>>> >>>>>>>>>>>> Is there any way >>>>>>>>>>>> to tell whether you have configured any devices which need the >>>>>>>>>>>> in-kernel MMIO emulation and only warn if you have? >>>>>>>>>>> >>>>>>>>>>> In theory, I guess so. In practice I can't see how you'd enumerate >>>>>>>>>>> all devices that might require kernel intervention without something >>>>>>>>>>> horribly invasive. >>>>>>>>>> >>>>>>>>>> We could WARN_ONCE in QEMU if we emulate such a hypercall, but its >>>>>>>>>> handler is io_mem_unassigned (or we add another minimum priority huge >>>>>>>>>> memory region on all 64bits of address space that reports the breakage). >>>>>>>>> >>>>>>>>> Would that work for the virtio+iothread case? I had the impression >>>>>>>>> the kernel handled notification region was layered over the qemu >>>>>>>>> emulated region in that case. >>>>>>>> >>>>>>>> IIRC we don't have a way to call back into kvm saying "please write to >>>>>>>> this in-kernel device". But we could at least defer the warning to a >>>>>>>> point where we know that we actually hit it. >>>>>>> >>>>>>> Right, but I'm saying we might miss the warning in cases where we want >>>>>>> it, because the KVM device is shadowed by a qemu device, so qemu won't >>>>>>> see the IO as unassigned or unhandled. >>>>>>> >>>>>>> In particular, I think that will happen in the case of virtio-blk with >>>>>>> iothread, which is the simplest case in which to observe the problem. >>>>>>> The virtio-blk device exists in qemu and is functional, but we rely on >>>>>>> KVM catching the queue notification MMIO before it reaches the qemu >>>>>>> implementation of the rest of the device's IO space. >>>>>> >>>>>> But in that case the VM stays functional and will merely see a >>>>>> performance hit when using virtio in SLOF, no? I don't think that's >>>>>> a problem worth worrying users about. >>>>> >>>>> Alas, no. The iothread stuff *relies* on the in-kernel notification, >>>>> so it will not work if the IO gets punted to qemu. This is the whole >>>>> reason for the in-kernel hcall implementation. >>>> >>>> So at least with vhost-net the in-kernel trapping is optional. If we >>>> happen to get MMIO into QEMU, we'll just handle it there. >>>> >>>> Enlighten me why the iothread stuff can't handle it that way too. >>> >>> So, as I understand it, it could, but it doesn't. Working out how to >>> fix it properly requires better understanding of the dataplane code >>> than I currently possess, >>> >>> So, using virtio-blk as the example case. Normally the queue notify >>> mmio will get routed by the general virtio code to >>> virtio_blk_handle_output(). >>> >>> In the case of dataplane, that just calls >>> virtio_blk_data_plane_start(). So the first time we get a vq notify, >>> the dataplane is started. That sets up the host notifier >>> (VirtioBusClass::set_host_notifier -> virtio_pci_set_host_notifier -> >>> virtio_pci_set_host_notifier_internal -> memory_region_add_eventfd() >>> -> memory_region_transaction_commit() -> >>> address_space_update_ioeventfds - >address_space_add_del_ioeventfds -> >>> kvm_mem_ioeventfd_add -> kvm_set_ioeventfd_mmio -> KVM_IOEVENTFD >>> ioctl) >>> >>> From this point on further calls to virtio_blk_handle_output() are >>> IIUC a "can't happen", because vq notifies should go to the eventfd >>> instead, where they will kick the iothread. >>> >>> So, with SLOF, the first request is ok - it hits >>> virtio_blk_handle_output() which starts the iothread which goes on to >>> process the request. >>> >>> On the second request, however, we get back into >>> virtio_blk_data_plane_start() which sees the iothread is already >>> running and aborts. I think it is assuming that this must be the >>> result of a race with another vcpu starting the dataplane, and so >>> assumes the racing thread will have woken the dataplane which will >>> then handle this vcpu's request as well. >>> >>> In our case, however, the IO hcalls go through to >>> virtio_blk_handle_output() when the dataplane already going, and >>> become no-ops without waking it up again to handle the new request. >>> >>> Enlightened enough yet? >> >> So reading this, it sounds like we could just add logic in the virtio >> dataplane code that allows for a graceful fallback to QEMU based MMIO by >> triggering the eventfd itself in the MMIO handler. When going via this >> slow path, we should of course emit a warning (once) to the user ;). >> >> Stefan, what do you think? > > So, as I understand it this should be possible. I did even have a > draft which did this. However, I don't know the dataplane well enough > to know what gotchas there might be in terms of races, and therefore > how to do this quite right. I'm sure Stefan knows :). > Note that this doesn't remove the need for the in-kernel H_LOGICAL_CI_* > hcalls, because those will still be necessary if we get real in-kernel > emulated devices in future. Oh, I definitely agree on that part. I just want to make sure we gracefully run on older host kernels with newer QEMU versions. Alex