From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36923) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WshCt-0003lZ-Ub for qemu-devel@nongnu.org; Thu, 05 Jun 2014 19:36:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WshCn-0000QI-Kj for qemu-devel@nongnu.org; Thu, 05 Jun 2014 19:36:27 -0400 Message-ID: <5390FEF3.4080108@suse.de> Date: Fri, 06 Jun 2014 01:36:19 +0200 From: Alexander Graf MIME-Version: 1.0 References: <1401947401-21329-1-git-send-email-aik@ozlabs.ru> <1401947401-21329-2-git-send-email-aik@ozlabs.ru> <5390119D.8040201@ozlabs.ru> <53906B56.3080007@suse.de> <53906C50.50308@ozlabs.ru> <53906D54.4030105@suse.de> <5390718C.4020005@ozlabs.ru> <53907267.1090000@suse.de> <53907FBA.8060604@ozlabs.ru> <5390A01D.7020004@suse.de> <5390FA95.2090509@ozlabs.ru> In-Reply-To: <5390FA95.2090509@ozlabs.ru> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v7 1/4] spapr_iommu: Make in-kernel TCE table optional List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy , qemu-devel@nongnu.org Cc: Alex Williamson , qemu-ppc@nongnu.org, Gavin Shan On 06.06.14 01:17, Alexey Kardashevskiy wrote: > On 06/06/2014 02:51 AM, Alexander Graf wrote: >> On 05.06.14 16:33, Alexey Kardashevskiy wrote: >>> On 06/05/2014 11:36 PM, Alexander Graf wrote: >>>> On 05.06.14 15:33, Alexey Kardashevskiy wrote: >>>>> On 06/05/2014 11:15 PM, Alexander Graf wrote: >>>>>> On 05.06.14 15:10, Alexey Kardashevskiy wrote: >>>>>>> On 06/05/2014 11:06 PM, Alexander Graf wrote: >>>>>>>> On 05.06.14 08:43, Alexey Kardashevskiy wrote: >>>>>>>>> On 06/05/2014 03:49 PM, Alexey Kardashevskiy wrote: >>>>>>>>>> POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows >>>>>>>>>> allocating >>>>>>>>>> TCE tables in the host kernel memory and handle H_PUT_TCE requests >>>>>>>>>> targeted to specific LIOBN (logical bus number) right in the host >>>>>>>>>> without >>>>>>>>>> switching to QEMU. At the moment this is used for emulated devices >>>>>>>>>> only >>>>>>>>>> and the handler only puts TCE to the table. If the in-kernel >>>>>>>>>> H_PUT_TCE >>>>>>>>>> handler finds a LIOBN and corresponding table, it will put a TCE to >>>>>>>>>> the table and complete hypercall execution. The user space will >>>>>>>>>> not be >>>>>>>>>> notified. >>>>>>>>>> >>>>>>>>>> Upcoming VFIO support is going to use the same sPAPRTCETable device >>>>>>>>>> class >>>>>>>>>> so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE >>>>>>>>>> tables for VFIO are going to be allocated in the host as well. >>>>>>>>>> However VFIO operates with real IOMMU tables and simple copying of >>>>>>>>>> a TCE to the real hardware TCE table will not work as guest physical >>>>>>>>>> to host physical address translation is requited. >>>>>>>>>> >>>>>>>>>> So until the host kernel gets VFIO support for H_PUT_TCE, we >>>>>>>>>> better not >>>>>>>>>> to register VFIO's TCE in the host. >>>>>>>>>> >>>>>>>>>> This adds a bool @kvm_accel flag to the sPAPRTCETable device telling >>>>>>>>>> that sPAPRTCETable should not try allocating TCE table in the host >>>>>>>>>> kernel. >>>>>>>>>> Instead, the table will be created in QEMU. >>>>>>>>>> >>>>>>>>>> This adds an kvm_accel parameter to spapr_tce_new_table() to let >>>>>>>>>> users >>>>>>>>>> choose whether to use acceleration or not. At the moment it is >>>>>>>>>> enabled >>>>>>>>>> for VIO and emulated PCI. Upcoming VFIO support will set it to false. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Alexey Kardashevskiy >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> This is a workaround but it lets me have one IOMMU device for VIO, >>>>>>>>>> emulated >>>>>>>>>> PCI and VFIO which is a good thing. >>>>>>>>>> >>>>>>>>>> The other way around would be a new KVM_CAP_SPAPR_TCE_VFIO >>>>>>>>>> capability but >>>>>>>>>> this needs kernel update. >>>>>>>>> Never mind, I'll make it a capability. I'll post capability >>>>>>>>> reservation >>>>>>>>> patch separately. >>>>>>>> Just rename the flag from "kvm_accel" to "vfio_accel", set it to >>>>>>>> true for >>>>>>>> vfio and false for emulated devices. Then the spapr_iommu file can >>>>>>>> check on >>>>>>>> the capability (and default to false for now, since it doesn't exist >>>>>>>> yet). >>>>>>> Is that ok if the flag does not have to do anything with VFIO per se? :) >>>>>> The flag means "use in-kernel acceleration if the vfio coupling >>>>>> capability >>>>>> is available", no? >>>>> It is a flag of sPAPRTCETable which is not supposed to know about VFIO at >>>>> all, it is just an IOMMU. But if you are ok with it, I have no reason >>>>> to be >>>>> unhappy either :) >>>>> >>>>> >>>>> >>>>>>>> That way you don't have to reserve a CAP today. >>>>>>> Why exactly cannot we do that today? >>>>>> Because the CAP namespace isn't a garbage bin we can just throw IDs at. >>>>>> Maybe we realize during patch review that we need completely different >>>>>> CAPs. >>>>> That was my first plan - to wait for KVM_CAP_SPAPR_TCE_64 be available in >>>>> the kernel. >>>> So all you need are 64bit TCEs with bus_offset? >>> No. I need 64bit IOBAs a.k.a. PCI bus addresses. The default DMA window is >>> just 1 or 2GB and it is mapped at 0 on PCI bus. >>> >>> TCEs are 64 bit already. >> Ok, so the guest has to tell the PCI device to write to a specific window. >> That's a shame :). > No. Guest tells the device some address, that's it. Guest allocates those > addresses from some window which host, guest and PHB know about but not the > device. What is a shame here? It would be nicer if the guest had full control over the virtual address range of a PCI device. > > >>>> What about the missing >>>> in-kernel modification of the shadow TCEs on H_PUT_TCE? I thought that's >>>> what this is really about. >>> This I do not understand :( >> How does real mode H_PUT_TCE emulation know that it needs to notify user >> space to establish the map? > If it wants to pass control to the user space, it returns H_TOO_HARD. This > happens, for example, if LIOBN was not registered in KVM. So how does KVM_CAP_SPAPR_TCE_64 help here? With KVM_CAP_SPAPR_TCE_64 we can still not map VFIO devices' TCE tables because we're missing all the magic to link the virtual TCE table to a physical TCE table. Alex