From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46943) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiDwF-0003Fs-Ez for qemu-devel@nongnu.org; Tue, 22 Mar 2016 00:29:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aiDwC-0008BZ-6A for qemu-devel@nongnu.org; Tue, 22 Mar 2016 00:29:03 -0400 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:33082) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiDwB-0008BS-PM for qemu-devel@nongnu.org; Tue, 22 Mar 2016 00:29:00 -0400 Received: by mail-pf0-x241.google.com with SMTP id x3so33575084pfb.0 for ; Mon, 21 Mar 2016 21:28:59 -0700 (PDT) References: <1458546426-26222-1-git-send-email-aik@ozlabs.ru> <1458546426-26222-2-git-send-email-aik@ozlabs.ru> <20160322004956.GS23586@voom.redhat.com> <56F0B81E.9080701@ozlabs.ru> <20160322032626.GA23586@voom.redhat.com> From: Alexey Kardashevskiy Message-ID: <56F0CA04.4010109@ozlabs.ru> Date: Tue, 22 Mar 2016 15:28:52 +1100 MIME-Version: 1.0 In-Reply-To: <20160322032626.GA23586@voom.redhat.com> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH qemu v14 01/18] memory: Fix IOMMU replay base address List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Paolo Bonzini , Alex Williamson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org On 03/22/2016 02:26 PM, David Gibson wrote: > On Tue, Mar 22, 2016 at 02:12:30PM +1100, Alexey Kardashevskiy wrote: >> On 03/22/2016 11:49 AM, David Gibson wrote: >>> On Mon, Mar 21, 2016 at 06:46:49PM +1100, Alexey Kardashevskiy wrote: >>>> Since a788f227 "memory: Allow replay of IOMMU mapping notifications" >>>> when new VFIO listener is added, all existing IOMMU mappings are >>>> replayed. However there is a problem that the base address of >>>> an IOMMU memory region (IOMMU MR) is ignored which is not a problem >>>> for the existing user (which is pseries) with its default 32bit DMA >>>> window starting at 0 but it is if there is another DMA window. >>>> >>>> This stores the IOMMU's offset_within_address_space and adjusts >>>> the IOVA before calling vfio_dma_map/vfio_dma_unmap. >>>> >>>> As the IOMMU notifier expects IOVA offset rather than the absolute >>>> address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before >>>> calling notifier(s). >>>> >>>> Signed-off-by: Alexey Kardashevskiy >>>> Reviewed-by: David Gibson >>> >>> On a closer look, I realised this still isn't quite correct, although >>> I don't think any cases which would break it exist or are planned. >>> >>>> --- >>>> hw/ppc/spapr_iommu.c | 2 +- >>>> hw/vfio/common.c | 14 ++++++++------ >>>> include/hw/vfio/vfio-common.h | 1 + >>>> 3 files changed, 10 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c >>>> index 7dd4588..277f289 100644 >>>> --- a/hw/ppc/spapr_iommu.c >>>> +++ b/hw/ppc/spapr_iommu.c >>>> @@ -277,7 +277,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba, >>>> tcet->table[index] = tce; >>>> >>>> entry.target_as = &address_space_memory, >>>> - entry.iova = ioba & page_mask; >>>> + entry.iova = (ioba - tcet->bus_offset) & page_mask; >>>> entry.translated_addr = tce & page_mask; >>>> entry.addr_mask = ~page_mask; >>>> entry.perm = spapr_tce_iommu_access_flags(tce); >>> >>> This bit's right/ >>> >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index fb588d8..d45e2db 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -257,14 +257,14 @@ static void vfio_iommu_map_notify(Notifier *n, void *data) >>>> VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n); >>>> VFIOContainer *container = giommu->container; >>>> IOMMUTLBEntry *iotlb = data; >>>> + hwaddr iova = iotlb->iova + giommu->offset_within_address_space; >>> >>> This bit might be right, depending on how you define giommu->offset_within_address_space. >>> >>>> MemoryRegion *mr; >>>> hwaddr xlat; >>>> hwaddr len = iotlb->addr_mask + 1; >>>> void *vaddr; >>>> int ret; >>>> >>>> - trace_vfio_iommu_map_notify(iotlb->iova, >>>> - iotlb->iova + iotlb->addr_mask); >>>> + trace_vfio_iommu_map_notify(iova, iova + iotlb->addr_mask); >>>> >>>> /* >>>> * The IOMMU TLB entry we have just covers translation through >>>> @@ -291,21 +291,21 @@ static void vfio_iommu_map_notify(Notifier *n, void *data) >>>> >>>> if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { >>>> vaddr = memory_region_get_ram_ptr(mr) + xlat; >>>> - ret = vfio_dma_map(container, iotlb->iova, >>>> + ret = vfio_dma_map(container, iova, >>>> iotlb->addr_mask + 1, vaddr, >>>> !(iotlb->perm & IOMMU_WO) || mr->readonly); >>>> if (ret) { >>>> error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", " >>>> "0x%"HWADDR_PRIx", %p) = %d (%m)", >>>> - container, iotlb->iova, >>>> + container, iova, >>>> iotlb->addr_mask + 1, vaddr, ret); >>>> } >>>> } else { >>>> - ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1); >>>> + ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1); >>>> if (ret) { >>>> error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", " >>>> "0x%"HWADDR_PRIx") = %d (%m)", >>>> - container, iotlb->iova, >>>> + container, iova, >>>> iotlb->addr_mask + 1, ret); >>>> } >>>> } >>> >>> This is fine. >>> >>>> @@ -377,6 +377,8 @@ static void vfio_listener_region_add(MemoryListener *listener, >>>> */ >>>> giommu = g_malloc0(sizeof(*giommu)); >>>> giommu->iommu = section->mr; >>>> + giommu->offset_within_address_space = >>>> + section->offset_within_address_space; >>> >>> But here there's a problem. The iova in IOMMUTLBEntry is relative to >>> the IOMMU MemoryRegion, but - at least in theory - only a subsection >>> of that MemoryRegion could be mapped into the AddressSpace. >> >> But the IOMMU MR stays the same - size, offset, and iova will be relative to >> its start, why does it matter if only portion is mapped? > > Because the portion mapped may not sit at the start of the MR. For > example if you had a 2G MR, and the second half is mapped at address 0 > in the AS, My imagination fails here. How could you do this in practice? address_space_init(&as, &root) memory_region_init(&mr, 2GB) memory_region_add_subregion(&root, -1GB, &mr) But offsets are unsigned. In general, how to map only a half, what memory_region_add_xxx() does that? > then an IOMMUTLBEntry iova of 1G would translated to AS > address 0. > > >>> So, to find the IOVA within the AddressSpace, from the IOVA within the >>> MemoryRegion, you need to first subtract the section's offset within >>> the MemoryRegion, then add the section's offset within the >>> AddressSpace. >>> >>> You could precalculate the combined delta here, but... >>> >>> >>>> giommu->container = container; >>>> giommu->n.notify = vfio_iommu_map_notify; >>>> QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); >>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h >>>> index eb0e1b0..5341e05 100644 >>>> --- a/include/hw/vfio/vfio-common.h >>>> +++ b/include/hw/vfio/vfio-common.h >>>> @@ -90,6 +90,7 @@ typedef struct VFIOContainer { >>>> typedef struct VFIOGuestIOMMU { >>>> VFIOContainer *container; >>>> MemoryRegion *iommu; >>>> + hwaddr offset_within_address_space; >>> >>> ...it might be simpler to replace both the iommu and >>> offset_within_address_space fields here with a pointer to the >>> MemoryRegionSection instead, which should give you all the info you >>> need. >> >> >> MemoryRegionSection is allocated on stack in listener_add_address_space() >> and seems to be in general some sort of temporary object. > > Ah, right, I guess you'll have to store the delta, then. > >> >> >>> >>> It might also be worth adding Paolo to the CC for this patch, since he >>> knows the MemoryRegion stuff better than anyone. >> >> >> Right, I added him in cc: now. >> >>> >>>> Notifier n; >>>> QLIST_ENTRY(VFIOGuestIOMMU) giommu_next; >>>> } VFIOGuestIOMMU; >>> >> >> > -- Alexey