From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33315) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ctqo0-00062U-8b for qemu-devel@nongnu.org; Fri, 31 Mar 2017 03:17:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ctqnx-0005Uw-1d for qemu-devel@nongnu.org; Fri, 31 Mar 2017 03:17:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57914) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ctqnw-0005SK-PX for qemu-devel@nongnu.org; Fri, 31 Mar 2017 03:17:04 -0400 References: <1486456099-7345-1-git-send-email-peterx@redhat.com> <1486456099-7345-15-git-send-email-peterx@redhat.com> <20170327091208.GG11497@pxdev.xzpeter.org> <9c3cda64-b4a3-6f9b-f951-bf73f6613faa@redhat.com> From: Jason Wang Message-ID: <81e79982-4af0-3b46-552c-eea4db05a362@redhat.com> Date: Fri, 31 Mar 2017 15:16:44 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add MemoryRegionIOMMUOps.replay() callback List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Liu, Yi L" , 'Peter Xu' Cc: "Lan, Tianyu" , "Tian, Kevin" , "'mst@redhat.com'" , "'jan.kiszka@siemens.com'" , "'bd.aviv@gmail.com'" , "'qemu-devel@nongnu.org'" , "'alex.williamson@redhat.com'" , 'David Gibson' On 2017=E5=B9=B403=E6=9C=8831=E6=97=A5 13:34, Liu, Yi L wrote: >> -----Original Message----- >> From: Jason Wang [mailto:jasowang@redhat.com] >> Sent: Thursday, March 30, 2017 7:58 PM >> To: Liu, Yi L ; 'Peter Xu' >> Cc: 'alex.williamson@redhat.com' ; Lan, Ti= anyu >> ; Tian, Kevin ; 'mst@redha= t.com' >> ; 'jan.kiszka@siemens.com' ; >> 'bd.aviv@gmail.com' ; 'David Gibson' >> ; 'qemu-devel@nongnu.org' > devel@nongnu.org> >> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add >> MemoryRegionIOMMUOps.replay() callback >> >> >> >> On 2017=E5=B9=B403=E6=9C=8830=E6=97=A5 19:06, Liu, Yi L wrote: >>>> -----Original Message----- >>>> From: Liu, Yi L >>>> Sent: Monday, March 27, 2017 5:22 PM >>>> To: Peter Xu >>>> Cc: alex.williamson@redhat.com; Lan, Tianyu ; >>>> Tian, Kevin ; mst@redhat.com; >>>> jan.kiszka@siemens.com; jasowang@redhat.com; bd.aviv@gmail.com; Davi= d >>>> Gibson ; qemu-devel@nongnu.org >>>> Subject: RE: [Qemu-devel] [PATCH v7 14/17] memory: add >>>> MemoryRegionIOMMUOps.replay() callback >>>> >>>>> -----Original Message----- >>>>> From: Peter Xu [mailto:peterx@redhat.com] >>>>> Sent: Monday, March 27, 2017 5:12 PM >>>>> To: Liu, Yi L >>>>> Cc: alex.williamson@redhat.com; Lan, Tianyu ; >>>>> Tian, Kevin ; mst@redhat.com; >>>>> jan.kiszka@siemens.com; jasowang@redhat.com; bd.aviv@gmail.com; >>>>> David Gibson ; qemu-devel@nongnu.org >>>>> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add >>>>> MemoryRegionIOMMUOps.replay() callback >>>>> >>>>> On Mon, Mar 27, 2017 at 08:35:05AM +0000, Liu, Yi L wrote: >>>>>>> -----Original Message----- >>>>>>> From: Qemu-devel >>>>>>> [mailto:qemu-devel-bounces+yi.l.liu=3Dintel.com@nongnu.org] On >>>>>>> Behalf Of Peter Xu >>>>>>> Sent: Tuesday, February 7, 2017 4:28 PM >>>>>>> To: qemu-devel@nongnu.org >>>>>>> Cc: Lan, Tianyu ; Tian, Kevin >>>>>>> ; mst@redhat.com; jan.kiszka@siemens.com; >>>>>>> jasowang@redhat.com; peterx@redhat.com; >>>>>>> alex.williamson@redhat.com; bd.aviv@gmail.com; David Gibson >>>>>>> >>>>>>> Subject: [Qemu-devel] [PATCH v7 14/17] memory: add >>>>>>> MemoryRegionIOMMUOps.replay() callback >>>>>>> >>>>>>> Originally we have one memory_region_iommu_replay() function, >>>>>>> which is the default behavior to replay the translations of the >>>>>>> whole IOMMU region. However, on some platform like x86, we may >>>>>>> want our own >>>>> replay logic for IOMMU regions. >>>>>>> This patch add one more hook for IOMMUOps for the callback, and >>>>>>> it'll override the default if set. >>>>>>> >>>>>>> Signed-off-by: Peter Xu >>>>>>> --- >>>>>>> include/exec/memory.h | 2 ++ >>>>>>> memory.c | 6 ++++++ >>>>>>> 2 files changed, 8 insertions(+) >>>>>>> >>>>>>> diff --git a/include/exec/memory.h b/include/exec/memory.h index >>>>>>> 0767888..30b2a74 100644 >>>>>>> --- a/include/exec/memory.h >>>>>>> +++ b/include/exec/memory.h >>>>>>> @@ -191,6 +191,8 @@ struct MemoryRegionIOMMUOps { >>>>>>> void (*notify_flag_changed)(MemoryRegion *iommu, >>>>>>> IOMMUNotifierFlag old_flags, >>>>>>> IOMMUNotifierFlag new_flags); >>>>>>> + /* Set this up to provide customized IOMMU replay function *= / >>>>>>> + void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier)= ; >>>>>>> }; >>>>>>> >>>>>>> typedef struct CoalescedMemoryRange CoalescedMemoryRange; diff >>>>>>> --git a/memory.c b/memory.c index 7a4f2f9..9c253cc 100644 >>>>>>> --- a/memory.c >>>>>>> +++ b/memory.c >>>>>>> @@ -1630,6 +1630,12 @@ void >>>>>>> memory_region_iommu_replay(MemoryRegion >>>>>>> *mr, IOMMUNotifier *n, >>>>>>> hwaddr addr, granularity; >>>>>>> IOMMUTLBEntry iotlb; >>>>>>> + /* If the IOMMU has its own replay callback, override */ >>>>>>> + if (mr->iommu_ops->replay) { >>>>>>> + mr->iommu_ops->replay(mr, n); >>>>>>> + return; >>>>>>> + } >>>>>> Hi Alex, Peter, >>>>>> >>>>>> Will all the other vendors(e.g. PPC, s390, ARM) add their own >>>>>> replay callback as well? I guess it depends on whether the origina= l >>>>>> replay algorithm work well for them? Do you have such knowledge? >>>>> I guess so. At least for VT-d we had this callback since the defaul= t >>>>> replay mechanism did not work well on x86 due to its extremely larg= e >>>>> memory region size. Thanks, >>>> thx. that would make sense. >>> Peter, >>> >>> Just come to mind that there may be a corner case here. >>> >>> Intel VT-d actually has a "pt" mode which allows device use physical >>> address even when VT-d is enabled. In kernel, there is a iommu_identi= ty_mapping. >>> If a device is in this map, then it would use "pt" mode. So that IOMM= U >>> driver would not build second-level page table for it. >> Yes, but qemu does not support ECAP_PT now, so guest will still have a= page table in >> this case. > That's true. Without ECAP_PT, IOMMU driver would create a 1:1 map. So t= his solution > can work well even a device is in identify_map. > >>> Back to the virtual IOVA implementation, if an assigned device is in >>> the iommu_identity_mapping(e.g. VGA controller), it uses GPA directly= to do DMA. >>> So it demands a GPA->HPA mapping in host. However, the >>> iommu->ops.replay is not able to build it when guest SL page table is= empty. >>> >>> So I think building an entire guest PA->HPA mapping before guest >>> kernel boot would be recommended. Any thoughts? >> We plan to add PT in 2.10, a possible rough idea is disabled iommu dma= r region and >> use another region without iommu_ops. Then >> vfio_listener_region_add() will just do the correct mappings. > Good to know it. Actually, I also need to expose ECAP_PT for vSVM. So j= ust comes to > realize that the current replay solution may not work well when I expos= e ECAP_PT to guest. > I also have a rough idea here. The current listener in container listen= s to address space > named with devfn if virtual VTd is added. How about adding one more lis= tener to listen > memory address space. So that the listener can build entire guest PA->H= PA mapping. This is only needed for PT. So looks like current code is sufficient to=20 do this I think. See the else part of if (memory_region_is_iommu()) of=20 vfio_listener_region_add(). Thanks > Also, > the vfio notifier is registered when changes happen in device address s= pace. However, I > didn=E2=80=99t check if all the layout changes in memory address space = happen before the first > dynamic map/unmap request from guest. If not, this solution is not prac= tical. > > Thanks, > Yi L