From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id ACA279867F8 for ; Fri, 15 Jul 2022 14:18:40 +0000 (UTC) From: "Afsa, Baptiste" Date: Fri, 15 Jul 2022 14:18:32 +0000 Message-ID: <1657894712677.5057@harman.com> References: <6f4f611e95cf49b0b16b37efba6bf78a@HIMDWSMB01.ad.harman.com>, In-Reply-To: MIME-Version: 1.0 Subject: Re: [virtio-dev] VM memory protection and zero-copy transfers. Content-Language: fr-FR Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: Stefan Hajnoczi Cc: "virtio-dev@lists.oasis-open.org" , Jean-Philippe Brucker , "Michael S. Tsirkin" List-ID: > Did you try virtio-pci with an IOMMU? The advantage compared to both=0A> = your proposal and swiotlb is that workloads that reuse buffers have no=0A> = performance overhead because the IOMMU mappings remain in place across=0A> = virtqueue requests.=0A>=0A> I have CCed Jean-Philippe Brucker who=0A> designed the virtio-iommu device.=0A>=0A> Using an IO= MMU can be slower than the approach you are proposing when=0A> each request= requires new mappings. That's because your approach=0A> combines the virtq= ueue kick processing with the page granting whereas=0A> programming an IOMM= U with map/unmap commands is a separate vmexit from=0A> the virtqueue kick.= It's probably easier to make your approach faster in=0A> the dynamic mappi= ngs case for this reason.=0A>=0A> A page-table based IOMMU (doesn't require= explicit map/unamp commands=0A> because it reads mappings on-demand from a= page table structure) might=0A> perform better than one that needs to be p= rogrammed for each each=0A> map/unmap operation. It still needs a kick (vme= xit) for invalidation but=0A> it might be possible for a design of this typ= e to avoid vmexits in the=0A> common case.=0A=0AWe considered it at some po= int with the idea of moving the memory granting to=0Athe driver's side but = we did not go in this direction for the reason that you=0Amentioned, plus t= he fact that doing it from the hypervisor allows us to validate=0Athe descr= iptors while granting the memory and moving them to the shadow queue.=0A=0A= However, we did not consider workloads where the buffers are reused. I am= =0Acurious to see how much this is used in practice and the sort of gain we= are=0Alooking at.=0A=0AThank you for pointing this out, this looks like so= mething we should investigate=0Afurther.=0A=0A> > Although the shadow virtq= ueue concept looks fairly simple, there is still one=0A> > point that has n= ot been covered yet: indirect descriptors.=0A> >=0A> > To support indirect = descriptors, the following two options were considered=0A> > initially:=0A>= >=0A> > 1. Grant the indirect descriptor as-is to the host while it is o= n the used=0A> > ring. This introduces a security issue because a comp= romised guest OS can=0A> > modify the indirect descriptor after it has= been pushed to the available=0A> > ring. This would cause the device = to fault while trying to access any=0A> > arbitrary memory that was no= t actually granted.=0A> >=0A> > Note that in the shadow virtqueue mode= l, there is no need for the device to=0A> > validate the descriptors i= n the available rings, because the hypervisor=0A> > already performed = such checks before granting the memory.=0A>=0A> Assuming that the driver ca= n trust the device isn't possible in all use=0A> cases. Hardware VIRTIO dev= ice implementations, VDUSE=0A> (https://docs.kernel.org/userspace-api/vduse= .html), and Confidential=0A> Computing are 3 use cases where the device is = untrusted. If you make the=0A> assumption then it's important to clearly ma= rk the code so it won't be=0A> reused in a context where that would be a se= curity problem.=0A>=0A> >=0A> > 2. Follow the same logic that is used for= the "normal" descriptors and=0A> > introduce shadow indirect descript= ors. This would require the hypervisor to=0A> > provision a memory poo= l to allocate these shadow indirect descriptors and=0A> > determining = the size of this pool may not be trivial.=0A> >=0A> > Additionally, in= direct descriptors can be as large as the driver wants them=0A> > to b= e, something that can cause the hypervisor to copy an arbitrary large=0A> >= amount of data.=0A>=0A> I agree that it's unfortunate that indirect d= escriptors would require=0A> some kind of dynamic memory in the hypervisor.= However, the statement=0A> about indirect descriptor size is incorrect. Th= ey are limited by Queue=0A> Size:=0A>=0A> VIRTIO 1.2 2.7.5.3.1 Driver Req= uirements: Indirect Descriptors=0A>=0A> A driver MUST NOT create a descri= ptor chain longer than the Queue Size=0A> of the device.=0A>=0A> >=0A> > = An alternative approach consists in introducing a new virtio feature bit. T= his=0A> > feature bit, when set by the device, instructs the driver to allo= cate indirect=0A> > descriptors using dedicated memory pages. These pages s= hall hold no other data=0A> > than the indirect descriptors. Since a correc= t virtio driver implementation does=0A> > not modify an indirect descriptor= once it has been pushed to the device, the=0A> > pages where the indirect = descriptors lies can later on be remapped as read-only=0A> > in the guest a= ddress space.=0A> >=0A> > This allows the hypervisor to validate the conten= t of the indirect descriptor,=0A> > grant it to the host (along with all th= e buffers referenced by this descriptor)=0A> > and remap the indirect descr= iptor read-only in the guest address space as long=0A> > as it is granted t= o the host (i.e. until the indirect descriptor is returned=0A> > through th= e used ring).=0A>=0A> That sounds very slow (2 page table updates per reque= st).=0A=0AYes it is. As I mentioned, overall this approach is about 40% slo= wer on the set=0Aof benchmarks we used when compared to running virtio devi= ces without memory=0Aisolation.=0A=0A> > The present proposal has some obvi= ous drawbacks, but we believe that memory=0A> > protection will not come fo= r free. We know that there are other folks out there=0A> > which try to add= ress this issue of memory sharing between VMs, so we would be=0A> > pleased= to hear what you guys think about this approach.=0A> >=0A> > Additionally,= we would like to know whether a feature bit similar to the one=0A> > that = was discussed here could be considered for addition to the virtio standard.= =0A>=0A> Memory isolation is hard to do efficiently. It would be great to d= iscuss=0A> your proposal more with the VIRTIO community and then send a spe= c patch=0A> for detailed review and voting.=0A=0AThis is exactly why we are= here!=0A=0AI can indeed prepare a spec patch but before doing that I wante= d to get some=0Afeedback on this approach and see if anyone in the communit= y would see any=0Areason not to introduce such a feature bit.=0A=0A> One th= ing I didn't see in your proposal was a copying vs zero-copy=0A> threshold.= Maybe it helps to look at the size of requests and copy data=0A> instead o= f granting pages when descriptors are small? On the other hand,=0A> a 4 KB = page size means that many descriptors won't be larger than 4 KB=0A> anyway = due to guest physical memory fragmentation. This is basically=0A> hybrid of= swiotlb and your proposal - zero-copy when it pays off,=0A> copying when i= t's cheap.=0A>=0A> As I mentioned, I think IOMMUs are worth investigating, = in particular=0A> for the case where mappings are rarely changed. They are = fast in that=0A> case.=0A=0AI agree there is much likely a point until whic= h copying is cheaper. However, we=0Ahave not considered this in our initial= investigation.=0A=0AThank you Stefan for taking the time to read this prop= osal and for your=0Afeedback.=0A=0ABaptiste --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org