From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space Date: Fri, 08 Jun 2012 16:43:30 +0200 Message-ID: <4FD20F92.9080805@siemens.com> References: <4FACB581.2050609@ozlabs.ru> <6A22E211-BC82-49BD-A335-02D3BAA14A17@suse.de> <4FAD0A4F.2050506@ozlabs.ru> <4FB080CE.3030703@ozlabs.ru> <4FB5DA43.90907@ozlabs.ru> <1337652170.2779.143.camel@pasglop> <6C472F5B-B8C3-48DE-B19B-00973AF6AC56@suse.de> <4FBB0B95.8050901@ozlabs.ru> <82643009-4F43-407F-B26C-C36537825BFD@suse.de> <4FBB2E25.2030206@ozlabs.ru> <584A5E54-2119-415C-93B4-BB91A08CA729@suse.de> <4FD1BC14.6030900@ozlabs.ru> <4FD1DA5C.5020900@siemens.com> <4FD1DF29.1050303@ozlabs.ru> <4FD1E25A.2010900@siemens.com> <4FD2058F.7030903@ozlabs.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alexander Graf , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , Alex Williamson , "anthony@codemonkey.ws" , David Gibson To: Alexey Kardashevskiy Return-path: Received: from thoth.sbs.de ([192.35.17.2]:19086 "EHLO thoth.sbs.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932212Ab2FHOnv (ORCPT ); Fri, 8 Jun 2012 10:43:51 -0400 In-Reply-To: <4FD2058F.7030903@ozlabs.ru> Sender: kvm-owner@vger.kernel.org List-ID: On 2012-06-08 16:00, Alexey Kardashevskiy wrote: > 08.06.2012 21:30, Jan Kiszka =D0=C9=DB=C5=D4: >> On 2012-06-08 13:16, Alexey Kardashevskiy wrote: >>> 08.06.2012 20:56, Jan Kiszka =CE=C1=D0=C9=D3=C1=CC: >>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote: >>>>> Yet another try :) >>>>> >>>>> Normally the pci_add_capability is called on devices to add new >>>>> capability. This is ok for emulated devices which capabilities li= st >>>>> is being built by QEMU. >>>>> >>>>> In the case of VFIO the capability may already exist and adding n= ew >>>> >>>> Why does it exit? VFIO should build the virtual capability list fr= om >>>> scratch (just like classic device assignment does), recreating the >>>> layout of the physical device (except for masked out caps). In tha= t >>>> case, this conflict should become impossible, no? >>> >>> Normally capabilities in emulated devices are created by calling >>> msi_init or msix_init - just when emulated device wants to advertis= e it >>> to the guest. >>> >>> In the case of VFIO, there is a lot of capabilities which QEMU does= not >>> know and does not want to know about. They are read from the host k= ernel >>> as is. And we definitely want to pass these capabilities to the gue= st as >>> is, i.e. on the same position and the same number of them. Just for= some >>> we call pci_add_capability (indirectly!) if we want QEMU to support= them >>> somehow. >>> >>> If we invent some function which "readds" all the capabilities we g= ot >>> from the host to keep internal QEMU's PCIDevice data in sync, then = we'll >>> need to change every piece of code which adds capabilities. >> >> I can't follow. What is different in VFIO from device-assignment.c, >> assigned_device_pci_cap_init (except that it already uses msi[x]_ini= t, >> something we need to fix in device-assignment.c)? >=20 > What are device-assignment.c and assigned_device_pci_cap_init? Cannot > find them in QEMU tree. "Old-style" KVM device assignment is not yet upstream. You can find it in qemu-kvm, hopefully in upstream soon as well. >=20 > Ah, anyway. The main difference is QEMU does not emulate VFIO devices= , > it just a proxy to the host system. Or I do not understand the questi= on. >=20 >>> I noticed, >>> this is very common approach here to change a lot for a very small = thing >>> or rare case but I'd like to avoid this :) >>> >>>> But if pci_*add*_capability should actually be used like this (I d= oubt >>>> this), >>> >>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call >>> msi_init/msix_init and they call pci_add_capability. >> >> You can't blame msi_init/msix_init for the fact that VFIO creates a >> capability list with an existing MSI/MSI-X entry beforehand. >=20 > VFIO does not create any capability. It gets them all from the host > kernel and passes to the guest as is. VFIO only needs MSIX to be enab= led > in VFIO. Just like any device in QEMU, also VFIO need to set up a virtual config space when it registers with the PCI core layer. Even if the virtual on= e is modeled after the real one, it is still _created_ by the VFIO userspace part. And this creation process is obviously a bit messed up so far. Fix this, but not by adding workarounds in the MSI or PCI layer= =2E Rather add all capabilities you want to expose to the guest via pci_add_capability or, indirectly, via msi[x]_init at the right position. Do not just copy the real config space over, that breaks the core layer as we see. Jan --=20 Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux