From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:52312)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1Sd0Po-0001Tx-0F
	for qemu-devel@nongnu.org; Fri, 08 Jun 2012 10:43:56 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1Sd0Ph-0002Vk-I9
	for qemu-devel@nongnu.org; Fri, 08 Jun 2012 10:43:51 -0400
Received: from thoth.sbs.de ([192.35.17.2]:25559)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1Sd0Ph-0002V4-8C
	for qemu-devel@nongnu.org; Fri, 08 Jun 2012 10:43:45 -0400
Message-ID: <4FD20F92.9080805@siemens.com>
Date: Fri, 08 Jun 2012 16:43:30 +0200
From: Jan Kiszka <jan.kiszka@siemens.com>
MIME-Version: 1.0
References: <4FACB581.2050609@ozlabs.ru>
	<6A22E211-BC82-49BD-A335-02D3BAA14A17@suse.de>
	<4FAD0A4F.2050506@ozlabs.ru>
	<E3C5DF35-72FF-4247-BC86-F3374E2B40E6@suse.de>
	<4FB080CE.3030703@ozlabs.ru> <4FB5DA43.90907@ozlabs.ru>
	<1337652170.2779.143.camel@pasglop>
	<6C472F5B-B8C3-48DE-B19B-00973AF6AC56@suse.de>
	<4FBB0B95.8050901@ozlabs.ru>
	<82643009-4F43-407F-B26C-C36537825BFD@suse.de>
	<4FBB2E25.2030206@ozlabs.ru>
	<584A5E54-2119-415C-93B4-BB91A08CA729@suse.de>
	<4FD1BC14.6030900@ozlabs.ru> <4FD1DA5C.5020900@siemens.com>
	<4FD1DF29.1050303@ozlabs.ru> <4FD1E25A.2010900@siemens.com>
	<4FD2058F.7030903@ozlabs.ru>
In-Reply-To: <4FD2058F.7030903@ozlabs.ru>
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability
 enhancement to prevent damaging config space
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>, Alexander Graf <agraf@suse.de>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Alex Williamson <alex.williamson@redhat.com>, "anthony@codemonkey.ws" <anthony@codemonkey.ws>, David Gibson <david@gibson.dropbear.id.au>

On 2012-06-08 16:00, Alexey Kardashevskiy wrote:
> 08.06.2012 21:30, Jan Kiszka =D0=C9=DB=C5=D4:
>> On 2012-06-08 13:16, Alexey Kardashevskiy wrote:
>>> 08.06.2012 20:56, Jan Kiszka =CE=C1=D0=C9=D3=C1=CC:
>>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote:
>>>>> Yet another try :)
>>>>>
>>>>> Normally the pci_add_capability is called on devices to add new
>>>>> capability. This is ok for emulated devices which capabilities list
>>>>> is being built by QEMU.
>>>>>
>>>>> In the case of VFIO the capability may already exist and adding new
>>>>
>>>> Why does it exit? VFIO should build the virtual capability list from
>>>> scratch (just like classic device assignment does), recreating the
>>>> layout of the physical device (except for masked out caps). In that
>>>> case, this conflict should become impossible, no?
>>>
>>> Normally capabilities in emulated devices are created by calling
>>> msi_init or msix_init - just when emulated device wants to advertise =
it
>>> to the guest.
>>>
>>> In the case of VFIO, there is a lot of capabilities which QEMU does n=
ot
>>> know and does not want to know about. They are read from the host ker=
nel
>>> as is. And we definitely want to pass these capabilities to the guest=
 as
>>> is, i.e. on the same position and the same number of them. Just for s=
ome
>>> we call pci_add_capability (indirectly!) if we want QEMU to support t=
hem
>>> somehow.
>>>
>>> If we invent some function which "readds" all the capabilities we got
>>> from the host to keep internal QEMU's PCIDevice data in sync, then we=
'll
>>> need to change every piece of code which adds capabilities.
>>
>> I can't follow. What is different in VFIO from device-assignment.c,
>> assigned_device_pci_cap_init (except that it already uses msi[x]_init,
>> something we need to fix in device-assignment.c)?
>=20
> What are device-assignment.c and assigned_device_pci_cap_init? Cannot
> find them in QEMU tree.

"Old-style" KVM device assignment is not yet upstream. You can find it
in qemu-kvm, hopefully in upstream soon as well.

>=20
> Ah, anyway. The main difference is QEMU does not emulate VFIO devices,
> it just a proxy to the host system. Or I do not understand the question.
>=20
>>> I noticed,
>>> this is very common approach here to change a lot for a very small th=
ing
>>> or rare case but I'd like to avoid this :)
>>>
>>>> But if pci_*add*_capability should actually be used like this (I dou=
bt
>>>> this),
>>>
>>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call
>>> msi_init/msix_init and they call pci_add_capability.
>>
>> You can't blame msi_init/msix_init for the fact that VFIO creates a
>> capability list with an existing MSI/MSI-X entry beforehand.
>=20
> VFIO does not create any capability. It gets them all from the host
> kernel and passes to the guest as is. VFIO only needs MSIX to be enable=
d
> in VFIO.

Just like any device in QEMU, also VFIO need to set up a virtual config
space when it registers with the PCI core layer. Even if the virtual one
is modeled after the real one, it is still _created_ by the VFIO
userspace part. And this creation process is obviously a bit messed up
so far. Fix this, but not by adding workarounds in the MSI or PCI layer.
Rather add all capabilities you want to expose to the guest via
pci_add_capability or, indirectly, via msi[x]_init at the right
position. Do not just copy the real config space over, that breaks the
core layer as we see.

Jan

--=20
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux