From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent
 damaging config space
Date: Fri, 08 Jun 2012 16:43:30 +0200
Message-ID: <4FD20F92.9080805@siemens.com>
References: <4FACB581.2050609@ozlabs.ru> <6A22E211-BC82-49BD-A335-02D3BAA14A17@suse.de> <4FAD0A4F.2050506@ozlabs.ru> <E3C5DF35-72FF-4247-BC86-F3374E2B40E6@suse.de> <4FB080CE.3030703@ozlabs.ru> <4FB5DA43.90907@ozlabs.ru> <1337652170.2779.143.camel@pasglop> <6C472F5B-B8C3-48DE-B19B-00973AF6AC56@suse.de> <4FBB0B95.8050901@ozlabs.ru> <82643009-4F43-407F-B26C-C36537825BFD@suse.de> <4FBB2E25.2030206@ozlabs.ru> <584A5E54-2119-415C-93B4-BB91A08CA729@suse.de> <4FD1BC14.6030900@ozlabs.ru> <4FD1DA5C.5020900@siemens.com> <4FD1DF29.1050303@ozlabs.ru> <4FD1E25A.2010900@siemens.com> <4FD2058F.7030903@ozlabs.ru>
Mime-Version: 1.0
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Alexander Graf <agraf@suse.de>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	"anthony@codemonkey.ws" <anthony@codemonkey.ws>,
	David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Return-path: <kvm-owner@vger.kernel.org>
Received: from thoth.sbs.de ([192.35.17.2]:19086 "EHLO thoth.sbs.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932212Ab2FHOnv (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 8 Jun 2012 10:43:51 -0400
In-Reply-To: <4FD2058F.7030903@ozlabs.ru>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 2012-06-08 16:00, Alexey Kardashevskiy wrote:
> 08.06.2012 21:30, Jan Kiszka =D0=C9=DB=C5=D4:
>> On 2012-06-08 13:16, Alexey Kardashevskiy wrote:
>>> 08.06.2012 20:56, Jan Kiszka =CE=C1=D0=C9=D3=C1=CC:
>>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote:
>>>>> Yet another try :)
>>>>>
>>>>> Normally the pci_add_capability is called on devices to add new
>>>>> capability. This is ok for emulated devices which capabilities li=
st
>>>>> is being built by QEMU.
>>>>>
>>>>> In the case of VFIO the capability may already exist and adding n=
ew
>>>>
>>>> Why does it exit? VFIO should build the virtual capability list fr=
om
>>>> scratch (just like classic device assignment does), recreating the
>>>> layout of the physical device (except for masked out caps). In tha=
t
>>>> case, this conflict should become impossible, no?
>>>
>>> Normally capabilities in emulated devices are created by calling
>>> msi_init or msix_init - just when emulated device wants to advertis=
e it
>>> to the guest.
>>>
>>> In the case of VFIO, there is a lot of capabilities which QEMU does=
 not
>>> know and does not want to know about. They are read from the host k=
ernel
>>> as is. And we definitely want to pass these capabilities to the gue=
st as
>>> is, i.e. on the same position and the same number of them. Just for=
 some
>>> we call pci_add_capability (indirectly!) if we want QEMU to support=
 them
>>> somehow.
>>>
>>> If we invent some function which "readds" all the capabilities we g=
ot
>>> from the host to keep internal QEMU's PCIDevice data in sync, then =
we'll
>>> need to change every piece of code which adds capabilities.
>>
>> I can't follow. What is different in VFIO from device-assignment.c,
>> assigned_device_pci_cap_init (except that it already uses msi[x]_ini=
t,
>> something we need to fix in device-assignment.c)?
>=20
> What are device-assignment.c and assigned_device_pci_cap_init? Cannot
> find them in QEMU tree.

"Old-style" KVM device assignment is not yet upstream. You can find it
in qemu-kvm, hopefully in upstream soon as well.

>=20
> Ah, anyway. The main difference is QEMU does not emulate VFIO devices=
,
> it just a proxy to the host system. Or I do not understand the questi=
on.
>=20
>>> I noticed,
>>> this is very common approach here to change a lot for a very small =
thing
>>> or rare case but I'd like to avoid this :)
>>>
>>>> But if pci_*add*_capability should actually be used like this (I d=
oubt
>>>> this),
>>>
>>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call
>>> msi_init/msix_init and they call pci_add_capability.
>>
>> You can't blame msi_init/msix_init for the fact that VFIO creates a
>> capability list with an existing MSI/MSI-X entry beforehand.
>=20
> VFIO does not create any capability. It gets them all from the host
> kernel and passes to the guest as is. VFIO only needs MSIX to be enab=
led
> in VFIO.

Just like any device in QEMU, also VFIO need to set up a virtual config
space when it registers with the PCI core layer. Even if the virtual on=
e
is modeled after the real one, it is still _created_ by the VFIO
userspace part. And this creation process is obviously a bit messed up
so far. Fix this, but not by adding workarounds in the MSI or PCI layer=
=2E
Rather add all capabilities you want to expose to the guest via
pci_add_capability or, indirectly, via msi[x]_init at the right
position. Do not just copy the real config space over, that breaks the
core layer as we see.

Jan

--=20
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux