From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexey Kardashevskiy <aik@ozlabs.ru>
Subject: Re: [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent
 damaging config space
Date: Sat, 09 Jun 2012 00:00:47 +1000
Message-ID: <4FD2058F.7030903@ozlabs.ru>
References: <4FACB581.2050609@ozlabs.ru> <6A22E211-BC82-49BD-A335-02D3BAA14A17@suse.de> <4FAD0A4F.2050506@ozlabs.ru> <E3C5DF35-72FF-4247-BC86-F3374E2B40E6@suse.de> <4FB080CE.3030703@ozlabs.ru> <4FB5DA43.90907@ozlabs.ru> <1337652170.2779.143.camel@pasglop> <6C472F5B-B8C3-48DE-B19B-00973AF6AC56@suse.de> <4FBB0B95.8050901@ozlabs.ru> <82643009-4F43-407F-B26C-C36537825BFD@suse.de> <4FBB2E25.2030206@ozlabs.ru> <584A5E54-2119-415C-93B4-BB91A08CA729@suse.de> <4FD1BC14.6030900@ozlabs.ru> <4FD1DA5C.5020900@siemens.com> <4FD1DF29.1050303@ozlabs.ru> <4FD1E25A.2010900@siemens.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Alexander Graf <agraf@suse.de>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	"anthony@codemonkey.ws" <anthony@codemonkey.ws>,
	David Gibson <david@gibson.dropbear.id.au>
To: Jan Kiszka <jan.kiszka@siemens.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-pz0-f46.google.com ([209.85.210.46]:38075 "EHLO
	mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1755230Ab2FHOAs (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 8 Jun 2012 10:00:48 -0400
Received: by dady13 with SMTP id y13so2391905dad.19
        for <kvm@vger.kernel.org>; Fri, 08 Jun 2012 07:00:47 -0700 (PDT)
In-Reply-To: <4FD1E25A.2010900@siemens.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

08.06.2012 21:30, Jan Kiszka =D0=C9=DB=C5=D4:
> On 2012-06-08 13:16, Alexey Kardashevskiy wrote:
>> 08.06.2012 20:56, Jan Kiszka =CE=C1=D0=C9=D3=C1=CC:
>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote:
>>>> Yet another try :)
>>>>
>>>> Normally the pci_add_capability is called on devices to add new
>>>> capability. This is ok for emulated devices which capabilities lis=
t
>>>> is being built by QEMU.
>>>>
>>>> In the case of VFIO the capability may already exist and adding ne=
w
>>>
>>> Why does it exit? VFIO should build the virtual capability list fro=
m
>>> scratch (just like classic device assignment does), recreating the
>>> layout of the physical device (except for masked out caps). In that
>>> case, this conflict should become impossible, no?
>>
>> Normally capabilities in emulated devices are created by calling
>> msi_init or msix_init - just when emulated device wants to advertise=
 it
>> to the guest.
>>
>> In the case of VFIO, there is a lot of capabilities which QEMU does =
not
>> know and does not want to know about. They are read from the host ke=
rnel
>> as is. And we definitely want to pass these capabilities to the gues=
t as
>> is, i.e. on the same position and the same number of them. Just for =
some
>> we call pci_add_capability (indirectly!) if we want QEMU to support =
them
>> somehow.
>>
>> If we invent some function which "readds" all the capabilities we go=
t
>> from the host to keep internal QEMU's PCIDevice data in sync, then w=
e'll
>> need to change every piece of code which adds capabilities.
>=20
> I can't follow. What is different in VFIO from device-assignment.c,
> assigned_device_pci_cap_init (except that it already uses msi[x]_init=
,
> something we need to fix in device-assignment.c)?

What are device-assignment.c and assigned_device_pci_cap_init? Cannot
find them in QEMU tree.

Ah, anyway. The main difference is QEMU does not emulate VFIO devices,
it just a proxy to the host system. Or I do not understand the question=
=2E

>> I noticed,
>> this is very common approach here to change a lot for a very small t=
hing
>> or rare case but I'd like to avoid this :)
>>
>>> But if pci_*add*_capability should actually be used like this (I do=
ubt
>>> this),
>>
>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call
>> msi_init/msix_init and they call pci_add_capability.
>=20
> You can't blame msi_init/msix_init for the fact that VFIO creates a
> capability list with an existing MSI/MSI-X entry beforehand.

VFIO does not create any capability. It gets them all from the host
kernel and passes to the guest as is. VFIO only needs MSIX to be enable=
d
in VFIO.

>>> some renaming would be required. "Add" sound like "append" to me,
>>> not "update".
>>
>> It is "add" for all the cases but VFIO. VFIO is the very special cas=
e
>> and I do not see another one doing the same soon.
>=20
> PCI device assignment may have some special requirements. Then it is
> either required to generalize common services properly or keep the
> specialty local. So far, this proposal does not fall in any of those =
two
> categories.

It is a common patch. It does not know about VFIO and lets
pci_add_capability handle one more situation when the capability alread=
y
exists.

The only "common" solution I see here is
1) to add pci_fixup_capabilities() which would mark all the bytes of
existing capabilities as "used", we will call it once we fetched the
config space from the host kernel
2) to fix pci_add_capabilities not to fail and simply return (0?) if we
add a capability which already exists.

Will it be ok?


--=20
With best regards

Alexey Kardashevskiy -- icq: 52150396