[Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
@ 2012-05-11  6:45 Alexey Kardashevskiy
  2012-05-11 10:52 ` Alexander Graf
  2012-05-11 19:20 ` Jason Baron
  0 siblings, 2 replies; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-05-11  6:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: kvm, Alex Graf, Alex Williamson, anthony, David Gibson

Normally the pci_add_capability is called on devices to add new
capability. This is ok for emulated devices which capabilities list
is being built by QEMU.

In the case of VFIO the capability may already exist and adding new
capability into the beginning of the linked list may create a loop.

For example, the old code destroys the following config
of PCIe Intel E1000E:

before adding PCI_CAP_ID_MSI (0x05):
0x34: 0xC8
0xC8: 0x01 0xD0
0xD0: 0x05 0xE0
0xE0: 0x10 0x00

after:
0x34: 0xD0
0xC8: 0x01 0xD0
0xD0: 0x05 0xC8
0xE0: 0x10 0x00

As result capabilities 0x01 and 0x05 point to each other.

The proposed patch does not change capability pointers when
the same type capability is about to add.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/pci.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index aa0c0b8..1f7c924 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
     }

     config = pdev->config + offset;
-    config[PCI_CAP_LIST_ID] = cap_id;
-    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
-    pdev->config[PCI_CAPABILITY_LIST] = offset;
-    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+    if (config[PCI_CAP_LIST_ID] != cap_id) {
+        config[PCI_CAP_LIST_ID] = cap_id;
+        config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
+        pdev->config[PCI_CAPABILITY_LIST] = offset;
+        pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+    }
     memset(pdev->used + offset, 0xFF, size);
     /* Make capability read-only by default */
     memset(pdev->wmask + offset, 0, size);


-- 
Alexey

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-11  6:45 [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space Alexey Kardashevskiy
@ 2012-05-11 10:52 ` Alexander Graf
  2012-05-11 12:47   ` Alexey Kardashevskiy
  2012-05-11 19:20 ` Jason Baron
  1 sibling, 1 reply; 29+ messages in thread
From: Alexander Graf @ 2012-05-11 10:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson


On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:

> Normally the pci_add_capability is called on devices to add new
> capability. This is ok for emulated devices which capabilities list
> is being built by QEMU.
> 
> In the case of VFIO the capability may already exist and adding new
> capability into the beginning of the linked list may create a loop.
> 
> For example, the old code destroys the following config
> of PCIe Intel E1000E:
> 
> before adding PCI_CAP_ID_MSI (0x05):
> 0x34: 0xC8
> 0xC8: 0x01 0xD0
> 0xD0: 0x05 0xE0
> 0xE0: 0x10 0x00
> 
> after:
> 0x34: 0xD0
> 0xC8: 0x01 0xD0
> 0xD0: 0x05 0xC8
> 0xE0: 0x10 0x00
> 
> As result capabilities 0x01 and 0x05 point to each other.
> 
> The proposed patch does not change capability pointers when
> the same type capability is about to add.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> hw/pci.c |   10 ++++++----
> 1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index aa0c0b8..1f7c924 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>     }
> 
>     config = pdev->config + offset;
> -    config[PCI_CAP_LIST_ID] = cap_id;
> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> +    if (config[PCI_CAP_LIST_ID] != cap_id) {

This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either

  * replace the existing one or
  * drop out and not write the new one in.

I'm not sure which way would be more natural.

> +        config[PCI_CAP_LIST_ID] = cap_id;
> +        config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> +        pdev->config[PCI_CAPABILITY_LIST] = offset;
> +        pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> +    }
>     memset(pdev->used + offset, 0xFF, size);
>     /* Make capability read-only by default */
>     memset(pdev->wmask + offset, 0, size);
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-11 10:52 ` Alexander Graf
@ 2012-05-11 12:47   ` Alexey Kardashevskiy
  2012-05-11 14:13     ` Alexander Graf
  0 siblings, 1 reply; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-05-11 12:47 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson

11.05.2012 20:52, Alexander Graf написал:
> 
> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
> 
>> Normally the pci_add_capability is called on devices to add new
>> capability. This is ok for emulated devices which capabilities list
>> is being built by QEMU.
>>
>> In the case of VFIO the capability may already exist and adding new
>> capability into the beginning of the linked list may create a loop.
>>
>> For example, the old code destroys the following config
>> of PCIe Intel E1000E:
>>
>> before adding PCI_CAP_ID_MSI (0x05):
>> 0x34: 0xC8
>> 0xC8: 0x01 0xD0
>> 0xD0: 0x05 0xE0
>> 0xE0: 0x10 0x00
>>
>> after:
>> 0x34: 0xD0
>> 0xC8: 0x01 0xD0
>> 0xD0: 0x05 0xC8
>> 0xE0: 0x10 0x00
>>
>> As result capabilities 0x01 and 0x05 point to each other.
>>
>> The proposed patch does not change capability pointers when
>> the same type capability is about to add.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> hw/pci.c |   10 ++++++----
>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/pci.c b/hw/pci.c
>> index aa0c0b8..1f7c924 100644
>> --- a/hw/pci.c
>> +++ b/hw/pci.c
>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>     }
>>
>>     config = pdev->config + offset;
>> -    config[PCI_CAP_LIST_ID] = cap_id;
>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
> 
> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>   * replace the existing one or
>   * drop out and not write the new one in.
> 
> I'm not sure which way would be more natural.

There is a third option - add another function, lets call it
pci_fixup_capability() which would do whatever pci_add_capability() does
but won't touch list pointers.

When vfio, pci_add_capability() is called from the code which knows
exactly that the capability exists and where it is and it calls
pci_add_capability() based on this knowledge so doing additional loops
just for imaginery scalability is a bit weird, no?


>> +        config[PCI_CAP_LIST_ID] = cap_id;
>> +        config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>> +        pdev->config[PCI_CAPABILITY_LIST] = offset;
>> +        pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>> +    }
>>     memset(pdev->used + offset, 0xFF, size);
>>     /* Make capability read-only by default */
>>     memset(pdev->wmask + offset, 0, size);


-- 
With best regards

Alexey Kardashevskiy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-11 12:47   ` Alexey Kardashevskiy
@ 2012-05-11 14:13     ` Alexander Graf
  2012-05-14  3:49       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Graf @ 2012-05-11 14:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson


On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:

> 11.05.2012 20:52, Alexander Graf написал:
>> 
>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>> 
>>> Normally the pci_add_capability is called on devices to add new
>>> capability. This is ok for emulated devices which capabilities list
>>> is being built by QEMU.
>>> 
>>> In the case of VFIO the capability may already exist and adding new
>>> capability into the beginning of the linked list may create a loop.
>>> 
>>> For example, the old code destroys the following config
>>> of PCIe Intel E1000E:
>>> 
>>> before adding PCI_CAP_ID_MSI (0x05):
>>> 0x34: 0xC8
>>> 0xC8: 0x01 0xD0
>>> 0xD0: 0x05 0xE0
>>> 0xE0: 0x10 0x00
>>> 
>>> after:
>>> 0x34: 0xD0
>>> 0xC8: 0x01 0xD0
>>> 0xD0: 0x05 0xC8
>>> 0xE0: 0x10 0x00
>>> 
>>> As result capabilities 0x01 and 0x05 point to each other.
>>> 
>>> The proposed patch does not change capability pointers when
>>> the same type capability is about to add.
>>> 
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>> hw/pci.c |   10 ++++++----
>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/hw/pci.c b/hw/pci.c
>>> index aa0c0b8..1f7c924 100644
>>> --- a/hw/pci.c
>>> +++ b/hw/pci.c
>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>    }
>>> 
>>>    config = pdev->config + offset;
>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>> 
>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>  * replace the existing one or
>>  * drop out and not write the new one in.

  * hw_error :)

>> 
>> I'm not sure which way would be more natural.
> 
> There is a third option - add another function, lets call it
> pci_fixup_capability() which would do whatever pci_add_capability() does
> but won't touch list pointers.

What good is a function that breaks internal consistency?

> When vfio, pci_add_capability() is called from the code which knows
> exactly that the capability exists and where it is and it calls
> pci_add_capability() based on this knowledge so doing additional loops
> just for imaginery scalability is a bit weird, no?

Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.


Alex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-11 14:13     ` Alexander Graf
@ 2012-05-14  3:49       ` Alexey Kardashevskiy
  2012-05-18  5:12         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-05-14  3:49 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson

On 12/05/12 00:13, Alexander Graf wrote:
> 
> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
> 
>> 11.05.2012 20:52, Alexander Graf написал:
>>>
>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>
>>>> Normally the pci_add_capability is called on devices to add new
>>>> capability. This is ok for emulated devices which capabilities list
>>>> is being built by QEMU.
>>>>
>>>> In the case of VFIO the capability may already exist and adding new
>>>> capability into the beginning of the linked list may create a loop.
>>>>
>>>> For example, the old code destroys the following config
>>>> of PCIe Intel E1000E:
>>>>
>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>> 0x34: 0xC8
>>>> 0xC8: 0x01 0xD0
>>>> 0xD0: 0x05 0xE0
>>>> 0xE0: 0x10 0x00
>>>>
>>>> after:
>>>> 0x34: 0xD0
>>>> 0xC8: 0x01 0xD0
>>>> 0xD0: 0x05 0xC8
>>>> 0xE0: 0x10 0x00
>>>>
>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>
>>>> The proposed patch does not change capability pointers when
>>>> the same type capability is about to add.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>> hw/pci.c |   10 ++++++----
>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>> index aa0c0b8..1f7c924 100644
>>>> --- a/hw/pci.c
>>>> +++ b/hw/pci.c
>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>    }
>>>>
>>>>    config = pdev->config + offset;
>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>
>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>  * replace the existing one or
>>>  * drop out and not write the new one in.
> 
>   * hw_error :)
> 
>>>
>>> I'm not sure which way would be more natural.
>>
>> There is a third option - add another function, lets call it
>> pci_fixup_capability() which would do whatever pci_add_capability() does
>> but won't touch list pointers.
> 
> What good is a function that breaks internal consistency?


It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
the whole list and add a capability if it does not exist. Emulated devices which care about having a
capability at some fixed offset would have initialized their config space before calling this
capabilities API (as VFIO does).

If we really want to support emulated devices which want some capabilities be at fixed offset and
others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
driver may care about its offset.



>> When vfio, pci_add_capability() is called from the code which knows
>> exactly that the capability exists and where it is and it calls
>> pci_add_capability() based on this knowledge so doing additional loops
>> just for imaginery scalability is a bit weird, no?
> 
> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
> 
> 
> Alex
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-14  3:49       ` Alexey Kardashevskiy
@ 2012-05-18  5:12         ` Alexey Kardashevskiy
  2012-05-22  2:02           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-05-18  5:12 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson

Alexander,

Is that any better? :)


@@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
  * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
                        uint8_t offset, uint8_t size)
 {
-    uint8_t *config;
+    uint8_t *config, existing;
     int i, overlapping_cap;

+    existing = pci_find_capability(pdev, cap_id);
+    if (existing) {
+        if (offset && (existing != offset)) {
+            return -EEXIST;
+        }
+        for (i = existing; i < size; ++i) {
+            if (pdev->used[i]) {
+                return -EFAULT;
+            }
+        }
+        memset(pdev->used + offset, 0xFF, size);
+        /* Make capability read-only by default */
+        memset(pdev->wmask + offset, 0, size);
+        /* Check capability by default */
+        memset(pdev->cmask + offset, 0xFF, size);
+        return existing;
+    }
+
     if (!offset) {
         offset = pci_find_space(pdev, size);
         if (!offset) {
             return -ENOSPC;






On 14/05/12 13:49, Alexey Kardashevskiy wrote:
> On 12/05/12 00:13, Alexander Graf wrote:
>>
>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>
>>> 11.05.2012 20:52, Alexander Graf написал:
>>>>
>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>
>>>>> Normally the pci_add_capability is called on devices to add new
>>>>> capability. This is ok for emulated devices which capabilities list
>>>>> is being built by QEMU.
>>>>>
>>>>> In the case of VFIO the capability may already exist and adding new
>>>>> capability into the beginning of the linked list may create a loop.
>>>>>
>>>>> For example, the old code destroys the following config
>>>>> of PCIe Intel E1000E:
>>>>>
>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>> 0x34: 0xC8
>>>>> 0xC8: 0x01 0xD0
>>>>> 0xD0: 0x05 0xE0
>>>>> 0xE0: 0x10 0x00
>>>>>
>>>>> after:
>>>>> 0x34: 0xD0
>>>>> 0xC8: 0x01 0xD0
>>>>> 0xD0: 0x05 0xC8
>>>>> 0xE0: 0x10 0x00
>>>>>
>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>
>>>>> The proposed patch does not change capability pointers when
>>>>> the same type capability is about to add.
>>>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> ---
>>>>> hw/pci.c |   10 ++++++----
>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>> index aa0c0b8..1f7c924 100644
>>>>> --- a/hw/pci.c
>>>>> +++ b/hw/pci.c
>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>    }
>>>>>
>>>>>    config = pdev->config + offset;
>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>
>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>  * replace the existing one or
>>>>  * drop out and not write the new one in.
>>
>>   * hw_error :)
>>
>>>>
>>>> I'm not sure which way would be more natural.
>>>
>>> There is a third option - add another function, lets call it
>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>> but won't touch list pointers.
>>
>> What good is a function that breaks internal consistency?
> 
> 
> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
> the whole list and add a capability if it does not exist. Emulated devices which care about having a
> capability at some fixed offset would have initialized their config space before calling this
> capabilities API (as VFIO does).
> 
> If we really want to support emulated devices which want some capabilities be at fixed offset and
> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
> driver may care about its offset.
> 
> 
> 
>>> When vfio, pci_add_capability() is called from the code which knows
>>> exactly that the capability exists and where it is and it calls
>>> pci_add_capability() based on this knowledge so doing additional loops
>>> just for imaginery scalability is a bit weird, no?
>>
>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
>>
>>
>> Alex
>>
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-18  5:12         ` Alexey Kardashevskiy
@ 2012-05-22  2:02           ` Benjamin Herrenschmidt
  2012-05-22  3:21             ` Alexander Graf
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-22  2:02 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Alexander Graf, qemu-devel, Alex Williamson, anthony,
	David Gibson

On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
> Alexander,
> 
> Is that any better? :)

Alex (Graf that is), ping ?

The original patch from Alexey was fine btw.

VFIO will always call things with the existing capability offset so
there's no real risk of doing the wrong thing or break the list or
anything.

IE. A small simple patch that addresses the problem :-)

The new patch is a bit more "robust" I believe, I don't think we need to
go too far to fix a problem we don't have. But we need a fix for the
real issue and the simple patch does it neatly from what I can
understand.

Cheers,
Ben.

> 
> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>   * in pci config space */
>  int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>                         uint8_t offset, uint8_t size)
>  {
> -    uint8_t *config;
> +    uint8_t *config, existing;
>      int i, overlapping_cap;
> 
> +    existing = pci_find_capability(pdev, cap_id);
> +    if (existing) {
> +        if (offset && (existing != offset)) {
> +            return -EEXIST;
> +        }
> +        for (i = existing; i < size; ++i) {
> +            if (pdev->used[i]) {
> +                return -EFAULT;
> +            }
> +        }
> +        memset(pdev->used + offset, 0xFF, size);
> +        /* Make capability read-only by default */
> +        memset(pdev->wmask + offset, 0, size);
> +        /* Check capability by default */
> +        memset(pdev->cmask + offset, 0xFF, size);
> +        return existing;
> +    }
> +
>      if (!offset) {
>          offset = pci_find_space(pdev, size);
>          if (!offset) {
>              return -ENOSPC;
> 
> 
> 
> 
> 
> 
> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
> > On 12/05/12 00:13, Alexander Graf wrote:
> >>
> >> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
> >>
> >>> 11.05.2012 20:52, Alexander Graf написал:
> >>>>
> >>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
> >>>>
> >>>>> Normally the pci_add_capability is called on devices to add new
> >>>>> capability. This is ok for emulated devices which capabilities list
> >>>>> is being built by QEMU.
> >>>>>
> >>>>> In the case of VFIO the capability may already exist and adding new
> >>>>> capability into the beginning of the linked list may create a loop.
> >>>>>
> >>>>> For example, the old code destroys the following config
> >>>>> of PCIe Intel E1000E:
> >>>>>
> >>>>> before adding PCI_CAP_ID_MSI (0x05):
> >>>>> 0x34: 0xC8
> >>>>> 0xC8: 0x01 0xD0
> >>>>> 0xD0: 0x05 0xE0
> >>>>> 0xE0: 0x10 0x00
> >>>>>
> >>>>> after:
> >>>>> 0x34: 0xD0
> >>>>> 0xC8: 0x01 0xD0
> >>>>> 0xD0: 0x05 0xC8
> >>>>> 0xE0: 0x10 0x00
> >>>>>
> >>>>> As result capabilities 0x01 and 0x05 point to each other.
> >>>>>
> >>>>> The proposed patch does not change capability pointers when
> >>>>> the same type capability is about to add.
> >>>>>
> >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>> ---
> >>>>> hw/pci.c |   10 ++++++----
> >>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/hw/pci.c b/hw/pci.c
> >>>>> index aa0c0b8..1f7c924 100644
> >>>>> --- a/hw/pci.c
> >>>>> +++ b/hw/pci.c
> >>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
> >>>>>    }
> >>>>>
> >>>>>    config = pdev->config + offset;
> >>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
> >>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> >>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
> >>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> >>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
> >>>>
> >>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
> >>>>  * replace the existing one or
> >>>>  * drop out and not write the new one in.
> >>
> >>   * hw_error :)
> >>
> >>>>
> >>>> I'm not sure which way would be more natural.
> >>>
> >>> There is a third option - add another function, lets call it
> >>> pci_fixup_capability() which would do whatever pci_add_capability() does
> >>> but won't touch list pointers.
> >>
> >> What good is a function that breaks internal consistency?
> > 
> > 
> > It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
> > the whole list and add a capability if it does not exist. Emulated devices which care about having a
> > capability at some fixed offset would have initialized their config space before calling this
> > capabilities API (as VFIO does).
> > 
> > If we really want to support emulated devices which want some capabilities be at fixed offset and
> > others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
> > by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
> > driver may care about its offset.
> > 
> > 
> > 
> >>> When vfio, pci_add_capability() is called from the code which knows
> >>> exactly that the capability exists and where it is and it calls
> >>> pci_add_capability() based on this knowledge so doing additional loops
> >>> just for imaginery scalability is a bit weird, no?
> >>
> >> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
> >>
> >>
> >> Alex
> >>
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  2:02           ` Benjamin Herrenschmidt
@ 2012-05-22  3:21             ` Alexander Graf
  2012-05-22  3:44               ` Alexey Kardashevskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Graf @ 2012-05-22  3:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, qemu-devel@nongnu.org,
	Alex Williamson, anthony@codemonkey.ws, David Gibson



On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>> Alexander,
>> 
>> Is that any better? :)
> 
> Alex (Graf that is), ping ?
> 
> The original patch from Alexey was fine btw.
> 
> VFIO will always call things with the existing capability offset so
> there's no real risk of doing the wrong thing or break the list or
> anything.
> 
> IE. A small simple patch that addresses the problem :-)
> 
> The new patch is a bit more "robust" I believe, I don't think we need to
> go too far to fix a problem we don't have. But we need a fix for the
> real issue and the simple patch does it neatly from what I can
> understand.
> 
> Cheers,
> Ben.
> 
>> 
>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>  * in pci config space */
>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>                        uint8_t offset, uint8_t size)
>> {
>> -    uint8_t *config;
>> +    uint8_t *config, existing;

Existing is a pointer to the target dev's config space, right?

>>     int i, overlapping_cap;
>> 
>> +    existing = pci_find_capability(pdev, cap_id);
>> +    if (existing) {
>> +        if (offset && (existing != offset)) {
>> +            return -EEXIST;
>> +        }
>> +        for (i = existing; i < size; ++i) {

So how does this possibly make sense?

>> +            if (pdev->used[i]) {
>> +                return -EFAULT;
>> +            }
>> +        }
>> +        memset(pdev->used + offset, 0xFF, size);

Why?

>> +        /* Make capability read-only by default */
>> +        memset(pdev->wmask + offset, 0, size);

Why?

>> +        /* Check capability by default */
>> +        memset(pdev->cmask + offset, 0xFF, size);

I don't understand this part either.


Alex

>> +        return existing;
>> +    }
>> +
>>     if (!offset) {
>>         offset = pci_find_space(pdev, size);
>>         if (!offset) {
>>             return -ENOSPC;
>> 
>> 
>> 
>> 
>> 
>> 
>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>> 
>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>> 
>>>>> 11.05.2012 20:52, Alexander Graf написал:
>>>>>> 
>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>> 
>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>> is being built by QEMU.
>>>>>>> 
>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>> 
>>>>>>> For example, the old code destroys the following config
>>>>>>> of PCIe Intel E1000E:
>>>>>>> 
>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>> 0x34: 0xC8
>>>>>>> 0xC8: 0x01 0xD0
>>>>>>> 0xD0: 0x05 0xE0
>>>>>>> 0xE0: 0x10 0x00
>>>>>>> 
>>>>>>> after:
>>>>>>> 0x34: 0xD0
>>>>>>> 0xC8: 0x01 0xD0
>>>>>>> 0xD0: 0x05 0xC8
>>>>>>> 0xE0: 0x10 0x00
>>>>>>> 
>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>> 
>>>>>>> The proposed patch does not change capability pointers when
>>>>>>> the same type capability is about to add.
>>>>>>> 
>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>> ---
>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>> --- a/hw/pci.c
>>>>>>> +++ b/hw/pci.c
>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>   }
>>>>>>> 
>>>>>>>   config = pdev->config + offset;
>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>> 
>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>>> * replace the existing one or
>>>>>> * drop out and not write the new one in.
>>>> 
>>>>  * hw_error :)
>>>> 
>>>>>> 
>>>>>> I'm not sure which way would be more natural.
>>>>> 
>>>>> There is a third option - add another function, lets call it
>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>>>> but won't touch list pointers.
>>>> 
>>>> What good is a function that breaks internal consistency?
>>> 
>>> 
>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a
>>> capability at some fixed offset would have initialized their config space before calling this
>>> capabilities API (as VFIO does).
>>> 
>>> If we really want to support emulated devices which want some capabilities be at fixed offset and
>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
>>> driver may care about its offset.
>>> 
>>> 
>>> 
>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>> exactly that the capability exists and where it is and it calls
>>>>> pci_add_capability() based on this knowledge so doing additional loops
>>>>> just for imaginery scalability is a bit weird, no?
>>>> 
>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
>>>> 
>>>> 
>>>> Alex
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  3:21             ` Alexander Graf
@ 2012-05-22  3:44               ` Alexey Kardashevskiy
  2012-05-22  5:52                 ` Alexander Graf
  0 siblings, 1 reply; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-05-22  3:44 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson,
	anthony@codemonkey.ws, David Gibson

On 22/05/12 13:21, Alexander Graf wrote:
> 
> 
> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>> Alexander,
>>>
>>> Is that any better? :)
>>
>> Alex (Graf that is), ping ?
>>
>> The original patch from Alexey was fine btw.
>>
>> VFIO will always call things with the existing capability offset so
>> there's no real risk of doing the wrong thing or break the list or
>> anything.
>>
>> IE. A small simple patch that addresses the problem :-)
>>
>> The new patch is a bit more "robust" I believe, I don't think we need to
>> go too far to fix a problem we don't have. But we need a fix for the
>> real issue and the simple patch does it neatly from what I can
>> understand.
>>
>> Cheers,
>> Ben.
>>
>>>
>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>  * in pci config space */
>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>                        uint8_t offset, uint8_t size)
>>> {
>>> -    uint8_t *config;
>>> +    uint8_t *config, existing;
> 
> Existing is a pointer to the target dev's config space, right?

Yes.

>>>     int i, overlapping_cap;
>>>
>>> +    existing = pci_find_capability(pdev, cap_id);
>>> +    if (existing) {
>>> +        if (offset && (existing != offset)) {
>>> +            return -EEXIST;
>>> +        }
>>> +        for (i = existing; i < size; ++i) {
> 
> So how does this possibly make sense?

Although I do not expect VFIO to add capabilities (does not make sense), I still want to double
check that this space has not been tried to use by someone else.

>>> +            if (pdev->used[i]) {
>>> +                return -EFAULT;
>>> +            }
>>> +        }
>>> +        memset(pdev->used + offset, 0xFF, size);
> Why?

Because I am marking the space this capability takes as used.

>>> +        /* Make capability read-only by default */
>>> +        memset(pdev->wmask + offset, 0, size);
> Why?

Because the pci_add_capability() does it for a new capability by default.


>>> +        /* Check capability by default */
>>> +        memset(pdev->cmask + offset, 0xFF, size);
> 
> I don't understand this part either.

The pci_add_capability() does it for a new capability by default.



> 
> Alex
> 
>>> +        return existing;
>>> +    }
>>> +
>>>     if (!offset) {
>>>         offset = pci_find_space(pdev, size);
>>>         if (!offset) {
>>>             return -ENOSPC;
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>>>
>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>>>
>>>>>> 11.05.2012 20:52, Alexander Graf написал:
>>>>>>>
>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>>>
>>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>>> is being built by QEMU.
>>>>>>>>
>>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>>>
>>>>>>>> For example, the old code destroys the following config
>>>>>>>> of PCIe Intel E1000E:
>>>>>>>>
>>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>>> 0x34: 0xC8
>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>> 0xD0: 0x05 0xE0
>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>
>>>>>>>> after:
>>>>>>>> 0x34: 0xD0
>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>> 0xD0: 0x05 0xC8
>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>
>>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>>>
>>>>>>>> The proposed patch does not change capability pointers when
>>>>>>>> the same type capability is about to add.
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>> ---
>>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>>> --- a/hw/pci.c
>>>>>>>> +++ b/hw/pci.c
>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   config = pdev->config + offset;
>>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>>>
>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>>>> * replace the existing one or
>>>>>>> * drop out and not write the new one in.
>>>>>
>>>>>  * hw_error :)
>>>>>
>>>>>>>
>>>>>>> I'm not sure which way would be more natural.
>>>>>>
>>>>>> There is a third option - add another function, lets call it
>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>>>>> but won't touch list pointers.
>>>>>
>>>>> What good is a function that breaks internal consistency?
>>>>
>>>>
>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a
>>>> capability at some fixed offset would have initialized their config space before calling this
>>>> capabilities API (as VFIO does).
>>>>
>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and
>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
>>>> driver may care about its offset.
>>>>
>>>>
>>>>
>>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>>> exactly that the capability exists and where it is and it calls
>>>>>> pci_add_capability() based on this knowledge so doing additional loops
>>>>>> just for imaginery scalability is a bit weird, no?
>>>>>
>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
>>>>>
>>>>>
>>>>> Alex
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>


-- 
Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  3:44               ` Alexey Kardashevskiy
@ 2012-05-22  5:52                 ` Alexander Graf
  2012-05-22  6:11                   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Graf @ 2012-05-22  5:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson,
	anthony@codemonkey.ws, David Gibson



On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 22/05/12 13:21, Alexander Graf wrote:
>> 
>> 
>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>> 
>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>>> Alexander,
>>>> 
>>>> Is that any better? :)
>>> 
>>> Alex (Graf that is), ping ?
>>> 
>>> The original patch from Alexey was fine btw.
>>> 
>>> VFIO will always call things with the existing capability offset so
>>> there's no real risk of doing the wrong thing or break the list or
>>> anything.
>>> 
>>> IE. A small simple patch that addresses the problem :-)
>>> 
>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>> go too far to fix a problem we don't have. But we need a fix for the
>>> real issue and the simple patch does it neatly from what I can
>>> understand.
>>> 
>>> Cheers,
>>> Ben.
>>> 
>>>> 
>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>> * in pci config space */
>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>                       uint8_t offset, uint8_t size)
>>>> {
>>>> -    uint8_t *config;
>>>> +    uint8_t *config, existing;
>> 
>> Existing is a pointer to the target dev's config space, right?
> 
> Yes.
> 
>>>>    int i, overlapping_cap;
>>>> 
>>>> +    existing = pci_find_capability(pdev, cap_id);
>>>> +    if (existing) {
>>>> +        if (offset && (existing != offset)) {
>>>> +            return -EEXIST;
>>>> +        }
>>>> +        for (i = existing; i < size; ++i) {
>> 
>> So how does this possibly make sense?
> 
> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double
> check that this space has not been tried to use by someone else.

i is an int. existing is a uint8_t*.

> 
>>>> +            if (pdev->used[i]) {
>>>> +                return -EFAULT;
>>>> +            }
>>>> +        }
>>>> +        memset(pdev->used + offset, 0xFF, size);
>> Why?
> 
> Because I am marking the space this capability takes as used.

But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no?

> 
>>>> +        /* Make capability read-only by default */
>>>> +        memset(pdev->wmask + offset, 0, size);
>> Why?
> 
> Because the pci_add_capability() does it for a new capability by default.

Hrm. So you're copying code? Can't you merge the overwrite and write cases?

Alex

> 
> 
>>>> +        /* Check capability by default */
>>>> +        memset(pdev->cmask + offset, 0xFF, size);
>> 
>> I don't understand this part either.
> 
> The pci_add_capability() does it for a new capability by default.
> 
> 
> 
>> 
>> Alex
>> 
>>>> +        return existing;
>>>> +    }
>>>> +
>>>>    if (!offset) {
>>>>        offset = pci_find_space(pdev, size);
>>>>        if (!offset) {
>>>>            return -ENOSPC;
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>>>> 
>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>>>> 
>>>>>>> 11.05.2012 20:52, Alexander Graf написал:
>>>>>>>> 
>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>>>> 
>>>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>>>> is being built by QEMU.
>>>>>>>>> 
>>>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>>>> 
>>>>>>>>> For example, the old code destroys the following config
>>>>>>>>> of PCIe Intel E1000E:
>>>>>>>>> 
>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>>>> 0x34: 0xC8
>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>> 0xD0: 0x05 0xE0
>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>> 
>>>>>>>>> after:
>>>>>>>>> 0x34: 0xD0
>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>> 0xD0: 0x05 0xC8
>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>> 
>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>>>> 
>>>>>>>>> The proposed patch does not change capability pointers when
>>>>>>>>> the same type capability is about to add.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>> ---
>>>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>>>> --- a/hw/pci.c
>>>>>>>>> +++ b/hw/pci.c
>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>>>  }
>>>>>>>>> 
>>>>>>>>>  config = pdev->config + offset;
>>>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>>>> 
>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>>>>> * replace the existing one or
>>>>>>>> * drop out and not write the new one in.
>>>>>> 
>>>>>> * hw_error :)
>>>>>> 
>>>>>>>> 
>>>>>>>> I'm not sure which way would be more natural.
>>>>>>> 
>>>>>>> There is a third option - add another function, lets call it
>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>>>>>> but won't touch list pointers.
>>>>>> 
>>>>>> What good is a function that breaks internal consistency?
>>>>> 
>>>>> 
>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a
>>>>> capability at some fixed offset would have initialized their config space before calling this
>>>>> capabilities API (as VFIO does).
>>>>> 
>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and
>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
>>>>> driver may care about its offset.
>>>>> 
>>>>> 
>>>>> 
>>>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>>>> exactly that the capability exists and where it is and it calls
>>>>>>> pci_add_capability() based on this knowledge so doing additional loops
>>>>>>> just for imaginery scalability is a bit weird, no?
>>>>>> 
>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
>>>>>> 
>>>>>> 
>>>>>> Alex
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  5:52                 ` Alexander Graf
@ 2012-05-22  6:11                   ` Alexey Kardashevskiy
  2012-05-22  6:31                     ` Alexander Graf
  2012-05-22  6:38                     ` Alexander Graf
  0 siblings, 2 replies; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-05-22  6:11 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson,
	anthony@codemonkey.ws, David Gibson

On 22/05/12 15:52, Alexander Graf wrote:
> 
> 
> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 22/05/12 13:21, Alexander Graf wrote:
>>>
>>>
>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>>>
>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>>>> Alexander,
>>>>>
>>>>> Is that any better? :)
>>>>
>>>> Alex (Graf that is), ping ?
>>>>
>>>> The original patch from Alexey was fine btw.
>>>>
>>>> VFIO will always call things with the existing capability offset so
>>>> there's no real risk of doing the wrong thing or break the list or
>>>> anything.
>>>>
>>>> IE. A small simple patch that addresses the problem :-)
>>>>
>>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>>> go too far to fix a problem we don't have. But we need a fix for the
>>>> real issue and the simple patch does it neatly from what I can
>>>> understand.
>>>>
>>>> Cheers,
>>>> Ben.
>>>>
>>>>>
>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>>> * in pci config space */
>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>                       uint8_t offset, uint8_t size)
>>>>> {
>>>>> -    uint8_t *config;
>>>>> +    uint8_t *config, existing;
>>>
>>> Existing is a pointer to the target dev's config space, right?
>>
>> Yes.
>>
>>>>>    int i, overlapping_cap;
>>>>>
>>>>> +    existing = pci_find_capability(pdev, cap_id);
>>>>> +    if (existing) {
>>>>> +        if (offset && (existing != offset)) {
>>>>> +            return -EEXIST;
>>>>> +        }
>>>>> +        for (i = existing; i < size; ++i) {
>>>
>>> So how does this possibly make sense?
>>
>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double
>> check that this space has not been tried to use by someone else.
> 
> i is an int. existing is a uint8_t*.


It was there before me. This function already does a loop and this is how it was coded at the first place.


>>>>> +            if (pdev->used[i]) {
>>>>> +                return -EFAULT;
>>>>> +            }
>>>>> +        }
>>>>> +        memset(pdev->used + offset, 0xFF, size);
>>> Why?
>>
>> Because I am marking the space this capability takes as used.
> 
> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no?


No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability().

Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities)  OR  hack msi_init/msix_init not to touch capabilities if they exist.



>>>>> +        /* Make capability read-only by default */
>>>>> +        memset(pdev->wmask + offset, 0, size);
>>> Why?
>>
>> Because the pci_add_capability() does it for a new capability by default.
> 
> Hrm. So you're copying code? Can't you merge the overwrite and write cases?

I am trying to make it as a single chunk which is as small as possible.


If it helps, below is the same patch with extended context to see what is going on in that function.






hw/pci.c |   20 +++++++++++++++++++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 63a8219..7008a42 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom)
     ptr = memory_region_get_ram_ptr(&pdev->rom);
     load_image(path, ptr);
     g_free(path);
 
     if (is_default_rom) {
         /* Only the default rom images will be patched (if needed). */
         pci_patch_ids(pdev, ptr, size);
     }
 
     qemu_put_ram_ptr(ptr);
 
     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
 
     return 0;
 }
 
 static void pci_del_option_rom(PCIDevice *pdev)
 {
     if (!pdev->has_rom)
         return;
 
     vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
     memory_region_destroy(&pdev->rom);
     pdev->has_rom = false;
 }
 
 /*
  * if !offset
  * Reserve space and add capability to the linked list in pci config space
  *
  * if offset = 0,
  * Find and reserve space and add capability to the linked list
  * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
                        uint8_t offset, uint8_t size)
 {
-    uint8_t *config;
+    uint8_t *config, existing;
     int i, overlapping_cap;
 
+    existing = pci_find_capability(pdev, cap_id);
+    if (existing) {
+        if (offset && (existing != offset)) {
+            return -EEXIST;
+        }
+        for (i = existing; i < size; ++i) {
+            if (pdev->used[i]) {
+                return -EFAULT;
+            }
+        }
+        memset(pdev->used + offset, 0xFF, size);
+        /* Make capability read-only by default */
+        memset(pdev->wmask + offset, 0, size);
+        /* Check capability by default */
+        memset(pdev->cmask + offset, 0xFF, size);
+        return existing;
+    }
+
     if (!offset) {
         offset = pci_find_space(pdev, size);
         if (!offset) {
             return -ENOSPC;
         }
     } else {
         /* Verify that capabilities don't overlap.  Note: device assignment
          * depends on this check to verify that the device is not broken.
          * Should never trigger for emulated devices, but it's helpful
          * for debugging these. */
         for (i = offset; i < offset + size; i++) {
             overlapping_cap = pci_find_capability_at_offset(pdev, i);
             if (overlapping_cap) {
                 fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
                         "Attempt to add PCI capability %x at offset "
                         "%x overlaps existing capability %x at offset %x\n",
                         pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
                         PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
                         cap_id, offset, overlapping_cap, i);
                 return -EINVAL;
             }
         }
     }
 
     config = pdev->config + offset;
     config[PCI_CAP_LIST_ID] = cap_id;
     config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
     pdev->config[PCI_CAPABILITY_LIST] = offset;
     pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
     memset(pdev->used + offset, 0xFF, size);
     /* Make capability read-only by default */
     memset(pdev->wmask + offset, 0, size);
     /* Check capability by default */
     memset(pdev->cmask + offset, 0xFF, size);
     return offset;
 }




>>>>> +        /* Check capability by default */
>>>>> +        memset(pdev->cmask + offset, 0xFF, size);
>>>
>>> I don't understand this part either.
>>
>> The pci_add_capability() does it for a new capability by default.
>>
>>
>>
>>>
>>> Alex
>>>
>>>>> +        return existing;
>>>>> +    }
>>>>> +
>>>>>    if (!offset) {
>>>>>        offset = pci_find_space(pdev, size);
>>>>>        if (!offset) {
>>>>>            return -ENOSPC;
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>>>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>>>>>
>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>>>>>
>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;:
>>>>>>>>>
>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>>>>>
>>>>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>>>>> is being built by QEMU.
>>>>>>>>>>
>>>>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>>>>>
>>>>>>>>>> For example, the old code destroys the following config
>>>>>>>>>> of PCIe Intel E1000E:
>>>>>>>>>>
>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>>>>> 0x34: 0xC8
>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>> 0xD0: 0x05 0xE0
>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>
>>>>>>>>>> after:
>>>>>>>>>> 0x34: 0xD0
>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>> 0xD0: 0x05 0xC8
>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>
>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>>>>>
>>>>>>>>>> The proposed patch does not change capability pointers when
>>>>>>>>>> the same type capability is about to add.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>> ---
>>>>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>>>>> --- a/hw/pci.c
>>>>>>>>>> +++ b/hw/pci.c
>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>>>>  }
>>>>>>>>>>
>>>>>>>>>>  config = pdev->config + offset;
>>>>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>>>>>
>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>>>>>> * replace the existing one or
>>>>>>>>> * drop out and not write the new one in.
>>>>>>>
>>>>>>> * hw_error :)
>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm not sure which way would be more natural.
>>>>>>>>
>>>>>>>> There is a third option - add another function, lets call it
>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>>>>>>> but won't touch list pointers.
>>>>>>>
>>>>>>> What good is a function that breaks internal consistency?
>>>>>>
>>>>>>
>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a
>>>>>> capability at some fixed offset would have initialized their config space before calling this
>>>>>> capabilities API (as VFIO does).
>>>>>>
>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and
>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
>>>>>> driver may care about its offset.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>>>>> exactly that the capability exists and where it is and it calls
>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops
>>>>>>>> just for imaginery scalability is a bit weird, no?
>>>>>>>
>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
>>>>>>>
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>> -- 
>> Alexey


-- 
Alexey

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  6:11                   ` Alexey Kardashevskiy
@ 2012-05-22  6:31                     ` Alexander Graf
  2012-05-22  7:01                       ` Alexey Kardashevskiy
  2012-06-08  8:47                       ` Alexey Kardashevskiy
  2012-05-22  6:38                     ` Alexander Graf
  1 sibling, 2 replies; 29+ messages in thread
From: Alexander Graf @ 2012-05-22  6:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson,
	anthony@codemonkey.ws, David Gibson



On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 22/05/12 15:52, Alexander Graf wrote:
>> 
>> 
>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> 
>>> On 22/05/12 13:21, Alexander Graf wrote:
>>>> 
>>>> 
>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>>>> 
>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>>>>> Alexander,
>>>>>> 
>>>>>> Is that any better? :)
>>>>> 
>>>>> Alex (Graf that is), ping ?
>>>>> 
>>>>> The original patch from Alexey was fine btw.
>>>>> 
>>>>> VFIO will always call things with the existing capability offset so
>>>>> there's no real risk of doing the wrong thing or break the list or
>>>>> anything.
>>>>> 
>>>>> IE. A small simple patch that addresses the problem :-)
>>>>> 
>>>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>>>> go too far to fix a problem we don't have. But we need a fix for the
>>>>> real issue and the simple patch does it neatly from what I can
>>>>> understand.
>>>>> 
>>>>> Cheers,
>>>>> Ben.
>>>>> 
>>>>>> 
>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>>>> * in pci config space */
>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>                      uint8_t offset, uint8_t size)
>>>>>> {
>>>>>> -    uint8_t *config;
>>>>>> +    uint8_t *config, existing;
>>>> 
>>>> Existing is a pointer to the target dev's config space, right?
>>> 
>>> Yes.
>>> 
>>>>>>   int i, overlapping_cap;
>>>>>> 
>>>>>> +    existing = pci_find_capability(pdev, cap_id);
>>>>>> +    if (existing) {
>>>>>> +        if (offset && (existing != offset)) {
>>>>>> +            return -EEXIST;
>>>>>> +        }
>>>>>> +        for (i = existing; i < size; ++i) {
>>>> 
>>>> So how does this possibly make sense?
>>> 
>>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double
>>> check that this space has not been tried to use by someone else.
>> 
>> i is an int. existing is a uint8_t*.
> 
> 
> It was there before me. This function already does a loop and this is how it was coded at the first place.

Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...

> 
> 
>>>>>> +            if (pdev->used[i]) {
>>>>>> +                return -EFAULT;
>>>>>> +            }
>>>>>> +        }
>>>>>> +        memset(pdev->used + offset, 0xFF, size);
>>>> Why?
>>> 
>>> Because I am marking the space this capability takes as used.
>> 
>> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no?
> 
> 
> No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability().

So why would the function that populates the config space initially not set the used flag correctly?

> 
> Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities)  OR  hack msi_init/msix_init not to touch capabilities if they exist.

No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really.

> 
> 
> 
>>>>>> +        /* Make capability read-only by default */
>>>>>> +        memset(pdev->wmask + offset, 0, size);
>>>> Why?
>>> 
>>> Because the pci_add_capability() does it for a new capability by default.
>> 
>> Hrm. So you're copying code? Can't you merge the overwrite and write cases?
> 
> I am trying to make it as a single chunk which is as small as possible.

No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense.

> 
> 
> If it helps, below is the same patch with extended context to see what is going on in that function.
> 
> 
> 
> 
> 
> 
> hw/pci.c |   20 +++++++++++++++++++-
> 1 files changed, 19 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 63a8219..7008a42 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom)
>     ptr = memory_region_get_ram_ptr(&pdev->rom);
>     load_image(path, ptr);
>     g_free(path);
> 
>     if (is_default_rom) {
>         /* Only the default rom images will be patched (if needed). */
>         pci_patch_ids(pdev, ptr, size);
>     }
> 
>     qemu_put_ram_ptr(ptr);
> 
>     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
> 
>     return 0;
> }
> 
> static void pci_del_option_rom(PCIDevice *pdev)
> {
>     if (!pdev->has_rom)
>         return;
> 
>     vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
>     memory_region_destroy(&pdev->rom);
>     pdev->has_rom = false;
> }
> 
> /*
>  * if !offset
>  * Reserve space and add capability to the linked list in pci config space
>  *
>  * if offset = 0,
>  * Find and reserve space and add capability to the linked list
>  * in pci config space */
> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>                        uint8_t offset, uint8_t size)
> {
> -    uint8_t *config;
> +    uint8_t *config, existing;
>     int i, overlapping_cap;
> 
> +    existing = pci_find_capability(pdev, cap_id);
> +    if (existing) {
> +        if (offset && (existing != offset)) {
> +            return -EEXIST;
> +        }
> +        for (i = existing; i < size; ++i) {
> +            if (pdev->used[i]) {
> +                return -EFAULT;
> +            }
> +        }

}

> +        memset(pdev->used + offset, 0xFF, size);
> +        /* Make capability read-only by default */
> +        memset(pdev->wmask + offset, 0, size);
> +        /* Check capability by default */
> +        memset(pdev->cmask + offset, 0xFF, size);
> +        return existing;
> +    }
> +
>     if (!offset) {

&& !existing maybe?

>         offset = pci_find_space(pdev, size);
>         if (!offset) {
>             return -ENOSPC;
>         }
>     } else {
>         /* Verify that capabilities don't overlap.  Note: device assignment
>          * depends on this check to verify that the device is not broken.
>          * Should never trigger for emulated devices, but it's helpful
>          * for debugging these. */
>         for (i = offset; i < offset + size; i++) {
>             overlapping_cap = pci_find_capability_at_offset(pdev, i);
>             if (overlapping_cap) {
>                 fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
>                         "Attempt to add PCI capability %x at offset "
>                         "%x overlaps existing capability %x at offset %x\n",
>                         pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
>                         PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
>                         cap_id, offset, overlapping_cap, i);
>                 return -EINVAL;
>             }
>         }
>     }
> 

If (!existing) {

>     config = pdev->config + offset;
>     config[PCI_CAP_LIST_ID] = cap_id;
>     config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>     pdev->config[PCI_CAPABILITY_LIST] = offset;
>     pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;

}

which poses the question why the above wouldn't apply for the existing case. It would work just as well to leave that in, no?

Alex

>     memset(pdev->used + offset, 0xFF, size);
>     /* Make capability read-only by default */
>     memset(pdev->wmask + offset, 0, size);
>     /* Check capability by default */
>     memset(pdev->cmask + offset, 0xFF, size);
>     return offset;
> }
> 
> 
> 
> 
>>>>>> +        /* Check capability by default */
>>>>>> +        memset(pdev->cmask + offset, 0xFF, size);
>>>> 
>>>> I don't understand this part either.
>>> 
>>> The pci_add_capability() does it for a new capability by default.
>>> 
>>> 
>>> 
>>>> 
>>>> Alex
>>>> 
>>>>>> +        return existing;
>>>>>> +    }
>>>>>> +
>>>>>>   if (!offset) {
>>>>>>       offset = pci_find_space(pdev, size);
>>>>>>       if (!offset) {
>>>>>>           return -ENOSPC;
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>>>>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>>>>>> 
>>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>>>>>> 
>>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;:
>>>>>>>>>> 
>>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>>>>>> 
>>>>>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>>>>>> is being built by QEMU.
>>>>>>>>>>> 
>>>>>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>>>>>> 
>>>>>>>>>>> For example, the old code destroys the following config
>>>>>>>>>>> of PCIe Intel E1000E:
>>>>>>>>>>> 
>>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>>>>>> 0x34: 0xC8
>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>> 0xD0: 0x05 0xE0
>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>> 
>>>>>>>>>>> after:
>>>>>>>>>>> 0x34: 0xD0
>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>> 0xD0: 0x05 0xC8
>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>> 
>>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>>>>>> 
>>>>>>>>>>> The proposed patch does not change capability pointers when
>>>>>>>>>>> the same type capability is about to add.
>>>>>>>>>>> 
>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>> ---
>>>>>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>>>> 
>>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>>>>>> --- a/hw/pci.c
>>>>>>>>>>> +++ b/hw/pci.c
>>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> config = pdev->config + offset;
>>>>>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>>>>>> 
>>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>>>>>>> * replace the existing one or
>>>>>>>>>> * drop out and not write the new one in.
>>>>>>>> 
>>>>>>>> * hw_error :)
>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I'm not sure which way would be more natural.
>>>>>>>>> 
>>>>>>>>> There is a third option - add another function, lets call it
>>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>>>>>>>> but won't touch list pointers.
>>>>>>>> 
>>>>>>>> What good is a function that breaks internal consistency?
>>>>>>> 
>>>>>>> 
>>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
>>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a
>>>>>>> capability at some fixed offset would have initialized their config space before calling this
>>>>>>> capabilities API (as VFIO does).
>>>>>>> 
>>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and
>>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
>>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
>>>>>>> driver may care about its offset.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>>>>>> exactly that the capability exists and where it is and it calls
>>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops
>>>>>>>>> just for imaginery scalability is a bit weird, no?
>>>>>>>> 
>>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Alex
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>> -- 
>>> Alexey
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  6:31                     ` Alexander Graf
@ 2012-05-22  7:01                       ` Alexey Kardashevskiy
  2012-05-22  7:13                         ` Alexander Graf
  2012-06-08  8:47                       ` Alexey Kardashevskiy
  1 sibling, 1 reply; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-05-22  7:01 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson,
	anthony@codemonkey.ws, David Gibson

On 22/05/12 16:31, Alexander Graf wrote:
> 
> 
> On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 22/05/12 15:52, Alexander Graf wrote:
>>>
>>>
>>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>
>>>> On 22/05/12 13:21, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>>>>>
>>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>>>>>> Alexander,
>>>>>>>
>>>>>>> Is that any better? :)
>>>>>>
>>>>>> Alex (Graf that is), ping ?
>>>>>>
>>>>>> The original patch from Alexey was fine btw.
>>>>>>
>>>>>> VFIO will always call things with the existing capability offset so
>>>>>> there's no real risk of doing the wrong thing or break the list or
>>>>>> anything.
>>>>>>
>>>>>> IE. A small simple patch that addresses the problem :-)
>>>>>>
>>>>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>>>>> go too far to fix a problem we don't have. But we need a fix for the
>>>>>> real issue and the simple patch does it neatly from what I can
>>>>>> understand.
>>>>>>
>>>>>> Cheers,
>>>>>> Ben.
>>>>>>
>>>>>>>
>>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>>>>> * in pci config space */
>>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>                      uint8_t offset, uint8_t size)
>>>>>>> {
>>>>>>> -    uint8_t *config;
>>>>>>> +    uint8_t *config, existing;
>>>>>
>>>>> Existing is a pointer to the target dev's config space, right?
>>>>
>>>> Yes.
>>>>
>>>>>>>   int i, overlapping_cap;
>>>>>>>
>>>>>>> +    existing = pci_find_capability(pdev, cap_id);
>>>>>>> +    if (existing) {
>>>>>>> +        if (offset && (existing != offset)) {
>>>>>>> +            return -EEXIST;
>>>>>>> +        }
>>>>>>> +        for (i = existing; i < size; ++i) {
>>>>>
>>>>> So how does this possibly make sense?
>>>>
>>>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double
>>>> check that this space has not been tried to use by someone else.
>>>
>>> i is an int. existing is a uint8_t*.
>>
>>
>> It was there before me. This function already does a loop and this is how it was coded at the first place.
> 
> Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...


Well it is still does not make much sense to have "int i" rather than "uint8_t i" :)


>>>>>>> +            if (pdev->used[i]) {
>>>>>>> +                return -EFAULT;
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +        memset(pdev->used + offset, 0xFF, size);
>>>>> Why?
>>>>
>>>> Because I am marking the space this capability takes as used.
>>>
>>> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no?
>>
>>
>> No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability().
> So why would the function that populates the config space initially not set the used flag correctly?


This is internal kitchen of PCIDevice which I do not want to touch from anywhere but pci.c. And
there is no "fixup_capability" or something.


>> Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities)  OR  hack msi_init/msix_init not to touch capabilities if they exist.
> No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really.


The last thing we want for a VFIO device is changing its capabilities list.


>>>>>>> +        /* Make capability read-only by default */
>>>>>>> +        memset(pdev->wmask + offset, 0, size);
>>>>> Why?
>>>>
>>>> Because the pci_add_capability() does it for a new capability by default.
>>>
>>> Hrm. So you're copying code? Can't you merge the overwrite and write cases?
>>
>> I am trying to make it as a single chunk which is as small as possible.
> 
> No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense.

I actually duplicated 4 (four) lines and did it just once. This is too little to be called
"duplicating" :) And I get very special case visually separated and easy to remove if we find a
better solution later.

But - no problemo, I'll rework it.


[no further comments]



>> If it helps, below is the same patch with extended context to see what is going on in that function.
>>
>>
>>
>>
>>
>>
>> hw/pci.c |   20 +++++++++++++++++++-
>> 1 files changed, 19 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/pci.c b/hw/pci.c
>> index 63a8219..7008a42 100644
>> --- a/hw/pci.c
>> +++ b/hw/pci.c
>> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom)
>>     ptr = memory_region_get_ram_ptr(&pdev->rom);
>>     load_image(path, ptr);
>>     g_free(path);
>>
>>     if (is_default_rom) {
>>         /* Only the default rom images will be patched (if needed). */
>>         pci_patch_ids(pdev, ptr, size);
>>     }
>>
>>     qemu_put_ram_ptr(ptr);
>>
>>     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
>>
>>     return 0;
>> }
>>
>> static void pci_del_option_rom(PCIDevice *pdev)
>> {
>>     if (!pdev->has_rom)
>>         return;
>>
>>     vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
>>     memory_region_destroy(&pdev->rom);
>>     pdev->has_rom = false;
>> }
>>
>> /*
>>  * if !offset
>>  * Reserve space and add capability to the linked list in pci config space
>>  *
>>  * if offset = 0,
>>  * Find and reserve space and add capability to the linked list
>>  * in pci config space */
>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>                        uint8_t offset, uint8_t size)
>> {
>> -    uint8_t *config;
>> +    uint8_t *config, existing;
>>     int i, overlapping_cap;
>>
>> +    existing = pci_find_capability(pdev, cap_id);
>> +    if (existing) {
>> +        if (offset && (existing != offset)) {
>> +            return -EEXIST;
>> +        }
>> +        for (i = existing; i < size; ++i) {
>> +            if (pdev->used[i]) {
>> +                return -EFAULT;
>> +            }
>> +        }
> 
> }
> 
>> +        memset(pdev->used + offset, 0xFF, size);
>> +        /* Make capability read-only by default */
>> +        memset(pdev->wmask + offset, 0, size);
>> +        /* Check capability by default */
>> +        memset(pdev->cmask + offset, 0xFF, size);
>> +        return existing;
>> +    }
>> +
>>     if (!offset) {
> 
> && !existing maybe?
> 
>>         offset = pci_find_space(pdev, size);
>>         if (!offset) {
>>             return -ENOSPC;
>>         }
>>     } else {
>>         /* Verify that capabilities don't overlap.  Note: device assignment
>>          * depends on this check to verify that the device is not broken.
>>          * Should never trigger for emulated devices, but it's helpful
>>          * for debugging these. */
>>         for (i = offset; i < offset + size; i++) {
>>             overlapping_cap = pci_find_capability_at_offset(pdev, i);
>>             if (overlapping_cap) {
>>                 fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
>>                         "Attempt to add PCI capability %x at offset "
>>                         "%x overlaps existing capability %x at offset %x\n",
>>                         pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
>>                         PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
>>                         cap_id, offset, overlapping_cap, i);
>>                 return -EINVAL;
>>             }
>>         }
>>     }
>>
> 
> If (!existing) {
> 
>>     config = pdev->config + offset;
>>     config[PCI_CAP_LIST_ID] = cap_id;
>>     config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>     pdev->config[PCI_CAPABILITY_LIST] = offset;
>>     pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> 
> }
> 
> which poses the question why the above wouldn't apply for the existing case. It would work just as well to leave that in, no?
> 
> Alex
> 
>>     memset(pdev->used + offset, 0xFF, size);
>>     /* Make capability read-only by default */
>>     memset(pdev->wmask + offset, 0, size);
>>     /* Check capability by default */
>>     memset(pdev->cmask + offset, 0xFF, size);
>>     return offset;
>> }
>>
>>
>>
>>
>>>>>>> +        /* Check capability by default */
>>>>>>> +        memset(pdev->cmask + offset, 0xFF, size);
>>>>>
>>>>> I don't understand this part either.
>>>>
>>>> The pci_add_capability() does it for a new capability by default.
>>>>
>>>>
>>>>
>>>>>
>>>>> Alex
>>>>>
>>>>>>> +        return existing;
>>>>>>> +    }
>>>>>>> +
>>>>>>>   if (!offset) {
>>>>>>>       offset = pci_find_space(pdev, size);
>>>>>>>       if (!offset) {
>>>>>>>           return -ENOSPC;
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>>>>>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>>>>>>>
>>>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>>>>>>>
>>>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;:
>>>>>>>>>>>
>>>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>>>>>>> is being built by QEMU.
>>>>>>>>>>>>
>>>>>>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>>>>>>>
>>>>>>>>>>>> For example, the old code destroys the following config
>>>>>>>>>>>> of PCIe Intel E1000E:
>>>>>>>>>>>>
>>>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>>>>>>> 0x34: 0xC8
>>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>>> 0xD0: 0x05 0xE0
>>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>>>
>>>>>>>>>>>> after:
>>>>>>>>>>>> 0x34: 0xD0
>>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>>> 0xD0: 0x05 0xC8
>>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>>>
>>>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>>>>>>>
>>>>>>>>>>>> The proposed patch does not change capability pointers when
>>>>>>>>>>>> the same type capability is about to add.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>>> ---
>>>>>>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>>>>>>> --- a/hw/pci.c
>>>>>>>>>>>> +++ b/hw/pci.c
>>>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> config = pdev->config + offset;
>>>>>>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>>>>>>>
>>>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>>>>>>>> * replace the existing one or
>>>>>>>>>>> * drop out and not write the new one in.
>>>>>>>>>
>>>>>>>>> * hw_error :)
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure which way would be more natural.
>>>>>>>>>>
>>>>>>>>>> There is a third option - add another function, lets call it
>>>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>>>>>>>>> but won't touch list pointers.
>>>>>>>>>
>>>>>>>>> What good is a function that breaks internal consistency?
>>>>>>>>
>>>>>>>>
>>>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
>>>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a
>>>>>>>> capability at some fixed offset would have initialized their config space before calling this
>>>>>>>> capabilities API (as VFIO does).
>>>>>>>>
>>>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and
>>>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
>>>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
>>>>>>>> driver may care about its offset.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>>>>>>> exactly that the capability exists and where it is and it calls
>>>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops
>>>>>>>>>> just for imaginery scalability is a bit weird, no?
>>>>>>>>>
>>>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.




-- 
Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  7:01                       ` Alexey Kardashevskiy
@ 2012-05-22  7:13                         ` Alexander Graf
  2012-05-22  7:37                           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Graf @ 2012-05-22  7:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson,
	anthony@codemonkey.ws, David Gibson


On 22.05.2012, at 09:01, Alexey Kardashevskiy wrote:

> On 22/05/12 16:31, Alexander Graf wrote:
>> 
>> 
>> On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> 
>>> On 22/05/12 15:52, Alexander Graf wrote:
>>>> 
>>>> 
>>>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>> 
>>>>> On 22/05/12 13:21, Alexander Graf wrote:
>>>>>> 
>>>>>> 
>>>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>>>>>> 
>>>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>>>>>>> Alexander,
>>>>>>>> 
>>>>>>>> Is that any better? :)
>>>>>>> 
>>>>>>> Alex (Graf that is), ping ?
>>>>>>> 
>>>>>>> The original patch from Alexey was fine btw.
>>>>>>> 
>>>>>>> VFIO will always call things with the existing capability offset so
>>>>>>> there's no real risk of doing the wrong thing or break the list or
>>>>>>> anything.
>>>>>>> 
>>>>>>> IE. A small simple patch that addresses the problem :-)
>>>>>>> 
>>>>>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>>>>>> go too far to fix a problem we don't have. But we need a fix for the
>>>>>>> real issue and the simple patch does it neatly from what I can
>>>>>>> understand.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Ben.
>>>>>>> 
>>>>>>>> 
>>>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>>>>>> * in pci config space */
>>>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>>                     uint8_t offset, uint8_t size)
>>>>>>>> {
>>>>>>>> -    uint8_t *config;
>>>>>>>> +    uint8_t *config, existing;
>>>>>> 
>>>>>> Existing is a pointer to the target dev's config space, right?
>>>>> 
>>>>> Yes.
>>>>> 
>>>>>>>>  int i, overlapping_cap;
>>>>>>>> 
>>>>>>>> +    existing = pci_find_capability(pdev, cap_id);
>>>>>>>> +    if (existing) {
>>>>>>>> +        if (offset && (existing != offset)) {
>>>>>>>> +            return -EEXIST;
>>>>>>>> +        }
>>>>>>>> +        for (i = existing; i < size; ++i) {
>>>>>> 
>>>>>> So how does this possibly make sense?
>>>>> 
>>>>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double
>>>>> check that this space has not been tried to use by someone else.
>>>> 
>>>> i is an int. existing is a uint8_t*.
>>> 
>>> 
>>> It was there before me. This function already does a loop and this is how it was coded at the first place.
>> 
>> Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...
> 
> 
> Well it is still does not make much sense to have "int i" rather than "uint8_t i" :)
> 
> 
>>>>>>>> +            if (pdev->used[i]) {
>>>>>>>> +                return -EFAULT;
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>> +        memset(pdev->used + offset, 0xFF, size);
>>>>>> Why?
>>>>> 
>>>>> Because I am marking the space this capability takes as used.
>>>> 
>>>> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no?
>>> 
>>> 
>>> No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability().
>> So why would the function that populates the config space initially not set the used flag correctly?
> 
> 
> This is internal kitchen of PCIDevice which I do not want to touch from anywhere but pci.c. And
> there is no "fixup_capability" or something.

Hrm. Maybe we should have one? :) Or instead of populating the config space with the exact data from the host device, loop through the host device capabilities and populate them using this function as we go. That should maintain the offsets, but ensure that all internal flags are set, no?

> 
> 
>>> Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities)  OR  hack msi_init/msix_init not to touch capabilities if they exist.
>> No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really.
> 
> 
> The last thing we want for a VFIO device is changing its capabilities list.

Well - we want it to look the same. The population should go through the same methods as emulated devices have to go through, no? :)

> 
> 
>>>>>>>> +        /* Make capability read-only by default */
>>>>>>>> +        memset(pdev->wmask + offset, 0, size);
>>>>>> Why?
>>>>> 
>>>>> Because the pci_add_capability() does it for a new capability by default.
>>>> 
>>>> Hrm. So you're copying code? Can't you merge the overwrite and write cases?
>>> 
>>> I am trying to make it as a single chunk which is as small as possible.
>> 
>> No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense.
> 
> I actually duplicated 4 (four) lines and did it just once. This is too little to be called
> "duplicating" :) And I get very special case visually separated and easy to remove if we find a
> better solution later.

Yeah, but the special case really shouldn't be all that special - that's why I was irritated.

> But - no problemo, I'll rework it.

Thanks!

> [no further comments]

What about mine below? ;)


Alex

> 
> 
> 
>>> If it helps, below is the same patch with extended context to see what is going on in that function.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> hw/pci.c |   20 +++++++++++++++++++-
>>> 1 files changed, 19 insertions(+), 1 deletions(-)
>>> 
>>> diff --git a/hw/pci.c b/hw/pci.c
>>> index 63a8219..7008a42 100644
>>> --- a/hw/pci.c
>>> +++ b/hw/pci.c
>>> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom)
>>>    ptr = memory_region_get_ram_ptr(&pdev->rom);
>>>    load_image(path, ptr);
>>>    g_free(path);
>>> 
>>>    if (is_default_rom) {
>>>        /* Only the default rom images will be patched (if needed). */
>>>        pci_patch_ids(pdev, ptr, size);
>>>    }
>>> 
>>>    qemu_put_ram_ptr(ptr);
>>> 
>>>    pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
>>> 
>>>    return 0;
>>> }
>>> 
>>> static void pci_del_option_rom(PCIDevice *pdev)
>>> {
>>>    if (!pdev->has_rom)
>>>        return;
>>> 
>>>    vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
>>>    memory_region_destroy(&pdev->rom);
>>>    pdev->has_rom = false;
>>> }
>>> 
>>> /*
>>> * if !offset
>>> * Reserve space and add capability to the linked list in pci config space
>>> *
>>> * if offset = 0,
>>> * Find and reserve space and add capability to the linked list
>>> * in pci config space */
>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>                       uint8_t offset, uint8_t size)
>>> {
>>> -    uint8_t *config;
>>> +    uint8_t *config, existing;
>>>    int i, overlapping_cap;
>>> 
>>> +    existing = pci_find_capability(pdev, cap_id);
>>> +    if (existing) {
>>> +        if (offset && (existing != offset)) {
>>> +            return -EEXIST;
>>> +        }
>>> +        for (i = existing; i < size; ++i) {
>>> +            if (pdev->used[i]) {
>>> +                return -EFAULT;
>>> +            }
>>> +        }
>> 
>> }
>> 
>>> +        memset(pdev->used + offset, 0xFF, size);
>>> +        /* Make capability read-only by default */
>>> +        memset(pdev->wmask + offset, 0, size);
>>> +        /* Check capability by default */
>>> +        memset(pdev->cmask + offset, 0xFF, size);
>>> +        return existing;
>>> +    }
>>> +
>>>    if (!offset) {
>> 
>> && !existing maybe?
>> 
>>>        offset = pci_find_space(pdev, size);
>>>        if (!offset) {
>>>            return -ENOSPC;
>>>        }
>>>    } else {
>>>        /* Verify that capabilities don't overlap.  Note: device assignment
>>>         * depends on this check to verify that the device is not broken.
>>>         * Should never trigger for emulated devices, but it's helpful
>>>         * for debugging these. */
>>>        for (i = offset; i < offset + size; i++) {
>>>            overlapping_cap = pci_find_capability_at_offset(pdev, i);
>>>            if (overlapping_cap) {
>>>                fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
>>>                        "Attempt to add PCI capability %x at offset "
>>>                        "%x overlaps existing capability %x at offset %x\n",
>>>                        pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
>>>                        PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
>>>                        cap_id, offset, overlapping_cap, i);
>>>                return -EINVAL;
>>>            }
>>>        }
>>>    }
>>> 
>> 
>> If (!existing) {
>> 
>>>    config = pdev->config + offset;
>>>    config[PCI_CAP_LIST_ID] = cap_id;
>>>    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>> 
>> }
>> 
>> which poses the question why the above wouldn't apply for the existing case. It would work just as well to leave that in, no?
>> 
>> Alex
>> 
>>>    memset(pdev->used + offset, 0xFF, size);
>>>    /* Make capability read-only by default */
>>>    memset(pdev->wmask + offset, 0, size);
>>>    /* Check capability by default */
>>>    memset(pdev->cmask + offset, 0xFF, size);
>>>    return offset;
>>> }
>>> 
>>> 
>>> 
>>> 
>>>>>>>> +        /* Check capability by default */
>>>>>>>> +        memset(pdev->cmask + offset, 0xFF, size);
>>>>>> 
>>>>>> I don't understand this part either.
>>>>> 
>>>>> The pci_add_capability() does it for a new capability by default.
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Alex
>>>>>> 
>>>>>>>> +        return existing;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>  if (!offset) {
>>>>>>>>      offset = pci_find_space(pdev, size);
>>>>>>>>      if (!offset) {
>>>>>>>>          return -ENOSPC;
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>>>>>>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>>>>>>>> 
>>>>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>>>>>>>> 
>>>>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;:
>>>>>>>>>>>> 
>>>>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>>>>>>>> is being built by QEMU.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For example, the old code destroys the following config
>>>>>>>>>>>>> of PCIe Intel E1000E:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>>>>>>>> 0x34: 0xC8
>>>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>>>> 0xD0: 0x05 0xE0
>>>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>>>> 
>>>>>>>>>>>>> after:
>>>>>>>>>>>>> 0x34: 0xD0
>>>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>>>> 0xD0: 0x05 0xC8
>>>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>>>> 
>>>>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The proposed patch does not change capability pointers when
>>>>>>>>>>>>> the same type capability is about to add.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>>>>>>>> --- a/hw/pci.c
>>>>>>>>>>>>> +++ b/hw/pci.c
>>>>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> config = pdev->config + offset;
>>>>>>>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>>>>>>>> 
>>>>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>>>>>>>>> * replace the existing one or
>>>>>>>>>>>> * drop out and not write the new one in.
>>>>>>>>>> 
>>>>>>>>>> * hw_error :)
>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm not sure which way would be more natural.
>>>>>>>>>>> 
>>>>>>>>>>> There is a third option - add another function, lets call it
>>>>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>>>>>>>>>> but won't touch list pointers.
>>>>>>>>>> 
>>>>>>>>>> What good is a function that breaks internal consistency?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
>>>>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a
>>>>>>>>> capability at some fixed offset would have initialized their config space before calling this
>>>>>>>>> capabilities API (as VFIO does).
>>>>>>>>> 
>>>>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and
>>>>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
>>>>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
>>>>>>>>> driver may care about its offset.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>>>>>>>> exactly that the capability exists and where it is and it calls
>>>>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops
>>>>>>>>>>> just for imaginery scalability is a bit weird, no?
>>>>>>>>>> 
>>>>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
> 
> 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  7:13                         ` Alexander Graf
@ 2012-05-22  7:37                           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-22  7:37 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, qemu-devel@nongnu.org,
	Alex Williamson, anthony@codemonkey.ws, David Gibson

On Tue, 2012-05-22 at 09:13 +0200, Alexander Graf wrote:
> On 22.05.2012, at 09:01, Alexey Kardashevskiy wrote:

> > This is internal kitchen of PCIDevice which I do not want to touch
> from anywhere but pci.c. And
> > there is no "fixup_capability" or something.
> 
> Hrm. Maybe we should have one? :) Or instead of populating the config
> space with the exact data from the host device,
> loop through the host device capabilities and populate them using this
> function as we go.
>  That should maintain the offsets, but ensure that all internal flags
> are set, no?

That actually sounds reasonable, though it might be more efficient
to have something like pci_parse_config() or similar called once
with a pre-cooked config space.

Internally inside pci.c it can do that same loop you mention. The
advantage is that it can also do whatever else we might need in the
future. If for some reason, something wants to cache a cap pointer
it can be done there, whatever else that is normally initialized as
fields to generate the config space can be initialized by reading the
config space and then initializing the fields etc... from that one
function.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  6:31                     ` Alexander Graf
  2012-05-22  7:01                       ` Alexey Kardashevskiy
@ 2012-06-08  8:47                       ` Alexey Kardashevskiy
  2012-06-08 10:56                         ` Jan Kiszka
  1 sibling, 1 reply; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-06-08  8:47 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson,
	anthony@codemonkey.ws, David Gibson

Yet another try :)

Normally the pci_add_capability is called on devices to add new
capability. This is ok for emulated devices which capabilities list
is being built by QEMU.

In the case of VFIO the capability may already exist and adding new
capability into the beginning of the linked list may create a loop
as the existing code ignores capabilities which point to the
capability being added.

For example, the old code destroys the following config
of PCIe Intel E1000E:

before adding PCI_CAP_ID_MSI (0x05) at offset 0xD0:
0x34: 0xC8
0xC8: 0x01 0xD0
0xD0: 0x05 0xE0
0xE0: 0x10 0x00

after:
0x34: 0xD0
0xC8: 0x01 0xD0
0xD0: 0x05 0xC8
0xE0: 0x10 0x00

As result capabilities at 0xC8 and 0xD0 point to each other.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/pci.c |   25 ++++++++++++++++++++-----
 1 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 63a8219..cd22caa 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1791,61 +1791,76 @@ static void pci_del_option_rom(PCIDevice *pdev)
         return;
 
     vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
     memory_region_destroy(&pdev->rom);
     pdev->has_rom = false;
 }
 
 /*
  * if !offset
  * Reserve space and add capability to the linked list in pci config space
  *
  * if offset = 0,
  * Find and reserve space and add capability to the linked list
- * in pci config space */
+ * in pci config space
+ * Also does not change the capability list pointers
+ * if a capability already exists (actual for device with pre-cooked config
+ * space such as VFIO)
+ */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
                        uint8_t offset, uint8_t size)
 {
     uint8_t *config;
     int i, overlapping_cap;
+    uint8_t existing_offset;
+
+    existing_offset = pci_find_capability(pdev, cap_id);
+    if (existing_offset) {
+        if (offset && (existing_offset != offset)) {
+            return -EEXIST;
+        }
+        offset = existing_offset;
+    }
 
     if (!offset) {
         offset = pci_find_space(pdev, size);
         if (!offset) {
             return -ENOSPC;
         }
     } else {
         /* Verify that capabilities don't overlap.  Note: device assignment
          * depends on this check to verify that the device is not broken.
          * Should never trigger for emulated devices, but it's helpful
          * for debugging these. */
         for (i = offset; i < offset + size; i++) {
             overlapping_cap = pci_find_capability_at_offset(pdev, i);
             if (overlapping_cap) {
                 fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
                         "Attempt to add PCI capability %x at offset "
                         "%x overlaps existing capability %x at offset %x\n",
                         pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
                         PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
                         cap_id, offset, overlapping_cap, i);
                 return -EINVAL;
             }
         }
     }
 
-    config = pdev->config + offset;
-    config[PCI_CAP_LIST_ID] = cap_id;
-    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
-    pdev->config[PCI_CAPABILITY_LIST] = offset;
+    if (!existing_offset) {
+        config = pdev->config + offset;
+        config[PCI_CAP_LIST_ID] = cap_id;
+        config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
+        pdev->config[PCI_CAPABILITY_LIST] = offset;
+    }
     pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
     memset(pdev->used + offset, 0xFF, size);
     /* Make capability read-only by default */
     memset(pdev->wmask + offset, 0, size);
     /* Check capability by default */
     memset(pdev->cmask + offset, 0xFF, size);
     return offset;
 }
 
 /* Unlink capability from the pci config space. */
 void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
 {
     uint8_t prev, offset = pci_find_capability_list(pdev, cap_id, &prev);
-- 
1.7.7.3




On 22/05/12 16:31, Alexander Graf wrote:
> 
> 
> On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 22/05/12 15:52, Alexander Graf wrote:
>>>
>>>
>>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>
>>>> On 22/05/12 13:21, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>>>>>
>>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>>>>>> Alexander,
>>>>>>>
>>>>>>> Is that any better? :)
>>>>>>
>>>>>> Alex (Graf that is), ping ?
>>>>>>
>>>>>> The original patch from Alexey was fine btw.
>>>>>>
>>>>>> VFIO will always call things with the existing capability offset so
>>>>>> there's no real risk of doing the wrong thing or break the list or
>>>>>> anything.
>>>>>>
>>>>>> IE. A small simple patch that addresses the problem :-)
>>>>>>
>>>>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>>>>> go too far to fix a problem we don't have. But we need a fix for the
>>>>>> real issue and the simple patch does it neatly from what I can
>>>>>> understand.
>>>>>>
>>>>>> Cheers,
>>>>>> Ben.
>>>>>>
>>>>>>>
>>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>>>>> * in pci config space */
>>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>                      uint8_t offset, uint8_t size)
>>>>>>> {
>>>>>>> -    uint8_t *config;
>>>>>>> +    uint8_t *config, existing;
>>>>>
>>>>> Existing is a pointer to the target dev's config space, right?
>>>>
>>>> Yes.
>>>>
>>>>>>>   int i, overlapping_cap;
>>>>>>>
>>>>>>> +    existing = pci_find_capability(pdev, cap_id);
>>>>>>> +    if (existing) {
>>>>>>> +        if (offset && (existing != offset)) {
>>>>>>> +            return -EEXIST;
>>>>>>> +        }
>>>>>>> +        for (i = existing; i < size; ++i) {
>>>>>
>>>>> So how does this possibly make sense?
>>>>
>>>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double
>>>> check that this space has not been tried to use by someone else.
>>>
>>> i is an int. existing is a uint8_t*.
>>
>>
>> It was there before me. This function already does a loop and this is how it was coded at the first place.
> 
> Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...
> 
>>
>>
>>>>>>> +            if (pdev->used[i]) {
>>>>>>> +                return -EFAULT;
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +        memset(pdev->used + offset, 0xFF, size);
>>>>> Why?
>>>>
>>>> Because I am marking the space this capability takes as used.
>>>
>>> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no?
>>
>>
>> No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability().
> 
> So why would the function that populates the config space initially not set the used flag correctly?
> 
>>
>> Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities)  OR  hack msi_init/msix_init not to touch capabilities if they exist.
> 
> No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really.
> 
>>
>>
>>
>>>>>>> +        /* Make capability read-only by default */
>>>>>>> +        memset(pdev->wmask + offset, 0, size);
>>>>> Why?
>>>>
>>>> Because the pci_add_capability() does it for a new capability by default.
>>>
>>> Hrm. So you're copying code? Can't you merge the overwrite and write cases?
>>
>> I am trying to make it as a single chunk which is as small as possible.
> 
> No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense.
> 
>>
>>
>> If it helps, below is the same patch with extended context to see what is going on in that function.
>>
>>
>>
>>
>>
>>
>> hw/pci.c |   20 +++++++++++++++++++-
>> 1 files changed, 19 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/pci.c b/hw/pci.c
>> index 63a8219..7008a42 100644
>> --- a/hw/pci.c
>> +++ b/hw/pci.c
>> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom)
>>     ptr = memory_region_get_ram_ptr(&pdev->rom);
>>     load_image(path, ptr);
>>     g_free(path);
>>
>>     if (is_default_rom) {
>>         /* Only the default rom images will be patched (if needed). */
>>         pci_patch_ids(pdev, ptr, size);
>>     }
>>
>>     qemu_put_ram_ptr(ptr);
>>
>>     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
>>
>>     return 0;
>> }
>>
>> static void pci_del_option_rom(PCIDevice *pdev)
>> {
>>     if (!pdev->has_rom)
>>         return;
>>
>>     vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
>>     memory_region_destroy(&pdev->rom);
>>     pdev->has_rom = false;
>> }
>>
>> /*
>>  * if !offset
>>  * Reserve space and add capability to the linked list in pci config space
>>  *
>>  * if offset = 0,
>>  * Find and reserve space and add capability to the linked list
>>  * in pci config space */
>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>                        uint8_t offset, uint8_t size)
>> {
>> -    uint8_t *config;
>> +    uint8_t *config, existing;
>>     int i, overlapping_cap;
>>
>> +    existing = pci_find_capability(pdev, cap_id);
>> +    if (existing) {
>> +        if (offset && (existing != offset)) {
>> +            return -EEXIST;
>> +        }
>> +        for (i = existing; i < size; ++i) {
>> +            if (pdev->used[i]) {
>> +                return -EFAULT;
>> +            }
>> +        }
> 
> }
> 
>> +        memset(pdev->used + offset, 0xFF, size);
>> +        /* Make capability read-only by default */
>> +        memset(pdev->wmask + offset, 0, size);
>> +        /* Check capability by default */
>> +        memset(pdev->cmask + offset, 0xFF, size);
>> +        return existing;
>> +    }
>> +
>>     if (!offset) {
> 
> && !existing maybe?
> 
>>         offset = pci_find_space(pdev, size);
>>         if (!offset) {
>>             return -ENOSPC;
>>         }
>>     } else {
>>         /* Verify that capabilities don't overlap.  Note: device assignment
>>          * depends on this check to verify that the device is not broken.
>>          * Should never trigger for emulated devices, but it's helpful
>>          * for debugging these. */
>>         for (i = offset; i < offset + size; i++) {
>>             overlapping_cap = pci_find_capability_at_offset(pdev, i);
>>             if (overlapping_cap) {
>>                 fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
>>                         "Attempt to add PCI capability %x at offset "
>>                         "%x overlaps existing capability %x at offset %x\n",
>>                         pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
>>                         PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
>>                         cap_id, offset, overlapping_cap, i);
>>                 return -EINVAL;
>>             }
>>         }
>>     }
>>
> 
> If (!existing) {
> 
>>     config = pdev->config + offset;
>>     config[PCI_CAP_LIST_ID] = cap_id;
>>     config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>     pdev->config[PCI_CAPABILITY_LIST] = offset;
>>     pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> 
> }
> 
> which poses the question why the above wouldn't apply for the existing case. It would work just as well to leave that in, no?
> 
> Alex
> 
>>     memset(pdev->used + offset, 0xFF, size);
>>     /* Make capability read-only by default */
>>     memset(pdev->wmask + offset, 0, size);
>>     /* Check capability by default */
>>     memset(pdev->cmask + offset, 0xFF, size);
>>     return offset;
>> }
>>
>>
>>
>>
>>>>>>> +        /* Check capability by default */
>>>>>>> +        memset(pdev->cmask + offset, 0xFF, size);
>>>>>
>>>>> I don't understand this part either.
>>>>
>>>> The pci_add_capability() does it for a new capability by default.
>>>>
>>>>
>>>>
>>>>>
>>>>> Alex
>>>>>
>>>>>>> +        return existing;
>>>>>>> +    }
>>>>>>> +
>>>>>>>   if (!offset) {
>>>>>>>       offset = pci_find_space(pdev, size);
>>>>>>>       if (!offset) {
>>>>>>>           return -ENOSPC;
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>>>>>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>>>>>>>
>>>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>>>>>>>
>>>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;:
>>>>>>>>>>>
>>>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>>>>>>> is being built by QEMU.
>>>>>>>>>>>>
>>>>>>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>>>>>>>
>>>>>>>>>>>> For example, the old code destroys the following config
>>>>>>>>>>>> of PCIe Intel E1000E:
>>>>>>>>>>>>
>>>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>>>>>>> 0x34: 0xC8
>>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>>> 0xD0: 0x05 0xE0
>>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>>>
>>>>>>>>>>>> after:
>>>>>>>>>>>> 0x34: 0xD0
>>>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>>>> 0xD0: 0x05 0xC8
>>>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>>>
>>>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>>>>>>>
>>>>>>>>>>>> The proposed patch does not change capability pointers when
>>>>>>>>>>>> the same type capability is about to add.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>>> ---
>>>>>>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>>>>>>> --- a/hw/pci.c
>>>>>>>>>>>> +++ b/hw/pci.c
>>>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> config = pdev->config + offset;
>>>>>>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>>>>>>>
>>>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>>>>>>>> * replace the existing one or
>>>>>>>>>>> * drop out and not write the new one in.
>>>>>>>>>
>>>>>>>>> * hw_error :)
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure which way would be more natural.
>>>>>>>>>>
>>>>>>>>>> There is a third option - add another function, lets call it
>>>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>>>>>>>>> but won't touch list pointers.
>>>>>>>>>
>>>>>>>>> What good is a function that breaks internal consistency?
>>>>>>>>
>>>>>>>>
>>>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
>>>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a
>>>>>>>> capability at some fixed offset would have initialized their config space before calling this
>>>>>>>> capabilities API (as VFIO does).
>>>>>>>>
>>>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and
>>>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
>>>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
>>>>>>>> driver may care about its offset.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>>>>>>> exactly that the capability exists and where it is and it calls
>>>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops
>>>>>>>>>> just for imaginery scalability is a bit weird, no?
>>>>>>>>>
>>>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.


-- 
Alexey

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-06-08  8:47                       ` Alexey Kardashevskiy
@ 2012-06-08 10:56                         ` Jan Kiszka
  2012-06-08 11:16                           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2012-06-08 10:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org,
	Alex Williamson, anthony@codemonkey.ws, David Gibson

On 2012-06-08 10:47, Alexey Kardashevskiy wrote:
> Yet another try :)
> 
> Normally the pci_add_capability is called on devices to add new
> capability. This is ok for emulated devices which capabilities list
> is being built by QEMU.
> 
> In the case of VFIO the capability may already exist and adding new

Why does it exit? VFIO should build the virtual capability list from
scratch (just like classic device assignment does), recreating the
layout of the physical device (except for masked out caps). In that
case, this conflict should become impossible, no?

But if pci_*add*_capability should actually be used like this (I doubt
this), some renaming would be required. "Add" sound like "append" to me,
not "update".

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-06-08 10:56                         ` Jan Kiszka
@ 2012-06-08 11:16                           ` Alexey Kardashevskiy
  2012-06-08 11:30                             ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-06-08 11:16 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org,
	Alex Williamson, anthony@codemonkey.ws, David Gibson

08.06.2012 20:56, Jan Kiszka написал:
> On 2012-06-08 10:47, Alexey Kardashevskiy wrote:
>> Yet another try :)
>>
>> Normally the pci_add_capability is called on devices to add new
>> capability. This is ok for emulated devices which capabilities list
>> is being built by QEMU.
>>
>> In the case of VFIO the capability may already exist and adding new
> 
> Why does it exit? VFIO should build the virtual capability list from
> scratch (just like classic device assignment does), recreating the
> layout of the physical device (except for masked out caps). In that
> case, this conflict should become impossible, no?

Normally capabilities in emulated devices are created by calling
msi_init or msix_init - just when emulated device wants to advertise it
to the guest.

In the case of VFIO, there is a lot of capabilities which QEMU does not
know and does not want to know about. They are read from the host kernel
as is. And we definitely want to pass these capabilities to the guest as
is, i.e. on the same position and the same number of them. Just for some
we call pci_add_capability (indirectly!) if we want QEMU to support them
somehow.

If we invent some function which "readds" all the capabilities we got
from the host to keep internal QEMU's PCIDevice data in sync, then we'll
need to change every piece of code which adds capabilities. I noticed,
this is very common approach here to change a lot for a very small thing
or rare case but I'd like to avoid this :)

> But if pci_*add*_capability should actually be used like this (I doubt
> this),

MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call
msi_init/msix_init and they call pci_add_capability.

> some renaming would be required. "Add" sound like "append" to me,
> not "update".

It is "add" for all the cases but VFIO. VFIO is the very special case
and I do not see another one doing the same soon.

-- 
With best regards

Alexey Kardashevskiy -- icq: 52150396

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-06-08 11:16                           ` Alexey Kardashevskiy
@ 2012-06-08 11:30                             ` Jan Kiszka
  2012-06-08 14:00                               ` Alexey Kardashevskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2012-06-08 11:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org,
	Alex Williamson, anthony@codemonkey.ws, David Gibson

On 2012-06-08 13:16, Alexey Kardashevskiy wrote:
> 08.06.2012 20:56, Jan Kiszka написал:
>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote:
>>> Yet another try :)
>>>
>>> Normally the pci_add_capability is called on devices to add new
>>> capability. This is ok for emulated devices which capabilities list
>>> is being built by QEMU.
>>>
>>> In the case of VFIO the capability may already exist and adding new
>>
>> Why does it exit? VFIO should build the virtual capability list from
>> scratch (just like classic device assignment does), recreating the
>> layout of the physical device (except for masked out caps). In that
>> case, this conflict should become impossible, no?
> 
> Normally capabilities in emulated devices are created by calling
> msi_init or msix_init - just when emulated device wants to advertise it
> to the guest.
> 
> In the case of VFIO, there is a lot of capabilities which QEMU does not
> know and does not want to know about. They are read from the host kernel
> as is. And we definitely want to pass these capabilities to the guest as
> is, i.e. on the same position and the same number of them. Just for some
> we call pci_add_capability (indirectly!) if we want QEMU to support them
> somehow.
> 
> If we invent some function which "readds" all the capabilities we got
> from the host to keep internal QEMU's PCIDevice data in sync, then we'll
> need to change every piece of code which adds capabilities.

I can't follow. What is different in VFIO from device-assignment.c,
assigned_device_pci_cap_init (except that it already uses msi[x]_init,
something we need to fix in device-assignment.c)?

> I noticed,
> this is very common approach here to change a lot for a very small thing
> or rare case but I'd like to avoid this :)
> 
>> But if pci_*add*_capability should actually be used like this (I doubt
>> this),
> 
> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call
> msi_init/msix_init and they call pci_add_capability.

You can't blame msi_init/msix_init for the fact that VFIO creates a
capability list with an existing MSI/MSI-X entry beforehand.

> 
>> some renaming would be required. "Add" sound like "append" to me,
>> not "update".
> 
> It is "add" for all the cases but VFIO. VFIO is the very special case
> and I do not see another one doing the same soon.

PCI device assignment may have some special requirements. Then it is
either required to generalize common services properly or keep the
specialty local. So far, this proposal does not fall in any of those two
categories.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-06-08 11:30                             ` Jan Kiszka
@ 2012-06-08 14:00                               ` Alexey Kardashevskiy
  2012-06-08 14:43                                 ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-06-08 14:00 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org,
	Alex Williamson, anthony@codemonkey.ws, David Gibson

08.06.2012 21:30, Jan Kiszka пишет:
> On 2012-06-08 13:16, Alexey Kardashevskiy wrote:
>> 08.06.2012 20:56, Jan Kiszka написал:
>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote:
>>>> Yet another try :)
>>>>
>>>> Normally the pci_add_capability is called on devices to add new
>>>> capability. This is ok for emulated devices which capabilities list
>>>> is being built by QEMU.
>>>>
>>>> In the case of VFIO the capability may already exist and adding new
>>>
>>> Why does it exit? VFIO should build the virtual capability list from
>>> scratch (just like classic device assignment does), recreating the
>>> layout of the physical device (except for masked out caps). In that
>>> case, this conflict should become impossible, no?
>>
>> Normally capabilities in emulated devices are created by calling
>> msi_init or msix_init - just when emulated device wants to advertise it
>> to the guest.
>>
>> In the case of VFIO, there is a lot of capabilities which QEMU does not
>> know and does not want to know about. They are read from the host kernel
>> as is. And we definitely want to pass these capabilities to the guest as
>> is, i.e. on the same position and the same number of them. Just for some
>> we call pci_add_capability (indirectly!) if we want QEMU to support them
>> somehow.
>>
>> If we invent some function which "readds" all the capabilities we got
>> from the host to keep internal QEMU's PCIDevice data in sync, then we'll
>> need to change every piece of code which adds capabilities.
> 
> I can't follow. What is different in VFIO from device-assignment.c,
> assigned_device_pci_cap_init (except that it already uses msi[x]_init,
> something we need to fix in device-assignment.c)?

What are device-assignment.c and assigned_device_pci_cap_init? Cannot
find them in QEMU tree.

Ah, anyway. The main difference is QEMU does not emulate VFIO devices,
it just a proxy to the host system. Or I do not understand the question.

>> I noticed,
>> this is very common approach here to change a lot for a very small thing
>> or rare case but I'd like to avoid this :)
>>
>>> But if pci_*add*_capability should actually be used like this (I doubt
>>> this),
>>
>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call
>> msi_init/msix_init and they call pci_add_capability.
> 
> You can't blame msi_init/msix_init for the fact that VFIO creates a
> capability list with an existing MSI/MSI-X entry beforehand.

VFIO does not create any capability. It gets them all from the host
kernel and passes to the guest as is. VFIO only needs MSIX to be enabled
in VFIO.

>>> some renaming would be required. "Add" sound like "append" to me,
>>> not "update".
>>
>> It is "add" for all the cases but VFIO. VFIO is the very special case
>> and I do not see another one doing the same soon.
> 
> PCI device assignment may have some special requirements. Then it is
> either required to generalize common services properly or keep the
> specialty local. So far, this proposal does not fall in any of those two
> categories.

It is a common patch. It does not know about VFIO and lets
pci_add_capability handle one more situation when the capability already
exists.

The only "common" solution I see here is
1) to add pci_fixup_capabilities() which would mark all the bytes of
existing capabilities as "used", we will call it once we fetched the
config space from the host kernel
2) to fix pci_add_capabilities not to fail and simply return (0?) if we
add a capability which already exists.

Will it be ok?


-- 
With best regards

Alexey Kardashevskiy -- icq: 52150396

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-06-08 14:00                               ` Alexey Kardashevskiy
@ 2012-06-08 14:43                                 ` Jan Kiszka
  2012-06-08 14:56                                   ` Alex Williamson
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2012-06-08 14:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org,
	Alex Williamson, anthony@codemonkey.ws, David Gibson

On 2012-06-08 16:00, Alexey Kardashevskiy wrote:
> 08.06.2012 21:30, Jan Kiszka пишет:
>> On 2012-06-08 13:16, Alexey Kardashevskiy wrote:
>>> 08.06.2012 20:56, Jan Kiszka написал:
>>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote:
>>>>> Yet another try :)
>>>>>
>>>>> Normally the pci_add_capability is called on devices to add new
>>>>> capability. This is ok for emulated devices which capabilities list
>>>>> is being built by QEMU.
>>>>>
>>>>> In the case of VFIO the capability may already exist and adding new
>>>>
>>>> Why does it exit? VFIO should build the virtual capability list from
>>>> scratch (just like classic device assignment does), recreating the
>>>> layout of the physical device (except for masked out caps). In that
>>>> case, this conflict should become impossible, no?
>>>
>>> Normally capabilities in emulated devices are created by calling
>>> msi_init or msix_init - just when emulated device wants to advertise it
>>> to the guest.
>>>
>>> In the case of VFIO, there is a lot of capabilities which QEMU does not
>>> know and does not want to know about. They are read from the host kernel
>>> as is. And we definitely want to pass these capabilities to the guest as
>>> is, i.e. on the same position and the same number of them. Just for some
>>> we call pci_add_capability (indirectly!) if we want QEMU to support them
>>> somehow.
>>>
>>> If we invent some function which "readds" all the capabilities we got
>>> from the host to keep internal QEMU's PCIDevice data in sync, then we'll
>>> need to change every piece of code which adds capabilities.
>>
>> I can't follow. What is different in VFIO from device-assignment.c,
>> assigned_device_pci_cap_init (except that it already uses msi[x]_init,
>> something we need to fix in device-assignment.c)?
> 
> What are device-assignment.c and assigned_device_pci_cap_init? Cannot
> find them in QEMU tree.

"Old-style" KVM device assignment is not yet upstream. You can find it
in qemu-kvm, hopefully in upstream soon as well.

> 
> Ah, anyway. The main difference is QEMU does not emulate VFIO devices,
> it just a proxy to the host system. Or I do not understand the question.
> 
>>> I noticed,
>>> this is very common approach here to change a lot for a very small thing
>>> or rare case but I'd like to avoid this :)
>>>
>>>> But if pci_*add*_capability should actually be used like this (I doubt
>>>> this),
>>>
>>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call
>>> msi_init/msix_init and they call pci_add_capability.
>>
>> You can't blame msi_init/msix_init for the fact that VFIO creates a
>> capability list with an existing MSI/MSI-X entry beforehand.
> 
> VFIO does not create any capability. It gets them all from the host
> kernel and passes to the guest as is. VFIO only needs MSIX to be enabled
> in VFIO.

Just like any device in QEMU, also VFIO need to set up a virtual config
space when it registers with the PCI core layer. Even if the virtual one
is modeled after the real one, it is still _created_ by the VFIO
userspace part. And this creation process is obviously a bit messed up
so far. Fix this, but not by adding workarounds in the MSI or PCI layer.
Rather add all capabilities you want to expose to the guest via
pci_add_capability or, indirectly, via msi[x]_init at the right
position. Do not just copy the real config space over, that breaks the
core layer as we see.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-06-08 14:43                                 ` Jan Kiszka
@ 2012-06-08 14:56                                   ` Alex Williamson
  2012-06-08 15:05                                     ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2012-06-08 14:56 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, Alexander Graf,
	qemu-devel@nongnu.org, anthony@codemonkey.ws, David Gibson

On Fri, 2012-06-08 at 16:43 +0200, Jan Kiszka wrote:
> On 2012-06-08 16:00, Alexey Kardashevskiy wrote:
> > 08.06.2012 21:30, Jan Kiszka пишет:
> >> On 2012-06-08 13:16, Alexey Kardashevskiy wrote:
> >>> 08.06.2012 20:56, Jan Kiszka написал:
> >>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote:
> >>>>> Yet another try :)
> >>>>>
> >>>>> Normally the pci_add_capability is called on devices to add new
> >>>>> capability. This is ok for emulated devices which capabilities list
> >>>>> is being built by QEMU.
> >>>>>
> >>>>> In the case of VFIO the capability may already exist and adding new
> >>>>
> >>>> Why does it exit? VFIO should build the virtual capability list from
> >>>> scratch (just like classic device assignment does), recreating the
> >>>> layout of the physical device (except for masked out caps). In that
> >>>> case, this conflict should become impossible, no?
> >>>
> >>> Normally capabilities in emulated devices are created by calling
> >>> msi_init or msix_init - just when emulated device wants to advertise it
> >>> to the guest.
> >>>
> >>> In the case of VFIO, there is a lot of capabilities which QEMU does not
> >>> know and does not want to know about. They are read from the host kernel
> >>> as is. And we definitely want to pass these capabilities to the guest as
> >>> is, i.e. on the same position and the same number of them. Just for some
> >>> we call pci_add_capability (indirectly!) if we want QEMU to support them
> >>> somehow.
> >>>
> >>> If we invent some function which "readds" all the capabilities we got
> >>> from the host to keep internal QEMU's PCIDevice data in sync, then we'll
> >>> need to change every piece of code which adds capabilities.
> >>
> >> I can't follow. What is different in VFIO from device-assignment.c,
> >> assigned_device_pci_cap_init (except that it already uses msi[x]_init,
> >> something we need to fix in device-assignment.c)?
> > 
> > What are device-assignment.c and assigned_device_pci_cap_init? Cannot
> > find them in QEMU tree.
> 
> "Old-style" KVM device assignment is not yet upstream. You can find it
> in qemu-kvm, hopefully in upstream soon as well.
> 
> > 
> > Ah, anyway. The main difference is QEMU does not emulate VFIO devices,
> > it just a proxy to the host system. Or I do not understand the question.
> > 
> >>> I noticed,
> >>> this is very common approach here to change a lot for a very small thing
> >>> or rare case but I'd like to avoid this :)
> >>>
> >>>> But if pci_*add*_capability should actually be used like this (I doubt
> >>>> this),
> >>>
> >>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call
> >>> msi_init/msix_init and they call pci_add_capability.
> >>
> >> You can't blame msi_init/msix_init for the fact that VFIO creates a
> >> capability list with an existing MSI/MSI-X entry beforehand.
> > 
> > VFIO does not create any capability. It gets them all from the host
> > kernel and passes to the guest as is. VFIO only needs MSIX to be enabled
> > in VFIO.
> 
> Just like any device in QEMU, also VFIO need to set up a virtual config
> space when it registers with the PCI core layer. Even if the virtual one
> is modeled after the real one, it is still _created_ by the VFIO
> userspace part. And this creation process is obviously a bit messed up
> so far. Fix this, but not by adding workarounds in the MSI or PCI layer.
> Rather add all capabilities you want to expose to the guest via
> pci_add_capability or, indirectly, via msi[x]_init at the right
> position. Do not just copy the real config space over, that breaks the
> core layer as we see.

The difference between VFIO and kvm device assignment is that VFIO
emulates a lot of config space for us, so most things are passed
through.  MSI and MSIX are unique that we actually do want the qemu
support for helping us to manage them.  So we're basically not telling
qemu about anything other than these, and for the most part, that works
since qemu never handles access to the other capabilities.  However, I
think you're probably right, VFIO should just walk the capabilities
list, registering each with qemu.  It's a little "unnecessary" overhead
from the VFIO perspective, but it makes the VFIO device less unique.
I'll work on adding this.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-06-08 14:56                                   ` Alex Williamson
@ 2012-06-08 15:05                                     ` Jan Kiszka
  2012-06-08 15:22                                       ` Alex Williamson
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2012-06-08 15:05 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, Alexander Graf,
	qemu-devel@nongnu.org, anthony@codemonkey.ws, David Gibson

On 2012-06-08 16:56, Alex Williamson wrote:
> The difference between VFIO and kvm device assignment is that VFIO
> emulates a lot of config space for us, so most things are passed
> through.

That's not different from current device assignment, is it? I think the
major difference is that VFIO filters and potentially post-processes the
direct writes in kernel space.

>  MSI and MSIX are unique that we actually do want the qemu
> support for helping us to manage them.  So we're basically not telling
> qemu about anything other than these, and for the most part, that works
> since qemu never handles access to the other capabilities.  However, I
> think you're probably right, VFIO should just walk the capabilities
> list, registering each with qemu.  It's a little "unnecessary" overhead
> from the VFIO perspective, but it makes the VFIO device less unique.
> I'll work on adding this.  Thanks,

Great, thanks!
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-06-08 15:05                                     ` Jan Kiszka
@ 2012-06-08 15:22                                       ` Alex Williamson
  0 siblings, 0 replies; 29+ messages in thread
From: Alex Williamson @ 2012-06-08 15:22 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, Alexander Graf,
	qemu-devel@nongnu.org, anthony@codemonkey.ws, David Gibson

On Fri, 2012-06-08 at 17:05 +0200, Jan Kiszka wrote:
> On 2012-06-08 16:56, Alex Williamson wrote:
> > The difference between VFIO and kvm device assignment is that VFIO
> > emulates a lot of config space for us, so most things are passed
> > through.
> 
> That's not different from current device assignment, is it? I think the
> major difference is that VFIO filters and potentially post-processes the
> direct writes in kernel space.

Right, and having the filtering/virtualization in the kernel means that
qemu only handles a very small subset of PCI config space.  That's made
us lax in even telling qemu about the areas that it'll never see
accesses too.  For current device assignment, since we doing the
emulation in qemu, it's a little more beneficial to register everything.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-22  6:11                   ` Alexey Kardashevskiy
  2012-05-22  6:31                     ` Alexander Graf
@ 2012-05-22  6:38                     ` Alexander Graf
  1 sibling, 0 replies; 29+ messages in thread
From: Alexander Graf @ 2012-05-22  6:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson,
	anthony@codemonkey.ws, David Gibson



On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 22/05/12 15:52, Alexander Graf wrote:
>> 
>> 
>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> 
>>> On 22/05/12 13:21, Alexander Graf wrote:
>>>> 
>>>> 
>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>>>> 
>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>>>>> Alexander,
>>>>>> 
>>>>>> Is that any better? :)
>>>>> 
>>>>> Alex (Graf that is), ping ?
>>>>> 
>>>>> The original patch from Alexey was fine btw.
>>>>> 
>>>>> VFIO will always call things with the existing capability offset so
>>>>> there's no real risk of doing the wrong thing or break the list or
>>>>> anything.
>>>>> 
>>>>> IE. A small simple patch that addresses the problem :-)
>>>>> 
>>>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>>>> go too far to fix a problem we don't have. But we need a fix for the
>>>>> real issue and the simple patch does it neatly from what I can
>>>>> understand.
>>>>> 
>>>>> Cheers,
>>>>> Ben.
>>>>> 
>>>>>> 
>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>>>> * in pci config space */
>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>                      uint8_t offset, uint8_t size)
>>>>>> {
>>>>>> -    uint8_t *config;
>>>>>> +    uint8_t *config, existing;
>>>> 
>>>> Existing is a pointer to the target dev's config space, right?
>>> 
>>> Yes.
>>> 
>>>>>>   int i, overlapping_cap;
>>>>>> 
>>>>>> +    existing = pci_find_capability(pdev, cap_id);
>>>>>> +    if (existing) {
>>>>>> +        if (offset && (existing != offset)) {
>>>>>> +            return -EEXIST;
>>>>>> +        }
>>>>>> +        for (i = existing; i < size; ++i) {
>>>> 
>>>> So how does this possibly make sense?
>>> 
>>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double
>>> check that this space has not been tried to use by someone else.
>> 
>> i is an int. existing is a uint8_t*.
> 
> 
> It was there before me. This function already does a loop and this is how it was coded at the first place.

Also, while at it, please add some comments at least for the code you add that explain why you do the things you do :).

Alex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-11  6:45 [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space Alexey Kardashevskiy
  2012-05-11 10:52 ` Alexander Graf
@ 2012-05-11 19:20 ` Jason Baron
  2012-05-12  0:27   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 29+ messages in thread
From: Jason Baron @ 2012-05-11 19:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, qemu-devel, Alex Graf, Alex Williamson, anthony,
	David Gibson

On Fri, May 11, 2012 at 04:45:21PM +1000, Alexey Kardashevskiy wrote:
> Normally the pci_add_capability is called on devices to add new
> capability. This is ok for emulated devices which capabilities list
> is being built by QEMU.
> 
> In the case of VFIO the capability may already exist and adding new
> capability into the beginning of the linked list may create a loop.

Hi,

I don't quite understand how we get a loop, if 'offset' is supplied to
'pci_add_capability' and there is an overlap we get -EINVAL. Otherwise,
we are adding the capability in a new empty space. So, I see how we
could get the capability in the list twice, but not how there is a loop.
what am I missing?

Thanks,

-Jason

> 
> For example, the old code destroys the following config
> of PCIe Intel E1000E:
> 
> before adding PCI_CAP_ID_MSI (0x05):
> 0x34: 0xC8
> 0xC8: 0x01 0xD0
> 0xD0: 0x05 0xE0
> 0xE0: 0x10 0x00
> 
> after:
> 0x34: 0xD0
> 0xC8: 0x01 0xD0
> 0xD0: 0x05 0xC8
> 0xE0: 0x10 0x00
> 
> As result capabilities 0x01 and 0x05 point to each other.
> 
> The proposed patch does not change capability pointers when
> the same type capability is about to add.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/pci.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index aa0c0b8..1f7c924 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>      }
> 
>      config = pdev->config + offset;
> -    config[PCI_CAP_LIST_ID] = cap_id;
> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
> +        config[PCI_CAP_LIST_ID] = cap_id;
> +        config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> +        pdev->config[PCI_CAPABILITY_LIST] = offset;
> +        pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> +    }
>      memset(pdev->used + offset, 0xFF, size);
>      /* Make capability read-only by default */
>      memset(pdev->wmask + offset, 0, size);
> 
> 
> -- 
> Alexey
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-11 19:20 ` Jason Baron
@ 2012-05-12  0:27   ` Alexey Kardashevskiy
  2012-05-14  2:37     ` Alex Williamson
  0 siblings, 1 reply; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-05-12  0:27 UTC (permalink / raw)
  To: Jason Baron
  Cc: kvm, qemu-devel, Alex Graf, Alex Williamson, anthony,
	David Gibson

12.05.2012 5:20, Jason Baron написал:
> On Fri, May 11, 2012 at 04:45:21PM +1000, Alexey Kardashevskiy wrote:
>> Normally the pci_add_capability is called on devices to add new
>> capability. This is ok for emulated devices which capabilities list
>> is being built by QEMU.
>>
>> In the case of VFIO the capability may already exist and adding new
>> capability into the beginning of the linked list may create a loop.
> 
> Hi,
> 
> I don't quite understand how we get a loop, if 'offset' is supplied to
> 'pci_add_capability' and there is an overlap we get -EINVAL. Otherwise,
> we are adding the capability in a new empty space. So, I see how we
> could get the capability in the list twice, but not how there is a loop.
> what am I missing?


This happens only with VFIO.

The capability already exists in the config space as it is fetched from
the host kernel _before_ msi_init is called. Furthermore, msi_init() is
called when VFIO sees this capability in the config space.

We probably want to re-add all capabilities, do not know...



> Thanks,
> 
> -Jason
> 
>>
>> For example, the old code destroys the following config
>> of PCIe Intel E1000E:
>>
>> before adding PCI_CAP_ID_MSI (0x05):
>> 0x34: 0xC8
>> 0xC8: 0x01 0xD0
>> 0xD0: 0x05 0xE0
>> 0xE0: 0x10 0x00
>>
>> after:
>> 0x34: 0xD0
>> 0xC8: 0x01 0xD0
>> 0xD0: 0x05 0xC8
>> 0xE0: 0x10 0x00
>>
>> As result capabilities 0x01 and 0x05 point to each other.
>>
>> The proposed patch does not change capability pointers when
>> the same type capability is about to add.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  hw/pci.c |   10 ++++++----
>>  1 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/pci.c b/hw/pci.c
>> index aa0c0b8..1f7c924 100644
>> --- a/hw/pci.c
>> +++ b/hw/pci.c
>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>      }
>>
>>      config = pdev->config + offset;
>> -    config[PCI_CAP_LIST_ID] = cap_id;
>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>> +        config[PCI_CAP_LIST_ID] = cap_id;
>> +        config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>> +        pdev->config[PCI_CAPABILITY_LIST] = offset;
>> +        pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>> +    }
>>      memset(pdev->used + offset, 0xFF, size);
>>      /* Make capability read-only by default */
>>      memset(pdev->wmask + offset, 0, size);



-- 
With best regards

Alexey Kardashevskiy -- icq: 52150396

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
  2012-05-12  0:27   ` Alexey Kardashevskiy
@ 2012-05-14  2:37     ` Alex Williamson
  0 siblings, 0 replies; 29+ messages in thread
From: Alex Williamson @ 2012-05-14  2:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Jason Baron, qemu-devel, Alex Graf, anthony, David Gibson

On Sat, 2012-05-12 at 10:27 +1000, Alexey Kardashevskiy wrote:
> 12.05.2012 5:20, Jason Baron написал:
> > On Fri, May 11, 2012 at 04:45:21PM +1000, Alexey Kardashevskiy wrote:
> >> Normally the pci_add_capability is called on devices to add new
> >> capability. This is ok for emulated devices which capabilities list
> >> is being built by QEMU.
> >>
> >> In the case of VFIO the capability may already exist and adding new
> >> capability into the beginning of the linked list may create a loop.
> > 
> > Hi,
> > 
> > I don't quite understand how we get a loop, if 'offset' is supplied to
> > 'pci_add_capability' and there is an overlap we get -EINVAL. Otherwise,
> > we are adding the capability in a new empty space. So, I see how we
> > could get the capability in the list twice, but not how there is a loop.
> > what am I missing?
> 
> 
> This happens only with VFIO.
> 
> The capability already exists in the config space as it is fetched from
> the host kernel _before_ msi_init is called. Furthermore, msi_init() is
> called when VFIO sees this capability in the config space.
> 
> We probably want to re-add all capabilities, do not know...

Yep, I've had a msi[1] and msix[2] patches in my vfio tree for a long
time, we really want to support this generically for all capabilities
though.  We either need to detect or allow the caller to specify that
the config space is already programmed.  Note that even if we don't
create a loop, particularly finicky drivers may balk at just changing
the order of the capabilities list.  Thanks,

Alex

[1]https://github.com/awilliam/qemu-vfio/commit/a9f04351610ab69e22d90a76dc85be3269000a9f
[2]https://github.com/awilliam/qemu-vfio/commit/b4de3d0436b0260fbc6fcd40787c1c92ffca2980

> >>
> >> For example, the old code destroys the following config
> >> of PCIe Intel E1000E:
> >>
> >> before adding PCI_CAP_ID_MSI (0x05):
> >> 0x34: 0xC8
> >> 0xC8: 0x01 0xD0
> >> 0xD0: 0x05 0xE0
> >> 0xE0: 0x10 0x00
> >>
> >> after:
> >> 0x34: 0xD0
> >> 0xC8: 0x01 0xD0
> >> 0xD0: 0x05 0xC8
> >> 0xE0: 0x10 0x00
> >>
> >> As result capabilities 0x01 and 0x05 point to each other.
> >>
> >> The proposed patch does not change capability pointers when
> >> the same type capability is about to add.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>  hw/pci.c |   10 ++++++----
> >>  1 files changed, 6 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/hw/pci.c b/hw/pci.c
> >> index aa0c0b8..1f7c924 100644
> >> --- a/hw/pci.c
> >> +++ b/hw/pci.c
> >> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
> >>      }
> >>
> >>      config = pdev->config + offset;
> >> -    config[PCI_CAP_LIST_ID] = cap_id;
> >> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> >> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
> >> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> >> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
> >> +        config[PCI_CAP_LIST_ID] = cap_id;
> >> +        config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> >> +        pdev->config[PCI_CAPABILITY_LIST] = offset;
> >> +        pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> >> +    }
> >>      memset(pdev->used + offset, 0xFF, size);
> >>      /* Make capability read-only by default */
> >>      memset(pdev->wmask + offset, 0, size);
> 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
@ 2012-05-11  6:59 Alexey Kardashevskiy
  0 siblings, 0 replies; 29+ messages in thread
From: Alexey Kardashevskiy @ 2012-05-11  6:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, kvm

Normally the pci_add_capability is called on devices to add new
capability. This is ok for emulated devices which capabilities list
is being built by QEMU.

In the case of VFIO the capability may already exist and adding new
capability into the beginning of the linked list may create a loop.

For example, the old code destroys the following config
of PCIe Intel E1000E:

before adding PCI_CAP_ID_MSI (0x05):
0x34: 0xC8
0xC8: 0x01 0xD0
0xD0: 0x05 0xE0
0xE0: 0x10 0x00

after:
0x34: 0xD0
0xC8: 0x01 0xD0
0xD0: 0x05 0xC8
0xE0: 0x10 0x00

As result capabilities 0x01 and 0x05 point to each other.

The proposed patch does not change capability pointers when
the same type capability is about to add.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/pci.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index aa0c0b8..1f7c924 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
     }

     config = pdev->config + offset;
-    config[PCI_CAP_LIST_ID] = cap_id;
-    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
-    pdev->config[PCI_CAPABILITY_LIST] = offset;
-    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+    if (config[PCI_CAP_LIST_ID] != cap_id) {
+        config[PCI_CAP_LIST_ID] = cap_id;
+        config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
+        pdev->config[PCI_CAPABILITY_LIST] = offset;
+        pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+    }
     memset(pdev->used + offset, 0xFF, size);
     /* Make capability read-only by default */
     memset(pdev->wmask + offset, 0, size);


-- 
Alexey

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-06-08 15:22 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-11  6:45 [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space Alexey Kardashevskiy
2012-05-11 10:52 ` Alexander Graf
2012-05-11 12:47   ` Alexey Kardashevskiy
2012-05-11 14:13     ` Alexander Graf
2012-05-14  3:49       ` Alexey Kardashevskiy
2012-05-18  5:12         ` Alexey Kardashevskiy
2012-05-22  2:02           ` Benjamin Herrenschmidt
2012-05-22  3:21             ` Alexander Graf
2012-05-22  3:44               ` Alexey Kardashevskiy
2012-05-22  5:52                 ` Alexander Graf
2012-05-22  6:11                   ` Alexey Kardashevskiy
2012-05-22  6:31                     ` Alexander Graf
2012-05-22  7:01                       ` Alexey Kardashevskiy
2012-05-22  7:13                         ` Alexander Graf
2012-05-22  7:37                           ` Benjamin Herrenschmidt
2012-06-08  8:47                       ` Alexey Kardashevskiy
2012-06-08 10:56                         ` Jan Kiszka
2012-06-08 11:16                           ` Alexey Kardashevskiy
2012-06-08 11:30                             ` Jan Kiszka
2012-06-08 14:00                               ` Alexey Kardashevskiy
2012-06-08 14:43                                 ` Jan Kiszka
2012-06-08 14:56                                   ` Alex Williamson
2012-06-08 15:05                                     ` Jan Kiszka
2012-06-08 15:22                                       ` Alex Williamson
2012-05-22  6:38                     ` Alexander Graf
2012-05-11 19:20 ` Jason Baron
2012-05-12  0:27   ` Alexey Kardashevskiy
2012-05-14  2:37     ` Alex Williamson
  -- strict thread matches above, loose matches on Subject: below --
2012-05-11  6:59 Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).