qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/2] PCI-e device multi-function hot-add support
@ 2015-10-26  3:29 Cao jin
  2015-10-26  3:29 ` [Qemu-devel] [PATCH v5 1/2] remove function during multi-function hot-add Cao jin
  2015-10-26  3:29 ` [Qemu-devel] [PATCH v5 2/2] enable " Cao jin
  0 siblings, 2 replies; 9+ messages in thread
From: Cao jin @ 2015-10-26  3:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: izumi.taku, alex.williamson, mst

Support PCI-e device hot-add multi-function via device_add, just ensure
add the function 0 is added last. While allow user to roll back in the
middle via device_del, in case user cancle the operation.

changelog v5:
1. change pci_is_function_0() to pci_get_function_0(),
   and use it according to v4 comments.
2. reimplement the content of pci_get_function_0()

changelog v4:
1. take ari into account, add api: pci_is_function_0()
2. reorder the patch
3. other minor fix according to the v3 comment

changelog v3:
1. Flag device as unexposed when func 0 doesn`t exist, via return 0xFF
   in case of gratuitous pci bus scan.
2. Since device is unexposed to guest, can remove function individually,
   without interaction with the guest.

Cao jin (2):
  remove function during multi-function hot-add
  enable multi-function hot-add

 hw/pci/pci.c         | 31 ++++++++++++++++++++++++++++++-
 hw/pci/pci_host.c    | 13 +++++++++++--
 hw/pci/pcie.c        | 39 ++++++++++++++++++++++++++-------------
 include/hw/pci/pci.h |  1 +
 4 files changed, 68 insertions(+), 16 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v5 1/2] remove function during multi-function hot-add
  2015-10-26  3:29 [Qemu-devel] [PATCH v5 0/2] PCI-e device multi-function hot-add support Cao jin
@ 2015-10-26  3:29 ` Cao jin
  2015-10-26  3:29 ` [Qemu-devel] [PATCH v5 2/2] enable " Cao jin
  1 sibling, 0 replies; 9+ messages in thread
From: Cao jin @ 2015-10-26  3:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: izumi.taku, alex.williamson, mst

In case user regret when hot-add multi-function, we should roll back,
device_del the function added but still not worked.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/pci/pcie.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 6e28985..b1adeaf 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -261,13 +261,31 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
                         PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
 }
 
+static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
+{
+    object_unparent(OBJECT(dev));
+}
+
 void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler *hotplug_dev,
                                          DeviceState *dev, Error **errp)
 {
     uint8_t *exp_cap;
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    PCIBus *bus = pci_dev->bus;
 
     pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
 
+    /* In case user cancel the operation of multi-function hot-add,
+     * remove the function that is unexposed to guest individually,
+     * without interaction with guest.
+     */
+    if (PCI_FUNC(pci_dev->devfn) &&
+        !bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)]) {
+        pcie_unplug_device(bus, pci_dev, NULL);
+
+        return;
+    }
+
     pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
 }
 
@@ -378,11 +396,6 @@ void pcie_cap_slot_reset(PCIDevice *dev)
     hotplug_event_update_event_status(dev);
 }
 
-static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
-{
-    object_unparent(OBJECT(dev));
-}
-
 void pcie_cap_slot_write_config(PCIDevice *dev,
                                 uint32_t addr, uint32_t val, int len)
 {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add
  2015-10-26  3:29 [Qemu-devel] [PATCH v5 0/2] PCI-e device multi-function hot-add support Cao jin
  2015-10-26  3:29 ` [Qemu-devel] [PATCH v5 1/2] remove function during multi-function hot-add Cao jin
@ 2015-10-26  3:29 ` Cao jin
  2015-10-26  8:29   ` Michael S. Tsirkin
  1 sibling, 1 reply; 9+ messages in thread
From: Cao jin @ 2015-10-26  3:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: izumi.taku, alex.williamson, mst

Enable pcie device multifunction hot, just ensure the function 0
added last, then driver will got the notification to scan all the
function in the slot.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/pci/pci.c         | 31 ++++++++++++++++++++++++++++++-
 hw/pci/pci_host.c    | 13 +++++++++++--
 hw/pci/pcie.c        | 18 +++++++++---------
 include/hw/pci/pci.h |  1 +
 4 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b0bf540..6f43b12 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -847,6 +847,9 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
     PCIConfigWriteFunc *config_write = pc->config_write;
     Error *local_err = NULL;
     AddressSpace *dma_as;
+    DeviceState *dev = DEVICE(pci_dev);
+
+    pci_dev->bus = bus;
 
     if (devfn < 0) {
         for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
@@ -864,9 +867,17 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
                    PCI_SLOT(devfn), PCI_FUNC(devfn), name,
                    bus->devices[devfn]->name);
         return NULL;
+    } else if (dev->hotplugged &&
+               pci_get_function_0(pci_dev)) {
+        error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
+                   " new func %s cannot be exposed to guest.",
+                   PCI_SLOT(devfn),
+                   bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
+                   name);
+
+       return NULL;
     }
 
-    pci_dev->bus = bus;
     pci_dev->devfn = devfn;
     dma_as = pci_device_iommu_address_space(pci_dev);
 
@@ -2454,6 +2465,24 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range)
     pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
 }
 
+/* ARI device function number range is 0-255, means has only 1 function0;
+ * while non-ARI device has 1 function0 in each slot. non-ARI device could
+ * be PCI or PCIe, and there is up to 32 slots for PCI */
+PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
+{
+    PCIDevice *parent_dev;
+
+    parent_dev = pci_bridge_get_device(pci_dev->bus);
+    if (pcie_cap_is_arifwd_enabled(parent_dev) &&
+        pci_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)) {
+        /* ARI enabled */
+        return pci_dev->bus->devices[0];
+    } else {
+        /* no ARI */
+        return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
+    }
+}
+
 static const TypeInfo pci_device_type_info = {
     .name = TYPE_PCI_DEVICE,
     .parent = TYPE_DEVICE,
diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index 3e26f92..63d7d2f 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -20,6 +20,7 @@
 
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_host.h"
+#include "hw/pci/pci_bus.h"
 #include "trace.h"
 
 /* debug PCI */
@@ -75,7 +76,11 @@ void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len)
     PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
     uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
 
-    if (!pci_dev) {
+    /* non-zero functions are only exposed when function 0 is present,
+     * allowing direct removal of unexposed functions.
+     */
+    if (!pci_dev ||
+        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
         return;
     }
 
@@ -91,7 +96,11 @@ uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
     uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
     uint32_t val;
 
-    if (!pci_dev) {
+    /* non-zero functions are only exposed when function 0 is present,
+     * allowing direct removal of unexposed functions.
+     */
+    if (!pci_dev ||
+        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
         return ~0x0;
     }
 
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index b1adeaf..4ba9501 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -249,16 +249,16 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
         return;
     }
 
-    /* TODO: multifunction hot-plug.
-     * Right now, only a device of function = 0 is allowed to be
-     * hot plugged/unplugged.
+    /* To enable multifunction hot-plug, we just ensure the function
+     * 0 added last. When function 0 is added, we set the sltsta and
+     * inform OS via event notification.
      */
-    assert(PCI_FUNC(pci_dev->devfn) == 0);
-
-    pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
-                               PCI_EXP_SLTSTA_PDS);
-    pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
-                        PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
+    if (pci_get_function_0(pci_dev)) {
+        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
+                                   PCI_EXP_SLTSTA_PDS);
+        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
+                            PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
+    }
 }
 
 static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f5e7fd8..379b6e1 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -397,6 +397,7 @@ void pci_for_each_bus_depth_first(PCIBus *bus,
                                   void *(*begin)(PCIBus *bus, void *parent_state),
                                   void (*end)(PCIBus *bus, void *state),
                                   void *parent_state);
+PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
 
 /* Use this wrapper when specific scan order is not required. */
 static inline
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add
  2015-10-26  3:29 ` [Qemu-devel] [PATCH v5 2/2] enable " Cao jin
@ 2015-10-26  8:29   ` Michael S. Tsirkin
  2015-10-26 11:57     ` Cao jin
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2015-10-26  8:29 UTC (permalink / raw)
  To: Cao jin; +Cc: izumi.taku, alex.williamson, qemu-devel

On Mon, Oct 26, 2015 at 11:29:18AM +0800, Cao jin wrote:
> Enable pcie device multifunction hot, just ensure the function 0
> added last, then driver will got the notification to scan all the
> function in the slot.
> 
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  hw/pci/pci.c         | 31 ++++++++++++++++++++++++++++++-
>  hw/pci/pci_host.c    | 13 +++++++++++--
>  hw/pci/pcie.c        | 18 +++++++++---------
>  include/hw/pci/pci.h |  1 +
>  4 files changed, 51 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index b0bf540..6f43b12 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -847,6 +847,9 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>      PCIConfigWriteFunc *config_write = pc->config_write;
>      Error *local_err = NULL;
>      AddressSpace *dma_as;
> +    DeviceState *dev = DEVICE(pci_dev);
> +
> +    pci_dev->bus = bus;
>  
>      if (devfn < 0) {
>          for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
> @@ -864,9 +867,17 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>                     PCI_SLOT(devfn), PCI_FUNC(devfn), name,
>                     bus->devices[devfn]->name);
>          return NULL;
> +    } else if (dev->hotplugged &&
> +               pci_get_function_0(pci_dev)) {
> +        error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
> +                   " new func %s cannot be exposed to guest.",
> +                   PCI_SLOT(devfn),
> +                   bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
> +                   name);
> +
> +       return NULL;
>      }
>  
> -    pci_dev->bus = bus;
>      pci_dev->devfn = devfn;
>      dma_as = pci_device_iommu_address_space(pci_dev);
>  
> @@ -2454,6 +2465,24 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range)
>      pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
>  }
>  
> +/* ARI device function number range is 0-255, means has only 1 function0;
> + * while non-ARI device has 1 function0 in each slot. non-ARI device could
> + * be PCI or PCIe, and there is up to 32 slots for PCI */
> +PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
> +{
> +    PCIDevice *parent_dev;
> +
> +    parent_dev = pci_bridge_get_device(pci_dev->bus);
> +    if (pcie_cap_is_arifwd_enabled(parent_dev) &&
> +        pci_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)) {
> +        /* ARI enabled */
> +        return pci_dev->bus->devices[0];

That's wrong I think since software might enable ARI after hotplug.
And I'm not sure all functions must have the ARI capability.

I don't see why don't you just go by spec:

static
bool pcie_has_upstream_port(PCIDevice *dev)
{
    PCIDevice *parent_dev = pci_bridge_get_device(pci_dev->bus);

    /* 
     * Device associated with an upstream port.
     * As there are several types of these, it's easier to check the
     * parent device: upstream ports are always connected to
     * root or downstream ports.
     */
    return parent_dev &&
	    pci_is_express(parent_dev) &&
		parent_dev->exp.exp_cap &&
		(pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_ROOT_PORT ||
		pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_DOWNSTREAM);
}


PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
{
	if (pcie_has_upstream_port(dev)) {
	    /* With an upstream PCIE port, we only support 1 device at slot 0 */
            return pci_dev->bus->devices[0];
	} else {
	    /* Other bus types might support multiple devices at slots 0-31 */
            return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
	}
}

> +    } else {
> +        /* no ARI */
> +        return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
> +    }
> +}
> +
>  static const TypeInfo pci_device_type_info = {
>      .name = TYPE_PCI_DEVICE,
>      .parent = TYPE_DEVICE,
> diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
> index 3e26f92..63d7d2f 100644
> --- a/hw/pci/pci_host.c
> +++ b/hw/pci/pci_host.c
> @@ -20,6 +20,7 @@
>  
>  #include "hw/pci/pci.h"
>  #include "hw/pci/pci_host.h"
> +#include "hw/pci/pci_bus.h"
>  #include "trace.h"
>  
>  /* debug PCI */
> @@ -75,7 +76,11 @@ void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len)
>      PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
>      uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
>  
> -    if (!pci_dev) {
> +    /* non-zero functions are only exposed when function 0 is present,
> +     * allowing direct removal of unexposed functions.
> +     */
> +    if (!pci_dev ||
> +        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
>          return;
>      }
>  
> @@ -91,7 +96,11 @@ uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
>      uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
>      uint32_t val;
>  
> -    if (!pci_dev) {
> +    /* non-zero functions are only exposed when function 0 is present,
> +     * allowing direct removal of unexposed functions.
> +     */
> +    if (!pci_dev ||
> +        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
>          return ~0x0;
>      }
>  
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index b1adeaf..4ba9501 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -249,16 +249,16 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>          return;
>      }
>  
> -    /* TODO: multifunction hot-plug.
> -     * Right now, only a device of function = 0 is allowed to be
> -     * hot plugged/unplugged.
> +    /* To enable multifunction hot-plug, we just ensure the function
> +     * 0 added last. When function 0 is added, we set the sltsta and
> +     * inform OS via event notification.
>       */
> -    assert(PCI_FUNC(pci_dev->devfn) == 0);
> -
> -    pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> -                               PCI_EXP_SLTSTA_PDS);
> -    pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> -                        PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> +    if (pci_get_function_0(pci_dev)) {
> +        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> +                                   PCI_EXP_SLTSTA_PDS);
> +        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> +                            PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> +    }
>  }
>  
>  static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index f5e7fd8..379b6e1 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -397,6 +397,7 @@ void pci_for_each_bus_depth_first(PCIBus *bus,
>                                    void *(*begin)(PCIBus *bus, void *parent_state),
>                                    void (*end)(PCIBus *bus, void *state),
>                                    void *parent_state);
> +PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
>  
>  /* Use this wrapper when specific scan order is not required. */
>  static inline
> -- 
> 2.1.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add
  2015-10-26  8:29   ` Michael S. Tsirkin
@ 2015-10-26 11:57     ` Cao jin
  2015-10-26 12:14       ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Cao jin @ 2015-10-26 11:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: izumi.taku, alex.williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 10230 bytes --]

Hi,
     warning, there is a long story below O:-)

On 10/26/2015 04:29 PM, Michael S. Tsirkin wrote:
> On Mon, Oct 26, 2015 at 11:29:18AM +0800, Cao jin wrote:
>> Enable pcie device multifunction hot, just ensure the function 0
>> added last, then driver will got the notification to scan all the
>> function in the slot.
>>
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
>>   hw/pci/pci.c         | 31 ++++++++++++++++++++++++++++++-
>>   hw/pci/pci_host.c    | 13 +++++++++++--
>>   hw/pci/pcie.c        | 18 +++++++++---------
>>   include/hw/pci/pci.h |  1 +
>>   4 files changed, 51 insertions(+), 12 deletions(-)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index b0bf540..6f43b12 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -847,6 +847,9 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>       PCIConfigWriteFunc *config_write = pc->config_write;
>>       Error *local_err = NULL;
>>       AddressSpace *dma_as;
>> +    DeviceState *dev = DEVICE(pci_dev);
>> +
>> +    pci_dev->bus = bus;
>>
>>       if (devfn < 0) {
>>           for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
>> @@ -864,9 +867,17 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>                      PCI_SLOT(devfn), PCI_FUNC(devfn), name,
>>                      bus->devices[devfn]->name);
>>           return NULL;
>> +    } else if (dev->hotplugged &&
>> +               pci_get_function_0(pci_dev)) {
>> +        error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
>> +                   " new func %s cannot be exposed to guest.",
>> +                   PCI_SLOT(devfn),
>> +                   bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
>> +                   name);
>> +
>> +       return NULL;
>>       }
>>
>> -    pci_dev->bus = bus;
>>       pci_dev->devfn = devfn;
>>       dma_as = pci_device_iommu_address_space(pci_dev);
>>
>> @@ -2454,6 +2465,24 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range)
>>       pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
>>   }
>>
>> +/* ARI device function number range is 0-255, means has only 1 function0;
>> + * while non-ARI device has 1 function0 in each slot. non-ARI device could
>> + * be PCI or PCIe, and there is up to 32 slots for PCI */
>> +PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
>> +{
>> +    PCIDevice *parent_dev;
>> +
>> +    parent_dev = pci_bridge_get_device(pci_dev->bus);
>> +    if (pcie_cap_is_arifwd_enabled(parent_dev) &&
>> +        pci_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)) {
>> +        /* ARI enabled */
>> +        return pci_dev->bus->devices[0];
>
> That's wrong I think since software might enable ARI after hotplug.

According to the spec, ARI feature is enabled only based on the 
following 2 conditions:
1. For an ARI Downstream Port, the capability is communicated through 
the Device Capabilities 2 register.
2. For an ARI Device, the capability is communicated through the ARI 
Capability structure.

also according to the driver code pci_configure_ari(), I think my 
implementation does follows spec?

And as you know, only ARI feature is enabled, we return 
pci_dev->bus->devices[0]

> And I'm not sure all functions must have the ARI capability.
>

do you means there maybe the following condition: ARI forwarding bit is 
enabled in downstream port, but the functions below, some have ARI 
Capability structure while the others don`t. Shouldn`t we forbid the 
condition? Because If this condition happens, I am not sure whether the 
device could work normally.

In the IMPLEMENTATION NOTE: ARI Forwarding Enable Being Set 
Inappropriately, It seems the spec don`t want that complicated 
condition? While actually, qemu calls pcie_cap_arifwd_init() in root 
port/downstream port initialization first.

> I don't see why don't you just go by spec:
>

I do read the spec, and also referred the pcie driver code(see 
pci_configure_ari()). I think it is my inaccurate understanding about 
"upstream port" results in my implementation(I also consult PCISIG 
support team for this question, see attachment). The concept of 
"upstream port" in ARI device definition confuse me a lot.

Talking about the definition of ARI device, I always thought the 
"upstream port" should on the ARI device itself(like the figure I drew 
before), or else why the definition add the words "with an upstream 
port"? It seems to me that the emphasis is on "with an upstream port", 
and implies to me that the non-ARI device doesn`t have an upstream port. 
Seeing their replies in attachment, says that I am wrong at this point, 
both of them have an upstream port.(their saying actually make me more 
confused at that time, but now I think I am clear about this concept 
after reading your implementation below)

After reading your implementation, and the PCISIG support team replies 
again, finally I figure out that, "upstream port" in ARI device 
definition is just a port whose position is closer to root complex, and 
the point is, the "upstream port" doesn`t need to exist on the ARI 
device itself, which also means, take Figure 6-13 in PCIe spec 3.1, the 
Root Port A is the "upstream port" for ARI Device X, and the Downstream 
Port D in Switch is the "upstream port" for ARI Device Y. Now, do I 
understand it right?

Hope I can get you understood from the long description above. If I 
still don`t understand it right, please point it out, after all, it 
confused me for a long time.

> static
> bool pcie_has_upstream_port(PCIDevice *dev)
> {
>      PCIDevice *parent_dev = pci_bridge_get_device(pci_dev->bus);
>
>      /*
>       * Device associated with an upstream port.
>       * As there are several types of these, it's easier to check the
>       * parent device: upstream ports are always connected to
>       * root or downstream ports.
>       */
>      return parent_dev &&
> 	    pci_is_express(parent_dev) &&
> 		parent_dev->exp.exp_cap &&
> 		(pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_ROOT_PORT ||
> 		pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_DOWNSTREAM);
> }
>

Assume my understanding is right, which means both ARI and non-ARI 
device have the upstream port(root port or downstream port), could the 
existence of upstream port be the judgment condition?

>
> PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
> {
> 	if (pcie_has_upstream_port(dev)) {
> 	    /* With an upstream PCIE port, we only support 1 device at slot 0 */
>              return pci_dev->bus->devices[0];
> 	} else {
> 	    /* Other bus types might support multiple devices at slots 0-31 */
>              return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
> 	}
> }
>

>> +    } else {
>> +        /* no ARI */
>> +        return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
>> +    }
>> +}
>> +
>>   static const TypeInfo pci_device_type_info = {
>>       .name = TYPE_PCI_DEVICE,
>>       .parent = TYPE_DEVICE,
>> diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
>> index 3e26f92..63d7d2f 100644
>> --- a/hw/pci/pci_host.c
>> +++ b/hw/pci/pci_host.c
>> @@ -20,6 +20,7 @@
>>
>>   #include "hw/pci/pci.h"
>>   #include "hw/pci/pci_host.h"
>> +#include "hw/pci/pci_bus.h"
>>   #include "trace.h"
>>
>>   /* debug PCI */
>> @@ -75,7 +76,11 @@ void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len)
>>       PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
>>       uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
>>
>> -    if (!pci_dev) {
>> +    /* non-zero functions are only exposed when function 0 is present,
>> +     * allowing direct removal of unexposed functions.
>> +     */
>> +    if (!pci_dev ||
>> +        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
>>           return;
>>       }
>>
>> @@ -91,7 +96,11 @@ uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
>>       uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
>>       uint32_t val;
>>
>> -    if (!pci_dev) {
>> +    /* non-zero functions are only exposed when function 0 is present,
>> +     * allowing direct removal of unexposed functions.
>> +     */
>> +    if (!pci_dev ||
>> +        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
>>           return ~0x0;
>>       }
>>
>> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>> index b1adeaf..4ba9501 100644
>> --- a/hw/pci/pcie.c
>> +++ b/hw/pci/pcie.c
>> @@ -249,16 +249,16 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>>           return;
>>       }
>>
>> -    /* TODO: multifunction hot-plug.
>> -     * Right now, only a device of function = 0 is allowed to be
>> -     * hot plugged/unplugged.
>> +    /* To enable multifunction hot-plug, we just ensure the function
>> +     * 0 added last. When function 0 is added, we set the sltsta and
>> +     * inform OS via event notification.
>>        */
>> -    assert(PCI_FUNC(pci_dev->devfn) == 0);
>> -
>> -    pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
>> -                               PCI_EXP_SLTSTA_PDS);
>> -    pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
>> -                        PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
>> +    if (pci_get_function_0(pci_dev)) {
>> +        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
>> +                                   PCI_EXP_SLTSTA_PDS);
>> +        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
>> +                            PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
>> +    }
>>   }
>>
>>   static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index f5e7fd8..379b6e1 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -397,6 +397,7 @@ void pci_for_each_bus_depth_first(PCIBus *bus,
>>                                     void *(*begin)(PCIBus *bus, void *parent_state),
>>                                     void (*end)(PCIBus *bus, void *state),
>>                                     void *parent_state);
>> +PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
>>
>>   /* Use this wrapper when specific scan order is not required. */
>>   static inline
>> --
>> 2.1.0
> .
>

-- 
Yours Sincerely,

Cao Jin

[-- Attachment #2: Re: [Tech Support] question about ARI Device definition.eml --]
[-- Type: message/rfc822, Size: 11437 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 1639 bytes --]

Please see the response in your appended note.  The response is based on the PCIe 3.1 spec.

Regards,
Richard
(Please send responses to this inquiry (and future technical support inquiries) to pcitech@pcisig.com)
Sending technical questions to administration@pcisig.com delays the response to the question.
----------------------------------2242  613
From: Cao jin [mailto:caoj.fnst@cn.fujitsu.com] 
Sent: Friday, October 23, 2015 4:23 AM
To: pcitech@pcisig.com
Subject: [Tech Support] question about ARI Device definition

hello there,

     thanks for quick response to my last mail. I have a new question here.

     the definition of ARI Device in spec: A Device associated with an Upstream Port.
If I understand right, ARI device has an upstream port on the device, while non-ARI device don`t have. 
***********************2242 RESPONSE:  An ARI Endpoint and a non-ARI Endpoint both have an Upstream Port.

But in figure 6-13 of PCIe spec 3.1, on ARI device X & Y, don`t see any notation of upstream port on it, should they have a notation of upstream port?
***********************613 RESPONSE:  it should be obvious that a device connected to a Root Port or switch Downstream Port must be a device with an Upstream Port.

and also, If I understand right, ARI device has a upstream port associated, so the main functionality of upstream port is routing msg to the functions immediately below the port?
***********************2242 RESPONSE:  The main feature of an ARI device is a Function Number field of 8 bits. 
If you have any additional questions, please let us know.
 
--
Yours Sincerely,

Cao Jin

[-- Attachment #2.1.2: Type: text/html, Size: 5932 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add
  2015-10-26 11:57     ` Cao jin
@ 2015-10-26 12:14       ` Michael S. Tsirkin
  2015-10-27  9:39         ` Cao jin
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2015-10-26 12:14 UTC (permalink / raw)
  To: Cao jin; +Cc: izumi.taku, alex.williamson, qemu-devel

On Mon, Oct 26, 2015 at 07:57:16PM +0800, Cao jin wrote:
> Hi,
>     warning, there is a long story below O:-)
> 
> On 10/26/2015 04:29 PM, Michael S. Tsirkin wrote:
> >On Mon, Oct 26, 2015 at 11:29:18AM +0800, Cao jin wrote:
> >>Enable pcie device multifunction hot, just ensure the function 0
> >>added last, then driver will got the notification to scan all the
> >>function in the slot.
> >>
> >>Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>


In my opinion, what's confusing you is that you keep
thinking about ARI.
ARI is just a way where PCI_SLOT(devfn) can be != 0
when you are actually just part of the same device.

But with or without ARI, PCIE devices with upstream ports
can only occupy slot 0.

So let's check for that.


> >>---
> >>  hw/pci/pci.c         | 31 ++++++++++++++++++++++++++++++-
> >>  hw/pci/pci_host.c    | 13 +++++++++++--
> >>  hw/pci/pcie.c        | 18 +++++++++---------
> >>  include/hw/pci/pci.h |  1 +
> >>  4 files changed, 51 insertions(+), 12 deletions(-)
> >>
> >>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>index b0bf540..6f43b12 100644
> >>--- a/hw/pci/pci.c
> >>+++ b/hw/pci/pci.c
> >>@@ -847,6 +847,9 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
> >>      PCIConfigWriteFunc *config_write = pc->config_write;
> >>      Error *local_err = NULL;
> >>      AddressSpace *dma_as;
> >>+    DeviceState *dev = DEVICE(pci_dev);
> >>+
> >>+    pci_dev->bus = bus;
> >>
> >>      if (devfn < 0) {
> >>          for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
> >>@@ -864,9 +867,17 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
> >>                     PCI_SLOT(devfn), PCI_FUNC(devfn), name,
> >>                     bus->devices[devfn]->name);
> >>          return NULL;
> >>+    } else if (dev->hotplugged &&
> >>+               pci_get_function_0(pci_dev)) {
> >>+        error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
> >>+                   " new func %s cannot be exposed to guest.",
> >>+                   PCI_SLOT(devfn),
> >>+                   bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
> >>+                   name);
> >>+
> >>+       return NULL;
> >>      }
> >>
> >>-    pci_dev->bus = bus;
> >>      pci_dev->devfn = devfn;
> >>      dma_as = pci_device_iommu_address_space(pci_dev);
> >>
> >>@@ -2454,6 +2465,24 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range)
> >>      pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
> >>  }
> >>
> >>+/* ARI device function number range is 0-255, means has only 1 function0;
> >>+ * while non-ARI device has 1 function0 in each slot. non-ARI device could
> >>+ * be PCI or PCIe, and there is up to 32 slots for PCI */
> >>+PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
> >>+{
> >>+    PCIDevice *parent_dev;
> >>+
> >>+    parent_dev = pci_bridge_get_device(pci_dev->bus);
> >>+    if (pcie_cap_is_arifwd_enabled(parent_dev) &&
> >>+        pci_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)) {
> >>+        /* ARI enabled */
> >>+        return pci_dev->bus->devices[0];
> >
> >That's wrong I think since software might enable ARI after hotplug.
> 
> According to the spec, ARI feature is enabled only based on the following 2
> conditions:
> 1. For an ARI Downstream Port, the capability is communicated through the
> Device Capabilities 2 register.
> 2. For an ARI Device, the capability is communicated through the ARI
> Capability structure.
> 
> also according to the driver code pci_configure_ari(), I think my
> implementation does follows spec?
> 
> And as you know, only ARI feature is enabled, we return
> pci_dev->bus->devices[0]


Yes but that's not the point.
The point is whether a device can occupy slot != 0.

> >And I'm not sure all functions must have the ARI capability.
> >
> 
> do you means there maybe the following condition: ARI forwarding bit is
> enabled in downstream port, but the functions below, some have ARI
> Capability structure while the others don`t. Shouldn`t we forbid the
> condition? Because If this condition happens, I am not sure whether the
> device could work normally.
> 
> In the IMPLEMENTATION NOTE: ARI Forwarding Enable Being Set Inappropriately,
> It seems the spec don`t want that complicated condition? While actually,
> qemu calls pcie_cap_arifwd_init() in root port/downstream port
> initialization first.
> 
> >I don't see why don't you just go by spec:
> >
> 
> I do read the spec, and also referred the pcie driver code(see
> pci_configure_ari()). I think it is my inaccurate understanding about
> "upstream port" results in my implementation(I also consult PCISIG support
> team for this question, see attachment). The concept of "upstream port" in
> ARI device definition confuse me a lot.
> 
> Talking about the definition of ARI device, I always thought the "upstream
> port" should on the ARI device itself(like the figure I drew before), or
> else why the definition add the words "with an upstream port"? It seems to
> me that the emphasis is on "with an upstream port", and implies to me that
> the non-ARI device doesn`t have an upstream port. Seeing their replies in
> attachment, says that I am wrong at this point, both of them have an
> upstream port.(their saying actually make me more confused at that time, but
> now I think I am clear about this concept after reading your implementation
> below)
> 
> After reading your implementation, and the PCISIG support team replies
> again, finally I figure out that, "upstream port" in ARI device definition
> is just a port whose position is closer to root complex, and the point is,
> the "upstream port" doesn`t need to exist on the ARI device itself, which
> also means, take Figure 6-13 in PCIe spec 3.1, the Root Port A is the
> "upstream port" for ARI Device X, and the Downstream Port D in Switch is the
> "upstream port" for ARI Device Y. Now, do I understand it right?
> 
> Hope I can get you understood from the long description above. If I still
> don`t understand it right, please point it out, after all, it confused me
> for a long time.
> 
> >static
> >bool pcie_has_upstream_port(PCIDevice *dev)
> >{
> >     PCIDevice *parent_dev = pci_bridge_get_device(pci_dev->bus);
> >
> >     /*
> >      * Device associated with an upstream port.
> >      * As there are several types of these, it's easier to check the
> >      * parent device: upstream ports are always connected to
> >      * root or downstream ports.
> >      */
> >     return parent_dev &&
> >	    pci_is_express(parent_dev) &&
> >		parent_dev->exp.exp_cap &&
> >		(pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >		pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_DOWNSTREAM);
> >}
> >
> 
> Assume my understanding is right, which means both ARI and non-ARI device
> have the upstream port(root port or downstream port), could the existence of
> upstream port be the judgment condition?

This tells us whether we are behind a port that
can address devices in slot != 0.


> >
> >PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
> >{
> >	if (pcie_has_upstream_port(dev)) {
> >	    /* With an upstream PCIE port, we only support 1 device at slot 0 */
> >             return pci_dev->bus->devices[0];
> >	} else {
> >	    /* Other bus types might support multiple devices at slots 0-31 */
> >             return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
> >	}
> >}
> >
> 
> >>+    } else {
> >>+        /* no ARI */
> >>+        return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
> >>+    }
> >>+}
> >>+
> >>  static const TypeInfo pci_device_type_info = {
> >>      .name = TYPE_PCI_DEVICE,
> >>      .parent = TYPE_DEVICE,
> >>diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
> >>index 3e26f92..63d7d2f 100644
> >>--- a/hw/pci/pci_host.c
> >>+++ b/hw/pci/pci_host.c
> >>@@ -20,6 +20,7 @@
> >>
> >>  #include "hw/pci/pci.h"
> >>  #include "hw/pci/pci_host.h"
> >>+#include "hw/pci/pci_bus.h"
> >>  #include "trace.h"
> >>
> >>  /* debug PCI */
> >>@@ -75,7 +76,11 @@ void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len)
> >>      PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
> >>      uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
> >>
> >>-    if (!pci_dev) {
> >>+    /* non-zero functions are only exposed when function 0 is present,
> >>+     * allowing direct removal of unexposed functions.
> >>+     */
> >>+    if (!pci_dev ||
> >>+        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
> >>          return;
> >>      }
> >>
> >>@@ -91,7 +96,11 @@ uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
> >>      uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
> >>      uint32_t val;
> >>
> >>-    if (!pci_dev) {
> >>+    /* non-zero functions are only exposed when function 0 is present,
> >>+     * allowing direct removal of unexposed functions.
> >>+     */
> >>+    if (!pci_dev ||
> >>+        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
> >>          return ~0x0;
> >>      }
> >>
> >>diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >>index b1adeaf..4ba9501 100644
> >>--- a/hw/pci/pcie.c
> >>+++ b/hw/pci/pcie.c
> >>@@ -249,16 +249,16 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>          return;
> >>      }
> >>
> >>-    /* TODO: multifunction hot-plug.
> >>-     * Right now, only a device of function = 0 is allowed to be
> >>-     * hot plugged/unplugged.
> >>+    /* To enable multifunction hot-plug, we just ensure the function
> >>+     * 0 added last. When function 0 is added, we set the sltsta and
> >>+     * inform OS via event notification.
> >>       */
> >>-    assert(PCI_FUNC(pci_dev->devfn) == 0);
> >>-
> >>-    pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> >>-                               PCI_EXP_SLTSTA_PDS);
> >>-    pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> >>-                        PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> >>+    if (pci_get_function_0(pci_dev)) {
> >>+        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> >>+                                   PCI_EXP_SLTSTA_PDS);
> >>+        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> >>+                            PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> >>+    }
> >>  }
> >>
> >>  static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
> >>diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >>index f5e7fd8..379b6e1 100644
> >>--- a/include/hw/pci/pci.h
> >>+++ b/include/hw/pci/pci.h
> >>@@ -397,6 +397,7 @@ void pci_for_each_bus_depth_first(PCIBus *bus,
> >>                                    void *(*begin)(PCIBus *bus, void *parent_state),
> >>                                    void (*end)(PCIBus *bus, void *state),
> >>                                    void *parent_state);
> >>+PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
> >>
> >>  /* Use this wrapper when specific scan order is not required. */
> >>  static inline
> >>--
> >>2.1.0
> >.
> >
> 
> -- 
> Yours Sincerely,
> 
> Cao Jin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add
  2015-10-26 12:14       ` Michael S. Tsirkin
@ 2015-10-27  9:39         ` Cao jin
  2015-10-27  9:47           ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Cao jin @ 2015-10-27  9:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: izumi.taku, alex.williamson, qemu-devel

hello Michael,
     there is still a long story with many personal understanding below, 
hope to get directions. If all my understanding are right, I think maybe 
I could say: finally I understand your implementation:)

On 10/26/2015 08:14 PM, Michael S. Tsirkin wrote:
> On Mon, Oct 26, 2015 at 07:57:16PM +0800, Cao jin wrote:
>> Hi,
>>      warning, there is a long story below O:-)
>>
>> On 10/26/2015 04:29 PM, Michael S. Tsirkin wrote:
>>> On Mon, Oct 26, 2015 at 11:29:18AM +0800, Cao jin wrote:
>>>> Enable pcie device multifunction hot, just ensure the function 0
>>>> added last, then driver will got the notification to scan all the
>>>> function in the slot.
>>>>
>>>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>
>
> In my opinion, what's confusing you is that you keep
> thinking about ARI.
> ARI is just a way where PCI_SLOT(devfn) can be != 0
> when you are actually just part of the same device.
>
> But with or without ARI, PCIE devices with upstream ports
> can only occupy slot 0.
>

Yes, I think I got you, please see my understanding and correct me 
please if I am wrong:
In the time of PCI bus, there are as many as 21 slots(device) under a 
PCI-PCI bridge, as IDSEL pin can be only connected to AD[31:11], but 
system software still need so scan the bus to enumerate device from 0 to 
31, and each function has 8 functions at most, so there is 
devices[PCI_SLOT_MAX * PCI_FUNC_MAX] under struct PCIBus. But PCIe bus 
is composed of point-to-point Links that interconnect a set of 
components, so physically, there is only 1 slot(device) at the leaf tip 
of the topology, and so PCI_SLOT(devfn) must be 0, and the old(PCI) spec 
says there is 8 functions at most in a device, if there are totally only 
8 functions under a downstream port, that would be a big restriction. 
For backward compatible and other reasons, PCIe designed the ARI 
feature, to extend device up to 256 functions, so we now could have 
PCI_SLOT(devfn) can be != 0.
I think it also could be said that, ARI give the standard that vendors 
could follow to product a PCIe device which could has up to 256 
functions(in reality, at least now, I never heard of a 256 functions 
device, yes I am a newbie about this:P)

summary:
1. PCI device, each slot has 8 functions at most, 32 slots at 
most(actually 21 slots) below a PCI-PCI bridge
2. PCIe device without ARI, only 1 slot with 8 functions at most below 
the downstream port
3. PCIe device with ARI, only 1 slot with 256 functions at most


I did the test as following, start vm with: ./qemu-system-x86_64 -M q35 
-device ioh3420,bus=pcie.0,addr=1c.0,port=1,id=br1,chassis=1 -hda 
/var/lib/libvirt/images/fedora21.qcow2 -smp 2 --enable-kvm -m 1024 
--monitor stdio

then: device_add e1000,multifunction=on,bus=br1,id=net0,addr=2.0

find that guest cannot find net0 at addr=2.0 indeed, and net0 also flash 
and gone via "info pci", so the ARI feature in the future is going to 
fix this kind issue?

Is the understanding above right?

> So let's check for that.
>
>
>>>> ---
>>>>   hw/pci/pci.c         | 31 ++++++++++++++++++++++++++++++-
>>>>   hw/pci/pci_host.c    | 13 +++++++++++--
>>>>   hw/pci/pcie.c        | 18 +++++++++---------
>>>>   include/hw/pci/pci.h |  1 +
>>>>   4 files changed, 51 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>> index b0bf540..6f43b12 100644
>>>> --- a/hw/pci/pci.c
>>>> +++ b/hw/pci/pci.c
>>>> @@ -847,6 +847,9 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>>>       PCIConfigWriteFunc *config_write = pc->config_write;
>>>>       Error *local_err = NULL;
>>>>       AddressSpace *dma_as;
>>>> +    DeviceState *dev = DEVICE(pci_dev);
>>>> +
>>>> +    pci_dev->bus = bus;
>>>>
>>>>       if (devfn < 0) {
>>>>           for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
>>>> @@ -864,9 +867,17 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>>>                      PCI_SLOT(devfn), PCI_FUNC(devfn), name,
>>>>                      bus->devices[devfn]->name);
>>>>           return NULL;
>>>> +    } else if (dev->hotplugged &&
>>>> +               pci_get_function_0(pci_dev)) {
>>>> +        error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
>>>> +                   " new func %s cannot be exposed to guest.",
>>>> +                   PCI_SLOT(devfn),
>>>> +                   bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
>>>> +                   name);
>>>> +
>>>> +       return NULL;
>>>>       }
>>>>
>>>> -    pci_dev->bus = bus;
>>>>       pci_dev->devfn = devfn;
>>>>       dma_as = pci_device_iommu_address_space(pci_dev);
>>>>
>>>> @@ -2454,6 +2465,24 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range)
>>>>       pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
>>>>   }
>>>>
>>>> +/* ARI device function number range is 0-255, means has only 1 function0;
>>>> + * while non-ARI device has 1 function0 in each slot. non-ARI device could
>>>> + * be PCI or PCIe, and there is up to 32 slots for PCI */
>>>> +PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
>>>> +{
>>>> +    PCIDevice *parent_dev;
>>>> +
>>>> +    parent_dev = pci_bridge_get_device(pci_dev->bus);
>>>> +    if (pcie_cap_is_arifwd_enabled(parent_dev) &&
>>>> +        pci_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)) {
>>>> +        /* ARI enabled */
>>>> +        return pci_dev->bus->devices[0];
>>>
>>> That's wrong I think since software might enable ARI after hotplug.
>>

I read pci_configure_ari() in linux driver again, find something new to 
me: driver enable ARI only by seeing the ari forwarding bit in upstream 
bridge and the ARI capability register in function 0. if it is like 
this, I think, if we do function hotplug, while want to enable ARI, we 
must ensure function0 has ARI capability register, but driver won`t see 
the other functions who don`t have ARI capability register

>> According to the spec, ARI feature is enabled only based on the following 2
>> conditions:
>> 1. For an ARI Downstream Port, the capability is communicated through the
>> Device Capabilities 2 register.
>> 2. For an ARI Device, the capability is communicated through the ARI
>> Capability structure.
>>
>> also according to the driver code pci_configure_ari(), I think my
>> implementation does follows spec?
>>
>> And as you know, only ARI feature is enabled, we return
>> pci_dev->bus->devices[0]
>

I think I am also wrong here yesterday, the saying may should be: it is 
only non root complex integrated device who return 
pci_dev->bus->devices[0].

>
> Yes but that's not the point.
> The point is whether a device can occupy slot != 0.
>
>>> And I'm not sure all functions must have the ARI capability.
>>>
>>

Based on the understanding above, to enable ARI, function0 must have ARI 
capability, but if the other non-zero functions don`t have, linux driver 
won`t see it

>> do you means there maybe the following condition: ARI forwarding bit is
>> enabled in downstream port, but the functions below, some have ARI
>> Capability structure while the others don`t. Shouldn`t we forbid the
>> condition? Because If this condition happens, I am not sure whether the
>> device could work normally.
>>
>> In the IMPLEMENTATION NOTE: ARI Forwarding Enable Being Set Inappropriately,
>> It seems the spec don`t want that complicated condition? While actually,
>> qemu calls pcie_cap_arifwd_init() in root port/downstream port
>> initialization first.
>>
>>> I don't see why don't you just go by spec:
>>>
>>
>> I do read the spec, and also referred the pcie driver code(see
>> pci_configure_ari()). I think it is my inaccurate understanding about
>> "upstream port" results in my implementation(I also consult PCISIG support
>> team for this question, see attachment). The concept of "upstream port" in
>> ARI device definition confuse me a lot.
>>
>> Talking about the definition of ARI device, I always thought the "upstream
>> port" should on the ARI device itself(like the figure I drew before), or
>> else why the definition add the words "with an upstream port"? It seems to
>> me that the emphasis is on "with an upstream port", and implies to me that
>> the non-ARI device doesn`t have an upstream port. Seeing their replies in
>> attachment, says that I am wrong at this point, both of them have an
>> upstream port.(their saying actually make me more confused at that time, but
>> now I think I am clear about this concept after reading your implementation
>> below)
>>
>> After reading your implementation, and the PCISIG support team replies
>> again, finally I figure out that, "upstream port" in ARI device definition
>> is just a port whose position is closer to root complex, and the point is,
>> the "upstream port" doesn`t need to exist on the ARI device itself, which
>> also means, take Figure 6-13 in PCIe spec 3.1, the Root Port A is the
>> "upstream port" for ARI Device X, and the Downstream Port D in Switch is the
>> "upstream port" for ARI Device Y. Now, do I understand it right?
>>
>> Hope I can get you understood from the long description above. If I still
>> don`t understand it right, please point it out, after all, it confused me
>> for a long time.
>>
>>> static
>>> bool pcie_has_upstream_port(PCIDevice *dev)
>>> {
>>>      PCIDevice *parent_dev = pci_bridge_get_device(pci_dev->bus);
>>>
>>>      /*
>>>       * Device associated with an upstream port.
>>>       * As there are several types of these, it's easier to check the
>>>       * parent device: upstream ports are always connected to
>>>       * root or downstream ports.
>>>       */
>>>      return parent_dev &&
>>> 	    pci_is_express(parent_dev) &&
>>> 		parent_dev->exp.exp_cap &&
>>> 		(pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>> 		pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_DOWNSTREAM);
>>> }
>>>
>>
>> Assume my understanding is right, which means both ARI and non-ARI device
>> have the upstream port(root port or downstream port), could the existence of
>> upstream port be the judgment condition?
>
> This tells us whether we are behind a port that
> can address devices in slot != 0.
>
>

Seems I find something, according to spec: Endpoints are classified as 
either legacy, PCI Express, or Root Complex Integrated Endpoints. I 
think this is also what you means in comment "As there are several types 
of these"
And "we are behind a port" means the device is not a Root Complex 
Integrated, only the other two types can be behind a port.

Am I right about this?

>>>
>>> PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
>>> {
>>> 	if (pcie_has_upstream_port(dev)) {
>>> 	    /* With an upstream PCIE port, we only support 1 device at slot 0 */
>>>              return pci_dev->bus->devices[0];
>>> 	} else {
>>> 	    /* Other bus types might support multiple devices at slots 0-31 */
>>>              return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
>>> 	}
>>> }
>>>
>>
>>>> +    } else {
>>>> +        /* no ARI */
>>>> +        return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
>>>> +    }
>>>> +}
>>>> +
>>>>   static const TypeInfo pci_device_type_info = {
>>>>       .name = TYPE_PCI_DEVICE,
>>>>       .parent = TYPE_DEVICE,
>>>> diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
>>>> index 3e26f92..63d7d2f 100644
>>>> --- a/hw/pci/pci_host.c
>>>> +++ b/hw/pci/pci_host.c
>>>> @@ -20,6 +20,7 @@
>>>>
>>>>   #include "hw/pci/pci.h"
>>>>   #include "hw/pci/pci_host.h"
>>>> +#include "hw/pci/pci_bus.h"
>>>>   #include "trace.h"
>>>>
>>>>   /* debug PCI */
>>>> @@ -75,7 +76,11 @@ void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len)
>>>>       PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
>>>>       uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
>>>>
>>>> -    if (!pci_dev) {
>>>> +    /* non-zero functions are only exposed when function 0 is present,
>>>> +     * allowing direct removal of unexposed functions.
>>>> +     */
>>>> +    if (!pci_dev ||
>>>> +        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
>>>>           return;
>>>>       }
>>>>
>>>> @@ -91,7 +96,11 @@ uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
>>>>       uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
>>>>       uint32_t val;
>>>>
>>>> -    if (!pci_dev) {
>>>> +    /* non-zero functions are only exposed when function 0 is present,
>>>> +     * allowing direct removal of unexposed functions.
>>>> +     */
>>>> +    if (!pci_dev ||
>>>> +        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
>>>>           return ~0x0;
>>>>       }
>>>>
>>>> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>>>> index b1adeaf..4ba9501 100644
>>>> --- a/hw/pci/pcie.c
>>>> +++ b/hw/pci/pcie.c
>>>> @@ -249,16 +249,16 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>           return;
>>>>       }
>>>>
>>>> -    /* TODO: multifunction hot-plug.
>>>> -     * Right now, only a device of function = 0 is allowed to be
>>>> -     * hot plugged/unplugged.
>>>> +    /* To enable multifunction hot-plug, we just ensure the function
>>>> +     * 0 added last. When function 0 is added, we set the sltsta and
>>>> +     * inform OS via event notification.
>>>>        */
>>>> -    assert(PCI_FUNC(pci_dev->devfn) == 0);
>>>> -
>>>> -    pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
>>>> -                               PCI_EXP_SLTSTA_PDS);
>>>> -    pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
>>>> -                        PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
>>>> +    if (pci_get_function_0(pci_dev)) {
>>>> +        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
>>>> +                                   PCI_EXP_SLTSTA_PDS);
>>>> +        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
>>>> +                            PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
>>>> +    }
>>>>   }
>>>>
>>>>   static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>>>> index f5e7fd8..379b6e1 100644
>>>> --- a/include/hw/pci/pci.h
>>>> +++ b/include/hw/pci/pci.h
>>>> @@ -397,6 +397,7 @@ void pci_for_each_bus_depth_first(PCIBus *bus,
>>>>                                     void *(*begin)(PCIBus *bus, void *parent_state),
>>>>                                     void (*end)(PCIBus *bus, void *state),
>>>>                                     void *parent_state);
>>>> +PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
>>>>
>>>>   /* Use this wrapper when specific scan order is not required. */
>>>>   static inline
>>>> --
>>>> 2.1.0
>>> .
>>>
>>
>> --
>> Yours Sincerely,
>>
>> Cao Jin
>
> .
>

-- 
Yours Sincerely,

Cao Jin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add
  2015-10-27  9:39         ` Cao jin
@ 2015-10-27  9:47           ` Michael S. Tsirkin
  2015-10-27 10:39             ` Cao jin
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2015-10-27  9:47 UTC (permalink / raw)
  To: Cao jin; +Cc: izumi.taku, alex.williamson, qemu-devel

On Tue, Oct 27, 2015 at 05:39:01PM +0800, Cao jin wrote:
> >>>static
> >>>bool pcie_has_upstream_port(PCIDevice *dev)
> >>>{
> >>>     PCIDevice *parent_dev = pci_bridge_get_device(pci_dev->bus);
> >>>
> >>>     /*
> >>>      * Device associated with an upstream port.
> >>>      * As there are several types of these, it's easier to check the
> >>>      * parent device: upstream ports are always connected to
> >>>      * root or downstream ports.
> >>>      */
> >>>     return parent_dev &&
> >>>	    pci_is_express(parent_dev) &&
> >>>		parent_dev->exp.exp_cap &&
> >>>		(pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >>>		pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_DOWNSTREAM);
> >>>}
> >>>
> >>
> >>Assume my understanding is right, which means both ARI and non-ARI device
> >>have the upstream port(root port or downstream port), could the existence of
> >>upstream port be the judgment condition?
> >
> >This tells us whether we are behind a port that
> >can address devices in slot != 0.
> >
> >
> 
> Seems I find something, according to spec: Endpoints are classified as
> either legacy, PCI Express, or Root Complex Integrated Endpoints. I think
> this is also what you means in comment "As there are several types of these"
> And "we are behind a port" means the device is not a Root Complex
> Integrated, only the other two types can be behind a port.
> 
> Am I right about this?

No, device can be a bridge itself. It's simple: root and downstream
ports only have 1 slot (and it's a physical one) so you know there's
always 1 device there. It could have multiple functions.

-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add
  2015-10-27  9:47           ` Michael S. Tsirkin
@ 2015-10-27 10:39             ` Cao jin
  0 siblings, 0 replies; 9+ messages in thread
From: Cao jin @ 2015-10-27 10:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: izumi.taku, alex.williamson, qemu-devel

Hi Michael
     Thanks for your quick response

On 10/27/2015 05:47 PM, Michael S. Tsirkin wrote:
> On Tue, Oct 27, 2015 at 05:39:01PM +0800, Cao jin wrote:
>>>>> static
>>>>> bool pcie_has_upstream_port(PCIDevice *dev)
>>>>> {
>>>>>      PCIDevice *parent_dev = pci_bridge_get_device(pci_dev->bus);
>>>>>
>>>>>      /*
>>>>>       * Device associated with an upstream port.
>>>>>       * As there are several types of these, it's easier to check the
>>>>>       * parent device: upstream ports are always connected to
>>>>>       * root or downstream ports.
>>>>>       */
>>>>>      return parent_dev &&
>>>>> 	    pci_is_express(parent_dev) &&
>>>>> 		parent_dev->exp.exp_cap &&
>>>>> 		(pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>>>> 		pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_DOWNSTREAM);
>>>>> }
>>>>>
>>>>
>>>> Assume my understanding is right, which means both ARI and non-ARI device
>>>> have the upstream port(root port or downstream port), could the existence of
>>>> upstream port be the judgment condition?
>>>
>>> This tells us whether we are behind a port that
>>> can address devices in slot != 0.
>>>
>>>
>>
>> Seems I find something, according to spec: Endpoints are classified as
>> either legacy, PCI Express, or Root Complex Integrated Endpoints. I think
>> this is also what you means in comment "As there are several types of these"
>> And "we are behind a port" means the device is not a Root Complex
>> Integrated, only the other two types can be behind a port.
>>
>> Am I right about this?
>
> No, device can be a bridge itself.

Ah yes, you are saying a PCIe-->PCI(PCI-X) bridge, right?

> It's simple: root and downstream
> ports only have 1 slot (and it's a physical one) so you know there's
> always 1 device there. It could have multiple functions.
>
Yes, I am aware.
It seems there is no big and obvious error about other understandings in 
last mail:) I just want to get myself totally understood about ARI and 
your implementation before using your code. Thank you very much for your 
time and patience:)

And the v6 is on the way

-- 
Yours Sincerely,

Cao Jin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-10-27 10:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-26  3:29 [Qemu-devel] [PATCH v5 0/2] PCI-e device multi-function hot-add support Cao jin
2015-10-26  3:29 ` [Qemu-devel] [PATCH v5 1/2] remove function during multi-function hot-add Cao jin
2015-10-26  3:29 ` [Qemu-devel] [PATCH v5 2/2] enable " Cao jin
2015-10-26  8:29   ` Michael S. Tsirkin
2015-10-26 11:57     ` Cao jin
2015-10-26 12:14       ` Michael S. Tsirkin
2015-10-27  9:39         ` Cao jin
2015-10-27  9:47           ` Michael S. Tsirkin
2015-10-27 10:39             ` Cao jin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).