From: Akihiko Odaki <akihiko.odaki@daynix.com>
To: Ani Sinha <anisinha@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>,
"Michael S. Tsirkin" <mst@redhat.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Julia Suvorova <jusual@redhat.com>
Subject: Re: [PATCH v7 5/6] hw/pci: ensure PCIE devices are plugged into only slot 0 of PCIE port
Date: Wed, 5 Jul 2023 19:42:51 +0900 [thread overview]
Message-ID: <cec1bb4e-813a-fd27-25a2-4d547b91613e@daynix.com> (raw)
In-Reply-To: <C3053F47-2C39-4CB4-BEBD-9EC95CF1C4BC@redhat.com>
On 2023/07/05 14:43, Ani Sinha wrote:
>
>
>> On 05-Jul-2023, at 7:09 AM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>>
>>
>> On 2023/07/05 0:07, Ani Sinha wrote:
>>>> On 04-Jul-2023, at 7:58 PM, Igor Mammedov <imammedo@redhat.com> wrote:
>>>>
>>>> On Tue, 4 Jul 2023 19:20:00 +0530
>>>> Ani Sinha <anisinha@redhat.com> wrote:
>>>>
>>>>>> On 04-Jul-2023, at 6:18 PM, Igor Mammedov <imammedo@redhat.com> wrote:
>>>>>>
>>>>>> On Tue, 4 Jul 2023 21:02:09 +0900
>>>>>> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>>
>>>>>>> On 2023/07/04 20:59, Ani Sinha wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>>>>>
>>>>>>>>> On 2023/07/04 20:25, Ani Sinha wrote:
>>>>>>>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>>>>>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>>>>>>>>>> invalid configuration is used. We may enforce this more strongly later on once
>>>>>>>>>> we get more clarity on whether we are introducing a bad regression for users
>>>>>>>>>> currenly using the wrong configuration.
>>>>>>>>>> The change has been tested to not break or alter behaviors of ARI capable
>>>>>>>>>> devices by instantiating seven vfs on an emulated igb device (the maximum
>>>>>>>>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>>>>>>>>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>>>>>>>>>> representation.
>>>>>>>>>> CC: jusual@redhat.com
>>>>>>>>>> CC: imammedo@redhat.com
>>>>>>>>>> CC: mst@redhat.com
>>>>>>>>>> CC: akihiko.odaki@daynix.com
>>>>>>>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>>>>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>>>>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>> hw/pci/pci.c | 15 +++++++++++++++
>>>>>>>>>> 1 file changed, 15 insertions(+)
>>>>>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>>>>>> index e2eb4c3b4a..47517ba3db 100644
>>>>>>>>>> --- a/hw/pci/pci.c
>>>>>>>>>> +++ b/hw/pci/pci.c
>>>>>>>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>>>>>>> static char *pcibus_get_dev_path(DeviceState *dev);
>>>>>>>>>> static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>>>>>>> static void pcibus_reset(BusState *qbus);
>>>>>>>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>>>>>>> static Property pci_props[] = {
>>>>>>>>>> DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>>>>>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> + /*
>>>>>>>>>> + * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>>>>>>>> + * PCI interpretation as all five bits reserved for slot addresses are
>>>>>>>>>> + * also used for function bits for the various vfs. Ignore that case.
>>>>>>>>>
>>>>>>>>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.
>>>>>>>>
>>>>>>>> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.
>>>>>>>
>>>>>>> Simply, you can say:
>>>>>>> With ARI, the slot number field in the conventional PCI interpretation
>>>>>>> can have a non-zero value as the field bits are reused to extend the
>>>>>>> function number bits. Ignore that case.
>>>>>>
>>>>>> mentioning 'conventional PCI interpretation' in comment and then immediately
>>>>>> checking 'pci_is_express(pci_dev)' is confusing. Since comment belongs
>>>>>> only to PCIE branch it would be better to talk in only about PCIe stuff
>>>>>> and referring to relevant portions of spec.
>>>>>
>>>>> Ok so how about this?
>>>>>
>>>>> * With ARI, devices can have non-zero slot in the traditional BDF
>>>>> * representation as all five bits reserved for slot addresses are
>>>>> * also used for function bits. Ignore that case.
>>>>
>>>> you still refer to traditional (which I misread as 'conventional'),
>>>> steal the linux comment and argument it with ARI if necessary,
>>>> something like this (probably needs some more massaging):
>>> The comment messaging in these patches seems to exceed the value of the patch itself :-)
>>> How about this?
>>> /*
>>> * A PCIe Downstream Port normally leads to a Link with only Device
>>> * 0 on it (PCIe spec r3.1, sec 7.3.1).
>>> * With ARI, PCI_SLOT() can return non-zero value as all five bits
>>> * reserved for slot addresses are also used for function bits.
>>> * Hence, ignore ARI capable devices.
>>> */
>>
>> Perhaps: s/normally leads to/must lead to/
>>
>> From the kernel perspective, they may need to deal with a quirky hardware that does not conform with the specification, but from QEMU perspective, it is what we *must* conform with.
>
> PCI base spec 4.0, rev 3, section 7.3.1 says:
>
> "
> Downstream Ports that do not have ARI Forwarding enabled must associate only Device 0 with the device attached to the Logical Bus representing the Link from the Port. Configuration Requests 15 targeting the Bus Number associated with a Link specifying Device Number 0 are delivered to the device attached to the Link; Configuration Requests specifying all other Device Numbers (1-31) must be terminated by the Switch Downstream Port or the Root Port with an Unsupported Request Completion Status (equivalent to Master Abort in PCI). Non-ARI Devices must not assume that Device Number 0 is associated with their Upstream Port, but must capture their assigned Device Number as discussed in Section 2.2.6.2. Non-ARI Devices must respond to all Type 0 Configuration Read Requests, regardless of the Device Number specified in the Request.
>
> …
>
> With an ARI Device, its Device Number is implied to be 0 rather than specified by a field within an ID. The traditional 5-bit Device Number and 3-bit Function Number fields in its associated Routing IDs, Requester IDs, and Completer IDs are interpreted as a single 8-bit Function Number. See Section 6.13. Any Type 0 Configuration Request targeting an unimplemented Function in an ARI Device must be handled as an Unsupported Request.
>
> “
>
> So it seems they do indeed use the “must” clause. I prefer to use the line from the spec verbatim as possible. Hence, this is what I am going with and be done with this patchset:
>
> /*
> * A PCIe Downstream Port that do not have ARI Forwarding enabled must
> * associate only Device 0 with the device attached to the bus
> * representing the Link from the Port (PCIe base spec rev 4.0 ver 0.3,
> * sec 7.3.1).
> * With ARI, PCI_SLOT() can return non-zero value as the traditional
> * 5-bit Device Number and 3-bit Function Number fields in its associated
> * Routing IDs, Requester IDs and Completer IDs are interpreted as a
> * single 8-bit Function Number. Hence, ignore ARI capable devices.
> */
Looks perfect.
>
>
>>
>> Otherwise looks good to me.
>>
>>>>
>>>>
>>>> /*
>>>> * A PCIe Downstream Port normally leads to a Link with only Device
>>>> * 0 on it (PCIe spec r3.1, sec 7.3.1).
>>>> However PCI_SLOT() is broken if ARI is enabled, hence work around it
>>>> by skipping check if the later cap is present.
>>>> */
>>>>
>>>>>
>>>>>
>>>>>> (for example see how it's done in kernel code: only_one_child(...)
>>>>>>
>>>>>> PS:
>>>>>> kernel can be forced to scan for !0 device numbers, but that's rather
>>>>>> a hack, so we shouldn't really care about that.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> + */
>>>>>>>>>> + if (pci_is_express(pci_dev) &&
>>>>>>>>>> + !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>>>>>>>>>> + pcie_has_upstream_port(pci_dev) &&
>>>>>>>>>> + PCI_SLOT(pci_dev->devfn)) {
>>>>>>>>>> + warn_report("PCI: slot %d is not valid for %s,"
>>>>>>>>>> + " parent device only allows plugging into slot 0.",
>>>>>>>>>> + PCI_SLOT(pci_dev->devfn), pci_dev->name);
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> if (pci_dev->failover_pair_id) {
>>>>>>>>>> if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>>>>>>>>> error_setg(errp, "failover primary device must be on "
>>>>>
>>>>
>>
>
next prev parent reply other threads:[~2023-07-05 10:43 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-04 11:25 [PATCH v7 0/6] test and QEMU fixes to ensure proper PCIE device usage Ani Sinha
2023-07-04 11:25 ` [PATCH v7 1/6] tests/acpi: allow changes in DSDT.noacpihp table blob Ani Sinha
2023-07-04 11:25 ` [PATCH v7 2/6] tests/acpi/bios-tables-test: use the correct slot on the pcie-root-port Ani Sinha
2023-07-04 11:25 ` [PATCH v7 3/6] tests/acpi/bios-tables-test: update acpi blob q35/DSDT.noacpihp Ani Sinha
2023-07-04 11:25 ` [PATCH v7 4/6] tests/qtest/hd-geo-test: fix incorrect pcie-root-port usage and simplify test Ani Sinha
2023-07-04 11:25 ` [PATCH v7 5/6] hw/pci: ensure PCIE devices are plugged into only slot 0 of PCIE port Ani Sinha
2023-07-04 11:38 ` Ani Sinha
2023-07-04 11:54 ` Akihiko Odaki
2023-07-04 11:59 ` Ani Sinha
2023-07-04 12:02 ` Akihiko Odaki
2023-07-04 12:08 ` Ani Sinha
2023-07-04 12:09 ` Akihiko Odaki
2023-07-04 12:28 ` Ani Sinha
2023-07-04 12:48 ` Igor Mammedov
2023-07-04 13:50 ` Ani Sinha
2023-07-04 14:28 ` Igor Mammedov
2023-07-04 15:07 ` Ani Sinha
2023-07-05 1:39 ` Akihiko Odaki
2023-07-05 5:43 ` Ani Sinha
2023-07-05 10:42 ` Akihiko Odaki [this message]
2023-07-04 11:25 ` [PATCH v7 6/6] hw/pci: add comment explaining the reason for checking function 0 in hotplug Ani Sinha
2023-07-04 12:15 ` Igor Mammedov
2023-07-04 12:31 ` Ani Sinha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cec1bb4e-813a-fd27-25a2-4d547b91613e@daynix.com \
--to=akihiko.odaki@daynix.com \
--cc=anisinha@redhat.com \
--cc=imammedo@redhat.com \
--cc=jusual@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).