* [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI
@ 2015-10-08 15:20 Ben Shelton
2015-10-15 17:58 ` Bjorn Helgaas
2015-10-15 19:31 ` Bjorn Helgaas
0 siblings, 2 replies; 9+ messages in thread
From: Ben Shelton @ 2015-10-08 15:20 UTC (permalink / raw)
To: helgaas; +Cc: bhelgaas, linux-pci, linux-kernel, Ben Shelton
For some SR-IOV devices, the number of available virtual functions increases
after enabling ARI. Currently, SRIOV_NUM_VF is read and saved off before the
ARI control bit is enabled in SRIOV_CTRL. This causes an issue when VFs are
enabled.
At device init, SRIOV_INITIAL_VF and SRIOV_NUM_VF are specified to contain the
number of available VFs for the device. sriov_enable() does a sanity check
that SRIOV_INITIAL_VF is not greater than iov->total_VFs, the saved-off value
of SRIOV_NUM_VF. Since the value of both SRIOV_INITIAL_VF and SRIOV_NUM_VF has
increased after enabling the ARI bit, the check fails, and the VFs cannot be
enabled.
To fix the issue, write SRIOV_CTRL first, and then read SRIOV_NUM_VF.
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
---
Changes since v1:
* Moved read of SRIOV_NUM_VF rather than re-reading it if ARI was enabled.
drivers/pci/iov.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..0174044 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -399,10 +399,6 @@ static int sriov_init(struct pci_dev *dev, int pos)
ssleep(1);
}
- pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
- if (!total)
- return 0;
-
ctrl = 0;
list_for_each_entry(pdev, &dev->bus->devices, bus_list)
if (pdev->is_physfn)
@@ -414,6 +410,11 @@ static int sriov_init(struct pci_dev *dev, int pos)
found:
pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
+
+ pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
+ if (!total)
+ return 0;
+
pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, 0);
pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, &offset);
pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, &stride);
--
1.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI
2015-10-08 15:20 [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI Ben Shelton
@ 2015-10-15 17:58 ` Bjorn Helgaas
2015-10-15 20:00 ` Alexander Duyck
2015-10-15 19:31 ` Bjorn Helgaas
1 sibling, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2015-10-15 17:58 UTC (permalink / raw)
To: Ben Shelton; +Cc: bhelgaas, linux-pci, linux-kernel
Hi Ben,
On Thu, Oct 08, 2015 at 10:20:17AM -0500, Ben Shelton wrote:
> For some SR-IOV devices, the number of available virtual functions increases
> after enabling ARI. Currently, SRIOV_NUM_VF is read and saved off before the
> ARI control bit is enabled in SRIOV_CTRL. This causes an issue when VFs are
> enabled.
>
> At device init, SRIOV_INITIAL_VF and SRIOV_NUM_VF are specified to contain the
> number of available VFs for the device. sriov_enable() does a sanity check
> that SRIOV_INITIAL_VF is not greater than iov->total_VFs, the saved-off value
> of SRIOV_NUM_VF. Since the value of both SRIOV_INITIAL_VF and SRIOV_NUM_VF has
> increased after enabling the ARI bit, the check fails, and the VFs cannot be
> enabled.
>
> To fix the issue, write SRIOV_CTRL first, and then read SRIOV_NUM_VF.
I think you mean PCI_SRIOV_TOTAL_VR (not NUM_VF), right?
This is interesting because the spec says TotalVFs is HwInit, which
means it's read-only, and it doesn't mention anything about it
changing when ARIis enabled. I can see why it would change in that
case, so maybe this is just a goof in the spec.
Bjorn
> Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
> ---
>
> Changes since v1:
> * Moved read of SRIOV_NUM_VF rather than re-reading it if ARI was enabled.
>
> drivers/pci/iov.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index ee0ebff..0174044 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -399,10 +399,6 @@ static int sriov_init(struct pci_dev *dev, int pos)
> ssleep(1);
> }
>
> - pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
> - if (!total)
> - return 0;
> -
> ctrl = 0;
> list_for_each_entry(pdev, &dev->bus->devices, bus_list)
> if (pdev->is_physfn)
> @@ -414,6 +410,11 @@ static int sriov_init(struct pci_dev *dev, int pos)
>
> found:
> pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
> +
> + pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
> + if (!total)
> + return 0;
> +
> pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, 0);
> pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, &offset);
> pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, &stride);
> --
> 1.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI
2015-10-15 17:58 ` Bjorn Helgaas
@ 2015-10-15 20:00 ` Alexander Duyck
2015-10-15 21:36 ` Bjorn Helgaas
0 siblings, 1 reply; 9+ messages in thread
From: Alexander Duyck @ 2015-10-15 20:00 UTC (permalink / raw)
To: Bjorn Helgaas, Ben Shelton; +Cc: bhelgaas, linux-pci, linux-kernel
On 10/15/2015 10:58 AM, Bjorn Helgaas wrote:
> Hi Ben,
>
> On Thu, Oct 08, 2015 at 10:20:17AM -0500, Ben Shelton wrote:
>> For some SR-IOV devices, the number of available virtual functions increases
>> after enabling ARI. Currently, SRIOV_NUM_VF is read and saved off before the
>> ARI control bit is enabled in SRIOV_CTRL. This causes an issue when VFs are
>> enabled.
>>
>> At device init, SRIOV_INITIAL_VF and SRIOV_NUM_VF are specified to contain the
>> number of available VFs for the device. sriov_enable() does a sanity check
>> that SRIOV_INITIAL_VF is not greater than iov->total_VFs, the saved-off value
>> of SRIOV_NUM_VF. Since the value of both SRIOV_INITIAL_VF and SRIOV_NUM_VF has
>> increased after enabling the ARI bit, the check fails, and the VFs cannot be
>> enabled.
>>
>> To fix the issue, write SRIOV_CTRL first, and then read SRIOV_NUM_VF.
>
> I think you mean PCI_SRIOV_TOTAL_VR (not NUM_VF), right?
>
> This is interesting because the spec says TotalVFs is HwInit, which
> means it's read-only, and it doesn't mention anything about it
> changing when ARIis enabled. I can see why it would change in that
> case, so maybe this is just a goof in the spec.
>
> Bjorn
I think it is supposed to be HwInit because changing the value can cause
issues with resource allocation for the VFs. Specifically if the number
of VFs increases after the BIOS has come through and assigned MMIO
resources it is possible that there may not be resources available.
I suspect we are going to end up having to quirk a number of devices in
the future because of this as I can see this easily causing issues.
- Alex
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI
2015-10-15 20:00 ` Alexander Duyck
@ 2015-10-15 21:36 ` Bjorn Helgaas
2015-10-15 22:14 ` Alexander Duyck
2015-10-16 16:56 ` Ben Shelton
0 siblings, 2 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2015-10-15 21:36 UTC (permalink / raw)
To: Alexander Duyck; +Cc: Ben Shelton, bhelgaas, linux-pci, linux-kernel
On Thu, Oct 15, 2015 at 01:00:55PM -0700, Alexander Duyck wrote:
> On 10/15/2015 10:58 AM, Bjorn Helgaas wrote:
> >Hi Ben,
> >
> >On Thu, Oct 08, 2015 at 10:20:17AM -0500, Ben Shelton wrote:
> >>For some SR-IOV devices, the number of available virtual functions increases
> >>after enabling ARI. Currently, SRIOV_NUM_VF is read and saved off before the
> >>ARI control bit is enabled in SRIOV_CTRL. This causes an issue when VFs are
> >>enabled.
> >>
> >>At device init, SRIOV_INITIAL_VF and SRIOV_NUM_VF are specified to contain the
> >>number of available VFs for the device. sriov_enable() does a sanity check
> >>that SRIOV_INITIAL_VF is not greater than iov->total_VFs, the saved-off value
> >>of SRIOV_NUM_VF. Since the value of both SRIOV_INITIAL_VF and SRIOV_NUM_VF has
> >>increased after enabling the ARI bit, the check fails, and the VFs cannot be
> >>enabled.
> >>
> >>To fix the issue, write SRIOV_CTRL first, and then read SRIOV_NUM_VF.
> >
> >I think you mean PCI_SRIOV_TOTAL_VR (not NUM_VF), right?
> >
> >This is interesting because the spec says TotalVFs is HwInit, which
> >means it's read-only, and it doesn't mention anything about it
> >changing when ARIis enabled. I can see why it would change in that
> >case, so maybe this is just a goof in the spec.
>
> I think it is supposed to be HwInit because changing the value can
> cause issues with resource allocation for the VFs. Specifically if
> the number of VFs increases after the BIOS has come through and
> assigned MMIO resources it is possible that there may not be
> resources available.
Maybe, although sufficiently smart software could deal with that by
reassigning resources. Theoretically, anyway.
> I suspect we are going to end up having to quirk a number of devices
> in the future because of this as I can see this easily causing
> issues.
I guess the issue if we made this change would be:
- BIOS sees "ARI Capable Hierarchy" is zero
- BIOS sees TotalVFs = X
- BIOS allocates space for X VFs (size = "S * X")
- Linux sets ARI Capable Hierarchy
- Linux sees TotalVFs = X + Y
- Linux reads SR-IOV BAR, computes size as "S * (X + Y)"
- Linux tries to claim SR-IOV BAR, but fails because size is now too
large to fit where BIOS put it
Right? What sort of quirk would you envision? Something to keep us
from increasing "total" beyond what it was before we turned on ARI
Capable?
What problem does this patch solve, Ben? I assume you have devices
that do change TotalVFs when ARI is enabled, and you do want the new
value?
Or is the problem something like the following:
- ...
- Linux PCI core sees TotalVFs = X (saved as iov->total_VFs)
- Linux sets ARI Capable Hierarchy
- Device changes TotalVFs to X + Y (but PCI core doesn't notice)
- Driver reads TotalVFs and sees X + Y
- Driver attempts pci_enable_sriov(dev, X + Y), which fails because
sriov_enable() sees "X + Y > iov->total_VFs"
I'm a little dubious about drivers reading the SRIOV capability
directly, so maybe this is a symptom of deeper problems.
Bjorn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI
2015-10-15 21:36 ` Bjorn Helgaas
@ 2015-10-15 22:14 ` Alexander Duyck
2015-10-16 16:56 ` Ben Shelton
1 sibling, 0 replies; 9+ messages in thread
From: Alexander Duyck @ 2015-10-15 22:14 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: Ben Shelton, bhelgaas, linux-pci, linux-kernel
On 10/15/2015 02:36 PM, Bjorn Helgaas wrote:
> On Thu, Oct 15, 2015 at 01:00:55PM -0700, Alexander Duyck wrote:
>> On 10/15/2015 10:58 AM, Bjorn Helgaas wrote:
>>> Hi Ben,
>>>
>>> On Thu, Oct 08, 2015 at 10:20:17AM -0500, Ben Shelton wrote:
>>>> For some SR-IOV devices, the number of available virtual functions increases
>>>> after enabling ARI. Currently, SRIOV_NUM_VF is read and saved off before the
>>>> ARI control bit is enabled in SRIOV_CTRL. This causes an issue when VFs are
>>>> enabled.
>>>>
>>>> At device init, SRIOV_INITIAL_VF and SRIOV_NUM_VF are specified to contain the
>>>> number of available VFs for the device. sriov_enable() does a sanity check
>>>> that SRIOV_INITIAL_VF is not greater than iov->total_VFs, the saved-off value
>>>> of SRIOV_NUM_VF. Since the value of both SRIOV_INITIAL_VF and SRIOV_NUM_VF has
>>>> increased after enabling the ARI bit, the check fails, and the VFs cannot be
>>>> enabled.
>>>>
>>>> To fix the issue, write SRIOV_CTRL first, and then read SRIOV_NUM_VF.
>>> I think you mean PCI_SRIOV_TOTAL_VR (not NUM_VF), right?
>>>
>>> This is interesting because the spec says TotalVFs is HwInit, which
>>> means it's read-only, and it doesn't mention anything about it
>>> changing when ARIis enabled. I can see why it would change in that
>>> case, so maybe this is just a goof in the spec.
>> I think it is supposed to be HwInit because changing the value can
>> cause issues with resource allocation for the VFs. Specifically if
>> the number of VFs increases after the BIOS has come through and
>> assigned MMIO resources it is possible that there may not be
>> resources available.
> Maybe, although sufficiently smart software could deal with that by
> reassigning resources. Theoretically, anyway.
>
>> I suspect we are going to end up having to quirk a number of devices
>> in the future because of this as I can see this easily causing
>> issues.
> I guess the issue if we made this change would be:
>
> - BIOS sees "ARI Capable Hierarchy" is zero
> - BIOS sees TotalVFs = X
> - BIOS allocates space for X VFs (size = "S * X")
> - Linux sets ARI Capable Hierarchy
> - Linux sees TotalVFs = X + Y
> - Linux reads SR-IOV BAR, computes size as "S * (X + Y)"
> - Linux tries to claim SR-IOV BAR, but fails because size is now too
> large to fit where BIOS put it
>
> Right? What sort of quirk would you envision? Something to keep us
> from increasing "total" beyond what it was before we turned on ARI
> Capable?
The thing we would have to do in such a situation is force a
reallocation of the BARs in the SR-IOV area. Maybe instead of adding a
quirk we could just add code here so that if totalVFs increases after we
set ARI we clear the BAR registers and force reallocation. If I am not
mistaken the reallocation for unassigned bars would take place after
this code is run so it is probably the right place to do it.
> What problem does this patch solve, Ben? I assume you have devices
> that do change TotalVFs when ARI is enabled, and you do want the new
> value?
>
> Or is the problem something like the following:
>
> - ...
> - Linux PCI core sees TotalVFs = X (saved as iov->total_VFs)
> - Linux sets ARI Capable Hierarchy
> - Device changes TotalVFs to X + Y (but PCI core doesn't notice)
> - Driver reads TotalVFs and sees X + Y
> - Driver attempts pci_enable_sriov(dev, X + Y), which fails because
> sriov_enable() sees "X + Y > iov->total_VFs"
>
> I'm a little dubious about drivers reading the SRIOV capability
> directly, so maybe this is a symptom of deeper problems.
I don't think the issue is the drivers reading the SR-IOV config, it is
likely the end users. They will want to get full use of the device and
they would see the config lists something like 64 VFs being available
via lspci, but the kernel would have them capped at 7.
I think instead of just moving the read we should read it before and
after. If the value increases we should just drop the contents of the
base address registers so that they can be reallocated now that the
memory footprint has changed.
- Alex
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI
2015-10-15 21:36 ` Bjorn Helgaas
2015-10-15 22:14 ` Alexander Duyck
@ 2015-10-16 16:56 ` Ben Shelton
2015-10-16 18:07 ` Bjorn Helgaas
1 sibling, 1 reply; 9+ messages in thread
From: Ben Shelton @ 2015-10-16 16:56 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: Alexander Duyck, bhelgaas, linux-pci, linux-kernel
Hi Bjorn,
> What problem does this patch solve, Ben? I assume you have devices
> that do change TotalVFs when ARI is enabled, and you do want the new
> value?
>
> Or is the problem something like the following:
>
> - ...
> - Linux PCI core sees TotalVFs = X (saved as iov->total_VFs)
> - Linux sets ARI Capable Hierarchy
> - Device changes TotalVFs to X + Y (but PCI core doesn't notice)
> - Driver reads TotalVFs and sees X + Y
> - Driver attempts pci_enable_sriov(dev, X + Y), which fails because
> sriov_enable() sees "X + Y > iov->total_VFs"
Here's a short snippet from the databook for the PCI Express controller we're
using:
"Supports two sets of VF Stride, First VF Offset, InitialVFs, and TotalVFs
registers per PF—one each for ARI and non-ARI hierarchies. Selection is
performed by host software through the ARI Capable Hierarchy bit of the Control
register in the PF0 SR-IOV capability structure."
The values in InitialVFs and TotalVFs are HWinit for each set of registers.
So the issue this is intended to fix is the following:
- Linux PCI core sees TotalVFs = X (saved as iov->total_VFs).
- Linux sets ARI Capable Hierarchy.
- Device switches to exposing the second set of registers, where
InitialVFs = TotalVFs = Y (where Y > X).
- User enables one or more VFs on the device, e.g. by writing a value to
sriov_numvfs in the sysfs.
- Driver calls pci_enable_sriov() for the device, which then calls
sriov_enable(). sriov_enable() reads InitialVFs (= Y) and then checks if it's
greater than iov->total_VFs (= X). Since Y > X, the comparison is true, so
sriov_enable() fails out and returns -EIO.
>
> I'm a little dubious about drivers reading the SRIOV capability
> directly, so maybe this is a symptom of deeper problems.
I agree that the driver should not be reading the capability directly, but from
what I understand, it's intended for the device itself to do this. From the PCI
SR-IOV spec revision 1.1:
"ARI Capable Hierarchy is a hint to the Device that ARI has been enabled in the
Root Port or Switch Downstream Port immediately above the Device."
Ben
>
> Bjorn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI
2015-10-16 16:56 ` Ben Shelton
@ 2015-10-16 18:07 ` Bjorn Helgaas
0 siblings, 0 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2015-10-16 18:07 UTC (permalink / raw)
To: Ben Shelton; +Cc: Alexander Duyck, bhelgaas, linux-pci, linux-kernel
Hi Ben,
Thanks for the detailed response!
On Fri, Oct 16, 2015 at 11:56:28AM -0500, Ben Shelton wrote:
> Hi Bjorn,
>
> > What problem does this patch solve, Ben? I assume you have devices
> > that do change TotalVFs when ARI is enabled, and you do want the new
> > value?
> >
> > Or is the problem something like the following:
> >
> > - ...
> > - Linux PCI core sees TotalVFs = X (saved as iov->total_VFs)
> > - Linux sets ARI Capable Hierarchy
> > - Device changes TotalVFs to X + Y (but PCI core doesn't notice)
> > - Driver reads TotalVFs and sees X + Y
> > - Driver attempts pci_enable_sriov(dev, X + Y), which fails because
> > sriov_enable() sees "X + Y > iov->total_VFs"
>
> Here's a short snippet from the databook for the PCI Express controller we're
> using:
>
> "Supports two sets of VF Stride, First VF Offset, InitialVFs, and TotalVFs
> registers per PF—one each for ARI and non-ARI hierarchies. Selection is
> performed by host software through the ARI Capable Hierarchy bit of the Control
> register in the PF0 SR-IOV capability structure."
>
> The values in InitialVFs and TotalVFs are HWinit for each set of registers.
The HwInit description says "bits are read-only after initialization
and can only be reset (for write-once by firmware) with 'Power Good
Reset'." I don't see any provision for different values based on a
control register bit, so I think this device is actually out of spec.
We should be able to deal with it, so it's not that big a deal, but we
will have to keep it in mind and probably mention it in a comment in
the code.
> So the issue this is intended to fix is the following:
>
> - Linux PCI core sees TotalVFs = X (saved as iov->total_VFs).
> - Linux sets ARI Capable Hierarchy.
> - Device switches to exposing the second set of registers, where
> InitialVFs = TotalVFs = Y (where Y > X).
> - User enables one or more VFs on the device, e.g. by writing a value to
> sriov_numvfs in the sysfs.
> - Driver calls pci_enable_sriov() for the device, which then calls
> sriov_enable(). sriov_enable() reads InitialVFs (= Y) and then checks if it's
> greater than iov->total_VFs (= X). Since Y > X, the comparison is true, so
> sriov_enable() fails out and returns -EIO.
I think there are two problems here:
1) We should be reading some registers together to make sure we get
consistent values. For example, we always read VFOffset and
VFStride immediately after writing NumVFs. I think we should
read InitialVFs and TotalVFs together. I don't see the point of
reading TotalVFs in sriov_init() and reading InitialVFs in
sriov_enable(). If we read them both in sriov_init(), I don't
think we'd have this problem of sriov_enable() returning -EIO.
2) To work around this non-compliant device, we should read InitialVFs
and TotalVFs after setting the ARI bit.
Ideally, I think this would be two patches: one to move the InitialVFs
read from sriov_enable() to sriov_init(), and a second to move the
pair from before setting ARI to after.
> > I'm a little dubious about drivers reading the SRIOV capability
> > directly, so maybe this is a symptom of deeper problems.
>
> I agree that the driver should not be reading the capability directly, but from
> what I understand, it's intended for the device itself to do this. From the PCI
> SR-IOV spec revision 1.1:
>
> "ARI Capable Hierarchy is a hint to the Device that ARI has been enabled in the
> Root Port or Switch Downstream Port immediately above the Device."
Sure, of course, the device should behave differently based on how the
registers are programmed; that's the whole point of having writable
registers. I think the particular case of the device changing HwInit
registers is out of spec, but changing things like VFOffset and
VFStride is completely expected.
Bjorn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI
2015-10-08 15:20 [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI Ben Shelton
2015-10-15 17:58 ` Bjorn Helgaas
@ 2015-10-15 19:31 ` Bjorn Helgaas
2015-10-21 20:52 ` Bjorn Helgaas
1 sibling, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2015-10-15 19:31 UTC (permalink / raw)
To: Ben Shelton; +Cc: bhelgaas, linux-pci, linux-kernel
On Thu, Oct 08, 2015 at 10:20:17AM -0500, Ben Shelton wrote:
> For some SR-IOV devices, the number of available virtual functions increases
> after enabling ARI. Currently, SRIOV_NUM_VF is read and saved off before the
> ARI control bit is enabled in SRIOV_CTRL. This causes an issue when VFs are
> enabled.
>
> At device init, SRIOV_INITIAL_VF and SRIOV_NUM_VF are specified to contain the
> number of available VFs for the device. sriov_enable() does a sanity check
> that SRIOV_INITIAL_VF is not greater than iov->total_VFs, the saved-off value
> of SRIOV_NUM_VF. Since the value of both SRIOV_INITIAL_VF and SRIOV_NUM_VF has
> increased after enabling the ARI bit, the check fails, and the VFs cannot be
> enabled.
>
> To fix the issue, write SRIOV_CTRL first, and then read SRIOV_NUM_VF.
>
> Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
I applied this as follows to pci/virtualization for v4.4, thanks, Ben!
This is on top of a NumVFs-related patch, so the diff looks slightly
different, but I think it's functionally equivalent.
commit 3aa71da412fedaee133b4b6e4be4b801c59d6c91
Author: Ben Shelton <benjamin.h.shelton@intel.com>
Date: Thu Oct 15 12:35:17 2015 -0500
PCI: Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs
For some SR-IOV devices, the number of available virtual functions, i.e.,
TotalVFs, increases after setting the ARI Capable Hierarchy bit in the
SR-IOV Control register. This violates the SR-IOV spec, r1.1, sec 3.3.6,
which says TotalVFs is HwInit, but we don't need TotalVFs before setting
the ARI Capable bit anyway.
Set the ARI Capable Hierarchy bit (if ARI is enabled in the upstream
bridge) before reading TotalVFs.
[bhelgaas: changelog]
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 0202ab0..f8bfc1d 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -399,10 +399,6 @@ static int sriov_init(struct pci_dev *dev, int pos)
ssleep(1);
}
- pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
- if (!total)
- return 0;
-
ctrl = 0;
list_for_each_entry(pdev, &dev->bus->devices, bus_list)
if (pdev->is_physfn)
@@ -415,6 +411,10 @@ static int sriov_init(struct pci_dev *dev, int pos)
found:
pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
+ pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
+ if (!total)
+ return 0;
+
pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, &pgsz);
i = PAGE_SHIFT > 12 ? PAGE_SHIFT - 12 : 0;
pgsz &= ~((1 << i) - 1);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI
2015-10-15 19:31 ` Bjorn Helgaas
@ 2015-10-21 20:52 ` Bjorn Helgaas
0 siblings, 0 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2015-10-21 20:52 UTC (permalink / raw)
To: Ben Shelton; +Cc: bhelgaas, linux-pci, linux-kernel
On Thu, Oct 15, 2015 at 02:31:16PM -0500, Bjorn Helgaas wrote:
> On Thu, Oct 08, 2015 at 10:20:17AM -0500, Ben Shelton wrote:
> > For some SR-IOV devices, the number of available virtual functions increases
> > after enabling ARI. Currently, SRIOV_NUM_VF is read and saved off before the
> > ARI control bit is enabled in SRIOV_CTRL. This causes an issue when VFs are
> > enabled.
> >
> > At device init, SRIOV_INITIAL_VF and SRIOV_NUM_VF are specified to contain the
> > number of available VFs for the device. sriov_enable() does a sanity check
> > that SRIOV_INITIAL_VF is not greater than iov->total_VFs, the saved-off value
> > of SRIOV_NUM_VF. Since the value of both SRIOV_INITIAL_VF and SRIOV_NUM_VF has
> > increased after enabling the ARI bit, the check fails, and the VFs cannot be
> > enabled.
> >
> > To fix the issue, write SRIOV_CTRL first, and then read SRIOV_NUM_VF.
> >
> > Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
>
> I applied this as follows to pci/virtualization for v4.4, thanks, Ben!
I dropped this one for now, pending the resolution of my questions in
http://lkml.kernel.org/r/20151016180701.GB21346@localhost
> commit 3aa71da412fedaee133b4b6e4be4b801c59d6c91
> Author: Ben Shelton <benjamin.h.shelton@intel.com>
> Date: Thu Oct 15 12:35:17 2015 -0500
>
> PCI: Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs
>
> For some SR-IOV devices, the number of available virtual functions, i.e.,
> TotalVFs, increases after setting the ARI Capable Hierarchy bit in the
> SR-IOV Control register. This violates the SR-IOV spec, r1.1, sec 3.3.6,
> which says TotalVFs is HwInit, but we don't need TotalVFs before setting
> the ARI Capable bit anyway.
>
> Set the ARI Capable Hierarchy bit (if ARI is enabled in the upstream
> bridge) before reading TotalVFs.
>
> [bhelgaas: changelog]
> Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 0202ab0..f8bfc1d 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -399,10 +399,6 @@ static int sriov_init(struct pci_dev *dev, int pos)
> ssleep(1);
> }
>
> - pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
> - if (!total)
> - return 0;
> -
> ctrl = 0;
> list_for_each_entry(pdev, &dev->bus->devices, bus_list)
> if (pdev->is_physfn)
> @@ -415,6 +411,10 @@ static int sriov_init(struct pci_dev *dev, int pos)
> found:
> pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
>
> + pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
> + if (!total)
> + return 0;
> +
> pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, &pgsz);
> i = PAGE_SHIFT > 12 ? PAGE_SHIFT - 12 : 0;
> pgsz &= ~((1 << i) - 1);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-10-21 20:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-08 15:20 [PATCH v2] PCI: IOV: read SRIOV_NUM_VF after enabling ARI Ben Shelton
2015-10-15 17:58 ` Bjorn Helgaas
2015-10-15 20:00 ` Alexander Duyck
2015-10-15 21:36 ` Bjorn Helgaas
2015-10-15 22:14 ` Alexander Duyck
2015-10-16 16:56 ` Ben Shelton
2015-10-16 18:07 ` Bjorn Helgaas
2015-10-15 19:31 ` Bjorn Helgaas
2015-10-21 20:52 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).