* [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon probe
2016-06-08 21:33 ` Alexander Duyck
@ 2016-06-08 21:46 ` Keller, Jacob E
2016-06-09 17:24 ` Keller, Jacob E
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Keller, Jacob E @ 2016-06-08 21:46 UTC (permalink / raw)
To: intel-wired-lan
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Wednesday, June 08, 2016 2:34 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon
> probe
>
> On Wed, Jun 8, 2016 at 2:23 PM, Keller, Jacob E
> <jacob.e.keller@intel.com> wrote:
> >> -----Original Message-----
> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> Sent: Wednesday, June 08, 2016 2:23 PM
> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> upon
> >> probe
> >>
> >> On Wed, Jun 8, 2016 at 2:18 PM, Keller, Jacob E
> >> <jacob.e.keller@intel.com> wrote:
> >> >> -----Original Message-----
> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> Sent: Wednesday, June 08, 2016 2:02 PM
> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> >> upon
> >> >> probe
> >> >>
> >> >> On Wed, Jun 8, 2016 at 12:11 PM, Keller, Jacob E
> >> >> <jacob.e.keller@intel.com> wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> >> Sent: Tuesday, June 07, 2016 8:54 PM
> >> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power
> state
> >> >> upon
> >> >> >> probe
> >> >> >>
> >> >> >> On Tue, Jun 7, 2016 at 5:20 PM, Jacob Keller
> >> <jacob.e.keller@intel.com>
> >> >> >> wrote:
> >> >> >> > It seems that under some circumstances, such as after
> performing
> >> an
> >> >> >> > unbind following an AER injection recovery, the power state of a
> VF
> >> >> >> > isn't set to PCI_D0. Following the unbind if we attempt to bind
> the
> >> >> >> > device, the MSI-X initialization fails due to incorrect power state.
> >> >> >>
> >> >> >> That doesn't make much sense. I didn't think the VFs had their own
> >> >> >> power state control. After all it isn't as if they could switch off
> >> >> >> while the PF is active on the host.
> >> >> >>
> >> >> >> > Fix this by always calling pci_set_power_state(pdev, PCI_D0) on
> >> driver
> >> >> >> > device probe.
> >> >> >>
> >> >> >> That should be redundant. There is already code that is called to
> >> >> >> transition the power state to D0 in pci_enable_device_mem. As
> long
> >> as
> >> >> >> nobody else has enabled the device it will be ran.
> >> >> >>
> >> >> >
> >> >> > That's weird. In this case, after doing aer-inject on the PF, if we
> unbind
> >> >> then bind the VF device, it fails in pci_enable_msix_range which I
> traced
> >> into
> >> >> eventually a call that fails because the state isn't PCI_D0. If I add a
> >> >> "pci_set_power_state(pdev, PCI_D0)" call in the probe routine it
> works
> >> >> fine...?
> >> >> >
> >> >> > I don't really know why that'd be the case... thoughts?
> >> >>
> >> >> Does the VF even advertise support for PCI power management
> >> >> capability? If I recall a VF shouldn't. As such I think a call to
> >> >> pci_set_power_state should be returning an error and not doing
> >> >> anything so I would think this code only has any effect on the PF. So
> >> >> I am not certain what this patch is even doing if you aren't reloading
> >> >> the PF driver to trigger it.
> >> >>
> >> >> - Alex
> >> >
> >> > I'm not sure either but without it we're ending up with the failure in
> >> pci_msi_supported failing with -EINVAL because dev->current_state !=
> >> PCI_D0...
> >> >
> >> > It only happens after an unbind of the VF device then a bind. I think
> >> somehow the dev->current_state gets messed up but I don't know how
> this
> >> happens.
> >> >
> >> > Thanks,
> >> > Jake
> >>
> >> When you say unbind/bind are you talking about using the sysfs or is
> >> this via some other mechanism?
> >>
> >> - Alex
> >
> > Using sysfs. We echo the PCI address to fm10k's unbind, then echo it to
> bind.
> >
> > Thanks,
> > Jake
>
> I'm wondering if we have a reference count bug being created
> somewhere. One thing you might try tracking is the dev->enable_cnt
> value for the VF through probe and after a remove to see if we are
> leaking that somewhere. The other thing to look for is to figure out
> where we are setting dev->current_state to something other than
> PCI_D0. I'd be interested in knowing where we are messing on this
> with a VF and what we are setting it to.
>
> - Alex
I will try to get this information.
Thanks,
Jake
^ permalink raw reply [flat|nested] 15+ messages in thread* [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon probe
2016-06-08 21:33 ` Alexander Duyck
2016-06-08 21:46 ` Keller, Jacob E
@ 2016-06-09 17:24 ` Keller, Jacob E
2016-06-09 17:31 ` Keller, Jacob E
2016-06-09 17:38 ` Keller, Jacob E
3 siblings, 0 replies; 15+ messages in thread
From: Keller, Jacob E @ 2016-06-09 17:24 UTC (permalink / raw)
To: intel-wired-lan
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Wednesday, June 08, 2016 2:34 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon
> probe
>
> On Wed, Jun 8, 2016 at 2:23 PM, Keller, Jacob E
> <jacob.e.keller@intel.com> wrote:
> >> -----Original Message-----
> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> Sent: Wednesday, June 08, 2016 2:23 PM
> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> upon
> >> probe
> >>
> >> On Wed, Jun 8, 2016 at 2:18 PM, Keller, Jacob E
> >> <jacob.e.keller@intel.com> wrote:
> >> >> -----Original Message-----
> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> Sent: Wednesday, June 08, 2016 2:02 PM
> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> >> upon
> >> >> probe
> >> >>
> >> >> On Wed, Jun 8, 2016 at 12:11 PM, Keller, Jacob E
> >> >> <jacob.e.keller@intel.com> wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> >> Sent: Tuesday, June 07, 2016 8:54 PM
> >> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power
> state
> >> >> upon
> >> >> >> probe
> >> >> >>
> >> >> >> On Tue, Jun 7, 2016 at 5:20 PM, Jacob Keller
> >> <jacob.e.keller@intel.com>
> >> >> >> wrote:
> >> >> >> > It seems that under some circumstances, such as after
> performing
> >> an
> >> >> >> > unbind following an AER injection recovery, the power state of a
> VF
> >> >> >> > isn't set to PCI_D0. Following the unbind if we attempt to bind
> the
> >> >> >> > device, the MSI-X initialization fails due to incorrect power state.
> >> >> >>
> >> >> >> That doesn't make much sense. I didn't think the VFs had their own
> >> >> >> power state control. After all it isn't as if they could switch off
> >> >> >> while the PF is active on the host.
> >> >> >>
> >> >> >> > Fix this by always calling pci_set_power_state(pdev, PCI_D0) on
> >> driver
> >> >> >> > device probe.
> >> >> >>
> >> >> >> That should be redundant. There is already code that is called to
> >> >> >> transition the power state to D0 in pci_enable_device_mem. As
> long
> >> as
> >> >> >> nobody else has enabled the device it will be ran.
> >> >> >>
> >> >> >
> >> >> > That's weird. In this case, after doing aer-inject on the PF, if we
> unbind
> >> >> then bind the VF device, it fails in pci_enable_msix_range which I
> traced
> >> into
> >> >> eventually a call that fails because the state isn't PCI_D0. If I add a
> >> >> "pci_set_power_state(pdev, PCI_D0)" call in the probe routine it
> works
> >> >> fine...?
> >> >> >
> >> >> > I don't really know why that'd be the case... thoughts?
> >> >>
> >> >> Does the VF even advertise support for PCI power management
> >> >> capability? If I recall a VF shouldn't. As such I think a call to
> >> >> pci_set_power_state should be returning an error and not doing
> >> >> anything so I would think this code only has any effect on the PF. So
> >> >> I am not certain what this patch is even doing if you aren't reloading
> >> >> the PF driver to trigger it.
> >> >>
> >> >> - Alex
> >> >
> >> > I'm not sure either but without it we're ending up with the failure in
> >> pci_msi_supported failing with -EINVAL because dev->current_state !=
> >> PCI_D0...
> >> >
> >> > It only happens after an unbind of the VF device then a bind. I think
> >> somehow the dev->current_state gets messed up but I don't know how
> this
> >> happens.
> >> >
> >> > Thanks,
> >> > Jake
> >>
> >> When you say unbind/bind are you talking about using the sysfs or is
> >> this via some other mechanism?
> >>
> >> - Alex
> >
> > Using sysfs. We echo the PCI address to fm10k's unbind, then echo it to
> bind.
> >
> > Thanks,
> > Jake
>
> I'm wondering if we have a reference count bug being created
> somewhere. One thing you might try tracking is the dev->enable_cnt
> value for the VF through probe and after a remove to see if we are
> leaking that somewhere. The other thing to look for is to figure out
> where we are setting dev->current_state to something other than
> PCI_D0. I'd be interested in knowing where we are messing on this
> with a VF and what we are setting it to.
>
> - Alex
After a fresh bind, the current_state is PCI_UNKNOWN even after (a successful) pci_enable_device_mem...
It stays unknown all the way through until the pci_enable_msix_range function is called and failed.
I'm adding code to check the enable count now.
Thanks,
Jake
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon probe
2016-06-08 21:33 ` Alexander Duyck
2016-06-08 21:46 ` Keller, Jacob E
2016-06-09 17:24 ` Keller, Jacob E
@ 2016-06-09 17:31 ` Keller, Jacob E
2016-06-09 17:38 ` Keller, Jacob E
3 siblings, 0 replies; 15+ messages in thread
From: Keller, Jacob E @ 2016-06-09 17:31 UTC (permalink / raw)
To: intel-wired-lan
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Wednesday, June 08, 2016 2:34 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon
> probe
>
> On Wed, Jun 8, 2016 at 2:23 PM, Keller, Jacob E
> <jacob.e.keller@intel.com> wrote:
> >> -----Original Message-----
> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> Sent: Wednesday, June 08, 2016 2:23 PM
> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> upon
> >> probe
> >>
> >> On Wed, Jun 8, 2016 at 2:18 PM, Keller, Jacob E
> >> <jacob.e.keller@intel.com> wrote:
> >> >> -----Original Message-----
> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> Sent: Wednesday, June 08, 2016 2:02 PM
> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> >> upon
> >> >> probe
> >> >>
> >> >> On Wed, Jun 8, 2016 at 12:11 PM, Keller, Jacob E
> >> >> <jacob.e.keller@intel.com> wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> >> Sent: Tuesday, June 07, 2016 8:54 PM
> >> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power
> state
> >> >> upon
> >> >> >> probe
> >> >> >>
> >> >> >> On Tue, Jun 7, 2016 at 5:20 PM, Jacob Keller
> >> <jacob.e.keller@intel.com>
> >> >> >> wrote:
> >> >> >> > It seems that under some circumstances, such as after
> performing
> >> an
> >> >> >> > unbind following an AER injection recovery, the power state of a
> VF
> >> >> >> > isn't set to PCI_D0. Following the unbind if we attempt to bind
> the
> >> >> >> > device, the MSI-X initialization fails due to incorrect power state.
> >> >> >>
> >> >> >> That doesn't make much sense. I didn't think the VFs had their own
> >> >> >> power state control. After all it isn't as if they could switch off
> >> >> >> while the PF is active on the host.
> >> >> >>
> >> >> >> > Fix this by always calling pci_set_power_state(pdev, PCI_D0) on
> >> driver
> >> >> >> > device probe.
> >> >> >>
> >> >> >> That should be redundant. There is already code that is called to
> >> >> >> transition the power state to D0 in pci_enable_device_mem. As
> long
> >> as
> >> >> >> nobody else has enabled the device it will be ran.
> >> >> >>
> >> >> >
> >> >> > That's weird. In this case, after doing aer-inject on the PF, if we
> unbind
> >> >> then bind the VF device, it fails in pci_enable_msix_range which I
> traced
> >> into
> >> >> eventually a call that fails because the state isn't PCI_D0. If I add a
> >> >> "pci_set_power_state(pdev, PCI_D0)" call in the probe routine it
> works
> >> >> fine...?
> >> >> >
> >> >> > I don't really know why that'd be the case... thoughts?
> >> >>
> >> >> Does the VF even advertise support for PCI power management
> >> >> capability? If I recall a VF shouldn't. As such I think a call to
> >> >> pci_set_power_state should be returning an error and not doing
> >> >> anything so I would think this code only has any effect on the PF. So
> >> >> I am not certain what this patch is even doing if you aren't reloading
> >> >> the PF driver to trigger it.
> >> >>
> >> >> - Alex
> >> >
> >> > I'm not sure either but without it we're ending up with the failure in
> >> pci_msi_supported failing with -EINVAL because dev->current_state !=
> >> PCI_D0...
> >> >
> >> > It only happens after an unbind of the VF device then a bind. I think
> >> somehow the dev->current_state gets messed up but I don't know how
> this
> >> happens.
> >> >
> >> > Thanks,
> >> > Jake
> >>
> >> When you say unbind/bind are you talking about using the sysfs or is
> >> this via some other mechanism?
> >>
> >> - Alex
> >
> > Using sysfs. We echo the PCI address to fm10k's unbind, then echo it to
> bind.
> >
> > Thanks,
> > Jake
>
> I'm wondering if we have a reference count bug being created
> somewhere. One thing you might try tracking is the dev->enable_cnt
> value for the VF through probe and after a remove to see if we are
> leaking that somewhere. The other thing to look for is to figure out
> where we are setting dev->current_state to something other than
> PCI_D0. I'd be interested in knowing where we are messing on this
> with a VF and what we are setting it to.
>
> - Alex
It appears that after an AER injection error we end up leaking a pci device reference count. I'm currently investigating that now.
Thanks,
Jake
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon probe
2016-06-08 21:33 ` Alexander Duyck
` (2 preceding siblings ...)
2016-06-09 17:31 ` Keller, Jacob E
@ 2016-06-09 17:38 ` Keller, Jacob E
2016-06-09 17:54 ` Alexander Duyck
3 siblings, 1 reply; 15+ messages in thread
From: Keller, Jacob E @ 2016-06-09 17:38 UTC (permalink / raw)
To: intel-wired-lan
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Wednesday, June 08, 2016 2:34 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon
> probe
>
> On Wed, Jun 8, 2016 at 2:23 PM, Keller, Jacob E
> <jacob.e.keller@intel.com> wrote:
> >> -----Original Message-----
> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> Sent: Wednesday, June 08, 2016 2:23 PM
> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> upon
> >> probe
> >>
> >> On Wed, Jun 8, 2016 at 2:18 PM, Keller, Jacob E
> >> <jacob.e.keller@intel.com> wrote:
> >> >> -----Original Message-----
> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> Sent: Wednesday, June 08, 2016 2:02 PM
> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> >> upon
> >> >> probe
> >> >>
> >> >> On Wed, Jun 8, 2016 at 12:11 PM, Keller, Jacob E
> >> >> <jacob.e.keller@intel.com> wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> >> Sent: Tuesday, June 07, 2016 8:54 PM
> >> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power
> state
> >> >> upon
> >> >> >> probe
> >> >> >>
> >> >> >> On Tue, Jun 7, 2016 at 5:20 PM, Jacob Keller
> >> <jacob.e.keller@intel.com>
> >> >> >> wrote:
> >> >> >> > It seems that under some circumstances, such as after
> performing
> >> an
> >> >> >> > unbind following an AER injection recovery, the power state of a
> VF
> >> >> >> > isn't set to PCI_D0. Following the unbind if we attempt to bind
> the
> >> >> >> > device, the MSI-X initialization fails due to incorrect power state.
> >> >> >>
> >> >> >> That doesn't make much sense. I didn't think the VFs had their own
> >> >> >> power state control. After all it isn't as if they could switch off
> >> >> >> while the PF is active on the host.
> >> >> >>
> >> >> >> > Fix this by always calling pci_set_power_state(pdev, PCI_D0) on
> >> driver
> >> >> >> > device probe.
> >> >> >>
> >> >> >> That should be redundant. There is already code that is called to
> >> >> >> transition the power state to D0 in pci_enable_device_mem. As
> long
> >> as
> >> >> >> nobody else has enabled the device it will be ran.
> >> >> >>
> >> >> >
> >> >> > That's weird. In this case, after doing aer-inject on the PF, if we
> unbind
> >> >> then bind the VF device, it fails in pci_enable_msix_range which I
> traced
> >> into
> >> >> eventually a call that fails because the state isn't PCI_D0. If I add a
> >> >> "pci_set_power_state(pdev, PCI_D0)" call in the probe routine it
> works
> >> >> fine...?
> >> >> >
> >> >> > I don't really know why that'd be the case... thoughts?
> >> >>
> >> >> Does the VF even advertise support for PCI power management
> >> >> capability? If I recall a VF shouldn't. As such I think a call to
> >> >> pci_set_power_state should be returning an error and not doing
> >> >> anything so I would think this code only has any effect on the PF. So
> >> >> I am not certain what this patch is even doing if you aren't reloading
> >> >> the PF driver to trigger it.
> >> >>
> >> >> - Alex
> >> >
> >> > I'm not sure either but without it we're ending up with the failure in
> >> pci_msi_supported failing with -EINVAL because dev->current_state !=
> >> PCI_D0...
> >> >
> >> > It only happens after an unbind of the VF device then a bind. I think
> >> somehow the dev->current_state gets messed up but I don't know how
> this
> >> happens.
> >> >
> >> > Thanks,
> >> > Jake
> >>
> >> When you say unbind/bind are you talking about using the sysfs or is
> >> this via some other mechanism?
> >>
> >> - Alex
> >
> > Using sysfs. We echo the PCI address to fm10k's unbind, then echo it to
> bind.
> >
> > Thanks,
> > Jake
>
> I'm wondering if we have a reference count bug being created
> somewhere. One thing you might try tracking is the dev->enable_cnt
> value for the VF through probe and after a remove to see if we are
> leaking that somewhere. The other thing to look for is to figure out
> where we are setting dev->current_state to something other than
> PCI_D0. I'd be interested in knowing where we are messing on this
> with a VF and what we are setting it to.
>
> - Alex
Yep, when we call pci_enable_device_mem() in fm10k_io_slot_reset() we leak an enable count and I think that?s at least partially responsible for the failure since I bet VFs don't actually expose power state, so once enabled it won't change back from unknown?
Thanks,
Jake
^ permalink raw reply [flat|nested] 15+ messages in thread* [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon probe
2016-06-09 17:38 ` Keller, Jacob E
@ 2016-06-09 17:54 ` Alexander Duyck
2016-06-09 18:30 ` Keller, Jacob E
2016-06-09 18:57 ` Keller, Jacob E
0 siblings, 2 replies; 15+ messages in thread
From: Alexander Duyck @ 2016-06-09 17:54 UTC (permalink / raw)
To: intel-wired-lan
On Thu, Jun 9, 2016 at 10:38 AM, Keller, Jacob E
<jacob.e.keller@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
>> Sent: Wednesday, June 08, 2016 2:34 PM
>> To: Keller, Jacob E <jacob.e.keller@intel.com>
>> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
>> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon
>> probe
>>
>> On Wed, Jun 8, 2016 at 2:23 PM, Keller, Jacob E
>> <jacob.e.keller@intel.com> wrote:
>> >> -----Original Message-----
>> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
>> >> Sent: Wednesday, June 08, 2016 2:23 PM
>> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
>> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
>> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
>> upon
>> >> probe
>> >>
>> >> On Wed, Jun 8, 2016 at 2:18 PM, Keller, Jacob E
>> >> <jacob.e.keller@intel.com> wrote:
>> >> >> -----Original Message-----
>> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
>> >> >> Sent: Wednesday, June 08, 2016 2:02 PM
>> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
>> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
>> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
>> >> upon
>> >> >> probe
>> >> >>
>> >> >> On Wed, Jun 8, 2016 at 12:11 PM, Keller, Jacob E
>> >> >> <jacob.e.keller@intel.com> wrote:
>> >> >> >> -----Original Message-----
>> >> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
>> >> >> >> Sent: Tuesday, June 07, 2016 8:54 PM
>> >> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
>> >> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
>> >> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power
>> state
>> >> >> upon
>> >> >> >> probe
>> >> >> >>
>> >> >> >> On Tue, Jun 7, 2016 at 5:20 PM, Jacob Keller
>> >> <jacob.e.keller@intel.com>
>> >> >> >> wrote:
>> >> >> >> > It seems that under some circumstances, such as after
>> performing
>> >> an
>> >> >> >> > unbind following an AER injection recovery, the power state of a
>> VF
>> >> >> >> > isn't set to PCI_D0. Following the unbind if we attempt to bind
>> the
>> >> >> >> > device, the MSI-X initialization fails due to incorrect power state.
>> >> >> >>
>> >> >> >> That doesn't make much sense. I didn't think the VFs had their own
>> >> >> >> power state control. After all it isn't as if they could switch off
>> >> >> >> while the PF is active on the host.
>> >> >> >>
>> >> >> >> > Fix this by always calling pci_set_power_state(pdev, PCI_D0) on
>> >> driver
>> >> >> >> > device probe.
>> >> >> >>
>> >> >> >> That should be redundant. There is already code that is called to
>> >> >> >> transition the power state to D0 in pci_enable_device_mem. As
>> long
>> >> as
>> >> >> >> nobody else has enabled the device it will be ran.
>> >> >> >>
>> >> >> >
>> >> >> > That's weird. In this case, after doing aer-inject on the PF, if we
>> unbind
>> >> >> then bind the VF device, it fails in pci_enable_msix_range which I
>> traced
>> >> into
>> >> >> eventually a call that fails because the state isn't PCI_D0. If I add a
>> >> >> "pci_set_power_state(pdev, PCI_D0)" call in the probe routine it
>> works
>> >> >> fine...?
>> >> >> >
>> >> >> > I don't really know why that'd be the case... thoughts?
>> >> >>
>> >> >> Does the VF even advertise support for PCI power management
>> >> >> capability? If I recall a VF shouldn't. As such I think a call to
>> >> >> pci_set_power_state should be returning an error and not doing
>> >> >> anything so I would think this code only has any effect on the PF. So
>> >> >> I am not certain what this patch is even doing if you aren't reloading
>> >> >> the PF driver to trigger it.
>> >> >>
>> >> >> - Alex
>> >> >
>> >> > I'm not sure either but without it we're ending up with the failure in
>> >> pci_msi_supported failing with -EINVAL because dev->current_state !=
>> >> PCI_D0...
>> >> >
>> >> > It only happens after an unbind of the VF device then a bind. I think
>> >> somehow the dev->current_state gets messed up but I don't know how
>> this
>> >> happens.
>> >> >
>> >> > Thanks,
>> >> > Jake
>> >>
>> >> When you say unbind/bind are you talking about using the sysfs or is
>> >> this via some other mechanism?
>> >>
>> >> - Alex
>> >
>> > Using sysfs. We echo the PCI address to fm10k's unbind, then echo it to
>> bind.
>> >
>> > Thanks,
>> > Jake
>>
>> I'm wondering if we have a reference count bug being created
>> somewhere. One thing you might try tracking is the dev->enable_cnt
>> value for the VF through probe and after a remove to see if we are
>> leaking that somewhere. The other thing to look for is to figure out
>> where we are setting dev->current_state to something other than
>> PCI_D0. I'd be interested in knowing where we are messing on this
>> with a VF and what we are setting it to.
>>
>> - Alex
>
> Yep, when we call pci_enable_device_mem() in fm10k_io_slot_reset() we leak an enable count and I think that?s at least partially responsible for the failure since I bet VFs don't actually expose power state, so once enabled it won't change back from unknown?
No, it will but you have to actually be re-enabling the device to trigger it.
You might try replacing the call to pci_enable_device_mem with
pci_reenable_device and see if that works for you. I suspect this is
probably a bug that is common in many of the drivers out there.
- Alex
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon probe
2016-06-09 17:54 ` Alexander Duyck
@ 2016-06-09 18:30 ` Keller, Jacob E
2016-06-09 18:57 ` Keller, Jacob E
1 sibling, 0 replies; 15+ messages in thread
From: Keller, Jacob E @ 2016-06-09 18:30 UTC (permalink / raw)
To: intel-wired-lan
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Thursday, June 09, 2016 10:55 AM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon
> probe
>
> On Thu, Jun 9, 2016 at 10:38 AM, Keller, Jacob E
> <jacob.e.keller@intel.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> Sent: Wednesday, June 08, 2016 2:34 PM
> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> upon
> >> probe
> >>
> >> On Wed, Jun 8, 2016 at 2:23 PM, Keller, Jacob E
> >> <jacob.e.keller@intel.com> wrote:
> >> >> -----Original Message-----
> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> Sent: Wednesday, June 08, 2016 2:23 PM
> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> >> upon
> >> >> probe
> >> >>
> >> >> On Wed, Jun 8, 2016 at 2:18 PM, Keller, Jacob E
> >> >> <jacob.e.keller@intel.com> wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> >> Sent: Wednesday, June 08, 2016 2:02 PM
> >> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power
> state
> >> >> upon
> >> >> >> probe
> >> >> >>
> >> >> >> On Wed, Jun 8, 2016 at 12:11 PM, Keller, Jacob E
> >> >> >> <jacob.e.keller@intel.com> wrote:
> >> >> >> >> -----Original Message-----
> >> >> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> >> >> Sent: Tuesday, June 07, 2016 8:54 PM
> >> >> >> >> To: Keller, Jacob E <jacob.e.keller@intel.com>
> >> >> >> >> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> >> >> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power
> >> state
> >> >> >> upon
> >> >> >> >> probe
> >> >> >> >>
> >> >> >> >> On Tue, Jun 7, 2016 at 5:20 PM, Jacob Keller
> >> >> <jacob.e.keller@intel.com>
> >> >> >> >> wrote:
> >> >> >> >> > It seems that under some circumstances, such as after
> >> performing
> >> >> an
> >> >> >> >> > unbind following an AER injection recovery, the power state
> of a
> >> VF
> >> >> >> >> > isn't set to PCI_D0. Following the unbind if we attempt to bind
> >> the
> >> >> >> >> > device, the MSI-X initialization fails due to incorrect power
> state.
> >> >> >> >>
> >> >> >> >> That doesn't make much sense. I didn't think the VFs had their
> own
> >> >> >> >> power state control. After all it isn't as if they could switch off
> >> >> >> >> while the PF is active on the host.
> >> >> >> >>
> >> >> >> >> > Fix this by always calling pci_set_power_state(pdev, PCI_D0)
> on
> >> >> driver
> >> >> >> >> > device probe.
> >> >> >> >>
> >> >> >> >> That should be redundant. There is already code that is called
> to
> >> >> >> >> transition the power state to D0 in pci_enable_device_mem. As
> >> long
> >> >> as
> >> >> >> >> nobody else has enabled the device it will be ran.
> >> >> >> >>
> >> >> >> >
> >> >> >> > That's weird. In this case, after doing aer-inject on the PF, if we
> >> unbind
> >> >> >> then bind the VF device, it fails in pci_enable_msix_range which I
> >> traced
> >> >> into
> >> >> >> eventually a call that fails because the state isn't PCI_D0. If I add a
> >> >> >> "pci_set_power_state(pdev, PCI_D0)" call in the probe routine it
> >> works
> >> >> >> fine...?
> >> >> >> >
> >> >> >> > I don't really know why that'd be the case... thoughts?
> >> >> >>
> >> >> >> Does the VF even advertise support for PCI power management
> >> >> >> capability? If I recall a VF shouldn't. As such I think a call to
> >> >> >> pci_set_power_state should be returning an error and not doing
> >> >> >> anything so I would think this code only has any effect on the PF.
> So
> >> >> >> I am not certain what this patch is even doing if you aren't
> reloading
> >> >> >> the PF driver to trigger it.
> >> >> >>
> >> >> >> - Alex
> >> >> >
> >> >> > I'm not sure either but without it we're ending up with the failure in
> >> >> pci_msi_supported failing with -EINVAL because dev->current_state
> !=
> >> >> PCI_D0...
> >> >> >
> >> >> > It only happens after an unbind of the VF device then a bind. I think
> >> >> somehow the dev->current_state gets messed up but I don't know
> how
> >> this
> >> >> happens.
> >> >> >
> >> >> > Thanks,
> >> >> > Jake
> >> >>
> >> >> When you say unbind/bind are you talking about using the sysfs or is
> >> >> this via some other mechanism?
> >> >>
> >> >> - Alex
> >> >
> >> > Using sysfs. We echo the PCI address to fm10k's unbind, then echo it to
> >> bind.
> >> >
> >> > Thanks,
> >> > Jake
> >>
> >> I'm wondering if we have a reference count bug being created
> >> somewhere. One thing you might try tracking is the dev->enable_cnt
> >> value for the VF through probe and after a remove to see if we are
> >> leaking that somewhere. The other thing to look for is to figure out
> >> where we are setting dev->current_state to something other than
> >> PCI_D0. I'd be interested in knowing where we are messing on this
> >> with a VF and what we are setting it to.
> >>
> >> - Alex
> >
> > Yep, when we call pci_enable_device_mem() in fm10k_io_slot_reset() we
> leak an enable count and I think that?s at least partially responsible for the
> failure since I bet VFs don't actually expose power state, so once enabled it
> won't change back from unknown?
>
> No, it will but you have to actually be re-enabling the device to trigger it.
>
> You might try replacing the call to pci_enable_device_mem with
> pci_reenable_device and see if that works for you. I suspect this is
> probably a bug that is common in many of the drivers out there.
>
> - Alex
The other possible fix is to put pci_device_disable into the .io_error_detected handler, which used to be there in the past. The downside to this call is that we have to check when we remove if we've already disabled to avoid disabling the device twice.
I'll look into using reenable instead and see how that goes.
Thanks,
Jake
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon probe
2016-06-09 17:54 ` Alexander Duyck
2016-06-09 18:30 ` Keller, Jacob E
@ 2016-06-09 18:57 ` Keller, Jacob E
1 sibling, 0 replies; 15+ messages in thread
From: Keller, Jacob E @ 2016-06-09 18:57 UTC (permalink / raw)
To: intel-wired-lan
On Thu, 2016-06-09 at 10:54 -0700, Alexander Duyck wrote:
> No, it will but you have to actually be re-enabling the device to
> trigger it.
>
> You might try replacing the call to pci_enable_device_mem with
> pci_reenable_device and see if that works for you.??I suspect this is
> probably a bug that is common in many of the drivers out there.
>
> - Alex
Changing the pci_enable_device_mem() to a pci_reenable_device() inside
the .io_slot_reset seems to have worked, and is much cleaner than my
alternative approach. I think this is the right way since we aren't
really wanting to disable the device in this section and matches up
with what the .slot_reset wants without breaking other flows.
I'll have a patch for that in a bit.
Thanks,
Jake
^ permalink raw reply [flat|nested] 15+ messages in thread