linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: bug in pci_try_reset_bus
       [not found]       ` <2f0fb394-f1c9-d782-ffec-dac04b1ac0a9@intel.com>
@ 2018-08-27 19:08         ` Sinan Kaya
  2018-08-27 19:52           ` Dennis Dalessandro
  0 siblings, 1 reply; 5+ messages in thread
From: Sinan Kaya @ 2018-08-27 19:08 UTC (permalink / raw)
  To: Dennis Dalessandro, bhelgaas, linux-pci

On 8/27/2018 2:59 PM, Dennis Dalessandro wrote:
> On 8/27/2018 2:21 PM, Sinan Kaya wrote:
>> [+linux-pci]
> 
> I'm not seeing a CC?

sorry, I fat fingered it. fixing now.

> 
>> On 8/27/2018 2:12 PM, Dennis Dalessandro wrote:
>>> On 8/27/2018 1:27 PM, Sinan Kaya wrote:
>>>> On 8/27/2018 1:21 PM, Dennis Dalessandro wrote:
>>>>> + */
>>>>> +int pci_try_reset_bus(struct pci_dev *pdev)
>>>>> +{
>>>>> +       return pci_probe_reset_slot(pdev->slot) ?
>>>>> +           __pci_try_reset_slot(pdev->slot) : __pci_try_reset_bus(pdev->bus);
>>>>> +}
>>>>
>>>> Code needs to go to __pci_try_reset_bus() when pci_probe_reset_slot() fails.
>>>>
>>>> It looks like we are missing a ! in front of pci_probe_reset_slot().
>>>>
>>>> Can you try this instead?
>>>>
>>>>        return !pci_probe_reset_slot(pdev->slot) ?
>>>>             __pci_try_reset_slot(pdev->slot) : __pci_try_reset_bus(pdev->bus);
>>>
>>> Right so I did that and it lands where I want it to, but the hang I get seems 
>>> to happen after it bails out of pci_bus_reset() because probe is set to 1.
>>
>> Is it returning from here?
>>
>> static int __pci_reset_bus(struct pci_bus *bus)
>> {
>>      int rc;
>>
>>      rc = pci_bus_reset(bus, 1);
>>      if (rc)
>>          return rc;
>>
>> ...
>> }
>>
>> If yes, this means that your bus is not resettable or your bus pointer
>> is null?
>>
>> Can you find out which one it is?
>>
>> https://elixir.bootlin.com/linux/v4.19-rc1/source/drivers/pci/pci.c#L5138
>>
>> It might be also good to capture the return value of pci_bus_reset() to
>> see if it returns 0 or not.
> 
> It is returning 0 from there because pci_bus_reset is being called with probe = 1.
> 
> The call stack from the driver is:
> 
> pci_reset_bus(pdev)
>      __pci_reset_bus(pdev->bus) <-- after fixing first issue
>          pci_bus_reset(bus, 1)
>              Checks ptr and if resetable.
>              if (probe)
>                  return 0 <-- here is where it returns
> 

Probe is there to query if this particular device supports reset or not. Which
is the very first thing this code does to ensure we are allowed to touch it
and it is correct.

If probe is returning 0, this code won't return from here. It will continue
to trylock piece.

	rc = pci_bus_reset(bus, 1);
	if (rc)
		return rc;

My guess is you are failing below with -EAGAIN because lock fails:

if (pci_bus_trylock(bus)) {

} else {
	rc = -EAGAIN;
}

can you please confirm?


> I tried removing the whole if (probe) then return thing but still get a hang 
> when it calls pci_bus_lock().
> 
>>> What is the right way to do an SBR while our driver is being probed? The 
>>> reason we are doing all of this is because we need to do a bump up to gen3 
>>> speed when the driver loads and the last thing we need to do is issue an SBR.
>>
>> I don't think you can achieve what you want without doing an SBR. Another
>> possibility is retrain if your hardware supports.
> 
> We do want to do an SBR. I'm just wondering if we are going about it the right 
> way. Because things worked before this was hidden from the drivers and we called 
> right into the bus reset.

That's where the problem is. It is responsibility of the PCI core to check that
you are allowed to reset.

> 
> -Denny
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bug in pci_try_reset_bus
  2018-08-27 19:08         ` bug in pci_try_reset_bus Sinan Kaya
@ 2018-08-27 19:52           ` Dennis Dalessandro
  2018-08-27 20:18             ` Sinan Kaya
  0 siblings, 1 reply; 5+ messages in thread
From: Dennis Dalessandro @ 2018-08-27 19:52 UTC (permalink / raw)
  To: Sinan Kaya, bhelgaas, linux-pci

On 8/27/2018 3:08 PM, Sinan Kaya wrote:
>> The call stack from the driver is:
>>
>> pci_reset_bus(pdev)
>>      __pci_reset_bus(pdev->bus) <-- after fixing first issue
>>          pci_bus_reset(bus, 1)
>>              Checks ptr and if resetable.
>>              if (probe)
>>                  return 0 <-- here is where it returns
>>
> 
> Probe is there to query if this particular device supports reset or not. 
> Which
> is the very first thing this code does to ensure we are allowed to touch it
> and it is correct.
> 
> If probe is returning 0, this code won't return from here. It will continue
> to trylock piece.

The probe arg is hard coded to 1 in this path. Actually both calls in 
pci.c it is hard coded to 1. Maybe we just need to get rid of that arg 
all together?

> 
>      rc = pci_bus_reset(bus, 1);
>      if (rc)
>          return rc;
> 
> My guess is you are failing below with -EAGAIN because lock fails:
> 
> if (pci_bus_trylock(bus)) {
> 
> } else {
>      rc = -EAGAIN;
> }
> 
> can you please confirm?

Ah yes, silly me. pci_bus_reset() returns 0 and it does go on but 
doesn't make it to the trylock, it gets hung calling 
pci_bus_save_and_disable().

-Denny

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bug in pci_try_reset_bus
  2018-08-27 19:52           ` Dennis Dalessandro
@ 2018-08-27 20:18             ` Sinan Kaya
  2018-08-27 20:48               ` Dennis Dalessandro
  0 siblings, 1 reply; 5+ messages in thread
From: Sinan Kaya @ 2018-08-27 20:18 UTC (permalink / raw)
  To: Dennis Dalessandro, bhelgaas, linux-pci

On 8/27/2018 3:52 PM, Dennis Dalessandro wrote:
>>
>> can you please confirm?
> 
> Ah yes, silly me. pci_bus_reset() returns 0 and it does go on but doesn't make 
> it to the trylock, it gets hung calling pci_bus_save_and_disable().
>

OK. That makes sense now. pci_bus_save_and_disable() is also trying to
obtain a device lock via pci_dev_lock().

Since you are calling this from probe time, you are getting dead lock
because device is locked.

Is it possible to defer this secondary bus reset operation to post probe?

Possible solutions are:
1. introduce a locked reset API
2. skip lock during probe
3. bring back raw reset API even though it is undesirable.

Other opinions?

BTW, please file a bugzilla and capture your email details there so that we
can have record of what we are doing?

> -Denny

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bug in pci_try_reset_bus
  2018-08-27 20:18             ` Sinan Kaya
@ 2018-08-27 20:48               ` Dennis Dalessandro
  2018-08-28 13:25                 ` Sinan Kaya
  0 siblings, 1 reply; 5+ messages in thread
From: Dennis Dalessandro @ 2018-08-27 20:48 UTC (permalink / raw)
  To: Sinan Kaya, bhelgaas, linux-pci

On 8/27/2018 4:18 PM, Sinan Kaya wrote:
> On 8/27/2018 3:52 PM, Dennis Dalessandro wrote:
>>>
>>> can you please confirm?
>>
>> Ah yes, silly me. pci_bus_reset() returns 0 and it does go on but 
>> doesn't make it to the trylock, it gets hung calling 
>> pci_bus_save_and_disable().
>>
> 
> OK. That makes sense now. pci_bus_save_and_disable() is also trying to
> obtain a device lock via pci_dev_lock().
> 
> Since you are calling this from probe time, you are getting dead lock
> because device is locked.
> 
> Is it possible to defer this secondary bus reset operation to post probe?

I don't think so. We need to do a gen3 bump at probe. I don't know that 
there is any other hardware that does is so probably why it hasn't been 
noticed.

> Possible solutions are:
> 1. introduce a locked reset API > 2. skip lock during probe
> 3. bring back raw reset API even though it is undesirable.

I think the first option is the cleanest. I can put together a patch and 
post it soon.

> Other opinions?
> 
> BTW, please file a bugzilla and capture your email details there so that we
> can have record of what we are doing?

I can certainly do that.

Thanks for your help on this!

-Denny

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bug in pci_try_reset_bus
  2018-08-27 20:48               ` Dennis Dalessandro
@ 2018-08-28 13:25                 ` Sinan Kaya
  0 siblings, 0 replies; 5+ messages in thread
From: Sinan Kaya @ 2018-08-28 13:25 UTC (permalink / raw)
  To: Dennis Dalessandro, bhelgaas, linux-pci

On 8/27/2018 4:48 PM, Dennis Dalessandro wrote:
>> Possible solutions are:
>> 1. introduce a locked reset API > 2. skip lock during probe
>> 3. bring back raw reset API even though it is undesirable.
> 
> I think the first option is the cleanest. I can put together a patch and 
> post it soon.

I was thinking about this. These locked API are getting out of control. It could
be a simpler solution to change the mutex type of device_lock such that it can
be locked twice. I think I have seen some examples of how to do this in the
NVMe driver.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-08-28 17:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3d3d78fb-f1f2-ec4f-38f9-0b0d14564f53@intel.com>
     [not found] ` <323b3bbe-d255-8a5d-2a9c-d64e1f86cb64@codeaurora.org>
     [not found]   ` <bd7dbd5a-eebc-9142-f9a2-3b5ff8de9b8b@intel.com>
     [not found]     ` <d437400c-7eed-19d3-9cf2-b8a42217136b@codeaurora.org>
     [not found]       ` <2f0fb394-f1c9-d782-ffec-dac04b1ac0a9@intel.com>
2018-08-27 19:08         ` bug in pci_try_reset_bus Sinan Kaya
2018-08-27 19:52           ` Dennis Dalessandro
2018-08-27 20:18             ` Sinan Kaya
2018-08-27 20:48               ` Dennis Dalessandro
2018-08-28 13:25                 ` Sinan Kaya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).