Linux PCI subsystem development
 help / color / mirror / Atom feed
* Re: pci_bus_distribute_available_resources() is wrong?
       [not found] ` <CAErSpo7WrAg5D4xyv0SycoDc1etSspU_TL6XMAK4STYrXDrGNQ@mail.gmail.com>
@ 2022-12-12 21:10   ` Alexander Motin
  2022-12-13  5:49     ` Mika Westerberg
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Motin @ 2022-12-12 21:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Nicholas Johnson, Nick Wolff, Bjorn Helgaas,
	linux-pci

On 12.12.2022 15:32, Bjorn Helgaas wrote:
> On Mon, Dec 12, 2022 at 1:36 PM Alexander Motin <mav@ixsystems.com> wrote:
>> Hi,
>>
>> I am writing to you three as the authors of Linux
>> drivers/pci/setup-bus.c pci_bus_distribute_available_resources()
>> function.  Trying to debug PCI hot-plug issue on passive side of AMD NTB
>> I hit this function, behavior of which I looks very suspicious to me,
>> which I believe cause resource allocation problems we observe.
>>
>> As I see, this function distributes extra size of parent memory window
>> of hot-plug PCI bridge between memory windows of child bridges.  It
>> probably makes some sense, but I see a problem in the fact that the
>> function only looks on children bridge memory windows, but not any other
>> resources (of bridges or other devices that may be there).
>>
>> In my AMD NTB case PCI topology looks this way:
>>
>> +-[0000:80]-+-00.0
>> |           +-01.1-[81-83]----00.0-[82-83]----00.0-[83]--+-00.0 Dummy
>> |           |                                            \-00.1 NTB
>>
>> 80:01.1 is the root bridge where the hot-plug happens.  The 81:00.0
>> bridge in addition to memory windows has small 16KB BAR.  But since it
>> is the only bridge on the bus, the function passes all available
>> resources down to its children.  As result, that BAR fails to allocate.
>>    And while that BAR seems not really needed, in some cases the
>> allocation error makes whole memory window to be disabled, that ends up
>> in NTB device driver attach failure.
> 
> Mika is working on what sounds like the same problem.  His current
> patch series is at
> https://lore.kernel.org/linux-pci/20221130112221.66612-1-mika.westerberg@linux.intel.com/
> 
> We would appreciate your comments and testing as that series is developed.

Thank you, Bjorn.  This definitely looks related, but as you've already 
noted in your review there, present patch does not handle BARs of the 
bridge itself, that I have in my case.  I'd be happy to test the updated 
patch.  Please keep me in a loop.

I also agree with your comment that the same should be done in case of 
multiple bridges.  I am generally not sure the cases of single bridge or 
not having hot-plug on this level should be any specific.

>> It may be rare to see PCI bridges with BARs, but I know that at least
>> some PLX bridges also have BARs for configuration purposes.  Also I
>> suppose the same problem would happen if there are other device on the
>> bus aside of the bridge.
>>
>> I've tried to disable pci_bus_distribute_available_resources(), and it
>> fixed the problem.  But I suppose it may cause problems in cases for
>> which this function was developed (hot-plug of JBOD supporting
>> hot-plug?).  Am I right?
>>
>> I would appreciate your feedback on this issue.  I am new to this code
>> area of Linux and not sure how to better fix this, but the way the code
>> is written now looks very wrong to me.
>>
>> Thanks.
>>
>> --
>> Alexander Motin

-- 
Alexander Motin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pci_bus_distribute_available_resources() is wrong?
  2022-12-12 21:10   ` pci_bus_distribute_available_resources() is wrong? Alexander Motin
@ 2022-12-13  5:49     ` Mika Westerberg
  2022-12-13 14:11       ` Alexander Motin
  0 siblings, 1 reply; 6+ messages in thread
From: Mika Westerberg @ 2022-12-13  5:49 UTC (permalink / raw)
  To: Alexander Motin
  Cc: Bjorn Helgaas, Nicholas Johnson, Nick Wolff, Bjorn Helgaas,
	linux-pci

Hi,

On Mon, Dec 12, 2022 at 04:10:16PM -0500, Alexander Motin wrote:
> On 12.12.2022 15:32, Bjorn Helgaas wrote:
> > On Mon, Dec 12, 2022 at 1:36 PM Alexander Motin <mav@ixsystems.com> wrote:
> > > Hi,
> > > 
> > > I am writing to you three as the authors of Linux
> > > drivers/pci/setup-bus.c pci_bus_distribute_available_resources()
> > > function.  Trying to debug PCI hot-plug issue on passive side of AMD NTB
> > > I hit this function, behavior of which I looks very suspicious to me,
> > > which I believe cause resource allocation problems we observe.
> > > 
> > > As I see, this function distributes extra size of parent memory window
> > > of hot-plug PCI bridge between memory windows of child bridges.  It
> > > probably makes some sense, but I see a problem in the fact that the
> > > function only looks on children bridge memory windows, but not any other
> > > resources (of bridges or other devices that may be there).

Right the idea was that we allocate the spare resources for the possible
hotplug downstream ports so that it is possible to extend that topology
without running out of resources. This is mostly used with
Thunderbolt/USB4 PCIe tunneling.

However, like many have noticed, it does not handle the more generic PCI
case well. Sorry about that.

> > > In my AMD NTB case PCI topology looks this way:
> > > 
> > > +-[0000:80]-+-00.0
> > > |           +-01.1-[81-83]----00.0-[82-83]----00.0-[83]--+-00.0 Dummy
> > > |           |                                            \-00.1 NTB
> > > 
> > > 80:01.1 is the root bridge where the hot-plug happens.  The 81:00.0
> > > bridge in addition to memory windows has small 16KB BAR.  But since it
> > > is the only bridge on the bus, the function passes all available
> > > resources down to its children.  As result, that BAR fails to allocate.
> > >    And while that BAR seems not really needed, in some cases the
> > > allocation error makes whole memory window to be disabled, that ends up
> > > in NTB device driver attach failure.

Just out of the curiosity, is this PCIe or PCI topology?

> > Mika is working on what sounds like the same problem.  His current
> > patch series is at
> > https://lore.kernel.org/linux-pci/20221130112221.66612-1-mika.westerberg@linux.intel.com/
> > 
> > We would appreciate your comments and testing as that series is developed.
> 
> Thank you, Bjorn.  This definitely looks related, but as you've already
> noted in your review there, present patch does not handle BARs of the bridge
> itself, that I have in my case.  I'd be happy to test the updated patch.
> Please keep me in a loop.
> 
> I also agree with your comment that the same should be done in case of
> multiple bridges.  I am generally not sure the cases of single bridge or not
> having hot-plug on this level should be any specific.

Yeah, I'm working on a new version of the patch series that should take
these into consideration. The challenge is that the code has been used
with the Thunderbolt/USB4 PCIe tunneling for some time already and we
don't want to break that either.

I'm also more than happy to test any patches regarding this if someone
else wants to work on it ;-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pci_bus_distribute_available_resources() is wrong?
  2022-12-13  5:49     ` Mika Westerberg
@ 2022-12-13 14:11       ` Alexander Motin
  2022-12-13 14:48         ` Mika Westerberg
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Motin @ 2022-12-13 14:11 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Nicholas Johnson, Nick Wolff, Bjorn Helgaas,
	linux-pci

On 13.12.2022 00:49, Mika Westerberg wrote:
> On Mon, Dec 12, 2022 at 04:10:16PM -0500, Alexander Motin wrote:
>> On 12.12.2022 15:32, Bjorn Helgaas wrote:
>>> On Mon, Dec 12, 2022 at 1:36 PM Alexander Motin <mav@ixsystems.com> wrote:
>>>> In my AMD NTB case PCI topology looks this way:
>>>>
>>>> +-[0000:80]-+-00.0
>>>> |           +-01.1-[81-83]----00.0-[82-83]----00.0-[83]--+-00.0 Dummy
>>>> |           |                                            \-00.1 NTB
>>>>
>>>> 80:01.1 is the root bridge where the hot-plug happens.  The 81:00.0
>>>> bridge in addition to memory windows has small 16KB BAR.  But since it
>>>> is the only bridge on the bus, the function passes all available
>>>> resources down to its children.  As result, that BAR fails to allocate.
>>>>     And while that BAR seems not really needed, in some cases the
>>>> allocation error makes whole memory window to be disabled, that ends up
>>>> in NTB device driver attach failure.
> 
> Just out of the curiosity, is this PCIe or PCI topology?

All PCIe: hot-plug root <-> upstream <-> downstream <-> two endpoints.

>>> Mika is working on what sounds like the same problem.  His current
>>> patch series is at
>>> https://lore.kernel.org/linux-pci/20221130112221.66612-1-mika.westerberg@linux.intel.com/
>>>
>>> We would appreciate your comments and testing as that series is developed.
>>
>> Thank you, Bjorn.  This definitely looks related, but as you've already
>> noted in your review there, present patch does not handle BARs of the bridge
>> itself, that I have in my case.  I'd be happy to test the updated patch.
>> Please keep me in a loop.
>>
>> I also agree with your comment that the same should be done in case of
>> multiple bridges.  I am generally not sure the cases of single bridge or not
>> having hot-plug on this level should be any specific.
> 
> Yeah, I'm working on a new version of the patch series that should take
> these into consideration. The challenge is that the code has been used
> with the Thunderbolt/USB4 PCIe tunneling for some time already and we
> don't want to break that either.
> 
> I'm also more than happy to test any patches regarding this if someone
> else wants to work on it ;-)

I was kind of ready to dive in, I hate hacks and tunables to workaround 
bugs.  But as I have told, I see this code first time, so my solutions 
may appear not right.  But I'll help as I can, if needed.

-- 
Alexander Motin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pci_bus_distribute_available_resources() is wrong?
  2022-12-13 14:11       ` Alexander Motin
@ 2022-12-13 14:48         ` Mika Westerberg
  2022-12-13 15:00           ` Alexander Motin
  0 siblings, 1 reply; 6+ messages in thread
From: Mika Westerberg @ 2022-12-13 14:48 UTC (permalink / raw)
  To: Alexander Motin
  Cc: Bjorn Helgaas, Nicholas Johnson, Nick Wolff, Bjorn Helgaas,
	linux-pci

Hi,

On Tue, Dec 13, 2022 at 09:11:12AM -0500, Alexander Motin wrote:
> > I'm also more than happy to test any patches regarding this if someone
> > else wants to work on it ;-)
> 
> I was kind of ready to dive in, I hate hacks and tunables to workaround
> bugs.  But as I have told, I see this code first time, so my solutions may
> appear not right.  But I'll help as I can, if needed.

That's good to hear :) Okay feel free to use any of my previous patches
as base if needed. I can test the Thunderbolt/USB4 and the QEMU PCI/PCIe
topologies so please keep me in the loop when submitting.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pci_bus_distribute_available_resources() is wrong?
  2022-12-13 14:48         ` Mika Westerberg
@ 2022-12-13 15:00           ` Alexander Motin
  2022-12-13 15:10             ` Mika Westerberg
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Motin @ 2022-12-13 15:00 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Nicholas Johnson, Nick Wolff, Bjorn Helgaas,
	linux-pci

On 13.12.2022 09:48, Mika Westerberg wrote:
> On Tue, Dec 13, 2022 at 09:11:12AM -0500, Alexander Motin wrote:
>>> I'm also more than happy to test any patches regarding this if someone
>>> else wants to work on it ;-)
>>
>> I was kind of ready to dive in, I hate hacks and tunables to workaround
>> bugs.  But as I have told, I see this code first time, so my solutions may
>> appear not right.  But I'll help as I can, if needed.
> 
> That's good to hear :) Okay feel free to use any of my previous patches
> as base if needed. I can test the Thunderbolt/USB4 and the QEMU PCI/PCIe
> topologies so please keep me in the loop when submitting.

It is quite a quick change of roles, don't you find?  It is you created 
the problem.  It is your email includes "@linux.intel.com" here.  Don't 
you feel some responsibility? ;)

Where are the current result of "I'm working on a new version of the 
patch series that should take these into consideration"?

-- 
Alexander Motin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pci_bus_distribute_available_resources() is wrong?
  2022-12-13 15:00           ` Alexander Motin
@ 2022-12-13 15:10             ` Mika Westerberg
  0 siblings, 0 replies; 6+ messages in thread
From: Mika Westerberg @ 2022-12-13 15:10 UTC (permalink / raw)
  To: Alexander Motin
  Cc: Bjorn Helgaas, Nicholas Johnson, Nick Wolff, Bjorn Helgaas,
	linux-pci

Hi,

On Tue, Dec 13, 2022 at 10:00:42AM -0500, Alexander Motin wrote:
> On 13.12.2022 09:48, Mika Westerberg wrote:
> > On Tue, Dec 13, 2022 at 09:11:12AM -0500, Alexander Motin wrote:
> > > > I'm also more than happy to test any patches regarding this if someone
> > > > else wants to work on it ;-)
> > > 
> > > I was kind of ready to dive in, I hate hacks and tunables to workaround
> > > bugs.  But as I have told, I see this code first time, so my solutions may
> > > appear not right.  But I'll help as I can, if needed.
> > 
> > That's good to hear :) Okay feel free to use any of my previous patches
> > as base if needed. I can test the Thunderbolt/USB4 and the QEMU PCI/PCIe
> > topologies so please keep me in the loop when submitting.
> 
> It is quite a quick change of roles, don't you find?  It is you created the
> problem.  It is your email includes "@linux.intel.com" here.  Don't you feel
> some responsibility? ;)

Sorry I must have misunderstood what you meant then. I was under
impression that you were ready to go and fix the issue. Okay no problem
I will work on this myself then and keep you guys updated.

> Where are the current result of "I'm working on a new version of the patch
> series that should take these into consideration"?

Still early "draft" I will submit it once it is in better shape (and I
have validated it works in the known cases). Anyway it will happen after
the merge window is closed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-12-13 15:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2ec11223-edb3-5f5c-62cd-3532d92de0a4@ixsystems.com>
     [not found] ` <CAErSpo7WrAg5D4xyv0SycoDc1etSspU_TL6XMAK4STYrXDrGNQ@mail.gmail.com>
2022-12-12 21:10   ` pci_bus_distribute_available_resources() is wrong? Alexander Motin
2022-12-13  5:49     ` Mika Westerberg
2022-12-13 14:11       ` Alexander Motin
2022-12-13 14:48         ` Mika Westerberg
2022-12-13 15:00           ` Alexander Motin
2022-12-13 15:10             ` Mika Westerberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox