* Re: arm64: juno-r2: SSD detect failed on mainline and next
[not found] ` <5a277d1d-c7b1-430c-a463-1e307a2823f6@arm.com>
@ 2025-04-11 19:11 ` Robin Murphy
2025-04-25 15:18 ` Robin Murphy
0 siblings, 1 reply; 3+ messages in thread
From: Robin Murphy @ 2025-04-11 19:11 UTC (permalink / raw)
To: Naresh Kamboju
Cc: Linux ARM, iommu, open list, lkft-triage, Linux Regressions,
Lorenzo Pieralisi, Bjorn Helgaas, Rob Herring, Dan Carpenter,
Arnd Bergmann, Anders Roxell, linux-pci@vger.kernel.org
On 10/04/2025 4:36 pm, Robin Murphy wrote:
> On 09/04/2025 4:56 pm, Naresh Kamboju wrote:
>> On Wed, 2 Apr 2025 at 21:04, Robin Murphy <robin.murphy@arm.com> wrote:
>>>
>>> On 31/03/2025 5:03 am, Naresh Kamboju wrote:
>>>> Regressions on arm64 Juno-r2 devices detect SSD tests failed on the
>>>> Linux next and Linux mainline.
>>>>
>>>> First seen on the v6.14-7245-g5c2a430e8599
>>>> Good: v6.14
>>>> Bad: v6.14-7422-gacb4f33713b9
>>>
>>> Sorry, I can't seem to reproduce this on my end, both today's mainline
>>> and acb4f33713b9 with my config, and even acb4f33713b9 with the linked
>>> LKFT config, all work OK on my Juno r2 (using a SATA SSD and PCIe
>>> networking). The only thing which stands out in your log is that PCI
>>> seems to give up probing and assigning resources beyond the switch
>>> downstream ports (so SATA and ethernet are never discovered), whereas on
>>> mine it does[2]. However that all happens before the first IOMMU
>>> instance probes (which conveniently is the PCIe one), so it's hard to
>>> imagine how that could have an effect anyway...
>>>
>>> The only obvious difference is that I'm using EDK2 rather than U-Boot,
>>> so that's done all the PCIe configuration once already, but it doesn't
>>> seem like that's significant - looking back at a random older log[1],
>>> the on-board endpoints were still being picked up right after
>>> reconfiguring the switch, well before the IOMMU comes into the picture.
>>>
>>
>> Since it is a still issue on mainline and next,
>>
>> Bisected and reverted patch ^ causing kernel warnings at boot time
>> but finding the SSD drive,
>>
>> [bcb81ac6ae3c2ef95b44e7b54c3c9522364a245c]
>> iommu: Get DT/ACPI parsing into the proper probe path
>>
>> pcieport 0000:00:00.0: late IOMMU probe at driver bind, something
>> fishy here!
>> WARNING: at drivers/iommu/iommu.c:559 __iommu_probe_device
>>
>> I see boot warnings [1]
>> I am happy to test debug patches if you have any.
>
> Seeing the warning after reverting the commit which introduced the
> warning mostly just means the conflict resolution in the revert wasn't
> right (there were some subsequent fixups...)
>
> Anyway, I have now managed to get my Juno booting with the same antique
> version of U-Boot and finally reproduce the issue. It seems to be
> somehow connected to bus->dma_configure() being called in the
> device_add() notifier (even though the rest of the IOMMU setup doesn't
> run at that point since the driver hasn't registered yet), but how and
> why that prevents the buses behind the switch downstream ports being
> probed, and why *that* only happens when the switch isn't already
> configured, remains a mystery so far. I'm still digging...
OK, I found it, but I'm still not sure what exactly to make of it - it's
the pci_request_acs() in of_iommu_configure(), now being called early
enough to actually have an effect. Booting with EDK2 already using PCI
prior to Linux, here's what I get for `sudo lspci -vv | grep ACSctl`
with 6.15-rc1:
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+
EgressCtrl- DirectTrans-
whereas with the 6.14 behaviour they are all '-'. I don't have a working
root filesystem with the U-Boot setup, but if I boot it with
"pci=config_acs=000000@pci:0:0" then the kernel does assign the bridge
windows and discover the ethernet/SATA endpoints again. I can spend some
time getting NFS working next week, but if you're able to get lspci
output off a machine in the "broken" state easily that would be handy to
compare.
So at this point it would seem to be something about how Linux
configures ACS when doing it from scratch. What I don't really know is
where to go from there. I do know Juno's possibly a bit odd in that the
switch supports ACS, but both the root port and endpoints either side of
it don't. Could this be tickling some subtle bug in the PCI layer, and
what is EDK2 doing that makes it not happen?
Thanks,
Robin.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: arm64: juno-r2: SSD detect failed on mainline and next
2025-04-11 19:11 ` arm64: juno-r2: SSD detect failed on mainline and next Robin Murphy
@ 2025-04-25 15:18 ` Robin Murphy
2025-05-30 9:48 ` Naresh Kamboju
0 siblings, 1 reply; 3+ messages in thread
From: Robin Murphy @ 2025-04-25 15:18 UTC (permalink / raw)
To: Naresh Kamboju
Cc: Linux ARM, iommu, open list, lkft-triage, Linux Regressions,
Lorenzo Pieralisi, Bjorn Helgaas, Rob Herring, Dan Carpenter,
Arnd Bergmann, Anders Roxell, linux-pci@vger.kernel.org
On 11/04/2025 8:11 pm, Robin Murphy wrote:
[...]
> OK, I found it, but I'm still not sure what exactly to make of it - it's
> the pci_request_acs() in of_iommu_configure(), now being called early
> enough to actually have an effect. Booting with EDK2 already using PCI
> prior to Linux, here's what I get for `sudo lspci -vv | grep ACSctl`
> with 6.15-rc1:
>
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> UpstreamFwd+ EgressCtrl- DirectTrans-
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> UpstreamFwd+ EgressCtrl- DirectTrans-
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> UpstreamFwd+ EgressCtrl- DirectTrans-
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> UpstreamFwd+ EgressCtrl- DirectTrans-
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> UpstreamFwd+ EgressCtrl- DirectTrans-
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> UpstreamFwd+ EgressCtrl- DirectTrans-
>
> whereas with the 6.14 behaviour they are all '-'. I don't have a working
> root filesystem with the U-Boot setup, but if I boot it with
> "pci=config_acs=000000@pci:0:0" then the kernel does assign the bridge
> windows and discover the ethernet/SATA endpoints again. I can spend some
> time getting NFS working next week, but if you're able to get lspci
> output off a machine in the "broken" state easily that would be handy to
> compare.
>
> So at this point it would seem to be something about how Linux
> configures ACS when doing it from scratch. What I don't really know is
> where to go from there. I do know Juno's possibly a bit odd in that the
> switch supports ACS, but both the root port and endpoints either side of
> it don't. Could this be tickling some subtle bug in the PCI layer, and
> what is EDK2 doing that makes it not happen?
Just following up on where I ran out of ideas. I poked at things a
little more, and from a process of elimination, the culprit appears to
be is that we enable ACS source validation on the downstream port while
its secondary bus is still 0, *then* we get to the "bridge configuration
invalid" bit and reconfigure the bus numbers, but after that, config
space accesses to the secondary bus still apparently fail to work as
expected.
What's now beyond me is whether this is just an ACS quirk of this
particular switch, and/or whether there's something we could or should
be doing in the PCI layer.
All I can suggest a this point is that you could at least sidestep the
problem on the LKFT boards by updating them to a less-ancient version of
U-Boot which supports PCIe for Juno (looks like that landed in 2020.10),
which should then configure the switch at boot such that the bus
numbering doesn't need to change when Linux probes it - that appears to
be the only "magic" thing that EDK2 is doing.
Thanks,
Robin.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: arm64: juno-r2: SSD detect failed on mainline and next
2025-04-25 15:18 ` Robin Murphy
@ 2025-05-30 9:48 ` Naresh Kamboju
0 siblings, 0 replies; 3+ messages in thread
From: Naresh Kamboju @ 2025-05-30 9:48 UTC (permalink / raw)
To: Robin Murphy
Cc: Linux ARM, iommu, open list, lkft-triage, Linux Regressions,
Lorenzo Pieralisi, Bjorn Helgaas, Rob Herring, Dan Carpenter,
Arnd Bergmann, Anders Roxell, linux-pci@vger.kernel.org
Hi Robin,
On Fri, 25 Apr 2025 at 20:48, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 11/04/2025 8:11 pm, Robin Murphy wrote:
> [...]
> > OK, I found it, but I'm still not sure what exactly to make of it - it's
> > the pci_request_acs() in of_iommu_configure(), now being called early
> > enough to actually have an effect. Booting with EDK2 already using PCI
> > prior to Linux, here's what I get for `sudo lspci -vv | grep ACSctl`
Linux version 6.14.9-rc1
# lscpi
00:00.0 PCI bridge: PLDA PCI Express Core Reference Design
01:00.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:01.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:02.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:03.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:0c.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:10.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:1f.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
ATA Raid II Controller (rev 01)
08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
PCI-E Gigabit Ethernet Controller
Linux version 6.15.0
# lscpi
00:00.0 PCI bridge: PLDA PCI Express Core Reference Design
01:00.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:01.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:02.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:03.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:0c.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:10.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:1f.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
> > with 6.15-rc1:
> >
> > ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> > ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> > ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> > ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> > ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> > ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> >
> > whereas with the 6.14 behaviour they are all '-'. I don't have a working
> > root filesystem with the U-Boot setup, but if I boot it with
> > "pci=config_acs=000000@pci:0:0" then the kernel does assign the bridge
> > windows and discover the ethernet/SATA endpoints again. I can spend some
> > time getting NFS working next week, but if you're able to get lspci
> > output off a machine in the "broken" state easily that would be handy to
> > compare.
On the 6.15, After adding this into Kernel boot args
pci=config_acs=000000@pci:0:0
The SSD was detected and mounted successfully.
Linux version 6.15.0 + pci=config_acs=000000@pci:0:0
# lspci
00:00.0 PCI bridge: PLDA PCI Express Core Reference Design
01:00.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:01.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:02.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:03.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:0c.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:10.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
02:1f.0 PCI bridge: Microsemi / PMC / IDT 89HPES16NT16G2 16-Lane
16-Port PCIe Gen2 System Interconnect Switch (rev 02)
03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
ATA Raid II Controller (rev 01)
08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
PCI-E Gigabit Ethernet Controller
> >
> > So at this point it would seem to be something about how Linux
> > configures ACS when doing it from scratch. What I don't really know is
> > where to go from there. I do know Juno's possibly a bit odd in that the
> > switch supports ACS, but both the root port and endpoints either side of
> > it don't. Could this be tickling some subtle bug in the PCI layer, and
> > what is EDK2 doing that makes it not happen?
>
> Just following up on where I ran out of ideas. I poked at things a
> little more, and from a process of elimination, the culprit appears to
> be is that we enable ACS source validation on the downstream port while
> its secondary bus is still 0, *then* we get to the "bridge configuration
> invalid" bit and reconfigure the bus numbers, but after that, config
> space accesses to the secondary bus still apparently fail to work as
> expected.
>
> What's now beyond me is whether this is just an ACS quirk of this
> particular switch, and/or whether there's something we could or should
> be doing in the PCI layer.
>
> All I can suggest a this point is that you could at least sidestep the
> problem on the LKFT boards by updating them to a less-ancient version of
> U-Boot which supports PCIe for Juno (looks like that landed in 2020.10),
> which should then configure the switch at boot such that the bus
> numbering doesn't need to change when Linux probes it - that appears to
> be the only "magic" thing that EDK2 is doing.
I will work with my team to address these issues.
Thanks for your suggestions.
>
> Thanks,
> Robin.
- Naresh
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-05-30 9:49 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+G9fYt0F_vR-zOV4P8m4HTv6AecT-eEnrL+t5wgAaKPodi0mQ@mail.gmail.com>
[not found] ` <6e0ef5cc-b692-4d39-bec4-a75c1af3f0aa@arm.com>
[not found] ` <CA+G9fYs_nUN2x8fFJ0cfudHWbCOLSJK=OhEK0Efd1ifcjq_LRg@mail.gmail.com>
[not found] ` <5a277d1d-c7b1-430c-a463-1e307a2823f6@arm.com>
2025-04-11 19:11 ` arm64: juno-r2: SSD detect failed on mainline and next Robin Murphy
2025-04-25 15:18 ` Robin Murphy
2025-05-30 9:48 ` Naresh Kamboju
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).