From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: "Pali Rohár" <pali@kernel.org>, "Bjorn Helgaas" <helgaas@kernel.org>
Cc: vtolkm@gmail.com, linux-pci@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
"Rob Herring" <robh@kernel.org>,
"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
"Marek Behún" <marek.behun@nic.cz>,
"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
"Jason Cooper" <jason@lakedaemon.net>
Subject: Re: PCI trouble on mvebu (Turris Omnia)
Date: Fri, 30 Oct 2020 14:02:22 +0100 [thread overview]
Message-ID: <87k0v7n9y9.fsf@toke.dk> (raw)
In-Reply-To: <20201030112331.meqg6lvultyn6v54@pali>
Pali Rohár <pali@kernel.org> writes:
> On Wednesday 28 October 2020 18:16:26 Bjorn Helgaas wrote:
>> [+cc Pali, Marek, Thomas, Jason]
>>
>> On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
>> > On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
>> > > Bjorn Helgaas <helgaas@kernel.org> writes:
>> > > > On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
>> > > > > Toke Høiland-Jørgensen <toke@redhat.com> writes:
>> > > > > > Bjorn Helgaas <helgaas@kernel.org> writes:
>> > > > > >
>> > > > > > > [+cc vtolkm]
>> > > > > > >
>> > > > > > > On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>> > > > > > > > Hi everyone
>> > > > > > > >
>> > > > > > > > I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>> > > > > > > > having some trouble getting the PCI bus to work correctly. Specifically,
>> > > > > > > > I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>> > > > > > > > the resource request fix[0] applied on top.
>> > > > > > > >
>> > > > > > > > The kernel boots fine, and the patch in [0] makes the PCI devices show
>> > > > > > > > up. But I'm still getting initialisation errors like these:
>> > > > > > > >
>> > > > > > > > [ 1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>> > > > > > > > [ 1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> > > > > > > > [ 1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>> > > > > > > > [ 1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> > > > > > > >
>> > > > > > > > and the WiFi drivers fail to initialise with what appears to me to be
>> > > > > > > > errors related to the bus rather than to the drivers themselves:
>> > > > > > > >
>> > > > > > > > [ 3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> > > > > > > > [ 3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>> > > > > > > > [ 3.524473] ath9k 0000:01:00.0: Failed to initialize device
>> > > > > > > > [ 3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>> > > > > > > > [ 3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>> > > > > > > > [ 3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>> > > > > > > > [ 3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>> > > > > > > > [ 3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>> > > > > > > > [ 3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>> > > > > > > >
>> > > > > > > > lspci looks OK, though:
>> > > > > > > >
>> > > > > > > > # lspci
>> > > > > > > > 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > > 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > > 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > > 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>> > > > > > > > 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>> > > > > > > >
>> > > > > > > > Does anyone have any clue what could be going on here? Is this a bug, or
>> > > > > > > > did I miss something in my config or other initialisation? I've tried
>> > > > > > > > with both the stock u-boot distributed with the board, and with an
>> > > > > > > > upstream u-boot from latest master; doesn't seem to make any different.
>> > > > > > > Can you try turning off CONFIG_PCIEASPM? We had a similar recent
>> > > > > > > report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>> > > > > > > don't think we have a fix yet.
>> > > > > > Yes! Turning that off does indeed help! Thanks a bunch :)
>> > > > > >
>> > > > > > You mention that bisecting this would be helpful - I can try that
>> > > > > > tomorrow; any idea when this was last working?
>> > > > > OK, so I tried to bisect this, but, erm, I couldn't find a working
>> > > > > revision to start from? I went all the way back to 4.10 (which is the
>> > > > > first version to include the device tree file for the Omnia), and even
>> > > > > on that, the wireless cards were failing to initialise with ASPM
>> > > > > enabled...
>> > > > I have no personal experience with this device; all I know is that the
>> > > > bugzilla suggests that it worked in v5.4, which isn't much help.
>> > > >
>> > > > Possibly the apparent regression was really a .config change, i.e.,
>> > > > CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
>> > > > "worked" but got enabled later and it started failing?
>> > > Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
>> > > default and only turns it on for specific targets. So I guess that it's
>> > > most likely that this has never worked...
>> > >
>> > > > Maybe the debug patch below would be worth trying to see if it makes
>> > > > any difference? If it *does* help, try omitting the first hunk to see
>> > > > if we just need to apply the quirk_enable_clear_retrain_link() quirk.
>> > > Tried, doesn't help...
>> > >
>> > > -Toke
>> >
>> > Found this patch
>> >
>> > https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch
>> >
>> > that mentions the Compex WLE900VX card, which reading the lspci verbose
>> > output from the bugtracker seems to the device being troubled.
>>
>> Interesting. Indeed, the Compex WLE900VX card seems to have the
>> Qualcomm Atheros QCA9880 on it, and it looks like Toke's system has
>> the same device in it.
>>
>> The patch you mention (https://git.kernel.org/linus/43fc679ced18) is
>> for aardvark, so of course doesn't help mvebu.
>>
>> PCIe hardware is supposed to automatically negotiate the highest link
>> speed supported by both ends. But software *is* allowed to set an
>> upper limit (the Target Link Speed in Link Control 2). If we initiate
>> a retrain and the link doesn't come back up, I wonder if we should try
>> to help the hardware out by using Target Link Speed to limit to a
>> lower speed and attempting another retrain, something like this hacky
>> patch: (please collect the dmesg log if you try this)
>
> My experience with that WLE900VX card, aardvark driver and aspm code:
>
> Link training in GEN2 mode for this card succeed only once after reset.
> Repeated link retraining fails and it fails even when aardvark is
> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
> working link training.
>
> What I did in aardvark driver: Set mode to GEN2, do link training. If
> success read "negotiated link speed" from "Link Control Status Register"
> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
> retrain link again (for WLE900VX now it would be at GEN1). After that
> card is stable and all future retraining (e.g. from aspm.c) also passes.
>
> If I do not change aardvark mode from GEN2 to GEN1 the second link
> training fails. And if I change mode to GEN1 after this failed link
> training then nothing happen, link training do not success.
>
> So just speculation now... In current setup initialization of card does
> one link training at GEN2. Then aspm.c is called which is doing second
> link retraining at GEN2. And if it fails then below patch issue third
> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
> then second link retraining must be at GEN1 (not GEN2) to workaround
> this issue.
>
> Bjorn, Toke: what about trying to hack aspm.c code to never do link
> retraining at GEN2 speed? And always force GEN1 speed prior link
> training?
Sounds like a plan. I poked around in aspm.c and must confess to being a
bit lost in the soup of registers ;)
So if one of you can cook up a patch, that would be most helpful!
-Toke
next prev parent reply other threads:[~2020-10-30 13:02 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-27 15:43 PCI trouble on mvebu (Turris Omnia) Toke Høiland-Jørgensen
2020-10-27 17:20 ` Bjorn Helgaas
2020-10-27 17:44 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-27 18:59 ` Toke Høiland-Jørgensen
2020-10-27 20:20 ` Toke Høiland-Jørgensen
2020-10-27 21:22 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-27 21:31 ` Toke Høiland-Jørgensen
2020-10-27 22:01 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-27 22:12 ` Toke Høiland-Jørgensen
2020-10-27 18:56 ` Toke Høiland-Jørgensen
2020-10-28 13:36 ` Toke Høiland-Jørgensen
2020-10-28 14:42 ` Bjorn Helgaas
2020-10-28 15:08 ` Toke Høiland-Jørgensen
2020-10-28 16:40 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-28 23:16 ` Bjorn Helgaas
2020-10-29 10:09 ` Pali Rohár
2020-10-29 10:56 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-29 11:12 ` Toke Høiland-Jørgensen
2020-10-29 19:30 ` Bjorn Helgaas
2020-10-29 19:56 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-29 19:57 ` Andrew Lunn
2020-10-29 21:55 ` Thomas Petazzoni
2020-10-29 20:18 ` Toke Høiland-Jørgensen
2020-10-29 22:09 ` Toke Høiland-Jørgensen
2020-10-29 20:58 ` Marek Behun
2020-10-30 10:08 ` Pali Rohár
2020-10-30 10:45 ` Marek Behun
2020-10-29 21:54 ` Thomas Petazzoni
2020-10-29 23:15 ` Toke Høiland-Jørgensen
2020-10-30 8:23 ` Thomas Petazzoni
2020-10-30 10:15 ` Pali Rohár
2020-10-29 10:41 ` Toke Høiland-Jørgensen
2020-10-29 11:18 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-30 11:23 ` Pali Rohár
2020-10-30 13:02 ` Toke Høiland-Jørgensen [this message]
2020-10-30 14:23 ` Pali Rohár
2020-10-30 14:54 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-31 12:49 ` Toke Høiland-Jørgensen
2020-11-02 15:24 ` Pali Rohár
2020-11-02 15:54 ` Toke Høiland-Jørgensen
2020-11-02 16:18 ` ™֟☻̭҇ Ѽ ҉ ®
2020-11-02 16:33 ` Toke Høiland-Jørgensen
2021-03-15 19:58 ` Pali Rohár
2021-03-16 9:25 ` Pali Rohár
2021-03-18 22:43 ` Toke Høiland-Jørgensen
2021-03-18 23:16 ` Pali Rohár
2021-03-26 12:50 ` Pali Rohár
2021-03-26 15:25 ` Toke Høiland-Jørgensen
2021-03-26 15:34 ` Pali Rohár
2021-03-26 16:54 ` Toke Høiland-Jørgensen
2021-03-26 17:11 ` Pali Rohár
2021-03-26 17:51 ` Toke Høiland-Jørgensen
2021-03-29 17:09 ` Pali Rohár
2021-03-31 14:02 ` Toke Høiland-Jørgensen
2021-03-31 16:15 ` Pali Rohár
2021-03-31 16:53 ` Toke Høiland-Jørgensen
2020-10-29 1:21 ` Marek Behun
2020-10-29 15:12 ` Rob Herring
2020-10-27 18:03 ` Marek Behun
2020-10-27 19:00 ` Toke Høiland-Jørgensen
2020-10-27 20:19 ` Marek Behun
2020-10-27 20:49 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k0v7n9y9.fsf@toke.dk \
--to=toke@redhat.com \
--cc=helgaas@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jason@lakedaemon.net \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=marek.behun@nic.cz \
--cc=pali@kernel.org \
--cc=robh@kernel.org \
--cc=thomas.petazzoni@bootlin.com \
--cc=vtolkm@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox