From: Marek Vasut <marek.vasut@mailbox.org>
To: Bjorn Helgaas <helgaas@kernel.org>,
Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-pci@vger.kernel.org,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Magnus Damm" <magnus.damm@gmail.com>,
"Manivannan Sadhasivam" <mani@kernel.org>,
"Rob Herring" <robh@kernel.org>,
"Yoshihiro Shimoda" <yoshihiro.shimoda.uh@renesas.com>,
linux-renesas-soc@vger.kernel.org
Subject: Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
Date: Wed, 17 Sep 2025 00:09:53 +0200 [thread overview]
Message-ID: <643c9b19-dda4-43c5-bb6d-aa0705053d43@mailbox.org> (raw)
In-Reply-To: <20250916181558.GA1810654@bhelgaas>
[-- Attachment #1: Type: text/plain, Size: 5892 bytes --]
On 9/16/25 8:15 PM, Bjorn Helgaas wrote:
> On Tue, Sep 16, 2025 at 07:39:07PM +0200, Marek Vasut wrote:
>> On 9/16/25 7:13 PM, Bjorn Helgaas wrote:
>>> On Tue, Sep 16, 2025 at 06:31:31PM +0200, Marek Vasut wrote:
>>>> On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:
>>>>> On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
>>>>>> On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
>>>
>>>>>>> This change looks correct, and fixes the timeout seen on White Hawk
>>>>>>> with CONFIG_DEBUG_LOCK_ALLOC=y.
>>>>>>> However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
>>>>>>>
>>>>>>> SError Interrupt on CPU0, code 0x00000000be000011 -- SError
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>> el1h_64_error_handler+0x2c/0x40
>>>>>>> el1h_64_error+0x70/0x74
>>>>>>> dw_pcie_read+0x20/0x74 (P)
>>>>>>> rcar_gen4_pcie_additional_common_init+0x1c/0x6c
>>>>>>
>>>>>> SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
>>>>>> this fix.
>>>>>>
>>>>>> Does the following patch make this SError go away ?
>>>>>
>>>>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>>>> @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
>>>>>> rcar_gen4_pcie *rcar)
>>>>>> if (ret)
>>>>>> goto err_unprepare;
>>>>>>
>>>>>> +mdelay(1);
>>>>>> +
>>>>>> if (rcar->drvdata->additional_common_init)
>>>>>> rcar->drvdata->additional_common_init(rcar);
>>>>>>
>>>>>
>>>>> Yes it does, thanks!
>>>>
>>>> So with this one extra mdelay(1), the PCIe is fully good on your side, or is
>>>> there still anything weird/flaky/malfunctioning ?
>>>
>>> Do we have a theory about why this delay is needed? I assume it's
>>> something specific to the rcar hardware (not something required by the
>>> PCIe base spec)?
>>>
>>> I'm seeing SError interrupts and external aborts on several arm64
>>> systems and I'd like to understand better why they happen and when/if
>>> we can recover from them.
>>
>> The SError here happens when the PWR RST is released and DBI is
>> accessed rapidly right after that. The current hypothesis is, that
>> the controller core needs a bit of time to initialize itself after
>> the reset is released, before DBI access can be performed ; if the
>> DBI access happens too soon after the reset was released, the core
>> reject the access and CPU interprets that as SError.
>
> Sounds like this SError is for something on the CPU side of the host
> bridge.
Since this only happens when accessing DBI registers shortly after the
reset was deasserted, it seems that way.
> I've also seen errors that seem to be related to errors on the PCIe
> side, e.g., a PCIe Completion Timeout or Unsupported Request. On most
> platforms, those result in writes being silently dropped or ~0 data
> being synthesized for reads.
>
> E.g., this SError that seems related to a BAR programming issue:
> https://lore.kernel.org/linux-pci/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/
The SError is reported for basically almost any AXI bus error, some IO
operation happened and it didn't complete or some such, so it is
expected to see a lot of disparate SErrors like that.
In this case, it isn't anything on the bridge side yet, the bridge isn't
even fully configured and the link is down when the SError happens here.
>> The reference manual however does not list any such delay, which
>> makes me concerned. I can send such a patch which adds the mdelay(1)
>> as a temporary stopgap fix, until some better explanation maybe
>> sometimes gets uncovered, if that would be OK ?
>
> Yeah, if it's some rcar-specific thing, I don't see a better
> alternative.
I have one more hypothesis.
I noticed in V4H RM rev.1.30 page 798 that CPG (reset) and PCIe are on
different busses. From CA76, the CPG reset is reachable via "CCI->Slave
Access BUS->CPG" and PCIe is reachable via "CCI->Slave Access BUS->HSC".
I wonder if a sequence like this:
- writel(CPG un-reset register)
- readl(PCI DBI register)
can, due to bus behavior, lead to readl(PCI DBI register) taking effect
on the PCIe IP _before_ the writel(CPG un-reset register) takes effect
on the CPG IP, which trigger the SError (coming from PCIe IP).
Or is it guaranteed that writel(some IP) followed by readl(another IP)
are strictly ordered in this manner even on the IP end at which they
arrive, even if the two IPs are on different busses ? Please pardon my
ignorance.
Does the attached diff also help Geert with the SError ?
Same diff is also inline below:
"
diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c
b/drivers/clk/renesas/renesas-cpg-mssr.c
index be9f59e6975d..cb380ba20141 100644
--- a/drivers/clk/renesas/renesas-cpg-mssr.c
+++ b/drivers/clk/renesas/renesas-cpg-mssr.c
@@ -688,12 +688,14 @@ static int cpg_mssr_reset(struct
reset_controller_dev *rcdev,
/* Reset module */
writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
+ readl(priv->pub.base0 + priv->reset_regs[reg]);
/* Wait for at least one cycle of the RCLK clock (@ ca. 32 kHz) */
udelay(35);
/* Release module from reset state */
writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
+ readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
return 0;
}
@@ -708,6 +710,7 @@ static int cpg_mssr_assert(struct
reset_controller_dev *rcdev, unsigned long id)
dev_dbg(priv->dev, "assert %u%02u\n", reg, bit);
writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
+ readl(priv->pub.base0 + priv->reset_regs[reg]);
return 0;
}
@@ -722,6 +725,7 @@ static int cpg_mssr_deassert(struct
reset_controller_dev *rcdev,
dev_dbg(priv->dev, "deassert %u%02u\n", reg, bit);
writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
+ readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
return 0;
}
"
[-- Attachment #2: reset.diff --]
[-- Type: text/x-patch, Size: 1266 bytes --]
diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c b/drivers/clk/renesas/renesas-cpg-mssr.c
index be9f59e6975d..cb380ba20141 100644
--- a/drivers/clk/renesas/renesas-cpg-mssr.c
+++ b/drivers/clk/renesas/renesas-cpg-mssr.c
@@ -688,12 +688,14 @@ static int cpg_mssr_reset(struct reset_controller_dev *rcdev,
/* Reset module */
writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
+ readl(priv->pub.base0 + priv->reset_regs[reg]);
/* Wait for at least one cycle of the RCLK clock (@ ca. 32 kHz) */
udelay(35);
/* Release module from reset state */
writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
+ readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
return 0;
}
@@ -708,6 +710,7 @@ static int cpg_mssr_assert(struct reset_controller_dev *rcdev, unsigned long id)
dev_dbg(priv->dev, "assert %u%02u\n", reg, bit);
writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
+ readl(priv->pub.base0 + priv->reset_regs[reg]);
return 0;
}
@@ -722,6 +725,7 @@ static int cpg_mssr_deassert(struct reset_controller_dev *rcdev,
dev_dbg(priv->dev, "deassert %u%02u\n", reg, bit);
writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
+ readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
return 0;
}
next prev parent reply other threads:[~2025-09-16 22:10 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 23:58 [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization Marek Vasut
2025-09-16 9:59 ` Geert Uytterhoeven
2025-09-16 13:39 ` Marek Vasut
2025-09-16 13:57 ` Geert Uytterhoeven
2025-09-16 16:31 ` Marek Vasut
2025-09-16 17:13 ` Bjorn Helgaas
2025-09-16 17:39 ` Marek Vasut
2025-09-16 18:15 ` Bjorn Helgaas
2025-09-16 22:09 ` Marek Vasut [this message]
2025-09-17 8:00 ` Geert Uytterhoeven
2025-09-17 13:44 ` Marek Vasut
2025-09-17 7:23 ` Geert Uytterhoeven
2025-09-18 3:16 ` Marek Vasut
2025-09-22 10:10 ` Geert Uytterhoeven
2025-09-22 15:17 ` Marek Vasut
2025-09-22 15:33 ` Geert Uytterhoeven
2025-09-22 15:49 ` Marek Vasut
2025-09-23 7:04 ` Geert Uytterhoeven
2025-09-25 16:36 ` Manivannan Sadhasivam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=643c9b19-dda4-43c5-bb6d-aa0705053d43@mailbox.org \
--to=marek.vasut@mailbox.org \
--cc=bhelgaas@google.com \
--cc=geert@linux-m68k.org \
--cc=helgaas@kernel.org \
--cc=kwilczynski@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-renesas-soc@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=magnus.damm@gmail.com \
--cc=mani@kernel.org \
--cc=robh@kernel.org \
--cc=yoshihiro.shimoda.uh@renesas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox