Linux PCI subsystem development
 help / color / mirror / Atom feed
* [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
@ 2025-09-15 23:58 Marek Vasut
  2025-09-16  9:59 ` Geert Uytterhoeven
  2025-09-25 16:36 ` Manivannan Sadhasivam
  0 siblings, 2 replies; 19+ messages in thread
From: Marek Vasut @ 2025-09-15 23:58 UTC (permalink / raw)
  To: linux-pci
  Cc: Marek Vasut, Krzysztof Wilczyński, Bjorn Helgaas,
	Geert Uytterhoeven, Lorenzo Pieralisi, Magnus Damm,
	Manivannan Sadhasivam, Rob Herring, Yoshihiro Shimoda,
	linux-renesas-soc

R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
Figure 104.3b Initial Setting of PCIEC(example), third quarter of the figure
indicates that register 0xf8 should be polled until bit 18 becomes set to 1.

Register 0xf8 bit 18 is 0 immediately after write to PCIERSTCTRL1 and is set
to 1 in less than 1 ms afterward. The current readl_poll_timeout() break
condition is inverted and returns when register 0xf8 bit 18 is set to 0,
which in most cases means immediately. In case CONFIG_DEBUG_LOCK_ALLOC=y ,
the timing changes just enough for the first readl_poll_timeout() poll to
already read register 0xf8 bit 18 as 1 and afterward never read register
0xf8 bit 18 as 0, which leads to timeout and failure to start the PCIe
controller.

Fix this by inverting the poll condition to match the reference manual
initialization sequence.

Fixes: faf5a975ee3b ("PCI: rcar-gen4: Add support for R-Car V4H")
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
---
Cc: "Krzysztof Wilczyński" <kwilczynski@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Manivannan Sadhasivam <mani@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: linux-pci@vger.kernel.org
Cc: linux-renesas-soc@vger.kernel.org
---
 drivers/pci/controller/dwc/pcie-rcar-gen4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pcie-rcar-gen4.c b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
index 5932f83996f0..dafec70837fe 100644
--- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
+++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
@@ -711,7 +711,7 @@ static int rcar_gen4_pcie_ltssm_control(struct rcar_gen4_pcie *rcar, bool enable
 	val &= ~APP_HOLD_PHY_RST;
 	writel(val, rcar->base + PCIERSTCTRL1);
 
-	ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, !(val & BIT(18)), 100, 10000);
+	ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, val & BIT(18), 100, 10000);
 	if (ret < 0)
 		return ret;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-15 23:58 [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization Marek Vasut
@ 2025-09-16  9:59 ` Geert Uytterhoeven
  2025-09-16 13:39   ` Marek Vasut
  2025-09-25 16:36 ` Manivannan Sadhasivam
  1 sibling, 1 reply; 19+ messages in thread
From: Geert Uytterhoeven @ 2025-09-16  9:59 UTC (permalink / raw)
  To: Marek Vasut
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

Hi Marek,

On Tue, 16 Sept 2025 at 01:59, Marek Vasut
<marek.vasut+renesas@mailbox.org> wrote:
> R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
> Figure 104.3b Initial Setting of PCIEC(example), third quarter of the figure
> indicates that register 0xf8 should be polled until bit 18 becomes set to 1.
>
> Register 0xf8 bit 18 is 0 immediately after write to PCIERSTCTRL1 and is set
> to 1 in less than 1 ms afterward. The current readl_poll_timeout() break
> condition is inverted and returns when register 0xf8 bit 18 is set to 0,
> which in most cases means immediately. In case CONFIG_DEBUG_LOCK_ALLOC=y ,
> the timing changes just enough for the first readl_poll_timeout() poll to
> already read register 0xf8 bit 18 as 1 and afterward never read register
> 0xf8 bit 18 as 0, which leads to timeout and failure to start the PCIe
> controller.
>
> Fix this by inverting the poll condition to match the reference manual
> initialization sequence.
>
> Fixes: faf5a975ee3b ("PCI: rcar-gen4: Add support for R-Car V4H")
> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>

Thanks for your patch!

> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> @@ -711,7 +711,7 @@ static int rcar_gen4_pcie_ltssm_control(struct rcar_gen4_pcie *rcar, bool enable
>         val &= ~APP_HOLD_PHY_RST;
>         writel(val, rcar->base + PCIERSTCTRL1);
>
> -       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, !(val & BIT(18)), 100, 10000);
> +       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, val & BIT(18), 100, 10000);
>         if (ret < 0)
>                 return ret;
>

This change looks correct, and fixes the timeout seen on White Hawk
with CONFIG_DEBUG_LOCK_ALLOC=y.
However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:

    SError Interrupt on CPU0, code 0x00000000be000011 -- SError
    CPU: 0 UID: 0 PID: 14 Comm: kworker/u16:1 Tainted: G   M
     6.17.0-rc5-rcar3-06070-gdadc15c5e4ba #523 NONE
    Tainted: [M]=MACHINE_CHECK
    Hardware name: Renesas White Hawk CPU and Breakout boards based on
r8a779g0 (DT)
    Workqueue: async async_run_entry_fn
    pstate: 60000009 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : dw_pcie_read+0x20/0x74
    lr : dw_pcie_read_dbi+0x74/0x9c
    sp : ffffffc086123ae0
    x29: ffffffc086123af0 x28: ffffff84403454e0 x27: ffffff8440343405
    x26: 61c8864680b583eb x25: ffffff8441769010 x24: ffffff86bedb7f80
    x23: ffffff8441769010 x22: ffffff8442b890d8 x21: ffffff8442b89650
    x20: 0000000000000000 x19: ffffff8442b89080 x18: 00000000e3b92ec8
    x17: ffffffc08002bcc8 x16: ffffffc08021d3c8 x15: ffffffc0801f10b8
    x14: ffffffc0861238d0 x13: 1ffffff0883aa021 x12: ffffffc080b46d14
    x11: 0000000000000000 x10: ffffffffffffffff x9 : 0000000000000001
    x8 : ffffffc0861239f0 x7 : 0000000000000000 x6 : ffffffc086616000
    x5 : 0000000000000714 x4 : ffffff8442b89080 x3 : 0000000000000003
    x2 : ffffffc086123ae4 x1 : 0000000000000000 x0 : ffffffc086616714
    Kernel panic - not syncing: Asynchronous SError Interrupt
    CPU: 0 UID: 0 PID: 14 Comm: kworker/u16:1 Tainted: G   M
     6.17.0-rc5-rcar3-06070-gdadc15c5e4ba #523 NONE
    Tainted: [M]=MACHINE_CHECK
    Hardware name: Renesas White Hawk CPU and Breakout boards based on
r8a779g0 (DT)
    Workqueue: async async_run_entry_fn
    Call trace:
     show_stack+0x14/0x1c (C)
     dump_stack_lvl+0x64/0x8c
     dump_stack+0x14/0x1c
     vpanic+0x10c/0x2f0
     nmi_panic+0x0/0x64
     nmi_panic+0x50/0x64
     arm64_serror_panic+0x68/0x74
     do_serror+0x2c/0x48
     el1h_64_error_handler+0x2c/0x40
     el1h_64_error+0x70/0x74
     dw_pcie_read+0x20/0x74 (P)
     rcar_gen4_pcie_additional_common_init+0x1c/0x6c
     rcar_gen4_pcie_host_init+0xbc/0x17c
     dw_pcie_host_init+0x258/0x428
     rcar_gen4_pcie_probe+0x198/0x1b4
     platform_probe+0x58/0x88
     really_probe+0x130/0x260
     __driver_probe_device+0xec/0x104
     driver_probe_device+0x3c/0xc0
     __device_attach_driver+0x58/0xcc
     bus_for_each_drv+0xb8/0xe0
     __device_attach_async_helper+0x70/0xc4
     async_run_entry_fn+0x34/0xe0
     process_scheduled_works+0x204/0x308
     worker_thread+0x144/0x1e0
     kthread+0x198/0x1a8
     ret_from_fork+0x10/0x20
    Kernel Offset: disabled
    CPU features: 0x000000,00007000,24003041,0400700b
    Memory Limit: none
    ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-16  9:59 ` Geert Uytterhoeven
@ 2025-09-16 13:39   ` Marek Vasut
  2025-09-16 13:57     ` Geert Uytterhoeven
  0 siblings, 1 reply; 19+ messages in thread
From: Marek Vasut @ 2025-09-16 13:39 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
> Hi Marek,

Hello Geert,

> On Tue, 16 Sept 2025 at 01:59, Marek Vasut
> <marek.vasut+renesas@mailbox.org> wrote:
>> R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
>> Figure 104.3b Initial Setting of PCIEC(example), third quarter of the figure
>> indicates that register 0xf8 should be polled until bit 18 becomes set to 1.
>>
>> Register 0xf8 bit 18 is 0 immediately after write to PCIERSTCTRL1 and is set
>> to 1 in less than 1 ms afterward. The current readl_poll_timeout() break
>> condition is inverted and returns when register 0xf8 bit 18 is set to 0,
>> which in most cases means immediately. In case CONFIG_DEBUG_LOCK_ALLOC=y ,
>> the timing changes just enough for the first readl_poll_timeout() poll to
>> already read register 0xf8 bit 18 as 1 and afterward never read register
>> 0xf8 bit 18 as 0, which leads to timeout and failure to start the PCIe
>> controller.
>>
>> Fix this by inverting the poll condition to match the reference manual
>> initialization sequence.
>>
>> Fixes: faf5a975ee3b ("PCI: rcar-gen4: Add support for R-Car V4H")
>> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
> 
> Thanks for your patch!
> 
>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>> @@ -711,7 +711,7 @@ static int rcar_gen4_pcie_ltssm_control(struct rcar_gen4_pcie *rcar, bool enable
>>          val &= ~APP_HOLD_PHY_RST;
>>          writel(val, rcar->base + PCIERSTCTRL1);
>>
>> -       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, !(val & BIT(18)), 100, 10000);
>> +       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, val & BIT(18), 100, 10000);
>>          if (ret < 0)
>>                  return ret;
>>
> 
> This change looks correct, and fixes the timeout seen on White Hawk
> with CONFIG_DEBUG_LOCK_ALLOC=y.
> However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
> 
>      SError Interrupt on CPU0, code 0x00000000be000011 -- SError

...

>       el1h_64_error_handler+0x2c/0x40
>       el1h_64_error+0x70/0x74
>       dw_pcie_read+0x20/0x74 (P)
>       rcar_gen4_pcie_additional_common_init+0x1c/0x6c

SError in rcar_gen4_pcie_additional_common_init , this is unrelated to 
this fix.

Does the following patch make this SError go away ?

"
diff --git a/drivers/pci/controller/dwc/pcie-rcar-gen4.c 
b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
index d787f2a364d5..5932f83996f0 100644
--- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
+++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
@@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct 
rcar_gen4_pcie *rcar)
         if (ret)
                 goto err_unprepare;

+mdelay(1);
+
         if (rcar->drvdata->additional_common_init)
                 rcar->drvdata->additional_common_init(rcar);

"

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-16 13:39   ` Marek Vasut
@ 2025-09-16 13:57     ` Geert Uytterhoeven
  2025-09-16 16:31       ` Marek Vasut
  0 siblings, 1 reply; 19+ messages in thread
From: Geert Uytterhoeven @ 2025-09-16 13:57 UTC (permalink / raw)
  To: Marek Vasut
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

Hi Marek,

On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
> On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
> > On Tue, 16 Sept 2025 at 01:59, Marek Vasut
> > <marek.vasut+renesas@mailbox.org> wrote:
> >> R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
> >> Figure 104.3b Initial Setting of PCIEC(example), third quarter of the figure
> >> indicates that register 0xf8 should be polled until bit 18 becomes set to 1.
> >>
> >> Register 0xf8 bit 18 is 0 immediately after write to PCIERSTCTRL1 and is set
> >> to 1 in less than 1 ms afterward. The current readl_poll_timeout() break
> >> condition is inverted and returns when register 0xf8 bit 18 is set to 0,
> >> which in most cases means immediately. In case CONFIG_DEBUG_LOCK_ALLOC=y ,
> >> the timing changes just enough for the first readl_poll_timeout() poll to
> >> already read register 0xf8 bit 18 as 1 and afterward never read register
> >> 0xf8 bit 18 as 0, which leads to timeout and failure to start the PCIe
> >> controller.
> >>
> >> Fix this by inverting the poll condition to match the reference manual
> >> initialization sequence.
> >>
> >> Fixes: faf5a975ee3b ("PCI: rcar-gen4: Add support for R-Car V4H")
> >> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
> >
> > Thanks for your patch!
> >
> >> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >> @@ -711,7 +711,7 @@ static int rcar_gen4_pcie_ltssm_control(struct rcar_gen4_pcie *rcar, bool enable
> >>          val &= ~APP_HOLD_PHY_RST;
> >>          writel(val, rcar->base + PCIERSTCTRL1);
> >>
> >> -       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, !(val & BIT(18)), 100, 10000);
> >> +       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, val & BIT(18), 100, 10000);
> >>          if (ret < 0)
> >>                  return ret;
> >>
> >
> > This change looks correct, and fixes the timeout seen on White Hawk
> > with CONFIG_DEBUG_LOCK_ALLOC=y.
> > However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
> >
> >      SError Interrupt on CPU0, code 0x00000000be000011 -- SError
>
> ...
>
> >       el1h_64_error_handler+0x2c/0x40
> >       el1h_64_error+0x70/0x74
> >       dw_pcie_read+0x20/0x74 (P)
> >       rcar_gen4_pcie_additional_common_init+0x1c/0x6c
>
> SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
> this fix.
>
> Does the following patch make this SError go away ?

> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
> rcar_gen4_pcie *rcar)
>          if (ret)
>                  goto err_unprepare;
>
> +mdelay(1);
> +
>          if (rcar->drvdata->additional_common_init)
>                  rcar->drvdata->additional_common_init(rcar);
>

Yes it does, thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-16 13:57     ` Geert Uytterhoeven
@ 2025-09-16 16:31       ` Marek Vasut
  2025-09-16 17:13         ` Bjorn Helgaas
  2025-09-17  7:23         ` Geert Uytterhoeven
  0 siblings, 2 replies; 19+ messages in thread
From: Marek Vasut @ 2025-09-16 16:31 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:

Hello Geert,

> On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
>> On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
>>> On Tue, 16 Sept 2025 at 01:59, Marek Vasut
>>> <marek.vasut+renesas@mailbox.org> wrote:
>>>> R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
>>>> Figure 104.3b Initial Setting of PCIEC(example), third quarter of the figure
>>>> indicates that register 0xf8 should be polled until bit 18 becomes set to 1.
>>>>
>>>> Register 0xf8 bit 18 is 0 immediately after write to PCIERSTCTRL1 and is set
>>>> to 1 in less than 1 ms afterward. The current readl_poll_timeout() break
>>>> condition is inverted and returns when register 0xf8 bit 18 is set to 0,
>>>> which in most cases means immediately. In case CONFIG_DEBUG_LOCK_ALLOC=y ,
>>>> the timing changes just enough for the first readl_poll_timeout() poll to
>>>> already read register 0xf8 bit 18 as 1 and afterward never read register
>>>> 0xf8 bit 18 as 0, which leads to timeout and failure to start the PCIe
>>>> controller.
>>>>
>>>> Fix this by inverting the poll condition to match the reference manual
>>>> initialization sequence.
>>>>
>>>> Fixes: faf5a975ee3b ("PCI: rcar-gen4: Add support for R-Car V4H")
>>>> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
>>>
>>> Thanks for your patch!
>>>
>>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>> @@ -711,7 +711,7 @@ static int rcar_gen4_pcie_ltssm_control(struct rcar_gen4_pcie *rcar, bool enable
>>>>           val &= ~APP_HOLD_PHY_RST;
>>>>           writel(val, rcar->base + PCIERSTCTRL1);
>>>>
>>>> -       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, !(val & BIT(18)), 100, 10000);
>>>> +       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, val & BIT(18), 100, 10000);
>>>>           if (ret < 0)
>>>>                   return ret;
>>>>
>>>
>>> This change looks correct, and fixes the timeout seen on White Hawk
>>> with CONFIG_DEBUG_LOCK_ALLOC=y.
>>> However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
>>>
>>>       SError Interrupt on CPU0, code 0x00000000be000011 -- SError
>>
>> ...
>>
>>>        el1h_64_error_handler+0x2c/0x40
>>>        el1h_64_error+0x70/0x74
>>>        dw_pcie_read+0x20/0x74 (P)
>>>        rcar_gen4_pcie_additional_common_init+0x1c/0x6c
>>
>> SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
>> this fix.
>>
>> Does the following patch make this SError go away ?
> 
>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>> @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
>> rcar_gen4_pcie *rcar)
>>           if (ret)
>>                   goto err_unprepare;
>>
>> +mdelay(1);
>> +
>>           if (rcar->drvdata->additional_common_init)
>>                   rcar->drvdata->additional_common_init(rcar);
>>
> 
> Yes it does, thanks!
So with this one extra mdelay(1), the PCIe is fully good on your side, 
or is there still anything weird/flaky/malfunctioning ?

If you could give me a RB/TB on this fix, it would be nice.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-16 16:31       ` Marek Vasut
@ 2025-09-16 17:13         ` Bjorn Helgaas
  2025-09-16 17:39           ` Marek Vasut
  2025-09-17  7:23         ` Geert Uytterhoeven
  1 sibling, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2025-09-16 17:13 UTC (permalink / raw)
  To: Marek Vasut
  Cc: Geert Uytterhoeven, linux-pci, Krzysztof Wilczyński,
	Bjorn Helgaas, Lorenzo Pieralisi, Magnus Damm,
	Manivannan Sadhasivam, Rob Herring, Yoshihiro Shimoda,
	linux-renesas-soc

On Tue, Sep 16, 2025 at 06:31:31PM +0200, Marek Vasut wrote:
> On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:
> > On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
> > > On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:

> > > > This change looks correct, and fixes the timeout seen on White Hawk
> > > > with CONFIG_DEBUG_LOCK_ALLOC=y.
> > > > However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
> > > > 
> > > >       SError Interrupt on CPU0, code 0x00000000be000011 -- SError
> > > 
> > > ...
> > > 
> > > >        el1h_64_error_handler+0x2c/0x40
> > > >        el1h_64_error+0x70/0x74
> > > >        dw_pcie_read+0x20/0x74 (P)
> > > >        rcar_gen4_pcie_additional_common_init+0x1c/0x6c
> > > 
> > > SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
> > > this fix.
> > > 
> > > Does the following patch make this SError go away ?
> > 
> > > --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> > > +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> > > @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
> > > rcar_gen4_pcie *rcar)
> > >           if (ret)
> > >                   goto err_unprepare;
> > > 
> > > +mdelay(1);
> > > +
> > >           if (rcar->drvdata->additional_common_init)
> > >                   rcar->drvdata->additional_common_init(rcar);
> > > 
> > 
> > Yes it does, thanks!
>
> So with this one extra mdelay(1), the PCIe is fully good on your side, or is
> there still anything weird/flaky/malfunctioning ?

Do we have a theory about why this delay is needed?  I assume it's
something specific to the rcar hardware (not something required by the
PCIe base spec)?

I'm seeing SError interrupts and external aborts on several arm64
systems and I'd like to understand better why they happen and when/if
we can recover from them.

Bjorn

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-16 17:13         ` Bjorn Helgaas
@ 2025-09-16 17:39           ` Marek Vasut
  2025-09-16 18:15             ` Bjorn Helgaas
  0 siblings, 1 reply; 19+ messages in thread
From: Marek Vasut @ 2025-09-16 17:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Geert Uytterhoeven, linux-pci, Krzysztof Wilczyński,
	Bjorn Helgaas, Lorenzo Pieralisi, Magnus Damm,
	Manivannan Sadhasivam, Rob Herring, Yoshihiro Shimoda,
	linux-renesas-soc

On 9/16/25 7:13 PM, Bjorn Helgaas wrote:
> On Tue, Sep 16, 2025 at 06:31:31PM +0200, Marek Vasut wrote:
>> On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:
>>> On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
>>>> On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
> 
>>>>> This change looks correct, and fixes the timeout seen on White Hawk
>>>>> with CONFIG_DEBUG_LOCK_ALLOC=y.
>>>>> However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
>>>>>
>>>>>        SError Interrupt on CPU0, code 0x00000000be000011 -- SError
>>>>
>>>> ...
>>>>
>>>>>         el1h_64_error_handler+0x2c/0x40
>>>>>         el1h_64_error+0x70/0x74
>>>>>         dw_pcie_read+0x20/0x74 (P)
>>>>>         rcar_gen4_pcie_additional_common_init+0x1c/0x6c
>>>>
>>>> SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
>>>> this fix.
>>>>
>>>> Does the following patch make this SError go away ?
>>>
>>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>> @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
>>>> rcar_gen4_pcie *rcar)
>>>>            if (ret)
>>>>                    goto err_unprepare;
>>>>
>>>> +mdelay(1);
>>>> +
>>>>            if (rcar->drvdata->additional_common_init)
>>>>                    rcar->drvdata->additional_common_init(rcar);
>>>>
>>>
>>> Yes it does, thanks!
>>
>> So with this one extra mdelay(1), the PCIe is fully good on your side, or is
>> there still anything weird/flaky/malfunctioning ?
> 
> Do we have a theory about why this delay is needed?  I assume it's
> something specific to the rcar hardware (not something required by the
> PCIe base spec)?
> 
> I'm seeing SError interrupts and external aborts on several arm64
> systems and I'd like to understand better why they happen and when/if
> we can recover from them.
The SError here happens when the PWR RST is released and DBI is accessed 
rapidly right after that. The current hypothesis is, that the controller 
core needs a bit of time to initialize itself after the reset is 
released, before DBI access can be performed ; if the DBI access happens 
too soon after the reset was released, the core reject the access and 
CPU interprets that as SError.

The reference manual however does not list any such delay, which makes 
me concerned. I can send such a patch which adds the mdelay(1) as a 
temporary stopgap fix, until some better explanation maybe sometimes 
gets uncovered, if that would be OK ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-16 17:39           ` Marek Vasut
@ 2025-09-16 18:15             ` Bjorn Helgaas
  2025-09-16 22:09               ` Marek Vasut
  0 siblings, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2025-09-16 18:15 UTC (permalink / raw)
  To: Marek Vasut
  Cc: Geert Uytterhoeven, linux-pci, Krzysztof Wilczyński,
	Bjorn Helgaas, Lorenzo Pieralisi, Magnus Damm,
	Manivannan Sadhasivam, Rob Herring, Yoshihiro Shimoda,
	linux-renesas-soc

On Tue, Sep 16, 2025 at 07:39:07PM +0200, Marek Vasut wrote:
> On 9/16/25 7:13 PM, Bjorn Helgaas wrote:
> > On Tue, Sep 16, 2025 at 06:31:31PM +0200, Marek Vasut wrote:
> > > On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:
> > > > On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
> > > > > On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
> > 
> > > > > > This change looks correct, and fixes the timeout seen on White Hawk
> > > > > > with CONFIG_DEBUG_LOCK_ALLOC=y.
> > > > > > However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
> > > > > > 
> > > > > >        SError Interrupt on CPU0, code 0x00000000be000011 -- SError
> > > > > 
> > > > > ...
> > > > > 
> > > > > >         el1h_64_error_handler+0x2c/0x40
> > > > > >         el1h_64_error+0x70/0x74
> > > > > >         dw_pcie_read+0x20/0x74 (P)
> > > > > >         rcar_gen4_pcie_additional_common_init+0x1c/0x6c
> > > > > 
> > > > > SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
> > > > > this fix.
> > > > > 
> > > > > Does the following patch make this SError go away ?
> > > > 
> > > > > --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> > > > > +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> > > > > @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
> > > > > rcar_gen4_pcie *rcar)
> > > > >            if (ret)
> > > > >                    goto err_unprepare;
> > > > > 
> > > > > +mdelay(1);
> > > > > +
> > > > >            if (rcar->drvdata->additional_common_init)
> > > > >                    rcar->drvdata->additional_common_init(rcar);
> > > > > 
> > > > 
> > > > Yes it does, thanks!
> > > 
> > > So with this one extra mdelay(1), the PCIe is fully good on your side, or is
> > > there still anything weird/flaky/malfunctioning ?
> > 
> > Do we have a theory about why this delay is needed?  I assume it's
> > something specific to the rcar hardware (not something required by the
> > PCIe base spec)?
> > 
> > I'm seeing SError interrupts and external aborts on several arm64
> > systems and I'd like to understand better why they happen and when/if
> > we can recover from them.
>
> The SError here happens when the PWR RST is released and DBI is
> accessed rapidly right after that. The current hypothesis is, that
> the controller core needs a bit of time to initialize itself after
> the reset is released, before DBI access can be performed ; if the
> DBI access happens too soon after the reset was released, the core
> reject the access and CPU interprets that as SError.

Sounds like this SError is for something on the CPU side of the host
bridge.

I've also seen errors that seem to be related to errors on the PCIe
side, e.g., a PCIe Completion Timeout or Unsupported Request.  On most
platforms, those result in writes being silently dropped or ~0 data
being synthesized for reads.

E.g., this SError that seems related to a BAR programming issue:
https://lore.kernel.org/linux-pci/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/

> The reference manual however does not list any such delay, which
> makes me concerned. I can send such a patch which adds the mdelay(1)
> as a temporary stopgap fix, until some better explanation maybe
> sometimes gets uncovered, if that would be OK ?

Yeah, if it's some rcar-specific thing, I don't see a better
alternative.

Bjorn

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-16 18:15             ` Bjorn Helgaas
@ 2025-09-16 22:09               ` Marek Vasut
  2025-09-17  8:00                 ` Geert Uytterhoeven
  0 siblings, 1 reply; 19+ messages in thread
From: Marek Vasut @ 2025-09-16 22:09 UTC (permalink / raw)
  To: Bjorn Helgaas, Geert Uytterhoeven
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

[-- Attachment #1: Type: text/plain, Size: 5892 bytes --]

On 9/16/25 8:15 PM, Bjorn Helgaas wrote:
> On Tue, Sep 16, 2025 at 07:39:07PM +0200, Marek Vasut wrote:
>> On 9/16/25 7:13 PM, Bjorn Helgaas wrote:
>>> On Tue, Sep 16, 2025 at 06:31:31PM +0200, Marek Vasut wrote:
>>>> On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:
>>>>> On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
>>>>>> On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
>>>
>>>>>>> This change looks correct, and fixes the timeout seen on White Hawk
>>>>>>> with CONFIG_DEBUG_LOCK_ALLOC=y.
>>>>>>> However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
>>>>>>>
>>>>>>>         SError Interrupt on CPU0, code 0x00000000be000011 -- SError
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>>          el1h_64_error_handler+0x2c/0x40
>>>>>>>          el1h_64_error+0x70/0x74
>>>>>>>          dw_pcie_read+0x20/0x74 (P)
>>>>>>>          rcar_gen4_pcie_additional_common_init+0x1c/0x6c
>>>>>>
>>>>>> SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
>>>>>> this fix.
>>>>>>
>>>>>> Does the following patch make this SError go away ?
>>>>>
>>>>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>>>> @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
>>>>>> rcar_gen4_pcie *rcar)
>>>>>>             if (ret)
>>>>>>                     goto err_unprepare;
>>>>>>
>>>>>> +mdelay(1);
>>>>>> +
>>>>>>             if (rcar->drvdata->additional_common_init)
>>>>>>                     rcar->drvdata->additional_common_init(rcar);
>>>>>>
>>>>>
>>>>> Yes it does, thanks!
>>>>
>>>> So with this one extra mdelay(1), the PCIe is fully good on your side, or is
>>>> there still anything weird/flaky/malfunctioning ?
>>>
>>> Do we have a theory about why this delay is needed?  I assume it's
>>> something specific to the rcar hardware (not something required by the
>>> PCIe base spec)?
>>>
>>> I'm seeing SError interrupts and external aborts on several arm64
>>> systems and I'd like to understand better why they happen and when/if
>>> we can recover from them.
>>
>> The SError here happens when the PWR RST is released and DBI is
>> accessed rapidly right after that. The current hypothesis is, that
>> the controller core needs a bit of time to initialize itself after
>> the reset is released, before DBI access can be performed ; if the
>> DBI access happens too soon after the reset was released, the core
>> reject the access and CPU interprets that as SError.
> 
> Sounds like this SError is for something on the CPU side of the host
> bridge.

Since this only happens when accessing DBI registers shortly after the 
reset was deasserted, it seems that way.

> I've also seen errors that seem to be related to errors on the PCIe
> side, e.g., a PCIe Completion Timeout or Unsupported Request.  On most
> platforms, those result in writes being silently dropped or ~0 data
> being synthesized for reads.
> 
> E.g., this SError that seems related to a BAR programming issue:
> https://lore.kernel.org/linux-pci/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/

The SError is reported for basically almost any AXI bus error, some IO 
operation happened and it didn't complete or some such, so it is 
expected to see a lot of disparate SErrors like that.

In this case, it isn't anything on the bridge side yet, the bridge isn't 
even fully configured and the link is down when the SError happens here.

>> The reference manual however does not list any such delay, which
>> makes me concerned. I can send such a patch which adds the mdelay(1)
>> as a temporary stopgap fix, until some better explanation maybe
>> sometimes gets uncovered, if that would be OK ?
> 
> Yeah, if it's some rcar-specific thing, I don't see a better
> alternative.
I have one more hypothesis.

I noticed in V4H RM rev.1.30 page 798 that CPG (reset) and PCIe are on 
different busses. From CA76, the CPG reset is reachable via "CCI->Slave 
Access BUS->CPG" and PCIe is reachable via "CCI->Slave Access BUS->HSC".

I wonder if a sequence like this:
- writel(CPG un-reset register)
- readl(PCI DBI register)
can, due to bus behavior, lead to readl(PCI DBI register) taking effect 
on the PCIe IP _before_ the writel(CPG un-reset register) takes effect 
on the CPG IP, which trigger the SError (coming from PCIe IP).

Or is it guaranteed that writel(some IP) followed by readl(another IP) 
are strictly ordered in this manner even on the IP end at which they 
arrive, even if the two IPs are on different busses ? Please pardon my 
ignorance.

Does the attached diff also help Geert with the SError ?

Same diff is also inline below:

"
diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c 
b/drivers/clk/renesas/renesas-cpg-mssr.c
index be9f59e6975d..cb380ba20141 100644
--- a/drivers/clk/renesas/renesas-cpg-mssr.c
+++ b/drivers/clk/renesas/renesas-cpg-mssr.c
@@ -688,12 +688,14 @@ static int cpg_mssr_reset(struct 
reset_controller_dev *rcdev,

  	/* Reset module */
  	writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
+	readl(priv->pub.base0 + priv->reset_regs[reg]);

  	/* Wait for at least one cycle of the RCLK clock (@ ca. 32 kHz) */
  	udelay(35);

  	/* Release module from reset state */
  	writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
+	readl(priv->pub.base0 + priv->reset_clear_regs[reg]);

  	return 0;
  }
@@ -708,6 +710,7 @@ static int cpg_mssr_assert(struct 
reset_controller_dev *rcdev, unsigned long id)
  	dev_dbg(priv->dev, "assert %u%02u\n", reg, bit);

  	writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
+	readl(priv->pub.base0 + priv->reset_regs[reg]);
  	return 0;
  }

@@ -722,6 +725,7 @@ static int cpg_mssr_deassert(struct 
reset_controller_dev *rcdev,
  	dev_dbg(priv->dev, "deassert %u%02u\n", reg, bit);

  	writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
+	readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
  	return 0;
  }

"

[-- Attachment #2: reset.diff --]
[-- Type: text/x-patch, Size: 1266 bytes --]

diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c b/drivers/clk/renesas/renesas-cpg-mssr.c
index be9f59e6975d..cb380ba20141 100644
--- a/drivers/clk/renesas/renesas-cpg-mssr.c
+++ b/drivers/clk/renesas/renesas-cpg-mssr.c
@@ -688,12 +688,14 @@ static int cpg_mssr_reset(struct reset_controller_dev *rcdev,
 
 	/* Reset module */
 	writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
+	readl(priv->pub.base0 + priv->reset_regs[reg]);
 
 	/* Wait for at least one cycle of the RCLK clock (@ ca. 32 kHz) */
 	udelay(35);
 
 	/* Release module from reset state */
 	writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
+	readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
 
 	return 0;
 }
@@ -708,6 +710,7 @@ static int cpg_mssr_assert(struct reset_controller_dev *rcdev, unsigned long id)
 	dev_dbg(priv->dev, "assert %u%02u\n", reg, bit);
 
 	writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
+	readl(priv->pub.base0 + priv->reset_regs[reg]);
 	return 0;
 }
 
@@ -722,6 +725,7 @@ static int cpg_mssr_deassert(struct reset_controller_dev *rcdev,
 	dev_dbg(priv->dev, "deassert %u%02u\n", reg, bit);
 
 	writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
+	readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-16 16:31       ` Marek Vasut
  2025-09-16 17:13         ` Bjorn Helgaas
@ 2025-09-17  7:23         ` Geert Uytterhoeven
  2025-09-18  3:16           ` Marek Vasut
  1 sibling, 1 reply; 19+ messages in thread
From: Geert Uytterhoeven @ 2025-09-17  7:23 UTC (permalink / raw)
  To: Marek Vasut
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

Hi Marek,

On Tue, 16 Sept 2025 at 18:31, Marek Vasut <marek.vasut@mailbox.org> wrote:
> On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:
> > On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
> >> On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
> >>> On Tue, 16 Sept 2025 at 01:59, Marek Vasut
> >>> <marek.vasut+renesas@mailbox.org> wrote:
> >>>> R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
> >>>> Figure 104.3b Initial Setting of PCIEC(example), third quarter of the figure
> >>>> indicates that register 0xf8 should be polled until bit 18 becomes set to 1.
> >>>>
> >>>> Register 0xf8 bit 18 is 0 immediately after write to PCIERSTCTRL1 and is set
> >>>> to 1 in less than 1 ms afterward. The current readl_poll_timeout() break
> >>>> condition is inverted and returns when register 0xf8 bit 18 is set to 0,
> >>>> which in most cases means immediately. In case CONFIG_DEBUG_LOCK_ALLOC=y ,
> >>>> the timing changes just enough for the first readl_poll_timeout() poll to
> >>>> already read register 0xf8 bit 18 as 1 and afterward never read register
> >>>> 0xf8 bit 18 as 0, which leads to timeout and failure to start the PCIe
> >>>> controller.
> >>>>
> >>>> Fix this by inverting the poll condition to match the reference manual
> >>>> initialization sequence.
> >>>>
> >>>> Fixes: faf5a975ee3b ("PCI: rcar-gen4: Add support for R-Car V4H")
> >>>> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
> >>>
> >>> Thanks for your patch!
> >>>
> >>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >>>> @@ -711,7 +711,7 @@ static int rcar_gen4_pcie_ltssm_control(struct rcar_gen4_pcie *rcar, bool enable
> >>>>           val &= ~APP_HOLD_PHY_RST;
> >>>>           writel(val, rcar->base + PCIERSTCTRL1);
> >>>>
> >>>> -       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, !(val & BIT(18)), 100, 10000);
> >>>> +       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, val & BIT(18), 100, 10000);
> >>>>           if (ret < 0)
> >>>>                   return ret;
> >>>>
> >>>
> >>> This change looks correct, and fixes the timeout seen on White Hawk
> >>> with CONFIG_DEBUG_LOCK_ALLOC=y.
> >>> However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
> >>>
> >>>       SError Interrupt on CPU0, code 0x00000000be000011 -- SError
> >>
> >> ...
> >>
> >>>        el1h_64_error_handler+0x2c/0x40
> >>>        el1h_64_error+0x70/0x74
> >>>        dw_pcie_read+0x20/0x74 (P)
> >>>        rcar_gen4_pcie_additional_common_init+0x1c/0x6c
> >>
> >> SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
> >> this fix.
> >>
> >> Does the following patch make this SError go away ?
> >
> >> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >> @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
> >> rcar_gen4_pcie *rcar)
> >>           if (ret)
> >>                   goto err_unprepare;
> >>
> >> +mdelay(1);
> >> +
> >>           if (rcar->drvdata->additional_common_init)
> >>                   rcar->drvdata->additional_common_init(rcar);
> >>
> >
> > Yes it does, thanks!
> So with this one extra mdelay(1), the PCIe is fully good on your side,
> or is there still anything weird/flaky/malfunctioning ?
>
> If you could give me a RB/TB on this fix, it would be nice.

You can have my
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
but only for the combination of both (A) "[PATCH] PCI: rcar-gen4: Fix
inverted break condition in PHY initialization" and (B) the addition
of the mdelay.

  - (A) without (B) causes an SError if CONFIG_DEBUG_LOCK_ALLOC=n,

  - (B) without (A) causes a timeout if CONFIG_DEBUG_LOCK_ALLOC=n
    (i.e. same behavior as with CONFIG_DEBUG_LOCK_ALLOC=y).

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-16 22:09               ` Marek Vasut
@ 2025-09-17  8:00                 ` Geert Uytterhoeven
  2025-09-17 13:44                   ` Marek Vasut
  0 siblings, 1 reply; 19+ messages in thread
From: Geert Uytterhoeven @ 2025-09-17  8:00 UTC (permalink / raw)
  To: Marek Vasut
  Cc: Bjorn Helgaas, linux-pci, Krzysztof Wilczyński,
	Bjorn Helgaas, Lorenzo Pieralisi, Magnus Damm,
	Manivannan Sadhasivam, Rob Herring, Yoshihiro Shimoda,
	linux-renesas-soc

Hi Marek,

On Wed, 17 Sept 2025 at 00:09, Marek Vasut <marek.vasut@mailbox.org> wrote:
> On 9/16/25 8:15 PM, Bjorn Helgaas wrote:
> > On Tue, Sep 16, 2025 at 07:39:07PM +0200, Marek Vasut wrote:
> >> On 9/16/25 7:13 PM, Bjorn Helgaas wrote:
> >>> On Tue, Sep 16, 2025 at 06:31:31PM +0200, Marek Vasut wrote:
> >>>> On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:
> >>>>> On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
> >>>>>> On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
> >>>
> >>>>>>> This change looks correct, and fixes the timeout seen on White Hawk
> >>>>>>> with CONFIG_DEBUG_LOCK_ALLOC=y.
> >>>>>>> However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
> >>>>>>>
> >>>>>>>         SError Interrupt on CPU0, code 0x00000000be000011 -- SError
> >>>>>>
> >>>>>> ...
> >>>>>>
> >>>>>>>          el1h_64_error_handler+0x2c/0x40
> >>>>>>>          el1h_64_error+0x70/0x74
> >>>>>>>          dw_pcie_read+0x20/0x74 (P)
> >>>>>>>          rcar_gen4_pcie_additional_common_init+0x1c/0x6c
> >>>>>>
> >>>>>> SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
> >>>>>> this fix.
> >>>>>>
> >>>>>> Does the following patch make this SError go away ?
> >>>>>
> >>>>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >>>>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >>>>>> @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
> >>>>>> rcar_gen4_pcie *rcar)
> >>>>>>             if (ret)
> >>>>>>                     goto err_unprepare;
> >>>>>>
> >>>>>> +mdelay(1);
> >>>>>> +
> >>>>>>             if (rcar->drvdata->additional_common_init)
> >>>>>>                     rcar->drvdata->additional_common_init(rcar);
> >>>>>>
> >>>>>
> >>>>> Yes it does, thanks!
> >>>>
> >>>> So with this one extra mdelay(1), the PCIe is fully good on your side, or is
> >>>> there still anything weird/flaky/malfunctioning ?
> >>>
> >>> Do we have a theory about why this delay is needed?  I assume it's
> >>> something specific to the rcar hardware (not something required by the
> >>> PCIe base spec)?

> > Yeah, if it's some rcar-specific thing, I don't see a better
> > alternative.
> I have one more hypothesis.
>
> I noticed in V4H RM rev.1.30 page 798 that CPG (reset) and PCIe are on
> different busses. From CA76, the CPG reset is reachable via "CCI->Slave
> Access BUS->CPG" and PCIe is reachable via "CCI->Slave Access BUS->HSC".
>
> I wonder if a sequence like this:
> - writel(CPG un-reset register)
> - readl(PCI DBI register)
> can, due to bus behavior, lead to readl(PCI DBI register) taking effect
> on the PCIe IP _before_ the writel(CPG un-reset register) takes effect
> on the CPG IP, which trigger the SError (coming from PCIe IP).
>
> Or is it guaranteed that writel(some IP) followed by readl(another IP)
> are strictly ordered in this manner even on the IP end at which they
> arrive, even if the two IPs are on different busses ? Please pardon my
> ignorance.
>
> Does the attached diff also help Geert with the SError ?
>
> Same diff is also inline below:

> --- a/drivers/clk/renesas/renesas-cpg-mssr.c
> +++ b/drivers/clk/renesas/renesas-cpg-mssr.c
> @@ -688,12 +688,14 @@ static int cpg_mssr_reset(struct
> reset_controller_dev *rcdev,
>
>         /* Reset module */
>         writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
> +       readl(priv->pub.base0 + priv->reset_regs[reg]);
>
>         /* Wait for at least one cycle of the RCLK clock (@ ca. 32 kHz) */
>         udelay(35);
>
>         /* Release module from reset state */
>         writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
> +       readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
>
>         return 0;
>   }
> @@ -708,6 +710,7 @@ static int cpg_mssr_assert(struct
> reset_controller_dev *rcdev, unsigned long id)
>         dev_dbg(priv->dev, "assert %u%02u\n", reg, bit);
>
>         writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
> +       readl(priv->pub.base0 + priv->reset_regs[reg]);
>         return 0;
>   }
>
> @@ -722,6 +725,7 @@ static int cpg_mssr_deassert(struct
> reset_controller_dev *rcdev,
>         dev_dbg(priv->dev, "deassert %u%02u\n", reg, bit);
>
>         writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
> +       readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
>         return 0;
>   }
>

Yes, reading the reset registers after writing works, too.

I just noticed the module reset operation in the R-Car V4H Hardware
User's Manual has changed from R-Car Gen2/Gen3, and is now more complex,
with three different reset types, depending on the module to be reset
(see Section 9.3 "Operation" subsection (2) "Software Reset").
The most striking difference is that there is no more mention of
a single delay of 1 RCLK cycle (cfr. the udelay(35) above), but of
much longer delays of 1 ms.

As drivers/pci/controller/dwc/pcie-rcar-gen4.c does not call the
combined reset_control_reset() , but uses separate
reset_control_assert() and reset_control_deassert() calls, the PCIe
driver itself is responsible for enforcing this 1 ms delay.
Which is exactly what your introduction of mdelay(1) (just after the
out-of-context call to reset_control_deassert()) does, so I believe
we're all set?

For enhance reliability, I would also add such a delay after the
(conditional) call to reset_control_assert(), like (whitespace-damaged):

--- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
+++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
@@ -184,8 +184,10 @@ static int rcar_gen4_pcie_common_init(struct
rcar_gen4_pcie *rcar)
                return ret;
        }

-       if (!reset_control_status(dw->core_rsts[DW_PCIE_PWR_RST].rstc))
+       if (!reset_control_status(dw->core_rsts[DW_PCIE_PWR_RST].rstc)) {
                reset_control_assert(dw->core_rsts[DW_PCIE_PWR_RST].rstc);
+               mdelay(1);
+       }

        val = readl(rcar->base + PCIEMSR0);
        if (rcar->drvdata->mode == DW_PCIE_RC_TYPE) {

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-17  8:00                 ` Geert Uytterhoeven
@ 2025-09-17 13:44                   ` Marek Vasut
  0 siblings, 0 replies; 19+ messages in thread
From: Marek Vasut @ 2025-09-17 13:44 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Bjorn Helgaas, linux-pci, Krzysztof Wilczyński,
	Bjorn Helgaas, Lorenzo Pieralisi, Magnus Damm,
	Manivannan Sadhasivam, Rob Herring, Yoshihiro Shimoda,
	linux-renesas-soc

On 9/17/25 10:00 AM, Geert Uytterhoeven wrote:

Hello Geert,

>> --- a/drivers/clk/renesas/renesas-cpg-mssr.c
>> +++ b/drivers/clk/renesas/renesas-cpg-mssr.c
>> @@ -688,12 +688,14 @@ static int cpg_mssr_reset(struct
>> reset_controller_dev *rcdev,
>>
>>          /* Reset module */
>>          writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
>> +       readl(priv->pub.base0 + priv->reset_regs[reg]);
>>
>>          /* Wait for at least one cycle of the RCLK clock (@ ca. 32 kHz) */
>>          udelay(35);
>>
>>          /* Release module from reset state */
>>          writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
>> +       readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
>>
>>          return 0;
>>    }
>> @@ -708,6 +710,7 @@ static int cpg_mssr_assert(struct
>> reset_controller_dev *rcdev, unsigned long id)
>>          dev_dbg(priv->dev, "assert %u%02u\n", reg, bit);
>>
>>          writel(bitmask, priv->pub.base0 + priv->reset_regs[reg]);
>> +       readl(priv->pub.base0 + priv->reset_regs[reg]);
>>          return 0;
>>    }
>>
>> @@ -722,6 +725,7 @@ static int cpg_mssr_deassert(struct
>> reset_controller_dev *rcdev,
>>          dev_dbg(priv->dev, "deassert %u%02u\n", reg, bit);
>>
>>          writel(bitmask, priv->pub.base0 + priv->reset_clear_regs[reg]);
>> +       readl(priv->pub.base0 + priv->reset_clear_regs[reg]);
>>          return 0;
>>    }
>>
> 
> Yes, reading the reset registers after writing works, too.
> 
> I just noticed the module reset operation in the R-Car V4H Hardware
> User's Manual has changed from R-Car Gen2/Gen3, and is now more complex,
> with three different reset types, depending on the module to be reset
> (see Section 9.3 "Operation" subsection (2) "Software Reset").
> The most striking difference is that there is no more mention of
> a single delay of 1 RCLK cycle (cfr. the udelay(35) above), but of
> much longer delays of 1 ms.
> 
> As drivers/pci/controller/dwc/pcie-rcar-gen4.c does not call the
> combined reset_control_reset() , but uses separate
> reset_control_assert() and reset_control_deassert() calls, the PCIe
> driver itself is responsible for enforcing this 1 ms delay.

Shouldn't we patch the reset driver and insert unconditional 1ms delay 
into reset_assert() callback ?

> Which is exactly what your introduction of mdelay(1) (just after the
> out-of-context call to reset_control_deassert()) does, so I believe
> we're all set?

No, I do not think so. Figure 9.3.2 Software Reset flow (B) on page 585 
does NOT describe 1ms delay after write into SRSTCLR register. It does 
describe 1ms delay after write into SRCR register. That means, we need 
1ms delay after reset_control_assert(), but it does NOT mean we need 1ms 
delay after reset_control_deassert() . That means the issue remains 
unexplained ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-17  7:23         ` Geert Uytterhoeven
@ 2025-09-18  3:16           ` Marek Vasut
  2025-09-22 10:10             ` Geert Uytterhoeven
  0 siblings, 1 reply; 19+ messages in thread
From: Marek Vasut @ 2025-09-18  3:16 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

On 9/17/25 9:23 AM, Geert Uytterhoeven wrote:

Hello Geert,

> On Tue, 16 Sept 2025 at 18:31, Marek Vasut <marek.vasut@mailbox.org> wrote:
>> On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:
>>> On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
>>>> On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
>>>>> On Tue, 16 Sept 2025 at 01:59, Marek Vasut
>>>>> <marek.vasut+renesas@mailbox.org> wrote:
>>>>>> R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
>>>>>> Figure 104.3b Initial Setting of PCIEC(example), third quarter of the figure
>>>>>> indicates that register 0xf8 should be polled until bit 18 becomes set to 1.
>>>>>>
>>>>>> Register 0xf8 bit 18 is 0 immediately after write to PCIERSTCTRL1 and is set
>>>>>> to 1 in less than 1 ms afterward. The current readl_poll_timeout() break
>>>>>> condition is inverted and returns when register 0xf8 bit 18 is set to 0,
>>>>>> which in most cases means immediately. In case CONFIG_DEBUG_LOCK_ALLOC=y ,
>>>>>> the timing changes just enough for the first readl_poll_timeout() poll to
>>>>>> already read register 0xf8 bit 18 as 1 and afterward never read register
>>>>>> 0xf8 bit 18 as 0, which leads to timeout and failure to start the PCIe
>>>>>> controller.
>>>>>>
>>>>>> Fix this by inverting the poll condition to match the reference manual
>>>>>> initialization sequence.
>>>>>>
>>>>>> Fixes: faf5a975ee3b ("PCI: rcar-gen4: Add support for R-Car V4H")
>>>>>> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
>>>>>
>>>>> Thanks for your patch!
>>>>>
>>>>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>>>> @@ -711,7 +711,7 @@ static int rcar_gen4_pcie_ltssm_control(struct rcar_gen4_pcie *rcar, bool enable
>>>>>>            val &= ~APP_HOLD_PHY_RST;
>>>>>>            writel(val, rcar->base + PCIERSTCTRL1);
>>>>>>
>>>>>> -       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, !(val & BIT(18)), 100, 10000);
>>>>>> +       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, val & BIT(18), 100, 10000);
>>>>>>            if (ret < 0)
>>>>>>                    return ret;
>>>>>>
>>>>>
>>>>> This change looks correct, and fixes the timeout seen on White Hawk
>>>>> with CONFIG_DEBUG_LOCK_ALLOC=y.
>>>>> However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
>>>>>
>>>>>        SError Interrupt on CPU0, code 0x00000000be000011 -- SError
>>>>
>>>> ...
>>>>
>>>>>         el1h_64_error_handler+0x2c/0x40
>>>>>         el1h_64_error+0x70/0x74
>>>>>         dw_pcie_read+0x20/0x74 (P)
>>>>>         rcar_gen4_pcie_additional_common_init+0x1c/0x6c
>>>>
>>>> SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
>>>> this fix.
>>>>
>>>> Does the following patch make this SError go away ?
>>>
>>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
>>>> @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
>>>> rcar_gen4_pcie *rcar)
>>>>            if (ret)
>>>>                    goto err_unprepare;
>>>>
>>>> +mdelay(1);
>>>> +
>>>>            if (rcar->drvdata->additional_common_init)
>>>>                    rcar->drvdata->additional_common_init(rcar);
>>>>
>>>
>>> Yes it does, thanks!
>> So with this one extra mdelay(1), the PCIe is fully good on your side,
>> or is there still anything weird/flaky/malfunctioning ?
>>
>> If you could give me a RB/TB on this fix, it would be nice.
> 
> You can have my
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> but only for the combination of both (A) "[PATCH] PCI: rcar-gen4: Fix
> inverted break condition in PHY initialization" and (B) the addition
> of the mdelay.
> 
>    - (A) without (B) causes an SError if CONFIG_DEBUG_LOCK_ALLOC=n,
> 
>    - (B) without (A) causes a timeout if CONFIG_DEBUG_LOCK_ALLOC=n
>      (i.e. same behavior as with CONFIG_DEBUG_LOCK_ALLOC=y).
I have instead posted what I think are proper fixes for that SError:

PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
https://patchwork.kernel.org/project/linux-pci/patch/20250918030058.330960-1-marek.vasut+renesas@mailbox.org/

clk: renesas: cpg-mssr: Add missing 1ms delay into reset toggle callback
https://patchwork.kernel.org/project/linux-clk/patch/20250918030552.331389-1-marek.vasut+renesas@mailbox.org/

clk: renesas: cpg-mssr: Read back reset registers to assure values latched
https://patchwork.kernel.org/project/linux-clk/patch/20250918030723.331634-1-marek.vasut+renesas@mailbox.org/

I hope those help. Can you please let me know if they do help ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-18  3:16           ` Marek Vasut
@ 2025-09-22 10:10             ` Geert Uytterhoeven
  2025-09-22 15:17               ` Marek Vasut
  0 siblings, 1 reply; 19+ messages in thread
From: Geert Uytterhoeven @ 2025-09-22 10:10 UTC (permalink / raw)
  To: Marek Vasut
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

Hi Marek,

On Thu, 18 Sept 2025 at 05:16, Marek Vasut <marek.vasut@mailbox.org> wrote:
> On 9/17/25 9:23 AM, Geert Uytterhoeven wrote:
> > On Tue, 16 Sept 2025 at 18:31, Marek Vasut <marek.vasut@mailbox.org> wrote:
> >> On 9/16/25 3:57 PM, Geert Uytterhoeven wrote:
> >>> On Tue, 16 Sept 2025 at 15:39, Marek Vasut <marek.vasut@mailbox.org> wrote:
> >>>> On 9/16/25 11:59 AM, Geert Uytterhoeven wrote:
> >>>>> On Tue, 16 Sept 2025 at 01:59, Marek Vasut
> >>>>> <marek.vasut+renesas@mailbox.org> wrote:
> >>>>>> R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
> >>>>>> Figure 104.3b Initial Setting of PCIEC(example), third quarter of the figure
> >>>>>> indicates that register 0xf8 should be polled until bit 18 becomes set to 1.
> >>>>>>
> >>>>>> Register 0xf8 bit 18 is 0 immediately after write to PCIERSTCTRL1 and is set
> >>>>>> to 1 in less than 1 ms afterward. The current readl_poll_timeout() break
> >>>>>> condition is inverted and returns when register 0xf8 bit 18 is set to 0,
> >>>>>> which in most cases means immediately. In case CONFIG_DEBUG_LOCK_ALLOC=y ,
> >>>>>> the timing changes just enough for the first readl_poll_timeout() poll to
> >>>>>> already read register 0xf8 bit 18 as 1 and afterward never read register
> >>>>>> 0xf8 bit 18 as 0, which leads to timeout and failure to start the PCIe
> >>>>>> controller.
> >>>>>>
> >>>>>> Fix this by inverting the poll condition to match the reference manual
> >>>>>> initialization sequence.
> >>>>>>
> >>>>>> Fixes: faf5a975ee3b ("PCI: rcar-gen4: Add support for R-Car V4H")
> >>>>>> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
> >>>>>
> >>>>> Thanks for your patch!
> >>>>>
> >>>>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >>>>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >>>>>> @@ -711,7 +711,7 @@ static int rcar_gen4_pcie_ltssm_control(struct rcar_gen4_pcie *rcar, bool enable
> >>>>>>            val &= ~APP_HOLD_PHY_RST;
> >>>>>>            writel(val, rcar->base + PCIERSTCTRL1);
> >>>>>>
> >>>>>> -       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, !(val & BIT(18)), 100, 10000);
> >>>>>> +       ret = readl_poll_timeout(rcar->phy_base + 0x0f8, val, val & BIT(18), 100, 10000);
> >>>>>>            if (ret < 0)
> >>>>>>                    return ret;
> >>>>>>
> >>>>>
> >>>>> This change looks correct, and fixes the timeout seen on White Hawk
> >>>>> with CONFIG_DEBUG_LOCK_ALLOC=y.
> >>>>> However, it causes a crash when CONFIG_DEBUG_LOCK_ALLOC=n:
> >>>>>
> >>>>>        SError Interrupt on CPU0, code 0x00000000be000011 -- SError
> >>>>
> >>>> ...
> >>>>
> >>>>>         el1h_64_error_handler+0x2c/0x40
> >>>>>         el1h_64_error+0x70/0x74
> >>>>>         dw_pcie_read+0x20/0x74 (P)
> >>>>>         rcar_gen4_pcie_additional_common_init+0x1c/0x6c
> >>>>
> >>>> SError in rcar_gen4_pcie_additional_common_init , this is unrelated to
> >>>> this fix.
> >>>>
> >>>> Does the following patch make this SError go away ?
> >>>
> >>>> --- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >>>> +++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
> >>>> @@ -204,6 +204,8 @@ static int rcar_gen4_pcie_common_init(struct
> >>>> rcar_gen4_pcie *rcar)
> >>>>            if (ret)
> >>>>                    goto err_unprepare;
> >>>>
> >>>> +mdelay(1);
> >>>> +
> >>>>            if (rcar->drvdata->additional_common_init)
> >>>>                    rcar->drvdata->additional_common_init(rcar);
> >>>>
> >>>
> >>> Yes it does, thanks!
> >> So with this one extra mdelay(1), the PCIe is fully good on your side,
> >> or is there still anything weird/flaky/malfunctioning ?
> >>
> >> If you could give me a RB/TB on this fix, it would be nice.
> >
> > You can have my
> > Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > but only for the combination of both (A) "[PATCH] PCI: rcar-gen4: Fix
> > inverted break condition in PHY initialization" and (B) the addition
> > of the mdelay.
> >
> >    - (A) without (B) causes an SError if CONFIG_DEBUG_LOCK_ALLOC=n,
> >
> >    - (B) without (A) causes a timeout if CONFIG_DEBUG_LOCK_ALLOC=n
> >      (i.e. same behavior as with CONFIG_DEBUG_LOCK_ALLOC=y).
> I have instead posted what I think are proper fixes for that SError:
>
> PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
> https://patchwork.kernel.org/project/linux-pci/patch/20250918030058.330960-1-marek.vasut+renesas@mailbox.org/

I used v3 instead.
While that patch seems to fix the SError after a hard reset (hardware
reset), it is not sufficient after a soft reset (typing "reboot").

> clk: renesas: cpg-mssr: Add missing 1ms delay into reset toggle callback
> https://patchwork.kernel.org/project/linux-clk/patch/20250918030552.331389-1-marek.vasut+renesas@mailbox.org/

This does not fix the SError, as expected (pcie-rcar-gen4.c does not
call reset_control_reset(), but reset_control_{,de}assert()).

> clk: renesas: cpg-mssr: Read back reset registers to assure values latched
> https://patchwork.kernel.org/project/linux-clk/patch/20250918030723.331634-1-marek.vasut+renesas@mailbox.org/

I used v2 instead, which seems to fix the SError.

> I hope those help. Can you please let me know if they do help ?

See above.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-22 10:10             ` Geert Uytterhoeven
@ 2025-09-22 15:17               ` Marek Vasut
  2025-09-22 15:33                 ` Geert Uytterhoeven
  0 siblings, 1 reply; 19+ messages in thread
From: Marek Vasut @ 2025-09-22 15:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

On 9/22/25 12:10 PM, Geert Uytterhoeven wrote:

Hello Geert,

>> I have instead posted what I think are proper fixes for that SError:
>>
>> PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
>> https://patchwork.kernel.org/project/linux-pci/patch/20250918030058.330960-1-marek.vasut+renesas@mailbox.org/
> 
> I used v3 instead.
> While that patch seems to fix the SError after a hard reset (hardware
> reset), it is not sufficient after a soft reset (typing "reboot").
> 
>> clk: renesas: cpg-mssr: Add missing 1ms delay into reset toggle callback
>> https://patchwork.kernel.org/project/linux-clk/patch/20250918030552.331389-1-marek.vasut+renesas@mailbox.org/
> 
> This does not fix the SError, as expected (pcie-rcar-gen4.c does not
> call reset_control_reset(), but reset_control_{,de}assert()).
> 
>> clk: renesas: cpg-mssr: Read back reset registers to assure values latched
>> https://patchwork.kernel.org/project/linux-clk/patch/20250918030723.331634-1-marek.vasut+renesas@mailbox.org/
> 
> I used v2 instead, which seems to fix the SError.

Those three patches have to be used together, and this inverted break 
condition fix should be applied too.

The first two are corrections which align the code behavior with 
reference manual. This inverted break fix is another correction. The 
last patch in the list above actually fixes the asynchronized reset 
behavior and turns it into synchronized reset behavior, therefore fixing 
the SError in the process.

-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-22 15:17               ` Marek Vasut
@ 2025-09-22 15:33                 ` Geert Uytterhoeven
  2025-09-22 15:49                   ` Marek Vasut
  0 siblings, 1 reply; 19+ messages in thread
From: Geert Uytterhoeven @ 2025-09-22 15:33 UTC (permalink / raw)
  To: Marek Vasut
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

Hi Marek,

On Mon, 22 Sept 2025 at 17:17, Marek Vasut <marek.vasut@mailbox.org> wrote:
> On 9/22/25 12:10 PM, Geert Uytterhoeven wrote:
> >> I have instead posted what I think are proper fixes for that SError:
> >>
> >> PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
> >> https://patchwork.kernel.org/project/linux-pci/patch/20250918030058.330960-1-marek.vasut+renesas@mailbox.org/
> >
> > I used v3 instead.
> > While that patch seems to fix the SError after a hard reset (hardware
> > reset), it is not sufficient after a soft reset (typing "reboot").
> >
> >> clk: renesas: cpg-mssr: Add missing 1ms delay into reset toggle callback
> >> https://patchwork.kernel.org/project/linux-clk/patch/20250918030552.331389-1-marek.vasut+renesas@mailbox.org/
> >
> > This does not fix the SError, as expected (pcie-rcar-gen4.c does not
> > call reset_control_reset(), but reset_control_{,de}assert()).
> >
> >> clk: renesas: cpg-mssr: Read back reset registers to assure values latched
> >> https://patchwork.kernel.org/project/linux-clk/patch/20250918030723.331634-1-marek.vasut+renesas@mailbox.org/
> >
> > I used v2 instead, which seems to fix the SError.
>
> Those three patches have to be used together, and this inverted break
> condition fix should be applied too.
>
> The first two are corrections which align the code behavior with
> reference manual. This inverted break fix is another correction. The
> last patch in the list above actually fixes the asynchronized reset
> behavior and turns it into synchronized reset behavior, therefore fixing
> the SError in the process.

FTR, I always had the inverted break condition fix applied.
All 3 fixes on top should be fine.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-22 15:33                 ` Geert Uytterhoeven
@ 2025-09-22 15:49                   ` Marek Vasut
  2025-09-23  7:04                     ` Geert Uytterhoeven
  0 siblings, 1 reply; 19+ messages in thread
From: Marek Vasut @ 2025-09-22 15:49 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

On 9/22/25 5:33 PM, Geert Uytterhoeven wrote:

Hello Geert,

> On Mon, 22 Sept 2025 at 17:17, Marek Vasut <marek.vasut@mailbox.org> wrote:
>> On 9/22/25 12:10 PM, Geert Uytterhoeven wrote:
>>>> I have instead posted what I think are proper fixes for that SError:
>>>>
>>>> PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
>>>> https://patchwork.kernel.org/project/linux-pci/patch/20250918030058.330960-1-marek.vasut+renesas@mailbox.org/
>>>
>>> I used v3 instead.
>>> While that patch seems to fix the SError after a hard reset (hardware
>>> reset), it is not sufficient after a soft reset (typing "reboot").
>>>
>>>> clk: renesas: cpg-mssr: Add missing 1ms delay into reset toggle callback
>>>> https://patchwork.kernel.org/project/linux-clk/patch/20250918030552.331389-1-marek.vasut+renesas@mailbox.org/
>>>
>>> This does not fix the SError, as expected (pcie-rcar-gen4.c does not
>>> call reset_control_reset(), but reset_control_{,de}assert()).
>>>
>>>> clk: renesas: cpg-mssr: Read back reset registers to assure values latched
>>>> https://patchwork.kernel.org/project/linux-clk/patch/20250918030723.331634-1-marek.vasut+renesas@mailbox.org/
>>>
>>> I used v2 instead, which seems to fix the SError.
>>
>> Those three patches have to be used together, and this inverted break
>> condition fix should be applied too.
>>
>> The first two are corrections which align the code behavior with
>> reference manual. This inverted break fix is another correction. The
>> last patch in the list above actually fixes the asynchronized reset
>> behavior and turns it into synchronized reset behavior, therefore fixing
>> the SError in the process.
> 
> FTR, I always had the inverted break condition fix applied.
> All 3 fixes on top should be fine.
Maybe I can finally properly deserve your TB on this patch with this 
option (C) , all three patches applied.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-22 15:49                   ` Marek Vasut
@ 2025-09-23  7:04                     ` Geert Uytterhoeven
  0 siblings, 0 replies; 19+ messages in thread
From: Geert Uytterhoeven @ 2025-09-23  7:04 UTC (permalink / raw)
  To: Marek Vasut
  Cc: linux-pci, Krzysztof Wilczyński, Bjorn Helgaas,
	Lorenzo Pieralisi, Magnus Damm, Manivannan Sadhasivam,
	Rob Herring, Yoshihiro Shimoda, linux-renesas-soc

On Mon, 22 Sept 2025 at 17:49, Marek Vasut <marek.vasut@mailbox.org> wrote:
> On 9/22/25 5:33 PM, Geert Uytterhoeven wrote:
> > On Mon, 22 Sept 2025 at 17:17, Marek Vasut <marek.vasut@mailbox.org> wrote:
> >> On 9/22/25 12:10 PM, Geert Uytterhoeven wrote:
> >>>> I have instead posted what I think are proper fixes for that SError:
> >>>>
> >>>> PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
> >>>> https://patchwork.kernel.org/project/linux-pci/patch/20250918030058.330960-1-marek.vasut+renesas@mailbox.org/
> >>>
> >>> I used v3 instead.
> >>> While that patch seems to fix the SError after a hard reset (hardware
> >>> reset), it is not sufficient after a soft reset (typing "reboot").
> >>>
> >>>> clk: renesas: cpg-mssr: Add missing 1ms delay into reset toggle callback
> >>>> https://patchwork.kernel.org/project/linux-clk/patch/20250918030552.331389-1-marek.vasut+renesas@mailbox.org/
> >>>
> >>> This does not fix the SError, as expected (pcie-rcar-gen4.c does not
> >>> call reset_control_reset(), but reset_control_{,de}assert()).
> >>>
> >>>> clk: renesas: cpg-mssr: Read back reset registers to assure values latched
> >>>> https://patchwork.kernel.org/project/linux-clk/patch/20250918030723.331634-1-marek.vasut+renesas@mailbox.org/
> >>>
> >>> I used v2 instead, which seems to fix the SError.
> >>
> >> Those three patches have to be used together, and this inverted break
> >> condition fix should be applied too.
> >>
> >> The first two are corrections which align the code behavior with
> >> reference manual. This inverted break fix is another correction. The
> >> last patch in the list above actually fixes the asynchronized reset
> >> behavior and turns it into synchronized reset behavior, therefore fixing
> >> the SError in the process.
> >
> > FTR, I always had the inverted break condition fix applied.
> > All 3 fixes on top should be fine.
>
> Maybe I can finally properly deserve your TB on this patch with this
> option (C) , all three patches applied.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  2025-09-15 23:58 [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization Marek Vasut
  2025-09-16  9:59 ` Geert Uytterhoeven
@ 2025-09-25 16:36 ` Manivannan Sadhasivam
  1 sibling, 0 replies; 19+ messages in thread
From: Manivannan Sadhasivam @ 2025-09-25 16:36 UTC (permalink / raw)
  To: linux-pci, Marek Vasut
  Cc: Krzysztof Wilczyński, Bjorn Helgaas, Geert Uytterhoeven,
	Lorenzo Pieralisi, Magnus Damm, Rob Herring, Yoshihiro Shimoda,
	linux-renesas-soc


On Tue, 16 Sep 2025 01:58:40 +0200, Marek Vasut wrote:
> R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
> Figure 104.3b Initial Setting of PCIEC(example), third quarter of the figure
> indicates that register 0xf8 should be polled until bit 18 becomes set to 1.
> 
> Register 0xf8 bit 18 is 0 immediately after write to PCIERSTCTRL1 and is set
> to 1 in less than 1 ms afterward. The current readl_poll_timeout() break
> condition is inverted and returns when register 0xf8 bit 18 is set to 0,
> which in most cases means immediately. In case CONFIG_DEBUG_LOCK_ALLOC=y ,
> the timing changes just enough for the first readl_poll_timeout() poll to
> already read register 0xf8 bit 18 as 1 and afterward never read register
> 0xf8 bit 18 as 0, which leads to timeout and failure to start the PCIe
> controller.
> 
> [...]

Applied, thanks!

[1/1] PCI: rcar-gen4: Fix inverted break condition in PHY initialization
      commit: d0bf8864a2fe2120a5da51e4bca3e11747a8e797

Best regards,
-- 
Manivannan Sadhasivam <mani@kernel.org>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-09-25 16:36 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15 23:58 [PATCH] PCI: rcar-gen4: Fix inverted break condition in PHY initialization Marek Vasut
2025-09-16  9:59 ` Geert Uytterhoeven
2025-09-16 13:39   ` Marek Vasut
2025-09-16 13:57     ` Geert Uytterhoeven
2025-09-16 16:31       ` Marek Vasut
2025-09-16 17:13         ` Bjorn Helgaas
2025-09-16 17:39           ` Marek Vasut
2025-09-16 18:15             ` Bjorn Helgaas
2025-09-16 22:09               ` Marek Vasut
2025-09-17  8:00                 ` Geert Uytterhoeven
2025-09-17 13:44                   ` Marek Vasut
2025-09-17  7:23         ` Geert Uytterhoeven
2025-09-18  3:16           ` Marek Vasut
2025-09-22 10:10             ` Geert Uytterhoeven
2025-09-22 15:17               ` Marek Vasut
2025-09-22 15:33                 ` Geert Uytterhoeven
2025-09-22 15:49                   ` Marek Vasut
2025-09-23  7:04                     ` Geert Uytterhoeven
2025-09-25 16:36 ` Manivannan Sadhasivam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox