devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Probe failure of usb controller @11290000 on MT8195 after next-20231221
@ 2024-01-18 18:36 Nícolas F. R. A. Prado
  2024-01-19  9:12 ` AngeloGioacchino Del Regno
  0 siblings, 1 reply; 12+ messages in thread
From: Nícolas F. R. A. Prado @ 2024-01-18 18:36 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno, Matthias Brugger
  Cc: devicetree, linux-kernel, linux-arm-kernel, linux-mediatek,
	kernel, Macpaul Lin, Chunfeng Yun, Chen-Yu Tsai

Hi,

KernelCI has identified a failure in the probe of one of the USB controllers on
the MT8195-Tomato Chromebook [1]:

[   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
[   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
[   16.337093] xhci-mtk 11290000.usb: new USB bus registered, assigned bus number 5
[   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
[   16.357119] xhci-mtk 11290000.usb: can't setup: -110
[   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
[   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110

A previous message [2] suggests that a force-mode phy property that has been
merged might help with addressing the issue, however it's not clear to me how,
given that the controller at 1129000 uses a USB2 phy and the phy driver patch
only looks for the property on USB3 phys.

Worth noting that the issue doesn't always happen. For instance the test did
pass for next-20240110 and then failed again on today's next [3]. But it does
seem that the issue was introduced, or at least became much more likely, between
next-20231221 and next-20240103, given that it never happened out of 10 runs
before, and after that has happened 5 out of 7 times.

Note: On the Tomato Chromebook specifically this USB controller is not connected
to anything.

[1] https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/
[2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
[3] https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/

Thanks,
Nícolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-01-18 18:36 Probe failure of usb controller @11290000 on MT8195 after next-20231221 Nícolas F. R. A. Prado
@ 2024-01-19  9:12 ` AngeloGioacchino Del Regno
  2024-07-10 19:15   ` Nícolas F. R. A. Prado
  0 siblings, 1 reply; 12+ messages in thread
From: AngeloGioacchino Del Regno @ 2024-01-19  9:12 UTC (permalink / raw)
  To: Nícolas F. R. A. Prado, Matthias Brugger
  Cc: devicetree, linux-kernel, linux-arm-kernel, linux-mediatek,
	kernel, Macpaul Lin, Chunfeng Yun, Chen-Yu Tsai

Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
> Hi,
> 
> KernelCI has identified a failure in the probe of one of the USB controllers on
> the MT8195-Tomato Chromebook [1]:
> 
> [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
> [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
> [   16.337093] xhci-mtk 11290000.usb: new USB bus registered, assigned bus number 5
> [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
> [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
> [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
> [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
> 
> A previous message [2] suggests that a force-mode phy property that has been
> merged might help with addressing the issue, however it's not clear to me how,
> given that the controller at 1129000 uses a USB2 phy and the phy driver patch
> only looks for the property on USB3 phys.
> 
> Worth noting that the issue doesn't always happen. For instance the test did
> pass for next-20240110 and then failed again on today's next [3]. But it does
> seem that the issue was introduced, or at least became much more likely, between
> next-20231221 and next-20240103, given that it never happened out of 10 runs
> before, and after that has happened 5 out of 7 times.
> 
> Note: On the Tomato Chromebook specifically this USB controller is not connected
> to anything.
> 
> [1] https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/
> [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
> [3] https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/
> 
> Thanks,
> Nícolas

Hey Nícolas,

I wonder if this is happening because of async probe... I have seen those happening
once in a (long) while on MT8186 as well with the same kind of flakiness and I am
not even able to reproduce anymore.

For MT8195 Tomato, I guess we can simply disable that controller without any side
effects but, at the same time, I'm not sure that this would be the right thing to
do in this case.

Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
but I don't know if that can ring any bell....

Cheers,
Angelo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-01-19  9:12 ` AngeloGioacchino Del Regno
@ 2024-07-10 19:15   ` Nícolas F. R. A. Prado
  2024-07-11  4:13     ` Macpaul Lin
  2024-07-11  9:14     ` AngeloGioacchino Del Regno
  0 siblings, 2 replies; 12+ messages in thread
From: Nícolas F. R. A. Prado @ 2024-07-10 19:15 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno
  Cc: Matthias Brugger, devicetree, linux-kernel, linux-arm-kernel,
	linux-mediatek, kernel, Macpaul Lin, Chunfeng Yun, Chen-Yu Tsai

On Fri, Jan 19, 2024 at 10:12:07AM +0100, AngeloGioacchino Del Regno wrote:
> Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
> > Hi,
> > 
> > KernelCI has identified a failure in the probe of one of the USB controllers on
> > the MT8195-Tomato Chromebook [1]:
> > 
> > [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
> > [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
> > [   16.337093] xhci-mtk 11290000.usb: new USB bus registered, assigned bus number 5
> > [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
> > [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
> > [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
> > [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
> > 
> > A previous message [2] suggests that a force-mode phy property that has been
> > merged might help with addressing the issue, however it's not clear to me how,
> > given that the controller at 1129000 uses a USB2 phy and the phy driver patch
> > only looks for the property on USB3 phys.
> > 
> > Worth noting that the issue doesn't always happen. For instance the test did
> > pass for next-20240110 and then failed again on today's next [3]. But it does
> > seem that the issue was introduced, or at least became much more likely, between
> > next-20231221 and next-20240103, given that it never happened out of 10 runs
> > before, and after that has happened 5 out of 7 times.
> > 
> > Note: On the Tomato Chromebook specifically this USB controller is not connected
> > to anything.
> > 
> > [1] https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/
> > [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
> > [3] https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/
> > 
> > Thanks,
> > Nícolas
> 
> Hey Nícolas,
> 
> I wonder if this is happening because of async probe... I have seen those happening
> once in a (long) while on MT8186 as well with the same kind of flakiness and I am
> not even able to reproduce anymore.
> 
> For MT8195 Tomato, I guess we can simply disable that controller without any side
> effects but, at the same time, I'm not sure that this would be the right thing to
> do in this case.
> 
> Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
> but I don't know if that can ring any bell....

An update on this issue: it looks like it only happens if "xhci-mtk
11290000.usb" probes before "mtk-pcie-gen3 112f8000.pcie". What they have in
common is that both of those nodes use phys that share the same t-phy block:
pcie uses the usb3 phy while xhci uses the usb2 phy. So it seems that some of
the initialization done by the pcie controller might be implicitly needed by the
usb controller.

This should help to narrow down the issue and find a proper fix for it.

Thanks,
Nícolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-07-10 19:15   ` Nícolas F. R. A. Prado
@ 2024-07-11  4:13     ` Macpaul Lin
  2024-07-11  9:21       ` AngeloGioacchino Del Regno
  2024-07-11  9:14     ` AngeloGioacchino Del Regno
  1 sibling, 1 reply; 12+ messages in thread
From: Macpaul Lin @ 2024-07-11  4:13 UTC (permalink / raw)
  To: Nícolas F. R. A. Prado, AngeloGioacchino Del Regno,
	Chunfeng Yun
  Cc: Matthias Brugger, devicetree, linux-kernel, linux-arm-kernel,
	linux-mediatek, kernel, Chen-Yu Tsai, Bear Wang, Pablo Sun



On 7/11/24 03:15, Nícolas F. R. A. Prado wrote:
> On Fri, Jan 19, 2024 at 10:12:07AM +0100, AngeloGioacchino Del Regno wrote:
>> Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
>>> Hi,
>>>
>>> KernelCI has identified a failure in the probe of one of the USB controllers on
>>> the MT8195-Tomato Chromebook [1]:
>>>
>>> [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
>>> [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
>>> [   16.337093] xhci-mtk 11290000.usb: new USB bus registered, assigned bus number 5
>>> [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
>>> [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
>>> [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
>>> [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
>>>
>>> A previous message [2] suggests that a force-mode phy property that has been
>>> merged might help with addressing the issue, however it's not clear to me how,
>>> given that the controller at 1129000 uses a USB2 phy and the phy driver patch
>>> only looks for the property on USB3 phys.
>>>
>>> Worth noting that the issue doesn't always happen. For instance the test did
>>> pass for next-20240110 and then failed again on today's next [3]. But it does
>>> seem that the issue was introduced, or at least became much more likely, between
>>> next-20231221 and next-20240103, given that it never happened out of 10 runs
>>> before, and after that has happened 5 out of 7 times.
>>>
>>> Note: On the Tomato Chromebook specifically this USB controller is not connected
>>> to anything.
>>>
>>> [1] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENT5oGHKY$
>>> [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
>>> [3] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENi-d0sVc$
>>>
>>> Thanks,
>>> Nícolas
>>
>> Hey Nícolas,
>>
>> I wonder if this is happening because of async probe... I have seen those happening
>> once in a (long) while on MT8186 as well with the same kind of flakiness and I am
>> not even able to reproduce anymore.
>>
>> For MT8195 Tomato, I guess we can simply disable that controller without any side
>> effects but, at the same time, I'm not sure that this would be the right thing to
>> do in this case.
>>
>> Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
>> but I don't know if that can ring any bell....
> 
> An update on this issue: it looks like it only happens if "xhci-mtk
> 11290000.usb" probes before "mtk-pcie-gen3 112f8000.pcie". What they have in
> common is that both of those nodes use phys that share the same t-phy block:
> pcie uses the usb3 phy while xhci uses the usb2 phy. So it seems that some of
> the initialization done by the pcie controller might be implicitly needed by the
> usb controller.
> 
> This should help to narrow down the issue and find a proper fix for it.
> 
> Thanks,
> Nícolas

'force-mode' should only applied to the boards which require XHCI 
function instead of a PCIE port.

For example, mt8395-genio-1200-evk.dts requires property 'force-mode' to 
fix probe issue for USBC @11290000.

https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux.git/commit/?h=v6.10-next/dts64&id=666e6f39faff05fe12bfc64c64aa9015135ce783

'force-mode' should be no need for tomato boards and the behavior should 
be the same as before.

Another possibility is the firmware change on tomato boards. I'm not 
sure if there is any changes on tomato's recent firmware for tphy of 
this port, which could also be a reason causes this kind of failure.
I don't have tomato boards on hand.

Thanks
Macpaul Lin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-07-10 19:15   ` Nícolas F. R. A. Prado
  2024-07-11  4:13     ` Macpaul Lin
@ 2024-07-11  9:14     ` AngeloGioacchino Del Regno
  1 sibling, 0 replies; 12+ messages in thread
From: AngeloGioacchino Del Regno @ 2024-07-11  9:14 UTC (permalink / raw)
  To: Nícolas F. R. A. Prado
  Cc: Matthias Brugger, devicetree, linux-kernel, linux-arm-kernel,
	linux-mediatek, kernel, Macpaul Lin, Chunfeng Yun, Chen-Yu Tsai

Il 10/07/24 21:15, Nícolas F. R. A. Prado ha scritto:
> On Fri, Jan 19, 2024 at 10:12:07AM +0100, AngeloGioacchino Del Regno wrote:
>> Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
>>> Hi,
>>>
>>> KernelCI has identified a failure in the probe of one of the USB controllers on
>>> the MT8195-Tomato Chromebook [1]:
>>>
>>> [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
>>> [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
>>> [   16.337093] xhci-mtk 11290000.usb: new USB bus registered, assigned bus number 5
>>> [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
>>> [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
>>> [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
>>> [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
>>>
>>> A previous message [2] suggests that a force-mode phy property that has been
>>> merged might help with addressing the issue, however it's not clear to me how,
>>> given that the controller at 1129000 uses a USB2 phy and the phy driver patch
>>> only looks for the property on USB3 phys.
>>>
>>> Worth noting that the issue doesn't always happen. For instance the test did
>>> pass for next-20240110 and then failed again on today's next [3]. But it does
>>> seem that the issue was introduced, or at least became much more likely, between
>>> next-20231221 and next-20240103, given that it never happened out of 10 runs
>>> before, and after that has happened 5 out of 7 times.
>>>
>>> Note: On the Tomato Chromebook specifically this USB controller is not connected
>>> to anything.
>>>
>>> [1] https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/
>>> [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
>>> [3] https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/
>>>
>>> Thanks,
>>> Nícolas
>>
>> Hey Nícolas,
>>
>> I wonder if this is happening because of async probe... I have seen those happening
>> once in a (long) while on MT8186 as well with the same kind of flakiness and I am
>> not even able to reproduce anymore.
>>
>> For MT8195 Tomato, I guess we can simply disable that controller without any side
>> effects but, at the same time, I'm not sure that this would be the right thing to
>> do in this case.
>>
>> Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
>> but I don't know if that can ring any bell....
> 
> An update on this issue: it looks like it only happens if "xhci-mtk
> 11290000.usb" probes before "mtk-pcie-gen3 112f8000.pcie". What they have in
> common is that both of those nodes use phys that share the same t-phy block:
> pcie uses the usb3 phy while xhci uses the usb2 phy. So it seems that some of
> the initialization done by the pcie controller might be implicitly needed by the
> usb controller.
> 
> This should help to narrow down the issue and find a proper fix for it.
> 

This gave me a couple ideas to try... and it looks like I have resolved this issue.

A commit will follow soon.

Thank you!
Angelo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-07-11  4:13     ` Macpaul Lin
@ 2024-07-11  9:21       ` AngeloGioacchino Del Regno
  2024-07-11 16:33         ` Nícolas F. R. A. Prado
  0 siblings, 1 reply; 12+ messages in thread
From: AngeloGioacchino Del Regno @ 2024-07-11  9:21 UTC (permalink / raw)
  To: Macpaul Lin, Nícolas F. R. A. Prado, Chunfeng Yun
  Cc: Matthias Brugger, devicetree, linux-kernel, linux-arm-kernel,
	linux-mediatek, kernel, Chen-Yu Tsai, Bear Wang, Pablo Sun

Il 11/07/24 06:13, Macpaul Lin ha scritto:
> 
> 
> On 7/11/24 03:15, Nícolas F. R. A. Prado wrote:
>> On Fri, Jan 19, 2024 at 10:12:07AM +0100, AngeloGioacchino Del Regno wrote:
>>> Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
>>>> Hi,
>>>>
>>>> KernelCI has identified a failure in the probe of one of the USB controllers on
>>>> the MT8195-Tomato Chromebook [1]:
>>>>
>>>> [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
>>>> [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
>>>> [   16.337093] xhci-mtk 11290000.usb: new USB bus registered, assigned bus 
>>>> number 5
>>>> [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
>>>> [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
>>>> [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
>>>> [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
>>>>
>>>> A previous message [2] suggests that a force-mode phy property that has been
>>>> merged might help with addressing the issue, however it's not clear to me how,
>>>> given that the controller at 1129000 uses a USB2 phy and the phy driver patch
>>>> only looks for the property on USB3 phys.
>>>>
>>>> Worth noting that the issue doesn't always happen. For instance the test did
>>>> pass for next-20240110 and then failed again on today's next [3]. But it does
>>>> seem that the issue was introduced, or at least became much more likely, between
>>>> next-20231221 and next-20240103, given that it never happened out of 10 runs
>>>> before, and after that has happened 5 out of 7 times.
>>>>
>>>> Note: On the Tomato Chromebook specifically this USB controller is not connected
>>>> to anything.
>>>>
>>>> [1] 
>>>> https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENT5oGHKY$
>>>> [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
>>>> [3] 
>>>> https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENi-d0sVc$
>>>>
>>>> Thanks,
>>>> Nícolas
>>>
>>> Hey Nícolas,
>>>
>>> I wonder if this is happening because of async probe... I have seen those happening
>>> once in a (long) while on MT8186 as well with the same kind of flakiness and I am
>>> not even able to reproduce anymore.
>>>
>>> For MT8195 Tomato, I guess we can simply disable that controller without any side
>>> effects but, at the same time, I'm not sure that this would be the right thing to
>>> do in this case.
>>>
>>> Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
>>> but I don't know if that can ring any bell....
>>
>> An update on this issue: it looks like it only happens if "xhci-mtk
>> 11290000.usb" probes before "mtk-pcie-gen3 112f8000.pcie". What they have in
>> common is that both of those nodes use phys that share the same t-phy block:
>> pcie uses the usb3 phy while xhci uses the usb2 phy. So it seems that some of
>> the initialization done by the pcie controller might be implicitly needed by the
>> usb controller.
>>
>> This should help to narrow down the issue and find a proper fix for it.
>>
>> Thanks,
>> Nícolas
> 
> 'force-mode' should only applied to the boards which require XHCI function instead 
> of a PCIE port.
> 
> For example, mt8395-genio-1200-evk.dts requires property 'force-mode' to fix probe 
> issue for USBC @11290000.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux.git/commit/?h=v6.10-next/dts64&id=666e6f39faff05fe12bfc64c64aa9015135ce783
> 
> 'force-mode' should be no need for tomato boards and the behavior should be the 
> same as before.
> 
> Another possibility is the firmware change on tomato boards. I'm not sure if there 
> is any changes on tomato's recent firmware for tphy of this port, which could also 
> be a reason causes this kind of failure.
> I don't have tomato boards on hand.
> 

Hello Macpaul,

it's just about the usb node missing a power domain: as the PCIE_MAC_P1 domain
seems to be shared between USB and PCIe, adding it to the USB node fixes the
setup phase.

I'll send a devicetree fix soon.

Cheers,
Angelo

> Thanks
> Macpaul Lin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-07-11  9:21       ` AngeloGioacchino Del Regno
@ 2024-07-11 16:33         ` Nícolas F. R. A. Prado
  2024-07-12  8:12           ` AngeloGioacchino Del Regno
  0 siblings, 1 reply; 12+ messages in thread
From: Nícolas F. R. A. Prado @ 2024-07-11 16:33 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno
  Cc: Macpaul Lin, Chunfeng Yun, Matthias Brugger, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, kernel,
	Chen-Yu Tsai, Bear Wang, Pablo Sun

On Thu, Jul 11, 2024 at 11:21:14AM +0200, AngeloGioacchino Del Regno wrote:
> Il 11/07/24 06:13, Macpaul Lin ha scritto:
> > 
> > 
> > On 7/11/24 03:15, Nícolas F. R. A. Prado wrote:
> > > On Fri, Jan 19, 2024 at 10:12:07AM +0100, AngeloGioacchino Del Regno wrote:
> > > > Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
> > > > > Hi,
> > > > > 
> > > > > KernelCI has identified a failure in the probe of one of the USB controllers on
> > > > > the MT8195-Tomato Chromebook [1]:
> > > > > 
> > > > > [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
> > > > > [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
> > > > > [   16.337093] xhci-mtk 11290000.usb: new USB bus
> > > > > registered, assigned bus number 5
> > > > > [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
> > > > > [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
> > > > > [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
> > > > > [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
> > > > > 
> > > > > A previous message [2] suggests that a force-mode phy property that has been
> > > > > merged might help with addressing the issue, however it's not clear to me how,
> > > > > given that the controller at 1129000 uses a USB2 phy and the phy driver patch
> > > > > only looks for the property on USB3 phys.
> > > > > 
> > > > > Worth noting that the issue doesn't always happen. For instance the test did
> > > > > pass for next-20240110 and then failed again on today's next [3]. But it does
> > > > > seem that the issue was introduced, or at least became much more likely, between
> > > > > next-20231221 and next-20240103, given that it never happened out of 10 runs
> > > > > before, and after that has happened 5 out of 7 times.
> > > > > 
> > > > > Note: On the Tomato Chromebook specifically this USB controller is not connected
> > > > > to anything.
> > > > > 
> > > > > [1] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENT5oGHKY$
> > > > > [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
> > > > > [3] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENi-d0sVc$
> > > > > 
> > > > > Thanks,
> > > > > Nícolas
> > > > 
> > > > Hey Nícolas,
> > > > 
> > > > I wonder if this is happening because of async probe... I have seen those happening
> > > > once in a (long) while on MT8186 as well with the same kind of flakiness and I am
> > > > not even able to reproduce anymore.
> > > > 
> > > > For MT8195 Tomato, I guess we can simply disable that controller without any side
> > > > effects but, at the same time, I'm not sure that this would be the right thing to
> > > > do in this case.
> > > > 
> > > > Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
> > > > but I don't know if that can ring any bell....
> > > 
> > > An update on this issue: it looks like it only happens if "xhci-mtk
> > > 11290000.usb" probes before "mtk-pcie-gen3 112f8000.pcie". What they have in
> > > common is that both of those nodes use phys that share the same t-phy block:
> > > pcie uses the usb3 phy while xhci uses the usb2 phy. So it seems that some of
> > > the initialization done by the pcie controller might be implicitly needed by the
> > > usb controller.
> > > 
> > > This should help to narrow down the issue and find a proper fix for it.
> > > 
> > > Thanks,
> > > Nícolas
> > 
> > 'force-mode' should only applied to the boards which require XHCI
> > function instead of a PCIE port.
> > 
> > For example, mt8395-genio-1200-evk.dts requires property 'force-mode' to
> > fix probe issue for USBC @11290000.
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux.git/commit/?h=v6.10-next/dts64&id=666e6f39faff05fe12bfc64c64aa9015135ce783
> > 
> > 'force-mode' should be no need for tomato boards and the behavior should
> > be the same as before.
> > 
> > Another possibility is the firmware change on tomato boards. I'm not
> > sure if there is any changes on tomato's recent firmware for tphy of
> > this port, which could also be a reason causes this kind of failure.
> > I don't have tomato boards on hand.
> > 
> 
> Hello Macpaul,
> 
> it's just about the usb node missing a power domain: as the PCIE_MAC_P1 domain
> seems to be shared between USB and PCIe, adding it to the USB node fixes the
> setup phase.
> 
> I'll send a devicetree fix soon.

Hi,

As I replied to that patch
(https://lore.kernel.org/all/20240711093230.118534-1-angelogioacchino.delregno@collabora.com)
it didn't fix the issue for me, but I have more updates:

I confirmed the pcie was doing some required setup since disabling the pcie1
node made the issue always happen, and that also made it easier to test.

I was able to track the issue down to the following clock:
<&infracfg_ao CLK_INFRA_AO_PCIE_P1_TL_96M>

Adding it to the clocks property of the xhci1 node fixed the issue.

I'm just not sure from a DT perspective what's the right way to describe this
clock. The node doesn't have the frmcnt_ck, is this that clock? Or is it
another clock that currently isn't described in the dt-bindings and driver?

Thanks,
Nícolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-07-11 16:33         ` Nícolas F. R. A. Prado
@ 2024-07-12  8:12           ` AngeloGioacchino Del Regno
  2024-07-12 15:58             ` Nícolas F. R. A. Prado
  0 siblings, 1 reply; 12+ messages in thread
From: AngeloGioacchino Del Regno @ 2024-07-12  8:12 UTC (permalink / raw)
  To: Nícolas F. R. A. Prado
  Cc: Macpaul Lin, Chunfeng Yun, Matthias Brugger, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, kernel,
	Chen-Yu Tsai, Bear Wang, Pablo Sun

Il 11/07/24 18:33, Nícolas F. R. A. Prado ha scritto:
> On Thu, Jul 11, 2024 at 11:21:14AM +0200, AngeloGioacchino Del Regno wrote:
>> Il 11/07/24 06:13, Macpaul Lin ha scritto:
>>>
>>>
>>> On 7/11/24 03:15, Nícolas F. R. A. Prado wrote:
>>>> On Fri, Jan 19, 2024 at 10:12:07AM +0100, AngeloGioacchino Del Regno wrote:
>>>>> Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
>>>>>> Hi,
>>>>>>
>>>>>> KernelCI has identified a failure in the probe of one of the USB controllers on
>>>>>> the MT8195-Tomato Chromebook [1]:
>>>>>>
>>>>>> [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
>>>>>> [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
>>>>>> [   16.337093] xhci-mtk 11290000.usb: new USB bus
>>>>>> registered, assigned bus number 5
>>>>>> [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
>>>>>> [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
>>>>>> [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
>>>>>> [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
>>>>>>
>>>>>> A previous message [2] suggests that a force-mode phy property that has been
>>>>>> merged might help with addressing the issue, however it's not clear to me how,
>>>>>> given that the controller at 1129000 uses a USB2 phy and the phy driver patch
>>>>>> only looks for the property on USB3 phys.
>>>>>>
>>>>>> Worth noting that the issue doesn't always happen. For instance the test did
>>>>>> pass for next-20240110 and then failed again on today's next [3]. But it does
>>>>>> seem that the issue was introduced, or at least became much more likely, between
>>>>>> next-20231221 and next-20240103, given that it never happened out of 10 runs
>>>>>> before, and after that has happened 5 out of 7 times.
>>>>>>
>>>>>> Note: On the Tomato Chromebook specifically this USB controller is not connected
>>>>>> to anything.
>>>>>>
>>>>>> [1] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENT5oGHKY$
>>>>>> [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
>>>>>> [3] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENi-d0sVc$
>>>>>>
>>>>>> Thanks,
>>>>>> Nícolas
>>>>>
>>>>> Hey Nícolas,
>>>>>
>>>>> I wonder if this is happening because of async probe... I have seen those happening
>>>>> once in a (long) while on MT8186 as well with the same kind of flakiness and I am
>>>>> not even able to reproduce anymore.
>>>>>
>>>>> For MT8195 Tomato, I guess we can simply disable that controller without any side
>>>>> effects but, at the same time, I'm not sure that this would be the right thing to
>>>>> do in this case.
>>>>>
>>>>> Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
>>>>> but I don't know if that can ring any bell....
>>>>
>>>> An update on this issue: it looks like it only happens if "xhci-mtk
>>>> 11290000.usb" probes before "mtk-pcie-gen3 112f8000.pcie". What they have in
>>>> common is that both of those nodes use phys that share the same t-phy block:
>>>> pcie uses the usb3 phy while xhci uses the usb2 phy. So it seems that some of
>>>> the initialization done by the pcie controller might be implicitly needed by the
>>>> usb controller.
>>>>
>>>> This should help to narrow down the issue and find a proper fix for it.
>>>>
>>>> Thanks,
>>>> Nícolas
>>>
>>> 'force-mode' should only applied to the boards which require XHCI
>>> function instead of a PCIE port.
>>>
>>> For example, mt8395-genio-1200-evk.dts requires property 'force-mode' to
>>> fix probe issue for USBC @11290000.
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux.git/commit/?h=v6.10-next/dts64&id=666e6f39faff05fe12bfc64c64aa9015135ce783
>>>
>>> 'force-mode' should be no need for tomato boards and the behavior should
>>> be the same as before.
>>>
>>> Another possibility is the firmware change on tomato boards. I'm not
>>> sure if there is any changes on tomato's recent firmware for tphy of
>>> this port, which could also be a reason causes this kind of failure.
>>> I don't have tomato boards on hand.
>>>
>>
>> Hello Macpaul,
>>
>> it's just about the usb node missing a power domain: as the PCIE_MAC_P1 domain
>> seems to be shared between USB and PCIe, adding it to the USB node fixes the
>> setup phase.
>>
>> I'll send a devicetree fix soon.
> 
> Hi,
> 
> As I replied to that patch
> (https://lore.kernel.org/all/20240711093230.118534-1-angelogioacchino.delregno@collabora.com)
> it didn't fix the issue for me, but I have more updates:
> 
> I confirmed the pcie was doing some required setup since disabling the pcie1
> node made the issue always happen, and that also made it easier to test.
> 
> I was able to track the issue down to the following clock:
> <&infracfg_ao CLK_INFRA_AO_PCIE_P1_TL_96M>
> 
> Adding it to the clocks property of the xhci1 node fixed the issue.
> 

Clocks is what I tried first, and didn't do anything for me...

..anyway, can you at this point try to run that solution on the multiple
devices that we have in the lab through KernelCI?

That would help validating that you're not facing the same false positive
as mine from yesterday...

> I'm just not sure from a DT perspective what's the right way to describe this
> clock. The node doesn't have the frmcnt_ck, is this that clock? Or is it
> another clock that currently isn't described in the dt-bindings and driver?
> 

That's the PCI-Express Root Port (RP) Transaction Layer (TL) clock... and I have
no idea why this has anything to do with USB.

MediaTek, is that a hardware quirk? What is the relation between this clock and
the USB controller at 11290000?

Thanks,
Angelo

> Thanks,
> Nícolas


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-07-12  8:12           ` AngeloGioacchino Del Regno
@ 2024-07-12 15:58             ` Nícolas F. R. A. Prado
  2024-07-15 12:04               ` AngeloGioacchino Del Regno
  0 siblings, 1 reply; 12+ messages in thread
From: Nícolas F. R. A. Prado @ 2024-07-12 15:58 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno
  Cc: Macpaul Lin, Chunfeng Yun, Matthias Brugger, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, kernel,
	Chen-Yu Tsai, Bear Wang, Pablo Sun

On Fri, Jul 12, 2024 at 10:12:39AM +0200, AngeloGioacchino Del Regno wrote:
> Il 11/07/24 18:33, Nícolas F. R. A. Prado ha scritto:
> > On Thu, Jul 11, 2024 at 11:21:14AM +0200, AngeloGioacchino Del Regno wrote:
> > > Il 11/07/24 06:13, Macpaul Lin ha scritto:
> > > > 
> > > > 
> > > > On 7/11/24 03:15, Nícolas F. R. A. Prado wrote:
> > > > > On Fri, Jan 19, 2024 at 10:12:07AM +0100, AngeloGioacchino Del Regno wrote:
> > > > > > Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > KernelCI has identified a failure in the probe of one of the USB controllers on
> > > > > > > the MT8195-Tomato Chromebook [1]:
> > > > > > > 
> > > > > > > [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
> > > > > > > [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
> > > > > > > [   16.337093] xhci-mtk 11290000.usb: new USB bus
> > > > > > > registered, assigned bus number 5
> > > > > > > [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
> > > > > > > [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
> > > > > > > [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
> > > > > > > [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
> > > > > > > 
> > > > > > > A previous message [2] suggests that a force-mode phy property that has been
> > > > > > > merged might help with addressing the issue, however it's not clear to me how,
> > > > > > > given that the controller at 1129000 uses a USB2 phy and the phy driver patch
> > > > > > > only looks for the property on USB3 phys.
> > > > > > > 
> > > > > > > Worth noting that the issue doesn't always happen. For instance the test did
> > > > > > > pass for next-20240110 and then failed again on today's next [3]. But it does
> > > > > > > seem that the issue was introduced, or at least became much more likely, between
> > > > > > > next-20231221 and next-20240103, given that it never happened out of 10 runs
> > > > > > > before, and after that has happened 5 out of 7 times.
> > > > > > > 
> > > > > > > Note: On the Tomato Chromebook specifically this USB controller is not connected
> > > > > > > to anything.
> > > > > > > 
> > > > > > > [1] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENT5oGHKY$
> > > > > > > [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
> > > > > > > [3] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENi-d0sVc$
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Nícolas
> > > > > > 
> > > > > > Hey Nícolas,
> > > > > > 
> > > > > > I wonder if this is happening because of async probe... I have seen those happening
> > > > > > once in a (long) while on MT8186 as well with the same kind of flakiness and I am
> > > > > > not even able to reproduce anymore.
> > > > > > 
> > > > > > For MT8195 Tomato, I guess we can simply disable that controller without any side
> > > > > > effects but, at the same time, I'm not sure that this would be the right thing to
> > > > > > do in this case.
> > > > > > 
> > > > > > Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
> > > > > > but I don't know if that can ring any bell....
> > > > > 
> > > > > An update on this issue: it looks like it only happens if "xhci-mtk
> > > > > 11290000.usb" probes before "mtk-pcie-gen3 112f8000.pcie". What they have in
> > > > > common is that both of those nodes use phys that share the same t-phy block:
> > > > > pcie uses the usb3 phy while xhci uses the usb2 phy. So it seems that some of
> > > > > the initialization done by the pcie controller might be implicitly needed by the
> > > > > usb controller.
> > > > > 
> > > > > This should help to narrow down the issue and find a proper fix for it.
> > > > > 
> > > > > Thanks,
> > > > > Nícolas
> > > > 
> > > > 'force-mode' should only applied to the boards which require XHCI
> > > > function instead of a PCIE port.
> > > > 
> > > > For example, mt8395-genio-1200-evk.dts requires property 'force-mode' to
> > > > fix probe issue for USBC @11290000.
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux.git/commit/?h=v6.10-next/dts64&id=666e6f39faff05fe12bfc64c64aa9015135ce783
> > > > 
> > > > 'force-mode' should be no need for tomato boards and the behavior should
> > > > be the same as before.
> > > > 
> > > > Another possibility is the firmware change on tomato boards. I'm not
> > > > sure if there is any changes on tomato's recent firmware for tphy of
> > > > this port, which could also be a reason causes this kind of failure.
> > > > I don't have tomato boards on hand.
> > > > 
> > > 
> > > Hello Macpaul,
> > > 
> > > it's just about the usb node missing a power domain: as the PCIE_MAC_P1 domain
> > > seems to be shared between USB and PCIe, adding it to the USB node fixes the
> > > setup phase.
> > > 
> > > I'll send a devicetree fix soon.
> > 
> > Hi,
> > 
> > As I replied to that patch
> > (https://lore.kernel.org/all/20240711093230.118534-1-angelogioacchino.delregno@collabora.com)
> > it didn't fix the issue for me, but I have more updates:
> > 
> > I confirmed the pcie was doing some required setup since disabling the pcie1
> > node made the issue always happen, and that also made it easier to test.
> > 
> > I was able to track the issue down to the following clock:
> > <&infracfg_ao CLK_INFRA_AO_PCIE_P1_TL_96M>
> > 
> > Adding it to the clocks property of the xhci1 node fixed the issue.
> > 
> 
> Clocks is what I tried first, and didn't do anything for me...
> 
> ..anyway, can you at this point try to run that solution on the multiple
> devices that we have in the lab through KernelCI?
> 
> That would help validating that you're not facing the same false positive
> as mine from yesterday...

Hi,

I've ran 10 times with and 10 times without the following patch:

  diff --git a/arch/arm64/boot/dts/mediatek/mt8195.dtsi b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
  index 2ee45752583c..611afe4de968 100644
  --- a/arch/arm64/boot/dts/mediatek/mt8195.dtsi
  +++ b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
  @@ -1453,9 +1453,10 @@ xhci1: usb@11290000 {
                                   <&topckgen CLK_TOP_SSUSB_P1_REF>,
                                   <&apmixedsys CLK_APMIXED_USB1PLL>,
                                   <&clk26m>,
  -                                <&pericfg_ao CLK_PERI_AO_SSUSB_1P_XHCI>;
  +                                <&pericfg_ao CLK_PERI_AO_SSUSB_1P_XHCI>,
  +                                <&infracfg_ao CLK_INFRA_AO_PCIE_P1_TL_96M>;
                          clock-names = "sys_ck", "ref_ck", "mcu_ck", "dma_ck",
  -                                     "xhci_ck";
  +                                     "xhci_ck", "frmcnt_ck";
                          mediatek,syscon-wakeup = <&pericfg 0x400 104>;
                          wakeup-source;
                          status = "disabled";

In both cases I also had

  diff --git a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
  index fe5400e17b0f..e50be8a82d49 100644
  --- a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
  +++ b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
  @@ -613,7 +613,7 @@ flash@0 {
   };
  
   &pcie1 {
  -       status = "okay";
  +       /* status = "okay"; */
  
          pinctrl-names = "default";
          pinctrl-0 = <&pcie1_pins_default>;

to make the issue always happen.

For reproducibility purposes, this was tested on next-20240703 with the
following config: http://0x0.st/XMGM.txt

And the results confirm that every run (10/10) with the patch didn't experience
the issue:

   https://lava.collabora.dev/scheduler/job/14805738
   https://lava.collabora.dev/scheduler/job/14805757
   https://lava.collabora.dev/scheduler/job/14805759
   https://lava.collabora.dev/scheduler/job/14805789
   https://lava.collabora.dev/scheduler/job/14805791
   https://lava.collabora.dev/scheduler/job/14805792
   https://lava.collabora.dev/scheduler/job/14805795
   https://lava.collabora.dev/scheduler/job/14805799
   https://lava.collabora.dev/scheduler/job/14805816
   https://lava.collabora.dev/scheduler/job/14805820

While every run (10/10) without the patch experienced the issue:

   https://lava.collabora.dev/scheduler/job/14805740
   https://lava.collabora.dev/scheduler/job/14805758
   https://lava.collabora.dev/scheduler/job/14805787
   https://lava.collabora.dev/scheduler/job/14805790
   https://lava.collabora.dev/scheduler/job/14805793
   https://lava.collabora.dev/scheduler/job/14805796
   https://lava.collabora.dev/scheduler/job/14805803
   https://lava.collabora.dev/scheduler/job/14805818
   https://lava.collabora.dev/scheduler/job/14805822
   https://lava.collabora.dev/scheduler/job/14805876

These runs are across different units of tomato-r2. I also tried on tomato-r3
with the same result:
without clock, fail: https://lava.collabora.dev/scheduler/job/14806546
with clock, pass: https://lava.collabora.dev/scheduler/job/14806547

So this definitely fixes it. Whether or not this is the right fix, or how to
describe this clock, I'll need your and MediaTek's help to figure out.

Thanks,
Nícolas

> 
> > I'm just not sure from a DT perspective what's the right way to describe this
> > clock. The node doesn't have the frmcnt_ck, is this that clock? Or is it
> > another clock that currently isn't described in the dt-bindings and driver?
> > 
> 
> That's the PCI-Express Root Port (RP) Transaction Layer (TL) clock... and I have
> no idea why this has anything to do with USB.
> 
> MediaTek, is that a hardware quirk? What is the relation between this clock and
> the USB controller at 11290000?
> 
> Thanks,
> Angelo
> 
> > Thanks,
> > Nícolas
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-07-12 15:58             ` Nícolas F. R. A. Prado
@ 2024-07-15 12:04               ` AngeloGioacchino Del Regno
  2024-07-15 14:00                 ` Nícolas F. R. A. Prado
  2024-07-22 14:53                 ` Nícolas F. R. A. Prado
  0 siblings, 2 replies; 12+ messages in thread
From: AngeloGioacchino Del Regno @ 2024-07-15 12:04 UTC (permalink / raw)
  To: Nícolas F. R. A. Prado
  Cc: Macpaul Lin, Chunfeng Yun, Matthias Brugger, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, kernel,
	Chen-Yu Tsai, Bear Wang, Pablo Sun

Il 12/07/24 17:58, Nícolas F. R. A. Prado ha scritto:
> On Fri, Jul 12, 2024 at 10:12:39AM +0200, AngeloGioacchino Del Regno wrote:
>> Il 11/07/24 18:33, Nícolas F. R. A. Prado ha scritto:
>>> On Thu, Jul 11, 2024 at 11:21:14AM +0200, AngeloGioacchino Del Regno wrote:
>>>> Il 11/07/24 06:13, Macpaul Lin ha scritto:
>>>>>
>>>>>
>>>>> On 7/11/24 03:15, Nícolas F. R. A. Prado wrote:
>>>>>> On Fri, Jan 19, 2024 at 10:12:07AM +0100, AngeloGioacchino Del Regno wrote:
>>>>>>> Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> KernelCI has identified a failure in the probe of one of the USB controllers on
>>>>>>>> the MT8195-Tomato Chromebook [1]:
>>>>>>>>
>>>>>>>> [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
>>>>>>>> [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
>>>>>>>> [   16.337093] xhci-mtk 11290000.usb: new USB bus
>>>>>>>> registered, assigned bus number 5
>>>>>>>> [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
>>>>>>>> [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
>>>>>>>> [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
>>>>>>>> [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
>>>>>>>>
>>>>>>>> A previous message [2] suggests that a force-mode phy property that has been
>>>>>>>> merged might help with addressing the issue, however it's not clear to me how,
>>>>>>>> given that the controller at 1129000 uses a USB2 phy and the phy driver patch
>>>>>>>> only looks for the property on USB3 phys.
>>>>>>>>
>>>>>>>> Worth noting that the issue doesn't always happen. For instance the test did
>>>>>>>> pass for next-20240110 and then failed again on today's next [3]. But it does
>>>>>>>> seem that the issue was introduced, or at least became much more likely, between
>>>>>>>> next-20231221 and next-20240103, given that it never happened out of 10 runs
>>>>>>>> before, and after that has happened 5 out of 7 times.
>>>>>>>>
>>>>>>>> Note: On the Tomato Chromebook specifically this USB controller is not connected
>>>>>>>> to anything.
>>>>>>>>
>>>>>>>> [1] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENT5oGHKY$
>>>>>>>> [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
>>>>>>>> [3] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENi-d0sVc$
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Nícolas
>>>>>>>
>>>>>>> Hey Nícolas,
>>>>>>>
>>>>>>> I wonder if this is happening because of async probe... I have seen those happening
>>>>>>> once in a (long) while on MT8186 as well with the same kind of flakiness and I am
>>>>>>> not even able to reproduce anymore.
>>>>>>>
>>>>>>> For MT8195 Tomato, I guess we can simply disable that controller without any side
>>>>>>> effects but, at the same time, I'm not sure that this would be the right thing to
>>>>>>> do in this case.
>>>>>>>
>>>>>>> Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
>>>>>>> but I don't know if that can ring any bell....
>>>>>>
>>>>>> An update on this issue: it looks like it only happens if "xhci-mtk
>>>>>> 11290000.usb" probes before "mtk-pcie-gen3 112f8000.pcie". What they have in
>>>>>> common is that both of those nodes use phys that share the same t-phy block:
>>>>>> pcie uses the usb3 phy while xhci uses the usb2 phy. So it seems that some of
>>>>>> the initialization done by the pcie controller might be implicitly needed by the
>>>>>> usb controller.
>>>>>>
>>>>>> This should help to narrow down the issue and find a proper fix for it.
>>>>>>
>>>>>> Thanks,
>>>>>> Nícolas
>>>>>
>>>>> 'force-mode' should only applied to the boards which require XHCI
>>>>> function instead of a PCIE port.
>>>>>
>>>>> For example, mt8395-genio-1200-evk.dts requires property 'force-mode' to
>>>>> fix probe issue for USBC @11290000.
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux.git/commit/?h=v6.10-next/dts64&id=666e6f39faff05fe12bfc64c64aa9015135ce783
>>>>>
>>>>> 'force-mode' should be no need for tomato boards and the behavior should
>>>>> be the same as before.
>>>>>
>>>>> Another possibility is the firmware change on tomato boards. I'm not
>>>>> sure if there is any changes on tomato's recent firmware for tphy of
>>>>> this port, which could also be a reason causes this kind of failure.
>>>>> I don't have tomato boards on hand.
>>>>>
>>>>
>>>> Hello Macpaul,
>>>>
>>>> it's just about the usb node missing a power domain: as the PCIE_MAC_P1 domain
>>>> seems to be shared between USB and PCIe, adding it to the USB node fixes the
>>>> setup phase.
>>>>
>>>> I'll send a devicetree fix soon.
>>>
>>> Hi,
>>>
>>> As I replied to that patch
>>> (https://lore.kernel.org/all/20240711093230.118534-1-angelogioacchino.delregno@collabora.com)
>>> it didn't fix the issue for me, but I have more updates:
>>>
>>> I confirmed the pcie was doing some required setup since disabling the pcie1
>>> node made the issue always happen, and that also made it easier to test.
>>>
>>> I was able to track the issue down to the following clock:
>>> <&infracfg_ao CLK_INFRA_AO_PCIE_P1_TL_96M>
>>>
>>> Adding it to the clocks property of the xhci1 node fixed the issue.
>>>
>>
>> Clocks is what I tried first, and didn't do anything for me...
>>
>> ..anyway, can you at this point try to run that solution on the multiple
>> devices that we have in the lab through KernelCI?
>>
>> That would help validating that you're not facing the same false positive
>> as mine from yesterday...
> 
> Hi,
> 
> I've ran 10 times with and 10 times without the following patch:
> 
>    diff --git a/arch/arm64/boot/dts/mediatek/mt8195.dtsi b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
>    index 2ee45752583c..611afe4de968 100644
>    --- a/arch/arm64/boot/dts/mediatek/mt8195.dtsi
>    +++ b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
>    @@ -1453,9 +1453,10 @@ xhci1: usb@11290000 {
>                                     <&topckgen CLK_TOP_SSUSB_P1_REF>,
>                                     <&apmixedsys CLK_APMIXED_USB1PLL>,
>                                     <&clk26m>,
>    -                                <&pericfg_ao CLK_PERI_AO_SSUSB_1P_XHCI>;
>    +                                <&pericfg_ao CLK_PERI_AO_SSUSB_1P_XHCI>,
>    +                                <&infracfg_ao CLK_INFRA_AO_PCIE_P1_TL_96M>;
>                            clock-names = "sys_ck", "ref_ck", "mcu_ck", "dma_ck",
>    -                                     "xhci_ck";
>    +                                     "xhci_ck", "frmcnt_ck";
>                            mediatek,syscon-wakeup = <&pericfg 0x400 104>;
>                            wakeup-source;
>                            status = "disabled";
> 
> In both cases I also had
> 
>    diff --git a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
>    index fe5400e17b0f..e50be8a82d49 100644
>    --- a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
>    +++ b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
>    @@ -613,7 +613,7 @@ flash@0 {
>     };
>    
>     &pcie1 {
>    -       status = "okay";
>    +       /* status = "okay"; */
>    
>            pinctrl-names = "default";
>            pinctrl-0 = <&pcie1_pins_default>;
> 
> to make the issue always happen.
> 
> For reproducibility purposes, this was tested on next-20240703 with the
> following config: http://0x0.st/XMGM.txt
> 
> And the results confirm that every run (10/10) with the patch didn't experience
> the issue:
> 
>     https://lava.collabora.dev/scheduler/job/14805738
>     https://lava.collabora.dev/scheduler/job/14805757
>     https://lava.collabora.dev/scheduler/job/14805759
>     https://lava.collabora.dev/scheduler/job/14805789
>     https://lava.collabora.dev/scheduler/job/14805791
>     https://lava.collabora.dev/scheduler/job/14805792
>     https://lava.collabora.dev/scheduler/job/14805795
>     https://lava.collabora.dev/scheduler/job/14805799
>     https://lava.collabora.dev/scheduler/job/14805816
>     https://lava.collabora.dev/scheduler/job/14805820
> 
> While every run (10/10) without the patch experienced the issue:
> 
>     https://lava.collabora.dev/scheduler/job/14805740
>     https://lava.collabora.dev/scheduler/job/14805758
>     https://lava.collabora.dev/scheduler/job/14805787
>     https://lava.collabora.dev/scheduler/job/14805790
>     https://lava.collabora.dev/scheduler/job/14805793
>     https://lava.collabora.dev/scheduler/job/14805796
>     https://lava.collabora.dev/scheduler/job/14805803
>     https://lava.collabora.dev/scheduler/job/14805818
>     https://lava.collabora.dev/scheduler/job/14805822
>     https://lava.collabora.dev/scheduler/job/14805876
> 
> These runs are across different units of tomato-r2. I also tried on tomato-r3
> with the same result:
> without clock, fail: https://lava.collabora.dev/scheduler/job/14806546
> with clock, pass: https://lava.collabora.dev/scheduler/job/14806547
> 
> So this definitely fixes it. Whether or not this is the right fix, or how to
> describe this clock, I'll need your and MediaTek's help to figure out.
> 

I analyzed the situation and....
well, it's right, this clock does indeed resolve the issue, also tested locally,
but apparently there is no reference anywhere to why this happens to resolve it.

So, after a bit of extensive research, the only realistic reason here is that
there is some sort of hardware bug/quirk for the clocking of the secondary XHCI
controller.
Whether that is on the clock controller, on the internal paths or wherever else
is curious to know, but I suspect that this would take a lot of time for MediaTek
to perform the research.

What counts is that MediaTek is aware of this situation so that they can internally
understand what is going on with this and resolve that at a hardware level on new
SoC models.

As for what we can do about this, since this is a one-off, we can add that as
the frmcnt_ck one, with a comment in DT saying that this is a bug, and eventually
that we don't know if this has anything to do with the frame counter.

Besides, I also noticed that the CLK_APMIXED_PLL_SSUSB26M is missing from u2port1
and the reason why it works is because other u3phy0 should be enabling that before
u3phy1 inits and/or before the USB controller using U3P1 tries to initialize, so
while you're at it ... if you can please also add that to the u3p1, I appreciate.

			u2port1: usb-phy@0 {
				reg = <0x0 0x700>;
				clocks = <&apmixedsys CLK_APMIXED_PLL_SSUSB26M>,
					 <&topckgen CLK_TOP_SSUSB_PHY_P1_REF>;
				clock-names = "ref", "da_ref";
				#phy-cells = <1>;
			};

Anyway, nice catch! Waiting for your patch :-)

Cheers,
Angelo

> Thanks,
> Nícolas
> 
>>
>>> I'm just not sure from a DT perspective what's the right way to describe this
>>> clock. The node doesn't have the frmcnt_ck, is this that clock? Or is it
>>> another clock that currently isn't described in the dt-bindings and driver?
>>>
>>
>> That's the PCI-Express Root Port (RP) Transaction Layer (TL) clock... and I have
>> no idea why this has anything to do with USB.
>>
>> MediaTek, is that a hardware quirk? What is the relation between this clock and
>> the USB controller at 11290000?
>>
>> Thanks,
>> Angelo
>>
>>> Thanks,
>>> Nícolas
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-07-15 12:04               ` AngeloGioacchino Del Regno
@ 2024-07-15 14:00                 ` Nícolas F. R. A. Prado
  2024-07-22 14:53                 ` Nícolas F. R. A. Prado
  1 sibling, 0 replies; 12+ messages in thread
From: Nícolas F. R. A. Prado @ 2024-07-15 14:00 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno
  Cc: Macpaul Lin, Chunfeng Yun, Matthias Brugger, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, kernel,
	Chen-Yu Tsai, Bear Wang, Pablo Sun

On Mon, Jul 15, 2024 at 02:04:54PM +0200, AngeloGioacchino Del Regno wrote:
> Il 12/07/24 17:58, Nícolas F. R. A. Prado ha scritto:
> > On Fri, Jul 12, 2024 at 10:12:39AM +0200, AngeloGioacchino Del Regno wrote:
> > > Il 11/07/24 18:33, Nícolas F. R. A. Prado ha scritto:
> > > > On Thu, Jul 11, 2024 at 11:21:14AM +0200, AngeloGioacchino Del Regno wrote:
> > > > > Il 11/07/24 06:13, Macpaul Lin ha scritto:
> > > > > > 
> > > > > > 
> > > > > > On 7/11/24 03:15, Nícolas F. R. A. Prado wrote:
> > > > > > > On Fri, Jan 19, 2024 at 10:12:07AM +0100, AngeloGioacchino Del Regno wrote:
> > > > > > > > Il 18/01/24 19:36, Nícolas F. R. A. Prado ha scritto:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > KernelCI has identified a failure in the probe of one of the USB controllers on
> > > > > > > > > the MT8195-Tomato Chromebook [1]:
> > > > > > > > > 
> > > > > > > > > [   16.336840] xhci-mtk 11290000.usb: uwk - reg:0x400, version:104
> > > > > > > > > [   16.337081] xhci-mtk 11290000.usb: xHCI Host Controller
> > > > > > > > > [   16.337093] xhci-mtk 11290000.usb: new USB bus
> > > > > > > > > registered, assigned bus number 5
> > > > > > > > > [   16.357114] xhci-mtk 11290000.usb: clocks are not stable (0x1003d0f)
> > > > > > > > > [   16.357119] xhci-mtk 11290000.usb: can't setup: -110
> > > > > > > > > [   16.357128] xhci-mtk 11290000.usb: USB bus 5 deregistered
> > > > > > > > > [   16.359484] xhci-mtk: probe of 11290000.usb failed with error -110
> > > > > > > > > 
> > > > > > > > > A previous message [2] suggests that a force-mode phy property that has been
> > > > > > > > > merged might help with addressing the issue, however it's not clear to me how,
> > > > > > > > > given that the controller at 1129000 uses a USB2 phy and the phy driver patch
> > > > > > > > > only looks for the property on USB3 phys.
> > > > > > > > > 
> > > > > > > > > Worth noting that the issue doesn't always happen. For instance the test did
> > > > > > > > > pass for next-20240110 and then failed again on today's next [3]. But it does
> > > > > > > > > seem that the issue was introduced, or at least became much more likely, between
> > > > > > > > > next-20231221 and next-20240103, given that it never happened out of 10 runs
> > > > > > > > > before, and after that has happened 5 out of 7 times.
> > > > > > > > > 
> > > > > > > > > Note: On the Tomato Chromebook specifically this USB controller is not connected
> > > > > > > > > to anything.
> > > > > > > > > 
> > > > > > > > > [1] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/659ce3506673076a8c52a428/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENT5oGHKY$
> > > > > > > > > [2] https://lore.kernel.org/all/239def9b-437b-9211-7844-af4332651df0@mediatek.com/
> > > > > > > > > [3] https://urldefense.com/v3/__https://linux.kernelci.org/test/case/id/65a8c66ee89acb56ac52a405/__;!!CTRNKA9wMg0ARbw!jtg5drII8WUPwTiL4sWZiSRPXN-EBN8ctTGI85sirqvkmaUbA5z-wrLqPPfxlZZkQ7NItOWDT97OSdENi-d0sVc$
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > Nícolas
> > > > > > > > 
> > > > > > > > Hey Nícolas,
> > > > > > > > 
> > > > > > > > I wonder if this is happening because of async probe... I have seen those happening
> > > > > > > > once in a (long) while on MT8186 as well with the same kind of flakiness and I am
> > > > > > > > not even able to reproduce anymore.
> > > > > > > > 
> > > > > > > > For MT8195 Tomato, I guess we can simply disable that controller without any side
> > > > > > > > effects but, at the same time, I'm not sure that this would be the right thing to
> > > > > > > > do in this case.
> > > > > > > > 
> > > > > > > > Besides, the controller at 11290000 is the only one that doesn't live behind MTU3,
> > > > > > > > but I don't know if that can ring any bell....
> > > > > > > 
> > > > > > > An update on this issue: it looks like it only happens if "xhci-mtk
> > > > > > > 11290000.usb" probes before "mtk-pcie-gen3 112f8000.pcie". What they have in
> > > > > > > common is that both of those nodes use phys that share the same t-phy block:
> > > > > > > pcie uses the usb3 phy while xhci uses the usb2 phy. So it seems that some of
> > > > > > > the initialization done by the pcie controller might be implicitly needed by the
> > > > > > > usb controller.
> > > > > > > 
> > > > > > > This should help to narrow down the issue and find a proper fix for it.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Nícolas
> > > > > > 
> > > > > > 'force-mode' should only applied to the boards which require XHCI
> > > > > > function instead of a PCIE port.
> > > > > > 
> > > > > > For example, mt8395-genio-1200-evk.dts requires property 'force-mode' to
> > > > > > fix probe issue for USBC @11290000.
> > > > > > 
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux.git/commit/?h=v6.10-next/dts64&id=666e6f39faff05fe12bfc64c64aa9015135ce783
> > > > > > 
> > > > > > 'force-mode' should be no need for tomato boards and the behavior should
> > > > > > be the same as before.
> > > > > > 
> > > > > > Another possibility is the firmware change on tomato boards. I'm not
> > > > > > sure if there is any changes on tomato's recent firmware for tphy of
> > > > > > this port, which could also be a reason causes this kind of failure.
> > > > > > I don't have tomato boards on hand.
> > > > > > 
> > > > > 
> > > > > Hello Macpaul,
> > > > > 
> > > > > it's just about the usb node missing a power domain: as the PCIE_MAC_P1 domain
> > > > > seems to be shared between USB and PCIe, adding it to the USB node fixes the
> > > > > setup phase.
> > > > > 
> > > > > I'll send a devicetree fix soon.
> > > > 
> > > > Hi,
> > > > 
> > > > As I replied to that patch
> > > > (https://lore.kernel.org/all/20240711093230.118534-1-angelogioacchino.delregno@collabora.com)
> > > > it didn't fix the issue for me, but I have more updates:
> > > > 
> > > > I confirmed the pcie was doing some required setup since disabling the pcie1
> > > > node made the issue always happen, and that also made it easier to test.
> > > > 
> > > > I was able to track the issue down to the following clock:
> > > > <&infracfg_ao CLK_INFRA_AO_PCIE_P1_TL_96M>
> > > > 
> > > > Adding it to the clocks property of the xhci1 node fixed the issue.
> > > > 
> > > 
> > > Clocks is what I tried first, and didn't do anything for me...
> > > 
> > > ..anyway, can you at this point try to run that solution on the multiple
> > > devices that we have in the lab through KernelCI?
> > > 
> > > That would help validating that you're not facing the same false positive
> > > as mine from yesterday...
> > 
> > Hi,
> > 
> > I've ran 10 times with and 10 times without the following patch:
> > 
> >    diff --git a/arch/arm64/boot/dts/mediatek/mt8195.dtsi b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
> >    index 2ee45752583c..611afe4de968 100644
> >    --- a/arch/arm64/boot/dts/mediatek/mt8195.dtsi
> >    +++ b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
> >    @@ -1453,9 +1453,10 @@ xhci1: usb@11290000 {
> >                                     <&topckgen CLK_TOP_SSUSB_P1_REF>,
> >                                     <&apmixedsys CLK_APMIXED_USB1PLL>,
> >                                     <&clk26m>,
> >    -                                <&pericfg_ao CLK_PERI_AO_SSUSB_1P_XHCI>;
> >    +                                <&pericfg_ao CLK_PERI_AO_SSUSB_1P_XHCI>,
> >    +                                <&infracfg_ao CLK_INFRA_AO_PCIE_P1_TL_96M>;
> >                            clock-names = "sys_ck", "ref_ck", "mcu_ck", "dma_ck",
> >    -                                     "xhci_ck";
> >    +                                     "xhci_ck", "frmcnt_ck";
> >                            mediatek,syscon-wakeup = <&pericfg 0x400 104>;
> >                            wakeup-source;
> >                            status = "disabled";
> > 
> > In both cases I also had
> > 
> >    diff --git a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
> >    index fe5400e17b0f..e50be8a82d49 100644
> >    --- a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
> >    +++ b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
> >    @@ -613,7 +613,7 @@ flash@0 {
> >     };
> >     &pcie1 {
> >    -       status = "okay";
> >    +       /* status = "okay"; */
> >            pinctrl-names = "default";
> >            pinctrl-0 = <&pcie1_pins_default>;
> > 
> > to make the issue always happen.
> > 
> > For reproducibility purposes, this was tested on next-20240703 with the
> > following config: http://0x0.st/XMGM.txt
> > 
> > And the results confirm that every run (10/10) with the patch didn't experience
> > the issue:
> > 
> >     https://lava.collabora.dev/scheduler/job/14805738
> >     https://lava.collabora.dev/scheduler/job/14805757
> >     https://lava.collabora.dev/scheduler/job/14805759
> >     https://lava.collabora.dev/scheduler/job/14805789
> >     https://lava.collabora.dev/scheduler/job/14805791
> >     https://lava.collabora.dev/scheduler/job/14805792
> >     https://lava.collabora.dev/scheduler/job/14805795
> >     https://lava.collabora.dev/scheduler/job/14805799
> >     https://lava.collabora.dev/scheduler/job/14805816
> >     https://lava.collabora.dev/scheduler/job/14805820
> > 
> > While every run (10/10) without the patch experienced the issue:
> > 
> >     https://lava.collabora.dev/scheduler/job/14805740
> >     https://lava.collabora.dev/scheduler/job/14805758
> >     https://lava.collabora.dev/scheduler/job/14805787
> >     https://lava.collabora.dev/scheduler/job/14805790
> >     https://lava.collabora.dev/scheduler/job/14805793
> >     https://lava.collabora.dev/scheduler/job/14805796
> >     https://lava.collabora.dev/scheduler/job/14805803
> >     https://lava.collabora.dev/scheduler/job/14805818
> >     https://lava.collabora.dev/scheduler/job/14805822
> >     https://lava.collabora.dev/scheduler/job/14805876
> > 
> > These runs are across different units of tomato-r2. I also tried on tomato-r3
> > with the same result:
> > without clock, fail: https://lava.collabora.dev/scheduler/job/14806546
> > with clock, pass: https://lava.collabora.dev/scheduler/job/14806547
> > 
> > So this definitely fixes it. Whether or not this is the right fix, or how to
> > describe this clock, I'll need your and MediaTek's help to figure out.
> > 
> 
> I analyzed the situation and....
> well, it's right, this clock does indeed resolve the issue, also tested locally,
> but apparently there is no reference anywhere to why this happens to resolve it.
> 
> So, after a bit of extensive research, the only realistic reason here is that
> there is some sort of hardware bug/quirk for the clocking of the secondary XHCI
> controller.
> Whether that is on the clock controller, on the internal paths or wherever else
> is curious to know, but I suspect that this would take a lot of time for MediaTek
> to perform the research.
> 
> What counts is that MediaTek is aware of this situation so that they can internally
> understand what is going on with this and resolve that at a hardware level on new
> SoC models.
> 
> As for what we can do about this, since this is a one-off, we can add that as
> the frmcnt_ck one, with a comment in DT saying that this is a bug, and eventually
> that we don't know if this has anything to do with the frame counter.
> 
> Besides, I also noticed that the CLK_APMIXED_PLL_SSUSB26M is missing from u2port1
> and the reason why it works is because other u3phy0 should be enabling that before
> u3phy1 inits and/or before the USB controller using U3P1 tries to initialize, so
> while you're at it ... if you can please also add that to the u3p1, I appreciate.
> 
> 			u2port1: usb-phy@0 {
> 				reg = <0x0 0x700>;
> 				clocks = <&apmixedsys CLK_APMIXED_PLL_SSUSB26M>,
> 					 <&topckgen CLK_TOP_SSUSB_PHY_P1_REF>;
> 				clock-names = "ref", "da_ref";
> 				#phy-cells = <1>;
> 			};
> 
> Anyway, nice catch! Waiting for your patch :-)

Sure thing, will do. I'll just wait a couple days to give MediaTek a chance to
comment on this. Then I'll send the patch(es).

Thanks,
Nícolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Probe failure of usb controller @11290000 on MT8195 after next-20231221
  2024-07-15 12:04               ` AngeloGioacchino Del Regno
  2024-07-15 14:00                 ` Nícolas F. R. A. Prado
@ 2024-07-22 14:53                 ` Nícolas F. R. A. Prado
  1 sibling, 0 replies; 12+ messages in thread
From: Nícolas F. R. A. Prado @ 2024-07-22 14:53 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno
  Cc: Macpaul Lin, Chunfeng Yun, Matthias Brugger, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, kernel,
	Chen-Yu Tsai, Bear Wang, Pablo Sun

On Mon, Jul 15, 2024 at 02:04:54PM +0200, AngeloGioacchino Del Regno wrote:
[..]
> Besides, I also noticed that the CLK_APMIXED_PLL_SSUSB26M is missing from u2port1
> and the reason why it works is because other u3phy0 should be enabling that before
> u3phy1 inits and/or before the USB controller using U3P1 tries to initialize, so
> while you're at it ... if you can please also add that to the u3p1, I appreciate.
> 
> 			u2port1: usb-phy@0 {
> 				reg = <0x0 0x700>;
> 				clocks = <&apmixedsys CLK_APMIXED_PLL_SSUSB26M>,
> 					 <&topckgen CLK_TOP_SSUSB_PHY_P1_REF>;
> 				clock-names = "ref", "da_ref";
> 				#phy-cells = <1>;
> 			};

I'm not familiar with the clock topology on MT8195, but I noticed the
CLK_APMIXED_PLL_SSUSB26M clock is currently only present in the USB3 phy nodes:
u3port1 and u3port0. You're suggesting to add it to a USB2 phy node here. Is it
needed by all USB2 phy nodes (u2port0, u2port1, u2port2, u2port3) then? 

Thanks,
Nícolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-07-22 14:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-18 18:36 Probe failure of usb controller @11290000 on MT8195 after next-20231221 Nícolas F. R. A. Prado
2024-01-19  9:12 ` AngeloGioacchino Del Regno
2024-07-10 19:15   ` Nícolas F. R. A. Prado
2024-07-11  4:13     ` Macpaul Lin
2024-07-11  9:21       ` AngeloGioacchino Del Regno
2024-07-11 16:33         ` Nícolas F. R. A. Prado
2024-07-12  8:12           ` AngeloGioacchino Del Regno
2024-07-12 15:58             ` Nícolas F. R. A. Prado
2024-07-15 12:04               ` AngeloGioacchino Del Regno
2024-07-15 14:00                 ` Nícolas F. R. A. Prado
2024-07-22 14:53                 ` Nícolas F. R. A. Prado
2024-07-11  9:14     ` AngeloGioacchino Del Regno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).