Re:Re: failed to power up MHI controller issue

Linux MHI Development
 help / color / mirror / Atom feed

From: "Slark Xiao" <slark_xiao@163.com>
To: "Manivannan Sadhasivam" <mani@kernel.org>
Cc: "Qiang Yu" <quic_qianyu@quicinc.com>, mhi@lists.linux.dev
Subject: Re:Re: failed to power up MHI controller issue
Date: Tue, 16 Apr 2024 11:56:24 +0800 (CST)	[thread overview]
Message-ID: <61e13e10.612a.18ee50c7dca.Coremail.slark_xiao@163.com> (raw)
In-Reply-To: <20240415103316.GE7537@thinkpad>


Hi Mani, QiangYu,
Just confirmed it with attached modification, and issue was gone.
So you can help update it into kernel side. And I will follow this
reference debug experience for other potential platform.
Thank you guys a lot!

At 2024-04-15 18:33:16, "Manivannan Sadhasivam" <mani@kernel.org> wrote:
>On Mon, Apr 15, 2024 at 02:09:35PM +0800, Qiang Yu wrote:
>> 
>> On 4/13/2024 5:10 PM, Slark Xiao wrote:
>> > Hi QiangYu, Mani,
>> > My case has a bit difference with yours.
>> > In my local case, device can't be recovered except a reboot(mainly a cold
>> > reboot). Workaround remove and rescan can't solve it.
>> > Also, this issue could be occured in some platform in the first bootup.
>> > May I know if this issue related with the power sequence of host or
>> > pin settings of the connector?
>> Hi Slark
>> 
>> I don't think it's a mhi host driver issue. From the log you shared, there
>> is a
>> PERST deassertion followed by a PERST assertion closely, leading to device
>> run
>> out of order.
>> 
>> // Here, suppose reboot is triggered on host, so mhi_pci_shutdown is invoked
>> 91.474100278 [0x0 mhi_sm_pcie_event_manager] Handling
>> EP_PCIE_PM_D3_HOT_EVENT
>> event, current states: READY and D0_STATE
>> 
>> 91.747734185 ep_pcie_handle_perst_irq: PCIe V1711211: No. 1 PERST assertion
>> 91.747784342 [0x0 mhi_dev_sm_pcie_handler] received:
>> EP_PCIE_PM_D3_COLD_EVENT
>> 91.748044394 [0x0 mhi_sm_pcie_event_manager] Handling
>> EP_PCIE_PM_D3_COLD_EVENT
>> event, current states: READY and D3_HOT_STATE
>> 
>> //Host should power up, so it deassert perst prepare to do link train.
>> 92.475207677 ep_pcie_handle_perst_irq: PCIe V1711211: No. 1 PERST
>> deassertion
>> 92.475269968 ep_pcie_notify_event: PCIe V1711211: Callback client for event
>> 8
>> 92.475317729 ep_pcie_core_enable_endpoint: PCIe V1711211: options input are
>> 0x2
>> 92.475321427 ep_pcie_vreg_init: PCIe V1711211
>> 92.475323823 ep_pcie_vreg_init: PCIe V1711211: Vreg vreg-1p8 is being
>> enabled
>> 
>> //Process of perst deassert has not completed, but deassert happen, which is
>> not expected
>> 92.475366010 ep_pcie_handle_perst_irq: PCIe V1711211: No. 2 PERST assertion
>
>Ok. I think I know what is going on. It is related to how PERST# is handled in
>the host.
>
>Rockpro64 SoC defines the PERST# GPIO as below in
>arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dtsi:
>
>        ep-gpios = <&gpio2 RK_PD4 GPIO_ACTIVE_HIGH>;
>
>Here, the PERST# GPIO is configured as ACTIVE_HIGH. So whatever the value driver
>sets as the logical output for the GPIO, it will be reflected as it is on the
>physical line.
>
>In the drivers/pci/controller/pcie-rockchip-host.c driver:
>
>	gpiod_set_value_cansleep(rockchip->ep_gpio, 0); # PERST# assert
>	gpiod_set_value_cansleep(rockchip->ep_gpio, 1); # PERST# deassert
>
>So when driver asserts the PERST# GPIO, the physical line will output "low"
>corresponding to the driver logical value "0". And vice versa for the deassert.
>
>When host reboot happens, the driver is not doing anything specific for PERST#.
>So after the SoC reboot, the physical line goes to "low" state corresponding to
>PERST# assert (default state of the GPIO) and this is reflected in the endpoint
>log as:
>
>	[    91.747734185] ep_pcie_handle_perst_irq: PCIe V1711211: No. 1 PERST assertion
>
>Then, when the host controller probes, the PERST# GPIO is requested as below:
>
>	rockchip->ep_gpio = devm_gpiod_get_optional(dev, "ep",
>                                                            GPIOD_OUT_HIGH);
>
>Here, the GPIO is requested with the initial state of GPIOD_OUT_HIGH. Which
>means, the driver sets the logical value of the PERST# GPIO to "1" and the
>physical value will be "high" and this is reflected in the endpoint log as:
>
>	[    92.475207677] ep_pcie_handle_perst_irq: PCIe V1711211: No. 1 PERST deassertion	
>
>Then during rockchip_pcie_host_init_port() of driver probe, PERST# is asserted
>again to perform register initialization and this is also reflected in the
>endpoint log as:
>
>	[    92.475366010] ep_pcie_handle_perst_irq: PCIe V1711211: No. 2 PERST assertion
>
>Once the register initilization is completed, then the PERST# is deasserted:
>
>	[    92.503568354] ep_pcie_handle_perst_irq: PCIe V1711211: No. 2 PERST deassertion
>
>The issue here happens due to the very short time between the first PERST#
>deassert during devm_gpiod_get_optional() and then successive PERST# assert
>during rockchip_pcie_host_init_port() as Qiang noted.
>
>But the GPIO flag (GPIOD_OUT_HIGH) is what actually culprit. It is supposed to
>be GPIOD_OUT_LOW as the driver should not deassert PERST# before configuring the
>controller.
>
>@Slark: If you can modify the host platform, then try changing the flag from
>GPIOD_OUT_HIGH to GPIOD_OUT_LOW in [1] and see if that fixes the issue during
>reboot.
>
>- Mani
>
>[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/pci/controller/pcie-rockchip.c?h=v6.6.3#n123
>
>> > At 2024-04-09 12:35:41, "Qiang Yu" <quic_qianyu@quicinc.com> wrote:
>> > >
>> > >On 4/8/2024 5:59 PM, Slark Xiao wrote:
>> > >> Hi Mani,
>> > >> Please see attached log files(both kernel log and IPC log).
>> > >>
>> > >Hi Mani, Slark
>> > >
>> > >I ever met similar issue like this, where device treated D3cold,ready -> D0
>> > >as illegal transition.
>> > >
>> > >"EP_PCIE_PM_D0_EVENT: illegal in current MHI state: READY and D3_COLD_STATE"
>> > >
>> > >In MHI spec, there is not transition path from D3cold,ready->D0, but
>> > >in fact, we have this transition in some cases.
>> > >
>> > >For example, when we remove and reinstall pci generic driver. During
>> > >remove,
>> > >mhi will reset device and pci framework will put it into D3cold
>> > directly >when
>> > >rootport driver runtime suspend. When we reinstall driver, device will
>> > >see D0
>> > >event but its current state is D3cold,ready. The whole state transition is
>> > >like:
>> > >D0,M0 -> D0,reset -> D0,ready-> D3cold,ready -> D0,ready.
>> > >
>> > >During reboot, if device doesn't reboot with host, look like we will
>> > >also meet
>> > >similar transition.
>> > >
>> > >This illegal state error log may not root cause to this issue. Even
>> > >process it
>> > >as syserr, we can still recovery and go back to M0.
>> > >> At 2024-04-02 12:56:47, "Manivannan Sadhasivam" <mani@kernel.org> wrote:
>> > >> >+ MHI list
>> > >> >
>> > >> >On Thu, Mar 28, 2024 at 08:02:20PM +0800, Slark Xiao wrote:
>> > >> >> Hi Mani,
>> > >> >>  Hope you are doing well! I got a problem with our sdx65 device in some
>> > >> >> >> >> specific platform. MHI driver would report "failed to power
>> > up MHI controller"
>> > >> >> when device bootup. Actually, I can reproduce this error when host doing a
>> > >> >> reboot. It's Rockpro64 with SDX65, and kernel 6.6.3 or 6.7.10.
>> > >> >> >> >> So I add some logs and change dbg level log to info for more
>> > print. You can
>> > >> >> see my attachments for reference.
>> > >> >> It seems the host didn't receive the event of "MISSION MODE" and then
>> > >> >> power down the device.
>> > >> >> BTW, there are some extra log prints were added in function
>> > >> >> mhi_sync_power_up(). You can find it with mask "##shawn##".
>> > >> >> Do you have any idea to debug it?
>> > >> >> >> >
>> > >> >Looks like something gone wrong with the device firmware. Is it possible to get
>> > >> >the logs from the device?
>> > >> >
>> > >> >This could be due to the way the PCIe controller driver on the host handling
>> > >> >reboot.
>> > >> >
>> > >> >But let's get the device logs first to debug further.
>> > >> >
>> > >> >- Mani
>> > >> >
>> > >> >-- >> >மணிவண்ணன் சதாசிவம்
>> 
>
>-- 
>மணிவண்ணன் சதாசிவம்

next prev parent reply	other threads:[~2024-04-16  3:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <38c7997c.10212.18e84f08a4f.Coremail.slark_xiao@163.com>
2024-04-02  4:56 ` failed to power up MHI controller issue Manivannan Sadhasivam
2024-04-08  9:59   ` Slark Xiao
2024-04-09  4:35     ` Qiang Yu
     [not found]       ` <40210cbb.3949.18ed6b904d6.Coremail.slark_xiao@163.com>
2024-04-15  6:09         ` Qiang Yu
2024-04-15 10:33           ` Manivannan Sadhasivam
2024-04-16  3:56             ` Slark Xiao [this message]
2024-04-17  2:30               ` Qiang Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61e13e10.612a.18ee50c7dca.Coremail.slark_xiao@163.com \
    --to=slark_xiao@163.com \
    --cc=mani@kernel.org \
    --cc=mhi@lists.linux.dev \
    --cc=quic_qianyu@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox