Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: lpieralisi@kernel.org, kw@linux.com, bhelgaas@google.com,
	robh@kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Subject: Re: [PATCH v2] PCI: qcom-ep: Enable controller resources like PHY only after refclk is available
Date: Fri, 30 Aug 2024 13:42:05 +0530	[thread overview]
Message-ID: <20240830081205.x3ucsausk5znk27e@thinkpad> (raw)
In-Reply-To: <20240829170624.GA67120@bhelgaas>

On Thu, Aug 29, 2024 at 12:06:24PM -0500, Bjorn Helgaas wrote:
> On Thu, Aug 29, 2024 at 10:14:55PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Aug 29, 2024 at 07:38:08AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Aug 29, 2024 at 11:07:20AM +0530, Manivannan Sadhasivam wrote:
> > > > On Wed, Aug 28, 2024 at 03:59:45PM -0500, Bjorn Helgaas wrote:
> > > > > On Wed, Aug 28, 2024 at 07:31:08PM +0530, Manivannan Sadhasivam wrote:
> > > > > > qcom_pcie_enable_resources() is called by qcom_pcie_ep_probe() and it
> > > > > > enables the controller resources like clocks, regulator, PHY. On one of the
> > > > > > new unreleased Qcom SoC, PHY enablement depends on the active refclk. And
> > > > > > on all of the supported Qcom endpoint SoCs, refclk comes from the host
> > > > > > (RC). So calling qcom_pcie_enable_resources() without refclk causes the
> > > > > > whole SoC crash on the new SoC.
> > > > > > 
> > > > > > qcom_pcie_enable_resources() is already called by
> > > > > > qcom_pcie_perst_deassert() when PERST# is deasserted, and refclk is
> > > > > > available at that time.
> > > > > > 
> > > > > > Hence, remove the unnecessary call to qcom_pcie_enable_resources() from
> > > > > > qcom_pcie_ep_probe() to prevent the crash.
> > > > > > 
> > > > > > Fixes: 869bc5253406 ("PCI: dwc: ep: Fix DBI access failure for drivers requiring refclk from host")
> > > > > > Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > ---
> > > > > > 
> > > > > > Changes in v2:
> > > > > > 
> > > > > > - Changed the patch description to mention the crash clearly as suggested by
> > > > > >   Bjorn
> > > > > 
> > > > > Clearly mentioning the crash as rationale for the change is *part* of
> > > > > what I was looking for.
> > > > > 
> > > > > The rest, just as important, is information about what sort of crash
> > > > > this is, because I hope and suspect the crash is recoverable, and we
> > > > > *should* recover from it because PERST# may occur at arbitrary times,
> > > > > so trying to avoid it is never going to be reliable.
> > > > 
> > > > I did mention 'whole SoC crash' which typically means unrecoverable
> > > > state as the SoC would crash (not just the driver). On Qcom SoCs,
> > > > this will also lead the SoC to boot into EDL (Emergency Download)
> > > > mode so that the users can collect dumps on the crash.
> > > 
> > > IIUC we're talking about an access to a PHY register, and the access
> > > requires Refclk from the host.  I assume the SoC accesses the register
> > > by doing an MMIO load.  If nothing responds, I assume the SoC would
> > > take a machine check or similar because there's no data to complete
> > > the load instruction.  So I assume again that the Linux on the SoC
> > > doesn't know how to recover from such a machine check?  If that's the
> > > scenario, is the machine check unrecoverable in principle, or is it
> > > potentially recoverable but nobody has done the work to do it?  My
> > > guess would be the latter, because the former would mean that it's
> > > impossible to build a robust endpoint around this SoC.  But obviously
> > > this is all complete speculation on my part.
> > 
> > Atleast on Qcom SoCs, doing a MMIO read without enabling the
> > resources would result in a NoC (Network On Chip) error, which then
> > end up as an exception to the Trustzone and Trustzone will finally
> > convert it to a SoC crash so that the users could take a crash dump
> > and do the analysis on why the crash has happened.
> > 
> > I know that it may sound strange to developers coming from x86 world
> > :)
> 
> It's only strange if the system design forces a crash for events that
> happen in normal operation.  Sounds like part of the problem here is
> the non-SRIS mode that depends on Refclk from the host.  That and the
> fact that operating in non-SRIS mode has an unavoidable race where
> PERST# from the host at the wrong time can crash the endpoint.
> 

Precisely.

> I think users of non-SRIS mode need to be aware of this issue, and
> this patch to narrow the race window, but not close it completely, is
> one good place to mention it.
> 

Okay. I'll mention it in patch description.

> > But this NoC error is something NVidia has also reported before, so
> > I wouldn't assume that this is a Qcom specific issue but rather for
> > SoCs depending on refclk from host.
> 
> Are there other drivers that need a similar band-aid?
> 

AFAIK, only tegra194 and qcom-pcie-ep drivers require refclk from host. Rest of
the endpoint drivers seem to have independent clock.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

      reply	other threads:[~2024-08-30  8:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-28 14:01 [PATCH v2] PCI: qcom-ep: Enable controller resources like PHY only after refclk is available Manivannan Sadhasivam
2024-08-28 20:59 ` Bjorn Helgaas
2024-08-29  5:37   ` Manivannan Sadhasivam
2024-08-29 12:38     ` Bjorn Helgaas
2024-08-29 16:44       ` Manivannan Sadhasivam
2024-08-29 17:06         ` Bjorn Helgaas
2024-08-30  8:12           ` Manivannan Sadhasivam [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240830081205.x3ucsausk5znk27e@thinkpad \
    --to=manivannan.sadhasivam@linaro.org \
    --cc=bhelgaas@google.com \
    --cc=dmitry.baryshkov@linaro.org \
    --cc=helgaas@kernel.org \
    --cc=kw@linux.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lpieralisi@kernel.org \
    --cc=robh@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox