From: Niklas Cassel <cassel@kernel.org>
To: Manivannan Sadhasivam <mani@kernel.org>
Cc: "Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Rob Herring" <robh@kernel.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Thierry Reding" <thierry.reding@gmail.com>,
"Jonathan Hunter" <jonathanh@nvidia.com>,
"Vidya Sagar" <vidyas@nvidia.com>,
"Shin'ichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
linux-pci@vger.kernel.org, linux-tegra@vger.kernel.org
Subject: Re: [PATCH] PCI: tegra194: Handle errors in BPMP response
Date: Mon, 22 Sep 2025 15:36:50 +0200 [thread overview]
Message-ID: <aNFQ8hCznucs-CZw@ryzen> (raw)
In-Reply-To: <lvchydppqdxm4hy4kogkzinz4w2hllvsihg2ehvueth25sxi53@feqxeedvrs2o>
On Sat, Sep 20, 2025 at 09:01:25PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Sep 11, 2025 at 02:27:29PM +0200, Niklas Cassel wrote:
> > From: Vidya Sagar <vidyas@nvidia.com>
> >
> > The return value from tegra_bpmp_transfer() indicates the success or
> > failure of the IPC transaction with BPMP. If the transaction
> > succeeded, we also need to check the actual command's result code.
> >
> > If a host deasserts PERST without providing a refclock, enabling the PHY
> > (via a tegra_bpmp_transfer() call) will silently fail, however, because
> > we are lacking error handling, pex_ep_event_pex_rst_deassert() will still
> > set pcie->ep_state = EP_STATE_ENABLED.
> >
>
> How can a host deassert PERST# without providing refclk? As per the CEM spec
> r4.0, sec 2.2, refclk should be active at least TPERST-CLK before PERST# is
> deasserted.
>
> So does this controller violate the spec? Even so, I don't know how an endpoint
> could function if it relies on the host for refclk.
Hello Mani,
I apologize for my poor commit message.
What happens is this:
I use a rock5b as host. I use a jetson orin nano (Tegra234 based board).
When uboot starts on rock5b, it initializes the PCIe root complex, this
asserts+deasserts PERST#, and I can see the PERST GPIO handler being
triggered on the jetson board.
However, for some unknown reason, the PHY init fails.
I have no idea why. My guess is that perhaps the uboot driver does not
provide the refclock for very long after deasserting PERST, before deasserting
PERST again and booting linux.
The problem is that the tegra PCIe endpoint driver has no error handling for
the PHY init (which is done using a tegra_bpmp_transfer() command to firmware).
So the tegra PCIe endpoint driver will incorrectly set pcie->ep_state to
EP_STATE_ENABLED.
So when linux eventually boots on the rock5b, and the DWC based PCIe driver
asserts+deasserts PERST, it will be a no-op, because the tegra driver will
think that the pcie->ep_state is already EP_STATE_ENABLED.
This means that the driver will not try to initialize the PHY during this
second PERST assert+deassert.
By simply adding error handling to tegra_bpmp_transfer(), when the first
PERST assert+deassert (make by uboot) fails, the pcie->ep_state will not
incorrectly be set to EP_STATE_ENABLED.
Thus, when the second PERST assert+deassert (made by the linux driver) is done,
we will try to initialize the PHY again. For some reason, the PHY init works
this second time. Why? I don't really know, but we know that the DWC based
driver has the proper sleeps before/after asserting PERST, and that it will
not disable the reflock suddently, because it is loading some other OS.
I will try to improve the commit message to be more specific in V2.
Kind regards,
Niklas
>
> > Because of this, any succeeding PERST deassertion will incorrectly be a
> > no-op (because of the pcie->ep_state == EP_STATE_ENABLED check in
> > pex_ep_event_pex_rst_deassert()), even if the host does provide a refclock
> > during the succeeding PERST deassertion.
> >
> > Add error handling to tegra_bpmp_transfer(), such that the pcie->ep_state
> > can not get out of sync with reality, which will incorrectly cause the
> > driver to think that it has been successfully initialized, which
> > incorrectly makes future calls to pex_ep_event_pex_rst_deassert() a no-op.
> >
> > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > [cassel: improve commit log]
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> > ---
> > drivers/pci/controller/dwc/pcie-tegra194.c | 18 ++++++++++++++++--
> > 1 file changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> > index 0c0734aa14b68..8c5c370dbba5e 100644
> > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > @@ -1214,6 +1214,7 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
> > struct mrq_uphy_response resp;
> > struct tegra_bpmp_message msg;
> > struct mrq_uphy_request req;
> > + int err;
> >
> > /*
> > * Controller-5 doesn't need to have its state set by BPMP-FW in
> > @@ -1236,7 +1237,13 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
> > msg.rx.data = &resp;
> > msg.rx.size = sizeof(resp);
> >
> > - return tegra_bpmp_transfer(pcie->bpmp, &msg);
> > + err = tegra_bpmp_transfer(pcie->bpmp, &msg);
> > + if (err)
> > + return err;
> > + if (msg.rx.ret)
> > + return -EINVAL;
> > +
> > + return 0;
> > }
> >
> > static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> > @@ -1245,6 +1252,7 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> > struct mrq_uphy_response resp;
> > struct tegra_bpmp_message msg;
> > struct mrq_uphy_request req;
> > + int err;
> >
> > memset(&req, 0, sizeof(req));
> > memset(&resp, 0, sizeof(resp));
> > @@ -1264,7 +1272,13 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> > msg.rx.data = &resp;
> > msg.rx.size = sizeof(resp);
> >
> > - return tegra_bpmp_transfer(pcie->bpmp, &msg);
> > + err = tegra_bpmp_transfer(pcie->bpmp, &msg);
> > + if (err)
> > + return err;
> > + if (msg.rx.ret)
> > + return -EINVAL;
> > +
> > + return 0;
> > }
> >
> > static void tegra_pcie_downstream_dev_to_D0(struct tegra_pcie_dw *pcie)
> > --
> > 2.51.0
> >
>
> --
> மணிவண்ணன் சதாசிவம்
prev parent reply other threads:[~2025-09-22 13:36 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-11 12:27 [PATCH] PCI: tegra194: Handle errors in BPMP response Niklas Cassel
2025-09-17 13:15 ` Jon Hunter
2025-09-20 15:31 ` Manivannan Sadhasivam
2025-09-22 13:36 ` Niklas Cassel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aNFQ8hCznucs-CZw@ryzen \
--to=cassel@kernel.org \
--cc=bhelgaas@google.com \
--cc=jonathanh@nvidia.com \
--cc=kwilczynski@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=mani@kernel.org \
--cc=robh@kernel.org \
--cc=shinichiro.kawasaki@wdc.com \
--cc=thierry.reding@gmail.com \
--cc=vidyas@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).