* [PATCH] PCI: tegra194: Handle errors in BPMP response
@ 2025-09-11 12:27 Niklas Cassel
2025-09-17 13:15 ` Jon Hunter
2025-09-20 15:31 ` Manivannan Sadhasivam
0 siblings, 2 replies; 4+ messages in thread
From: Niklas Cassel @ 2025-09-11 12:27 UTC (permalink / raw)
To: Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Thierry Reding,
Jonathan Hunter
Cc: Vidya Sagar, Shin'ichiro Kawasaki, Niklas Cassel, linux-pci,
linux-tegra
From: Vidya Sagar <vidyas@nvidia.com>
The return value from tegra_bpmp_transfer() indicates the success or
failure of the IPC transaction with BPMP. If the transaction
succeeded, we also need to check the actual command's result code.
If a host deasserts PERST without providing a refclock, enabling the PHY
(via a tegra_bpmp_transfer() call) will silently fail, however, because
we are lacking error handling, pex_ep_event_pex_rst_deassert() will still
set pcie->ep_state = EP_STATE_ENABLED.
Because of this, any succeeding PERST deassertion will incorrectly be a
no-op (because of the pcie->ep_state == EP_STATE_ENABLED check in
pex_ep_event_pex_rst_deassert()), even if the host does provide a refclock
during the succeeding PERST deassertion.
Add error handling to tegra_bpmp_transfer(), such that the pcie->ep_state
can not get out of sync with reality, which will incorrectly cause the
driver to think that it has been successfully initialized, which
incorrectly makes future calls to pex_ep_event_pex_rst_deassert() a no-op.
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
[cassel: improve commit log]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
drivers/pci/controller/dwc/pcie-tegra194.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
index 0c0734aa14b68..8c5c370dbba5e 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -1214,6 +1214,7 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
struct mrq_uphy_response resp;
struct tegra_bpmp_message msg;
struct mrq_uphy_request req;
+ int err;
/*
* Controller-5 doesn't need to have its state set by BPMP-FW in
@@ -1236,7 +1237,13 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
msg.rx.data = &resp;
msg.rx.size = sizeof(resp);
- return tegra_bpmp_transfer(pcie->bpmp, &msg);
+ err = tegra_bpmp_transfer(pcie->bpmp, &msg);
+ if (err)
+ return err;
+ if (msg.rx.ret)
+ return -EINVAL;
+
+ return 0;
}
static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
@@ -1245,6 +1252,7 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
struct mrq_uphy_response resp;
struct tegra_bpmp_message msg;
struct mrq_uphy_request req;
+ int err;
memset(&req, 0, sizeof(req));
memset(&resp, 0, sizeof(resp));
@@ -1264,7 +1272,13 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
msg.rx.data = &resp;
msg.rx.size = sizeof(resp);
- return tegra_bpmp_transfer(pcie->bpmp, &msg);
+ err = tegra_bpmp_transfer(pcie->bpmp, &msg);
+ if (err)
+ return err;
+ if (msg.rx.ret)
+ return -EINVAL;
+
+ return 0;
}
static void tegra_pcie_downstream_dev_to_D0(struct tegra_pcie_dw *pcie)
--
2.51.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] PCI: tegra194: Handle errors in BPMP response
2025-09-11 12:27 [PATCH] PCI: tegra194: Handle errors in BPMP response Niklas Cassel
@ 2025-09-17 13:15 ` Jon Hunter
2025-09-20 15:31 ` Manivannan Sadhasivam
1 sibling, 0 replies; 4+ messages in thread
From: Jon Hunter @ 2025-09-17 13:15 UTC (permalink / raw)
To: Niklas Cassel, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Thierry Reding
Cc: Vidya Sagar, Shin'ichiro Kawasaki, linux-pci, linux-tegra
On 11/09/2025 13:27, Niklas Cassel wrote:
> From: Vidya Sagar <vidyas@nvidia.com>
>
> The return value from tegra_bpmp_transfer() indicates the success or
> failure of the IPC transaction with BPMP. If the transaction
> succeeded, we also need to check the actual command's result code.
>
> If a host deasserts PERST without providing a refclock, enabling the PHY
> (via a tegra_bpmp_transfer() call) will silently fail, however, because
> we are lacking error handling, pex_ep_event_pex_rst_deassert() will still
> set pcie->ep_state = EP_STATE_ENABLED.
>
> Because of this, any succeeding PERST deassertion will incorrectly be a
> no-op (because of the pcie->ep_state == EP_STATE_ENABLED check in
> pex_ep_event_pex_rst_deassert()), even if the host does provide a refclock
> during the succeeding PERST deassertion.
>
> Add error handling to tegra_bpmp_transfer(), such that the pcie->ep_state
> can not get out of sync with reality, which will incorrectly cause the
> driver to think that it has been successfully initialized, which
> incorrectly makes future calls to pex_ep_event_pex_rst_deassert() a no-op.
>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> [cassel: improve commit log]
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
> drivers/pci/controller/dwc/pcie-tegra194.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> index 0c0734aa14b68..8c5c370dbba5e 100644
> --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> @@ -1214,6 +1214,7 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
> struct mrq_uphy_response resp;
> struct tegra_bpmp_message msg;
> struct mrq_uphy_request req;
> + int err;
>
> /*
> * Controller-5 doesn't need to have its state set by BPMP-FW in
> @@ -1236,7 +1237,13 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
> msg.rx.data = &resp;
> msg.rx.size = sizeof(resp);
>
> - return tegra_bpmp_transfer(pcie->bpmp, &msg);
> + err = tegra_bpmp_transfer(pcie->bpmp, &msg);
> + if (err)
> + return err;
> + if (msg.rx.ret)
> + return -EINVAL;
> +
> + return 0;
> }
>
> static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> @@ -1245,6 +1252,7 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> struct mrq_uphy_response resp;
> struct tegra_bpmp_message msg;
> struct mrq_uphy_request req;
> + int err;
>
> memset(&req, 0, sizeof(req));
> memset(&resp, 0, sizeof(resp));
> @@ -1264,7 +1272,13 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> msg.rx.data = &resp;
> msg.rx.size = sizeof(resp);
>
> - return tegra_bpmp_transfer(pcie->bpmp, &msg);
> + err = tegra_bpmp_transfer(pcie->bpmp, &msg);
> + if (err)
> + return err;
> + if (msg.rx.ret)
> + return -EINVAL;
> +
> + return 0;
> }
>
> static void tegra_pcie_downstream_dev_to_D0(struct tegra_pcie_dw *pcie)
Looks good to me!
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Cheers
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] PCI: tegra194: Handle errors in BPMP response
2025-09-11 12:27 [PATCH] PCI: tegra194: Handle errors in BPMP response Niklas Cassel
2025-09-17 13:15 ` Jon Hunter
@ 2025-09-20 15:31 ` Manivannan Sadhasivam
2025-09-22 13:36 ` Niklas Cassel
1 sibling, 1 reply; 4+ messages in thread
From: Manivannan Sadhasivam @ 2025-09-20 15:31 UTC (permalink / raw)
To: Niklas Cassel
Cc: Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
Bjorn Helgaas, Thierry Reding, Jonathan Hunter, Vidya Sagar,
Shin'ichiro Kawasaki, linux-pci, linux-tegra
On Thu, Sep 11, 2025 at 02:27:29PM +0200, Niklas Cassel wrote:
> From: Vidya Sagar <vidyas@nvidia.com>
>
> The return value from tegra_bpmp_transfer() indicates the success or
> failure of the IPC transaction with BPMP. If the transaction
> succeeded, we also need to check the actual command's result code.
>
> If a host deasserts PERST without providing a refclock, enabling the PHY
> (via a tegra_bpmp_transfer() call) will silently fail, however, because
> we are lacking error handling, pex_ep_event_pex_rst_deassert() will still
> set pcie->ep_state = EP_STATE_ENABLED.
>
How can a host deassert PERST# without providing refclk? As per the CEM spec
r4.0, sec 2.2, refclk should be active at least TPERST-CLK before PERST# is
deasserted.
So does this controller violate the spec? Even so, I don't know how an endpoint
could function if it relies on the host for refclk.
- Mani
> Because of this, any succeeding PERST deassertion will incorrectly be a
> no-op (because of the pcie->ep_state == EP_STATE_ENABLED check in
> pex_ep_event_pex_rst_deassert()), even if the host does provide a refclock
> during the succeeding PERST deassertion.
>
> Add error handling to tegra_bpmp_transfer(), such that the pcie->ep_state
> can not get out of sync with reality, which will incorrectly cause the
> driver to think that it has been successfully initialized, which
> incorrectly makes future calls to pex_ep_event_pex_rst_deassert() a no-op.
>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> [cassel: improve commit log]
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
> drivers/pci/controller/dwc/pcie-tegra194.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> index 0c0734aa14b68..8c5c370dbba5e 100644
> --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> @@ -1214,6 +1214,7 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
> struct mrq_uphy_response resp;
> struct tegra_bpmp_message msg;
> struct mrq_uphy_request req;
> + int err;
>
> /*
> * Controller-5 doesn't need to have its state set by BPMP-FW in
> @@ -1236,7 +1237,13 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
> msg.rx.data = &resp;
> msg.rx.size = sizeof(resp);
>
> - return tegra_bpmp_transfer(pcie->bpmp, &msg);
> + err = tegra_bpmp_transfer(pcie->bpmp, &msg);
> + if (err)
> + return err;
> + if (msg.rx.ret)
> + return -EINVAL;
> +
> + return 0;
> }
>
> static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> @@ -1245,6 +1252,7 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> struct mrq_uphy_response resp;
> struct tegra_bpmp_message msg;
> struct mrq_uphy_request req;
> + int err;
>
> memset(&req, 0, sizeof(req));
> memset(&resp, 0, sizeof(resp));
> @@ -1264,7 +1272,13 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> msg.rx.data = &resp;
> msg.rx.size = sizeof(resp);
>
> - return tegra_bpmp_transfer(pcie->bpmp, &msg);
> + err = tegra_bpmp_transfer(pcie->bpmp, &msg);
> + if (err)
> + return err;
> + if (msg.rx.ret)
> + return -EINVAL;
> +
> + return 0;
> }
>
> static void tegra_pcie_downstream_dev_to_D0(struct tegra_pcie_dw *pcie)
> --
> 2.51.0
>
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] PCI: tegra194: Handle errors in BPMP response
2025-09-20 15:31 ` Manivannan Sadhasivam
@ 2025-09-22 13:36 ` Niklas Cassel
0 siblings, 0 replies; 4+ messages in thread
From: Niklas Cassel @ 2025-09-22 13:36 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
Bjorn Helgaas, Thierry Reding, Jonathan Hunter, Vidya Sagar,
Shin'ichiro Kawasaki, linux-pci, linux-tegra
On Sat, Sep 20, 2025 at 09:01:25PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Sep 11, 2025 at 02:27:29PM +0200, Niklas Cassel wrote:
> > From: Vidya Sagar <vidyas@nvidia.com>
> >
> > The return value from tegra_bpmp_transfer() indicates the success or
> > failure of the IPC transaction with BPMP. If the transaction
> > succeeded, we also need to check the actual command's result code.
> >
> > If a host deasserts PERST without providing a refclock, enabling the PHY
> > (via a tegra_bpmp_transfer() call) will silently fail, however, because
> > we are lacking error handling, pex_ep_event_pex_rst_deassert() will still
> > set pcie->ep_state = EP_STATE_ENABLED.
> >
>
> How can a host deassert PERST# without providing refclk? As per the CEM spec
> r4.0, sec 2.2, refclk should be active at least TPERST-CLK before PERST# is
> deasserted.
>
> So does this controller violate the spec? Even so, I don't know how an endpoint
> could function if it relies on the host for refclk.
Hello Mani,
I apologize for my poor commit message.
What happens is this:
I use a rock5b as host. I use a jetson orin nano (Tegra234 based board).
When uboot starts on rock5b, it initializes the PCIe root complex, this
asserts+deasserts PERST#, and I can see the PERST GPIO handler being
triggered on the jetson board.
However, for some unknown reason, the PHY init fails.
I have no idea why. My guess is that perhaps the uboot driver does not
provide the refclock for very long after deasserting PERST, before deasserting
PERST again and booting linux.
The problem is that the tegra PCIe endpoint driver has no error handling for
the PHY init (which is done using a tegra_bpmp_transfer() command to firmware).
So the tegra PCIe endpoint driver will incorrectly set pcie->ep_state to
EP_STATE_ENABLED.
So when linux eventually boots on the rock5b, and the DWC based PCIe driver
asserts+deasserts PERST, it will be a no-op, because the tegra driver will
think that the pcie->ep_state is already EP_STATE_ENABLED.
This means that the driver will not try to initialize the PHY during this
second PERST assert+deassert.
By simply adding error handling to tegra_bpmp_transfer(), when the first
PERST assert+deassert (make by uboot) fails, the pcie->ep_state will not
incorrectly be set to EP_STATE_ENABLED.
Thus, when the second PERST assert+deassert (made by the linux driver) is done,
we will try to initialize the PHY again. For some reason, the PHY init works
this second time. Why? I don't really know, but we know that the DWC based
driver has the proper sleeps before/after asserting PERST, and that it will
not disable the reflock suddently, because it is loading some other OS.
I will try to improve the commit message to be more specific in V2.
Kind regards,
Niklas
>
> > Because of this, any succeeding PERST deassertion will incorrectly be a
> > no-op (because of the pcie->ep_state == EP_STATE_ENABLED check in
> > pex_ep_event_pex_rst_deassert()), even if the host does provide a refclock
> > during the succeeding PERST deassertion.
> >
> > Add error handling to tegra_bpmp_transfer(), such that the pcie->ep_state
> > can not get out of sync with reality, which will incorrectly cause the
> > driver to think that it has been successfully initialized, which
> > incorrectly makes future calls to pex_ep_event_pex_rst_deassert() a no-op.
> >
> > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > [cassel: improve commit log]
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> > ---
> > drivers/pci/controller/dwc/pcie-tegra194.c | 18 ++++++++++++++++--
> > 1 file changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> > index 0c0734aa14b68..8c5c370dbba5e 100644
> > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > @@ -1214,6 +1214,7 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
> > struct mrq_uphy_response resp;
> > struct tegra_bpmp_message msg;
> > struct mrq_uphy_request req;
> > + int err;
> >
> > /*
> > * Controller-5 doesn't need to have its state set by BPMP-FW in
> > @@ -1236,7 +1237,13 @@ static int tegra_pcie_bpmp_set_ctrl_state(struct tegra_pcie_dw *pcie,
> > msg.rx.data = &resp;
> > msg.rx.size = sizeof(resp);
> >
> > - return tegra_bpmp_transfer(pcie->bpmp, &msg);
> > + err = tegra_bpmp_transfer(pcie->bpmp, &msg);
> > + if (err)
> > + return err;
> > + if (msg.rx.ret)
> > + return -EINVAL;
> > +
> > + return 0;
> > }
> >
> > static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> > @@ -1245,6 +1252,7 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> > struct mrq_uphy_response resp;
> > struct tegra_bpmp_message msg;
> > struct mrq_uphy_request req;
> > + int err;
> >
> > memset(&req, 0, sizeof(req));
> > memset(&resp, 0, sizeof(resp));
> > @@ -1264,7 +1272,13 @@ static int tegra_pcie_bpmp_set_pll_state(struct tegra_pcie_dw *pcie,
> > msg.rx.data = &resp;
> > msg.rx.size = sizeof(resp);
> >
> > - return tegra_bpmp_transfer(pcie->bpmp, &msg);
> > + err = tegra_bpmp_transfer(pcie->bpmp, &msg);
> > + if (err)
> > + return err;
> > + if (msg.rx.ret)
> > + return -EINVAL;
> > +
> > + return 0;
> > }
> >
> > static void tegra_pcie_downstream_dev_to_D0(struct tegra_pcie_dw *pcie)
> > --
> > 2.51.0
> >
>
> --
> மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-09-22 13:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-11 12:27 [PATCH] PCI: tegra194: Handle errors in BPMP response Niklas Cassel
2025-09-17 13:15 ` Jon Hunter
2025-09-20 15:31 ` Manivannan Sadhasivam
2025-09-22 13:36 ` Niklas Cassel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox