* [RFC PATCH v3 0/3] PCI: rockchip-host: Support quirky devices @ 2025-06-10 19:05 Geraldo Nascimento 2025-06-10 19:05 ` [RFC PATCH v3 1/3] PCI: rockchip-host: reorder rockchip_pcie_set_vpcie() Geraldo Nascimento ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Geraldo Nascimento @ 2025-06-10 19:05 UTC (permalink / raw) To: linux-rockchip Cc: Hugh Cole-Baker, Shawn Lin, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel Hi folks, while I understand there are lots of already-working PCIe devices on RK3399 there are also many quirky devices which fail link training and refuse to enumerate. This RFC series is meant to alleviate this problem and has been tested on my Rock Pi N10. Note that with these patches, link will train for quirky devices but with Gen1 only and only one lane (x1). I have separate patches for improving to Gen2 and all four lanes (x4). They don't depend on this fix however and since I predict the present patches are bound to be controversial, I decided to send the quality improvements separately. --- V2 -> V3: separated commit for reordering function as per Bjorn's suggestion V1 -> V2: adjusted commit message to be more clear about change Geraldo Nascimento (3): PCI: rockchip-host: reorder rockchip_pcie_set_vpcie() PCI: rockchip-host: Retry link training on failure without PERST# arm64: dts: rockchip: drop PCIe 3v3 always-on and boot-on .../dts/rockchip/rk3399pro-vmarc-som.dtsi | 2 - drivers/pci/controller/pcie-rockchip-host.c | 141 +++++++++++------- 2 files changed, 87 insertions(+), 56 deletions(-) -- 2.49.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH v3 1/3] PCI: rockchip-host: reorder rockchip_pcie_set_vpcie() 2025-06-10 19:05 [RFC PATCH v3 0/3] PCI: rockchip-host: Support quirky devices Geraldo Nascimento @ 2025-06-10 19:05 ` Geraldo Nascimento 2025-06-10 19:05 ` [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# Geraldo Nascimento 2025-06-10 19:05 ` [RFC PATCH v3 3/3] arm64: dts: rockchip: drop PCIe 3v3 always-on and boot-on Geraldo Nascimento 2 siblings, 0 replies; 12+ messages in thread From: Geraldo Nascimento @ 2025-06-10 19:05 UTC (permalink / raw) To: linux-rockchip Cc: Hugh Cole-Baker, Shawn Lin, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel rockchip_pcie_set_vpcie() is needed for re-enabling power regulators after disabling them, if link training fails. This permits quirky endpoint devices to complete link training, enumerate sucessfully on the PCI bus and generally work with RK3399 PCIe. Reorder the function - no functional change intended. Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> --- drivers/pci/controller/pcie-rockchip-host.c | 94 ++++++++++----------- 1 file changed, 47 insertions(+), 47 deletions(-) diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c index b9e7a8710cf0..2a1071cd3241 100644 --- a/drivers/pci/controller/pcie-rockchip-host.c +++ b/drivers/pci/controller/pcie-rockchip-host.c @@ -284,6 +284,53 @@ static void rockchip_pcie_set_power_limit(struct rockchip_pcie *rockchip) rockchip_pcie_write(rockchip, status, PCIE_RC_CONFIG_DCR); } +static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) +{ + struct device *dev = rockchip->dev; + int err; + + if (!IS_ERR(rockchip->vpcie12v)) { + err = regulator_enable(rockchip->vpcie12v); + if (err) { + dev_err(dev, "fail to enable vpcie12v regulator\n"); + goto err_out; + } + } + + if (!IS_ERR(rockchip->vpcie3v3)) { + err = regulator_enable(rockchip->vpcie3v3); + if (err) { + dev_err(dev, "fail to enable vpcie3v3 regulator\n"); + goto err_disable_12v; + } + } + + err = regulator_enable(rockchip->vpcie1v8); + if (err) { + dev_err(dev, "fail to enable vpcie1v8 regulator\n"); + goto err_disable_3v3; + } + + err = regulator_enable(rockchip->vpcie0v9); + if (err) { + dev_err(dev, "fail to enable vpcie0v9 regulator\n"); + goto err_disable_1v8; + } + + return 0; + +err_disable_1v8: + regulator_disable(rockchip->vpcie1v8); +err_disable_3v3: + if (!IS_ERR(rockchip->vpcie3v3)) + regulator_disable(rockchip->vpcie3v3); +err_disable_12v: + if (!IS_ERR(rockchip->vpcie12v)) + regulator_disable(rockchip->vpcie12v); +err_out: + return err; +} + /** * rockchip_pcie_host_init_port - Initialize hardware * @rockchip: PCIe port information @@ -613,53 +660,6 @@ static int rockchip_pcie_parse_host_dt(struct rockchip_pcie *rockchip) return 0; } -static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) -{ - struct device *dev = rockchip->dev; - int err; - - if (!IS_ERR(rockchip->vpcie12v)) { - err = regulator_enable(rockchip->vpcie12v); - if (err) { - dev_err(dev, "fail to enable vpcie12v regulator\n"); - goto err_out; - } - } - - if (!IS_ERR(rockchip->vpcie3v3)) { - err = regulator_enable(rockchip->vpcie3v3); - if (err) { - dev_err(dev, "fail to enable vpcie3v3 regulator\n"); - goto err_disable_12v; - } - } - - err = regulator_enable(rockchip->vpcie1v8); - if (err) { - dev_err(dev, "fail to enable vpcie1v8 regulator\n"); - goto err_disable_3v3; - } - - err = regulator_enable(rockchip->vpcie0v9); - if (err) { - dev_err(dev, "fail to enable vpcie0v9 regulator\n"); - goto err_disable_1v8; - } - - return 0; - -err_disable_1v8: - regulator_disable(rockchip->vpcie1v8); -err_disable_3v3: - if (!IS_ERR(rockchip->vpcie3v3)) - regulator_disable(rockchip->vpcie3v3); -err_disable_12v: - if (!IS_ERR(rockchip->vpcie12v)) - regulator_disable(rockchip->vpcie12v); -err_out: - return err; -} - static void rockchip_pcie_enable_interrupts(struct rockchip_pcie *rockchip) { rockchip_pcie_write(rockchip, (PCIE_CLIENT_INT_CLI << 16) & -- 2.49.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# 2025-06-10 19:05 [RFC PATCH v3 0/3] PCI: rockchip-host: Support quirky devices Geraldo Nascimento 2025-06-10 19:05 ` [RFC PATCH v3 1/3] PCI: rockchip-host: reorder rockchip_pcie_set_vpcie() Geraldo Nascimento @ 2025-06-10 19:05 ` Geraldo Nascimento 2025-06-23 11:29 ` Manivannan Sadhasivam 2025-07-18 1:55 ` Shawn Lin 2025-06-10 19:05 ` [RFC PATCH v3 3/3] arm64: dts: rockchip: drop PCIe 3v3 always-on and boot-on Geraldo Nascimento 2 siblings, 2 replies; 12+ messages in thread From: Geraldo Nascimento @ 2025-06-10 19:05 UTC (permalink / raw) To: linux-rockchip Cc: Hugh Cole-Baker, Shawn Lin, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi N10 through trial-and-error debugging, I finally got positive results with enumeration on the PCI bus for both a Realtek 8111E NIC and a Samsung PM981a SSD. The NIC was connected to a M.2->PCIe x4 riser card and it would get stuck on Polling.Compliance, without breaking electrical idle on the Host RX side. The Samsung PM981a SSD is directly connected to M.2 connector and that SSD is known to be quirky (OEM... no support) and non-functional on the RK3399 platform. The Samsung SSD was even worse than the NIC - it would get stuck on Detect.Active like a bricked card, even though it was fully functional via USB adapter. It seems both devices benefit from retrying Link Training if - big if here - PERST# is not toggled during retry. For retry to work, flow must be exactly as handled by present patch, that is, we must cut power, disable the clocks, then re-enable both clocks and power regulators and go through initialization without touching PERST#. Then quirky devices are able to sucessfully enumerate. No functional change intended for already working devices. Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> --- drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++--- 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c index 2a1071cd3241..67b3b379d277 100644 --- a/drivers/pci/controller/pcie-rockchip-host.c +++ b/drivers/pci/controller/pcie-rockchip-host.c @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) { struct device *dev = rockchip->dev; - int err, i = MAX_LANE_NUM; + int err, i = MAX_LANE_NUM, is_reinit = 0; u32 status; - gpiod_set_value_cansleep(rockchip->perst_gpio, 0); + if (!is_reinit) { + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); + } +reinit: err = rockchip_pcie_init_port(rockchip); if (err) return err; @@ -369,16 +372,46 @@ static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) rockchip_pcie_write(rockchip, PCIE_CLIENT_LINK_TRAIN_ENABLE, PCIE_CLIENT_CONFIG); - msleep(PCIE_T_PVPERL_MS); - gpiod_set_value_cansleep(rockchip->perst_gpio, 1); - - msleep(PCIE_T_RRS_READY_MS); + if (!is_reinit) { + msleep(PCIE_T_PVPERL_MS); + gpiod_set_value_cansleep(rockchip->perst_gpio, 1); + msleep(PCIE_T_RRS_READY_MS); + } /* 500ms timeout value should be enough for Gen1/2 training */ err = readl_poll_timeout(rockchip->apb_base + PCIE_CLIENT_BASIC_STATUS1, status, PCIE_LINK_UP(status), 20, 500 * USEC_PER_MSEC); - if (err) { + + if (err && !is_reinit) { + while (i--) + phy_power_off(rockchip->phys[i]); + i = MAX_LANE_NUM; + while (i--) + phy_exit(rockchip->phys[i]); + i = MAX_LANE_NUM; + is_reinit = 1; + dev_dbg(dev, "Will reinit PCIe without toggling PERST#"); + if (!IS_ERR(rockchip->vpcie12v)) + regulator_disable(rockchip->vpcie12v); + if (!IS_ERR(rockchip->vpcie3v3)) + regulator_disable(rockchip->vpcie3v3); + regulator_disable(rockchip->vpcie1v8); + regulator_disable(rockchip->vpcie0v9); + rockchip_pcie_disable_clocks(rockchip); + err = rockchip_pcie_enable_clocks(rockchip); + if (err) + return err; + err = rockchip_pcie_set_vpcie(rockchip); + if (err) { + dev_err(dev, "failed to set vpcie regulator\n"); + rockchip_pcie_disable_clocks(rockchip); + return err; + } + goto reinit; + } + + else if (err) { dev_err(dev, "PCIe link training gen1 timeout!\n"); goto err_power_off_phy; } -- 2.49.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# 2025-06-10 19:05 ` [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# Geraldo Nascimento @ 2025-06-23 11:29 ` Manivannan Sadhasivam 2025-06-23 11:44 ` Geraldo Nascimento 2025-07-18 1:55 ` Shawn Lin 1 sibling, 1 reply; 12+ messages in thread From: Manivannan Sadhasivam @ 2025-06-23 11:29 UTC (permalink / raw) To: Geraldo Nascimento Cc: linux-rockchip, Hugh Cole-Baker, Shawn Lin, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote: > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > N10 through trial-and-error debugging, I finally got positive results > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > Samsung PM981a SSD. > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > stuck on Polling.Compliance, without breaking electrical idle on the > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > connector and that SSD is known to be quirky (OEM... no support) > and non-functional on the RK3399 platform. > > The Samsung SSD was even worse than the NIC - it would get stuck on > Detect.Active like a bricked card, even though it was fully functional > via USB adapter. > > It seems both devices benefit from retrying Link Training if - big if > here - PERST# is not toggled during retry. > > For retry to work, flow must be exactly as handled by present patch, > that is, we must cut power, disable the clocks, then re-enable > both clocks and power regulators and go through initialization > without touching PERST#. Then quirky devices are able to sucessfully > enumerate. > This sounds weird. PERST# is just an indication to the device that the power and refclk are applied or going to be removed. The devices uses PERST# to prepare for the power removal during assert and start functioning after deassert. It looks like the PERST# polarity is inverted in your case. Could you please change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue without this patch? If that didn't work, could you please drop the 'ep-gpios' property and check? > No functional change intended for already working devices. > > Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> > --- > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++--- > 1 file changed, 40 insertions(+), 7 deletions(-) > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c > index 2a1071cd3241..67b3b379d277 100644 > --- a/drivers/pci/controller/pcie-rockchip-host.c > +++ b/drivers/pci/controller/pcie-rockchip-host.c > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > { > struct device *dev = rockchip->dev; > - int err, i = MAX_LANE_NUM; > + int err, i = MAX_LANE_NUM, is_reinit = 0; > u32 status; > > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > + if (!is_reinit) { > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > + } > > +reinit: So this reinit part only skips the PERST# assert, but calls rockchip_pcie_init_port() which resets the Root Port including PHY. I don't think it is safe to do it if PERST# is wired. - Mani -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# 2025-06-23 11:29 ` Manivannan Sadhasivam @ 2025-06-23 11:44 ` Geraldo Nascimento 2025-07-17 12:29 ` Manivannan Sadhasivam 0 siblings, 1 reply; 12+ messages in thread From: Geraldo Nascimento @ 2025-06-23 11:44 UTC (permalink / raw) To: Manivannan Sadhasivam Cc: linux-rockchip, Hugh Cole-Baker, Shawn Lin, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel On Mon, Jun 23, 2025 at 05:29:46AM -0600, Manivannan Sadhasivam wrote: > On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote: > > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > > N10 through trial-and-error debugging, I finally got positive results > > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > > Samsung PM981a SSD. > > > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > > stuck on Polling.Compliance, without breaking electrical idle on the > > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > > connector and that SSD is known to be quirky (OEM... no support) > > and non-functional on the RK3399 platform. > > > > The Samsung SSD was even worse than the NIC - it would get stuck on > > Detect.Active like a bricked card, even though it was fully functional > > via USB adapter. > > > > It seems both devices benefit from retrying Link Training if - big if > > here - PERST# is not toggled during retry. > > > > For retry to work, flow must be exactly as handled by present patch, > > that is, we must cut power, disable the clocks, then re-enable > > both clocks and power regulators and go through initialization > > without touching PERST#. Then quirky devices are able to sucessfully > > enumerate. > > > > This sounds weird. PERST# is just an indication to the device that the power and > refclk are applied or going to be removed. The devices uses PERST# to prepare > for the power removal during assert and start functioning after deassert. Hi Mani! Thank you for looking into this. Yeah, tell me about it, it is beyond weird. I posted RFC Patch in the hopes someone with access to PCIe Analyzer could have deeper look at what the heck is going on here - because it does work, but I don't claim to understand how. > > It looks like the PERST# polarity is inverted in your case. Could you please > change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue > without this patch? > > If that didn't work, could you please drop the 'ep-gpios' property and check? Sorry to decline your request, but I assure you I have tried many other combinations before reaching present patch, including your suggestion. It will do nothing. It won't work, won't make SSD that refuse to work with RK3399, working. Note that this isn't specific to my board - RK3399 is infamous for being picky about devices. > > > No functional change intended for already working devices. > > > > Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> > > --- > > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++--- > > 1 file changed, 40 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c > > index 2a1071cd3241..67b3b379d277 100644 > > --- a/drivers/pci/controller/pcie-rockchip-host.c > > +++ b/drivers/pci/controller/pcie-rockchip-host.c > > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) > > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > > { > > struct device *dev = rockchip->dev; > > - int err, i = MAX_LANE_NUM; > > + int err, i = MAX_LANE_NUM, is_reinit = 0; > > u32 status; > > > > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > > + if (!is_reinit) { > > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > > + } > > > > +reinit: > > So this reinit part only skips the PERST# assert, but calls > rockchip_pcie_init_port() which resets the Root Port including PHY. I don't > think it is safe to do it if PERST# is wired. I don't understand, could you be a bit more verbose on why do you think this is dangerous? Thanks, Geraldo Nascimento > > - Mani > > -- > மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# 2025-06-23 11:44 ` Geraldo Nascimento @ 2025-07-17 12:29 ` Manivannan Sadhasivam 2025-07-17 13:50 ` Geraldo Nascimento 0 siblings, 1 reply; 12+ messages in thread From: Manivannan Sadhasivam @ 2025-07-17 12:29 UTC (permalink / raw) To: Geraldo Nascimento, Shawn Lin Cc: linux-rockchip, Hugh Cole-Baker, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel On Mon, Jun 23, 2025 at 08:44:49AM GMT, Geraldo Nascimento wrote: > On Mon, Jun 23, 2025 at 05:29:46AM -0600, Manivannan Sadhasivam wrote: > > On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote: > > > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > > > N10 through trial-and-error debugging, I finally got positive results > > > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > > > Samsung PM981a SSD. > > > > > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > > > stuck on Polling.Compliance, without breaking electrical idle on the > > > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > > > connector and that SSD is known to be quirky (OEM... no support) > > > and non-functional on the RK3399 platform. > > > > > > The Samsung SSD was even worse than the NIC - it would get stuck on > > > Detect.Active like a bricked card, even though it was fully functional > > > via USB adapter. > > > > > > It seems both devices benefit from retrying Link Training if - big if > > > here - PERST# is not toggled during retry. > > > > > > For retry to work, flow must be exactly as handled by present patch, > > > that is, we must cut power, disable the clocks, then re-enable > > > both clocks and power regulators and go through initialization > > > without touching PERST#. Then quirky devices are able to sucessfully > > > enumerate. > > > > > > > This sounds weird. PERST# is just an indication to the device that the power and > > refclk are applied or going to be removed. The devices uses PERST# to prepare > > for the power removal during assert and start functioning after deassert. > > Hi Mani! Thank you for looking into this. > > Yeah, tell me about it, it is beyond weird. I posted RFC Patch in the > hopes someone with access to PCIe Analyzer could have deeper look > at what the heck is going on here - because it does work, but I don't > claim to understand how. > I was hoping that the Rockchip folks would chime in, but no reply from them so far. @Shawn: Could you please shed some light here? > > > > It looks like the PERST# polarity is inverted in your case. Could you please > > change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue > > without this patch? > > > > If that didn't work, could you please drop the 'ep-gpios' property and check? > > Sorry to decline your request, but I assure you I have tried many > other combinations before reaching present patch, including your > suggestion. It will do nothing. It won't work, won't make SSD that > refuse to work with RK3399, working. Note that this isn't specific > to my board - RK3399 is infamous for being picky about devices. > > > > > > No functional change intended for already working devices. > > > > > > Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> > > > --- > > > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++--- > > > 1 file changed, 40 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c > > > index 2a1071cd3241..67b3b379d277 100644 > > > --- a/drivers/pci/controller/pcie-rockchip-host.c > > > +++ b/drivers/pci/controller/pcie-rockchip-host.c > > > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) > > > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > > > { > > > struct device *dev = rockchip->dev; > > > - int err, i = MAX_LANE_NUM; > > > + int err, i = MAX_LANE_NUM, is_reinit = 0; > > > u32 status; > > > > > > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > > > + if (!is_reinit) { > > > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > > > + } > > > > > > +reinit: > > > > So this reinit part only skips the PERST# assert, but calls > > rockchip_pcie_init_port() which resets the Root Port including PHY. I don't > > think it is safe to do it if PERST# is wired. > > I don't understand, could you be a bit more verbose on why do you > think this is dangerous? > When the Root Port and PHY gets reset, there is a good chance that the refclk would also be cutoff. So if that happens without PERST# assert, then the device has no chance to clean its state machine. If the device gets its own refclk, then it is a different story, but we should not make assumptions. - Mani -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# 2025-07-17 12:29 ` Manivannan Sadhasivam @ 2025-07-17 13:50 ` Geraldo Nascimento 0 siblings, 0 replies; 12+ messages in thread From: Geraldo Nascimento @ 2025-07-17 13:50 UTC (permalink / raw) To: Manivannan Sadhasivam Cc: Shawn Lin, linux-rockchip, Hugh Cole-Baker, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel On Thu, Jul 17, 2025 at 05:59:32PM +0530, Manivannan Sadhasivam wrote: > On Mon, Jun 23, 2025 at 08:44:49AM GMT, Geraldo Nascimento wrote: > > On Mon, Jun 23, 2025 at 05:29:46AM -0600, Manivannan Sadhasivam wrote: > > > On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote: > > > > +reinit: > > > > > > So this reinit part only skips the PERST# assert, but calls > > > rockchip_pcie_init_port() which resets the Root Port including PHY. I don't > > > think it is safe to do it if PERST# is wired. > > > > I don't understand, could you be a bit more verbose on why do you > > think this is dangerous? > > > > When the Root Port and PHY gets reset, there is a good chance that the refclk > would also be cutoff. So if that happens without PERST# assert, then the device > has no chance to clean its state machine. If the device gets its own refclk, > then it is a different story, but we should not make assumptions. Hi Mani, thank you for your time spent looking into this! I'm not sure if the following information helps, but patch 2 of this series disables the PCIe 3.3V always-on/boot-on through DT. That was not incidental, and in fact it is required for patch 1 to work. Then, if you follow the proposed code change, you will see that power is effectively cut via disabling the power regulators, even before disabling the clocks. So there's effectively zero chance of corrupting the endpoint device state machine, since the device is power-cycled. While I understand we should not make assumptions on kernel work, and that the patch is unmergeable on its current form (it's a goddamn hack), it does empirically alleviate a very real report, that of known-good working devices refusing to cooperate with Rockchip-IP PCIe. I agree we should wait on Shawn Lin's feedback. Thank you, Geraldo Nascimento > > - Mani > > -- > மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# 2025-06-10 19:05 ` [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# Geraldo Nascimento 2025-06-23 11:29 ` Manivannan Sadhasivam @ 2025-07-18 1:55 ` Shawn Lin 2025-07-18 3:33 ` Geraldo Nascimento 1 sibling, 1 reply; 12+ messages in thread From: Shawn Lin @ 2025-07-18 1:55 UTC (permalink / raw) To: Geraldo Nascimento Cc: shawn.lin, Hugh Cole-Baker, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel, linux-rockchip Hi Geraldo, 在 2025/06/11 星期三 3:05, Geraldo Nascimento 写道: > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > N10 through trial-and-error debugging, I finally got positive results > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > Samsung PM981a SSD. > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > stuck on Polling.Compliance, without breaking electrical idle on the > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > connector and that SSD is known to be quirky (OEM... no support) > and non-functional on the RK3399 platform. > > The Samsung SSD was even worse than the NIC - it would get stuck on > Detect.Active like a bricked card, even though it was fully functional > via USB adapter. > > It seems both devices benefit from retrying Link Training if - big if > here - PERST# is not toggled during retry. > I didn't see this error before especially given RTL8111 NIC is widelly used by customers. Could you help tried this? [1] apply your patch 3 first [2] apply below changes --- a/drivers/pci/controller/pcie-rockchip-host.c +++ b/drivers/pci/controller/pcie-rockchip-host.c @@ -314,7 +314,7 @@ static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) rockchip_pcie_write(rockchip, PCIE_CLIENT_LINK_TRAIN_ENABLE, PCIE_CLIENT_CONFIG); - msleep(PCIE_T_PVPERL_MS); + msleep(500); gpiod_set_value_cansleep(rockchip->perst_gpio, 1); msleep(PCIE_RESET_CONFIG_WAIT_MS); @@ -322,7 +322,7 @@ static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) /* 500ms timeout value should be enough for Gen1/2 training */ err = readl_poll_timeout(rockchip->apb_base + PCIE_CLIENT_BASIC_STATUS1, status, PCIE_LINK_UP(status), 20, - 500 * USEC_PER_MSEC); + 5000 * USEC_PER_MSEC); if (err) { dev_err(dev, "PCIe link training gen1 timeout!\n"); goto err_power_off_phy; @@ -951,6 +951,8 @@ static int rockchip_pcie_probe(struct platform_device *pdev) if (err) return err; + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); + err = rockchip_pcie_set_vpcie(rockchip); if (err) { dev_err(dev, "failed to set vpcie regulator\n"); > For retry to work, flow must be exactly as handled by present patch, > that is, we must cut power, disable the clocks, then re-enable > both clocks and power regulators and go through initialization > without touching PERST#. Then quirky devices are able to sucessfully > enumerate. > > No functional change intended for already working devices. > > Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> > --- > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++--- > 1 file changed, 40 insertions(+), 7 deletions(-) > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c > index 2a1071cd3241..67b3b379d277 100644 > --- a/drivers/pci/controller/pcie-rockchip-host.c > +++ b/drivers/pci/controller/pcie-rockchip-host.c > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > { > struct device *dev = rockchip->dev; > - int err, i = MAX_LANE_NUM; > + int err, i = MAX_LANE_NUM, is_reinit = 0; > u32 status; > > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > + if (!is_reinit) { > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > + } > > +reinit: > err = rockchip_pcie_init_port(rockchip); > if (err) > return err; > @@ -369,16 +372,46 @@ static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > rockchip_pcie_write(rockchip, PCIE_CLIENT_LINK_TRAIN_ENABLE, > PCIE_CLIENT_CONFIG); > > - msleep(PCIE_T_PVPERL_MS); > - gpiod_set_value_cansleep(rockchip->perst_gpio, 1); > - > - msleep(PCIE_T_RRS_READY_MS); > + if (!is_reinit) { > + msleep(PCIE_T_PVPERL_MS); > + gpiod_set_value_cansleep(rockchip->perst_gpio, 1); > + msleep(PCIE_T_RRS_READY_MS); > + } > > /* 500ms timeout value should be enough for Gen1/2 training */ > err = readl_poll_timeout(rockchip->apb_base + PCIE_CLIENT_BASIC_STATUS1, > status, PCIE_LINK_UP(status), 20, > 500 * USEC_PER_MSEC); > - if (err) { > + > + if (err && !is_reinit) { > + while (i--) > + phy_power_off(rockchip->phys[i]); > + i = MAX_LANE_NUM; > + while (i--) > + phy_exit(rockchip->phys[i]); > + i = MAX_LANE_NUM; > + is_reinit = 1; > + dev_dbg(dev, "Will reinit PCIe without toggling PERST#"); > + if (!IS_ERR(rockchip->vpcie12v)) > + regulator_disable(rockchip->vpcie12v); > + if (!IS_ERR(rockchip->vpcie3v3)) > + regulator_disable(rockchip->vpcie3v3); > + regulator_disable(rockchip->vpcie1v8); > + regulator_disable(rockchip->vpcie0v9); > + rockchip_pcie_disable_clocks(rockchip); > + err = rockchip_pcie_enable_clocks(rockchip); > + if (err) > + return err; > + err = rockchip_pcie_set_vpcie(rockchip); > + if (err) { > + dev_err(dev, "failed to set vpcie regulator\n"); > + rockchip_pcie_disable_clocks(rockchip); > + return err; > + } > + goto reinit; > + } > + > + else if (err) { > dev_err(dev, "PCIe link training gen1 timeout!\n"); > goto err_power_off_phy; > } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# 2025-07-18 1:55 ` Shawn Lin @ 2025-07-18 3:33 ` Geraldo Nascimento 2025-07-18 3:46 ` Shawn Lin 0 siblings, 1 reply; 12+ messages in thread From: Geraldo Nascimento @ 2025-07-18 3:33 UTC (permalink / raw) To: Shawn Lin Cc: Hugh Cole-Baker, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel, linux-rockchip On Fri, Jul 18, 2025 at 09:55:42AM +0800, Shawn Lin wrote: > Hi Geraldo, > > 在 2025/06/11 星期三 3:05, Geraldo Nascimento 写道: > > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > > N10 through trial-and-error debugging, I finally got positive results > > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > > Samsung PM981a SSD. > > > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > > stuck on Polling.Compliance, without breaking electrical idle on the > > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > > connector and that SSD is known to be quirky (OEM... no support) > > and non-functional on the RK3399 platform. > > > > The Samsung SSD was even worse than the NIC - it would get stuck on > > Detect.Active like a bricked card, even though it was fully functional > > via USB adapter. > > > > It seems both devices benefit from retrying Link Training if - big if > > here - PERST# is not toggled during retry. > > > > I didn't see this error before especially given RTL8111 NIC is widelly > used by customers. Hi Shawn, great to hear from you! Notice that my board exposes PCIe only via NVMe connector, and not directly via a proper PCIe connector, so it is necessary for me to adapt with inexpensive riser card that exposes proper PCIe connector. I say this because while I don't doubt that the RTL8111 NIC works out-of-the-box for boards that directly expose PCIe connector, the combination of riser card plus NIC has a similar effect - though not entirely equal, as described above - of connecting known good SSDs that simply refuse to work with Rockchip-IP PCIe. I admit that patch 1 looks a little crazy, but is has the effect of enabling use of presently non-working devices or combination of devices on this IP, at least on the board I have access to. > > Could you help tried this? > [1] apply your patch 3 first Sure, I'm always open for testing, but could you clarify the patch 3 part? AFAIK this series of mine only has 2 patches, so I'm a little confused about exactly which patch to apply as a preliminary step. Also, since you're asking me to test some code, I think it is only fair if I ask you to test my code, too. It shouldn't be too hard for you to find a otherwise working NVMe SSD that refuses to complete link training with current code. Connect this SSD please to a RK3399 board and let us know if my proposed code change does anything to ameliorate the long-standing issue of SSD that refuses to cooperate. Thank you, Geraldo Nascimento ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# 2025-07-18 3:33 ` Geraldo Nascimento @ 2025-07-18 3:46 ` Shawn Lin 2025-07-18 17:06 ` Geraldo Nascimento 0 siblings, 1 reply; 12+ messages in thread From: Shawn Lin @ 2025-07-18 3:46 UTC (permalink / raw) To: Geraldo Nascimento Cc: shawn.lin, Hugh Cole-Baker, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel, linux-rockchip 在 2025/07/18 星期五 11:33, Geraldo Nascimento 写道: > On Fri, Jul 18, 2025 at 09:55:42AM +0800, Shawn Lin wrote: >> Hi Geraldo, >> >> 在 2025/06/11 星期三 3:05, Geraldo Nascimento 写道: >>> After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi >>> N10 through trial-and-error debugging, I finally got positive results >>> with enumeration on the PCI bus for both a Realtek 8111E NIC and a >>> Samsung PM981a SSD. >>> >>> The NIC was connected to a M.2->PCIe x4 riser card and it would get >>> stuck on Polling.Compliance, without breaking electrical idle on the >>> Host RX side. The Samsung PM981a SSD is directly connected to M.2 >>> connector and that SSD is known to be quirky (OEM... no support) >>> and non-functional on the RK3399 platform. >>> >>> The Samsung SSD was even worse than the NIC - it would get stuck on >>> Detect.Active like a bricked card, even though it was fully functional >>> via USB adapter. >>> >>> It seems both devices benefit from retrying Link Training if - big if >>> here - PERST# is not toggled during retry. >>> >> >> I didn't see this error before especially given RTL8111 NIC is widelly >> used by customers. > > Hi Shawn, great to hear from you! > > Notice that my board exposes PCIe only via NVMe connector, and not > directly via a proper PCIe connector, so it is necessary for me to > adapt with inexpensive riser card that exposes proper PCIe connector. > > I say this because while I don't doubt that the RTL8111 NIC works > out-of-the-box for boards that directly expose PCIe connector, the > combination of riser card plus NIC has a similar effect - though not > entirely equal, as described above - of connecting known good SSDs > that simply refuse to work with Rockchip-IP PCIe. > > I admit that patch 1 looks a little crazy, but is has the effect of > enabling use of presently non-working devices or combination of devices > on this IP, at least on the board I have access to. > >> >> Could you help tried this? >> [1] apply your patch 3 first > > Sure, I'm always open for testing, but could you clarify the patch 3 > part? AFAIK this series of mine only has 2 patches, so I'm a little > confused about exactly which patch to apply as a preliminary step. Patch 3 refers to "arm64: dts: rockchip: drop PCIe 3v3 always-on and boot-on" which let kernel fully controller the power in case firmware did it in advanced. > > Also, since you're asking me to test some code, I think it is only fair > if I ask you to test my code, too. It shouldn't be too hard for you to > find a otherwise working NVMe SSD that refuses to complete link training > with current code. Connect this SSD please to a RK3399 board and let us > know if my proposed code change does anything to ameliorate the > long-standing issue of SSD that refuses to cooperate. Sure, I don't have Samsung PM981a SSD now, but I could try to test all my SSDs to find if I could pick up one that won't work. > > Thank you, > Geraldo Nascimento > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# 2025-07-18 3:46 ` Shawn Lin @ 2025-07-18 17:06 ` Geraldo Nascimento 0 siblings, 0 replies; 12+ messages in thread From: Geraldo Nascimento @ 2025-07-18 17:06 UTC (permalink / raw) To: Shawn Lin Cc: Hugh Cole-Baker, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel, linux-rockchip On Fri, Jul 18, 2025 at 11:46:33AM +0800, Shawn Lin wrote: > 在 2025/07/18 星期五 11:33, Geraldo Nascimento 写道: > > On Fri, Jul 18, 2025 at 09:55:42AM +0800, Shawn Lin wrote: > >> Could you help tried this? > >> [1] apply your patch 3 first > > > > Sure, I'm always open for testing, but could you clarify the patch 3 > > part? AFAIK this series of mine only has 2 patches, so I'm a little > > confused about exactly which patch to apply as a preliminary step. > > Patch 3 refers to "arm64: dts: rockchip: drop PCIe 3v3 always-on and > boot-on" which let kernel fully controller the power in case firmware > did it in advanced. Hi Shawn, I tested your patch but unfortunately it does not work, PM981a SSD "plays dead" and 2.5 GT/s training never completes, even with the bigger timeout. I hope you get the chance to test my patch soon, because once you share your results there could be two possible scenarios: 1) Patch does not alleviate problem for you: If this is the case, then there's little I can do further and this becomes a wild goose chase, so no chance of upstreaming anything and I'll just move on to more useful work and leave everybody else to do their useful work too. 2) Patch works and previously non-working SSD is now working: In this case there's something serious going on and it is our mission to find a way to correctly upstream a fix. Thanks, Geraldo Nascimento ^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH v3 3/3] arm64: dts: rockchip: drop PCIe 3v3 always-on and boot-on 2025-06-10 19:05 [RFC PATCH v3 0/3] PCI: rockchip-host: Support quirky devices Geraldo Nascimento 2025-06-10 19:05 ` [RFC PATCH v3 1/3] PCI: rockchip-host: reorder rockchip_pcie_set_vpcie() Geraldo Nascimento 2025-06-10 19:05 ` [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# Geraldo Nascimento @ 2025-06-10 19:05 ` Geraldo Nascimento 2 siblings, 0 replies; 12+ messages in thread From: Geraldo Nascimento @ 2025-06-10 19:05 UTC (permalink / raw) To: linux-rockchip Cc: Hugh Cole-Baker, Shawn Lin, Lorenzo Pieralisi, Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas, Heiko Stuebner, linux-pci, linux-arm-kernel, linux-kernel Example commit of needed dropping of regulator always-on/boot-on declarations to make sure quirky devices known to not be working on RK3399 are able to enumerate on second try without assertion/deassertion of PERST# in-band PCIe reset signal. One example only, to avoid patch-bomb. Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> --- arch/arm64/boot/dts/rockchip/rk3399pro-vmarc-som.dtsi | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/arm64/boot/dts/rockchip/rk3399pro-vmarc-som.dtsi b/arch/arm64/boot/dts/rockchip/rk3399pro-vmarc-som.dtsi index 8ce7cee92af0..d31fd3d34cda 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399pro-vmarc-som.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399pro-vmarc-som.dtsi @@ -25,8 +25,6 @@ vcc3v3_pcie: regulator-vcc-pcie { pinctrl-names = "default"; pinctrl-0 = <&pcie_pwr>; regulator-name = "vcc3v3_pcie"; - regulator-always-on; - regulator-boot-on; vin-supply = <&vcc5v0_sys>; }; -- 2.49.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-07-18 17:06 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-10 19:05 [RFC PATCH v3 0/3] PCI: rockchip-host: Support quirky devices Geraldo Nascimento 2025-06-10 19:05 ` [RFC PATCH v3 1/3] PCI: rockchip-host: reorder rockchip_pcie_set_vpcie() Geraldo Nascimento 2025-06-10 19:05 ` [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST# Geraldo Nascimento 2025-06-23 11:29 ` Manivannan Sadhasivam 2025-06-23 11:44 ` Geraldo Nascimento 2025-07-17 12:29 ` Manivannan Sadhasivam 2025-07-17 13:50 ` Geraldo Nascimento 2025-07-18 1:55 ` Shawn Lin 2025-07-18 3:33 ` Geraldo Nascimento 2025-07-18 3:46 ` Shawn Lin 2025-07-18 17:06 ` Geraldo Nascimento 2025-06-10 19:05 ` [RFC PATCH v3 3/3] arm64: dts: rockchip: drop PCIe 3v3 always-on and boot-on Geraldo Nascimento
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).