linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Niklas Cassel <cassel@kernel.org>
To: Manivannan Sadhasivam <mani@kernel.org>
Cc: manivannan.sadhasivam@oss.qualcomm.com,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
	"Oliver O'Halloran" <oohall@gmail.com>,
	"Will Deacon" <will@kernel.org>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Rob Herring" <robh@kernel.org>,
	"Heiko Stuebner" <heiko@sntech.de>,
	"Philipp Zabel" <p.zabel@pengutronix.de>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org,
	linux-arm-msm@vger.kernel.org,
	linux-rockchip@lists.infradead.org,
	"Wilfred Mallawa" <wilfred.mallawa@wdc.com>,
	"Krishna Chaitanya Chundru" <krishna.chundru@oss.qualcomm.com>,
	"Lukas Wunner" <lukas@wunner.de>
Subject: Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way
Date: Thu, 4 Sep 2025 16:03:39 +0200	[thread overview]
Message-ID: <aLmcO8ukT-CDZMuT@ryzen> (raw)
In-Reply-To: <lakgphb7ym3cybwmpdqyipzi4tlkwbfijzhd4r6hvhho3pc7iu@6ludgw6wqkjh>

Hello Mani,

On Fri, Aug 29, 2025 at 09:44:08PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Aug 15, 2025 at 11:07:42AM GMT, Niklas Cassel wrote:

(snip)

> > > > > ## On EP side:
> > > > > # echo 0 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start && \
> > > > >   sleep 0.1 && echo 1 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start
> > > > > 
> > > > > Basically all tests timeout
> > > > > # FAILED: 1 / 16 tests passed.
> > > > > 
> > > > > Which is the same as before this patch series.
> > > 
> > > This is kind of expected since the pci_endpoint_test driver doesn't have the AER
> > > err_handlers defined.
> > 
> > I see.
> > Would be nice if we could add them then, so that we can verify that this
> > series is working as intended.

(snip)

> Ok, thanks for the logs. I guess what is happening here is that we are not
> saving/restoring the config space of the devices under the Root Port if linkdown
> is happens. TBH, we cannot do that from the PCI core since once linkdown
> happens, we cannot access any devices underneath the Root Port. But if
> err_handlers are available for drivers for all devices, they could do something
> smart like below:
> 
> diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
> index c4e5e2c977be..9aabf1fe902e 100644
> --- a/drivers/misc/pci_endpoint_test.c
> +++ b/drivers/misc/pci_endpoint_test.c
> @@ -989,6 +989,8 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
>  
>         pci_set_drvdata(pdev, test);
>  
> +       pci_save_state(pdev);
> +
>         id = ida_alloc(&pci_endpoint_test_ida, GFP_KERNEL);
>         if (id < 0) {
>                 ret = id;
> @@ -1140,12 +1142,31 @@ static const struct pci_device_id pci_endpoint_test_tbl[] = {
>  };
>  MODULE_DEVICE_TABLE(pci, pci_endpoint_test_tbl);
>  
> +static pci_ers_result_t pci_endpoint_test_error_detected(struct pci_dev *pdev,
> +                                              pci_channel_state_t state)
> +{
> +       return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t pci_endpoint_test_slot_reset(struct pci_dev *pdev)
> +{
> +       pci_restore_state(pdev);
> +
> +       return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +static const struct pci_error_handlers pci_endpoint_test_err_handler = {
> +       .error_detected = pci_endpoint_test_error_detected,
> +       .slot_reset = pci_endpoint_test_slot_reset,
> +};
> +
>  static struct pci_driver pci_endpoint_test_driver = {
>         .name           = DRV_MODULE_NAME,
>         .id_table       = pci_endpoint_test_tbl,
>         .probe          = pci_endpoint_test_probe,
>         .remove         = pci_endpoint_test_remove,
>         .sriov_configure = pci_sriov_configure_simple,
> +       .err_handler    = &pci_endpoint_test_err_handler,
>  };
>  module_pci_driver(pci_endpoint_test_driver);
> 
> This essentially saves the good known config space during probe and restores it
> during the slot_reset callback. Ofc, the state would've been overwritten if
> suspend/resume happens in-between, but the point I'm making is that unless all
> device drivers restore their known config space, devices cannot be resumed
> properly post linkdown recovery.
> 
> I can add a patch based on the above diff in next revision if that helps. Right
> now, I do not have access to my endpoint test setup. So can't test anything.

I tested your patch series + your suggested change above, and after a:

## On EP side:
# echo 0 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start && \
  sleep 0.1 && echo 1 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start

Instead of:

# FAILED: 1 / 16 tests passed.

I now get:
# FAILED: 7 / 16 tests passed.

Test cases 1-7 now passes (the test cases related to BARs),
all other test cases still fail:

# /pcitest 
TAP version 13
1..16
# Starting 16 tests from 9 test cases.
#  RUN           pci_ep_bar.BAR0.BAR_TEST ...
#            OK  pci_ep_bar.BAR0.BAR_TEST
ok 1 pci_ep_bar.BAR0.BAR_TEST
#  RUN           pci_ep_bar.BAR1.BAR_TEST ...
#            OK  pci_ep_bar.BAR1.BAR_TEST
ok 2 pci_ep_bar.BAR1.BAR_TEST
#  RUN           pci_ep_bar.BAR2.BAR_TEST ...
#            OK  pci_ep_bar.BAR2.BAR_TEST
ok 3 pci_ep_bar.BAR2.BAR_TEST
#  RUN           pci_ep_bar.BAR3.BAR_TEST ...
#            OK  pci_ep_bar.BAR3.BAR_TEST
ok 4 pci_ep_bar.BAR3.BAR_TEST
#  RUN           pci_ep_bar.BAR4.BAR_TEST ...
#      SKIP      BAR is disabled
#            OK  pci_ep_bar.BAR4.BAR_TEST
ok 5 pci_ep_bar.BAR4.BAR_TEST # SKIP BAR is disabled
#  RUN           pci_ep_bar.BAR5.BAR_TEST ...
#            OK  pci_ep_bar.BAR5.BAR_TEST
ok 6 pci_ep_bar.BAR5.BAR_TEST
#  RUN           pci_ep_basic.CONSECUTIVE_BAR_TEST ...
#            OK  pci_ep_basic.CONSECUTIVE_BAR_TEST
ok 7 pci_ep_basic.CONSECUTIVE_BAR_TEST
#  RUN           pci_ep_basic.LEGACY_IRQ_TEST ...
# pci_endpoint_test.c:106:LEGACY_IRQ_TEST:Expected 0 (0) == ret (-110)
# pci_endpoint_test.c:106:LEGACY_IRQ_TEST:Test failed for Legacy IRQ
# LEGACY_IRQ_TEST: Test failed
#          FAIL  pci_ep_basic.LEGACY_IRQ_TEST
not ok 8 pci_ep_basic.LEGACY_IRQ_TEST
#  RUN           pci_ep_basic.MSI_TEST ...
# pci_endpoint_test.c:118:MSI_TEST:Expected 0 (0) == ret (-110)
# pci_endpoint_test.c:118:MSI_TEST:Test failed for MSI1
# pci_endpoint_test.c:118:MSI_TEST:Expected 0 (0) == ret (-110)
# pci_endpoint_test.c:118:MSI_TEST:Test failed for MSI2
# pci_endpoint_test.c:118:MSI_TEST:Expected 0 (0) == ret (-110)
# pci_endpoint_test.c:118:MSI_TEST:Test failed for MSI3
...


I think I know the reason.. you save the state before the IRQs have been allocated.

Perhaps we need to save the state after enabling IRQs?

I tried this patch on top of your patch:
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -851,6 +851,8 @@ static int pci_endpoint_test_set_irq(struct pci_endpoint_test *test,
                return ret;
        }
 
+       pci_save_state(pdev);
+
        return 0;
 }


But still:
# FAILED: 7 / 16 tests passed.

So... apparently that did not help...

I tried with the following change as well (on top of my patch above):

+static pci_ers_result_t pci_endpoint_test_slot_reset(struct pci_dev *pdev)
+{
+       struct pci_endpoint_test *test = pci_get_drvdata(pdev);
+       int irq_type = test->irq_type;
+
+       pci_restore_state(pdev);
+
+       if (irq_type != PCITEST_IRQ_TYPE_UNDEFINED) {
+               pci_endpoint_test_clear_irq(test);
+               pci_endpoint_test_set_irq(test, irq_type);
+       }
+
+       return PCI_ERS_RESULT_RECOVERED;
+}

But still only:
# FAILED: 7 / 16 tests passed.

Do you have any suggestions?


Kind regards,
Niklas


  reply	other threads:[~2025-09-04 18:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-15 14:21 [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 1/4] PCI/ERR: " Manivannan Sadhasivam via B4 Relay
2025-07-17 18:28   ` [PATCH v6 1/4] PCI/ERR: Add support for resetting the Root Ports in a platform specific wayy Frank Li
2025-07-15 14:21 ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam via B4 Relay
2025-07-17 18:31   ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Portsy Frank Li
2025-08-28 20:25   ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Brian Norris
     [not found]     ` <aLFmSFe5iyYDrIjt@wunner.de>
2025-08-29 23:58       ` Brian Norris
2025-07-15 14:21 ` [PATCH v6 3/4] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 4/4] PCI: dw-rockchip: Add support to reset Root Port upon " Manivannan Sadhasivam via B4 Relay
2025-07-18  3:58 ` [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Krishna Chaitanya Chundru
2025-07-18 10:28 ` Niklas Cassel
2025-07-18 10:39   ` Niklas Cassel
2025-07-24  5:30     ` Manivannan Sadhasivam
2025-08-15  9:07       ` Niklas Cassel
2025-08-29 16:14         ` Manivannan Sadhasivam
2025-09-04 14:03           ` Niklas Cassel [this message]
2025-07-24  9:28 ` Hongxing Zhu
2025-08-28 20:01 ` Brian Norris
2025-08-29 13:56   ` Manivannan Sadhasivam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aLmcO8ukT-CDZMuT@ryzen \
    --to=cassel@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=heiko@sntech.de \
    --cc=krishna.chundru@oss.qualcomm.com \
    --cc=kwilczynski@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lpieralisi@kernel.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=mani@kernel.org \
    --cc=manivannan.sadhasivam@oss.qualcomm.com \
    --cc=oohall@gmail.com \
    --cc=p.zabel@pengutronix.de \
    --cc=robh@kernel.org \
    --cc=wilfred.mallawa@wdc.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).