* [PATCH v2 net] qede: fix firmware halt over suspend and resume
@ 2023-08-09 13:43 Manish Chopra
2023-08-10 18:02 ` Simon Horman
2023-08-11 0:47 ` Jakub Kicinski
0 siblings, 2 replies; 7+ messages in thread
From: Manish Chopra @ 2023-08-09 13:43 UTC (permalink / raw)
To: kuba
Cc: netdev, aelior, palok, njavali, skashyap, jmeneghi, yuval.mintz,
skalluru, pabeni, edumazet, horms, David Miller
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.
Fixes: 2950219d87b0 ("qede: Add basic network device support")
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
---
V1->V2:
* Replace SIMPLE_DEV_PM_OPS with DEFINE_SIMPLE_DEV_PM_OPS
---
drivers/net/ethernet/qlogic/qede/qede_main.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index d57e52a97f85..18ae7af1764c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -177,6 +177,18 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param)
}
#endif
+static int __maybe_unused qede_suspend(struct device *dev)
+{
+ if (!dev)
+ return -ENODEV;
+
+ dev_info(dev, "Device does not support suspend operation\n");
+
+ return -EOPNOTSUPP;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(qede_pm_ops, qede_suspend, NULL);
+
static const struct pci_error_handlers qede_err_handler = {
.error_detected = qede_io_error_detected,
};
@@ -191,6 +203,7 @@ static struct pci_driver qede_pci_driver = {
.sriov_configure = qede_sriov_configure,
#endif
.err_handler = &qede_err_handler,
+ .driver.pm = &qede_pm_ops,
};
static struct qed_eth_cb_ops qede_ll_ops = {
--
2.27.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
2023-08-09 13:43 [PATCH v2 net] qede: fix firmware halt over suspend and resume Manish Chopra
@ 2023-08-10 18:02 ` Simon Horman
2023-08-11 0:47 ` Jakub Kicinski
1 sibling, 0 replies; 7+ messages in thread
From: Simon Horman @ 2023-08-10 18:02 UTC (permalink / raw)
To: Manish Chopra
Cc: kuba, netdev, aelior, palok, njavali, skashyap, jmeneghi,
yuval.mintz, skalluru, pabeni, edumazet, horms, David Miller
On Wed, Aug 09, 2023 at 07:13:39PM +0530, Manish Chopra wrote:
> While performing certain power-off sequences, PCI drivers are
> called to suspend and resume their underlying devices through
> PCI PM (power management) interface. However this NIC hardware
> does not support PCI PM suspend/resume operations so system wide
> suspend/resume leads to bad MFW (management firmware) state which
> causes various follow-up errors in driver when communicating with
> the device/firmware afterwards.
>
> To fix this driver implements PCI PM suspend handler to indicate
> unsupported operation to the PCI subsystem explicitly, thus avoiding
> system to go into suspended/standby mode.
>
> Fixes: 2950219d87b0 ("qede: Add basic network device support")
> Cc: David Miller <davem@davemloft.net>
> Signed-off-by: Manish Chopra <manishc@marvell.com>
> Signed-off-by: Alok Prasad <palok@marvell.com>
> ---
> V1->V2:
> * Replace SIMPLE_DEV_PM_OPS with DEFINE_SIMPLE_DEV_PM_OPS
Thanks!
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
2023-08-09 13:43 [PATCH v2 net] qede: fix firmware halt over suspend and resume Manish Chopra
2023-08-10 18:02 ` Simon Horman
@ 2023-08-11 0:47 ` Jakub Kicinski
2023-08-11 9:31 ` [EXT] " Manish Chopra
1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2023-08-11 0:47 UTC (permalink / raw)
To: Manish Chopra
Cc: netdev, aelior, palok, njavali, skashyap, jmeneghi, yuval.mintz,
skalluru, pabeni, edumazet, horms, David Miller
On Wed, 9 Aug 2023 19:13:39 +0530 Manish Chopra wrote:
> While performing certain power-off sequences, PCI drivers are
> called to suspend and resume their underlying devices through
> PCI PM (power management) interface. However this NIC hardware
> does not support PCI PM suspend/resume operations so system wide
> suspend/resume leads to bad MFW (management firmware) state which
> causes various follow-up errors in driver when communicating with
> the device/firmware afterwards.
Does the FW end up recovering? That could still be preferable
to rejecting suspend altogether. Reject is a big hammer,
I'm a bit worried it will cause a regression in stable.
> To fix this driver implements PCI PM suspend handler to indicate
> unsupported operation to the PCI subsystem explicitly, thus avoiding
> system to go into suspended/standby mode.
>
> Fixes: 2950219d87b0 ("qede: Add basic network device support")
> Cc: David Miller <davem@davemloft.net>
> Signed-off-by: Manish Chopra <manishc@marvell.com>
> Signed-off-by: Alok Prasad <palok@marvell.com>
> ---
> V1->V2:
> * Replace SIMPLE_DEV_PM_OPS with DEFINE_SIMPLE_DEV_PM_OPS
> ---
> drivers/net/ethernet/qlogic/qede/qede_main.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
> index d57e52a97f85..18ae7af1764c 100644
> --- a/drivers/net/ethernet/qlogic/qede/qede_main.c
> +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
> @@ -177,6 +177,18 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param)
> }
> #endif
>
> +static int __maybe_unused qede_suspend(struct device *dev)
> +{
> + if (!dev)
> + return -ENODEV;
Can dev really be NULL here? That wouldn't make sense, what's the
driver supposed to do in such case?
--
pw-bot: cr
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
2023-08-11 0:47 ` Jakub Kicinski
@ 2023-08-11 9:31 ` Manish Chopra
2023-08-11 21:45 ` Jakub Kicinski
0 siblings, 1 reply; 7+ messages in thread
From: Manish Chopra @ 2023-08-11 9:31 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev@vger.kernel.org, Ariel Elior, Alok Prasad, Nilesh Javali,
Saurav Kashyap, jmeneghi@redhat.com, yuval.mintz@qlogic.com,
Sudarsana Reddy Kalluru, pabeni@redhat.com, edumazet@google.com,
horms@kernel.org, David Miller
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, August 11, 2023 6:17 AM
> To: Manish Chopra <manishc@marvell.com>
> Cc: netdev@vger.kernel.org; Ariel Elior <aelior@marvell.com>; Alok Prasad
> <palok@marvell.com>; Nilesh Javali <njavali@marvell.com>; Saurav Kashyap
> <skashyap@marvell.com>; jmeneghi@redhat.com; yuval.mintz@qlogic.com;
> Sudarsana Reddy Kalluru <skalluru@marvell.com>; pabeni@redhat.com;
> edumazet@google.com; horms@kernel.org; David Miller
> <davem@davemloft.net>
> Subject: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and
> resume
>
> External Email
>
> ----------------------------------------------------------------------
> On Wed, 9 Aug 2023 19:13:39 +0530 Manish Chopra wrote:
> > While performing certain power-off sequences, PCI drivers are called
> > to suspend and resume their underlying devices through PCI PM (power
> > management) interface. However this NIC hardware does not support PCI
> > PM suspend/resume operations so system wide suspend/resume leads to
> > bad MFW (management firmware) state which causes various follow-up
> > errors in driver when communicating with the device/firmware
> > afterwards.
>
> Does the FW end up recovering? That could still be preferable to rejecting
> suspend altogether. Reject is a big hammer, I'm a bit worried it will cause a
> regression in stable.
Yes, By adding the driver's suspend handler with explicit error returned
to PCI subsystem prevents the system wide suspend and does not impact the
device/FW at all. It keeps them operational as they were before.
>
> > To fix this driver implements PCI PM suspend handler to indicate
> > unsupported operation to the PCI subsystem explicitly, thus avoiding
> > system to go into suspended/standby mode.
> >
> > Fixes: 2950219d87b0 ("qede: Add basic network device support")
> > Cc: David Miller <davem@davemloft.net>
> > Signed-off-by: Manish Chopra <manishc@marvell.com>
> > Signed-off-by: Alok Prasad <palok@marvell.com>
> > ---
> > V1->V2:
> > * Replace SIMPLE_DEV_PM_OPS with DEFINE_SIMPLE_DEV_PM_OPS
> > ---
> > drivers/net/ethernet/qlogic/qede/qede_main.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c
> > b/drivers/net/ethernet/qlogic/qede/qede_main.c
> > index d57e52a97f85..18ae7af1764c 100644
> > --- a/drivers/net/ethernet/qlogic/qede/qede_main.c
> > +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
> > @@ -177,6 +177,18 @@ static int qede_sriov_configure(struct pci_dev
> > *pdev, int num_vfs_param) } #endif
> >
> > +static int __maybe_unused qede_suspend(struct device *dev) {
> > + if (!dev)
> > + return -ENODEV;
>
> Can dev really be NULL here? That wouldn't make sense, what's the driver
> supposed to do in such case?
It's not supposed to be NULL here assuming caller must be validating it way
before. I just put it for sanity. I will remove it.
> --
> pw-bot: cr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
2023-08-11 9:31 ` [EXT] " Manish Chopra
@ 2023-08-11 21:45 ` Jakub Kicinski
2023-08-14 10:24 ` Manish Chopra
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2023-08-11 21:45 UTC (permalink / raw)
To: Manish Chopra
Cc: netdev@vger.kernel.org, Ariel Elior, Alok Prasad, Nilesh Javali,
Saurav Kashyap, jmeneghi@redhat.com, yuval.mintz@qlogic.com,
Sudarsana Reddy Kalluru, pabeni@redhat.com, edumazet@google.com,
horms@kernel.org, David Miller
On Fri, 11 Aug 2023 09:31:15 +0000 Manish Chopra wrote:
> > Does the FW end up recovering? That could still be preferable to rejecting
> > suspend altogether. Reject is a big hammer, I'm a bit worried it will cause a
> > regression in stable.
>
> Yes, By adding the driver's suspend handler with explicit error returned
> to PCI subsystem prevents the system wide suspend and does not impact the
> device/FW at all. It keeps them operational as they were before.
I'm asking about recovery without this patch, not with it.
That should be evident from the text I'm replying under.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
2023-08-11 21:45 ` Jakub Kicinski
@ 2023-08-14 10:24 ` Manish Chopra
2023-08-14 15:17 ` Jakub Kicinski
0 siblings, 1 reply; 7+ messages in thread
From: Manish Chopra @ 2023-08-14 10:24 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev@vger.kernel.org, Ariel Elior, Alok Prasad, Nilesh Javali,
Saurav Kashyap, jmeneghi@redhat.com, yuval.mintz@qlogic.com,
Sudarsana Reddy Kalluru, pabeni@redhat.com, edumazet@google.com,
horms@kernel.org, David Miller
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Saturday, August 12, 2023 3:15 AM
> To: Manish Chopra <manishc@marvell.com>
> Cc: netdev@vger.kernel.org; Ariel Elior <aelior@marvell.com>; Alok Prasad
> <palok@marvell.com>; Nilesh Javali <njavali@marvell.com>; Saurav Kashyap
> <skashyap@marvell.com>; jmeneghi@redhat.com; yuval.mintz@qlogic.com;
> Sudarsana Reddy Kalluru <skalluru@marvell.com>; pabeni@redhat.com;
> edumazet@google.com; horms@kernel.org; David Miller
> <davem@davemloft.net>
> Subject: Re: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and
> resume
>
> On Fri, 11 Aug 2023 09:31:15 +0000 Manish Chopra wrote:
> > > Does the FW end up recovering? That could still be preferable to
> > > rejecting suspend altogether. Reject is a big hammer, I'm a bit
> > > worried it will cause a regression in stable.
> >
> > Yes, By adding the driver's suspend handler with explicit error
> > returned to PCI subsystem prevents the system wide suspend and does
> > not impact the device/FW at all. It keeps them operational as they were
> before.
>
> I'm asking about recovery without this patch, not with it.
> That should be evident from the text I'm replying under.
Nope, It does not recover. We have to power cycle the system to recover.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
2023-08-14 10:24 ` Manish Chopra
@ 2023-08-14 15:17 ` Jakub Kicinski
0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2023-08-14 15:17 UTC (permalink / raw)
To: Manish Chopra
Cc: netdev@vger.kernel.org, Ariel Elior, Alok Prasad, Nilesh Javali,
Saurav Kashyap, jmeneghi@redhat.com, yuval.mintz@qlogic.com,
Sudarsana Reddy Kalluru, pabeni@redhat.com, edumazet@google.com,
horms@kernel.org, David Miller
On Mon, 14 Aug 2023 10:24:52 +0000 Manish Chopra wrote:
> > I'm asking about recovery without this patch, not with it.
> > That should be evident from the text I'm replying under.
>
> Nope, It does not recover. We have to power cycle the system to recover.
Alright, please state that in the commit message and drop the
unnecessary NULL check for v2.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-08-14 15:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-09 13:43 [PATCH v2 net] qede: fix firmware halt over suspend and resume Manish Chopra
2023-08-10 18:02 ` Simon Horman
2023-08-11 0:47 ` Jakub Kicinski
2023-08-11 9:31 ` [EXT] " Manish Chopra
2023-08-11 21:45 ` Jakub Kicinski
2023-08-14 10:24 ` Manish Chopra
2023-08-14 15:17 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).