netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net] qede: fix firmware halt over suspend and resume
@ 2023-08-09 13:43 Manish Chopra
  2023-08-10 18:02 ` Simon Horman
  2023-08-11  0:47 ` Jakub Kicinski
  0 siblings, 2 replies; 7+ messages in thread
From: Manish Chopra @ 2023-08-09 13:43 UTC (permalink / raw)
  To: kuba
  Cc: netdev, aelior, palok, njavali, skashyap, jmeneghi, yuval.mintz,
	skalluru, pabeni, edumazet, horms, David Miller

While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.

To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.

Fixes: 2950219d87b0 ("qede: Add basic network device support")
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
---
V1->V2:
* Replace SIMPLE_DEV_PM_OPS with DEFINE_SIMPLE_DEV_PM_OPS
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index d57e52a97f85..18ae7af1764c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -177,6 +177,18 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param)
 }
 #endif
 
+static int __maybe_unused qede_suspend(struct device *dev)
+{
+	if (!dev)
+		return -ENODEV;
+
+	dev_info(dev, "Device does not support suspend operation\n");
+
+	return -EOPNOTSUPP;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(qede_pm_ops, qede_suspend, NULL);
+
 static const struct pci_error_handlers qede_err_handler = {
 	.error_detected = qede_io_error_detected,
 };
@@ -191,6 +203,7 @@ static struct pci_driver qede_pci_driver = {
 	.sriov_configure = qede_sriov_configure,
 #endif
 	.err_handler = &qede_err_handler,
+	.driver.pm = &qede_pm_ops,
 };
 
 static struct qed_eth_cb_ops qede_ll_ops = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
  2023-08-09 13:43 [PATCH v2 net] qede: fix firmware halt over suspend and resume Manish Chopra
@ 2023-08-10 18:02 ` Simon Horman
  2023-08-11  0:47 ` Jakub Kicinski
  1 sibling, 0 replies; 7+ messages in thread
From: Simon Horman @ 2023-08-10 18:02 UTC (permalink / raw)
  To: Manish Chopra
  Cc: kuba, netdev, aelior, palok, njavali, skashyap, jmeneghi,
	yuval.mintz, skalluru, pabeni, edumazet, horms, David Miller

On Wed, Aug 09, 2023 at 07:13:39PM +0530, Manish Chopra wrote:
> While performing certain power-off sequences, PCI drivers are
> called to suspend and resume their underlying devices through
> PCI PM (power management) interface. However this NIC hardware
> does not support PCI PM suspend/resume operations so system wide
> suspend/resume leads to bad MFW (management firmware) state which
> causes various follow-up errors in driver when communicating with
> the device/firmware afterwards.
> 
> To fix this driver implements PCI PM suspend handler to indicate
> unsupported operation to the PCI subsystem explicitly, thus avoiding
> system to go into suspended/standby mode.
> 
> Fixes: 2950219d87b0 ("qede: Add basic network device support")
> Cc: David Miller <davem@davemloft.net>
> Signed-off-by: Manish Chopra <manishc@marvell.com>
> Signed-off-by: Alok Prasad <palok@marvell.com>
> ---
> V1->V2:
> * Replace SIMPLE_DEV_PM_OPS with DEFINE_SIMPLE_DEV_PM_OPS

Thanks!

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
  2023-08-09 13:43 [PATCH v2 net] qede: fix firmware halt over suspend and resume Manish Chopra
  2023-08-10 18:02 ` Simon Horman
@ 2023-08-11  0:47 ` Jakub Kicinski
  2023-08-11  9:31   ` [EXT] " Manish Chopra
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2023-08-11  0:47 UTC (permalink / raw)
  To: Manish Chopra
  Cc: netdev, aelior, palok, njavali, skashyap, jmeneghi, yuval.mintz,
	skalluru, pabeni, edumazet, horms, David Miller

On Wed, 9 Aug 2023 19:13:39 +0530 Manish Chopra wrote:
> While performing certain power-off sequences, PCI drivers are
> called to suspend and resume their underlying devices through
> PCI PM (power management) interface. However this NIC hardware
> does not support PCI PM suspend/resume operations so system wide
> suspend/resume leads to bad MFW (management firmware) state which
> causes various follow-up errors in driver when communicating with
> the device/firmware afterwards.

Does the FW end up recovering? That could still be preferable
to rejecting suspend altogether. Reject is a big hammer,
I'm a bit worried it will cause a regression in stable.

> To fix this driver implements PCI PM suspend handler to indicate
> unsupported operation to the PCI subsystem explicitly, thus avoiding
> system to go into suspended/standby mode.
> 
> Fixes: 2950219d87b0 ("qede: Add basic network device support")
> Cc: David Miller <davem@davemloft.net>
> Signed-off-by: Manish Chopra <manishc@marvell.com>
> Signed-off-by: Alok Prasad <palok@marvell.com>
> ---
> V1->V2:
> * Replace SIMPLE_DEV_PM_OPS with DEFINE_SIMPLE_DEV_PM_OPS
> ---
>  drivers/net/ethernet/qlogic/qede/qede_main.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
> index d57e52a97f85..18ae7af1764c 100644
> --- a/drivers/net/ethernet/qlogic/qede/qede_main.c
> +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
> @@ -177,6 +177,18 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param)
>  }
>  #endif
>  
> +static int __maybe_unused qede_suspend(struct device *dev)
> +{
> +	if (!dev)
> +		return -ENODEV;

Can dev really be NULL here? That wouldn't make sense, what's the
driver supposed to do in such case?
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
  2023-08-11  0:47 ` Jakub Kicinski
@ 2023-08-11  9:31   ` Manish Chopra
  2023-08-11 21:45     ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Manish Chopra @ 2023-08-11  9:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev@vger.kernel.org, Ariel Elior, Alok Prasad, Nilesh Javali,
	Saurav Kashyap, jmeneghi@redhat.com, yuval.mintz@qlogic.com,
	Sudarsana Reddy Kalluru, pabeni@redhat.com, edumazet@google.com,
	horms@kernel.org, David Miller

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, August 11, 2023 6:17 AM
> To: Manish Chopra <manishc@marvell.com>
> Cc: netdev@vger.kernel.org; Ariel Elior <aelior@marvell.com>; Alok Prasad
> <palok@marvell.com>; Nilesh Javali <njavali@marvell.com>; Saurav Kashyap
> <skashyap@marvell.com>; jmeneghi@redhat.com; yuval.mintz@qlogic.com;
> Sudarsana Reddy Kalluru <skalluru@marvell.com>; pabeni@redhat.com;
> edumazet@google.com; horms@kernel.org; David Miller
> <davem@davemloft.net>
> Subject: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and
> resume
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Wed, 9 Aug 2023 19:13:39 +0530 Manish Chopra wrote:
> > While performing certain power-off sequences, PCI drivers are called
> > to suspend and resume their underlying devices through PCI PM (power
> > management) interface. However this NIC hardware does not support PCI
> > PM suspend/resume operations so system wide suspend/resume leads to
> > bad MFW (management firmware) state which causes various follow-up
> > errors in driver when communicating with the device/firmware
> > afterwards.
> 
> Does the FW end up recovering? That could still be preferable to rejecting
> suspend altogether. Reject is a big hammer, I'm a bit worried it will cause a
> regression in stable.

Yes, By adding the driver's suspend handler with explicit error returned 
to PCI subsystem prevents the system wide suspend and does not impact the
device/FW at all. It keeps them operational as they were before.

> 
> > To fix this driver implements PCI PM suspend handler to indicate
> > unsupported operation to the PCI subsystem explicitly, thus avoiding
> > system to go into suspended/standby mode.
> >
> > Fixes: 2950219d87b0 ("qede: Add basic network device support")
> > Cc: David Miller <davem@davemloft.net>
> > Signed-off-by: Manish Chopra <manishc@marvell.com>
> > Signed-off-by: Alok Prasad <palok@marvell.com>
> > ---
> > V1->V2:
> > * Replace SIMPLE_DEV_PM_OPS with DEFINE_SIMPLE_DEV_PM_OPS
> > ---
> >  drivers/net/ethernet/qlogic/qede/qede_main.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c
> > b/drivers/net/ethernet/qlogic/qede/qede_main.c
> > index d57e52a97f85..18ae7af1764c 100644
> > --- a/drivers/net/ethernet/qlogic/qede/qede_main.c
> > +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
> > @@ -177,6 +177,18 @@ static int qede_sriov_configure(struct pci_dev
> > *pdev, int num_vfs_param)  }  #endif
> >
> > +static int __maybe_unused qede_suspend(struct device *dev) {
> > +	if (!dev)
> > +		return -ENODEV;
> 
> Can dev really be NULL here? That wouldn't make sense, what's the driver
> supposed to do in such case?

It's not supposed to be NULL here assuming caller must be validating it way
before. I just put it for sanity. I will remove it.

> --
> pw-bot: cr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
  2023-08-11  9:31   ` [EXT] " Manish Chopra
@ 2023-08-11 21:45     ` Jakub Kicinski
  2023-08-14 10:24       ` Manish Chopra
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2023-08-11 21:45 UTC (permalink / raw)
  To: Manish Chopra
  Cc: netdev@vger.kernel.org, Ariel Elior, Alok Prasad, Nilesh Javali,
	Saurav Kashyap, jmeneghi@redhat.com, yuval.mintz@qlogic.com,
	Sudarsana Reddy Kalluru, pabeni@redhat.com, edumazet@google.com,
	horms@kernel.org, David Miller

On Fri, 11 Aug 2023 09:31:15 +0000 Manish Chopra wrote:
> > Does the FW end up recovering? That could still be preferable to rejecting
> > suspend altogether. Reject is a big hammer, I'm a bit worried it will cause a
> > regression in stable.  
> 
> Yes, By adding the driver's suspend handler with explicit error returned 
> to PCI subsystem prevents the system wide suspend and does not impact the
> device/FW at all. It keeps them operational as they were before.

I'm asking about recovery without this patch, not with it.
That should be evident from the text I'm replying under.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
  2023-08-11 21:45     ` Jakub Kicinski
@ 2023-08-14 10:24       ` Manish Chopra
  2023-08-14 15:17         ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Manish Chopra @ 2023-08-14 10:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev@vger.kernel.org, Ariel Elior, Alok Prasad, Nilesh Javali,
	Saurav Kashyap, jmeneghi@redhat.com, yuval.mintz@qlogic.com,
	Sudarsana Reddy Kalluru, pabeni@redhat.com, edumazet@google.com,
	horms@kernel.org, David Miller

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Saturday, August 12, 2023 3:15 AM
> To: Manish Chopra <manishc@marvell.com>
> Cc: netdev@vger.kernel.org; Ariel Elior <aelior@marvell.com>; Alok Prasad
> <palok@marvell.com>; Nilesh Javali <njavali@marvell.com>; Saurav Kashyap
> <skashyap@marvell.com>; jmeneghi@redhat.com; yuval.mintz@qlogic.com;
> Sudarsana Reddy Kalluru <skalluru@marvell.com>; pabeni@redhat.com;
> edumazet@google.com; horms@kernel.org; David Miller
> <davem@davemloft.net>
> Subject: Re: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and
> resume
> 
> On Fri, 11 Aug 2023 09:31:15 +0000 Manish Chopra wrote:
> > > Does the FW end up recovering? That could still be preferable to
> > > rejecting suspend altogether. Reject is a big hammer, I'm a bit
> > > worried it will cause a regression in stable.
> >
> > Yes, By adding the driver's suspend handler with explicit error
> > returned to PCI subsystem prevents the system wide suspend and does
> > not impact the device/FW at all. It keeps them operational as they were
> before.
> 
> I'm asking about recovery without this patch, not with it.
> That should be evident from the text I'm replying under.

Nope, It does not recover. We have to power cycle the system to recover.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [EXT] Re: [PATCH v2 net] qede: fix firmware halt over suspend and resume
  2023-08-14 10:24       ` Manish Chopra
@ 2023-08-14 15:17         ` Jakub Kicinski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2023-08-14 15:17 UTC (permalink / raw)
  To: Manish Chopra
  Cc: netdev@vger.kernel.org, Ariel Elior, Alok Prasad, Nilesh Javali,
	Saurav Kashyap, jmeneghi@redhat.com, yuval.mintz@qlogic.com,
	Sudarsana Reddy Kalluru, pabeni@redhat.com, edumazet@google.com,
	horms@kernel.org, David Miller

On Mon, 14 Aug 2023 10:24:52 +0000 Manish Chopra wrote:
> > I'm asking about recovery without this patch, not with it.
> > That should be evident from the text I'm replying under.  
> 
> Nope, It does not recover. We have to power cycle the system to recover.

Alright, please state that in the commit message and drop the
unnecessary NULL check for v2.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-14 15:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-09 13:43 [PATCH v2 net] qede: fix firmware halt over suspend and resume Manish Chopra
2023-08-10 18:02 ` Simon Horman
2023-08-11  0:47 ` Jakub Kicinski
2023-08-11  9:31   ` [EXT] " Manish Chopra
2023-08-11 21:45     ` Jakub Kicinski
2023-08-14 10:24       ` Manish Chopra
2023-08-14 15:17         ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).