netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] ice: fix memory leak of aRFS after resuming from suspend
@ 2021-03-18  8:15 Yongxin Liu
  2021-03-18 22:20 ` Creeley, Brett
  0 siblings, 1 reply; 3+ messages in thread
From: Yongxin Liu @ 2021-03-18  8:15 UTC (permalink / raw)
  To: brett.creeley, madhu.chittim, anthony.l.nguyen, andrewx.bowers,
	jeffrey.t.kirsher
  Cc: netdev

In ice_suspend(), ice_clear_interrupt_scheme() is called, and then
irq_free_descs() will be eventually called to free irq and its descriptor.

In ice_resume(), ice_init_interrupt_scheme() is called to allocate new irqs.
However, in ice_rebuild_arfs(), struct irq_glue and struct cpu_rmap maybe
cannot be freed, if the irqs that released in ice_suspend() were reassigned
to other devices, which makes irq descriptor's affinity_notify lost.

So move ice_remove_arfs() before ice_clear_interrupt_scheme(), which can
make sure all irq_glue and cpu_rmap can be correctly released before
corresponding irq and descriptor are released.

Fix the following memeory leak.

unreferenced object 0xffff95bd951afc00 (size 512):
  comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s)
  hex dump (first 32 bytes):
    18 00 00 00 18 00 18 00 70 fc 1a 95 bd 95 ff ff  ........p.......
    00 00 ff ff 01 00 ff ff 02 00 ff ff 03 00 ff ff  ................
  backtrace:
    [<0000000072e4b914>] __kmalloc+0x336/0x540
    [<0000000054642a87>] alloc_cpu_rmap+0x3b/0xb0
    [<00000000f220deec>] ice_set_cpu_rx_rmap+0x6a/0x110 [ice]
    [<000000002370a632>] ice_probe+0x941/0x1180 [ice]
    [<00000000d692edba>] local_pci_probe+0x47/0xa0
    [<00000000503934f0>] work_for_cpu_fn+0x1a/0x30
    [<00000000555a9e4a>] process_one_work+0x1dd/0x410
    [<000000002c4b414a>] worker_thread+0x221/0x3f0
    [<00000000bb2b556b>] kthread+0x14c/0x170
    [<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30
unreferenced object 0xffff95bd81b0a2a0 (size 96):
  comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s)
  hex dump (first 32 bytes):
    38 00 00 00 01 00 00 00 e0 ff ff ff 0f 00 00 00  8...............
    b0 a2 b0 81 bd 95 ff ff b0 a2 b0 81 bd 95 ff ff  ................
  backtrace:
    [<00000000582dd5c5>] kmem_cache_alloc_trace+0x31f/0x4c0
    [<000000002659850d>] irq_cpu_rmap_add+0x25/0xe0
    [<00000000495a3055>] ice_set_cpu_rx_rmap+0xb4/0x110 [ice]
    [<000000002370a632>] ice_probe+0x941/0x1180 [ice]
    [<00000000d692edba>] local_pci_probe+0x47/0xa0
    [<00000000503934f0>] work_for_cpu_fn+0x1a/0x30
    [<00000000555a9e4a>] process_one_work+0x1dd/0x410
    [<000000002c4b414a>] worker_thread+0x221/0x3f0
    [<00000000bb2b556b>] kthread+0x14c/0x170
    [<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30

Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
---
 drivers/net/ethernet/intel/ice/ice_arfs.c | 1 -
 drivers/net/ethernet/intel/ice/ice_main.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_arfs.c b/drivers/net/ethernet/intel/ice/ice_arfs.c
index 6560acd76c94..c748d0a5c7d4 100644
--- a/drivers/net/ethernet/intel/ice/ice_arfs.c
+++ b/drivers/net/ethernet/intel/ice/ice_arfs.c
@@ -654,7 +654,6 @@ void ice_rebuild_arfs(struct ice_pf *pf)
 	if (!pf_vsi)
 		return;
 
-	ice_remove_arfs(pf);
 	if (ice_set_cpu_rx_rmap(pf_vsi)) {
 		dev_err(ice_pf_to_dev(pf), "Failed to rebuild aRFS\n");
 		return;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 2c23c8f468a5..dba901bf2b9b 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4568,6 +4568,9 @@ static int __maybe_unused ice_suspend(struct device *dev)
 			continue;
 		ice_vsi_free_q_vectors(pf->vsi[v]);
 	}
+	if (test_bit(ICE_FLAG_FD_ENA, pf->flags)) {
+		ice_remove_arfs(pf);
+	}
 	ice_clear_interrupt_scheme(pf);
 
 	pci_save_state(pdev);
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] ice: fix memory leak of aRFS after resuming from suspend
  2021-03-18  8:15 [PATCH net] ice: fix memory leak of aRFS after resuming from suspend Yongxin Liu
@ 2021-03-18 22:20 ` Creeley, Brett
  2021-03-19  2:33   ` Liu, Yongxin
  0 siblings, 1 reply; 3+ messages in thread
From: Creeley, Brett @ 2021-03-18 22:20 UTC (permalink / raw)
  To: yongxin.liu@windriver.com, jeffrey.t.kirsher@intel.com,
	Chittim, Madhu, Nguyen, Anthony L, andrewx.bowers@intel.com
  Cc: netdev@vger.kernel.org

On Thu, 2021-03-18 at 16:15 +0800, Yongxin Liu wrote:
> In ice_suspend(), ice_clear_interrupt_scheme() is called, and then
> irq_free_descs() will be eventually called to free irq and its
> descriptor.
> 
> In ice_resume(), ice_init_interrupt_scheme() is called to allocate
> new irqs.
> However, in ice_rebuild_arfs(), struct irq_glue and struct cpu_rmap
> maybe
> cannot be freed, if the irqs that released in ice_suspend() were
> reassigned
> to other devices, which makes irq descriptor's affinity_notify lost.
> 
> So move ice_remove_arfs() before ice_clear_interrupt_scheme(), which
> can
> make sure all irq_glue and cpu_rmap can be correctly released before
> corresponding irq and descriptor are released.
> 
> Fix the following memeory leak.

s/memeory/memory

<snip>

> diff --git a/drivers/net/ethernet/intel/ice/ice_arfs.c
> b/drivers/net/ethernet/intel/ice/ice_arfs.c
> index 6560acd76c94..c748d0a5c7d4 100644
> --- a/drivers/net/ethernet/intel/ice/ice_arfs.c
> +++ b/drivers/net/ethernet/intel/ice/ice_arfs.c
> @@ -654,7 +654,6 @@ void ice_rebuild_arfs(struct ice_pf *pf)
>  	if (!pf_vsi)
>  		return;
>  
> -	ice_remove_arfs(pf);

This should not be removed. Removing this would break the
reset flows outside of the suspend/remove case.

>  	if (ice_set_cpu_rx_rmap(pf_vsi)) {
>  		dev_err(ice_pf_to_dev(pf), "Failed to rebuild aRFS\n");
>  		return;
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
> b/drivers/net/ethernet/intel/ice/ice_main.c
> index 2c23c8f468a5..dba901bf2b9b 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -4568,6 +4568,9 @@ static int __maybe_unused ice_suspend(struct
> device *dev)
>  			continue;
>  		ice_vsi_free_q_vectors(pf->vsi[v]);
>  	}
> +	if (test_bit(ICE_FLAG_FD_ENA, pf->flags)) {
> +		ice_remove_arfs(pf);
> +	}

Braces aren't needed around a single if statement like this.

Also, I don't think this is the right solution. I think a better
approach would be to call ice_free_rx_cpu_map() here. With this,
it seems like no other changes are necessary. It also isn't
necessary to check the ICE_FLAG_FD_ENA bit with this change.

>  	ice_clear_interrupt_scheme(pf);
>  
>  	pci_save_state(pdev);

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH net] ice: fix memory leak of aRFS after resuming from suspend
  2021-03-18 22:20 ` Creeley, Brett
@ 2021-03-19  2:33   ` Liu, Yongxin
  0 siblings, 0 replies; 3+ messages in thread
From: Liu, Yongxin @ 2021-03-19  2:33 UTC (permalink / raw)
  To: Creeley, Brett
  Cc: netdev@vger.kernel.org, jeffrey.t.kirsher@intel.com,
	Chittim, Madhu, Nguyen, Anthony L, andrewx.bowers@intel.com


> -----Original Message-----
> From: Creeley, Brett <brett.creeley@intel.com>
> Sent: Friday, March 19, 2021 06:20
> To: Liu, Yongxin <Yongxin.Liu@windriver.com>; jeffrey.t.kirsher@intel.com;
> Chittim, Madhu <madhu.chittim@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; andrewx.bowers@intel.com
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net] ice: fix memory leak of aRFS after resuming from
> suspend
> 
> 
> On Thu, 2021-03-18 at 16:15 +0800, Yongxin Liu wrote:
> > In ice_suspend(), ice_clear_interrupt_scheme() is called, and then
> > irq_free_descs() will be eventually called to free irq and its
> > descriptor.
> >
> > In ice_resume(), ice_init_interrupt_scheme() is called to allocate new
> > irqs.
> > However, in ice_rebuild_arfs(), struct irq_glue and struct cpu_rmap
> > maybe cannot be freed, if the irqs that released in ice_suspend() were
> > reassigned to other devices, which makes irq descriptor's
> > affinity_notify lost.
> >
> > So move ice_remove_arfs() before ice_clear_interrupt_scheme(), which
> > can make sure all irq_glue and cpu_rmap can be correctly released
> > before corresponding irq and descriptor are released.
> >
> > Fix the following memeory leak.
> 
> s/memeory/memory
> 
> <snip>
> 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_arfs.c
> > b/drivers/net/ethernet/intel/ice/ice_arfs.c
> > index 6560acd76c94..c748d0a5c7d4 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_arfs.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_arfs.c
> > @@ -654,7 +654,6 @@ void ice_rebuild_arfs(struct ice_pf *pf)
> >       if (!pf_vsi)
> >               return;
> >
> > -     ice_remove_arfs(pf);
> 
> This should not be removed. Removing this would break the reset flows
> outside of the suspend/remove case.
> 
> >       if (ice_set_cpu_rx_rmap(pf_vsi)) {
> >               dev_err(ice_pf_to_dev(pf), "Failed to rebuild aRFS\n");
> >               return;
> > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
> > b/drivers/net/ethernet/intel/ice/ice_main.c
> > index 2c23c8f468a5..dba901bf2b9b 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -4568,6 +4568,9 @@ static int __maybe_unused ice_suspend(struct
> > device *dev)
> >                       continue;
> >               ice_vsi_free_q_vectors(pf->vsi[v]);
> >       }
> > +     if (test_bit(ICE_FLAG_FD_ENA, pf->flags)) {
> > +             ice_remove_arfs(pf);
> > +     }
> 
> Braces aren't needed around a single if statement like this.
> 
> Also, I don't think this is the right solution. I think a better approach
> would be to call ice_free_rx_cpu_map() here. With this, it seems like no
> other changes are necessary. It also isn't necessary to check the
> ICE_FLAG_FD_ENA bit with this change.

Thanks for your valuable review. I will send V2.

--Yongxin

> 
> >       ice_clear_interrupt_scheme(pf);
> >
> >       pci_save_state(pdev);

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-03-19  2:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-18  8:15 [PATCH net] ice: fix memory leak of aRFS after resuming from suspend Yongxin Liu
2021-03-18 22:20 ` Creeley, Brett
2021-03-19  2:33   ` Liu, Yongxin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).