linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug
@ 2015-10-07 15:51 Sagi Grimberg
       [not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2015-10-07 15:51 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Jason Gunthorpe, Erez Shitrit, Doug Ledford

This started popping up (not sure if it's new to 4.3-rc1).

Happens when unloading the provider driver (mlx4/mlx5 in my case).

Has anyone seen this?

kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 2 PID: 6012 at drivers/infiniband/core/verbs.c:283 
ib_dealloc_pd+0x5b/0xa0 [ib_core]()
kernel: Modules linked in: rpcrdma ib_srp scsi_transport_srp ib_iser 
rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_umad ib_uverbs ib_ipoib 
ib_cm mlx4_ib ib_sa ib_mad mlx4_core mlx5_ib(-) mlx5_core ib_core 
ib_addr mst_pciconf(O) mst_pci(O) nfsv3 nfs af_packet coretemp 
x86_pkg_temp_thermal crct10dif_pclmul crc32c_intel aesni_intel 
aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd microcode 
ipmi_ssif pcspkr lpc_ich i2c_i801 mfd_core ioatdma wmi ipmi_si 
ipmi_msghandler processor button nfsd auth_rpcgss oid_registry nfs_acl 
lockd grace sunrpc ip_tables x_tables ext4 crc16 mbcache jbd2 sd_mod 
hid_generic usbhid hid ahci libahci libata igb ehci_pci hwmon ehci_hcd 
ptp usbcore pps_core scsi_mod i2c_algo_bit usb_common i2c_core dca 
autofs4 [last unloaded: mlx4_core] 

kernel: CPU: 2 PID: 6012 Comm: modprobe Tainted: G           O L 
4.3.0-rc3-debug+ #67
kernel: Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013
kernel:  000000000000011b ffff8807a99afbe8 ffffffff8129915b 0000000000000009
kernel:  0000000000000000 ffff8807a99afc28 ffffffff810752b5 ffff880827d7c2a0
kernel:  ffff8807b0d03260 ffff880827d7c2a0 ffff880827d7cc60 0000000000000000
kernel: Call Trace:
kernel:  [<ffffffff8129915b>] dump_stack+0x4f/0x74
kernel:  [<ffffffff810752b5>] warn_slowpath_common+0x95/0xe0
kernel:  [<ffffffff8107531a>] warn_slowpath_null+0x1a/0x20
kernel:  [<ffffffffa001bd4b>] ib_dealloc_pd+0x5b/0xa0 [ib_core]
kernel:  [<ffffffffa047adce>] ipoib_transport_dev_cleanup+0x9e/0xf0 
[ib_ipoib]
kernel:  [<ffffffffa047712e>] ipoib_ib_dev_cleanup+0x5e/0x80 [ib_ipoib]
kernel:  [<ffffffffa0473984>] ipoib_dev_cleanup+0x2a4/0x3b0 [ib_ipoib]
kernel:  [<ffffffff8107a11d>] ? __local_bh_enable_ip+0x6d/0xd0
kernel:  [<ffffffffa0473a9e>] ipoib_uninit+0xe/0x10 [ib_ipoib]
kernel:  [<ffffffff8141ba17>] rollback_registered_many+0x1a7/0x2c0
kernel:  [<ffffffff8141bbd1>] rollback_registered+0x31/0x40
kernel:  [<ffffffff8141bc38>] unregister_netdevice_queue+0x58/0xb0
kernel:  [<ffffffff8141be00>] unregister_netdev+0x20/0x30
kernel:  [<ffffffffa04721a1>] ipoib_remove_one+0xa1/0xe0 [ib_ipoib]
kernel:  [<ffffffffa001e0d1>] ib_unregister_device+0xc1/0x160 [ib_core]
kernel:  [<ffffffffa05231f9>] mlx5_ib_remove+0x19/0x50 [mlx5_ib]
kernel:  [<ffffffffa04e5068>] mlx5_remove_device+0x68/0x80 [mlx5_core]
kernel:  [<ffffffffa04e50be>] mlx5_unregister_interface+0x3e/0x70 
[mlx5_core]
kernel:  [<ffffffffa053397c>] mlx5_ib_cleanup+0x10/0x694 [mlx5_ib]
kernel:  [<ffffffff810f67aa>] SyS_delete_module+0x17a/0x1c0
kernel:  [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
kernel:  [<ffffffff811e80b0>] ? generic_show_options+0x180/0x180
kernel:  [<ffffffff8151a1f2>] entry_SYSCALL_64_fastpath+0x12/0x76
kernel: ---[ end trace 31339c7283574ccb ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug
       [not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-10-07 16:11   ` Doug Ledford
       [not found]     ` <5615442D.2020007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-10-07 16:22   ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA
  2015-10-07 19:12   ` Or Gerlitz
  2 siblings, 1 reply; 8+ messages in thread
From: Doug Ledford @ 2015-10-07 16:11 UTC (permalink / raw)
  To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Jason Gunthorpe, Erez Shitrit

[-- Attachment #1: Type: text/plain, Size: 3654 bytes --]

On 10/07/2015 11:51 AM, Sagi Grimberg wrote:
> This started popping up (not sure if it's new to 4.3-rc1).
> 
> Happens when unloading the provider driver (mlx4/mlx5 in my case).
> 
> Has anyone seen this?
> 
> kernel: ------------[ cut here ]------------
> kernel: WARNING: CPU: 2 PID: 6012 at drivers/infiniband/core/verbs.c:283
> ib_dealloc_pd+0x5b/0xa0 [ib_core]()
> kernel: Modules linked in: rpcrdma ib_srp scsi_transport_srp ib_iser
> rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_umad ib_uverbs ib_ipoib
> ib_cm mlx4_ib ib_sa ib_mad mlx4_core mlx5_ib(-) mlx5_core ib_core
> ib_addr mst_pciconf(O) mst_pci(O) nfsv3 nfs af_packet coretemp
> x86_pkg_temp_thermal crct10dif_pclmul crc32c_intel aesni_intel
> aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd microcode
> ipmi_ssif pcspkr lpc_ich i2c_i801 mfd_core ioatdma wmi ipmi_si
> ipmi_msghandler processor button nfsd auth_rpcgss oid_registry nfs_acl
> lockd grace sunrpc ip_tables x_tables ext4 crc16 mbcache jbd2 sd_mod
> hid_generic usbhid hid ahci libahci libata igb ehci_pci hwmon ehci_hcd
> ptp usbcore pps_core scsi_mod i2c_algo_bit usb_common i2c_core dca
> autofs4 [last unloaded: mlx4_core]
> kernel: CPU: 2 PID: 6012 Comm: modprobe Tainted: G           O L
> 4.3.0-rc3-debug+ #67
> kernel: Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013
> kernel:  000000000000011b ffff8807a99afbe8 ffffffff8129915b
> 0000000000000009
> kernel:  0000000000000000 ffff8807a99afc28 ffffffff810752b5
> ffff880827d7c2a0
> kernel:  ffff8807b0d03260 ffff880827d7c2a0 ffff880827d7cc60
> 0000000000000000
> kernel: Call Trace:
> kernel:  [<ffffffff8129915b>] dump_stack+0x4f/0x74
> kernel:  [<ffffffff810752b5>] warn_slowpath_common+0x95/0xe0
> kernel:  [<ffffffff8107531a>] warn_slowpath_null+0x1a/0x20
> kernel:  [<ffffffffa001bd4b>] ib_dealloc_pd+0x5b/0xa0 [ib_core]
> kernel:  [<ffffffffa047adce>] ipoib_transport_dev_cleanup+0x9e/0xf0
> [ib_ipoib]
> kernel:  [<ffffffffa047712e>] ipoib_ib_dev_cleanup+0x5e/0x80 [ib_ipoib]
> kernel:  [<ffffffffa0473984>] ipoib_dev_cleanup+0x2a4/0x3b0 [ib_ipoib]
> kernel:  [<ffffffff8107a11d>] ? __local_bh_enable_ip+0x6d/0xd0
> kernel:  [<ffffffffa0473a9e>] ipoib_uninit+0xe/0x10 [ib_ipoib]
> kernel:  [<ffffffff8141ba17>] rollback_registered_many+0x1a7/0x2c0
> kernel:  [<ffffffff8141bbd1>] rollback_registered+0x31/0x40
> kernel:  [<ffffffff8141bc38>] unregister_netdevice_queue+0x58/0xb0
> kernel:  [<ffffffff8141be00>] unregister_netdev+0x20/0x30
> kernel:  [<ffffffffa04721a1>] ipoib_remove_one+0xa1/0xe0 [ib_ipoib]
> kernel:  [<ffffffffa001e0d1>] ib_unregister_device+0xc1/0x160 [ib_core]
> kernel:  [<ffffffffa05231f9>] mlx5_ib_remove+0x19/0x50 [mlx5_ib]
> kernel:  [<ffffffffa04e5068>] mlx5_remove_device+0x68/0x80 [mlx5_core]
> kernel:  [<ffffffffa04e50be>] mlx5_unregister_interface+0x3e/0x70
> [mlx5_core]
> kernel:  [<ffffffffa053397c>] mlx5_ib_cleanup+0x10/0x694 [mlx5_ib]
> kernel:  [<ffffffff810f67aa>] SyS_delete_module+0x17a/0x1c0
> kernel:  [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
> kernel:  [<ffffffff811e80b0>] ? generic_show_options+0x180/0x180
> kernel:  [<ffffffff8151a1f2>] entry_SYSCALL_64_fastpath+0x12/0x76
> kernel: ---[ end trace 31339c7283574ccb ]---

Yes.  I'm seeing this too.  The last time this popped up I fixed it by
adding the code for reaping ahs.  I suspect that the new code to timeout
sendonly multicast joins combined with us now creating and joining what
used to be sendonly groups is the likely culprit here.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug
       [not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-10-07 16:11   ` Doug Ledford
@ 2015-10-07 16:22   ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA
  2015-10-07 19:12   ` Or Gerlitz
  2 siblings, 0 replies; 8+ messages in thread
From: santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA @ 2015-10-07 16:22 UTC (permalink / raw)
  To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Jason Gunthorpe, Erez Shitrit, Doug Ledford

Sagi,

On 10/7/15 8:51 AM, Sagi Grimberg wrote:
> This started popping up (not sure if it's new to 4.3-rc1).
>
> Happens when unloading the provider driver (mlx4/mlx5 in my case).
>
> Has anyone seen this?
>
Not sure it is useful but yes I have seen similar dump with
RDS on 4.3-rc1. I later found that RDS code had mr leak(s) in
normal operation which lead to WARNS on module clean up.
I believe the leaks lead to pd 'usecnt' getting messed up.
Once i avoided that, I stopped seeing it.


> kernel: ------------[ cut here ]------------
> kernel: WARNING: CPU: 2 PID: 6012 at drivers/infiniband/core/verbs.c:283
> ib_dealloc_pd+0x5b/0xa0 [ib_core]()
> kernel: Modules linked in: rpcrdma ib_srp scsi_transport_srp ib_iser
> rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_umad ib_uverbs ib_ipoib
> ib_cm mlx4_ib ib_sa ib_mad mlx4_core mlx5_ib(-) mlx5_core ib_core
> ib_addr mst_pciconf(O) mst_pci(O) nfsv3 nfs af_packet coretemp
> x86_pkg_temp_thermal crct10dif_pclmul crc32c_intel aesni_intel
> aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd microcode
> ipmi_ssif pcspkr lpc_ich i2c_i801 mfd_core ioatdma wmi ipmi_si
> ipmi_msghandler processor button nfsd auth_rpcgss oid_registry nfs_acl
> lockd grace sunrpc ip_tables x_tables ext4 crc16 mbcache jbd2 sd_mod
> hid_generic usbhid hid ahci libahci libata igb ehci_pci hwmon ehci_hcd
> ptp usbcore pps_core scsi_mod i2c_algo_bit usb_common i2c_core dca
> autofs4 [last unloaded: mlx4_core]
> kernel: CPU: 2 PID: 6012 Comm: modprobe Tainted: G           O L
> 4.3.0-rc3-debug+ #67
> kernel: Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013
> kernel:  000000000000011b ffff8807a99afbe8 ffffffff8129915b
> 0000000000000009
> kernel:  0000000000000000 ffff8807a99afc28 ffffffff810752b5
> ffff880827d7c2a0
> kernel:  ffff8807b0d03260 ffff880827d7c2a0 ffff880827d7cc60
> 0000000000000000
> kernel: Call Trace:
> kernel:  [<ffffffff8129915b>] dump_stack+0x4f/0x74
> kernel:  [<ffffffff810752b5>] warn_slowpath_common+0x95/0xe0
> kernel:  [<ffffffff8107531a>] warn_slowpath_null+0x1a/0x20
> kernel:  [<ffffffffa001bd4b>] ib_dealloc_pd+0x5b/0xa0 [ib_core]
> kernel:  [<ffffffffa047adce>] ipoib_transport_dev_cleanup+0x9e/0xf0
> [ib_ipoib]
> kernel:  [<ffffffffa047712e>] ipoib_ib_dev_cleanup+0x5e/0x80 [ib_ipoib]
> kernel:  [<ffffffffa0473984>] ipoib_dev_cleanup+0x2a4/0x3b0 [ib_ipoib]
> kernel:  [<ffffffff8107a11d>] ? __local_bh_enable_ip+0x6d/0xd0
> kernel:  [<ffffffffa0473a9e>] ipoib_uninit+0xe/0x10 [ib_ipoib]
> kernel:  [<ffffffff8141ba17>] rollback_registered_many+0x1a7/0x2c0
> kernel:  [<ffffffff8141bbd1>] rollback_registered+0x31/0x40
> kernel:  [<ffffffff8141bc38>] unregister_netdevice_queue+0x58/0xb0
> kernel:  [<ffffffff8141be00>] unregister_netdev+0x20/0x30
> kernel:  [<ffffffffa04721a1>] ipoib_remove_one+0xa1/0xe0 [ib_ipoib]
> kernel:  [<ffffffffa001e0d1>] ib_unregister_device+0xc1/0x160 [ib_core]
> kernel:  [<ffffffffa05231f9>] mlx5_ib_remove+0x19/0x50 [mlx5_ib]
> kernel:  [<ffffffffa04e5068>] mlx5_remove_device+0x68/0x80 [mlx5_core]
> kernel:  [<ffffffffa04e50be>] mlx5_unregister_interface+0x3e/0x70
> [mlx5_core]
> kernel:  [<ffffffffa053397c>] mlx5_ib_cleanup+0x10/0x694 [mlx5_ib]
> kernel:  [<ffffffff810f67aa>] SyS_delete_module+0x17a/0x1c0
> kernel:  [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
> kernel:  [<ffffffff811e80b0>] ? generic_show_options+0x180/0x180
> kernel:  [<ffffffff8151a1f2>] entry_SYSCALL_64_fastpath+0x12/0x76
> kernel: ---[ end trace 31339c7283574ccb ]---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug
       [not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-10-07 16:11   ` Doug Ledford
  2015-10-07 16:22   ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA
@ 2015-10-07 19:12   ` Or Gerlitz
  2 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2015-10-07 19:12 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jason Gunthorpe, Erez Shitrit, Doug Ledford

On Wed, Oct 7, 2015 at 6:51 PM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> This started popping up (not sure if it's new to 4.3-rc1).
> Happens when unloading the provider driver (mlx4/mlx5 in my case).
> Has anyone seen this?

yes, I think to see it over the last 1-2 years

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug
       [not found]     ` <5615442D.2020007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-10-11 15:51       ` Sagi Grimberg
       [not found]         ` <561A8591.4020608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2015-10-11 15:51 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Jason Gunthorpe, Erez Shitrit, Christoph Lamater


> Yes.  I'm seeing this too.  The last time this popped up I fixed it by
> adding the code for reaping ahs.  I suspect that the new code to timeout
> sendonly multicast joins combined with us now creating and joining what
> used to be sendonly groups is the likely culprit here.
>

Is someone looking at this? It really should be fixed before 4.3
final...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug
       [not found]         ` <561A8591.4020608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-10-11 23:49           ` Christoph Lameter
       [not found]             ` <alpine.DEB.2.20.1510111848560.10812-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Lameter @ 2015-10-11 23:49 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jason Gunthorpe, Erez Shitrit

On Sun, 11 Oct 2015, Sagi Grimberg wrote:

> Is someone looking at this? It really should be fixed before 4.3
> final...

The following fixup patch is needed:



Subject: ipoib: For sendonly join free the multicast group on leave

When we leave the multicast group on expiration of a neighbor we
do not free the mcast structure. This results in a memory leak.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

Index: linux/drivers/infiniband/ulp/ipoib/ipoib.h
===================================================================
--- linux.orig/drivers/infiniband/ulp/ipoib/ipoib.h
+++ linux/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -495,6 +495,7 @@ void ipoib_dev_cleanup(struct net_device
 void ipoib_mcast_join_task(struct work_struct *work);
 void ipoib_mcast_carrier_on_task(struct work_struct *work);
 void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb);
+void ipoib_mcast_free(struct ipoib_mcast *mc);

 void ipoib_mcast_restart_task(struct work_struct *work);
 int ipoib_mcast_start_thread(struct net_device *dev);
Index: linux/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1207,8 +1207,10 @@ static void __ipoib_reap_neigh(struct ip

 out_unlock:
 	spin_unlock_irqrestore(&priv->lock, flags);
-	list_for_each_entry_safe(mcast, tmcast, &remove_list, list)
+	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
 		ipoib_mcast_leave(dev, mcast);
+		ipoib_mcast_free(mcast);
+	}
 }

 static void ipoib_reap_neigh(struct work_struct *work)
Index: linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -106,7 +106,7 @@ static void __ipoib_mcast_schedule_join_
 		queue_delayed_work(priv->wq, &priv->mcast_task, 0);
 }

-static void ipoib_mcast_free(struct ipoib_mcast *mcast)
+void ipoib_mcast_free(struct ipoib_mcast *mcast)
 {
 	struct net_device *dev = mcast->dev;
 	int tx_dropped = 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug
       [not found]             ` <alpine.DEB.2.20.1510111848560.10812-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-10-12  7:53               ` Sagi Grimberg
  2015-10-12 12:35               ` Doug Ledford
  1 sibling, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2015-10-12  7:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jason Gunthorpe, Erez Shitrit


> The following fixup patch is needed:
>
>
>
> Subject: ipoib: For sendonly join free the multicast group on leave
>
> When we leave the multicast group on expiration of a neighbor we
> do not free the mcast structure. This results in a memory leak.
>
> Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
>
> Index: linux/drivers/infiniband/ulp/ipoib/ipoib.h
> ===================================================================
> --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ linux/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -495,6 +495,7 @@ void ipoib_dev_cleanup(struct net_device
>   void ipoib_mcast_join_task(struct work_struct *work);
>   void ipoib_mcast_carrier_on_task(struct work_struct *work);
>   void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb);
> +void ipoib_mcast_free(struct ipoib_mcast *mc);
>
>   void ipoib_mcast_restart_task(struct work_struct *work);
>   int ipoib_mcast_start_thread(struct net_device *dev);
> Index: linux/drivers/infiniband/ulp/ipoib/ipoib_main.c
> ===================================================================
> --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1207,8 +1207,10 @@ static void __ipoib_reap_neigh(struct ip
>
>   out_unlock:
>   	spin_unlock_irqrestore(&priv->lock, flags);
> -	list_for_each_entry_safe(mcast, tmcast, &remove_list, list)
> +	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
>   		ipoib_mcast_leave(dev, mcast);
> +		ipoib_mcast_free(mcast);
> +	}
>   }
>
>   static void ipoib_reap_neigh(struct work_struct *work)
> Index: linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> ===================================================================
> --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -106,7 +106,7 @@ static void __ipoib_mcast_schedule_join_
>   		queue_delayed_work(priv->wq, &priv->mcast_task, 0);
>   }
>
> -static void ipoib_mcast_free(struct ipoib_mcast *mcast)
> +void ipoib_mcast_free(struct ipoib_mcast *mcast)
>   {
>   	struct net_device *dev = mcast->dev;
>   	int tx_dropped = 0;
>


Hey Christoph,

Thanks for the quick patch. When you re-spin this as
a proper patch you can add my:

Tested-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug
       [not found]             ` <alpine.DEB.2.20.1510111848560.10812-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2015-10-12  7:53               ` Sagi Grimberg
@ 2015-10-12 12:35               ` Doug Ledford
  1 sibling, 0 replies; 8+ messages in thread
From: Doug Ledford @ 2015-10-12 12:35 UTC (permalink / raw)
  To: Christoph Lameter, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jason Gunthorpe, Erez Shitrit

[-- Attachment #1: Type: text/plain, Size: 2913 bytes --]

On 10/11/2015 07:49 PM, Christoph Lameter wrote:
> On Sun, 11 Oct 2015, Sagi Grimberg wrote:
> 
>> Is someone looking at this? It really should be fixed before 4.3
>> final...
> 
> The following fixup patch is needed:

Thanks Christoph.  I figured the issue had to have come from the new
code, but I hadn't had a chance to track it down yet.

> 
> 
> Subject: ipoib: For sendonly join free the multicast group on leave
> 
> When we leave the multicast group on expiration of a neighbor we
> do not free the mcast structure. This results in a memory leak.
> 
> Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> 
> Index: linux/drivers/infiniband/ulp/ipoib/ipoib.h
> ===================================================================
> --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ linux/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -495,6 +495,7 @@ void ipoib_dev_cleanup(struct net_device
>  void ipoib_mcast_join_task(struct work_struct *work);
>  void ipoib_mcast_carrier_on_task(struct work_struct *work);
>  void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb);
> +void ipoib_mcast_free(struct ipoib_mcast *mc);
> 
>  void ipoib_mcast_restart_task(struct work_struct *work);
>  int ipoib_mcast_start_thread(struct net_device *dev);
> Index: linux/drivers/infiniband/ulp/ipoib/ipoib_main.c
> ===================================================================
> --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1207,8 +1207,10 @@ static void __ipoib_reap_neigh(struct ip
> 
>  out_unlock:
>  	spin_unlock_irqrestore(&priv->lock, flags);
> -	list_for_each_entry_safe(mcast, tmcast, &remove_list, list)
> +	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
>  		ipoib_mcast_leave(dev, mcast);
> +		ipoib_mcast_free(mcast);
> +	}
>  }
> 
>  static void ipoib_reap_neigh(struct work_struct *work)
> Index: linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> ===================================================================
> --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -106,7 +106,7 @@ static void __ipoib_mcast_schedule_join_
>  		queue_delayed_work(priv->wq, &priv->mcast_task, 0);
>  }
> 
> -static void ipoib_mcast_free(struct ipoib_mcast *mcast)
> +void ipoib_mcast_free(struct ipoib_mcast *mcast)
>  {
>  	struct net_device *dev = mcast->dev;
>  	int tx_dropped = 0;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-10-12 12:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-07 15:51 Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug Sagi Grimberg
     [not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-10-07 16:11   ` Doug Ledford
     [not found]     ` <5615442D.2020007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-10-11 15:51       ` Sagi Grimberg
     [not found]         ` <561A8591.4020608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-10-11 23:49           ` Christoph Lameter
     [not found]             ` <alpine.DEB.2.20.1510111848560.10812-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-10-12  7:53               ` Sagi Grimberg
2015-10-12 12:35               ` Doug Ledford
2015-10-07 16:22   ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA
2015-10-07 19:12   ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).