* Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug
@ 2015-10-07 15:51 Sagi Grimberg
[not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2015-10-07 15:51 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Jason Gunthorpe, Erez Shitrit, Doug Ledford
This started popping up (not sure if it's new to 4.3-rc1).
Happens when unloading the provider driver (mlx4/mlx5 in my case).
Has anyone seen this?
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 2 PID: 6012 at drivers/infiniband/core/verbs.c:283
ib_dealloc_pd+0x5b/0xa0 [ib_core]()
kernel: Modules linked in: rpcrdma ib_srp scsi_transport_srp ib_iser
rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_umad ib_uverbs ib_ipoib
ib_cm mlx4_ib ib_sa ib_mad mlx4_core mlx5_ib(-) mlx5_core ib_core
ib_addr mst_pciconf(O) mst_pci(O) nfsv3 nfs af_packet coretemp
x86_pkg_temp_thermal crct10dif_pclmul crc32c_intel aesni_intel
aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd microcode
ipmi_ssif pcspkr lpc_ich i2c_i801 mfd_core ioatdma wmi ipmi_si
ipmi_msghandler processor button nfsd auth_rpcgss oid_registry nfs_acl
lockd grace sunrpc ip_tables x_tables ext4 crc16 mbcache jbd2 sd_mod
hid_generic usbhid hid ahci libahci libata igb ehci_pci hwmon ehci_hcd
ptp usbcore pps_core scsi_mod i2c_algo_bit usb_common i2c_core dca
autofs4 [last unloaded: mlx4_core]
kernel: CPU: 2 PID: 6012 Comm: modprobe Tainted: G O L
4.3.0-rc3-debug+ #67
kernel: Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013
kernel: 000000000000011b ffff8807a99afbe8 ffffffff8129915b 0000000000000009
kernel: 0000000000000000 ffff8807a99afc28 ffffffff810752b5 ffff880827d7c2a0
kernel: ffff8807b0d03260 ffff880827d7c2a0 ffff880827d7cc60 0000000000000000
kernel: Call Trace:
kernel: [<ffffffff8129915b>] dump_stack+0x4f/0x74
kernel: [<ffffffff810752b5>] warn_slowpath_common+0x95/0xe0
kernel: [<ffffffff8107531a>] warn_slowpath_null+0x1a/0x20
kernel: [<ffffffffa001bd4b>] ib_dealloc_pd+0x5b/0xa0 [ib_core]
kernel: [<ffffffffa047adce>] ipoib_transport_dev_cleanup+0x9e/0xf0
[ib_ipoib]
kernel: [<ffffffffa047712e>] ipoib_ib_dev_cleanup+0x5e/0x80 [ib_ipoib]
kernel: [<ffffffffa0473984>] ipoib_dev_cleanup+0x2a4/0x3b0 [ib_ipoib]
kernel: [<ffffffff8107a11d>] ? __local_bh_enable_ip+0x6d/0xd0
kernel: [<ffffffffa0473a9e>] ipoib_uninit+0xe/0x10 [ib_ipoib]
kernel: [<ffffffff8141ba17>] rollback_registered_many+0x1a7/0x2c0
kernel: [<ffffffff8141bbd1>] rollback_registered+0x31/0x40
kernel: [<ffffffff8141bc38>] unregister_netdevice_queue+0x58/0xb0
kernel: [<ffffffff8141be00>] unregister_netdev+0x20/0x30
kernel: [<ffffffffa04721a1>] ipoib_remove_one+0xa1/0xe0 [ib_ipoib]
kernel: [<ffffffffa001e0d1>] ib_unregister_device+0xc1/0x160 [ib_core]
kernel: [<ffffffffa05231f9>] mlx5_ib_remove+0x19/0x50 [mlx5_ib]
kernel: [<ffffffffa04e5068>] mlx5_remove_device+0x68/0x80 [mlx5_core]
kernel: [<ffffffffa04e50be>] mlx5_unregister_interface+0x3e/0x70
[mlx5_core]
kernel: [<ffffffffa053397c>] mlx5_ib_cleanup+0x10/0x694 [mlx5_ib]
kernel: [<ffffffff810f67aa>] SyS_delete_module+0x17a/0x1c0
kernel: [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
kernel: [<ffffffff811e80b0>] ? generic_show_options+0x180/0x180
kernel: [<ffffffff8151a1f2>] entry_SYSCALL_64_fastpath+0x12/0x76
kernel: ---[ end trace 31339c7283574ccb ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug [not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-10-07 16:11 ` Doug Ledford [not found] ` <5615442D.2020007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-10-07 16:22 ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA 2015-10-07 19:12 ` Or Gerlitz 2 siblings, 1 reply; 8+ messages in thread From: Doug Ledford @ 2015-10-07 16:11 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: Jason Gunthorpe, Erez Shitrit [-- Attachment #1: Type: text/plain, Size: 3654 bytes --] On 10/07/2015 11:51 AM, Sagi Grimberg wrote: > This started popping up (not sure if it's new to 4.3-rc1). > > Happens when unloading the provider driver (mlx4/mlx5 in my case). > > Has anyone seen this? > > kernel: ------------[ cut here ]------------ > kernel: WARNING: CPU: 2 PID: 6012 at drivers/infiniband/core/verbs.c:283 > ib_dealloc_pd+0x5b/0xa0 [ib_core]() > kernel: Modules linked in: rpcrdma ib_srp scsi_transport_srp ib_iser > rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_umad ib_uverbs ib_ipoib > ib_cm mlx4_ib ib_sa ib_mad mlx4_core mlx5_ib(-) mlx5_core ib_core > ib_addr mst_pciconf(O) mst_pci(O) nfsv3 nfs af_packet coretemp > x86_pkg_temp_thermal crct10dif_pclmul crc32c_intel aesni_intel > aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd microcode > ipmi_ssif pcspkr lpc_ich i2c_i801 mfd_core ioatdma wmi ipmi_si > ipmi_msghandler processor button nfsd auth_rpcgss oid_registry nfs_acl > lockd grace sunrpc ip_tables x_tables ext4 crc16 mbcache jbd2 sd_mod > hid_generic usbhid hid ahci libahci libata igb ehci_pci hwmon ehci_hcd > ptp usbcore pps_core scsi_mod i2c_algo_bit usb_common i2c_core dca > autofs4 [last unloaded: mlx4_core] > kernel: CPU: 2 PID: 6012 Comm: modprobe Tainted: G O L > 4.3.0-rc3-debug+ #67 > kernel: Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013 > kernel: 000000000000011b ffff8807a99afbe8 ffffffff8129915b > 0000000000000009 > kernel: 0000000000000000 ffff8807a99afc28 ffffffff810752b5 > ffff880827d7c2a0 > kernel: ffff8807b0d03260 ffff880827d7c2a0 ffff880827d7cc60 > 0000000000000000 > kernel: Call Trace: > kernel: [<ffffffff8129915b>] dump_stack+0x4f/0x74 > kernel: [<ffffffff810752b5>] warn_slowpath_common+0x95/0xe0 > kernel: [<ffffffff8107531a>] warn_slowpath_null+0x1a/0x20 > kernel: [<ffffffffa001bd4b>] ib_dealloc_pd+0x5b/0xa0 [ib_core] > kernel: [<ffffffffa047adce>] ipoib_transport_dev_cleanup+0x9e/0xf0 > [ib_ipoib] > kernel: [<ffffffffa047712e>] ipoib_ib_dev_cleanup+0x5e/0x80 [ib_ipoib] > kernel: [<ffffffffa0473984>] ipoib_dev_cleanup+0x2a4/0x3b0 [ib_ipoib] > kernel: [<ffffffff8107a11d>] ? __local_bh_enable_ip+0x6d/0xd0 > kernel: [<ffffffffa0473a9e>] ipoib_uninit+0xe/0x10 [ib_ipoib] > kernel: [<ffffffff8141ba17>] rollback_registered_many+0x1a7/0x2c0 > kernel: [<ffffffff8141bbd1>] rollback_registered+0x31/0x40 > kernel: [<ffffffff8141bc38>] unregister_netdevice_queue+0x58/0xb0 > kernel: [<ffffffff8141be00>] unregister_netdev+0x20/0x30 > kernel: [<ffffffffa04721a1>] ipoib_remove_one+0xa1/0xe0 [ib_ipoib] > kernel: [<ffffffffa001e0d1>] ib_unregister_device+0xc1/0x160 [ib_core] > kernel: [<ffffffffa05231f9>] mlx5_ib_remove+0x19/0x50 [mlx5_ib] > kernel: [<ffffffffa04e5068>] mlx5_remove_device+0x68/0x80 [mlx5_core] > kernel: [<ffffffffa04e50be>] mlx5_unregister_interface+0x3e/0x70 > [mlx5_core] > kernel: [<ffffffffa053397c>] mlx5_ib_cleanup+0x10/0x694 [mlx5_ib] > kernel: [<ffffffff810f67aa>] SyS_delete_module+0x17a/0x1c0 > kernel: [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19 > kernel: [<ffffffff811e80b0>] ? generic_show_options+0x180/0x180 > kernel: [<ffffffff8151a1f2>] entry_SYSCALL_64_fastpath+0x12/0x76 > kernel: ---[ end trace 31339c7283574ccb ]--- Yes. I'm seeing this too. The last time this popped up I fixed it by adding the code for reaping ahs. I suspect that the new code to timeout sendonly multicast joins combined with us now creating and joining what used to be sendonly groups is the likely culprit here. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 884 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <5615442D.2020007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug [not found] ` <5615442D.2020007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2015-10-11 15:51 ` Sagi Grimberg [not found] ` <561A8591.4020608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2015-10-11 15:51 UTC (permalink / raw) To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: Jason Gunthorpe, Erez Shitrit, Christoph Lamater > Yes. I'm seeing this too. The last time this popped up I fixed it by > adding the code for reaping ahs. I suspect that the new code to timeout > sendonly multicast joins combined with us now creating and joining what > used to be sendonly groups is the likely culprit here. > Is someone looking at this? It really should be fixed before 4.3 final... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <561A8591.4020608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug [not found] ` <561A8591.4020608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-10-11 23:49 ` Christoph Lameter [not found] ` <alpine.DEB.2.20.1510111848560.10812-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Christoph Lameter @ 2015-10-11 23:49 UTC (permalink / raw) To: Sagi Grimberg Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jason Gunthorpe, Erez Shitrit On Sun, 11 Oct 2015, Sagi Grimberg wrote: > Is someone looking at this? It really should be fixed before 4.3 > final... The following fixup patch is needed: Subject: ipoib: For sendonly join free the multicast group on leave When we leave the multicast group on expiration of a neighbor we do not free the mcast structure. This results in a memory leak. Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> Index: linux/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib.h +++ linux/drivers/infiniband/ulp/ipoib/ipoib.h @@ -495,6 +495,7 @@ void ipoib_dev_cleanup(struct net_device void ipoib_mcast_join_task(struct work_struct *work); void ipoib_mcast_carrier_on_task(struct work_struct *work); void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb); +void ipoib_mcast_free(struct ipoib_mcast *mc); void ipoib_mcast_restart_task(struct work_struct *work); int ipoib_mcast_start_thread(struct net_device *dev); Index: linux/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1207,8 +1207,10 @@ static void __ipoib_reap_neigh(struct ip out_unlock: spin_unlock_irqrestore(&priv->lock, flags); - list_for_each_entry_safe(mcast, tmcast, &remove_list, list) + list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { ipoib_mcast_leave(dev, mcast); + ipoib_mcast_free(mcast); + } } static void ipoib_reap_neigh(struct work_struct *work) Index: linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -106,7 +106,7 @@ static void __ipoib_mcast_schedule_join_ queue_delayed_work(priv->wq, &priv->mcast_task, 0); } -static void ipoib_mcast_free(struct ipoib_mcast *mcast) +void ipoib_mcast_free(struct ipoib_mcast *mcast) { struct net_device *dev = mcast->dev; int tx_dropped = 0; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <alpine.DEB.2.20.1510111848560.10812-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>]
* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug [not found] ` <alpine.DEB.2.20.1510111848560.10812-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> @ 2015-10-12 7:53 ` Sagi Grimberg 2015-10-12 12:35 ` Doug Ledford 1 sibling, 0 replies; 8+ messages in thread From: Sagi Grimberg @ 2015-10-12 7:53 UTC (permalink / raw) To: Christoph Lameter Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jason Gunthorpe, Erez Shitrit > The following fixup patch is needed: > > > > Subject: ipoib: For sendonly join free the multicast group on leave > > When we leave the multicast group on expiration of a neighbor we > do not free the mcast structure. This results in a memory leak. > > Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> > > Index: linux/drivers/infiniband/ulp/ipoib/ipoib.h > =================================================================== > --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib.h > +++ linux/drivers/infiniband/ulp/ipoib/ipoib.h > @@ -495,6 +495,7 @@ void ipoib_dev_cleanup(struct net_device > void ipoib_mcast_join_task(struct work_struct *work); > void ipoib_mcast_carrier_on_task(struct work_struct *work); > void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb); > +void ipoib_mcast_free(struct ipoib_mcast *mc); > > void ipoib_mcast_restart_task(struct work_struct *work); > int ipoib_mcast_start_thread(struct net_device *dev); > Index: linux/drivers/infiniband/ulp/ipoib/ipoib_main.c > =================================================================== > --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -1207,8 +1207,10 @@ static void __ipoib_reap_neigh(struct ip > > out_unlock: > spin_unlock_irqrestore(&priv->lock, flags); > - list_for_each_entry_safe(mcast, tmcast, &remove_list, list) > + list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { > ipoib_mcast_leave(dev, mcast); > + ipoib_mcast_free(mcast); > + } > } > > static void ipoib_reap_neigh(struct work_struct *work) > Index: linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > =================================================================== > --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > +++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > @@ -106,7 +106,7 @@ static void __ipoib_mcast_schedule_join_ > queue_delayed_work(priv->wq, &priv->mcast_task, 0); > } > > -static void ipoib_mcast_free(struct ipoib_mcast *mcast) > +void ipoib_mcast_free(struct ipoib_mcast *mcast) > { > struct net_device *dev = mcast->dev; > int tx_dropped = 0; > Hey Christoph, Thanks for the quick patch. When you re-spin this as a proper patch you can add my: Tested-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug [not found] ` <alpine.DEB.2.20.1510111848560.10812-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 2015-10-12 7:53 ` Sagi Grimberg @ 2015-10-12 12:35 ` Doug Ledford 1 sibling, 0 replies; 8+ messages in thread From: Doug Ledford @ 2015-10-12 12:35 UTC (permalink / raw) To: Christoph Lameter, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jason Gunthorpe, Erez Shitrit [-- Attachment #1: Type: text/plain, Size: 2913 bytes --] On 10/11/2015 07:49 PM, Christoph Lameter wrote: > On Sun, 11 Oct 2015, Sagi Grimberg wrote: > >> Is someone looking at this? It really should be fixed before 4.3 >> final... > > The following fixup patch is needed: Thanks Christoph. I figured the issue had to have come from the new code, but I hadn't had a chance to track it down yet. > > > Subject: ipoib: For sendonly join free the multicast group on leave > > When we leave the multicast group on expiration of a neighbor we > do not free the mcast structure. This results in a memory leak. > > Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> > > Index: linux/drivers/infiniband/ulp/ipoib/ipoib.h > =================================================================== > --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib.h > +++ linux/drivers/infiniband/ulp/ipoib/ipoib.h > @@ -495,6 +495,7 @@ void ipoib_dev_cleanup(struct net_device > void ipoib_mcast_join_task(struct work_struct *work); > void ipoib_mcast_carrier_on_task(struct work_struct *work); > void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb); > +void ipoib_mcast_free(struct ipoib_mcast *mc); > > void ipoib_mcast_restart_task(struct work_struct *work); > int ipoib_mcast_start_thread(struct net_device *dev); > Index: linux/drivers/infiniband/ulp/ipoib/ipoib_main.c > =================================================================== > --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -1207,8 +1207,10 @@ static void __ipoib_reap_neigh(struct ip > > out_unlock: > spin_unlock_irqrestore(&priv->lock, flags); > - list_for_each_entry_safe(mcast, tmcast, &remove_list, list) > + list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { > ipoib_mcast_leave(dev, mcast); > + ipoib_mcast_free(mcast); > + } > } > > static void ipoib_reap_neigh(struct work_struct *work) > Index: linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > =================================================================== > --- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > +++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > @@ -106,7 +106,7 @@ static void __ipoib_mcast_schedule_join_ > queue_delayed_work(priv->wq, &priv->mcast_task, 0); > } > > -static void ipoib_mcast_free(struct ipoib_mcast *mcast) > +void ipoib_mcast_free(struct ipoib_mcast *mcast) > { > struct net_device *dev = mcast->dev; > int tx_dropped = 0; > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 884 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug [not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-10-07 16:11 ` Doug Ledford @ 2015-10-07 16:22 ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA 2015-10-07 19:12 ` Or Gerlitz 2 siblings, 0 replies; 8+ messages in thread From: santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA @ 2015-10-07 16:22 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: Jason Gunthorpe, Erez Shitrit, Doug Ledford Sagi, On 10/7/15 8:51 AM, Sagi Grimberg wrote: > This started popping up (not sure if it's new to 4.3-rc1). > > Happens when unloading the provider driver (mlx4/mlx5 in my case). > > Has anyone seen this? > Not sure it is useful but yes I have seen similar dump with RDS on 4.3-rc1. I later found that RDS code had mr leak(s) in normal operation which lead to WARNS on module clean up. I believe the leaks lead to pd 'usecnt' getting messed up. Once i avoided that, I stopped seeing it. > kernel: ------------[ cut here ]------------ > kernel: WARNING: CPU: 2 PID: 6012 at drivers/infiniband/core/verbs.c:283 > ib_dealloc_pd+0x5b/0xa0 [ib_core]() > kernel: Modules linked in: rpcrdma ib_srp scsi_transport_srp ib_iser > rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_umad ib_uverbs ib_ipoib > ib_cm mlx4_ib ib_sa ib_mad mlx4_core mlx5_ib(-) mlx5_core ib_core > ib_addr mst_pciconf(O) mst_pci(O) nfsv3 nfs af_packet coretemp > x86_pkg_temp_thermal crct10dif_pclmul crc32c_intel aesni_intel > aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd microcode > ipmi_ssif pcspkr lpc_ich i2c_i801 mfd_core ioatdma wmi ipmi_si > ipmi_msghandler processor button nfsd auth_rpcgss oid_registry nfs_acl > lockd grace sunrpc ip_tables x_tables ext4 crc16 mbcache jbd2 sd_mod > hid_generic usbhid hid ahci libahci libata igb ehci_pci hwmon ehci_hcd > ptp usbcore pps_core scsi_mod i2c_algo_bit usb_common i2c_core dca > autofs4 [last unloaded: mlx4_core] > kernel: CPU: 2 PID: 6012 Comm: modprobe Tainted: G O L > 4.3.0-rc3-debug+ #67 > kernel: Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013 > kernel: 000000000000011b ffff8807a99afbe8 ffffffff8129915b > 0000000000000009 > kernel: 0000000000000000 ffff8807a99afc28 ffffffff810752b5 > ffff880827d7c2a0 > kernel: ffff8807b0d03260 ffff880827d7c2a0 ffff880827d7cc60 > 0000000000000000 > kernel: Call Trace: > kernel: [<ffffffff8129915b>] dump_stack+0x4f/0x74 > kernel: [<ffffffff810752b5>] warn_slowpath_common+0x95/0xe0 > kernel: [<ffffffff8107531a>] warn_slowpath_null+0x1a/0x20 > kernel: [<ffffffffa001bd4b>] ib_dealloc_pd+0x5b/0xa0 [ib_core] > kernel: [<ffffffffa047adce>] ipoib_transport_dev_cleanup+0x9e/0xf0 > [ib_ipoib] > kernel: [<ffffffffa047712e>] ipoib_ib_dev_cleanup+0x5e/0x80 [ib_ipoib] > kernel: [<ffffffffa0473984>] ipoib_dev_cleanup+0x2a4/0x3b0 [ib_ipoib] > kernel: [<ffffffff8107a11d>] ? __local_bh_enable_ip+0x6d/0xd0 > kernel: [<ffffffffa0473a9e>] ipoib_uninit+0xe/0x10 [ib_ipoib] > kernel: [<ffffffff8141ba17>] rollback_registered_many+0x1a7/0x2c0 > kernel: [<ffffffff8141bbd1>] rollback_registered+0x31/0x40 > kernel: [<ffffffff8141bc38>] unregister_netdevice_queue+0x58/0xb0 > kernel: [<ffffffff8141be00>] unregister_netdev+0x20/0x30 > kernel: [<ffffffffa04721a1>] ipoib_remove_one+0xa1/0xe0 [ib_ipoib] > kernel: [<ffffffffa001e0d1>] ib_unregister_device+0xc1/0x160 [ib_core] > kernel: [<ffffffffa05231f9>] mlx5_ib_remove+0x19/0x50 [mlx5_ib] > kernel: [<ffffffffa04e5068>] mlx5_remove_device+0x68/0x80 [mlx5_core] > kernel: [<ffffffffa04e50be>] mlx5_unregister_interface+0x3e/0x70 > [mlx5_core] > kernel: [<ffffffffa053397c>] mlx5_ib_cleanup+0x10/0x694 [mlx5_ib] > kernel: [<ffffffff810f67aa>] SyS_delete_module+0x17a/0x1c0 > kernel: [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19 > kernel: [<ffffffff811e80b0>] ? generic_show_options+0x180/0x180 > kernel: [<ffffffff8151a1f2>] entry_SYSCALL_64_fastpath+0x12/0x76 > kernel: ---[ end trace 31339c7283574ccb ]--- > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug [not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-10-07 16:11 ` Doug Ledford 2015-10-07 16:22 ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA @ 2015-10-07 19:12 ` Or Gerlitz 2 siblings, 0 replies; 8+ messages in thread From: Or Gerlitz @ 2015-10-07 19:12 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jason Gunthorpe, Erez Shitrit, Doug Ledford On Wed, Oct 7, 2015 at 6:51 PM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote: > This started popping up (not sure if it's new to 4.3-rc1). > Happens when unloading the provider driver (mlx4/mlx5 in my case). > Has anyone seen this? yes, I think to see it over the last 1-2 years Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-10-12 12:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-07 15:51 Seeing WARN_ON in ib_dealloc_pd from ipoib in kernel 4.3-rc1-debug Sagi Grimberg
[not found] ` <56153F71.2010801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-10-07 16:11 ` Doug Ledford
[not found] ` <5615442D.2020007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-10-11 15:51 ` Sagi Grimberg
[not found] ` <561A8591.4020608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-10-11 23:49 ` Christoph Lameter
[not found] ` <alpine.DEB.2.20.1510111848560.10812-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-10-12 7:53 ` Sagi Grimberg
2015-10-12 12:35 ` Doug Ledford
2015-10-07 16:22 ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA
2015-10-07 19:12 ` Or Gerlitz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).