From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Wise Subject: Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN Date: Thu, 23 Feb 2012 13:55:13 -0600 Message-ID: <4F4699A1.7030402@opengridcomputing.com> References: <20120222214307.23921.83903.stgit@build.ogc.int> <4F465A46.3060301@opengridcomputing.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F465A46.3060301-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Roland Dreier Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-rdma@vger.kernel.org On 02/23/2012 09:24 AM, Steve Wise wrote: > On 02/23/2012 01:46 AM, Roland Dreier wrote: >> On Wed, Feb 22, 2012 at 1:43 PM, Steve Wise wrote: >>> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c >>> index 1a696f7..6847d76 100644 >>> --- a/drivers/infiniband/core/iwcm.c >>> +++ b/drivers/infiniband/core/iwcm.c >>> @@ -631,6 +631,8 @@ static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv, >>> spin_lock_irqsave(&listen_id_priv->lock, flags); >>> if (listen_id_priv->state != IW_CM_STATE_LISTEN) { >>> spin_unlock_irqrestore(&listen_id_priv->lock, flags); >>> + iw_cm_reject(cm_id, NULL, 0); >>> + iw_destroy_cm_id(cm_id); >>> goto out; >>> } >>> spin_unlock_irqrestore(&listen_id_priv->lock, flags); >> Thanks, this makes more sense to my brain at least. >> > > Yes, this is the best fix methinks. Thanks for the review! > >> I assume this works just as well in your testing? :) > > Yes, I've run some large NP MPI tests that tickle this condition and all the connections get cleaned up now. I also > ran some other MPI regression tests with this fix. > Hrm. I just hit this after more testing. Debugging now. Just hold of on this patch until I root cause this. Unable to handle kernel paging request at 0000000000200200 RIP: [<0000000000200200>] PGD 183c984067 PUD 0 Oops: 0010 [1] SMP last sysfs file: /class/infiniband/cxgb4_0/node_guid CPU 10 Modules linked in: nfs fscache nfs_acl cxgb3(U) iw_cxgb4(U) kretprobes(U) autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2iscsi iscsi_tcp bnx2i cnic uio libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6 xfrm_nalgo crypto_api ib_uverbs(U) ib_umad(U) iw_nes(U) ib_qib(U) dca mlx4_ib(U) mlx4_en(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev cxgb4(U) tpm_tis tpm e1000e tpm_bios sr_mod shpchp i7core_edac edac_mc cdrom i2c_i801 i2c_core serio_raw 8021q sg pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 5708, comm: iw_cm_wq Tainted: G 2.6.18-238.el5 #1 RIP: 0010:[<0000000000200200>] [<0000000000200200>] RSP: 0018:ffff81183e0cfcf8 EFLAGS: 00010097 RAX: ffff810c3cf3ca58 RBX: 0c30100000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff81012aad6a58 RBP: ffff81183e0cfd30 R08: ffff81012aad6a70 R09: 0000000000000282 R10: 0000000000000000 R11: 0000000000000280 R12: 0000000000000000 R13: 0000000000003c15 R14: ffff810c3cf3ca50 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff810c6a3c42c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000200200 CR3: 0000000c3d5a4000 CR4: 00000000000006e0 Process iw_cm_wq (pid: 5708, threadinfo ffff81183e0ce000, task ffff810c3ea79080) Stack: ffffffff8008c846 0000000300000000 ffff810c3cf3ca50 0000000000000000 0000000000000000 0000000000000282 0000000000000003 ffff81183e0cfd70 ffffffff8002e261 0000000000000000 ffff810c3cf3c9c0 ffff810c3cf3c900 Call Trace: [] __wake_up_common+0x3e/0x68 [] __wake_up+0x38/0x4f [] :iw_cm:iw_cm_reject+0x5a/0xa7 [] :iw_cm:cm_work_handler+0x15e/0x424 [] :iw_cm:cm_work_handler+0x0/0x424 [] run_workqueue+0x99/0xf6 [] worker_thread+0x0/0x122 [] keventd_create_kthread+0x0/0xc4 [] worker_thread+0xf0/0x122 [] default_wake_function+0x0/0xe [] keventd_create_kthread+0x0/0xc4 [] kthread+0xfe/0x132 [] child_rip+0xa/0x11 [] keventd_create_kthread+0x0/0xc4 [] kthread+0x0/0x132 [] child_rip+0x0/0x11 Code: Bad RIP value. RIP [<0000000000200200>] RSP crash> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html