All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
Date: Thu, 23 Feb 2012 13:55:13 -0600	[thread overview]
Message-ID: <4F4699A1.7030402@opengridcomputing.com> (raw)
In-Reply-To: <4F465A46.3060301-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>

On 02/23/2012 09:24 AM, Steve Wise wrote:
> On 02/23/2012 01:46 AM, Roland Dreier wrote:
>> On Wed, Feb 22, 2012 at 1:43 PM, Steve Wise<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>  wrote:
>>> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
>>> index 1a696f7..6847d76 100644
>>> --- a/drivers/infiniband/core/iwcm.c
>>> +++ b/drivers/infiniband/core/iwcm.c
>>> @@ -631,6 +631,8 @@ static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv,
>>>         spin_lock_irqsave(&listen_id_priv->lock, flags);
>>>         if (listen_id_priv->state != IW_CM_STATE_LISTEN) {
>>>                 spin_unlock_irqrestore(&listen_id_priv->lock, flags);
>>> +               iw_cm_reject(cm_id, NULL, 0);
>>> +               iw_destroy_cm_id(cm_id);
>>>                 goto out;
>>>         }
>>>         spin_unlock_irqrestore(&listen_id_priv->lock, flags);
>> Thanks, this makes more sense to my brain at least.
>>
>
> Yes, this is the best fix methinks.  Thanks for the review!
>
>> I assume this works just as well in your testing? :)
>
> Yes, I've run some large NP MPI tests that tickle this condition and all the connections get cleaned up now.  I also 
> ran some other MPI regression tests with this fix.
>

Hrm.  I just hit this after more testing.  Debugging now.  Just hold of on this patch until I root cause this.


Unable to handle kernel paging request at 0000000000200200 RIP:
  [<0000000000200200>]
PGD 183c984067 PUD 0
Oops: 0010 [1] SMP
last sysfs file: /class/infiniband/cxgb4_0/node_guid
CPU 10
Modules linked in: nfs fscache nfs_acl cxgb3(U) iw_cxgb4(U) kretprobes(U) autofs4 hidp rfcomm l2cap bluetooth lockd 
sunrpc be2iscsi iscsi_tcp bnx2i cnic uio libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi rdma_ucm(U) 
ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6 xfrm_nalgo crypto_api 
ib_uverbs(U) ib_umad(U) iw_nes(U) ib_qib(U) dca mlx4_ib(U) mlx4_en(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) 
dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi 
acpi_memhotplug ac parport_pc lp parport joydev cxgb4(U) tpm_tis tpm e1000e tpm_bios sr_mod shpchp i7core_edac edac_mc 
cdrom i2c_i801 i2c_core serio_raw 8021q sg pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci 
libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 5708, comm: iw_cm_wq Tainted: G      2.6.18-238.el5 #1
RIP: 0010:[<0000000000200200>]  [<0000000000200200>]
RSP: 0018:ffff81183e0cfcf8  EFLAGS: 00010097
RAX: ffff810c3cf3ca58 RBX: 0c30100000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff81012aad6a58
RBP: ffff81183e0cfd30 R08: ffff81012aad6a70 R09: 0000000000000282
R10: 0000000000000000 R11: 0000000000000280 R12: 0000000000000000
R13: 0000000000003c15 R14: ffff810c3cf3ca50 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff810c6a3c42c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000200200 CR3: 0000000c3d5a4000 CR4: 00000000000006e0
Process iw_cm_wq (pid: 5708, threadinfo ffff81183e0ce000, task ffff810c3ea79080)
Stack:  ffffffff8008c846 0000000300000000 ffff810c3cf3ca50 0000000000000000
  0000000000000000 0000000000000282 0000000000000003 ffff81183e0cfd70
  ffffffff8002e261 0000000000000000 ffff810c3cf3c9c0 ffff810c3cf3c900
Call Trace:
  [<ffffffff8008c846>] __wake_up_common+0x3e/0x68
  [<ffffffff8002e261>] __wake_up+0x38/0x4f
  [<ffffffff8867410b>] :iw_cm:iw_cm_reject+0x5a/0xa7
  [<ffffffff88674baa>] :iw_cm:cm_work_handler+0x15e/0x424
  [<ffffffff88674a4c>] :iw_cm:cm_work_handler+0x0/0x424
  [<ffffffff8004d7ae>] run_workqueue+0x99/0xf6
  [<ffffffff80049ff6>] worker_thread+0x0/0x122
  [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
  [<ffffffff8004a0e6>] worker_thread+0xf0/0x122
  [<ffffffff8008e40a>] default_wake_function+0x0/0xe
  [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
  [<ffffffff80032974>] kthread+0xfe/0x132
  [<ffffffff8005dfb1>] child_rip+0xa/0x11
  [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
  [<ffffffff80032876>] kthread+0x0/0x132
  [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code:  Bad RIP value.
RIP  [<0000000000200200>]
  RSP <ffff81183e0cfcf8>
crash>


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-02-23 19:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-22 21:43 [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN Steve Wise
     [not found] ` <20120222214307.23921.83903.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
2012-02-23  7:46   ` Roland Dreier
     [not found]     ` <CAL1RGDV7ZoKWgbh+ERF+af3_B7K2USAkXSPKWeQEg5atpHY0og-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-23 15:24       ` Steve Wise
     [not found]         ` <4F465A46.3060301-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2012-02-23 19:55           ` Steve Wise [this message]
     [not found]             ` <4F4699A1.7030402-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2012-02-23 20:23               ` Steve Wise
2012-02-24  1:57               ` Roland Dreier
     [not found]                 ` <CAL1RGDWkVJxEDZ5SaaSa8oA_y6a0u1NCbzTK9agsJE+V_YzimQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-24 14:16                   ` Steve Wise
2012-02-24 21:32   ` Roland Dreier
     [not found]     ` <CAL1RGDWb0ocYN5oM3QtxRj5VWCAWrp3Jtx6N1UHSrNDP2A1WEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-24 21:41       ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F4699A1.7030402@opengridcomputing.com \
    --to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.