From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pradeep Satyanarayana Subject: dat_ep_disconnect() with ABRUPT Date: Thu, 18 Nov 2010 09:40:28 -0800 Message-ID: <4CE5650C.1090706@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: "Davis, Arlin R" Cc: linux-rdma List-Id: linux-rdma@vger.kernel.org Hi Arlin, We are seeing some issues with dat_ep_disconnect() with ABRUPT flag. In fact it appears that the ABRUPT flag seems to behave like the GRACEFUL flag. One difference between the DAT1.2 and DAT2.0 appears to be the following: In dapls_ib_disconnect() /* ABRUPT close, wait for callback and DISCONNECTED state */ if (close_flags == DAT_CLOSE_ABRUPT_FLAG) { dapl_os_lock(&ep_ptr->header.lock); while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) { dapl_os_unlock(&ep_ptr->header.lock); dapl_os_sleep_usec(10000); dapl_os_lock(&ep_ptr->header.lock); } dapl_os_unlock(&ep_ptr->header.lock); } this loop exists in DAT2.0 and has been removed in DAT1.2. I am not sure why this is leading to different behaviors in DAT1.2 and DAT2.0. One thought is that both DAT1.2 and DAT2.0 have a missing check for ABRUPT flag in dapl_ep_disconnect(). if ( ep_ptr->param.ep_state == DAT_EP_STATE_ACTIVE_CONNECTION_PENDING || ep_ptr->param.ep_state == DAT_EP_STATE_COMPLETION_PENDING ) { /* * Beginning or waiting on a connection: abort and reset the * state */ ep_ptr->param.ep_state = DAT_EP_STATE_DISCONNECTED; dapl_os_unlock ( &ep_ptr->header.lock ); /* disconnect and make sure we get no callbacks */ (void) dapls_ib_disconnect (ep_ptr, DAT_CLOSE_ABRUPT_FLAG); /* clean up connection state */ dapl_sp_remove_ep (ep_ptr); evd_ptr = (DAPL_EVD *) ep_ptr->param.connect_evd_handle; dapls_evd_post_connection_event (evd_ptr, DAT_CONNECTION_EVENT_DISCONNECTED, (DAT_HANDLE) ep_ptr, 0, 0); dat_status = DAT_SUCCESS; goto bail; } The if condition above should also have an additional check for disconnect_flags == ABRUPT. If the EP is in a CONNECTED state and the remote end crashes and this node calls dat_ep_disconnect() with ABRUPT, the if condition is not true and it is treated as though it was a GRACEFUL disconnect. If you agree with the assessment I can build a patch. Thanks Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html