linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Do not hold clnt_fd_lock mutex during connect
@ 2016-05-18 17:54 Paulo Andrade
  2016-05-19  3:43 ` [Libtirpc-devel] " Ian Kent
  2016-05-19  5:19 ` Ian Kent
  0 siblings, 2 replies; 4+ messages in thread
From: Paulo Andrade @ 2016-05-18 17:54 UTC (permalink / raw)
  To: libtirpc-devel; +Cc: linux-nfs, Paulo Andrade

  An user  reports  that  their  application  connects to  multiple  servers
through a rpc interface using  libtirpc. When one of  the servers misbehaves
(goes down  ungracefully or  has a  delay of  a few  seconds in  the traffic
flow), it was observed that the traffic from the  client to other servers is
decreased by  the  traffic  anomaly  of  the failing  server,  i.e.  traffic
decreases or goes to 0 in all the servers.

  When investigated further, specifically into the behavior  of the libtirpc
at the  time of  the issue,  it  was observed  that all  of the  application
threads specifically interacting with  libtirpc were locked into  one single
lock inside  the  libtirpc library.  This  was a  race  condition which  had
resulted in a deadlock and hence the resultant dip/stoppage of traffic.

  As an experiment, the user removed the libtirpc from the application build
and used the  standard glibc  library for rpc  communication. In  that case,
everything worked perfectly even  in the time  of the issue of  server nodes
misbehaving.

Signed-off-by: Paulo Andrade <pcpa@gnu.org>
---
 src/clnt_vc.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/clnt_vc.c b/src/clnt_vc.c
index a72f9f7..2396f34 100644
--- a/src/clnt_vc.c
+++ b/src/clnt_vc.c
@@ -229,27 +229,23 @@ clnt_vc_create(fd, raddr, prog, vers, sendsz, recvsz)
 	} else
 		assert(vc_cv != (cond_t *) NULL);
 
-	/*
-	 * XXX - fvdl connecting while holding a mutex?
-	 */
+	mutex_unlock(&clnt_fd_lock);
+
 	slen = sizeof ss;
 	if (getpeername(fd, (struct sockaddr *)&ss, &slen) < 0) {
 		if (errno != ENOTCONN) {
 			rpc_createerr.cf_stat = RPC_SYSTEMERROR;
 			rpc_createerr.cf_error.re_errno = errno;
-			mutex_unlock(&clnt_fd_lock);
 			thr_sigsetmask(SIG_SETMASK, &(mask), NULL);
 			goto err;
 		}
 		if (connect(fd, (struct sockaddr *)raddr->buf, raddr->len) < 0){
 			rpc_createerr.cf_stat = RPC_SYSTEMERROR;
 			rpc_createerr.cf_error.re_errno = errno;
-			mutex_unlock(&clnt_fd_lock);
 			thr_sigsetmask(SIG_SETMASK, &(mask), NULL);
 			goto err;
 		}
 	}
-	mutex_unlock(&clnt_fd_lock);
 	if (!__rpc_fd2sockinfo(fd, &si))
 		goto err;
 	thr_sigsetmask(SIG_SETMASK, &(mask), NULL);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-05-19 23:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-18 17:54 [PATCH] Do not hold clnt_fd_lock mutex during connect Paulo Andrade
2016-05-19  3:43 ` [Libtirpc-devel] " Ian Kent
2016-05-19  5:19 ` Ian Kent
2016-05-19 23:53   ` Ian Kent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).