From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ilya Maximets Subject: Re: [PATCH] vhost: fix connect hang in client mode Date: Thu, 21 Jul 2016 15:10:15 +0300 Message-ID: <5790BBA7.6070202@samsung.com> References: <1469089275-15209-1-git-send-email-i.maximets@samsung.com> <20160721093714.GD28708@yliu-dev.sh.intel.com> <579099BC.9050603@samsung.com> <20160721101311.GE28708@yliu-dev.sh.intel.com> <5790A5D4.1090703@samsung.com> <5790AEB3.2010708@samsung.com> <20160721114016.GF28708@yliu-dev.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: dev@dpdk.org, Huawei Xie , Dyasly Sergey , Heetae Ahn , Thomas Monjalon To: Yuanhan Liu Return-path: Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id 8EEEC4B79 for ; Thu, 21 Jul 2016 14:10:18 +0200 (CEST) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout1.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OAN009H2Z542W90@mailout1.w1.samsung.com> for dev@dpdk.org; Thu, 21 Jul 2016 13:10:16 +0100 (BST) In-reply-to: <20160721114016.GF28708@yliu-dev.sh.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 21.07.2016 14:40, Yuanhan Liu wrote: > On Thu, Jul 21, 2016 at 02:14:59PM +0300, Ilya Maximets wrote: >>> Hmm, how about this fixup: >>> ------------------------------------------------------------------------------ >>> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c >>> index 8626d13..b0f45e6 100644 >>> --- a/lib/librte_vhost/vhost_user/vhost-net-user.c >>> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c >>> @@ -537,18 +537,7 @@ vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) >>> errno = EINVAL; >>> >>> ret = connect(fd, un, sz); >>> - if (ret == -1 && errno != EINPROGRESS) >>> - return -1; >>> - if (ret == 0) >>> - goto connected; >>> - >>> - FD_ZERO(&fdset); >>> - FD_SET(fd, &fdset); >>> - >>> - ret = select(fd + 1, NULL, &fdset, NULL, &tv); >>> - if (!ret) >>> - errno = ETIMEDOUT; >>> - if (ret != 1) >>> + if (ret < 0 && errno != EISCONN) >>> return -1; >>> >>> ret = getsockopt(fd, SOL_SOCKET, SO_ERROR, &so_error, &len); >>> @@ -558,7 +547,6 @@ vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) >>> return -1; >>> } >>> >>> -connected: >>> flags = fcntl(fd, F_GETFL, 0); >>> if (flags < 0) { >>> RTE_LOG(ERR, VHOST_CONFIG, >>> ------------------------------------------------------------------------------ >>> ? >>> >>> We will not check the EINPROGRESS, but subsequent 'connect()' will return >>> EISCONN if connection already established. getsockopt() is kept just in >>> case. Subsequent 'connect()' will happen on the next iteration of >>> reconnection cycle (1 second sleep). >> >> I've sent v2 with this changes. > > Thanks. But still, it doesn't look clean to me. I was thinking following > might be cleaner? > > diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c > b/lib/librte_vhost/vhost_user/vhost-net-user. > index f0f92f8..c0ef290 100644 > --- a/lib/librte_vhost/vhost_user/vhost-net-user.c > +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c > @@ -532,6 +532,10 @@ vhost_user_client_reconnect(void *arg __rte_unused) > reconn != NULL; reconn = next) { > next = TAILQ_NEXT(reconn, next); > > + if (reconn->conn_inprogress) { > + /* do connect check here */ > + } > + > if (connect(reconn->fd, (struct sockaddr *)&reconn->un, > sizeof(reconn->un)) < 0) > continue; > @@ -605,6 +609,7 @@ vhost_user_create_client(struct vhost_user_socket *vsocket) > reconn->un = un; > reconn->fd = fd; > reconn->vsocket = vsocket; > + reconn->conn_inprogress = errno == EINPROGRESS; > pthread_mutex_lock(&reconn_list.mutex); > TAILQ_INSERT_TAIL(&reconn_list.head, reconn, next); > pthread_mutex_unlock(&reconn_list.mutex); > > It's just a rough diff, hopefully it shows my idea clearly. And of > course, we should not call connect() anymore when conn_inprogress > is set. > > What do you think of it? I found that we can't check connection status without select/poll on it. 'getsockopt()' will return 0 with no errors if connection is not still established just like if it was. So, I think, the first version of this patch is the only acceptable solution. Best regards, Ilya Maximets.