From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ilya Maximets Subject: Re: [PATCH] vhost: fix connect hang in client mode Date: Thu, 21 Jul 2016 12:45:32 +0300 Message-ID: <579099BC.9050603@samsung.com> References: <1469089275-15209-1-git-send-email-i.maximets@samsung.com> <20160721093714.GD28708@yliu-dev.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: dev@dpdk.org, Huawei Xie , Dyasly Sergey , Heetae Ahn , Thomas Monjalon To: Yuanhan Liu Return-path: Received: from mailout4.w1.samsung.com (mailout4.w1.samsung.com [210.118.77.14]) by dpdk.org (Postfix) with ESMTP id E37FD47CD for ; Thu, 21 Jul 2016 11:45:35 +0200 (CEST) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout4.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OAN00C61SFXXR60@mailout4.w1.samsung.com> for dev@dpdk.org; Thu, 21 Jul 2016 10:45:33 +0100 (BST) In-reply-to: <20160721093714.GD28708@yliu-dev.sh.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 21.07.2016 12:37, Yuanhan Liu wrote: > On Thu, Jul 21, 2016 at 11:21:15AM +0300, Ilya Maximets wrote: >> If something abnormal happened to QEMU, 'connect()' can block calling >> thread (e.g. main thread of OVS) forever or for a really long time. >> This can break whole application or block the reconnection thread. >> >> Example with OVS: >> >> ovs_rcu(urcu2)|WARN|blocked 512000 ms waiting for main to quiesce >> (gdb) bt >> #0 connect () from /lib64/libpthread.so.0 >> #1 vhost_user_create_client (vsocket=0xa816e0) >> #2 rte_vhost_driver_register >> #3 netdev_dpdk_vhost_user_construct >> #4 netdev_open (name=0xa664b0 "vhost1") >> [...] >> #11 main >> >> Fix that by setting non-blocking mode for client sockets for connection. >> >> Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode") > > Thanks for spotting and fixing yet another bug! > >> >> +static int >> +vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) > > I don't quite understand why this is needed: connect() with O_NONBLOCK > flag set is not enough? There is a little issue with non-blocking connect() call. Connection establishing may be started but '-1' returned with 'errno = EINPROGRESS'. In this case we must wait on fd until it will be available for writing. After that we need to check current status of connection using getsockopt(). I don't sure that we're able to get such situation, but it's documented, and, I think, we should handle it. See 'man connect' for details. Best regards, Ilya Maximets.