From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yuanhan Liu Subject: Re: [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex Date: Thu, 18 Jan 2018 22:03:30 +0800 Message-ID: <20180118140330.GI29540@yliu-mob> References: <1514887716-26343-1-git-send-email-wangzhike@jd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: dev@dpdk.org To: zhike wang Return-path: Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by dpdk.org (Postfix) with ESMTP id 685641B30C for ; Thu, 18 Jan 2018 15:03:36 +0100 (CET) Content-Disposition: inline In-Reply-To: <1514887716-26343-1-git-send-email-wangzhike@jd.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, Apologize for late review. On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote: > From: wang zhike > > v3: > * Fix duplicate variable name, which leads to unexpected memory write. > v2: > * Move fdset_del before conn destroy. > * Fix coding style. Note that we prefer to put the change logs after "---" below Signed-off-by, so that those change logs won't be tracked in the git log history. > This patch fixes below race condition: > 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex > ->fdset_del->loop to check fd.busy. > 2. another thread calls fdset_event_dispatch, and the busy flag is > changed AFTER handling on the fd, i.e, rcb(). However, the rcb, > such as vhost_user_read_cb() would try to retrieve the conn_mutex. > > So issue is that the 1st thread will loop check the flag while holding > the mutex, while the 2nd thread would be blocked by mutex and can not > change the flag. Then dead lock is observed. I then would change the title to "vhost: fix deadlock". I'm also keen to know how do you reproduce this issue with real-life APP (say ovs) and how easy it is for reproduce. > Signed-off-by: zhike wang Again, you need fix your git config file about your name. > --- > lib/librte_vhost/socket.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > index 422da00..ea01327 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list { > struct vhost_user_socket *vsocket = vhost_user.vsockets[i]; > > if (!strcmp(vsocket->path, path)) { > + int del_fds[MAX_FDS]; > + int num_of_fds = 0, fd_index; > + I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds), fd_idx". > if (vsocket->is_server) { > fdset_del(&vhost_user.fdset, vsocket->socket_fd); > close(vsocket->socket_fd); > @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list { > vhost_user_remove_reconnect(vsocket); > } > > + /* fdset_del() must be called without conn_mutex. */ > + pthread_mutex_lock(&vsocket->conn_mutex); > + for (conn = TAILQ_FIRST(&vsocket->conn_list); > + conn != NULL; > + conn = next) { > + next = TAILQ_NEXT(conn, next); > + > + del_fds[num_of_fds++] = conn->connfd; > + } > + pthread_mutex_unlock(&vsocket->conn_mutex); > + > + for (fd_index = 0; fd_index < num_of_fds; fd_index++) > + fdset_del(&vhost_user.fdset, del_fds[fd_index]); > + > pthread_mutex_lock(&vsocket->conn_mutex); > for (conn = TAILQ_FIRST(&vsocket->conn_list); > conn != NULL; > conn = next) { > next = TAILQ_NEXT(conn, next); > > - fdset_del(&vhost_user.fdset, conn->connfd); If you log the fd here and invoke fdset_del() and close() after the loop, you then could avoid one extra loop as you did above. --yliu > RTE_LOG(INFO, VHOST_CONFIG, > "free connfd = %d for device '%s'\n", > conn->connfd, path); > -- > 1.8.3.1