From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35473) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUmnq-0004M0-OI for qemu-devel@nongnu.org; Mon, 31 Mar 2014 20:43:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WUmnj-0004Df-3L for qemu-devel@nongnu.org; Mon, 31 Mar 2014 20:43:46 -0400 Received: from e8.ny.us.ibm.com ([32.97.182.138]:40472) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUmni-0004Cx-UL for qemu-devel@nongnu.org; Mon, 31 Mar 2014 20:43:39 -0400 Received: from /spool/local by e8.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 31 Mar 2014 20:43:34 -0400 Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 526D838C8027 for ; Mon, 31 Mar 2014 20:43:32 -0400 (EDT) Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by b01cxnp22035.gho.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s310hWqi9634088 for ; Tue, 1 Apr 2014 00:43:32 GMT Received: from d01av01.pok.ibm.com (localhost [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s310hV6F016516 for ; Mon, 31 Mar 2014 20:43:31 -0400 Message-ID: <533A0B67.6010308@linux.vnet.ibm.com> Date: Tue, 01 Apr 2014 08:42:15 +0800 From: "Michael R. Hines" MIME-Version: 1.0 References: <1396078745-5584-1-git-send-email-arei.gonglei@huawei.com> In-Reply-To: <1396078745-5584-1-git-send-email-arei.gonglei@huawei.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] rdma: Fix block during rdma migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: arei.gonglei@huawei.com, qemu-devel@nongnu.org Cc: weidong.huang@huawei.com, quintela@redhat.com, dgilbert@redhat.com, owasserm@redhat.com, mrhines@us.ibm.com, pbonzini@redhat.com, Mo Yuxiang On 03/29/2014 03:39 PM, arei.gonglei@huawei.com wrote: > From: Mo Yuxiang > > If the networking break or there's something wrong with rdma > device(ib0 with no IP) during rdma migration, the main_loop of > qemu will be blocked in rdma_destroy_id. I add rdma_ack_cm_event > to fix this bug. > > Signed-off-by: Mo Yuxiang > Signed-off-by: Gonglei > --- > migration-rdma.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/migration-rdma.c b/migration-rdma.c > index eeb4302..f60749b 100644 > --- a/migration-rdma.c > +++ b/migration-rdma.c > @@ -949,6 +949,7 @@ route: > ERROR(errp, "result not equal to event_addr_resolved %s", > rdma_event_str(cm_event->event)); > perror("rdma_resolve_addr"); > + rdma_ack_cm_event(cm_event); > ret = -EINVAL; > goto err_resolve_get_addr; > } Reviewed-by: Michael R. Hines Good catch. =) That's an obvious bug. It looks like I need to do a much better job of "kill -9" inside the regression testing scripts - probably i should try killing the migration prematurely at different periods just to be sure there are no more places where the connection state is not getting cleaned up...... - Michael