From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43542) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yaiiy-00051m-Ii for qemu-devel@nongnu.org; Wed, 25 Mar 2015 06:39:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yaiit-0004Bz-MA for qemu-devel@nongnu.org; Wed, 25 Mar 2015 06:39:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54261) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yaiit-0004B3-Eq for qemu-devel@nongnu.org; Wed, 25 Mar 2015 06:39:43 -0400 Date: Wed, 25 Mar 2015 10:39:37 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20150325103937.GC2313@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4bc35b66-3d20-411a-a6d1-6f3263a13758@CMEXHTCAS2.ad.emulex.com> Subject: Re: [Qemu-devel] [PATCH] rdma: Fix cleanup in error paths List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Padmanabh Ratnakar Cc: Meghana Cheripady , quintela@redhat.com, qemu-devel@nongnu.org, mrhines@us.ibm.com, amit.shah@redhat.com > As part of commit e325b49a320b493cc5d69e263751ff716dc458fe, > order in which resources are destroyed was changed for fixing > a seg fault. Due to this change, CQ will never get destroyed as > CQ should be destroyed after QP destruction. Seg fault is caused > improper cleanup when connection fails. Fixing cleanup after > connection failure and order in which resources are destroyed > in qemu_rdma_cleanup() routine. Do you have a test case that triggers the seg fault? > Signed-off-by: Meghana Cheripady > Signed-off-by: Padmanabh Ratnakar > --- > migration/rdma.c | 22 ++++++++-------------- > 1 files changed, 8 insertions(+), 14 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index e6c3a67..77e3444 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2194,6 +2194,10 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > } > } > > + if (rdma->qp) { > + rdma_destroy_qp(rdma->cm_id); > + rdma->qp = NULL; > + } OK, that makes sense, the manpage for rdma_destroy_qp says 'Users must destroy any QP associated with an rdma_cm_id before destroying the ID.' so it's probably good to have this early. > if (rdma->cq) { > ibv_destroy_cq(rdma->cq); > rdma->cq = NULL; > @@ -2206,18 +2210,14 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > ibv_dealloc_pd(rdma->pd); > rdma->pd = NULL; > } > - if (rdma->listen_id) { > - rdma_destroy_id(rdma->listen_id); > - rdma->listen_id = NULL; > - } > if (rdma->cm_id) { > - if (rdma->qp) { > - rdma_destroy_qp(rdma->cm_id); > - rdma->qp = NULL; > - } > rdma_destroy_id(rdma->cm_id); > rdma->cm_id = NULL; > } > + if (rdma->listen_id) { > + rdma_destroy_id(rdma->listen_id); > + rdma->listen_id = NULL; > + } Can you explain this reorder - why is the order of listen_id and cm_id important? is that a failure on the destination? > if (rdma->channel) { > rdma_destroy_event_channel(rdma->channel); > rdma->channel = NULL; > @@ -2309,8 +2309,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp) > if (ret) { > perror("rdma_connect"); > ERROR(errp, "connecting to destination!"); > - rdma_destroy_id(rdma->cm_id); > - rdma->cm_id = NULL; > goto err_rdma_source_connect; > } > > @@ -2319,8 +2317,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp) > perror("rdma_get_cm_event after rdma_connect"); > ERROR(errp, "connecting to destination!"); > rdma_ack_cm_event(cm_event); > - rdma_destroy_id(rdma->cm_id); > - rdma->cm_id = NULL; > goto err_rdma_source_connect; > } > > @@ -2328,8 +2324,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp) > perror("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect"); > ERROR(errp, "connecting to destination!"); > rdma_ack_cm_event(cm_event); > - rdma_destroy_id(rdma->cm_id); > - rdma->cm_id = NULL; > goto err_rdma_source_connect; > } > rdma->connected = true; > -- > 1.7.1 Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK