From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43542)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Yaiiy-00051m-Ii
	for qemu-devel@nongnu.org; Wed, 25 Mar 2015 06:39:49 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Yaiit-0004Bz-MA
	for qemu-devel@nongnu.org; Wed, 25 Mar 2015 06:39:48 -0400
Received: from mx1.redhat.com ([209.132.183.28]:54261)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Yaiit-0004B3-Eq
	for qemu-devel@nongnu.org; Wed, 25 Mar 2015 06:39:43 -0400
Date: Wed, 25 Mar 2015 10:39:37 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20150325103937.GC2313@work-vm>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4bc35b66-3d20-411a-a6d1-6f3263a13758@CMEXHTCAS2.ad.emulex.com>
Subject: Re: [Qemu-devel] [PATCH] rdma: Fix cleanup in error paths
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Padmanabh Ratnakar <padmanabh.ratnakar@Emulex.Com>
Cc: Meghana Cheripady <meghana.cheripady@Emulex.Com>, quintela@redhat.com, qemu-devel@nongnu.org, mrhines@us.ibm.com, amit.shah@redhat.com

> As part of commit e325b49a320b493cc5d69e263751ff716dc458fe,
> order in which resources are destroyed was changed for fixing
> a seg fault. Due to this change, CQ will never get destroyed as
> CQ should be destroyed after QP destruction. Seg fault is caused
> improper cleanup when connection fails. Fixing cleanup after
> connection failure and order in which resources are destroyed
> in qemu_rdma_cleanup() routine.

Do you have a test case that triggers the seg fault?

> Signed-off-by: Meghana Cheripady <meghana.cheripady@emulex.com>
> Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
> ---
>  migration/rdma.c |   22 ++++++++--------------
>  1 files changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index e6c3a67..77e3444 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2194,6 +2194,10 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>          }
>      }
>  
> +    if (rdma->qp) {
> +        rdma_destroy_qp(rdma->cm_id);
> +        rdma->qp = NULL;
> +    }

OK, that makes sense, the manpage for rdma_destroy_qp says
'Users must destroy any QP associated with an rdma_cm_id before destroying the ID.'
so it's probably good to have this early.

>      if (rdma->cq) {
>          ibv_destroy_cq(rdma->cq);
>          rdma->cq = NULL;
> @@ -2206,18 +2210,14 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>          ibv_dealloc_pd(rdma->pd);
>          rdma->pd = NULL;
>      }
> -    if (rdma->listen_id) {
> -        rdma_destroy_id(rdma->listen_id);
> -        rdma->listen_id = NULL;
> -    }
>      if (rdma->cm_id) {
> -        if (rdma->qp) {
> -            rdma_destroy_qp(rdma->cm_id);
> -            rdma->qp = NULL;
> -        }
>          rdma_destroy_id(rdma->cm_id);
>          rdma->cm_id = NULL;
>      }
> +    if (rdma->listen_id) {
> +        rdma_destroy_id(rdma->listen_id);
> +        rdma->listen_id = NULL;
> +    }

Can you explain this reorder - why is the order of listen_id and cm_id important?
is that a failure on the destination?

>      if (rdma->channel) {
>          rdma_destroy_event_channel(rdma->channel);
>          rdma->channel = NULL;
> @@ -2309,8 +2309,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp)
>      if (ret) {
>          perror("rdma_connect");
>          ERROR(errp, "connecting to destination!");
> -        rdma_destroy_id(rdma->cm_id);
> -        rdma->cm_id = NULL;
>          goto err_rdma_source_connect;
>      }
>  
> @@ -2319,8 +2317,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp)
>          perror("rdma_get_cm_event after rdma_connect");
>          ERROR(errp, "connecting to destination!");
>          rdma_ack_cm_event(cm_event);
> -        rdma_destroy_id(rdma->cm_id);
> -        rdma->cm_id = NULL;
>          goto err_rdma_source_connect;
>      }
>  
> @@ -2328,8 +2324,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp)
>          perror("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
>          ERROR(errp, "connecting to destination!");
>          rdma_ack_cm_event(cm_event);
> -        rdma_destroy_id(rdma->cm_id);
> -        rdma->cm_id = NULL;
>          goto err_rdma_source_connect;
>      }
>      rdma->connected = true;
> -- 
> 1.7.1

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK