From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39301) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UrYCH-0005y0-4k for qemu-devel@nongnu.org; Tue, 25 Jun 2013 14:42:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UrYC9-0003nv-9B for qemu-devel@nongnu.org; Tue, 25 Jun 2013 14:42:33 -0400 Received: from e31.co.us.ibm.com ([32.97.110.149]:33560) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UrYC9-0003np-0O for qemu-devel@nongnu.org; Tue, 25 Jun 2013 14:42:25 -0400 Received: from /spool/local by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 25 Jun 2013 12:42:22 -0600 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id 7ED3C19D805F for ; Tue, 25 Jun 2013 12:38:23 -0600 (MDT) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r5PIcRVt029574 for ; Tue, 25 Jun 2013 12:38:30 -0600 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r5PIes9x007070 for ; Tue, 25 Jun 2013 12:40:54 -0600 Message-ID: <51C9E3A1.9060107@linux.vnet.ibm.com> Date: Tue, 25 Jun 2013 14:38:25 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <1372125485-11795-1-git-send-email-mrhines@linux.vnet.ibm.com> <1372125485-11795-12-git-send-email-mrhines@linux.vnet.ibm.com> <20130625163149.GA10901@dhcp-192-168-178-175.profitbricks.localdomain> In-Reply-To: <20130625163149.GA10901@dhcp-192-168-178-175.profitbricks.localdomain> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v11 11/15] rdma: core logic List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vasilis Liaskovitis Cc: aliguori@us.ibm.com, quintela@redhat.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, pbonzini@redhat.com, chegu_vinod@hp.com, knoel@redhat.com On 06/25/2013 12:31 PM, Vasilis Liaskovitis wrote: > Hi, > > On Mon, Jun 24, 2013 at 09:58:01PM -0400, mrhines@linux.vnet.ibm.com wrote: >> From: "Michael R. Hines" >> > [...] >> +/* >> + * Put in the log file which RDMA device was opened and the details >> + * associated with that device. >> + */ >> +static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs) >> +{ >> + printf("%s RDMA Device opened: kernel name %s " >> + "uverbs device name %s, " >> + "infiniband_verbs class device path %s," >> + " infiniband class device path %s\n", >> + who, >> + verbs->device->name, >> + verbs->device->dev_name, >> + verbs->device->dev_path, >> + verbs->device->ibdev_path); >> +} > see below > >> +static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) >> +{ >> + int ret = -EINVAL, idx; >> + struct sockaddr_in sin; >> + struct rdma_cm_id *listen_id; >> + char ip[40] = "unknown"; >> + >> + for (idx = 0; idx < RDMA_CONTROL_MAX_WR; idx++) { >> + rdma->wr_data[idx].control_len = 0; >> + rdma->wr_data[idx].control_curr = NULL; >> + } >> + >> + if (rdma->host == NULL) { >> + ERROR(errp, "RDMA host is not set!\n"); >> + rdma->error_state = -EINVAL; >> + return -1; >> + } >> + /* create CM channel */ >> + rdma->channel = rdma_create_event_channel(); >> + if (!rdma->channel) { >> + ERROR(errp, "could not create rdma event channel\n"); >> + rdma->error_state = -EINVAL; >> + return -1; >> + } >> + >> + /* create CM id */ >> + ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP); >> + if (ret) { >> + ERROR(errp, "could not create cm_id!\n"); >> + goto err_dest_init_create_listen_id; >> + } >> + >> + memset(&sin, 0, sizeof(sin)); >> + sin.sin_family = AF_INET; >> + sin.sin_port = htons(rdma->port); >> + >> + if (rdma->host && strcmp("", rdma->host)) { >> + struct hostent *dest_addr; >> + dest_addr = gethostbyname(rdma->host); >> + if (!dest_addr) { >> + ERROR(errp, "migration could not gethostbyname!\n"); >> + ret = -EINVAL; >> + goto err_dest_init_bind_addr; >> + } >> + memcpy(&sin.sin_addr.s_addr, dest_addr->h_addr, >> + dest_addr->h_length); >> + inet_ntop(AF_INET, dest_addr->h_addr, ip, sizeof ip); >> + } else { >> + sin.sin_addr.s_addr = INADDR_ANY; >> + } >> + >> + DPRINTF("%s => %s\n", rdma->host, ip); >> + >> + ret = rdma_bind_addr(listen_id, (struct sockaddr *)&sin); >> + if (ret) { >> + ERROR(errp, "Error: could not rdma_bind_addr!\n"); >> + goto err_dest_init_bind_addr; >> + } >> + >> + rdma->listen_id = listen_id; >> + if (listen_id->verbs) { >> + rdma->verbs = listen_id->verbs; >> + } >> + qemu_rdma_dump_id("dest_init", rdma->verbs); > > I wonder if you have ever hit the case where rdma_bind_addr() does not set the > verbs structure in listen_id because we are binding to the loopback device (also > see linux kernel commit 8523c048). > I keep hitting this case on my destination VM ("incoming x-rdma:host:port) > > Then I think qemu_rdma_dump_id can segfault trying to dereference a null verbs > structure. The dump_id function should check for non-NULL verbs argument, > or the dump should be made only in the (verbs != NULL) if clause. > > Disabling the dump_id above, I have rdma_resolve_addr() problems on the source > VM side (getting RDMA_CM_EVENT_ADDR_ERROR instead of > RDMA_CM_EVENT_ADDR_RESOLVED). > > I assume that is because of the null verbs structure destination problem above. > qemu_rdma_dest_prepare() will always fail with a NULL verbs argument: Good catch, thank you. I'll fix this immediately in the next version. I never tried binding to the localhost before...... >> + >> +static int qemu_rdma_dest_prepare(RDMAContext *rdma, Error **errp) >> +{ >> + int ret; >> + int idx; >> + >> + if (!rdma->verbs) { >> + ERROR(errp, "no verbs context!\n"); >> + return 0; >> + } > It is first called from rdma_start_incoming_migration() and will fail with the > loopback binding case (rdma->verbs == NULL). > > however later qemu_rdma_accept() will check against the incoming cm_event > verbs structure and set the RDMAContext's verb struct, calling > qemu_rdma_dest_prepare with that struct: > > [...] >> +static int qemu_rdma_accept(RDMAContext *rdma) > [...] >> + if (!rdma->verbs) { >> + rdma->verbs = verbs; >> + /* >> + * Cannot propagate errp, as there is no error pointer >> + * to be propagated. >> + */ >> + ret = qemu_rdma_dest_prepare(rdma, NULL); >> + if (ret) { >> + fprintf(stderr, "rdma migration: error preparing dest!\n"); >> + goto err_rdma_dest_wait; >> + } > Are these two cases intentionally different? Another good catch. We should definitely allow loopback RDMA without any issues. I will fix this immediately as well in the next version. - Michael