From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: yamahata@private.email.ne.jp, quintela@redhat.com,
hinesmr@cn.ibm.com, qemu-devel@nongnu.org, mrhines@us.ibm.com
Subject: Re: [Qemu-devel] qemu_rdma_cleanup seg - related to 5a91337?
Date: Tue, 18 Feb 2014 09:47:30 +0800 [thread overview]
Message-ID: <5302BBB2.4010704@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140217090602.GA2978@work-vm>
On 02/17/2014 05:06 PM, Dr. David Alan Gilbert wrote:
> * Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
>> On 02/06/2014 08:26 PM, Dr. David Alan Gilbert wrote:
>>> Hi Isaku,
>>> I hit a seg in qemu_rdma_cleanup in the code changed by your
>>> '[PATCH] rdma: clean up of qemu_rdma_cleanup()'
>>>
>>> migration-rdma.c ~ 2241
>>>
>>> if (rdma->qp) {
>>> rdma_destroy_qp(rdma->cm_id);
>>> rdma->qp = NULL;
>>> }
>>>
>>> Your patch changed that to free cm_id at that point rather than
>>> qp; but in my case cm_id is NULL and so rdma_destroy_qp segs.
>>>
>>> given that there is a :
>>>
>>> if (rdma->cm_id) {
>>> rdma_destroy_id(rdma->cm_id);
>>> rdma->cm_id = NULL;
>>> }
>>>
>>> later down, and there is now no longer any destroy of rdma->qp
>>> I don't understand your change.
>>>
>>> Your change text says:
>>> '- RDMAContext::qp is created by rdma_create_qp() so that it should be destroyed
>>> by rdma_destroy_qp(). not ibv_destroy_qp()'
>>>
>>> but the diff is:
>>> if (rdma->qp) {
>>> - ibv_destroy_qp(rdma->qp);
>>> + rdma_destroy_qp(rdma->cm_id);
>>> rdma->qp = NULL;
>>>
>>> should that have been rdma_destroy_qp(rdma->qp)?
>>>
>>> Dave (who doesn't yet know enough RDMA to be dangerous)
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> Hi Michael,
>
>> Responding for Isaku..... Thanks for reporting the bug, but I need some help
>> in tracking down the cause of the bug, see below.
>
>> Actually, the parameter "rdma->cm_id" to the function is correct, it's just
>> that the variable never got initialized in the first place, which
>> means that either
>> the connection never got established or an early error happened during
>> the migration that required cleaning up the identifier.
>>
>> Can you describe the conditions of the migration and the environment?
>> 1. Did you migrate only one VM? Was the host under heavy load?
>> 2. Did your migration lose connectivity? Did one of the hosts crash?
>> 3. Was the connection abruptly broken for some reason?
>> 4. Did you ever cancel the migration at some point and restart?
>> 5. Did you use libvirt?
> This is my 1st attempt with RDMA and I'm using softiwarp and
> getting an early error, I've not tracked down why yet, hence why I'm
> only really reporting the cleanup seg.
>
> outgoing:
> rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No such file or directory
> RDMA ERROR: connecting to destination!
> migration: setting error state
> migration: setting error state
> migrate: RDMA ERROR: connecting to destination!
> (qemu)
>
> incoming:
> ibv_poll_cq wc.status=5 Work Request Flushed Error!
> ibv_poll_cq wrid=CONTROL RECV!
> messages from qemu_rdma_poll
>
> I've not 100% sure which side fails first yet, but I believe that
> incoming fails after outgoing calls rdma_connect but before it calls
> rdma_get_cm_event; but as I say I'm new to RDMA and it's my 1st time
> trying to debug it.
OK, yes, that explains it. That means the cm_id was never successfully
connected
to begin with, so I'll just go ahead with a patch to check for NULL
properly.
And regarding softiwarp - I recommend making sure that the standard RDMA
helper utilities from OFED are working cleanly first, like 'ucmatose'
and rdma_read/write
and so forth between the two machines you're trying to use.
I've successfully migrated over softiwarp before - but only after making
sure the
utilities were working.....
- Michael
prev parent reply other threads:[~2014-02-18 1:47 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-06 12:26 [Qemu-devel] qemu_rdma_cleanup seg - related to 5a91337? Dr. David Alan Gilbert
2014-02-17 7:28 ` Michael R. Hines
2014-02-17 9:06 ` Dr. David Alan Gilbert
2014-02-18 1:47 ` Michael R. Hines [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5302BBB2.4010704@linux.vnet.ibm.com \
--to=mrhines@linux.vnet.ibm.com \
--cc=dgilbert@redhat.com \
--cc=hinesmr@cn.ibm.com \
--cc=mrhines@us.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=yamahata@private.email.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).