From: Jeff Roberson <jroberson-gUAg20sWgfgcWVvVuXF20w@public.gmane.org>
To: Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: LID reconfiguration
Date: Mon, 16 Nov 2009 18:38:40 -1000 (HST) [thread overview]
Message-ID: <alpine.BSF.2.00.0911161835230.1226@desktop> (raw)
In-Reply-To: <20091110002047.GJ6188-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On Mon, 9 Nov 2009, Jason Gunthorpe wrote:
> On Mon, Nov 09, 2009 at 01:56:49PM -1000, Jeff Roberson wrote:
>>>> Is there anything I can do other than restart the discovery and
>>>> connection process? Shouldn't we have enough information with the GID to
>>>> retain and reroute the connection?
>>>
>>> With a GID you can go back to the SM and get an updated set of
>>> path records with the new LID data.
>>
>> Ok, so the QPs will be held in an error state but I can restart them once
>> I re-initialize the paths right? I can query the path using umad and get
>> path record? So we'll have a minor hicup in communication but previously
>> buffered data will be sent as soon as the QP is valid again?
>
> I've never heard of someone recovering QPs once they reach the error
> state, I think they are pretty much done at that point. You have to
> start again.
>
> To get hitless switching to the passive backup pass you need to use
> the IB APM feature.
Is there an opensource example of using APM? When I call ib_cm_send_lap()
the QP goes to some error state and my connections die. Do I need to set
some QP attributes first? I found a paper that describes the process but
does not contain sample code or any real details.
Thanks,
Jeff
>
> Otherwse, you could detect failure of the QP and issue a new PR query
> for the GID using umad and then try again to connect - depending on
> how your home grown connection process works I guess..
>
>> We are not using IPoIB at the moment. This is for an appliance type
>> device and the customers will be responsible for their own switches. At
>> present everything simply stops working when we re-lid so I just need to
>> add the correct failure handling code.
>
> Detect failure and start again from stratch is what pretty much
> everyone does today, AFAIK.
>
>>> rdmacm when combined with IPoIB bonding will give you a kind of
>>> active/passive HA type multi-path.
>>
>> That is essentially what we're looking for. We discover the devices
>> automatically but transparent multi-path would've saved a lot of work.
>
> Yes, you probably could have used the bonding feature, but note it
> does not save you from errored QPs in the failover case and I've had
> problems with IPoIB PR caching in LID-change cases in the past..
>
> Jason
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-11-17 4:38 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-09 23:30 LID reconfiguration Jeff Roberson
2009-11-09 23:45 ` Jason Gunthorpe
[not found] ` <20091109234547.GH6188-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-09 23:56 ` Jeff Roberson
2009-11-10 0:20 ` Jason Gunthorpe
[not found] ` <20091110002047.GJ6188-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-17 4:38 ` Jeff Roberson [this message]
2009-11-17 4:51 ` Jason Gunthorpe
2009-11-17 5:15 ` Sean Hefty
[not found] ` <6A30FB8CEED94D778E7CDAE4660458DA-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-11-24 3:57 ` Jeff Roberson
2009-11-24 17:49 ` Sean Hefty
[not found] ` <10477AA8CF094F2F92E8792307982F66-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-11-24 19:54 ` Jeff Roberson
2009-11-24 19:59 ` Sean Hefty
[not found] ` <65B503E4F968463B8D6D5D019E036ED7-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-12-01 0:28 ` Jeff Roberson
2009-12-01 15:55 ` Sean Hefty
2009-11-10 7:07 ` Or Gerlitz
[not found] ` <4AF91138.7000809-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2009-11-10 7:11 ` Jeff Roberson
2009-11-10 7:44 ` Or Gerlitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.BSF.2.00.0911161835230.1226@desktop \
--to=jroberson-guag20swgfgcwvvvuxf20w@public.gmane.org \
--cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox