public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Roberson <jroberson-gUAg20sWgfgcWVvVuXF20w@public.gmane.org>
To: Jason Gunthorpe
	<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: LID reconfiguration
Date: Mon, 16 Nov 2009 18:38:40 -1000 (HST)	[thread overview]
Message-ID: <alpine.BSF.2.00.0911161835230.1226@desktop> (raw)
In-Reply-To: <20091110002047.GJ6188-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Mon, 9 Nov 2009, Jason Gunthorpe wrote:

> On Mon, Nov 09, 2009 at 01:56:49PM -1000, Jeff Roberson wrote:
>>>> Is there anything I can do other than restart the discovery and
>>>> connection process?  Shouldn't we have enough information with the GID to
>>>> retain and reroute the connection?
>>>
>>> With a GID you can go back to the SM and get an updated set of
>>> path records with the new LID data.
>>
>> Ok, so the QPs will be held in an error state but I can restart them once
>> I re-initialize the paths right?  I can query the path using umad and get
>> path record?  So we'll have a minor hicup in communication but previously
>> buffered data will be sent as soon as the QP is valid again?
>
> I've never heard of someone recovering QPs once they reach the error
> state, I think they are pretty much done at that point. You have to
> start again.
>
> To get hitless switching to the passive backup pass you need to use
> the IB APM feature.

Is there an opensource example of using APM?  When I call ib_cm_send_lap() 
the QP goes to some error state and my connections die.  Do I need to set 
some QP attributes first?  I found a paper that describes the process but 
does not contain sample code or any real details.

Thanks,
Jeff

>
> Otherwse, you could detect failure of the QP and issue a new PR query
> for the GID using umad and then try again to connect - depending on
> how your home grown connection process works I guess..
>
>> We are not using IPoIB at the moment.  This is for an appliance type
>> device and the customers will be responsible for their own switches.  At
>> present everything simply stops working when we re-lid so I just need to
>> add the correct failure handling code.
>
> Detect failure and start again from stratch is what pretty much
> everyone does today, AFAIK.
>
>>> rdmacm when combined with IPoIB bonding will give you a kind of
>>> active/passive HA type multi-path.
>>
>> That is essentially what we're looking for.  We discover the devices
>> automatically but transparent multi-path would've saved a lot of work.
>
> Yes, you probably could have used the bonding feature, but note it
> does not save you from errored QPs in the failover case and I've had
> problems with IPoIB PR caching in LID-change cases in the past..
>
> Jason
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2009-11-17  4:38 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-09 23:30 LID reconfiguration Jeff Roberson
2009-11-09 23:45 ` Jason Gunthorpe
     [not found]   ` <20091109234547.GH6188-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-09 23:56     ` Jeff Roberson
2009-11-10  0:20       ` Jason Gunthorpe
     [not found]         ` <20091110002047.GJ6188-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-17  4:38           ` Jeff Roberson [this message]
2009-11-17  4:51             ` Jason Gunthorpe
2009-11-17  5:15             ` Sean Hefty
     [not found]               ` <6A30FB8CEED94D778E7CDAE4660458DA-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-11-24  3:57                 ` Jeff Roberson
2009-11-24 17:49                   ` Sean Hefty
     [not found]                     ` <10477AA8CF094F2F92E8792307982F66-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-11-24 19:54                       ` Jeff Roberson
2009-11-24 19:59                         ` Sean Hefty
     [not found]                           ` <65B503E4F968463B8D6D5D019E036ED7-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-12-01  0:28                             ` Jeff Roberson
2009-12-01 15:55                               ` Sean Hefty
2009-11-10  7:07 ` Or Gerlitz
     [not found]   ` <4AF91138.7000809-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2009-11-10  7:11     ` Jeff Roberson
2009-11-10  7:44       ` Or Gerlitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.BSF.2.00.0911161835230.1226@desktop \
    --to=jroberson-guag20swgfgcwvvvuxf20w@public.gmane.org \
    --cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox