From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Roberson Subject: RE: LID reconfiguration Date: Tue, 24 Nov 2009 09:54:38 -1000 (HST) Message-ID: References: <20091109234547.GH6188@obsidianresearch.com> <20091110002047.GJ6188@obsidianresearch.com> <6A30FB8CEED94D778E7CDAE4660458DA@amr.corp.intel.com> <10477AA8CF094F2F92E8792307982F66@amr.corp.intel.com> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Return-path: In-Reply-To: <10477AA8CF094F2F92E8792307982F66-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Sean Hefty Cc: Jason Gunthorpe , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-rdma@vger.kernel.org On Tue, 24 Nov 2009, Sean Hefty wrote: >> Thank you. This worked for me. However, there seems to be some kind of >> race when the connection is first set up. On the client if I call >> ib_cm_send_lap() immediately after ib_cm_send_rtu() returns successfully I >> get an EINVAL error. If I delay for one second it works just fine. > > ib_cm_send_lap() returning EINVAL should indicate an immediate error, so this > should be an issue with the local side. It sounds like a possible bug in the > code, but I didn't see anything obvious from a quick look at the code. That's what I suspected. I wonder if the connection state isn't set properly until later? I'm really not sure. Without a kernel debugger it'll be hard to determine. I guess I can throw some printfs in to track this down unless there are better suggestions. > >> According to the spec the passive/server side can not send the lap so I >> can't send it in the rtu handler. Presumably the call fails immediately >> after send_rtu because the server hasn't received that message yet? Is >> this right? Is there a way to do this cleanly without a delay? > > I don't know that the code enforces that the passive side not send a LAP, (and > can't think of a reason why the protocol should have such a restriction.) It > may work. But, rather than sending a separate LAP immediately after connecting, > why not include the alternate path in the original REQ? This creates a race for me. We have a discovery process that finds nodes and paths to nodes. If it discovers a new path while the connection is in the process of being created it won't see an existing connection and we won't add the alternate path. To close this race I have to check for an alternate path when the connection is complete anyway. > >> I notice that if I create the initial attributes for the connection >> request with an alternate path specified the alt_path_state is still >> MIGRATED when I send rtu. If I load a path after the connection is >> established I can fail back and forth without issue. > > Can you clarify this a little more? What specific field are you looking at and > what state are you seeing it set to? > This turned out to be a bug in my code. I'm very confident in the post-connection alternate path code however. Thanks, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html