public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* ibstat stuck in state initialized after reboot
@ 2010-03-24 16:26 Michael Robbert
       [not found] ` <E25E098F-AFCA-4FEA-BE46-5AF59C408293-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Robbert @ 2010-03-24 16:26 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

I hope this is the correct place to get help with the problem I have. I have an IB fabric running on a Cisco SFS switch with a 7000D as the subnet manager and the whole thing has been running great for well over a year now, but today I noticed that after any node gets rebooted its IB link doesn't initialize. This has happened on 4 hosts now. What I see is as follows:

[root@compute-2-7 ~]# ibstat
CA 'mthca0'
       CA type: MT25204
       Number of ports: 1
       Firmware version: 1.2.917
       Hardware version: 20
       Node GUID: 0x0005ad00000c0990
       System image GUID: 0x0005ad000100d050
       Port 1:
               State: Initializing
               Physical state: LinkUp
               Rate: 20
               Base lid: 0
               LMC: 0
               SM lid: 0
               Capability mask: 0x02510a68
               Port GUID: 0x0005ad00000c0991

I don't know much about subnet managers, since ours is in hardware and we've never had to configure anything on it, but I can login to the device and it isn't showing any errors. On a node that hasn't been rebooted recently and is still working I can see what appears to be a working subnet manager:

[root@compute-2-10 ~]# sminfo 
sminfo: sm lid 2 sm guid 0x5ad00001df2a0, activity count 2146213408 priority 10 state 3 SMINFO_MASTER

The same command on a non-working node shows this:

[root@compute-2-7 ~]# sminfo 
sminfo: sm lid 0 sm guid 0x0, activity count 0 priority 0 state 2 SMINFO_STANDBY

So far I have reseated all the cables involved on both ends and I have moved the cables on the switch end to new ports and none of that has made a difference even after reboots. I am hoping to find a node that I can take offline tomorrow so I can actually test the cables, but since this seems to be happening to any host that reboots it doesn't appear to be a cabling problem. Can anybody suggest where I should go from here? Is there anything I can do from a working or non-working host to diagnose the problem? Should I try rebooting the subnet manager switch? Will that affect the rest of the fabric? 

Thanks,
Mike Robbert
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-03-24 20:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-24 16:26 ibstat stuck in state initialized after reboot Michael Robbert
     [not found] ` <E25E098F-AFCA-4FEA-BE46-5AF59C408293-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
2010-03-24 16:38   ` Ira Weiny
     [not found]     ` <20100324093805.4c7c1034.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-03-24 16:59       ` Michael Robbert
     [not found]         ` <4256D4F9-36CC-4C21-A459-B69B363F29C9-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
2010-03-24 17:12           ` Meyer, Donald J
     [not found]             ` <6203933669E90E4AB42B5BC4EDE38D350C9B6386B6-qERRe+bbXDTTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-24 17:34               ` Michael Robbert
     [not found]                 ` <230744DB-D7A7-4A1C-973E-E0D7097554DE-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
2010-03-24 18:25                   ` Ira Weiny
     [not found]                     ` <20100324112525.fc4a8eb9.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-03-24 19:16                       ` Chuck Hartley
2010-03-24 19:42                       ` Michael Robbert
     [not found]                         ` <13A62F2E-BA5E-41AD-B020-53A8102F2738-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
2010-03-24 20:29                           ` Meyer, Donald J
2010-03-24 20:37                           ` Ira Weiny
2010-03-24 18:26                   ` Michael Robbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox