public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
To: Michael Robbert <mrobbert-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: ibstat stuck in state initialized after reboot
Date: Wed, 24 Mar 2010 09:38:05 -0700	[thread overview]
Message-ID: <20100324093805.4c7c1034.weiny2@llnl.gov> (raw)
In-Reply-To: <E25E098F-AFCA-4FEA-BE46-5AF59C408293-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>

On Wed, 24 Mar 2010 10:26:02 -0600
Michael Robbert <mrobbert-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org> wrote:

> I hope this is the correct place to get help with the problem I have. I have
> an IB fabric running on a Cisco SFS switch with a 7000D as the subnet
> manager and the whole thing has been running great for well over a year now,
> but today I noticed that after any node gets rebooted its IB link doesn't
> initialize. This has happened on 4 hosts now. What I see is as follows:
> 
> [root@compute-2-7 ~]# ibstat
> CA 'mthca0'
>        CA type: MT25204
>        Number of ports: 1
>        Firmware version: 1.2.917
>        Hardware version: 20
>        Node GUID: 0x0005ad00000c0990
>        System image GUID: 0x0005ad000100d050
>        Port 1:
>                State: Initializing
>                Physical state: LinkUp
>                Rate: 20
>                Base lid: 0
>                LMC: 0
>                SM lid: 0
>                Capability mask: 0x02510a68
>                Port GUID: 0x0005ad00000c0991
> 
> I don't know much about subnet managers, since ours is in hardware and we've
> never had to configure anything on it, but I can login to the device and it
> isn't showing any errors. On a node that hasn't been rebooted recently and
> is still working I can see what appears to be a working subnet manager:
> 
> [root@compute-2-10 ~]# sminfo 
> sminfo: sm lid 2 sm guid 0x5ad00001df2a0, activity count 2146213408 priority 10 state 3 SMINFO_MASTER
> 
> The same command on a non-working node shows this:
> 
> [root@compute-2-7 ~]# sminfo 
> sminfo: sm lid 0 sm guid 0x0, activity count 0 priority 0 state 2 SMINFO_STANDBY
> 
> So far I have reseated all the cables involved on both ends and I have moved
> the cables on the switch end to new ports and none of that has made a
> difference even after reboots. I am hoping to find a node that I can take
> offline tomorrow so I can actually test the cables, but since this seems to
> be happening to any host that reboots it doesn't appear to be a cabling
> problem. Can anybody suggest where I should go from here? Is there anything
> I can do from a working or non-working host to diagnose the problem? Should
> I try rebooting the subnet manager switch? Will that affect the rest of the
> fabric? 

Have you spoken to Cisco about the problem?  You say you can log into the
"device" (the SM switch?) if so talk to Cisco about how you may be able to
restart the SM there.

It does sound like the SM on the switch is failing to transition the links.
If you can restart the SM on the switch I would try that first.  Otherwise yes
rebooting the switch is probably your best bet, and yes it will affect the
fabric, although I can't say how much without knowing the topology.

Ira

> 
> Thanks,
> Mike Robbert
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2010-03-24 16:38 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-24 16:26 ibstat stuck in state initialized after reboot Michael Robbert
     [not found] ` <E25E098F-AFCA-4FEA-BE46-5AF59C408293-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
2010-03-24 16:38   ` Ira Weiny [this message]
     [not found]     ` <20100324093805.4c7c1034.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-03-24 16:59       ` Michael Robbert
     [not found]         ` <4256D4F9-36CC-4C21-A459-B69B363F29C9-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
2010-03-24 17:12           ` Meyer, Donald J
     [not found]             ` <6203933669E90E4AB42B5BC4EDE38D350C9B6386B6-qERRe+bbXDTTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-24 17:34               ` Michael Robbert
     [not found]                 ` <230744DB-D7A7-4A1C-973E-E0D7097554DE-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
2010-03-24 18:25                   ` Ira Weiny
     [not found]                     ` <20100324112525.fc4a8eb9.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-03-24 19:16                       ` Chuck Hartley
2010-03-24 19:42                       ` Michael Robbert
     [not found]                         ` <13A62F2E-BA5E-41AD-B020-53A8102F2738-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org>
2010-03-24 20:29                           ` Meyer, Donald J
2010-03-24 20:37                           ` Ira Weiny
2010-03-24 18:26                   ` Michael Robbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100324093805.4c7c1034.weiny2@llnl.gov \
    --to=weiny2-i2bct+ncu+m@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mrobbert-/qOHPfZA4H6HXe+LvDLADg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox