All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gerben Roest <g.roest-99SnrGqf+M9mR6Xm/wNWPw@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: Problems with link, opensm complains IB_SA_MAD_STATUS_REQ_INVALID
Date: Fri, 16 Dec 2011 11:46:05 +0100	[thread overview]
Message-ID: <4EEB216D.2010407@grepit.nl> (raw)
In-Reply-To: <20111216091416.GA3448@calypso>

On 16-12-2011 10:14, Alex Netes wrote:
> Hi Gerben,
> 
> It's complaining about the link rate:
> 
> Dec 15 23:35:05 792236 [46B9F940] 0x04 -> validate_port_caps: Port's RATE 2 is less than 3
> 
> Probably, the host that is trying to join is connected via 1x cable.
> The rate is defined by the capabilities of the host that opened a group, so
> you see this problem only when the host with higher rate created the MC group.

Is it possible to force them to some specified speed?

The strange thing is that both hosts show this problem if they start
opensm, they have the same errors in /var/log/opensm.log. This is what
both hosts have:

[root@titus ~]# lspci -v |grep Infini
0a:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0
5GT/s - IB DDR / 10GigE] (rev a0)

[root@vespasianus ~]# lspci -v |grep Infini
0a:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0
5GT/s - IB DDR / 10GigE] (rev a0)

The hosts are connected to each other's single port via one IB cable.

[root@vespasianus ~]# grep -A1 -B1 INVALID /var/log/opensm.log| tail

Dec 16 11:35:10 041359 [483D2940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
from port 0x001e8c0000c84b62 (titus HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Dec 16 11:35:10 041365 [483D2940] 0x10 -> osm_sa_send_error: [
--
Dec 16 11:35:17 351591 [429C9940] 0x04 -> validate_port_caps: Port's
RATE 2 is less than 3
Dec 16 11:35:17 351598 [429C9940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
from port 0x001e8c0000b90641 (vespasianus HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Dec 16 11:35:17 351604 [429C9940] 0x10 -> osm_sa_send_error: [
--
Dec 16 11:35:18 042907 [43DCB940] 0x04 -> validate_port_caps: Port's
RATE 2 is less than 3
Dec 16 11:35:18 042914 [43DCB940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
from port 0x001e8c0000c84b62 (titus HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Dec 16 11:35:18 042920 [43DCB940] 0x10 -> osm_sa_send_error: [

Gerben


> 
> On 09:56 Fri 16 Dec     , Gerben Roest wrote:
>> On 16-12-2011 1:06, Ira Weiny wrote:
>>> On Thu, 15 Dec 2011 15:17:24 -0800
>>> Gerben Roest <g.roest-99SnrGqf+M9mR6Xm/wNWPw@public.gmane.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> Starting opensm from OFED 1.5.1, 1.5.3.2, 1.5.4 on a Scientific Linux 5
>>>> machine, directly linked to its neighbour (a twin 1U setup) gives me no
>>>> connection but lots of errors in /var/log/opensm.log, like these:
>>>>
>>>> Dec 15 22:38:35 685651 [45AFD940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
>>>> validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
>>>> from port 0x001e8c0000b90641 (vespasianus HCA-1), sending
>>>> IB_SA_MAD_STATUS_REQ_INVALID
>>>> Dec 15 22:38:35 686174 [464FE940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
>>>> validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
>>>> from port 0x001e8c0000c84b62 (titus HCA-1), sending
>>>> IB_SA_MAD_STATUS_REQ_INVALID
>>>>
>>>> Does anyone know what happens here? Another twin node has no problems,
>>>> that one uses OFED-1.5.1.
>>>>
>>>> I can send a "-V" log of opensm or any config files if you like,
>>>
>>> Just set -D 0x7 which adds VERBOSE and send the snippet around the above errors.
>>
>> Dec 15 23:35:05 791001 [4399A940] 0x10 -> osm_vendor_send: [
>> Dec 15 23:35:05 791008 [4399A940] 0x04 -> osm_vendor_send: RMPP 0 length 256
>> Dec 15 23:35:05 791021 [4399A940] 0x10 -> osm_vendor_put: [
>> Dec 15 23:35:05 791028 [4399A940] 0x08 -> osm_vendor_put: Retiring UMAD
>> 0x3dd9290
>> Dec 15 23:35:05 791034 [4399A940] 0x10 -> osm_vendor_put: ]
>> Dec 15 23:35:05 791040 [4399A940] 0x08 -> osm_vendor_send: Completed
>> sending response or unsolicited p_madw = 0x3ddf5c0
>> Dec 15 23:35:05 791046 [4399A940] 0x10 -> osm_vendor_send: ]
>> Dec 15 23:35:05 791051 [4399A940] 0x10 -> osm_sa_send_error: ]
>> Dec 15 23:35:05 791057 [4399A940] 0x10 -> mcmr_rcv_join_mgrp: ]
>> Dec 15 23:35:05 791062 [4399A940] 0x10 -> osm_mcmr_rcv_process: ]
>> Dec 15 23:35:05 791068 [4399A940] 0x10 -> sa_mad_ctrl_disp_done_callback: [
>> Dec 15 23:35:05 791073 [4399A940] 0x10 -> osm_vendor_put: [
>> Dec 15 23:35:05 791079 [4399A940] 0x08 -> osm_vendor_put: Retiring UMAD
>> 0x3dd7290
>> Dec 15 23:35:05 791084 [4399A940] 0x10 -> osm_vendor_put: ]
>> Dec 15 23:35:05 791090 [4399A940] 0x10 -> sa_mad_ctrl_disp_done_callback: ]
>> Dec 15 23:35:05 792086 [4B1A6940] 0x10 -> osm_vendor_get: [
>> Dec 15 23:35:05 792106 [4B1A6940] 0x08 -> osm_vendor_get: Acquiring UMAD
>> for p_madw = 0x3ddf5d8, size = 256
>> Dec 15 23:35:05 792117 [4B1A6940] 0x08 -> osm_vendor_get: Acquired UMAD
>> 0x3dd7290, size = 256
>> Dec 15 23:35:05 792126 [4B1A6940] 0x10 -> osm_vendor_get: ]
>> Dec 15 23:35:05 792132 [4B1A6940] 0x10 -> sa_mad_ctrl_rcv_callback: [
>> Dec 15 23:35:05 792139 [4B1A6940] 0x08 -> sa_mad_ctrl_rcv_callback: 4 SA
>> MADs received
>> Dec 15 23:35:05 792152 [4B1A6940] 0x20 -> SA MAD dump:
>>                                 base_ver................0x1
>>                                 mgmt_class..............0x3
>>                                 class_ver...............0x2
>>                                 method..................0x2 (SubnAdmSet)
>>                                 status..................0x0
>>                                 resv....................0x0
>>                                 trans_id................0x53bf6d21e
>>                                 attr_id.................0x38
>> (MCMemberRecord)
>>                                 resv1...................0x0
>>                                 attr_mod................0x0
>>                                 rmpp_version............0x0
>>                                 rmpp_type...............0x0
>>                                 rmpp_flags..............0x0
>>                                 rmpp_status.............0x0
>>                                 seg_num.................0x0
>>                                 payload_len/new_win.....0x0
>>                                 sm_key..................0x0000000000000000
>>                                 attr_offset.............0x0
>>                                 resv2...................0x0
>>                                 comp_mask...............0x0000000000010083
>>
>>
>> Dec 15 23:35:05 792158 [4B1A6940] 0x10 -> sa_mad_ctrl_process: [
>> Dec 15 23:35:05 792165 [4B1A6940] 0x08 -> sa_mad_ctrl_process: Posting
>> Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD
>> Dec 15 23:35:05 792187 [4B1A6940] 0x10 -> sa_mad_ctrl_process: ]
>> Dec 15 23:35:05 792194 [4B1A6940] 0x10 -> sa_mad_ctrl_rcv_callback: ]
>> Dec 15 23:35:05 792204 [46B9F940] 0x10 -> osm_mcmr_rcv_process: [
>> Dec 15 23:35:05 792211 [46B9F940] 0x10 -> mcmr_rcv_join_mgrp: [
>> Dec 15 23:35:05 792216 [46B9F940] 0x08 -> mcmr_rcv_join_mgrp: Dump of
>> incoming record
>> Dec 15 23:35:05 792228 [46B9F940] 0x08 -> MCMember Record dump:
>>
>> MGID....................ff12:401b:ffff::ffff:ffff
>>                                 PortGid.................fe80::1e:8c00:b9:641
>>                                 qkey....................0x0
>>                                 mlid....................0x0
>>                                 mtu.....................0x0
>>                                 TClass..................0x0
>>                                 pkey....................0xFFFF
>>                                 rate....................0x0
>>                                 pkt_life................0x0
>>                                 SLFlowLabelHopLimit.....0x0
>>                                 ScopeState..............0x1
>>                                 ProxyJoin...............0x0
>> Dec 15 23:35:05 792236 [46B9F940] 0x04 -> validate_port_caps: Port's
>> RATE 2 is less than 3
>> Dec 15 23:35:05 792243 [46B9F940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
>> validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
>> from port 0x001e8c0000b90641 (vespasianus HCA-1), sending
>> IB_SA_MAD_STATUS_REQ_INVALID
>> Dec 15 23:35:05 792253 [46B9F940] 0x10 -> osm_sa_send_error: [
>> Dec 15 23:35:05 792260 [46B9F940] 0x10 -> osm_vendor_get: [
>> Dec 15 23:35:05 792266 [46B9F940] 0x08 -> osm_vendor_get: Acquiring UMAD
>> for p_madw = 0x3dd73f8, size = 256
>> Dec 15 23:35:05 792273 [46B9F940] 0x08 -> osm_vendor_get: Acquired UMAD
>> 0x3dd9290, size = 256
>> Dec 15 23:35:05 792279 [46B9F940] 0x10 -> osm_vendor_get: ]
>> Dec 15 23:35:05 792291 [46B9F940] 0x20 -> SA MAD dump:
>>                                 base_ver................0x1
>>                                 mgmt_class..............0x3
>>                                 class_ver...............0x2
>>                                 method..................0x81
>> (SubnAdmGetResp)
>>                                 status..................0x200
>>                                 resv....................0x0
>>                                 trans_id................0x53bf6d21e
>>                                 attr_id.................0x38
>> (MCMemberRecord)
>>                                 resv1...................0x0
>>                                 attr_mod................0x0
>>                                 rmpp_version............0x0
>>                                 rmpp_type...............0x0
>>                                 rmpp_flags..............0x0
>>                                 rmpp_status.............0x0
>>                                 seg_num.................0x0
>>                                 payload_len/new_win.....0x0
>>                                 sm_key..................0x0000000000000000
>>                                 attr_offset.............0x0
>>                                 resv2...................0x0
>>                                 comp_mask...............0x0000000000010083
>>
>>
>> Dec 15 23:35:05 792298 [46B9F940] 0x10 -> osm_vendor_send: [
>> Dec 15 23:35:05 792304 [46B9F940] 0x04 -> osm_vendor_send: RMPP 0 length 256
>> Dec 15 23:35:05 792318 [46B9F940] 0x10 -> osm_vendor_put: [
>> Dec 15 23:35:05 792325 [46B9F940] 0x08 -> osm_vendor_put: Retiring UMAD
>> 0x3dd9290
>> Dec 15 23:35:05 792331 [46B9F940] 0x10 -> osm_vendor_put: ]
>> Dec 15 23:35:05 792337 [46B9F940] 0x08 -> osm_vendor_send: Completed
>> sending response or unsolicited p_madw = 0x3dd73e0
>> Dec 15 23:35:05 792343 [46B9F940] 0x10 -> osm_vendor_send: ]
>> Dec 15 23:35:05 792360 [46B9F940] 0x10 -> osm_sa_send_error: ]
>> Dec 15 23:35:05 792366 [46B9F940] 0x10 -> mcmr_rcv_join_mgrp: ]
>> Dec 15 23:35:05 792371 [46B9F940] 0x10 -> osm_mcmr_rcv_process: ]
>> Dec 15 23:35:05 792377 [46B9F940] 0x10 -> sa_mad_ctrl_disp_done_callback: [
>> Dec 15 23:35:05 792383 [46B9F940] 0x10 -> osm_vendor_put: [
>> Dec 15 23:35:05 792388 [46B9F940] 0x08 -> osm_vendor_put: Retiring UMAD
>> 0x3dd7e40
>> Dec 15 23:35:05 792394 [46B9F940] 0x10 -> osm_vendor_put: ]
>> Dec 15 23:35:05 792400 [46B9F940] 0x10 -> sa_mad_ctrl_disp_done_callback: ]
>> Dec 15 23:35:09 759207 [4A7A5940] 0x08 -> sm_sweeper: Off schedule sweep
>> signalled
>> Dec 15 23:35:09 759229 [4A7A5940] 0x10 -> osm_state_mgr_process: [
>> Dec 15 23:35:09 759240 [4A7A5940] 0x08 -> osm_state_mgr_process:
>> Received signal OSM_SIGNAL_SWEEP in state MASTER
>> Dec 15 23:35:09 759249 [4A7A5940] 0x10 -> state_mgr_sweep_hop_0: [
>> Dec 15 23:35:09 759258 [4A7A5940] 0x04 -> state_mgr_sweep_hop_0:
>>
>>
>>
>> thanks,
>>
>> Gerben
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 

Grep IT                      tel: 0252-769005
Egelantier 3                 fax: 0252-769006
2211 NN Noordwijkerhout     g.roest-99SnrGqf+M9mR6Xm/wNWPw@public.gmane.org
The Netherlands
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-12-16 10:46 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-15 23:17 Problems with link, opensm complains IB_SA_MAD_STATUS_REQ_INVALID Gerben Roest
     [not found] ` <4EEA8004.4060103-99SnrGqf+M9mR6Xm/wNWPw@public.gmane.org>
2011-12-16  0:06   ` Ira Weiny
     [not found]     ` <20111215160600.ebccb033.weiny2-i2BcT+NCU+M@public.gmane.org>
2011-12-16  8:56       ` Gerben Roest
     [not found]         ` <4EEB07C3.90803-99SnrGqf+M9mR6Xm/wNWPw@public.gmane.org>
2011-12-16  9:14           ` Alex Netes
2011-12-16 10:46             ` Gerben Roest [this message]
     [not found]               ` <4EEB216D.2010407-99SnrGqf+M9mR6Xm/wNWPw@public.gmane.org>
2011-12-16 12:30                 ` Hal Rosenstock
     [not found]                   ` <4EEB39E8.5030601-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2011-12-16 12:55                     ` Gerben Roest
     [not found]                       ` <4EEB3FD3.3080409-99SnrGqf+M9mR6Xm/wNWPw@public.gmane.org>
2011-12-16 13:10                         ` Hal Rosenstock
     [not found]                           ` <4EEB4362.1050505-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2011-12-16 15:37                             ` Gerben Roest
     [not found]                               ` <4EEB65D0.8040802-99SnrGqf+M9mR6Xm/wNWPw@public.gmane.org>
2011-12-16 15:43                                 ` Hal Rosenstock
     [not found]                                   ` <4EEB6729.8070600-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2011-12-16 15:56                                     ` Gerben Roest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EEB216D.2010407@grepit.nl \
    --to=g.roest-99snrgqf+m9mr6xm/wnwpw@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.