From: Bob Ciotti <Bob.Ciotti-NSQ8wuThN14@public.gmane.org>
To: Hal Rosenstock <hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: watchdog timer
Date: Fri, 18 May 2012 09:49:37 -0700 [thread overview]
Message-ID: <4FB67DA1.606@nasa.gov> (raw)
In-Reply-To: <4FB66817.6050106-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
On 05/18/2012 08:17 AM, Hal Rosenstock wrote:
> On 5/18/2012 10:35 AM, Bob Ciotti wrote:
>> On 05/18/2012 06:07 AM, Hal Rosenstock wrote:
>>> On 5/18/2012 2:05 AM, Bob Ciotti wrote:
>>>>
>>>>
>>>> I'm seeing lots of these messages in SM log:
>>>>
>>>> May 17 22:36:04 947774 [DA234710] 0x01 -> log_trap_info: Received
>>>> Generic Notice type:1 num:131 (Flow Control Update watchdog timer
>>>> expired) Producer:2 (Switch) from LID:444 Port 5 TID:0x0000000000000025
>>>>
>>>> the referenced port is a switch to HCA link.
>>>>
>>>> I've seen this in cases where there was bad hardware. Spec says failure
>>>> in flow control machine on other end. But lets assume hardware was good.
>>>> When could this occur?
>>>
>>> Do OperationalVLs match on both sides of the link ? Are you
>>> using/configuring QoS ?
>>>
>>
>>
>> There are two separate fabric on each port of 2 port HCA.
>> Issue is seen on both fabrics.
>
> So these are dual homed hcas onto disjoint IB subnets.
yes
>
>> Normally we use QoS on both fabrics. QoS now disabled on
>> ib0 on hca port 1:
>
> Is watchdog timeout still observed on fabric to which hca for port 1 is
> attached ?
>
yes - on both fabrics. seems to be more on the one without QoS enabled
>>
>> r327i7n0 ~ # smpquery portinfo 248 | grep VL
>> VLCap:...........................VL0-7
>> VLHighLimit:.....................4
>> VLArbHighCap:....................8
>> VLArbLowCap:.....................8
>> VLStallCount:....................0
>> OperVLs:.........................VL0-7
>> r327i7n0 ~ # smpquery -D portinfo 0 1 | grep VL
>> VLCap:...........................VL0-7
>> VLHighLimit:.....................4
>> VLArbHighCap:....................8
>> VLArbLowCap:.....................8
>> VLStallCount:....................0
>> OperVLs:.........................VL0-7
>> r327i7n0 ~ # smpquery -D portinfo 0,1 1 | grep VL
>> VLCap:...........................VL0-7
>> VLHighLimit:.....................4
>> VLArbHighCap:....................8
>> VLArbLowCap:.....................8
>> VLStallCount:....................7
>> OperVLs:.........................VL0-7
>
> It's not an OperVLs mismatch issue.
>
>> r327i7n0 ~ # ibstat
>> CA 'mlx4_0'
>> CA type: MT4099
>> Number of ports: 2
>> Firmware version: 2.10.4350
>> Hardware version: 0
>> Node GUID: 0x0002c90300336b20
>> System image GUID: 0x0002c90300336b23
>> Port 1:
>> State: Active
>> Physical state: LinkUp
>> Rate: 56
>> Base lid: 248
>> LMC: 0
>> SM lid: 1
>> Capability mask: 0x02514868
>> Port GUID: 0x0002c90300336b21
>> Link layer: InfiniBand
>> Port 2:
>> State: Active
>> Physical state: LinkUp
>> Rate: 56
>> Base lid: 1971
>> LMC: 0
>> SM lid: 1685
>> Capability mask: 0x02514868
>> Port GUID: 0x0002c90300336b22
>> Link layer: InfiniBand
>>
>> r327i7n0 ~ # smpquery -D nodeinfo 0,1 1
>> # Node info: DR path slid 65535; dlid 65535; 0,1
>> BaseVers:........................1
>> ClassVers:.......................1
>> NodeType:........................Switch
>> NumPorts:........................36
>> SystemGuid:......................0x080069000000a4db
>> Guid:............................0x080069000000a4d8
>> PortGuid:........................0x080069000000a4d8
>> PartCap:.........................8
>> DevId:...........................0xc738
>> Revision:........................0x000000a1
>> LocalPort:.......................1
>> VendorId:........................0x0002c9
>>
>> r327i7n0 ~ # smpquery -D nodedesc 0,1
>> Node Description:.SwitchX - Mellanox Technologies
>
> What does vendstat -N to this switch say ? Do you know what firmware is
> running there ?
r327i7n0 ~ # vendstat -N -G 0x080069000000a4d8
hw_dev_rev: 0x0001
hw_dev_id: 0xc738
hw_uptime: 0x00038410
fw_version: 09.01.58
fw_build_id: 0x2fb8
fw_date: 04/18/2012
fw_psid: '030_2617_00X_SX1'
fw_ini_ver: 0
sw_version: 00.00.09
bob
>
> -- Hal
>
>> r327i7n0 ~ # smpquery -D sl2vl 0,1 1
>> # SL2VL table: DR path slid 65535; dlid 65535; 0,1
>> # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
>> ports: in 0, out 1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
>> ports: in 1, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 2, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 3, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 4, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 5, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 6, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 7, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 8, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 9, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 10, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 11, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 12, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 13, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 14, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 15, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 16, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 17, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 18, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 19, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 20, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 21, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 22, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 23, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 24, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 25, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 26, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 27, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 28, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 29, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 30, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 31, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 32, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 33, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 34, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 35, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>> ports: in 36, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>>
>> r327i7n0 ~ # smpquery -D sl2vl 0 1
>> # SL2VL table: DR path slid 65535; dlid 65535; 0
>> # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
>> ports: in 0, out 0: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
>>
>> r327i7n0 ~ # smpquery -D vlarb 0,1 1
>> # VLArbitration tables: DR path slid 65535; dlid 65535; 0,1 port 1
>> LowCap 8 HighCap 8
>> # Low priority VL Arbitration Table:
>> VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
>> WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |
>> # High priority VL Arbitration Table:
>> VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
>> WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |
>>
>> r327i7n0 ~ # smpquery -D vlarb 0 1
>> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 1 LowCap
>> 8 HighCap 8
>> # Low priority VL Arbitration Table:
>> VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
>> WEIGHT: |0x20|0x20|0x20|0x20|0x20|0x20|0x20|0x20|
>> # High priority VL Arbitration Table:
>> VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
>> WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
>>
>>
>> on ib1, HCA port 2, Qos is enabled:
>>
>> r327i7n0 ~ # smpquery -P2 -D sl2vl 0 2
>> # SL2VL table: DR path slid 65535; dlid 65535; 0
>> # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
>> ports: in 0, out 0: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>>
>> r327i7n0 ~ # smpquery -P2 -D sl2vl 0,2 1
>> # SL2VL table: DR path slid 65535; dlid 65535; 0,2
>> # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
>> ports: in 0, out 1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
>> ports: in 1, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 2, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 3, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 4, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 5, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 6, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 7, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 8, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 9, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 10, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 11, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 12, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 13, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 14, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 15, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 16, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 17, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 18, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 19, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 20, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 21, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 22, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 23, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 24, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 25, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 26, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 27, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 28, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 29, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 30, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 31, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 32, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 33, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 34, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 35, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>> ports: in 36, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
>>
>> r327i7n0 ~ # smpquery -P2 -D vlarb 0,2 1
>> # VLArbitration tables: DR path slid 65535; dlid 65535; 0,2 port 1
>> LowCap 8 HighCap 8
>> # Low priority VL Arbitration Table:
>> VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
>> WEIGHT: |0x0 |0x0 |0x0 |0x40|0x40|0x40|0x40|0x40|
>> # High priority VL Arbitration Table:
>> VL : |0x0 |0x1 |0x2 |0x0 |0x0 |0x0 |0x0 |0x0 |
>> WEIGHT: |0x80|0x40|0x40|0x0 |0x0 |0x0 |0x0 |0x0 |
>>
>> r327i7n0 ~ # smpquery -P2 -D vlarb 0 2
>> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 2 LowCap
>> 8 HighCap 8
>> # Low priority VL Arbitration Table:
>> VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
>> WEIGHT: |0x0 |0x0 |0x0 |0x40|0x40|0x40|0x40|0x40|
>> # High priority VL Arbitration Table:
>> VL : |0x0 |0x1 |0x2 |0x0 |0x0 |0x0 |0x0 |0x0 |
>> WEIGHT: |0x80|0x40|0x40|0x0 |0x0 |0x0 |0x0 |0x0 |
>>
>>>> Only in the case of FW bug?
>>>
>>> I don't think flow control is performed by FW.
>>>
>>>> Any tunable's that might impact this?
>>>
>>> No IBA standard ones AFAIK. Who's the HCA vendor ?
>>>
>>> -- Hal
>>>
>>>> bob
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2012-05-18 16:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-18 6:05 watchdog timer Bob Ciotti
[not found] ` <4FB5E69A.7010602-NSQ8wuThN14@public.gmane.org>
2012-05-18 13:07 ` Hal Rosenstock
[not found] ` <4FB649A6.2060602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-05-18 14:35 ` Bob Ciotti
[not found] ` <4FB65E30.4070805-NSQ8wuThN14@public.gmane.org>
2012-05-18 15:10 ` Ira Weiny
[not found] ` <20120518081029.01004d2c.weiny2-i2BcT+NCU+M@public.gmane.org>
2012-05-18 15:27 ` Hal Rosenstock
[not found] ` <4FB66A6C.1060703-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-05-21 17:08 ` Ira Weiny
[not found] ` <20120521100831.a434152f.weiny2-i2BcT+NCU+M@public.gmane.org>
2012-05-22 0:51 ` Bob Ciotti
[not found] ` <4FBAE311.3080909-NSQ8wuThN14@public.gmane.org>
2012-05-22 15:55 ` Ira Weiny
2012-05-18 19:10 ` Bob Ciotti
2012-05-18 15:17 ` Hal Rosenstock
[not found] ` <4FB66817.6050106-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-05-18 16:49 ` Bob Ciotti [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FB67DA1.606@nasa.gov \
--to=bob.ciotti-nsq8wuthn14@public.gmane.org \
--cc=hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.