public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Bob Ciotti <Bob.Ciotti-NSQ8wuThN14@public.gmane.org>
To: Hal Rosenstock <hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: watchdog timer
Date: Fri, 18 May 2012 07:35:28 -0700	[thread overview]
Message-ID: <4FB65E30.4070805@nasa.gov> (raw)
In-Reply-To: <4FB649A6.2060602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

On 05/18/2012 06:07 AM, Hal Rosenstock wrote:
 > On 5/18/2012 2:05 AM, Bob Ciotti wrote:
 >>
 >>
 >> I'm seeing lots of these messages in SM log:
 >>
 >> May 17 22:36:04 947774 [DA234710] 0x01 ->  log_trap_info: Received
 >> Generic Notice type:1 num:131 (Flow Control Update watchdog timer
 >> expired) Producer:2 (Switch) from LID:444 Port 5 TID:0x0000000000000025
 >>
 >> the referenced port is a switch to HCA link.
 >>
 >> I've seen this in cases where there was bad hardware. Spec says failure
 >> in flow control machine on other end. But lets assume hardware was good.
 >> When could this occur?
 >
 > Do OperationalVLs match on both sides of the link ? Are you
 > using/configuring QoS ?
 >


There are two separate fabric on each port of 2 port HCA.
Issue is seen on both fabrics.
Normally we use QoS on both fabrics. QoS now disabled on
ib0 on hca port 1:

r327i7n0 ~ # smpquery portinfo 248 | grep VL
VLCap:...........................VL0-7
VLHighLimit:.....................4
VLArbHighCap:....................8
VLArbLowCap:.....................8
VLStallCount:....................0
OperVLs:.........................VL0-7
r327i7n0 ~ # smpquery -D portinfo 0 1 | grep VL
VLCap:...........................VL0-7
VLHighLimit:.....................4
VLArbHighCap:....................8
VLArbLowCap:.....................8
VLStallCount:....................0
OperVLs:.........................VL0-7
r327i7n0 ~ # smpquery -D portinfo 0,1 1 | grep VL
VLCap:...........................VL0-7
VLHighLimit:.....................4
VLArbHighCap:....................8
VLArbLowCap:.....................8
VLStallCount:....................7
OperVLs:.........................VL0-7

r327i7n0 ~ # ibstat
CA 'mlx4_0'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.10.4350
	Hardware version: 0
	Node GUID: 0x0002c90300336b20
	System image GUID: 0x0002c90300336b23
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 248
		LMC: 0
		SM lid: 1
		Capability mask: 0x02514868
		Port GUID: 0x0002c90300336b21
		Link layer: InfiniBand
	Port 2:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 1971
		LMC: 0
		SM lid: 1685
		Capability mask: 0x02514868
		Port GUID: 0x0002c90300336b22
		Link layer: InfiniBand

r327i7n0 ~ # smpquery -D nodeinfo 0,1 1
# Node info: DR path slid 65535; dlid 65535; 0,1
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Switch
NumPorts:........................36
SystemGuid:......................0x080069000000a4db
Guid:............................0x080069000000a4d8
PortGuid:........................0x080069000000a4d8
PartCap:.........................8
DevId:...........................0xc738
Revision:........................0x000000a1
LocalPort:.......................1
VendorId:........................0x0002c9

r327i7n0 ~ # smpquery -D nodedesc 0,1
Node Description:.SwitchX -  Mellanox Technologies

r327i7n0 ~ # smpquery -D sl2vl 0,1 1
# SL2VL table: DR path slid 65535; dlid 65535; 0,1
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
ports: in  1, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  2, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  3, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  4, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  5, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  6, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  7, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  8, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  9, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 10, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 11, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 12, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 13, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 14, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 15, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 16, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 17, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 18, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 19, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 20, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 21, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 22, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 23, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 24, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 25, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 26, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 27, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 28, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 29, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 30, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 31, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 32, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 33, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 34, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 35, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 36, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|

r327i7n0 ~ # smpquery -D sl2vl 0 1
# SL2VL table: DR path slid 65535; dlid 65535; 0
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  0: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|

r327i7n0 ~ # smpquery -D vlarb 0,1 1
# VLArbitration tables: DR path slid 65535; dlid 65535; 0,1 port 1 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |
# High priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |

r327i7n0 ~ # smpquery -D vlarb 0 1
# VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 1 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x20|0x20|0x20|0x20|0x20|0x20|0x20|0x20|
# High priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |


on ib1, HCA port 2, Qos is enabled:

r327i7n0 ~ # smpquery -P2 -D sl2vl 0 2
# SL2VL table: DR path slid 65535; dlid 65535; 0
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  0: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|

r327i7n0 ~ # smpquery -P2 -D sl2vl 0,2 1
# SL2VL table: DR path slid 65535; dlid 65535; 0,2
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
ports: in  1, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in  2, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in  3, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in  4, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in  5, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in  6, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in  7, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in  8, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in  9, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 10, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 11, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 12, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 13, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 14, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 15, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 16, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 17, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 18, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 19, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 20, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 21, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 22, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 23, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 24, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 25, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 26, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 27, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 28, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 29, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 30, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 31, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 32, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 33, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 34, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 35, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
ports: in 36, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|

r327i7n0 ~ # smpquery -P2 -D vlarb 0,2 1
# VLArbitration tables: DR path slid 65535; dlid 65535; 0,2 port 1 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x0 |0x0 |0x0 |0x40|0x40|0x40|0x40|0x40|
# High priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x0 |0x0 |0x0 |0x0 |0x0 |
WEIGHT: |0x80|0x40|0x40|0x0 |0x0 |0x0 |0x0 |0x0 |

r327i7n0 ~ # smpquery -P2 -D vlarb 0 2
# VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 2 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x0 |0x0 |0x0 |0x40|0x40|0x40|0x40|0x40|
# High priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x0 |0x0 |0x0 |0x0 |0x0 |
WEIGHT: |0x80|0x40|0x40|0x0 |0x0 |0x0 |0x0 |0x0 |



>> Only in the case of FW bug?
>
> I don't think flow control is performed by FW.
>
>> Any tunable's that might impact this?
>
> No IBA standard ones AFAIK. Who's the HCA vendor ?
>
> -- Hal
>
>> bob
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-05-18 14:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18  6:05 watchdog timer Bob Ciotti
     [not found] ` <4FB5E69A.7010602-NSQ8wuThN14@public.gmane.org>
2012-05-18 13:07   ` Hal Rosenstock
     [not found]     ` <4FB649A6.2060602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-05-18 14:35       ` Bob Ciotti [this message]
     [not found]         ` <4FB65E30.4070805-NSQ8wuThN14@public.gmane.org>
2012-05-18 15:10           ` Ira Weiny
     [not found]             ` <20120518081029.01004d2c.weiny2-i2BcT+NCU+M@public.gmane.org>
2012-05-18 15:27               ` Hal Rosenstock
     [not found]                 ` <4FB66A6C.1060703-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-05-21 17:08                   ` Ira Weiny
     [not found]                     ` <20120521100831.a434152f.weiny2-i2BcT+NCU+M@public.gmane.org>
2012-05-22  0:51                       ` Bob Ciotti
     [not found]                         ` <4FBAE311.3080909-NSQ8wuThN14@public.gmane.org>
2012-05-22 15:55                           ` Ira Weiny
2012-05-18 19:10               ` Bob Ciotti
2012-05-18 15:17           ` Hal Rosenstock
     [not found]             ` <4FB66817.6050106-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-05-18 16:49               ` Bob Ciotti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FB65E30.4070805@nasa.gov \
    --to=bob.ciotti-nsq8wuthn14@public.gmane.org \
    --cc=hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox