public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [ewg] ibcheckerrors "Port All FAILED" reported
       [not found] ` <382A478CAD40FA4FB46605CF81FE39F45685DEAD-osO9UTpF0URzLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-05-06  1:09   ` Ira Weiny
       [not found]     ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Ira Weiny @ 2010-05-06  1:09 UTC (permalink / raw)
  To: Woodruff, Robert J,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: EWG, tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org

Interesting...

I have a switch which does this as well.  Tracing through the scripts shows
that the perfquery command is failing like this.

14:29:03 > ./perfquery 40 255
./perfquery: iberror: failed: AllPortSelect not supported

It seems there is an issue with the CapabilityMask value...

14:43:32 > ./perfquery 40 255
cap_mask 0x400  <=== my debug output
./perfquery: iberror: failed: AllPortSelect not supported

14:43:38 > ./saquery CPI 40
SA ClassPortInfo:
...
                Capability mask..........0x2602
...

Those don't match because...  perfquery has a bug...

perfquery is issuing a PMA query when it should be issuing a SA query.  It
just so happens that on some switches the result of that PMA query indicates
AllPortSelect is available.  Patch to follow.

Ira


On Wed, 5 May 2010 13:47:54 -0700
"Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> 
> Hi guys,
> 
> When I run ibcheckerrors on my Mellanox switch,
> it is reporting that Port all FAILED. 
> 
> From what I can tell, the switch is working fine and
> I think that this is a bogus error from the program.
> 
> If this is indeed not a real problem, can the diagnostic
> be fixed to not report this as an error ?
> 
> 
> ibcheckerrors -nocolor -v -t 100
> 
> # Checking Switch: nodeguid 0x0002c902004046a0
> Node check lid 7: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED   <------------
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
> 
>  Checking Ca: nodeguid 0x0002c9030002628a
> Node check lid 14: OK
> Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300025e0a
> Node check lid 12: OK
> Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030002615e
> Node check lid 15: OK
> Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e442
> Node check lid 11: OK
> Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e44e
> Node check lid 8: OK
> Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e3e6
> Node check lid 2: OK
> Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e44a
> Node check lid 18: OK
> Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300044fb4
> Node check lid 13: OK
> Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300044fbc
> Node check lid 10: OK
> Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e3ee
> Node check lid 9: OK
> Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e446
> Node check lid 4: OK
> Error check on lid 4 (cstnh-12 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e22e
> Node check lid 1: OK
> Error check on lid 1 (cstnh-14 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e43e
> Node check lid 19: OK
> Error check on lid 19 (cstnh-15 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0090270002000345
> Node check lid 6: OK
> Error check on lid 6 (cstnh-5 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0090270002000335
> Node check lid 5: OK
> Error check on lid 5 (cstnh-6 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300028238
> Node check lid 3: OK
> Error check on lid 3 (cst-linux HCA-1) port 1: OK
> 
> ## Summary: 17 nodes checked, 0 bad nodes found
> ##          32 ports checked, 0 ports have errors beyond threshold
> _______________________________________________
> ewg mailing list
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 


-- 
Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ewg] ibcheckerrors "Port All FAILED" reported
       [not found]     ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2010-05-06  1:57       ` Ira Weiny
  2010-05-06 13:26       ` Mike Heinz
  2010-05-06 21:11       ` Sasha Khapyorsky
  2 siblings, 0 replies; 7+ messages in thread
From: Ira Weiny @ 2010-05-06  1:57 UTC (permalink / raw)
  To: Woodruff, Robert J,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG,
	tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org

Nevermind, I am wrong about the below.

However, there is an option to "emulate" the all ports when it is not supported.

That is a way to fix this I believe.
Ira

On Wed, 5 May 2010 18:09:43 -0700
Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:

> Interesting...
> 
> I have a switch which does this as well.  Tracing through the scripts shows
> that the perfquery command is failing like this.
> 
> 14:29:03 > ./perfquery 40 255
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> It seems there is an issue with the CapabilityMask value...
> 
> 14:43:32 > ./perfquery 40 255
> cap_mask 0x400  <=== my debug output
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> 14:43:38 > ./saquery CPI 40
> SA ClassPortInfo:
> ...
>                 Capability mask..........0x2602
> ...
> 
> Those don't match because...  perfquery has a bug...
> 
> perfquery is issuing a PMA query when it should be issuing a SA query.  It
> just so happens that on some switches the result of that PMA query indicates
> AllPortSelect is available.  Patch to follow.
> 
> Ira
> 
> 
> On Wed, 5 May 2010 13:47:54 -0700
> "Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > 
> > Hi guys,
> > 
> > When I run ibcheckerrors on my Mellanox switch,
> > it is reporting that Port all FAILED. 
> > 
> > From what I can tell, the switch is working fine and
> > I think that this is a bogus error from the program.
> > 
> > If this is indeed not a real problem, can the diagnostic
> > be fixed to not report this as an error ?
> > 
> > 
> > ibcheckerrors -nocolor -v -t 100
> > 
> > # Checking Switch: nodeguid 0x0002c902004046a0
> > Node check lid 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED   <------------
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
> > 
> >  Checking Ca: nodeguid 0x0002c9030002628a
> > Node check lid 14: OK
> > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300025e0a
> > Node check lid 12: OK
> > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030002615e
> > Node check lid 15: OK
> > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e442
> > Node check lid 11: OK
> > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44e
> > Node check lid 8: OK
> > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3e6
> > Node check lid 2: OK
> > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44a
> > Node check lid 18: OK
> > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fb4
> > Node check lid 13: OK
> > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fbc
> > Node check lid 10: OK
> > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3ee
> > Node check lid 9: OK
> > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e446
> > Node check lid 4: OK
> > Error check on lid 4 (cstnh-12 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e22e
> > Node check lid 1: OK
> > Error check on lid 1 (cstnh-14 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e43e
> > Node check lid 19: OK
> > Error check on lid 19 (cstnh-15 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0090270002000345
> > Node check lid 6: OK
> > Error check on lid 6 (cstnh-5 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0090270002000335
> > Node check lid 5: OK
> > Error check on lid 5 (cstnh-6 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300028238
> > Node check lid 3: OK
> > Error check on lid 3 (cst-linux HCA-1) port 1: OK
> > 
> > ## Summary: 17 nodes checked, 0 bad nodes found
> > ##          32 ports checked, 0 ports have errors beyond threshold
> > _______________________________________________
> > ewg mailing list
> > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> > http://**lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> > 
> 
> 
> -- 
> Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [ewg] ibcheckerrors "Port All FAILED" reported
       [not found]     ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org>
  2010-05-06  1:57       ` Ira Weiny
@ 2010-05-06 13:26       ` Mike Heinz
       [not found]         ` <4C2744E8AD2982428C5BFE523DF8CDCB49A4740C58-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>
  2010-05-06 21:11       ` Sasha Khapyorsky
  2 siblings, 1 reply; 7+ messages in thread
From: Mike Heinz @ 2010-05-06 13:26 UTC (permalink / raw)
  To: Ira Weiny, Woodruff, Robert J,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: EWG, tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org

Ira, 

I'm pretty sure I already fixed this problem. I submitted a patch to Sasha back in April.


-----Original Message-----
From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ira Weiny
Sent: Wednesday, May 05, 2010 9:10 PM
To: Woodruff, Robert J; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: EWG; tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org
Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported

Interesting...

I have a switch which does this as well.  Tracing through the scripts shows
that the perfquery command is failing like this.

14:29:03 > ./perfquery 40 255
./perfquery: iberror: failed: AllPortSelect not supported

It seems there is an issue with the CapabilityMask value...

14:43:32 > ./perfquery 40 255
cap_mask 0x400  <=== my debug output
./perfquery: iberror: failed: AllPortSelect not supported

14:43:38 > ./saquery CPI 40
SA ClassPortInfo:
...
                Capability mask..........0x2602
...

Those don't match because...  perfquery has a bug...

perfquery is issuing a PMA query when it should be issuing a SA query.  It
just so happens that on some switches the result of that PMA query indicates
AllPortSelect is available.  Patch to follow.

Ira


On Wed, 5 May 2010 13:47:54 -0700
"Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> 
> Hi guys,
> 
> When I run ibcheckerrors on my Mellanox switch,
> it is reporting that Port all FAILED. 
> 
> From what I can tell, the switch is working fine and
> I think that this is a bogus error from the program.
> 
> If this is indeed not a real problem, can the diagnostic
> be fixed to not report this as an error ?
> 
> 
> ibcheckerrors -nocolor -v -t 100
> 
> # Checking Switch: nodeguid 0x0002c902004046a0
> Node check lid 7: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED   <------------
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
> 
>  Checking Ca: nodeguid 0x0002c9030002628a
> Node check lid 14: OK
> Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300025e0a
> Node check lid 12: OK
> Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030002615e
> Node check lid 15: OK
> Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e442
> Node check lid 11: OK
> Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e44e
> Node check lid 8: OK
> Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e3e6
> Node check lid 2: OK
> Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e44a
> Node check lid 18: OK
> Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300044fb4
> Node check lid 13: OK
> Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300044fbc
> Node check lid 10: OK
> Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e3ee
> Node check lid 9: OK
> Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e446
> Node check lid 4: OK
> Error check on lid 4 (cstnh-12 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e22e
> Node check lid 1: OK
> Error check on lid 1 (cstnh-14 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e43e
> Node check lid 19: OK
> Error check on lid 19 (cstnh-15 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0090270002000345
> Node check lid 6: OK
> Error check on lid 6 (cstnh-5 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0090270002000335
> Node check lid 5: OK
> Error check on lid 5 (cstnh-6 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300028238
> Node check lid 3: OK
> Error check on lid 3 (cst-linux HCA-1) port 1: OK
> 
> ## Summary: 17 nodes checked, 0 bad nodes found
> ##          32 ports checked, 0 ports have errors beyond threshold
> _______________________________________________
> ewg mailing list
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 


-- 
Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ewg] ibcheckerrors "Port All FAILED" reported
       [not found]         ` <4C2744E8AD2982428C5BFE523DF8CDCB49A4740C58-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>
@ 2010-05-06 15:34           ` Ira Weiny
       [not found]             ` <20100506083455.951377af.weiny2-i2BcT+NCU+M@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Ira Weiny @ 2010-05-06 15:34 UTC (permalink / raw)
  To: Mike Heinz, Sasha Khapyorsky
  Cc: Woodruff, Robert J,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG,
	tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org

On Thu, 6 May 2010 06:26:55 -0700
Mike Heinz <michael.heinz-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org> wrote:

> Ira, 
> 
> I'm pretty sure I already fixed this problem. I submitted a patch to Sasha
> back in April.

The tests below are with the current master.

git://git.openfabrics.org/~sashak/management


Ira

> 
> 
> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ira Weiny
> Sent: Wednesday, May 05, 2010 9:10 PM
> To: Woodruff, Robert J; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Cc: EWG; tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org
> Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported
> 
> Interesting...
> 
> I have a switch which does this as well.  Tracing through the scripts shows
> that the perfquery command is failing like this.
> 
> 14:29:03 > ./perfquery 40 255
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> It seems there is an issue with the CapabilityMask value...
> 
> 14:43:32 > ./perfquery 40 255
> cap_mask 0x400  <=== my debug output
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> 14:43:38 > ./saquery CPI 40
> SA ClassPortInfo:
> ...
>                 Capability mask..........0x2602
> ...
> 
> Those don't match because...  perfquery has a bug...
> 
> perfquery is issuing a PMA query when it should be issuing a SA query.  It
> just so happens that on some switches the result of that PMA query indicates
> AllPortSelect is available.  Patch to follow.
> 
> Ira
> 
> 
> On Wed, 5 May 2010 13:47:54 -0700
> "Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > 
> > Hi guys,
> > 
> > When I run ibcheckerrors on my Mellanox switch,
> > it is reporting that Port all FAILED. 
> > 
> > From what I can tell, the switch is working fine and
> > I think that this is a bogus error from the program.
> > 
> > If this is indeed not a real problem, can the diagnostic
> > be fixed to not report this as an error ?
> > 
> > 
> > ibcheckerrors -nocolor -v -t 100
> > 
> > # Checking Switch: nodeguid 0x0002c902004046a0
> > Node check lid 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED   <------------
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
> > 
> >  Checking Ca: nodeguid 0x0002c9030002628a
> > Node check lid 14: OK
> > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300025e0a
> > Node check lid 12: OK
> > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030002615e
> > Node check lid 15: OK
> > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e442
> > Node check lid 11: OK
> > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44e
> > Node check lid 8: OK
> > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3e6
> > Node check lid 2: OK
> > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44a
> > Node check lid 18: OK
> > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fb4
> > Node check lid 13: OK
> > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fbc
> > Node check lid 10: OK
> > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3ee
> > Node check lid 9: OK
> > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e446
> > Node check lid 4: OK
> > Error check on lid 4 (cstnh-12 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e22e
> > Node check lid 1: OK
> > Error check on lid 1 (cstnh-14 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e43e
> > Node check lid 19: OK
> > Error check on lid 19 (cstnh-15 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0090270002000345
> > Node check lid 6: OK
> > Error check on lid 6 (cstnh-5 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0090270002000335
> > Node check lid 5: OK
> > Error check on lid 5 (cstnh-6 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300028238
> > Node check lid 3: OK
> > Error check on lid 3 (cst-linux HCA-1) port 1: OK
> > 
> > ## Summary: 17 nodes checked, 0 bad nodes found
> > ##          32 ports checked, 0 ports have errors beyond threshold
> > _______________________________________________
> > ewg mailing list
> > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> > http://**lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> > 
> 
> 
> -- 
> Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [ewg] ibcheckerrors "Port All FAILED" reported
       [not found]             ` <20100506083455.951377af.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2010-05-06 15:41               ` Mike Heinz
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Heinz @ 2010-05-06 15:41 UTC (permalink / raw)
  To: Ira Weiny, Sasha Khapyorsky
  Cc: Woodruff, Robert J,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG,
	tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org

Yup - I've also sent a note to Sasha what happened to the patch.

-----Original Message-----
From: Ira Weiny [mailto:weiny2-i2BcT+NCU+M@public.gmane.org] 
Sent: Thursday, May 06, 2010 11:35 AM
To: Mike Heinz; Sasha Khapyorsky
Cc: Woodruff, Robert J; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; EWG; tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org
Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported

On Thu, 6 May 2010 06:26:55 -0700
Mike Heinz <michael.heinz-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org> wrote:

> Ira, 
> 
> I'm pretty sure I already fixed this problem. I submitted a patch to Sasha
> back in April.

The tests below are with the current master.

git://git.openfabrics.org/~sashak/management


Ira

> 
> 
> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ira Weiny
> Sent: Wednesday, May 05, 2010 9:10 PM
> To: Woodruff, Robert J; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Cc: EWG; tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org
> Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported
> 
> Interesting...
> 
> I have a switch which does this as well.  Tracing through the scripts shows
> that the perfquery command is failing like this.
> 
> 14:29:03 > ./perfquery 40 255
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> It seems there is an issue with the CapabilityMask value...
> 
> 14:43:32 > ./perfquery 40 255
> cap_mask 0x400  <=== my debug output
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> 14:43:38 > ./saquery CPI 40
> SA ClassPortInfo:
> ...
>                 Capability mask..........0x2602
> ...
> 
> Those don't match because...  perfquery has a bug...
> 
> perfquery is issuing a PMA query when it should be issuing a SA query.  It
> just so happens that on some switches the result of that PMA query indicates
> AllPortSelect is available.  Patch to follow.
> 
> Ira
> 
> 
> On Wed, 5 May 2010 13:47:54 -0700
> "Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > 
> > Hi guys,
> > 
> > When I run ibcheckerrors on my Mellanox switch,
> > it is reporting that Port all FAILED. 
> > 
> > From what I can tell, the switch is working fine and
> > I think that this is a bogus error from the program.
> > 
> > If this is indeed not a real problem, can the diagnostic
> > be fixed to not report this as an error ?
> > 
> > 
> > ibcheckerrors -nocolor -v -t 100
> > 
> > # Checking Switch: nodeguid 0x0002c902004046a0
> > Node check lid 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED   <------------
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
> > 
> >  Checking Ca: nodeguid 0x0002c9030002628a
> > Node check lid 14: OK
> > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300025e0a
> > Node check lid 12: OK
> > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030002615e
> > Node check lid 15: OK
> > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e442
> > Node check lid 11: OK
> > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44e
> > Node check lid 8: OK
> > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3e6
> > Node check lid 2: OK
> > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44a
> > Node check lid 18: OK
> > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fb4
> > Node check lid 13: OK
> > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fbc
> > Node check lid 10: OK
> > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3ee
> > Node check lid 9: OK
> > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e446
> > Node check lid 4: OK
> > Error check on lid 4 (cstnh-12 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e22e
> > Node check lid 1: OK
> > Error check on lid 1 (cstnh-14 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e43e
> > Node check lid 19: OK
> > Error check on lid 19 (cstnh-15 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0090270002000345
> > Node check lid 6: OK
> > Error check on lid 6 (cstnh-5 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0090270002000335
> > Node check lid 5: OK
> > Error check on lid 5 (cstnh-6 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300028238
> > Node check lid 3: OK
> > Error check on lid 3 (cst-linux HCA-1) port 1: OK
> > 
> > ## Summary: 17 nodes checked, 0 bad nodes found
> > ##          32 ports checked, 0 ports have errors beyond threshold
> > _______________________________________________
> > ewg mailing list
> > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> > http://**lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> > 
> 
> 
> -- 
> Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ewg] ibcheckerrors "Port All FAILED" reported
       [not found]         ` <20100506211124.GH7099-o14lFNPAa+WKTadZzrrH2Q@public.gmane.org>
@ 2010-05-06 21:08           ` Ira Weiny
  0 siblings, 0 replies; 7+ messages in thread
From: Ira Weiny @ 2010-05-06 21:08 UTC (permalink / raw)
  To: Sasha Khapyorsky
  Cc: Woodruff, Robert J,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG,
	tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org

On Thu, 6 May 2010 14:11:24 -0700
Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:

> On 18:09 Wed 05 May     , Ira Weiny wrote:
> > 
> > 14:29:03 > ./perfquery 40 255
> > ./perfquery: iberror: failed: AllPortSelect not supported
> > 
> > It seems there is an issue with the CapabilityMask value...
> > 
> > 14:43:32 > ./perfquery 40 255
> > cap_mask 0x400  <=== my debug output
> > ./perfquery: iberror: failed: AllPortSelect not supported
> > 
> > 14:43:38 > ./saquery CPI 40
> > SA ClassPortInfo:
> > ...
> >                 Capability mask..........0x2602
> > ...
> > 
> > Those don't match because...  perfquery has a bug...
> > 
> > perfquery is issuing a PMA query when it should be issuing a SA query.
> 
> I'm not following. How should it be related to each other SA and PM
> ClassPortInfo(s)?

It's not, I was confused...  :-D

Ira

> 
> Sasha


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ibcheckerrors "Port All FAILED" reported
       [not found]     ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org>
  2010-05-06  1:57       ` Ira Weiny
  2010-05-06 13:26       ` Mike Heinz
@ 2010-05-06 21:11       ` Sasha Khapyorsky
       [not found]         ` <20100506211124.GH7099-o14lFNPAa+WKTadZzrrH2Q@public.gmane.org>
  2 siblings, 1 reply; 7+ messages in thread
From: Sasha Khapyorsky @ 2010-05-06 21:11 UTC (permalink / raw)
  To: Ira Weiny; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG

On 18:09 Wed 05 May     , Ira Weiny wrote:
> 
> 14:29:03 > ./perfquery 40 255
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> It seems there is an issue with the CapabilityMask value...
> 
> 14:43:32 > ./perfquery 40 255
> cap_mask 0x400  <=== my debug output
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> 14:43:38 > ./saquery CPI 40
> SA ClassPortInfo:
> ...
>                 Capability mask..........0x2602
> ...
> 
> Those don't match because...  perfquery has a bug...
> 
> perfquery is issuing a PMA query when it should be issuing a SA query.

I'm not following. How should it be related to each other SA and PM
ClassPortInfo(s)?

Sasha

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-05-06 21:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <382A478CAD40FA4FB46605CF81FE39F45685DEAD@orsmsx507.amr.corp.intel.com>
     [not found] ` <382A478CAD40FA4FB46605CF81FE39F45685DEAD-osO9UTpF0URzLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-05-06  1:09   ` [ewg] ibcheckerrors "Port All FAILED" reported Ira Weiny
     [not found]     ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-05-06  1:57       ` Ira Weiny
2010-05-06 13:26       ` Mike Heinz
     [not found]         ` <4C2744E8AD2982428C5BFE523DF8CDCB49A4740C58-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>
2010-05-06 15:34           ` Ira Weiny
     [not found]             ` <20100506083455.951377af.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-05-06 15:41               ` Mike Heinz
2010-05-06 21:11       ` Sasha Khapyorsky
     [not found]         ` <20100506211124.GH7099-o14lFNPAa+WKTadZzrrH2Q@public.gmane.org>
2010-05-06 21:08           ` [ewg] " Ira Weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox