* Re: [ewg] ibcheckerrors "Port All FAILED" reported [not found] ` <382A478CAD40FA4FB46605CF81FE39F45685DEAD-osO9UTpF0URzLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2010-05-06 1:09 ` Ira Weiny [not found] ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Ira Weiny @ 2010-05-06 1:09 UTC (permalink / raw) To: Woodruff, Robert J, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: EWG, tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org Interesting... I have a switch which does this as well. Tracing through the scripts shows that the perfquery command is failing like this. 14:29:03 > ./perfquery 40 255 ./perfquery: iberror: failed: AllPortSelect not supported It seems there is an issue with the CapabilityMask value... 14:43:32 > ./perfquery 40 255 cap_mask 0x400 <=== my debug output ./perfquery: iberror: failed: AllPortSelect not supported 14:43:38 > ./saquery CPI 40 SA ClassPortInfo: ... Capability mask..........0x2602 ... Those don't match because... perfquery has a bug... perfquery is issuing a PMA query when it should be issuing a SA query. It just so happens that on some switches the result of that PMA query indicates AllPortSelect is available. Patch to follow. Ira On Wed, 5 May 2010 13:47:54 -0700 "Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > Hi guys, > > When I run ibcheckerrors on my Mellanox switch, > it is reporting that Port all FAILED. > > From what I can tell, the switch is working fine and > I think that this is a bogus error from the program. > > If this is indeed not a real problem, can the diagnostic > be fixed to not report this as an error ? > > > ibcheckerrors -nocolor -v -t 100 > > # Checking Switch: nodeguid 0x0002c902004046a0 > Node check lid 7: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED <------------ > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK > > Checking Ca: nodeguid 0x0002c9030002628a > Node check lid 14: OK > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c90300025e0a > Node check lid 12: OK > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030002615e > Node check lid 15: OK > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e442 > Node check lid 11: OK > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e44e > Node check lid 8: OK > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e3e6 > Node check lid 2: OK > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e44a > Node check lid 18: OK > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c90300044fb4 > Node check lid 13: OK > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c90300044fbc > Node check lid 10: OK > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e3ee > Node check lid 9: OK > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e446 > Node check lid 4: OK > Error check on lid 4 (cstnh-12 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e22e > Node check lid 1: OK > Error check on lid 1 (cstnh-14 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e43e > Node check lid 19: OK > Error check on lid 19 (cstnh-15 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0090270002000345 > Node check lid 6: OK > Error check on lid 6 (cstnh-5 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0090270002000335 > Node check lid 5: OK > Error check on lid 5 (cstnh-6 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c90300028238 > Node check lid 3: OK > Error check on lid 3 (cst-linux HCA-1) port 1: OK > > ## Summary: 17 nodes checked, 0 bad nodes found > ## 32 ports checked, 0 ports have errors beyond threshold > _______________________________________________ > ewg mailing list > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org > http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > -- Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org>]
* Re: [ewg] ibcheckerrors "Port All FAILED" reported [not found] ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org> @ 2010-05-06 1:57 ` Ira Weiny 2010-05-06 13:26 ` Mike Heinz 2010-05-06 21:11 ` Sasha Khapyorsky 2 siblings, 0 replies; 7+ messages in thread From: Ira Weiny @ 2010-05-06 1:57 UTC (permalink / raw) To: Woodruff, Robert J, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG, tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org Nevermind, I am wrong about the below. However, there is an option to "emulate" the all ports when it is not supported. That is a way to fix this I believe. Ira On Wed, 5 May 2010 18:09:43 -0700 Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote: > Interesting... > > I have a switch which does this as well. Tracing through the scripts shows > that the perfquery command is failing like this. > > 14:29:03 > ./perfquery 40 255 > ./perfquery: iberror: failed: AllPortSelect not supported > > It seems there is an issue with the CapabilityMask value... > > 14:43:32 > ./perfquery 40 255 > cap_mask 0x400 <=== my debug output > ./perfquery: iberror: failed: AllPortSelect not supported > > 14:43:38 > ./saquery CPI 40 > SA ClassPortInfo: > ... > Capability mask..........0x2602 > ... > > Those don't match because... perfquery has a bug... > > perfquery is issuing a PMA query when it should be issuing a SA query. It > just so happens that on some switches the result of that PMA query indicates > AllPortSelect is available. Patch to follow. > > Ira > > > On Wed, 5 May 2010 13:47:54 -0700 > "Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > > > > Hi guys, > > > > When I run ibcheckerrors on my Mellanox switch, > > it is reporting that Port all FAILED. > > > > From what I can tell, the switch is working fine and > > I think that this is a bogus error from the program. > > > > If this is indeed not a real problem, can the diagnostic > > be fixed to not report this as an error ? > > > > > > ibcheckerrors -nocolor -v -t 100 > > > > # Checking Switch: nodeguid 0x0002c902004046a0 > > Node check lid 7: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED <------------ > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK > > > > Checking Ca: nodeguid 0x0002c9030002628a > > Node check lid 14: OK > > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300025e0a > > Node check lid 12: OK > > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030002615e > > Node check lid 15: OK > > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e442 > > Node check lid 11: OK > > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e44e > > Node check lid 8: OK > > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e3e6 > > Node check lid 2: OK > > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e44a > > Node check lid 18: OK > > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300044fb4 > > Node check lid 13: OK > > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300044fbc > > Node check lid 10: OK > > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e3ee > > Node check lid 9: OK > > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e446 > > Node check lid 4: OK > > Error check on lid 4 (cstnh-12 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e22e > > Node check lid 1: OK > > Error check on lid 1 (cstnh-14 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e43e > > Node check lid 19: OK > > Error check on lid 19 (cstnh-15 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0090270002000345 > > Node check lid 6: OK > > Error check on lid 6 (cstnh-5 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0090270002000335 > > Node check lid 5: OK > > Error check on lid 5 (cstnh-6 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300028238 > > Node check lid 3: OK > > Error check on lid 3 (cst-linux HCA-1) port 1: OK > > > > ## Summary: 17 nodes checked, 0 bad nodes found > > ## 32 ports checked, 0 ports have errors beyond threshold > > _______________________________________________ > > ewg mailing list > > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org > > http://**lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > > > -- > Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://*vger.kernel.org/majordomo-info.html > -- Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [ewg] ibcheckerrors "Port All FAILED" reported [not found] ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org> 2010-05-06 1:57 ` Ira Weiny @ 2010-05-06 13:26 ` Mike Heinz [not found] ` <4C2744E8AD2982428C5BFE523DF8CDCB49A4740C58-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org> 2010-05-06 21:11 ` Sasha Khapyorsky 2 siblings, 1 reply; 7+ messages in thread From: Mike Heinz @ 2010-05-06 13:26 UTC (permalink / raw) To: Ira Weiny, Woodruff, Robert J, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: EWG, tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org Ira, I'm pretty sure I already fixed this problem. I submitted a patch to Sasha back in April. -----Original Message----- From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ira Weiny Sent: Wednesday, May 05, 2010 9:10 PM To: Woodruff, Robert J; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: EWG; tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported Interesting... I have a switch which does this as well. Tracing through the scripts shows that the perfquery command is failing like this. 14:29:03 > ./perfquery 40 255 ./perfquery: iberror: failed: AllPortSelect not supported It seems there is an issue with the CapabilityMask value... 14:43:32 > ./perfquery 40 255 cap_mask 0x400 <=== my debug output ./perfquery: iberror: failed: AllPortSelect not supported 14:43:38 > ./saquery CPI 40 SA ClassPortInfo: ... Capability mask..........0x2602 ... Those don't match because... perfquery has a bug... perfquery is issuing a PMA query when it should be issuing a SA query. It just so happens that on some switches the result of that PMA query indicates AllPortSelect is available. Patch to follow. Ira On Wed, 5 May 2010 13:47:54 -0700 "Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > Hi guys, > > When I run ibcheckerrors on my Mellanox switch, > it is reporting that Port all FAILED. > > From what I can tell, the switch is working fine and > I think that this is a bogus error from the program. > > If this is indeed not a real problem, can the diagnostic > be fixed to not report this as an error ? > > > ibcheckerrors -nocolor -v -t 100 > > # Checking Switch: nodeguid 0x0002c902004046a0 > Node check lid 7: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED <------------ > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK > > Checking Ca: nodeguid 0x0002c9030002628a > Node check lid 14: OK > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c90300025e0a > Node check lid 12: OK > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030002615e > Node check lid 15: OK > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e442 > Node check lid 11: OK > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e44e > Node check lid 8: OK > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e3e6 > Node check lid 2: OK > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e44a > Node check lid 18: OK > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c90300044fb4 > Node check lid 13: OK > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c90300044fbc > Node check lid 10: OK > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e3ee > Node check lid 9: OK > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e446 > Node check lid 4: OK > Error check on lid 4 (cstnh-12 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e22e > Node check lid 1: OK > Error check on lid 1 (cstnh-14 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c9030008e43e > Node check lid 19: OK > Error check on lid 19 (cstnh-15 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0090270002000345 > Node check lid 6: OK > Error check on lid 6 (cstnh-5 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0090270002000335 > Node check lid 5: OK > Error check on lid 5 (cstnh-6 HCA-1) port 1: OK > > # Checking Ca: nodeguid 0x0002c90300028238 > Node check lid 3: OK > Error check on lid 3 (cst-linux HCA-1) port 1: OK > > ## Summary: 17 nodes checked, 0 bad nodes found > ## 32 ports checked, 0 ports have errors beyond threshold > _______________________________________________ > ewg mailing list > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org > http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > -- Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <4C2744E8AD2982428C5BFE523DF8CDCB49A4740C58-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>]
* Re: [ewg] ibcheckerrors "Port All FAILED" reported [not found] ` <4C2744E8AD2982428C5BFE523DF8CDCB49A4740C58-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org> @ 2010-05-06 15:34 ` Ira Weiny [not found] ` <20100506083455.951377af.weiny2-i2BcT+NCU+M@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Ira Weiny @ 2010-05-06 15:34 UTC (permalink / raw) To: Mike Heinz, Sasha Khapyorsky Cc: Woodruff, Robert J, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG, tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org On Thu, 6 May 2010 06:26:55 -0700 Mike Heinz <michael.heinz-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org> wrote: > Ira, > > I'm pretty sure I already fixed this problem. I submitted a patch to Sasha > back in April. The tests below are with the current master. git://git.openfabrics.org/~sashak/management Ira > > > -----Original Message----- > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ira Weiny > Sent: Wednesday, May 05, 2010 9:10 PM > To: Woodruff, Robert J; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Cc: EWG; tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org > Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported > > Interesting... > > I have a switch which does this as well. Tracing through the scripts shows > that the perfquery command is failing like this. > > 14:29:03 > ./perfquery 40 255 > ./perfquery: iberror: failed: AllPortSelect not supported > > It seems there is an issue with the CapabilityMask value... > > 14:43:32 > ./perfquery 40 255 > cap_mask 0x400 <=== my debug output > ./perfquery: iberror: failed: AllPortSelect not supported > > 14:43:38 > ./saquery CPI 40 > SA ClassPortInfo: > ... > Capability mask..........0x2602 > ... > > Those don't match because... perfquery has a bug... > > perfquery is issuing a PMA query when it should be issuing a SA query. It > just so happens that on some switches the result of that PMA query indicates > AllPortSelect is available. Patch to follow. > > Ira > > > On Wed, 5 May 2010 13:47:54 -0700 > "Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > > > > Hi guys, > > > > When I run ibcheckerrors on my Mellanox switch, > > it is reporting that Port all FAILED. > > > > From what I can tell, the switch is working fine and > > I think that this is a bogus error from the program. > > > > If this is indeed not a real problem, can the diagnostic > > be fixed to not report this as an error ? > > > > > > ibcheckerrors -nocolor -v -t 100 > > > > # Checking Switch: nodeguid 0x0002c902004046a0 > > Node check lid 7: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED <------------ > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK > > > > Checking Ca: nodeguid 0x0002c9030002628a > > Node check lid 14: OK > > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300025e0a > > Node check lid 12: OK > > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030002615e > > Node check lid 15: OK > > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e442 > > Node check lid 11: OK > > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e44e > > Node check lid 8: OK > > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e3e6 > > Node check lid 2: OK > > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e44a > > Node check lid 18: OK > > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300044fb4 > > Node check lid 13: OK > > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300044fbc > > Node check lid 10: OK > > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e3ee > > Node check lid 9: OK > > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e446 > > Node check lid 4: OK > > Error check on lid 4 (cstnh-12 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e22e > > Node check lid 1: OK > > Error check on lid 1 (cstnh-14 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e43e > > Node check lid 19: OK > > Error check on lid 19 (cstnh-15 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0090270002000345 > > Node check lid 6: OK > > Error check on lid 6 (cstnh-5 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0090270002000335 > > Node check lid 5: OK > > Error check on lid 5 (cstnh-6 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300028238 > > Node check lid 3: OK > > Error check on lid 3 (cst-linux HCA-1) port 1: OK > > > > ## Summary: 17 nodes checked, 0 bad nodes found > > ## 32 ports checked, 0 ports have errors beyond threshold > > _______________________________________________ > > ewg mailing list > > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org > > http://**lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > > > -- > Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://*vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://*vger.kernel.org/majordomo-info.html > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 weiny2-i2BcT+NCU+M@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20100506083455.951377af.weiny2-i2BcT+NCU+M@public.gmane.org>]
* RE: [ewg] ibcheckerrors "Port All FAILED" reported [not found] ` <20100506083455.951377af.weiny2-i2BcT+NCU+M@public.gmane.org> @ 2010-05-06 15:41 ` Mike Heinz 0 siblings, 0 replies; 7+ messages in thread From: Mike Heinz @ 2010-05-06 15:41 UTC (permalink / raw) To: Ira Weiny, Sasha Khapyorsky Cc: Woodruff, Robert J, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG, tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org Yup - I've also sent a note to Sasha what happened to the patch. -----Original Message----- From: Ira Weiny [mailto:weiny2-i2BcT+NCU+M@public.gmane.org] Sent: Thursday, May 06, 2010 11:35 AM To: Mike Heinz; Sasha Khapyorsky Cc: Woodruff, Robert J; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; EWG; tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported On Thu, 6 May 2010 06:26:55 -0700 Mike Heinz <michael.heinz-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org> wrote: > Ira, > > I'm pretty sure I already fixed this problem. I submitted a patch to Sasha > back in April. The tests below are with the current master. git://git.openfabrics.org/~sashak/management Ira > > > -----Original Message----- > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ira Weiny > Sent: Wednesday, May 05, 2010 9:10 PM > To: Woodruff, Robert J; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Cc: EWG; tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org > Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported > > Interesting... > > I have a switch which does this as well. Tracing through the scripts shows > that the perfquery command is failing like this. > > 14:29:03 > ./perfquery 40 255 > ./perfquery: iberror: failed: AllPortSelect not supported > > It seems there is an issue with the CapabilityMask value... > > 14:43:32 > ./perfquery 40 255 > cap_mask 0x400 <=== my debug output > ./perfquery: iberror: failed: AllPortSelect not supported > > 14:43:38 > ./saquery CPI 40 > SA ClassPortInfo: > ... > Capability mask..........0x2602 > ... > > Those don't match because... perfquery has a bug... > > perfquery is issuing a PMA query when it should be issuing a SA query. It > just so happens that on some switches the result of that PMA query indicates > AllPortSelect is available. Patch to follow. > > Ira > > > On Wed, 5 May 2010 13:47:54 -0700 > "Woodruff, Robert J" <robert.j.woodruff-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > > > > Hi guys, > > > > When I run ibcheckerrors on my Mellanox switch, > > it is reporting that Port all FAILED. > > > > From what I can tell, the switch is working fine and > > I think that this is a bogus error from the program. > > > > If this is indeed not a real problem, can the diagnostic > > be fixed to not report this as an error ? > > > > > > ibcheckerrors -nocolor -v -t 100 > > > > # Checking Switch: nodeguid 0x0002c902004046a0 > > Node check lid 7: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED <------------ > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK > > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK > > > > Checking Ca: nodeguid 0x0002c9030002628a > > Node check lid 14: OK > > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300025e0a > > Node check lid 12: OK > > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030002615e > > Node check lid 15: OK > > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e442 > > Node check lid 11: OK > > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e44e > > Node check lid 8: OK > > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e3e6 > > Node check lid 2: OK > > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e44a > > Node check lid 18: OK > > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300044fb4 > > Node check lid 13: OK > > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300044fbc > > Node check lid 10: OK > > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e3ee > > Node check lid 9: OK > > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e446 > > Node check lid 4: OK > > Error check on lid 4 (cstnh-12 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e22e > > Node check lid 1: OK > > Error check on lid 1 (cstnh-14 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c9030008e43e > > Node check lid 19: OK > > Error check on lid 19 (cstnh-15 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0090270002000345 > > Node check lid 6: OK > > Error check on lid 6 (cstnh-5 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0090270002000335 > > Node check lid 5: OK > > Error check on lid 5 (cstnh-6 HCA-1) port 1: OK > > > > # Checking Ca: nodeguid 0x0002c90300028238 > > Node check lid 3: OK > > Error check on lid 3 (cst-linux HCA-1) port 1: OK > > > > ## Summary: 17 nodes checked, 0 bad nodes found > > ## 32 ports checked, 0 ports have errors beyond threshold > > _______________________________________________ > > ewg mailing list > > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org > > http://**lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > > > -- > Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://*vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://*vger.kernel.org/majordomo-info.html > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 weiny2-i2BcT+NCU+M@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ibcheckerrors "Port All FAILED" reported [not found] ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org> 2010-05-06 1:57 ` Ira Weiny 2010-05-06 13:26 ` Mike Heinz @ 2010-05-06 21:11 ` Sasha Khapyorsky [not found] ` <20100506211124.GH7099-o14lFNPAa+WKTadZzrrH2Q@public.gmane.org> 2 siblings, 1 reply; 7+ messages in thread From: Sasha Khapyorsky @ 2010-05-06 21:11 UTC (permalink / raw) To: Ira Weiny; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG On 18:09 Wed 05 May , Ira Weiny wrote: > > 14:29:03 > ./perfquery 40 255 > ./perfquery: iberror: failed: AllPortSelect not supported > > It seems there is an issue with the CapabilityMask value... > > 14:43:32 > ./perfquery 40 255 > cap_mask 0x400 <=== my debug output > ./perfquery: iberror: failed: AllPortSelect not supported > > 14:43:38 > ./saquery CPI 40 > SA ClassPortInfo: > ... > Capability mask..........0x2602 > ... > > Those don't match because... perfquery has a bug... > > perfquery is issuing a PMA query when it should be issuing a SA query. I'm not following. How should it be related to each other SA and PM ClassPortInfo(s)? Sasha ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20100506211124.GH7099-o14lFNPAa+WKTadZzrrH2Q@public.gmane.org>]
* Re: [ewg] ibcheckerrors "Port All FAILED" reported [not found] ` <20100506211124.GH7099-o14lFNPAa+WKTadZzrrH2Q@public.gmane.org> @ 2010-05-06 21:08 ` Ira Weiny 0 siblings, 0 replies; 7+ messages in thread From: Ira Weiny @ 2010-05-06 21:08 UTC (permalink / raw) To: Sasha Khapyorsky Cc: Woodruff, Robert J, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, EWG, tziporet-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org On Thu, 6 May 2010 14:11:24 -0700 Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote: > On 18:09 Wed 05 May , Ira Weiny wrote: > > > > 14:29:03 > ./perfquery 40 255 > > ./perfquery: iberror: failed: AllPortSelect not supported > > > > It seems there is an issue with the CapabilityMask value... > > > > 14:43:32 > ./perfquery 40 255 > > cap_mask 0x400 <=== my debug output > > ./perfquery: iberror: failed: AllPortSelect not supported > > > > 14:43:38 > ./saquery CPI 40 > > SA ClassPortInfo: > > ... > > Capability mask..........0x2602 > > ... > > > > Those don't match because... perfquery has a bug... > > > > perfquery is issuing a PMA query when it should be issuing a SA query. > > I'm not following. How should it be related to each other SA and PM > ClassPortInfo(s)? It's not, I was confused... :-D Ira > > Sasha -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 weiny2-i2BcT+NCU+M@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-05-06 21:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <382A478CAD40FA4FB46605CF81FE39F45685DEAD@orsmsx507.amr.corp.intel.com>
[not found] ` <382A478CAD40FA4FB46605CF81FE39F45685DEAD-osO9UTpF0URzLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-05-06 1:09 ` [ewg] ibcheckerrors "Port All FAILED" reported Ira Weiny
[not found] ` <20100505180943.a9bbb74e.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-05-06 1:57 ` Ira Weiny
2010-05-06 13:26 ` Mike Heinz
[not found] ` <4C2744E8AD2982428C5BFE523DF8CDCB49A4740C58-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>
2010-05-06 15:34 ` Ira Weiny
[not found] ` <20100506083455.951377af.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-05-06 15:41 ` Mike Heinz
2010-05-06 21:11 ` Sasha Khapyorsky
[not found] ` <20100506211124.GH7099-o14lFNPAa+WKTadZzrrH2Q@public.gmane.org>
2010-05-06 21:08 ` [ewg] " Ira Weiny
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox