* ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet
@ 2010-08-25 17:27 Hal Rosenstock
[not found] ` <AANLkTi=54yw-YcpXNvWmsCFybZhTZX6Hp6=KW6Gz9KLH-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Hal Rosenstock @ 2010-08-25 17:27 UTC (permalink / raw)
To: Sasha Khapyorsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Sasha,
I'm seeing an issue with ibnetdiscover from a CA port where it appears
to extend a path at a "remote" CA port (it's actually another port on
the same CA) to query NodeInfo of the next hop beyond it. I get the
following error message:
src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr
0x11:0) bad status 110; Connection timed out
where smpquery -D nodeinfo of 0,1,20 is a CA which can also be seen
from the topology.
It appears to stem from the following code snippet from
libibnetdisc/src/ibnetdisc.c:recv_port_info
if (port_num && mad_get_field(port->info, 0, IB_PORT_PHYS_STATE_F)
== IB_PORT_PHYS_STATE_LINKUP
&& ((node->type == IB_NODE_SWITCH && port_num != local_port) ||
(node == fabric->from_node && port_num == local_port))) {
ib_portid_t path = smp->path;
if (extend_dpath(engine, &path, port_num) > 0)
query_node_info(engine, &path, node);
}
that was introduced by:
commit fcb8d5e7588e38508a8e354c37009d73c0a3889f
Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Date: Sat Apr 10 02:43:24 2010 +0300
libibnetdisc: no backward NodeInfo queries
Then switch is reached via port N we don't need to query back via this
port - source node is discovered already. Finally this saves some amount
of unnecessary MADs.
Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
and subsequently modified by:
commit 49d149c63a44d99259f516a15af53d8cf3f0e7c9
Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Date: Tue Apr 13 19:54:45 2010 +0300
libibnetdisc: don't try to cross discovery over CA
When discovery is running from CA node it shouldn't try to cross over
all ports, but only via local one (send over non-local ports will fail
since CA doesn't route MADs).
Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
due to the (node == fabric->from_node && port_num == local_port)
clause being TRUE.
ibnetdiscover
src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr
0x11:0) bad status 110; Connection timed out
#
# Topology file: generated on Wed Aug 25 18:52:16 2010
#
# Initiated from node 0002c9020020ee0c port 0002c9020020ee0d
vendid=0x2c9
devid=0xb924
sysimgguid=0xb8cffff00438b
switchguid=0xb8cffff00438b(b8cffff00438b)
Switch 24 "S-000b8cffff00438b" # "MT47396 Infiniscale-III
Mellanox Technologies" base port 0 lid 4 lmc 0
[5] "H-0002c903000010e0"[1](2c903000010e1) # "sw124
HCA-1" lid 5 4xDDR
[6] "H-0002c9030000d1c8"[1](2c9030000d1c9) # "sw123
HCA-1" lid 0 4xDDR
[7] "H-0002c9020020ee0c"[1](2c9020020ee0d) # "sw075
HCA-1" lid 2 4xDDR
[20] "H-0002c9020020ee0c"[2](2c9020020ee0e) # "sw075
HCA-1" lid 3 4xDDR
...
vendid=0x2c9
devid=0x6278
sysimgguid=0x2c9020020ee0f
caguid=0x2c9020020ee0c
Ca 2 "H-0002c9020020ee0c" # "sw075 HCA-1"
[1](2c9020020ee0d) "S-000b8cffff00438b"[7] # lid 2 lmc 0
"MT47396 Infiniscale-III Mellanox Technologies" lid 4 4xDDR
[2](2c9020020ee0e) "S-000b8cffff00438b"[20] # lid
3 lmc 0 "MT47396 Infiniscale-III Mellanox Technologies" lid 4 4xDDR
smpquery -D nodeinfo 0,1,20
# Node info: DR path slid 65535; dlid 65535; 0,1,20
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Channel Adapter
NumPorts:........................2
SystemGuid:......................0x0002c9020020ee0f
Guid:............................0x0002c9020020ee0c
PortGuid:........................0x0002c9020020ee0e
PartCap:.........................64
DevId:...........................0x6278
Revision:........................0x000000a0
LocalPort:.......................2
VendorId:........................0x0002c9
I don't think the local port part of the test above (node ==
fabric->from_node && port_num == local_port) is correct where:
local_port = (uint8_t) mad_get_field(port_info, 0,
IB_PORT_LOCAL_PORT_F);
Instead, shouldn't port_num be checked against the local port that
initiated the ibnetdiscover (which in this case is port 1) ? If so, a
"from_portnum" could be added/saved in the fabric structure and used
for this check. Do you concur with this approach ?
-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread[parent not found: <AANLkTi=54yw-YcpXNvWmsCFybZhTZX6Hp6=KW6Gz9KLH-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet [not found] ` <AANLkTi=54yw-YcpXNvWmsCFybZhTZX6Hp6=KW6Gz9KLH-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-09-01 13:43 ` Sasha Khapyorsky 2010-09-01 13:47 ` Hal Rosenstock 0 siblings, 1 reply; 4+ messages in thread From: Sasha Khapyorsky @ 2010-09-01 13:43 UTC (permalink / raw) To: Hal Rosenstock; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Hal, On 13:27 Wed 25 Aug , Hal Rosenstock wrote: > > I'm seeing an issue with ibnetdiscover from a CA port where it appears > to extend a path at a "remote" CA port (it's actually another port on > the same CA) to query NodeInfo of the next hop beyond it. I get the > following error message: > > src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr > 0x11:0) bad status 110; Connection timed out > > where smpquery -D nodeinfo of 0,1,20 is a CA which can also be seen > from the topology. > > It appears to stem from the following code snippet from > libibnetdisc/src/ibnetdisc.c:recv_port_info > > if (port_num && mad_get_field(port->info, 0, IB_PORT_PHYS_STATE_F) > == IB_PORT_PHYS_STATE_LINKUP > && ((node->type == IB_NODE_SWITCH && port_num != local_port) || > (node == fabric->from_node && port_num == local_port))) { > ib_portid_t path = smp->path; > if (extend_dpath(engine, &path, port_num) > 0) > query_node_info(engine, &path, node); > } This makes sense for me. > > that was introduced by: > commit fcb8d5e7588e38508a8e354c37009d73c0a3889f > Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> > Date: Sat Apr 10 02:43:24 2010 +0300 > > libibnetdisc: no backward NodeInfo queries > > Then switch is reached via port N we don't need to query back via this > port - source node is discovered already. Finally this saves some amount > of unnecessary MADs. > > Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> > > and subsequently modified by: > commit 49d149c63a44d99259f516a15af53d8cf3f0e7c9 > Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> > Date: Tue Apr 13 19:54:45 2010 +0300 > > libibnetdisc: don't try to cross discovery over CA > > When discovery is running from CA node it shouldn't try to cross over > all ports, but only via local one (send over non-local ports will fail > since CA doesn't route MADs). > > Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> > > due to the (node == fabric->from_node && port_num == local_port) > clause being TRUE. But I don't see how those patches are actually related to the story. An original (before patches) condition was: if (port_num && mad_get_field(port->info, 0, IB_PORT_PHYS_STATE_F) == IB_PORT_PHYS_STATE_LINKUP && (node->type == IB_NODE_SWITCH || node == fabric->from_node)) , which has the described bug as I can understand this. Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet 2010-09-01 13:43 ` Sasha Khapyorsky @ 2010-09-01 13:47 ` Hal Rosenstock [not found] ` <AANLkTi=RuZukS=kNkZVxax3CP8oZROqYxe71GQfuV2Rx-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Hal Rosenstock @ 2010-09-01 13:47 UTC (permalink / raw) To: Sasha Khapyorsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Sasha, On Wed, Sep 1, 2010 at 9:43 AM, Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote: > Hi Hal, > > On 13:27 Wed 25 Aug , Hal Rosenstock wrote: >> >> I'm seeing an issue with ibnetdiscover from a CA port where it appears >> to extend a path at a "remote" CA port (it's actually another port on >> the same CA) to query NodeInfo of the next hop beyond it. I get the >> following error message: >> >> src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr >> 0x11:0) bad status 110; Connection timed out >> >> where smpquery -D nodeinfo of 0,1,20 is a CA which can also be seen >> from the topology. >> >> It appears to stem from the following code snippet from >> libibnetdisc/src/ibnetdisc.c:recv_port_info >> >> if (port_num && mad_get_field(port->info, 0, IB_PORT_PHYS_STATE_F) >> == IB_PORT_PHYS_STATE_LINKUP >> && ((node->type == IB_NODE_SWITCH && port_num != local_port) || >> (node == fabric->from_node && port_num == local_port))) { >> ib_portid_t path = smp->path; >> if (extend_dpath(engine, &path, port_num) > 0) >> query_node_info(engine, &path, node); >> } > > This makes sense for me. > >> >> that was introduced by: >> commit fcb8d5e7588e38508a8e354c37009d73c0a3889f >> Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> >> Date: Sat Apr 10 02:43:24 2010 +0300 >> >> libibnetdisc: no backward NodeInfo queries >> >> Then switch is reached via port N we don't need to query back via this >> port - source node is discovered already. Finally this saves some amount >> of unnecessary MADs. >> >> Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> >> >> and subsequently modified by: >> commit 49d149c63a44d99259f516a15af53d8cf3f0e7c9 >> Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> >> Date: Tue Apr 13 19:54:45 2010 +0300 >> >> libibnetdisc: don't try to cross discovery over CA >> >> When discovery is running from CA node it shouldn't try to cross over >> all ports, but only via local one (send over non-local ports will fail >> since CA doesn't route MADs). >> >> Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> >> >> due to the (node == fabric->from_node && port_num == local_port) >> clause being TRUE. > > But I don't see how those patches are actually related to the story. An > original (before patches) condition was: > > if (port_num && mad_get_field(port->info, 0, IB_PORT_PHYS_STATE_F) > == IB_PORT_PHYS_STATE_LINKUP > && (node->type == IB_NODE_SWITCH || node == fabric->from_node)) > > , which has the described bug as I can understand this. I thought this used to work and those changes looked related to me. Maybe the fix is right but that part of the problem description isn't. Do you want a revised patch without that part of the description ? -- Hal > > Sasha > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <AANLkTi=RuZukS=kNkZVxax3CP8oZROqYxe71GQfuV2Rx-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet [not found] ` <AANLkTi=RuZukS=kNkZVxax3CP8oZROqYxe71GQfuV2Rx-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-09-01 17:06 ` Sasha Khapyorsky 0 siblings, 0 replies; 4+ messages in thread From: Sasha Khapyorsky @ 2010-09-01 17:06 UTC (permalink / raw) To: Hal Rosenstock; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On 09:47 Wed 01 Sep , Hal Rosenstock wrote: > > I thought this used to work and those changes looked related to me. > Maybe the fix is right but that part of the problem description isn't. > Do you want a revised patch without that part of the description ? No needs - I applied this already. Thanks. Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-09-01 17:06 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-25 17:27 ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet Hal Rosenstock
[not found] ` <AANLkTi=54yw-YcpXNvWmsCFybZhTZX6Hp6=KW6Gz9KLH-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-09-01 13:43 ` Sasha Khapyorsky
2010-09-01 13:47 ` Hal Rosenstock
[not found] ` <AANLkTi=RuZukS=kNkZVxax3CP8oZROqYxe71GQfuV2Rx-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-09-01 17:06 ` Sasha Khapyorsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox