* cm_process_routed_req() does not resonate well with RoCE systems
@ 2021-03-18 19:21 Haakon Bugge
2021-03-19 15:07 ` Haakon Bugge
0 siblings, 1 reply; 4+ messages in thread
From: Haakon Bugge @ 2021-03-18 19:21 UTC (permalink / raw)
To: OFED mailing list, Jason Gunthorpe, Leon Romanovsky
With the introduction of RoCE systems, a CM REQ message will contain (pasted from Wireshark):
Primary Hop Limit: 0x40
.... 0... = Primary Subnet Local: 0x0
This because cma_resolve_iboe_route() has:
if (((struct sockaddr *)&id_priv->id.route.addr.dst_addr)->sa_family != AF_IB)
/* TODO: get the hoplimit from the inet/inet6 device */
route->path_rec->hop_limit = addr->dev_addr.hoplimit;
else
route->path_rec->hop_limit = 1;
The addr->dev_addr.hoplimit is coming from addr4_resolve(), which has:
addr->hoplimit = ip4_dst_hoplimit(&rt->dst);
ip4_dst_hoplimit() returns the value of the sysctl net.ipv4.ip_default_ttl in this case (64).
For the purpose of this case, consider the CM REQ to have the Primary SL != 0.
When this REQ gets processed by cm_req_handler(), the cm_process_routed_req() function is called.
Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed:
IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
At least on the system I ran on, which was equipped with a Mellanox CX-5 HCA, the wc->sl is zero. Hence, the request to setup a connection using an SL != zero, will not be honoured, and a connection using SL zero will be created instead.
As a side note, in cm_process_routed_req(), we have:
IBA_SET(CM_REQ_PRIMARY_REMOTE_PORT_LID, req_msg, wc->dlid_path_bits);
which is strange, since a LID is 16 bits, whereas dlid_path_bits is only eight.
I am uncertain about the correct fix here. Any input appreciated.
Thxs, Håkon
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: cm_process_routed_req() does not resonate well with RoCE systems 2021-03-18 19:21 cm_process_routed_req() does not resonate well with RoCE systems Haakon Bugge @ 2021-03-19 15:07 ` Haakon Bugge 2021-03-23 18:08 ` Jason Gunthorpe 0 siblings, 1 reply; 4+ messages in thread From: Haakon Bugge @ 2021-03-19 15:07 UTC (permalink / raw) To: OFED mailing list, Jason Gunthorpe, Leon Romanovsky > On 18 Mar 2021, at 20:21, Haakon Bugge <haakon.bugge@oracle.com> wrote: > > With the introduction of RoCE systems, a CM REQ message will contain (pasted from Wireshark): > > Primary Hop Limit: 0x40 > .... 0... = Primary Subnet Local: 0x0 > > This because cma_resolve_iboe_route() has: > > if (((struct sockaddr *)&id_priv->id.route.addr.dst_addr)->sa_family != AF_IB) > /* TODO: get the hoplimit from the inet/inet6 device */ > route->path_rec->hop_limit = addr->dev_addr.hoplimit; > else > route->path_rec->hop_limit = 1; > > The addr->dev_addr.hoplimit is coming from addr4_resolve(), which has: > > addr->hoplimit = ip4_dst_hoplimit(&rt->dst); > > ip4_dst_hoplimit() returns the value of the sysctl net.ipv4.ip_default_ttl in this case (64). > > For the purpose of this case, consider the CM REQ to have the Primary SL != 0. > > When this REQ gets processed by cm_req_handler(), the cm_process_routed_req() function is called. > > Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed: > > IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); > > At least on the system I ran on, which was equipped with a Mellanox CX-5 HCA, the wc->sl is zero. Hence, the request to setup a connection using an SL != zero, will not be honoured, and a connection using SL zero will be created instead. > > As a side note, in cm_process_routed_req(), we have: > > IBA_SET(CM_REQ_PRIMARY_REMOTE_PORT_LID, req_msg, wc->dlid_path_bits); > > which is strange, since a LID is 16 bits, whereas dlid_path_bits is only eight. > > I am uncertain about the correct fix here. Any input appreciated. I intend to send a patch doing: --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work) goto destroy; } - cm_process_routed_req(req_msg, work->mad_recv_wc->wc); + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) + cm_process_routed_req(req_msg, work->mad_recv_wc->wc); memset(&work->path[0], 0, sizeof(work->path[0])); if (cm_req_has_alt_path(req_msg)) if I do not get a push back. Thxs, Håkon ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: cm_process_routed_req() does not resonate well with RoCE systems 2021-03-19 15:07 ` Haakon Bugge @ 2021-03-23 18:08 ` Jason Gunthorpe 2021-03-23 18:38 ` Haakon Bugge 0 siblings, 1 reply; 4+ messages in thread From: Jason Gunthorpe @ 2021-03-23 18:08 UTC (permalink / raw) To: Haakon Bugge; +Cc: OFED mailing list, Leon Romanovsky On Fri, Mar 19, 2021 at 03:07:07PM +0000, Haakon Bugge wrote: > > > > On 18 Mar 2021, at 20:21, Haakon Bugge <haakon.bugge@oracle.com> wrote: > > > > With the introduction of RoCE systems, a CM REQ message will contain (pasted from Wireshark): > > > > Primary Hop Limit: 0x40 > > .... 0... = Primary Subnet Local: 0x0 > > > > This because cma_resolve_iboe_route() has: > > > > if (((struct sockaddr *)&id_priv->id.route.addr.dst_addr)->sa_family != AF_IB) > > /* TODO: get the hoplimit from the inet/inet6 device */ > > route->path_rec->hop_limit = addr->dev_addr.hoplimit; > > else > > route->path_rec->hop_limit = 1; > > > > The addr->dev_addr.hoplimit is coming from addr4_resolve(), which has: > > > > addr->hoplimit = ip4_dst_hoplimit(&rt->dst); > > > > ip4_dst_hoplimit() returns the value of the sysctl net.ipv4.ip_default_ttl in this case (64). > > > > For the purpose of this case, consider the CM REQ to have the Primary SL != 0. > > > > When this REQ gets processed by cm_req_handler(), the cm_process_routed_req() function is called. > > > > Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed: > > > > IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); > > > > At least on the system I ran on, which was equipped with a > > Mellanox CX-5 HCA, the wc->sl is zero. Hence, the request to setup > > a connection using an SL != zero, will not be honoured, and a > > connection using SL zero will be created instead. > > > > As a side note, in cm_process_routed_req(), we have: > > > > IBA_SET(CM_REQ_PRIMARY_REMOTE_PORT_LID, req_msg, wc->dlid_path_bits); > > > > which is strange, since a LID is 16 bits, whereas dlid_path_bits is only eight. > > > > I am uncertain about the correct fix here. Any input appreciated. > > I intend to send a patch doing: > > +++ b/drivers/infiniband/core/cm.c > @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work) > goto destroy; > } > > - cm_process_routed_req(req_msg, work->mad_recv_wc->wc); > + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) > + cm_process_routed_req(req_msg, work->mad_recv_wc->wc); > > memset(&work->path[0], 0, sizeof(work->path[0])); > if (cm_req_has_alt_path(req_msg)) > > > if I do not get a push back. This does seem reasonable, but I don't understand the underlying issue, why is anything in roce land touching the SL? Are you trying to use the SL as a proxy for the TOS bits? Jason ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: cm_process_routed_req() does not resonate well with RoCE systems 2021-03-23 18:08 ` Jason Gunthorpe @ 2021-03-23 18:38 ` Haakon Bugge 0 siblings, 0 replies; 4+ messages in thread From: Haakon Bugge @ 2021-03-23 18:38 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: OFED mailing list, Leon Romanovsky > On 23 Mar 2021, at 19:08, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Fri, Mar 19, 2021 at 03:07:07PM +0000, Haakon Bugge wrote: >> >> >>> On 18 Mar 2021, at 20:21, Haakon Bugge <haakon.bugge@oracle.com> wrote: >>> >>> With the introduction of RoCE systems, a CM REQ message will contain (pasted from Wireshark): >>> >>> Primary Hop Limit: 0x40 >>> .... 0... = Primary Subnet Local: 0x0 >>> >>> This because cma_resolve_iboe_route() has: >>> >>> if (((struct sockaddr *)&id_priv->id.route.addr.dst_addr)->sa_family != AF_IB) >>> /* TODO: get the hoplimit from the inet/inet6 device */ >>> route->path_rec->hop_limit = addr->dev_addr.hoplimit; >>> else >>> route->path_rec->hop_limit = 1; >>> >>> The addr->dev_addr.hoplimit is coming from addr4_resolve(), which has: >>> >>> addr->hoplimit = ip4_dst_hoplimit(&rt->dst); >>> >>> ip4_dst_hoplimit() returns the value of the sysctl net.ipv4.ip_default_ttl in this case (64). >>> >>> For the purpose of this case, consider the CM REQ to have the Primary SL != 0. >>> >>> When this REQ gets processed by cm_req_handler(), the cm_process_routed_req() function is called. >>> >>> Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed: >>> >>> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); >>> >>> At least on the system I ran on, which was equipped with a >>> Mellanox CX-5 HCA, the wc->sl is zero. Hence, the request to setup >>> a connection using an SL != zero, will not be honoured, and a >>> connection using SL zero will be created instead. >>> >>> As a side note, in cm_process_routed_req(), we have: >>> >>> IBA_SET(CM_REQ_PRIMARY_REMOTE_PORT_LID, req_msg, wc->dlid_path_bits); >>> >>> which is strange, since a LID is 16 bits, whereas dlid_path_bits is only eight. >>> >>> I am uncertain about the correct fix here. Any input appreciated. >> >> I intend to send a patch doing: >> >> +++ b/drivers/infiniband/core/cm.c >> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work) >> goto destroy; >> } >> >> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc); >> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) >> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc); >> >> memset(&work->path[0], 0, sizeof(work->path[0])); >> if (cm_req_has_alt_path(req_msg)) >>> >> if I do not get a push back. > > This does seem reasonable, but I don't understand the underlying > issue, why is anything in roce land touching the SL? Are you trying to > use the SL as a proxy for the TOS bits? We want to control the DSCP in the encapsulating IP packet to select different TCs. As per the RoCE Annex: <quote> The SL component in the Address Vector is used to determine the Ethernet Priority of generated RoCEv2 packets. SL 0-7 are mapped directly to Priorities 0-7, respectively. SL 8-15 are reserved. </quote> Quite similar to how IB Link-layer translates the SL to an VL. Thxs, Håkon ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-03-23 18:39 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-03-18 19:21 cm_process_routed_req() does not resonate well with RoCE systems Haakon Bugge 2021-03-19 15:07 ` Haakon Bugge 2021-03-23 18:08 ` Jason Gunthorpe 2021-03-23 18:38 ` Haakon Bugge
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.