* rdma_connect() "timeout"
@ 2012-07-18 15:12 Yann Droneaud
[not found] ` <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Yann Droneaud @ 2012-07-18 15:12 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA
Hi,
Is there a way to setup the timeout in rdma_connect() ?
I'm testing something not really useful: to trigger connection error,
I'm asking to connect to the network address or the broadcast address,
for example 10.0.0.0/8 or 10.255.255.255/8.
I'm creating an RDMA_CM identifier for the RDMA_PS_TCP port space.
rdma_resolve_addr() is OK, I'm getting RDMA_CM_EVENT_ADDR_RESOLVED
event.
rdma_resolve_route() is OK, I'm getting RDMA_CM_EVENT_ROUTE_RESOLVED
event.
rdma_connect() is OK ... but I'm getting RDMA_CM_EVENT_UNREACHABLE event
about 98 seconds after calling rdma_connect().
And 98 seconds is a bit longer than I expected.
Is there a way to change the CM parameters ? e.g. "Service Timeout" to
wait for moving from "REP wait" state to "Timeout" state, and the number
of send "REQ" retries (From 12.9.5 "Communication Establishement and
Release - Active") ?
Is struct rdma_conn_param.retry_count the number of "REQ" retries ?
According to the manpage, it seems it doesn't apply to CM.
Regards.
--
Yann Droneaud
OPTEYA
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: rdma_connect() "timeout"
[not found] ` <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
@ 2012-07-18 15:49 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Hefty, Sean @ 2012-07-18 15:49 UTC (permalink / raw)
To: Yann Droneaud, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Is there a way to setup the timeout in rdma_connect() ?
For IB, the timeout is based on the packet lifetime in the path record returned by the SA. The rdma_cm will retry a CM REQ the maximum number of times (15).
> Is there a way to change the CM parameters ? e.g. "Service Timeout" to
> wait for moving from "REP wait" state to "Timeout" state, and the number
> of send "REQ" retries (From 12.9.5 "Communication Establishement and
> Release - Active") ?
There is no direct way to change the timeout parameter. You would need to adjust the subnet timeout values at the SA.
> Is struct rdma_conn_param.retry_count the number of "REQ" retries ?
> According to the manpage, it seems it doesn't apply to CM.
The retry_count applies to the QP and is not associated with the CM timeout. I.e. it maps to REQ:retry_count, versus REQ:max_cm_retries.
- Sean
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rdma_connect() "timeout"
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-07-18 17:16 ` Yann Droneaud
[not found] ` <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Yann Droneaud @ 2012-07-18 17:16 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA
Hi,
Le mercredi 18 juillet 2012 à 15:49 +0000, Hefty, Sean a écrit :
> > Is there a way to setup the timeout in rdma_connect() ?
>
> For IB, the timeout is based on the packet lifetime in the path record returned by the SA.
> The rdma_cm will retry a CM REQ the maximum number of times (15).
>
According to the OpenSM default configuration (/usr/sbin/opensm
--create-config <config>) :
# The subnet_timeout code that will be set for all the ports
# The actual timeout is 4.096usec * 2^<subnet_timeout>
subnet_timeout 18
# The code of maximal time a packet can live in a switch
# The actual time is 4.096usec * 2^<packet_life_time>
# The value 0x14 disables this mechanism
packet_life_time 0x12
Despite the notation, they are the same values.
It gives me:
4.096 * 10^-6 * 2^18 = 1.074 s
15 * subnet timeout / packet life time = 16.106 s ...
This is a lot less than 98 s.
Where does come the difference ?
> > Is there a way to change the CM parameters ? e.g. "Service Timeout" to
> > wait for moving from "REP wait" state to "Timeout" state, and the number
> > of send "REQ" retries (From 12.9.5 "Communication Establishement and
> > Release - Active") ?
>
> There is no direct way to change the timeout parameter. You would need to adjust the subnet timeout values at the SA.
Is it subnet_timeout or packet_life_time ?
Thanks for your answers.
Regards.
--
Yann Droneaud
OPTEYA
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: rdma_connect() "timeout"
[not found] ` <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
@ 2012-07-18 17:30 ` Hefty, Sean
0 siblings, 0 replies; 4+ messages in thread
From: Hefty, Sean @ 2012-07-18 17:30 UTC (permalink / raw)
To: Yann Droneaud; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> According to the OpenSM default configuration (/usr/sbin/opensm
> --create-config <config>) :
>
> # The subnet_timeout code that will be set for all the ports
> # The actual timeout is 4.096usec * 2^<subnet_timeout>
> subnet_timeout 18
>
> # The code of maximal time a packet can live in a switch
> # The actual time is 4.096usec * 2^<packet_life_time>
> # The value 0x14 disables this mechanism
> packet_life_time 0x12
>
> Despite the notation, they are the same values.
>
> It gives me:
>
> 4.096 * 10^-6 * 2^18 = 1.074 s
>
> 15 * subnet timeout / packet life time = 16.106 s ...
>
> This is a lot less than 98 s.
>
> Where does come the difference ?
The IB CM calculates the timeout as:
packet lifetime * 2 + remote cm response timeout
The rdma_cm has a hard coded value of 20 for the remote cm response timeout, which ends up accounting for the majority of the time... :/
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-07-18 17:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-18 15:12 rdma_connect() "timeout" Yann Droneaud
[not found] ` <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
2012-07-18 15:49 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-07-18 17:16 ` Yann Droneaud
[not found] ` <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
2012-07-18 17:30 ` Hefty, Sean
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox