* rdma_connect() "timeout"
@ 2012-07-18 15:12 Yann Droneaud
[not found] ` <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Yann Droneaud @ 2012-07-18 15:12 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA
Hi,
Is there a way to setup the timeout in rdma_connect() ?
I'm testing something not really useful: to trigger connection error,
I'm asking to connect to the network address or the broadcast address,
for example 10.0.0.0/8 or 10.255.255.255/8.
I'm creating an RDMA_CM identifier for the RDMA_PS_TCP port space.
rdma_resolve_addr() is OK, I'm getting RDMA_CM_EVENT_ADDR_RESOLVED
event.
rdma_resolve_route() is OK, I'm getting RDMA_CM_EVENT_ROUTE_RESOLVED
event.
rdma_connect() is OK ... but I'm getting RDMA_CM_EVENT_UNREACHABLE event
about 98 seconds after calling rdma_connect().
And 98 seconds is a bit longer than I expected.
Is there a way to change the CM parameters ? e.g. "Service Timeout" to
wait for moving from "REP wait" state to "Timeout" state, and the number
of send "REQ" retries (From 12.9.5 "Communication Establishement and
Release - Active") ?
Is struct rdma_conn_param.retry_count the number of "REQ" retries ?
According to the manpage, it seems it doesn't apply to CM.
Regards.
--
Yann Droneaud
OPTEYA
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread[parent not found: <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>]
* RE: rdma_connect() "timeout" [not found] ` <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org> @ 2012-07-18 15:49 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Hefty, Sean @ 2012-07-18 15:49 UTC (permalink / raw) To: Yann Droneaud, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Is there a way to setup the timeout in rdma_connect() ? For IB, the timeout is based on the packet lifetime in the path record returned by the SA. The rdma_cm will retry a CM REQ the maximum number of times (15). > Is there a way to change the CM parameters ? e.g. "Service Timeout" to > wait for moving from "REP wait" state to "Timeout" state, and the number > of send "REQ" retries (From 12.9.5 "Communication Establishement and > Release - Active") ? There is no direct way to change the timeout parameter. You would need to adjust the subnet timeout values at the SA. > Is struct rdma_conn_param.retry_count the number of "REQ" retries ? > According to the manpage, it seems it doesn't apply to CM. The retry_count applies to the QP and is not associated with the CM timeout. I.e. it maps to REQ:retry_count, versus REQ:max_cm_retries. - Sean ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: rdma_connect() "timeout" [not found] ` <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2012-07-18 17:16 ` Yann Droneaud [not found] ` <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Yann Droneaud @ 2012-07-18 17:16 UTC (permalink / raw) To: Hefty, Sean Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA Hi, Le mercredi 18 juillet 2012 à 15:49 +0000, Hefty, Sean a écrit : > > Is there a way to setup the timeout in rdma_connect() ? > > For IB, the timeout is based on the packet lifetime in the path record returned by the SA. > The rdma_cm will retry a CM REQ the maximum number of times (15). > According to the OpenSM default configuration (/usr/sbin/opensm --create-config <config>) : # The subnet_timeout code that will be set for all the ports # The actual timeout is 4.096usec * 2^<subnet_timeout> subnet_timeout 18 # The code of maximal time a packet can live in a switch # The actual time is 4.096usec * 2^<packet_life_time> # The value 0x14 disables this mechanism packet_life_time 0x12 Despite the notation, they are the same values. It gives me: 4.096 * 10^-6 * 2^18 = 1.074 s 15 * subnet timeout / packet life time = 16.106 s ... This is a lot less than 98 s. Where does come the difference ? > > Is there a way to change the CM parameters ? e.g. "Service Timeout" to > > wait for moving from "REP wait" state to "Timeout" state, and the number > > of send "REQ" retries (From 12.9.5 "Communication Establishement and > > Release - Active") ? > > There is no direct way to change the timeout parameter. You would need to adjust the subnet timeout values at the SA. Is it subnet_timeout or packet_life_time ? Thanks for your answers. Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>]
* RE: rdma_connect() "timeout" [not found] ` <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org> @ 2012-07-18 17:30 ` Hefty, Sean 0 siblings, 0 replies; 4+ messages in thread From: Hefty, Sean @ 2012-07-18 17:30 UTC (permalink / raw) To: Yann Droneaud; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > According to the OpenSM default configuration (/usr/sbin/opensm > --create-config <config>) : > > # The subnet_timeout code that will be set for all the ports > # The actual timeout is 4.096usec * 2^<subnet_timeout> > subnet_timeout 18 > > # The code of maximal time a packet can live in a switch > # The actual time is 4.096usec * 2^<packet_life_time> > # The value 0x14 disables this mechanism > packet_life_time 0x12 > > Despite the notation, they are the same values. > > It gives me: > > 4.096 * 10^-6 * 2^18 = 1.074 s > > 15 * subnet timeout / packet life time = 16.106 s ... > > This is a lot less than 98 s. > > Where does come the difference ? The IB CM calculates the timeout as: packet lifetime * 2 + remote cm response timeout The rdma_cm has a hard coded value of 20 for the remote cm response timeout, which ends up accounting for the majority of the time... :/ ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-07-18 17:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-18 15:12 rdma_connect() "timeout" Yann Droneaud
[not found] ` <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
2012-07-18 15:49 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-07-18 17:16 ` Yann Droneaud
[not found] ` <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
2012-07-18 17:30 ` Hefty, Sean
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox