public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* rdma_connect() "timeout"
@ 2012-07-18 15:12 Yann Droneaud
       [not found] ` <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Yann Droneaud @ 2012-07-18 15:12 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA

Hi,

Is there a way to setup the timeout in rdma_connect() ?

I'm testing something not really useful: to trigger connection error,
I'm asking to connect to the network address or the broadcast address,
for example 10.0.0.0/8 or 10.255.255.255/8.

I'm creating an RDMA_CM identifier for the RDMA_PS_TCP port space.

rdma_resolve_addr() is OK, I'm getting RDMA_CM_EVENT_ADDR_RESOLVED
event.
rdma_resolve_route() is OK, I'm getting RDMA_CM_EVENT_ROUTE_RESOLVED
event.
rdma_connect() is OK ... but I'm getting RDMA_CM_EVENT_UNREACHABLE event
about 98 seconds after calling rdma_connect().

And 98 seconds is a bit longer than I expected.

Is there a way to change the CM parameters ? e.g. "Service Timeout" to
wait for moving from "REP wait" state to "Timeout" state, and the number
of send "REQ" retries (From 12.9.5 "Communication Establishement and
Release - Active") ?

Is struct rdma_conn_param.retry_count the number of "REQ" retries ?
According to the manpage, it seems it doesn't apply to CM.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: rdma_connect() "timeout"
       [not found] ` <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
@ 2012-07-18 15:49   ` Hefty, Sean
       [not found]     ` <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Hefty, Sean @ 2012-07-18 15:49 UTC (permalink / raw)
  To: Yann Droneaud, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> Is there a way to setup the timeout in rdma_connect() ?

For IB, the timeout is based on the packet lifetime in the path record returned by the SA.  The rdma_cm will retry a CM REQ the maximum number of times (15).
 
> Is there a way to change the CM parameters ? e.g. "Service Timeout" to
> wait for moving from "REP wait" state to "Timeout" state, and the number
> of send "REQ" retries (From 12.9.5 "Communication Establishement and
> Release - Active") ?

There is no direct way to change the timeout parameter.  You would need to adjust the subnet timeout values at the SA.
 
> Is struct rdma_conn_param.retry_count the number of "REQ" retries ?
> According to the manpage, it seems it doesn't apply to CM.

The retry_count applies to the QP and is not associated with the CM timeout.  I.e. it maps to REQ:retry_count, versus REQ:max_cm_retries.

- Sean

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rdma_connect() "timeout"
       [not found]     ` <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-07-18 17:16       ` Yann Droneaud
       [not found]         ` <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Yann Droneaud @ 2012-07-18 17:16 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA

Hi,

Le mercredi 18 juillet 2012 à 15:49 +0000, Hefty, Sean a écrit :
> > Is there a way to setup the timeout in rdma_connect() ?
> 
> For IB, the timeout is based on the packet lifetime in the path record returned by the SA.  
> The rdma_cm will retry a CM REQ the maximum number of times (15).
>  

According to the OpenSM default configuration (/usr/sbin/opensm
--create-config <config>) :

  # The subnet_timeout code that will be set for all the ports
  # The actual timeout is 4.096usec * 2^<subnet_timeout>
  subnet_timeout 18

  # The code of maximal time a packet can live in a switch
  # The actual time is 4.096usec * 2^<packet_life_time>
  # The value 0x14 disables this mechanism
  packet_life_time 0x12

Despite the notation, they are the same values.

It gives me:

 4.096 * 10^-6 * 2^18 = 1.074 s

 15 * subnet timeout / packet life time = 16.106 s ... 

This is a lot less than 98 s.

Where does come the difference ?

> > Is there a way to change the CM parameters ? e.g. "Service Timeout" to
> > wait for moving from "REP wait" state to "Timeout" state, and the number
> > of send "REQ" retries (From 12.9.5 "Communication Establishement and
> > Release - Active") ?
> 
> There is no direct way to change the timeout parameter.  You would need to adjust the subnet timeout values at the SA.

Is it subnet_timeout or packet_life_time ?

Thanks for your answers.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: rdma_connect() "timeout"
       [not found]         ` <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
@ 2012-07-18 17:30           ` Hefty, Sean
  0 siblings, 0 replies; 4+ messages in thread
From: Hefty, Sean @ 2012-07-18 17:30 UTC (permalink / raw)
  To: Yann Droneaud; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> According to the OpenSM default configuration (/usr/sbin/opensm
> --create-config <config>) :
> 
>   # The subnet_timeout code that will be set for all the ports
>   # The actual timeout is 4.096usec * 2^<subnet_timeout>
>   subnet_timeout 18
> 
>   # The code of maximal time a packet can live in a switch
>   # The actual time is 4.096usec * 2^<packet_life_time>
>   # The value 0x14 disables this mechanism
>   packet_life_time 0x12
> 
> Despite the notation, they are the same values.
> 
> It gives me:
> 
>  4.096 * 10^-6 * 2^18 = 1.074 s
> 
>  15 * subnet timeout / packet life time = 16.106 s ...
> 
> This is a lot less than 98 s.
> 
> Where does come the difference ?

The IB CM calculates the timeout as:

packet lifetime * 2 + remote cm response timeout

The rdma_cm has a hard coded value of 20 for the remote cm response timeout, which ends up accounting for the majority of the time... :/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-07-18 17:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-18 15:12 rdma_connect() "timeout" Yann Droneaud
     [not found] ` <1342624372.19395.35.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
2012-07-18 15:49   ` Hefty, Sean
     [not found]     ` <1828884A29C6694DAF28B7E6B8A8237346A6A5AA-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-07-18 17:16       ` Yann Droneaud
     [not found]         ` <1342631766.19395.48.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
2012-07-18 17:30           ` Hefty, Sean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox