* RDMA_CM_EVENT_REJECTED and ressources release
@ 2012-06-20 14:19 Yann Droneaud
[not found] ` <CAFRond5aFKD1orfM=3wdDkkHq47b8BkY2RiidMLJDTy014gdLQ@mail.gmail.com>
[not found] ` <1340201995.2468.24.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
0 siblings, 2 replies; 3+ messages in thread
From: Yann Droneaud @ 2012-06-20 14:19 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA
Hi,
I'm trying to make my InfiniBand verbs/RDMA application more reliable
regarding RDMA CM error events.
In particular, I'm trying to handle verbs ressources release.
Here's a scenarii from the client point of view:
rdma_resolve_addr()
=> event RDMA_CM_EVENT_ADDR_RESOLVED
rdma_resolv_route()
=> event RDMA_CM_EVENT_ROUTE_RESOLVED
ibv_reg_mr()
ibv_create_cq()
rdma_create_qp()
rdma_connect()
=> event RDMA_CM_EVENT_REJECTED !
In the handler of RDMA_CM_EVENT_REJECTED, I could handle this in two
different ways:
- call rdma_disconnect(): even if the connection is not established,
rdma_disconnect() can be called.
In this case, all receive WR posted came back in error.
But there's no event RDMA_CM_EVENT_TIMEWAIT_EXIT to handle
where the program could call rdma_destroy_qp(), ibv_destroy_cq(),
ibv_dereg_mr(), and rdma_destroy_id().
Note there's no event RDMA_CM_EVENT_DISCONNECTED either (indeed).
- call rdma_destroy_qp(), ibv_destroy_cq(), ibv_dereg_mr(), and
rdma_destroy_id().
Before calling ibv_destroy_cq(), the program call ibv_poll_cq() to
flush the CQ (but the function return -2 when called on the CQ used
to hold receive WC, but without problem on the one used to hold send
WC)
The completion channel which was registered against the CQ is
notified of an event. ibv_get_cq_event() will return a pointer
to the destroyed CQ and ibv_poll_cq() return 0 (no WC).
(and currently my code is calling ibv_ack_cq_events() then
ibv_req_notify_cq() on the CQ returned by ibv_get_cq_event).
Neither solution seems really suitable to me.
Do you have any tip/hint to handle this situation.
Regards.
--
Yann Droneaud
OPTEYA
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: RDMA_CM_EVENT_REJECTED and ressources release
[not found] ` <CAFRond5aFKD1orfM=3wdDkkHq47b8BkY2RiidMLJDTy014gdLQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-06-21 6:11 ` Yann Droneaud
0 siblings, 0 replies; 3+ messages in thread
From: Yann Droneaud @ 2012-06-21 6:11 UTC (permalink / raw)
To: Veerendra Allada
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA
Le mercredi 20 juin 2012 à 21:27 -0400, Veerendra Allada a écrit :
> Is it possible to delay the memory registrations (ibv_reg_mr) and the
> ibv_post_recv()'s until the connection is established ? Based on what
> i understood, i am suggesting the following sequence of calls on
> Client and Server. See if it helps.
>
> Client
> rdma_create_event_channel()
> rdma_create_id()
> rdma_resolve_addr()
> ibv_alloc_pd()
> ibv_create_comp_channel()
> ibv_create_cq()
> rdma_create_qp()
> rdma_resolve_route()
> rdma_connect()
>
> If the connection is not successful, call the corresponding destroy
> calls in the the reverse order.
> If the connection is successful, register memory and post the receive
> calls.
>
> Server
> rdma_create_event_channel()
> rdma_create_id()
> rdma_bind_addr()
> rdma_listen()
>
> Up on a new incoming connection request, do the following sequence.
>
> ibv_alloc_pd()
> ibv_create_comp_channel()
> ibv_create_cq()
> rdma_create_qp()
> rdma_accept()
>
> After the connection is successful, register memory and then post
> receive requests.
> At this point you should be able to post ibv_post_send requests and
> poll your completion queue's, etc.
>
Thanks for your answer, but it's not applicable on my case:
I have to post receive works before doing rdma_accept()/rdma_connect()
so that both side are ready to receive messages sent by other peer.
Not doing so will lead to races, RNR errors will be triggered on the
sending side if it's faster than the receiving side.
If the protocol implemented above the RDMA connection is
request/response based and initiated by the client, the client could
follow your proposed scheme, but on the server side, it will have to
post receive works before calling rdma_accept().
Regards.
--
Yann Droneaud
OPTEYA
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: RDMA_CM_EVENT_REJECTED and ressources release
[not found] ` <1340201995.2468.24.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
@ 2012-06-21 16:04 ` Hefty, Sean
0 siblings, 0 replies; 3+ messages in thread
From: Hefty, Sean @ 2012-06-21 16:04 UTC (permalink / raw)
To: Yann Droneaud, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> - call rdma_disconnect(): even if the connection is not established,
> rdma_disconnect() can be called.
>
> In this case, all receive WR posted came back in error.
On processing a reject event, the librdmacm should transition the QP into the error state. That should flush all posted work requests. At that point, you should only need to pull off the failed WRs from the CQ, then destroy everything.
> - call rdma_destroy_qp(), ibv_destroy_cq(), ibv_dereg_mr(), and
> rdma_destroy_id().
>
> Before calling ibv_destroy_cq(), the program call ibv_poll_cq() to
> flush the CQ (but the function return -2 when called on the CQ used
> to hold receive WC, but without problem on the one used to hold send
> WC)
I have no idea what the -2 return code is indicating or why it occurs. But try polling the CQ before attempting to destroy the QP.
- Sean
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-06-21 16:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-20 14:19 RDMA_CM_EVENT_REJECTED and ressources release Yann Droneaud
[not found] ` <CAFRond5aFKD1orfM=3wdDkkHq47b8BkY2RiidMLJDTy014gdLQ@mail.gmail.com>
[not found] ` <CAFRond5aFKD1orfM=3wdDkkHq47b8BkY2RiidMLJDTy014gdLQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-06-21 6:11 ` Yann Droneaud
[not found] ` <1340201995.2468.24.camel-sQn2kEGNn0pFevvuwOF9vF6hYfS7NtTn@public.gmane.org>
2012-06-21 16:04 ` Hefty, Sean
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox