* rdma_get_cm_event() vs ibv_async_event() [not found] ` <1301059658.2192.23.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org> @ 2011-03-29 9:37 ` Yann Droneaud [not found] ` <1301391461.2192.41.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Yann Droneaud @ 2011-03-29 9:37 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA [This question was already asked on ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org] Hi, I'm using RDMA CM to manage connections above Infiniband. On the server side, RDMA CM event are monitored using an event channel created by rdma_create_event_channel(), polled using file descriptor rdma_event_channel->fd and later processed by rdma_get_cm_event() on event. After a connection is established, IB async events are polled using the file descriptor found in the associated connection id returned by RDMA CM layer: ibv_context->async_fd and proceed using ibv_get_async_event(). But the only event I caught and handle is IBV_EVENT_COMM_EST because it happen sometimes under high load. In this case, rdma_notify() is used to send the event back to RDMA CM layer. But sometimes, rdma_notify() returns -1 and errno is set to EISCONN : Transport endpoint is already connected. (It happens mostly when I'm running my test program under strace). I'm suspecting there's some kind of race between RDMA event channel and IB async event processing. So here's my question: should I monitor async events using ibv_get_async_event() or is it fully managed by the RDMA CM layer ? Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <1301391461.2192.41.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org>]
* Re: rdma_get_cm_event() vs ibv_async_event() [not found] ` <1301391461.2192.41.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org> @ 2011-03-29 10:14 ` Or Gerlitz [not found] ` <4D91B107.2040007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Or Gerlitz @ 2011-03-29 10:14 UTC (permalink / raw) To: Yann Droneaud; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean Yann Droneaud wrote: > [...] I caught and handle is IBV_EVENT_COMM_EST because it > happen sometimes under high load. In this case, rdma_notify() is used to > send the event back to RDMA CM layer. But sometimes, rdma_notify() > returns -1 and errno is set to EISCONN : Transport endpoint is already > connected. (It happens mostly when I'm running my test program under strace). EISCONN means that between the time you've got the comm established async event to the time you reported on it by calling rdma_notify, the kernel CM managed to establish the connection. Note that the man page says that rdma_notify "handle the rare situation where the connection never forms on its own", so as for your questions, the errno you see isn't a failure (see the patch I sent to Sean), and you should listen on IB async event and report the comm established for the rare case the kernel can't establish the connection as of repeated CM packet loss Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <4D91B107.2040007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: rdma_get_cm_event() vs ibv_async_event() [not found] ` <4D91B107.2040007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2011-03-29 10:33 ` Yann Droneaud [not found] ` <1301394815.2193.2.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Yann Droneaud @ 2011-03-29 10:33 UTC (permalink / raw) To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean Le mardi 29 mars 2011 à 12:14 +0200, Or Gerlitz a écrit : > Yann Droneaud wrote: > > [...] I caught and handle is IBV_EVENT_COMM_EST because it > > happen sometimes under high load. In this case, rdma_notify() is used to > > send the event back to RDMA CM layer. But sometimes, rdma_notify() > > returns -1 and errno is set to EISCONN : Transport endpoint is already > > connected. (It happens mostly when I'm running my test program under strace). > > EISCONN means that between the time you've got the comm established > async event to the time you reported on it by calling rdma_notify, > the kernel CM managed to establish the connection. Note that the man > page says that rdma_notify "handle the rare situation where the > connection never forms on its own", so as for your questions, the errno > you see isn't a failure (see the patch I sent to Sean), and you should > listen on IB async event and report the comm established for the rare > case the kernel can't establish the connection as of repeated CM packet loss > Thanks for the information. So ignoring IBV_EVENT_COMM_EST async event seems the best option when using RDMA CM to establish connection ? Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <1301394815.2193.2.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org>]
* Re: rdma_get_cm_event() vs ibv_async_event() [not found] ` <1301394815.2193.2.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org> @ 2011-03-29 10:45 ` Or Gerlitz [not found] ` <4D91B85C.1080106-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Or Gerlitz @ 2011-03-29 10:45 UTC (permalink / raw) To: Yann Droneaud, Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA Yann Droneaud wrote: > So ignoring IBV_EVENT_COMM_EST async event seems the best option when > using RDMA CM to establish connection ? no! its a must for the rare case of repeated CM packet loss - Sean, do you agree to that assertion? should the man page be more clear about that? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <4D91B85C.1080106-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* RE: rdma_get_cm_event() vs ibv_async_event() [not found] ` <4D91B85C.1080106-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2011-03-29 14:51 ` Hefty, Sean 0 siblings, 0 replies; 5+ messages in thread From: Hefty, Sean @ 2011-03-29 14:51 UTC (permalink / raw) To: Or Gerlitz, Yann Droneaud Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1342 bytes --] > > So ignoring IBV_EVENT_COMM_EST async event seems the best option when > > using RDMA CM to establish connection ? > > no! its a must for the rare case of repeated CM packet loss - Sean, > do you agree to that assertion? should the man page be more clear about > that? Handling that event is needed IMO in an extremely rare case where: The remote side sends a CM request message. The CM request message is received and a reply is generated. The CM reply message is received by the remote side. The last CM message (ready to use) from the remote side is lost or delayed 15 times, but somehow a message on the connected QP arrives. The app sees a COMM_EST event and calls rdma_notify The processing of rdma_notify occurs before the CM can timeout waiting for the RTU. (Note that in theory, the path that a CM message takes through the fabric may differ from that taken by a message posted on the QP. In practice, this is not the case given the current implementation.) So... handling the COMM_EST event does handle the rare condition listed above. There's no harm in handling the event, but deciding how essential it is to handle is left as an exercise to the reader. :) - Sean N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-03-29 14:51 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1301052615.2192.15.camel@deela.quest-ce.net>
[not found] ` <1301059658.2192.23.camel@deela.quest-ce.net>
[not found] ` <1301059658.2192.23.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org>
2011-03-29 9:37 ` rdma_get_cm_event() vs ibv_async_event() Yann Droneaud
[not found] ` <1301391461.2192.41.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org>
2011-03-29 10:14 ` Or Gerlitz
[not found] ` <4D91B107.2040007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-03-29 10:33 ` Yann Droneaud
[not found] ` <1301394815.2193.2.camel-H/AUWmsJYVeqvyCYKW+Xr6xOck334EZe@public.gmane.org>
2011-03-29 10:45 ` Or Gerlitz
[not found] ` <4D91B85C.1080106-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-03-29 14:51 ` Hefty, Sean
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox