* identify the race condition in this code and win the respect of linux-rdma developers!
@ 2011-09-15 23:58 Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B2B1-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Hefty, Sean @ 2011-09-15 23:58 UTC (permalink / raw)
To: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
I have a ping-pong test application that loops doing: send, wait for send completion, wait for receive completion. The test occasionally hangs in the following code at ibv_get_cq_event() (error handling removed):
Case 1:
ret = ibv_poll_cq(id->recv_cq, 1, wc);
ret = ibv_req_notify_cq(id->recv_cq, 0);
while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc))) {
ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
ibv_ack_cq_events(id->recv_cq, 1);
}
The test does NOT hang if the code is restructured as either of the below:
Case 2:
ret = ibv_poll_cq(id->recv_cq, 1, wc);
ret = ibv_req_notify_cq(id->recv_cq, 0);
do {
ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
ibv_ack_cq_events(id->recv_cq, 1);
} while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc)));
or
Case 3:
ret = ibv_poll_cq(id->recv_cq, 1, wc);
while (1)
ret = ibv_req_notify_cq(id->recv_cq, 0);
ret = ibv_poll_cq(id->recv_cq, 1, wc);
if (ret)
break;
ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
ibv_ack_cq_events(id->recv_cq, 1);
}
Personally, I would have expected case 2 to hang and cases 1 and 3 to work. Does anyone see why case 1 should hang? This is with mlx4.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <1828884A29C6694DAF28B7E6B8A8237316E5B2B1-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: identify the race condition in this code and win the respect of linux-rdma developers! [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B2B1-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2011-09-16 0:50 ` Jason Gunthorpe [not found] ` <20110916005008.GC6020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Jason Gunthorpe @ 2011-09-16 0:50 UTC (permalink / raw) To: Hefty, Sean Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) On Thu, Sep 15, 2011 at 11:58:34PM +0000, Hefty, Sean wrote: > I have a ping-pong test application that loops doing: send, wait for > send completion, wait for receive completion. The test occasionally > hangs in the following code at ibv_get_cq_event() (error handling > removed): > Case 1: > ret = ibv_poll_cq(id->recv_cq, 1, wc); > ret = ibv_req_notify_cq(id->recv_cq, 0); > > while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc))) { > ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context); > ibv_ack_cq_events(id->recv_cq, 1); > } If ibv_get_cq_event returns here, because the CQ notify triggers but poll_cq goes not return a WC then you go back into get_cq_event without requesting notification. I'm pretty sure ibv_req_notify_cq does nothing if there is already a notification pending. > Case 3: > ret = ibv_poll_cq(id->recv_cq, 1, wc); > while (1) > ret = ibv_req_notify_cq(id->recv_cq, 0); > > ret = ibv_poll_cq(id->recv_cq, 1, wc); > if (ret) > break; > ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context); > ibv_ack_cq_events(id->recv_cq, 1); > } Where as this, is correct. The sequence must always be ibv_req_notify_cq, ibv_poll_cq, ibv_get_cq_event Case 2 will also fail, but it will be statistically less likely since you are looking for a WC being added between the poll_cq and req_notify_cq. While case 1 only requires a run through the loop with a pending notification but no WC entries. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20110916005008.GC6020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: identify the race condition in this code and win the respect of linux-rdma developers! [not found] ` <20110916005008.GC6020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2011-09-16 14:51 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B320-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Hefty, Sean @ 2011-09-16 14:51 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) > > Case 1: > > ret = ibv_poll_cq(id->recv_cq, 1, wc); > > ret = ibv_req_notify_cq(id->recv_cq, 0); > > > > while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc))) { > > ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context); > > ibv_ack_cq_events(id->recv_cq, 1); > > } > > If ibv_get_cq_event returns here, because the CQ notify triggers but > poll_cq goes not return a WC then you go back into get_cq_event > without requesting notification. I understand this, but I still don't quite see how the situation fully occurs. > I'm pretty sure ibv_req_notify_cq does nothing if there is already a > notification pending. This last statement might be it, but I couldn't see that in the code. I was under the assumption that once an event occurred, the CQ would be 'un-armed'. ibv_req_notify_cq() would rearm it. In the above code then, ibv_get_cq_event() would pull the old event, but then a new one should eventually show up with a corresponding completion. The flow would be: poll rearm -- completion -- while (poll) -> returns cq entry, but leaves event, cq is not armed? next test iteration poll rearm -> should arm cq while (poll) get_event -> returns previous event -- loop back around waiting for new event -- this, of course, isn't happening. Maybe there's a significant delay between the device writing the CQ entry and generating the event, such that the 2nd rearm doesn't do anything, as you suggest.. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A8237316E5B320-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: identify the race condition in this code and win the respect of linux-rdma developers! [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B320-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2011-09-16 15:59 ` Jason Gunthorpe [not found] ` <20110916155955.GA18548-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Jason Gunthorpe @ 2011-09-16 15:59 UTC (permalink / raw) To: Hefty, Sean Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) On Fri, Sep 16, 2011 at 02:51:04PM +0000, Hefty, Sean wrote: > > > Case 1: > > > ret = ibv_poll_cq(id->recv_cq, 1, wc); > > > ret = ibv_req_notify_cq(id->recv_cq, 0); > > > > > > while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc))) { > > > ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context); > > > ibv_ack_cq_events(id->recv_cq, 1); > > > } > > > > If ibv_get_cq_event returns here, because the CQ notify triggers but > > poll_cq goes not return a WC then you go back into get_cq_event > > without requesting notification. > > I understand this, but I still don't quite see how the situation fully occurs. > > > I'm pretty sure ibv_req_notify_cq does nothing if there is already a > > notification pending. > > This last statement might be it, but I couldn't see that in the > code. I was under the assumption that once an event occurred, the > CQ would be 'un-armed'. ibv_req_notify_cq() would rearm it. In the > above code then, ibv_get_cq_event() would pull the old event, but > then a new one should eventually show up with a corresponding > completion. The flow would be: As I see it the problem flow would be this: poll ibv_req_notify_cq // HCA Write CQ Entry // Trigger EVENT poll -> return CQ poll ibv_req_notify_cq // does nothing, event is already triggered poll ibv_get_cq_event // returns immediately due to prior Trigger event poll ibv_get_cq_event // blocks because the CQ was never rearmed. Which would happen easily and frequently without having to invoke any spooky infrequent racing. poll and cq_events are totally independent, I belive the implementation of CQ events is like: enum {DISABLED,ARMED,TRIGGERED} flag; And the only sane, race free implementation of req_notify must thus be if (flag == DISABLED) flag = ARMED And for CQ entry add: if (flag == ARMED) flag = TRIGGERED And for get_cq_event: if (flag == TRIGGERED) {flag = DISABLED; return;} What you are imagining is that ibv_req_notify_cq does this: if (flag == ARMED || flag == TRIGGERED) flag = ARMED which is not a good choice because it will destroy a triggered state and that creates all sorts of bad race conditions, again because of the independence of poll and cq_event. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20110916155955.GA18548-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: identify the race condition in this code and win the respect of linux-rdma developers! [not found] ` <20110916155955.GA18548-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2011-09-16 16:57 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B3B8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 2011-09-16 19:36 ` Hefty, Sean 1 sibling, 1 reply; 8+ messages in thread From: Hefty, Sean @ 2011-09-16 16:57 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) > poll and cq_events are totally independent, I belive the implementation > of CQ events is like: > > enum {DISABLED,ARMED,TRIGGERED} flag; I was suggesting that there were only 2 states, not 3, with ibv_get_cq_event() simply retrieving a queued event from the kernel without touching the CQ state. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A8237316E5B3B8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: identify the race condition in this code and win the respect of linux-rdma developers! [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B3B8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2011-09-16 17:21 ` Hefty, Sean 0 siblings, 0 replies; 8+ messages in thread From: Hefty, Sean @ 2011-09-16 17:21 UTC (permalink / raw) To: Hefty, Sean, Jason Gunthorpe Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) Maybe someone at Mellanox can provide some guidance here. Here are the arm/get_event calls from the libmlx4. int mlx4_arm_cq(struct ibv_cq *ibvcq, int solicited) { ... sn = cq->arm_sn & 3; ci = cq->cons_index & 0xffffff; cmd = solicited ? MLX4_CQ_DB_REQ_NOT_SOL : MLX4_CQ_DB_REQ_NOT; *cq->arm_db = htonl(sn << 28 | cmd | ci); wmb(); doorbell[0] = htonl(sn << 28 | cmd | cq->cqn); doorbell[1] = htonl(ci); mlx4_write64(doorbell, to_mctx(ibvcq->context), MLX4_CQ_DOORBELL); return 0; } void mlx4_cq_event(struct ibv_cq *cq) { to_mcq(cq)->arm_sn++; } What exactly is the role of arm_sn? - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: identify the race condition in this code and win the respect of linux-rdma developers! [not found] ` <20110916155955.GA18548-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2011-09-16 16:57 ` Hefty, Sean @ 2011-09-16 19:36 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B55E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 1 sibling, 1 reply; 8+ messages in thread From: Hefty, Sean @ 2011-09-16 19:36 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) > As I see it the problem flow would be this: > > poll > ibv_req_notify_cq > // HCA Write CQ Entry > // Trigger EVENT Since the trigger event must be separate from writing the cq entry, if it doesn't happen until... > poll -> return CQ > > poll > ibv_req_notify_cq // does nothing, event is already triggered here, then the situation you describe can occur based on the spec. And if we allow an implementation to make ibv_req_notify_cq a no-op if there is an event outstanding, the race condition is easier to hit. > poll > ibv_get_cq_event // returns immediately due to prior Trigger event > poll > ibv_get_cq_event // blocks because the CQ was never rearmed. I changed the code to do this: do { poll rearm poll get_event } while (1); which I believe fixes the issue. Thanks for the help. You are now the winner of my respect. :) - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A8237316E5B55E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: identify the race condition in this code and win the respect of linux-rdma developers! [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B55E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2011-09-16 19:57 ` Jason Gunthorpe 0 siblings, 0 replies; 8+ messages in thread From: Jason Gunthorpe @ 2011-09-16 19:57 UTC (permalink / raw) To: Hefty, Sean Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) On Fri, Sep 16, 2011 at 07:36:17PM +0000, Hefty, Sean wrote: > I was suggesting that there were only 2 states, not 3, with > ibv_get_cq_event() simply retrieving a queued event from the kernel > without touching the CQ state. But that isn't true, as you point out in your next message ibv_get_cq_event calls mlx4_cq_event which does alter the CQ state, particularly in how it behaves WRT to mlx4_arm_cq. Hard to say what the arm_sn does, but my guess is that it is creating a one-shot that is sensitive to ibv_get_cq_event, eg you will never get more than one event queued in the kernel queue, which creates the 3 state system I described.. (at least that matches my experience) Certainly, considering that arm_sn is a 3 bit counter, calling mlx4_cq_event many times then calling arm_cq will alias the counter from the point of view of the chip and that can't possibly be good. > I changed the code to do this: > > do { > poll > rearm > poll > get_event > } while (1); > > which I believe fixes the issue. In the past I looked at all this and concluded this was the only possible correct way to write the loop. You have to guarentee rearm is called before every get_event and you have to do a poll after every rearm. > Thanks for the help. You are now the winner of my respect. :) NP :) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-09-16 19:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-15 23:58 identify the race condition in this code and win the respect of linux-rdma developers! Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B2B1-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-16 0:50 ` Jason Gunthorpe
[not found] ` <20110916005008.GC6020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-09-16 14:51 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B320-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-16 15:59 ` Jason Gunthorpe
[not found] ` <20110916155955.GA18548-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-09-16 16:57 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B3B8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-16 17:21 ` Hefty, Sean
2011-09-16 19:36 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B55E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-16 19:57 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox