* identify the race condition in this code and win the respect of linux-rdma developers!
@ 2011-09-15 23:58 Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B2B1-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Hefty, Sean @ 2011-09-15 23:58 UTC (permalink / raw)
To: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
I have a ping-pong test application that loops doing: send, wait for send completion, wait for receive completion. The test occasionally hangs in the following code at ibv_get_cq_event() (error handling removed):
Case 1:
ret = ibv_poll_cq(id->recv_cq, 1, wc);
ret = ibv_req_notify_cq(id->recv_cq, 0);
while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc))) {
ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
ibv_ack_cq_events(id->recv_cq, 1);
}
The test does NOT hang if the code is restructured as either of the below:
Case 2:
ret = ibv_poll_cq(id->recv_cq, 1, wc);
ret = ibv_req_notify_cq(id->recv_cq, 0);
do {
ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
ibv_ack_cq_events(id->recv_cq, 1);
} while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc)));
or
Case 3:
ret = ibv_poll_cq(id->recv_cq, 1, wc);
while (1)
ret = ibv_req_notify_cq(id->recv_cq, 0);
ret = ibv_poll_cq(id->recv_cq, 1, wc);
if (ret)
break;
ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
ibv_ack_cq_events(id->recv_cq, 1);
}
Personally, I would have expected case 2 to hang and cases 1 and 3 to work. Does anyone see why case 1 should hang? This is with mlx4.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: identify the race condition in this code and win the respect of linux-rdma developers!
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B2B1-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2011-09-16 0:50 ` Jason Gunthorpe
[not found] ` <20110916005008.GC6020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2011-09-16 0:50 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
On Thu, Sep 15, 2011 at 11:58:34PM +0000, Hefty, Sean wrote:
> I have a ping-pong test application that loops doing: send, wait for
> send completion, wait for receive completion. The test occasionally
> hangs in the following code at ibv_get_cq_event() (error handling
> removed):
> Case 1:
> ret = ibv_poll_cq(id->recv_cq, 1, wc);
> ret = ibv_req_notify_cq(id->recv_cq, 0);
>
> while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc))) {
> ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
> ibv_ack_cq_events(id->recv_cq, 1);
> }
If ibv_get_cq_event returns here, because the CQ notify triggers but
poll_cq goes not return a WC then you go back into get_cq_event
without requesting notification.
I'm pretty sure ibv_req_notify_cq does nothing if there is already a
notification pending.
> Case 3:
> ret = ibv_poll_cq(id->recv_cq, 1, wc);
> while (1)
> ret = ibv_req_notify_cq(id->recv_cq, 0);
>
> ret = ibv_poll_cq(id->recv_cq, 1, wc);
> if (ret)
> break;
> ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
> ibv_ack_cq_events(id->recv_cq, 1);
> }
Where as this, is correct. The sequence must always be
ibv_req_notify_cq, ibv_poll_cq, ibv_get_cq_event
Case 2 will also fail, but it will be statistically less likely since
you are looking for a WC being added between the poll_cq and
req_notify_cq. While case 1 only requires a run through the loop with
a pending notification but no WC entries.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: identify the race condition in this code and win the respect of linux-rdma developers!
[not found] ` <20110916005008.GC6020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2011-09-16 14:51 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B320-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Hefty, Sean @ 2011-09-16 14:51 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
> > Case 1:
> > ret = ibv_poll_cq(id->recv_cq, 1, wc);
> > ret = ibv_req_notify_cq(id->recv_cq, 0);
> >
> > while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc))) {
> > ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
> > ibv_ack_cq_events(id->recv_cq, 1);
> > }
>
> If ibv_get_cq_event returns here, because the CQ notify triggers but
> poll_cq goes not return a WC then you go back into get_cq_event
> without requesting notification.
I understand this, but I still don't quite see how the situation fully occurs.
> I'm pretty sure ibv_req_notify_cq does nothing if there is already a
> notification pending.
This last statement might be it, but I couldn't see that in the code. I was under the assumption that once an event occurred, the CQ would be 'un-armed'. ibv_req_notify_cq() would rearm it. In the above code then, ibv_get_cq_event() would pull the old event, but then a new one should eventually show up with a corresponding completion. The flow would be:
poll
rearm
-- completion --
while (poll) -> returns cq entry, but leaves event, cq is not armed?
next test iteration
poll
rearm -> should arm cq
while (poll)
get_event -> returns previous event
-- loop back around waiting for new event --
this, of course, isn't happening. Maybe there's a significant delay between the device writing the CQ entry and generating the event, such that the 2nd rearm doesn't do anything, as you suggest..
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: identify the race condition in this code and win the respect of linux-rdma developers!
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B320-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2011-09-16 15:59 ` Jason Gunthorpe
[not found] ` <20110916155955.GA18548-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2011-09-16 15:59 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
On Fri, Sep 16, 2011 at 02:51:04PM +0000, Hefty, Sean wrote:
> > > Case 1:
> > > ret = ibv_poll_cq(id->recv_cq, 1, wc);
> > > ret = ibv_req_notify_cq(id->recv_cq, 0);
> > >
> > > while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc))) {
> > > ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context);
> > > ibv_ack_cq_events(id->recv_cq, 1);
> > > }
> >
> > If ibv_get_cq_event returns here, because the CQ notify triggers but
> > poll_cq goes not return a WC then you go back into get_cq_event
> > without requesting notification.
>
> I understand this, but I still don't quite see how the situation fully occurs.
>
> > I'm pretty sure ibv_req_notify_cq does nothing if there is already a
> > notification pending.
>
> This last statement might be it, but I couldn't see that in the
> code. I was under the assumption that once an event occurred, the
> CQ would be 'un-armed'. ibv_req_notify_cq() would rearm it. In the
> above code then, ibv_get_cq_event() would pull the old event, but
> then a new one should eventually show up with a corresponding
> completion. The flow would be:
As I see it the problem flow would be this:
poll
ibv_req_notify_cq
// HCA Write CQ Entry
// Trigger EVENT
poll -> return CQ
poll
ibv_req_notify_cq // does nothing, event is already triggered
poll
ibv_get_cq_event // returns immediately due to prior Trigger event
poll
ibv_get_cq_event // blocks because the CQ was never rearmed.
Which would happen easily and frequently without having to invoke any
spooky infrequent racing.
poll and cq_events are totally independent, I belive the implementation
of CQ events is like:
enum {DISABLED,ARMED,TRIGGERED} flag;
And the only sane, race free implementation of req_notify must thus be
if (flag == DISABLED) flag = ARMED
And for CQ entry add:
if (flag == ARMED) flag = TRIGGERED
And for get_cq_event:
if (flag == TRIGGERED) {flag = DISABLED; return;}
What you are imagining is that ibv_req_notify_cq does this:
if (flag == ARMED || flag == TRIGGERED) flag = ARMED
which is not a good choice because it will destroy a triggered state
and that creates all sorts of bad race conditions, again because of
the independence of poll and cq_event.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: identify the race condition in this code and win the respect of linux-rdma developers!
[not found] ` <20110916155955.GA18548-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2011-09-16 16:57 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B3B8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-16 19:36 ` Hefty, Sean
1 sibling, 1 reply; 8+ messages in thread
From: Hefty, Sean @ 2011-09-16 16:57 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
> poll and cq_events are totally independent, I belive the implementation
> of CQ events is like:
>
> enum {DISABLED,ARMED,TRIGGERED} flag;
I was suggesting that there were only 2 states, not 3, with ibv_get_cq_event() simply retrieving a queued event from the kernel without touching the CQ state.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: identify the race condition in this code and win the respect of linux-rdma developers!
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B3B8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2011-09-16 17:21 ` Hefty, Sean
0 siblings, 0 replies; 8+ messages in thread
From: Hefty, Sean @ 2011-09-16 17:21 UTC (permalink / raw)
To: Hefty, Sean, Jason Gunthorpe
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
Maybe someone at Mellanox can provide some guidance here. Here are the arm/get_event calls from the libmlx4.
int mlx4_arm_cq(struct ibv_cq *ibvcq, int solicited)
{
...
sn = cq->arm_sn & 3;
ci = cq->cons_index & 0xffffff;
cmd = solicited ? MLX4_CQ_DB_REQ_NOT_SOL : MLX4_CQ_DB_REQ_NOT;
*cq->arm_db = htonl(sn << 28 | cmd | ci);
wmb();
doorbell[0] = htonl(sn << 28 | cmd | cq->cqn);
doorbell[1] = htonl(ci);
mlx4_write64(doorbell, to_mctx(ibvcq->context), MLX4_CQ_DOORBELL);
return 0;
}
void mlx4_cq_event(struct ibv_cq *cq)
{
to_mcq(cq)->arm_sn++;
}
What exactly is the role of arm_sn?
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: identify the race condition in this code and win the respect of linux-rdma developers!
[not found] ` <20110916155955.GA18548-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-09-16 16:57 ` Hefty, Sean
@ 2011-09-16 19:36 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B55E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
1 sibling, 1 reply; 8+ messages in thread
From: Hefty, Sean @ 2011-09-16 19:36 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
> As I see it the problem flow would be this:
>
> poll
> ibv_req_notify_cq
> // HCA Write CQ Entry
> // Trigger EVENT
Since the trigger event must be separate from writing the cq entry, if it doesn't happen until...
> poll -> return CQ
>
> poll
> ibv_req_notify_cq // does nothing, event is already triggered
here, then the situation you describe can occur based on the spec. And if we allow an implementation to make ibv_req_notify_cq a no-op if there is an event outstanding, the race condition is easier to hit.
> poll
> ibv_get_cq_event // returns immediately due to prior Trigger event
> poll
> ibv_get_cq_event // blocks because the CQ was never rearmed.
I changed the code to do this:
do {
poll
rearm
poll
get_event
} while (1);
which I believe fixes the issue. Thanks for the help. You are now the winner of my respect. :)
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: identify the race condition in this code and win the respect of linux-rdma developers!
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B55E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2011-09-16 19:57 ` Jason Gunthorpe
0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2011-09-16 19:57 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
On Fri, Sep 16, 2011 at 07:36:17PM +0000, Hefty, Sean wrote:
> I was suggesting that there were only 2 states, not 3, with
> ibv_get_cq_event() simply retrieving a queued event from the kernel
> without touching the CQ state.
But that isn't true, as you point out in your next message
ibv_get_cq_event calls mlx4_cq_event which does alter the CQ state,
particularly in how it behaves WRT to mlx4_arm_cq.
Hard to say what the arm_sn does, but my guess is that it is creating
a one-shot that is sensitive to ibv_get_cq_event, eg you will never
get more than one event queued in the kernel queue, which creates the
3 state system I described.. (at least that matches my experience)
Certainly, considering that arm_sn is a 3 bit counter, calling
mlx4_cq_event many times then calling arm_cq will alias the counter
from the point of view of the chip and that can't possibly be good.
> I changed the code to do this:
>
> do {
> poll
> rearm
> poll
> get_event
> } while (1);
>
> which I believe fixes the issue.
In the past I looked at all this and concluded this was the only
possible correct way to write the loop. You have to guarentee rearm is
called before every get_event and you have to do a poll after every
rearm.
> Thanks for the help. You are now the winner of my respect. :)
NP :)
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-09-16 19:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-15 23:58 identify the race condition in this code and win the respect of linux-rdma developers! Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B2B1-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-16 0:50 ` Jason Gunthorpe
[not found] ` <20110916005008.GC6020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-09-16 14:51 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B320-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-16 15:59 ` Jason Gunthorpe
[not found] ` <20110916155955.GA18548-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-09-16 16:57 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B3B8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-16 17:21 ` Hefty, Sean
2011-09-16 19:36 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B55E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-16 19:57 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox