* [PATCH] IB/rds: use IB_CQ_REPORT_MISSED_EVENTS to avoid missed CQ callbacks
@ 2010-03-30 22:24 Ralph Campbell
[not found] ` <1269987841.4192.228.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Ralph Campbell @ 2010-03-30 22:24 UTC (permalink / raw)
To: Andy Grover; +Cc: linux-rdma
ib_req_notify_cq(IB_CQ_NEXT_COMP) is not guaranteed to generate
a callback for the next completion entered since there is a race
between arming the callback and another CQE being added to the queue.
OpenFabrics added a IB_CQ_REPORT_MISSED_EVENTS flag to detect this
race and allow the verbs consumer to call ib_poll_cq() and
ib_req_notify_cq() again to avoid delays in processing the CQE.
Signed-off-by: Ralph Campbell <ralph.campbell-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org>
---
This patch by itself should improve RDS performance over QLogic
HCAs. I was running "qperf rds_bw" tests and would sometimes
see "unexpected opcode 0xdead" messages from
rds_ib_send_cq_comp_handler(). I have looked very closely at the
code and I didn't see any race conditions which would explain
why delaying the completion callback would cause the message but
with this patch, I don't see the error message.
net/rds/ib_send.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index a10fab6..609ae66 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -187,10 +187,8 @@ void rds_ib_send_cq_comp_handler(struct ib_cq *cq, void *context)
rdsdebug("cq %p conn %p\n", cq, conn);
rds_ib_stats_inc(s_ib_tx_cq_call);
- ret = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
- if (ret)
- rdsdebug("ib_req_notify_cq send failed: %d\n", ret);
+again:
while (ib_poll_cq(cq, 1, &wc) > 0) {
rdsdebug("wc wr_id 0x%llx status %u byte_len %u imm_data %u\n",
(unsigned long long)wc.wr_id, wc.status, wc.byte_len,
@@ -264,6 +262,12 @@ void rds_ib_send_cq_comp_handler(struct ib_cq *cq, void *context)
&conn->c_faddr, wc.status);
}
}
+ ret = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
+ IB_CQ_REPORT_MISSED_EVENTS);
+ if (ret > 0)
+ goto again;
+ if (ret < 0)
+ rdsdebug("ib_req_notify_cq send failed: %d\n", ret);
}
/*
--
1.6.6.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] IB/rds: use IB_CQ_REPORT_MISSED_EVENTS to avoid missed CQ callbacks
[not found] ` <1269987841.4192.228.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
@ 2010-03-30 23:30 ` Roland Dreier
[not found] ` <adaeij1v14z.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Roland Dreier @ 2010-03-30 23:30 UTC (permalink / raw)
To: Ralph Campbell; +Cc: Andy Grover, linux-rdma
> OpenFabrics added a IB_CQ_REPORT_MISSED_EVENTS flag to detect this
OpenFabrics? Funny, I thought I added that flag.
(This is not a serious objection ... I'm just being annoying. Being
serious, I'm happy to see this used outside of IPoIB and this looks like
a good optimization)
> This patch by itself should improve RDS performance over QLogic HCAs
Out of curiousity have you measured an improvement?
--
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] IB/rds: use IB_CQ_REPORT_MISSED_EVENTS to avoid missed CQ callbacks
[not found] ` <adaeij1v14z.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
@ 2010-03-30 23:50 ` Ralph Campbell
[not found] ` <1269993053.4192.235.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Ralph Campbell @ 2010-03-30 23:50 UTC (permalink / raw)
To: Roland Dreier; +Cc: Andy Grover, linux-rdma
On Tue, 2010-03-30 at 16:30 -0700, Roland Dreier wrote:
> > OpenFabrics added a IB_CQ_REPORT_MISSED_EVENTS flag to detect this
>
> OpenFabrics? Funny, I thought I added that flag.
You did. I can change the comment if you like.
> (This is not a serious objection ... I'm just being annoying. Being
> serious, I'm happy to see this used outside of IPoIB and this looks like
> a good optimization)
>
> > This patch by itself should improve RDS performance over QLogic HCAs
>
> Out of curiousity have you measured an improvement?
It is a great improvement over a bunch of printks and then
hitting BUG_ON() in __rds_ib_ring_used().
An actual performance difference will have to wait until
the underlying race is better understood. I thought this
patch might spark a discussion since the code in
rds_ib_send_cq_comp_handler() calls printk_ratelimit()
so the 0xdead message must have been seen before enough
to warrant adding the call and the comment:
/* In the error case, wc.opcode sometimes contains garbage */
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] IB/rds: use IB_CQ_REPORT_MISSED_EVENTS to avoid missed CQ callbacks
[not found] ` <1269993053.4192.235.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
@ 2010-03-31 21:59 ` Roland Dreier
0 siblings, 0 replies; 4+ messages in thread
From: Roland Dreier @ 2010-03-31 21:59 UTC (permalink / raw)
To: Ralph Campbell; +Cc: Andy Grover, linux-rdma
> > Out of curiousity have you measured an improvement?
>
> It is a great improvement over a bunch of printks and then
> hitting BUG_ON() in __rds_ib_ring_used().
I think it would be better to understand that problem, since this patch
is not really fixing anything, just changing timing. So at best you're
hiding the problem until it pops up again.
--
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-03-31 21:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-30 22:24 [PATCH] IB/rds: use IB_CQ_REPORT_MISSED_EVENTS to avoid missed CQ callbacks Ralph Campbell
[not found] ` <1269987841.4192.228.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
2010-03-30 23:30 ` Roland Dreier
[not found] ` <adaeij1v14z.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-03-30 23:50 ` Ralph Campbell
[not found] ` <1269993053.4192.235.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
2010-03-31 21:59 ` Roland Dreier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox