* possible core cq bug
@ 2017-12-02 19:19 Steve Wise
2017-12-03 11:51 ` Sagi Grimberg
0 siblings, 1 reply; 7+ messages in thread
From: Steve Wise @ 2017-12-02 19:19 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
If an application creates its cq for DIRECT poll mode using ib_create_cq()
instead of ib_alloc_cq(), and then uses ib_drain_qp() to drain its qp,
ib_drain_sq/rq() will always hang forever because cq->wc is NULL. IE
ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does. Yet the
__ib_process_cq() requires cq->wc to actually complete any completions and
calling the cqe_done function.
Is this a bug in the CQ core code or the application?
Thanks,
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible core cq bug
2017-12-02 19:19 possible core cq bug Steve Wise
@ 2017-12-03 11:51 ` Sagi Grimberg
[not found] ` <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2017-12-03 11:51 UTC (permalink / raw)
To: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Steve,
> If an application creates its cq for DIRECT poll mode using ib_create_cq()
> instead of ib_alloc_cq(), and then uses ib_drain_qp() to drain its qp,
> ib_drain_sq/rq() will always hang forever because cq->wc is NULL. IE
> ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does. Yet the
> __ib_process_cq() requires cq->wc to actually complete any completions and
> calling the cqe_done function.
>
> Is this a bug in the CQ core code or the application?
Take a look in __ib_drain_rq/__ib_drain_sq for
cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls
the completion queue from time to time...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: possible core cq bug
[not found] ` <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2017-12-03 15:24 ` Steve Wise
2017-12-03 16:53 ` Sagi Grimberg
0 siblings, 1 reply; 7+ messages in thread
From: Steve Wise @ 2017-12-03 15:24 UTC (permalink / raw)
To: 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA
>
> > If an application creates its cq for DIRECT poll mode using ib_create_cq()
> > instead of ib_alloc_cq(), and then uses ib_drain_qp() to drain its qp,
> > ib_drain_sq/rq() will always hang forever because cq->wc is NULL. IE
> > ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does. Yet the
> > __ib_process_cq() requires cq->wc to actually complete any completions
> and
> > calling the cqe_done function.
> >
> > Is this a bug in the CQ core code or the application?
>
> Take a look in __ib_drain_rq/__ib_drain_sq for
> cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls
> the completion queue from time to time...
Yes, but it ends up calling __ib_process_cq() which doesn't actually poll the CQ because cq->wc is NULL.
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible core cq bug
2017-12-03 15:24 ` Steve Wise
@ 2017-12-03 16:53 ` Sagi Grimberg
[not found] ` <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2017-12-03 16:53 UTC (permalink / raw)
To: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA
>>> If an application creates its cq for DIRECT poll mode using ib_create_cq()
>>> instead of ib_alloc_cq(), and then uses ib_drain_qp() to drain its qp,
>>> ib_drain_sq/rq() will always hang forever because cq->wc is NULL. IE
>>> ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does. Yet the
>>> __ib_process_cq() requires cq->wc to actually complete any completions
>> and
>>> calling the cqe_done function.
>>>
>>> Is this a bug in the CQ core code or the application?
>>
>> Take a look in __ib_drain_rq/__ib_drain_sq for
>> cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls
>> the completion queue from time to time...
>
> Yes, but it ends up calling __ib_process_cq() which doesn't actually poll the CQ because cq->wc is NULL.
Do you mean that the CQ allocation wasn't done with ib_alloc_cq? That
indeed would be a bug. We can WARN on it as well so the application
will know to allocate its CQ with ib_alloc_cq.
Does something like this makes sense?
--
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index f2ae75fa3128..90eac56b5f1a 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -69,7 +69,7 @@ static int __ib_process_cq(struct ib_cq *cq, int budget)
*/
int ib_process_cq_direct(struct ib_cq *cq, int budget)
{
- WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT);
+ WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT || !cq->wc);
return __ib_process_cq(cq, budget);
}
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 7+ messages in thread
* RE: possible core cq bug
[not found] ` <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2017-12-03 17:17 ` Steve Wise
2017-12-04 18:41 ` Sagi Grimberg
0 siblings, 1 reply; 7+ messages in thread
From: Steve Wise @ 2017-12-03 17:17 UTC (permalink / raw)
To: 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA
> > Yes, but it ends up calling __ib_process_cq() which doesn't actually poll
> the CQ because cq->wc is NULL.
>
> Do you mean that the CQ allocation wasn't done with ib_alloc_cq? That
> indeed would be a bug. We can WARN on it as well so the application
> will know to allocate its CQ with ib_alloc_cq.
Yes, the application used ib_create_cq().
>
> Does something like this makes sense?
>
That would at least log the issue, but the thread in ib_drain_qp() will be stuck forever continually blocking for 1/10sec and then polling. Perhaps the drain logic should detect this, and then return?
Is there a reason we don't get rid of ib_create_cq()? Or just make it call ib_alloc_cq()...
> --
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index f2ae75fa3128..90eac56b5f1a 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -69,7 +69,7 @@ static int __ib_process_cq(struct ib_cq *cq, int budget)
> */
> int ib_process_cq_direct(struct ib_cq *cq, int budget)
> {
> - WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT);
> + WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT || !cq->wc);
>
> return __ib_process_cq(cq, budget);
> }
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible core cq bug
2017-12-03 17:17 ` Steve Wise
@ 2017-12-04 18:41 ` Sagi Grimberg
[not found] ` <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2017-12-04 18:41 UTC (permalink / raw)
To: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA
> That would at least log the issue, but the thread in ib_drain_qp() will be stuck forever continually blocking for 1/10sec and then polling. Perhaps the drain logic should detect this, and then return?
I don't think returning is a better choice here..
> Is there a reason we don't get rid of ib_create_cq()? Or just make it call ib_alloc_cq()...
We could do that, it also makes sense to me. Thoughts from others?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: possible core cq bug
[not found] ` <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2017-12-07 16:00 ` Steve Wise
0 siblings, 0 replies; 7+ messages in thread
From: Steve Wise @ 2017-12-07 16:00 UTC (permalink / raw)
To: 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA
>
> > That would at least log the issue, but the thread in ib_drain_qp() will be
> stuck forever continually blocking for 1/10sec and then polling. Perhaps the
> drain logic should detect this, and then return?
>
> I don't think returning is a better choice here..
>
Why not?
> > Is there a reason we don't get rid of ib_create_cq()? Or just make it call
> ib_alloc_cq()...
>
> We could do that, it also makes sense to me. Thoughts from others?
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-12-07 16:00 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-02 19:19 possible core cq bug Steve Wise
2017-12-03 11:51 ` Sagi Grimberg
[not found] ` <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-12-03 15:24 ` Steve Wise
2017-12-03 16:53 ` Sagi Grimberg
[not found] ` <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-12-03 17:17 ` Steve Wise
2017-12-04 18:41 ` Sagi Grimberg
[not found] ` <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-12-07 16:00 ` Steve Wise
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).