* possible core cq bug @ 2017-12-02 19:19 Steve Wise 2017-12-03 11:51 ` Sagi Grimberg 0 siblings, 1 reply; 7+ messages in thread From: Steve Wise @ 2017-12-02 19:19 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA If an application creates its cq for DIRECT poll mode using ib_create_cq() instead of ib_alloc_cq(), and then uses ib_drain_qp() to drain its qp, ib_drain_sq/rq() will always hang forever because cq->wc is NULL. IE ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does. Yet the __ib_process_cq() requires cq->wc to actually complete any completions and calling the cqe_done function. Is this a bug in the CQ core code or the application? Thanks, Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible core cq bug 2017-12-02 19:19 possible core cq bug Steve Wise @ 2017-12-03 11:51 ` Sagi Grimberg [not found] ` <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Sagi Grimberg @ 2017-12-03 11:51 UTC (permalink / raw) To: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Steve, > If an application creates its cq for DIRECT poll mode using ib_create_cq() > instead of ib_alloc_cq(), and then uses ib_drain_qp() to drain its qp, > ib_drain_sq/rq() will always hang forever because cq->wc is NULL. IE > ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does. Yet the > __ib_process_cq() requires cq->wc to actually complete any completions and > calling the cqe_done function. > > Is this a bug in the CQ core code or the application? Take a look in __ib_drain_rq/__ib_drain_sq for cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls the completion queue from time to time... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>]
* RE: possible core cq bug [not found] ` <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> @ 2017-12-03 15:24 ` Steve Wise 2017-12-03 16:53 ` Sagi Grimberg 0 siblings, 1 reply; 7+ messages in thread From: Steve Wise @ 2017-12-03 15:24 UTC (permalink / raw) To: 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA > > > If an application creates its cq for DIRECT poll mode using ib_create_cq() > > instead of ib_alloc_cq(), and then uses ib_drain_qp() to drain its qp, > > ib_drain_sq/rq() will always hang forever because cq->wc is NULL. IE > > ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does. Yet the > > __ib_process_cq() requires cq->wc to actually complete any completions > and > > calling the cqe_done function. > > > > Is this a bug in the CQ core code or the application? > > Take a look in __ib_drain_rq/__ib_drain_sq for > cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls > the completion queue from time to time... Yes, but it ends up calling __ib_process_cq() which doesn't actually poll the CQ because cq->wc is NULL. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible core cq bug 2017-12-03 15:24 ` Steve Wise @ 2017-12-03 16:53 ` Sagi Grimberg [not found] ` <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Sagi Grimberg @ 2017-12-03 16:53 UTC (permalink / raw) To: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA >>> If an application creates its cq for DIRECT poll mode using ib_create_cq() >>> instead of ib_alloc_cq(), and then uses ib_drain_qp() to drain its qp, >>> ib_drain_sq/rq() will always hang forever because cq->wc is NULL. IE >>> ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does. Yet the >>> __ib_process_cq() requires cq->wc to actually complete any completions >> and >>> calling the cqe_done function. >>> >>> Is this a bug in the CQ core code or the application? >> >> Take a look in __ib_drain_rq/__ib_drain_sq for >> cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls >> the completion queue from time to time... > > Yes, but it ends up calling __ib_process_cq() which doesn't actually poll the CQ because cq->wc is NULL. Do you mean that the CQ allocation wasn't done with ib_alloc_cq? That indeed would be a bug. We can WARN on it as well so the application will know to allocate its CQ with ib_alloc_cq. Does something like this makes sense? -- diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index f2ae75fa3128..90eac56b5f1a 100644 --- a/drivers/infiniband/core/cq.c +++ b/drivers/infiniband/core/cq.c @@ -69,7 +69,7 @@ static int __ib_process_cq(struct ib_cq *cq, int budget) */ int ib_process_cq_direct(struct ib_cq *cq, int budget) { - WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT); + WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT || !cq->wc); return __ib_process_cq(cq, budget); } -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 7+ messages in thread
[parent not found: <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>]
* RE: possible core cq bug [not found] ` <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> @ 2017-12-03 17:17 ` Steve Wise 2017-12-04 18:41 ` Sagi Grimberg 0 siblings, 1 reply; 7+ messages in thread From: Steve Wise @ 2017-12-03 17:17 UTC (permalink / raw) To: 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA > > Yes, but it ends up calling __ib_process_cq() which doesn't actually poll > the CQ because cq->wc is NULL. > > Do you mean that the CQ allocation wasn't done with ib_alloc_cq? That > indeed would be a bug. We can WARN on it as well so the application > will know to allocate its CQ with ib_alloc_cq. Yes, the application used ib_create_cq(). > > Does something like this makes sense? > That would at least log the issue, but the thread in ib_drain_qp() will be stuck forever continually blocking for 1/10sec and then polling. Perhaps the drain logic should detect this, and then return? Is there a reason we don't get rid of ib_create_cq()? Or just make it call ib_alloc_cq()... > -- > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c > index f2ae75fa3128..90eac56b5f1a 100644 > --- a/drivers/infiniband/core/cq.c > +++ b/drivers/infiniband/core/cq.c > @@ -69,7 +69,7 @@ static int __ib_process_cq(struct ib_cq *cq, int budget) > */ > int ib_process_cq_direct(struct ib_cq *cq, int budget) > { > - WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT); > + WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT || !cq->wc); > > return __ib_process_cq(cq, budget); > } > -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible core cq bug 2017-12-03 17:17 ` Steve Wise @ 2017-12-04 18:41 ` Sagi Grimberg [not found] ` <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Sagi Grimberg @ 2017-12-04 18:41 UTC (permalink / raw) To: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA > That would at least log the issue, but the thread in ib_drain_qp() will be stuck forever continually blocking for 1/10sec and then polling. Perhaps the drain logic should detect this, and then return? I don't think returning is a better choice here.. > Is there a reason we don't get rid of ib_create_cq()? Or just make it call ib_alloc_cq()... We could do that, it also makes sense to me. Thoughts from others? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>]
* RE: possible core cq bug [not found] ` <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> @ 2017-12-07 16:00 ` Steve Wise 0 siblings, 0 replies; 7+ messages in thread From: Steve Wise @ 2017-12-07 16:00 UTC (permalink / raw) To: 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA > > > That would at least log the issue, but the thread in ib_drain_qp() will be > stuck forever continually blocking for 1/10sec and then polling. Perhaps the > drain logic should detect this, and then return? > > I don't think returning is a better choice here.. > Why not? > > Is there a reason we don't get rid of ib_create_cq()? Or just make it call > ib_alloc_cq()... > > We could do that, it also makes sense to me. Thoughts from others? > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-12-07 16:00 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-02 19:19 possible core cq bug Steve Wise
2017-12-03 11:51 ` Sagi Grimberg
[not found] ` <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-12-03 15:24 ` Steve Wise
2017-12-03 16:53 ` Sagi Grimberg
[not found] ` <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-12-03 17:17 ` Steve Wise
2017-12-04 18:41 ` Sagi Grimberg
[not found] ` <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-12-07 16:00 ` Steve Wise
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).