linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* possible core cq bug
@ 2017-12-02 19:19 Steve Wise
  2017-12-03 11:51 ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Steve Wise @ 2017-12-02 19:19 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

If an application creates its cq for DIRECT poll mode using ib_create_cq()
instead of ib_alloc_cq(),  and then uses ib_drain_qp() to drain its qp,
ib_drain_sq/rq() will always hang forever because cq->wc is NULL.  IE
ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does.  Yet the
__ib_process_cq() requires cq->wc to actually complete any completions and
calling the cqe_done function.

Is this a bug in the CQ core code or the application?

Thanks,

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: possible core cq bug
  2017-12-02 19:19 possible core cq bug Steve Wise
@ 2017-12-03 11:51 ` Sagi Grimberg
       [not found]   ` <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2017-12-03 11:51 UTC (permalink / raw)
  To: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Steve,

> If an application creates its cq for DIRECT poll mode using ib_create_cq()
> instead of ib_alloc_cq(),  and then uses ib_drain_qp() to drain its qp,
> ib_drain_sq/rq() will always hang forever because cq->wc is NULL.  IE
> ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does.  Yet the
> __ib_process_cq() requires cq->wc to actually complete any completions and
> calling the cqe_done function.
> 
> Is this a bug in the CQ core code or the application?

Take a look in __ib_drain_rq/__ib_drain_sq for
cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls
the completion queue from time to time...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: possible core cq bug
       [not found]   ` <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2017-12-03 15:24     ` Steve Wise
  2017-12-03 16:53       ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Steve Wise @ 2017-12-03 15:24 UTC (permalink / raw)
  To: 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA

> 
> > If an application creates its cq for DIRECT poll mode using ib_create_cq()
> > instead of ib_alloc_cq(),  and then uses ib_drain_qp() to drain its qp,
> > ib_drain_sq/rq() will always hang forever because cq->wc is NULL.  IE
> > ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does.  Yet the
> > __ib_process_cq() requires cq->wc to actually complete any completions
> and
> > calling the cqe_done function.
> >
> > Is this a bug in the CQ core code or the application?
> 
> Take a look in __ib_drain_rq/__ib_drain_sq for
> cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls
> the completion queue from time to time...

Yes, but it ends up calling __ib_process_cq() which doesn't actually poll the CQ because cq->wc is NULL. 

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: possible core cq bug
  2017-12-03 15:24     ` Steve Wise
@ 2017-12-03 16:53       ` Sagi Grimberg
       [not found]         ` <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2017-12-03 16:53 UTC (permalink / raw)
  To: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA


>>> If an application creates its cq for DIRECT poll mode using ib_create_cq()
>>> instead of ib_alloc_cq(),  and then uses ib_drain_qp() to drain its qp,
>>> ib_drain_sq/rq() will always hang forever because cq->wc is NULL.  IE
>>> ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does.  Yet the
>>> __ib_process_cq() requires cq->wc to actually complete any completions
>> and
>>> calling the cqe_done function.
>>>
>>> Is this a bug in the CQ core code or the application?
>>
>> Take a look in __ib_drain_rq/__ib_drain_sq for
>> cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls
>> the completion queue from time to time...
> 
> Yes, but it ends up calling __ib_process_cq() which doesn't actually poll the CQ because cq->wc is NULL.

Do you mean that the CQ allocation wasn't done with ib_alloc_cq? That
indeed would be a bug. We can WARN on it as well so the application
will know to allocate its CQ with ib_alloc_cq.

Does something like this makes sense?
--
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index f2ae75fa3128..90eac56b5f1a 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -69,7 +69,7 @@ static int __ib_process_cq(struct ib_cq *cq, int budget)
   */
  int ib_process_cq_direct(struct ib_cq *cq, int budget)
  {
-       WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT);
+       WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT || !cq->wc);

         return __ib_process_cq(cq, budget);
  }
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: possible core cq bug
       [not found]         ` <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2017-12-03 17:17           ` Steve Wise
  2017-12-04 18:41             ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Steve Wise @ 2017-12-03 17:17 UTC (permalink / raw)
  To: 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA

> > Yes, but it ends up calling __ib_process_cq() which doesn't actually poll
> the CQ because cq->wc is NULL.
> 
> Do you mean that the CQ allocation wasn't done with ib_alloc_cq? That
> indeed would be a bug. We can WARN on it as well so the application
> will know to allocate its CQ with ib_alloc_cq.

Yes, the application used ib_create_cq().

> 
> Does something like this makes sense?
>

That would at least log the issue, but the thread in ib_drain_qp() will be stuck forever continually blocking for 1/10sec and then polling.  Perhaps the drain logic should detect this, and then return?

Is there a reason we don't get rid of ib_create_cq()?   Or just make it call ib_alloc_cq()...

> --
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index f2ae75fa3128..90eac56b5f1a 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -69,7 +69,7 @@ static int __ib_process_cq(struct ib_cq *cq, int budget)
>    */
>   int ib_process_cq_direct(struct ib_cq *cq, int budget)
>   {
> -       WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT);
> +       WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT || !cq->wc);
> 
>          return __ib_process_cq(cq, budget);
>   }
> --

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: possible core cq bug
  2017-12-03 17:17           ` Steve Wise
@ 2017-12-04 18:41             ` Sagi Grimberg
       [not found]               ` <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2017-12-04 18:41 UTC (permalink / raw)
  To: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA


> That would at least log the issue, but the thread in ib_drain_qp() will be stuck forever continually blocking for 1/10sec and then polling.  Perhaps the drain logic should detect this, and then return?

I don't think returning is a better choice here..

> Is there a reason we don't get rid of ib_create_cq()?   Or just make it call ib_alloc_cq()...

We could do that, it also makes sense to me. Thoughts from others?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: possible core cq bug
       [not found]               ` <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2017-12-07 16:00                 ` Steve Wise
  0 siblings, 0 replies; 7+ messages in thread
From: Steve Wise @ 2017-12-07 16:00 UTC (permalink / raw)
  To: 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA

> 
> > That would at least log the issue, but the thread in ib_drain_qp() will be
> stuck forever continually blocking for 1/10sec and then polling.  Perhaps the
> drain logic should detect this, and then return?
> 
> I don't think returning is a better choice here..
> 

Why not?

> > Is there a reason we don't get rid of ib_create_cq()?   Or just make it call
> ib_alloc_cq()...
> 
> We could do that, it also makes sense to me. Thoughts from others?
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-07 16:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-02 19:19 possible core cq bug Steve Wise
2017-12-03 11:51 ` Sagi Grimberg
     [not found]   ` <be658098-28a6-e5a8-cebf-5865096f99ce-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-12-03 15:24     ` Steve Wise
2017-12-03 16:53       ` Sagi Grimberg
     [not found]         ` <e7782921-b55a-a628-02d4-874d88049ec5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-12-03 17:17           ` Steve Wise
2017-12-04 18:41             ` Sagi Grimberg
     [not found]               ` <456aef15-a5a7-6dc8-478b-9fc7f75ff1fb-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-12-07 16:00                 ` Steve Wise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).