linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
@ 2010-10-20 19:28 Steve Wise
       [not found] ` <20101020192859.1431.68877.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Wise @ 2010-10-20 19:28 UTC (permalink / raw)
  To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

It is possible for the CQ event thread to poll the CQ after it has been
destroyed which can result in a seg fault on T3 interfaces.  This patch
cancels the thread and waits for it to exit before destroying the CQ.

Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 examples/rping.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/examples/rping.c b/examples/rping.c
index 91952e7..e603d3b 100644
--- a/examples/rping.c
+++ b/examples/rping.c
@@ -800,10 +800,10 @@ static void *rping_persistent_server_thread(void *arg)
 
 	rping_test_server(cb);
 	rdma_disconnect(cb->child_cm_id);
-	rping_free_buffers(cb);
-	rping_free_qp(cb);
 	pthread_cancel(cb->cqthread);
 	pthread_join(cb->cqthread, NULL);
+	rping_free_buffers(cb);
+	rping_free_qp(cb);
 	rdma_destroy_id(cb->child_cm_id);
 	free_cb(cb);
 	return NULL;
@@ -888,6 +888,8 @@ static int rping_run_server(struct rping_cb *cb)
 
 	rping_test_server(cb);
 	rdma_disconnect(cb->child_cm_id);
+	pthread_cancel(cb->cqthread);
+	pthread_join(cb->cqthread, NULL);
 	rdma_destroy_id(cb->child_cm_id);
 err2:
 	rping_free_buffers(cb);
@@ -1055,6 +1057,8 @@ static int rping_run_client(struct rping_cb *cb)
 
 	rping_test_client(cb);
 	rdma_disconnect(cb->cm_id);
+	pthread_cancel(cb->cqthread);
+	pthread_join(cb->cqthread, NULL);
 err2:
 	rping_free_buffers(cb);
 err1:

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] RPING: Remove printf for FLUSH completion.
       [not found] ` <20101020192859.1431.68877.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
@ 2010-10-20 19:29   ` Steve Wise
       [not found]     ` <20101020192905.1431.40267.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
  2010-10-20 20:05   ` [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ Bart Van Assche
  1 sibling, 1 reply; 15+ messages in thread
From: Steve Wise @ 2010-10-20 19:29 UTC (permalink / raw)
  To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 examples/rping.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/examples/rping.c b/examples/rping.c
index e603d3b..5eb71e2 100644
--- a/examples/rping.c
+++ b/examples/rping.c
@@ -280,10 +280,12 @@ static int rping_cq_event_handler(struct rping_cb *cb)
 		ret = 0;
 
 		if (wc.status) {
-			fprintf(stderr, "cq completion failed status %d\n",
-				wc.status);
-			if (wc.status != IBV_WC_WR_FLUSH_ERR)
+			if (wc.status != IBV_WC_WR_FLUSH_ERR) {
+				fprintf(stderr, 
+					"cq completion failed status %d\n",
+					wc.status);
 				ret = -1;
+			}
 			goto error;
 		}
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found] ` <20101020192859.1431.68877.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
  2010-10-20 19:29   ` [PATCH 2/2] RPING: Remove printf for FLUSH completion Steve Wise
@ 2010-10-20 20:05   ` Bart Van Assche
       [not found]     ` <AANLkTi=uAuXT1RMEQ+vsA4FkN2828mZ2q0KPKXxpb6H0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2010-10-20 20:05 UTC (permalink / raw)
  To: Steve Wise
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 20, 2010 at 9:28 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+v7I7tHvgBF7@public.gmane.orgm> wrote:
>
> It is possible for the CQ event thread to poll the CQ after it has been
> destroyed which can result in a seg fault on T3 interfaces.  This patch
> cancels the thread and waits for it to exit before destroying the CQ.
>
> Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> ---
>
>  examples/rping.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/examples/rping.c b/examples/rping.c
> index 91952e7..e603d3b 100644
> --- a/examples/rping.c
> +++ b/examples/rping.c
> @@ -800,10 +800,10 @@ static void *rping_persistent_server_thread(void *arg)
>
>        rping_test_server(cb);
>        rdma_disconnect(cb->child_cm_id);
> -       rping_free_buffers(cb);
> -       rping_free_qp(cb);
>        pthread_cancel(cb->cqthread);
>        pthread_join(cb->cqthread, NULL);
> +       rping_free_buffers(cb);
> +       rping_free_qp(cb);
>        rdma_destroy_id(cb->child_cm_id);
>        free_cb(cb);
>        return NULL;
> @@ -888,6 +888,8 @@ static int rping_run_server(struct rping_cb *cb)
>
>        rping_test_server(cb);
>        rdma_disconnect(cb->child_cm_id);
> +       pthread_cancel(cb->cqthread);
> +       pthread_join(cb->cqthread, NULL);
>        rdma_destroy_id(cb->child_cm_id);
>  err2:
>        rping_free_buffers(cb);
> @@ -1055,6 +1057,8 @@ static int rping_run_client(struct rping_cb *cb)
>
>        rping_test_client(cb);
>        rdma_disconnect(cb->cm_id);
> +       pthread_cancel(cb->cqthread);
> +       pthread_join(cb->cqthread, NULL);
>  err2:
>        rping_free_buffers(cb);
>  err1:

Hello Steve,

Are you aware that in general it is easy to trigger a deadlock or
other undesired behavior by invoking pthread_cancel() ? If a thread
e.g. gets canceled after having obtained a mutex lock and before that
mutex is unlocked, this will cause trouble for any other thread that
tries to grab the mutex.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]     ` <AANLkTi=uAuXT1RMEQ+vsA4FkN2828mZ2q0KPKXxpb6H0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-10-20 20:12       ` Steve Wise
       [not found]         ` <4CBF4D30.3050500-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  2010-10-20 20:31       ` Jason Gunthorpe
  1 sibling, 1 reply; 15+ messages in thread
From: Steve Wise @ 2010-10-20 20:12 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 10/20/2010 03:05 PM, Bart Van Assche wrote:
> On Wed, Oct 20, 2010 at 9:28 PM, Steve Wise<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>  wrote:
>    
>> It is possible for the CQ event thread to poll the CQ after it has been
>> destroyed which can result in a seg fault on T3 interfaces.  This patch
>> cancels the thread and waits for it to exit before destroying the CQ.
>>
>> Signed-off-by: Steve Wise<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>> ---
>>
>>   examples/rping.c |    8 ++++++--
>>   1 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/examples/rping.c b/examples/rping.c
>> index 91952e7..e603d3b 100644
>> --- a/examples/rping.c
>> +++ b/examples/rping.c
>> @@ -800,10 +800,10 @@ static void *rping_persistent_server_thread(void *arg)
>>
>>         rping_test_server(cb);
>>         rdma_disconnect(cb->child_cm_id);
>> -       rping_free_buffers(cb);
>> -       rping_free_qp(cb);
>>         pthread_cancel(cb->cqthread);
>>         pthread_join(cb->cqthread, NULL);
>> +       rping_free_buffers(cb);
>> +       rping_free_qp(cb);
>>         rdma_destroy_id(cb->child_cm_id);
>>         free_cb(cb);
>>         return NULL;
>> @@ -888,6 +888,8 @@ static int rping_run_server(struct rping_cb *cb)
>>
>>         rping_test_server(cb);
>>         rdma_disconnect(cb->child_cm_id);
>> +       pthread_cancel(cb->cqthread);
>> +       pthread_join(cb->cqthread, NULL);
>>         rdma_destroy_id(cb->child_cm_id);
>>   err2:
>>         rping_free_buffers(cb);
>> @@ -1055,6 +1057,8 @@ static int rping_run_client(struct rping_cb *cb)
>>
>>         rping_test_client(cb);
>>         rdma_disconnect(cb->cm_id);
>> +       pthread_cancel(cb->cqthread);
>> +       pthread_join(cb->cqthread, NULL);
>>   err2:
>>         rping_free_buffers(cb);
>>   err1:
>>      
> Hello Steve,
>
> Are you aware that in general it is easy to trigger a deadlock or
> other undesired behavior by invoking pthread_cancel() ? If a thread
> e.g. gets canceled after having obtained a mutex lock and before that
> mutex is unlocked, this will cause trouble for any other thread that
> tries to grab the mutex.
>
>    

I was under the impression that the thread would only be canceled at 
precise cancellation points.  Like in system calls, or if the thread 
calls pthread_testcancel(), which is what rping does.   There are no 
mutexes held by the thread being canceled either.  I think its ok.  Do 
you see some other issue with the rping CQ event thread that would cause 
these problems?

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]         ` <4CBF4D30.3050500-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-10-20 20:23           ` Bart Van Assche
       [not found]             ` <AANLkTimTpM_h7P8tacZic2aP7i=dBiAquDN5S_D881ae-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2010-10-20 20:23 UTC (permalink / raw)
  To: Steve Wise
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 20, 2010 at 10:12 PM, Steve Wise
<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
> On 10/20/2010 03:05 PM, Bart Van Assche wrote:
>>
>> On Wed, Oct 20, 2010 at 9:28 PM, Steve Wise<swise@opengridcomputing.com>
>>  wrote:
>>
>>>
>>> It is possible for the CQ event thread to poll the CQ after it has been
>>> destroyed which can result in a seg fault on T3 interfaces.  This patch
>>> cancels the thread and waits for it to exit before destroying the CQ.
>>>
>>> Signed-off-by: Steve Wise<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>>> ---
>>>
>>>  examples/rping.c |    8 ++++++--
>>>  1 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/examples/rping.c b/examples/rping.c
>>> index 91952e7..e603d3b 100644
>>> --- a/examples/rping.c
>>> +++ b/examples/rping.c
>>> @@ -800,10 +800,10 @@ static void *rping_persistent_server_thread(void
>>> *arg)
>>>
>>>        rping_test_server(cb);
>>>        rdma_disconnect(cb->child_cm_id);
>>> -       rping_free_buffers(cb);
>>> -       rping_free_qp(cb);
>>>        pthread_cancel(cb->cqthread);
>>>        pthread_join(cb->cqthread, NULL);
>>> +       rping_free_buffers(cb);
>>> +       rping_free_qp(cb);
>>>        rdma_destroy_id(cb->child_cm_id);
>>>        free_cb(cb);
>>>        return NULL;
>>> @@ -888,6 +888,8 @@ static int rping_run_server(struct rping_cb *cb)
>>>
>>>        rping_test_server(cb);
>>>        rdma_disconnect(cb->child_cm_id);
>>> +       pthread_cancel(cb->cqthread);
>>> +       pthread_join(cb->cqthread, NULL);
>>>        rdma_destroy_id(cb->child_cm_id);
>>>  err2:
>>>        rping_free_buffers(cb);
>>> @@ -1055,6 +1057,8 @@ static int rping_run_client(struct rping_cb *cb)
>>>
>>>        rping_test_client(cb);
>>>        rdma_disconnect(cb->cm_id);
>>> +       pthread_cancel(cb->cqthread);
>>> +       pthread_join(cb->cqthread, NULL);
>>>  err2:
>>>        rping_free_buffers(cb);
>>>  err1:
>>>
>>
>> Hello Steve,
>>
>> Are you aware that in general it is easy to trigger a deadlock or
>> other undesired behavior by invoking pthread_cancel() ? If a thread
>> e.g. gets canceled after having obtained a mutex lock and before that
>> mutex is unlocked, this will cause trouble for any other thread that
>> tries to grab the mutex.
>
> I was under the impression that the thread would only be canceled at precise
> cancellation points.  Like in system calls, or if the thread calls
> pthread_testcancel(), which is what rping does.   There are no mutexes held
> by the thread being canceled either.  I think its ok.  Do you see some other
> issue with the rping CQ event thread that would cause these problems?

Please keep in mind that glibc uses locking internally for many file
I/O functions, e.g. printf() and fprintf().

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]             ` <AANLkTimTpM_h7P8tacZic2aP7i=dBiAquDN5S_D881ae-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-10-20 20:26               ` Steve Wise
  2010-10-20 20:35               ` Jason Gunthorpe
  1 sibling, 0 replies; 15+ messages in thread
From: Steve Wise @ 2010-10-20 20:26 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 10/20/2010 03:23 PM, Bart Van Assche wrote:
> On Wed, Oct 20, 2010 at 10:12 PM, Steve Wise
> <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>  wrote:
>    
>> On 10/20/2010 03:05 PM, Bart Van Assche wrote:
>>      
>>> On Wed, Oct 20, 2010 at 9:28 PM, Steve Wise<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>>>   wrote:
>>>
>>>        
>>>> It is possible for the CQ event thread to poll the CQ after it has been
>>>> destroyed which can result in a seg fault on T3 interfaces.  This patch
>>>> cancels the thread and waits for it to exit before destroying the CQ.
>>>>
>>>> Signed-off-by: Steve Wise<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>>>> ---
>>>>
>>>>   examples/rping.c |    8 ++++++--
>>>>   1 files changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/examples/rping.c b/examples/rping.c
>>>> index 91952e7..e603d3b 100644
>>>> --- a/examples/rping.c
>>>> +++ b/examples/rping.c
>>>> @@ -800,10 +800,10 @@ static void *rping_persistent_server_thread(void
>>>> *arg)
>>>>
>>>>         rping_test_server(cb);
>>>>         rdma_disconnect(cb->child_cm_id);
>>>> -       rping_free_buffers(cb);
>>>> -       rping_free_qp(cb);
>>>>         pthread_cancel(cb->cqthread);
>>>>         pthread_join(cb->cqthread, NULL);
>>>> +       rping_free_buffers(cb);
>>>> +       rping_free_qp(cb);
>>>>         rdma_destroy_id(cb->child_cm_id);
>>>>         free_cb(cb);
>>>>         return NULL;
>>>> @@ -888,6 +888,8 @@ static int rping_run_server(struct rping_cb *cb)
>>>>
>>>>         rping_test_server(cb);
>>>>         rdma_disconnect(cb->child_cm_id);
>>>> +       pthread_cancel(cb->cqthread);
>>>> +       pthread_join(cb->cqthread, NULL);
>>>>         rdma_destroy_id(cb->child_cm_id);
>>>>   err2:
>>>>         rping_free_buffers(cb);
>>>> @@ -1055,6 +1057,8 @@ static int rping_run_client(struct rping_cb *cb)
>>>>
>>>>         rping_test_client(cb);
>>>>         rdma_disconnect(cb->cm_id);
>>>> +       pthread_cancel(cb->cqthread);
>>>> +       pthread_join(cb->cqthread, NULL);
>>>>   err2:
>>>>         rping_free_buffers(cb);
>>>>   err1:
>>>>
>>>>          
>>> Hello Steve,
>>>
>>> Are you aware that in general it is easy to trigger a deadlock or
>>> other undesired behavior by invoking pthread_cancel() ? If a thread
>>> e.g. gets canceled after having obtained a mutex lock and before that
>>> mutex is unlocked, this will cause trouble for any other thread that
>>> tries to grab the mutex.
>>>        
>> I was under the impression that the thread would only be canceled at precise
>> cancellation points.  Like in system calls, or if the thread calls
>> pthread_testcancel(), which is what rping does.   There are no mutexes held
>> by the thread being canceled either.  I think its ok.  Do you see some other
>> issue with the rping CQ event thread that would cause these problems?
>>      
> Please keep in mind that glibc uses locking internally for many file
> I/O functions, e.g. printf() and fprintf().
>
> Bart.
>    

Ok, so if pthread_cancel() is broken, then what should I be using?

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]     ` <AANLkTi=uAuXT1RMEQ+vsA4FkN2828mZ2q0KPKXxpb6H0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2010-10-20 20:12       ` Steve Wise
@ 2010-10-20 20:31       ` Jason Gunthorpe
  1 sibling, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2010-10-20 20:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Steve Wise, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 20, 2010 at 10:05:36PM +0200, Bart Van Assche wrote:
> Are you aware that in general it is easy to trigger a deadlock or
> other undesired behavior by invoking pthread_cancel() ? If a thread
> e.g. gets canceled after having obtained a mutex lock and before that
> mutex is unlocked, this will cause trouble for any other thread that
> tries to grab the mutex.

Steve's specific case looks OK to me, at least considering how mlx4
and ibverbs operates, but I'm not sure that verbs and all of the
drivers have been carefully designed with cancellation safety in mind?

ie what happens to a ibv_poll_cq version that calls out to the kernel
like soft-iwarp? That will probably embed a cancelation point. Should
ibv_poll_cq be a cancellation point? etc..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]             ` <AANLkTimTpM_h7P8tacZic2aP7i=dBiAquDN5S_D881ae-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2010-10-20 20:26               ` Steve Wise
@ 2010-10-20 20:35               ` Jason Gunthorpe
       [not found]                 ` <20101020203551.GO10362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2010-10-20 20:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Steve Wise, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 20, 2010 at 10:23:25PM +0200, Bart Van Assche wrote:

> Please keep in mind that glibc uses locking internally for many file
> I/O functions, e.g. printf() and fprintf().

POSIX strictly defines what functions are, aren't and could be
cancellation points, yes it is a surprising list. But, glibc won't
break internally if you cancel any of its function calls. It has
cleanup handlers and what not to protect itself.

rping doesn't hold locks, or allocate memory in the cq_thread, so it
seems OK to me. My main question would be what happens internally to
ibverbs, and does it call any possible cancellation point while holding
locks - is it missing cleanup handlers? Etc.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]                 ` <20101020203551.GO10362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-10-20 20:56                   ` Steve Wise
       [not found]                     ` <4CBF5786.2020203-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Wise @ 2010-10-20 20:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bart Van Assche, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 10/20/2010 03:35 PM, Jason Gunthorpe wrote:
> On Wed, Oct 20, 2010 at 10:23:25PM +0200, Bart Van Assche wrote:
>
>    
>> Please keep in mind that glibc uses locking internally for many file
>> I/O functions, e.g. printf() and fprintf().
>>      
> POSIX strictly defines what functions are, aren't and could be
> cancellation points, yes it is a surprising list. But, glibc won't
> break internally if you cancel any of its function calls. It has
> cleanup handlers and what not to protect itself.
>
> rping doesn't hold locks, or allocate memory in the cq_thread, so it
> seems OK to me. My main question would be what happens internally to
> ibverbs, and does it call any possible cancellation point while holding
> locks - is it missing cleanup handlers? Etc.
>
> Jason
>    

Hey Jason, do you have a pointer to the list of pthread cancellation 
points for Linux?

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]                     ` <4CBF5786.2020203-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-10-20 21:05                       ` Robert D. Russell
  2010-10-20 21:16                       ` Jason Gunthorpe
  1 sibling, 0 replies; 15+ messages in thread
From: Robert D. Russell @ 2010-10-20 21:05 UTC (permalink / raw)
  To: Steve Wise
  Cc: Jason Gunthorpe, Bart Van Assche,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, 20 Oct 2010, Steve Wise wrote:

> On 10/20/2010 03:35 PM, Jason Gunthorpe wrote:
>> On Wed, Oct 20, 2010 at 10:23:25PM +0200, Bart Van Assche wrote:
>>
>> 
>>> Please keep in mind that glibc uses locking internally for many file
>>> I/O functions, e.g. printf() and fprintf().
>>> 
>> POSIX strictly defines what functions are, aren't and could be
>> cancellation points, yes it is a surprising list. But, glibc won't
>> break internally if you cancel any of its function calls. It has
>> cleanup handlers and what not to protect itself.
>> 
>> rping doesn't hold locks, or allocate memory in the cq_thread, so it
>> seems OK to me. My main question would be what happens internally to
>> ibverbs, and does it call any possible cancellation point while holding
>> locks - is it missing cleanup handlers? Etc.
>> 
>> Jason
>> 
>
> Hey Jason, do you have a pointer to the list of pthread cancellation points 
> for Linux?
>
> Steve.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Steve:

On a linux system, just try
man 7 pthreads
and scroll down to the cancellation points lists.

Bob

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/2] RPING: Remove printf for FLUSH completion.
       [not found]     ` <20101020192905.1431.40267.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
@ 2010-10-20 21:12       ` Hefty, Sean
       [not found]         ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B801FB15-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Hefty, Sean @ 2010-10-20 21:12 UTC (permalink / raw)
  To: Steve Wise; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

I applied this, but am waiting on the other patch until the discussion finishes up.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]                     ` <4CBF5786.2020203-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  2010-10-20 21:05                       ` Robert D. Russell
@ 2010-10-20 21:16                       ` Jason Gunthorpe
       [not found]                         ` <20101020211617.GP10362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2010-10-20 21:16 UTC (permalink / raw)
  To: Steve Wise
  Cc: Bart Van Assche, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 20, 2010 at 03:56:38PM -0500, Steve Wise wrote:

> Hey Jason, do you have a pointer to the list of pthread cancellation  
> points for Linux?

Sure do

http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html

POSIX:2008 is a more authoritative source, the above is cribbed from
there. You can see the list of cancelation points is pretty wild :(

IMHO, it would be a useful job for someone to sort this out for
verbs and the drivers - document what ibv interfaces are not
cancelation points and make all the ibv apis cancelation-safe.
I bet there isn't much coding, just lots of auditing. Just now I
looked through all the functions cq_thread called and decided
everything but ibv_poll_cq was certainly not a cancelation point for
mlx4/mtcha, while ibv_poll_cq was too complex to tell right away.

In glibc if a function is not cancelable it will be marked with __THROW.
I wonder if gcc (or sparse?) could check that functions
marked __THROW don't call other functions not marked __THROW ?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]                         ` <20101020211617.GP10362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-10-21 10:16                           ` Bart Van Assche
       [not found]                             ` <AANLkTi=eJdjgQ4MKPxddUA2eBOswf-5fo5wm=JzX=Hib-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2010-10-21 10:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Steve Wise, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 20, 2010 at 11:16 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Wed, Oct 20, 2010 at 03:56:38PM -0500, Steve Wise wrote:
>
>> Hey Jason, do you have a pointer to the list of pthread cancellation
>> points for Linux?
>
> Sure do
>
> http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html
>
> POSIX:2008 is a more authoritative source, the above is cribbed from
> there.
> [ ... ]

The IEEE Std 1003.1, 2004 Edition list of cancellation points is available here:
http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_09.html

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ.
       [not found]                             ` <AANLkTi=eJdjgQ4MKPxddUA2eBOswf-5fo5wm=JzX=Hib-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-10-21 10:40                               ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2010-10-21 10:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Steve Wise, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 21, 2010 at 12:16 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>
> On Wed, Oct 20, 2010 at 11:16 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Wed, Oct 20, 2010 at 03:56:38PM -0500, Steve Wise wrote:
> >
> >> Hey Jason, do you have a pointer to the list of pthread cancellation
> >> points for Linux?
> >
> > Sure do
> >
> > http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html
> >
> > POSIX:2008 is a more authoritative source, the above is cribbed from
> > there.
> > [ ... ]
>
> The IEEE Std 1003.1, 2004 Edition list of cancellation points is available here:
> http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_09.html

Just found out that the POSIX:2008 list is also available online:
http://www.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RPING: Remove printf for FLUSH completion.
       [not found]         ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B801FB15-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-10-21 12:30           ` Steve Wise
  0 siblings, 0 replies; 15+ messages in thread
From: Steve Wise @ 2010-10-21 12:30 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 10/20/2010 4:12 PM, Hefty, Sean wrote:
> I applied this, but am waiting on the other patch until the discussion finishes up.

Since pthread_cancel() seems to be iffy for user verbs, maybe I'll try 
another approach:  Since the rping main thread issues an 
rdma_disconnect(), we can assume that the CQ thread will definitely get 
an event and then poll the FLUSHED recv wr and pthread_exit().  So the 
main thread only needs to do a pthread_join() after issuing the 
disconnect.  I'll try this approach on iWARP and IB transports to make 
sure my assumptions are true...

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-10-21 12:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-20 19:28 [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ Steve Wise
     [not found] ` <20101020192859.1431.68877.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
2010-10-20 19:29   ` [PATCH 2/2] RPING: Remove printf for FLUSH completion Steve Wise
     [not found]     ` <20101020192905.1431.40267.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
2010-10-20 21:12       ` Hefty, Sean
     [not found]         ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B801FB15-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-10-21 12:30           ` Steve Wise
2010-10-20 20:05   ` [PATCH 1/2] RPING: Make sure CQ event thread exits before destroying the CQ Bart Van Assche
     [not found]     ` <AANLkTi=uAuXT1RMEQ+vsA4FkN2828mZ2q0KPKXxpb6H0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-20 20:12       ` Steve Wise
     [not found]         ` <4CBF4D30.3050500-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-10-20 20:23           ` Bart Van Assche
     [not found]             ` <AANLkTimTpM_h7P8tacZic2aP7i=dBiAquDN5S_D881ae-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-20 20:26               ` Steve Wise
2010-10-20 20:35               ` Jason Gunthorpe
     [not found]                 ` <20101020203551.GO10362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-10-20 20:56                   ` Steve Wise
     [not found]                     ` <4CBF5786.2020203-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-10-20 21:05                       ` Robert D. Russell
2010-10-20 21:16                       ` Jason Gunthorpe
     [not found]                         ` <20101020211617.GP10362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-10-21 10:16                           ` Bart Van Assche
     [not found]                             ` <AANLkTi=eJdjgQ4MKPxddUA2eBOswf-5fo5wm=JzX=Hib-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-21 10:40                               ` Bart Van Assche
2010-10-20 20:31       ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).