public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Bad behavior by rdma-core ?
@ 2021-10-14 14:57 Bob Pearson
  2021-10-14 16:14 ` Bob Pearson
  0 siblings, 1 reply; 5+ messages in thread
From: Bob Pearson @ 2021-10-14 14:57 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe, Zhu Yanjun,
	linux-rdma@vger.kernel.org

I have been chasing a bug in the rxe driver seen in the python tests (test_cq_events_ud).
The following occurs

	The first time I execute this test it creates two AHs which are allocated by
	rdma-core and passed to rxe_create_ah. The test attempts to destroy them
	(i.e. rxe_destroy_ah is called in the provider driver) but rdma-core does not
	destroy them (i.e. rxe_destroy_ah is not called in the kernel).

	The rxe driver saves the AV state and some metadata for these AHs and keeps it
	since it thinks they are still active.

	The second or third time I execute this test two new AHs are created by
	rxe_create_ah but the memory passed in from rdma-core is the same as the first
	test. I.e. it has recycled them but they are still active in the driver so
	the result is chaos.

Somehow rdma-core thinks it has destroyed the AHs but it does not call down to the
driver. This only occurs for AHs AFAIK.

Bob 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bad behavior by rdma-core ?
  2021-10-14 14:57 Bad behavior by rdma-core ? Bob Pearson
@ 2021-10-14 16:14 ` Bob Pearson
  2021-10-14 16:43   ` Bob Pearson
  2021-10-14 18:32   ` Jason Gunthorpe
  0 siblings, 2 replies; 5+ messages in thread
From: Bob Pearson @ 2021-10-14 16:14 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe, Zhu Yanjun,
	linux-rdma@vger.kernel.org

On 10/14/21 9:57 AM, Bob Pearson wrote:
> I have been chasing a bug in the rxe driver seen in the python tests (test_cq_events_ud).
> The following occurs
> 
> 	The first time I execute this test it creates two AHs which are allocated by
> 	rdma-core and passed to rxe_create_ah. The test attempts to destroy them
> 	(i.e. rxe_destroy_ah is called in the provider driver) but rdma-core does not
> 	destroy them (i.e. rxe_destroy_ah is not called in the kernel).
> 
> 	The rxe driver saves the AV state and some metadata for these AHs and keeps it
> 	since it thinks they are still active.
> 
> 	The second or third time I execute this test two new AHs are created by
> 	rxe_create_ah but the memory passed in from rdma-core is the same as the first
> 	test. I.e. it has recycled them but they are still active in the driver so
> 	the result is chaos.
> 
> Somehow rdma-core thinks it has destroyed the AHs but it does not call down to the
> driver. This only occurs for AHs AFAIK.
> 
> Bob 
> 

The cause seems simple enough.

In uverbs_cmd.c ib_uverbs_create_ah() calls rdma_create_user_ah() which
eventually calls device->ops.create_user_ah() or device->ops.create_ah().

But ib_uverbs_destroy_ah does *not* call rdma_uverbs_destroy_ah() it just
deletes the object.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bad behavior by rdma-core ?
  2021-10-14 16:14 ` Bob Pearson
@ 2021-10-14 16:43   ` Bob Pearson
  2021-10-14 18:32   ` Jason Gunthorpe
  1 sibling, 0 replies; 5+ messages in thread
From: Bob Pearson @ 2021-10-14 16:43 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe, Zhu Yanjun,
	linux-rdma@vger.kernel.org

On 10/14/21 11:14 AM, Bob Pearson wrote:
> On 10/14/21 9:57 AM, Bob Pearson wrote:
>> I have been chasing a bug in the rxe driver seen in the python tests (test_cq_events_ud).
>> The following occurs
>>
>> 	The first time I execute this test it creates two AHs which are allocated by
>> 	rdma-core and passed to rxe_create_ah. The test attempts to destroy them
>> 	(i.e. rxe_destroy_ah is called in the provider driver) but rdma-core does not
>> 	destroy them (i.e. rxe_destroy_ah is not called in the kernel).
>>
>> 	The rxe driver saves the AV state and some metadata for these AHs and keeps it
>> 	since it thinks they are still active.
>>
>> 	The second or third time I execute this test two new AHs are created by
>> 	rxe_create_ah but the memory passed in from rdma-core is the same as the first
>> 	test. I.e. it has recycled them but they are still active in the driver so
>> 	the result is chaos.
>>
>> Somehow rdma-core thinks it has destroyed the AHs but it does not call down to the
>> driver. This only occurs for AHs AFAIK.
>>
>> Bob 
>>
> 
> The cause seems simple enough.
> 
> In uverbs_cmd.c ib_uverbs_create_ah() calls rdma_create_user_ah() which
> eventually calls device->ops.create_user_ah() or device->ops.create_ah().
> 
> But ib_uverbs_destroy_ah does *not* call rdma_uverbs_destroy_ah() it just
should be                                  rdma_destroy_user_ah()
> deletes the object.
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bad behavior by rdma-core ?
  2021-10-14 16:14 ` Bob Pearson
  2021-10-14 16:43   ` Bob Pearson
@ 2021-10-14 18:32   ` Jason Gunthorpe
  2021-10-14 20:08     ` Bob Pearson
  1 sibling, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2021-10-14 18:32 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Leon Romanovsky, Zhu Yanjun, linux-rdma@vger.kernel.org

On Thu, Oct 14, 2021 at 11:14:57AM -0500, Bob Pearson wrote:

> But ib_uverbs_destroy_ah does *not* call rdma_uverbs_destroy_ah() it just
> deletes the object.

ib_uverbs_destroy_ah
 uobj_perform_destroy
  __uobj_perform_destroy
   __uobj_get_destroy
    uobj_destroy
     uverbs_destroy_uobject:

    	} else if (uobj->object) {
		ret = uobj->uapi_object->type_class->destroy_hw(uobj, reason,
								attrs);

Which calls 

destroy_hw_idr_uobject
  	int ret = idr_type->destroy_object(uobj, why, attrs);

Which links to this:

DECLARE_UVERBS_NAMED_OBJECT(UVERBS_OBJECT_AH,
			    UVERBS_TYPE_ALLOC_IDR(uverbs_free_ah),
			    &UVERBS_METHOD(UVERBS_METHOD_AH_DESTROY));

And thus calls

static int uverbs_free_ah(struct ib_uobject *uobject,
			  enum rdma_remove_reason why,
			  struct uverbs_attr_bundle *attrs)
{
	return rdma_destroy_ah_user((struct ib_ah *)uobject->object,
				    RDMA_DESTROY_AH_SLEEPABLE,
				    &attrs->driver_udata);
}

So, look along that path and find out where it goes wrong?

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bad behavior by rdma-core ?
  2021-10-14 18:32   ` Jason Gunthorpe
@ 2021-10-14 20:08     ` Bob Pearson
  0 siblings, 0 replies; 5+ messages in thread
From: Bob Pearson @ 2021-10-14 20:08 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, Zhu Yanjun, linux-rdma@vger.kernel.org

On 10/14/21 1:32 PM, Jason Gunthorpe wrote:
> On Thu, Oct 14, 2021 at 11:14:57AM -0500, Bob Pearson wrote:
> 
>> But ib_uverbs_destroy_ah does *not* call rdma_uverbs_destroy_ah() it just
>> deletes the object.
> 
> ib_uverbs_destroy_ah
>  uobj_perform_destroy
>   __uobj_perform_destroy
>    __uobj_get_destroy
>     uobj_destroy
>      uverbs_destroy_uobject:
> 
>     	} else if (uobj->object) {
> 		ret = uobj->uapi_object->type_class->destroy_hw(uobj, reason,
> 								attrs);
> 
> Which calls 
> 
> destroy_hw_idr_uobject
>   	int ret = idr_type->destroy_object(uobj, why, attrs);
> 
> Which links to this:
> 
> DECLARE_UVERBS_NAMED_OBJECT(UVERBS_OBJECT_AH,
> 			    UVERBS_TYPE_ALLOC_IDR(uverbs_free_ah),
> 			    &UVERBS_METHOD(UVERBS_METHOD_AH_DESTROY));
> 
> And thus calls
> 
> static int uverbs_free_ah(struct ib_uobject *uobject,
> 			  enum rdma_remove_reason why,
> 			  struct uverbs_attr_bundle *attrs)
> {
> 	return rdma_destroy_ah_user((struct ib_ah *)uobject->object,
> 				    RDMA_DESTROY_AH_SLEEPABLE,
> 				    &attrs->driver_udata);
> }
> 
> So, look along that path and find out where it goes wrong?
> 
> Jason
> 
Thanks

I had more or less figured that out. I looked at other objects and saw a similar pattern.
I think I've traced the problem back to myself.

Bob

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-14 20:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-14 14:57 Bad behavior by rdma-core ? Bob Pearson
2021-10-14 16:14 ` Bob Pearson
2021-10-14 16:43   ` Bob Pearson
2021-10-14 18:32   ` Jason Gunthorpe
2021-10-14 20:08     ` Bob Pearson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox