All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/ttm: update bulk move object of ghost BO
@ 2022-09-01  9:29 ZhenGuo Yin
  2022-09-01  9:55 ` JingWen Chen
  2022-09-01 11:11 ` Christian König
  0 siblings, 2 replies; 6+ messages in thread
From: ZhenGuo Yin @ 2022-09-01  9:29 UTC (permalink / raw)
  To: amd-gfx; +Cc: ZhenGuo Yin, jingwen.chen2, christian.koenig, dri-devel

[Why]
Ghost BO is released with non-empty bulk move object. There is a
warning trace:
WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
Call Trace:
  amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
  amdttm_bo_put+0x28/0x30 [amdttm]
  amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
  amdgpu_bo_move+0x1a8/0x770 [amdgpu]
  ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
  amdttm_bo_validate+0xbf/0x100 [amdttm]

[How]
The resource of ghost BO should be moved to LRU directly, instead of
using bulk move. The bulk move object of ghost BO should set to NULL
before function ttm_bo_move_to_lru_tail_unlocked.

Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1cbfb00c1d65..a90bbbd91910 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 
 	if (fbo->base.resource) {
 		ttm_resource_set_bo(fbo->base.resource, &fbo->base);
+		ttm_bo_set_bulk_move(&fbo->base, NULL);
 		bo->resource = NULL;
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
  2022-09-01  9:29 [PATCH] drm/ttm: update bulk move object of ghost BO ZhenGuo Yin
@ 2022-09-01  9:55 ` JingWen Chen
  2022-09-01 11:11 ` Christian König
  1 sibling, 0 replies; 6+ messages in thread
From: JingWen Chen @ 2022-09-01  9:55 UTC (permalink / raw)
  To: ZhenGuo Yin, amd-gfx; +Cc: jingwen.chen2, christian.koenig, dri-devel

Acked-by: Jingwen Chen <Jingwen.Chen2@amd.com>

still need confirmation from Christian

On 9/1/22 5:29 PM, ZhenGuo Yin wrote:
> [Why]
> Ghost BO is released with non-empty bulk move object. There is a
> warning trace:
> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
> Call Trace:
>   amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>   amdttm_bo_put+0x28/0x30 [amdttm]
>   amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>   amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>   ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>   amdttm_bo_validate+0xbf/0x100 [amdttm]
>
> [How]
> The resource of ghost BO should be moved to LRU directly, instead of
> using bulk move. The bulk move object of ghost BO should set to NULL
> before function ttm_bo_move_to_lru_tail_unlocked.
>
> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
> ---
>  drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index 1cbfb00c1d65..a90bbbd91910 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
>  
>  	if (fbo->base.resource) {
>  		ttm_resource_set_bo(fbo->base.resource, &fbo->base);
> +		ttm_bo_set_bulk_move(&fbo->base, NULL);
>  		bo->resource = NULL;
>  	}
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
  2022-09-01  9:29 [PATCH] drm/ttm: update bulk move object of ghost BO ZhenGuo Yin
  2022-09-01  9:55 ` JingWen Chen
@ 2022-09-01 11:11 ` Christian König
  2022-09-01 11:13   ` Christian König
  1 sibling, 1 reply; 6+ messages in thread
From: Christian König @ 2022-09-01 11:11 UTC (permalink / raw)
  To: ZhenGuo Yin, amd-gfx; +Cc: jingwen.chen2, christian.koenig, dri-devel

Am 01.09.22 um 11:29 schrieb ZhenGuo Yin:
> [Why]
> Ghost BO is released with non-empty bulk move object. There is a
> warning trace:
> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
> Call Trace:
>    amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>    amdttm_bo_put+0x28/0x30 [amdttm]
>    amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>    amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>    ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>    amdttm_bo_validate+0xbf/0x100 [amdttm]
>
> [How]
> The resource of ghost BO should be moved to LRU directly, instead of
> using bulk move. The bulk move object of ghost BO should set to NULL
> before function ttm_bo_move_to_lru_tail_unlocked.
>
> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>

Good catch, but the fix is not 100% correct. Please rather just NULL the 
member while initializing the BO structure.

E.g. something like this:

  ....
  fbo->base.pin_count = 0;
+fbo->base.bulk_move= NULL;
  if (bo->type != ttm_bo_type_sg)
  ....

Thanks,
Christian.

> ---
>   drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index 1cbfb00c1d65..a90bbbd91910 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
>   
>   	if (fbo->base.resource) {
>   		ttm_resource_set_bo(fbo->base.resource, &fbo->base);
> +		ttm_bo_set_bulk_move(&fbo->base, NULL);
>   		bo->resource = NULL;
>   	}
>   


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
  2022-09-01 11:11 ` Christian König
@ 2022-09-01 11:13   ` Christian König
  2022-09-05  7:59     ` Yin, ZhenGuo (Chris)
  0 siblings, 1 reply; 6+ messages in thread
From: Christian König @ 2022-09-01 11:13 UTC (permalink / raw)
  To: ZhenGuo Yin, amd-gfx; +Cc: jingwen.chen2, christian.koenig, dri-devel

Am 01.09.22 um 13:11 schrieb Christian König:
> Am 01.09.22 um 11:29 schrieb ZhenGuo Yin:
>> [Why]
>> Ghost BO is released with non-empty bulk move object. There is a
>> warning trace:
>> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 
>> ttm_bo_release+0x2e1/0x2f0 [amdttm]
>> Call Trace:
>>    amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>>    amdttm_bo_put+0x28/0x30 [amdttm]
>>    amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>>    amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>>    ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>>    amdttm_bo_validate+0xbf/0x100 [amdttm]
>>
>> [How]
>> The resource of ghost BO should be moved to LRU directly, instead of
>> using bulk move. The bulk move object of ghost BO should set to NULL
>> before function ttm_bo_move_to_lru_tail_unlocked.
>>
>> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
>
> Good catch, but the fix is not 100% correct. Please rather just NULL 
> the member while initializing the BO structure.
>
> E.g. something like this:
>
>  ....
>  fbo->base.pin_count = 0;
> +fbo->base.bulk_move= NULL;
>  if (bo->type != ttm_bo_type_sg)
>  ....

On the other hand thinking about it that won't work either.

You need to set bulk_move to NULL manually in an else clauses or 
something like this.

Regards,
Christian.

>
> Thanks,
> Christian.
>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> index 1cbfb00c1d65..a90bbbd91910 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct 
>> ttm_buffer_object *bo,
>>         if (fbo->base.resource) {
>>           ttm_resource_set_bo(fbo->base.resource, &fbo->base);
>> +        ttm_bo_set_bulk_move(&fbo->base, NULL);
>>           bo->resource = NULL;
>>       }
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
  2022-09-01 11:13   ` Christian König
@ 2022-09-05  7:59     ` Yin, ZhenGuo (Chris)
  2022-09-05 11:05       ` Christian König
  0 siblings, 1 reply; 6+ messages in thread
From: Yin, ZhenGuo (Chris) @ 2022-09-05  7:59 UTC (permalink / raw)
  To: Christian König, ZhenGuo Yin, amd-gfx
  Cc: jingwen.chen2, christian.koenig, dri-devel

Inside the function ttm_bo_set_bulk_move, it calls 
ttm_resource_del_bulk_move to remove the old resource from the bulk_move 
list.

If we set the bulk_move to NULL manually as suggested, the old resource 
attached in the ghost BO seems won't be removed from the bulk_move.

On 9/1/2022 7:13 PM, Christian König wrote:
> Am 01.09.22 um 13:11 schrieb Christian König:
>> Am 01.09.22 um 11:29 schrieb ZhenGuo Yin:
>>> [Why]
>>> Ghost BO is released with non-empty bulk move object. There is a
>>> warning trace:
>>> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 
>>> ttm_bo_release+0x2e1/0x2f0 [amdttm]
>>> Call Trace:
>>>    amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>>>    amdttm_bo_put+0x28/0x30 [amdttm]
>>>    amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>>>    amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>>>    ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>>>    amdttm_bo_validate+0xbf/0x100 [amdttm]
>>>
>>> [How]
>>> The resource of ghost BO should be moved to LRU directly, instead of
>>> using bulk move. The bulk move object of ghost BO should set to NULL
>>> before function ttm_bo_move_to_lru_tail_unlocked.
>>>
>>> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
>>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
>>
>> Good catch, but the fix is not 100% correct. Please rather just NULL 
>> the member while initializing the BO structure.
>>
>> E.g. something like this:
>>
>>  ....
>>  fbo->base.pin_count = 0;
>> +fbo->base.bulk_move= NULL;
>>  if (bo->type != ttm_bo_type_sg)
>>  ....
>
> On the other hand thinking about it that won't work either.
>
> You need to set bulk_move to NULL manually in an else clauses or 
> something like this.
>
> Regards,
> Christian.
>
>>
>> Thanks,
>> Christian.
>>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> index 1cbfb00c1d65..a90bbbd91910 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct 
>>> ttm_buffer_object *bo,
>>>         if (fbo->base.resource) {
>>>           ttm_resource_set_bo(fbo->base.resource, &fbo->base);
>>> +        ttm_bo_set_bulk_move(&fbo->base, NULL);
>>>           bo->resource = NULL;
>>>       }
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
  2022-09-05  7:59     ` Yin, ZhenGuo (Chris)
@ 2022-09-05 11:05       ` Christian König
  0 siblings, 0 replies; 6+ messages in thread
From: Christian König @ 2022-09-05 11:05 UTC (permalink / raw)
  To: Yin, ZhenGuo (Chris), ZhenGuo Yin, amd-gfx
  Cc: jingwen.chen2, christian.koenig, dri-devel

Yeah, I realized that as well after sending the first mail.

The problem is that we keep the bulk move around when there currently 
isn't any resource associated with the template.

So the correct code should look something like this:

if (fbo->base.resource) {
     ttm_resource_set_bo(fbo->base.resource, &fbo->base);
     bo->resource = NULL;
     ttm_bo_set_bulk_move(&fbo->base, NULL);
} else {
     fbo->bulk_move = NULL;
}

Regards,
Christian.

Am 05.09.22 um 09:59 schrieb Yin, ZhenGuo (Chris):
> Inside the function ttm_bo_set_bulk_move, it calls 
> ttm_resource_del_bulk_move to remove the old resource from the 
> bulk_move list.
>
> If we set the bulk_move to NULL manually as suggested, the old 
> resource attached in the ghost BO seems won't be removed from the 
> bulk_move.
>
> On 9/1/2022 7:13 PM, Christian König wrote:
>> Am 01.09.22 um 13:11 schrieb Christian König:
>>> Am 01.09.22 um 11:29 schrieb ZhenGuo Yin:
>>>> [Why]
>>>> Ghost BO is released with non-empty bulk move object. There is a
>>>> warning trace:
>>>> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 
>>>> ttm_bo_release+0x2e1/0x2f0 [amdttm]
>>>> Call Trace:
>>>>    amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>>>>    amdttm_bo_put+0x28/0x30 [amdttm]
>>>>    amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>>>>    amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>>>>    ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>>>>    amdttm_bo_validate+0xbf/0x100 [amdttm]
>>>>
>>>> [How]
>>>> The resource of ghost BO should be moved to LRU directly, instead of
>>>> using bulk move. The bulk move object of ghost BO should set to NULL
>>>> before function ttm_bo_move_to_lru_tail_unlocked.
>>>>
>>>> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
>>>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
>>>
>>> Good catch, but the fix is not 100% correct. Please rather just NULL 
>>> the member while initializing the BO structure.
>>>
>>> E.g. something like this:
>>>
>>>  ....
>>>  fbo->base.pin_count = 0;
>>> +fbo->base.bulk_move= NULL;
>>>  if (bo->type != ttm_bo_type_sg)
>>>  ....
>>
>> On the other hand thinking about it that won't work either.
>>
>> You need to set bulk_move to NULL manually in an else clauses or 
>> something like this.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Christian.
>>>
>>>> ---
>>>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
>>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> index 1cbfb00c1d65..a90bbbd91910 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct 
>>>> ttm_buffer_object *bo,
>>>>         if (fbo->base.resource) {
>>>>           ttm_resource_set_bo(fbo->base.resource, &fbo->base);
>>>> +        ttm_bo_set_bulk_move(&fbo->base, NULL);
>>>>           bo->resource = NULL;
>>>>       }
>>>
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-05 11:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-01  9:29 [PATCH] drm/ttm: update bulk move object of ghost BO ZhenGuo Yin
2022-09-01  9:55 ` JingWen Chen
2022-09-01 11:11 ` Christian König
2022-09-01 11:13   ` Christian König
2022-09-05  7:59     ` Yin, ZhenGuo (Chris)
2022-09-05 11:05       ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.