* [PATCH] drm/ttm: update bulk move object of ghost BO
@ 2022-09-01 9:29 ZhenGuo Yin
2022-09-01 9:55 ` JingWen Chen
2022-09-01 11:11 ` Christian König
0 siblings, 2 replies; 6+ messages in thread
From: ZhenGuo Yin @ 2022-09-01 9:29 UTC (permalink / raw)
To: amd-gfx; +Cc: ZhenGuo Yin, jingwen.chen2, christian.koenig, dri-devel
[Why]
Ghost BO is released with non-empty bulk move object. There is a
warning trace:
WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
Call Trace:
amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
amdttm_bo_put+0x28/0x30 [amdttm]
amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
amdgpu_bo_move+0x1a8/0x770 [amdgpu]
ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
amdttm_bo_validate+0xbf/0x100 [amdttm]
[How]
The resource of ghost BO should be moved to LRU directly, instead of
using bulk move. The bulk move object of ghost BO should set to NULL
before function ttm_bo_move_to_lru_tail_unlocked.
Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
---
drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1cbfb00c1d65..a90bbbd91910 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
if (fbo->base.resource) {
ttm_resource_set_bo(fbo->base.resource, &fbo->base);
+ ttm_bo_set_bulk_move(&fbo->base, NULL);
bo->resource = NULL;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
2022-09-01 9:29 [PATCH] drm/ttm: update bulk move object of ghost BO ZhenGuo Yin
@ 2022-09-01 9:55 ` JingWen Chen
2022-09-01 11:11 ` Christian König
1 sibling, 0 replies; 6+ messages in thread
From: JingWen Chen @ 2022-09-01 9:55 UTC (permalink / raw)
To: ZhenGuo Yin, amd-gfx; +Cc: jingwen.chen2, christian.koenig, dri-devel
Acked-by: Jingwen Chen <Jingwen.Chen2@amd.com>
still need confirmation from Christian
On 9/1/22 5:29 PM, ZhenGuo Yin wrote:
> [Why]
> Ghost BO is released with non-empty bulk move object. There is a
> warning trace:
> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
> Call Trace:
> amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
> amdttm_bo_put+0x28/0x30 [amdttm]
> amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
> amdgpu_bo_move+0x1a8/0x770 [amdgpu]
> ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
> amdttm_bo_validate+0xbf/0x100 [amdttm]
>
> [How]
> The resource of ghost BO should be moved to LRU directly, instead of
> using bulk move. The bulk move object of ghost BO should set to NULL
> before function ttm_bo_move_to_lru_tail_unlocked.
>
> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
> ---
> drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index 1cbfb00c1d65..a90bbbd91910 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
>
> if (fbo->base.resource) {
> ttm_resource_set_bo(fbo->base.resource, &fbo->base);
> + ttm_bo_set_bulk_move(&fbo->base, NULL);
> bo->resource = NULL;
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
2022-09-01 9:29 [PATCH] drm/ttm: update bulk move object of ghost BO ZhenGuo Yin
2022-09-01 9:55 ` JingWen Chen
@ 2022-09-01 11:11 ` Christian König
2022-09-01 11:13 ` Christian König
1 sibling, 1 reply; 6+ messages in thread
From: Christian König @ 2022-09-01 11:11 UTC (permalink / raw)
To: ZhenGuo Yin, amd-gfx; +Cc: jingwen.chen2, christian.koenig, dri-devel
Am 01.09.22 um 11:29 schrieb ZhenGuo Yin:
> [Why]
> Ghost BO is released with non-empty bulk move object. There is a
> warning trace:
> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
> Call Trace:
> amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
> amdttm_bo_put+0x28/0x30 [amdttm]
> amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
> amdgpu_bo_move+0x1a8/0x770 [amdgpu]
> ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
> amdttm_bo_validate+0xbf/0x100 [amdttm]
>
> [How]
> The resource of ghost BO should be moved to LRU directly, instead of
> using bulk move. The bulk move object of ghost BO should set to NULL
> before function ttm_bo_move_to_lru_tail_unlocked.
>
> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Good catch, but the fix is not 100% correct. Please rather just NULL the
member while initializing the BO structure.
E.g. something like this:
....
fbo->base.pin_count = 0;
+fbo->base.bulk_move= NULL;
if (bo->type != ttm_bo_type_sg)
....
Thanks,
Christian.
> ---
> drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index 1cbfb00c1d65..a90bbbd91910 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
>
> if (fbo->base.resource) {
> ttm_resource_set_bo(fbo->base.resource, &fbo->base);
> + ttm_bo_set_bulk_move(&fbo->base, NULL);
> bo->resource = NULL;
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
2022-09-01 11:11 ` Christian König
@ 2022-09-01 11:13 ` Christian König
2022-09-05 7:59 ` Yin, ZhenGuo (Chris)
0 siblings, 1 reply; 6+ messages in thread
From: Christian König @ 2022-09-01 11:13 UTC (permalink / raw)
To: ZhenGuo Yin, amd-gfx; +Cc: jingwen.chen2, christian.koenig, dri-devel
Am 01.09.22 um 13:11 schrieb Christian König:
> Am 01.09.22 um 11:29 schrieb ZhenGuo Yin:
>> [Why]
>> Ghost BO is released with non-empty bulk move object. There is a
>> warning trace:
>> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366
>> ttm_bo_release+0x2e1/0x2f0 [amdttm]
>> Call Trace:
>> amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>> amdttm_bo_put+0x28/0x30 [amdttm]
>> amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>> amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>> ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>> amdttm_bo_validate+0xbf/0x100 [amdttm]
>>
>> [How]
>> The resource of ghost BO should be moved to LRU directly, instead of
>> using bulk move. The bulk move object of ghost BO should set to NULL
>> before function ttm_bo_move_to_lru_tail_unlocked.
>>
>> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
>
> Good catch, but the fix is not 100% correct. Please rather just NULL
> the member while initializing the BO structure.
>
> E.g. something like this:
>
> ....
> fbo->base.pin_count = 0;
> +fbo->base.bulk_move= NULL;
> if (bo->type != ttm_bo_type_sg)
> ....
On the other hand thinking about it that won't work either.
You need to set bulk_move to NULL manually in an else clauses or
something like this.
Regards,
Christian.
>
> Thanks,
> Christian.
>
>> ---
>> drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c
>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> index 1cbfb00c1d65..a90bbbd91910 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct
>> ttm_buffer_object *bo,
>> if (fbo->base.resource) {
>> ttm_resource_set_bo(fbo->base.resource, &fbo->base);
>> + ttm_bo_set_bulk_move(&fbo->base, NULL);
>> bo->resource = NULL;
>> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
2022-09-01 11:13 ` Christian König
@ 2022-09-05 7:59 ` Yin, ZhenGuo (Chris)
2022-09-05 11:05 ` Christian König
0 siblings, 1 reply; 6+ messages in thread
From: Yin, ZhenGuo (Chris) @ 2022-09-05 7:59 UTC (permalink / raw)
To: Christian König, ZhenGuo Yin, amd-gfx
Cc: jingwen.chen2, christian.koenig, dri-devel
Inside the function ttm_bo_set_bulk_move, it calls
ttm_resource_del_bulk_move to remove the old resource from the bulk_move
list.
If we set the bulk_move to NULL manually as suggested, the old resource
attached in the ghost BO seems won't be removed from the bulk_move.
On 9/1/2022 7:13 PM, Christian König wrote:
> Am 01.09.22 um 13:11 schrieb Christian König:
>> Am 01.09.22 um 11:29 schrieb ZhenGuo Yin:
>>> [Why]
>>> Ghost BO is released with non-empty bulk move object. There is a
>>> warning trace:
>>> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366
>>> ttm_bo_release+0x2e1/0x2f0 [amdttm]
>>> Call Trace:
>>> amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>>> amdttm_bo_put+0x28/0x30 [amdttm]
>>> amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>>> amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>>> ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>>> amdttm_bo_validate+0xbf/0x100 [amdttm]
>>>
>>> [How]
>>> The resource of ghost BO should be moved to LRU directly, instead of
>>> using bulk move. The bulk move object of ghost BO should set to NULL
>>> before function ttm_bo_move_to_lru_tail_unlocked.
>>>
>>> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
>>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
>>
>> Good catch, but the fix is not 100% correct. Please rather just NULL
>> the member while initializing the BO structure.
>>
>> E.g. something like this:
>>
>> ....
>> fbo->base.pin_count = 0;
>> +fbo->base.bulk_move= NULL;
>> if (bo->type != ttm_bo_type_sg)
>> ....
>
> On the other hand thinking about it that won't work either.
>
> You need to set bulk_move to NULL manually in an else clauses or
> something like this.
>
> Regards,
> Christian.
>
>>
>> Thanks,
>> Christian.
>>
>>> ---
>>> drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> index 1cbfb00c1d65..a90bbbd91910 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct
>>> ttm_buffer_object *bo,
>>> if (fbo->base.resource) {
>>> ttm_resource_set_bo(fbo->base.resource, &fbo->base);
>>> + ttm_bo_set_bulk_move(&fbo->base, NULL);
>>> bo->resource = NULL;
>>> }
>>
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] drm/ttm: update bulk move object of ghost BO
2022-09-05 7:59 ` Yin, ZhenGuo (Chris)
@ 2022-09-05 11:05 ` Christian König
0 siblings, 0 replies; 6+ messages in thread
From: Christian König @ 2022-09-05 11:05 UTC (permalink / raw)
To: Yin, ZhenGuo (Chris), ZhenGuo Yin, amd-gfx
Cc: jingwen.chen2, christian.koenig, dri-devel
Yeah, I realized that as well after sending the first mail.
The problem is that we keep the bulk move around when there currently
isn't any resource associated with the template.
So the correct code should look something like this:
if (fbo->base.resource) {
ttm_resource_set_bo(fbo->base.resource, &fbo->base);
bo->resource = NULL;
ttm_bo_set_bulk_move(&fbo->base, NULL);
} else {
fbo->bulk_move = NULL;
}
Regards,
Christian.
Am 05.09.22 um 09:59 schrieb Yin, ZhenGuo (Chris):
> Inside the function ttm_bo_set_bulk_move, it calls
> ttm_resource_del_bulk_move to remove the old resource from the
> bulk_move list.
>
> If we set the bulk_move to NULL manually as suggested, the old
> resource attached in the ghost BO seems won't be removed from the
> bulk_move.
>
> On 9/1/2022 7:13 PM, Christian König wrote:
>> Am 01.09.22 um 13:11 schrieb Christian König:
>>> Am 01.09.22 um 11:29 schrieb ZhenGuo Yin:
>>>> [Why]
>>>> Ghost BO is released with non-empty bulk move object. There is a
>>>> warning trace:
>>>> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366
>>>> ttm_bo_release+0x2e1/0x2f0 [amdttm]
>>>> Call Trace:
>>>> amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>>>> amdttm_bo_put+0x28/0x30 [amdttm]
>>>> amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>>>> amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>>>> ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>>>> amdttm_bo_validate+0xbf/0x100 [amdttm]
>>>>
>>>> [How]
>>>> The resource of ghost BO should be moved to LRU directly, instead of
>>>> using bulk move. The bulk move object of ghost BO should set to NULL
>>>> before function ttm_bo_move_to_lru_tail_unlocked.
>>>>
>>>> Fixed:·5b951e487fd6bf5f·("drm/ttm:·fix·bulk·move·handling·v2")
>>>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
>>>
>>> Good catch, but the fix is not 100% correct. Please rather just NULL
>>> the member while initializing the BO structure.
>>>
>>> E.g. something like this:
>>>
>>> ....
>>> fbo->base.pin_count = 0;
>>> +fbo->base.bulk_move= NULL;
>>> if (bo->type != ttm_bo_type_sg)
>>> ....
>>
>> On the other hand thinking about it that won't work either.
>>
>> You need to set bulk_move to NULL manually in an else clauses or
>> something like this.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Christian.
>>>
>>>> ---
>>>> drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> index 1cbfb00c1d65..a90bbbd91910 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> @@ -238,6 +238,7 @@ static int ttm_buffer_object_transfer(struct
>>>> ttm_buffer_object *bo,
>>>> if (fbo->base.resource) {
>>>> ttm_resource_set_bo(fbo->base.resource, &fbo->base);
>>>> + ttm_bo_set_bulk_move(&fbo->base, NULL);
>>>> bo->resource = NULL;
>>>> }
>>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-09-05 11:06 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-01 9:29 [PATCH] drm/ttm: update bulk move object of ghost BO ZhenGuo Yin
2022-09-01 9:55 ` JingWen Chen
2022-09-01 11:11 ` Christian König
2022-09-01 11:13 ` Christian König
2022-09-05 7:59 ` Yin, ZhenGuo (Chris)
2022-09-05 11:05 ` Christian König
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.