All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
@ 2017-04-27  7:20 Chunming Zhou
       [not found] ` <1493277617-14966-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Chunming Zhou @ 2017-04-27  7:20 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Chunming Zhou, Monk.Liu-5C7GfCeVMHo

the case could happen when gpu reset:
1. when gpu reset, cs can be continue until sw queue is full, then push job will wait with holding pd reservation.
2. gpu_reset routine will also need pd reservation to restore page table from their shadow.
3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs releases reservation.

Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 9edb1a4..8c6d036 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1076,7 +1076,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	amdgpu_job_free_resources(job);
 
 	trace_amdgpu_cs_ioctl(job);
-	amd_sched_entity_push_job(&job->base);
 
 	return 0;
 }
@@ -1132,6 +1131,8 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 
 out:
 	amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
+	if (!r)
+		amd_sched_entity_push_job(&parser.job->base);
 	return r;
 }
 
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
       [not found] ` <1493277617-14966-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
@ 2017-04-27  8:01   ` zhoucm1
  0 siblings, 0 replies; 9+ messages in thread
From: zhoucm1 @ 2017-04-27  8:01 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk.Liu-5C7GfCeVMHo

pls ignore this one, will send another new one.

On 2017年04月27日 15:20, Chunming Zhou wrote:
> the case could happen when gpu reset:
> 1. when gpu reset, cs can be continue until sw queue is full, then push job will wait with holding pd reservation.
> 2. gpu_reset routine will also need pd reservation to restore page table from their shadow.
> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs releases reservation.
>
> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 9edb1a4..8c6d036 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1076,7 +1076,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   	amdgpu_job_free_resources(job);
>   
>   	trace_amdgpu_cs_ioctl(job);
> -	amd_sched_entity_push_job(&job->base);
>   
>   	return 0;
>   }
> @@ -1132,6 +1131,8 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>   
>   out:
>   	amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
> +	if (!r)
> +		amd_sched_entity_push_job(&parser.job->base);
>   	return r;
>   }
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
@ 2017-04-27  8:25 Chunming Zhou
       [not found] ` <1493281554-9898-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Chunming Zhou @ 2017-04-27  8:25 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Chunming Zhou, Monk.Liu-5C7GfCeVMHo

the case could happen when gpu reset:
1. when gpu reset, cs can be continue until sw queue is full, then push job will wait with holding pd reservation.
2. gpu_reset routine will also need pd reservation to restore page table from their shadow.
3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs releases reservation.

Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 9edb1a4..a6722a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	amdgpu_job_free_resources(job);
 
 	trace_amdgpu_cs_ioctl(job);
+	amdgpu_cs_parser_fini(p, 0, true);
 	amd_sched_entity_push_job(&job->base);
 
 	return 0;
@@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 
 	r = amdgpu_cs_submit(&parser, cs);
 
+	return r;
 out:
 	amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
 	return r;
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
       [not found] ` <1493281554-9898-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
@ 2017-04-27  9:08   ` Zhang, Jerry (Junwei)
       [not found]     ` <5901B526.6080305-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Jerry (Junwei) @ 2017-04-27  9:08 UTC (permalink / raw)
  To: Chunming Zhou, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Monk.Liu-5C7GfCeVMHo

On 04/27/2017 04:25 PM, Chunming Zhou wrote:
> the case could happen when gpu reset:
> 1. when gpu reset, cs can be continue until sw queue is full, then push job will wait with holding pd reservation.
> 2. gpu_reset routine will also need pd reservation to restore page table from their shadow.
> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs releases reservation.
>
> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 9edb1a4..a6722a7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   	amdgpu_job_free_resources(job);
>
>   	trace_amdgpu_cs_ioctl(job);
> +	amdgpu_cs_parser_fini(p, 0, true);

Please confirm:
It will not free/release more things that may be accessed in job working 
process, as cs_parse_fini free so many things related to parser.

Or we can just release the pd reservation here?

Jerry

>   	amd_sched_entity_push_job(&job->base);
>
>   	return 0;
> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>
>   	r = amdgpu_cs_submit(&parser, cs);
>
> +	return r;
>   out:
>   	amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>   	return r;
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
       [not found]     ` <5901B526.6080305-5C7GfCeVMHo@public.gmane.org>
@ 2017-04-27 10:11       ` Christian König
       [not found]         ` <62336e5c-571a-196f-6c07-be25c928cdd3-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2017-04-27 10:11 UTC (permalink / raw)
  To: Zhang, Jerry (Junwei), Chunming Zhou,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Monk.Liu-5C7GfCeVMHo

Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>> the case could happen when gpu reset:
>> 1. when gpu reset, cs can be continue until sw queue is full, then 
>> push job will wait with holding pd reservation.
>> 2. gpu_reset routine will also need pd reservation to restore page 
>> table from their shadow.
>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for 
>> cs releases reservation.
>>
>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 9edb1a4..a6722a7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct 
>> amdgpu_cs_parser *p,
>>       amdgpu_job_free_resources(job);
>>
>>       trace_amdgpu_cs_ioctl(job);
>> +    amdgpu_cs_parser_fini(p, 0, true);
>
> Please confirm:
> It will not free/release more things that may be accessed in job 
> working process, as cs_parse_fini free so many things related to parser.

Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:

         if (parser->job)
                 amdgpu_job_free(parser->job);

And amdgpu_cs_submit() sets the fence, so it must be called before 
amdgpu_cs_parser_fini().

But apart from that this is a rather nifty idea. What was the problem 
with the initial patch?

>
> Or we can just release the pd reservation here?

We could run into problem with the reservation ticked with that. So I 
would want to avoid that.

Christian.

>
> Jerry
>
>> amd_sched_entity_push_job(&job->base);
>>
>>       return 0;
>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, 
>> void *data, struct drm_file *filp)
>>
>>       r = amdgpu_cs_submit(&parser, cs);
>>
>> +    return r;
>>   out:
>>       amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>       return r;
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
       [not found]         ` <62336e5c-571a-196f-6c07-be25c928cdd3-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-04-27 10:23           ` zhoucm1
       [not found]             ` <5901C6BD.1070303-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: zhoucm1 @ 2017-04-27 10:23 UTC (permalink / raw)
  To: Christian König, Zhang, Jerry (Junwei),
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Monk.Liu-5C7GfCeVMHo



On 2017年04月27日 18:11, Christian König wrote:
> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>> the case could happen when gpu reset:
>>> 1. when gpu reset, cs can be continue until sw queue is full, then 
>>> push job will wait with holding pd reservation.
>>> 2. gpu_reset routine will also need pd reservation to restore page 
>>> table from their shadow.
>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting 
>>> for cs releases reservation.
>>>
>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index 9edb1a4..a6722a7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct 
>>> amdgpu_cs_parser *p,
>>>       amdgpu_job_free_resources(job);
>>>
>>>       trace_amdgpu_cs_ioctl(job);
>>> +    amdgpu_cs_parser_fini(p, 0, true);
>>
>> Please confirm:
>> It will not free/release more things that may be accessed in job 
>> working process, as cs_parse_fini free so many things related to parser.
>
> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>
No, with my patch, parser->job already is NULL here, same as previous 
sequence, just put push_job action after amdgpu_cs_parser_fini()
>         if (parser->job)
>                 amdgpu_job_free(parser->job);
>
> And amdgpu_cs_submit() sets the fence, so it must be called before 
> amdgpu_cs_parser_fini().
yes, fence is already generated, so amdgpu_cs_parser_fini before pushing 
job is completely safe here.
>
> But apart from that this is a rather nifty idea. What was the problem 
> with the initial patch?
Just as comments in patch, which is found by Monk.

Regards,
David Zhou
>
>>
>> Or we can just release the pd reservation here?
>
> We could run into problem with the reservation ticked with that. So I 
> would want to avoid that.
>
> Christian.
>
>>
>> Jerry
>>
>>> amd_sched_entity_push_job(&job->base);
>>>
>>>       return 0;
>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, 
>>> void *data, struct drm_file *filp)
>>>
>>>       r = amdgpu_cs_submit(&parser, cs);
>>>
>>> +    return r;
>>>   out:
>>>       amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>       return r;
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
       [not found]             ` <5901C6BD.1070303-5C7GfCeVMHo@public.gmane.org>
@ 2017-04-27 12:04               ` Christian König
       [not found]                 ` <b6e45c8f-40a1-4185-89fb-1f4f7719c0fb-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  2017-04-27 14:02               ` Liu, Monk
  1 sibling, 1 reply; 9+ messages in thread
From: Christian König @ 2017-04-27 12:04 UTC (permalink / raw)
  To: zhoucm1, Zhang, Jerry (Junwei),
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Monk.Liu-5C7GfCeVMHo

Am 27.04.2017 um 12:23 schrieb zhoucm1:
>
>
> On 2017年04月27日 18:11, Christian König wrote:
>> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>>> the case could happen when gpu reset:
>>>> 1. when gpu reset, cs can be continue until sw queue is full, then 
>>>> push job will wait with holding pd reservation.
>>>> 2. gpu_reset routine will also need pd reservation to restore page 
>>>> table from their shadow.
>>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting 
>>>> for cs releases reservation.
>>>>
>>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> index 9edb1a4..a6722a7 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct 
>>>> amdgpu_cs_parser *p,
>>>>       amdgpu_job_free_resources(job);
>>>>
>>>>       trace_amdgpu_cs_ioctl(job);
>>>> +    amdgpu_cs_parser_fini(p, 0, true);
>>>
>>> Please confirm:
>>> It will not free/release more things that may be accessed in job 
>>> working process, as cs_parse_fini free so many things related to 
>>> parser.
>>
>> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>>
> No, with my patch, parser->job already is NULL here, same as previous 
> sequence, just put push_job action after amdgpu_cs_parser_fini()

Ah! You add that into amdgpu_cs_submit! That was the point I was missing.

>   	trace_amdgpu_cs_ioctl(job);
> +	amdgpu_cs_parser_fini(p, 0, true);
>   	amd_sched_entity_push_job(&job->base);
Can we keep the trace_amdgpu_cs_ioctl(); directly before the 
amd_sched_entity_push_job()?

With that changed, the patch is Reviewed-by: Christian König 
<christian.koenig@amd.com>.

Regards,
Christian.

>>         if (parser->job)
>>                 amdgpu_job_free(parser->job);
>>
>> And amdgpu_cs_submit() sets the fence, so it must be called before 
>> amdgpu_cs_parser_fini().
> yes, fence is already generated, so amdgpu_cs_parser_fini before 
> pushing job is completely safe here.
>>
>> But apart from that this is a rather nifty idea. What was the problem 
>> with the initial patch?
> Just as comments in patch, which is found by Monk.
>
> Regards,
> David Zhou
>>
>>>
>>> Or we can just release the pd reservation here?
>>
>> We could run into problem with the reservation ticked with that. So I 
>> would want to avoid that.
>>
>> Christian.
>>
>>>
>>> Jerry
>>>
>>>> amd_sched_entity_push_job(&job->base);
>>>>
>>>>       return 0;
>>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, 
>>>> void *data, struct drm_file *filp)
>>>>
>>>>       r = amdgpu_cs_submit(&parser, cs);
>>>>
>>>> +    return r;
>>>>   out:
>>>>       amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>>       return r;
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
       [not found]             ` <5901C6BD.1070303-5C7GfCeVMHo@public.gmane.org>
  2017-04-27 12:04               ` Christian König
@ 2017-04-27 14:02               ` Liu, Monk
  1 sibling, 0 replies; 9+ messages in thread
From: Liu, Monk @ 2017-04-27 14:02 UTC (permalink / raw)
  To: Zhou, David(ChunMing), Christian König, Zhang, Jerry,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org

A really good improvement 

Reviewed-by: Monk Liu <monk.liu@amd.com>

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of zhoucm1
Sent: Thursday, April 27, 2017 6:24 PM
To: Christian König <deathsimple@vodafone.de>; Zhang, Jerry <Jerry.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Liu, Monk <Monk.Liu@amd.com>
Subject: Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset



On 2017年04月27日 18:11, Christian König wrote:
> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>> the case could happen when gpu reset:
>>> 1. when gpu reset, cs can be continue until sw queue is full, then 
>>> push job will wait with holding pd reservation.
>>> 2. gpu_reset routine will also need pd reservation to restore page 
>>> table from their shadow.
>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting 
>>> for cs releases reservation.
>>>
>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index 9edb1a4..a6722a7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct 
>>> amdgpu_cs_parser *p,
>>>       amdgpu_job_free_resources(job);
>>>
>>>       trace_amdgpu_cs_ioctl(job);
>>> +    amdgpu_cs_parser_fini(p, 0, true);
>>
>> Please confirm:
>> It will not free/release more things that may be accessed in job 
>> working process, as cs_parse_fini free so many things related to parser.
>
> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>
No, with my patch, parser->job already is NULL here, same as previous sequence, just put push_job action after amdgpu_cs_parser_fini()
>         if (parser->job)
>                 amdgpu_job_free(parser->job);
>
> And amdgpu_cs_submit() sets the fence, so it must be called before 
> amdgpu_cs_parser_fini().
yes, fence is already generated, so amdgpu_cs_parser_fini before pushing job is completely safe here.
>
> But apart from that this is a rather nifty idea. What was the problem 
> with the initial patch?
Just as comments in patch, which is found by Monk.

Regards,
David Zhou
>
>>
>> Or we can just release the pd reservation here?
>
> We could run into problem with the reservation ticked with that. So I 
> would want to avoid that.
>
> Christian.
>
>>
>> Jerry
>>
>>> amd_sched_entity_push_job(&job->base);
>>>
>>>       return 0;
>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, 
>>> void *data, struct drm_file *filp)
>>>
>>>       r = amdgpu_cs_submit(&parser, cs);
>>>
>>> +    return r;
>>>   out:
>>>       amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>       return r;
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
       [not found]                 ` <b6e45c8f-40a1-4185-89fb-1f4f7719c0fb-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-04-28  2:37                   ` Zhang, Jerry (Junwei)
  0 siblings, 0 replies; 9+ messages in thread
From: Zhang, Jerry (Junwei) @ 2017-04-28  2:37 UTC (permalink / raw)
  To: Christian König, zhoucm1,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Monk.Liu-5C7GfCeVMHo

On 04/27/2017 08:04 PM, Christian König wrote:
> Am 27.04.2017 um 12:23 schrieb zhoucm1:
>>
>>
>> On 2017年04月27日 18:11, Christian König wrote:
>>> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>>>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>>>> the case could happen when gpu reset:
>>>>> 1. when gpu reset, cs can be continue until sw queue is full, then push
>>>>> job will wait with holding pd reservation.
>>>>> 2. gpu_reset routine will also need pd reservation to restore page table
>>>>> from their shadow.
>>>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs
>>>>> releases reservation.
>>>>>
>>>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>>>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>>>   1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> index 9edb1a4..a6722a7 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>       amdgpu_job_free_resources(job);
>>>>>
>>>>>       trace_amdgpu_cs_ioctl(job);
>>>>> +    amdgpu_cs_parser_fini(p, 0, true);
>>>>
>>>> Please confirm:
>>>> It will not free/release more things that may be accessed in job working
>>>> process, as cs_parse_fini free so many things related to parser.
>>>
>>> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>>>
>> No, with my patch, parser->job already is NULL here, same as previous
>> sequence, just put push_job action after amdgpu_cs_parser_fini()
>
> Ah! You add that into amdgpu_cs_submit! That was the point I was missing.

Thanks your all input.
See it too.

Feel free to add my RB, if need.
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>

Jerry

>
>>       trace_amdgpu_cs_ioctl(job);
>> +    amdgpu_cs_parser_fini(p, 0, true);
>>       amd_sched_entity_push_job(&job->base);
> Can we keep the trace_amdgpu_cs_ioctl(); directly before the
> amd_sched_entity_push_job()?
>
> With that changed, the patch is Reviewed-by: Christian König
> <christian.koenig@amd.com>.
>
> Regards,
> Christian.
>
>>>         if (parser->job)
>>>                 amdgpu_job_free(parser->job);
>>>
>>> And amdgpu_cs_submit() sets the fence, so it must be called before
>>> amdgpu_cs_parser_fini().
>> yes, fence is already generated, so amdgpu_cs_parser_fini before pushing job
>> is completely safe here.
>>>
>>> But apart from that this is a rather nifty idea. What was the problem with
>>> the initial patch?
>> Just as comments in patch, which is found by Monk.
>>
>> Regards,
>> David Zhou
>>>
>>>>
>>>> Or we can just release the pd reservation here?
>>>
>>> We could run into problem with the reservation ticked with that. So I would
>>> want to avoid that.
>>>
>>> Christian.
>>>
>>>>
>>>> Jerry
>>>>
>>>>> amd_sched_entity_push_job(&job->base);
>>>>>
>>>>>       return 0;
>>>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void
>>>>> *data, struct drm_file *filp)
>>>>>
>>>>>       r = amdgpu_cs_submit(&parser, cs);
>>>>>
>>>>> +    return r;
>>>>>   out:
>>>>>       amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>>>       return r;
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-04-28  2:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-27  8:25 [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset Chunming Zhou
     [not found] ` <1493281554-9898-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
2017-04-27  9:08   ` Zhang, Jerry (Junwei)
     [not found]     ` <5901B526.6080305-5C7GfCeVMHo@public.gmane.org>
2017-04-27 10:11       ` Christian König
     [not found]         ` <62336e5c-571a-196f-6c07-be25c928cdd3-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-04-27 10:23           ` zhoucm1
     [not found]             ` <5901C6BD.1070303-5C7GfCeVMHo@public.gmane.org>
2017-04-27 12:04               ` Christian König
     [not found]                 ` <b6e45c8f-40a1-4185-89fb-1f4f7719c0fb-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-04-28  2:37                   ` Zhang, Jerry (Junwei)
2017-04-27 14:02               ` Liu, Monk
  -- strict thread matches above, loose matches on Subject: below --
2017-04-27  7:20 Chunming Zhou
     [not found] ` <1493277617-14966-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
2017-04-27  8:01   ` zhoucm1

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.