* [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
@ 2017-04-27 7:20 Chunming Zhou
[not found] ` <1493277617-14966-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Chunming Zhou @ 2017-04-27 7:20 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Chunming Zhou, Monk.Liu-5C7GfCeVMHo
the case could happen when gpu reset:
1. when gpu reset, cs can be continue until sw queue is full, then push job will wait with holding pd reservation.
2. gpu_reset routine will also need pd reservation to restore page table from their shadow.
3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs releases reservation.
Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 9edb1a4..8c6d036 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1076,7 +1076,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
amdgpu_job_free_resources(job);
trace_amdgpu_cs_ioctl(job);
- amd_sched_entity_push_job(&job->base);
return 0;
}
@@ -1132,6 +1131,8 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
out:
amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
+ if (!r)
+ amd_sched_entity_push_job(&parser.job->base);
return r;
}
--
1.9.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
[not found] ` <1493277617-14966-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
@ 2017-04-27 8:01 ` zhoucm1
0 siblings, 0 replies; 9+ messages in thread
From: zhoucm1 @ 2017-04-27 8:01 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk.Liu-5C7GfCeVMHo
pls ignore this one, will send another new one.
On 2017年04月27日 15:20, Chunming Zhou wrote:
> the case could happen when gpu reset:
> 1. when gpu reset, cs can be continue until sw queue is full, then push job will wait with holding pd reservation.
> 2. gpu_reset routine will also need pd reservation to restore page table from their shadow.
> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs releases reservation.
>
> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 9edb1a4..8c6d036 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1076,7 +1076,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> amdgpu_job_free_resources(job);
>
> trace_amdgpu_cs_ioctl(job);
> - amd_sched_entity_push_job(&job->base);
>
> return 0;
> }
> @@ -1132,6 +1131,8 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>
> out:
> amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
> + if (!r)
> + amd_sched_entity_push_job(&parser.job->base);
> return r;
> }
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
@ 2017-04-27 8:25 Chunming Zhou
[not found] ` <1493281554-9898-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Chunming Zhou @ 2017-04-27 8:25 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Chunming Zhou, Monk.Liu-5C7GfCeVMHo
the case could happen when gpu reset:
1. when gpu reset, cs can be continue until sw queue is full, then push job will wait with holding pd reservation.
2. gpu_reset routine will also need pd reservation to restore page table from their shadow.
3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs releases reservation.
Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 9edb1a4..a6722a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
amdgpu_job_free_resources(job);
trace_amdgpu_cs_ioctl(job);
+ amdgpu_cs_parser_fini(p, 0, true);
amd_sched_entity_push_job(&job->base);
return 0;
@@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
r = amdgpu_cs_submit(&parser, cs);
+ return r;
out:
amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
return r;
--
1.9.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
[not found] ` <1493281554-9898-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
@ 2017-04-27 9:08 ` Zhang, Jerry (Junwei)
[not found] ` <5901B526.6080305-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Jerry (Junwei) @ 2017-04-27 9:08 UTC (permalink / raw)
To: Chunming Zhou, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Monk.Liu-5C7GfCeVMHo
On 04/27/2017 04:25 PM, Chunming Zhou wrote:
> the case could happen when gpu reset:
> 1. when gpu reset, cs can be continue until sw queue is full, then push job will wait with holding pd reservation.
> 2. gpu_reset routine will also need pd reservation to restore page table from their shadow.
> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs releases reservation.
>
> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 9edb1a4..a6722a7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> amdgpu_job_free_resources(job);
>
> trace_amdgpu_cs_ioctl(job);
> + amdgpu_cs_parser_fini(p, 0, true);
Please confirm:
It will not free/release more things that may be accessed in job working
process, as cs_parse_fini free so many things related to parser.
Or we can just release the pd reservation here?
Jerry
> amd_sched_entity_push_job(&job->base);
>
> return 0;
> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>
> r = amdgpu_cs_submit(&parser, cs);
>
> + return r;
> out:
> amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
> return r;
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
[not found] ` <5901B526.6080305-5C7GfCeVMHo@public.gmane.org>
@ 2017-04-27 10:11 ` Christian König
[not found] ` <62336e5c-571a-196f-6c07-be25c928cdd3-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2017-04-27 10:11 UTC (permalink / raw)
To: Zhang, Jerry (Junwei), Chunming Zhou,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Monk.Liu-5C7GfCeVMHo
Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>> the case could happen when gpu reset:
>> 1. when gpu reset, cs can be continue until sw queue is full, then
>> push job will wait with holding pd reservation.
>> 2. gpu_reset routine will also need pd reservation to restore page
>> table from their shadow.
>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for
>> cs releases reservation.
>>
>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 9edb1a4..a6722a7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct
>> amdgpu_cs_parser *p,
>> amdgpu_job_free_resources(job);
>>
>> trace_amdgpu_cs_ioctl(job);
>> + amdgpu_cs_parser_fini(p, 0, true);
>
> Please confirm:
> It will not free/release more things that may be accessed in job
> working process, as cs_parse_fini free so many things related to parser.
Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
if (parser->job)
amdgpu_job_free(parser->job);
And amdgpu_cs_submit() sets the fence, so it must be called before
amdgpu_cs_parser_fini().
But apart from that this is a rather nifty idea. What was the problem
with the initial patch?
>
> Or we can just release the pd reservation here?
We could run into problem with the reservation ticked with that. So I
would want to avoid that.
Christian.
>
> Jerry
>
>> amd_sched_entity_push_job(&job->base);
>>
>> return 0;
>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev,
>> void *data, struct drm_file *filp)
>>
>> r = amdgpu_cs_submit(&parser, cs);
>>
>> + return r;
>> out:
>> amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>> return r;
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
[not found] ` <62336e5c-571a-196f-6c07-be25c928cdd3-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-04-27 10:23 ` zhoucm1
[not found] ` <5901C6BD.1070303-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: zhoucm1 @ 2017-04-27 10:23 UTC (permalink / raw)
To: Christian König, Zhang, Jerry (Junwei),
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Monk.Liu-5C7GfCeVMHo
On 2017年04月27日 18:11, Christian König wrote:
> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>> the case could happen when gpu reset:
>>> 1. when gpu reset, cs can be continue until sw queue is full, then
>>> push job will wait with holding pd reservation.
>>> 2. gpu_reset routine will also need pd reservation to restore page
>>> table from their shadow.
>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting
>>> for cs releases reservation.
>>>
>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index 9edb1a4..a6722a7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct
>>> amdgpu_cs_parser *p,
>>> amdgpu_job_free_resources(job);
>>>
>>> trace_amdgpu_cs_ioctl(job);
>>> + amdgpu_cs_parser_fini(p, 0, true);
>>
>> Please confirm:
>> It will not free/release more things that may be accessed in job
>> working process, as cs_parse_fini free so many things related to parser.
>
> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>
No, with my patch, parser->job already is NULL here, same as previous
sequence, just put push_job action after amdgpu_cs_parser_fini()
> if (parser->job)
> amdgpu_job_free(parser->job);
>
> And amdgpu_cs_submit() sets the fence, so it must be called before
> amdgpu_cs_parser_fini().
yes, fence is already generated, so amdgpu_cs_parser_fini before pushing
job is completely safe here.
>
> But apart from that this is a rather nifty idea. What was the problem
> with the initial patch?
Just as comments in patch, which is found by Monk.
Regards,
David Zhou
>
>>
>> Or we can just release the pd reservation here?
>
> We could run into problem with the reservation ticked with that. So I
> would want to avoid that.
>
> Christian.
>
>>
>> Jerry
>>
>>> amd_sched_entity_push_job(&job->base);
>>>
>>> return 0;
>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev,
>>> void *data, struct drm_file *filp)
>>>
>>> r = amdgpu_cs_submit(&parser, cs);
>>>
>>> + return r;
>>> out:
>>> amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>> return r;
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
[not found] ` <5901C6BD.1070303-5C7GfCeVMHo@public.gmane.org>
@ 2017-04-27 12:04 ` Christian König
[not found] ` <b6e45c8f-40a1-4185-89fb-1f4f7719c0fb-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-04-27 14:02 ` Liu, Monk
1 sibling, 1 reply; 9+ messages in thread
From: Christian König @ 2017-04-27 12:04 UTC (permalink / raw)
To: zhoucm1, Zhang, Jerry (Junwei),
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Monk.Liu-5C7GfCeVMHo
Am 27.04.2017 um 12:23 schrieb zhoucm1:
>
>
> On 2017年04月27日 18:11, Christian König wrote:
>> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>>> the case could happen when gpu reset:
>>>> 1. when gpu reset, cs can be continue until sw queue is full, then
>>>> push job will wait with holding pd reservation.
>>>> 2. gpu_reset routine will also need pd reservation to restore page
>>>> table from their shadow.
>>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting
>>>> for cs releases reservation.
>>>>
>>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>> 1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> index 9edb1a4..a6722a7 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct
>>>> amdgpu_cs_parser *p,
>>>> amdgpu_job_free_resources(job);
>>>>
>>>> trace_amdgpu_cs_ioctl(job);
>>>> + amdgpu_cs_parser_fini(p, 0, true);
>>>
>>> Please confirm:
>>> It will not free/release more things that may be accessed in job
>>> working process, as cs_parse_fini free so many things related to
>>> parser.
>>
>> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>>
> No, with my patch, parser->job already is NULL here, same as previous
> sequence, just put push_job action after amdgpu_cs_parser_fini()
Ah! You add that into amdgpu_cs_submit! That was the point I was missing.
> trace_amdgpu_cs_ioctl(job);
> + amdgpu_cs_parser_fini(p, 0, true);
> amd_sched_entity_push_job(&job->base);
Can we keep the trace_amdgpu_cs_ioctl(); directly before the
amd_sched_entity_push_job()?
With that changed, the patch is Reviewed-by: Christian König
<christian.koenig@amd.com>.
Regards,
Christian.
>> if (parser->job)
>> amdgpu_job_free(parser->job);
>>
>> And amdgpu_cs_submit() sets the fence, so it must be called before
>> amdgpu_cs_parser_fini().
> yes, fence is already generated, so amdgpu_cs_parser_fini before
> pushing job is completely safe here.
>>
>> But apart from that this is a rather nifty idea. What was the problem
>> with the initial patch?
> Just as comments in patch, which is found by Monk.
>
> Regards,
> David Zhou
>>
>>>
>>> Or we can just release the pd reservation here?
>>
>> We could run into problem with the reservation ticked with that. So I
>> would want to avoid that.
>>
>> Christian.
>>
>>>
>>> Jerry
>>>
>>>> amd_sched_entity_push_job(&job->base);
>>>>
>>>> return 0;
>>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev,
>>>> void *data, struct drm_file *filp)
>>>>
>>>> r = amdgpu_cs_submit(&parser, cs);
>>>>
>>>> + return r;
>>>> out:
>>>> amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>> return r;
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
[not found] ` <5901C6BD.1070303-5C7GfCeVMHo@public.gmane.org>
2017-04-27 12:04 ` Christian König
@ 2017-04-27 14:02 ` Liu, Monk
1 sibling, 0 replies; 9+ messages in thread
From: Liu, Monk @ 2017-04-27 14:02 UTC (permalink / raw)
To: Zhou, David(ChunMing), Christian König, Zhang, Jerry,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
A really good improvement
Reviewed-by: Monk Liu <monk.liu@amd.com>
-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of zhoucm1
Sent: Thursday, April 27, 2017 6:24 PM
To: Christian König <deathsimple@vodafone.de>; Zhang, Jerry <Jerry.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Liu, Monk <Monk.Liu@amd.com>
Subject: Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
On 2017年04月27日 18:11, Christian König wrote:
> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>> the case could happen when gpu reset:
>>> 1. when gpu reset, cs can be continue until sw queue is full, then
>>> push job will wait with holding pd reservation.
>>> 2. gpu_reset routine will also need pd reservation to restore page
>>> table from their shadow.
>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting
>>> for cs releases reservation.
>>>
>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index 9edb1a4..a6722a7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct
>>> amdgpu_cs_parser *p,
>>> amdgpu_job_free_resources(job);
>>>
>>> trace_amdgpu_cs_ioctl(job);
>>> + amdgpu_cs_parser_fini(p, 0, true);
>>
>> Please confirm:
>> It will not free/release more things that may be accessed in job
>> working process, as cs_parse_fini free so many things related to parser.
>
> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>
No, with my patch, parser->job already is NULL here, same as previous sequence, just put push_job action after amdgpu_cs_parser_fini()
> if (parser->job)
> amdgpu_job_free(parser->job);
>
> And amdgpu_cs_submit() sets the fence, so it must be called before
> amdgpu_cs_parser_fini().
yes, fence is already generated, so amdgpu_cs_parser_fini before pushing job is completely safe here.
>
> But apart from that this is a rather nifty idea. What was the problem
> with the initial patch?
Just as comments in patch, which is found by Monk.
Regards,
David Zhou
>
>>
>> Or we can just release the pd reservation here?
>
> We could run into problem with the reservation ticked with that. So I
> would want to avoid that.
>
> Christian.
>
>>
>> Jerry
>>
>>> amd_sched_entity_push_job(&job->base);
>>>
>>> return 0;
>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev,
>>> void *data, struct drm_file *filp)
>>>
>>> r = amdgpu_cs_submit(&parser, cs);
>>>
>>> + return r;
>>> out:
>>> amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>> return r;
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset
[not found] ` <b6e45c8f-40a1-4185-89fb-1f4f7719c0fb-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-04-28 2:37 ` Zhang, Jerry (Junwei)
0 siblings, 0 replies; 9+ messages in thread
From: Zhang, Jerry (Junwei) @ 2017-04-28 2:37 UTC (permalink / raw)
To: Christian König, zhoucm1,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Monk.Liu-5C7GfCeVMHo
On 04/27/2017 08:04 PM, Christian König wrote:
> Am 27.04.2017 um 12:23 schrieb zhoucm1:
>>
>>
>> On 2017年04月27日 18:11, Christian König wrote:
>>> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>>>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>>>> the case could happen when gpu reset:
>>>>> 1. when gpu reset, cs can be continue until sw queue is full, then push
>>>>> job will wait with holding pd reservation.
>>>>> 2. gpu_reset routine will also need pd reservation to restore page table
>>>>> from their shadow.
>>>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs
>>>>> releases reservation.
>>>>>
>>>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>>>> Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>>> 1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> index 9edb1a4..a6722a7 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>> amdgpu_job_free_resources(job);
>>>>>
>>>>> trace_amdgpu_cs_ioctl(job);
>>>>> + amdgpu_cs_parser_fini(p, 0, true);
>>>>
>>>> Please confirm:
>>>> It will not free/release more things that may be accessed in job working
>>>> process, as cs_parse_fini free so many things related to parser.
>>>
>>> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>>>
>> No, with my patch, parser->job already is NULL here, same as previous
>> sequence, just put push_job action after amdgpu_cs_parser_fini()
>
> Ah! You add that into amdgpu_cs_submit! That was the point I was missing.
Thanks your all input.
See it too.
Feel free to add my RB, if need.
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Jerry
>
>> trace_amdgpu_cs_ioctl(job);
>> + amdgpu_cs_parser_fini(p, 0, true);
>> amd_sched_entity_push_job(&job->base);
> Can we keep the trace_amdgpu_cs_ioctl(); directly before the
> amd_sched_entity_push_job()?
>
> With that changed, the patch is Reviewed-by: Christian König
> <christian.koenig@amd.com>.
>
> Regards,
> Christian.
>
>>> if (parser->job)
>>> amdgpu_job_free(parser->job);
>>>
>>> And amdgpu_cs_submit() sets the fence, so it must be called before
>>> amdgpu_cs_parser_fini().
>> yes, fence is already generated, so amdgpu_cs_parser_fini before pushing job
>> is completely safe here.
>>>
>>> But apart from that this is a rather nifty idea. What was the problem with
>>> the initial patch?
>> Just as comments in patch, which is found by Monk.
>>
>> Regards,
>> David Zhou
>>>
>>>>
>>>> Or we can just release the pd reservation here?
>>>
>>> We could run into problem with the reservation ticked with that. So I would
>>> want to avoid that.
>>>
>>> Christian.
>>>
>>>>
>>>> Jerry
>>>>
>>>>> amd_sched_entity_push_job(&job->base);
>>>>>
>>>>> return 0;
>>>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void
>>>>> *data, struct drm_file *filp)
>>>>>
>>>>> r = amdgpu_cs_submit(&parser, cs);
>>>>>
>>>>> + return r;
>>>>> out:
>>>>> amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>>> return r;
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-04-28 2:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-27 7:20 [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset Chunming Zhou
[not found] ` <1493277617-14966-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
2017-04-27 8:01 ` zhoucm1
-- strict thread matches above, loose matches on Subject: below --
2017-04-27 8:25 Chunming Zhou
[not found] ` <1493281554-9898-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
2017-04-27 9:08 ` Zhang, Jerry (Junwei)
[not found] ` <5901B526.6080305-5C7GfCeVMHo@public.gmane.org>
2017-04-27 10:11 ` Christian König
[not found] ` <62336e5c-571a-196f-6c07-be25c928cdd3-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-04-27 10:23 ` zhoucm1
[not found] ` <5901C6BD.1070303-5C7GfCeVMHo@public.gmane.org>
2017-04-27 12:04 ` Christian König
[not found] ` <b6e45c8f-40a1-4185-89fb-1f4f7719c0fb-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-04-28 2:37 ` Zhang, Jerry (Junwei)
2017-04-27 14:02 ` Liu, Monk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.