From: zhoucm1 <david1.zhou-5C7GfCeVMHo@public.gmane.org>
To: "Christian König"
<deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>,
"Michel Dänzer" <michel-otUistvHUpPR7s880joybQ@public.gmane.org>
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH 4/4] drm/amdgpu: reset fpriv vram_lost_counter
Date: Wed, 17 May 2017 16:46:51 +0800 [thread overview]
Message-ID: <591C0DFB.8030604@amd.com> (raw)
In-Reply-To: <7a302ebe-1de1-734f-fb21-aadcc7904d37-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
[-- Attachment #1.1: Type: text/plain, Size: 4220 bytes --]
On 2017年05月17日 16:40, Christian König wrote:
> Am 17.05.2017 um 10:01 schrieb Michel Dänzer:
>> On 17/05/17 04:13 PM, zhoucm1 wrote:
>>> On 2017年05月17日 14:57, Michel Dänzer wrote:
>>>> On 17/05/17 01:28 PM, zhoucm1 wrote:
>>>>> On 2017年05月17日 11:15, Michel Dänzer wrote:
>>>>>> On 17/05/17 12:04 PM, zhoucm1 wrote:
>>>>>>> On 2017年05月17日 09:18, Michel Dänzer wrote:
>>>>>>>> On 16/05/17 06:25 PM, Chunming Zhou wrote:
>>>>>>>>> Change-Id: I8eb6d7f558da05510e429d3bf1d48c8cec6c1977
>>>>>>>>> Signed-off-by: Chunming Zhou <David1.Zhou-5C7GfCeVMHo@public.gmane.org>
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>> index bca1fb5..f3e7525 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>> @@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct drm_device *dev,
>>>>>>>>> void *data, struct drm_file *filp)
>>>>>>>>> case AMDGPU_VM_OP_UNRESERVE_VMID:
>>>>>>>>> amdgpu_vm_free_reserved_vmid(adev, &fpriv->vm,
>>>>>>>>> AMDGPU_GFXHUB);
>>>>>>>>> break;
>>>>>>>>> + case AMDGPU_VM_OP_RESET:
>>>>>>>>> + fpriv->vram_lost_counter =
>>>>>>>>> atomic_read(&adev->vram_lost_counter);
>>>>>>>>> + break;
>>>>>>>> How do you envision the UMDs using this? I can mostly think of
>>>>>>>> them
>>>>>>>> calling this ioctl when a context is created or destroyed. But
>>>>>>>> that
>>>>>>>> would also allow any other remaining contexts using the same
>>>>>>>> DRM file
>>>>>>>> descriptor to use all ioctls again. So, I think there needs to
>>>>>>>> be a
>>>>>>>> vram_lost_counter in struct amdgpu_ctx instead of in struct
>>>>>>>> amdgpu_fpriv.
>>>>>>> struct amdgpu_fpriv for vram_lost_counter is proper place,
>>>>>>> especially
>>>>>>> for ioctl return value.
>>>>>>> if you need to reset ctx one by one, we can mark all contexts of
>>>>>>> that
>>>>>>> vm, and then reset by userspace.
>>>>>> I'm not following. With vram_lost_counter in amdgpu_fpriv, if any
>>>>>> context calls this ioctl, all other contexts using the same file
>>>>>> descriptor will also be considered safe again, right?
>>>>> Yes, but it really depends on userspace requirement, if you need to
>>>>> reset ctx one by one, we can mark all contexts of that vm to
>>>>> guilty, and
>>>>> then reset one context by userspace.
>>>> Still not sure what you mean by that.
>>>>
>>>> E.g. what do you mean by "guilty"? I thought that refers to the
>>>> context
>>>> which caused a hang. But it seems like you're using it to refer to any
>>>> context which hasn't reacted yet to VRAM contents being lost.
>>> When vram is lost, we treat all contexts need to reset.
>> Essentially, your patches only track VRAM contents being lost per file
>> descriptor, not per context. I'm not sure (rather skeptical) that this
>> is suitable for OpenGL UMDs, since state is usually tracked per context.
>> Marek / Nicolai?
>
> Oh, yeah that's a good point.
>
> The problem with tracking it per context is that Vulkan also wants the
> ENODEV on the amdgpu_gem_va_ioct() and amdgpu_info_ioctl() which are
> context less.
>
> But thinking more about this blocking those two doesn't make much
> sense. The VM content can be restored and why should be disallow
> reading GPU info?
I can re-paste the Vulkan APIs requiring ENODEV:
"
The Vulkan APIs listed below could return VK_ERROR_DEVICE_LOST according
to the spec.
I tries to provide a list of u/k interfaces that could be called for
each vk API.
vkCreateDevice
-amdgpu_device_initialize.
-amdgpu_query_gpu_info
vkQueueSubmit
-amdgpu_cs_submit
vkWaitForFences
amdgpu_cs_wait_fences
vkGetEventStatus
vkQueueWaitIdle
vkDeviceWaitIdle
vkGetQueryPoolResults**
amdgpu_cs_query_Fence_status
vkQueueBindSparse**
amdgpu_bo_va_op
amdgpu_bo_va_op_raw
vkCreateSwapchainKHR**
vkAcquireNextImageKHR**
vkQueuePresentKHR
Not related with u/k interface.**
**
Besides those listed above, I think
amdgpu_cs_signal_Sem/amdgpu_cs_wait_sem should respond to gpu reset as
well."
>
> Christian.
>
[-- Attachment #1.2: Type: text/html, Size: 9346 bytes --]
[-- Attachment #2: Type: text/plain, Size: 154 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2017-05-17 8:46 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-16 9:25 [PATCH 1/4] drm/amdgpu: check if vram is lost v2 Chunming Zhou
[not found] ` <1494926750-1081-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
2017-05-16 9:25 ` [PATCH 2/4] drm/amdgpu: return -ENODEV to user space when " Chunming Zhou
[not found] ` <1494926750-1081-2-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
2017-05-23 15:08 ` Deucher, Alexander
[not found] ` <BN6PR12MB1652E1AFF691F58C92FC2CD7F7F90-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-23 15:16 ` Christian König
[not found] ` <69717c0b-b2c1-589a-c466-5d6be9518eda-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-24 2:20 ` zhoucm1
2017-05-16 9:25 ` [PATCH 3/4] drm/amdgpu: skip all jobs of guilty vm Chunming Zhou
2017-05-16 9:25 ` [PATCH 4/4] drm/amdgpu: reset fpriv vram_lost_counter Chunming Zhou
[not found] ` <1494926750-1081-4-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
2017-05-17 1:18 ` Michel Dänzer
[not found] ` <58988726-543a-535a-3011-860d29b9f2da-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-17 3:04 ` zhoucm1
[not found] ` <591BBDA2.1070900-5C7GfCeVMHo@public.gmane.org>
2017-05-17 3:15 ` Michel Dänzer
[not found] ` <29fe2142-7fd1-e23a-49d9-c38dc685db92-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-17 4:28 ` zhoucm1
[not found] ` <591BD17C.8050903-5C7GfCeVMHo@public.gmane.org>
2017-05-17 6:57 ` Michel Dänzer
[not found] ` <7d87bc8e-9c09-ad25-de6e-dfbd8116bf6e-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-17 7:13 ` zhoucm1
[not found] ` <591BF825.6090505-5C7GfCeVMHo@public.gmane.org>
2017-05-17 8:01 ` Michel Dänzer
[not found] ` <31db7a30-dd98-5cb2-4125-187d3d0e2a49-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-17 8:40 ` Christian König
[not found] ` <7a302ebe-1de1-734f-fb21-aadcc7904d37-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-17 8:46 ` zhoucm1 [this message]
[not found] ` <591C0DFB.8030604-5C7GfCeVMHo@public.gmane.org>
2017-05-17 8:55 ` Michel Dänzer
2017-05-17 8:56 ` Christian König
[not found] ` <46582a1e-e019-34ac-1913-ed4a2a992e4c-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-17 9:49 ` Marek Olšák
[not found] ` <CAAxE2A7sRZXx3MnRSO76DW=61X06nfVs2AHe_a-r+K+46tfJPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-17 10:15 ` Zhou, David(ChunMing)
2017-05-16 10:49 ` [PATCH 1/4] drm/amdgpu: check if vram is lost v2 Christian König
[not found] ` <0c1d89c5-65ea-cdad-100a-80d0377b865c-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-17 4:37 ` zhoucm1
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=591C0DFB.8030604@amd.com \
--to=david1.zhou-5c7gfcevmho@public.gmane.org \
--cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
--cc=deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org \
--cc=michel-otUistvHUpPR7s880joybQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.