From: "Christian König" <ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Monk Liu <Monk.Liu-5C7GfCeVMHo@public.gmane.org>,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH 2/4] drm/amdgpu: cleanups for vram lost handling
Date: Wed, 28 Feb 2018 13:22:34 +0100 [thread overview]
Message-ID: <7b2cc0f2-fa4e-758e-1395-24d7bd48c898@gmail.com> (raw)
In-Reply-To: <1519802463-9090-2-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
Am 28.02.2018 um 08:21 schrieb Monk Liu:
> 1)create a routine "handle_vram_lost" to do the vram
> recovery, and put it into amdgpu_device_reset/reset_sriov,
> this way no need of the extra paramter to hold the
> VRAM LOST information and the related macros can be removed.
>
> 3)show vram_recover failure if time out, and set TMO equal to
> lockup_timeout if vram_recover is under SRIOV runtime mode.
>
> 4)report error if any ip reset failed for SR-IOV
>
> Change-Id: I686e2b6133844c14948c206a2315c064a78c1d9c
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Looks good to me, but not 100% sure if all of this is technically correct.
Patch is Acked-by: Christian König <christian.koenig@amd.com> for now.
Andrey and/or David should probably take a look as well.
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 136 +++++++++++++++--------------
> 2 files changed, 72 insertions(+), 68 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 5bddfc1..abbc3f1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -181,10 +181,6 @@ extern int amdgpu_cik_support;
> #define CIK_CURSOR_WIDTH 128
> #define CIK_CURSOR_HEIGHT 128
>
> -/* GPU RESET flags */
> -#define AMDGPU_RESET_INFO_VRAM_LOST (1 << 0)
> -#define AMDGPU_RESET_INFO_FULLRESET (1 << 1)
> -
> struct amdgpu_device;
> struct amdgpu_ib;
> struct amdgpu_cs_parser;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e9d81a8..39ece7f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1591,6 +1591,8 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>
> r = block->version->funcs->hw_init(adev);
> DRM_INFO("RE-INIT: %s %s\n", block->version->funcs->name, r?"failed":"successed");
> + if (r)
> + return r;
> }
> }
>
> @@ -1624,6 +1626,8 @@ static int amdgpu_device_ip_reinit_late_sriov(struct amdgpu_device *adev)
>
> r = block->version->funcs->hw_init(adev);
> DRM_INFO("RE-INIT: %s %s\n", block->version->funcs->name, r?"failed":"successed");
> + if (r)
> + return r;
> }
> }
>
> @@ -2471,17 +2475,71 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
> return r;
> }
>
> +static int amdgpu_handle_vram_lost(struct amdgpu_device *adev)
> +{
> + struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> + struct amdgpu_bo *bo, *tmp;
> + struct dma_fence *fence = NULL, *next = NULL;
> + long r = 1;
> + int i = 0;
> + long tmo;
> +
> + if (amdgpu_sriov_runtime(adev))
> + tmo = msecs_to_jiffies(amdgpu_lockup_timeout);
> + else
> + tmo = msecs_to_jiffies(100);
> +
> + DRM_INFO("recover vram bo from shadow start\n");
> + mutex_lock(&adev->shadow_list_lock);
> + list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
> + next = NULL;
> + amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
> + if (fence) {
> + r = dma_fence_wait_timeout(fence, false, tmo);
> + if (r == 0)
> + pr_err("wait fence %p[%d] timeout\n", fence, i);
> + else if (r < 0)
> + pr_err("wait fence %p[%d] interrupted\n", fence, i);
> + if (r < 1) {
> + dma_fence_put(fence);
> + fence = next;
> + break;
> + }
> + i++;
> + }
> +
> + dma_fence_put(fence);
> + fence = next;
> + }
> + mutex_unlock(&adev->shadow_list_lock);
> +
> + if (fence) {
> + r = dma_fence_wait_timeout(fence, false, tmo);
> + if (r == 0)
> + pr_err("wait fence %p[%d] timeout\n", fence, i);
> + else if (r < 0)
> + pr_err("wait fence %p[%d] interrupted\n", fence, i);
> +
> + }
> + dma_fence_put(fence);
> +
> + if (r > 0)
> + DRM_INFO("recover vram bo from shadow done\n");
> + else
> + DRM_ERROR("recover vram bo from shadow failed\n");
> +
> + return (r > 0?0:1);
> +}
> +
> /*
> * amdgpu_device_reset - reset ASIC/GPU for bare-metal or passthrough
> *
> * @adev: amdgpu device pointer
> - * @reset_flags: output param tells caller the reset result
> *
> * attempt to do soft-reset or full-reset and reinitialize Asic
> * return 0 means successed otherwise failed
> */
> -static int amdgpu_device_reset(struct amdgpu_device *adev,
> - uint64_t* reset_flags)
> +static int amdgpu_device_reset(struct amdgpu_device *adev)
> {
> bool need_full_reset, vram_lost = 0;
> int r;
> @@ -2496,7 +2554,6 @@ static int amdgpu_device_reset(struct amdgpu_device *adev,
> DRM_INFO("soft reset failed, will fallback to full reset!\n");
> need_full_reset = true;
> }
> -
> }
>
> if (need_full_reset) {
> @@ -2545,13 +2602,8 @@ static int amdgpu_device_reset(struct amdgpu_device *adev,
> }
> }
>
> - if (reset_flags) {
> - if (vram_lost)
> - (*reset_flags) |= AMDGPU_RESET_INFO_VRAM_LOST;
> -
> - if (need_full_reset)
> - (*reset_flags) |= AMDGPU_RESET_INFO_FULLRESET;
> - }
> + if (!r && ((need_full_reset && !(adev->flags & AMD_IS_APU)) || vram_lost))
> + r = amdgpu_handle_vram_lost(adev);
>
> return r;
> }
> @@ -2560,14 +2612,11 @@ static int amdgpu_device_reset(struct amdgpu_device *adev,
> * amdgpu_device_reset_sriov - reset ASIC for SR-IOV vf
> *
> * @adev: amdgpu device pointer
> - * @reset_flags: output param tells caller the reset result
> *
> * do VF FLR and reinitialize Asic
> * return 0 means successed otherwise failed
> */
> -static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
> - uint64_t *reset_flags,
> - bool from_hypervisor)
> +static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, bool from_hypervisor)
> {
> int r;
>
> @@ -2588,28 +2637,20 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
>
> /* now we are okay to resume SMC/CP/SDMA */
> r = amdgpu_device_ip_reinit_late_sriov(adev);
> + amdgpu_virt_release_full_gpu(adev, true);
> if (r)
> goto error;
>
> amdgpu_irq_gpu_reset_resume_helper(adev);
> r = amdgpu_ib_ring_tests(adev);
> - if (r)
> - dev_err(adev->dev, "[GPU_RESET] ib ring test failed (%d).\n", r);
>
> -error:
> - /* release full control of GPU after ib test */
> - amdgpu_virt_release_full_gpu(adev, true);
> -
> - if (reset_flags) {
> - if (adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
> - (*reset_flags) |= AMDGPU_RESET_INFO_VRAM_LOST;
> - atomic_inc(&adev->vram_lost_counter);
> - }
> -
> - /* VF FLR or hotlink reset is always full-reset */
> - (*reset_flags) |= AMDGPU_RESET_INFO_FULLRESET;
> + if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
> + atomic_inc(&adev->vram_lost_counter);
> + r = amdgpu_handle_vram_lost(adev);
> }
>
> +error:
> +
> return r;
> }
>
> @@ -2673,42 +2714,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> }
>
> if (amdgpu_sriov_vf(adev))
> - r = amdgpu_device_reset_sriov(adev, &reset_flags, job ? false : true);
> + r = amdgpu_device_reset_sriov(adev, job ? false : true);
> else
> - r = amdgpu_device_reset(adev, &reset_flags);
> -
> - if (!r) {
> - if (((reset_flags & AMDGPU_RESET_INFO_FULLRESET) && !(adev->flags & AMD_IS_APU)) ||
> - (reset_flags & AMDGPU_RESET_INFO_VRAM_LOST)) {
> - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> - struct amdgpu_bo *bo, *tmp;
> - struct dma_fence *fence = NULL, *next = NULL;
> -
> - DRM_INFO("recover vram bo from shadow\n");
> - mutex_lock(&adev->shadow_list_lock);
> - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
> - next = NULL;
> - amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
> - if (fence) {
> - r = dma_fence_wait(fence, false);
> - if (r) {
> - WARN(r, "recovery from shadow isn't completed\n");
> - break;
> - }
> - }
> -
> - dma_fence_put(fence);
> - fence = next;
> - }
> - mutex_unlock(&adev->shadow_list_lock);
> - if (fence) {
> - r = dma_fence_wait(fence, false);
> - if (r)
> - WARN(r, "recovery from shadow isn't completed\n");
> - }
> - dma_fence_put(fence);
> - }
> - }
> + r = amdgpu_device_reset(adev);
>
> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> struct amdgpu_ring *ring = adev->rings[i];
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2018-02-28 12:22 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-28 7:21 [PATCH 1/4] drm/amdgpu: stop all rings before doing gpu recover Monk Liu
[not found] ` <1519802463-9090-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2018-02-28 7:21 ` [PATCH 2/4] drm/amdgpu: cleanups for vram lost handling Monk Liu
[not found] ` <1519802463-9090-2-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2018-02-28 12:22 ` Christian König [this message]
2018-02-28 13:29 ` Andrey Grodzovsky
[not found] ` <13337cd9-78e9-df36-f2ab-749cf182177b-5C7GfCeVMHo@public.gmane.org>
2018-02-28 13:36 ` Liu, Monk
[not found] ` <BLUPR12MB0449D9513F4798028569ED0884C70-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-02-28 13:41 ` Andrey Grodzovsky
2018-02-28 14:25 ` Alex Deucher
2018-02-28 7:21 ` [PATCH 3/4] drm/amdgpu: don't return when ring not ready for fill_buffer Monk Liu
[not found] ` <1519802463-9090-3-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2018-02-28 12:23 ` Christian König
[not found] ` <01879d0d-edea-680e-c9f2-1005d94f1dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-28 12:34 ` Liu, Monk
[not found] ` <BLUPR12MB0449BEDBC1FCE72C0021C88184C70-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-02-28 12:40 ` Liu, Monk
2018-02-28 12:45 ` Christian König
[not found] ` <3a8945f9-848e-dd19-373d-5dddc69f76cb-5C7GfCeVMHo@public.gmane.org>
2018-03-01 6:01 ` Liu, Monk
[not found] ` <BLUPR12MB04498FBA36C747652758D56E84C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01 8:16 ` Christian König
[not found] ` <7af9b6fb-28e8-4e25-5d4a-5b566a00cbea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-03-01 8:23 ` Liu, Monk
[not found] ` <BLUPR12MB04490BF8390F6270B265969984C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01 8:41 ` Christian König
[not found] ` <6dedfc7f-cf69-676f-463d-be52cda1b1bb-5C7GfCeVMHo@public.gmane.org>
2018-03-01 8:51 ` Liu, Monk
[not found] ` <BLUPR12MB04491577FB4B6D6A5BCC3B8784C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01 9:25 ` Liu, Monk
[not found] ` <BLUPR12MB044933DC340A46EADC1F7A2784C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01 9:30 ` Christian König
2018-03-01 9:37 ` Liu, Monk
[not found] ` <BLUPR12MB044905DB87968358166BDEF484C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01 9:39 ` Christian König
[not found] ` <a6812ba2-1c34-c6a4-d65a-09f924ea0940-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-03-01 9:41 ` Liu, Monk
[not found] ` <BLUPR12MB044923819BB9561C57EACA8584C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01 9:51 ` Christian König
[not found] ` <26eb2e4c-33dd-2160-5b83-a7ff9e3a2558-5C7GfCeVMHo@public.gmane.org>
2018-03-01 10:11 ` Liu, Monk
2018-02-28 7:21 ` [PATCH 4/4] drm/amdgpu: try again kiq access if not in IRQ Monk Liu
[not found] ` <1519802463-9090-4-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2018-02-28 7:41 ` Liu, Monk
2018-02-28 12:20 ` [PATCH 1/4] drm/amdgpu: stop all rings before doing gpu recover Christian König
[not found] ` <9e575c74-c6ce-76ef-a09c-1dec5a4807a3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-28 13:30 ` Andrey Grodzovsky
[not found] ` <e4756f8b-d8f9-9849-aad4-a23193e367f6-5C7GfCeVMHo@public.gmane.org>
2018-02-28 13:31 ` Liu, Monk
[not found] ` <BLUPR12MB04491C15749E9DCAB3DEF1A684C70-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-02-28 13:42 ` Andrey Grodzovsky
2018-02-28 16:40 ` Andrey Grodzovsky
[not found] ` <a54b5a4f-a370-87ec-7bac-33e6036107f9-5C7GfCeVMHo@public.gmane.org>
2018-02-28 17:42 ` Andrey Grodzovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b2cc0f2-fa4e-758e-1395-24d7bd48c898@gmail.com \
--to=ckoenig.leichtzumerken-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=Monk.Liu-5C7GfCeVMHo@public.gmane.org \
--cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
--cc=christian.koenig-5C7GfCeVMHo@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.