AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Sharma, Shashank" <shashank.sharma@amd.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	amd-gfx@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>,
	Christian Koenig <christian.koenig@amd.com>,
	Arvind Yadav <arvind.yadav@amd.com>
Subject: Re: [PATCH v11 24/28] drm/amdgpu: resume gfx userqueues
Date: Wed, 25 Sep 2024 11:15:50 +0200	[thread overview]
Message-ID: <d9110bdb-06f5-4951-b09a-2fbdd6d7f516@amd.com> (raw)
In-Reply-To: <8a6fc562-277b-4162-ad0d-3ee0f42a55c4@gmail.com>


On 17/09/2024 14:30, Christian König wrote:
> Am 09.09.24 um 22:06 schrieb Shashank Sharma:
>> This patch adds support for userqueue resume. What it typically does is
>> this:
>> - adds a new delayed work for resuming all the queues.
>> - schedules this delayed work from the suspend work.
>> - validates the BOs and replaces the eviction fence before resuming all
>>    the queues running under this instance of userq manager.
>>
>> V2: Addressed Christian's review comments:
>>      - declare local variables like ret at the bottom.
>>      - lock all the object first, then start attaching the new fence.
>>      - dont replace old eviction fence, just attach new eviction fence.
>>      - no error logs for drm_exec_lock failures
>>      - no need to reserve bos after drm_exec_locked
>>      - schedule the resume worker immediately (not after 100 ms)
>>      - check for NULL BO (Arvind)
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 120 ++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |   1 +
>>   2 files changed, 121 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 979174f80993..e7f7354e0c0e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -405,6 +405,122 @@ int amdgpu_userq_ioctl(struct drm_device *dev, 
>> void *data,
>>       return r;
>>   }
>>   +static int
>> +amdgpu_userqueue_resume_all(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    const struct amdgpu_userq_funcs *userq_funcs;
>> +    struct amdgpu_usermode_queue *queue;
>> +    int queue_id, ret;
>> +
>> +    userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
>> +
>> +    /* Resume all the queues for this process */
>> +    idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
>> +        ret = userq_funcs->resume(uq_mgr, queue);
>> +        if (ret)
>> +            DRM_ERROR("Failed to resume queue %d\n", queue_id);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static int
>> +amdgpu_userqueue_validate_bos(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
>> +    struct amdgpu_vm *vm = &fpriv->vm;
>> +    struct amdgpu_bo_va *bo_va, *tmp;
>> +    struct drm_exec exec;
>> +    struct amdgpu_bo *bo;
>> +    int ret;
>> +
>> +    drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
>> +    drm_exec_until_all_locked(&exec) {
>> +        ret = amdgpu_vm_lock_pd(vm, &exec, 2);
>> +        drm_exec_retry_on_contention(&exec);
>> +        if (unlikely(ret)) {
>> +            DRM_ERROR("Failed to lock PD\n");
>
> I would drop those error messages in the low level function.
>
> The most likely cause (except for contention) why locking a BO fails 
> is because we were interrupted, and for that we actually don't want to 
> print anything.
>
> Apart from that I really need to wrap my head around the VM code once 
> more, but that here should probably work for now.

Noted, I will remove the error message.

- Shashank

>
> Regards,
> Christian.
>
>> +            goto unlock_all;
>> +        }
>> +
>> +        /* Lock the done list */
>> +        list_for_each_entry_safe(bo_va, tmp, &vm->done, 
>> base.vm_status) {
>> +            bo = bo_va->base.bo;
>> +            if (!bo)
>> +                continue;
>> +
>> +            ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
>> +            drm_exec_retry_on_contention(&exec);
>> +            if (unlikely(ret))
>> +                goto unlock_all;
>> +        }
>> +
>> +        /* Lock the invalidated list */
>> +        list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, 
>> base.vm_status) {
>> +            bo = bo_va->base.bo;
>> +            if (!bo)
>> +                continue;
>> +
>> +            ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
>> +            drm_exec_retry_on_contention(&exec);
>> +            if (unlikely(ret))
>> +                goto unlock_all;
>> +        }
>> +    }
>> +
>> +    /* Now validate BOs */
>> +    list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, 
>> base.vm_status) {
>> +        bo = bo_va->base.bo;
>> +        if (!bo)
>> +            continue;
>> +
>> +        ret = amdgpu_userqueue_validate_vm_bo(NULL, bo);
>> +        if (ret) {
>> +            DRM_ERROR("Failed to validate BO\n");
>> +            goto unlock_all;
>> +        }
>> +    }
>> +
>> +    /* Handle the moved BOs */
>> +    ret = amdgpu_vm_handle_moved(uq_mgr->adev, vm, &exec.ticket);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to handle moved BOs\n");
>> +        goto unlock_all;
>> +    }
>> +
>> +    ret = amdgpu_eviction_fence_replace_fence(fpriv);
>> +    if (ret)
>> +        DRM_ERROR("Failed to replace eviction fence\n");
>> +
>> +unlock_all:
>> +    drm_exec_fini(&exec);
>> +    return ret;
>> +}
>> +
>> +static void amdgpu_userqueue_resume_worker(struct work_struct *work)
>> +{
>> +    struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, 
>> resume_work.work);
>> +    int ret;
>> +
>> +    mutex_lock(&uq_mgr->userq_mutex);
>> +
>> +    ret = amdgpu_userqueue_validate_bos(uq_mgr);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to validate BOs to restore\n");
>> +        goto unlock;
>> +    }
>> +
>> +    ret = amdgpu_userqueue_resume_all(uq_mgr);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to resume all queues\n");
>> +        goto unlock;
>> +    }
>> +
>> +unlock:
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +}
>> +
>>   static int
>>   amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr)
>>   {
>> @@ -486,6 +602,9 @@ amdgpu_userqueue_suspend_worker(struct 
>> work_struct *work)
>>       /* Cleanup old eviction fence entry */
>>       amdgpu_eviction_fence_destroy(evf_mgr);
>>   +    /* Schedule a work to restore userqueue */
>> +    schedule_delayed_work(&uq_mgr->resume_work, 0);
>> +
>>   unlock:
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>   }
>> @@ -508,6 +627,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
>> *userq_mgr, struct amdgpu_devi
>>       /* This reference is required for suspend work */
>>       fpriv->evf_mgr.ev_fence->uq_mgr = userq_mgr;
>>       INIT_DELAYED_WORK(&userq_mgr->suspend_work, 
>> amdgpu_userqueue_suspend_worker);
>> +    INIT_DELAYED_WORK(&userq_mgr->resume_work, 
>> amdgpu_userqueue_resume_worker);
>>       return 0;
>>   }
>>   diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 8b3b50fa8b5b..d035b5c2b14b 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -76,6 +76,7 @@ struct amdgpu_userq_mgr {
>>       struct amdgpu_device        *adev;
>>         struct delayed_work        suspend_work;
>> +    struct delayed_work        resume_work;
>>       int num_userqs;
>>   };
>

  reply	other threads:[~2024-09-25  9:16 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 01/28] drm/amdgpu: UAPI for user queue management Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 02/28] drm/amdgpu: add usermode queue base code Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 03/28] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 04/28] drm/amdgpu: add helpers to create userqueue object Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 05/28] drm/amdgpu: create MES-V11 usermode queue for GFX Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 06/28] drm/amdgpu: create context space for usermode queue Shashank Sharma
2024-10-18 17:39   ` Alex Deucher
2024-09-09 20:05 ` [PATCH v11 07/28] drm/amdgpu: map usermode queue into MES Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 08/28] drm/amdgpu: map wptr BO into GART Shashank Sharma
2024-09-16 12:39   ` Christian König
2024-09-09 20:06 ` [PATCH v11 09/28] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 10/28] drm/amdgpu: cleanup leftover queues Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 11/28] drm/amdgpu: enable GFX-V11 userqueue support Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 12/28] drm/amdgpu: enable SDMA usermode queues Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 13/28] drm/amdgpu: enable compute/gfx usermode queue Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 14/28] drm/amdgpu: update userqueue BOs and PDs Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 15/28] drm/amdgpu: add kernel config for gfx-userqueue Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 21/28] drm/amdgpu: add gfx eviction fence helpers Shashank Sharma
2024-09-16 14:14   ` Christian König
2024-09-25  9:08     ` Sharma, Shashank
2024-09-09 20:06 ` [PATCH v11 22/28] drm/amdgpu: add userqueue suspend/resume functions Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 23/28] drm/amdgpu: suspend gfx userqueues Shashank Sharma
2024-09-17 11:58   ` Christian König
2024-09-25  9:13     ` Sharma, Shashank
2024-09-09 20:06 ` [PATCH v11 24/28] drm/amdgpu: resume " Shashank Sharma
2024-09-17 12:30   ` Christian König
2024-09-25  9:15     ` Sharma, Shashank [this message]
2024-09-09 20:06 ` [PATCH v11 25/28] drm/amdgpu: Add input fence to sync bo unmap Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 26/28] drm/amdgpu: fix MES GFX mask Shashank Sharma
2024-09-17 12:21   ` Christian König
2024-09-09 20:06 ` [PATCH v11 27/28] Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV" Shashank Sharma
2024-09-09 20:31   ` Alex Deucher
2024-09-11  9:20     ` Sharma, Shashank
2024-09-09 20:06 ` [PATCH v11 28/28] Revert "drm/amdgpu: don't allow userspace to create a doorbell BO" Shashank Sharma
2024-09-17 12:25   ` Christian König
2024-09-19 16:59 ` [PATCH v11 00/28] AMDGPU usermode queues Alex Deucher
2024-09-25  9:14   ` Sharma, Shashank

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9110bdb-06f5-4951-b09a-2fbdd6d7f516@amd.com \
    --to=shashank.sharma@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=arvind.yadav@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox