All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Felix Kuehling <felix.kuehling@amd.com>, amd-gfx@lists.freedesktop.org
Cc: xiaogang.chen@amd.com, ramesh.errabolu@amd.com, christian.koenig@amd.com
Subject: Re: [PATCH 2/6] drm/amdgpu: New VM state for evicted user BOs
Date: Wed, 8 Nov 2023 13:28:59 +0100	[thread overview]
Message-ID: <c3a50dd5-d621-46ac-8cd3-d1fa5bed30bd@gmail.com> (raw)
In-Reply-To: <8b63ff39-b1f2-4dc5-970c-29d96c63b18b@amd.com>

Not necessary objections to this patch here, but rather how this new 
state is used later on.

The fundamental problem is that re-validating things in 
amdgpu_vm_handle_moved() won't work in all cases.

The general approach for both classic CS IOCTL as well as user queues is 
the following:

1. Grab all the necessary locks
     The VM lock + either everything inside the VM (user queues) or just 
the BOs in the submission (CS IOCTL).

2. Validate everything locked
     It can be that additional BO locks are grabbed for eviction and 
even locked BOs are shuffled around during that.

3. Update the PDEs and PTEs
     This can be done only after validating everything because things 
can still shuffle around during validation.

4. Do your CS or make the userqueu / process runable etc...

5. Attach fences to the locked BOs.

6. Unlock everything again.

I think that is part of the problem why handling all updates in 
amdgpu_vm_handle_moved() for the CS doesn't work and sooner or later you 
might run into the same issue during process restore as well.

If I'm not completely mistaken this should be solvable by moving all 
validation to amdgpu_vm_validate_pt_bos() instead (but let us rename 
that function).

Regards,
Christian.

Am 07.11.23 um 23:11 schrieb Felix Kuehling:
> Hi Christian,
>
> I know you have objected to this patch before. I still think this is 
> the best solution for what I need. I can talk you through my reasoning 
> by email or offline. If I can't convince you, I will need your 
> guidance for a better solution.
>
> Thanks,
>   Felix
>
>
> On 2023-11-07 11:58, Felix Kuehling wrote:
>> Create a new VM state to track user BOs that are in the system domain.
>> In the next patch this will be used do conditionally re-validate them in
>> amdgpu_vm_handle_moved.
>>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 +++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  5 ++++-
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 1442d97ddd0f..0d685577243c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -232,6 +232,22 @@ static void amdgpu_vm_bo_invalidated(struct 
>> amdgpu_vm_bo_base *vm_bo)
>>       spin_unlock(&vm_bo->vm->status_lock);
>>   }
>>   +/**
>> + * amdgpu_vm_bo_evicted_user - vm_bo is evicted
>> + *
>> + * @vm_bo: vm_bo which is evicted
>> + *
>> + * State for BOs used by user mode queues which are not at the 
>> location they
>> + * should be.
>> + */
>> +static void amdgpu_vm_bo_evicted_user(struct amdgpu_vm_bo_base *vm_bo)
>> +{
>> +    vm_bo->moved = true;
>> +    spin_lock(&vm_bo->vm->status_lock);
>> +    list_move(&vm_bo->vm_status, &vm_bo->vm->evicted_user);
>> +    spin_unlock(&vm_bo->vm->status_lock);
>> +}
>> +
>>   /**
>>    * amdgpu_vm_bo_relocated - vm_bo is reloacted
>>    *
>> @@ -2110,6 +2126,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, 
>> struct amdgpu_vm *vm, int32_t xcp
>>       for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
>>           vm->reserved_vmid[i] = NULL;
>>       INIT_LIST_HEAD(&vm->evicted);
>> +    INIT_LIST_HEAD(&vm->evicted_user);
>>       INIT_LIST_HEAD(&vm->relocated);
>>       INIT_LIST_HEAD(&vm->moved);
>>       INIT_LIST_HEAD(&vm->idle);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 12cac06fa289..939d0c2219c0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -286,9 +286,12 @@ struct amdgpu_vm {
>>       /* Lock to protect vm_bo add/del/move on all lists of vm */
>>       spinlock_t        status_lock;
>>   -    /* BOs who needs a validation */
>> +    /* Per VM and PT BOs who needs a validation */
>>       struct list_head    evicted;
>>   +    /* BOs for user mode queues that need a validation */
>> +    struct list_head    evicted_user;
>> +
>>       /* PT BOs which relocated and their parent need an update */
>>       struct list_head    relocated;


  reply	other threads:[~2023-11-08 12:29 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07 16:58 [PATCH 1/6] drm/amdgpu: Fix possible null pointer dereference Felix Kuehling
2023-11-07 16:58 ` [PATCH 2/6] drm/amdgpu: New VM state for evicted user BOs Felix Kuehling
2023-11-07 22:11   ` Felix Kuehling
2023-11-08 12:28     ` Christian König [this message]
2023-11-08 21:23       ` Felix Kuehling
2023-11-09  8:12         ` Christian König
2023-11-14 22:26           ` Felix Kuehling
2023-11-07 16:58 ` [PATCH 3/6] drm/amdgpu: Auto-validate DMABuf imports in compute VMs Felix Kuehling
2023-11-07 20:24   ` Joshi, Mukul
2023-11-07 16:58 ` [PATCH 4/6] drm/amdkfd: Export DMABufs from KFD using GEM handles Felix Kuehling
2023-11-07 19:44   ` Errabolu, Ramesh
2023-11-07 19:56     ` Felix Kuehling
2023-11-08  2:08       ` Errabolu, Ramesh
2023-11-16 21:53   ` Felix Kuehling
2023-11-17 16:30     ` Christian König
2023-11-07 16:58 ` [PATCH 5/6] drm/amdkfd: Import DMABufs for interop through DRM Felix Kuehling
2023-11-08  0:01   ` Errabolu, Ramesh
2023-11-08 23:20   ` Chen, Xiaogang
2023-11-08 23:26     ` Felix Kuehling
2023-11-08 23:41       ` Chen, Xiaogang
2023-11-09  8:16   ` Christian König
2023-11-07 16:58 ` [PATCH 6/6] drm/amdkfd: Bump KFD ioctl version Felix Kuehling
2023-11-08 12:11 ` [PATCH 1/6] drm/amdgpu: Fix possible null pointer dereference Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3a50dd5-d621-46ac-8cd3-d1fa5bed30bd@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=felix.kuehling@amd.com \
    --cc=ramesh.errabolu@amd.com \
    --cc=xiaogang.chen@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.