Re: [PATCH] drm/amdgpu: Fix potential race processing vm->freed

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Christian König" <christian.koenig@amd.com>
To: Rob Clark <robdclark@gmail.com>, dri-devel@lists.freedesktop.org
Cc: Rob Clark <robdclark@chromium.org>,
	Philip Yang <Philip.Yang@amd.com>,
	Jammy Zhou <Jammy.Zhou@amd.com>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:RADEON and AMDGPU DRM DRIVERS"
	<amd-gfx@lists.freedesktop.org>, Qiang Yu <qiang.yu@amd.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	Alex Deucher <alexander.deucher@amd.com>,
	David Airlie <airlied@gmail.com>
Subject: Re: [PATCH] drm/amdgpu: Fix potential race processing vm->freed
Date: Mon, 6 Feb 2023 11:14:49 +0100	[thread overview]
Message-ID: <2d5fc6f8-2247-8a8b-1174-eccdc2b08064@amd.com> (raw)
In-Reply-To: <20230203181005.4129175-1-robdclark@gmail.com>

Am 03.02.23 um 19:10 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> If userspace calls the AMDGPU_CS ioctl from multiple threads, because
> the vm is global to the drm_file, you can end up with multiple threads
> racing in amdgpu_vm_clear_freed().  So the freed list should be
> protected with the status_lock, similar to other vm lists.

Well this is nonsense. To process the freed list the VM root PD lock 
must be held anyway.

If we have a call path where this isn't true then we have a major bug at 
a different place here.

Regards,
Christian.

>
> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++----
>   1 file changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index b9441ab457ea..aeed7bc1512f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>   	struct amdgpu_bo_va_mapping *mapping;
>   	uint64_t init_pte_value = 0;
>   	struct dma_fence *f = NULL;
> +	struct list_head freed;
>   	int r;
>   
> -	while (!list_empty(&vm->freed)) {
> -		mapping = list_first_entry(&vm->freed,
> +	/*
> +	 * Move the contents of the VM's freed list to a local list
> +	 * that we can iterate without racing against other threads:
> +	 */
> +	spin_lock(&vm->status_lock);
> +	list_replace_init(&vm->freed, &freed);
> +	spin_unlock(&vm->status_lock);
> +
> +	while (!list_empty(&freed)) {
> +		mapping = list_first_entry(&freed,
>   			struct amdgpu_bo_va_mapping, list);
>   		list_del(&mapping->list);
>   
> @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>   		amdgpu_vm_free_mapping(adev, vm, mapping, f);
>   		if (r) {
>   			dma_fence_put(f);
> +
> +			/*
> +			 * Move any unprocessed mappings back to the freed
> +			 * list:
> +			 */
> +			spin_lock(&vm->status_lock);
> +			list_splice_tail(&freed, &vm->freed);
> +			spin_unlock(&vm->status_lock);
> +
>   			return r;
>   		}
>   	}
> @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>   	mapping->bo_va = NULL;
>   	trace_amdgpu_vm_bo_unmap(bo_va, mapping);
>   
> -	if (valid)
> +	if (valid) {
> +		spin_lock(&vm->status_lock);
>   		list_add(&mapping->list, &vm->freed);
> -	else
> +		spin_unlock(&vm->status_lock);
> +	} else {
>   		amdgpu_vm_free_mapping(adev, vm, mapping,
>   				       bo_va->last_pt_update);
> +	}
>   
>   	return 0;
>   }
> @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>   		    tmp->last = eaddr;
>   
>   		tmp->bo_va = NULL;
> +		spin_lock(&vm->status_lock);
>   		list_add(&tmp->list, &vm->freed);
> +		spin_unlock(&vm->status_lock);
>   		trace_amdgpu_vm_bo_unmap(NULL, tmp);
>   	}
>   
> @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
>   		amdgpu_vm_it_remove(mapping, &vm->va);
>   		mapping->bo_va = NULL;
>   		trace_amdgpu_vm_bo_unmap(bo_va, mapping);
> +		spin_lock(&vm->status_lock);
>   		list_add(&mapping->list, &vm->freed);
> +		spin_unlock(&vm->status_lock);
>   	}
>   	list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) {
>   		list_del(&mapping->list);

WARNING: multiple messages have this Message-ID (diff)

From: "Christian König" <christian.koenig@amd.com>
To: Rob Clark <robdclark@gmail.com>, dri-devel@lists.freedesktop.org
Cc: Rob Clark <robdclark@chromium.org>,
	Philip Yang <Philip.Yang@amd.com>,
	Jammy Zhou <Jammy.Zhou@amd.com>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:RADEON and AMDGPU DRM DRIVERS"
	<amd-gfx@lists.freedesktop.org>, Qiang Yu <qiang.yu@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: Fix potential race processing vm->freed
Date: Mon, 6 Feb 2023 11:14:49 +0100	[thread overview]
Message-ID: <2d5fc6f8-2247-8a8b-1174-eccdc2b08064@amd.com> (raw)
In-Reply-To: <20230203181005.4129175-1-robdclark@gmail.com>

Am 03.02.23 um 19:10 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> If userspace calls the AMDGPU_CS ioctl from multiple threads, because
> the vm is global to the drm_file, you can end up with multiple threads
> racing in amdgpu_vm_clear_freed().  So the freed list should be
> protected with the status_lock, similar to other vm lists.

Well this is nonsense. To process the freed list the VM root PD lock 
must be held anyway.

If we have a call path where this isn't true then we have a major bug at 
a different place here.

Regards,
Christian.

>
> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++----
>   1 file changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index b9441ab457ea..aeed7bc1512f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>   	struct amdgpu_bo_va_mapping *mapping;
>   	uint64_t init_pte_value = 0;
>   	struct dma_fence *f = NULL;
> +	struct list_head freed;
>   	int r;
>   
> -	while (!list_empty(&vm->freed)) {
> -		mapping = list_first_entry(&vm->freed,
> +	/*
> +	 * Move the contents of the VM's freed list to a local list
> +	 * that we can iterate without racing against other threads:
> +	 */
> +	spin_lock(&vm->status_lock);
> +	list_replace_init(&vm->freed, &freed);
> +	spin_unlock(&vm->status_lock);
> +
> +	while (!list_empty(&freed)) {
> +		mapping = list_first_entry(&freed,
>   			struct amdgpu_bo_va_mapping, list);
>   		list_del(&mapping->list);
>   
> @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>   		amdgpu_vm_free_mapping(adev, vm, mapping, f);
>   		if (r) {
>   			dma_fence_put(f);
> +
> +			/*
> +			 * Move any unprocessed mappings back to the freed
> +			 * list:
> +			 */
> +			spin_lock(&vm->status_lock);
> +			list_splice_tail(&freed, &vm->freed);
> +			spin_unlock(&vm->status_lock);
> +
>   			return r;
>   		}
>   	}
> @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>   	mapping->bo_va = NULL;
>   	trace_amdgpu_vm_bo_unmap(bo_va, mapping);
>   
> -	if (valid)
> +	if (valid) {
> +		spin_lock(&vm->status_lock);
>   		list_add(&mapping->list, &vm->freed);
> -	else
> +		spin_unlock(&vm->status_lock);
> +	} else {
>   		amdgpu_vm_free_mapping(adev, vm, mapping,
>   				       bo_va->last_pt_update);
> +	}
>   
>   	return 0;
>   }
> @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>   		    tmp->last = eaddr;
>   
>   		tmp->bo_va = NULL;
> +		spin_lock(&vm->status_lock);
>   		list_add(&tmp->list, &vm->freed);
> +		spin_unlock(&vm->status_lock);
>   		trace_amdgpu_vm_bo_unmap(NULL, tmp);
>   	}
>   
> @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
>   		amdgpu_vm_it_remove(mapping, &vm->va);
>   		mapping->bo_va = NULL;
>   		trace_amdgpu_vm_bo_unmap(bo_va, mapping);
> +		spin_lock(&vm->status_lock);
>   		list_add(&mapping->list, &vm->freed);
> +		spin_unlock(&vm->status_lock);
>   	}
>   	list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) {
>   		list_del(&mapping->list);

WARNING: multiple messages have this Message-ID (diff)

From: "Christian König" <christian.koenig@amd.com>
To: Rob Clark <robdclark@gmail.com>, dri-devel@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>,
	Rob Clark <robdclark@chromium.org>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	Philip Yang <Philip.Yang@amd.com>, Qiang Yu <qiang.yu@amd.com>,
	Jammy Zhou <Jammy.Zhou@amd.com>,
	"open list:RADEON and AMDGPU DRM DRIVERS" 
	<amd-gfx@lists.freedesktop.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] drm/amdgpu: Fix potential race processing vm->freed
Date: Mon, 6 Feb 2023 11:14:49 +0100	[thread overview]
Message-ID: <2d5fc6f8-2247-8a8b-1174-eccdc2b08064@amd.com> (raw)
In-Reply-To: <20230203181005.4129175-1-robdclark@gmail.com>

Am 03.02.23 um 19:10 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> If userspace calls the AMDGPU_CS ioctl from multiple threads, because
> the vm is global to the drm_file, you can end up with multiple threads
> racing in amdgpu_vm_clear_freed().  So the freed list should be
> protected with the status_lock, similar to other vm lists.

Well this is nonsense. To process the freed list the VM root PD lock 
must be held anyway.

If we have a call path where this isn't true then we have a major bug at 
a different place here.

Regards,
Christian.

>
> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++----
>   1 file changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index b9441ab457ea..aeed7bc1512f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>   	struct amdgpu_bo_va_mapping *mapping;
>   	uint64_t init_pte_value = 0;
>   	struct dma_fence *f = NULL;
> +	struct list_head freed;
>   	int r;
>   
> -	while (!list_empty(&vm->freed)) {
> -		mapping = list_first_entry(&vm->freed,
> +	/*
> +	 * Move the contents of the VM's freed list to a local list
> +	 * that we can iterate without racing against other threads:
> +	 */
> +	spin_lock(&vm->status_lock);
> +	list_replace_init(&vm->freed, &freed);
> +	spin_unlock(&vm->status_lock);
> +
> +	while (!list_empty(&freed)) {
> +		mapping = list_first_entry(&freed,
>   			struct amdgpu_bo_va_mapping, list);
>   		list_del(&mapping->list);
>   
> @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>   		amdgpu_vm_free_mapping(adev, vm, mapping, f);
>   		if (r) {
>   			dma_fence_put(f);
> +
> +			/*
> +			 * Move any unprocessed mappings back to the freed
> +			 * list:
> +			 */
> +			spin_lock(&vm->status_lock);
> +			list_splice_tail(&freed, &vm->freed);
> +			spin_unlock(&vm->status_lock);
> +
>   			return r;
>   		}
>   	}
> @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>   	mapping->bo_va = NULL;
>   	trace_amdgpu_vm_bo_unmap(bo_va, mapping);
>   
> -	if (valid)
> +	if (valid) {
> +		spin_lock(&vm->status_lock);
>   		list_add(&mapping->list, &vm->freed);
> -	else
> +		spin_unlock(&vm->status_lock);
> +	} else {
>   		amdgpu_vm_free_mapping(adev, vm, mapping,
>   				       bo_va->last_pt_update);
> +	}
>   
>   	return 0;
>   }
> @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>   		    tmp->last = eaddr;
>   
>   		tmp->bo_va = NULL;
> +		spin_lock(&vm->status_lock);
>   		list_add(&tmp->list, &vm->freed);
> +		spin_unlock(&vm->status_lock);
>   		trace_amdgpu_vm_bo_unmap(NULL, tmp);
>   	}
>   
> @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
>   		amdgpu_vm_it_remove(mapping, &vm->va);
>   		mapping->bo_va = NULL;
>   		trace_amdgpu_vm_bo_unmap(bo_va, mapping);
> +		spin_lock(&vm->status_lock);
>   		list_add(&mapping->list, &vm->freed);
> +		spin_unlock(&vm->status_lock);
>   	}
>   	list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) {
>   		list_del(&mapping->list);

next prev parent reply	other threads:[~2023-02-06 10:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-03 18:10 [PATCH] drm/amdgpu: Fix potential race processing vm->freed Rob Clark
2023-02-03 18:10 ` Rob Clark
2023-02-03 18:10 ` Rob Clark
2023-02-06 10:14 ` Christian König [this message]
2023-02-06 10:14   ` Christian König
2023-02-06 10:14   ` Christian König
2023-02-06 15:52   ` Rob Clark
2023-02-06 15:52     ` Rob Clark
2023-02-06 15:52     ` Rob Clark
2023-02-06 16:05     ` Christian König
2023-02-06 16:05       ` Christian König
2023-02-06 16:05       ` Christian König
2023-02-06 18:21       ` Rob Clark
2023-02-06 18:21         ` Rob Clark
2023-02-06 18:21         ` Rob Clark
2023-02-06 18:23         ` Christian König
2023-02-06 18:23           ` Christian König
2023-02-06 18:23           ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2d5fc6f8-2247-8a8b-1174-eccdc2b08064@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Jammy.Zhou@amd.com \
    --cc=Philip.Yang@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qiang.yu@amd.com \
    --cc=robdclark@chromium.org \
    --cc=robdclark@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.