* [PATCH] drm/i915: Fix a race between vma / object destruction and unbinding
@ 2022-01-27 11:56 Thomas Hellström
2022-01-27 11:57 ` Maarten Lankhorst
2022-01-28 22:32 ` [Intel-gfx] " Tvrtko Ursulin
0 siblings, 2 replies; 4+ messages in thread
From: Thomas Hellström @ 2022-01-27 11:56 UTC (permalink / raw)
To: intel-gfx, dri-devel; +Cc: Thomas Hellström
The vma destruction code was using an unlocked advisory check for
drm_mm_node_allocated() to avoid racing with eviction code unbinding
the vma.
This is very fragile and prohibits the dereference of non-refcounted
pointers of dying vmas after a call to __i915_vma_unbind(). It also
prohibits the dereference of vma->obj of refcounted pointers of
dying vmas after a call to __i915_vma_unbind(), since even if a
refcount is held on the vma, that won't guarantee that its backing
object doesn't get destroyed.
So introduce an unbind under the vm mutex at object destroy time,
removing all weak references of the vma and its object from the
object vma list and from the vm bound list.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 1a9e1f940a7d..e03e362d320b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -280,6 +280,12 @@ void __i915_gem_object_pages_fini(struct drm_i915_gem_object *obj)
GEM_BUG_ON(vma->obj != obj);
spin_unlock(&obj->vma.lock);
+ /* Verify that the vma is unbound under the vm mutex. */
+ mutex_lock(&vma->vm->mutex);
+ atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
+ __i915_vma_unbind(vma);
+ mutex_unlock(&vma->vm->mutex);
+
__i915_vma_put(vma);
spin_lock(&obj->vma.lock);
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/i915: Fix a race between vma / object destruction and unbinding
2022-01-27 11:56 [PATCH] drm/i915: Fix a race between vma / object destruction and unbinding Thomas Hellström
@ 2022-01-27 11:57 ` Maarten Lankhorst
2022-01-28 22:32 ` [Intel-gfx] " Tvrtko Ursulin
1 sibling, 0 replies; 4+ messages in thread
From: Maarten Lankhorst @ 2022-01-27 11:57 UTC (permalink / raw)
To: Thomas Hellström, intel-gfx, dri-devel
Op 27-01-2022 om 12:56 schreef Thomas Hellström:
> The vma destruction code was using an unlocked advisory check for
> drm_mm_node_allocated() to avoid racing with eviction code unbinding
> the vma.
>
> This is very fragile and prohibits the dereference of non-refcounted
> pointers of dying vmas after a call to __i915_vma_unbind(). It also
> prohibits the dereference of vma->obj of refcounted pointers of
> dying vmas after a call to __i915_vma_unbind(), since even if a
> refcount is held on the vma, that won't guarantee that its backing
> object doesn't get destroyed.
>
> So introduce an unbind under the vm mutex at object destroy time,
> removing all weak references of the vma and its object from the
> object vma list and from the vm bound list.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 1a9e1f940a7d..e03e362d320b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -280,6 +280,12 @@ void __i915_gem_object_pages_fini(struct drm_i915_gem_object *obj)
> GEM_BUG_ON(vma->obj != obj);
> spin_unlock(&obj->vma.lock);
>
> + /* Verify that the vma is unbound under the vm mutex. */
> + mutex_lock(&vma->vm->mutex);
> + atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> + __i915_vma_unbind(vma);
> + mutex_unlock(&vma->vm->mutex);
> +
> __i915_vma_put(vma);
>
> spin_lock(&obj->vma.lock);
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915: Fix a race between vma / object destruction and unbinding
2022-01-27 11:56 [PATCH] drm/i915: Fix a race between vma / object destruction and unbinding Thomas Hellström
2022-01-27 11:57 ` Maarten Lankhorst
@ 2022-01-28 22:32 ` Tvrtko Ursulin
2022-01-29 11:38 ` Thomas Hellström
1 sibling, 1 reply; 4+ messages in thread
From: Tvrtko Ursulin @ 2022-01-28 22:32 UTC (permalink / raw)
To: Thomas Hellström, intel-gfx, dri-devel
On 27/01/2022 11:56, Thomas Hellström wrote:
> The vma destruction code was using an unlocked advisory check for
> drm_mm_node_allocated() to avoid racing with eviction code unbinding
> the vma.
>
> This is very fragile and prohibits the dereference of non-refcounted
> pointers of dying vmas after a call to __i915_vma_unbind(). It also
> prohibits the dereference of vma->obj of refcounted pointers of
> dying vmas after a call to __i915_vma_unbind(), since even if a
> refcount is held on the vma, that won't guarantee that its backing
> object doesn't get destroyed.
>
> So introduce an unbind under the vm mutex at object destroy time,
> removing all weak references of the vma and its object from the
> object vma list and from the vm bound list.
Maarten suggested this fixes an oops like seen in
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22133/shard-snb6/igt@gem_softpin@softpin.html.
If that is so, what would be the Fixes: tag to put here? Although it is
too late now so hopefully bug was introduced in something yet unreleased.
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 1a9e1f940a7d..e03e362d320b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -280,6 +280,12 @@ void __i915_gem_object_pages_fini(struct drm_i915_gem_object *obj)
> GEM_BUG_ON(vma->obj != obj);
> spin_unlock(&obj->vma.lock);
>
> + /* Verify that the vma is unbound under the vm mutex. */
> + mutex_lock(&vma->vm->mutex);
> + atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> + __i915_vma_unbind(vma);
> + mutex_unlock(&vma->vm->mutex);
Hm I am not up to speed with the latest design, but how does the verb
verify and absence of conditionals reconcile here? Does the comment need
improving?
Regards,
Tvrtko
> +
> __i915_vma_put(vma);
>
> spin_lock(&obj->vma.lock);
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915: Fix a race between vma / object destruction and unbinding
2022-01-28 22:32 ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-01-29 11:38 ` Thomas Hellström
0 siblings, 0 replies; 4+ messages in thread
From: Thomas Hellström @ 2022-01-29 11:38 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx, dri-devel
On 1/28/22 23:32, Tvrtko Ursulin wrote:
>
> On 27/01/2022 11:56, Thomas Hellström wrote:
>> The vma destruction code was using an unlocked advisory check for
>> drm_mm_node_allocated() to avoid racing with eviction code unbinding
>> the vma.
>>
>> This is very fragile and prohibits the dereference of non-refcounted
>> pointers of dying vmas after a call to __i915_vma_unbind(). It also
>> prohibits the dereference of vma->obj of refcounted pointers of
>> dying vmas after a call to __i915_vma_unbind(), since even if a
>> refcount is held on the vma, that won't guarantee that its backing
>> object doesn't get destroyed.
>>
>> So introduce an unbind under the vm mutex at object destroy time,
>> removing all weak references of the vma and its object from the
>> object vma list and from the vm bound list.
>
> Maarten suggested this fixes an oops like seen in
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22133/shard-snb6/igt@gem_softpin@softpin.html.
> If that is so, what would be the Fixes: tag to put here? Although it
> is too late now so hopefully bug was introduced in something yet
> unreleased.
Yes, should've had a fixes tag here. Luckily it fixes a very recent
commit, which shouldn't have had time to be released yet.
>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 1a9e1f940a7d..e03e362d320b 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -280,6 +280,12 @@ void __i915_gem_object_pages_fini(struct
>> drm_i915_gem_object *obj)
>> GEM_BUG_ON(vma->obj != obj);
>> spin_unlock(&obj->vma.lock);
>> + /* Verify that the vma is unbound under the vm mutex. */
>> + mutex_lock(&vma->vm->mutex);
>> + atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>> + __i915_vma_unbind(vma);
>> + mutex_unlock(&vma->vm->mutex);
>
> Hm I am not up to speed with the latest design, but how does the verb
> verify and absence of conditionals reconcile here? Does the comment
> need improving?
Yes. Ensure would have been better here. There is some rework of the vma
destruction still needed, though so I'll update or remove the comment
with those patches.
Thanks,
Thomas
>
> Regards,
>
> Tvrtko
>
>> +
>> __i915_vma_put(vma);
>> spin_lock(&obj->vma.lock);
>>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-01-29 11:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-27 11:56 [PATCH] drm/i915: Fix a race between vma / object destruction and unbinding Thomas Hellström
2022-01-27 11:57 ` Maarten Lankhorst
2022-01-28 22:32 ` [Intel-gfx] " Tvrtko Ursulin
2022-01-29 11:38 ` Thomas Hellström
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox