* [PATCH 1/3] drm/panthor: Don't use the racy drm_gem_lru_remove() helper
2026-05-06 12:16 [PATCH 0/3] drm/panthor: Fix a race in the shrinker logic Boris Brezillon
@ 2026-05-06 12:16 ` Boris Brezillon
2026-05-06 15:40 ` Steven Price
2026-05-06 12:16 ` [PATCH 2/3] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release() Boris Brezillon
2026-05-06 12:16 ` [PATCH 3/3] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper Boris Brezillon
2 siblings, 1 reply; 9+ messages in thread
From: Boris Brezillon @ 2026-05-06 12:16 UTC (permalink / raw)
To: Steven Price, Liviu Dudau, Boris Brezillon, Dmitry Osipenko
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Akash Goel, Chia-I Wu, Rob Clark, Dmitry Baryshkov,
Abhinav Kumar, Jessica Zhang, Sean Paul, Marijn Suijten,
linux-arm-msm, freedreno, dri-devel, linux-kernel
drm_gem_lru_remove() dereference stores drm_gem_object::lru in a local
variable that's then dereferenced to acquire the LRU lock. Because this
assignment in done without the LRU lock held, it can race with
drm_gem_lru_scan() where drm_gem_object::lru is temporarily assigned
a stack-allcated LRU that goes away when leaving the function. By
the time we dereference this local lru variable, the object might already
be gone.
It feels like drm_gem_lru_move_tail() was never meant to be used this
way, because there's no easy way we can avoid this race unless we defer
the locking to the caller. Let's add an explicit LRU for unreclaimable
BOs instead, and have all BOs added to this LRU at creation time.
Fixes: fb42964e2a76 ("drm/panthor: Add a GEM shrinker")
Reported-by: Chia-I Wu <olvaffe@gmail.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
---
drivers/gpu/drm/panthor/panthor_device.h | 10 ++++++++++
drivers/gpu/drm/panthor/panthor_gem.c | 5 ++++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 4e4607bca7cc..45b71546f83c 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -190,6 +190,16 @@ struct panthor_device {
/** @reclaim.lock: Lock protecting all LRUs */
struct mutex lock;
+ /**
+ * @reclaim.unreclaimable: unreclaimable BOs
+ *
+ * Either the BO is unreclaimable because it has no pages allocated,
+ * or it's unreclaimable because pages are pinned.
+ *
+ * All BOs start in that list at creation time.
+ */
+ struct drm_gem_lru unreclaimable;
+
/**
* @reclaim.unused: BOs with unused pages
*
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 13295d7a593d..8e31740126e7 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -204,7 +204,7 @@ void panthor_gem_update_reclaim_state_locked(struct panthor_gem_object *bo,
drm_gem_lru_move_tail(&ptdev->reclaim.gpu_mapped_shared, &bo->base);
break;
case PANTHOR_GEM_UNRECLAIMABLE:
- drm_gem_lru_remove(&bo->base);
+ drm_gem_lru_move_tail(&ptdev->reclaim.unreclaimable, &bo->base);
break;
default:
drm_WARN(&ptdev->base, true, "invalid GEM reclaim state (%d)\n", new_state);
@@ -994,6 +994,7 @@ static struct panthor_gem_object *
panthor_gem_create(struct drm_device *dev, size_t size, uint32_t flags,
struct panthor_vm *exclusive_vm, u32 usage_flags)
{
+ struct panthor_device *ptdev = container_of(dev, struct panthor_device, base);
struct panthor_gem_object *bo;
int ret;
@@ -1026,6 +1027,7 @@ panthor_gem_create(struct drm_device *dev, size_t size, uint32_t flags,
}
panthor_gem_debugfs_set_usage_flags(bo, usage_flags);
+ drm_gem_lru_move_tail(&ptdev->reclaim.unreclaimable, &bo->base);
return bo;
err_put:
@@ -1551,6 +1553,7 @@ int panthor_gem_shrinker_init(struct panthor_device *ptdev)
return ret;
INIT_LIST_HEAD(&ptdev->reclaim.vms);
+ drm_gem_lru_init(&ptdev->reclaim.unreclaimable, &ptdev->reclaim.lock);
drm_gem_lru_init(&ptdev->reclaim.unused, &ptdev->reclaim.lock);
drm_gem_lru_init(&ptdev->reclaim.mmapped, &ptdev->reclaim.lock);
drm_gem_lru_init(&ptdev->reclaim.gpu_mapped_shared, &ptdev->reclaim.lock);
--
2.54.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] drm/panthor: Don't use the racy drm_gem_lru_remove() helper
2026-05-06 12:16 ` [PATCH 1/3] drm/panthor: Don't use the racy drm_gem_lru_remove() helper Boris Brezillon
@ 2026-05-06 15:40 ` Steven Price
2026-05-06 16:25 ` Boris Brezillon
0 siblings, 1 reply; 9+ messages in thread
From: Steven Price @ 2026-05-06 15:40 UTC (permalink / raw)
To: Boris Brezillon, Liviu Dudau, Dmitry Osipenko
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Akash Goel, Chia-I Wu, Rob Clark, Dmitry Baryshkov,
Abhinav Kumar, Jessica Zhang, Sean Paul, Marijn Suijten,
linux-arm-msm, freedreno, dri-devel, linux-kernel
On 06/05/2026 13:16, Boris Brezillon wrote:
> drm_gem_lru_remove() dereference stores drm_gem_object::lru in a local
> variable that's then dereferenced to acquire the LRU lock. Because this
> assignment in done without the LRU lock held, it can race with
s/in/is/ ^^
> drm_gem_lru_scan() where drm_gem_object::lru is temporarily assigned
> a stack-allcated LRU that goes away when leaving the function. By
> the time we dereference this local lru variable, the object might already
> be gone.
>
> It feels like drm_gem_lru_move_tail() was never meant to be used this
> way, because there's no easy way we can avoid this race unless we defer
> the locking to the caller. Let's add an explicit LRU for unreclaimable
> BOs instead, and have all BOs added to this LRU at creation time.
>
> Fixes: fb42964e2a76 ("drm/panthor: Add a GEM shrinker")
> Reported-by: Chia-I Wu <olvaffe@gmail.com>
> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
With minor typos fixed
Reviewed-by: Steven Price <steven.price@arm.com>
Although an alternative would be to expose drm_gem_lru_remove_locked()
in some form (maybe a wrapper which requires passing in the lock?)
> ---
> drivers/gpu/drm/panthor/panthor_device.h | 10 ++++++++++
> drivers/gpu/drm/panthor/panthor_gem.c | 5 ++++-
> 2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 4e4607bca7cc..45b71546f83c 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -190,6 +190,16 @@ struct panthor_device {
> /** @reclaim.lock: Lock protecting all LRUs */
> struct mutex lock;
>
> + /**
> + * @reclaim.unreclaimable: unreclaimable BOs
> + *
> + * Either the BO is unreclaimable because it has no pages allocated,
> + * or it's unreclaimable because pages are pinned.
> + *
> + * All BOs start in that list at creation time.
s/that/this/ ^^^^
Thanks,
Steve
> + */
> + struct drm_gem_lru unreclaimable;
> +
> /**
> * @reclaim.unused: BOs with unused pages
> *
> diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
> index 13295d7a593d..8e31740126e7 100644
> --- a/drivers/gpu/drm/panthor/panthor_gem.c
> +++ b/drivers/gpu/drm/panthor/panthor_gem.c
> @@ -204,7 +204,7 @@ void panthor_gem_update_reclaim_state_locked(struct panthor_gem_object *bo,
> drm_gem_lru_move_tail(&ptdev->reclaim.gpu_mapped_shared, &bo->base);
> break;
> case PANTHOR_GEM_UNRECLAIMABLE:
> - drm_gem_lru_remove(&bo->base);
> + drm_gem_lru_move_tail(&ptdev->reclaim.unreclaimable, &bo->base);
> break;
> default:
> drm_WARN(&ptdev->base, true, "invalid GEM reclaim state (%d)\n", new_state);
> @@ -994,6 +994,7 @@ static struct panthor_gem_object *
> panthor_gem_create(struct drm_device *dev, size_t size, uint32_t flags,
> struct panthor_vm *exclusive_vm, u32 usage_flags)
> {
> + struct panthor_device *ptdev = container_of(dev, struct panthor_device, base);
> struct panthor_gem_object *bo;
> int ret;
>
> @@ -1026,6 +1027,7 @@ panthor_gem_create(struct drm_device *dev, size_t size, uint32_t flags,
> }
>
> panthor_gem_debugfs_set_usage_flags(bo, usage_flags);
> + drm_gem_lru_move_tail(&ptdev->reclaim.unreclaimable, &bo->base);
> return bo;
>
> err_put:
> @@ -1551,6 +1553,7 @@ int panthor_gem_shrinker_init(struct panthor_device *ptdev)
> return ret;
>
> INIT_LIST_HEAD(&ptdev->reclaim.vms);
> + drm_gem_lru_init(&ptdev->reclaim.unreclaimable, &ptdev->reclaim.lock);
> drm_gem_lru_init(&ptdev->reclaim.unused, &ptdev->reclaim.lock);
> drm_gem_lru_init(&ptdev->reclaim.mmapped, &ptdev->reclaim.lock);
> drm_gem_lru_init(&ptdev->reclaim.gpu_mapped_shared, &ptdev->reclaim.lock);
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] drm/panthor: Don't use the racy drm_gem_lru_remove() helper
2026-05-06 15:40 ` Steven Price
@ 2026-05-06 16:25 ` Boris Brezillon
0 siblings, 0 replies; 9+ messages in thread
From: Boris Brezillon @ 2026-05-06 16:25 UTC (permalink / raw)
To: Steven Price
Cc: Liviu Dudau, Dmitry Osipenko, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Akash Goel,
Chia-I Wu, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
freedreno, dri-devel, linux-kernel
On Wed, 6 May 2026 16:40:22 +0100
Steven Price <steven.price@arm.com> wrote:
> On 06/05/2026 13:16, Boris Brezillon wrote:
> > drm_gem_lru_remove() dereference stores drm_gem_object::lru in a local
> > variable that's then dereferenced to acquire the LRU lock. Because this
> > assignment in done without the LRU lock held, it can race with
> s/in/is/ ^^
> > drm_gem_lru_scan() where drm_gem_object::lru is temporarily assigned
> > a stack-allcated LRU that goes away when leaving the function. By
> > the time we dereference this local lru variable, the object might already
> > be gone.
> >
> > It feels like drm_gem_lru_move_tail() was never meant to be used this
> > way, because there's no easy way we can avoid this race unless we defer
> > the locking to the caller. Let's add an explicit LRU for unreclaimable
> > BOs instead, and have all BOs added to this LRU at creation time.
> >
> > Fixes: fb42964e2a76 ("drm/panthor: Add a GEM shrinker")
> > Reported-by: Chia-I Wu <olvaffe@gmail.com>
> > Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
>
> With minor typos fixed
>
> Reviewed-by: Steven Price <steven.price@arm.com>
>
> Although an alternative would be to expose drm_gem_lru_remove_locked()
> in some form (maybe a wrapper which requires passing in the lock?)
I considered that too, but I thought it was less invasive to just have
a default LRU to start in at creation time, and end in there's nothing
left to reclaim. It's also what MSM does, so I figured I'd do that too.
>
> > ---
> > drivers/gpu/drm/panthor/panthor_device.h | 10 ++++++++++
> > drivers/gpu/drm/panthor/panthor_gem.c | 5 ++++-
> > 2 files changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > index 4e4607bca7cc..45b71546f83c 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -190,6 +190,16 @@ struct panthor_device {
> > /** @reclaim.lock: Lock protecting all LRUs */
> > struct mutex lock;
> >
> > + /**
> > + * @reclaim.unreclaimable: unreclaimable BOs
> > + *
> > + * Either the BO is unreclaimable because it has no pages allocated,
> > + * or it's unreclaimable because pages are pinned.
> > + *
> > + * All BOs start in that list at creation time.
> s/that/this/ ^^^^
>
> Thanks,
> Steve
>
> > + */
> > + struct drm_gem_lru unreclaimable;
> > +
> > /**
> > * @reclaim.unused: BOs with unused pages
> > *
> > diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
> > index 13295d7a593d..8e31740126e7 100644
> > --- a/drivers/gpu/drm/panthor/panthor_gem.c
> > +++ b/drivers/gpu/drm/panthor/panthor_gem.c
> > @@ -204,7 +204,7 @@ void panthor_gem_update_reclaim_state_locked(struct panthor_gem_object *bo,
> > drm_gem_lru_move_tail(&ptdev->reclaim.gpu_mapped_shared, &bo->base);
> > break;
> > case PANTHOR_GEM_UNRECLAIMABLE:
> > - drm_gem_lru_remove(&bo->base);
> > + drm_gem_lru_move_tail(&ptdev->reclaim.unreclaimable, &bo->base);
> > break;
> > default:
> > drm_WARN(&ptdev->base, true, "invalid GEM reclaim state (%d)\n", new_state);
> > @@ -994,6 +994,7 @@ static struct panthor_gem_object *
> > panthor_gem_create(struct drm_device *dev, size_t size, uint32_t flags,
> > struct panthor_vm *exclusive_vm, u32 usage_flags)
> > {
> > + struct panthor_device *ptdev = container_of(dev, struct panthor_device, base);
> > struct panthor_gem_object *bo;
> > int ret;
> >
> > @@ -1026,6 +1027,7 @@ panthor_gem_create(struct drm_device *dev, size_t size, uint32_t flags,
> > }
> >
> > panthor_gem_debugfs_set_usage_flags(bo, usage_flags);
> > + drm_gem_lru_move_tail(&ptdev->reclaim.unreclaimable, &bo->base);
> > return bo;
> >
> > err_put:
> > @@ -1551,6 +1553,7 @@ int panthor_gem_shrinker_init(struct panthor_device *ptdev)
> > return ret;
> >
> > INIT_LIST_HEAD(&ptdev->reclaim.vms);
> > + drm_gem_lru_init(&ptdev->reclaim.unreclaimable, &ptdev->reclaim.lock);
> > drm_gem_lru_init(&ptdev->reclaim.unused, &ptdev->reclaim.lock);
> > drm_gem_lru_init(&ptdev->reclaim.mmapped, &ptdev->reclaim.lock);
> > drm_gem_lru_init(&ptdev->reclaim.gpu_mapped_shared, &ptdev->reclaim.lock);
> >
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/3] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release()
2026-05-06 12:16 [PATCH 0/3] drm/panthor: Fix a race in the shrinker logic Boris Brezillon
2026-05-06 12:16 ` [PATCH 1/3] drm/panthor: Don't use the racy drm_gem_lru_remove() helper Boris Brezillon
@ 2026-05-06 12:16 ` Boris Brezillon
2026-05-06 13:21 ` Rob Clark
2026-05-06 12:16 ` [PATCH 3/3] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper Boris Brezillon
2 siblings, 1 reply; 9+ messages in thread
From: Boris Brezillon @ 2026-05-06 12:16 UTC (permalink / raw)
To: Steven Price, Liviu Dudau, Boris Brezillon, Dmitry Osipenko
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Akash Goel, Chia-I Wu, Rob Clark, Dmitry Baryshkov,
Abhinav Kumar, Jessica Zhang, Sean Paul, Marijn Suijten,
linux-arm-msm, freedreno, dri-devel, linux-kernel
The following race can currently happen:
| Thread 0 in `drm_gem_lru_scan` | Thread 1 in `drm_gem_object_release` |
| - | - |
| move obj1 with refcount==0 to `still_in_lru` | |
| move obj2 with refcount!=0 to `still_in_lru` | |
| mutex_unlock | |
| shrink obj2 | |
| | lru = obj1->lru; // `still_in_lru` |
| mutex_lock | |
| move obj1 back to the original lru | |
| mutex_unlock | |
| return | |
| | dereference `still_in_lru` |
Move the drm_gem_lru_move_tail_locked() after the
kref_get_unless_zero() check so that we don't end up with a
vanishing LRU when we hit drm_gem_object_release(). We also need to
remove the skipped object from its LRU, otherwise we'll keep hitting
it on subsequent loop iterations until it's actually removed from the
list in the drm_gem_release().
Fixes: e7c2af13f811 ("drm/gem: Add LRU/shrinker helper")
Reported-by: Chia-I Wu <olvaffe@gmail.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
---
drivers/gpu/drm/drm_gem.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index fca42949eb2b..97cf63de0112 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1660,15 +1660,19 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
if (!obj)
break;
- drm_gem_lru_move_tail_locked(&still_in_lru, obj);
-
/*
* If it's in the process of being freed, gem_object->free()
- * may be blocked on lock waiting to remove it. So just
- * skip it.
+ * may be blocked on lock waiting to remove it. So just remove
+ * it from its current LRU and skip it.
*/
- if (!kref_get_unless_zero(&obj->refcount))
+ if (!kref_get_unless_zero(&obj->refcount)) {
+ if (obj->lru)
+ drm_gem_lru_remove_locked(obj);
+
continue;
+ }
+
+ drm_gem_lru_move_tail_locked(&still_in_lru, obj);
/*
* Now that we own a reference, we can drop the lock for the
--
2.54.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 2/3] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release()
2026-05-06 12:16 ` [PATCH 2/3] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release() Boris Brezillon
@ 2026-05-06 13:21 ` Rob Clark
2026-05-06 14:33 ` Boris Brezillon
0 siblings, 1 reply; 9+ messages in thread
From: Rob Clark @ 2026-05-06 13:21 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, Liviu Dudau, Dmitry Osipenko, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Akash Goel, Chia-I Wu, Dmitry Baryshkov, Abhinav Kumar,
Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
freedreno, dri-devel, linux-kernel
On Wed, May 6, 2026 at 5:16 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> The following race can currently happen:
>
> | Thread 0 in `drm_gem_lru_scan` | Thread 1 in `drm_gem_object_release` |
> | - | - |
> | move obj1 with refcount==0 to `still_in_lru` | |
> | move obj2 with refcount!=0 to `still_in_lru` | |
> | mutex_unlock | |
> | shrink obj2 | |
> | | lru = obj1->lru; // `still_in_lru` |
> | mutex_lock | |
> | move obj1 back to the original lru | |
> | mutex_unlock | |
> | return | |
> | | dereference `still_in_lru` |
>
> Move the drm_gem_lru_move_tail_locked() after the
> kref_get_unless_zero() check so that we don't end up with a
> vanishing LRU when we hit drm_gem_object_release(). We also need to
> remove the skipped object from its LRU, otherwise we'll keep hitting
> it on subsequent loop iterations until it's actually removed from the
> list in the drm_gem_release().
>
> Fixes: e7c2af13f811 ("drm/gem: Add LRU/shrinker helper")
> Reported-by: Chia-I Wu <olvaffe@gmail.com>
> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> ---
> drivers/gpu/drm/drm_gem.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index fca42949eb2b..97cf63de0112 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1660,15 +1660,19 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
> if (!obj)
> break;
>
> - drm_gem_lru_move_tail_locked(&still_in_lru, obj);
> -
> /*
> * If it's in the process of being freed, gem_object->free()
> - * may be blocked on lock waiting to remove it. So just
> - * skip it.
> + * may be blocked on lock waiting to remove it. So just remove
> + * it from its current LRU and skip it.
> */
> - if (!kref_get_unless_zero(&obj->refcount))
> + if (!kref_get_unless_zero(&obj->refcount)) {
> + if (obj->lru)
> + drm_gem_lru_remove_locked(obj);
if we are iterating a LRU.. and lru->lock is held, shouldn't obj->lru
always be non-null?
BR,
-R
> +
> continue;
> + }
> +
> + drm_gem_lru_move_tail_locked(&still_in_lru, obj);
>
> /*
> * Now that we own a reference, we can drop the lock for the
>
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 2/3] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release()
2026-05-06 13:21 ` Rob Clark
@ 2026-05-06 14:33 ` Boris Brezillon
0 siblings, 0 replies; 9+ messages in thread
From: Boris Brezillon @ 2026-05-06 14:33 UTC (permalink / raw)
To: Rob Clark
Cc: Steven Price, Liviu Dudau, Dmitry Osipenko, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Akash Goel, Chia-I Wu, Dmitry Baryshkov, Abhinav Kumar,
Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
freedreno, dri-devel, linux-kernel
On Wed, 6 May 2026 06:21:42 -0700
Rob Clark <rob.clark@oss.qualcomm.com> wrote:
> On Wed, May 6, 2026 at 5:16 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > The following race can currently happen:
> >
> > | Thread 0 in `drm_gem_lru_scan` | Thread 1 in `drm_gem_object_release` |
> > | - | - |
> > | move obj1 with refcount==0 to `still_in_lru` | |
> > | move obj2 with refcount!=0 to `still_in_lru` | |
> > | mutex_unlock | |
> > | shrink obj2 | |
> > | | lru = obj1->lru; // `still_in_lru` |
> > | mutex_lock | |
> > | move obj1 back to the original lru | |
> > | mutex_unlock | |
> > | return | |
> > | | dereference `still_in_lru` |
> >
> > Move the drm_gem_lru_move_tail_locked() after the
> > kref_get_unless_zero() check so that we don't end up with a
> > vanishing LRU when we hit drm_gem_object_release(). We also need to
> > remove the skipped object from its LRU, otherwise we'll keep hitting
> > it on subsequent loop iterations until it's actually removed from the
> > list in the drm_gem_release().
> >
> > Fixes: e7c2af13f811 ("drm/gem: Add LRU/shrinker helper")
> > Reported-by: Chia-I Wu <olvaffe@gmail.com>
> > Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> > ---
> > drivers/gpu/drm/drm_gem.c | 14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > index fca42949eb2b..97cf63de0112 100644
> > --- a/drivers/gpu/drm/drm_gem.c
> > +++ b/drivers/gpu/drm/drm_gem.c
> > @@ -1660,15 +1660,19 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
> > if (!obj)
> > break;
> >
> > - drm_gem_lru_move_tail_locked(&still_in_lru, obj);
> > -
> > /*
> > * If it's in the process of being freed, gem_object->free()
> > - * may be blocked on lock waiting to remove it. So just
> > - * skip it.
> > + * may be blocked on lock waiting to remove it. So just remove
> > + * it from its current LRU and skip it.
> > */
> > - if (!kref_get_unless_zero(&obj->refcount))
> > + if (!kref_get_unless_zero(&obj->refcount)) {
> > + if (obj->lru)
> > + drm_gem_lru_remove_locked(obj);
>
> if we are iterating a LRU.. and lru->lock is held, shouldn't obj->lru
> always be non-null?
Right, the != NULL check is not needed here.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper
2026-05-06 12:16 [PATCH 0/3] drm/panthor: Fix a race in the shrinker logic Boris Brezillon
2026-05-06 12:16 ` [PATCH 1/3] drm/panthor: Don't use the racy drm_gem_lru_remove() helper Boris Brezillon
2026-05-06 12:16 ` [PATCH 2/3] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release() Boris Brezillon
@ 2026-05-06 12:16 ` Boris Brezillon
2026-05-06 15:40 ` Steven Price
2 siblings, 1 reply; 9+ messages in thread
From: Boris Brezillon @ 2026-05-06 12:16 UTC (permalink / raw)
To: Steven Price, Liviu Dudau, Boris Brezillon, Dmitry Osipenko
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Akash Goel, Chia-I Wu, Rob Clark, Dmitry Baryshkov,
Abhinav Kumar, Jessica Zhang, Sean Paul, Marijn Suijten,
linux-arm-msm, freedreno, dri-devel, linux-kernel
The only place where it's safe to call drm_gem_lru_remove() is when
we know the drm_gem_object::lru field can't be concurrently updated,
which we know is the case when the drm_gem_object is destroyed.
Rather than trying to make that safe, let's kill the function and inline
its content in drm_gem_object_release().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
---
drivers/gpu/drm/drm_gem.c | 49 +++++++++++++++++------------------------------
include/drm/drm_gem.h | 1 -
2 files changed, 18 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 97cf63de0112..b6df4f62f7b5 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1108,6 +1108,15 @@ drm_gem_release(struct drm_device *dev, struct drm_file *file_private)
idr_destroy(&file_private->object_idr);
}
+static void
+drm_gem_lru_remove_locked(struct drm_gem_object *obj)
+{
+ obj->lru->count -= obj->size >> PAGE_SHIFT;
+ WARN_ON(obj->lru->count < 0);
+ list_del(&obj->lru_node);
+ obj->lru = NULL;
+}
+
/**
* drm_gem_object_release - release GEM buffer object resources
* @obj: GEM buffer object
@@ -1124,7 +1133,15 @@ drm_gem_object_release(struct drm_gem_object *obj)
drm_gem_private_object_fini(obj);
drm_gem_free_mmap_offset(obj);
- drm_gem_lru_remove(obj);
+
+ /* If the object refcount drops to zero, it means no one can change
+ * the LRU it's inserted into, so it's safe to dereference
+ * drm_gem_object::lru without the drm_gem_lru::lock held.
+ */
+ if (obj->lru) {
+ guard(mutex)(obj->lru->lock);
+ drm_gem_lru_remove_locked(obj);
+ }
}
EXPORT_SYMBOL(drm_gem_object_release);
@@ -1552,36 +1569,6 @@ drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock)
}
EXPORT_SYMBOL(drm_gem_lru_init);
-static void
-drm_gem_lru_remove_locked(struct drm_gem_object *obj)
-{
- obj->lru->count -= obj->size >> PAGE_SHIFT;
- WARN_ON(obj->lru->count < 0);
- list_del(&obj->lru_node);
- obj->lru = NULL;
-}
-
-/**
- * drm_gem_lru_remove - remove object from whatever LRU it is in
- *
- * If the object is currently in any LRU, remove it.
- *
- * @obj: The GEM object to remove from current LRU
- */
-void
-drm_gem_lru_remove(struct drm_gem_object *obj)
-{
- struct drm_gem_lru *lru = obj->lru;
-
- if (!lru)
- return;
-
- mutex_lock(lru->lock);
- drm_gem_lru_remove_locked(obj);
- mutex_unlock(lru->lock);
-}
-EXPORT_SYMBOL(drm_gem_lru_remove);
-
/**
* drm_gem_lru_move_tail_locked - move the object to the tail of the LRU
*
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 86f5846154f7..d527df98d142 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -611,7 +611,6 @@ int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
u32 handle, u64 *offset);
void drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock);
-void drm_gem_lru_remove(struct drm_gem_object *obj);
void drm_gem_lru_move_tail_locked(struct drm_gem_lru *lru, struct drm_gem_object *obj);
void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
unsigned long
--
2.54.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 3/3] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper
2026-05-06 12:16 ` [PATCH 3/3] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper Boris Brezillon
@ 2026-05-06 15:40 ` Steven Price
0 siblings, 0 replies; 9+ messages in thread
From: Steven Price @ 2026-05-06 15:40 UTC (permalink / raw)
To: Boris Brezillon, Liviu Dudau, Dmitry Osipenko
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Akash Goel, Chia-I Wu, Rob Clark, Dmitry Baryshkov,
Abhinav Kumar, Jessica Zhang, Sean Paul, Marijn Suijten,
linux-arm-msm, freedreno, dri-devel, linux-kernel
On 06/05/2026 13:16, Boris Brezillon wrote:
> The only place where it's safe to call drm_gem_lru_remove() is when
> we know the drm_gem_object::lru field can't be concurrently updated,
> which we know is the case when the drm_gem_object is destroyed.
>
> Rather than trying to make that safe, let's kill the function and inline
> its content in drm_gem_object_release().
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/drm_gem.c | 49 +++++++++++++++++------------------------------
> include/drm/drm_gem.h | 1 -
> 2 files changed, 18 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 97cf63de0112..b6df4f62f7b5 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1108,6 +1108,15 @@ drm_gem_release(struct drm_device *dev, struct drm_file *file_private)
> idr_destroy(&file_private->object_idr);
> }
>
> +static void
> +drm_gem_lru_remove_locked(struct drm_gem_object *obj)
> +{
> + obj->lru->count -= obj->size >> PAGE_SHIFT;
> + WARN_ON(obj->lru->count < 0);
> + list_del(&obj->lru_node);
> + obj->lru = NULL;
> +}
> +
> /**
> * drm_gem_object_release - release GEM buffer object resources
> * @obj: GEM buffer object
> @@ -1124,7 +1133,15 @@ drm_gem_object_release(struct drm_gem_object *obj)
> drm_gem_private_object_fini(obj);
>
> drm_gem_free_mmap_offset(obj);
> - drm_gem_lru_remove(obj);
> +
> + /* If the object refcount drops to zero, it means no one can change
> + * the LRU it's inserted into, so it's safe to dereference
> + * drm_gem_object::lru without the drm_gem_lru::lock held.
> + */
> + if (obj->lru) {
> + guard(mutex)(obj->lru->lock);
> + drm_gem_lru_remove_locked(obj);
> + }
> }
> EXPORT_SYMBOL(drm_gem_object_release);
>
> @@ -1552,36 +1569,6 @@ drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock)
> }
> EXPORT_SYMBOL(drm_gem_lru_init);
>
> -static void
> -drm_gem_lru_remove_locked(struct drm_gem_object *obj)
> -{
> - obj->lru->count -= obj->size >> PAGE_SHIFT;
> - WARN_ON(obj->lru->count < 0);
> - list_del(&obj->lru_node);
> - obj->lru = NULL;
> -}
> -
> -/**
> - * drm_gem_lru_remove - remove object from whatever LRU it is in
> - *
> - * If the object is currently in any LRU, remove it.
> - *
> - * @obj: The GEM object to remove from current LRU
> - */
> -void
> -drm_gem_lru_remove(struct drm_gem_object *obj)
> -{
> - struct drm_gem_lru *lru = obj->lru;
> -
> - if (!lru)
> - return;
> -
> - mutex_lock(lru->lock);
> - drm_gem_lru_remove_locked(obj);
> - mutex_unlock(lru->lock);
> -}
> -EXPORT_SYMBOL(drm_gem_lru_remove);
> -
> /**
> * drm_gem_lru_move_tail_locked - move the object to the tail of the LRU
> *
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 86f5846154f7..d527df98d142 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -611,7 +611,6 @@ int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
> u32 handle, u64 *offset);
>
> void drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock);
> -void drm_gem_lru_remove(struct drm_gem_object *obj);
> void drm_gem_lru_move_tail_locked(struct drm_gem_lru *lru, struct drm_gem_object *obj);
> void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
> unsigned long
>
^ permalink raw reply [flat|nested] 9+ messages in thread