[PATCH v2 0/4] drm/panthor: Fix a race in the shrinker logic

dri-devel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] drm/panthor: Fix a race in the shrinker logic
@ 2026-05-08 10:40 Boris Brezillon
  2026-05-08 10:40 ` [PATCH v2 1/4] drm/panthor: Don't use the racy drm_gem_lru_remove() helper Boris Brezillon
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Boris Brezillon @ 2026-05-08 10:40 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau, Boris Brezillon
  Cc: Dmitry Osipenko, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Akash Goel,
	Chia-I Wu, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
	freedreno, dri-devel, linux-kernel

As reported by Chia-I [1], a race exists between drm_gem_lru_remove()
and drm_gem_lru_scan(), causing a UAF on a stack-allocated object.

This first patch fixes the problem at the panthor level by making
sure we never use drm_gem_lru_remove(). The second one fixes an
undetected race between drm_gem_lru_scan() and
drm_gem_object_release(). The third one kills drm_gem_lru_remove()
so no one else relying on the drm_gem_lru infra gets bitten by this
race again. And the last one tries to simplify the locking around
LRU updates so we can solve the chicken/egg problem where the lock
that needs to acquired is under gem->lru->lock, and gem->lru is
also supposed to be accessed with the lru->lock held.

Note that patch 1, 2 and 3 could be skipped if we go directly for the
approach in patch 4. Panthor wouldn't be impacted because the
shrinker support didn't land in Linus tree yet, so no fix to backport
there. We might still want patch 2 so it can easily be backported (if
the bug is deemed important for MSM).

Rob, I'll leave it up to you, but no matter what we decide, I'd really
like to have some fix in before the next merge window.

Liviu, Chia-I, Steve, I've intentionally dropped your R-b on patch 2
and 3 because they changed a bit.

[1]https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86

---
Changes in v2:
- Collect R-b
- Drop a useless obj->lru != NULL check in drm_gem_lru_scan()
- Fix another race introduced in patch 2
- Document why the lru != NULL check done without the lru lock held
  in drm_gem_lru_remove() is safe
- Add a patch to sanitize the GEM LRU locking: lock is now part of
  drm_device, meaning we don't have this chicken/egg problem where
  the lock that needs to acquired is under gem->lru->lock, and
  gem->lru is also supposed to be accessed with the lru->lock held
- Fix typos in commit messages and comments
- Link to v1: https://lore.kernel.org/r/20260506-panthor-shrinker-fixes-v1-0-e7721526de96@collabora.com

---
Boris Brezillon (4):
      drm/panthor: Don't use the racy drm_gem_lru_remove() helper
      drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release()
      drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper
      drm/gem: Make the GEM LRU lock part of drm_device

 drivers/gpu/drm/drm_drv.c                |  2 +
 drivers/gpu/drm/drm_gem.c                | 79 +++++++++++++-------------------
 drivers/gpu/drm/msm/msm_drv.c            | 11 ++---
 drivers/gpu/drm/msm/msm_drv.h            |  7 ---
 drivers/gpu/drm/msm/msm_gem.c            | 32 ++++++-------
 drivers/gpu/drm/msm/msm_gem_shrinker.c   |  4 +-
 drivers/gpu/drm/msm/msm_gem_submit.c     |  6 +--
 drivers/gpu/drm/msm/msm_gem_vma.c        | 12 ++---
 drivers/gpu/drm/msm/msm_ringbuffer.c     |  6 +--
 drivers/gpu/drm/panthor/panthor_device.h | 11 ++++-
 drivers/gpu/drm/panthor/panthor_gem.c    | 24 +++++-----
 drivers/gpu/drm/panthor/panthor_mmu.c    | 29 ++++++------
 include/drm/drm_device.h                 |  7 +++
 include/drm/drm_gem.h                    | 21 ++++-----
 14 files changed, 120 insertions(+), 131 deletions(-)
---
base-commit: c006978163fd001fbca55e5fa57bddcf49f47ad9
change-id: 20260506-panthor-shrinker-fixes-58c1f45cfc41

Best regards,
-- 
Boris Brezillon <boris.brezillon@collabora.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/4] drm/panthor: Don't use the racy drm_gem_lru_remove() helper
  2026-05-08 10:40 [PATCH v2 0/4] drm/panthor: Fix a race in the shrinker logic Boris Brezillon
@ 2026-05-08 10:40 ` Boris Brezillon
  2026-05-08 10:40 ` [PATCH v2 2/4] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release() Boris Brezillon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Boris Brezillon @ 2026-05-08 10:40 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau, Boris Brezillon
  Cc: Dmitry Osipenko, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Akash Goel,
	Chia-I Wu, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
	freedreno, dri-devel, linux-kernel

drm_gem_lru_remove() stores drm_gem_object::lru in a local variable
that's then dereferenced to acquire the LRU lock. Because this
assignment is done without the LRU lock held, it can race with
drm_gem_lru_scan() where drm_gem_object::lru is temporarily assigned
a stack-allcated LRU that goes away when leaving the function. By
the time we dereference this local lru variable, the object might
already be gone.

It feels like drm_gem_lru_remove() was never meant to be used this
way, because there's no easy way we can avoid this race unless we defer
the locking to the caller. Let's add an explicit LRU for unreclaimable
BOs instead, and have all BOs added to this LRU at creation time.

Fixes: fb42964e2a76 ("drm/panthor: Add a GEM shrinker")
Reported-by: Chia-I Wu <olvaffe@gmail.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 10 ++++++++++
 drivers/gpu/drm/panthor/panthor_gem.c    |  5 ++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 4e4607bca7cc..dcdce75b683b 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -190,6 +190,16 @@ struct panthor_device {
 		/** @reclaim.lock: Lock protecting all LRUs */
 		struct mutex lock;
 
+		/**
+		 * @reclaim.unreclaimable: unreclaimable BOs
+		 *
+		 * Either the BO is unreclaimable because it has no pages allocated,
+		 * or it's unreclaimable because pages are pinned.
+		 *
+		 * All BOs start in this list at creation time.
+		 */
+		struct drm_gem_lru unreclaimable;
+
 		/**
 		 * @reclaim.unused: BOs with unused pages
 		 *
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 13295d7a593d..8e31740126e7 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -204,7 +204,7 @@ void panthor_gem_update_reclaim_state_locked(struct panthor_gem_object *bo,
 		drm_gem_lru_move_tail(&ptdev->reclaim.gpu_mapped_shared, &bo->base);
 		break;
 	case PANTHOR_GEM_UNRECLAIMABLE:
-		drm_gem_lru_remove(&bo->base);
+		drm_gem_lru_move_tail(&ptdev->reclaim.unreclaimable, &bo->base);
 		break;
 	default:
 		drm_WARN(&ptdev->base, true, "invalid GEM reclaim state (%d)\n", new_state);
@@ -994,6 +994,7 @@ static struct panthor_gem_object *
 panthor_gem_create(struct drm_device *dev, size_t size, uint32_t flags,
 		   struct panthor_vm *exclusive_vm, u32 usage_flags)
 {
+	struct panthor_device *ptdev = container_of(dev, struct panthor_device, base);
 	struct panthor_gem_object *bo;
 	int ret;
 
@@ -1026,6 +1027,7 @@ panthor_gem_create(struct drm_device *dev, size_t size, uint32_t flags,
 	}
 
 	panthor_gem_debugfs_set_usage_flags(bo, usage_flags);
+	drm_gem_lru_move_tail(&ptdev->reclaim.unreclaimable, &bo->base);
 	return bo;
 
 err_put:
@@ -1551,6 +1553,7 @@ int panthor_gem_shrinker_init(struct panthor_device *ptdev)
 		return ret;
 
 	INIT_LIST_HEAD(&ptdev->reclaim.vms);
+	drm_gem_lru_init(&ptdev->reclaim.unreclaimable, &ptdev->reclaim.lock);
 	drm_gem_lru_init(&ptdev->reclaim.unused, &ptdev->reclaim.lock);
 	drm_gem_lru_init(&ptdev->reclaim.mmapped, &ptdev->reclaim.lock);
 	drm_gem_lru_init(&ptdev->reclaim.gpu_mapped_shared, &ptdev->reclaim.lock);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/4] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release()
  2026-05-08 10:40 [PATCH v2 0/4] drm/panthor: Fix a race in the shrinker logic Boris Brezillon
  2026-05-08 10:40 ` [PATCH v2 1/4] drm/panthor: Don't use the racy drm_gem_lru_remove() helper Boris Brezillon
@ 2026-05-08 10:40 ` Boris Brezillon
  2026-05-08 13:49   ` Liviu Dudau
  2026-05-08 10:40 ` [PATCH v2 3/4] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper Boris Brezillon
  2026-05-08 10:40 ` [PATCH v2 4/4] drm/gem: Make the GEM LRU lock part of drm_device Boris Brezillon
  3 siblings, 1 reply; 7+ messages in thread
From: Boris Brezillon @ 2026-05-08 10:40 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau, Boris Brezillon
  Cc: Dmitry Osipenko, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Akash Goel,
	Chia-I Wu, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
	freedreno, dri-devel, linux-kernel

The following race can currently happen:

| Thread 0 in `drm_gem_lru_scan`               | Thread 1 in `drm_gem_object_release` |
| -                                            | -                                    |
| move obj1 with refcount==0 to `still_in_lru` |                                      |
| move obj2 with refcount!=0 to `still_in_lru` |                                      |
| mutex_unlock                                 |                                      |
| shrink obj2                                  |                                      |
|                                              | lru = obj1->lru; // `still_in_lru`   |
| mutex_lock                                   |                                      |
| move obj1 back to the original lru           |                                      |
| mutex_unlock                                 |                                      |
| return                                       |                                      |
|                                              | dereference `still_in_lru`           |

Move the drm_gem_lru_move_tail_locked() after the
kref_get_unless_zero() check so that we don't end up with a
vanishing LRU when we hit drm_gem_object_release(). We also need to
remove the skipped object from its LRU, otherwise we'll keep hitting
it on subsequent loop iterations until it's actually removed from the
list in the drm_gem_release().

Fixes: e7c2af13f811 ("drm/gem: Add LRU/shrinker helper")
Reported-by: Chia-I Wu <olvaffe@gmail.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/drm_gem.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index fca42949eb2b..0e087c770883 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1573,11 +1573,31 @@ drm_gem_lru_remove(struct drm_gem_object *obj)
 {
 	struct drm_gem_lru *lru = obj->lru;
 
+	/*
+	 * We do the lru != NULL check without the lru->lock held, which
+	 * means we might end up with a stale lru value by the time the
+	 * lock is acquired.
+	 *
+	 * This is deemed safe because:
+	 * 1. the LRU is assumed to outlive any GEM object it was attached
+	 *    (LRUs are usually bound to a drm_device). So even if obj->lru
+	 *    has become NULL, it still point to a valid object that can
+	 *    safely be dereferenced to get the lock.
+	 *
+	 * 2. all LRUs a GEM object might be attached to must share the same
+	 *    lock (lock that's usually part of the driver-specific device
+	 *    object), so taking the lock on the 'old' LRU is equivalent
+	 *    to taking it on the new one (if any)
+	 */
 	if (!lru)
 		return;
 
 	mutex_lock(lru->lock);
-	drm_gem_lru_remove_locked(obj);
+	/* Check a second time with the lock held to make sure we're not racing
+	 * with another drm_gem_lru_remove[_locked]() call.
+	 */
+	if (obj->lru)
+		drm_gem_lru_remove_locked(obj);
 	mutex_unlock(lru->lock);
 }
 EXPORT_SYMBOL(drm_gem_lru_remove);
@@ -1660,15 +1680,17 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
 		if (!obj)
 			break;
 
-		drm_gem_lru_move_tail_locked(&still_in_lru, obj);
-
 		/*
 		 * If it's in the process of being freed, gem_object->free()
-		 * may be blocked on lock waiting to remove it.  So just
-		 * skip it.
+		 * may be blocked on lock waiting to remove it.  So just remove
+		 * it from its current LRU and skip it.
 		 */
-		if (!kref_get_unless_zero(&obj->refcount))
+		if (!kref_get_unless_zero(&obj->refcount)) {
+			drm_gem_lru_remove_locked(obj);
 			continue;
+		}
+
+		drm_gem_lru_move_tail_locked(&still_in_lru, obj);
 
 		/*
 		 * Now that we own a reference, we can drop the lock for the

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/4] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper
  2026-05-08 10:40 [PATCH v2 0/4] drm/panthor: Fix a race in the shrinker logic Boris Brezillon
  2026-05-08 10:40 ` [PATCH v2 1/4] drm/panthor: Don't use the racy drm_gem_lru_remove() helper Boris Brezillon
  2026-05-08 10:40 ` [PATCH v2 2/4] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release() Boris Brezillon
@ 2026-05-08 10:40 ` Boris Brezillon
  2026-05-08 15:00   ` Liviu Dudau
  2026-05-08 10:40 ` [PATCH v2 4/4] drm/gem: Make the GEM LRU lock part of drm_device Boris Brezillon
  3 siblings, 1 reply; 7+ messages in thread
From: Boris Brezillon @ 2026-05-08 10:40 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau, Boris Brezillon
  Cc: Dmitry Osipenko, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Akash Goel,
	Chia-I Wu, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
	freedreno, dri-devel, linux-kernel

The only place where it's safe to call drm_gem_lru_remove() is when
we know the drm_gem_object::lru field can't be concurrently updated,
which we know is the case when the drm_gem_object is destroyed.

Rather than trying to make that safe, let's kill the function and inline
its content in drm_gem_object_release().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/drm_gem.c | 90 ++++++++++++++++++++---------------------------
 include/drm/drm_gem.h     |  1 -
 2 files changed, 39 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 0e087c770883..c85a39b8b163 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1108,6 +1108,15 @@ drm_gem_release(struct drm_device *dev, struct drm_file *file_private)
 	idr_destroy(&file_private->object_idr);
 }
 
+static void
+drm_gem_lru_remove_locked(struct drm_gem_object *obj)
+{
+	obj->lru->count -= obj->size >> PAGE_SHIFT;
+	WARN_ON(obj->lru->count < 0);
+	list_del(&obj->lru_node);
+	obj->lru = NULL;
+}
+
 /**
  * drm_gem_object_release - release GEM buffer object resources
  * @obj: GEM buffer object
@@ -1118,13 +1127,42 @@ drm_gem_release(struct drm_device *dev, struct drm_file *file_private)
 void
 drm_gem_object_release(struct drm_gem_object *obj)
 {
+	struct drm_gem_lru *lru;
+
 	if (obj->filp)
 		fput(obj->filp);
 
 	drm_gem_private_object_fini(obj);
 
 	drm_gem_free_mmap_offset(obj);
-	drm_gem_lru_remove(obj);
+
+	/*
+	 * We do the lru != NULL check without the lru->lock held, which
+	 * means we might end up with a stale lru value by the time the
+	 * lock is acquired.
+	 *
+	 * This is deemed safe because:
+	 * 1. the LRU is assumed to outlive any GEM object it was attached
+	 *    (LRUs are usually bound to a drm_device). So even if obj->lru
+	 *    has become NULL, it still point to a valid object that can
+	 *    safely be dereferenced to get the lock.
+	 *
+	 * 2. all LRUs a GEM object might be attached to must share the same
+	 *    lock (lock that's usually part of the driver-specific device
+	 *    object), so taking the lock on the 'old' LRU is equivalent
+	 *    to taking it on the new one (if any)
+	 */
+	lru = obj->lru;
+	if (lru) {
+		guard(mutex)(lru->lock);
+
+		/* Check a second time with the lock held to make sure we're
+		 * not racing with the drm_gem_lru_remove_locked() call in
+		 * drm_gem_lru_scan().
+		 */
+		if (obj->lru)
+			drm_gem_lru_remove_locked(obj);
+	}
 }
 EXPORT_SYMBOL(drm_gem_object_release);
 
@@ -1552,56 +1590,6 @@ drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock)
 }
 EXPORT_SYMBOL(drm_gem_lru_init);
 
-static void
-drm_gem_lru_remove_locked(struct drm_gem_object *obj)
-{
-	obj->lru->count -= obj->size >> PAGE_SHIFT;
-	WARN_ON(obj->lru->count < 0);
-	list_del(&obj->lru_node);
-	obj->lru = NULL;
-}
-
-/**
- * drm_gem_lru_remove - remove object from whatever LRU it is in
- *
- * If the object is currently in any LRU, remove it.
- *
- * @obj: The GEM object to remove from current LRU
- */
-void
-drm_gem_lru_remove(struct drm_gem_object *obj)
-{
-	struct drm_gem_lru *lru = obj->lru;
-
-	/*
-	 * We do the lru != NULL check without the lru->lock held, which
-	 * means we might end up with a stale lru value by the time the
-	 * lock is acquired.
-	 *
-	 * This is deemed safe because:
-	 * 1. the LRU is assumed to outlive any GEM object it was attached
-	 *    (LRUs are usually bound to a drm_device). So even if obj->lru
-	 *    has become NULL, it still point to a valid object that can
-	 *    safely be dereferenced to get the lock.
-	 *
-	 * 2. all LRUs a GEM object might be attached to must share the same
-	 *    lock (lock that's usually part of the driver-specific device
-	 *    object), so taking the lock on the 'old' LRU is equivalent
-	 *    to taking it on the new one (if any)
-	 */
-	if (!lru)
-		return;
-
-	mutex_lock(lru->lock);
-	/* Check a second time with the lock held to make sure we're not racing
-	 * with another drm_gem_lru_remove[_locked]() call.
-	 */
-	if (obj->lru)
-		drm_gem_lru_remove_locked(obj);
-	mutex_unlock(lru->lock);
-}
-EXPORT_SYMBOL(drm_gem_lru_remove);
-
 /**
  * drm_gem_lru_move_tail_locked - move the object to the tail of the LRU
  *
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 86f5846154f7..d527df98d142 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -611,7 +611,6 @@ int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
 			    u32 handle, u64 *offset);
 
 void drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock);
-void drm_gem_lru_remove(struct drm_gem_object *obj);
 void drm_gem_lru_move_tail_locked(struct drm_gem_lru *lru, struct drm_gem_object *obj);
 void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
 unsigned long

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] drm/gem: Make the GEM LRU lock part of drm_device
  2026-05-08 10:40 [PATCH v2 0/4] drm/panthor: Fix a race in the shrinker logic Boris Brezillon
                   ` (2 preceding siblings ...)
  2026-05-08 10:40 ` [PATCH v2 3/4] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper Boris Brezillon
@ 2026-05-08 10:40 ` Boris Brezillon
  3 siblings, 0 replies; 7+ messages in thread
From: Boris Brezillon @ 2026-05-08 10:40 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau, Boris Brezillon
  Cc: Dmitry Osipenko, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Akash Goel,
	Chia-I Wu, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
	freedreno, dri-devel, linux-kernel

Recently, a few races have been discovered in the GEM LRU logic, all
of them caused by the fact the LRU lock is accessed through
gem->lru->lock, and that lock itself also protects changes to
gem->lru, leading to situations where gem->lru needs to first be
accessed without the lock held, to then get the lru to access the lock
through and finally take the lock and do the expected operation.

Currently, the two drivers making use of this API declare device-wide
locks, and there's no clue that we will ever have a driver that wants
different pools of LRUs protected by different locks under the same
drm_device. So we're better off moving this lock to drm_device and
always locking it through obj->dev->gem_lru_mutex, or directly through
dev->gem_lru_mutex.

If anyone ever needs more fine-grained locking, this can be revisited
to pass some drm_gem_lru_pool object represent the pool of LRUs under
a specific lock, but for now, the per-device lock seems to be enough.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/drm_drv.c                |  2 ++
 drivers/gpu/drm/drm_gem.c                | 49 ++++++++------------------------
 drivers/gpu/drm/msm/msm_drv.c            | 11 ++++---
 drivers/gpu/drm/msm/msm_drv.h            |  7 -----
 drivers/gpu/drm/msm/msm_gem.c            | 32 ++++++++++-----------
 drivers/gpu/drm/msm/msm_gem_shrinker.c   |  4 +--
 drivers/gpu/drm/msm/msm_gem_submit.c     |  6 ++--
 drivers/gpu/drm/msm/msm_gem_vma.c        | 12 ++++----
 drivers/gpu/drm/msm/msm_ringbuffer.c     |  6 ++--
 drivers/gpu/drm/panthor/panthor_device.h |  3 --
 drivers/gpu/drm/panthor/panthor_gem.c    | 21 ++++++--------
 drivers/gpu/drm/panthor/panthor_mmu.c    | 29 ++++++++++---------
 include/drm/drm_device.h                 |  7 +++++
 include/drm/drm_gem.h                    | 20 ++++++-------
 14 files changed, 88 insertions(+), 121 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 985c283cf59f..675675480da4 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -697,6 +697,7 @@ static void drm_dev_init_release(struct drm_device *dev, void *res)
 	mutex_destroy(&dev->master_mutex);
 	mutex_destroy(&dev->clientlist_mutex);
 	mutex_destroy(&dev->filelist_mutex);
+	mutex_destroy(&dev->gem_lru_mutex);
 }
 
 static int drm_dev_init(struct drm_device *dev,
@@ -738,6 +739,7 @@ static int drm_dev_init(struct drm_device *dev,
 	INIT_LIST_HEAD(&dev->vblank_event_list);
 
 	spin_lock_init(&dev->event_lock);
+	mutex_init(&dev->gem_lru_mutex);
 	mutex_init(&dev->filelist_mutex);
 	mutex_init(&dev->clientlist_mutex);
 	mutex_init(&dev->master_mutex);
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index c85a39b8b163..a0e6668e93f2 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1127,8 +1127,6 @@ drm_gem_lru_remove_locked(struct drm_gem_object *obj)
 void
 drm_gem_object_release(struct drm_gem_object *obj)
 {
-	struct drm_gem_lru *lru;
-
 	if (obj->filp)
 		fput(obj->filp);
 
@@ -1136,30 +1134,7 @@ drm_gem_object_release(struct drm_gem_object *obj)
 
 	drm_gem_free_mmap_offset(obj);
 
-	/*
-	 * We do the lru != NULL check without the lru->lock held, which
-	 * means we might end up with a stale lru value by the time the
-	 * lock is acquired.
-	 *
-	 * This is deemed safe because:
-	 * 1. the LRU is assumed to outlive any GEM object it was attached
-	 *    (LRUs are usually bound to a drm_device). So even if obj->lru
-	 *    has become NULL, it still point to a valid object that can
-	 *    safely be dereferenced to get the lock.
-	 *
-	 * 2. all LRUs a GEM object might be attached to must share the same
-	 *    lock (lock that's usually part of the driver-specific device
-	 *    object), so taking the lock on the 'old' LRU is equivalent
-	 *    to taking it on the new one (if any)
-	 */
-	lru = obj->lru;
-	if (lru) {
-		guard(mutex)(lru->lock);
-
-		/* Check a second time with the lock held to make sure we're
-		 * not racing with the drm_gem_lru_remove_locked() call in
-		 * drm_gem_lru_scan().
-		 */
+	scoped_guard(mutex, &obj->dev->gem_lru_mutex) {
 		if (obj->lru)
 			drm_gem_lru_remove_locked(obj);
 	}
@@ -1582,9 +1557,8 @@ EXPORT_SYMBOL(drm_gem_unlock_reservations);
  * @lock: The lock protecting the LRU
  */
 void
-drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock)
+drm_gem_lru_init(struct drm_gem_lru *lru)
 {
-	lru->lock = lock;
 	lru->count = 0;
 	INIT_LIST_HEAD(&lru->list);
 }
@@ -1601,7 +1575,7 @@ EXPORT_SYMBOL(drm_gem_lru_init);
 void
 drm_gem_lru_move_tail_locked(struct drm_gem_lru *lru, struct drm_gem_object *obj)
 {
-	lockdep_assert_held_once(lru->lock);
+	lockdep_assert_held_once(&obj->dev->gem_lru_mutex);
 
 	if (obj->lru)
 		drm_gem_lru_remove_locked(obj);
@@ -1625,9 +1599,9 @@ EXPORT_SYMBOL(drm_gem_lru_move_tail_locked);
 void
 drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj)
 {
-	mutex_lock(lru->lock);
+	mutex_lock(&obj->dev->gem_lru_mutex);
 	drm_gem_lru_move_tail_locked(lru, obj);
-	mutex_unlock(lru->lock);
+	mutex_unlock(&obj->dev->gem_lru_mutex);
 }
 EXPORT_SYMBOL(drm_gem_lru_move_tail);
 
@@ -1648,7 +1622,8 @@ EXPORT_SYMBOL(drm_gem_lru_move_tail);
  * @ticket: Optional ww_acquire_ctx context to use for locking
  */
 unsigned long
-drm_gem_lru_scan(struct drm_gem_lru *lru,
+drm_gem_lru_scan(struct drm_device *dev,
+		 struct drm_gem_lru *lru,
 		 unsigned int nr_to_scan,
 		 unsigned long *remaining,
 		 bool (*shrink)(struct drm_gem_object *obj, struct ww_acquire_ctx *ticket),
@@ -1658,9 +1633,9 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
 	struct drm_gem_object *obj;
 	unsigned freed = 0;
 
-	drm_gem_lru_init(&still_in_lru, lru->lock);
+	drm_gem_lru_init(&still_in_lru);
 
-	mutex_lock(lru->lock);
+	mutex_lock(&dev->gem_lru_mutex);
 
 	while (freed < nr_to_scan) {
 		obj = list_first_entry_or_null(&lru->list, typeof(*obj), lru_node);
@@ -1685,7 +1660,7 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
 		 * rest of the loop body, to reduce contention with other
 		 * code paths that need the LRU lock
 		 */
-		mutex_unlock(lru->lock);
+		mutex_unlock(&dev->gem_lru_mutex);
 
 		if (ticket)
 			ww_acquire_init(ticket, &reservation_ww_class);
@@ -1729,7 +1704,7 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
 
 tail:
 		drm_gem_object_put(obj);
-		mutex_lock(lru->lock);
+		mutex_lock(&dev->gem_lru_mutex);
 	}
 
 	/*
@@ -1741,7 +1716,7 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
 	list_splice_tail(&still_in_lru.list, &lru->list);
 	lru->count += still_in_lru.count;
 
-	mutex_unlock(lru->lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 
 	return freed;
 }
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 195f40e331e5..cc2bcd14b1c2 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -128,11 +128,10 @@ static int msm_drm_init(struct device *dev, const struct drm_driver *drv,
 	/*
 	 * Initialize the LRUs:
 	 */
-	mutex_init(&priv->lru.lock);
-	drm_gem_lru_init(&priv->lru.unbacked, &priv->lru.lock);
-	drm_gem_lru_init(&priv->lru.pinned,   &priv->lru.lock);
-	drm_gem_lru_init(&priv->lru.willneed, &priv->lru.lock);
-	drm_gem_lru_init(&priv->lru.dontneed, &priv->lru.lock);
+	drm_gem_lru_init(&priv->lru.unbacked);
+	drm_gem_lru_init(&priv->lru.pinned);
+	drm_gem_lru_init(&priv->lru.willneed);
+	drm_gem_lru_init(&priv->lru.dontneed);
 
 	/* Initialize stall-on-fault */
 	spin_lock_init(&priv->fault_stall_lock);
@@ -140,7 +139,7 @@ static int msm_drm_init(struct device *dev, const struct drm_driver *drv,
 
 	/* Teach lockdep about lock ordering wrt. shrinker: */
 	fs_reclaim_acquire(GFP_KERNEL);
-	might_lock(&priv->lru.lock);
+	might_lock(&ddev->gem_lru_mutex);
 	fs_reclaim_release(GFP_KERNEL);
 
 	if (priv->kms_init) {
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 76ac61df0b35..c3fb3205f683 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -150,13 +150,6 @@ struct msm_drm_private {
 		 * DONTNEED state (ie. can be purged)
 		 */
 		struct drm_gem_lru dontneed;
-
-		/**
-		 * lock:
-		 *
-		 * Protects manipulation of all of the LRUs.
-		 */
-		struct mutex lock;
 	} lru;
 
 	struct notifier_block vmap_notifier;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 2cb3ab04f125..070f5fc4bc17 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -177,11 +177,11 @@ static void update_lru_locked(struct drm_gem_object *obj)
 
 static void update_lru(struct drm_gem_object *obj)
 {
-	struct msm_drm_private *priv = obj->dev->dev_private;
+	struct drm_device *dev = obj->dev;
 
-	mutex_lock(&priv->lru.lock);
+	mutex_lock(&dev->gem_lru_mutex);
 	update_lru_locked(obj);
-	mutex_unlock(&priv->lru.lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 }
 
 static struct page **get_pages(struct drm_gem_object *obj)
@@ -292,11 +292,11 @@ void msm_gem_pin_obj_locked(struct drm_gem_object *obj)
 
 static void pin_obj_locked(struct drm_gem_object *obj)
 {
-	struct msm_drm_private *priv = obj->dev->dev_private;
+	struct drm_device *dev = obj->dev;
 
-	mutex_lock(&priv->lru.lock);
+	mutex_lock(&dev->gem_lru_mutex);
 	msm_gem_pin_obj_locked(obj);
-	mutex_unlock(&priv->lru.lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 }
 
 struct page **msm_gem_pin_pages_locked(struct drm_gem_object *obj)
@@ -487,16 +487,16 @@ int msm_gem_pin_vma_locked(struct drm_gem_object *obj, struct drm_gpuva *vma)
 
 void msm_gem_unpin_locked(struct drm_gem_object *obj)
 {
-	struct msm_drm_private *priv = obj->dev->dev_private;
+	struct drm_device *dev = obj->dev;
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
 	msm_gem_assert_locked(obj);
 
-	mutex_lock(&priv->lru.lock);
+	mutex_lock(&dev->gem_lru_mutex);
 	msm_obj->pin_count--;
 	GEM_WARN_ON(msm_obj->pin_count < 0);
 	update_lru_locked(obj);
-	mutex_unlock(&priv->lru.lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 }
 
 /* Special unpin path for use in fence-signaling path, avoiding the need
@@ -507,10 +507,10 @@ void msm_gem_unpin_locked(struct drm_gem_object *obj)
  */
 void msm_gem_unpin_active(struct drm_gem_object *obj)
 {
-	struct msm_drm_private *priv = obj->dev->dev_private;
+	struct drm_device *dev = obj->dev;
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-	GEM_WARN_ON(!mutex_is_locked(&priv->lru.lock));
+	GEM_WARN_ON(!mutex_is_locked(&dev->gem_lru_mutex));
 
 	msm_obj->pin_count--;
 	GEM_WARN_ON(msm_obj->pin_count < 0);
@@ -797,12 +797,12 @@ void msm_gem_put_vaddr(struct drm_gem_object *obj)
  */
 int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv)
 {
-	struct msm_drm_private *priv = obj->dev->dev_private;
+	struct drm_device *dev = obj->dev;
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
 	msm_gem_lock(obj);
 
-	mutex_lock(&priv->lru.lock);
+	mutex_lock(&dev->gem_lru_mutex);
 
 	if (msm_obj->madv != __MSM_MADV_PURGED)
 		msm_obj->madv = madv;
@@ -814,7 +814,7 @@ int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv)
 	 */
 	update_lru_locked(obj);
 
-	mutex_unlock(&priv->lru.lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 
 	msm_gem_unlock(obj);
 
@@ -839,10 +839,10 @@ void msm_gem_purge(struct drm_gem_object *obj)
 
 	put_pages(obj);
 
-	mutex_lock(&priv->lru.lock);
+	mutex_lock(&dev->gem_lru_mutex);
 	/* A one-way transition: */
 	msm_obj->madv = __MSM_MADV_PURGED;
-	mutex_unlock(&priv->lru.lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 
 	drm_gem_free_mmap_offset(obj);
 
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 31fa51a44f86..c07af9602fee 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -186,7 +186,7 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 		if (!stages[i].cond)
 			continue;
 		stages[i].freed =
-			drm_gem_lru_scan(stages[i].lru, nr,
+			drm_gem_lru_scan(priv->dev, stages[i].lru, nr,
 					 &stages[i].remaining,
 					 stages[i].shrink,
 					 &ticket);
@@ -255,7 +255,7 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr)
 	unsigned long remaining = 0;
 
 	for (idx = 0; lrus[idx] && unmapped < vmap_shrink_limit; idx++) {
-		unmapped += drm_gem_lru_scan(lrus[idx],
+		unmapped += drm_gem_lru_scan(priv->dev, lrus[idx],
 					     vmap_shrink_limit - unmapped,
 					     &remaining,
 					     vmap_shrink,
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 26ea8a28be47..3c6bc90c3d48 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -352,7 +352,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit)
 
 static int submit_pin_objects(struct msm_gem_submit *submit)
 {
-	struct msm_drm_private *priv = submit->dev->dev_private;
+	struct drm_device *dev = submit->dev;
 	int i, ret = 0;
 
 	for (i = 0; i < submit->nr_bos; i++) {
@@ -381,11 +381,11 @@ static int submit_pin_objects(struct msm_gem_submit *submit)
 	 * get_pages() which could trigger reclaim.. and if we held the LRU lock
 	 * could trigger deadlock with the shrinker).
 	 */
-	mutex_lock(&priv->lru.lock);
+	mutex_lock(&dev->gem_lru_mutex);
 	for (i = 0; i < submit->nr_bos; i++) {
 		msm_gem_pin_obj_locked(submit->bos[i].obj);
 	}
-	mutex_unlock(&priv->lru.lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 
 	submit->bos_pinned = true;
 
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c b/drivers/gpu/drm/msm/msm_gem_vma.c
index 271691ae32c3..3ed05ab0eeef 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -702,7 +702,7 @@ static struct dma_fence *
 msm_vma_job_run(struct drm_sched_job *_job)
 {
 	struct msm_vm_bind_job *job = to_msm_vm_bind_job(_job);
-	struct msm_drm_private *priv = job->vm->drm->dev_private;
+	struct drm_device *dev = job->vm->drm;
 	struct msm_gem_vm *vm = to_msm_vm(job->vm);
 	struct drm_gem_object *obj;
 	int ret = vm->unusable ? -EINVAL : 0;
@@ -745,13 +745,13 @@ msm_vma_job_run(struct drm_sched_job *_job)
 	if (ret)
 		msm_gem_vm_unusable(job->vm);
 
-	mutex_lock(&priv->lru.lock);
+	mutex_lock(&dev->gem_lru_mutex);
 
 	job_foreach_bo (obj, job) {
 		msm_gem_unpin_active(obj);
 	}
 
-	mutex_unlock(&priv->lru.lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 
 	/* VM_BIND ops are synchronous, so no fence to wait on: */
 	return NULL;
@@ -1304,7 +1304,7 @@ vm_bind_job_pin_objects(struct msm_vm_bind_job *job)
 			return PTR_ERR(pages);
 	}
 
-	struct msm_drm_private *priv = job->vm->drm->dev_private;
+	struct drm_device *dev = job->vm->drm;
 
 	/*
 	 * A second loop while holding the LRU lock (a) avoids acquiring/dropping
@@ -1313,10 +1313,10 @@ vm_bind_job_pin_objects(struct msm_vm_bind_job *job)
 	 * get_pages() which could trigger reclaim.. and if we held the LRU lock
 	 * could trigger deadlock with the shrinker).
 	 */
-	mutex_lock(&priv->lru.lock);
+	mutex_lock(&dev->gem_lru_mutex);
 	job_foreach_bo (obj, job)
 		msm_gem_pin_obj_locked(obj);
-	mutex_unlock(&priv->lru.lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 
 	job->bos_pinned = true;
 
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index a7dafa7ab4b1..0d14c31bd4e4 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -16,13 +16,13 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
 	struct msm_gem_submit *submit = to_msm_submit(job);
 	struct msm_fence_context *fctx = submit->ring->fctx;
 	struct msm_gpu *gpu = submit->gpu;
-	struct msm_drm_private *priv = gpu->dev->dev_private;
+	struct drm_device *dev = gpu->dev;
 	unsigned nr_cmds = submit->nr_cmds;
 	int i;
 
 	msm_fence_init(submit->hw_fence, fctx);
 
-	mutex_lock(&priv->lru.lock);
+	mutex_lock(&dev->gem_lru_mutex);
 
 	for (i = 0; i < submit->nr_bos; i++) {
 		struct drm_gem_object *obj = submit->bos[i].obj;
@@ -32,7 +32,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
 
 	submit->bos_pinned = false;
 
-	mutex_unlock(&priv->lru.lock);
+	mutex_unlock(&dev->gem_lru_mutex);
 
 	/* TODO move submit path over to using a per-ring lock.. */
 	mutex_lock(&gpu->lock);
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index dcdce75b683b..cc5720312fa9 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -187,9 +187,6 @@ struct panthor_device {
 		/** @reclaim.shrinker: Shrinker instance */
 		struct shrinker *shrinker;
 
-		/** @reclaim.lock: Lock protecting all LRUs */
-		struct mutex lock;
-
 		/**
 		 * @reclaim.unreclaimable: unreclaimable BOs
 		 *
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 8e31740126e7..450516d55faa 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -1497,13 +1497,13 @@ panthor_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 	if (!can_swap())
 		goto out;
 
-	freed += drm_gem_lru_scan(&ptdev->reclaim.unused,
+	freed += drm_gem_lru_scan(&ptdev->base, &ptdev->reclaim.unused,
 				  sc->nr_to_scan - freed, &remaining,
 				  panthor_gem_try_evict_no_resv_wait, NULL);
 	if (freed >= sc->nr_to_scan)
 		goto out;
 
-	freed += drm_gem_lru_scan(&ptdev->reclaim.mmapped,
+	freed += drm_gem_lru_scan(&ptdev->base, &ptdev->reclaim.mmapped,
 				  sc->nr_to_scan - freed, &remaining,
 				  panthor_gem_try_evict_no_resv_wait, NULL);
 	if (freed >= sc->nr_to_scan)
@@ -1517,7 +1517,7 @@ panthor_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 	if (freed >= sc->nr_to_scan)
 		goto out;
 
-	freed += drm_gem_lru_scan(&ptdev->reclaim.gpu_mapped_shared,
+	freed += drm_gem_lru_scan(&ptdev->base, &ptdev->reclaim.gpu_mapped_shared,
 				  sc->nr_to_scan - freed, &remaining,
 				  panthor_gem_try_evict, NULL);
 
@@ -1546,22 +1546,17 @@ panthor_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 int panthor_gem_shrinker_init(struct panthor_device *ptdev)
 {
 	struct shrinker *shrinker;
-	int ret;
-
-	ret = drmm_mutex_init(&ptdev->base, &ptdev->reclaim.lock);
-	if (ret)
-		return ret;
 
 	INIT_LIST_HEAD(&ptdev->reclaim.vms);
-	drm_gem_lru_init(&ptdev->reclaim.unreclaimable, &ptdev->reclaim.lock);
-	drm_gem_lru_init(&ptdev->reclaim.unused, &ptdev->reclaim.lock);
-	drm_gem_lru_init(&ptdev->reclaim.mmapped, &ptdev->reclaim.lock);
-	drm_gem_lru_init(&ptdev->reclaim.gpu_mapped_shared, &ptdev->reclaim.lock);
+	drm_gem_lru_init(&ptdev->reclaim.unreclaimable);
+	drm_gem_lru_init(&ptdev->reclaim.unused);
+	drm_gem_lru_init(&ptdev->reclaim.mmapped);
+	drm_gem_lru_init(&ptdev->reclaim.gpu_mapped_shared);
 	ptdev->reclaim.gpu_mapped_count = 0;
 
 	/* Teach lockdep about lock ordering wrt. shrinker: */
 	fs_reclaim_acquire(GFP_KERNEL);
-	might_lock(&ptdev->reclaim.lock);
+	might_lock(&ptdev->base.gem_lru_mutex);
 	fs_reclaim_release(GFP_KERNEL);
 
 	shrinker = shrinker_alloc(0, "drm-panthor-gem");
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index 452d0b6d4668..9d4500850561 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -715,10 +715,10 @@ int panthor_vm_active(struct panthor_vm *vm)
 	 * never became active in the first place will be reclaimed last, but
 	 * that's an acceptable trade-off.
 	 */
-	mutex_lock(&ptdev->reclaim.lock);
+	mutex_lock(&ptdev->base.gem_lru_mutex);
 	if (vm->reclaim.lru.count)
 		list_move_tail(&vm->reclaim.lru_node, &ptdev->reclaim.vms);
-	mutex_unlock(&ptdev->reclaim.lock);
+	mutex_unlock(&ptdev->base.gem_lru_mutex);
 
 	/* Make sure we don't race with lock/unlock_region() calls
 	 * happening around VM bind operations.
@@ -1962,9 +1962,9 @@ static void panthor_vm_free(struct drm_gpuvm *gpuvm)
 	struct panthor_vm *vm = container_of(gpuvm, struct panthor_vm, base);
 	struct panthor_device *ptdev = vm->ptdev;
 
-	mutex_lock(&ptdev->reclaim.lock);
+	mutex_lock(&ptdev->base.gem_lru_mutex);
 	list_del_init(&vm->reclaim.lru_node);
-	mutex_unlock(&ptdev->reclaim.lock);
+	mutex_unlock(&ptdev->base.gem_lru_mutex);
 
 	mutex_lock(&vm->heaps.lock);
 	if (drm_WARN_ON(&ptdev->base, vm->heaps.pool))
@@ -2360,11 +2360,11 @@ void panthor_vm_update_bo_reclaim_lru_locked(struct panthor_gem_object *bo)
 		drm_WARN_ON(&ptdev->base, vm);
 		vm = container_of(vm_bo->vm, struct panthor_vm, base);
 
-		mutex_lock(&ptdev->reclaim.lock);
+		mutex_lock(&ptdev->base.gem_lru_mutex);
 		drm_gem_lru_move_tail_locked(&vm->reclaim.lru, &bo->base);
 		if (list_empty(&vm->reclaim.lru_node))
 			list_move(&vm->reclaim.lru_node, &ptdev->reclaim.vms);
-		mutex_unlock(&ptdev->reclaim.lock);
+		mutex_unlock(&ptdev->base.gem_lru_mutex);
 	}
 }
 
@@ -2774,7 +2774,7 @@ panthor_vm_create(struct panthor_device *ptdev, bool for_mcu,
 	vm->kernel_auto_va.start = auto_kernel_va_start;
 	vm->kernel_auto_va.end = vm->kernel_auto_va.start + auto_kernel_va_size - 1;
 
-	drm_gem_lru_init(&vm->reclaim.lru, &ptdev->reclaim.lock);
+	drm_gem_lru_init(&vm->reclaim.lru);
 	INIT_LIST_HEAD(&vm->reclaim.lru_node);
 	INIT_LIST_HEAD(&vm->node);
 	INIT_LIST_HEAD(&vm->as.lru_node);
@@ -3140,7 +3140,7 @@ panthor_mmu_reclaim_priv_bos(struct panthor_device *ptdev,
 	LIST_HEAD(remaining_vms);
 	LIST_HEAD(vms);
 
-	mutex_lock(&ptdev->reclaim.lock);
+	mutex_lock(&ptdev->base.gem_lru_mutex);
 	list_splice_init(&ptdev->reclaim.vms, &vms);
 
 	while (freed < nr_to_scan) {
@@ -3156,12 +3156,13 @@ panthor_mmu_reclaim_priv_bos(struct panthor_device *ptdev,
 			continue;
 		}
 
-		mutex_unlock(&ptdev->reclaim.lock);
+		mutex_unlock(&ptdev->base.gem_lru_mutex);
 
-		freed += drm_gem_lru_scan(&vm->reclaim.lru, nr_to_scan - freed,
+		freed += drm_gem_lru_scan(&ptdev->base, &vm->reclaim.lru,
+					  nr_to_scan - freed,
 					  remaining, shrink, NULL);
 
-		mutex_lock(&ptdev->reclaim.lock);
+		mutex_lock(&ptdev->base.gem_lru_mutex);
 
 		/* If the VM is still in the temporary list, remove it so we
 		 * can proceed with the next VM.
@@ -3177,11 +3178,11 @@ panthor_mmu_reclaim_priv_bos(struct panthor_device *ptdev,
 				list_add_tail(&vm->reclaim.lru_node, &remaining_vms);
 		}
 
-		mutex_unlock(&ptdev->reclaim.lock);
+		mutex_unlock(&ptdev->base.gem_lru_mutex);
 
 		panthor_vm_put(vm);
 
-		mutex_lock(&ptdev->reclaim.lock);
+		mutex_lock(&ptdev->base.gem_lru_mutex);
 	}
 
 	/* Re-insert VMs with remaining data to reclaim at the beginning of
@@ -3192,7 +3193,7 @@ panthor_mmu_reclaim_priv_bos(struct panthor_device *ptdev,
 	 */
 	list_splice_tail(&vms, &remaining_vms);
 	list_splice(&remaining_vms, &ptdev->reclaim.vms);
-	mutex_unlock(&ptdev->reclaim.lock);
+	mutex_unlock(&ptdev->base.gem_lru_mutex);
 
 	return freed;
 }
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index bc78fb77cc27..768a8dae83c5 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -375,6 +375,13 @@ struct drm_device {
 	 * Root directory for debugfs files.
 	 */
 	struct dentry *debugfs_root;
+
+	/**
+	 * @gem_lru_mutex:
+	 *
+	 * Lock protecting movement of GEM objects between LRUs.
+	 */
+	struct mutex gem_lru_mutex;
 };
 
 void drm_dev_set_dma_dev(struct drm_device *dev, struct device *dma_dev);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index d527df98d142..dd1a9cd559be 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -245,17 +245,11 @@ struct drm_gem_object_funcs {
  * for lockless &shrinker.count_objects, and provides
  * &drm_gem_lru_scan for driver's &shrinker.scan_objects
  * implementation.
+ *
+ * Any access to this kind of object must be done with
+ * drm_device::gem_lru_mutex held.
  */
 struct drm_gem_lru {
-	/**
-	 * @lock:
-	 *
-	 * Lock protecting movement of GEM objects between LRUs.  All
-	 * LRUs that the object can move between should be protected
-	 * by the same lock.
-	 */
-	struct mutex *lock;
-
 	/**
 	 * @count:
 	 *
@@ -453,6 +447,9 @@ struct drm_gem_object {
 	 * @lru:
 	 *
 	 * The current LRU list that the GEM object is on.
+	 *
+	 * Access to this field must be done with drm_device::gem_lru_mutex
+	 * held.
 	 */
 	struct drm_gem_lru *lru;
 };
@@ -610,11 +607,12 @@ void drm_gem_unlock_reservations(struct drm_gem_object **objs, int count,
 int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
 			    u32 handle, u64 *offset);
 
-void drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock);
+void drm_gem_lru_init(struct drm_gem_lru *lru);
 void drm_gem_lru_move_tail_locked(struct drm_gem_lru *lru, struct drm_gem_object *obj);
 void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
 unsigned long
-drm_gem_lru_scan(struct drm_gem_lru *lru,
+drm_gem_lru_scan(struct drm_device *dev,
+		 struct drm_gem_lru *lru,
 		 unsigned int nr_to_scan,
 		 unsigned long *remaining,
 		 bool (*shrink)(struct drm_gem_object *obj, struct ww_acquire_ctx *ticket),

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/4] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release()
  2026-05-08 10:40 ` [PATCH v2 2/4] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release() Boris Brezillon
@ 2026-05-08 13:49   ` Liviu Dudau
  0 siblings, 0 replies; 7+ messages in thread
From: Liviu Dudau @ 2026-05-08 13:49 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Dmitry Osipenko, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Akash Goel,
	Chia-I Wu, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
	freedreno, dri-devel, linux-kernel

On Fri, May 08, 2026 at 12:40:48PM +0200, Boris Brezillon wrote:
> The following race can currently happen:
> 
> | Thread 0 in `drm_gem_lru_scan`               | Thread 1 in `drm_gem_object_release` |
> | -                                            | -                                    |
> | move obj1 with refcount==0 to `still_in_lru` |                                      |
> | move obj2 with refcount!=0 to `still_in_lru` |                                      |
> | mutex_unlock                                 |                                      |
> | shrink obj2                                  |                                      |
> |                                              | lru = obj1->lru; // `still_in_lru`   |
> | mutex_lock                                   |                                      |
> | move obj1 back to the original lru           |                                      |
> | mutex_unlock                                 |                                      |
> | return                                       |                                      |
> |                                              | dereference `still_in_lru`           |
> 
> Move the drm_gem_lru_move_tail_locked() after the
> kref_get_unless_zero() check so that we don't end up with a
> vanishing LRU when we hit drm_gem_object_release(). We also need to
> remove the skipped object from its LRU, otherwise we'll keep hitting
> it on subsequent loop iterations until it's actually removed from the
> list in the drm_gem_release().
> 
> Fixes: e7c2af13f811 ("drm/gem: Add LRU/shrinker helper")
> Reported-by: Chia-I Wu <olvaffe@gmail.com>
> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

> ---
>  drivers/gpu/drm/drm_gem.c | 34 ++++++++++++++++++++++++++++------
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index fca42949eb2b..0e087c770883 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1573,11 +1573,31 @@ drm_gem_lru_remove(struct drm_gem_object *obj)
>  {
>  	struct drm_gem_lru *lru = obj->lru;
>  
> +	/*
> +	 * We do the lru != NULL check without the lru->lock held, which
> +	 * means we might end up with a stale lru value by the time the
> +	 * lock is acquired.
> +	 *
> +	 * This is deemed safe because:
> +	 * 1. the LRU is assumed to outlive any GEM object it was attached
> +	 *    (LRUs are usually bound to a drm_device). So even if obj->lru
> +	 *    has become NULL, it still point to a valid object that can
> +	 *    safely be dereferenced to get the lock.
> +	 *
> +	 * 2. all LRUs a GEM object might be attached to must share the same
> +	 *    lock (lock that's usually part of the driver-specific device
> +	 *    object), so taking the lock on the 'old' LRU is equivalent
> +	 *    to taking it on the new one (if any)

I like the description, but I think it's worth merging the later comment around
the second check here as that is basically the whole "belt and braces" mechanism
for ensuring correctness.

Best regards,
Liviu

> +	 */
>  	if (!lru)
>  		return;
>  
>  	mutex_lock(lru->lock);
> -	drm_gem_lru_remove_locked(obj);
> +	/* Check a second time with the lock held to make sure we're not racing
> +	 * with another drm_gem_lru_remove[_locked]() call.
> +	 */
> +	if (obj->lru)
> +		drm_gem_lru_remove_locked(obj);
>  	mutex_unlock(lru->lock);
>  }
>  EXPORT_SYMBOL(drm_gem_lru_remove);
> @@ -1660,15 +1680,17 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
>  		if (!obj)
>  			break;
>  
> -		drm_gem_lru_move_tail_locked(&still_in_lru, obj);
> -
>  		/*
>  		 * If it's in the process of being freed, gem_object->free()
> -		 * may be blocked on lock waiting to remove it.  So just
> -		 * skip it.
> +		 * may be blocked on lock waiting to remove it.  So just remove
> +		 * it from its current LRU and skip it.
>  		 */
> -		if (!kref_get_unless_zero(&obj->refcount))
> +		if (!kref_get_unless_zero(&obj->refcount)) {
> +			drm_gem_lru_remove_locked(obj);
>  			continue;
> +		}
> +
> +		drm_gem_lru_move_tail_locked(&still_in_lru, obj);
>  
>  		/*
>  		 * Now that we own a reference, we can drop the lock for the
> 
> -- 
> 2.54.0
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/4] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper
  2026-05-08 10:40 ` [PATCH v2 3/4] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper Boris Brezillon
@ 2026-05-08 15:00   ` Liviu Dudau
  0 siblings, 0 replies; 7+ messages in thread
From: Liviu Dudau @ 2026-05-08 15:00 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Dmitry Osipenko, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Akash Goel,
	Chia-I Wu, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, linux-arm-msm,
	freedreno, dri-devel, linux-kernel

On Fri, May 08, 2026 at 12:40:49PM +0200, Boris Brezillon wrote:
> The only place where it's safe to call drm_gem_lru_remove() is when
> we know the drm_gem_object::lru field can't be concurrently updated,
> which we know is the case when the drm_gem_object is destroyed.
> 
> Rather than trying to make that safe, let's kill the function and inline
> its content in drm_gem_object_release().
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

Best regards,
Liviu

> ---
>  drivers/gpu/drm/drm_gem.c | 90 ++++++++++++++++++++---------------------------
>  include/drm/drm_gem.h     |  1 -
>  2 files changed, 39 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 0e087c770883..c85a39b8b163 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1108,6 +1108,15 @@ drm_gem_release(struct drm_device *dev, struct drm_file *file_private)
>  	idr_destroy(&file_private->object_idr);
>  }
>  
> +static void
> +drm_gem_lru_remove_locked(struct drm_gem_object *obj)
> +{
> +	obj->lru->count -= obj->size >> PAGE_SHIFT;
> +	WARN_ON(obj->lru->count < 0);
> +	list_del(&obj->lru_node);
> +	obj->lru = NULL;
> +}
> +
>  /**
>   * drm_gem_object_release - release GEM buffer object resources
>   * @obj: GEM buffer object
> @@ -1118,13 +1127,42 @@ drm_gem_release(struct drm_device *dev, struct drm_file *file_private)
>  void
>  drm_gem_object_release(struct drm_gem_object *obj)
>  {
> +	struct drm_gem_lru *lru;
> +
>  	if (obj->filp)
>  		fput(obj->filp);
>  
>  	drm_gem_private_object_fini(obj);
>  
>  	drm_gem_free_mmap_offset(obj);
> -	drm_gem_lru_remove(obj);
> +
> +	/*
> +	 * We do the lru != NULL check without the lru->lock held, which
> +	 * means we might end up with a stale lru value by the time the
> +	 * lock is acquired.
> +	 *
> +	 * This is deemed safe because:
> +	 * 1. the LRU is assumed to outlive any GEM object it was attached
> +	 *    (LRUs are usually bound to a drm_device). So even if obj->lru
> +	 *    has become NULL, it still point to a valid object that can
> +	 *    safely be dereferenced to get the lock.
> +	 *
> +	 * 2. all LRUs a GEM object might be attached to must share the same
> +	 *    lock (lock that's usually part of the driver-specific device
> +	 *    object), so taking the lock on the 'old' LRU is equivalent
> +	 *    to taking it on the new one (if any)
> +	 */
> +	lru = obj->lru;
> +	if (lru) {
> +		guard(mutex)(lru->lock);
> +
> +		/* Check a second time with the lock held to make sure we're
> +		 * not racing with the drm_gem_lru_remove_locked() call in
> +		 * drm_gem_lru_scan().
> +		 */
> +		if (obj->lru)
> +			drm_gem_lru_remove_locked(obj);
> +	}
>  }
>  EXPORT_SYMBOL(drm_gem_object_release);
>  
> @@ -1552,56 +1590,6 @@ drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock)
>  }
>  EXPORT_SYMBOL(drm_gem_lru_init);
>  
> -static void
> -drm_gem_lru_remove_locked(struct drm_gem_object *obj)
> -{
> -	obj->lru->count -= obj->size >> PAGE_SHIFT;
> -	WARN_ON(obj->lru->count < 0);
> -	list_del(&obj->lru_node);
> -	obj->lru = NULL;
> -}
> -
> -/**
> - * drm_gem_lru_remove - remove object from whatever LRU it is in
> - *
> - * If the object is currently in any LRU, remove it.
> - *
> - * @obj: The GEM object to remove from current LRU
> - */
> -void
> -drm_gem_lru_remove(struct drm_gem_object *obj)
> -{
> -	struct drm_gem_lru *lru = obj->lru;
> -
> -	/*
> -	 * We do the lru != NULL check without the lru->lock held, which
> -	 * means we might end up with a stale lru value by the time the
> -	 * lock is acquired.
> -	 *
> -	 * This is deemed safe because:
> -	 * 1. the LRU is assumed to outlive any GEM object it was attached
> -	 *    (LRUs are usually bound to a drm_device). So even if obj->lru
> -	 *    has become NULL, it still point to a valid object that can
> -	 *    safely be dereferenced to get the lock.
> -	 *
> -	 * 2. all LRUs a GEM object might be attached to must share the same
> -	 *    lock (lock that's usually part of the driver-specific device
> -	 *    object), so taking the lock on the 'old' LRU is equivalent
> -	 *    to taking it on the new one (if any)
> -	 */
> -	if (!lru)
> -		return;
> -
> -	mutex_lock(lru->lock);
> -	/* Check a second time with the lock held to make sure we're not racing
> -	 * with another drm_gem_lru_remove[_locked]() call.
> -	 */
> -	if (obj->lru)
> -		drm_gem_lru_remove_locked(obj);
> -	mutex_unlock(lru->lock);
> -}
> -EXPORT_SYMBOL(drm_gem_lru_remove);
> -
>  /**
>   * drm_gem_lru_move_tail_locked - move the object to the tail of the LRU
>   *
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 86f5846154f7..d527df98d142 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -611,7 +611,6 @@ int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
>  			    u32 handle, u64 *offset);
>  
>  void drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock);
> -void drm_gem_lru_remove(struct drm_gem_object *obj);
>  void drm_gem_lru_move_tail_locked(struct drm_gem_lru *lru, struct drm_gem_object *obj);
>  void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
>  unsigned long
> 
> -- 
> 2.54.0
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-08 15:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08 10:40 [PATCH v2 0/4] drm/panthor: Fix a race in the shrinker logic Boris Brezillon
2026-05-08 10:40 ` [PATCH v2 1/4] drm/panthor: Don't use the racy drm_gem_lru_remove() helper Boris Brezillon
2026-05-08 10:40 ` [PATCH v2 2/4] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release() Boris Brezillon
2026-05-08 13:49   ` Liviu Dudau
2026-05-08 10:40 ` [PATCH v2 3/4] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper Boris Brezillon
2026-05-08 15:00   ` Liviu Dudau
2026-05-08 10:40 ` [PATCH v2 4/4] drm/gem: Make the GEM LRU lock part of drm_device Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox