All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liviu Dudau <liviu.dudau@arm.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Steven Price <steven.price@arm.com>,
	Dmitry Osipenko <dmitry.osipenko@collabora.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	Akash Goel <akash.goel@arm.com>, Chia-I Wu <olvaffe@gmail.com>,
	Rob Clark <robin.clark@oss.qualcomm.com>,
	Dmitry Baryshkov <lumag@kernel.org>,
	Abhinav Kumar <abhinav.kumar@linux.dev>,
	Jessica Zhang <jesszhan0024@gmail.com>,
	Sean Paul <sean@poorly.run>,
	Marijn Suijten <marijn.suijten@somainline.org>,
	linux-arm-msm@vger.kernel.org, freedreno@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/4] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release()
Date: Fri, 8 May 2026 14:49:16 +0100	[thread overview]
Message-ID: <af3p3GsJAskjVMWQ@e142607> (raw)
In-Reply-To: <20260508-panthor-shrinker-fixes-v2-2-39cdb7d577c9@collabora.com>

On Fri, May 08, 2026 at 12:40:48PM +0200, Boris Brezillon wrote:
> The following race can currently happen:
> 
> | Thread 0 in `drm_gem_lru_scan`               | Thread 1 in `drm_gem_object_release` |
> | -                                            | -                                    |
> | move obj1 with refcount==0 to `still_in_lru` |                                      |
> | move obj2 with refcount!=0 to `still_in_lru` |                                      |
> | mutex_unlock                                 |                                      |
> | shrink obj2                                  |                                      |
> |                                              | lru = obj1->lru; // `still_in_lru`   |
> | mutex_lock                                   |                                      |
> | move obj1 back to the original lru           |                                      |
> | mutex_unlock                                 |                                      |
> | return                                       |                                      |
> |                                              | dereference `still_in_lru`           |
> 
> Move the drm_gem_lru_move_tail_locked() after the
> kref_get_unless_zero() check so that we don't end up with a
> vanishing LRU when we hit drm_gem_object_release(). We also need to
> remove the skipped object from its LRU, otherwise we'll keep hitting
> it on subsequent loop iterations until it's actually removed from the
> list in the drm_gem_release().
> 
> Fixes: e7c2af13f811 ("drm/gem: Add LRU/shrinker helper")
> Reported-by: Chia-I Wu <olvaffe@gmail.com>
> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

> ---
>  drivers/gpu/drm/drm_gem.c | 34 ++++++++++++++++++++++++++++------
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index fca42949eb2b..0e087c770883 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1573,11 +1573,31 @@ drm_gem_lru_remove(struct drm_gem_object *obj)
>  {
>  	struct drm_gem_lru *lru = obj->lru;
>  
> +	/*
> +	 * We do the lru != NULL check without the lru->lock held, which
> +	 * means we might end up with a stale lru value by the time the
> +	 * lock is acquired.
> +	 *
> +	 * This is deemed safe because:
> +	 * 1. the LRU is assumed to outlive any GEM object it was attached
> +	 *    (LRUs are usually bound to a drm_device). So even if obj->lru
> +	 *    has become NULL, it still point to a valid object that can
> +	 *    safely be dereferenced to get the lock.
> +	 *
> +	 * 2. all LRUs a GEM object might be attached to must share the same
> +	 *    lock (lock that's usually part of the driver-specific device
> +	 *    object), so taking the lock on the 'old' LRU is equivalent
> +	 *    to taking it on the new one (if any)

I like the description, but I think it's worth merging the later comment around
the second check here as that is basically the whole "belt and braces" mechanism
for ensuring correctness.

Best regards,
Liviu

> +	 */
>  	if (!lru)
>  		return;
>  
>  	mutex_lock(lru->lock);
> -	drm_gem_lru_remove_locked(obj);
> +	/* Check a second time with the lock held to make sure we're not racing
> +	 * with another drm_gem_lru_remove[_locked]() call.
> +	 */
> +	if (obj->lru)
> +		drm_gem_lru_remove_locked(obj);
>  	mutex_unlock(lru->lock);
>  }
>  EXPORT_SYMBOL(drm_gem_lru_remove);
> @@ -1660,15 +1680,17 @@ drm_gem_lru_scan(struct drm_gem_lru *lru,
>  		if (!obj)
>  			break;
>  
> -		drm_gem_lru_move_tail_locked(&still_in_lru, obj);
> -
>  		/*
>  		 * If it's in the process of being freed, gem_object->free()
> -		 * may be blocked on lock waiting to remove it.  So just
> -		 * skip it.
> +		 * may be blocked on lock waiting to remove it.  So just remove
> +		 * it from its current LRU and skip it.
>  		 */
> -		if (!kref_get_unless_zero(&obj->refcount))
> +		if (!kref_get_unless_zero(&obj->refcount)) {
> +			drm_gem_lru_remove_locked(obj);
>  			continue;
> +		}
> +
> +		drm_gem_lru_move_tail_locked(&still_in_lru, obj);
>  
>  		/*
>  		 * Now that we own a reference, we can drop the lock for the
> 
> -- 
> 2.54.0
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

  reply	other threads:[~2026-05-08 13:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 10:40 [PATCH v2 0/4] drm/panthor: Fix a race in the shrinker logic Boris Brezillon
2026-05-08 10:40 ` [PATCH v2 1/4] drm/panthor: Don't use the racy drm_gem_lru_remove() helper Boris Brezillon
2026-05-08 10:40 ` [PATCH v2 2/4] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release() Boris Brezillon
2026-05-08 13:49   ` Liviu Dudau [this message]
2026-05-08 10:40 ` [PATCH v2 3/4] drm/gem: Stop exposing the racy/unsafe drm_gem_lru_remove() helper Boris Brezillon
2026-05-08 15:00   ` Liviu Dudau
2026-05-08 10:40 ` [PATCH v2 4/4] drm/gem: Make the GEM LRU lock part of drm_device Boris Brezillon
2026-05-11  9:11   ` Liviu Dudau
2026-05-11 13:18   ` Rob Clark
2026-05-11 15:44     ` Boris Brezillon
2026-05-11 16:16       ` Rob Clark
2026-05-11 16:26         ` Boris Brezillon
2026-05-11 16:27         ` Liviu Dudau
2026-05-11 16:32           ` Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af3p3GsJAskjVMWQ@e142607 \
    --to=liviu.dudau@arm.com \
    --cc=abhinav.kumar@linux.dev \
    --cc=airlied@gmail.com \
    --cc=akash.goel@arm.com \
    --cc=boris.brezillon@collabora.com \
    --cc=dmitry.osipenko@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=jesszhan0024@gmail.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lumag@kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=marijn.suijten@somainline.org \
    --cc=mripard@kernel.org \
    --cc=olvaffe@gmail.com \
    --cc=robin.clark@oss.qualcomm.com \
    --cc=sean@poorly.run \
    --cc=simona@ffwll.ch \
    --cc=steven.price@arm.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.