From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 36976324718 for ; Fri, 8 May 2026 13:49:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778248180; cv=none; b=mXfX7UIdDhTFA7h+eAFjCZmHvgkrguEV1fm46Ft9TfCJc/M5vwDjGhb2nHeo3H5LuVx6afJVg7Quxvjg0SHs/EAW+4CKZT64Lg2leCEACLetm5AeM8tsQX/HowGPTt5qFjFkd0V0OUKeXxq/1qsteVhIHISNjbWlpu2lvhlyKPU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778248180; c=relaxed/simple; bh=KWYJnVlBnj7V7qkaHdFDGEGcW0rUpKIbmySK/TzA54M=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=HKuC1H6Bvyyiqd3jRepYKFPJoztbxAzv0/pXTjLi4oUHbUOiBDaOm9KR4p89nj/IiBRdcPaFGsfSZr08xOxzayBTN2kMn1/NGI9DAlCDJ+x5q6NfGQPl5jwnRk+a0hD4vFM2ImPP/DI3j3jYzOShnQ9yED3Wh7wF8ej9G11K6c8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=oMZDjBMX; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="oMZDjBMX" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 422B232D0 for ; Fri, 8 May 2026 06:49:33 -0700 (PDT) Received: from [192.168.0.1] (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 451233F836 for ; Fri, 8 May 2026 06:49:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1778248178; bh=KWYJnVlBnj7V7qkaHdFDGEGcW0rUpKIbmySK/TzA54M=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=oMZDjBMX/2287ZhwWULdBiW+Pv28f6DC5g2vX5T6kzQGUXT11R3ENIPAdEkSit11k f1vKT8PwLhOjbmtJMOSoO1m0L3vJJ9LLHulQTDA6KYi8LMZbq36RD1fKx2vWZmc+PS a2JTzVBc+uLVVzSSBGT/MF5Q8Q0e0OdNzIdNBcZo= Date: Fri, 8 May 2026 14:49:16 +0100 From: Liviu Dudau To: Boris Brezillon Cc: Steven Price , Dmitry Osipenko , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Simona Vetter , Akash Goel , Chia-I Wu , Rob Clark , Dmitry Baryshkov , Abhinav Kumar , Jessica Zhang , Sean Paul , Marijn Suijten , linux-arm-msm@vger.kernel.org, freedreno@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 2/4] drm/gem: Fix a race between drm_gem_lru_scan() and drm_gem_object_release() Message-ID: References: <20260508-panthor-shrinker-fixes-v2-0-39cdb7d577c9@collabora.com> <20260508-panthor-shrinker-fixes-v2-2-39cdb7d577c9@collabora.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260508-panthor-shrinker-fixes-v2-2-39cdb7d577c9@collabora.com> On Fri, May 08, 2026 at 12:40:48PM +0200, Boris Brezillon wrote: > The following race can currently happen: > > | Thread 0 in `drm_gem_lru_scan` | Thread 1 in `drm_gem_object_release` | > | - | - | > | move obj1 with refcount==0 to `still_in_lru` | | > | move obj2 with refcount!=0 to `still_in_lru` | | > | mutex_unlock | | > | shrink obj2 | | > | | lru = obj1->lru; // `still_in_lru` | > | mutex_lock | | > | move obj1 back to the original lru | | > | mutex_unlock | | > | return | | > | | dereference `still_in_lru` | > > Move the drm_gem_lru_move_tail_locked() after the > kref_get_unless_zero() check so that we don't end up with a > vanishing LRU when we hit drm_gem_object_release(). We also need to > remove the skipped object from its LRU, otherwise we'll keep hitting > it on subsequent loop iterations until it's actually removed from the > list in the drm_gem_release(). > > Fixes: e7c2af13f811 ("drm/gem: Add LRU/shrinker helper") > Reported-by: Chia-I Wu > Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86 > Signed-off-by: Boris Brezillon Reviewed-by: Liviu Dudau > --- > drivers/gpu/drm/drm_gem.c | 34 ++++++++++++++++++++++++++++------ > 1 file changed, 28 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > index fca42949eb2b..0e087c770883 100644 > --- a/drivers/gpu/drm/drm_gem.c > +++ b/drivers/gpu/drm/drm_gem.c > @@ -1573,11 +1573,31 @@ drm_gem_lru_remove(struct drm_gem_object *obj) > { > struct drm_gem_lru *lru = obj->lru; > > + /* > + * We do the lru != NULL check without the lru->lock held, which > + * means we might end up with a stale lru value by the time the > + * lock is acquired. > + * > + * This is deemed safe because: > + * 1. the LRU is assumed to outlive any GEM object it was attached > + * (LRUs are usually bound to a drm_device). So even if obj->lru > + * has become NULL, it still point to a valid object that can > + * safely be dereferenced to get the lock. > + * > + * 2. all LRUs a GEM object might be attached to must share the same > + * lock (lock that's usually part of the driver-specific device > + * object), so taking the lock on the 'old' LRU is equivalent > + * to taking it on the new one (if any) I like the description, but I think it's worth merging the later comment around the second check here as that is basically the whole "belt and braces" mechanism for ensuring correctness. Best regards, Liviu > + */ > if (!lru) > return; > > mutex_lock(lru->lock); > - drm_gem_lru_remove_locked(obj); > + /* Check a second time with the lock held to make sure we're not racing > + * with another drm_gem_lru_remove[_locked]() call. > + */ > + if (obj->lru) > + drm_gem_lru_remove_locked(obj); > mutex_unlock(lru->lock); > } > EXPORT_SYMBOL(drm_gem_lru_remove); > @@ -1660,15 +1680,17 @@ drm_gem_lru_scan(struct drm_gem_lru *lru, > if (!obj) > break; > > - drm_gem_lru_move_tail_locked(&still_in_lru, obj); > - > /* > * If it's in the process of being freed, gem_object->free() > - * may be blocked on lock waiting to remove it. So just > - * skip it. > + * may be blocked on lock waiting to remove it. So just remove > + * it from its current LRU and skip it. > */ > - if (!kref_get_unless_zero(&obj->refcount)) > + if (!kref_get_unless_zero(&obj->refcount)) { > + drm_gem_lru_remove_locked(obj); > continue; > + } > + > + drm_gem_lru_move_tail_locked(&still_in_lru, obj); > > /* > * Now that we own a reference, we can drop the lock for the > > -- > 2.54.0 > -- ==================== | I would like to | | fix the world, | | but they're not | | giving me the | \ source code! / --------------- ¯\_(ツ)_/¯