Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Thomas Zimmermann <tzimmermann@suse.de>
To: Simona Vetter <simona.vetter@ffwll.ch>
Cc: DRI Development <dri-devel@lists.freedesktop.org>,
	intel-xe@lists.freedesktop.org,
	Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>,
	stable@vger.kernel.org,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	Simona Vetter <simona.vetter@intel.com>
Subject: Re: [PATCH 1/8] drm/gem: Fix race in drm_gem_handle_create_tail()
Date: Tue, 3 Jun 2025 14:40:38 +0200	[thread overview]
Message-ID: <35b01b87-6345-4d59-99be-e39daa226dd5@suse.de> (raw)
In-Reply-To: <aD7gcvEaZzoDRRc1@phenom.ffwll.local>

Hi

Am 03.06.25 um 13:45 schrieb Simona Vetter:
> On Mon, Jun 02, 2025 at 05:15:58PM +0200, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 28.05.25 um 11:12 schrieb Simona Vetter:
>>> Object creation is a careful dance where we must guarantee that the
>>> object is fully constructed before it is visible to other threads, and
>>> GEM buffer objects are no difference.
>>>
>>> Final publishing happens by calling drm_gem_handle_create(). After
>>> that the only allowed thing to do is call drm_gem_object_put() because
>>> a concurrent call to the GEM_CLOSE ioctl with a correctly guessed id
>>> (which is trivial since we have a linear allocator) can already tear
>>> down the object again.
>>>
>>> Luckily most drivers get this right, the very few exceptions I've
>>> pinged the relevant maintainers for. Unfortunately we also need
>>> drm_gem_handle_create() when creating additional handles for an
>>> already existing object (e.g. GETFB ioctl or the various bo import
>>> ioctl), and hence we cannot have a drm_gem_handle_create_and_put() as
>>> the only exported function to stop these issues from happening.
>>>
>>> Now unfortunately the implementation of drm_gem_handle_create() isn't
>>> living up to standards: It does correctly finishe object
>>> initialization at the global level, and hence is safe against a
>>> concurrent tear down. But it also sets up the file-private aspects of
>>> the handle, and that part goes wrong: We fully register the object in
>>> the drm_file.object_idr before calling drm_vma_node_allow() or
>>> obj->funcs->open, which opens up races against concurrent removal of
>>> that handle in drm_gem_handle_delete().
>>>
>>> Fix this with the usual two-stage approach of first reserving the
>>> handle id, and then only registering the object after we've completed
>>> the file-private setup.
>>>
>>> Jacek reported this with a testcase of concurrently calling GEM_CLOSE
>>> on a freshly-created object (which also destroys the object), but it
>>> should be possible to hit this with just additional handles created
>>> through import or GETFB without completed destroying the underlying
>>> object with the concurrent GEM_CLOSE ioctl calls.
>>>
>>> Note that the close-side of this race was fixed in f6cd7daecff5 ("drm:
>>> Release driver references to handle before making it available
>>> again"), which means a cool 9 years have passed until someone noticed
>>> that we need to make this symmetry or there's still gaps left :-/
>>> Without the 2-stage close approach we'd still have a race, therefore
>>> that's an integral part of this bugfix.
>>>
>>> More importantly, this means we can have NULL pointers behind
>>> allocated id in our drm_file.object_idr. We need to check for that
>>> now:
>>>
>>> - drm_gem_handle_delete() checks for ERR_OR_NULL already
>>>
>>> - drm_gem.c:object_lookup() also chekcs for NULL
>>>
>>> - drm_gem_release() should never be called if there's another thread
>>>     still existing that could call into an IOCTL that creates a new
>>>     handle, so cannot race. For paranoia I added a NULL check to
>>>     drm_gem_object_release_handle() though.
>>>
>>> - most drivers (etnaviv, i915, msm) are find because they use
>>>     idr_find, which maps both ENOENT and NULL to NULL.
>>>
>>> - vmgfx is already broken vmw_debugfs_gem_info_show() because NULL
>>>     pointers might exist due to drm_gem_handle_delete(). This needs a
>>>     separate patch. This is because idr_for_each_entry terminates on the
>>>     first NULL entry and so might not iterate over everything.
>>>
>>> - similar for amd in amdgpu_debugfs_gem_info_show() and
>>>     amdgpu_gem_force_release(). The latter is really questionable though
>>>     since it's a best effort hack and there's no way to close all the
>>>     races. Needs separate patches.
>>>
>>> - xe is really broken because it not uses idr_for_each_entry() but
>>>     also drops the drm_file.table_lock, which can wreak the idr iterator
>>>     state if you're unlucky enough. Maybe another reason to look into
>>>     the drm fdinfo memory stats instead of hand-rolling too much.
>>>
>>> - drm_show_memory_stats() is also broken since it uses
>>>     idr_for_each_entry. But since that's a preexisting bug I'll follow
>>>     up with a separate patch.
>>>
>>> Reported-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>>> Cc: stable@vger.kernel.org
>>> Cc: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> Cc: Maxime Ripard <mripard@kernel.org>
>>> Cc: Thomas Zimmermann <tzimmermann@suse.de>
>>> Cc: David Airlie <airlied@gmail.com>
>>> Cc: Simona Vetter <simona@ffwll.ch>
>>> Signed-off-by: Simona Vetter <simona.vetter@intel.com>
>>> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
>>> ---
>>>    drivers/gpu/drm/drm_gem.c | 10 +++++++++-
>>>    include/drm/drm_file.h    |  3 +++
>>>    2 files changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
>>> index 1e659d2660f7..e4e20dda47b1 100644
>>> --- a/drivers/gpu/drm/drm_gem.c
>>> +++ b/drivers/gpu/drm/drm_gem.c
>>> @@ -279,6 +279,9 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
>>>    	struct drm_file *file_priv = data;
>>>    	struct drm_gem_object *obj = ptr;
>>> +	if (WARN_ON(!data))
>>> +		return 0;
>>> +
>>>    	if (obj->funcs->close)
>>>    		obj->funcs->close(obj, file_priv);
>>> @@ -399,7 +402,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
>>>    	idr_preload(GFP_KERNEL);
>>>    	spin_lock(&file_priv->table_lock);
>>> -	ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
>>> +	ret = idr_alloc(&file_priv->object_idr, NULL, 1, 0, GFP_NOWAIT);
>>>    	spin_unlock(&file_priv->table_lock);
>>>    	idr_preload_end();
>>> @@ -420,6 +423,11 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
>>>    			goto err_revoke;
>>>    	}
>>> +	/* mirrors drm_gem_handle_delete to avoid races */
>>> +	spin_lock(&file_priv->table_lock);
>>> +	obj = idr_replace(&file_priv->object_idr, obj, handle);
>>> +	WARN_ON(obj != NULL);
>> A DRM print function would be preferable. The obj here is an errno pointer.
>> Should the errno code be part of the error message?
>>
>> If it fails, why does the function still succeed?
> This is an internal error that should never happen, at that point just
> bailing out is the way to go.
>
> Also note that the error code here is just to satisfy the function
> signature that id_for_each expects, we don't look at it ever (since if
> there's no bugs, it should never fail). I learned this because I actually
> removed the int return value and stuff didn't compile :-)

I see.

>
> I can use drm_WARN_ON if you want me to though?

If you use drm_WARN_ON, you can add

Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>

Best regards
Thomas

>
> I'll also explain this in the commit message for the next round.
> -Sima
>
>> Best regards
>> Thomas
>>
>>> +	spin_unlock(&file_priv->table_lock);
>>>    	*handlep = handle;
>>>    	return 0;
>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>> index 5c3b2aa3e69d..d344d41e6cfe 100644
>>> --- a/include/drm/drm_file.h
>>> +++ b/include/drm/drm_file.h
>>> @@ -300,6 +300,9 @@ struct drm_file {
>>>    	 *
>>>    	 * Mapping of mm object handles to object pointers. Used by the GEM
>>>    	 * subsystem. Protected by @table_lock.
>>> +	 *
>>> +	 * Note that allocated entries might be NULL as a transient state when
>>> +	 * creating or deleting a handle.
>>>    	 */
>>>    	struct idr object_idr;
>> -- 
>> --
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Software Solutions Germany GmbH
>> Frankenstrasse 146, 90461 Nuernberg, Germany
>> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
>> HRB 36809 (AG Nuernberg)
>>

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


  reply	other threads:[~2025-06-03 12:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250528091307.1894940-1-simona.vetter@ffwll.ch>
2025-05-28  9:12 ` [PATCH 1/8] drm/gem: Fix race in drm_gem_handle_create_tail() Simona Vetter
2025-05-28  9:26   ` Simona Vetter
2025-05-28 13:20   ` Jacek Lawrynowicz
2025-06-02 15:15   ` Thomas Zimmermann
2025-06-03 11:45     ` Simona Vetter
2025-06-03 12:40       ` Thomas Zimmermann [this message]
2025-06-04  9:02       ` Simona Vetter
2025-05-28  9:13 ` [PATCH 2/8] drm/fdinfo: Switch to idr_for_each() in drm_show_memory_stats() Simona Vetter
2025-05-28  9:22   ` Simona Vetter
2025-05-28 20:10   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35b01b87-6345-4d59-99be-e39daa226dd5@suse.de \
    --to=tzimmermann@suse.de \
    --cc=airlied@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jacek.lawrynowicz@linux.intel.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=simona.vetter@ffwll.ch \
    --cc=simona.vetter@intel.com \
    --cc=simona@ffwll.ch \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox