From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6006DCA0FE0 for ; Fri, 30 Aug 2024 10:44:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D9DFD10EA56; Fri, 30 Aug 2024 10:44:36 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="glUIKdlh"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id C801110EA56 for ; Fri, 30 Aug 2024 10:44:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1725014674; bh=NdhqZXbWQ6r1rlPXjHmrhS3VLCyFIRId/uQPtpKpALE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=glUIKdlhn2FSyXdwyfD1v8YSrRnb06knRBHxGEKh0tY9cZBERxIGrLrikCYD+jgva bGrvWarqbg6taKCvB4GkUD1sbPx0ZMiNiXZUBPTiNTCGX9gxYAcmq3eyF0VOjHm9hR 76XTnuwCoL9sJXk+OZR8Lcy/9ePeuu3P5I74oR9u6PZjnKjMerlAABsL5OmenX7zEa IaaTymKHqR0GGL7dbrclSiRxmTx7v3dT7XyfudfVXUehluOEetEGEyfWHOoxBauFn/ AiAV8RxKC3W/XkpL4h95KRfH9oRsSIlpF6FSfyUc5De1h/zFrYGsvMrrY3nKfCJ9MB 09TsI+43auZqA== Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id D734817E0FAA; Fri, 30 Aug 2024 12:44:33 +0200 (CEST) Date: Fri, 30 Aug 2024 12:44:29 +0200 From: Boris Brezillon To: Christian =?UTF-8?B?S8O2bmln?= Cc: dri-devel@lists.freedesktop.org, Steven Price , Liviu Dudau , =?UTF-8?B?QWRyacOhbg==?= Larumbe , kernel@collabora.com, Luben Tuikov , Matthew Brost , Danilo Krummrich Subject: Re: [RFC PATCH] drm/sched: Fix a UAF on drm_sched_fence::sched Message-ID: <20240830124429.5699c9bf@collabora.com> In-Reply-To: <20240830113721.6174f3d9@collabora.com> References: <20240829171238.609481-1-boris.brezillon@collabora.com> <20240830113721.6174f3d9@collabora.com> Organization: Collabora X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Fri, 30 Aug 2024 11:37:21 +0200 Boris Brezillon wrote: > > > With the introduction of a new model where each entity has its own > > > drm_gpu_scheduler instance, this situation is likely to happen every time > > > a GPU context is destroyed and some of its fences remain attached to > > > dma_buf objects still owned by other drivers/processes. > > > > > > In order to make drm_sched_fence_get_timeline_name() safe, we need to > > > copy the scheduler name into our own refcounted object that's only > > > destroyed when both the scheduler and all its fences are gone. > > > > > > The fact drm_sched_fence might have a reference to the drm_gpu_scheduler > > > even after it's been released is worrisome though, but I'd rather > > > discuss that with everyone than come up with a solution that's likely > > > to end up being rejected. > > > > > > Note that the bug was found while repeatedly reading dma_buf's debugfs > > > file, which, at some point, calls dma_resv_describe() on a resv that > > > contains signalled fences coming from a destroyed GPU context. > > > AFAIK, there's nothing invalid there. > > > > Yeah but reading debugfs is not guaranteed to crash the kernel. > > > > On the other hand the approach with a kref'ed string looks rather sane > > to me. One comment on this below. > > There's still the problem I mentioned above (unloading drm_sched can > make things crash). Are there any plans to fix that? The simple option > would be to prevent compiling drm_sched as a module, but that's not an > option because it depends on DRM which is a tristate too. Maybe we > could have drm_sched_fence.o linked statically, just like dma-fence.c > is linked statically to prevent the stub ops from disappearing. > Not sure if drm_sched_fence.c depends on symbols defined in > sched_{main,entity}.c or other parts of the DRM subsystem though. For the record, I gave it a try, and linking drm_sched_fence.o statically while leaving the rest of drm_sched as a module seems to work. I just sent an RFC for that [1]. [1]https://lore.kernel.org/dri-devel/20240830104057.1479321-1-boris.brezillon@collabora.com/T/#u