From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91DD5CA0EE4 for ; Fri, 15 Aug 2025 19:59:16 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DFACD10E997; Fri, 15 Aug 2025 19:59:15 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=igalia.com header.i=@igalia.com header.b="sn4SaiDp"; dkim-atps=neutral Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by gabe.freedesktop.org (Postfix) with ESMTPS id E2F6A10E98C for ; Fri, 15 Aug 2025 19:59:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Cc:To:In-Reply-To:References:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From:Sender: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=3gm6b83pGzrxj/nuiONN4hrSdXUbITAwHw86LF5KIiA=; b=sn4SaiDp2vOIAK4R6K4LUeWiVG /5pkGiJhE7PiMMWC7dpbbHto/7ksR2tM4joqZJaUbMjF8wEpWBE2Aa6dTQvr51GhhZGTmmZA/hhmE MCydBQziHD4xm/WwssVvK7z3DsA7xv+tIQF0keyFUSdpc5Q0kHTBPC6evCYH5T0XqmxyHah6vhkQr 4XUh1xxdbe7yPtn5yIiC/8CbuEgzUVKkT/MVePaiI0QRItE6oiX+jN0q/S5TVKwB3RJMHXnpo+DNa 8maNED3P0yel628aX1ydhHQgUab8/sK6096sjZiTnb5eNIP/cWO5O22vJ+Aq+vnjgDRzvEhxRwIxX lFKmhSFg==; Received: from [177.191.196.76] (helo=[192.168.0.165]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1un0Zz-00EoI4-AD; Fri, 15 Aug 2025 21:59:11 +0200 From: =?utf-8?q?Ma=C3=ADra_Canal?= Date: Fri, 15 Aug 2025 16:58:43 -0300 Subject: [PATCH v2 4/6] drm/v3d: Address race-condition between per-fd GPU stats and fd release MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Message-Id: <20250815-v3d-queue-lock-v2-4-ce37258ffb53@igalia.com> References: <20250815-v3d-queue-lock-v2-0-ce37258ffb53@igalia.com> In-Reply-To: <20250815-v3d-queue-lock-v2-0-ce37258ffb53@igalia.com> To: Melissa Wen , Iago Toral Quiroga , Jose Maria Casanova Crespo , David Airlie , Simona Vetter Cc: kernel-dev@igalia.com, dri-devel@lists.freedesktop.org, =?utf-8?q?Ma=C3=ADra_Canal?= X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=6106; i=mcanal@igalia.com; h=from:subject:message-id; bh=iKE4WivXjO4z8L1VLPRwQevXtwDu0ubTLLULrbSIKFY=; b=owEBbQGS/pANAwAKAT/zDop2iPqqAcsmYgBon5F/3U4E/xIXg1Hgkcv8uQPgngY/Lo9KAkGft QDQbfQezTuJATMEAAEKAB0WIQT45F19ARZ3Bymmd9E/8w6Kdoj6qgUCaJ+RfwAKCRA/8w6Kdoj6 qqLzB/9hkCbawM7zuo4EYZLjEDP6Rlv80o0nU1TYtK0RxytAJ7fcewtylNvARAqj1L6nV27Wp7o f68gj507r/olGUSFn07A5r9XwsMXymqmO8KvlfVaOi4sANSGtZryz2G92PgzerwuODLCj6K8faS huE1AZopyZT0NOEDlV/UrzV1dVVKaZNyNq1ZaFcMCjRbW2jPB/Q3Qg0QPxp73jFuUEwg1gZKCi3 UbFlBj4epyys0FtHsCuTj4QcwkUf4F/LxJ0XXRFQsWcyP9/IZ0xrZC4zA0d1e0NCiJLEDjdNcqA dRa0rnIvBrTf4Fcjc6bqr9xsmO/j+jY33fo4hTVRviAy1pRq X-Developer-Key: i=mcanal@igalia.com; a=openpgp; fpr=F8E45D7D0116770729A677D13FF30E8A7688FAAA X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" When the file descriptor is closed while a job is still running, there's a race condition between the job completion callback and the file descriptor cleanup. This can lead to accessing freed memory when updating per-fd GPU stats, such as the following example: [56120.512903] Unable to handle kernel paging request at virtual address 0000330a92b9688a [56120.520881] Mem abort info: [56120.523687] ESR = 0x0000000096000005 [56120.527454] EC = 0x25: DABT (current EL), IL = 32 bits [56120.532785] SET = 0, FnV = 0 [56120.535847] EA = 0, S1PTW = 0 [56120.538995] FSC = 0x05: level 1 translation fault [56120.543891] Data abort info: [56120.546778] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [56120.552289] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [56120.557362] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [56120.562690] user pgtable: 16k pages, 47-bit VAs, pgdp=0000000023f54000 [56120.569239] [0000330a92b9688a] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [56120.577975] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP CPU: 0 UID: 1000 PID: 1497409 Comm: mpv Not tainted 6.12.37-ncvm5+ #1 Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT) pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : v3d_job_update_stats+0x64/0x168 [v3d] lr : v3d_job_update_stats+0x40/0x168 [v3d] sp : ffffc00080003e60 x29: ffffc00080003e60 x28: ffff800002860000 x27: 0000000000000000 x26: 0000000000000000 x25: ffff800002860000 x24: ffff800002630800 x23: ffff800060786000 x22: 0000330a933c31fb x21: 0000000000000001 x20: 0000330a92b96302 x19: ffff800060786b10 x18: 0000000000000000 x17: ffffaf90506a0000 x16: ffffd06fce57c360 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : ffffd06f5d0fec40 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000002978dbd535a x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : 0000300001fddf88 x2 : 0000000000000020 x1 : 0000000000010001 x0 : 0000330a92b96872 Call trace: v3d_job_update_stats+0x64/0x168 [v3d] v3d_irq+0x118/0x2e0 [v3d] __handle_irq_event_percpu+0x60/0x220 Fix such an issue by protecting all accesses to `job->file_priv` with the queue's lock. With that, we can clear `job->file_priv` before the V3D per-fd structure is freed and assure that `job->file_priv` exists during the per-fd GPU stats updates. Fixes: e1bc3a13bd77 ("drm/v3d: Avoid NULL pointer dereference in `v3d_job_update_stats()`") Signed-off-by: MaĆ­ra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_drv.c | 14 +++++++++++++- drivers/gpu/drm/v3d/v3d_drv.h | 2 +- drivers/gpu/drm/v3d/v3d_sched.c | 14 +++++++------- 3 files changed, 21 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 2def155ce496ec5f159f6bda9929aeaae141d1a6..c5a3bbbc74c5c6c9a34a6e6111b4e4e21bee0b34 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -157,12 +157,24 @@ v3d_open(struct drm_device *dev, struct drm_file *file) static void v3d_postclose(struct drm_device *dev, struct drm_file *file) { + struct v3d_dev *v3d = to_v3d_dev(dev); struct v3d_file_priv *v3d_priv = file->driver_priv; + unsigned long irqflags; enum v3d_queue q; - for (q = 0; q < V3D_MAX_QUEUES; q++) + for (q = 0; q < V3D_MAX_QUEUES; q++) { + struct v3d_queue_state *queue = &v3d->queue[q]; + struct v3d_job *job = queue->active_job; + drm_sched_entity_destroy(&v3d_priv->sched_entity[q]); + if (job && job->base.entity == &v3d_priv->sched_entity[q]) { + spin_lock_irqsave(&queue->queue_lock, irqflags); + job->file_priv = NULL; + spin_unlock_irqrestore(&queue->queue_lock, irqflags); + } + } + v3d_perfmon_close_file(v3d_priv); kfree(v3d_priv); } diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index cfc2f9c123aa99f6f1875b297eaf6c226b03d4ec..0317f3d7452a3f01b91bfdc691b5a98e54b3a4ec 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -608,7 +608,7 @@ void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, unsigned int count); void v3d_performance_query_info_free(struct v3d_performance_query_info *query_info, unsigned int count); -void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); +void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue q); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index e348816b691ef05909828accbe15399816e69369..60e251367f42170b30de968682deb6370604db44 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -194,11 +194,11 @@ v3d_stats_update(struct v3d_stats *stats, u64 now) } void -v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) +v3d_job_update_stats(struct v3d_job *job, enum v3d_queue q) { struct v3d_dev *v3d = job->v3d; - struct v3d_file_priv *file = job->file_priv; - struct v3d_stats *global_stats = &v3d->queue[queue].stats; + struct v3d_queue_state *queue = &v3d->queue[q]; + struct v3d_stats *global_stats = &queue->stats; u64 now = local_clock(); unsigned long flags; @@ -209,10 +209,10 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) preempt_disable(); /* Don't update the local stats if the file context has already closed */ - if (file) - v3d_stats_update(&file->stats[queue], now); - else - drm_dbg(&v3d->drm, "The file descriptor was closed before job completion\n"); + spin_lock(&queue->queue_lock); + if (job->file_priv) + v3d_stats_update(&job->file_priv->stats[q], now); + spin_unlock(&queue->queue_lock); v3d_stats_update(global_stats, now); -- 2.50.1