From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B0C70C3ABCB for ; Mon, 12 May 2025 14:09:38 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2233410E412; Mon, 12 May 2025 14:09:38 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; secure) header.d=mailbox.org header.i=@mailbox.org header.b="IS4SOX9Y"; dkim-atps=neutral Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) by gabe.freedesktop.org (Postfix) with ESMTPS id DA1BC10E082; Mon, 12 May 2025 14:09:32 +0000 (UTC) Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4Zx1gr6W2Wz9t2s; Mon, 12 May 2025 16:09:28 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mailbox.org; s=mail20150812; t=1747058969; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sGjALh9eKEZFx0N6cbie0vkm+fZRwhnmy1kasYIhjDI=; b=IS4SOX9YvKxf3uXn2X1ZsGOM+1O5TQ6cHc6abSj0zXdmPUz1VBrR0aU+mnRevCnpXM0Xqe gTFTHEH/6IEjK39CvZZishOP+Mthn5eUQpjgeo77EdJ4CTj4NEMkQMV+lsA1mL4cLXfWi7 KzJQ/SrJ2/uGWNEeeOp6kLOcN4LbRtHBKKpSDDGFSjfdoM56zKSRdp9sRv7a2WsCrDLExC DcTpT8w1cXAvP+mPfmyUh9OAURAMBK9NzMRfZ6GrfM3/dxXw6gBi/BLXCqxHd/f+TLfTDn Vs29GTeY3DdygJKd0ZSfPdo2SGm54S+yg8llkTpMInw+iOb9/IaMTYZv5OBW4Q== Message-ID: <50debbe5a4856a744b8ac89048405be89f936a6a.camel@mailbox.org> Subject: Re: [PATCH 1/8] drm/sched: Allow drivers to skip the reset and keep on running From: Philipp Stanner To: =?ISO-8859-1?Q?Ma=EDra?= Canal , phasta@kernel.org, Matthew Brost Cc: Danilo Krummrich , Christian =?ISO-8859-1?Q?K=F6nig?= , Tvrtko Ursulin , Simona Vetter , Melissa Wen , Lucas Stach , Russell King , Christian Gmeiner , Lucas De Marchi , Thomas =?ISO-8859-1?Q?Hellstr=F6m?= , Rodrigo Vivi , Boris Brezillon , Rob Herring , Steven Price , kernel-dev@igalia.com, dri-devel@lists.freedesktop.org, etnaviv@lists.freedesktop.org, intel-xe@lists.freedesktop.org Date: Mon, 12 May 2025 16:09:21 +0200 In-Reply-To: <22fdf8aa-437a-4d28-886a-fe10e957edfa@igalia.com> References: <20250503-sched-skip-reset-v1-0-ed0d6701a3fe@igalia.com> <20250503-sched-skip-reset-v1-1-ed0d6701a3fe@igalia.com> <22fdf8aa-437a-4d28-886a-fe10e957edfa@igalia.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MBO-RS-ID: 47d697da19a3a933f0f X-MBO-RS-META: nnzga1m3wrejtbp351cpd68ns5cf7ngj X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: phasta@kernel.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Mon, 2025-05-12 at 11:04 -0300, Ma=C3=ADra Canal wrote: > Hi Philipp, >=20 > On 12/05/25 08:13, Philipp Stanner wrote: > > On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote: > > > On Mon, May 05, 2025 at 07:41:09PM -0700, Matthew Brost wrote: > > > > On Sat, May 03, 2025 at 05:59:52PM -0300, Ma=C3=ADra Canal wrote: > > > > > When the DRM scheduler times out, it's possible that the GPU > > > > > isn't hung; > > > > > instead, a job may still be running, and there may be no > > > > > valid > > > > > reason to > > > > > reset the hardware. This can occur in two situations: > > > > >=20 > > > > > =C2=A0=C2=A0 1. The GPU exposes some mechanism that ensures the G= PU is > > > > > still > > > > > making > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 progress. By checking this mechani= sm, we can safely > > > > > skip the > > > > > reset, > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 rearm the timeout, and allow the j= ob to continue > > > > > running > > > > > until > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 completion. This is the case for v= 3d and Etnaviv. > > > > > =C2=A0=C2=A0 2. TDR has fired before the IRQ that signals the fen= ce. > > > > > Consequently, > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 the job actually finishes, but it = triggers a timeout > > > > > before > > > > > signaling > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 the completion fence. > > > > >=20 > > > >=20 > > > > We have both of these cases in Xe too. We implement the > > > > requeuing > > > > in Xe > > > > via driver side function - xe_sched_add_pending_job but this > > > > looks > > > > better and will make use of this. > > > >=20 > > > > > These two scenarios are problematic because we remove the job > > > > > from the > > > > > `sched->pending_list` before calling `sched->ops- > > > > > > timedout_job()`. This > > > > > means that when the job finally signals completion (e.g. in > > > > > the > > > > > IRQ > > > > > handler), the scheduler won't call `sched->ops->free_job()`. > > > > > As a > > > > > result, > > > > > the job and its resources won't be freed, leading to a memory > > > > > leak. > > > > >=20 > > > > > To resolve this issue, we create a new `drm_gpu_sched_stat` > > > > > that > > > > > allows a > > > > > driver to skip the reset. This new status will indicate that > > > > > the > > > > > job > > > > > should be reinserted into the pending list, and the driver > > > > > will > > > > > still > > > > > signal its completion. > > > > >=20 > > > > > Signed-off-by: Ma=C3=ADra Canal > > > >=20 > > > > Reviewed-by: Matthew Brost > > > >=20 > > >=20 > > > Wait - nevermind I think one issue is below. > > >=20 > > > > > --- > > > > > =C2=A0=C2=A0drivers/gpu/drm/scheduler/sched_main.c | 14 +++++++++= +++++ > > > > > =C2=A0=C2=A0include/drm/gpu_scheduler.h=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 2 ++ > > > > > =C2=A0=C2=A02 files changed, 16 insertions(+) > > > > >=20 > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > > > index > > > > > 829579c41c6b5d8b2abce5ad373c7017469b7680..68ca827d77e32187a03 > > > > > 4309 > > > > > f881135dbc639a9b4 100644 > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > > @@ -568,6 +568,17 @@ static void > > > > > drm_sched_job_timedout(struct > > > > > work_struct *work) > > > > > =C2=A0=C2=A0 job->sched->ops->free_job(job); > > > > > =C2=A0=C2=A0 sched->free_guilty =3D false; > > > > > =C2=A0=C2=A0 } > > > > > + > > > > > + /* > > > > > + * If the driver indicated that the GPU is > > > > > still > > > > > running and wants to skip > > > > > + * the reset, reinsert the job back into the > > > > > pending list and realarm the > > > > > + * timeout. > > > > > + */ > > > > > + if (status =3D=3D DRM_GPU_SCHED_STAT_RUNNING) { > > > > > + spin_lock(&sched->job_list_lock); > > > > > + list_add(&job->list, &sched- > > > > > > pending_list); > > > > > + spin_unlock(&sched->job_list_lock); > > > > > + } > > >=20 > > > I think you need to requeue free_job wq here. It is possible the > > > free_job wq ran, didn't find a job, goes to sleep, then we add a > > > signaled job here which will never get freed. > >=20 > > I wonder if that could be solved by holding job_list_lock a bit > > longer. > > free_job_work will try to check the list for the next signaled job, > > but > > will wait for the lock. > >=20 > > If that works, we could completely rely on the standard mechanism > > without requeuing, which would be neat. >=20 > I believe it works. However, the tradeoff would be holding the lock > for > the entire reset of the GPU (in the cases the GPU actually hanged), > which looks like a lot of time. >=20 > Do you think it's reasonable to do so? The scheduler only has three distinct work items, run_job, free_job and timeout. timeout runs only serially, so that's not relevant; and run_job() and free_job() should be halted in the timeout handler through drm_sched_stop() anyways. Moreover, timeouts should be rare events. So I'd say yes, the clarity of the code trumps here. Cheers, P. >=20 > Best Regards, > - Ma=C3=ADra >=20 > >=20 > > P. > >=20 > > >=20 > > > Matt > > >=20 >=20 >=20