From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E29D16C6A8; Wed, 3 Jul 2024 11:06:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720004779; cv=none; b=A+fRGNUSAX+uAkURximIy8tQ+/urRqoQJrlZsvzl8Bb3EMzZTvMeMPR6GX3H328NG2ZriUpZvimlN0WDcl+26KFpWI3O5Ab1cTTXBphnxPvcgK90SYM2c9ZTNI/fkviAHBtljIQerfDovhop+OFEgDaSHL9YOob42uxUbBQEols= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720004779; c=relaxed/simple; bh=4EyCENxsh+I1qQXpXvsB5tOk4NjyvVXEtgm/Zanjb6w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ETGqLBriuMM10UM0YlOLNqmN9yusrR1FCnKFscCFgWdCTMx81bQRb+xd42UL5VEMWod/OkaIMJ7fHwRaV8Tt8zlyWXlQI6D5EoAnbRzzpZQ4nWkl67YM1x+oDRhyAFRyADFgLyGawi4L9Kx22z61Sg3rPwUJp5SKdoD+Q6nWTgM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=Ht9HtG4q; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="Ht9HtG4q" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24B3CC2BD10; Wed, 3 Jul 2024 11:06:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1720004777; bh=4EyCENxsh+I1qQXpXvsB5tOk4NjyvVXEtgm/Zanjb6w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ht9HtG4q7/UhUBwiYS6X3MvrTpNaDzrL2VFE+CfAMOTGZdTAcqYdxwUSVkTYheU+V x6kUf8tO98Un73CPm3CLOZ1Nu1e1Bv2wLSHT0LCVyQ/0WkWpxvZrEA+K+9ogp1VmNc AN9bg5K7zlqv6PWY6456/tEPro8E0pancDU+m+ko= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Erico Nunes , Qiang Yu , Sasha Levin Subject: [PATCH 5.10 124/290] drm/lima: mask irqs in timeout path before hard reset Date: Wed, 3 Jul 2024 12:38:25 +0200 Message-ID: <20240703102908.873195394@linuxfoundation.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240703102904.170852981@linuxfoundation.org> References: <20240703102904.170852981@linuxfoundation.org> User-Agent: quilt/0.67 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 5.10-stable review patch. If anyone has any objections, please let me know. ------------------ From: Erico Nunes [ Upstream commit a421cc7a6a001b70415aa4f66024fa6178885a14 ] There is a race condition in which a rendering job might take just long enough to trigger the drm sched job timeout handler but also still complete before the hard reset is done by the timeout handler. This runs into race conditions not expected by the timeout handler. In some very specific cases it currently may result in a refcount imbalance on lima_pm_idle, with a stack dump such as: [10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0 ... [10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0 ... [10136.669628] Call trace: [10136.669634] lima_devfreq_record_idle+0xa0/0xb0 [10136.669646] lima_sched_pipe_task_done+0x5c/0xb0 [10136.669656] lima_gp_irq_handler+0xa8/0x120 [10136.669666] __handle_irq_event_percpu+0x48/0x160 [10136.669679] handle_irq_event+0x4c/0xc0 We can prevent that race condition entirely by masking the irqs at the beginning of the timeout handler, at which point we give up on waiting for that job entirely. The irqs will be enabled again at the next hard reset which is already done as a recovery by the timeout handler. Signed-off-by: Erico Nunes Reviewed-by: Qiang Yu Signed-off-by: Qiang Yu Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-4-nunes.erico@gmail.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/lima/lima_sched.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index f6e7a88a56f1b..290f875c28598 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -419,6 +419,13 @@ static void lima_sched_timedout_job(struct drm_sched_job *job) struct lima_sched_task *task = to_lima_task(job); struct lima_device *ldev = pipe->ldev; + /* + * The task might still finish while this timeout handler runs. + * To prevent a race condition on its completion, mask all irqs + * on the running core until the next hard reset completes. + */ + pipe->task_mask_irq(pipe); + if (!pipe->error) DRM_ERROR("lima job timeout\n"); -- 2.43.0