From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BFA1410FCAE8 for ; Wed, 1 Apr 2026 21:26:12 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6042D10ECCA; Wed, 1 Apr 2026 21:26:12 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="BMuIL/QR"; dkim-atps=neutral Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 37A2710ECCA for ; Wed, 1 Apr 2026 21:26:11 +0000 (UTC) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 11D26444AD; Wed, 1 Apr 2026 21:26:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6FDD2C4CEF7; Wed, 1 Apr 2026 21:26:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775078770; bh=qSSFaZBfLTIqOw+aXnDW/au5YBKlCp2ysGicVT2cvpE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BMuIL/QRK0o7CZ88Dw/NCfPBiUgEyC9utXqkfiZVNW1B/cbPl1Nmc+1wcVF4Qc0aR 5GdzaOU2GgNs0SLfoW/IdbiYO7TohNmSz5zLGjUe55TmWV3gTwqgnxMCkQ+Ku1dUh4 5DrD9xwA1oCvu8Mfmy1e4gk3OkOKJVIENZ0AeLM2LJykXu3qCkHi95BdmmbGVzmi63 VcU+eIzKr5s7iflU9bs3MuzH3W9n3HwEn/rniqPbMm9ZxBHFHG9VNXjAZjNMtZOYi7 LIfScBCCmQ51TiJLF9tf99q0UwvKUVyu7Uz4zP8vQVJsdj59IPjdQ0jm9LlqoybiBp LNq3RpPXXHqhg== Date: Wed, 1 Apr 2026 23:26:07 +0200 From: Andi Shyti To: Sebastian Brzezinka Cc: intel-gfx@lists.freedesktop.org, andi.shyti@linux.intel.com, krzysztof.karas@intel.com Subject: Re: [PATCH] drm/i915/gt: fix refcount underflow in intel_engine_park_heartbeat Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Hi Sebastian, ... > Fix this by replacing the non-atomic pointer read + separate clear with > xchg() in both racing paths. xchg() is a single indivisible hardware > instruction that atomically reads the old pointer and writes NULL. This > guarantees only one of the two concurrent callers obtains the non-NULL > pointer and performs the put, the other gets NULL and skips it. > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/15880 > Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats") Cc: # v5.5+ > Signed-off-by: Sebastian Brzezinka > --- > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 26 +++++++++++++------ > 1 file changed, 18 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c > index b279878dca29..a3830627ef81 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c > @@ -148,10 +148,12 @@ static void heartbeat(struct work_struct *wrk) > /* Just in case everything has gone horribly wrong, give it a kick */ > intel_engine_flush_submission(engine); > > - rq = engine->heartbeat.systole; > - if (rq && i915_request_completed(rq)) { > - i915_request_put(rq); > - engine->heartbeat.systole = NULL; > + rq = xchg(&engine->heartbeat.systole, NULL); > + if (rq) { > + if (i915_request_completed(rq)) > + i915_request_put(rq); > + else > + engine->heartbeat.systole = rq; Well spotted, Sebastian! > } > > if (!intel_engine_pm_get_if_awake(engine)) > @@ -232,8 +234,11 @@ static void heartbeat(struct work_struct *wrk) > unlock: > mutex_unlock(&ce->timeline->mutex); > out: > - if (!engine->i915->params.enable_hangcheck || !next_heartbeat(engine)) > - i915_request_put(fetch_and_zero(&engine->heartbeat.systole)); > + if (!engine->i915->params.enable_hangcheck || !next_heartbeat(engine)) { > + rq = xchg(&engine->heartbeat.systole, NULL); > + if (rq) > + i915_request_put(rq); > + } > intel_engine_pm_put(engine); > } > > @@ -247,8 +252,13 @@ void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine) > > void intel_engine_park_heartbeat(struct intel_engine_cs *engine) > { > - if (cancel_delayed_work(&engine->heartbeat.work)) > - i915_request_put(fetch_and_zero(&engine->heartbeat.systole)); > + struct i915_request *rq; nit: this should go inside the if statement. Reviewed-by: Andi Shyti Nice patch, Sebastian. Two very little nitpicks that I can take care of before merging. Thank you, Andi > + if (cancel_delayed_work(&engine->heartbeat.work)) { > + rq = xchg(&engine->heartbeat.systole, NULL); > + if (rq) > + i915_request_put(rq); > + } > } > > void intel_gt_unpark_heartbeats(struct intel_gt *gt) > -- > 2.52.0 >