public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Andi Shyti <andi.shyti@linux.intel.com>
To: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Cc: intel-gfx@lists.freedesktop.org, andi.shyti@linux.intel.com,
	krzysztof.karas@intel.com
Subject: Re: [PATCH] drm/i915/gt: fix refcount underflow in intel_engine_park_heartbeat
Date: Tue, 7 Apr 2026 18:06:52 +0200	[thread overview]
Message-ID: <adUrnBEu08c27e6S@ashyti-mobl2.lan> (raw)
In-Reply-To: <d4c1c14255688dd07cc8044973c4f032a8d1559e.1775038106.git.sebastian.brzezinka@intel.com>

Hi Sebastian,

On Wed, Apr 01, 2026 at 12:10:07PM +0200, Sebastian Brzezinka wrote:
> A use-after-free / refcount underflow is possible when the heartbeat
> worker and intel_engine_park_heartbeat() race to release the same
> engine->heartbeat.systole request.
> 
> The heartbeat worker reads engine->heartbeat.systole and calls
> i915_request_put() on it when the request is complete, but clears
> the pointer in a separate, non-atomic step. Concurrently, a request
> retirement on another CPU can drop the engine wakeref to zero, triggering
> __engine_park() -> intel_engine_park_heartbeat(). If the heartbeat
> timer is pending at that point, cancel_delayed_work() returns true and
> intel_engine_park_heartbeat() reads the stale non-NULL systole pointer
> and calls i915_request_put() on it again, causing a refcount underflow:
> 
> ```
> <4> [487.221889] Workqueue: i915-unordered engine_retire [i915]
> <4> [487.222640] RIP: 0010:refcount_warn_saturate+0x68/0xb0
> ...
> <4> [487.222707] Call Trace:
> <4> [487.222711]  <TASK>
> <4> [487.222716]  intel_engine_park_heartbeat.part.0+0x6f/0x80 [i915]
> <4> [487.223115]  intel_engine_park_heartbeat+0x25/0x40 [i915]
> <4> [487.223566]  __engine_park+0xb9/0x650 [i915]
> <4> [487.223973]  ____intel_wakeref_put_last+0x2e/0xb0 [i915]
> <4> [487.224408]  __intel_wakeref_put_last+0x72/0x90 [i915]
> <4> [487.224797]  intel_context_exit_engine+0x7c/0x80 [i915]
> <4> [487.225238]  intel_context_exit+0xf1/0x1b0 [i915]
> <4> [487.225695]  i915_request_retire.part.0+0x1b9/0x530 [i915]
> <4> [487.226178]  i915_request_retire+0x1c/0x40 [i915]
> <4> [487.226625]  engine_retire+0x122/0x180 [i915]
> <4> [487.227037]  process_one_work+0x239/0x760
> <4> [487.227060]  worker_thread+0x200/0x3f0
> <4> [487.227068]  ? __pfx_worker_thread+0x10/0x10
> <4> [487.227075]  kthread+0x10d/0x150
> <4> [487.227083]  ? __pfx_kthread+0x10/0x10
> <4> [487.227092]  ret_from_fork+0x3d4/0x480
> <4> [487.227099]  ? __pfx_kthread+0x10/0x10
> <4> [487.227107]  ret_from_fork_asm+0x1a/0x30
> <4> [487.227141]  </TASK>
> ```
> 
> Fix this by replacing the non-atomic pointer read + separate clear with
> xchg() in both racing paths. xchg() is a single indivisible hardware
> instruction that atomically reads the old pointer and writes NULL. This
> guarantees only one of the two concurrent callers obtains the non-NULL
> pointer and performs the put, the other gets NULL and skips it.
> 
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/15880
> Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
> Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>

merged to drm-intel-gt-next.

Thanks,
Andi

      parent reply	other threads:[~2026-04-07 16:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 10:10 [PATCH] drm/i915/gt: fix refcount underflow in intel_engine_park_heartbeat Sebastian Brzezinka
2026-04-01 13:07 ` ✓ i915.CI.BAT: success for " Patchwork
2026-04-01 21:26 ` [PATCH] " Andi Shyti
2026-04-02  6:10 ` ✗ i915.CI.Full: failure for " Patchwork
2026-04-02 12:40 ` ✓ i915.CI.BAT: success for drm/i915/gt: fix refcount underflow in intel_engine_park_heartbeat (rev2) Patchwork
2026-04-03  9:06 ` [PATCH] drm/i915/gt: fix refcount underflow in intel_engine_park_heartbeat Krzysztof Karas
2026-04-03 10:05 ` ✗ i915.CI.Full: failure for drm/i915/gt: fix refcount underflow in intel_engine_park_heartbeat (rev2) Patchwork
2026-04-07 10:06 ` ✓ i915.CI.BAT: success for drm/i915/gt: fix refcount underflow in intel_engine_park_heartbeat (rev3) Patchwork
2026-04-07 12:28 ` ✗ i915.CI.Full: failure " Patchwork
2026-04-07 16:06 ` Andi Shyti [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adUrnBEu08c27e6S@ashyti-mobl2.lan \
    --to=andi.shyti@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=krzysztof.karas@intel.com \
    --cc=sebastian.brzezinka@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox