All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] accel/habanalabs: fix kref underflow in hl_cs_signal_sob_wraparound_handler
@ 2026-06-28 11:30 WenTao Liang
  2026-06-28 11:48 ` sashiko-bot
  0 siblings, 1 reply; 2+ messages in thread
From: WenTao Liang @ 2026-06-28 11:30 UTC (permalink / raw)
  To: dri-devel
  Cc: ogabbay, koby.elbaz, konstantin.sinyuk, kees, linux-kernel,
	stable, WenTao Liang, Greg KH

When other_sob->need_reset is true and encaps_sig is false,
hw_sob_put(other_sob) decrements the kref to 0, but the matching
hw_sob_get(other_sob) is skipped because it is inside the encaps_sig
block. The function returns other_sob with kref=0, causing a subsequent
kref_put to underflow. Fix by adding hw_sob_get(other_sob) in the else
branch.

Suggested-by: Greg KH <gregkh@linuxfoundation.org>
Fixes: dadf17abb724 ("habanalabs: add support for encapsulated signals reservation")
Cc: stable@vger.kernel.org
Signed-off-by: WenTao Liang <vulab@iscas.ac.cn>
---
Changes in v2:
- Correct Fixes hash based on reviewer feedback
---
---
 drivers/accel/habanalabs/common/command_submission.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/accel/habanalabs/common/command_submission.c b/drivers/accel/habanalabs/common/command_submission.c
index ba4257bda77b..675301dfc0ef 100644
--- a/drivers/accel/habanalabs/common/command_submission.c
+++ b/drivers/accel/habanalabs/common/command_submission.c
@@ -1860,11 +1860,10 @@ int hl_cs_signal_sob_wraparound_handler(struct hl_device *hdev, u32 q_idx,
 		if (other_sob->need_reset)
 			hw_sob_put(other_sob);
 
-		if (encaps_sig) {
+		if (encaps_sig)
 			/* set reset indication for the sob */
 			sob->need_reset = true;
-			hw_sob_get(other_sob);
-		}
+		hw_sob_get(other_sob);
 
 		dev_dbg(hdev->dev, "switched to SOB %d, q_idx: %d\n",
 				prop->curr_sob_offset, q_idx);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] accel/habanalabs: fix kref underflow in hl_cs_signal_sob_wraparound_handler
  2026-06-28 11:30 [PATCH v2] accel/habanalabs: fix kref underflow in hl_cs_signal_sob_wraparound_handler WenTao Liang
@ 2026-06-28 11:48 ` sashiko-bot
  0 siblings, 0 replies; 2+ messages in thread
From: sashiko-bot @ 2026-06-28 11:48 UTC (permalink / raw)
  To: WenTao Liang; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Unconditionally incrementing the reference count of the new SOB during a wraparound causes a permanent reference leak when `encaps_sig` is false, eventually causing a denial of service.

Pre-existing issues:
- [High] Pre-existing reference underflow on the old SOB during a wraparound in `init_signal_cs()`.
- [High] Pre-existing mismatch between the hardware signal generation and the SOB address returned to userspace, leading to an indefinite userspace hang.
--

commit c04dffdbf27c2cf3b267257e4a97f77648c44492
Author: WenTao Liang <vulab@iscas.ac.cn>

accel/habanalabs: fix kref underflow in hl_cs_signal_sob_wraparound_handler

When other_sob->need_reset is true and encaps_sig is false,
hw_sob_put(other_sob) decrements the kref to 0, but the matching
hw_sob_get(other_sob) is skipped because it is inside the encaps_sig
block. The function returns other_sob with kref=0, causing a subsequent
kref_put to underflow. Fix by adding hw_sob_get(other_sob) in the else
branch.

Suggested-by: Greg KH <gregkh@linuxfoundation.org>
Fixes: dadf17abb724 ("habanalabs: add support for encapsulated signals reservation")
Cc: stable@vger.kernel.org
Signed-off-by: WenTao Liang <vulab@iscas.ac.cn>

> diff --git a/drivers/accel/habanalabs/common/command_submission.c b/drivers/accel/habanalabs/common/command_submission.c
> index d51b4cf332b2a..ffcdf29b7e070 100644
> --- a/drivers/accel/habanalabs/common/command_submission.c
> +++ b/drivers/accel/habanalabs/common/command_submission.c

[ ... ]

> @@ -1869,11 +1869,10 @@ int hl_cs_signal_sob_wraparound_handler(struct hl_device *hdev, u32 q_idx,
>  		if (other_sob->need_reset)
>  			hw_sob_put(other_sob);
>  
> -		if (encaps_sig) {
> +		if (encaps_sig)
>  			/* set reset indication for the sob */
>  			sob->need_reset = true;
> -			hw_sob_get(other_sob);
> -		}
> +		hw_sob_get(other_sob);

[Severity: High]
Does this unconditional get leak the reference count when encaps_sig is false?

If encaps_sig is false, init_signal_cs() retains the old SOB in
cs_cmpl->hw_sob and ignores the new SOB returned by this wraparound handler.

Since no object takes ownership of this newly acquired reference, does it
permanently leak the other_sob reference?

Could this inflated refcount cause the kref_read(&other_sob->kref) != 1
check to fail during future wraparounds, leading to a denial of service?

[ ... ]

[Severity: High]
This is a pre-existing issue, but can init_signal_cs() cause a reference
underflow on the old SOB during a wraparound?

In drivers/accel/habanalabs/common/hw_queue.c:init_signal_cs(), the
completion object retains a pointer to the old SOB before calling the
wraparound handler:

    cs_cmpl->hw_sob = hw_sob;
    ...
    rc = hl_cs_signal_sob_wraparound_handler(hdev, q_idx, &hw_sob, 1,
                                             false);

If a wraparound occurs, the handler calls hw_sob_put_err(sob) on the old
SOB. Later, when the job completes, cs_do_release() calls
cs_release_sob_reset_handler() which unconditionally calls
hw_sob_put(hl_cs_cmpl->hw_sob).

Since the completion object still points to the old SOB, does this decrement
the old SOB's refcount without a corresponding increment, leading to premature
release?

[Severity: High]
This is a pre-existing issue, but is there a mismatch between the
hardware signal generation and the SOB address returned to userspace?

In drivers/accel/habanalabs/common/hw_queue.c:init_signal_cs(), the hardware
command is generated using the old SOB's ID:

    hdev->asic_funcs->gen_signal_cb(hdev, job->patched_cb,
                                    cs_cmpl->hw_sob->sob_id, 0, true);

Immediately after, the wraparound handler executes and updates the local hw_sob
pointer to the new SOB:

    rc = hl_cs_signal_sob_wraparound_handler(hdev, q_idx, &hw_sob, 1,
                                             false);

The function then assigns the new SOB's address to the job:

    job->cs->sob_addr_offset = hw_sob->sob_addr;

If userspace waits on this exported offset, it waits on the new SOB, but
the hardware was programmed to signal the old SOB. Can this cause an
indefinite userspace hang?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260628113020.43942-1-vulab@iscas.ac.cn?part=1

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-28 11:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-28 11:30 [PATCH v2] accel/habanalabs: fix kref underflow in hl_cs_signal_sob_wraparound_handler WenTao Liang
2026-06-28 11:48 ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.