public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Tomer Tayar <ttayar@habana.ai>
Subject: [PATCH 3/8] habanalabs: print context refcount value if hard reset fails
Date: Wed, 23 Nov 2022 16:57:56 +0200	[thread overview]
Message-ID: <20221123145801.542029-3-ogabbay@kernel.org> (raw)
In-Reply-To: <20221123145801.542029-1-ogabbay@kernel.org>

From: Tomer Tayar <ttayar@habana.ai>

Failing to kill a user process during a hard reset can be due to a
reference to the user context which isn't released.
To make it easier to understand if this the reason for the failure and
not something else, add a print of the context refcount value.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/common/device.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index f5864893237c..926f230def56 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -696,10 +696,22 @@ static void device_hard_reset_pending(struct work_struct *work)
 	flags = device_reset_work->flags | HL_DRV_RESET_FROM_RESET_THR;
 
 	rc = hl_device_reset(hdev, flags);
+
 	if ((rc == -EBUSY) && !hdev->device_fini_pending) {
-		dev_info(hdev->dev,
-			"Could not reset device. will try again in %u seconds",
-			HL_PENDING_RESET_PER_SEC);
+		struct hl_ctx *ctx = hl_get_compute_ctx(hdev);
+
+		if (ctx) {
+			/* The read refcount value should subtracted by one, because the read is
+			 * protected with hl_get_compute_ctx().
+			 */
+			dev_info(hdev->dev,
+				"Could not reset device (compute_ctx refcount %u). will try again in %u seconds",
+				kref_read(&ctx->refcount) - 1, HL_PENDING_RESET_PER_SEC);
+			hl_ctx_put(ctx);
+		} else {
+			dev_info(hdev->dev, "Could not reset device. will try again in %u seconds",
+				HL_PENDING_RESET_PER_SEC);
+		}
 
 		queue_delayed_work(hdev->reset_wq, &device_reset_work->reset_work,
 					msecs_to_jiffies(HL_PENDING_RESET_PER_SEC * 1000));
-- 
2.25.1


  parent reply	other threads:[~2022-11-23 14:58 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-23 14:57 [PATCH 1/8] habanalabs: fix rc when new CPUCP opcodes are not supported Oded Gabbay
2022-11-23 14:57 ` [PATCH 2/8] habanalabs: add RMWREG32_SHIFTED to set a val within a mask Oded Gabbay
2022-11-23 14:57 ` Oded Gabbay [this message]
2022-11-23 14:57 ` [PATCH 4/8] habanalabs: don't put context in hl_encaps_handle_do_release_sob() Oded Gabbay
2022-11-23 14:57 ` [PATCH 5/8] habanalabs: clear non-released encapsulated signals Oded Gabbay
2022-11-23 14:57 ` [PATCH 6/8] habanalabs: make print of engines idle mask more readable Oded Gabbay
2022-11-23 14:58 ` [PATCH 7/8] habanalabs: fail driver load if EEPROM errors detected Oded Gabbay
2022-11-23 14:58 ` [PATCH 8/8] habanalabs: fix VA range calculation Oded Gabbay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221123145801.542029-3-ogabbay@kernel.org \
    --to=ogabbay@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ttayar@habana.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox