From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Tal Cohen <talcohen@habana.ai>
Subject: [PATCH 05/17] habanalabs: send an event notification when CS timeout occurs
Date: Mon, 20 Jun 2022 16:04:20 +0300 [thread overview]
Message-ID: <20220620130432.1180451-5-ogabbay@kernel.org> (raw)
In-Reply-To: <20220620130432.1180451-1-ogabbay@kernel.org>
From: Tal Cohen <talcohen@habana.ai>
The Driver needs to inform the User process whenever one of its
CS is timed out. The Driver shall recognize the CS timeout and shall
send an eventfd notification, towards user space, whenever a timeout
is expired on a CS.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
.../habanalabs/common/command_submission.c | 26 ++++++++++++-------
include/uapi/misc/habanalabs.h | 2 ++
2 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 47b49cbf67ab..cbb7c29966ff 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -797,10 +797,11 @@ static void cs_do_release(struct kref *ref)
static void cs_timedout(struct work_struct *work)
{
struct hl_device *hdev;
+ u64 event_mask;
int rc;
struct hl_cs *cs = container_of(work, struct hl_cs,
work_tdr.work);
- bool skip_reset_on_timeout = cs->skip_reset_on_timeout;
+ bool skip_reset_on_timeout = cs->skip_reset_on_timeout, device_reset = false;
rc = cs_get_unless_zero(cs);
if (!rc)
@@ -811,9 +812,15 @@ static void cs_timedout(struct work_struct *work)
return;
}
- /* Mark the CS is timed out so we won't try to cancel its TDR */
- if (likely(!skip_reset_on_timeout))
+ if (likely(!skip_reset_on_timeout)) {
+ if (hdev->reset_on_lockup)
+ device_reset = true;
+ else
+ hdev->reset_info.needs_reset = true;
+
+ /* Mark the CS is timed out so we won't try to cancel its TDR */
cs->timedout = true;
+ }
hdev = cs->ctx->hdev;
@@ -822,6 +829,11 @@ static void cs_timedout(struct work_struct *work)
if (rc) {
hdev->last_error.cs_timeout.timestamp = ktime_get();
hdev->last_error.cs_timeout.seq = cs->sequence;
+
+ event_mask = device_reset ? (HL_NOTIFIER_EVENT_CS_TIMEOUT |
+ HL_NOTIFIER_EVENT_DEVICE_RESET) : HL_NOTIFIER_EVENT_CS_TIMEOUT;
+
+ hl_notifier_event_send_all(hdev, event_mask);
}
switch (cs->type) {
@@ -856,12 +868,8 @@ static void cs_timedout(struct work_struct *work)
cs_put(cs);
- if (likely(!skip_reset_on_timeout)) {
- if (hdev->reset_on_lockup)
- hl_device_reset(hdev, HL_DRV_RESET_TDR);
- else
- hdev->reset_info.needs_reset = true;
- }
+ if (device_reset)
+ hl_device_reset(hdev, HL_DRV_RESET_TDR);
}
static int allocate_cs(struct hl_device *hdev, struct hl_ctx *ctx,
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index 5f9a6097f5f3..18f86d259421 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -1435,10 +1435,12 @@ struct hl_debug_args {
* HL_NOTIFIER_EVENT_TPC_ASSERT - Indicates TPC assert event
* HL_NOTIFIER_EVENT_UNDEFINED_OPCODE - Indicates undefined operation code
* HL_NOTIFIER_EVENT_DEVICE_RESET - Indicates device requires a reset
+ * HL_NOTIFIER_EVENT_CS_TIMEOUT - Indicates CS timeout error
*/
#define HL_NOTIFIER_EVENT_TPC_ASSERT (1ULL << 0)
#define HL_NOTIFIER_EVENT_UNDEFINED_OPCODE (1ULL << 1)
#define HL_NOTIFIER_EVENT_DEVICE_RESET (1ULL << 2)
+#define HL_NOTIFIER_EVENT_CS_TIMEOUT (1ULL << 3)
/*
* Various information operations such as:
--
2.25.1
next prev parent reply other threads:[~2022-06-20 13:18 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-20 13:04 [PATCH 01/17] habanalabs/gaudi: collect undefined opcode error info Oded Gabbay
2022-06-20 13:04 ` [PATCH 02/17] habanalabs: expose undefined opcode status via info ioctl Oded Gabbay
2022-06-20 13:04 ` [PATCH 03/17] habanalabs/gaudi: invoke device reset from one code block Oded Gabbay
2022-06-20 13:04 ` [PATCH 04/17] habanalabs/gaudi: send device reset notification Oded Gabbay
2022-06-20 13:04 ` Oded Gabbay [this message]
2022-06-20 13:04 ` [PATCH 06/17] habanalabs: avoid unnecessary error print Oded Gabbay
2022-06-20 13:04 ` [PATCH 07/17] habanalabs/gaudi: fix incorrect MME offset calculation Oded Gabbay
2022-06-20 13:04 ` [PATCH 08/17] habanalabs: add validity check for cq counter offset Oded Gabbay
2022-06-20 13:04 ` [PATCH 09/17] habanalabs/gaudi: fix shift out of bounds Oded Gabbay
2022-06-20 13:04 ` [PATCH 10/17] habanalabs: fix NULL dereference on cs timeout Oded Gabbay
2022-06-20 13:04 ` [PATCH 11/17] habanalabs: remove unused get_dma_desc_list_size Oded Gabbay
2022-06-20 13:04 ` [PATCH 12/17] habanalabs/gaudi: notify user process on device unavailable Oded Gabbay
2022-06-20 13:04 ` [PATCH 13/17] habanalabs: add critical indication in sram ecc Oded Gabbay
2022-06-20 13:04 ` [PATCH 14/17] habanalabs: check fence pointer before use Oded Gabbay
2022-06-20 13:04 ` [PATCH 15/17] habanalabs: print pointer with correct modifier Oded Gabbay
2022-06-20 13:04 ` [PATCH 16/17] habanalabs: use kvcalloc when possible Oded Gabbay
2022-06-20 13:04 ` [PATCH 17/17] habanalabs: fix comment style Oded Gabbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220620130432.1180451-5-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=talcohen@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox