From: Oded Gabbay <ogabbay@kernel.org>
To: dri-devel@lists.freedesktop.org
Subject: [PATCH 05/12] accel/habanalabs: print max timeout value on CS stuck
Date: Tue, 16 May 2023 12:30:23 +0300 [thread overview]
Message-ID: <20230516093030.1220526-5-ogabbay@kernel.org> (raw)
In-Reply-To: <20230516093030.1220526-1-ogabbay@kernel.org>
If a workload got stuck, we print an error to the kernel log about it.
Add to that print the configured max timeout value, as that value is
not fixed between ASICs and in addition it can be configured using
a kernel module parameter.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
.../habanalabs/common/command_submission.c | 26 +++++++++++--------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/drivers/accel/habanalabs/common/command_submission.c b/drivers/accel/habanalabs/common/command_submission.c
index ccf68f482948..4ec28af3ed78 100644
--- a/drivers/accel/habanalabs/common/command_submission.c
+++ b/drivers/accel/habanalabs/common/command_submission.c
@@ -804,12 +804,14 @@ static void cs_do_release(struct kref *ref)
static void cs_timedout(struct work_struct *work)
{
+ struct hl_cs *cs = container_of(work, struct hl_cs, work_tdr.work);
+ bool skip_reset_on_timeout, device_reset = false;
struct hl_device *hdev;
u64 event_mask = 0x0;
+ uint timeout_sec;
int rc;
- struct hl_cs *cs = container_of(work, struct hl_cs,
- work_tdr.work);
- bool skip_reset_on_timeout = cs->skip_reset_on_timeout, device_reset = false;
+
+ skip_reset_on_timeout = cs->skip_reset_on_timeout;
rc = cs_get_unless_zero(cs);
if (!rc)
@@ -840,29 +842,31 @@ static void cs_timedout(struct work_struct *work)
event_mask |= HL_NOTIFIER_EVENT_CS_TIMEOUT;
}
+ timeout_sec = jiffies_to_msecs(hdev->timeout_jiffies) / 1000;
+
switch (cs->type) {
case CS_TYPE_SIGNAL:
dev_err(hdev->dev,
- "Signal command submission %llu has not finished in time!\n",
- cs->sequence);
+ "Signal command submission %llu has not finished in %u seconds!\n",
+ cs->sequence, timeout_sec);
break;
case CS_TYPE_WAIT:
dev_err(hdev->dev,
- "Wait command submission %llu has not finished in time!\n",
- cs->sequence);
+ "Wait command submission %llu has not finished in %u seconds!\n",
+ cs->sequence, timeout_sec);
break;
case CS_TYPE_COLLECTIVE_WAIT:
dev_err(hdev->dev,
- "Collective Wait command submission %llu has not finished in time!\n",
- cs->sequence);
+ "Collective Wait command submission %llu has not finished in %u seconds!\n",
+ cs->sequence, timeout_sec);
break;
default:
dev_err(hdev->dev,
- "Command submission %llu has not finished in time!\n",
- cs->sequence);
+ "Command submission %llu has not finished in %u seconds!\n",
+ cs->sequence, timeout_sec);
break;
}
--
2.40.1
next prev parent reply other threads:[~2023-05-16 9:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-16 9:30 [PATCH 01/12] accel/habanalabs: rename security functions related arguments Oded Gabbay
2023-05-16 9:30 ` [PATCH 02/12] accel/habanalabs: set unused bit as reserved Oded Gabbay
2023-05-17 18:03 ` Ofir Bitton
2023-05-16 9:30 ` [PATCH 03/12] accel/habanalabs: fix mem leak in capture user mappings Oded Gabbay
2023-05-16 9:30 ` [PATCH 04/12] accel/habanalabs: align to latest firmware specs Oded Gabbay
2023-05-17 18:03 ` Ofir Bitton
2023-05-16 9:30 ` Oded Gabbay [this message]
2023-05-17 18:01 ` [PATCH 05/12] accel/habanalabs: print max timeout value on CS stuck Ofir Bitton
2023-05-16 9:30 ` [PATCH 06/12] accel/habanalabs: upon DMA errors, use FW-extracted error cause Oded Gabbay
2023-05-16 9:30 ` [PATCH 07/12] accel/habanalabs: remove support for mmu disable Oded Gabbay
2023-05-16 9:30 ` [PATCH 08/12] accel/habanalabs: use binning info when handling razwi Oded Gabbay
2023-05-16 9:30 ` [PATCH 09/12] accel/habanalabs: use lower QM in QM errors handling Oded Gabbay
2023-05-16 9:30 ` [PATCH 10/12] accel/habanalabs: print qman data on error only for lower qman Oded Gabbay
2023-05-16 9:30 ` [PATCH 11/12] accel/habanalabs: update state when loading boot fit Oded Gabbay
2023-05-16 9:30 ` [PATCH 12/12] accel/habanalabs: mask part of hmmu page fault captured address Oded Gabbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230516093030.1220526-5-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.