From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Tomer Tayar <ttayar@habana.ai>
Subject: [PATCH 29/30] habanalabs: prevent false heartbeat failure during soft-reset
Date: Sat, 22 Jan 2022 21:57:30 +0200 [thread overview]
Message-ID: <20220122195731.934494-29-ogabbay@kernel.org> (raw)
In-Reply-To: <20220122195731.934494-1-ogabbay@kernel.org>
From: Tomer Tayar <ttayar@habana.ai>
The heartbeat thread is active during soft-reset, and it tries to send
messages to CPU-CP core.
Within the soft-reset, in the time window in which the device is marked
as disabled, any CPU-CP command is "silently" skipped and a success
value it returned.
However, in addition to the return value, the heartbeat function also
checks the F/W result, but because no command is sent in this time
window, the result variable won't hold the expected value and we will
have a false heartbeat failure.
To avoid it, modify the "silent" skip to be done only in hard-reset.
The CPU-CP should be able to handle messages during soft-reset.
In addition to the heartbeat problem, this should also solve other
issues in other flows that send messages during soft-reset and use the
F/W result as it w/o being aware to the reset.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/firmware_if.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c
index 39de9d86ee6c..11957d36c6a9 100644
--- a/drivers/misc/habanalabs/common/firmware_if.c
+++ b/drivers/misc/habanalabs/common/firmware_if.c
@@ -214,7 +214,7 @@ int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg,
dma_addr_t pkt_dma_addr;
struct hl_bd *sent_bd;
u32 tmp, expected_ack_val, pi;
- int rc = 0;
+ int rc;
pkt = hdev->asic_funcs->cpu_accessible_dma_pool_alloc(hdev, len,
&pkt_dma_addr);
@@ -228,8 +228,11 @@ int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg,
mutex_lock(&hdev->send_cpu_message_lock);
- if (hdev->disabled)
+ /* CPU-CP messages can be sent during soft-reset */
+ if (hdev->disabled && !hdev->reset_info.is_in_soft_reset) {
+ rc = 0;
goto out;
+ }
if (hdev->device_cpu_disabled) {
rc = -EIO;
--
2.25.1
next prev parent reply other threads:[~2022-01-22 19:59 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-22 19:57 [PATCH 01/30] habanalabs: check the return value of hl_cs_poll_fences() Oded Gabbay
2022-01-22 19:57 ` [PATCH 02/30] habanalabs: fix race when waiting on encaps signal Oded Gabbay
2022-01-22 19:57 ` [PATCH 03/30] habanalabs: fix possible memory leak in MMU DR fini Oded Gabbay
2022-01-22 19:57 ` [PATCH 04/30] habanalabs/gaudi: disable CGM permanently Oded Gabbay
2022-01-22 19:57 ` [PATCH 05/30] habanalabs: remove ASIC functions of clock gating Oded Gabbay
2022-01-22 19:57 ` [PATCH 06/30] habanalabs: make some MMU functions common Oded Gabbay
2022-01-22 19:57 ` [PATCH 07/30] habanalabs: sysfs functions should be in sysfs.c Oded Gabbay
2022-01-22 19:57 ` [PATCH 08/30] habanalabs: get clk is common function Oded Gabbay
2022-01-22 19:57 ` [PATCH 09/30] habanalabs: remove hwmgr.c Oded Gabbay
2022-01-22 19:57 ` [PATCH 10/30] habanalabs: move more f/w functions to firmware_if.c Oded Gabbay
2022-01-22 19:57 ` [PATCH 11/30] habanalabs: remove asic callback set_pll_profile() Oded Gabbay
2022-01-22 19:57 ` [PATCH 12/30] habanalabs: rename dev_attr_grp to dev_clk_attr_grp Oded Gabbay
2022-01-22 19:57 ` [PATCH 13/30] habanalabs: add vrm version to sysfs Oded Gabbay
2022-01-22 19:57 ` [PATCH 14/30] habanalabs: remove power9 workaround for dma support Oded Gabbay
2022-01-22 19:57 ` [PATCH 15/30] habanalabs: use common wrapper for MMU cache invalidation Oded Gabbay
2022-01-22 19:57 ` [PATCH 16/30] habanalabs: sysfs support for fw os version Oded Gabbay
2022-01-22 19:57 ` [PATCH 17/30] habanalabs: there is no kernel TDR in future ASICs Oded Gabbay
2022-01-22 19:57 ` [PATCH 18/30] habanalabs: duplicate HOP table props to MMU props Oded Gabbay
2022-01-22 19:57 ` [PATCH 19/30] habanalabs: don't free phys_pg_pack inside lock Oded Gabbay
2022-01-22 19:57 ` [PATCH 20/30] habanalabs: avoid copying pll data if pll_info_get fails Oded Gabbay
2022-01-22 19:57 ` [PATCH 21/30] habanalabs: add missing error check in sysfs clk_freq_mhz_show Oded Gabbay
2022-01-22 19:57 ` [PATCH 22/30] habanalabs: fix soft reset flow in case of failure Oded Gabbay
2022-01-22 19:57 ` [PATCH 23/30] habanalabs: add missing error check in sysfs max_power_show Oded Gabbay
2022-01-22 19:57 ` [PATCH 24/30] habanalabs: update to latest f/w specs Oded Gabbay
2022-01-22 19:57 ` [PATCH 25/30] habanalabs: expose number of user interrupts Oded Gabbay
2022-01-22 19:57 ` [PATCH 26/30] habanalabs: reject host map with mmu disabled Oded Gabbay
2022-01-22 19:57 ` [PATCH 27/30] habanalabs: fix user interrupt wait when timeout is 0 Oded Gabbay
2022-01-22 19:57 ` [PATCH 28/30] habanalabs: fix race between wait and irq Oded Gabbay
2022-01-22 19:57 ` Oded Gabbay [this message]
2022-01-22 19:57 ` [PATCH 30/30] habanalabs: remove duplicate print Oded Gabbay
[not found] ` <20220123002722.3057-1-hdanton@sina.com>
2022-01-24 18:22 ` [PATCH 02/30] habanalabs: fix race when waiting on encaps signal Dani Liberman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220122195731.934494-29-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ttayar@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.