From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Farah Kassabri <fkassabri@habana.ai>,
Oded Gabbay <ogabbay@kernel.org>, Sasha Levin <sashal@kernel.org>,
ttayar@habana.ai, stanislaw.gruszka@linux.intel.com,
kelbaz@habana.ai, dhirschfeld@habana.ai,
dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.7 64/88] accel/habanalabs: fix EQ heartbeat mechanism
Date: Mon, 22 Jan 2024 09:51:37 -0500 [thread overview]
Message-ID: <20240122145608.990137-64-sashal@kernel.org> (raw)
In-Reply-To: <20240122145608.990137-1-sashal@kernel.org>
From: Farah Kassabri <fkassabri@habana.ai>
[ Upstream commit d1958dce5ab6a3e089c60cf474e8c9b7e96e70ad ]
Stop rescheduling another heartbeat check when EQ heartbeat check fails
as it generates confusing logs in dmesg that the heartbeat fails.
Signed-off-by: Farah Kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/accel/habanalabs/common/device.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 9e461c03e705..9290d4374551 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -1044,18 +1044,19 @@ static bool is_pci_link_healthy(struct hl_device *hdev)
return (vendor_id == PCI_VENDOR_ID_HABANALABS);
}
-static void hl_device_eq_heartbeat(struct hl_device *hdev)
+static int hl_device_eq_heartbeat_check(struct hl_device *hdev)
{
- u64 event_mask = HL_NOTIFIER_EVENT_DEVICE_RESET | HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE;
struct asic_fixed_properties *prop = &hdev->asic_prop;
if (!prop->cpucp_info.eq_health_check_supported)
- return;
+ return 0;
if (hdev->eq_heartbeat_received)
hdev->eq_heartbeat_received = false;
else
- hl_device_cond_reset(hdev, HL_DRV_RESET_HARD, event_mask);
+ return -EIO;
+
+ return 0;
}
static void hl_device_heartbeat(struct work_struct *work)
@@ -1072,10 +1073,9 @@ static void hl_device_heartbeat(struct work_struct *work)
/*
* For EQ health check need to check if driver received the heartbeat eq event
* in order to validate the eq is working.
+ * Only if both the EQ is healthy and we managed to send the next heartbeat reschedule.
*/
- hl_device_eq_heartbeat(hdev);
-
- if (!hdev->asic_funcs->send_heartbeat(hdev))
+ if ((!hl_device_eq_heartbeat_check(hdev)) && (!hdev->asic_funcs->send_heartbeat(hdev)))
goto reschedule;
if (hl_device_operational(hdev, NULL))
--
2.43.0
next prev parent reply other threads:[~2024-01-22 15:00 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-22 14:50 [PATCH AUTOSEL 6.7 01/88] f2fs: fix to check return value of f2fs_reserve_new_block() Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 02/88] ALSA: hda: Refer to correct stream index at loops Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 03/88] ASoC: doc: Fix undefined SND_SOC_DAPM_NOPM argument Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 04/88] drm: Fix color LUT rounding Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 05/88] fast_dput(): handle underflows gracefully Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 06/88] reiserfs: Avoid touching renamed directory if parent does not change Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 07/88] ocfs2: " Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 08/88] drm/msm/a690: Fix reg values for a690 Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 09/88] RDMA/IPoIB: Fix error code return in ipoib_mcast_join Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 10/88] ASoC: SOF: icp3-dtrace: Fix wrong kfree() usage Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 11/88] drm/panel-edp: Add override_edid_mode quirk for generic edp Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 12/88] drm/bridge: anx7625: Fix Set HPD irq detect window to 2ms Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 13/88] drm/amd/display: Fix tiled display misalignment Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 14/88] media: renesas: vsp1: Fix references to pad config Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 15/88] f2fs: fix write pointers on zoned device after roll forward Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 16/88] ASoC: amd: Add new dmi entries for acp5x platform Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 17/88] drm/amd/display: initialize all the dpm level's stutter latency Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 18/88] drm/amd/display: Fix MST PBN/X.Y value calculations Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 19/88] drm/amd/display: Fix disable_otg_wa logic Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 20/88] drm/amd/display: Fix Replay Desync Error IRQ handler Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 21/88] drm/amd/display: add support for DTO genarated dscclk Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 22/88] drm/amd/display: Fix writeback_info never got updated Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 23/88] drm/amd/display: Fix writeback_info is not removed Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 24/88] drm/drm_file: fix use of uninitialized variable Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 25/88] drm/framebuffer: Fix " Sasha Levin
2024-01-22 14:50 ` [PATCH AUTOSEL 6.7 26/88] drm/mipi-dsi: Fix detach call without attach Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 27/88] media: stk1160: Fixed high volume of stk1160_dbg messages Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 28/88] media: rockchip: rga: fix swizzling for RGB formats Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 29/88] PCI: add INTEL_HDA_ARL to pci_ids.h Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 30/88] ALSA: hda: Intel: add HDA_ARL PCI ID support Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 31/88] ALSA: hda: intel-dspcfg: add filters for ARL-S and ARL Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 32/88] drm/msm/dp: Add DisplayPort controller for SM8650 Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 33/88] media: uvcvideo: Fix power line control for a Chicony camera Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 34/88] media: uvcvideo: Fix power line control for SunplusIT camera Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 35/88] media: rkisp1: Drop IRQF_SHARED Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 36/88] media: rkisp1: Fix IRQ handler return values Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 37/88] media: rkisp1: Store IRQ lines Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 38/88] media: rkisp1: Fix IRQ disable race issue Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 39/88] media: rkisp1: resizer: Stop manual allocation of v4l2_subdev_state Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 40/88] hwmon: (nct6775) Fix fan speed set failure in automatic mode Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 41/88] hwmon: (pc87360) Bounds check data->innr usage Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 42/88] hwmon: (hp-wmi-sensors) Fix failure to load on EliteDesk 800 G6 Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 43/88] f2fs: fix to tag gcing flag on page during block migration Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 44/88] drm/exynos: Call drm_atomic_helper_shutdown() at shutdown/unbind time Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 45/88] IB/ipoib: Fix mcast list locking Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 46/88] media: amphion: remove mutext lock in condition of wait_event Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 47/88] media: ddbridge: fix an error code problem in ddb_probe Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 48/88] media: ov2740: Fix hts value Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 49/88] media: i2c: imx335: Fix hblank min/max values Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 50/88] drm/amd/display: For prefetch mode > 0, extend prefetch if possible Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 51/88] drm/amd/display: Force p-state disallow if leaving no plane config Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 52/88] drm/amdkfd: fix mes set shader debugger process management Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 53/88] drm/msm/dpu: enable writeback on SM8350 Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 54/88] drm/msm/dpu: enable writeback on SM8450 Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 55/88] drm/msm/dpu: Ratelimit framedone timeout msgs Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 56/88] drm/msm/dpu: fix writeback programming for YUV cases Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 57/88] drm/msm/dpu: Add mutex lock in control vblank irq Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 58/88] drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 59/88] clk: hi3620: Fix memory leak in hi3620_mmc_clk_init() Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 60/88] clk: mmp: pxa168: Fix memory leak in pxa168_clk_init() Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 61/88] watchdog: starfive: add lock annotations to fix context imbalances Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 62/88] watchdog: it87_wdt: Keep WDTCTRL bit 3 unmodified for IT8784/IT8786 Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 63/88] accel/habanalabs: add support for Gaudi2C device Sasha Levin
2024-01-22 14:51 ` Sasha Levin [this message]
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 65/88] accel/habanalabs/gaudi2: fix undef opcode reporting Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 66/88] drm/amd/display: make flip_timestamp_in_us a 64-bit variable Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 67/88] drm/amd/display: fix usb-c connector_type Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 68/88] drm/amd/display: Fix lightup regression with DP2 single display configs Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 69/88] drm/amd/display: Only clear symclk otg flag for HDMI Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 70/88] clk: imx: scu: Fix memory leak in __imx_clk_gpr_scu() Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 71/88] clk: imx: clk-imx8qxp: fix LVDS bypass, pixel and phy clocks Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 72/88] ALSA: hda/tas2781: add fixup for Lenovo 14ARB7 Sasha Levin
2024-01-22 18:10 ` Gergo Koteles
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 73/88] drm/amdgpu: Fix ecc irq enable/disable unpaired Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 74/88] drm/amd/display: Fix minor issues in BW Allocation Phase2 Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 75/88] drm/amdgpu: Let KFD sync with VM fences Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 76/88] drm/amd/display: Fixing stream allocation regression Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 77/88] Re-revert "drm/amd/display: Enable Replay for static screen use cases" Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 78/88] drm/amdgpu: Fix possible NULL dereference in amdgpu_ras_query_error_status_helper() Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 79/88] drm/amdgpu: Fix variable 'mca_funcs' dereferenced before NULL check in 'amdgpu_mca_smu_get_mca_entry()' Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 80/88] drm/amdgpu: Fix '*fw' from request_firmware() not released in 'amdgpu_ucode_request()' Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 81/88] drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()' Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 82/88] drm/amdkfd: Fix iterator used outside loop in 'kfd_add_peer_prop()' Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 83/88] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole" Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 84/88] drm/amdgpu: apply the RV2 system aperture fix to RN/CZN as well Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 85/88] ALSA: hda/conexant: Fix headset auto detect fail in cx8070 and SN6140 Sasha Levin
2024-01-22 14:51 ` [PATCH AUTOSEL 6.7 86/88] ksmbd: set v2 lease version on lease upgrade Sasha Levin
2024-01-22 14:52 ` [PATCH AUTOSEL 6.7 87/88] ksmbd: fix potential circular locking issue in smb2_set_ea() Sasha Levin
2024-01-22 14:52 ` [PATCH AUTOSEL 6.7 88/88] ksmbd: send lease break notification on FILE_RENAME_INFORMATION Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240122145608.990137-64-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=dhirschfeld@habana.ai \
--cc=dri-devel@lists.freedesktop.org \
--cc=fkassabri@habana.ai \
--cc=kelbaz@habana.ai \
--cc=linux-kernel@vger.kernel.org \
--cc=ogabbay@kernel.org \
--cc=stable@vger.kernel.org \
--cc=stanislaw.gruszka@linux.intel.com \
--cc=ttayar@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox