* [PATCH 1/9] habanalabs: pass reset flags to reset thread
@ 2021-12-05 15:42 Oded Gabbay
2021-12-05 15:42 ` [PATCH 2/9] habanalabs: add missing kernel-doc comments for hl_device fields Oded Gabbay
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Oded Gabbay @ 2021-12-05 15:42 UTC (permalink / raw)
To: linux-kernel; +Cc: Tomer Tayar
From: Tomer Tayar <ttayar@habana.ai>
The reset flags used by the reset thread are currently a mix of
hard-coded values and a specific flag which is passed from the context
that initiates the reset.
To make it easier to pass more flags in future from this context to the
reset thread, modify it to pass all the original reset flags to the
thread.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/device.c | 10 +++-------
drivers/misc/habanalabs/common/habanalabs.h | 4 ++--
2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index 720eea0b7e9c..db4168f35c18 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -324,16 +324,12 @@ static void device_cdev_sysfs_del(struct hl_device *hdev)
static void device_hard_reset_pending(struct work_struct *work)
{
struct hl_device_reset_work *device_reset_work =
- container_of(work, struct hl_device_reset_work,
- reset_work.work);
+ container_of(work, struct hl_device_reset_work, reset_work.work);
struct hl_device *hdev = device_reset_work->hdev;
u32 flags;
int rc;
- flags = HL_DRV_RESET_HARD | HL_DRV_RESET_FROM_RESET_THR;
-
- if (device_reset_work->fw_reset)
- flags |= HL_DRV_RESET_BYPASS_REQ_TO_FW;
+ flags = device_reset_work->flags | HL_DRV_RESET_FROM_RESET_THR;
rc = hl_device_reset(hdev, flags);
if ((rc == -EBUSY) && !hdev->device_fini_pending) {
@@ -1040,7 +1036,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
hdev->process_kill_trial_cnt = 0;
- hdev->device_reset_work.fw_reset = fw_reset;
+ hdev->device_reset_work.flags = flags;
/*
* Because the reset function can't run from heartbeat work,
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 93d0a85265be..722fc8e69fd6 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -2214,13 +2214,13 @@ struct hwmon_chip_info;
* @wq: work queue for device reset procedure.
* @reset_work: reset work to be done.
* @hdev: habanalabs device structure.
- * @fw_reset: whether f/w will do the reset without us sending them a message to do it.
+ * @flags: reset flags.
*/
struct hl_device_reset_work {
struct workqueue_struct *wq;
struct delayed_work reset_work;
struct hl_device *hdev;
- bool fw_reset;
+ u32 flags;
};
/**
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/9] habanalabs: add missing kernel-doc comments for hl_device fields
2021-12-05 15:42 [PATCH 1/9] habanalabs: pass reset flags to reset thread Oded Gabbay
@ 2021-12-05 15:42 ` Oded Gabbay
2021-12-05 15:42 ` [PATCH 3/9] habanalabs: free signal handle on failure Oded Gabbay
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Oded Gabbay @ 2021-12-05 15:42 UTC (permalink / raw)
To: linux-kernel; +Cc: Tomer Tayar
From: Tomer Tayar <ttayar@habana.ai>
Add missing kernel-doc comments for the "last_error" and
"stream_master_qid_arr" fields of the "hl_device" structure".
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/habanalabs.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 722fc8e69fd6..57bc55c2ddac 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -2512,6 +2512,8 @@ struct last_error_session_info {
* @state_dump_specs: constants and dictionaries needed to dump system state.
* @multi_cs_completion: array of multi-CS completion.
* @clk_throttling: holds information about current/previous clock throttling events
+ * @last_error: holds information about last session in which CS timeout or razwi error occurred.
+ * @stream_master_qid_arr: pointer to array with QIDs of master streams.
* @dram_used_mem: current DRAM memory consumption.
* @timeout_jiffies: device CS timeout value.
* @max_power: the max power of the device, as configured by the sysadmin. This
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/9] habanalabs: free signal handle on failure
2021-12-05 15:42 [PATCH 1/9] habanalabs: pass reset flags to reset thread Oded Gabbay
2021-12-05 15:42 ` [PATCH 2/9] habanalabs: add missing kernel-doc comments for hl_device fields Oded Gabbay
@ 2021-12-05 15:42 ` Oded Gabbay
2021-12-05 15:42 ` [PATCH 4/9] habanalabs: remove redundant check on ctx_fini Oded Gabbay
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Oded Gabbay @ 2021-12-05 15:42 UTC (permalink / raw)
To: linux-kernel
Fix a bug where in case of failure to allocate idr, the handle's
memory wasn't freed as part of the error handling code.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/command_submission.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 8be547b0926c..d169418197c0 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -1838,7 +1838,7 @@ static int cs_ioctl_reserve_signals(struct hl_fpriv *hpriv,
if (hdl_id < 0) {
dev_err(hdev->dev, "Failed to allocate IDR for a new signal reservation\n");
rc = -EINVAL;
- goto out;
+ goto free_handle;
}
handle->id = hdl_id;
@@ -1891,7 +1891,9 @@ static int cs_ioctl_reserve_signals(struct hl_fpriv *hpriv,
idr_remove(&mgr->handles, hdl_id);
spin_unlock(&mgr->lock);
+free_handle:
kfree(handle);
+
out:
return rc;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/9] habanalabs: remove redundant check on ctx_fini
2021-12-05 15:42 [PATCH 1/9] habanalabs: pass reset flags to reset thread Oded Gabbay
2021-12-05 15:42 ` [PATCH 2/9] habanalabs: add missing kernel-doc comments for hl_device fields Oded Gabbay
2021-12-05 15:42 ` [PATCH 3/9] habanalabs: free signal handle on failure Oded Gabbay
@ 2021-12-05 15:42 ` Oded Gabbay
2021-12-05 15:42 ` [PATCH 5/9] habanalabs: save ctx inside encaps signal Oded Gabbay
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Oded Gabbay @ 2021-12-05 15:42 UTC (permalink / raw)
To: linux-kernel
The driver supports only a single context. Therefore, no need to check
if the user context that is closed is the compute context. The user
context, if exists, is always the compute context.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/context.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c
index d0aaccd4df2c..4f7d39a29a42 100644
--- a/drivers/misc/habanalabs/common/context.c
+++ b/drivers/misc/habanalabs/common/context.c
@@ -97,10 +97,8 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
/* The engines are stopped as there is no executing CS, but the
* Coresight might be still working by accessing addresses
* related to the stopped engines. Hence stop it explicitly.
- * Stop only if this is the compute context, as there can be
- * only one compute context
*/
- if ((hdev->in_debug) && (hdev->compute_ctx == ctx))
+ if (hdev->in_debug)
hl_device_set_debug_mode(hdev, false);
hdev->asic_funcs->ctx_fini(ctx);
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/9] habanalabs: save ctx inside encaps signal
2021-12-05 15:42 [PATCH 1/9] habanalabs: pass reset flags to reset thread Oded Gabbay
` (2 preceding siblings ...)
2021-12-05 15:42 ` [PATCH 4/9] habanalabs: remove redundant check on ctx_fini Oded Gabbay
@ 2021-12-05 15:42 ` Oded Gabbay
2021-12-05 15:42 ` [PATCH 6/9] habanalabs: fix etr asid configuration Oded Gabbay
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Oded Gabbay @ 2021-12-05 15:42 UTC (permalink / raw)
To: linux-kernel
Compute context pointer in hdev shouldn't be used for fetching the
context's pointer.
If an object needs the context's pointer, it should get it while
incrementing its kref, and when the object is released, put it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/command_submission.c | 11 ++++++++---
drivers/misc/habanalabs/common/context.c | 10 +++++-----
drivers/misc/habanalabs/common/habanalabs.h | 2 ++
drivers/misc/habanalabs/common/hw_queue.c | 2 +-
4 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index d169418197c0..a63ebbc04787 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright 2016-2019 HabanaLabs, Ltd.
+ * Copyright 2016-2021 HabanaLabs, Ltd.
* All Rights Reserved.
*/
@@ -1829,6 +1829,9 @@ static int cs_ioctl_reserve_signals(struct hl_fpriv *hpriv,
}
handle->count = count;
+
+ hl_ctx_get(hdev, hpriv->ctx);
+ handle->ctx = hpriv->ctx;
mgr = &hpriv->ctx->sig_mgr;
spin_lock(&mgr->lock);
@@ -1838,7 +1841,7 @@ static int cs_ioctl_reserve_signals(struct hl_fpriv *hpriv,
if (hdl_id < 0) {
dev_err(hdev->dev, "Failed to allocate IDR for a new signal reservation\n");
rc = -EINVAL;
- goto free_handle;
+ goto put_ctx;
}
handle->id = hdl_id;
@@ -1891,7 +1894,8 @@ static int cs_ioctl_reserve_signals(struct hl_fpriv *hpriv,
idr_remove(&mgr->handles, hdl_id);
spin_unlock(&mgr->lock);
-free_handle:
+put_ctx:
+ hl_ctx_put(handle->ctx);
kfree(handle);
out:
@@ -1953,6 +1957,7 @@ static int cs_ioctl_unreserve_signals(struct hl_fpriv *hpriv, u32 handle_id)
/* Release the id and free allocated memory of the handle */
idr_remove(&mgr->handles, handle_id);
+ hl_ctx_put(encaps_sig_hdl->ctx);
kfree(encaps_sig_hdl);
} else {
rc = -EINVAL;
diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c
index 4f7d39a29a42..8291151948ef 100644
--- a/drivers/misc/habanalabs/common/context.c
+++ b/drivers/misc/habanalabs/common/context.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright 2016-2019 HabanaLabs, Ltd.
+ * Copyright 2016-2021 HabanaLabs, Ltd.
* All Rights Reserved.
*/
@@ -13,13 +13,13 @@ void hl_encaps_handle_do_release(struct kref *ref)
{
struct hl_cs_encaps_sig_handle *handle =
container_of(ref, struct hl_cs_encaps_sig_handle, refcount);
- struct hl_ctx *ctx = handle->hdev->compute_ctx;
- struct hl_encaps_signals_mgr *mgr = &ctx->sig_mgr;
+ struct hl_encaps_signals_mgr *mgr = &handle->ctx->sig_mgr;
spin_lock(&mgr->lock);
idr_remove(&mgr->handles, handle->id);
spin_unlock(&mgr->lock);
+ hl_ctx_put(handle->ctx);
kfree(handle);
}
@@ -27,8 +27,7 @@ static void hl_encaps_handle_do_release_sob(struct kref *ref)
{
struct hl_cs_encaps_sig_handle *handle =
container_of(ref, struct hl_cs_encaps_sig_handle, refcount);
- struct hl_ctx *ctx = handle->hdev->compute_ctx;
- struct hl_encaps_signals_mgr *mgr = &ctx->sig_mgr;
+ struct hl_encaps_signals_mgr *mgr = &handle->ctx->sig_mgr;
/* if we're here, then there was a signals reservation but cs with
* encaps signals wasn't submitted, so need to put refcount
@@ -40,6 +39,7 @@ static void hl_encaps_handle_do_release_sob(struct kref *ref)
idr_remove(&mgr->handles, handle->id);
spin_unlock(&mgr->lock);
+ hl_ctx_put(handle->ctx);
kfree(handle);
}
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 57bc55c2ddac..0ad08fdc89ea 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -2757,6 +2757,7 @@ struct hl_device {
* wait cs are used to wait of the reserved encaps signals.
* @hdev: pointer to habanalabs device structure.
* @hw_sob: pointer to H/W SOB used in the reservation.
+ * @ctx: pointer to the user's context data structure
* @cs_seq: staged cs sequence which contains encapsulated signals
* @id: idr handler id to be used to fetch the handler info
* @q_idx: stream queue index
@@ -2767,6 +2768,7 @@ struct hl_cs_encaps_sig_handle {
struct kref refcount;
struct hl_device *hdev;
struct hl_hw_sob *hw_sob;
+ struct hl_ctx *ctx;
u64 cs_seq;
u32 id;
u32 q_idx;
diff --git a/drivers/misc/habanalabs/common/hw_queue.c b/drivers/misc/habanalabs/common/hw_queue.c
index fc841d651210..6103e479e855 100644
--- a/drivers/misc/habanalabs/common/hw_queue.c
+++ b/drivers/misc/habanalabs/common/hw_queue.c
@@ -574,7 +574,7 @@ static int encaps_sig_first_staged_cs_handler
struct hl_encaps_signals_mgr *mgr;
int rc = 0;
- mgr = &hdev->compute_ctx->sig_mgr;
+ mgr = &cs->ctx->sig_mgr;
spin_lock(&mgr->lock);
encaps_sig_hdl = idr_find(&mgr->handles, cs->encaps_sig_hdl_id);
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 6/9] habanalabs: fix etr asid configuration
2021-12-05 15:42 [PATCH 1/9] habanalabs: pass reset flags to reset thread Oded Gabbay
` (3 preceding siblings ...)
2021-12-05 15:42 ` [PATCH 5/9] habanalabs: save ctx inside encaps signal Oded Gabbay
@ 2021-12-05 15:42 ` Oded Gabbay
2021-12-05 15:42 ` [PATCH 7/9] habanalabs: add helper to get compute context Oded Gabbay
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Oded Gabbay @ 2021-12-05 15:42 UTC (permalink / raw)
To: linux-kernel
Pass the user's context pointer into the etr configuration function
to extract its ASID.
Using the compute_ctx pointer is an error as it is just an indication
of whether a user has opened the compute device.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/context.c | 2 +-
drivers/misc/habanalabs/common/device.c | 4 ++--
drivers/misc/habanalabs/common/habanalabs.h | 6 +++---
drivers/misc/habanalabs/common/habanalabs_ioctl.c | 13 +++++++------
drivers/misc/habanalabs/gaudi/gaudiP.h | 4 ++--
drivers/misc/habanalabs/gaudi/gaudi_coresight.c | 4 ++--
drivers/misc/habanalabs/goya/goyaP.h | 4 ++--
drivers/misc/habanalabs/goya/goya_coresight.c | 4 ++--
8 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c
index 8291151948ef..8de1217b2ed2 100644
--- a/drivers/misc/habanalabs/common/context.c
+++ b/drivers/misc/habanalabs/common/context.c
@@ -99,7 +99,7 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
* related to the stopped engines. Hence stop it explicitly.
*/
if (hdev->in_debug)
- hl_device_set_debug_mode(hdev, false);
+ hl_device_set_debug_mode(hdev, ctx, false);
hdev->asic_funcs->ctx_fini(ctx);
hl_cb_va_pool_fini(ctx);
diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index db4168f35c18..bc5736ae6b70 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -622,7 +622,7 @@ int hl_device_utilization(struct hl_device *hdev, u32 *utilization)
return 0;
}
-int hl_device_set_debug_mode(struct hl_device *hdev, bool enable)
+int hl_device_set_debug_mode(struct hl_device *hdev, struct hl_ctx *ctx, bool enable)
{
int rc = 0;
@@ -637,7 +637,7 @@ int hl_device_set_debug_mode(struct hl_device *hdev, bool enable)
}
if (!hdev->hard_reset_pending)
- hdev->asic_funcs->halt_coresight(hdev);
+ hdev->asic_funcs->halt_coresight(hdev, ctx);
hdev->in_debug = 0;
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 0ad08fdc89ea..670fad9b4ca0 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -1288,7 +1288,7 @@ struct hl_asic_funcs {
int (*send_heartbeat)(struct hl_device *hdev);
void (*set_clock_gating)(struct hl_device *hdev);
void (*disable_clock_gating)(struct hl_device *hdev);
- int (*debug_coresight)(struct hl_device *hdev, void *data);
+ int (*debug_coresight)(struct hl_device *hdev, struct hl_ctx *ctx, void *data);
bool (*is_device_idle)(struct hl_device *hdev, u64 *mask_arr,
u8 mask_len, struct seq_file *s);
int (*non_hard_reset_late_init)(struct hl_device *hdev);
@@ -1303,7 +1303,7 @@ struct hl_asic_funcs {
int (*init_iatu)(struct hl_device *hdev);
u32 (*rreg)(struct hl_device *hdev, u32 reg);
void (*wreg)(struct hl_device *hdev, u32 reg, u32 val);
- void (*halt_coresight)(struct hl_device *hdev);
+ void (*halt_coresight)(struct hl_device *hdev, struct hl_ctx *ctx);
int (*ctx_init)(struct hl_ctx *ctx);
void (*ctx_fini)(struct hl_ctx *ctx);
int (*get_clk_rate)(struct hl_device *hdev, u32 *cur_clk, u32 *max_clk);
@@ -2867,7 +2867,7 @@ int hl_device_open_ctrl(struct inode *inode, struct file *filp);
bool hl_device_operational(struct hl_device *hdev,
enum hl_device_status *status);
enum hl_device_status hl_device_status(struct hl_device *hdev);
-int hl_device_set_debug_mode(struct hl_device *hdev, bool enable);
+int hl_device_set_debug_mode(struct hl_device *hdev, struct hl_ctx *ctx, bool enable);
int hl_hw_queues_create(struct hl_device *hdev);
void hl_hw_queues_destroy(struct hl_device *hdev);
int hl_hw_queue_send_cb_no_cmpl(struct hl_device *hdev, u32 hw_queue_id,
diff --git a/drivers/misc/habanalabs/common/habanalabs_ioctl.c b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
index 6c7339978bae..9210114beefe 100644
--- a/drivers/misc/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
@@ -158,7 +158,7 @@ static int hw_idle(struct hl_device *hdev, struct hl_info_args *args)
min((size_t) max_size, sizeof(hw_idle))) ? -EFAULT : 0;
}
-static int debug_coresight(struct hl_device *hdev, struct hl_debug_args *args)
+static int debug_coresight(struct hl_device *hdev, struct hl_ctx *ctx, struct hl_debug_args *args)
{
struct hl_debug_params *params;
void *input = NULL, *output = NULL;
@@ -200,7 +200,7 @@ static int debug_coresight(struct hl_device *hdev, struct hl_debug_args *args)
params->output_size = args->output_size;
}
- rc = hdev->asic_funcs->debug_coresight(hdev, params);
+ rc = hdev->asic_funcs->debug_coresight(hdev, ctx, params);
if (rc) {
dev_err(hdev->dev,
"debug coresight operation failed %d\n", rc);
@@ -738,13 +738,14 @@ static int hl_debug_ioctl(struct hl_fpriv *hpriv, void *data)
"Rejecting debug configuration request because device not in debug mode\n");
return -EFAULT;
}
- args->input_size =
- min(args->input_size, hl_debug_struct_size[args->op]);
- rc = debug_coresight(hdev, args);
+ args->input_size = min(args->input_size, hl_debug_struct_size[args->op]);
+ rc = debug_coresight(hdev, hpriv->ctx, args);
break;
+
case HL_DEBUG_OP_SET_MODE:
- rc = hl_device_set_debug_mode(hdev, (bool) args->enable);
+ rc = hl_device_set_debug_mode(hdev, hpriv->ctx, (bool) args->enable);
break;
+
default:
dev_err(hdev->dev, "Invalid request %d\n", args->op);
rc = -ENOTTY;
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index f325e36a71e6..8ac16a9b7d15 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -357,8 +357,8 @@ void gaudi_init_security(struct hl_device *hdev);
void gaudi_ack_protection_bits_errors(struct hl_device *hdev);
void gaudi_add_device_attr(struct hl_device *hdev,
struct attribute_group *dev_attr_grp);
-int gaudi_debug_coresight(struct hl_device *hdev, void *data);
-void gaudi_halt_coresight(struct hl_device *hdev);
+int gaudi_debug_coresight(struct hl_device *hdev, struct hl_ctx *ctx, void *data);
+void gaudi_halt_coresight(struct hl_device *hdev, struct hl_ctx *ctx);
void gaudi_mmu_prepare_reg(struct hl_device *hdev, u64 reg, u32 asid);
#endif /* GAUDIP_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_coresight.c b/drivers/misc/habanalabs/gaudi/gaudi_coresight.c
index 5349c1be13f9..08108f5fed67 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_coresight.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_coresight.c
@@ -848,7 +848,7 @@ static int gaudi_config_spmu(struct hl_device *hdev,
return 0;
}
-int gaudi_debug_coresight(struct hl_device *hdev, void *data)
+int gaudi_debug_coresight(struct hl_device *hdev, struct hl_ctx *ctx, void *data)
{
struct hl_debug_params *params = data;
int rc = 0;
@@ -887,7 +887,7 @@ int gaudi_debug_coresight(struct hl_device *hdev, void *data)
return rc;
}
-void gaudi_halt_coresight(struct hl_device *hdev)
+void gaudi_halt_coresight(struct hl_device *hdev, struct hl_ctx *ctx)
{
struct hl_debug_params params = {};
int i, rc;
diff --git a/drivers/misc/habanalabs/goya/goyaP.h b/drivers/misc/habanalabs/goya/goyaP.h
index f0c3c6df04d5..3740fd25bf84 100644
--- a/drivers/misc/habanalabs/goya/goyaP.h
+++ b/drivers/misc/habanalabs/goya/goyaP.h
@@ -220,8 +220,8 @@ void goya_set_pll_profile(struct hl_device *hdev, enum hl_pll_frequency freq);
void goya_add_device_attr(struct hl_device *hdev,
struct attribute_group *dev_attr_grp);
int goya_cpucp_info_get(struct hl_device *hdev);
-int goya_debug_coresight(struct hl_device *hdev, void *data);
-void goya_halt_coresight(struct hl_device *hdev);
+int goya_debug_coresight(struct hl_device *hdev, struct hl_ctx *ctx, void *data);
+void goya_halt_coresight(struct hl_device *hdev, struct hl_ctx *ctx);
int goya_suspend(struct hl_device *hdev);
int goya_resume(struct hl_device *hdev);
diff --git a/drivers/misc/habanalabs/goya/goya_coresight.c b/drivers/misc/habanalabs/goya/goya_coresight.c
index c55c100fdd24..2c5133cfae65 100644
--- a/drivers/misc/habanalabs/goya/goya_coresight.c
+++ b/drivers/misc/habanalabs/goya/goya_coresight.c
@@ -652,7 +652,7 @@ static int goya_config_spmu(struct hl_device *hdev,
return 0;
}
-int goya_debug_coresight(struct hl_device *hdev, void *data)
+int goya_debug_coresight(struct hl_device *hdev, struct hl_ctx *ctx, void *data)
{
struct hl_debug_params *params = data;
int rc = 0;
@@ -691,7 +691,7 @@ int goya_debug_coresight(struct hl_device *hdev, void *data)
return rc;
}
-void goya_halt_coresight(struct hl_device *hdev)
+void goya_halt_coresight(struct hl_device *hdev, struct hl_ctx *ctx)
{
struct hl_debug_params params = {};
int i, rc;
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 7/9] habanalabs: add helper to get compute context
2021-12-05 15:42 [PATCH 1/9] habanalabs: pass reset flags to reset thread Oded Gabbay
` (4 preceding siblings ...)
2021-12-05 15:42 ` [PATCH 6/9] habanalabs: fix etr asid configuration Oded Gabbay
@ 2021-12-05 15:42 ` Oded Gabbay
2021-12-05 15:42 ` [PATCH 8/9] habanalabs: remove compute context pointer Oded Gabbay
2021-12-05 15:42 ` [PATCH 9/9] habanalabs: wait again for multi-CS if no CS completed Oded Gabbay
7 siblings, 0 replies; 9+ messages in thread
From: Oded Gabbay @ 2021-12-05 15:42 UTC (permalink / raw)
To: linux-kernel
There are multiple places where the code needs to get the context's
pointer and increment its ref cnt. This is the proper way instead
of using the compute context pointer in the device structure.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/context.c | 23 +++++++++++++++++++++
drivers/misc/habanalabs/common/debugfs.c | 14 ++++++-------
drivers/misc/habanalabs/common/device.c | 13 ++++++------
drivers/misc/habanalabs/common/habanalabs.h | 1 +
4 files changed, 36 insertions(+), 15 deletions(-)
diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c
index 8de1217b2ed2..b2884107fa15 100644
--- a/drivers/misc/habanalabs/common/context.c
+++ b/drivers/misc/habanalabs/common/context.c
@@ -272,6 +272,29 @@ int hl_ctx_put(struct hl_ctx *ctx)
return kref_put(&ctx->refcount, hl_ctx_do_release);
}
+struct hl_ctx *hl_get_compute_ctx(struct hl_device *hdev)
+{
+ struct hl_ctx *ctx = NULL;
+ struct hl_fpriv *hpriv;
+
+ mutex_lock(&hdev->fpriv_list_lock);
+
+ list_for_each_entry(hpriv, &hdev->fpriv_list, dev_node) {
+ /* There can only be a single user which has opened the compute device, so exit
+ * immediately once we find him
+ */
+ if (!hpriv->is_control) {
+ ctx = hpriv->ctx;
+ hl_ctx_get(hdev, ctx);
+ break;
+ }
+ }
+
+ mutex_unlock(&hdev->fpriv_list_lock);
+
+ return ctx;
+}
+
/*
* hl_ctx_get_fence_locked - get CS fence under CS lock
*
diff --git a/drivers/misc/habanalabs/common/debugfs.c b/drivers/misc/habanalabs/common/debugfs.c
index 9727d82b121f..2e9c31d79d5e 100644
--- a/drivers/misc/habanalabs/common/debugfs.c
+++ b/drivers/misc/habanalabs/common/debugfs.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright 2016-2019 HabanaLabs, Ltd.
+ * Copyright 2016-2021 HabanaLabs, Ltd.
* All Rights Reserved.
*/
@@ -327,11 +327,7 @@ static int vm_show(struct seq_file *s, void *data)
spin_unlock(&dev_entry->ctx_mem_hash_spinlock);
- mutex_lock(&dev_entry->hdev->fpriv_list_lock);
- ctx = dev_entry->hdev->compute_ctx;
- if (ctx)
- hl_ctx_get(dev_entry->hdev, ctx);
- mutex_unlock(&dev_entry->hdev->fpriv_list_lock);
+ ctx = hl_get_compute_ctx(dev_entry->hdev);
if (ctx) {
seq_puts(s, "\nVA ranges:\n\n");
for (i = HL_VA_RANGE_TYPE_HOST ; i < HL_VA_RANGE_TYPE_MAX ; ++i) {
@@ -443,7 +439,7 @@ static int mmu_show(struct seq_file *s, void *data)
if (dev_entry->mmu_asid == HL_KERNEL_ASID_ID)
ctx = hdev->kernel_ctx;
else
- ctx = hdev->compute_ctx;
+ ctx = hl_get_compute_ctx(hdev);
if (!ctx) {
dev_err(hdev->dev, "no ctx available\n");
@@ -596,7 +592,7 @@ static int device_va_to_pa(struct hl_device *hdev, u64 virt_addr, u32 size,
u64 *phys_addr)
{
struct hl_vm_phys_pg_pack *phys_pg_pack;
- struct hl_ctx *ctx = hdev->compute_ctx;
+ struct hl_ctx *ctx;
struct hl_vm_hash_node *hnode;
u64 end_address, range_size;
struct hl_userptr *userptr;
@@ -604,6 +600,8 @@ static int device_va_to_pa(struct hl_device *hdev, u64 virt_addr, u32 size,
bool valid = false;
int i, rc = 0;
+ ctx = hl_get_compute_ctx(hdev);
+
if (!ctx) {
dev_err(hdev->dev, "no ctx available\n");
return -EINVAL;
diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index bc5736ae6b70..407f6c5020c7 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -961,6 +961,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
bool hard_reset, from_hard_reset_thread, fw_reset, hard_instead_soft = false,
reset_upon_device_release = false;
u64 idle_mask[HL_BUSY_ENGINES_MASK_EXT_SIZE] = {0};
+ struct hl_ctx *ctx;
int i, rc;
if (!hdev->init_done) {
@@ -1101,16 +1102,14 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
hl_cq_reset(hdev, &hdev->completion_queue[i]);
- mutex_lock(&hdev->fpriv_list_lock);
-
/* Make sure the context switch phase will run again */
- if (hdev->compute_ctx) {
- atomic_set(&hdev->compute_ctx->thread_ctx_switch_token, 1);
- hdev->compute_ctx->thread_ctx_switch_wait_token = 0;
+ ctx = hl_get_compute_ctx(hdev);
+ if (ctx) {
+ atomic_set(&ctx->thread_ctx_switch_token, 1);
+ ctx->thread_ctx_switch_wait_token = 0;
+ hl_ctx_put(ctx);
}
- mutex_unlock(&hdev->fpriv_list_lock);
-
/* Finished tear-down, starting to re-initialize */
if (hard_reset) {
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 670fad9b4ca0..eec96e506bb0 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -2906,6 +2906,7 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx);
void hl_ctx_do_release(struct kref *ref);
void hl_ctx_get(struct hl_device *hdev, struct hl_ctx *ctx);
int hl_ctx_put(struct hl_ctx *ctx);
+struct hl_ctx *hl_get_compute_ctx(struct hl_device *hdev);
struct hl_fence *hl_ctx_get_fence(struct hl_ctx *ctx, u64 seq);
int hl_ctx_get_fences(struct hl_ctx *ctx, u64 *seq_arr,
struct hl_fence **fence, u32 arr_len);
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 8/9] habanalabs: remove compute context pointer
2021-12-05 15:42 [PATCH 1/9] habanalabs: pass reset flags to reset thread Oded Gabbay
` (5 preceding siblings ...)
2021-12-05 15:42 ` [PATCH 7/9] habanalabs: add helper to get compute context Oded Gabbay
@ 2021-12-05 15:42 ` Oded Gabbay
2021-12-05 15:42 ` [PATCH 9/9] habanalabs: wait again for multi-CS if no CS completed Oded Gabbay
7 siblings, 0 replies; 9+ messages in thread
From: Oded Gabbay @ 2021-12-05 15:42 UTC (permalink / raw)
To: linux-kernel
It was an error to save the compute context's pointer in the device
structure, as it allowed its use without proper ref-cnt.
Change the variable to a flag that only indicates whether there is
an active compute context. Code that needs the pointer will now
be forced to use proper internal APIs to get the pointer.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/context.c | 2 +-
drivers/misc/habanalabs/common/device.c | 10 +++++-----
drivers/misc/habanalabs/common/habanalabs.h | 5 ++---
drivers/misc/habanalabs/common/habanalabs_drv.c | 2 +-
drivers/misc/habanalabs/goya/goya.c | 4 ++--
drivers/misc/habanalabs/goya/goya_hwmgr.c | 4 ++--
6 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c
index b2884107fa15..49e6f1172d18 100644
--- a/drivers/misc/habanalabs/common/context.c
+++ b/drivers/misc/habanalabs/common/context.c
@@ -165,7 +165,7 @@ int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv *hpriv)
hpriv->ctx = ctx;
/* TODO: remove the following line for multiple process support */
- hdev->compute_ctx = ctx;
+ hdev->is_compute_ctx_active = true;
return 0;
diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index 407f6c5020c7..bea05a59425f 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -97,12 +97,12 @@ static void hpriv_release(struct kref *ref)
|| hdev->reset_upon_device_release)
hl_device_reset(hdev, HL_DRV_RESET_DEV_RELEASE);
- /* Now we can mark the compute_ctx as empty. Even if a reset is running in a different
+ /* Now we can mark the compute_ctx as not active. Even if a reset is running in a different
* thread, we don't care because the in_reset is marked so if a user will try to open
- * the device it will fail on that, even if compute_ctx is NULL.
+ * the device it will fail on that, even if compute_ctx is false.
*/
mutex_lock(&hdev->fpriv_list_lock);
- hdev->compute_ctx = NULL;
+ hdev->is_compute_ctx_active = false;
mutex_unlock(&hdev->fpriv_list_lock);
kfree(hpriv);
@@ -1150,7 +1150,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
goto out_err;
}
- hdev->compute_ctx = NULL;
+ hdev->is_compute_ctx_active = false;
rc = hl_ctx_init(hdev, hdev->kernel_ctx, true);
if (rc) {
@@ -1403,7 +1403,7 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
goto mmu_fini;
}
- hdev->compute_ctx = NULL;
+ hdev->is_compute_ctx_active = false;
hdev->asic_funcs->state_dump_init(hdev);
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index eec96e506bb0..df1935952c28 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -2503,7 +2503,6 @@ struct last_error_session_info {
* @fpriv_list: list of file private data structures. Each structure is created
* when a user opens the device
* @fpriv_list_lock: protects the fpriv_list
- * @compute_ctx: current compute context executing.
* @aggregated_cs_counters: aggregated cs counters among all contexts
* @mmu_priv: device-specific MMU data.
* @mmu_func: device-related MMU functions.
@@ -2601,6 +2600,7 @@ struct last_error_session_info {
* cases where Linux was not loaded to device CPU
* @supports_wait_for_multi_cs: true if wait for multi CS is supported
* @is_in_soft_reset: Device is currently in soft reset process.
+ * @is_compute_ctx_active: Whether there is an active compute context executing.
*/
struct hl_device {
struct pci_dev *pdev;
@@ -2656,8 +2656,6 @@ struct hl_device {
struct list_head fpriv_list;
struct mutex fpriv_list_lock;
- struct hl_ctx *compute_ctx;
-
struct hl_cs_counters_atomic aggregated_cs_counters;
struct hl_mmu_priv mmu_priv;
@@ -2730,6 +2728,7 @@ struct hl_device {
u8 supports_wait_for_multi_cs;
u8 stream_master_qid_arr_size;
u8 is_in_soft_reset;
+ u8 is_compute_ctx_active;
/* Parameters for bring-up */
u64 nic_ports_mask;
diff --git a/drivers/misc/habanalabs/common/habanalabs_drv.c b/drivers/misc/habanalabs/common/habanalabs_drv.c
index d4ef99952d15..62a02ef43bb7 100644
--- a/drivers/misc/habanalabs/common/habanalabs_drv.c
+++ b/drivers/misc/habanalabs/common/habanalabs_drv.c
@@ -161,7 +161,7 @@ int hl_device_open(struct inode *inode, struct file *filp)
goto out_err;
}
- if (hdev->compute_ctx) {
+ if (hdev->is_compute_ctx_active) {
dev_dbg_ratelimited(hdev->dev,
"Can't open %s because another user is working on it\n",
dev_name(hdev->dev));
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index e54d60e75854..8d0f2cd608fc 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright 2016-2019 HabanaLabs, Ltd.
+ * Copyright 2016-2021 HabanaLabs, Ltd.
* All Rights Reserved.
*/
@@ -827,7 +827,7 @@ static void goya_set_freq_to_low_job(struct work_struct *work)
mutex_lock(&hdev->fpriv_list_lock);
- if (!hdev->compute_ctx)
+ if (!hdev->is_compute_ctx_active)
goya_set_frequency(hdev, PLL_LOW);
mutex_unlock(&hdev->fpriv_list_lock);
diff --git a/drivers/misc/habanalabs/goya/goya_hwmgr.c b/drivers/misc/habanalabs/goya/goya_hwmgr.c
index 42985a85b625..76b47749affe 100644
--- a/drivers/misc/habanalabs/goya/goya_hwmgr.c
+++ b/drivers/misc/habanalabs/goya/goya_hwmgr.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright 2016-2019 HabanaLabs, Ltd.
+ * Copyright 2016-2021 HabanaLabs, Ltd.
* All Rights Reserved.
*/
@@ -258,7 +258,7 @@ static ssize_t pm_mng_profile_store(struct device *dev,
mutex_lock(&hdev->fpriv_list_lock);
- if (hdev->compute_ctx) {
+ if (hdev->is_compute_ctx_active) {
dev_err(hdev->dev,
"Can't change PM profile while compute context is opened on the device\n");
count = -EPERM;
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 9/9] habanalabs: wait again for multi-CS if no CS completed
2021-12-05 15:42 [PATCH 1/9] habanalabs: pass reset flags to reset thread Oded Gabbay
` (6 preceding siblings ...)
2021-12-05 15:42 ` [PATCH 8/9] habanalabs: remove compute context pointer Oded Gabbay
@ 2021-12-05 15:42 ` Oded Gabbay
7 siblings, 0 replies; 9+ messages in thread
From: Oded Gabbay @ 2021-12-05 15:42 UTC (permalink / raw)
To: linux-kernel; +Cc: Ohad Sharabi, Dani Liberman
From: Ohad Sharabi <osharabi@habana.ai>
The original multi-CS design assumption that stream masters are used
exclusively (i.e. multi-CS with set of stream master QIDs will not get
completed by CS not from the multi-CS set) is inaccurate.
Thus multi-CS behavior is now modified not to treat such case as an
error.
Instead, if we have multi-CS completion but we detect that no CS from
the list is actually completed we will do another multi-CS wait (with
modified timeout).
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
.../habanalabs/common/command_submission.c | 97 +++++++++----------
drivers/misc/habanalabs/common/habanalabs.h | 4 +-
2 files changed, 50 insertions(+), 51 deletions(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index a63ebbc04787..f58fff3671d6 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -545,13 +545,6 @@ static void complete_multi_cs(struct hl_device *hdev, struct hl_cs *cs)
* mcs fences.
*/
fence->mcs_handling_done = true;
- /*
- * Since CS (and its related fence) can be associated with only one
- * multi CS context, once it triggered multi CS completion no need to
- * continue checking other multi CS contexts.
- */
- spin_unlock(&mcs_compl->lock);
- break;
}
spin_unlock(&mcs_compl->lock);
@@ -2498,6 +2491,21 @@ static int _hl_cs_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
return rc;
}
+static inline unsigned long hl_usecs64_to_jiffies(const u64 usecs)
+{
+ if (usecs <= U32_MAX)
+ return usecs_to_jiffies(usecs);
+
+ /*
+ * If the value in nanoseconds is larger than 64 bit, use the largest
+ * 64 bit value.
+ */
+ if (usecs >= ((u64)(U64_MAX / NSEC_PER_USEC)))
+ return nsecs_to_jiffies(U64_MAX);
+
+ return nsecs_to_jiffies(usecs * NSEC_PER_USEC);
+}
+
/*
* hl_wait_multi_cs_completion_init - init completion structure
*
@@ -2534,8 +2542,7 @@ static struct multi_cs_completion *hl_wait_multi_cs_completion_init(
}
if (i == MULTI_CS_MAX_USER_CTX) {
- dev_err(hdev->dev,
- "no available multi-CS completion structure\n");
+ dev_err(hdev->dev, "no available multi-CS completion structure\n");
return ERR_PTR(-ENOMEM);
}
return mcs_compl;
@@ -2566,27 +2573,18 @@ static void hl_wait_multi_cs_completion_fini(
*
* @return 0 on success, otherwise non 0 error code
*/
-static int hl_wait_multi_cs_completion(struct multi_cs_data *mcs_data)
+static int hl_wait_multi_cs_completion(struct multi_cs_data *mcs_data,
+ struct multi_cs_completion *mcs_compl)
{
- struct hl_device *hdev = mcs_data->ctx->hdev;
- struct multi_cs_completion *mcs_compl;
long completion_rc;
- mcs_compl = hl_wait_multi_cs_completion_init(hdev,
- mcs_data->stream_master_qid_map);
- if (IS_ERR(mcs_compl))
- return PTR_ERR(mcs_compl);
-
- completion_rc = wait_for_completion_interruptible_timeout(
- &mcs_compl->completion,
- usecs_to_jiffies(mcs_data->timeout_us));
+ completion_rc = wait_for_completion_interruptible_timeout(&mcs_compl->completion,
+ mcs_data->timeout_jiffies);
/* update timestamp */
if (completion_rc > 0)
mcs_data->timestamp = mcs_compl->timestamp;
- hl_wait_multi_cs_completion_fini(mcs_compl);
-
mcs_data->wait_status = completion_rc;
return 0;
@@ -2619,6 +2617,7 @@ void hl_multi_cs_completion_init(struct hl_device *hdev)
*/
static int hl_multi_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data)
{
+ struct multi_cs_completion *mcs_compl;
struct hl_device *hdev = hpriv->hdev;
struct multi_cs_data mcs_data = {0};
union hl_wait_cs_args *args = data;
@@ -2686,12 +2685,19 @@ static int hl_multi_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data)
goto put_ctx;
/* wait (with timeout) for the first CS to be completed */
- mcs_data.timeout_us = args->in.timeout_us;
- rc = hl_wait_multi_cs_completion(&mcs_data);
- if (rc)
+ mcs_data.timeout_jiffies = hl_usecs64_to_jiffies(args->in.timeout_us);
+
+ mcs_compl = hl_wait_multi_cs_completion_init(hdev, mcs_data.stream_master_qid_map);
+ if (IS_ERR(mcs_compl)) {
+ rc = PTR_ERR(mcs_compl);
goto put_ctx;
+ }
+
+ while (true) {
+ rc = hl_wait_multi_cs_completion(&mcs_data, mcs_compl);
+ if (rc || (mcs_data.wait_status == 0))
+ break;
- if (mcs_data.wait_status > 0) {
/*
* poll fences once again to update the CS map.
* no timestamp should be updated this time.
@@ -2699,18 +2705,26 @@ static int hl_multi_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data)
mcs_data.update_ts = false;
rc = hl_cs_poll_fences(&mcs_data);
+ if (mcs_data.completion_bitmap)
+ break;
+
/*
* if hl_wait_multi_cs_completion returned before timeout (i.e.
- * it got a completion) we expect to see at least one CS
- * completed after the poll function.
+ * it got a completion) it either got completed by CS in the multi CS list
+ * (in which case the indication will be non empty completion_bitmap) or it
+ * got completed by CS submitted to one of the shared stream master but
+ * not in the multi CS list (in which case we should wait again but reinit
+ * the completion, modify the timeout and set timestamp as zero to let a CS
+ * related to the current multi-CS set a new, relevant, timestamp)
*/
- if (!mcs_data.completion_bitmap) {
- dev_warn_ratelimited(hdev->dev,
- "Multi-CS got completion on wait but no CS completed\n");
- rc = -EFAULT;
- }
+ /* wait again with modified timeout */
+ mcs_data.timeout_jiffies = mcs_data.wait_status;
+ reinit_completion(&mcs_compl->completion);
+ mcs_compl->timestamp = 0;
}
+ hl_wait_multi_cs_completion_fini(mcs_compl);
+
put_ctx:
hl_ctx_put(ctx);
kfree(fence_arr);
@@ -2741,7 +2755,7 @@ static int hl_multi_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data)
}
/* update if some CS was gone */
- if (mcs_data.timestamp)
+ if (!mcs_data.timestamp)
args->out.flags |= HL_WAIT_CS_STATUS_FLAG_GONE;
} else {
args->out.status = HL_WAIT_CS_STATUS_BUSY;
@@ -2807,21 +2821,6 @@ static int hl_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data)
return 0;
}
-static inline unsigned long hl_usecs64_to_jiffies(const u64 usecs)
-{
- if (usecs <= U32_MAX)
- return usecs_to_jiffies(usecs);
-
- /*
- * If the value in nanoseconds is larger than 64 bit, use the largest
- * 64 bit value.
- */
- if (usecs >= ((u64)(U64_MAX / NSEC_PER_USEC)))
- return nsecs_to_jiffies(U64_MAX);
-
- return nsecs_to_jiffies(usecs * NSEC_PER_USEC);
-}
-
static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
u64 timeout_us, u64 user_address,
u64 target_value, struct hl_user_interrupt *interrupt,
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index df1935952c28..eda1c70f6966 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -2362,7 +2362,7 @@ struct multi_cs_completion {
* @ctx: pointer to the context structure
* @fence_arr: array of fences of all CSs
* @seq_arr: array of CS sequence numbers
- * @timeout_us: timeout in usec for waiting for CS to complete
+ * @timeout_jiffies: timeout in jiffies for waiting for CS to complete
* @timestamp: timestamp of first completed CS
* @wait_status: wait for CS status
* @completion_bitmap: bitmap of completed CSs (1- completed, otherwise 0)
@@ -2376,7 +2376,7 @@ struct multi_cs_data {
struct hl_ctx *ctx;
struct hl_fence **fence_arr;
u64 *seq_arr;
- s64 timeout_us;
+ s64 timeout_jiffies;
s64 timestamp;
long wait_status;
u32 completion_bitmap;
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-12-05 15:43 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-05 15:42 [PATCH 1/9] habanalabs: pass reset flags to reset thread Oded Gabbay
2021-12-05 15:42 ` [PATCH 2/9] habanalabs: add missing kernel-doc comments for hl_device fields Oded Gabbay
2021-12-05 15:42 ` [PATCH 3/9] habanalabs: free signal handle on failure Oded Gabbay
2021-12-05 15:42 ` [PATCH 4/9] habanalabs: remove redundant check on ctx_fini Oded Gabbay
2021-12-05 15:42 ` [PATCH 5/9] habanalabs: save ctx inside encaps signal Oded Gabbay
2021-12-05 15:42 ` [PATCH 6/9] habanalabs: fix etr asid configuration Oded Gabbay
2021-12-05 15:42 ` [PATCH 7/9] habanalabs: add helper to get compute context Oded Gabbay
2021-12-05 15:42 ` [PATCH 8/9] habanalabs: remove compute context pointer Oded Gabbay
2021-12-05 15:42 ` [PATCH 9/9] habanalabs: wait again for multi-CS if no CS completed Oded Gabbay
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox