* [PATCH for-rc 0/9] RDMA/hns: Bugfixes
@ 2024-07-05 8:59 Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 1/9] RDMA/hns: Check atomic wr length Junxian Huang
` (8 more replies)
0 siblings, 9 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
Here are some bugfixes for hns driver.
Chengchang Tang (5):
RDMA/hns: Fix missing pagesize and alignment check in FRMR
RDMA/hns: Fix shift-out-bounds when max_inline_data is 0
RDMA/hns: Fix undifined behavior caused by invalid max_sge
RDMA/hns: Fix insufficient extend DB for VFs.
RDMA/hns: Fix mbx timing out before CMD execution is completed
Junxian Huang (3):
RDMA/hns: Check atomic wr length
RDMA/hns: Fix soft lockup under heavy CEQE load
RDMA/hns: Fix unmatch exception handling when init eq table fails
wenglianfa (1):
RDMA/hns: Fix a long wait for cmdq event during reset
drivers/infiniband/hw/hns/hns_roce_device.h | 7 +
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 191 +++++++++++++-------
drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 6 +
drivers/infiniband/hw/hns/hns_roce_mr.c | 5 +
drivers/infiniband/hw/hns/hns_roce_qp.c | 4 +-
drivers/infiniband/hw/hns/hns_roce_srq.c | 2 +-
6 files changed, 152 insertions(+), 63 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH for-rc 1/9] RDMA/hns: Check atomic wr length
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
@ 2024-07-05 8:59 ` Junxian Huang
2024-07-07 8:24 ` Leon Romanovsky
2024-07-05 8:59 ` [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset Junxian Huang
` (7 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
8 bytes is the only supported length of atomic. Return an error if
it is not.
Fixes: 384f88185112 ("RDMA/hns: Add atomic support")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
drivers/infiniband/hw/hns/hns_roce_device.h | 2 ++
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 19 +++++++++++++++----
2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index ff0b3f68ee3a..05005079258c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -91,6 +91,8 @@
/* Configure to HW for PAGE_SIZE larger than 4KB */
#define PG_SHIFT_OFFSET (PAGE_SHIFT - 12)
+#define ATOMIC_WR_LEN 8
+
#define HNS_ROCE_IDX_QUE_ENTRY_SZ 4
#define SRQ_DB_REG 0x230
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 4287818a737f..a5d746a5cc68 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -164,15 +164,23 @@ static void set_frmr_seg(struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
hr_reg_clear(fseg, FRMR_BLK_MODE);
}
-static void set_atomic_seg(const struct ib_send_wr *wr,
- struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
- unsigned int valid_num_sge)
+static int set_atomic_seg(struct hns_roce_dev *hr_dev,
+ const struct ib_send_wr *wr,
+ struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
+ unsigned int valid_num_sge, u32 msg_len)
{
struct hns_roce_v2_wqe_data_seg *dseg =
(void *)rc_sq_wqe + sizeof(struct hns_roce_v2_rc_send_wqe);
struct hns_roce_wqe_atomic_seg *aseg =
(void *)dseg + sizeof(struct hns_roce_v2_wqe_data_seg);
+ if (msg_len != ATOMIC_WR_LEN) {
+ ibdev_err_ratelimited(&hr_dev->ib_dev,
+ "invalid atomic wr len, len = %u.\n",
+ msg_len);
+ return -EINVAL;
+ }
+
set_data_seg_v2(dseg, wr->sg_list);
if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP) {
@@ -185,6 +193,8 @@ static void set_atomic_seg(const struct ib_send_wr *wr,
}
hr_reg_write(rc_sq_wqe, RC_SEND_WQE_SGE_NUM, valid_num_sge);
+
+ return 0;
}
static int fill_ext_sge_inl_data(struct hns_roce_qp *qp,
@@ -592,7 +602,8 @@ static inline int set_rc_wqe(struct hns_roce_qp *qp,
if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
wr->opcode == IB_WR_ATOMIC_FETCH_AND_ADD)
- set_atomic_seg(wr, rc_sq_wqe, valid_num_sge);
+ ret = set_atomic_seg(hr_dev, wr, rc_sq_wqe, valid_num_sge,
+ msg_len);
else if (wr->opcode != IB_WR_REG_MR)
ret = set_rwqe_data_seg(&qp->ibqp, wr, rc_sq_wqe,
&curr_idx, valid_num_sge);
--
2.33.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 1/9] RDMA/hns: Check atomic wr length Junxian Huang
@ 2024-07-05 8:59 ` Junxian Huang
2024-07-07 8:30 ` Leon Romanovsky
2024-07-05 8:59 ` [PATCH for-rc 3/9] RDMA/hns: Fix soft lockup under heavy CEQE load Junxian Huang
` (6 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
From: wenglianfa <wenglianfa@huawei.com>
During reset, cmdq events won't be reported, leading to a long and
unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
of reset.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: wenglianfa <wenglianfa@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index a5d746a5cc68..ff135df1a761 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
}
+
+static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
+{
+ struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
+ int i;
+
+ if (!hr_dev->cmd_mod)
+ return;
+
+ for (i = 0; i < hr_cmd->max_cmds; i++) {
+ hr_cmd->context[i].result = -EBUSY;
+ complete(&hr_cmd->context[i].done);
+ }
+}
+
static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
{
struct hns_roce_dev *hr_dev;
@@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
hr_dev->dis_db = true;
hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
+ /* Complete the CMDQ event in advance during the reset. */
+ hns_roce_v2_reset_notify_cmd(hr_dev);
+
return 0;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH for-rc 3/9] RDMA/hns: Fix soft lockup under heavy CEQE load
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 1/9] RDMA/hns: Check atomic wr length Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset Junxian Huang
@ 2024-07-05 8:59 ` Junxian Huang
2024-07-05 10:47 ` Zhu Yanjun
2024-07-05 8:59 ` [PATCH for-rc 4/9] RDMA/hns: Fix unmatch exception handling when init eq table fails Junxian Huang
` (5 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
CEQEs are handled in interrupt handler currently. This may cause the
CPU core staying in interrupt context too long and lead to soft lockup
under heavy load.
Handle CEQEs in tasklet and set an upper limit for the number of CEQE
handled by a single call of tasklet.
Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
drivers/infiniband/hw/hns/hns_roce_device.h | 1 +
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 88 ++++++++++++---------
2 files changed, 53 insertions(+), 36 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 05005079258c..5a2445f357ab 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -717,6 +717,7 @@ struct hns_roce_eq {
int shift;
int event_type;
int sub_type;
+ struct tasklet_struct tasklet;
};
struct hns_roce_eq_table {
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index ff135df1a761..f73de06a3ca5 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -6146,33 +6146,11 @@ static struct hns_roce_ceqe *next_ceqe_sw_v2(struct hns_roce_eq *eq)
!!(eq->cons_index & eq->entries)) ? ceqe : NULL;
}
-static irqreturn_t hns_roce_v2_ceq_int(struct hns_roce_dev *hr_dev,
- struct hns_roce_eq *eq)
+static irqreturn_t hns_roce_v2_ceq_int(struct hns_roce_eq *eq)
{
- struct hns_roce_ceqe *ceqe = next_ceqe_sw_v2(eq);
- irqreturn_t ceqe_found = IRQ_NONE;
- u32 cqn;
-
- while (ceqe) {
- /* Make sure we read CEQ entry after we have checked the
- * ownership bit
- */
- dma_rmb();
-
- cqn = hr_reg_read(ceqe, CEQE_CQN);
-
- hns_roce_cq_completion(hr_dev, cqn);
-
- ++eq->cons_index;
- ceqe_found = IRQ_HANDLED;
- atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CEQE_CNT]);
-
- ceqe = next_ceqe_sw_v2(eq);
- }
+ tasklet_schedule(&eq->tasklet);
- update_eq_db(eq);
-
- return IRQ_RETVAL(ceqe_found);
+ return IRQ_HANDLED;
}
static irqreturn_t hns_roce_v2_msix_interrupt_eq(int irq, void *eq_ptr)
@@ -6183,7 +6161,7 @@ static irqreturn_t hns_roce_v2_msix_interrupt_eq(int irq, void *eq_ptr)
if (eq->type_flag == HNS_ROCE_CEQ)
/* Completion event interrupt */
- int_work = hns_roce_v2_ceq_int(hr_dev, eq);
+ int_work = hns_roce_v2_ceq_int(eq);
else
/* Asynchronous event interrupt */
int_work = hns_roce_v2_aeq_int(hr_dev, eq);
@@ -6551,6 +6529,34 @@ static int hns_roce_v2_create_eq(struct hns_roce_dev *hr_dev,
return ret;
}
+static void hns_roce_ceq_task(struct tasklet_struct *task)
+{
+ struct hns_roce_eq *eq = from_tasklet(eq, task, tasklet);
+ struct hns_roce_ceqe *ceqe = next_ceqe_sw_v2(eq);
+ struct hns_roce_dev *hr_dev = eq->hr_dev;
+ int ceqe_num = 0;
+ u32 cqn;
+
+ while (ceqe && ceqe_num < hr_dev->caps.ceqe_depth) {
+ /* Make sure we read CEQ entry after we have checked the
+ * ownership bit
+ */
+ dma_rmb();
+
+ cqn = hr_reg_read(ceqe, CEQE_CQN);
+
+ hns_roce_cq_completion(hr_dev, cqn);
+
+ ++eq->cons_index;
+ ++ceqe_num;
+ atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CEQE_CNT]);
+
+ ceqe = next_ceqe_sw_v2(eq);
+ }
+
+ update_eq_db(eq);
+}
+
static int __hns_roce_request_irq(struct hns_roce_dev *hr_dev, int irq_num,
int comp_num, int aeq_num, int other_num)
{
@@ -6582,21 +6588,24 @@ static int __hns_roce_request_irq(struct hns_roce_dev *hr_dev, int irq_num,
j - other_num - aeq_num);
for (j = 0; j < irq_num; j++) {
- if (j < other_num)
+ if (j < other_num) {
ret = request_irq(hr_dev->irq[j],
hns_roce_v2_msix_interrupt_abn,
0, hr_dev->irq_names[j], hr_dev);
-
- else if (j < (other_num + comp_num))
+ } else if (j < (other_num + comp_num)) {
+ tasklet_setup(&eq_table->eq[j - other_num].tasklet,
+ hns_roce_ceq_task);
ret = request_irq(eq_table->eq[j - other_num].irq,
hns_roce_v2_msix_interrupt_eq,
0, hr_dev->irq_names[j + aeq_num],
&eq_table->eq[j - other_num]);
- else
+ } else {
ret = request_irq(eq_table->eq[j - other_num].irq,
hns_roce_v2_msix_interrupt_eq,
0, hr_dev->irq_names[j - comp_num],
&eq_table->eq[j - other_num]);
+ }
+
if (ret) {
dev_err(hr_dev->dev, "request irq error!\n");
goto err_request_failed;
@@ -6606,12 +6615,16 @@ static int __hns_roce_request_irq(struct hns_roce_dev *hr_dev, int irq_num,
return 0;
err_request_failed:
- for (j -= 1; j >= 0; j--)
- if (j < other_num)
+ for (j -= 1; j >= 0; j--) {
+ if (j < other_num) {
free_irq(hr_dev->irq[j], hr_dev);
- else
- free_irq(eq_table->eq[j - other_num].irq,
- &eq_table->eq[j - other_num]);
+ continue;
+ }
+ free_irq(eq_table->eq[j - other_num].irq,
+ &eq_table->eq[j - other_num]);
+ if (j < other_num + comp_num)
+ tasklet_kill(&eq_table->eq[j - other_num].tasklet);
+ }
err_kzalloc_failed:
for (i -= 1; i >= 0; i--)
@@ -6632,8 +6645,11 @@ static void __hns_roce_free_irq(struct hns_roce_dev *hr_dev)
for (i = 0; i < hr_dev->caps.num_other_vectors; i++)
free_irq(hr_dev->irq[i], hr_dev);
- for (i = 0; i < eq_num; i++)
+ for (i = 0; i < eq_num; i++) {
free_irq(hr_dev->eq_table.eq[i].irq, &hr_dev->eq_table.eq[i]);
+ if (i < hr_dev->caps.num_comp_vectors)
+ tasklet_kill(&hr_dev->eq_table.eq[i].tasklet);
+ }
for (i = 0; i < irq_num; i++)
kfree(hr_dev->irq_names[i]);
--
2.33.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH for-rc 4/9] RDMA/hns: Fix unmatch exception handling when init eq table fails
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
` (2 preceding siblings ...)
2024-07-05 8:59 ` [PATCH for-rc 3/9] RDMA/hns: Fix soft lockup under heavy CEQE load Junxian Huang
@ 2024-07-05 8:59 ` Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR Junxian Huang
` (4 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
The hw ctx should be destroyed when init eq table fails.
Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 25 +++++++++++-----------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index f73de06a3ca5..6cdeafef800a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -6373,9 +6373,16 @@ static void hns_roce_v2_int_mask_enable(struct hns_roce_dev *hr_dev,
roce_write(hr_dev, ROCEE_VF_ABN_INT_CFG_REG, enable_flag);
}
-static void hns_roce_v2_destroy_eqc(struct hns_roce_dev *hr_dev, u32 eqn)
+static void free_eq_buf(struct hns_roce_dev *hr_dev, struct hns_roce_eq *eq)
+{
+ hns_roce_mtr_destroy(hr_dev, &eq->mtr);
+}
+
+static void hns_roce_v2_destroy_eqc(struct hns_roce_dev *hr_dev,
+ struct hns_roce_eq *eq)
{
struct device *dev = hr_dev->dev;
+ int eqn = eq->eqn;
int ret;
u8 cmd;
@@ -6386,12 +6393,9 @@ static void hns_roce_v2_destroy_eqc(struct hns_roce_dev *hr_dev, u32 eqn)
ret = hns_roce_destroy_hw_ctx(hr_dev, cmd, eqn & HNS_ROCE_V2_EQN_M);
if (ret)
- dev_err(dev, "[mailbox cmd] destroy eqc(%u) failed.\n", eqn);
-}
+ dev_err(dev, "[mailbox cmd] destroy eqc(%d) failed.\n", eqn);
-static void free_eq_buf(struct hns_roce_dev *hr_dev, struct hns_roce_eq *eq)
-{
- hns_roce_mtr_destroy(hr_dev, &eq->mtr);
+ free_eq_buf(hr_dev, eq);
}
static void init_eq_config(struct hns_roce_dev *hr_dev, struct hns_roce_eq *eq)
@@ -6738,7 +6742,7 @@ static int hns_roce_v2_init_eq_table(struct hns_roce_dev *hr_dev)
err_create_eq_fail:
for (i -= 1; i >= 0; i--)
- free_eq_buf(hr_dev, &eq_table->eq[i]);
+ hns_roce_v2_destroy_eqc(hr_dev, &eq_table->eq[i]);
kfree(eq_table->eq);
return ret;
@@ -6758,11 +6762,8 @@ static void hns_roce_v2_cleanup_eq_table(struct hns_roce_dev *hr_dev)
__hns_roce_free_irq(hr_dev);
destroy_workqueue(hr_dev->irq_workq);
- for (i = 0; i < eq_num; i++) {
- hns_roce_v2_destroy_eqc(hr_dev, i);
-
- free_eq_buf(hr_dev, &eq_table->eq[i]);
- }
+ for (i = 0; i < eq_num; i++)
+ hns_roce_v2_destroy_eqc(hr_dev, &eq_table->eq[i]);
kfree(eq_table->eq);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
` (3 preceding siblings ...)
2024-07-05 8:59 ` [PATCH for-rc 4/9] RDMA/hns: Fix unmatch exception handling when init eq table fails Junxian Huang
@ 2024-07-05 8:59 ` Junxian Huang
2024-07-07 9:16 ` Zhu Yanjun
2024-07-05 8:59 ` [PATCH for-rc 6/9] RDMA/hns: Fix shift-out-bounds when max_inline_data is 0 Junxian Huang
` (3 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
From: Chengchang Tang <tangchengchang@huawei.com>
The offset requires 128B alignment and the page size ranges from
4K to 128M.
Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
drivers/infiniband/hw/hns/hns_roce_device.h | 4 ++++
drivers/infiniband/hw/hns/hns_roce_mr.c | 5 +++++
2 files changed, 9 insertions(+)
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 5a2445f357ab..15b3b978a601 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -83,6 +83,7 @@
#define MR_TYPE_DMA 0x03
#define HNS_ROCE_FRMR_MAX_PA 512
+#define HNS_ROCE_FRMR_ALIGN_SIZE 128
#define PKEY_ID 0xffff
#define NODE_DESC_SIZE 64
@@ -189,6 +190,9 @@ enum {
#define HNS_HW_PAGE_SHIFT 12
#define HNS_HW_PAGE_SIZE (1 << HNS_HW_PAGE_SHIFT)
+#define HNS_HW_MAX_PAGE_SHIFT 27
+#define HNS_HW_MAX_PAGE_SIZE (1 << HNS_HW_MAX_PAGE_SHIFT)
+
struct hns_roce_uar {
u64 pfn;
unsigned long index;
diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index 1a61dceb3319..846da8c78b8b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -443,6 +443,11 @@ int hns_roce_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
struct hns_roce_mtr *mtr = &mr->pbl_mtr;
int ret, sg_num = 0;
+ if (!IS_ALIGNED(*sg_offset, HNS_ROCE_FRMR_ALIGN_SIZE) ||
+ ibmr->page_size < HNS_HW_PAGE_SIZE ||
+ ibmr->page_size > HNS_HW_MAX_PAGE_SIZE)
+ return sg_num;
+
mr->npages = 0;
mr->page_list = kvcalloc(mr->pbl_mtr.hem_cfg.buf_pg_count,
sizeof(dma_addr_t), GFP_KERNEL);
--
2.33.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH for-rc 6/9] RDMA/hns: Fix shift-out-bounds when max_inline_data is 0
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
` (4 preceding siblings ...)
2024-07-05 8:59 ` [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR Junxian Huang
@ 2024-07-05 8:59 ` Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 7/9] RDMA/hns: Fix undifined behavior caused by invalid max_sge Junxian Huang
` (2 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
From: Chengchang Tang <tangchengchang@huawei.com>
A shift-out-bounds may occur, if the max_inline_data has not been set.
The related log:
UBSAN: shift-out-of-bounds in kernel/include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
Call trace:
dump_backtrace+0xb0/0x118
show_stack+0x20/0x38
dump_stack_lvl+0xbc/0x120
dump_stack+0x1c/0x28
__ubsan_handle_shift_out_of_bounds+0x104/0x240
set_ext_sge_param+0x40c/0x420 [hns_roce_hw_v2]
hns_roce_create_qp+0xf48/0x1c40 [hns_roce_hw_v2]
create_qp.part.0+0x294/0x3c0
ib_create_qp_kernel+0x7c/0x150
create_mad_qp+0x11c/0x1e0
ib_mad_init_device+0x834/0xc88
add_client_context+0x248/0x318
enable_device_and_get+0x158/0x280
ib_register_device+0x4ac/0x610
hns_roce_init+0x890/0xf98 [hns_roce_hw_v2]
__hns_roce_hw_v2_init_instance+0x398/0x720 [hns_roce_hw_v2]
hns_roce_hw_v2_init_instance+0x108/0x1e0 [hns_roce_hw_v2]
hclge_init_roce_client_instance+0x1a0/0x358 [hclge]
hclge_init_client_instance+0xa0/0x508 [hclge]
hnae3_register_client+0x18c/0x210 [hnae3]
hns_roce_hw_v2_init+0x28/0xff8 [hns_roce_hw_v2]
do_one_initcall+0xe0/0x510
do_init_module+0x110/0x370
load_module+0x2c6c/0x2f20
init_module_from_file+0xe0/0x140
idempotent_init_module+0x24c/0x350
__arm64_sys_finit_module+0x88/0xf8
invoke_syscall+0x68/0x1a0
el0_svc_common.constprop.0+0x11c/0x150
do_el0_svc+0x38/0x50
el0_svc+0x50/0xa0
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x1a4/0x1a8
Fixes: 0c5e259b06a8 ("RDMA/hns: Fix incorrect sge nums calculation")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
drivers/infiniband/hw/hns/hns_roce_qp.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index db34665d1dfb..1de384ce4d0e 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -532,13 +532,15 @@ static unsigned int get_sge_num_from_max_inl_data(bool is_ud_or_gsi,
{
unsigned int inline_sge;
- inline_sge = roundup_pow_of_two(max_inline_data) / HNS_ROCE_SGE_SIZE;
+ if (!max_inline_data)
+ return 0;
/*
* if max_inline_data less than
* HNS_ROCE_SGE_IN_WQE * HNS_ROCE_SGE_SIZE,
* In addition to ud's mode, no need to extend sge.
*/
+ inline_sge = roundup_pow_of_two(max_inline_data) / HNS_ROCE_SGE_SIZE;
if (!is_ud_or_gsi && inline_sge <= HNS_ROCE_SGE_IN_WQE)
inline_sge = 0;
--
2.33.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH for-rc 7/9] RDMA/hns: Fix undifined behavior caused by invalid max_sge
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
` (5 preceding siblings ...)
2024-07-05 8:59 ` [PATCH for-rc 6/9] RDMA/hns: Fix shift-out-bounds when max_inline_data is 0 Junxian Huang
@ 2024-07-05 8:59 ` Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 8/9] RDMA/hns: Fix insufficient extend DB for VFs Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 9/9] RDMA/hns: Fix mbx timing out before CMD execution is completed Junxian Huang
8 siblings, 0 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
From: Chengchang Tang <tangchengchang@huawei.com>
If max_sge has been set to 0, roundup_pow_of_two() in
set_srq_basic_param() may have undefined behavior.
Fixes: 9dd052474a26 ("RDMA/hns: Allocate one more recv SGE for HIP08")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
drivers/infiniband/hw/hns/hns_roce_srq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_srq.c b/drivers/infiniband/hw/hns/hns_roce_srq.c
index f1997abc97ca..c9b8233f4b05 100644
--- a/drivers/infiniband/hw/hns/hns_roce_srq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_srq.c
@@ -297,7 +297,7 @@ static int set_srq_basic_param(struct hns_roce_srq *srq,
max_sge = proc_srq_sge(hr_dev, srq, !!udata);
if (attr->max_wr > hr_dev->caps.max_srq_wrs ||
- attr->max_sge > max_sge) {
+ attr->max_sge > max_sge || !attr->max_sge) {
ibdev_err(&hr_dev->ib_dev,
"invalid SRQ attr, depth = %u, sge = %u.\n",
attr->max_wr, attr->max_sge);
--
2.33.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH for-rc 8/9] RDMA/hns: Fix insufficient extend DB for VFs.
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
` (6 preceding siblings ...)
2024-07-05 8:59 ` [PATCH for-rc 7/9] RDMA/hns: Fix undifined behavior caused by invalid max_sge Junxian Huang
@ 2024-07-05 8:59 ` Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 9/9] RDMA/hns: Fix mbx timing out before CMD execution is completed Junxian Huang
8 siblings, 0 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
From: Chengchang Tang <tangchengchang@huawei.com>
VFs and its PF will share the memory of the extend DB. Currently,
the number of extend DB allocated by driver is only enough for PF.
This leads to a probability of DB loss and some other problems in
scenarios where both PF and VFs use a large number of QPs.
Fixes: 6b63597d3540 ("RDMA/hns: Add TSQ link table support")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 6cdeafef800a..1d72c8d47ae7 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -2468,14 +2468,16 @@ static int set_llm_cfg_to_hw(struct hns_roce_dev *hr_dev,
static struct hns_roce_link_table *
alloc_link_table_buf(struct hns_roce_dev *hr_dev)
{
+ u16 total_sl = hr_dev->caps.sl_num * hr_dev->func_num;
struct hns_roce_v2_priv *priv = hr_dev->priv;
struct hns_roce_link_table *link_tbl;
u32 pg_shift, size, min_size;
link_tbl = &priv->ext_llm;
pg_shift = hr_dev->caps.llm_buf_pg_sz + PAGE_SHIFT;
- size = hr_dev->caps.num_qps * HNS_ROCE_V2_EXT_LLM_ENTRY_SZ;
- min_size = HNS_ROCE_EXT_LLM_MIN_PAGES(hr_dev->caps.sl_num) << pg_shift;
+ size = hr_dev->caps.num_qps * hr_dev->func_num *
+ HNS_ROCE_V2_EXT_LLM_ENTRY_SZ;
+ min_size = HNS_ROCE_EXT_LLM_MIN_PAGES(total_sl) << pg_shift;
/* Alloc data table */
size = max(size, min_size);
--
2.33.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH for-rc 9/9] RDMA/hns: Fix mbx timing out before CMD execution is completed
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
` (7 preceding siblings ...)
2024-07-05 8:59 ` [PATCH for-rc 8/9] RDMA/hns: Fix insufficient extend DB for VFs Junxian Huang
@ 2024-07-05 8:59 ` Junxian Huang
8 siblings, 0 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-05 8:59 UTC (permalink / raw)
To: jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel, huangjunxian6
From: Chengchang Tang <tangchengchang@huawei.com>
When a large number of tasks are issued, the speed of HW processing
mbx will slow down. The standard for judging mbx timeout in the current
firmware is 30ms, and the current timeout standard for the driver is also
30ms.
Considering that firmware scheduling in multi-function scenarios takes a
certain amount of time, this will cause the driver to time out too early
and report a failure before mbx execution times out.
This patch introduces a new mechanism that can set different timeouts for
different cmds and extends the timeout of mbx to 35ms.
Fixes: a04ff739f2a9 ("RDMA/hns: Add command queue support for hip08 RoCE driver")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 35 +++++++++++++++++-----
drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 6 ++++
2 files changed, 34 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 1d72c8d47ae7..0ed6ec2232dc 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -1280,12 +1280,38 @@ static int hns_roce_cmd_err_convert_errno(u16 desc_ret)
return -EIO;
}
+static u32 hns_roce_cmdq_tx_timeout(u16 opcode, u32 tx_timeout)
+{
+ static const struct hns_roce_cmdq_tx_timeout_map cmdq_tx_timeout[] = {
+ {HNS_ROCE_OPC_POST_MB, HNS_ROCE_OPC_POST_MB_TIMEOUT},
+ };
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(cmdq_tx_timeout); i++)
+ if (cmdq_tx_timeout[i].opcode == opcode)
+ return cmdq_tx_timeout[i].tx_timeout;
+
+ return tx_timeout;
+}
+
+static void hns_roce_wait_csq_done(struct hns_roce_dev *hr_dev, u16 opcode)
+{
+ struct hns_roce_v2_priv *priv = hr_dev->priv;
+ u32 tx_timeout = hns_roce_cmdq_tx_timeout(opcode, priv->cmq.tx_timeout);
+ u32 timeout = 0;
+
+ do {
+ if (hns_roce_cmq_csq_done(hr_dev))
+ break;
+ udelay(1);
+ } while (++timeout < tx_timeout);
+}
+
static int __hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
struct hns_roce_cmq_desc *desc, int num)
{
struct hns_roce_v2_priv *priv = hr_dev->priv;
struct hns_roce_v2_cmq_ring *csq = &priv->cmq.csq;
- u32 timeout = 0;
u16 desc_ret;
u32 tail;
int ret;
@@ -1306,12 +1332,7 @@ static int __hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CMDS_CNT]);
- do {
- if (hns_roce_cmq_csq_done(hr_dev))
- break;
- udelay(1);
- } while (++timeout < priv->cmq.tx_timeout);
-
+ hns_roce_wait_csq_done(hr_dev, le16_to_cpu(desc->opcode));
if (hns_roce_cmq_csq_done(hr_dev)) {
ret = 0;
for (i = 0; i < num; i++) {
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index def1d15a03c7..c65f68a14a26 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -224,6 +224,12 @@ enum hns_roce_opcode_type {
HNS_SWITCH_PARAMETER_CFG = 0x1033,
};
+#define HNS_ROCE_OPC_POST_MB_TIMEOUT 35000
+struct hns_roce_cmdq_tx_timeout_map {
+ u16 opcode;
+ u32 tx_timeout;
+};
+
enum {
TYPE_CRQ,
TYPE_CSQ,
--
2.33.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 3/9] RDMA/hns: Fix soft lockup under heavy CEQE load
2024-07-05 8:59 ` [PATCH for-rc 3/9] RDMA/hns: Fix soft lockup under heavy CEQE load Junxian Huang
@ 2024-07-05 10:47 ` Zhu Yanjun
2024-07-08 2:30 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Zhu Yanjun @ 2024-07-05 10:47 UTC (permalink / raw)
To: Junxian Huang, jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel
在 2024/7/5 16:59, Junxian Huang 写道:
> CEQEs are handled in interrupt handler currently. This may cause the
> CPU core staying in interrupt context too long and lead to soft lockup
> under heavy load.
>
> Handle CEQEs in tasklet and set an upper limit for the number of CEQE
> handled by a single call of tasklet.
https://patchwork.kernel.org/project/linux-rdma/cover/20240621050525.3720069-1-allen.lkml@gmail.com/
In the above link, it seems that tasklet is not good enough. The tasklet
is marked deprecated and has some design flaws. It is being replace BH
workqueue.
So directly use workqueue instead of tasklet?
Zhu Yanjun
>
> Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08")
> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> ---
> drivers/infiniband/hw/hns/hns_roce_device.h | 1 +
> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 88 ++++++++++++---------
> 2 files changed, 53 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> index 05005079258c..5a2445f357ab 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> @@ -717,6 +717,7 @@ struct hns_roce_eq {
> int shift;
> int event_type;
> int sub_type;
> + struct tasklet_struct tasklet;
> };
>
> struct hns_roce_eq_table {
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index ff135df1a761..f73de06a3ca5 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -6146,33 +6146,11 @@ static struct hns_roce_ceqe *next_ceqe_sw_v2(struct hns_roce_eq *eq)
> !!(eq->cons_index & eq->entries)) ? ceqe : NULL;
> }
>
> -static irqreturn_t hns_roce_v2_ceq_int(struct hns_roce_dev *hr_dev,
> - struct hns_roce_eq *eq)
> +static irqreturn_t hns_roce_v2_ceq_int(struct hns_roce_eq *eq)
> {
> - struct hns_roce_ceqe *ceqe = next_ceqe_sw_v2(eq);
> - irqreturn_t ceqe_found = IRQ_NONE;
> - u32 cqn;
> -
> - while (ceqe) {
> - /* Make sure we read CEQ entry after we have checked the
> - * ownership bit
> - */
> - dma_rmb();
> -
> - cqn = hr_reg_read(ceqe, CEQE_CQN);
> -
> - hns_roce_cq_completion(hr_dev, cqn);
> -
> - ++eq->cons_index;
> - ceqe_found = IRQ_HANDLED;
> - atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CEQE_CNT]);
> -
> - ceqe = next_ceqe_sw_v2(eq);
> - }
> + tasklet_schedule(&eq->tasklet);
>
> - update_eq_db(eq);
> -
> - return IRQ_RETVAL(ceqe_found);
> + return IRQ_HANDLED;
> }
>
> static irqreturn_t hns_roce_v2_msix_interrupt_eq(int irq, void *eq_ptr)
> @@ -6183,7 +6161,7 @@ static irqreturn_t hns_roce_v2_msix_interrupt_eq(int irq, void *eq_ptr)
>
> if (eq->type_flag == HNS_ROCE_CEQ)
> /* Completion event interrupt */
> - int_work = hns_roce_v2_ceq_int(hr_dev, eq);
> + int_work = hns_roce_v2_ceq_int(eq);
> else
> /* Asynchronous event interrupt */
> int_work = hns_roce_v2_aeq_int(hr_dev, eq);
> @@ -6551,6 +6529,34 @@ static int hns_roce_v2_create_eq(struct hns_roce_dev *hr_dev,
> return ret;
> }
>
> +static void hns_roce_ceq_task(struct tasklet_struct *task)
> +{
> + struct hns_roce_eq *eq = from_tasklet(eq, task, tasklet);
> + struct hns_roce_ceqe *ceqe = next_ceqe_sw_v2(eq);
> + struct hns_roce_dev *hr_dev = eq->hr_dev;
> + int ceqe_num = 0;
> + u32 cqn;
> +
> + while (ceqe && ceqe_num < hr_dev->caps.ceqe_depth) {
> + /* Make sure we read CEQ entry after we have checked the
> + * ownership bit
> + */
> + dma_rmb();
> +
> + cqn = hr_reg_read(ceqe, CEQE_CQN);
> +
> + hns_roce_cq_completion(hr_dev, cqn);
> +
> + ++eq->cons_index;
> + ++ceqe_num;
> + atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CEQE_CNT]);
> +
> + ceqe = next_ceqe_sw_v2(eq);
> + }
> +
> + update_eq_db(eq);
> +}
> +
> static int __hns_roce_request_irq(struct hns_roce_dev *hr_dev, int irq_num,
> int comp_num, int aeq_num, int other_num)
> {
> @@ -6582,21 +6588,24 @@ static int __hns_roce_request_irq(struct hns_roce_dev *hr_dev, int irq_num,
> j - other_num - aeq_num);
>
> for (j = 0; j < irq_num; j++) {
> - if (j < other_num)
> + if (j < other_num) {
> ret = request_irq(hr_dev->irq[j],
> hns_roce_v2_msix_interrupt_abn,
> 0, hr_dev->irq_names[j], hr_dev);
> -
> - else if (j < (other_num + comp_num))
> + } else if (j < (other_num + comp_num)) {
> + tasklet_setup(&eq_table->eq[j - other_num].tasklet,
> + hns_roce_ceq_task);
> ret = request_irq(eq_table->eq[j - other_num].irq,
> hns_roce_v2_msix_interrupt_eq,
> 0, hr_dev->irq_names[j + aeq_num],
> &eq_table->eq[j - other_num]);
> - else
> + } else {
> ret = request_irq(eq_table->eq[j - other_num].irq,
> hns_roce_v2_msix_interrupt_eq,
> 0, hr_dev->irq_names[j - comp_num],
> &eq_table->eq[j - other_num]);
> + }
> +
> if (ret) {
> dev_err(hr_dev->dev, "request irq error!\n");
> goto err_request_failed;
> @@ -6606,12 +6615,16 @@ static int __hns_roce_request_irq(struct hns_roce_dev *hr_dev, int irq_num,
> return 0;
>
> err_request_failed:
> - for (j -= 1; j >= 0; j--)
> - if (j < other_num)
> + for (j -= 1; j >= 0; j--) {
> + if (j < other_num) {
> free_irq(hr_dev->irq[j], hr_dev);
> - else
> - free_irq(eq_table->eq[j - other_num].irq,
> - &eq_table->eq[j - other_num]);
> + continue;
> + }
> + free_irq(eq_table->eq[j - other_num].irq,
> + &eq_table->eq[j - other_num]);
> + if (j < other_num + comp_num)
> + tasklet_kill(&eq_table->eq[j - other_num].tasklet);
> + }
>
> err_kzalloc_failed:
> for (i -= 1; i >= 0; i--)
> @@ -6632,8 +6645,11 @@ static void __hns_roce_free_irq(struct hns_roce_dev *hr_dev)
> for (i = 0; i < hr_dev->caps.num_other_vectors; i++)
> free_irq(hr_dev->irq[i], hr_dev);
>
> - for (i = 0; i < eq_num; i++)
> + for (i = 0; i < eq_num; i++) {
> free_irq(hr_dev->eq_table.eq[i].irq, &hr_dev->eq_table.eq[i]);
> + if (i < hr_dev->caps.num_comp_vectors)
> + tasklet_kill(&hr_dev->eq_table.eq[i].tasklet);
> + }
>
> for (i = 0; i < irq_num; i++)
> kfree(hr_dev->irq_names[i]);
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 1/9] RDMA/hns: Check atomic wr length
2024-07-05 8:59 ` [PATCH for-rc 1/9] RDMA/hns: Check atomic wr length Junxian Huang
@ 2024-07-07 8:24 ` Leon Romanovsky
2024-07-08 2:27 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-07 8:24 UTC (permalink / raw)
To: Junxian Huang; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On Fri, Jul 05, 2024 at 04:59:29PM +0800, Junxian Huang wrote:
> 8 bytes is the only supported length of atomic. Return an error if
> it is not.
>
> Fixes: 384f88185112 ("RDMA/hns: Add atomic support")
> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> ---
> drivers/infiniband/hw/hns/hns_roce_device.h | 2 ++
> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 19 +++++++++++++++----
> 2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> index ff0b3f68ee3a..05005079258c 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> @@ -91,6 +91,8 @@
> /* Configure to HW for PAGE_SIZE larger than 4KB */
> #define PG_SHIFT_OFFSET (PAGE_SHIFT - 12)
>
> +#define ATOMIC_WR_LEN 8
> +
> #define HNS_ROCE_IDX_QUE_ENTRY_SZ 4
> #define SRQ_DB_REG 0x230
>
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index 4287818a737f..a5d746a5cc68 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -164,15 +164,23 @@ static void set_frmr_seg(struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
> hr_reg_clear(fseg, FRMR_BLK_MODE);
> }
>
> -static void set_atomic_seg(const struct ib_send_wr *wr,
> - struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
> - unsigned int valid_num_sge)
> +static int set_atomic_seg(struct hns_roce_dev *hr_dev,
> + const struct ib_send_wr *wr,
> + struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
> + unsigned int valid_num_sge, u32 msg_len)
> {
> struct hns_roce_v2_wqe_data_seg *dseg =
> (void *)rc_sq_wqe + sizeof(struct hns_roce_v2_rc_send_wqe);
> struct hns_roce_wqe_atomic_seg *aseg =
> (void *)dseg + sizeof(struct hns_roce_v2_wqe_data_seg);
>
> + if (msg_len != ATOMIC_WR_LEN) {
> + ibdev_err_ratelimited(&hr_dev->ib_dev,
> + "invalid atomic wr len, len = %u.\n",
> + msg_len);
> + return -EINVAL;
1. Please don't add prints in data-path.
2. You most likely need to add this check before calling to set_atomic_seg().
3. You shouldn't continue to process the WQE if the length is invalid.
Need to return from set_rc_wqe() and not continue.
4. I wonder if it is right place to put this limitation and can't be
enforced much earlier.
Thanks
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-05 8:59 ` [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset Junxian Huang
@ 2024-07-07 8:30 ` Leon Romanovsky
2024-07-08 2:29 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-07 8:30 UTC (permalink / raw)
To: Junxian Huang; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
> From: wenglianfa <wenglianfa@huawei.com>
>
> During reset, cmdq events won't be reported, leading to a long and
> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
> of reset.
>
> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> ---
> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index a5d746a5cc68..ff135df1a761 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>
> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
> }
> +
> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
> +{
> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
> + int i;
> +
> + if (!hr_dev->cmd_mod)
What prevents cmd_mod from being changed?
> + return;
> +
> + for (i = 0; i < hr_cmd->max_cmds; i++) {
> + hr_cmd->context[i].result = -EBUSY;
> + complete(&hr_cmd->context[i].done);
> + }
> +}
> +
> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> {
> struct hns_roce_dev *hr_dev;
> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> hr_dev->dis_db = true;
> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
>
> + /* Complete the CMDQ event in advance during the reset. */
> + hns_roce_v2_reset_notify_cmd(hr_dev);
> +
> return 0;
> }
>
> --
> 2.33.0
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR
2024-07-05 8:59 ` [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR Junxian Huang
@ 2024-07-07 9:16 ` Zhu Yanjun
2024-07-08 2:44 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Zhu Yanjun @ 2024-07-07 9:16 UTC (permalink / raw)
To: Junxian Huang, jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel
在 2024/7/5 16:59, Junxian Huang 写道:
> From: Chengchang Tang <tangchengchang@huawei.com>
>
> The offset requires 128B alignment and the page size ranges from
> 4K to 128M.
>
> Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
https://patchwork.kernel.org/project/linux-rdma/patch/2eee7e35-504e-4f2a-a364-527e90669108@CMEXHTCAS1.ad.emulex.com/
In the above link, from Bart, it seems that FRMR is renamed to FRWR.
"
There are already a few drivers upstream in which the fast register
memory region work request is abbreviated as FRWR. Please consider
renaming FRMR into FRWR in order to avoid confusion and in order to
make it easier to find related code with grep in the kernel tree.
"
So is it possible to rename FRMR to FRWR?
> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> ---
> drivers/infiniband/hw/hns/hns_roce_device.h | 4 ++++
> drivers/infiniband/hw/hns/hns_roce_mr.c | 5 +++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> index 5a2445f357ab..15b3b978a601 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> @@ -83,6 +83,7 @@
> #define MR_TYPE_DMA 0x03
>
> #define HNS_ROCE_FRMR_MAX_PA 512
> +#define HNS_ROCE_FRMR_ALIGN_SIZE 128
>
> #define PKEY_ID 0xffff
> #define NODE_DESC_SIZE 64
> @@ -189,6 +190,9 @@ enum {
> #define HNS_HW_PAGE_SHIFT 12
> #define HNS_HW_PAGE_SIZE (1 << HNS_HW_PAGE_SHIFT)
>
> +#define HNS_HW_MAX_PAGE_SHIFT 27
> +#define HNS_HW_MAX_PAGE_SIZE (1 << HNS_HW_MAX_PAGE_SHIFT)
> +
> struct hns_roce_uar {
> u64 pfn;
> unsigned long index;
> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> index 1a61dceb3319..846da8c78b8b 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> @@ -443,6 +443,11 @@ int hns_roce_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
> struct hns_roce_mtr *mtr = &mr->pbl_mtr;
> int ret, sg_num = 0;
>
> + if (!IS_ALIGNED(*sg_offset, HNS_ROCE_FRMR_ALIGN_SIZE) ||
> + ibmr->page_size < HNS_HW_PAGE_SIZE ||
> + ibmr->page_size > HNS_HW_MAX_PAGE_SIZE)
> + return sg_num;
> +
> mr->npages = 0;
> mr->page_list = kvcalloc(mr->pbl_mtr.hem_cfg.buf_pg_count,
> sizeof(dma_addr_t), GFP_KERNEL);
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 1/9] RDMA/hns: Check atomic wr length
2024-07-07 8:24 ` Leon Romanovsky
@ 2024-07-08 2:27 ` Junxian Huang
0 siblings, 0 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-08 2:27 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On 2024/7/7 16:24, Leon Romanovsky wrote:
> On Fri, Jul 05, 2024 at 04:59:29PM +0800, Junxian Huang wrote:
>> 8 bytes is the only supported length of atomic. Return an error if
>> it is not.
>>
>> Fixes: 384f88185112 ("RDMA/hns: Add atomic support")
>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>> ---
>> drivers/infiniband/hw/hns/hns_roce_device.h | 2 ++
>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 19 +++++++++++++++----
>> 2 files changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
>> index ff0b3f68ee3a..05005079258c 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
>> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
>> @@ -91,6 +91,8 @@
>> /* Configure to HW for PAGE_SIZE larger than 4KB */
>> #define PG_SHIFT_OFFSET (PAGE_SHIFT - 12)
>>
>> +#define ATOMIC_WR_LEN 8
>> +
>> #define HNS_ROCE_IDX_QUE_ENTRY_SZ 4
>> #define SRQ_DB_REG 0x230
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index 4287818a737f..a5d746a5cc68 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -164,15 +164,23 @@ static void set_frmr_seg(struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
>> hr_reg_clear(fseg, FRMR_BLK_MODE);
>> }
>>
>> -static void set_atomic_seg(const struct ib_send_wr *wr,
>> - struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
>> - unsigned int valid_num_sge)
>> +static int set_atomic_seg(struct hns_roce_dev *hr_dev,
>> + const struct ib_send_wr *wr,
>> + struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
>> + unsigned int valid_num_sge, u32 msg_len)
>> {
>> struct hns_roce_v2_wqe_data_seg *dseg =
>> (void *)rc_sq_wqe + sizeof(struct hns_roce_v2_rc_send_wqe);
>> struct hns_roce_wqe_atomic_seg *aseg =
>> (void *)dseg + sizeof(struct hns_roce_v2_wqe_data_seg);
>>
>> + if (msg_len != ATOMIC_WR_LEN) {
>> + ibdev_err_ratelimited(&hr_dev->ib_dev,
>> + "invalid atomic wr len, len = %u.\n",
>> + msg_len);
>> + return -EINVAL;
>
> 1. Please don't add prints in data-path.
> 2. You most likely need to add this check before calling to set_atomic_seg().
> 3. You shouldn't continue to process the WQE if the length is invalid.
> Need to return from set_rc_wqe() and not continue.
> 4. I wonder if it is right place to put this limitation and can't be
> enforced much earlier.
>
> Thanks
>
Thanks. 1 & 3 will be fixed. And for 2 & 4, I don't see any place more appropriate,
so I'll just add this check in set_rc_wqe().
Junxian
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-07 8:30 ` Leon Romanovsky
@ 2024-07-08 2:29 ` Junxian Huang
2024-07-08 5:38 ` Leon Romanovsky
0 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-08 2:29 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On 2024/7/7 16:30, Leon Romanovsky wrote:
> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
>> From: wenglianfa <wenglianfa@huawei.com>
>>
>> During reset, cmdq events won't be reported, leading to a long and
>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
>> of reset.
>>
>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>> ---
>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index a5d746a5cc68..ff135df1a761 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>
>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
>> }
>> +
>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
>> +{
>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
>> + int i;
>> +
>> + if (!hr_dev->cmd_mod)
>
> What prevents cmd_mod from being changed?
>
It's set when the device is being initialized, and won't be changed after that.
Junxian
>> + return;
>> +
>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
>> + hr_cmd->context[i].result = -EBUSY;
>> + complete(&hr_cmd->context[i].done);
>> + }
>> +}
>> +
>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>> {
>> struct hns_roce_dev *hr_dev;
>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>> hr_dev->dis_db = true;
>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
>>
>> + /* Complete the CMDQ event in advance during the reset. */
>> + hns_roce_v2_reset_notify_cmd(hr_dev);
>> +
>> return 0;
>> }
>>
>> --
>> 2.33.0
>>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 3/9] RDMA/hns: Fix soft lockup under heavy CEQE load
2024-07-05 10:47 ` Zhu Yanjun
@ 2024-07-08 2:30 ` Junxian Huang
0 siblings, 0 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-08 2:30 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel
On 2024/7/5 18:47, Zhu Yanjun wrote:
> 在 2024/7/5 16:59, Junxian Huang 写道:
>> CEQEs are handled in interrupt handler currently. This may cause the
>> CPU core staying in interrupt context too long and lead to soft lockup
>> under heavy load.
>>
>> Handle CEQEs in tasklet and set an upper limit for the number of CEQE
>> handled by a single call of tasklet.
>
> https://patchwork.kernel.org/project/linux-rdma/cover/20240621050525.3720069-1-allen.lkml@gmail.com/
>
> In the above link, it seems that tasklet is not good enough. The tasklet is marked deprecated and has some design flaws. It is being replace BH workqueue.
>
> So directly use workqueue instead of tasklet?
>
> Zhu Yanjun
>
Thanks, I'll have a look at it.
Junxian
>>
>> Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08")
>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>> ---
>> drivers/infiniband/hw/hns/hns_roce_device.h | 1 +
>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 88 ++++++++++++---------
>> 2 files changed, 53 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
>> index 05005079258c..5a2445f357ab 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
>> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
>> @@ -717,6 +717,7 @@ struct hns_roce_eq {
>> int shift;
>> int event_type;
>> int sub_type;
>> + struct tasklet_struct tasklet;
>> };
>> struct hns_roce_eq_table {
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index ff135df1a761..f73de06a3ca5 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -6146,33 +6146,11 @@ static struct hns_roce_ceqe *next_ceqe_sw_v2(struct hns_roce_eq *eq)
>> !!(eq->cons_index & eq->entries)) ? ceqe : NULL;
>> }
>> -static irqreturn_t hns_roce_v2_ceq_int(struct hns_roce_dev *hr_dev,
>> - struct hns_roce_eq *eq)
>> +static irqreturn_t hns_roce_v2_ceq_int(struct hns_roce_eq *eq)
>> {
>> - struct hns_roce_ceqe *ceqe = next_ceqe_sw_v2(eq);
>> - irqreturn_t ceqe_found = IRQ_NONE;
>> - u32 cqn;
>> -
>> - while (ceqe) {
>> - /* Make sure we read CEQ entry after we have checked the
>> - * ownership bit
>> - */
>> - dma_rmb();
>> -
>> - cqn = hr_reg_read(ceqe, CEQE_CQN);
>> -
>> - hns_roce_cq_completion(hr_dev, cqn);
>> -
>> - ++eq->cons_index;
>> - ceqe_found = IRQ_HANDLED;
>> - atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CEQE_CNT]);
>> -
>> - ceqe = next_ceqe_sw_v2(eq);
>> - }
>> + tasklet_schedule(&eq->tasklet);
>> - update_eq_db(eq);
>> -
>> - return IRQ_RETVAL(ceqe_found);
>> + return IRQ_HANDLED;
>> }
>> static irqreturn_t hns_roce_v2_msix_interrupt_eq(int irq, void *eq_ptr)
>> @@ -6183,7 +6161,7 @@ static irqreturn_t hns_roce_v2_msix_interrupt_eq(int irq, void *eq_ptr)
>> if (eq->type_flag == HNS_ROCE_CEQ)
>> /* Completion event interrupt */
>> - int_work = hns_roce_v2_ceq_int(hr_dev, eq);
>> + int_work = hns_roce_v2_ceq_int(eq);
>> else
>> /* Asynchronous event interrupt */
>> int_work = hns_roce_v2_aeq_int(hr_dev, eq);
>> @@ -6551,6 +6529,34 @@ static int hns_roce_v2_create_eq(struct hns_roce_dev *hr_dev,
>> return ret;
>> }
>> +static void hns_roce_ceq_task(struct tasklet_struct *task)
>> +{
>> + struct hns_roce_eq *eq = from_tasklet(eq, task, tasklet);
>> + struct hns_roce_ceqe *ceqe = next_ceqe_sw_v2(eq);
>> + struct hns_roce_dev *hr_dev = eq->hr_dev;
>> + int ceqe_num = 0;
>> + u32 cqn;
>> +
>> + while (ceqe && ceqe_num < hr_dev->caps.ceqe_depth) {
>> + /* Make sure we read CEQ entry after we have checked the
>> + * ownership bit
>> + */
>> + dma_rmb();
>> +
>> + cqn = hr_reg_read(ceqe, CEQE_CQN);
>> +
>> + hns_roce_cq_completion(hr_dev, cqn);
>> +
>> + ++eq->cons_index;
>> + ++ceqe_num;
>> + atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CEQE_CNT]);
>> +
>> + ceqe = next_ceqe_sw_v2(eq);
>> + }
>> +
>> + update_eq_db(eq);
>> +}
>> +
>> static int __hns_roce_request_irq(struct hns_roce_dev *hr_dev, int irq_num,
>> int comp_num, int aeq_num, int other_num)
>> {
>> @@ -6582,21 +6588,24 @@ static int __hns_roce_request_irq(struct hns_roce_dev *hr_dev, int irq_num,
>> j - other_num - aeq_num);
>> for (j = 0; j < irq_num; j++) {
>> - if (j < other_num)
>> + if (j < other_num) {
>> ret = request_irq(hr_dev->irq[j],
>> hns_roce_v2_msix_interrupt_abn,
>> 0, hr_dev->irq_names[j], hr_dev);
>> -
>> - else if (j < (other_num + comp_num))
>> + } else if (j < (other_num + comp_num)) {
>> + tasklet_setup(&eq_table->eq[j - other_num].tasklet,
>> + hns_roce_ceq_task);
>> ret = request_irq(eq_table->eq[j - other_num].irq,
>> hns_roce_v2_msix_interrupt_eq,
>> 0, hr_dev->irq_names[j + aeq_num],
>> &eq_table->eq[j - other_num]);
>> - else
>> + } else {
>> ret = request_irq(eq_table->eq[j - other_num].irq,
>> hns_roce_v2_msix_interrupt_eq,
>> 0, hr_dev->irq_names[j - comp_num],
>> &eq_table->eq[j - other_num]);
>> + }
>> +
>> if (ret) {
>> dev_err(hr_dev->dev, "request irq error!\n");
>> goto err_request_failed;
>> @@ -6606,12 +6615,16 @@ static int __hns_roce_request_irq(struct hns_roce_dev *hr_dev, int irq_num,
>> return 0;
>> err_request_failed:
>> - for (j -= 1; j >= 0; j--)
>> - if (j < other_num)
>> + for (j -= 1; j >= 0; j--) {
>> + if (j < other_num) {
>> free_irq(hr_dev->irq[j], hr_dev);
>> - else
>> - free_irq(eq_table->eq[j - other_num].irq,
>> - &eq_table->eq[j - other_num]);
>> + continue;
>> + }
>> + free_irq(eq_table->eq[j - other_num].irq,
>> + &eq_table->eq[j - other_num]);
>> + if (j < other_num + comp_num)
>> + tasklet_kill(&eq_table->eq[j - other_num].tasklet);
>> + }
>> err_kzalloc_failed:
>> for (i -= 1; i >= 0; i--)
>> @@ -6632,8 +6645,11 @@ static void __hns_roce_free_irq(struct hns_roce_dev *hr_dev)
>> for (i = 0; i < hr_dev->caps.num_other_vectors; i++)
>> free_irq(hr_dev->irq[i], hr_dev);
>> - for (i = 0; i < eq_num; i++)
>> + for (i = 0; i < eq_num; i++) {
>> free_irq(hr_dev->eq_table.eq[i].irq, &hr_dev->eq_table.eq[i]);
>> + if (i < hr_dev->caps.num_comp_vectors)
>> + tasklet_kill(&hr_dev->eq_table.eq[i].tasklet);
>> + }
>> for (i = 0; i < irq_num; i++)
>> kfree(hr_dev->irq_names[i]);
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR
2024-07-07 9:16 ` Zhu Yanjun
@ 2024-07-08 2:44 ` Junxian Huang
2024-07-08 5:41 ` Leon Romanovsky
2024-07-08 7:57 ` Zhu Yanjun
0 siblings, 2 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-08 2:44 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel
On 2024/7/7 17:16, Zhu Yanjun wrote:
> 在 2024/7/5 16:59, Junxian Huang 写道:
>> From: Chengchang Tang <tangchengchang@huawei.com>
>>
>> The offset requires 128B alignment and the page size ranges from
>> 4K to 128M.
>>
>> Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
>
> https://patchwork.kernel.org/project/linux-rdma/patch/2eee7e35-504e-4f2a-a364-527e90669108@CMEXHTCAS1.ad.emulex.com/
> In the above link, from Bart, it seems that FRMR is renamed to FRWR.
> "
> There are already a few drivers upstream in which the fast register
> memory region work request is abbreviated as FRWR. Please consider
> renaming FRMR into FRWR in order to avoid confusion and in order to
> make it easier to find related code with grep in the kernel tree.
> "
>
> So is it possible to rename FRMR to FRWR?
>
I think the rename is irrelevant to this bugfix, and if it needs to be done,
we'll need a single patch to rename all existing 'FRMR' in hns driver.
So let's leave it as is for now.
Thanks,
Junxian
>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>> ---
>> drivers/infiniband/hw/hns/hns_roce_device.h | 4 ++++
>> drivers/infiniband/hw/hns/hns_roce_mr.c | 5 +++++
>> 2 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
>> index 5a2445f357ab..15b3b978a601 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
>> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
>> @@ -83,6 +83,7 @@
>> #define MR_TYPE_DMA 0x03
>> #define HNS_ROCE_FRMR_MAX_PA 512
>> +#define HNS_ROCE_FRMR_ALIGN_SIZE 128
>> #define PKEY_ID 0xffff
>> #define NODE_DESC_SIZE 64
>> @@ -189,6 +190,9 @@ enum {
>> #define HNS_HW_PAGE_SHIFT 12
>> #define HNS_HW_PAGE_SIZE (1 << HNS_HW_PAGE_SHIFT)
>> +#define HNS_HW_MAX_PAGE_SHIFT 27
>> +#define HNS_HW_MAX_PAGE_SIZE (1 << HNS_HW_MAX_PAGE_SHIFT)
>> +
>> struct hns_roce_uar {
>> u64 pfn;
>> unsigned long index;
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
>> index 1a61dceb3319..846da8c78b8b 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
>> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
>> @@ -443,6 +443,11 @@ int hns_roce_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
>> struct hns_roce_mtr *mtr = &mr->pbl_mtr;
>> int ret, sg_num = 0;
>> + if (!IS_ALIGNED(*sg_offset, HNS_ROCE_FRMR_ALIGN_SIZE) ||
>> + ibmr->page_size < HNS_HW_PAGE_SIZE ||
>> + ibmr->page_size > HNS_HW_MAX_PAGE_SIZE)
>> + return sg_num;
>> +
>> mr->npages = 0;
>> mr->page_list = kvcalloc(mr->pbl_mtr.hem_cfg.buf_pg_count,
>> sizeof(dma_addr_t), GFP_KERNEL);
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 2:29 ` Junxian Huang
@ 2024-07-08 5:38 ` Leon Romanovsky
2024-07-08 6:50 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-08 5:38 UTC (permalink / raw)
To: Junxian Huang; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
>
>
> On 2024/7/7 16:30, Leon Romanovsky wrote:
> > On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
> >> From: wenglianfa <wenglianfa@huawei.com>
> >>
> >> During reset, cmdq events won't be reported, leading to a long and
> >> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
> >> of reset.
> >>
> >> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
> >> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
> >> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> >> ---
> >> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
> >> 1 file changed, 18 insertions(+)
> >>
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> index a5d746a5cc68..ff135df1a761 100644
> >> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
> >>
> >> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
> >> }
> >> +
> >> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
> >> +{
> >> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
> >> + int i;
> >> +
> >> + if (!hr_dev->cmd_mod)
> >
> > What prevents cmd_mod from being changed?
> >
>
> It's set when the device is being initialized, and won't be changed after that.
This is exactly the point, you are assuming that the device is already
ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
from being called in the middle of initialization?
Thanks
>
> Junxian
>
> >> + return;
> >> +
> >> + for (i = 0; i < hr_cmd->max_cmds; i++) {
> >> + hr_cmd->context[i].result = -EBUSY;
> >> + complete(&hr_cmd->context[i].done);
> >> + }
> >> +}
> >> +
> >> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >> {
> >> struct hns_roce_dev *hr_dev;
> >> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >> hr_dev->dis_db = true;
> >> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
> >>
> >> + /* Complete the CMDQ event in advance during the reset. */
> >> + hns_roce_v2_reset_notify_cmd(hr_dev);
> >> +
> >> return 0;
> >> }
> >>
> >> --
> >> 2.33.0
> >>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR
2024-07-08 2:44 ` Junxian Huang
@ 2024-07-08 5:41 ` Leon Romanovsky
2024-07-08 7:57 ` Zhu Yanjun
1 sibling, 0 replies; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-08 5:41 UTC (permalink / raw)
To: Junxian Huang; +Cc: Zhu Yanjun, jgg, linux-rdma, linuxarm, linux-kernel
On Mon, Jul 08, 2024 at 10:44:52AM +0800, Junxian Huang wrote:
>
>
> On 2024/7/7 17:16, Zhu Yanjun wrote:
> > 在 2024/7/5 16:59, Junxian Huang 写道:
> >> From: Chengchang Tang <tangchengchang@huawei.com>
> >>
> >> The offset requires 128B alignment and the page size ranges from
> >> 4K to 128M.
> >>
> >> Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
> >
> > https://patchwork.kernel.org/project/linux-rdma/patch/2eee7e35-504e-4f2a-a364-527e90669108@CMEXHTCAS1.ad.emulex.com/
> > In the above link, from Bart, it seems that FRMR is renamed to FRWR.
> > "
> > There are already a few drivers upstream in which the fast register
> > memory region work request is abbreviated as FRWR. Please consider
> > renaming FRMR into FRWR in order to avoid confusion and in order to
> > make it easier to find related code with grep in the kernel tree.
> > "
> >
> > So is it possible to rename FRMR to FRWR?
> >
>
> I think the rename is irrelevant to this bugfix, and if it needs to be done,
> we'll need a single patch to rename all existing 'FRMR' in hns driver.
>
> So let's leave it as is for now.
+1
>
> Thanks,
> Junxian
>
> >> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> >> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> >> ---
> >> drivers/infiniband/hw/hns/hns_roce_device.h | 4 ++++
> >> drivers/infiniband/hw/hns/hns_roce_mr.c | 5 +++++
> >> 2 files changed, 9 insertions(+)
> >>
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> >> index 5a2445f357ab..15b3b978a601 100644
> >> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> >> @@ -83,6 +83,7 @@
> >> #define MR_TYPE_DMA 0x03
> >> #define HNS_ROCE_FRMR_MAX_PA 512
> >> +#define HNS_ROCE_FRMR_ALIGN_SIZE 128
> >> #define PKEY_ID 0xffff
> >> #define NODE_DESC_SIZE 64
> >> @@ -189,6 +190,9 @@ enum {
> >> #define HNS_HW_PAGE_SHIFT 12
> >> #define HNS_HW_PAGE_SIZE (1 << HNS_HW_PAGE_SHIFT)
> >> +#define HNS_HW_MAX_PAGE_SHIFT 27
> >> +#define HNS_HW_MAX_PAGE_SIZE (1 << HNS_HW_MAX_PAGE_SHIFT)
> >> +
> >> struct hns_roce_uar {
> >> u64 pfn;
> >> unsigned long index;
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >> index 1a61dceb3319..846da8c78b8b 100644
> >> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >> @@ -443,6 +443,11 @@ int hns_roce_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
> >> struct hns_roce_mtr *mtr = &mr->pbl_mtr;
> >> int ret, sg_num = 0;
> >> + if (!IS_ALIGNED(*sg_offset, HNS_ROCE_FRMR_ALIGN_SIZE) ||
> >> + ibmr->page_size < HNS_HW_PAGE_SIZE ||
> >> + ibmr->page_size > HNS_HW_MAX_PAGE_SIZE)
> >> + return sg_num;
> >> +
> >> mr->npages = 0;
> >> mr->page_list = kvcalloc(mr->pbl_mtr.hem_cfg.buf_pg_count,
> >> sizeof(dma_addr_t), GFP_KERNEL);
> >
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 5:38 ` Leon Romanovsky
@ 2024-07-08 6:50 ` Junxian Huang
2024-07-08 7:33 ` Leon Romanovsky
0 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-08 6:50 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On 2024/7/8 13:38, Leon Romanovsky wrote:
> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
>>
>>
>> On 2024/7/7 16:30, Leon Romanovsky wrote:
>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
>>>> From: wenglianfa <wenglianfa@huawei.com>
>>>>
>>>> During reset, cmdq events won't be reported, leading to a long and
>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
>>>> of reset.
>>>>
>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>>>> ---
>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
>>>> 1 file changed, 18 insertions(+)
>>>>
>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>> index a5d746a5cc68..ff135df1a761 100644
>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>>>
>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
>>>> }
>>>> +
>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
>>>> +{
>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
>>>> + int i;
>>>> +
>>>> + if (!hr_dev->cmd_mod)
>>>
>>> What prevents cmd_mod from being changed?
>>>
>>
>> It's set when the device is being initialized, and won't be changed after that.
>
> This is exactly the point, you are assuming that the device is already
> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
> from being called in the middle of initialization?
>
> Thanks
>
This is ensured by hns3 NIC driver.
Initialization and reset of hns RoCE are both called by hns3. It will check the state
of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
is called) only if the RoCE device has been already initialized:
3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
3792 enum hnae3_reset_notify_type type)
3793 {
3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
3795 struct hnae3_client *client = hdev->roce_client;
3796 int ret;
3797
3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
3799 return 0;
3800
3801 if (!client->ops->reset_notify)
3802 return -EOPNOTSUPP;
3803
3804 ret = client->ops->reset_notify(handle, type);
3805 if (ret)
3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
3807 type, ret);
3808
3809 return ret;
3810 }
And the bit is set (see line 11246) after the initialization has been done (line 11242):
11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
11225 struct hclge_vport *vport)
11226 {
11227 struct hclge_dev *hdev = ae_dev->priv;
11228 struct hnae3_client *client;
11229 int rst_cnt;
11230 int ret;
11231
11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
11233 !hdev->nic_client)
11234 return 0;
11235
11236 client = hdev->roce_client;
11237 ret = hclge_init_roce_base_info(vport);
11238 if (ret)
11239 return ret;
11240
11241 rst_cnt = hdev->rst_stats.reset_cnt;
11242 ret = client->ops->init_instance(&vport->roce);
11243 if (ret)
11244 return ret;
11245
11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
11248 rst_cnt != hdev->rst_stats.reset_cnt) {
11249 ret = -EBUSY;
11250 goto init_roce_err;
11251 }
Junxian
>>
>> Junxian
>>
>>>> + return;
>>>> +
>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
>>>> + hr_cmd->context[i].result = -EBUSY;
>>>> + complete(&hr_cmd->context[i].done);
>>>> + }
>>>> +}
>>>> +
>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>> {
>>>> struct hns_roce_dev *hr_dev;
>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>> hr_dev->dis_db = true;
>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
>>>>
>>>> + /* Complete the CMDQ event in advance during the reset. */
>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
>>>> +
>>>> return 0;
>>>> }
>>>>
>>>> --
>>>> 2.33.0
>>>>
>>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 6:50 ` Junxian Huang
@ 2024-07-08 7:33 ` Leon Romanovsky
2024-07-08 7:46 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-08 7:33 UTC (permalink / raw)
To: Junxian Huang; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
>
>
> On 2024/7/8 13:38, Leon Romanovsky wrote:
> > On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
> >>
> >>
> >> On 2024/7/7 16:30, Leon Romanovsky wrote:
> >>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
> >>>> From: wenglianfa <wenglianfa@huawei.com>
> >>>>
> >>>> During reset, cmdq events won't be reported, leading to a long and
> >>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
> >>>> of reset.
> >>>>
> >>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
> >>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
> >>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> >>>> ---
> >>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
> >>>> 1 file changed, 18 insertions(+)
> >>>>
> >>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>> index a5d746a5cc68..ff135df1a761 100644
> >>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
> >>>>
> >>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
> >>>> }
> >>>> +
> >>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
> >>>> +{
> >>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
> >>>> + int i;
> >>>> +
> >>>> + if (!hr_dev->cmd_mod)
> >>>
> >>> What prevents cmd_mod from being changed?
> >>>
> >>
> >> It's set when the device is being initialized, and won't be changed after that.
> >
> > This is exactly the point, you are assuming that the device is already
> > ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
> > from being called in the middle of initialization?
> >
> > Thanks
> >
>
> This is ensured by hns3 NIC driver.
>
> Initialization and reset of hns RoCE are both called by hns3. It will check the state
> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
> is called) only if the RoCE device has been already initialized:
So why do you have "if (!hr_dev->cmd_mod)" check in the code?
Thanks
>
> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
> 3792 enum hnae3_reset_notify_type type)
> 3793 {
> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
> 3795 struct hnae3_client *client = hdev->roce_client;
> 3796 int ret;
> 3797
> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
> 3799 return 0;
> 3800
> 3801 if (!client->ops->reset_notify)
> 3802 return -EOPNOTSUPP;
> 3803
> 3804 ret = client->ops->reset_notify(handle, type);
> 3805 if (ret)
> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
> 3807 type, ret);
> 3808
> 3809 return ret;
> 3810 }
>
> And the bit is set (see line 11246) after the initialization has been done (line 11242):
>
> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
> 11225 struct hclge_vport *vport)
> 11226 {
> 11227 struct hclge_dev *hdev = ae_dev->priv;
> 11228 struct hnae3_client *client;
> 11229 int rst_cnt;
> 11230 int ret;
> 11231
> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
> 11233 !hdev->nic_client)
> 11234 return 0;
> 11235
> 11236 client = hdev->roce_client;
> 11237 ret = hclge_init_roce_base_info(vport);
> 11238 if (ret)
> 11239 return ret;
> 11240
> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
> 11242 ret = client->ops->init_instance(&vport->roce);
> 11243 if (ret)
> 11244 return ret;
> 11245
> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
> 11249 ret = -EBUSY;
> 11250 goto init_roce_err;
> 11251 }
>
> Junxian
>
> >>
> >> Junxian
> >>
> >>>> + return;
> >>>> +
> >>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
> >>>> + hr_cmd->context[i].result = -EBUSY;
> >>>> + complete(&hr_cmd->context[i].done);
> >>>> + }
> >>>> +}
> >>>> +
> >>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>> {
> >>>> struct hns_roce_dev *hr_dev;
> >>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>> hr_dev->dis_db = true;
> >>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
> >>>>
> >>>> + /* Complete the CMDQ event in advance during the reset. */
> >>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
> >>>> +
> >>>> return 0;
> >>>> }
> >>>>
> >>>> --
> >>>> 2.33.0
> >>>>
> >>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 7:33 ` Leon Romanovsky
@ 2024-07-08 7:46 ` Junxian Huang
2024-07-08 8:27 ` Leon Romanovsky
0 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-08 7:46 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On 2024/7/8 15:33, Leon Romanovsky wrote:
> On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
>>
>>
>> On 2024/7/8 13:38, Leon Romanovsky wrote:
>>> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
>>>>
>>>>
>>>> On 2024/7/7 16:30, Leon Romanovsky wrote:
>>>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
>>>>>> From: wenglianfa <wenglianfa@huawei.com>
>>>>>>
>>>>>> During reset, cmdq events won't be reported, leading to a long and
>>>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
>>>>>> of reset.
>>>>>>
>>>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
>>>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
>>>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>>>>>> ---
>>>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
>>>>>> 1 file changed, 18 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>> index a5d746a5cc68..ff135df1a761 100644
>>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>>>>>
>>>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
>>>>>> }
>>>>>> +
>>>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
>>>>>> +{
>>>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
>>>>>> + int i;
>>>>>> +
>>>>>> + if (!hr_dev->cmd_mod)
>>>>>
>>>>> What prevents cmd_mod from being changed?
>>>>>
>>>>
>>>> It's set when the device is being initialized, and won't be changed after that.
>>>
>>> This is exactly the point, you are assuming that the device is already
>>> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
>>> from being called in the middle of initialization?
>>>
>>> Thanks
>>>
>>
>> This is ensured by hns3 NIC driver.
>>
>> Initialization and reset of hns RoCE are both called by hns3. It will check the state
>> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
>> is called) only if the RoCE device has been already initialized:
>
> So why do you have "if (!hr_dev->cmd_mod)" check in the code?
>
> Thanks
>
cmd_mod indicates the mode of cmdq (0: poll mode, 1: event mode).
This patch only affects event mode because HW won't report events during reset.
Junxian
>>
>> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
>> 3792 enum hnae3_reset_notify_type type)
>> 3793 {
>> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
>> 3795 struct hnae3_client *client = hdev->roce_client;
>> 3796 int ret;
>> 3797
>> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
>> 3799 return 0;
>> 3800
>> 3801 if (!client->ops->reset_notify)
>> 3802 return -EOPNOTSUPP;
>> 3803
>> 3804 ret = client->ops->reset_notify(handle, type);
>> 3805 if (ret)
>> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
>> 3807 type, ret);
>> 3808
>> 3809 return ret;
>> 3810 }
>>
>> And the bit is set (see line 11246) after the initialization has been done (line 11242):
>>
>> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
>> 11225 struct hclge_vport *vport)
>> 11226 {
>> 11227 struct hclge_dev *hdev = ae_dev->priv;
>> 11228 struct hnae3_client *client;
>> 11229 int rst_cnt;
>> 11230 int ret;
>> 11231
>> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
>> 11233 !hdev->nic_client)
>> 11234 return 0;
>> 11235
>> 11236 client = hdev->roce_client;
>> 11237 ret = hclge_init_roce_base_info(vport);
>> 11238 if (ret)
>> 11239 return ret;
>> 11240
>> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
>> 11242 ret = client->ops->init_instance(&vport->roce);
>> 11243 if (ret)
>> 11244 return ret;
>> 11245
>> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
>> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
>> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
>> 11249 ret = -EBUSY;
>> 11250 goto init_roce_err;
>> 11251 }
>>
>> Junxian
>>
>>>>
>>>> Junxian
>>>>
>>>>>> + return;
>>>>>> +
>>>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
>>>>>> + hr_cmd->context[i].result = -EBUSY;
>>>>>> + complete(&hr_cmd->context[i].done);
>>>>>> + }
>>>>>> +}
>>>>>> +
>>>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>> {
>>>>>> struct hns_roce_dev *hr_dev;
>>>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>> hr_dev->dis_db = true;
>>>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
>>>>>>
>>>>>> + /* Complete the CMDQ event in advance during the reset. */
>>>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
>>>>>> +
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> --
>>>>>> 2.33.0
>>>>>>
>>>>
>>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR
2024-07-08 2:44 ` Junxian Huang
2024-07-08 5:41 ` Leon Romanovsky
@ 2024-07-08 7:57 ` Zhu Yanjun
2024-07-08 8:33 ` Leon Romanovsky
1 sibling, 1 reply; 33+ messages in thread
From: Zhu Yanjun @ 2024-07-08 7:57 UTC (permalink / raw)
To: Junxian Huang, jgg, leon; +Cc: linux-rdma, linuxarm, linux-kernel
在 2024/7/8 10:44, Junxian Huang 写道:
>
>
> On 2024/7/7 17:16, Zhu Yanjun wrote:
>> 在 2024/7/5 16:59, Junxian Huang 写道:
>>> From: Chengchang Tang <tangchengchang@huawei.com>
>>>
>>> The offset requires 128B alignment and the page size ranges from
>>> 4K to 128M.
>>>
>>> Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
>>
>> https://patchwork.kernel.org/project/linux-rdma/patch/2eee7e35-504e-4f2a-a364-527e90669108@CMEXHTCAS1.ad.emulex.com/
>> In the above link, from Bart, it seems that FRMR is renamed to FRWR.
>> "
>> There are already a few drivers upstream in which the fast register
>> memory region work request is abbreviated as FRWR. Please consider
>> renaming FRMR into FRWR in order to avoid confusion and in order to
>> make it easier to find related code with grep in the kernel tree.
>> "
>>
>> So is it possible to rename FRMR to FRWR?
>>
>
> I think the rename is irrelevant to this bugfix, and if it needs to be done,
> we'll need a single patch to rename all existing 'FRMR' in hns driver.
>
> So let's leave it as is for now.
Exactly. FRMR is obsolete. We do need a single patch to rename all
existing "FRMR" to "FRWR" in RDMA.
@Leon, please let me know your suggestions.
Thanks,
Zhu Yanjun
>
> Thanks,
> Junxian
>
>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>>> ---
>>> drivers/infiniband/hw/hns/hns_roce_device.h | 4 ++++
>>> drivers/infiniband/hw/hns/hns_roce_mr.c | 5 +++++
>>> 2 files changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
>>> index 5a2445f357ab..15b3b978a601 100644
>>> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
>>> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
>>> @@ -83,6 +83,7 @@
>>> #define MR_TYPE_DMA 0x03
>>> #define HNS_ROCE_FRMR_MAX_PA 512
>>> +#define HNS_ROCE_FRMR_ALIGN_SIZE 128
>>> #define PKEY_ID 0xffff
>>> #define NODE_DESC_SIZE 64
>>> @@ -189,6 +190,9 @@ enum {
>>> #define HNS_HW_PAGE_SHIFT 12
>>> #define HNS_HW_PAGE_SIZE (1 << HNS_HW_PAGE_SHIFT)
>>> +#define HNS_HW_MAX_PAGE_SHIFT 27
>>> +#define HNS_HW_MAX_PAGE_SIZE (1 << HNS_HW_MAX_PAGE_SHIFT)
>>> +
>>> struct hns_roce_uar {
>>> u64 pfn;
>>> unsigned long index;
>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
>>> index 1a61dceb3319..846da8c78b8b 100644
>>> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
>>> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
>>> @@ -443,6 +443,11 @@ int hns_roce_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
>>> struct hns_roce_mtr *mtr = &mr->pbl_mtr;
>>> int ret, sg_num = 0;
>>> + if (!IS_ALIGNED(*sg_offset, HNS_ROCE_FRMR_ALIGN_SIZE) ||
>>> + ibmr->page_size < HNS_HW_PAGE_SIZE ||
>>> + ibmr->page_size > HNS_HW_MAX_PAGE_SIZE)
>>> + return sg_num;
>>> +
>>> mr->npages = 0;
>>> mr->page_list = kvcalloc(mr->pbl_mtr.hem_cfg.buf_pg_count,
>>> sizeof(dma_addr_t), GFP_KERNEL);
>>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 7:46 ` Junxian Huang
@ 2024-07-08 8:27 ` Leon Romanovsky
2024-07-08 8:45 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-08 8:27 UTC (permalink / raw)
To: Junxian Huang; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On Mon, Jul 08, 2024 at 03:46:26PM +0800, Junxian Huang wrote:
>
>
> On 2024/7/8 15:33, Leon Romanovsky wrote:
> > On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
> >>
> >>
> >> On 2024/7/8 13:38, Leon Romanovsky wrote:
> >>> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
> >>>>
> >>>>
> >>>> On 2024/7/7 16:30, Leon Romanovsky wrote:
> >>>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
> >>>>>> From: wenglianfa <wenglianfa@huawei.com>
> >>>>>>
> >>>>>> During reset, cmdq events won't be reported, leading to a long and
> >>>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
> >>>>>> of reset.
> >>>>>>
> >>>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
> >>>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
> >>>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> >>>>>> ---
> >>>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
> >>>>>> 1 file changed, 18 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>> index a5d746a5cc68..ff135df1a761 100644
> >>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
> >>>>>>
> >>>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
> >>>>>> }
> >>>>>> +
> >>>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
> >>>>>> +{
> >>>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
> >>>>>> + int i;
> >>>>>> +
> >>>>>> + if (!hr_dev->cmd_mod)
> >>>>>
> >>>>> What prevents cmd_mod from being changed?
> >>>>>
> >>>>
> >>>> It's set when the device is being initialized, and won't be changed after that.
> >>>
> >>> This is exactly the point, you are assuming that the device is already
> >>> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
> >>> from being called in the middle of initialization?
> >>>
> >>> Thanks
> >>>
> >>
> >> This is ensured by hns3 NIC driver.
> >>
> >> Initialization and reset of hns RoCE are both called by hns3. It will check the state
> >> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
> >> is called) only if the RoCE device has been already initialized:
> >
> > So why do you have "if (!hr_dev->cmd_mod)" check in the code?
> >
> > Thanks
> >
>
> cmd_mod indicates the mode of cmdq (0: poll mode, 1: event mode).
> This patch only affects event mode because HW won't report events during reset.
You set cmd_mod to 1 in hns_roce_hw_v2_init_instance() without any
condition, I don't see when hns v2 IB device is created and continue
to operate in polling mode.
Thanks
>
> Junxian
>
> >>
> >> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
> >> 3792 enum hnae3_reset_notify_type type)
> >> 3793 {
> >> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
> >> 3795 struct hnae3_client *client = hdev->roce_client;
> >> 3796 int ret;
> >> 3797
> >> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
> >> 3799 return 0;
> >> 3800
> >> 3801 if (!client->ops->reset_notify)
> >> 3802 return -EOPNOTSUPP;
> >> 3803
> >> 3804 ret = client->ops->reset_notify(handle, type);
> >> 3805 if (ret)
> >> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
> >> 3807 type, ret);
> >> 3808
> >> 3809 return ret;
> >> 3810 }
> >>
> >> And the bit is set (see line 11246) after the initialization has been done (line 11242):
> >>
> >> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
> >> 11225 struct hclge_vport *vport)
> >> 11226 {
> >> 11227 struct hclge_dev *hdev = ae_dev->priv;
> >> 11228 struct hnae3_client *client;
> >> 11229 int rst_cnt;
> >> 11230 int ret;
> >> 11231
> >> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
> >> 11233 !hdev->nic_client)
> >> 11234 return 0;
> >> 11235
> >> 11236 client = hdev->roce_client;
> >> 11237 ret = hclge_init_roce_base_info(vport);
> >> 11238 if (ret)
> >> 11239 return ret;
> >> 11240
> >> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
> >> 11242 ret = client->ops->init_instance(&vport->roce);
> >> 11243 if (ret)
> >> 11244 return ret;
> >> 11245
> >> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
> >> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
> >> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
> >> 11249 ret = -EBUSY;
> >> 11250 goto init_roce_err;
> >> 11251 }
> >>
> >> Junxian
> >>
> >>>>
> >>>> Junxian
> >>>>
> >>>>>> + return;
> >>>>>> +
> >>>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
> >>>>>> + hr_cmd->context[i].result = -EBUSY;
> >>>>>> + complete(&hr_cmd->context[i].done);
> >>>>>> + }
> >>>>>> +}
> >>>>>> +
> >>>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>>>> {
> >>>>>> struct hns_roce_dev *hr_dev;
> >>>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>>>> hr_dev->dis_db = true;
> >>>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
> >>>>>>
> >>>>>> + /* Complete the CMDQ event in advance during the reset. */
> >>>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
> >>>>>> +
> >>>>>> return 0;
> >>>>>> }
> >>>>>>
> >>>>>> --
> >>>>>> 2.33.0
> >>>>>>
> >>>>
> >>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR
2024-07-08 7:57 ` Zhu Yanjun
@ 2024-07-08 8:33 ` Leon Romanovsky
0 siblings, 0 replies; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-08 8:33 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: Junxian Huang, jgg, linux-rdma, linuxarm, linux-kernel
On Mon, Jul 08, 2024 at 03:57:49PM +0800, Zhu Yanjun wrote:
> 在 2024/7/8 10:44, Junxian Huang 写道:
> >
> >
> > On 2024/7/7 17:16, Zhu Yanjun wrote:
> > > 在 2024/7/5 16:59, Junxian Huang 写道:
> > > > From: Chengchang Tang <tangchengchang@huawei.com>
> > > >
> > > > The offset requires 128B alignment and the page size ranges from
> > > > 4K to 128M.
> > > >
> > > > Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
> > >
> > > https://patchwork.kernel.org/project/linux-rdma/patch/2eee7e35-504e-4f2a-a364-527e90669108@CMEXHTCAS1.ad.emulex.com/
> > > In the above link, from Bart, it seems that FRMR is renamed to FRWR.
> > > "
> > > There are already a few drivers upstream in which the fast register
> > > memory region work request is abbreviated as FRWR. Please consider
> > > renaming FRMR into FRWR in order to avoid confusion and in order to
> > > make it easier to find related code with grep in the kernel tree.
> > > "
> > >
> > > So is it possible to rename FRMR to FRWR?
> > >
> >
> > I think the rename is irrelevant to this bugfix, and if it needs to be done,
> > we'll need a single patch to rename all existing 'FRMR' in hns driver.
> >
> > So let's leave it as is for now.
>
> Exactly. FRMR is obsolete. We do need a single patch to rename all existing
> "FRMR" to "FRWR" in RDMA.
>
> @Leon, please let me know your suggestions.
Our preference is to avoid from churn patches which are not part of some
larger series. In this case, rename won't give us any benefit, but will
create a lot of noise for the backports.
There is no need to be cruel to the people who are doing these backports
just because "some word was removed from the dictionary".
Thanks
> Thanks,
>
> Zhu Yanjun
>
> >
> > Thanks,
> > Junxian
> >
> > > > Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> > > > Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> > > > ---
> > > > drivers/infiniband/hw/hns/hns_roce_device.h | 4 ++++
> > > > drivers/infiniband/hw/hns/hns_roce_mr.c | 5 +++++
> > > > 2 files changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> > > > index 5a2445f357ab..15b3b978a601 100644
> > > > --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> > > > +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> > > > @@ -83,6 +83,7 @@
> > > > #define MR_TYPE_DMA 0x03
> > > > #define HNS_ROCE_FRMR_MAX_PA 512
> > > > +#define HNS_ROCE_FRMR_ALIGN_SIZE 128
> > > > #define PKEY_ID 0xffff
> > > > #define NODE_DESC_SIZE 64
> > > > @@ -189,6 +190,9 @@ enum {
> > > > #define HNS_HW_PAGE_SHIFT 12
> > > > #define HNS_HW_PAGE_SIZE (1 << HNS_HW_PAGE_SHIFT)
> > > > +#define HNS_HW_MAX_PAGE_SHIFT 27
> > > > +#define HNS_HW_MAX_PAGE_SIZE (1 << HNS_HW_MAX_PAGE_SHIFT)
> > > > +
> > > > struct hns_roce_uar {
> > > > u64 pfn;
> > > > unsigned long index;
> > > > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > index 1a61dceb3319..846da8c78b8b 100644
> > > > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > @@ -443,6 +443,11 @@ int hns_roce_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
> > > > struct hns_roce_mtr *mtr = &mr->pbl_mtr;
> > > > int ret, sg_num = 0;
> > > > + if (!IS_ALIGNED(*sg_offset, HNS_ROCE_FRMR_ALIGN_SIZE) ||
> > > > + ibmr->page_size < HNS_HW_PAGE_SIZE ||
> > > > + ibmr->page_size > HNS_HW_MAX_PAGE_SIZE)
> > > > + return sg_num;
> > > > +
> > > > mr->npages = 0;
> > > > mr->page_list = kvcalloc(mr->pbl_mtr.hem_cfg.buf_pg_count,
> > > > sizeof(dma_addr_t), GFP_KERNEL);
> > >
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 8:27 ` Leon Romanovsky
@ 2024-07-08 8:45 ` Junxian Huang
2024-07-08 8:59 ` Leon Romanovsky
0 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-08 8:45 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On 2024/7/8 16:27, Leon Romanovsky wrote:
> On Mon, Jul 08, 2024 at 03:46:26PM +0800, Junxian Huang wrote:
>>
>>
>> On 2024/7/8 15:33, Leon Romanovsky wrote:
>>> On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
>>>>
>>>>
>>>> On 2024/7/8 13:38, Leon Romanovsky wrote:
>>>>> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
>>>>>>
>>>>>>
>>>>>> On 2024/7/7 16:30, Leon Romanovsky wrote:
>>>>>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
>>>>>>>> From: wenglianfa <wenglianfa@huawei.com>
>>>>>>>>
>>>>>>>> During reset, cmdq events won't be reported, leading to a long and
>>>>>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
>>>>>>>> of reset.
>>>>>>>>
>>>>>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
>>>>>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
>>>>>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>>>>>>>> ---
>>>>>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
>>>>>>>> 1 file changed, 18 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>> index a5d746a5cc68..ff135df1a761 100644
>>>>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>>>>>>>
>>>>>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
>>>>>>>> }
>>>>>>>> +
>>>>>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
>>>>>>>> +{
>>>>>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
>>>>>>>> + int i;
>>>>>>>> +
>>>>>>>> + if (!hr_dev->cmd_mod)
>>>>>>>
>>>>>>> What prevents cmd_mod from being changed?
>>>>>>>
>>>>>>
>>>>>> It's set when the device is being initialized, and won't be changed after that.
>>>>>
>>>>> This is exactly the point, you are assuming that the device is already
>>>>> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
>>>>> from being called in the middle of initialization?
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>> This is ensured by hns3 NIC driver.
>>>>
>>>> Initialization and reset of hns RoCE are both called by hns3. It will check the state
>>>> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
>>>> is called) only if the RoCE device has been already initialized:
>>>
>>> So why do you have "if (!hr_dev->cmd_mod)" check in the code?
>>>
>>> Thanks
>>>
>>
>> cmd_mod indicates the mode of cmdq (0: poll mode, 1: event mode).
>> This patch only affects event mode because HW won't report events during reset.
>
> You set cmd_mod to 1 in hns_roce_hw_v2_init_instance() without any
> condition, I don't see when hns v2 IB device is created and continue
> to operate in polling mode.
>
> Thanks
>
Event mode is the default. In hns_roce_cmd_use_events(), if kcalloc() fails
then it'll be set to polling mode instead.
Junxian
>>
>> Junxian
>>
>>>>
>>>> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
>>>> 3792 enum hnae3_reset_notify_type type)
>>>> 3793 {
>>>> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
>>>> 3795 struct hnae3_client *client = hdev->roce_client;
>>>> 3796 int ret;
>>>> 3797
>>>> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
>>>> 3799 return 0;
>>>> 3800
>>>> 3801 if (!client->ops->reset_notify)
>>>> 3802 return -EOPNOTSUPP;
>>>> 3803
>>>> 3804 ret = client->ops->reset_notify(handle, type);
>>>> 3805 if (ret)
>>>> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
>>>> 3807 type, ret);
>>>> 3808
>>>> 3809 return ret;
>>>> 3810 }
>>>>
>>>> And the bit is set (see line 11246) after the initialization has been done (line 11242):
>>>>
>>>> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
>>>> 11225 struct hclge_vport *vport)
>>>> 11226 {
>>>> 11227 struct hclge_dev *hdev = ae_dev->priv;
>>>> 11228 struct hnae3_client *client;
>>>> 11229 int rst_cnt;
>>>> 11230 int ret;
>>>> 11231
>>>> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
>>>> 11233 !hdev->nic_client)
>>>> 11234 return 0;
>>>> 11235
>>>> 11236 client = hdev->roce_client;
>>>> 11237 ret = hclge_init_roce_base_info(vport);
>>>> 11238 if (ret)
>>>> 11239 return ret;
>>>> 11240
>>>> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
>>>> 11242 ret = client->ops->init_instance(&vport->roce);
>>>> 11243 if (ret)
>>>> 11244 return ret;
>>>> 11245
>>>> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
>>>> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
>>>> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
>>>> 11249 ret = -EBUSY;
>>>> 11250 goto init_roce_err;
>>>> 11251 }
>>>>
>>>> Junxian
>>>>
>>>>>>
>>>>>> Junxian
>>>>>>
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
>>>>>>>> + hr_cmd->context[i].result = -EBUSY;
>>>>>>>> + complete(&hr_cmd->context[i].done);
>>>>>>>> + }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>>>> {
>>>>>>>> struct hns_roce_dev *hr_dev;
>>>>>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>>>> hr_dev->dis_db = true;
>>>>>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
>>>>>>>>
>>>>>>>> + /* Complete the CMDQ event in advance during the reset. */
>>>>>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
>>>>>>>> +
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.33.0
>>>>>>>>
>>>>>>
>>>>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 8:45 ` Junxian Huang
@ 2024-07-08 8:59 ` Leon Romanovsky
2024-07-08 9:30 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-08 8:59 UTC (permalink / raw)
To: Junxian Huang; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On Mon, Jul 08, 2024 at 04:45:34PM +0800, Junxian Huang wrote:
>
>
> On 2024/7/8 16:27, Leon Romanovsky wrote:
> > On Mon, Jul 08, 2024 at 03:46:26PM +0800, Junxian Huang wrote:
> >>
> >>
> >> On 2024/7/8 15:33, Leon Romanovsky wrote:
> >>> On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
> >>>>
> >>>>
> >>>> On 2024/7/8 13:38, Leon Romanovsky wrote:
> >>>>> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 2024/7/7 16:30, Leon Romanovsky wrote:
> >>>>>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
> >>>>>>>> From: wenglianfa <wenglianfa@huawei.com>
> >>>>>>>>
> >>>>>>>> During reset, cmdq events won't be reported, leading to a long and
> >>>>>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
> >>>>>>>> of reset.
> >>>>>>>>
> >>>>>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
> >>>>>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
> >>>>>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> >>>>>>>> ---
> >>>>>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
> >>>>>>>> 1 file changed, 18 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>>>> index a5d746a5cc68..ff135df1a761 100644
> >>>>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
> >>>>>>>>
> >>>>>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
> >>>>>>>> }
> >>>>>>>> +
> >>>>>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
> >>>>>>>> +{
> >>>>>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
> >>>>>>>> + int i;
> >>>>>>>> +
> >>>>>>>> + if (!hr_dev->cmd_mod)
> >>>>>>>
> >>>>>>> What prevents cmd_mod from being changed?
> >>>>>>>
> >>>>>>
> >>>>>> It's set when the device is being initialized, and won't be changed after that.
> >>>>>
> >>>>> This is exactly the point, you are assuming that the device is already
> >>>>> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
> >>>>> from being called in the middle of initialization?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>
> >>>> This is ensured by hns3 NIC driver.
> >>>>
> >>>> Initialization and reset of hns RoCE are both called by hns3. It will check the state
> >>>> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
> >>>> is called) only if the RoCE device has been already initialized:
> >>>
> >>> So why do you have "if (!hr_dev->cmd_mod)" check in the code?
> >>>
> >>> Thanks
> >>>
> >>
> >> cmd_mod indicates the mode of cmdq (0: poll mode, 1: event mode).
> >> This patch only affects event mode because HW won't report events during reset.
> >
> > You set cmd_mod to 1 in hns_roce_hw_v2_init_instance() without any
> > condition, I don't see when hns v2 IB device is created and continue
> > to operate in polling mode.
> >
> > Thanks
> >
>
> Event mode is the default. In hns_roce_cmd_use_events(), if kcalloc() fails
> then it'll be set to polling mode instead.
1. Awesome, and we are returning back to the question. What prevents
hns_roce_v2_reset_notify_cmd() from being called in the middle of
changing cmd_mod from 1 to 0 and from 0 to 1?
2. This cmd_mode swtich from 1 to 0 should be removed. Failure to
allocate memory is not a reason to switch to polling mode. The reason
can be HW limitation, but not OS limitation.
Thanks
>
> Junxian
>
> >>
> >> Junxian
> >>
> >>>>
> >>>> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
> >>>> 3792 enum hnae3_reset_notify_type type)
> >>>> 3793 {
> >>>> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
> >>>> 3795 struct hnae3_client *client = hdev->roce_client;
> >>>> 3796 int ret;
> >>>> 3797
> >>>> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
> >>>> 3799 return 0;
> >>>> 3800
> >>>> 3801 if (!client->ops->reset_notify)
> >>>> 3802 return -EOPNOTSUPP;
> >>>> 3803
> >>>> 3804 ret = client->ops->reset_notify(handle, type);
> >>>> 3805 if (ret)
> >>>> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
> >>>> 3807 type, ret);
> >>>> 3808
> >>>> 3809 return ret;
> >>>> 3810 }
> >>>>
> >>>> And the bit is set (see line 11246) after the initialization has been done (line 11242):
> >>>>
> >>>> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
> >>>> 11225 struct hclge_vport *vport)
> >>>> 11226 {
> >>>> 11227 struct hclge_dev *hdev = ae_dev->priv;
> >>>> 11228 struct hnae3_client *client;
> >>>> 11229 int rst_cnt;
> >>>> 11230 int ret;
> >>>> 11231
> >>>> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
> >>>> 11233 !hdev->nic_client)
> >>>> 11234 return 0;
> >>>> 11235
> >>>> 11236 client = hdev->roce_client;
> >>>> 11237 ret = hclge_init_roce_base_info(vport);
> >>>> 11238 if (ret)
> >>>> 11239 return ret;
> >>>> 11240
> >>>> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
> >>>> 11242 ret = client->ops->init_instance(&vport->roce);
> >>>> 11243 if (ret)
> >>>> 11244 return ret;
> >>>> 11245
> >>>> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
> >>>> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
> >>>> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
> >>>> 11249 ret = -EBUSY;
> >>>> 11250 goto init_roce_err;
> >>>> 11251 }
> >>>>
> >>>> Junxian
> >>>>
> >>>>>>
> >>>>>> Junxian
> >>>>>>
> >>>>>>>> + return;
> >>>>>>>> +
> >>>>>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
> >>>>>>>> + hr_cmd->context[i].result = -EBUSY;
> >>>>>>>> + complete(&hr_cmd->context[i].done);
> >>>>>>>> + }
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>>>>>> {
> >>>>>>>> struct hns_roce_dev *hr_dev;
> >>>>>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>>>>>> hr_dev->dis_db = true;
> >>>>>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
> >>>>>>>>
> >>>>>>>> + /* Complete the CMDQ event in advance during the reset. */
> >>>>>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
> >>>>>>>> +
> >>>>>>>> return 0;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> 2.33.0
> >>>>>>>>
> >>>>>>
> >>>>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 8:59 ` Leon Romanovsky
@ 2024-07-08 9:30 ` Junxian Huang
2024-07-08 11:16 ` Leon Romanovsky
0 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-08 9:30 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On 2024/7/8 16:59, Leon Romanovsky wrote:
> On Mon, Jul 08, 2024 at 04:45:34PM +0800, Junxian Huang wrote:
>>
>>
>> On 2024/7/8 16:27, Leon Romanovsky wrote:
>>> On Mon, Jul 08, 2024 at 03:46:26PM +0800, Junxian Huang wrote:
>>>>
>>>>
>>>> On 2024/7/8 15:33, Leon Romanovsky wrote:
>>>>> On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
>>>>>>
>>>>>>
>>>>>> On 2024/7/8 13:38, Leon Romanovsky wrote:
>>>>>>> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/7/7 16:30, Leon Romanovsky wrote:
>>>>>>>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
>>>>>>>>>> From: wenglianfa <wenglianfa@huawei.com>
>>>>>>>>>>
>>>>>>>>>> During reset, cmdq events won't be reported, leading to a long and
>>>>>>>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
>>>>>>>>>> of reset.
>>>>>>>>>>
>>>>>>>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
>>>>>>>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
>>>>>>>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>>>>>>>>>> ---
>>>>>>>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
>>>>>>>>>> 1 file changed, 18 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>>>> index a5d746a5cc68..ff135df1a761 100644
>>>>>>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>>>>>>>>>
>>>>>>>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
>>>>>>>>>> }
>>>>>>>>>> +
>>>>>>>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
>>>>>>>>>> +{
>>>>>>>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
>>>>>>>>>> + int i;
>>>>>>>>>> +
>>>>>>>>>> + if (!hr_dev->cmd_mod)
>>>>>>>>>
>>>>>>>>> What prevents cmd_mod from being changed?
>>>>>>>>>
>>>>>>>>
>>>>>>>> It's set when the device is being initialized, and won't be changed after that.
>>>>>>>
>>>>>>> This is exactly the point, you are assuming that the device is already
>>>>>>> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
>>>>>>> from being called in the middle of initialization?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>> This is ensured by hns3 NIC driver.
>>>>>>
>>>>>> Initialization and reset of hns RoCE are both called by hns3. It will check the state
>>>>>> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
>>>>>> is called) only if the RoCE device has been already initialized:
>>>>>
>>>>> So why do you have "if (!hr_dev->cmd_mod)" check in the code?
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>> cmd_mod indicates the mode of cmdq (0: poll mode, 1: event mode).
>>>> This patch only affects event mode because HW won't report events during reset.
>>>
>>> You set cmd_mod to 1 in hns_roce_hw_v2_init_instance() without any
>>> condition, I don't see when hns v2 IB device is created and continue
>>> to operate in polling mode.
>>>
>>> Thanks
>>>
>>
>> Event mode is the default. In hns_roce_cmd_use_events(), if kcalloc() fails
>> then it'll be set to polling mode instead.
>
> 1. Awesome, and we are returning back to the question. What prevents
> hns_roce_v2_reset_notify_cmd() from being called in the middle of
> changing cmd_mod from 1 to 0 and from 0 to 1?
The changing of cmd_mod is during the initialization of a device. The call
of hns_roce_v2_reset_notify_cmd() is during reset. As I said previously,
the hns3 NIC driver ensures that there will be no concurrency between
initialization and reset, and therefore hns_roce_v2_reset_notify_cmd() won't
be called in the middle of changing cmd_mod.
> 2. This cmd_mode swtich from 1 to 0 should be removed. Failure to
> allocate memory is not a reason to switch to polling mode. The reason
> can be HW limitation, but not OS limitation.
>
But event mode relies on the allocated resource. If the allocation fails and
we don't switch to polling mode, the driver won't work any more. Are you suggesting
we should return an error and fail the initialization in this case?
Junxian
> Thanks
>
>>
>> Junxian
>>
>>>>
>>>> Junxian
>>>>
>>>>>>
>>>>>> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
>>>>>> 3792 enum hnae3_reset_notify_type type)
>>>>>> 3793 {
>>>>>> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
>>>>>> 3795 struct hnae3_client *client = hdev->roce_client;
>>>>>> 3796 int ret;
>>>>>> 3797
>>>>>> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
>>>>>> 3799 return 0;
>>>>>> 3800
>>>>>> 3801 if (!client->ops->reset_notify)
>>>>>> 3802 return -EOPNOTSUPP;
>>>>>> 3803
>>>>>> 3804 ret = client->ops->reset_notify(handle, type);
>>>>>> 3805 if (ret)
>>>>>> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
>>>>>> 3807 type, ret);
>>>>>> 3808
>>>>>> 3809 return ret;
>>>>>> 3810 }
>>>>>>
>>>>>> And the bit is set (see line 11246) after the initialization has been done (line 11242):
>>>>>>
>>>>>> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
>>>>>> 11225 struct hclge_vport *vport)
>>>>>> 11226 {
>>>>>> 11227 struct hclge_dev *hdev = ae_dev->priv;
>>>>>> 11228 struct hnae3_client *client;
>>>>>> 11229 int rst_cnt;
>>>>>> 11230 int ret;
>>>>>> 11231
>>>>>> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
>>>>>> 11233 !hdev->nic_client)
>>>>>> 11234 return 0;
>>>>>> 11235
>>>>>> 11236 client = hdev->roce_client;
>>>>>> 11237 ret = hclge_init_roce_base_info(vport);
>>>>>> 11238 if (ret)
>>>>>> 11239 return ret;
>>>>>> 11240
>>>>>> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
>>>>>> 11242 ret = client->ops->init_instance(&vport->roce);
>>>>>> 11243 if (ret)
>>>>>> 11244 return ret;
>>>>>> 11245
>>>>>> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
>>>>>> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
>>>>>> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
>>>>>> 11249 ret = -EBUSY;
>>>>>> 11250 goto init_roce_err;
>>>>>> 11251 }
>>>>>>
>>>>>> Junxian
>>>>>>
>>>>>>>>
>>>>>>>> Junxian
>>>>>>>>
>>>>>>>>>> + return;
>>>>>>>>>> +
>>>>>>>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
>>>>>>>>>> + hr_cmd->context[i].result = -EBUSY;
>>>>>>>>>> + complete(&hr_cmd->context[i].done);
>>>>>>>>>> + }
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>>>>>> {
>>>>>>>>>> struct hns_roce_dev *hr_dev;
>>>>>>>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>>>>>> hr_dev->dis_db = true;
>>>>>>>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
>>>>>>>>>>
>>>>>>>>>> + /* Complete the CMDQ event in advance during the reset. */
>>>>>>>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
>>>>>>>>>> +
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> 2.33.0
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 9:30 ` Junxian Huang
@ 2024-07-08 11:16 ` Leon Romanovsky
2024-07-09 6:21 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-08 11:16 UTC (permalink / raw)
To: Junxian Huang; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On Mon, Jul 08, 2024 at 05:30:58PM +0800, Junxian Huang wrote:
>
>
> On 2024/7/8 16:59, Leon Romanovsky wrote:
> > On Mon, Jul 08, 2024 at 04:45:34PM +0800, Junxian Huang wrote:
> >>
> >>
> >> On 2024/7/8 16:27, Leon Romanovsky wrote:
> >>> On Mon, Jul 08, 2024 at 03:46:26PM +0800, Junxian Huang wrote:
> >>>>
> >>>>
> >>>> On 2024/7/8 15:33, Leon Romanovsky wrote:
> >>>>> On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 2024/7/8 13:38, Leon Romanovsky wrote:
> >>>>>>> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2024/7/7 16:30, Leon Romanovsky wrote:
> >>>>>>>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
> >>>>>>>>>> From: wenglianfa <wenglianfa@huawei.com>
> >>>>>>>>>>
> >>>>>>>>>> During reset, cmdq events won't be reported, leading to a long and
> >>>>>>>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
> >>>>>>>>>> of reset.
> >>>>>>>>>>
> >>>>>>>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
> >>>>>>>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
> >>>>>>>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> >>>>>>>>>> ---
> >>>>>>>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
> >>>>>>>>>> 1 file changed, 18 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>>>>>> index a5d746a5cc68..ff135df1a761 100644
> >>>>>>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>>>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
> >>>>>>>>>>
> >>>>>>>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
> >>>>>>>>>> }
> >>>>>>>>>> +
> >>>>>>>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
> >>>>>>>>>> +{
> >>>>>>>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
> >>>>>>>>>> + int i;
> >>>>>>>>>> +
> >>>>>>>>>> + if (!hr_dev->cmd_mod)
> >>>>>>>>>
> >>>>>>>>> What prevents cmd_mod from being changed?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> It's set when the device is being initialized, and won't be changed after that.
> >>>>>>>
> >>>>>>> This is exactly the point, you are assuming that the device is already
> >>>>>>> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
> >>>>>>> from being called in the middle of initialization?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>
> >>>>>> This is ensured by hns3 NIC driver.
> >>>>>>
> >>>>>> Initialization and reset of hns RoCE are both called by hns3. It will check the state
> >>>>>> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
> >>>>>> is called) only if the RoCE device has been already initialized:
> >>>>>
> >>>>> So why do you have "if (!hr_dev->cmd_mod)" check in the code?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>
> >>>> cmd_mod indicates the mode of cmdq (0: poll mode, 1: event mode).
> >>>> This patch only affects event mode because HW won't report events during reset.
> >>>
> >>> You set cmd_mod to 1 in hns_roce_hw_v2_init_instance() without any
> >>> condition, I don't see when hns v2 IB device is created and continue
> >>> to operate in polling mode.
> >>>
> >>> Thanks
> >>>
> >>
> >> Event mode is the default. In hns_roce_cmd_use_events(), if kcalloc() fails
> >> then it'll be set to polling mode instead.
> >
> > 1. Awesome, and we are returning back to the question. What prevents
> > hns_roce_v2_reset_notify_cmd() from being called in the middle of
> > changing cmd_mod from 1 to 0 and from 0 to 1?
>
> The changing of cmd_mod is during the initialization of a device. The call
> of hns_roce_v2_reset_notify_cmd() is during reset. As I said previously,
> the hns3 NIC driver ensures that there will be no concurrency between
> initialization and reset, and therefore hns_roce_v2_reset_notify_cmd() won't
> be called in the middle of changing cmd_mod.
>
> > 2. This cmd_mode swtich from 1 to 0 should be removed. Failure to
> > allocate memory is not a reason to switch to polling mode. The reason
> > can be HW limitation, but not OS limitation.
> >
>
> But event mode relies on the allocated resource. If the allocation fails and
> we don't switch to polling mode, the driver won't work any more. Are you suggesting
> we should return an error and fail the initialization in this case?
Yes, please.
Thanks
>
> Junxian
>
> > Thanks
> >
> >>
> >> Junxian
> >>
> >>>>
> >>>> Junxian
> >>>>
> >>>>>>
> >>>>>> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
> >>>>>> 3792 enum hnae3_reset_notify_type type)
> >>>>>> 3793 {
> >>>>>> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
> >>>>>> 3795 struct hnae3_client *client = hdev->roce_client;
> >>>>>> 3796 int ret;
> >>>>>> 3797
> >>>>>> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
> >>>>>> 3799 return 0;
> >>>>>> 3800
> >>>>>> 3801 if (!client->ops->reset_notify)
> >>>>>> 3802 return -EOPNOTSUPP;
> >>>>>> 3803
> >>>>>> 3804 ret = client->ops->reset_notify(handle, type);
> >>>>>> 3805 if (ret)
> >>>>>> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
> >>>>>> 3807 type, ret);
> >>>>>> 3808
> >>>>>> 3809 return ret;
> >>>>>> 3810 }
> >>>>>>
> >>>>>> And the bit is set (see line 11246) after the initialization has been done (line 11242):
> >>>>>>
> >>>>>> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
> >>>>>> 11225 struct hclge_vport *vport)
> >>>>>> 11226 {
> >>>>>> 11227 struct hclge_dev *hdev = ae_dev->priv;
> >>>>>> 11228 struct hnae3_client *client;
> >>>>>> 11229 int rst_cnt;
> >>>>>> 11230 int ret;
> >>>>>> 11231
> >>>>>> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
> >>>>>> 11233 !hdev->nic_client)
> >>>>>> 11234 return 0;
> >>>>>> 11235
> >>>>>> 11236 client = hdev->roce_client;
> >>>>>> 11237 ret = hclge_init_roce_base_info(vport);
> >>>>>> 11238 if (ret)
> >>>>>> 11239 return ret;
> >>>>>> 11240
> >>>>>> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
> >>>>>> 11242 ret = client->ops->init_instance(&vport->roce);
> >>>>>> 11243 if (ret)
> >>>>>> 11244 return ret;
> >>>>>> 11245
> >>>>>> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
> >>>>>> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
> >>>>>> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
> >>>>>> 11249 ret = -EBUSY;
> >>>>>> 11250 goto init_roce_err;
> >>>>>> 11251 }
> >>>>>>
> >>>>>> Junxian
> >>>>>>
> >>>>>>>>
> >>>>>>>> Junxian
> >>>>>>>>
> >>>>>>>>>> + return;
> >>>>>>>>>> +
> >>>>>>>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
> >>>>>>>>>> + hr_cmd->context[i].result = -EBUSY;
> >>>>>>>>>> + complete(&hr_cmd->context[i].done);
> >>>>>>>>>> + }
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>>>>>>>> {
> >>>>>>>>>> struct hns_roce_dev *hr_dev;
> >>>>>>>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>>>>>>>> hr_dev->dis_db = true;
> >>>>>>>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
> >>>>>>>>>>
> >>>>>>>>>> + /* Complete the CMDQ event in advance during the reset. */
> >>>>>>>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
> >>>>>>>>>> +
> >>>>>>>>>> return 0;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> 2.33.0
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> >
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-08 11:16 ` Leon Romanovsky
@ 2024-07-09 6:21 ` Junxian Huang
2024-07-09 7:22 ` Leon Romanovsky
0 siblings, 1 reply; 33+ messages in thread
From: Junxian Huang @ 2024-07-09 6:21 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On 2024/7/8 19:16, Leon Romanovsky wrote:
> On Mon, Jul 08, 2024 at 05:30:58PM +0800, Junxian Huang wrote:
>>
>>
>> On 2024/7/8 16:59, Leon Romanovsky wrote:
>>> On Mon, Jul 08, 2024 at 04:45:34PM +0800, Junxian Huang wrote:
>>>>
>>>>
>>>> On 2024/7/8 16:27, Leon Romanovsky wrote:
>>>>> On Mon, Jul 08, 2024 at 03:46:26PM +0800, Junxian Huang wrote:
>>>>>>
>>>>>>
>>>>>> On 2024/7/8 15:33, Leon Romanovsky wrote:
>>>>>>> On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/7/8 13:38, Leon Romanovsky wrote:
>>>>>>>>> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2024/7/7 16:30, Leon Romanovsky wrote:
>>>>>>>>>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
>>>>>>>>>>>> From: wenglianfa <wenglianfa@huawei.com>
>>>>>>>>>>>>
>>>>>>>>>>>> During reset, cmdq events won't be reported, leading to a long and
>>>>>>>>>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
>>>>>>>>>>>> of reset.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
>>>>>>>>>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
>>>>>>>>>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
>>>>>>>>>>>> 1 file changed, 18 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>>>>>> index a5d746a5cc68..ff135df1a761 100644
>>>>>>>>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>>>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>>>>>>>>>>>
>>>>>>>>>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
>>>>>>>>>>>> }
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
>>>>>>>>>>>> +{
>>>>>>>>>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
>>>>>>>>>>>> + int i;
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (!hr_dev->cmd_mod)
>>>>>>>>>>>
>>>>>>>>>>> What prevents cmd_mod from being changed?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It's set when the device is being initialized, and won't be changed after that.
>>>>>>>>>
>>>>>>>>> This is exactly the point, you are assuming that the device is already
>>>>>>>>> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
>>>>>>>>> from being called in the middle of initialization?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is ensured by hns3 NIC driver.
>>>>>>>>
>>>>>>>> Initialization and reset of hns RoCE are both called by hns3. It will check the state
>>>>>>>> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
>>>>>>>> is called) only if the RoCE device has been already initialized:
>>>>>>>
>>>>>>> So why do you have "if (!hr_dev->cmd_mod)" check in the code?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>> cmd_mod indicates the mode of cmdq (0: poll mode, 1: event mode).
>>>>>> This patch only affects event mode because HW won't report events during reset.
>>>>>
>>>>> You set cmd_mod to 1 in hns_roce_hw_v2_init_instance() without any
>>>>> condition, I don't see when hns v2 IB device is created and continue
>>>>> to operate in polling mode.
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>> Event mode is the default. In hns_roce_cmd_use_events(), if kcalloc() fails
>>>> then it'll be set to polling mode instead.
>>>
>>> 1. Awesome, and we are returning back to the question. What prevents
>>> hns_roce_v2_reset_notify_cmd() from being called in the middle of
>>> changing cmd_mod from 1 to 0 and from 0 to 1?
>>
>> The changing of cmd_mod is during the initialization of a device. The call
>> of hns_roce_v2_reset_notify_cmd() is during reset. As I said previously,
>> the hns3 NIC driver ensures that there will be no concurrency between
>> initialization and reset, and therefore hns_roce_v2_reset_notify_cmd() won't
>> be called in the middle of changing cmd_mod.
>>
>>> 2. This cmd_mode swtich from 1 to 0 should be removed. Failure to
>>> allocate memory is not a reason to switch to polling mode. The reason
>>> can be HW limitation, but not OS limitation.
>>>
>>
>> But event mode relies on the allocated resource. If the allocation fails and
>> we don't switch to polling mode, the driver won't work any more. Are you suggesting
>> we should return an error and fail the initialization in this case?
>
> Yes, please.
The reason of switching cmd_mod is that we try to keep the driver available,
even if the allocation of event mode resources fails. We don't consider this
as a critical error that should lead to an initialization failure. The driver
can still post mailbox and provide normal functionality in this case.
Our discussion seems to have strayed a bit away?This patch doesn't involve
polling mode.
Junxian
>
> Thanks
>
>>
>> Junxian
>>
>>> Thanks
>>>
>>>>
>>>> Junxian
>>>>
>>>>>>
>>>>>> Junxian
>>>>>>
>>>>>>>>
>>>>>>>> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
>>>>>>>> 3792 enum hnae3_reset_notify_type type)
>>>>>>>> 3793 {
>>>>>>>> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
>>>>>>>> 3795 struct hnae3_client *client = hdev->roce_client;
>>>>>>>> 3796 int ret;
>>>>>>>> 3797
>>>>>>>> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
>>>>>>>> 3799 return 0;
>>>>>>>> 3800
>>>>>>>> 3801 if (!client->ops->reset_notify)
>>>>>>>> 3802 return -EOPNOTSUPP;
>>>>>>>> 3803
>>>>>>>> 3804 ret = client->ops->reset_notify(handle, type);
>>>>>>>> 3805 if (ret)
>>>>>>>> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
>>>>>>>> 3807 type, ret);
>>>>>>>> 3808
>>>>>>>> 3809 return ret;
>>>>>>>> 3810 }
>>>>>>>>
>>>>>>>> And the bit is set (see line 11246) after the initialization has been done (line 11242):
>>>>>>>>
>>>>>>>> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
>>>>>>>> 11225 struct hclge_vport *vport)
>>>>>>>> 11226 {
>>>>>>>> 11227 struct hclge_dev *hdev = ae_dev->priv;
>>>>>>>> 11228 struct hnae3_client *client;
>>>>>>>> 11229 int rst_cnt;
>>>>>>>> 11230 int ret;
>>>>>>>> 11231
>>>>>>>> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
>>>>>>>> 11233 !hdev->nic_client)
>>>>>>>> 11234 return 0;
>>>>>>>> 11235
>>>>>>>> 11236 client = hdev->roce_client;
>>>>>>>> 11237 ret = hclge_init_roce_base_info(vport);
>>>>>>>> 11238 if (ret)
>>>>>>>> 11239 return ret;
>>>>>>>> 11240
>>>>>>>> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
>>>>>>>> 11242 ret = client->ops->init_instance(&vport->roce);
>>>>>>>> 11243 if (ret)
>>>>>>>> 11244 return ret;
>>>>>>>> 11245
>>>>>>>> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
>>>>>>>> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
>>>>>>>> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
>>>>>>>> 11249 ret = -EBUSY;
>>>>>>>> 11250 goto init_roce_err;
>>>>>>>> 11251 }
>>>>>>>>
>>>>>>>> Junxian
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Junxian
>>>>>>>>>>
>>>>>>>>>>>> + return;
>>>>>>>>>>>> +
>>>>>>>>>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
>>>>>>>>>>>> + hr_cmd->context[i].result = -EBUSY;
>>>>>>>>>>>> + complete(&hr_cmd->context[i].done);
>>>>>>>>>>>> + }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>>>>>>>> {
>>>>>>>>>>>> struct hns_roce_dev *hr_dev;
>>>>>>>>>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>>>>>>>> hr_dev->dis_db = true;
>>>>>>>>>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
>>>>>>>>>>>>
>>>>>>>>>>>> + /* Complete the CMDQ event in advance during the reset. */
>>>>>>>>>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
>>>>>>>>>>>> +
>>>>>>>>>>>> return 0;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.33.0
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-09 6:21 ` Junxian Huang
@ 2024-07-09 7:22 ` Leon Romanovsky
2024-07-09 7:49 ` Junxian Huang
0 siblings, 1 reply; 33+ messages in thread
From: Leon Romanovsky @ 2024-07-09 7:22 UTC (permalink / raw)
To: Junxian Huang; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On Tue, Jul 09, 2024 at 02:21:31PM +0800, Junxian Huang wrote:
>
>
> On 2024/7/8 19:16, Leon Romanovsky wrote:
> > On Mon, Jul 08, 2024 at 05:30:58PM +0800, Junxian Huang wrote:
> >>
> >>
> >> On 2024/7/8 16:59, Leon Romanovsky wrote:
> >>> On Mon, Jul 08, 2024 at 04:45:34PM +0800, Junxian Huang wrote:
> >>>>
> >>>>
> >>>> On 2024/7/8 16:27, Leon Romanovsky wrote:
> >>>>> On Mon, Jul 08, 2024 at 03:46:26PM +0800, Junxian Huang wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 2024/7/8 15:33, Leon Romanovsky wrote:
> >>>>>>> On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2024/7/8 13:38, Leon Romanovsky wrote:
> >>>>>>>>> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 2024/7/7 16:30, Leon Romanovsky wrote:
> >>>>>>>>>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
> >>>>>>>>>>>> From: wenglianfa <wenglianfa@huawei.com>
> >>>>>>>>>>>>
> >>>>>>>>>>>> During reset, cmdq events won't be reported, leading to a long and
> >>>>>>>>>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
> >>>>>>>>>>>> of reset.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
> >>>>>>>>>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
> >>>>>>>>>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
> >>>>>>>>>>>> 1 file changed, 18 insertions(+)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>>>>>>>> index a5d746a5cc68..ff135df1a761 100644
> >>>>>>>>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>>>>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >>>>>>>>>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
> >>>>>>>>>>>>
> >>>>>>>>>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
> >>>>>>>>>>>> }
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
> >>>>>>>>>>>> + int i;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + if (!hr_dev->cmd_mod)
> >>>>>>>>>>>
> >>>>>>>>>>> What prevents cmd_mod from being changed?
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> It's set when the device is being initialized, and won't be changed after that.
> >>>>>>>>>
> >>>>>>>>> This is exactly the point, you are assuming that the device is already
> >>>>>>>>> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
> >>>>>>>>> from being called in the middle of initialization?
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> This is ensured by hns3 NIC driver.
> >>>>>>>>
> >>>>>>>> Initialization and reset of hns RoCE are both called by hns3. It will check the state
> >>>>>>>> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
> >>>>>>>> is called) only if the RoCE device has been already initialized:
> >>>>>>>
> >>>>>>> So why do you have "if (!hr_dev->cmd_mod)" check in the code?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>
> >>>>>> cmd_mod indicates the mode of cmdq (0: poll mode, 1: event mode).
> >>>>>> This patch only affects event mode because HW won't report events during reset.
> >>>>>
> >>>>> You set cmd_mod to 1 in hns_roce_hw_v2_init_instance() without any
> >>>>> condition, I don't see when hns v2 IB device is created and continue
> >>>>> to operate in polling mode.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>
> >>>> Event mode is the default. In hns_roce_cmd_use_events(), if kcalloc() fails
> >>>> then it'll be set to polling mode instead.
> >>>
> >>> 1. Awesome, and we are returning back to the question. What prevents
> >>> hns_roce_v2_reset_notify_cmd() from being called in the middle of
> >>> changing cmd_mod from 1 to 0 and from 0 to 1?
> >>
> >> The changing of cmd_mod is during the initialization of a device. The call
> >> of hns_roce_v2_reset_notify_cmd() is during reset. As I said previously,
> >> the hns3 NIC driver ensures that there will be no concurrency between
> >> initialization and reset, and therefore hns_roce_v2_reset_notify_cmd() won't
> >> be called in the middle of changing cmd_mod.
> >>
> >>> 2. This cmd_mode swtich from 1 to 0 should be removed. Failure to
> >>> allocate memory is not a reason to switch to polling mode. The reason
> >>> can be HW limitation, but not OS limitation.
> >>>
> >>
> >> But event mode relies on the allocated resource. If the allocation fails and
> >> we don't switch to polling mode, the driver won't work any more. Are you suggesting
> >> we should return an error and fail the initialization in this case?
> >
> > Yes, please.
>
> The reason of switching cmd_mod is that we try to keep the driver available,
> even if the allocation of event mode resources fails. We don't consider this
> as a critical error that should lead to an initialization failure. The driver
> can still post mailbox and provide normal functionality in this case.
Driver that failed to allocate memory in the middle of initialization
can't be considered as "normal".
>
> Our discussion seems to have strayed a bit away?This patch doesn't involve
> polling mode.
As long as patch has this line "if (!hr_dev->cmd_mod)", this discussion
is related to polling mode.
Thanks
>
> Junxian
>
> >
> > Thanks
> >
> >>
> >> Junxian
> >>
> >>> Thanks
> >>>
> >>>>
> >>>> Junxian
> >>>>
> >>>>>>
> >>>>>> Junxian
> >>>>>>
> >>>>>>>>
> >>>>>>>> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
> >>>>>>>> 3792 enum hnae3_reset_notify_type type)
> >>>>>>>> 3793 {
> >>>>>>>> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
> >>>>>>>> 3795 struct hnae3_client *client = hdev->roce_client;
> >>>>>>>> 3796 int ret;
> >>>>>>>> 3797
> >>>>>>>> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
> >>>>>>>> 3799 return 0;
> >>>>>>>> 3800
> >>>>>>>> 3801 if (!client->ops->reset_notify)
> >>>>>>>> 3802 return -EOPNOTSUPP;
> >>>>>>>> 3803
> >>>>>>>> 3804 ret = client->ops->reset_notify(handle, type);
> >>>>>>>> 3805 if (ret)
> >>>>>>>> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
> >>>>>>>> 3807 type, ret);
> >>>>>>>> 3808
> >>>>>>>> 3809 return ret;
> >>>>>>>> 3810 }
> >>>>>>>>
> >>>>>>>> And the bit is set (see line 11246) after the initialization has been done (line 11242):
> >>>>>>>>
> >>>>>>>> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
> >>>>>>>> 11225 struct hclge_vport *vport)
> >>>>>>>> 11226 {
> >>>>>>>> 11227 struct hclge_dev *hdev = ae_dev->priv;
> >>>>>>>> 11228 struct hnae3_client *client;
> >>>>>>>> 11229 int rst_cnt;
> >>>>>>>> 11230 int ret;
> >>>>>>>> 11231
> >>>>>>>> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
> >>>>>>>> 11233 !hdev->nic_client)
> >>>>>>>> 11234 return 0;
> >>>>>>>> 11235
> >>>>>>>> 11236 client = hdev->roce_client;
> >>>>>>>> 11237 ret = hclge_init_roce_base_info(vport);
> >>>>>>>> 11238 if (ret)
> >>>>>>>> 11239 return ret;
> >>>>>>>> 11240
> >>>>>>>> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
> >>>>>>>> 11242 ret = client->ops->init_instance(&vport->roce);
> >>>>>>>> 11243 if (ret)
> >>>>>>>> 11244 return ret;
> >>>>>>>> 11245
> >>>>>>>> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
> >>>>>>>> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
> >>>>>>>> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
> >>>>>>>> 11249 ret = -EBUSY;
> >>>>>>>> 11250 goto init_roce_err;
> >>>>>>>> 11251 }
> >>>>>>>>
> >>>>>>>> Junxian
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Junxian
> >>>>>>>>>>
> >>>>>>>>>>>> + return;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
> >>>>>>>>>>>> + hr_cmd->context[i].result = -EBUSY;
> >>>>>>>>>>>> + complete(&hr_cmd->context[i].done);
> >>>>>>>>>>>> + }
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>>>>>>>>>> {
> >>>>>>>>>>>> struct hns_roce_dev *hr_dev;
> >>>>>>>>>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> >>>>>>>>>>>> hr_dev->dis_db = true;
> >>>>>>>>>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
> >>>>>>>>>>>>
> >>>>>>>>>>>> + /* Complete the CMDQ event in advance during the reset. */
> >>>>>>>>>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> return 0;
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> 2.33.0
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>
> >>>
> >>
> >
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset
2024-07-09 7:22 ` Leon Romanovsky
@ 2024-07-09 7:49 ` Junxian Huang
0 siblings, 0 replies; 33+ messages in thread
From: Junxian Huang @ 2024-07-09 7:49 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: jgg, linux-rdma, linuxarm, linux-kernel
On 2024/7/9 15:22, Leon Romanovsky wrote:
> On Tue, Jul 09, 2024 at 02:21:31PM +0800, Junxian Huang wrote:
>>
>>
>> On 2024/7/8 19:16, Leon Romanovsky wrote:
>>> On Mon, Jul 08, 2024 at 05:30:58PM +0800, Junxian Huang wrote:
>>>>
>>>>
>>>> On 2024/7/8 16:59, Leon Romanovsky wrote:
>>>>> On Mon, Jul 08, 2024 at 04:45:34PM +0800, Junxian Huang wrote:
>>>>>>
>>>>>>
>>>>>> On 2024/7/8 16:27, Leon Romanovsky wrote:
>>>>>>> On Mon, Jul 08, 2024 at 03:46:26PM +0800, Junxian Huang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/7/8 15:33, Leon Romanovsky wrote:
>>>>>>>>> On Mon, Jul 08, 2024 at 02:50:50PM +0800, Junxian Huang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2024/7/8 13:38, Leon Romanovsky wrote:
>>>>>>>>>>> On Mon, Jul 08, 2024 at 10:29:54AM +0800, Junxian Huang wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2024/7/7 16:30, Leon Romanovsky wrote:
>>>>>>>>>>>>> On Fri, Jul 05, 2024 at 04:59:30PM +0800, Junxian Huang wrote:
>>>>>>>>>>>>>> From: wenglianfa <wenglianfa@huawei.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> During reset, cmdq events won't be reported, leading to a long and
>>>>>>>>>>>>>> unnecessary wait. Notify all the cmdqs to stop waiting at the beginning
>>>>>>>>>>>>>> of reset.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
>>>>>>>>>>>>>> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
>>>>>>>>>>>>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 18 ++++++++++++++++++
>>>>>>>>>>>>>> 1 file changed, 18 insertions(+)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>>>>>>>> index a5d746a5cc68..ff135df1a761 100644
>>>>>>>>>>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>>>>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>>>>>>>> @@ -6977,6 +6977,21 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +static void hns_roce_v2_reset_notify_cmd(struct hns_roce_dev *hr_dev)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
>>>>>>>>>>>>>> + int i;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + if (!hr_dev->cmd_mod)
>>>>>>>>>>>>>
>>>>>>>>>>>>> What prevents cmd_mod from being changed?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It's set when the device is being initialized, and won't be changed after that.
>>>>>>>>>>>
>>>>>>>>>>> This is exactly the point, you are assuming that the device is already
>>>>>>>>>>> ininitialized or not initialized at all. What prevents hns_roce_v2_reset_notify_cmd()
>>>>>>>>>>> from being called in the middle of initialization?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is ensured by hns3 NIC driver.
>>>>>>>>>>
>>>>>>>>>> Initialization and reset of hns RoCE are both called by hns3. It will check the state
>>>>>>>>>> of RoCE device (see line 3798), and notify RoCE device to reset (hns_roce_v2_reset_notify_cmd()
>>>>>>>>>> is called) only if the RoCE device has been already initialized:
>>>>>>>>>
>>>>>>>>> So why do you have "if (!hr_dev->cmd_mod)" check in the code?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>
>>>>>>>> cmd_mod indicates the mode of cmdq (0: poll mode, 1: event mode).
>>>>>>>> This patch only affects event mode because HW won't report events during reset.
>>>>>>>
>>>>>>> You set cmd_mod to 1 in hns_roce_hw_v2_init_instance() without any
>>>>>>> condition, I don't see when hns v2 IB device is created and continue
>>>>>>> to operate in polling mode.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>> Event mode is the default. In hns_roce_cmd_use_events(), if kcalloc() fails
>>>>>> then it'll be set to polling mode instead.
>>>>>
>>>>> 1. Awesome, and we are returning back to the question. What prevents
>>>>> hns_roce_v2_reset_notify_cmd() from being called in the middle of
>>>>> changing cmd_mod from 1 to 0 and from 0 to 1?
>>>>
>>>> The changing of cmd_mod is during the initialization of a device. The call
>>>> of hns_roce_v2_reset_notify_cmd() is during reset. As I said previously,
>>>> the hns3 NIC driver ensures that there will be no concurrency between
>>>> initialization and reset, and therefore hns_roce_v2_reset_notify_cmd() won't
>>>> be called in the middle of changing cmd_mod.
>>>>
>>>>> 2. This cmd_mode swtich from 1 to 0 should be removed. Failure to
>>>>> allocate memory is not a reason to switch to polling mode. The reason
>>>>> can be HW limitation, but not OS limitation.
>>>>>
>>>>
>>>> But event mode relies on the allocated resource. If the allocation fails and
>>>> we don't switch to polling mode, the driver won't work any more. Are you suggesting
>>>> we should return an error and fail the initialization in this case?
>>>
>>> Yes, please.
>>
>> The reason of switching cmd_mod is that we try to keep the driver available,
>> even if the allocation of event mode resources fails. We don't consider this
>> as a critical error that should lead to an initialization failure. The driver
>> can still post mailbox and provide normal functionality in this case.
>
> Driver that failed to allocate memory in the middle of initialization
> can't be considered as "normal".
>
>>
>> Our discussion seems to have strayed a bit away?This patch doesn't involve
>> polling mode.
>
> As long as patch has this line "if (!hr_dev->cmd_mod)", this discussion
> is related to polling mode.
>
> Thanks
>
Okay, I'll drop this patch from this patchset in v2 and fix the cmd_mod issues
in another patchset later.
Thanks,
Junxian
>>
>> Junxian
>>
>>>
>>> Thanks
>>>
>>>>
>>>> Junxian
>>>>
>>>>> Thanks
>>>>>
>>>>>>
>>>>>> Junxian
>>>>>>
>>>>>>>>
>>>>>>>> Junxian
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 3791 static int hclge_notify_roce_client(struct hclge_dev *hdev,
>>>>>>>>>> 3792 enum hnae3_reset_notify_type type)
>>>>>>>>>> 3793 {
>>>>>>>>>> 3794 struct hnae3_handle *handle = &hdev->vport[0].roce;
>>>>>>>>>> 3795 struct hnae3_client *client = hdev->roce_client;
>>>>>>>>>> 3796 int ret;
>>>>>>>>>> 3797
>>>>>>>>>> 3798 if (!test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state) || !client)
>>>>>>>>>> 3799 return 0;
>>>>>>>>>> 3800
>>>>>>>>>> 3801 if (!client->ops->reset_notify)
>>>>>>>>>> 3802 return -EOPNOTSUPP;
>>>>>>>>>> 3803
>>>>>>>>>> 3804 ret = client->ops->reset_notify(handle, type);
>>>>>>>>>> 3805 if (ret)
>>>>>>>>>> 3806 dev_err(&hdev->pdev->dev, "notify roce client failed %d(%d)",
>>>>>>>>>> 3807 type, ret);
>>>>>>>>>> 3808
>>>>>>>>>> 3809 return ret;
>>>>>>>>>> 3810 }
>>>>>>>>>>
>>>>>>>>>> And the bit is set (see line 11246) after the initialization has been done (line 11242):
>>>>>>>>>>
>>>>>>>>>> 11224 static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev,
>>>>>>>>>> 11225 struct hclge_vport *vport)
>>>>>>>>>> 11226 {
>>>>>>>>>> 11227 struct hclge_dev *hdev = ae_dev->priv;
>>>>>>>>>> 11228 struct hnae3_client *client;
>>>>>>>>>> 11229 int rst_cnt;
>>>>>>>>>> 11230 int ret;
>>>>>>>>>> 11231
>>>>>>>>>> 11232 if (!hnae3_dev_roce_supported(hdev) || !hdev->roce_client ||
>>>>>>>>>> 11233 !hdev->nic_client)
>>>>>>>>>> 11234 return 0;
>>>>>>>>>> 11235
>>>>>>>>>> 11236 client = hdev->roce_client;
>>>>>>>>>> 11237 ret = hclge_init_roce_base_info(vport);
>>>>>>>>>> 11238 if (ret)
>>>>>>>>>> 11239 return ret;
>>>>>>>>>> 11240
>>>>>>>>>> 11241 rst_cnt = hdev->rst_stats.reset_cnt;
>>>>>>>>>> 11242 ret = client->ops->init_instance(&vport->roce);
>>>>>>>>>> 11243 if (ret)
>>>>>>>>>> 11244 return ret;
>>>>>>>>>> 11245
>>>>>>>>>> 11246 set_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state);
>>>>>>>>>> 11247 if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
>>>>>>>>>> 11248 rst_cnt != hdev->rst_stats.reset_cnt) {
>>>>>>>>>> 11249 ret = -EBUSY;
>>>>>>>>>> 11250 goto init_roce_err;
>>>>>>>>>> 11251 }
>>>>>>>>>>
>>>>>>>>>> Junxian
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Junxian
>>>>>>>>>>>>
>>>>>>>>>>>>>> + return;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + for (i = 0; i < hr_cmd->max_cmds; i++) {
>>>>>>>>>>>>>> + hr_cmd->context[i].result = -EBUSY;
>>>>>>>>>>>>>> + complete(&hr_cmd->context[i].done);
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> struct hns_roce_dev *hr_dev;
>>>>>>>>>>>>>> @@ -6997,6 +7012,9 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>>>>>>>>>>>>>> hr_dev->dis_db = true;
>>>>>>>>>>>>>> hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + /* Complete the CMDQ event in advance during the reset. */
>>>>>>>>>>>>>> + hns_roce_v2_reset_notify_cmd(hr_dev);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 2.33.0
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2024-07-09 7:49 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-05 8:59 [PATCH for-rc 0/9] RDMA/hns: Bugfixes Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 1/9] RDMA/hns: Check atomic wr length Junxian Huang
2024-07-07 8:24 ` Leon Romanovsky
2024-07-08 2:27 ` Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 2/9] RDMA/hns: Fix a long wait for cmdq event during reset Junxian Huang
2024-07-07 8:30 ` Leon Romanovsky
2024-07-08 2:29 ` Junxian Huang
2024-07-08 5:38 ` Leon Romanovsky
2024-07-08 6:50 ` Junxian Huang
2024-07-08 7:33 ` Leon Romanovsky
2024-07-08 7:46 ` Junxian Huang
2024-07-08 8:27 ` Leon Romanovsky
2024-07-08 8:45 ` Junxian Huang
2024-07-08 8:59 ` Leon Romanovsky
2024-07-08 9:30 ` Junxian Huang
2024-07-08 11:16 ` Leon Romanovsky
2024-07-09 6:21 ` Junxian Huang
2024-07-09 7:22 ` Leon Romanovsky
2024-07-09 7:49 ` Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 3/9] RDMA/hns: Fix soft lockup under heavy CEQE load Junxian Huang
2024-07-05 10:47 ` Zhu Yanjun
2024-07-08 2:30 ` Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 4/9] RDMA/hns: Fix unmatch exception handling when init eq table fails Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 5/9] RDMA/hns: Fix missing pagesize and alignment check in FRMR Junxian Huang
2024-07-07 9:16 ` Zhu Yanjun
2024-07-08 2:44 ` Junxian Huang
2024-07-08 5:41 ` Leon Romanovsky
2024-07-08 7:57 ` Zhu Yanjun
2024-07-08 8:33 ` Leon Romanovsky
2024-07-05 8:59 ` [PATCH for-rc 6/9] RDMA/hns: Fix shift-out-bounds when max_inline_data is 0 Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 7/9] RDMA/hns: Fix undifined behavior caused by invalid max_sge Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 8/9] RDMA/hns: Fix insufficient extend DB for VFs Junxian Huang
2024-07-05 8:59 ` [PATCH for-rc 9/9] RDMA/hns: Fix mbx timing out before CMD execution is completed Junxian Huang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).