linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mlx5-next 00/10] Collection of ODP fixes
@ 2018-11-08 19:10 Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 01/10] net/mlx5: Release resource on error flow Leon Romanovsky
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Leon Romanovsky <leonro@mellanox.com>

Hi,

This is collection of fixes to mlx5_core and mlx5_ib ODP logic.
There are two main reasons why we decided to forward it to mlx5-next
and not to rdma-rc or net like our other fixes.

1. We have large number of patches exactly in that area that are ready
for submission and we don't want to create extra merge work for
maintainers due to that. Among those future series (already ready and
tested): internal change of mlx5_core pagefaults handling, moving ODP
code to RDMA, change in EQ mechanism, simplification and moving SRQ QP
to RDMA, extra fixes to mlx5_ib ODP and ODP SRQ support.

2. Most of those fixes are for old bugs and there is no rush to fix them
right now (in -rc1/-rc2).

Thanks

Moni Shoua (10):
  net/mlx5: Release resource on error flow
  IB/mlx5: Avoid hangs due to a missing completion
  net/mlx5: Add interface to hold and release core resources
  net/mlx5: Enumerate page fault types
  IB/mlx5: Lock QP during page fault handling
  net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
  net/mlx5: Use multi threaded workqueue for page fault handling
  IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
  net/mlx5: Remove unused function
  IB/mlx5: Improve ODP debugging messages

 drivers/infiniband/core/umem_odp.c            |  14 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   1 +
 drivers/infiniband/hw/mlx5/mr.c               |  15 +-
 drivers/infiniband/hw/mlx5/odp.c              | 158 ++++++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  21 +--
 drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  20 ++-
 include/linux/mlx5/device.h                   |   7 +
 include/linux/mlx5/driver.h                   |   7 +-
 include/linux/mlx5/qp.h                       |   5 +
 10 files changed, 191 insertions(+), 59 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 01/10] net/mlx5: Release resource on error flow
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 02/10] IB/mlx5: Avoid hangs due to a missing completion Leon Romanovsky
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

Fix reference counting leakage when the event handler aborts due to an
unsupported event for the resource type.

Fixes: a14c2d4beee5 ("net/mlx5_core: Warn on unsupported events of QP/RQ/SQ")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index 91b8139a388d..690dc1dd9391 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -132,7 +132,7 @@ void mlx5_rsc_event(struct mlx5_core_dev *dev, u32 rsn, int event_type)
 	if (!is_event_type_allowed((rsn >> MLX5_USER_INDEX_LEN), event_type)) {
 		mlx5_core_warn(dev, "event 0x%.2x is not allowed on resource 0x%.8x\n",
 			       event_type, rsn);
-		return;
+		goto out;
 	}
 
 	switch (common->res) {
@@ -150,7 +150,7 @@ void mlx5_rsc_event(struct mlx5_core_dev *dev, u32 rsn, int event_type)
 	default:
 		mlx5_core_warn(dev, "invalid resource type for 0x%x\n", rsn);
 	}
-
+out:
 	mlx5_core_put_rsc(common);
 }
 
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 02/10] IB/mlx5: Avoid hangs due to a missing completion
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 01/10] net/mlx5: Release resource on error flow Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:44   ` Jason Gunthorpe
  2018-11-08 19:10 ` [PATCH mlx5-next 03/10] net/mlx5: Add interface to hold and release core resources Leon Romanovsky
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

Fix 2 flows that may cause a process to hang on wait_for_completion():

1. When callback for create MKEY command returns with bad status
2. When callback for create MKEY command is called before executer of
command calls wait_for_completion()

The following call trace might be triggered in the above flows:

INFO: task echo_server:1655 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
echo_server     D ffff880813fb6898     0  1655      1 0x00000004
 ffff880423f5b880 0000000000000086 ffff880402290fd0 ffff880423f5bfd8
 ffff880423f5bfd8 ffff880423f5bfd8 ffff880402290fd0 ffff880813fb68a0
 7fffffffffffffff ffff880813fb6898 ffff880402290fd0 ffff880813fb6898
Call Trace:
 [<ffffffff816a94c9>] schedule+0x29/0x70
 [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
 [<ffffffffc07309e2>] ? mlx5_cmd_exec_cb+0x22/0x30 [mlx5_core]
 [<ffffffffc073e697>] ? mlx5_core_create_mkey_cb+0xb7/0x220 [mlx5_core]
 [<ffffffff811b94b7>] ? mm_drop_all_locks+0xd7/0x110
 [<ffffffff816a987d>] wait_for_completion+0xfd/0x140
 [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
 [<ffffffffc08fd308>] mlx5_mr_cache_alloc+0xa8/0x170 [mlx5_ib]
 [<ffffffffc0909626>] implicit_mr_alloc+0x36/0x190 [mlx5_ib]
 [<ffffffffc090a26e>] mlx5_ib_alloc_implicit_mr+0x4e/0xa0 [mlx5_ib]
 [<ffffffffc08ff2f3>] mlx5_ib_reg_user_mr+0x93/0x6a0 [mlx5_ib]
 [<ffffffffc0907410>] ? mlx5_ib_exp_query_device+0xab0/0xbc0 [mlx5_ib]
 [<ffffffffc04998be>] ib_uverbs_exp_reg_mr+0x2fe/0x550 [ib_uverbs]
 [<ffffffff811edaff>] ? do_huge_pmd_anonymous_page+0x2bf/0x530
 [<ffffffffc048f6cc>] ib_uverbs_write+0x3ec/0x490 [ib_uverbs]
 [<ffffffff81200d2d>] vfs_write+0xbd/0x1e0
 [<ffffffff81201b3f>] SyS_write+0x7f/0xe0
 [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b

Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 drivers/infiniband/hw/mlx5/mr.c      | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index b651a7a6fde9..cd9335e368bd 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -644,6 +644,7 @@ struct mlx5_cache_ent {
 	struct delayed_work	dwork;
 	int			pending;
 	struct completion	compl;
+	atomic_t		do_complete;
 };

 struct mlx5_mr_cache {
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 9b195d65a13e..259dd49c6874 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -143,7 +143,7 @@ static void reg_mr_callback(int status, void *context)
 		kfree(mr);
 		dev->fill_delay = 1;
 		mod_timer(&dev->delay_timer, jiffies + HZ);
-		return;
+		goto do_complete;
 	}

 	mr->mmkey.type = MLX5_MKEY_MR;
@@ -167,8 +167,13 @@ static void reg_mr_callback(int status, void *context)
 		pr_err("Error inserting to mkey tree. 0x%x\n", -err);
 	write_unlock_irqrestore(&table->lock, flags);

-	if (!completion_done(&ent->compl))
+do_complete:
+	spin_lock_irqsave(&ent->lock, flags);
+	if (atomic_read(&ent->do_complete)) {
 		complete(&ent->compl);
+		atomic_dec(&ent->do_complete);
+	}
+	spin_unlock_irqrestore(&ent->lock, flags);
 }

 static int add_keys(struct mlx5_ib_dev *dev, int c, int num)
@@ -476,9 +481,12 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, int entry)
 		if (list_empty(&ent->head)) {
 			spin_unlock_irq(&ent->lock);

+			atomic_inc(&ent->do_complete);
 			err = add_keys(dev, entry, 1);
-			if (err && err != -EAGAIN)
+			if (err && err != -EAGAIN) {
+				atomic_dec(&ent->do_complete);
 				return ERR_PTR(err);
+			}

 			wait_for_completion(&ent->compl);
 		} else {
@@ -687,6 +695,7 @@ int mlx5_mr_cache_init(struct mlx5_ib_dev *dev)
 		ent->order = i + 2;
 		ent->dev = dev;
 		ent->limit = 0;
+		atomic_set(&ent->do_complete, 0);

 		init_completion(&ent->compl);
 		INIT_WORK(&ent->work, cache_work_func);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 03/10] net/mlx5: Add interface to hold and release core resources
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 01/10] net/mlx5: Release resource on error flow Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 02/10] IB/mlx5: Avoid hangs due to a missing completion Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 04/10] net/mlx5: Enumerate page fault types Leon Romanovsky
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

Sometimes upper layers may want to prevent the destruction of a core
resource for a period of time while work on that resource is in
progress.  Add API to support this.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 16 ++++++++++++++++
 include/linux/mlx5/qp.h                      |  5 +++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index 690dc1dd9391..cba4a435043a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -670,3 +670,19 @@ int mlx5_core_query_q_counter(struct mlx5_core_dev *dev, u16 counter_id,
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, out_size);
 }
 EXPORT_SYMBOL_GPL(mlx5_core_query_q_counter);
+
+struct mlx5_core_rsc_common *mlx5_core_res_hold(struct mlx5_core_dev *dev,
+						int res_num,
+						enum mlx5_res_type res_type)
+{
+	u32 rsn = res_num | (res_type << MLX5_USER_INDEX_LEN);
+
+	return mlx5_get_rsc(dev, rsn);
+}
+EXPORT_SYMBOL_GPL(mlx5_core_res_hold);
+
+void mlx5_core_res_put(struct mlx5_core_rsc_common *res)
+{
+	mlx5_core_put_rsc(res);
+}
+EXPORT_SYMBOL_GPL(mlx5_core_res_put);
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index fbe322c966bc..b26ea9077384 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -596,6 +596,11 @@ int mlx5_core_dealloc_q_counter(struct mlx5_core_dev *dev, u16 counter_id);
 int mlx5_core_query_q_counter(struct mlx5_core_dev *dev, u16 counter_id,
 			      int reset, void *out, int out_size);
 
+struct mlx5_core_rsc_common *mlx5_core_res_hold(struct mlx5_core_dev *dev,
+						int res_num,
+						enum mlx5_res_type res_type);
+void mlx5_core_res_put(struct mlx5_core_rsc_common *res);
+
 static inline const char *mlx5_qp_type_str(int type)
 {
 	switch (type) {
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 04/10] net/mlx5: Enumerate page fault types
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
                   ` (2 preceding siblings ...)
  2018-11-08 19:10 ` [PATCH mlx5-next 03/10] net/mlx5: Add interface to hold and release core resources Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 05/10] IB/mlx5: Lock QP during page fault handling Leon Romanovsky
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

Give meaningful names to type of WQE page faults.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/linux/mlx5/device.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index b4c0457fbebd..e326524bafcc 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -212,6 +212,13 @@ enum {
 	MLX5_PFAULT_SUBTYPE_RDMA = 1,
 };
 
+enum wqe_page_fault_type {
+	MLX5_WQE_PF_TYPE_RMP = 0,
+	MLX5_WQE_PF_TYPE_REQ_SEND_OR_WRITE = 1,
+	MLX5_WQE_PF_TYPE_RESP = 2,
+	MLX5_WQE_PF_TYPE_REQ_READ_OR_ATOMIC = 3,
+};
+
 enum {
 	MLX5_PERM_LOCAL_READ	= 1 << 2,
 	MLX5_PERM_LOCAL_WRITE	= 1 << 3,
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 05/10] IB/mlx5: Lock QP during page fault handling
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
                   ` (3 preceding siblings ...)
  2018-11-08 19:10 ` [PATCH mlx5-next 04/10] net/mlx5: Enumerate page fault types Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 06/10] net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state Leon Romanovsky
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

When page fault event for a WQE arrives, the event data contains the
resource (e.g. QP) number which will later be used by the page fault
handler to retrieve the resource. Meanwhile, another context can destroy
the resource and cause use-after-free. To avoid that, take a reference on the
resource when handler starts and release it when it ends.

Page fault events for RDMA operations don't need to be protected because
the driver doesn't need to access the QP in the page fault handler.

Fixes: d9aaed838765 ("{net,IB}/mlx5: Refactor page fault handling")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/odp.c | 46 +++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index b04eb6775326..abce55b8b9ba 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -1016,16 +1016,31 @@ static int mlx5_ib_mr_responder_pfault_handler(
 	return 0;
 }

-static struct mlx5_ib_qp *mlx5_ib_odp_find_qp(struct mlx5_ib_dev *dev,
-					      u32 wq_num)
+static inline struct mlx5_core_rsc_common *odp_get_rsc(struct mlx5_ib_dev *dev,
+						       u32 wq_num, int pf_type)
 {
-	struct mlx5_core_qp *mqp = __mlx5_qp_lookup(dev->mdev, wq_num);
+	enum mlx5_res_type res_type;

-	if (!mqp) {
-		mlx5_ib_err(dev, "QPN 0x%6x not found\n", wq_num);
+	switch (pf_type) {
+	case MLX5_WQE_PF_TYPE_RMP:
+		res_type = MLX5_RES_SRQ;
+		break;
+	case MLX5_WQE_PF_TYPE_REQ_SEND_OR_WRITE:
+	case MLX5_WQE_PF_TYPE_RESP:
+	case MLX5_WQE_PF_TYPE_REQ_READ_OR_ATOMIC:
+		res_type = MLX5_RES_QP;
+		break;
+	default:
 		return NULL;
 	}

+	return mlx5_core_res_hold(dev->mdev, wq_num, res_type);
+}
+
+static inline struct mlx5_ib_qp *res_to_qp(struct mlx5_core_rsc_common *res)
+{
+	struct mlx5_core_qp *mqp = (struct mlx5_core_qp *)res;
+
 	return to_mibqp(mqp);
 }

@@ -1039,18 +1054,30 @@ static void mlx5_ib_mr_wqe_pfault_handler(struct mlx5_ib_dev *dev,
 	int resume_with_error = 1;
 	u16 wqe_index = pfault->wqe.wqe_index;
 	int requestor = pfault->type & MLX5_PFAULT_REQUESTOR;
+	struct mlx5_core_rsc_common *res;
 	struct mlx5_ib_qp *qp;

+	res = odp_get_rsc(dev, pfault->wqe.wq_num, pfault->type);
+	if (!res) {
+		mlx5_ib_dbg(dev, "wqe page fault for missing resource %d\n", pfault->wqe.wq_num);
+		return;
+	}
+
+	switch (res->res) {
+	case MLX5_RES_QP:
+		qp = res_to_qp(res);
+		break;
+	default:
+		mlx5_ib_err(dev, "wqe page fault for unsupported type %d\n", pfault->type);
+		goto resolve_page_fault;
+	}
+
 	buffer = (char *)__get_free_page(GFP_KERNEL);
 	if (!buffer) {
 		mlx5_ib_err(dev, "Error allocating memory for IO page fault handling.\n");
 		goto resolve_page_fault;
 	}

-	qp = mlx5_ib_odp_find_qp(dev, pfault->wqe.wq_num);
-	if (!qp)
-		goto resolve_page_fault;
-
 	ret = mlx5_ib_read_user_wqe(qp, requestor, wqe_index, buffer,
 				    PAGE_SIZE, &qp->trans_qp.base);
 	if (ret < 0) {
@@ -1090,6 +1117,7 @@ static void mlx5_ib_mr_wqe_pfault_handler(struct mlx5_ib_dev *dev,
 	mlx5_ib_dbg(dev, "PAGE FAULT completed. QP 0x%x resume_with_error=%d, type: 0x%x\n",
 		    pfault->wqe.wq_num, resume_with_error,
 		    pfault->type);
+	mlx5_core_res_put(res);
 	free_page((unsigned long)buffer);
 }

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 06/10] net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
                   ` (4 preceding siblings ...)
  2018-11-08 19:10 ` [PATCH mlx5-next 05/10] IB/mlx5: Lock QP during page fault handling Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 07/10] net/mlx5: Use multi threaded workqueue for page fault handling Leon Romanovsky
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

When the device is in internal error state, command interface isn't
accessible and the driver decides which commands to fail and which to
pass.
Move the PAGE_FAULT_RESUME command to the pass list in order to avoid
redundant failure messages.

Fixes: 89d44f0a6c73 ("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index a5a0823e5ada..7b18aff955f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -313,6 +313,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
 	case MLX5_CMD_OP_FPGA_DESTROY_QP:
 	case MLX5_CMD_OP_DESTROY_GENERAL_OBJECT:
 	case MLX5_CMD_OP_DEALLOC_MEMIC:
+	case MLX5_CMD_OP_PAGE_FAULT_RESUME:
 		return MLX5_CMD_STAT_OK;
 
 	case MLX5_CMD_OP_QUERY_HCA_CAP:
@@ -326,7 +327,6 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
 	case MLX5_CMD_OP_CREATE_MKEY:
 	case MLX5_CMD_OP_QUERY_MKEY:
 	case MLX5_CMD_OP_QUERY_SPECIAL_CONTEXTS:
-	case MLX5_CMD_OP_PAGE_FAULT_RESUME:
 	case MLX5_CMD_OP_CREATE_EQ:
 	case MLX5_CMD_OP_QUERY_EQ:
 	case MLX5_CMD_OP_GEN_EQE:
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 07/10] net/mlx5: Use multi threaded workqueue for page fault handling
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
                   ` (5 preceding siblings ...)
  2018-11-08 19:10 ` [PATCH mlx5-next 06/10] net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously Leon Romanovsky
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

Page fault events are processed in a workqueue context. Since each QP
can have up to two concurrent unrelated page-faults, one for requester
and one for responder, page-fault handling can be done in parallel.
Achieve this by changing the workqueue to be multi-threaded.
The number of threads is the same as the number of command interface
channels to avoid command interface bottlenecks.

In addition to multi-threads, change the workqueue flags to give it high
priority.

Stress benchmark shows that before this change 85% of page faults were
waiting in queue 8 seconds or more while after the change 98% of page
faults were waiting in queue 64 milliseconds or less. The number of threads
was chosen as the number of channels to the command interface.

Fixes: d9aaed838765 ("{net,IB}/mlx5: Refactor page fault handling")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index c1e1a16a9b07..aeab0c4f60f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -356,8 +356,9 @@ static int init_pf_ctx(struct mlx5_eq_pagefault *pf_ctx, const char *name)
 	spin_lock_init(&pf_ctx->lock);
 	INIT_WORK(&pf_ctx->work, eq_pf_action);
 
-	pf_ctx->wq = alloc_ordered_workqueue(name,
-					     WQ_MEM_RECLAIM);
+	pf_ctx->wq = alloc_workqueue(name,
+				     WQ_HIGHPRI | WQ_UNBOUND | WQ_MEM_RECLAIM,
+				     MLX5_NUM_CMD_EQE);
 	if (!pf_ctx->wq)
 		return -ENOMEM;
 
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
                   ` (6 preceding siblings ...)
  2018-11-08 19:10 ` [PATCH mlx5-next 07/10] net/mlx5: Use multi threaded workqueue for page fault handling Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:49   ` Jason Gunthorpe
  2018-11-08 19:10 ` [PATCH mlx5-next 09/10] net/mlx5: Remove unused function Leon Romanovsky
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

Telling the HCA that page fault handling is done and QP can resume
its flow is done in the context of the page fault handler. This blocks
the handling of the next work in queue without a need.
Call the PAGE_FAULT_RESUME command in an asynchronous manner and free
the workqueue to pick the next work item for handling. All tasks that
were executed after PAGE_FAULT_RESUME need to be done now
in the callback of the asynchronous command mechanism.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/odp.c | 110 +++++++++++++++++++++++++------
 include/linux/mlx5/driver.h      |   3 +
 2 files changed, 94 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index abce55b8b9ba..0c4f469cdd5b 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -298,20 +298,78 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
 	return;
 }
 
+struct pfault_resume_cb_ctx {
+	struct mlx5_ib_dev *dev;
+	struct mlx5_core_rsc_common *res;
+	struct mlx5_pagefault *pfault;
+};
+
+static void page_fault_resume_callback(int status, void *context)
+{
+	struct pfault_resume_cb_ctx *ctx = context;
+	struct mlx5_pagefault *pfault = ctx->pfault;
+
+	if (status)
+		mlx5_ib_err(ctx->dev, "Resolve the page fault failed with status %d\n",
+			    status);
+
+	if (ctx->res)
+		mlx5_core_res_put(ctx->res);
+	kfree(pfault);
+	kfree(ctx);
+}
+
 static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
+				      struct mlx5_core_rsc_common *res,
 				      struct mlx5_pagefault *pfault,
-				      int error)
+				      int error,
+				      bool async)
 {
+	int ret = 0;
+	u32 *out = pfault->out_pf_resume;
+	u32 *in = pfault->in_pf_resume;
+	u32 token = pfault->token;
 	int wq_num = pfault->event_subtype == MLX5_PFAULT_SUBTYPE_WQE ?
-		     pfault->wqe.wq_num : pfault->token;
-	int ret = mlx5_core_page_fault_resume(dev->mdev,
-					      pfault->token,
-					      wq_num,
-					      pfault->type,
-					      error);
-	if (ret)
-		mlx5_ib_err(dev, "Failed to resolve the page fault on WQ 0x%x\n",
-			    wq_num);
+		pfault->wqe.wq_num : pfault->token;
+	u8 type = pfault->type;
+	struct pfault_resume_cb_ctx *ctx = NULL;
+
+	if (async)
+		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
+
+	if (ctx) {
+		ctx->dev = dev;
+		ctx->res = res;
+		ctx->pfault = pfault;
+	}
+
+	memset(out, 0, MLX5_ST_SZ_BYTES(page_fault_resume_out));
+	memset(in, 0, MLX5_ST_SZ_BYTES(page_fault_resume_in));
+
+	MLX5_SET(page_fault_resume_in, in, opcode, MLX5_CMD_OP_PAGE_FAULT_RESUME);
+	MLX5_SET(page_fault_resume_in, in, error, !!error);
+	MLX5_SET(page_fault_resume_in, in, page_fault_type, type);
+	MLX5_SET(page_fault_resume_in, in, wq_number, wq_num);
+	MLX5_SET(page_fault_resume_in, in, token, token);
+
+	if (ctx)
+		ret = mlx5_cmd_exec_cb(dev->mdev,
+				       in, MLX5_ST_SZ_BYTES(page_fault_resume_in),
+				       out, MLX5_ST_SZ_BYTES(page_fault_resume_out),
+				       page_fault_resume_callback, ctx);
+	else
+		ret = mlx5_cmd_exec(dev->mdev,
+				    in, MLX5_ST_SZ_BYTES(page_fault_resume_in),
+				    out, MLX5_ST_SZ_BYTES(page_fault_resume_out));
+	if (ret || !ctx) {
+		if (ret)
+			mlx5_ib_err(dev, "Failed to resume the page fault (%d)\n", ret);
+		if (res)
+			mlx5_core_res_put(res);
+		if (async)
+			kfree(pfault);
+		kfree(ctx);
+	}
 }
 
 static struct mlx5_ib_mr *implicit_mr_alloc(struct ib_pd *pd,
@@ -1045,7 +1103,8 @@ static inline struct mlx5_ib_qp *res_to_qp(struct mlx5_core_rsc_common *res)
 }
 
 static void mlx5_ib_mr_wqe_pfault_handler(struct mlx5_ib_dev *dev,
-					  struct mlx5_pagefault *pfault)
+					  struct mlx5_pagefault *pfault,
+					  bool async_resume)
 {
 	int ret;
 	void *wqe, *wqe_end;
@@ -1113,11 +1172,10 @@ static void mlx5_ib_mr_wqe_pfault_handler(struct mlx5_ib_dev *dev,
 
 	resume_with_error = 0;
 resolve_page_fault:
-	mlx5_ib_page_fault_resume(dev, pfault, resume_with_error);
+	mlx5_ib_page_fault_resume(dev, res, pfault, resume_with_error, async_resume);
 	mlx5_ib_dbg(dev, "PAGE FAULT completed. QP 0x%x resume_with_error=%d, type: 0x%x\n",
 		    pfault->wqe.wq_num, resume_with_error,
 		    pfault->type);
-	mlx5_core_res_put(res);
 	free_page((unsigned long)buffer);
 }
 
@@ -1128,7 +1186,8 @@ static int pages_in_range(u64 address, u32 length)
 }
 
 static void mlx5_ib_mr_rdma_pfault_handler(struct mlx5_ib_dev *dev,
-					   struct mlx5_pagefault *pfault)
+					   struct mlx5_pagefault *pfault,
+					   bool async_resume)
 {
 	u64 address;
 	u32 length;
@@ -1166,14 +1225,14 @@ static void mlx5_ib_mr_rdma_pfault_handler(struct mlx5_ib_dev *dev,
 		/* We're racing with an invalidation, don't prefetch */
 		prefetch_activated = 0;
 	} else if (ret < 0 || pages_in_range(address, length) > ret) {
-		mlx5_ib_page_fault_resume(dev, pfault, 1);
+		mlx5_ib_page_fault_resume(dev, NULL, pfault, 1, async_resume);
 		if (ret != -ENOENT)
 			mlx5_ib_dbg(dev, "PAGE FAULT error %d. QP 0x%x, type: 0x%x\n",
 				    ret, pfault->token, pfault->type);
 		return;
 	}
 
-	mlx5_ib_page_fault_resume(dev, pfault, 0);
+	mlx5_ib_page_fault_resume(dev, NULL, pfault, 0, async_resume);
 	mlx5_ib_dbg(dev, "PAGE FAULT completed. QP 0x%x, type: 0x%x, prefetch_activated: %d\n",
 		    pfault->token, pfault->type,
 		    prefetch_activated);
@@ -1201,18 +1260,31 @@ void mlx5_ib_pfault(struct mlx5_core_dev *mdev, void *context,
 {
 	struct mlx5_ib_dev *dev = context;
 	u8 event_subtype = pfault->event_subtype;
+	struct mlx5_pagefault *pfault_copy;
+	struct mlx5_pagefault *pfault_arg;
+	bool async_resume;
+
+	pfault_copy = kmalloc(sizeof(*pfault_copy), GFP_KERNEL);
+	if (pfault_copy) {
+		memcpy(pfault_copy, pfault, sizeof(*pfault_copy));
+		async_resume = true;
+		pfault_arg = pfault_copy;
+	} else {
+		async_resume = false;
+		pfault_arg = pfault;
+	}
 
 	switch (event_subtype) {
 	case MLX5_PFAULT_SUBTYPE_WQE:
-		mlx5_ib_mr_wqe_pfault_handler(dev, pfault);
+		mlx5_ib_mr_wqe_pfault_handler(dev, pfault_arg, async_resume);
 		break;
 	case MLX5_PFAULT_SUBTYPE_RDMA:
-		mlx5_ib_mr_rdma_pfault_handler(dev, pfault);
+		mlx5_ib_mr_rdma_pfault_handler(dev, pfault_arg, async_resume);
 		break;
 	default:
 		mlx5_ib_err(dev, "Invalid page fault event subtype: 0x%x\n",
 			    event_subtype);
-		mlx5_ib_page_fault_resume(dev, pfault, 1);
+		mlx5_ib_page_fault_resume(dev, NULL, pfault, 1, false);
 	}
 }
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index aa5963b5d38e..85275612a473 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -738,6 +738,9 @@ enum mlx5_pagefault_type_flags {
 
 /* Contains the details of a pagefault. */
 struct mlx5_pagefault {
+	u32 out_pf_resume[MLX5_ST_SZ_DW(page_fault_resume_out)];
+	u32 in_pf_resume[MLX5_ST_SZ_DW(page_fault_resume_in)];
+
 	u32			bytes_committed;
 	u32			token;
 	u8			event_subtype;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 09/10] net/mlx5: Remove unused function
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
                   ` (7 preceding siblings ...)
  2018-11-08 19:10 ` [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:10 ` [PATCH mlx5-next 10/10] IB/mlx5: Improve ODP debugging messages Leon Romanovsky
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

No callers to the function mlx5_core_page_fault_resume() so it is OK to
delete it.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 ----------------
 include/linux/mlx5/driver.h                  |  4 ----
 2 files changed, 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index aeab0c4f60f4..22eabb80e122 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -373,22 +373,6 @@ static int init_pf_ctx(struct mlx5_eq_pagefault *pf_ctx, const char *name)
 	return -ENOMEM;
 }
 
-int mlx5_core_page_fault_resume(struct mlx5_core_dev *dev, u32 token,
-				u32 wq_num, u8 type, int error)
-{
-	u32 out[MLX5_ST_SZ_DW(page_fault_resume_out)] = {0};
-	u32 in[MLX5_ST_SZ_DW(page_fault_resume_in)]   = {0};
-
-	MLX5_SET(page_fault_resume_in, in, opcode,
-		 MLX5_CMD_OP_PAGE_FAULT_RESUME);
-	MLX5_SET(page_fault_resume_in, in, error, !!error);
-	MLX5_SET(page_fault_resume_in, in, page_fault_type, type);
-	MLX5_SET(page_fault_resume_in, in, wq_number, wq_num);
-	MLX5_SET(page_fault_resume_in, in, token, token);
-
-	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
-}
-EXPORT_SYMBOL_GPL(mlx5_core_page_fault_resume);
 #endif
 
 static void general_event_handler(struct mlx5_core_dev *dev,
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 85275612a473..1de1e8e91382 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1141,10 +1141,6 @@ int mlx5_query_odp_caps(struct mlx5_core_dev *dev,
 			struct mlx5_odp_caps *odp_caps);
 int mlx5_core_query_ib_ppcnt(struct mlx5_core_dev *dev,
 			     u8 port_num, void *out, size_t sz);
-#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-int mlx5_core_page_fault_resume(struct mlx5_core_dev *dev, u32 token,
-				u32 wq_num, u8 type, int error);
-#endif
 
 int mlx5_init_rl_table(struct mlx5_core_dev *dev);
 void mlx5_cleanup_rl_table(struct mlx5_core_dev *dev);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH mlx5-next 10/10] IB/mlx5: Improve ODP debugging messages
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
                   ` (8 preceding siblings ...)
  2018-11-08 19:10 ` [PATCH mlx5-next 09/10] net/mlx5: Remove unused function Leon Romanovsky
@ 2018-11-08 19:10 ` Leon Romanovsky
  2018-11-08 19:45 ` [PATCH mlx5-next 00/10] Collection of ODP fixes Jason Gunthorpe
  2018-11-12 20:43 ` Leon Romanovsky
  11 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-08 19:10 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

From: Moni Shoua <monis@mellanox.com>

Add and modify debug messages to ODP related error flows.
In that context, return code EAGAIN is considered less severe and print
level for it is set debug instead of warn.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/umem_odp.c | 14 ++++++++++++--
 drivers/infiniband/hw/mlx5/odp.c   |  4 ++--
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 2b4c5e7dd5a1..e334480b56bf 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -655,8 +655,13 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 				flags, local_page_list, NULL, NULL);
 		up_read(&owning_mm->mmap_sem);

-		if (npages < 0)
+		if (npages < 0) {
+			if (npages != -EAGAIN)
+				pr_warn("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
+			else
+				pr_debug("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
 			break;
+		}

 		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
 		mutex_lock(&umem_odp->umem_mutex);
@@ -674,8 +679,13 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 			ret = ib_umem_odp_map_dma_single_page(
 					umem_odp, k, local_page_list[j],
 					access_mask, current_seq);
-			if (ret < 0)
+			if (ret < 0) {
+				if (ret != -EAGAIN)
+					pr_warn("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
+				else
+					pr_debug("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
 				break;
+			}

 			p = page_to_phys(local_page_list[j]);
 			k++;
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 0c4f469cdd5b..7bca1592d811 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -665,8 +665,8 @@ static int pagefault_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr,
 			if (!wait_for_completion_timeout(
 					&odp->notifier_completion,
 					timeout)) {
-				mlx5_ib_warn(dev, "timeout waiting for mmu notifier. seq %d against %d\n",
-					     current_seq, odp->notifiers_seq);
+				mlx5_ib_warn(dev, "timeout waiting for mmu notifier. seq %d against %d. notifiers_count=%d\n",
+					     current_seq, odp->notifiers_seq, odp->notifiers_count);
 			}
 		} else {
 			/* The MR is being killed, kill the QP as well. */

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 02/10] IB/mlx5: Avoid hangs due to a missing completion
  2018-11-08 19:10 ` [PATCH mlx5-next 02/10] IB/mlx5: Avoid hangs due to a missing completion Leon Romanovsky
@ 2018-11-08 19:44   ` Jason Gunthorpe
  2018-11-09 16:17     ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2018-11-08 19:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Artemy Kovalyov,
	Majd Dibbiny, Moni Shoua, Saeed Mahameed, linux-netdev

On Thu, Nov 08, 2018 at 09:10:09PM +0200, Leon Romanovsky wrote:
> From: Moni Shoua <monis@mellanox.com>
> 
> Fix 2 flows that may cause a process to hang on wait_for_completion():
> 
> 1. When callback for create MKEY command returns with bad status
> 2. When callback for create MKEY command is called before executer of
> command calls wait_for_completion()
> 
> The following call trace might be triggered in the above flows:
> 
> INFO: task echo_server:1655 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> echo_server     D ffff880813fb6898     0  1655      1 0x00000004
>  ffff880423f5b880 0000000000000086 ffff880402290fd0 ffff880423f5bfd8
>  ffff880423f5bfd8 ffff880423f5bfd8 ffff880402290fd0 ffff880813fb68a0
>  7fffffffffffffff ffff880813fb6898 ffff880402290fd0 ffff880813fb6898
> Call Trace:
>  [<ffffffff816a94c9>] schedule+0x29/0x70
>  [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
>  [<ffffffffc07309e2>] ? mlx5_cmd_exec_cb+0x22/0x30 [mlx5_core]
>  [<ffffffffc073e697>] ? mlx5_core_create_mkey_cb+0xb7/0x220 [mlx5_core]
>  [<ffffffff811b94b7>] ? mm_drop_all_locks+0xd7/0x110
>  [<ffffffff816a987d>] wait_for_completion+0xfd/0x140
>  [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
>  [<ffffffffc08fd308>] mlx5_mr_cache_alloc+0xa8/0x170 [mlx5_ib]
>  [<ffffffffc0909626>] implicit_mr_alloc+0x36/0x190 [mlx5_ib]
>  [<ffffffffc090a26e>] mlx5_ib_alloc_implicit_mr+0x4e/0xa0 [mlx5_ib]
>  [<ffffffffc08ff2f3>] mlx5_ib_reg_user_mr+0x93/0x6a0 [mlx5_ib]
>  [<ffffffffc0907410>] ? mlx5_ib_exp_query_device+0xab0/0xbc0 [mlx5_ib]
>  [<ffffffffc04998be>] ib_uverbs_exp_reg_mr+0x2fe/0x550 [ib_uverbs]
>  [<ffffffff811edaff>] ? do_huge_pmd_anonymous_page+0x2bf/0x530
>  [<ffffffffc048f6cc>] ib_uverbs_write+0x3ec/0x490 [ib_uverbs]
>  [<ffffffff81200d2d>] vfs_write+0xbd/0x1e0
>  [<ffffffff81201b3f>] SyS_write+0x7f/0xe0
>  [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
> 
> Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib")
> Signed-off-by: Moni Shoua <monis@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
>  drivers/infiniband/hw/mlx5/mr.c      | 15 ++++++++++++---
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index b651a7a6fde9..cd9335e368bd 100644
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -644,6 +644,7 @@ struct mlx5_cache_ent {
>  	struct delayed_work	dwork;
>  	int			pending;
>  	struct completion	compl;
> +	atomic_t		do_complete;
>  };
> 
>  struct mlx5_mr_cache {
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index 9b195d65a13e..259dd49c6874 100644
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -143,7 +143,7 @@ static void reg_mr_callback(int status, void *context)
>  		kfree(mr);
>  		dev->fill_delay = 1;
>  		mod_timer(&dev->delay_timer, jiffies + HZ);
> -		return;
> +		goto do_complete;
>  	}
> 
>  	mr->mmkey.type = MLX5_MKEY_MR;
> @@ -167,8 +167,13 @@ static void reg_mr_callback(int status, void *context)
>  		pr_err("Error inserting to mkey tree. 0x%x\n", -err);
>  	write_unlock_irqrestore(&table->lock, flags);
> 
> -	if (!completion_done(&ent->compl))
> +do_complete:
> +	spin_lock_irqsave(&ent->lock, flags);
> +	if (atomic_read(&ent->do_complete)) {
>  		complete(&ent->compl);
> +		atomic_dec(&ent->do_complete);
> +	}
> +	spin_unlock_irqrestore(&ent->lock, flags);

Oh, this is quite an ugly way to use completions, I think this has
veered into misusing completion territory.. The completion_done was
never right...

add_keys should accept a flag indicating that this MR has a completor
waiting and should trigger complete() on CB finishing... Can probably
store the flag someplace in the MR.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 00/10] Collection of ODP fixes
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
                   ` (9 preceding siblings ...)
  2018-11-08 19:10 ` [PATCH mlx5-next 10/10] IB/mlx5: Improve ODP debugging messages Leon Romanovsky
@ 2018-11-08 19:45 ` Jason Gunthorpe
  2018-11-08 19:50   ` Saeed Mahameed
  2018-11-12 20:43 ` Leon Romanovsky
  11 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2018-11-08 19:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Artemy Kovalyov,
	Majd Dibbiny, Moni Shoua, Saeed Mahameed, linux-netdev

On Thu, Nov 08, 2018 at 09:10:07PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Hi,
> 
> This is collection of fixes to mlx5_core and mlx5_ib ODP logic.
> There are two main reasons why we decided to forward it to mlx5-next
> and not to rdma-rc or net like our other fixes.
> 
> 1. We have large number of patches exactly in that area that are ready
> for submission and we don't want to create extra merge work for
> maintainers due to that. Among those future series (already ready and
> tested): internal change of mlx5_core pagefaults handling, moving ODP
> code to RDMA, change in EQ mechanism, simplification and moving SRQ QP
> to RDMA, extra fixes to mlx5_ib ODP and ODP SRQ support.
> 
> 2. Most of those fixes are for old bugs and there is no rush to fix them
> right now (in -rc1/-rc2).
> 
> Thanks
> 
> Moni Shoua (10):
>   net/mlx5: Release resource on error flow
>   IB/mlx5: Avoid hangs due to a missing completion
>   net/mlx5: Add interface to hold and release core resources
>   net/mlx5: Enumerate page fault types
>   IB/mlx5: Lock QP during page fault handling
>   net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
>   net/mlx5: Use multi threaded workqueue for page fault handling
>   IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
>   net/mlx5: Remove unused function
>   IB/mlx5: Improve ODP debugging messages
> 
>  drivers/infiniband/core/umem_odp.c            |  14 +-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h          |   1 +
>  drivers/infiniband/hw/mlx5/mr.c               |  15 +-
>  drivers/infiniband/hw/mlx5/odp.c              | 158 ++++++++++++++----
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  21 +--
>  drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  20 ++-

there is alot of ethernet files here, parts of this should probably go
through the shared branch?

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
  2018-11-08 19:10 ` [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously Leon Romanovsky
@ 2018-11-08 19:49   ` Jason Gunthorpe
  2018-11-09 16:26     ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2018-11-08 19:49 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Artemy Kovalyov,
	Majd Dibbiny, Moni Shoua, Saeed Mahameed, linux-netdev

On Thu, Nov 08, 2018 at 09:10:15PM +0200, Leon Romanovsky wrote:
> From: Moni Shoua <monis@mellanox.com>
> 
> Telling the HCA that page fault handling is done and QP can resume
> its flow is done in the context of the page fault handler. This blocks
> the handling of the next work in queue without a need.
> Call the PAGE_FAULT_RESUME command in an asynchronous manner and free
> the workqueue to pick the next work item for handling. All tasks that
> were executed after PAGE_FAULT_RESUME need to be done now
> in the callback of the asynchronous command mechanism.
> 
> Signed-off-by: Moni Shoua <monis@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/odp.c | 110 +++++++++++++++++++++++++------
>  include/linux/mlx5/driver.h      |   3 +
>  2 files changed, 94 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> index abce55b8b9ba..0c4f469cdd5b 100644
> +++ b/drivers/infiniband/hw/mlx5/odp.c
> @@ -298,20 +298,78 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
>  	return;
>  }
>  
> +struct pfault_resume_cb_ctx {
> +	struct mlx5_ib_dev *dev;
> +	struct mlx5_core_rsc_common *res;
> +	struct mlx5_pagefault *pfault;
> +};
> +
> +static void page_fault_resume_callback(int status, void *context)
> +{
> +	struct pfault_resume_cb_ctx *ctx = context;
> +	struct mlx5_pagefault *pfault = ctx->pfault;
> +
> +	if (status)
> +		mlx5_ib_err(ctx->dev, "Resolve the page fault failed with status %d\n",
> +			    status);
> +
> +	if (ctx->res)
> +		mlx5_core_res_put(ctx->res);
> +	kfree(pfault);
> +	kfree(ctx);
> +}
> +
>  static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
> +				      struct mlx5_core_rsc_common *res,
>  				      struct mlx5_pagefault *pfault,
> -				      int error)
> +				      int error,
> +				      bool async)
>  {
> +	int ret = 0;
> +	u32 *out = pfault->out_pf_resume;
> +	u32 *in = pfault->in_pf_resume;
> +	u32 token = pfault->token;
>  	int wq_num = pfault->event_subtype == MLX5_PFAULT_SUBTYPE_WQE ?
> -		     pfault->wqe.wq_num : pfault->token;
> -	int ret = mlx5_core_page_fault_resume(dev->mdev,
> -					      pfault->token,
> -					      wq_num,
> -					      pfault->type,
> -					      error);
> -	if (ret)
> -		mlx5_ib_err(dev, "Failed to resolve the page fault on WQ 0x%x\n",
> -			    wq_num);
> +		pfault->wqe.wq_num : pfault->token;
> +	u8 type = pfault->type;
> +	struct pfault_resume_cb_ctx *ctx = NULL;
> +
> +	if (async)
> +		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);

Why not allocate this ctx ast part of the mlx5_pagefault and avoid
this allocation failure strategy?

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 00/10] Collection of ODP fixes
  2018-11-08 19:45 ` [PATCH mlx5-next 00/10] Collection of ODP fixes Jason Gunthorpe
@ 2018-11-08 19:50   ` Saeed Mahameed
  0 siblings, 0 replies; 21+ messages in thread
From: Saeed Mahameed @ 2018-11-08 19:50 UTC (permalink / raw)
  To: Jason Gunthorpe, leon@kernel.org
  Cc: netdev@vger.kernel.org, Majd Dibbiny, Moni Shoua, Leon Romanovsky,
	linux-rdma@vger.kernel.org, dledford@redhat.com, Artemy Kovalyov

On Thu, 2018-11-08 at 19:45 +0000, Jason Gunthorpe wrote:
> On Thu, Nov 08, 2018 at 09:10:07PM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> > 
> > Hi,
> > 
> > This is collection of fixes to mlx5_core and mlx5_ib ODP logic.
> > There are two main reasons why we decided to forward it to mlx5-
> > next
> > and not to rdma-rc or net like our other fixes.
> > 
> > 1. We have large number of patches exactly in that area that are
> > ready
> > for submission and we don't want to create extra merge work for
> > maintainers due to that. Among those future series (already ready
> > and
> > tested): internal change of mlx5_core pagefaults handling, moving
> > ODP
> > code to RDMA, change in EQ mechanism, simplification and moving SRQ
> > QP
> > to RDMA, extra fixes to mlx5_ib ODP and ODP SRQ support.
> > 
> > 2. Most of those fixes are for old bugs and there is no rush to fix
> > them
> > right now (in -rc1/-rc2).
> > 
> > Thanks
> > 
> > Moni Shoua (10):
> >   net/mlx5: Release resource on error flow
> >   IB/mlx5: Avoid hangs due to a missing completion
> >   net/mlx5: Add interface to hold and release core resources
> >   net/mlx5: Enumerate page fault types
> >   IB/mlx5: Lock QP during page fault handling
> >   net/mlx5: Return success for PAGE_FAULT_RESUME in internal error
> > state
> >   net/mlx5: Use multi threaded workqueue for page fault handling
> >   IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
> >   net/mlx5: Remove unused function
> >   IB/mlx5: Improve ODP debugging messages
> > 
> >  drivers/infiniband/core/umem_odp.c            |  14 +-
> >  drivers/infiniband/hw/mlx5/mlx5_ib.h          |   1 +
> >  drivers/infiniband/hw/mlx5/mr.c               |  15 +-
> >  drivers/infiniband/hw/mlx5/odp.c              | 158
> > ++++++++++++++----
> >  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   2 +-
> >  drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  21 +--
> >  drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  20 ++-
> 
> there is alot of ethernet files here, parts of this should probably
> go
> through the shared branch?
> 

All of this should go to the shared branch, 
after this ODP series we have another series that moves all the ODP
logic out of mlx5_core into mlx5_ib.

> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 02/10] IB/mlx5: Avoid hangs due to a missing completion
  2018-11-08 19:44   ` Jason Gunthorpe
@ 2018-11-09 16:17     ` Leon Romanovsky
  0 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-09 16:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

[-- Attachment #1: Type: text/plain, Size: 4191 bytes --]

On Thu, Nov 08, 2018 at 07:44:44PM +0000, Jason Gunthorpe wrote:
> On Thu, Nov 08, 2018 at 09:10:09PM +0200, Leon Romanovsky wrote:
> > From: Moni Shoua <monis@mellanox.com>
> >
> > Fix 2 flows that may cause a process to hang on wait_for_completion():
> >
> > 1. When callback for create MKEY command returns with bad status
> > 2. When callback for create MKEY command is called before executer of
> > command calls wait_for_completion()
> >
> > The following call trace might be triggered in the above flows:
> >
> > INFO: task echo_server:1655 blocked for more than 120 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > echo_server     D ffff880813fb6898     0  1655      1 0x00000004
> >  ffff880423f5b880 0000000000000086 ffff880402290fd0 ffff880423f5bfd8
> >  ffff880423f5bfd8 ffff880423f5bfd8 ffff880402290fd0 ffff880813fb68a0
> >  7fffffffffffffff ffff880813fb6898 ffff880402290fd0 ffff880813fb6898
> > Call Trace:
> >  [<ffffffff816a94c9>] schedule+0x29/0x70
> >  [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
> >  [<ffffffffc07309e2>] ? mlx5_cmd_exec_cb+0x22/0x30 [mlx5_core]
> >  [<ffffffffc073e697>] ? mlx5_core_create_mkey_cb+0xb7/0x220 [mlx5_core]
> >  [<ffffffff811b94b7>] ? mm_drop_all_locks+0xd7/0x110
> >  [<ffffffff816a987d>] wait_for_completion+0xfd/0x140
> >  [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
> >  [<ffffffffc08fd308>] mlx5_mr_cache_alloc+0xa8/0x170 [mlx5_ib]
> >  [<ffffffffc0909626>] implicit_mr_alloc+0x36/0x190 [mlx5_ib]
> >  [<ffffffffc090a26e>] mlx5_ib_alloc_implicit_mr+0x4e/0xa0 [mlx5_ib]
> >  [<ffffffffc08ff2f3>] mlx5_ib_reg_user_mr+0x93/0x6a0 [mlx5_ib]
> >  [<ffffffffc0907410>] ? mlx5_ib_exp_query_device+0xab0/0xbc0 [mlx5_ib]
> >  [<ffffffffc04998be>] ib_uverbs_exp_reg_mr+0x2fe/0x550 [ib_uverbs]
> >  [<ffffffff811edaff>] ? do_huge_pmd_anonymous_page+0x2bf/0x530
> >  [<ffffffffc048f6cc>] ib_uverbs_write+0x3ec/0x490 [ib_uverbs]
> >  [<ffffffff81200d2d>] vfs_write+0xbd/0x1e0
> >  [<ffffffff81201b3f>] SyS_write+0x7f/0xe0
> >  [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
> >
> > Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib")
> > Signed-off-by: Moni Shoua <monis@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
> >  drivers/infiniband/hw/mlx5/mr.c      | 15 ++++++++++++---
> >  2 files changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> > index b651a7a6fde9..cd9335e368bd 100644
> > +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> > @@ -644,6 +644,7 @@ struct mlx5_cache_ent {
> >  	struct delayed_work	dwork;
> >  	int			pending;
> >  	struct completion	compl;
> > +	atomic_t		do_complete;
> >  };
> >
> >  struct mlx5_mr_cache {
> > diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> > index 9b195d65a13e..259dd49c6874 100644
> > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > @@ -143,7 +143,7 @@ static void reg_mr_callback(int status, void *context)
> >  		kfree(mr);
> >  		dev->fill_delay = 1;
> >  		mod_timer(&dev->delay_timer, jiffies + HZ);
> > -		return;
> > +		goto do_complete;
> >  	}
> >
> >  	mr->mmkey.type = MLX5_MKEY_MR;
> > @@ -167,8 +167,13 @@ static void reg_mr_callback(int status, void *context)
> >  		pr_err("Error inserting to mkey tree. 0x%x\n", -err);
> >  	write_unlock_irqrestore(&table->lock, flags);
> >
> > -	if (!completion_done(&ent->compl))
> > +do_complete:
> > +	spin_lock_irqsave(&ent->lock, flags);
> > +	if (atomic_read(&ent->do_complete)) {
> >  		complete(&ent->compl);
> > +		atomic_dec(&ent->do_complete);
> > +	}
> > +	spin_unlock_irqrestore(&ent->lock, flags);
>
> Oh, this is quite an ugly way to use completions, I think this has
> veered into misusing completion territory.. The completion_done was
> never right...
>
> add_keys should accept a flag indicating that this MR has a completor
> waiting and should trigger complete() on CB finishing... Can probably
> store the flag someplace in the MR.

I reread this patch again ans you are right.
Let's drop this patch.

Thanks

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
  2018-11-08 19:49   ` Jason Gunthorpe
@ 2018-11-09 16:26     ` Leon Romanovsky
  2018-11-09 16:59       ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-09 16:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

[-- Attachment #1: Type: text/plain, Size: 3018 bytes --]

On Thu, Nov 08, 2018 at 07:49:03PM +0000, Jason Gunthorpe wrote:
> On Thu, Nov 08, 2018 at 09:10:15PM +0200, Leon Romanovsky wrote:
> > From: Moni Shoua <monis@mellanox.com>
> >
> > Telling the HCA that page fault handling is done and QP can resume
> > its flow is done in the context of the page fault handler. This blocks
> > the handling of the next work in queue without a need.
> > Call the PAGE_FAULT_RESUME command in an asynchronous manner and free
> > the workqueue to pick the next work item for handling. All tasks that
> > were executed after PAGE_FAULT_RESUME need to be done now
> > in the callback of the asynchronous command mechanism.
> >
> > Signed-off-by: Moni Shoua <monis@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/hw/mlx5/odp.c | 110 +++++++++++++++++++++++++------
> >  include/linux/mlx5/driver.h      |   3 +
> >  2 files changed, 94 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> > index abce55b8b9ba..0c4f469cdd5b 100644
> > +++ b/drivers/infiniband/hw/mlx5/odp.c
> > @@ -298,20 +298,78 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
> >  	return;
> >  }
> >
> > +struct pfault_resume_cb_ctx {
> > +	struct mlx5_ib_dev *dev;
> > +	struct mlx5_core_rsc_common *res;
> > +	struct mlx5_pagefault *pfault;
> > +};
> > +
> > +static void page_fault_resume_callback(int status, void *context)
> > +{
> > +	struct pfault_resume_cb_ctx *ctx = context;
> > +	struct mlx5_pagefault *pfault = ctx->pfault;
> > +
> > +	if (status)
> > +		mlx5_ib_err(ctx->dev, "Resolve the page fault failed with status %d\n",
> > +			    status);
> > +
> > +	if (ctx->res)
> > +		mlx5_core_res_put(ctx->res);
> > +	kfree(pfault);
> > +	kfree(ctx);
> > +}
> > +
> >  static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
> > +				      struct mlx5_core_rsc_common *res,
> >  				      struct mlx5_pagefault *pfault,
> > -				      int error)
> > +				      int error,
> > +				      bool async)
> >  {
> > +	int ret = 0;
> > +	u32 *out = pfault->out_pf_resume;
> > +	u32 *in = pfault->in_pf_resume;
> > +	u32 token = pfault->token;
> >  	int wq_num = pfault->event_subtype == MLX5_PFAULT_SUBTYPE_WQE ?
> > -		     pfault->wqe.wq_num : pfault->token;
> > -	int ret = mlx5_core_page_fault_resume(dev->mdev,
> > -					      pfault->token,
> > -					      wq_num,
> > -					      pfault->type,
> > -					      error);
> > -	if (ret)
> > -		mlx5_ib_err(dev, "Failed to resolve the page fault on WQ 0x%x\n",
> > -			    wq_num);
> > +		pfault->wqe.wq_num : pfault->token;
> > +	u8 type = pfault->type;
> > +	struct pfault_resume_cb_ctx *ctx = NULL;
> > +
> > +	if (async)
> > +		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>
> Why not allocate this ctx ast part of the mlx5_pagefault and avoid
> this allocation failure strategy?

It is another way to implement it, both of them are correct.

Can I assume that we can progress with patches except patch #2?

Thanks

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
  2018-11-09 16:26     ` Leon Romanovsky
@ 2018-11-09 16:59       ` Jason Gunthorpe
  2018-11-10 15:47         ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2018-11-09 16:59 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

On Fri, Nov 09, 2018 at 06:26:22PM +0200, Leon Romanovsky wrote:
> On Thu, Nov 08, 2018 at 07:49:03PM +0000, Jason Gunthorpe wrote:
> > On Thu, Nov 08, 2018 at 09:10:15PM +0200, Leon Romanovsky wrote:
> > > From: Moni Shoua <monis@mellanox.com>
> > >
> > > Telling the HCA that page fault handling is done and QP can resume
> > > its flow is done in the context of the page fault handler. This blocks
> > > the handling of the next work in queue without a need.
> > > Call the PAGE_FAULT_RESUME command in an asynchronous manner and free
> > > the workqueue to pick the next work item for handling. All tasks that
> > > were executed after PAGE_FAULT_RESUME need to be done now
> > > in the callback of the asynchronous command mechanism.
> > >
> > > Signed-off-by: Moni Shoua <monis@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > >  drivers/infiniband/hw/mlx5/odp.c | 110 +++++++++++++++++++++++++------
> > >  include/linux/mlx5/driver.h      |   3 +
> > >  2 files changed, 94 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> > > index abce55b8b9ba..0c4f469cdd5b 100644
> > > +++ b/drivers/infiniband/hw/mlx5/odp.c
> > > @@ -298,20 +298,78 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
> > >  	return;
> > >  }
> > >
> > > +struct pfault_resume_cb_ctx {
> > > +	struct mlx5_ib_dev *dev;
> > > +	struct mlx5_core_rsc_common *res;
> > > +	struct mlx5_pagefault *pfault;
> > > +};
> > > +
> > > +static void page_fault_resume_callback(int status, void *context)
> > > +{
> > > +	struct pfault_resume_cb_ctx *ctx = context;
> > > +	struct mlx5_pagefault *pfault = ctx->pfault;
> > > +
> > > +	if (status)
> > > +		mlx5_ib_err(ctx->dev, "Resolve the page fault failed with status %d\n",
> > > +			    status);
> > > +
> > > +	if (ctx->res)
> > > +		mlx5_core_res_put(ctx->res);
> > > +	kfree(pfault);
> > > +	kfree(ctx);
> > > +}
> > > +
> > >  static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
> > > +				      struct mlx5_core_rsc_common *res,
> > >  				      struct mlx5_pagefault *pfault,
> > > -				      int error)
> > > +				      int error,
> > > +				      bool async)
> > >  {
> > > +	int ret = 0;
> > > +	u32 *out = pfault->out_pf_resume;
> > > +	u32 *in = pfault->in_pf_resume;
> > > +	u32 token = pfault->token;
> > >  	int wq_num = pfault->event_subtype == MLX5_PFAULT_SUBTYPE_WQE ?
> > > -		     pfault->wqe.wq_num : pfault->token;
> > > -	int ret = mlx5_core_page_fault_resume(dev->mdev,
> > > -					      pfault->token,
> > > -					      wq_num,
> > > -					      pfault->type,
> > > -					      error);
> > > -	if (ret)
> > > -		mlx5_ib_err(dev, "Failed to resolve the page fault on WQ 0x%x\n",
> > > -			    wq_num);
> > > +		pfault->wqe.wq_num : pfault->token;
> > > +	u8 type = pfault->type;
> > > +	struct pfault_resume_cb_ctx *ctx = NULL;
> > > +
> > > +	if (async)
> > > +		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> >
> > Why not allocate this ctx ast part of the mlx5_pagefault and avoid
> > this allocation failure strategy?
> 
> It is another way to implement it, both of them are correct.

.. I think it is alot better to move this allocation, it gets rid of
this ugly duplicated code

> Can I assume that we can progress with patches except patch #2?

Lets drop this one too..

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
  2018-11-09 16:59       ` Jason Gunthorpe
@ 2018-11-10 15:47         ` Leon Romanovsky
  0 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-10 15:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

[-- Attachment #1: Type: text/plain, Size: 3602 bytes --]

On Fri, Nov 09, 2018 at 09:59:53AM -0700, Jason Gunthorpe wrote:
> On Fri, Nov 09, 2018 at 06:26:22PM +0200, Leon Romanovsky wrote:
> > On Thu, Nov 08, 2018 at 07:49:03PM +0000, Jason Gunthorpe wrote:
> > > On Thu, Nov 08, 2018 at 09:10:15PM +0200, Leon Romanovsky wrote:
> > > > From: Moni Shoua <monis@mellanox.com>
> > > >
> > > > Telling the HCA that page fault handling is done and QP can resume
> > > > its flow is done in the context of the page fault handler. This blocks
> > > > the handling of the next work in queue without a need.
> > > > Call the PAGE_FAULT_RESUME command in an asynchronous manner and free
> > > > the workqueue to pick the next work item for handling. All tasks that
> > > > were executed after PAGE_FAULT_RESUME need to be done now
> > > > in the callback of the asynchronous command mechanism.
> > > >
> > > > Signed-off-by: Moni Shoua <monis@mellanox.com>
> > > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > > >  drivers/infiniband/hw/mlx5/odp.c | 110 +++++++++++++++++++++++++------
> > > >  include/linux/mlx5/driver.h      |   3 +
> > > >  2 files changed, 94 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> > > > index abce55b8b9ba..0c4f469cdd5b 100644
> > > > +++ b/drivers/infiniband/hw/mlx5/odp.c
> > > > @@ -298,20 +298,78 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
> > > >  	return;
> > > >  }
> > > >
> > > > +struct pfault_resume_cb_ctx {
> > > > +	struct mlx5_ib_dev *dev;
> > > > +	struct mlx5_core_rsc_common *res;
> > > > +	struct mlx5_pagefault *pfault;
> > > > +};
> > > > +
> > > > +static void page_fault_resume_callback(int status, void *context)
> > > > +{
> > > > +	struct pfault_resume_cb_ctx *ctx = context;
> > > > +	struct mlx5_pagefault *pfault = ctx->pfault;
> > > > +
> > > > +	if (status)
> > > > +		mlx5_ib_err(ctx->dev, "Resolve the page fault failed with status %d\n",
> > > > +			    status);
> > > > +
> > > > +	if (ctx->res)
> > > > +		mlx5_core_res_put(ctx->res);
> > > > +	kfree(pfault);
> > > > +	kfree(ctx);
> > > > +}
> > > > +
> > > >  static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
> > > > +				      struct mlx5_core_rsc_common *res,
> > > >  				      struct mlx5_pagefault *pfault,
> > > > -				      int error)
> > > > +				      int error,
> > > > +				      bool async)
> > > >  {
> > > > +	int ret = 0;
> > > > +	u32 *out = pfault->out_pf_resume;
> > > > +	u32 *in = pfault->in_pf_resume;
> > > > +	u32 token = pfault->token;
> > > >  	int wq_num = pfault->event_subtype == MLX5_PFAULT_SUBTYPE_WQE ?
> > > > -		     pfault->wqe.wq_num : pfault->token;
> > > > -	int ret = mlx5_core_page_fault_resume(dev->mdev,
> > > > -					      pfault->token,
> > > > -					      wq_num,
> > > > -					      pfault->type,
> > > > -					      error);
> > > > -	if (ret)
> > > > -		mlx5_ib_err(dev, "Failed to resolve the page fault on WQ 0x%x\n",
> > > > -			    wq_num);
> > > > +		pfault->wqe.wq_num : pfault->token;
> > > > +	u8 type = pfault->type;
> > > > +	struct pfault_resume_cb_ctx *ctx = NULL;
> > > > +
> > > > +	if (async)
> > > > +		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> > >
> > > Why not allocate this ctx ast part of the mlx5_pagefault and avoid
> > > this allocation failure strategy?
> >
> > It is another way to implement it, both of them are correct.
>
> .. I think it is alot better to move this allocation, it gets rid of
> this ugly duplicated code
>
> > Can I assume that we can progress with patches except patch #2?
>
> Lets drop this one too..

Sure, thanks

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 00/10] Collection of ODP fixes
  2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
                   ` (10 preceding siblings ...)
  2018-11-08 19:45 ` [PATCH mlx5-next 00/10] Collection of ODP fixes Jason Gunthorpe
@ 2018-11-12 20:43 ` Leon Romanovsky
  2018-11-19 22:25   ` Jason Gunthorpe
  11 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2018-11-12 20:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: RDMA mailing list, Artemy Kovalyov, Majd Dibbiny, Moni Shoua,
	Saeed Mahameed, linux-netdev

[-- Attachment #1: Type: text/plain, Size: 2622 bytes --]

On Thu, Nov 08, 2018 at 09:10:07PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
>
> Hi,
>
> This is collection of fixes to mlx5_core and mlx5_ib ODP logic.
> There are two main reasons why we decided to forward it to mlx5-next
> and not to rdma-rc or net like our other fixes.
>
> 1. We have large number of patches exactly in that area that are ready
> for submission and we don't want to create extra merge work for
> maintainers due to that. Among those future series (already ready and
> tested): internal change of mlx5_core pagefaults handling, moving ODP
> code to RDMA, change in EQ mechanism, simplification and moving SRQ QP
> to RDMA, extra fixes to mlx5_ib ODP and ODP SRQ support.
>
> 2. Most of those fixes are for old bugs and there is no rush to fix them
> right now (in -rc1/-rc2).
>
> Thanks
>
> Moni Shoua (10):
>   net/mlx5: Release resource on error flow
>   net/mlx5: Add interface to hold and release core resources
>   net/mlx5: Enumerate page fault types
>   IB/mlx5: Lock QP during page fault handling
>   net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
>   net/mlx5: Use multi threaded workqueue for page fault handling
>   IB/mlx5: Improve ODP debugging messages

Applied to mlx5-next
b02394aa75e3 IB/mlx5: Improve ODP debugging messages
90290db7669b net/mlx5: Use multi threaded workqueue for page fault handling
ef90c5e9757d net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
032080ab43ac IB/mlx5: Lock QP during page fault handling
c99fefea2cc9 net/mlx5: Enumerate page fault types
27e95603f4df net/mlx5: Add interface to hold and release core resources
698114968a22 net/mlx5: Release resource on error flow

>   IB/mlx5: Avoid hangs due to a missing completion
>   IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously

Dropped to address feedback.

>   net/mlx5: Remove unused function

Dropped, because it depends on dropped commit.

Thanks

>
>  drivers/infiniband/core/umem_odp.c            |  14 +-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h          |   1 +
>  drivers/infiniband/hw/mlx5/mr.c               |  15 +-
>  drivers/infiniband/hw/mlx5/odp.c              | 158 ++++++++++++++----
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  21 +--
>  drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  20 ++-
>  include/linux/mlx5/device.h                   |   7 +
>  include/linux/mlx5/driver.h                   |   7 +-
>  include/linux/mlx5/qp.h                       |   5 +
>  10 files changed, 191 insertions(+), 59 deletions(-)
>
> --
> 2.19.1
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH mlx5-next 00/10] Collection of ODP fixes
  2018-11-12 20:43 ` Leon Romanovsky
@ 2018-11-19 22:25   ` Jason Gunthorpe
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2018-11-19 22:25 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Artemy Kovalyov, Majd Dibbiny,
	Moni Shoua, Saeed Mahameed, linux-netdev

On Mon, Nov 12, 2018 at 01:43:15PM -0700, Leon Romanovsky wrote:
> On Thu, Nov 08, 2018 at 09:10:07PM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Hi,
> >
> > This is collection of fixes to mlx5_core and mlx5_ib ODP logic.
> > There are two main reasons why we decided to forward it to mlx5-next
> > and not to rdma-rc or net like our other fixes.
> >
> > 1. We have large number of patches exactly in that area that are ready
> > for submission and we don't want to create extra merge work for
> > maintainers due to that. Among those future series (already ready and
> > tested): internal change of mlx5_core pagefaults handling, moving ODP
> > code to RDMA, change in EQ mechanism, simplification and moving SRQ QP
> > to RDMA, extra fixes to mlx5_ib ODP and ODP SRQ support.
> >
> > 2. Most of those fixes are for old bugs and there is no rush to fix them
> > right now (in -rc1/-rc2).
> >
> > Thanks
> >
> > Moni Shoua (10):
> >   net/mlx5: Release resource on error flow
> >   net/mlx5: Add interface to hold and release core resources
> >   net/mlx5: Enumerate page fault types
> >   IB/mlx5: Lock QP during page fault handling
> >   net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
> >   net/mlx5: Use multi threaded workqueue for page fault handling
> >   IB/mlx5: Improve ODP debugging messages
> 
> Applied to mlx5-next
> b02394aa75e3 IB/mlx5: Improve ODP debugging messages
> 90290db7669b net/mlx5: Use multi threaded workqueue for page fault handling
> ef90c5e9757d net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
> 032080ab43ac IB/mlx5: Lock QP during page fault handling
> c99fefea2cc9 net/mlx5: Enumerate page fault types
> 27e95603f4df net/mlx5: Add interface to hold and release core resources
> 698114968a22 net/mlx5: Release resource on error flow
> 
> >   IB/mlx5: Avoid hangs due to a missing completion
> >   IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
> 
> Dropped to address feedback.
> 
> >   net/mlx5: Remove unused function
> 
> Dropped, because it depends on dropped commit.

Okay, merged into for-next

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-11-19 22:25 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-08 19:10 [PATCH mlx5-next 00/10] Collection of ODP fixes Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 01/10] net/mlx5: Release resource on error flow Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 02/10] IB/mlx5: Avoid hangs due to a missing completion Leon Romanovsky
2018-11-08 19:44   ` Jason Gunthorpe
2018-11-09 16:17     ` Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 03/10] net/mlx5: Add interface to hold and release core resources Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 04/10] net/mlx5: Enumerate page fault types Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 05/10] IB/mlx5: Lock QP during page fault handling Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 06/10] net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 07/10] net/mlx5: Use multi threaded workqueue for page fault handling Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously Leon Romanovsky
2018-11-08 19:49   ` Jason Gunthorpe
2018-11-09 16:26     ` Leon Romanovsky
2018-11-09 16:59       ` Jason Gunthorpe
2018-11-10 15:47         ` Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 09/10] net/mlx5: Remove unused function Leon Romanovsky
2018-11-08 19:10 ` [PATCH mlx5-next 10/10] IB/mlx5: Improve ODP debugging messages Leon Romanovsky
2018-11-08 19:45 ` [PATCH mlx5-next 00/10] Collection of ODP fixes Jason Gunthorpe
2018-11-08 19:50   ` Saeed Mahameed
2018-11-12 20:43 ` Leon Romanovsky
2018-11-19 22:25   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).