[PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations
@ 2024-08-21 11:40 Dragos Tatulea
  2024-08-21 11:40 ` [PATCH vhost 1/7] vdpa/mlx5: Create direct MKEYs in parallel Dragos Tatulea
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-21 11:40 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin, Si-Wei Liu,
	virtualization, Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

This series improves the time of .set_map() operations by parallelizing
the MKEY creation and deletion for direct MKEYs. Looking at the top
level MKEY creation/deletion functions, the following improvement can be
seen:

|-------------------+-------------|
| operation         | improvement |
|-------------------+-------------|
| create_user_mr()  | 3-5x        |
| destroy_user_mr() | 8x          |
|-------------------+-------------|

The last part of the series introduces lazy MKEY deletion which
postpones the MKEY deletion to a later point in a workqueue.

As this series and the previous ones were targeting live migration,
we can also observe improvements on this front:

|-------------------+------------------+------------------|
| Stage             | Downtime #1 (ms) | Downtime #2 (ms) |
|-------------------+------------------+------------------|
| Baseline          | 3140             | 3630             |
| Parallel MKEY ops | 1200             | 2000             |
| Deferred deletion | 1014             | 1253             |
|-------------------+------------------+------------------|

Test configuration: 256 GB VM, 32 CPUs x 2 threads per core, 4 x mlx5
vDPA devices x 32 VQs (16 VQPs)

This series must be applied on top of the parallel VQ suspend/resume
series [0].

[0] https://lore.kernel.org/all/20240816090159.1967650-1-dtatulea@nvidia.com/

Dragos Tatulea (7):
  vdpa/mlx5: Create direct MKEYs in parallel
  vdpa/mlx5: Delete direct MKEYs in parallel
  vdpa/mlx5: Rename function
  vdpa/mlx5: Extract mr members in own resource struct
  vdpa/mlx5: Rename mr_mtx -> lock
  vdpa/mlx5: Introduce init/destroy for MR resources
  vdpa/mlx5: Postpone MR deletion

 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  25 ++-
 drivers/vdpa/mlx5/core/mr.c        | 284 ++++++++++++++++++++++++-----
 drivers/vdpa/mlx5/core/resources.c |   3 -
 drivers/vdpa/mlx5/net/mlx5_vnet.c  |  53 +++---
 4 files changed, 293 insertions(+), 72 deletions(-)

-- 
2.45.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH vhost 1/7] vdpa/mlx5: Create direct MKEYs in parallel
  2024-08-21 11:40 [PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations Dragos Tatulea
@ 2024-08-21 11:40 ` Dragos Tatulea
  2024-08-29 13:10   ` Eugenio Perez Martin
  2024-08-21 11:40 ` [PATCH vhost 2/7] vdpa/mlx5: Delete " Dragos Tatulea
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-21 11:40 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin, Si-Wei Liu,
	virtualization, Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

Use the async interface to issue MTT MKEY creation.
Extra care is taken at the allocation of FW input commands
due to the MTT tables having variable sizes depending on
MR.

The indirect MKEY is still created synchronously at the
end as the direct MKEYs need to be filled in.

This makes create_user_mr() 3-5x faster, depending on
the size of the MR.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/core/mr.c | 118 +++++++++++++++++++++++++++++-------
 1 file changed, 96 insertions(+), 22 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 4758914ccf86..66e6a15f823f 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -49,17 +49,18 @@ static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
 	}
 }
 
-static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
+struct mlx5_create_mkey_mem {
+	u8 out[MLX5_ST_SZ_BYTES(create_mkey_out)];
+	u8 in[MLX5_ST_SZ_BYTES(create_mkey_in)];
+	DECLARE_FLEX_ARRAY(__be64, mtt);
+};
+
+static void fill_create_direct_mr(struct mlx5_vdpa_dev *mvdev,
+				  struct mlx5_vdpa_direct_mr *mr,
+				  struct mlx5_create_mkey_mem *mem)
 {
-	int inlen;
+	void *in = &mem->in;
 	void *mkc;
-	void *in;
-	int err;
-
-	inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + roundup(MLX5_ST_SZ_BYTES(mtt) * mr->nsg, 16);
-	in = kvzalloc(inlen, GFP_KERNEL);
-	if (!in)
-		return -ENOMEM;
 
 	MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
 	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
@@ -76,18 +77,25 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct
 	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
 		 get_octo_len(mr->end - mr->start, mr->log_size));
 	populate_mtts(mr, MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt));
-	err = mlx5_vdpa_create_mkey(mvdev, &mr->mr, in, inlen);
-	kvfree(in);
-	if (err) {
-		mlx5_vdpa_warn(mvdev, "Failed to create direct MR\n");
-		return err;
-	}
 
-	return 0;
+	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
+}
+
+static void create_direct_mr_end(struct mlx5_vdpa_dev *mvdev,
+				 struct mlx5_vdpa_direct_mr *mr,
+				 struct mlx5_create_mkey_mem *mem)
+{
+	u32 mkey_index = MLX5_GET(create_mkey_out, mem->out, mkey_index);
+
+	mr->mr = mlx5_idx_to_mkey(mkey_index);
 }
 
 static void destroy_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
 {
+	if (!mr->mr)
+		return;
+
 	mlx5_vdpa_destroy_mkey(mvdev, mr->mr);
 }
 
@@ -179,6 +187,74 @@ static int klm_byte_size(int nklms)
 	return 16 * ALIGN(nklms, 4);
 }
 
+static int create_direct_keys(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
+{
+	struct mlx5_vdpa_async_cmd *cmds = NULL;
+	struct mlx5_vdpa_direct_mr *dmr;
+	int err = 0;
+	int i = 0;
+
+	cmds = kvcalloc(mr->num_directs, sizeof(*cmds), GFP_KERNEL);
+	if (!cmds)
+		return -ENOMEM;
+
+	list_for_each_entry(dmr, &mr->head, list) {
+		struct mlx5_create_mkey_mem *cmd_mem;
+		int mttlen, mttcount;
+
+		mttlen = roundup(MLX5_ST_SZ_BYTES(mtt) * dmr->nsg, 16);
+		mttcount = mttlen / sizeof(cmd_mem->mtt[0]);
+		cmd_mem = kvcalloc(1, struct_size(cmd_mem, mtt, mttcount), GFP_KERNEL);
+		if (!cmd_mem) {
+			err = -ENOMEM;
+			goto done;
+		}
+
+		cmds[i].out = cmd_mem->out;
+		cmds[i].outlen = sizeof(cmd_mem->out);
+		cmds[i].in = cmd_mem->in;
+		cmds[i].inlen = struct_size(cmd_mem, mtt, mttcount);
+
+		fill_create_direct_mr(mvdev, dmr, cmd_mem);
+
+		i++;
+	}
+
+	err = mlx5_vdpa_exec_async_cmds(mvdev, cmds, mr->num_directs);
+	if (err) {
+
+		mlx5_vdpa_err(mvdev, "error issuing MTT mkey creation for direct mrs: %d\n", err);
+		goto done;
+	}
+
+	i = 0;
+	list_for_each_entry(dmr, &mr->head, list) {
+		struct mlx5_vdpa_async_cmd *cmd = &cmds[i++];
+		struct mlx5_create_mkey_mem *cmd_mem;
+
+		cmd_mem = container_of(cmd->out, struct mlx5_create_mkey_mem, out);
+
+		if (!cmd->err) {
+			create_direct_mr_end(mvdev, dmr, cmd_mem);
+		} else {
+			err = err ? err : cmd->err;
+			mlx5_vdpa_err(mvdev, "error creating MTT mkey [0x%llx, 0x%llx]: %d\n",
+				dmr->start, dmr->end, cmd->err);
+		}
+	}
+
+done:
+	for (i = i-1; i >= 0; i--) {
+		struct mlx5_create_mkey_mem *cmd_mem;
+
+		cmd_mem = container_of(cmds[i].out, struct mlx5_create_mkey_mem, out);
+		kvfree(cmd_mem);
+	}
+
+	kvfree(cmds);
+	return err;
+}
+
 static int create_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
 {
 	int inlen;
@@ -279,14 +355,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr
 		goto err_map;
 	}
 
-	err = create_direct_mr(mvdev, mr);
-	if (err)
-		goto err_direct;
-
 	return 0;
 
-err_direct:
-	dma_unmap_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
 err_map:
 	sg_free_table(&mr->sg_head);
 	return err;
@@ -401,6 +471,10 @@ static int create_user_mr(struct mlx5_vdpa_dev *mvdev,
 	if (err)
 		goto err_chain;
 
+	err = create_direct_keys(mvdev, mr);
+	if (err)
+		goto err_chain;
+
 	/* Create the memory key that defines the guests's address space. This
 	 * memory key refers to the direct keys that contain the MTT
 	 * translations
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH vhost 2/7] vdpa/mlx5: Delete direct MKEYs in parallel
  2024-08-21 11:40 [PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations Dragos Tatulea
  2024-08-21 11:40 ` [PATCH vhost 1/7] vdpa/mlx5: Create direct MKEYs in parallel Dragos Tatulea
@ 2024-08-21 11:40 ` Dragos Tatulea
  2024-08-29 13:42   ` Eugenio Perez Martin
  2024-08-21 11:40 ` [PATCH vhost 3/7] vdpa/mlx5: Rename function Dragos Tatulea
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-21 11:40 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin, Si-Wei Liu,
	virtualization, Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

Use the async interface to issue MTT MKEY deletion.

This makes destroy_user_mr() on average 8x times faster.
This number is also dependent on the size of the MR being
deleted.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/core/mr.c | 66 +++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 66e6a15f823f..8cedf2969991 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -55,6 +55,11 @@ struct mlx5_create_mkey_mem {
 	DECLARE_FLEX_ARRAY(__be64, mtt);
 };
 
+struct mlx5_destroy_mkey_mem {
+	u8 out[MLX5_ST_SZ_BYTES(destroy_mkey_out)];
+	u8 in[MLX5_ST_SZ_BYTES(destroy_mkey_in)];
+};
+
 static void fill_create_direct_mr(struct mlx5_vdpa_dev *mvdev,
 				  struct mlx5_vdpa_direct_mr *mr,
 				  struct mlx5_create_mkey_mem *mem)
@@ -91,6 +96,17 @@ static void create_direct_mr_end(struct mlx5_vdpa_dev *mvdev,
 	mr->mr = mlx5_idx_to_mkey(mkey_index);
 }
 
+static void fill_destroy_direct_mr(struct mlx5_vdpa_dev *mvdev,
+				   struct mlx5_vdpa_direct_mr *mr,
+				   struct mlx5_destroy_mkey_mem *mem)
+{
+	void *in = &mem->in;
+
+	MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
+	MLX5_SET(destroy_mkey_in, in, opcode, MLX5_CMD_OP_DESTROY_MKEY);
+	MLX5_SET(destroy_mkey_in, in, mkey_index, mlx5_mkey_to_idx(mr->mr));
+}
+
 static void destroy_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
 {
 	if (!mr->mr)
@@ -255,6 +271,55 @@ static int create_direct_keys(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *
 	return err;
 }
 
+static int destroy_direct_keys(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
+{
+	struct mlx5_destroy_mkey_mem *cmd_mem;
+	struct mlx5_vdpa_async_cmd *cmds;
+	struct mlx5_vdpa_direct_mr *dmr;
+	int err = 0;
+	int i = 0;
+
+	cmds = kvcalloc(mr->num_directs, sizeof(*cmds), GFP_KERNEL);
+	cmd_mem = kvcalloc(mr->num_directs, sizeof(*cmd_mem), GFP_KERNEL);
+	if (!cmds || !cmd_mem) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	list_for_each_entry(dmr, &mr->head, list) {
+		cmds[i].out = cmd_mem[i].out;
+		cmds[i].outlen = sizeof(cmd_mem[i].out);
+		cmds[i].in = cmd_mem[i].in;
+		cmds[i].inlen = sizeof(cmd_mem[i].in);
+		fill_destroy_direct_mr(mvdev, dmr, &cmd_mem[i]);
+		i++;
+	}
+
+	err = mlx5_vdpa_exec_async_cmds(mvdev, cmds, mr->num_directs);
+	if (err) {
+
+		mlx5_vdpa_err(mvdev, "error issuing MTT mkey deletion for direct mrs: %d\n", err);
+		goto done;
+	}
+
+	i = 0;
+	list_for_each_entry(dmr, &mr->head, list) {
+		struct mlx5_vdpa_async_cmd *cmd = &cmds[i++];
+
+		dmr->mr = 0;
+		if (cmd->err) {
+			err = err ? err : cmd->err;
+			mlx5_vdpa_err(mvdev, "error deleting MTT mkey [0x%llx, 0x%llx]: %d\n",
+				dmr->start, dmr->end, cmd->err);
+		}
+	}
+
+done:
+	kvfree(cmd_mem);
+	kvfree(cmds);
+	return err;
+}
+
 static int create_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
 {
 	int inlen;
@@ -563,6 +628,7 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr
 	struct mlx5_vdpa_direct_mr *n;
 
 	destroy_indirect_key(mvdev, mr);
+	destroy_direct_keys(mvdev, mr);
 	list_for_each_entry_safe_reverse(dmr, n, &mr->head, list) {
 		list_del_init(&dmr->list);
 		unmap_direct_mr(mvdev, dmr);
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH vhost 3/7] vdpa/mlx5: Rename function
  2024-08-21 11:40 [PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations Dragos Tatulea
  2024-08-21 11:40 ` [PATCH vhost 1/7] vdpa/mlx5: Create direct MKEYs in parallel Dragos Tatulea
  2024-08-21 11:40 ` [PATCH vhost 2/7] vdpa/mlx5: Delete " Dragos Tatulea
@ 2024-08-21 11:40 ` Dragos Tatulea
  2024-08-21 11:40 ` [PATCH vhost 4/7] vdpa/mlx5: Extract mr members in own resource struct Dragos Tatulea
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-21 11:40 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin, Si-Wei Liu,
	virtualization, Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

A followup patch will use this name for something else.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h | 2 +-
 drivers/vdpa/mlx5/core/mr.c        | 2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 24fa00afb24f..4d217d18239c 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -135,7 +135,7 @@ int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, u32 *mkey, u32 *in,
 int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey);
 struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
 					 struct vhost_iotlb *iotlb);
-void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
+void mlx5_vdpa_clean_mrs(struct mlx5_vdpa_dev *mvdev);
 void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
 		      struct mlx5_vdpa_mr *mr);
 void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 8cedf2969991..149edea09c8f 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -719,7 +719,7 @@ static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
 
 }
 
-void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
+void mlx5_vdpa_clean_mrs(struct mlx5_vdpa_dev *mvdev)
 {
 	for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
 		mlx5_vdpa_update_mr(mvdev, NULL, i);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 822092eccb32..cf2b77ebc72b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3223,7 +3223,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
 err_driver:
 	unregister_link_notifier(ndev);
 err_setup:
-	mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
+	mlx5_vdpa_clean_mrs(&ndev->mvdev);
 	ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
 err_clear:
 	up_write(&ndev->reslock);
@@ -3275,7 +3275,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
 	}
 
 	if (flags & VDPA_RESET_F_CLEAN_MAP)
-		mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
+		mlx5_vdpa_clean_mrs(&ndev->mvdev);
 	ndev->mvdev.status = 0;
 	ndev->mvdev.suspended = false;
 	ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
@@ -3433,7 +3433,7 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
 	ndev = to_mlx5_vdpa_ndev(mvdev);
 
 	free_fixed_resources(ndev);
-	mlx5_vdpa_destroy_mr_resources(mvdev);
+	mlx5_vdpa_clean_mrs(mvdev);
 	if (!is_zero_ether_addr(ndev->config.mac)) {
 		pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
 		mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
@@ -4008,7 +4008,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 err_res2:
 	free_fixed_resources(ndev);
 err_mr:
-	mlx5_vdpa_destroy_mr_resources(mvdev);
+	mlx5_vdpa_clean_mrs(mvdev);
 err_res:
 	mlx5_vdpa_free_resources(&ndev->mvdev);
 err_mpfs:
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH vhost 4/7] vdpa/mlx5: Extract mr members in own resource struct
  2024-08-21 11:40 [PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations Dragos Tatulea
                   ` (2 preceding siblings ...)
  2024-08-21 11:40 ` [PATCH vhost 3/7] vdpa/mlx5: Rename function Dragos Tatulea
@ 2024-08-21 11:40 ` Dragos Tatulea
  2024-08-29 14:16   ` Eugenio Perez Martin
  2024-08-21 11:40 ` [PATCH vhost 5/7] vdpa/mlx5: Rename mr_mtx -> lock Dragos Tatulea
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-21 11:40 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin, Si-Wei Liu,
	virtualization, Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

Group all mapping related resources into their own structure.

Upcoming patches will add more members in this new structure.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h | 13 ++++++-----
 drivers/vdpa/mlx5/core/mr.c        | 30 ++++++++++++-------------
 drivers/vdpa/mlx5/core/resources.c |  6 ++---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 36 +++++++++++++++---------------
 4 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 4d217d18239c..5ae6deea2a8a 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -83,10 +83,18 @@ enum {
 	MLX5_VDPA_NUM_AS = 2
 };
 
+struct mlx5_vdpa_mr_resources {
+	struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
+	unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
+	struct list_head mr_list_head;
+	struct mutex mr_mtx;
+};
+
 struct mlx5_vdpa_dev {
 	struct vdpa_device vdev;
 	struct mlx5_core_dev *mdev;
 	struct mlx5_vdpa_resources res;
+	struct mlx5_vdpa_mr_resources mres;
 
 	u64 mlx_features;
 	u64 actual_features;
@@ -95,13 +103,8 @@ struct mlx5_vdpa_dev {
 	u16 max_idx;
 	u32 generation;
 
-	struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
-	struct list_head mr_list_head;
-	/* serialize mr access */
-	struct mutex mr_mtx;
 	struct mlx5_control_vq cvq;
 	struct workqueue_struct *wq;
-	unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
 	bool suspended;
 
 	struct mlx5_async_ctx async_ctx;
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 149edea09c8f..2c8660e5c0de 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -666,9 +666,9 @@ static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
 void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
 		      struct mlx5_vdpa_mr *mr)
 {
-	mutex_lock(&mvdev->mr_mtx);
+	mutex_lock(&mvdev->mres.mr_mtx);
 	_mlx5_vdpa_put_mr(mvdev, mr);
-	mutex_unlock(&mvdev->mr_mtx);
+	mutex_unlock(&mvdev->mres.mr_mtx);
 }
 
 static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
@@ -683,39 +683,39 @@ static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
 void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
 		      struct mlx5_vdpa_mr *mr)
 {
-	mutex_lock(&mvdev->mr_mtx);
+	mutex_lock(&mvdev->mres.mr_mtx);
 	_mlx5_vdpa_get_mr(mvdev, mr);
-	mutex_unlock(&mvdev->mr_mtx);
+	mutex_unlock(&mvdev->mres.mr_mtx);
 }
 
 void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
 			 struct mlx5_vdpa_mr *new_mr,
 			 unsigned int asid)
 {
-	struct mlx5_vdpa_mr *old_mr = mvdev->mr[asid];
+	struct mlx5_vdpa_mr *old_mr = mvdev->mres.mr[asid];
 
-	mutex_lock(&mvdev->mr_mtx);
+	mutex_lock(&mvdev->mres.mr_mtx);
 
 	_mlx5_vdpa_put_mr(mvdev, old_mr);
-	mvdev->mr[asid] = new_mr;
+	mvdev->mres.mr[asid] = new_mr;
 
-	mutex_unlock(&mvdev->mr_mtx);
+	mutex_unlock(&mvdev->mres.mr_mtx);
 }
 
 static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
 {
 	struct mlx5_vdpa_mr *mr;
 
-	mutex_lock(&mvdev->mr_mtx);
+	mutex_lock(&mvdev->mres.mr_mtx);
 
-	list_for_each_entry(mr, &mvdev->mr_list_head, mr_list) {
+	list_for_each_entry(mr, &mvdev->mres.mr_list_head, mr_list) {
 
 		mlx5_vdpa_warn(mvdev, "mkey still alive after resource delete: "
 				      "mr: %p, mkey: 0x%x, refcount: %u\n",
 				       mr, mr->mkey, refcount_read(&mr->refcount));
 	}
 
-	mutex_unlock(&mvdev->mr_mtx);
+	mutex_unlock(&mvdev->mres.mr_mtx);
 
 }
 
@@ -753,7 +753,7 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
 	if (err)
 		goto err_iotlb;
 
-	list_add_tail(&mr->mr_list, &mvdev->mr_list_head);
+	list_add_tail(&mr->mr_list, &mvdev->mres.mr_list_head);
 
 	return 0;
 
@@ -779,9 +779,9 @@ struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	mutex_lock(&mvdev->mr_mtx);
+	mutex_lock(&mvdev->mres.mr_mtx);
 	err = _mlx5_vdpa_create_mr(mvdev, mr, iotlb);
-	mutex_unlock(&mvdev->mr_mtx);
+	mutex_unlock(&mvdev->mres.mr_mtx);
 
 	if (err)
 		goto out_err;
@@ -801,7 +801,7 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
 {
 	int err;
 
-	if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
+	if (mvdev->mres.group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
 		return 0;
 
 	spin_lock(&mvdev->cvq.iommu_lock);
diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
index 22ea32fe007b..3e3b3049cb08 100644
--- a/drivers/vdpa/mlx5/core/resources.c
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -256,7 +256,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 		mlx5_vdpa_warn(mvdev, "resources already allocated\n");
 		return -EINVAL;
 	}
-	mutex_init(&mvdev->mr_mtx);
+	mutex_init(&mvdev->mres.mr_mtx);
 	res->uar = mlx5_get_uars_page(mdev);
 	if (IS_ERR(res->uar)) {
 		err = PTR_ERR(res->uar);
@@ -301,7 +301,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 err_uctx:
 	mlx5_put_uars_page(mdev, res->uar);
 err_uars:
-	mutex_destroy(&mvdev->mr_mtx);
+	mutex_destroy(&mvdev->mres.mr_mtx);
 	return err;
 }
 
@@ -318,7 +318,7 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
 	dealloc_pd(mvdev, res->pdn, res->uid);
 	destroy_uctx(mvdev, res->uid);
 	mlx5_put_uars_page(mvdev->mdev, res->uar);
-	mutex_destroy(&mvdev->mr_mtx);
+	mutex_destroy(&mvdev->mres.mr_mtx);
 	res->valid = false;
 }
 
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index cf2b77ebc72b..3e55a7f1afcd 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -941,11 +941,11 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev,
 		MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
 		MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
 
-		vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+		vq_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_GROUP]];
 		if (vq_mr)
 			MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
 
-		vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+		vq_desc_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
 		if (vq_desc_mr &&
 		    MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
 			MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, vq_desc_mr->mkey);
@@ -953,11 +953,11 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev,
 		/* If there is no mr update, make sure that the existing ones are set
 		 * modify to ready.
 		 */
-		vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+		vq_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_GROUP]];
 		if (vq_mr)
 			mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
 
-		vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+		vq_desc_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
 		if (vq_desc_mr)
 			mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
 	}
@@ -1354,7 +1354,7 @@ static void fill_modify_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
 	}
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
-		vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+		vq_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_GROUP]];
 
 		if (vq_mr)
 			MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
@@ -1363,7 +1363,7 @@ static void fill_modify_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
 	}
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
-		desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+		desc_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
 
 		if (desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
 			MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, desc_mr->mkey);
@@ -1381,8 +1381,8 @@ static void modify_virtqueue_end(struct mlx5_vdpa_net *ndev,
 	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
-		unsigned int asid = mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP];
-		struct mlx5_vdpa_mr *vq_mr = mvdev->mr[asid];
+		unsigned int asid = mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_GROUP];
+		struct mlx5_vdpa_mr *vq_mr = mvdev->mres.mr[asid];
 
 		mlx5_vdpa_put_mr(mvdev, mvq->vq_mr);
 		mlx5_vdpa_get_mr(mvdev, vq_mr);
@@ -1390,8 +1390,8 @@ static void modify_virtqueue_end(struct mlx5_vdpa_net *ndev,
 	}
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
-		unsigned int asid = mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP];
-		struct mlx5_vdpa_mr *desc_mr = mvdev->mr[asid];
+		unsigned int asid = mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP];
+		struct mlx5_vdpa_mr *desc_mr = mvdev->mres.mr[asid];
 
 		mlx5_vdpa_put_mr(mvdev, mvq->desc_mr);
 		mlx5_vdpa_get_mr(mvdev, desc_mr);
@@ -3235,7 +3235,7 @@ static void init_group_to_asid_map(struct mlx5_vdpa_dev *mvdev)
 
 	/* default mapping all groups are mapped to asid 0 */
 	for (i = 0; i < MLX5_VDPA_NUMVQ_GROUPS; i++)
-		mvdev->group2asid[i] = 0;
+		mvdev->mres.group2asid[i] = 0;
 }
 
 static bool needs_vqs_reset(const struct mlx5_vdpa_dev *mvdev)
@@ -3353,7 +3353,7 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
 		new_mr = NULL;
 	}
 
-	if (!mvdev->mr[asid]) {
+	if (!mvdev->mres.mr[asid]) {
 		mlx5_vdpa_update_mr(mvdev, new_mr, asid);
 	} else {
 		err = mlx5_vdpa_change_map(mvdev, new_mr, asid);
@@ -3637,12 +3637,12 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
 	if (group >= MLX5_VDPA_NUMVQ_GROUPS)
 		return -EINVAL;
 
-	mvdev->group2asid[group] = asid;
+	mvdev->mres.group2asid[group] = asid;
 
-	mutex_lock(&mvdev->mr_mtx);
-	if (group == MLX5_VDPA_CVQ_GROUP && mvdev->mr[asid])
-		err = mlx5_vdpa_update_cvq_iotlb(mvdev, mvdev->mr[asid]->iotlb, asid);
-	mutex_unlock(&mvdev->mr_mtx);
+	mutex_lock(&mvdev->mres.mr_mtx);
+	if (group == MLX5_VDPA_CVQ_GROUP && mvdev->mres.mr[asid])
+		err = mlx5_vdpa_update_cvq_iotlb(mvdev, mvdev->mres.mr[asid]->iotlb, asid);
+	mutex_unlock(&mvdev->mres.mr_mtx);
 
 	return err;
 }
@@ -3962,7 +3962,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 	if (err)
 		goto err_mpfs;
 
-	INIT_LIST_HEAD(&mvdev->mr_list_head);
+	INIT_LIST_HEAD(&mvdev->mres.mr_list_head);
 
 	if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
 		err = mlx5_vdpa_create_dma_mr(mvdev);
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH vhost 5/7] vdpa/mlx5: Rename mr_mtx -> lock
  2024-08-21 11:40 [PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations Dragos Tatulea
                   ` (3 preceding siblings ...)
  2024-08-21 11:40 ` [PATCH vhost 4/7] vdpa/mlx5: Extract mr members in own resource struct Dragos Tatulea
@ 2024-08-21 11:40 ` Dragos Tatulea
  2024-08-29 14:18   ` Eugenio Perez Martin
  2024-08-21 11:41 ` [PATCH vhost 6/7] vdpa/mlx5: Introduce init/destroy for MR resources Dragos Tatulea
  2024-08-21 11:41 ` [PATCH vhost 7/7] vdpa/mlx5: Postpone MR deletion Dragos Tatulea
  6 siblings, 1 reply; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-21 11:40 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin, Si-Wei Liu,
	virtualization, Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

Now that the mr resources have their own namespace in the
struct, give the lock a clearer name.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 +-
 drivers/vdpa/mlx5/core/mr.c        | 20 ++++++++++----------
 drivers/vdpa/mlx5/core/resources.c |  6 +++---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  |  4 ++--
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 5ae6deea2a8a..89b564cecddf 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -87,7 +87,7 @@ struct mlx5_vdpa_mr_resources {
 	struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
 	unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
 	struct list_head mr_list_head;
-	struct mutex mr_mtx;
+	struct mutex lock;
 };
 
 struct mlx5_vdpa_dev {
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 2c8660e5c0de..f20f2a8a701d 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -666,9 +666,9 @@ static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
 void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
 		      struct mlx5_vdpa_mr *mr)
 {
-	mutex_lock(&mvdev->mres.mr_mtx);
+	mutex_lock(&mvdev->mres.lock);
 	_mlx5_vdpa_put_mr(mvdev, mr);
-	mutex_unlock(&mvdev->mres.mr_mtx);
+	mutex_unlock(&mvdev->mres.lock);
 }
 
 static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
@@ -683,9 +683,9 @@ static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
 void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
 		      struct mlx5_vdpa_mr *mr)
 {
-	mutex_lock(&mvdev->mres.mr_mtx);
+	mutex_lock(&mvdev->mres.lock);
 	_mlx5_vdpa_get_mr(mvdev, mr);
-	mutex_unlock(&mvdev->mres.mr_mtx);
+	mutex_unlock(&mvdev->mres.lock);
 }
 
 void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
@@ -694,19 +694,19 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
 {
 	struct mlx5_vdpa_mr *old_mr = mvdev->mres.mr[asid];
 
-	mutex_lock(&mvdev->mres.mr_mtx);
+	mutex_lock(&mvdev->mres.lock);
 
 	_mlx5_vdpa_put_mr(mvdev, old_mr);
 	mvdev->mres.mr[asid] = new_mr;
 
-	mutex_unlock(&mvdev->mres.mr_mtx);
+	mutex_unlock(&mvdev->mres.lock);
 }
 
 static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
 {
 	struct mlx5_vdpa_mr *mr;
 
-	mutex_lock(&mvdev->mres.mr_mtx);
+	mutex_lock(&mvdev->mres.lock);
 
 	list_for_each_entry(mr, &mvdev->mres.mr_list_head, mr_list) {
 
@@ -715,7 +715,7 @@ static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
 				       mr, mr->mkey, refcount_read(&mr->refcount));
 	}
 
-	mutex_unlock(&mvdev->mres.mr_mtx);
+	mutex_unlock(&mvdev->mres.lock);
 
 }
 
@@ -779,9 +779,9 @@ struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	mutex_lock(&mvdev->mres.mr_mtx);
+	mutex_lock(&mvdev->mres.lock);
 	err = _mlx5_vdpa_create_mr(mvdev, mr, iotlb);
-	mutex_unlock(&mvdev->mres.mr_mtx);
+	mutex_unlock(&mvdev->mres.lock);
 
 	if (err)
 		goto out_err;
diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
index 3e3b3049cb08..fe2ca3458f6c 100644
--- a/drivers/vdpa/mlx5/core/resources.c
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -256,7 +256,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 		mlx5_vdpa_warn(mvdev, "resources already allocated\n");
 		return -EINVAL;
 	}
-	mutex_init(&mvdev->mres.mr_mtx);
+	mutex_init(&mvdev->mres.lock);
 	res->uar = mlx5_get_uars_page(mdev);
 	if (IS_ERR(res->uar)) {
 		err = PTR_ERR(res->uar);
@@ -301,7 +301,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 err_uctx:
 	mlx5_put_uars_page(mdev, res->uar);
 err_uars:
-	mutex_destroy(&mvdev->mres.mr_mtx);
+	mutex_destroy(&mvdev->mres.lock);
 	return err;
 }
 
@@ -318,7 +318,7 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
 	dealloc_pd(mvdev, res->pdn, res->uid);
 	destroy_uctx(mvdev, res->uid);
 	mlx5_put_uars_page(mvdev->mdev, res->uar);
-	mutex_destroy(&mvdev->mres.mr_mtx);
+	mutex_destroy(&mvdev->mres.lock);
 	res->valid = false;
 }
 
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 3e55a7f1afcd..8a51c492a62a 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3639,10 +3639,10 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
 
 	mvdev->mres.group2asid[group] = asid;
 
-	mutex_lock(&mvdev->mres.mr_mtx);
+	mutex_lock(&mvdev->mres.lock);
 	if (group == MLX5_VDPA_CVQ_GROUP && mvdev->mres.mr[asid])
 		err = mlx5_vdpa_update_cvq_iotlb(mvdev, mvdev->mres.mr[asid]->iotlb, asid);
-	mutex_unlock(&mvdev->mres.mr_mtx);
+	mutex_unlock(&mvdev->mres.lock);
 
 	return err;
 }
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH vhost 6/7] vdpa/mlx5: Introduce init/destroy for MR resources
  2024-08-21 11:40 [PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations Dragos Tatulea
                   ` (4 preceding siblings ...)
  2024-08-21 11:40 ` [PATCH vhost 5/7] vdpa/mlx5: Rename mr_mtx -> lock Dragos Tatulea
@ 2024-08-21 11:41 ` Dragos Tatulea
  2024-08-29 14:37   ` Eugenio Perez Martin
  2024-08-21 11:41 ` [PATCH vhost 7/7] vdpa/mlx5: Postpone MR deletion Dragos Tatulea
  6 siblings, 1 reply; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-21 11:41 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin, Si-Wei Liu,
	virtualization, Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

There's currently not a lot of action happening during
the init/destroy of MR resources. But more will be added
in the upcoming patches.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 ++
 drivers/vdpa/mlx5/core/mr.c        | 17 +++++++++++++++++
 drivers/vdpa/mlx5/core/resources.c |  3 ---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 10 ++++++++--
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 89b564cecddf..c3e17bc888e8 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -138,6 +138,8 @@ int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, u32 *mkey, u32 *in,
 int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey);
 struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
 					 struct vhost_iotlb *iotlb);
+int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev);
+void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
 void mlx5_vdpa_clean_mrs(struct mlx5_vdpa_dev *mvdev);
 void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
 		      struct mlx5_vdpa_mr *mr);
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index f20f2a8a701d..ec75f165f832 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -843,3 +843,20 @@ int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
 
 	return 0;
 }
+
+int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev)
+{
+	struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
+
+	INIT_LIST_HEAD(&mres->mr_list_head);
+	mutex_init(&mres->lock);
+
+	return 0;
+}
+
+void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
+{
+	struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
+
+	mutex_destroy(&mres->lock);
+}
diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
index fe2ca3458f6c..aeae31d0cefa 100644
--- a/drivers/vdpa/mlx5/core/resources.c
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -256,7 +256,6 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 		mlx5_vdpa_warn(mvdev, "resources already allocated\n");
 		return -EINVAL;
 	}
-	mutex_init(&mvdev->mres.lock);
 	res->uar = mlx5_get_uars_page(mdev);
 	if (IS_ERR(res->uar)) {
 		err = PTR_ERR(res->uar);
@@ -301,7 +300,6 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 err_uctx:
 	mlx5_put_uars_page(mdev, res->uar);
 err_uars:
-	mutex_destroy(&mvdev->mres.lock);
 	return err;
 }
 
@@ -318,7 +316,6 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
 	dealloc_pd(mvdev, res->pdn, res->uid);
 	destroy_uctx(mvdev, res->uid);
 	mlx5_put_uars_page(mvdev->mdev, res->uar);
-	mutex_destroy(&mvdev->mres.lock);
 	res->valid = false;
 }
 
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 8a51c492a62a..1cadcb05a5c7 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3434,6 +3434,7 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
 
 	free_fixed_resources(ndev);
 	mlx5_vdpa_clean_mrs(mvdev);
+	mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
 	if (!is_zero_ether_addr(ndev->config.mac)) {
 		pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
 		mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
@@ -3962,12 +3963,15 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 	if (err)
 		goto err_mpfs;
 
-	INIT_LIST_HEAD(&mvdev->mres.mr_list_head);
+	err = mlx5_vdpa_init_mr_resources(mvdev);
+	if (err)
+		goto err_res;
+
 
 	if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
 		err = mlx5_vdpa_create_dma_mr(mvdev);
 		if (err)
-			goto err_res;
+			goto err_mr_res;
 	}
 
 	err = alloc_fixed_resources(ndev);
@@ -4009,6 +4013,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 	free_fixed_resources(ndev);
 err_mr:
 	mlx5_vdpa_clean_mrs(mvdev);
+err_mr_res:
+	mlx5_vdpa_destroy_mr_resources(mvdev);
 err_res:
 	mlx5_vdpa_free_resources(&ndev->mvdev);
 err_mpfs:
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH vhost 7/7] vdpa/mlx5: Postpone MR deletion
  2024-08-21 11:40 [PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations Dragos Tatulea
                   ` (5 preceding siblings ...)
  2024-08-21 11:41 ` [PATCH vhost 6/7] vdpa/mlx5: Introduce init/destroy for MR resources Dragos Tatulea
@ 2024-08-21 11:41 ` Dragos Tatulea
  2024-08-29 15:07   ` Eugenio Perez Martin
  6 siblings, 1 reply; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-21 11:41 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin, Si-Wei Liu,
	virtualization, Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

Currently, when a new MR is set up, the old MR is deleted. MR deletion
is about 30-40% the time of MR creation. As deleting the old MR is not
important for the process of setting up the new MR, this operation
can be postponed.

This series adds a workqueue that does MR garbage collection at a later
point. If the MR lock is taken, the handler will back off and
reschedule. The exception during shutdown: then the handler must
not postpone the work.

Note that this is only a speculative optimization: if there is some
mapping operation that is triggered while the garbage collector handler
has the lock taken, this operation it will have to wait for the handler
to finish.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h | 10 ++++++
 drivers/vdpa/mlx5/core/mr.c        | 51 ++++++++++++++++++++++++++++--
 drivers/vdpa/mlx5/net/mlx5_vnet.c  |  3 +-
 3 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index c3e17bc888e8..2cedf7e2dbc4 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -86,8 +86,18 @@ enum {
 struct mlx5_vdpa_mr_resources {
 	struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
 	unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
+
+	/* Pre-deletion mr list */
 	struct list_head mr_list_head;
+
+	/* Deferred mr list */
+	struct list_head mr_gc_list_head;
+	struct workqueue_struct *wq_gc;
+	struct delayed_work gc_dwork_ent;
+
 	struct mutex lock;
+
+	atomic_t shutdown;
 };
 
 struct mlx5_vdpa_dev {
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index ec75f165f832..43fce6b39cf2 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -653,14 +653,46 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_
 	kfree(mr);
 }
 
+#define MLX5_VDPA_MR_GC_TRIGGER_MS 2000
+
+static void mlx5_vdpa_mr_gc_handler(struct work_struct *work)
+{
+	struct mlx5_vdpa_mr_resources *mres;
+	struct mlx5_vdpa_mr *mr, *tmp;
+	struct mlx5_vdpa_dev *mvdev;
+
+	mres = container_of(work, struct mlx5_vdpa_mr_resources, gc_dwork_ent.work);
+
+	if (atomic_read(&mres->shutdown)) {
+		mutex_lock(&mres->lock);
+	} else if (!mutex_trylock(&mres->lock)) {
+		queue_delayed_work(mres->wq_gc, &mres->gc_dwork_ent,
+				   msecs_to_jiffies(MLX5_VDPA_MR_GC_TRIGGER_MS));
+		return;
+	}
+
+	mvdev = container_of(mres, struct mlx5_vdpa_dev, mres);
+
+	list_for_each_entry_safe(mr, tmp, &mres->mr_gc_list_head, mr_list) {
+		_mlx5_vdpa_destroy_mr(mvdev, mr);
+	}
+
+	mutex_unlock(&mres->lock);
+}
+
 static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
 			      struct mlx5_vdpa_mr *mr)
 {
+	struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
+
 	if (!mr)
 		return;
 
-	if (refcount_dec_and_test(&mr->refcount))
-		_mlx5_vdpa_destroy_mr(mvdev, mr);
+	if (refcount_dec_and_test(&mr->refcount)) {
+		list_move_tail(&mr->mr_list, &mres->mr_gc_list_head);
+		queue_delayed_work(mres->wq_gc, &mres->gc_dwork_ent,
+				   msecs_to_jiffies(MLX5_VDPA_MR_GC_TRIGGER_MS));
+	}
 }
 
 void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
@@ -848,9 +880,17 @@ int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev)
 {
 	struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
 
-	INIT_LIST_HEAD(&mres->mr_list_head);
+	mres->wq_gc = create_singlethread_workqueue("mlx5_vdpa_mr_gc");
+	if (!mres->wq_gc)
+		return -ENOMEM;
+
+	INIT_DELAYED_WORK(&mres->gc_dwork_ent, mlx5_vdpa_mr_gc_handler);
+
 	mutex_init(&mres->lock);
 
+	INIT_LIST_HEAD(&mres->mr_list_head);
+	INIT_LIST_HEAD(&mres->mr_gc_list_head);
+
 	return 0;
 }
 
@@ -858,5 +898,10 @@ void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
 {
 	struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
 
+	atomic_set(&mres->shutdown, 1);
+
+	flush_delayed_work(&mres->gc_dwork_ent);
+	destroy_workqueue(mres->wq_gc);
+	mres->wq_gc = NULL;
 	mutex_destroy(&mres->lock);
 }
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1cadcb05a5c7..ee9482ef51e6 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3435,6 +3435,8 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
 	free_fixed_resources(ndev);
 	mlx5_vdpa_clean_mrs(mvdev);
 	mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
+	mlx5_cmd_cleanup_async_ctx(&mvdev->async_ctx);
+
 	if (!is_zero_ether_addr(ndev->config.mac)) {
 		pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
 		mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
@@ -4044,7 +4046,6 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
 	destroy_workqueue(wq);
 	mgtdev->ndev = NULL;
 
-	mlx5_cmd_cleanup_async_ctx(&mvdev->async_ctx);
 }
 
 static const struct vdpa_mgmtdev_ops mdev_ops = {
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 1/7] vdpa/mlx5: Create direct MKEYs in parallel
  2024-08-21 11:40 ` [PATCH vhost 1/7] vdpa/mlx5: Create direct MKEYs in parallel Dragos Tatulea
@ 2024-08-29 13:10   ` Eugenio Perez Martin
       [not found]     ` <6935f3aa-9de5-4781-b823-30c17817cc86@nvidia.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Eugenio Perez Martin @ 2024-08-29 13:10 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, virtualization,
	Gal Pressman, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

On Wed, Aug 21, 2024 at 1:41 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Use the async interface to issue MTT MKEY creation.
> Extra care is taken at the allocation of FW input commands
> due to the MTT tables having variable sizes depending on
> MR.
>
> The indirect MKEY is still created synchronously at the
> end as the direct MKEYs need to be filled in.
>
> This makes create_user_mr() 3-5x faster, depending on
> the size of the MR.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  drivers/vdpa/mlx5/core/mr.c | 118 +++++++++++++++++++++++++++++-------
>  1 file changed, 96 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index 4758914ccf86..66e6a15f823f 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -49,17 +49,18 @@ static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
>         }
>  }
>
> -static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
> +struct mlx5_create_mkey_mem {
> +       u8 out[MLX5_ST_SZ_BYTES(create_mkey_out)];
> +       u8 in[MLX5_ST_SZ_BYTES(create_mkey_in)];
> +       DECLARE_FLEX_ARRAY(__be64, mtt);

I may be missing something obvious, but why do we need
DECLARE_FLEX_ARRAY here? My understanding is that it is only needed in
special cases like uapi headers and we can use "__be64 mtt[]" here.

> +};
> +
> +static void fill_create_direct_mr(struct mlx5_vdpa_dev *mvdev,
> +                                 struct mlx5_vdpa_direct_mr *mr,
> +                                 struct mlx5_create_mkey_mem *mem)
>  {
> -       int inlen;
> +       void *in = &mem->in;
>         void *mkc;
> -       void *in;
> -       int err;
> -
> -       inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + roundup(MLX5_ST_SZ_BYTES(mtt) * mr->nsg, 16);
> -       in = kvzalloc(inlen, GFP_KERNEL);
> -       if (!in)
> -               return -ENOMEM;
>
>         MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
>         mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
> @@ -76,18 +77,25 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct
>         MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
>                  get_octo_len(mr->end - mr->start, mr->log_size));
>         populate_mtts(mr, MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt));
> -       err = mlx5_vdpa_create_mkey(mvdev, &mr->mr, in, inlen);
> -       kvfree(in);
> -       if (err) {
> -               mlx5_vdpa_warn(mvdev, "Failed to create direct MR\n");
> -               return err;
> -       }
>
> -       return 0;
> +       MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
> +       MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
> +}
> +
> +static void create_direct_mr_end(struct mlx5_vdpa_dev *mvdev,
> +                                struct mlx5_vdpa_direct_mr *mr,
> +                                struct mlx5_create_mkey_mem *mem)
> +{
> +       u32 mkey_index = MLX5_GET(create_mkey_out, mem->out, mkey_index);
> +
> +       mr->mr = mlx5_idx_to_mkey(mkey_index);
>  }
>
>  static void destroy_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
>  {
> +       if (!mr->mr)
> +               return;
> +
>         mlx5_vdpa_destroy_mkey(mvdev, mr->mr);
>  }
>
> @@ -179,6 +187,74 @@ static int klm_byte_size(int nklms)
>         return 16 * ALIGN(nklms, 4);
>  }
>
> +static int create_direct_keys(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
> +{
> +       struct mlx5_vdpa_async_cmd *cmds = NULL;
> +       struct mlx5_vdpa_direct_mr *dmr;
> +       int err = 0;
> +       int i = 0;
> +
> +       cmds = kvcalloc(mr->num_directs, sizeof(*cmds), GFP_KERNEL);
> +       if (!cmds)
> +               return -ENOMEM;

Nit: this could benefit from Scope-based Cleanup Helpers [1], as it
would make clear that memory is always removed at the end of the
function & could prevent errors if the function is modified in the
future. I'm a big fan of them so my opinion may be biased though :).

It would be great to be able to remove the array members with that
too, but I think the kernel doesn't have any facility for that.

> +
> +       list_for_each_entry(dmr, &mr->head, list) {
> +               struct mlx5_create_mkey_mem *cmd_mem;
> +               int mttlen, mttcount;
> +
> +               mttlen = roundup(MLX5_ST_SZ_BYTES(mtt) * dmr->nsg, 16);

I don't get the roundup here, I guess it is so the driver does not
send potentially uninitialized memory to the device? Maybe the 16
should be a macro?

> +               mttcount = mttlen / sizeof(cmd_mem->mtt[0]);
> +               cmd_mem = kvcalloc(1, struct_size(cmd_mem, mtt, mttcount), GFP_KERNEL);
> +               if (!cmd_mem) {
> +                       err = -ENOMEM;
> +                       goto done;
> +               }
> +
> +               cmds[i].out = cmd_mem->out;
> +               cmds[i].outlen = sizeof(cmd_mem->out);
> +               cmds[i].in = cmd_mem->in;
> +               cmds[i].inlen = struct_size(cmd_mem, mtt, mttcount);
> +
> +               fill_create_direct_mr(mvdev, dmr, cmd_mem);
> +
> +               i++;
> +       }
> +
> +       err = mlx5_vdpa_exec_async_cmds(mvdev, cmds, mr->num_directs);
> +       if (err) {
> +
> +               mlx5_vdpa_err(mvdev, "error issuing MTT mkey creation for direct mrs: %d\n", err);
> +               goto done;
> +       }
> +
> +       i = 0;
> +       list_for_each_entry(dmr, &mr->head, list) {
> +               struct mlx5_vdpa_async_cmd *cmd = &cmds[i++];
> +               struct mlx5_create_mkey_mem *cmd_mem;
> +
> +               cmd_mem = container_of(cmd->out, struct mlx5_create_mkey_mem, out);
> +
> +               if (!cmd->err) {
> +                       create_direct_mr_end(mvdev, dmr, cmd_mem);

The caller function doesn't trust the result if we return an error.
Why not use the previous loop to call create_direct_mr_end? Am I
missing any side effect?

> +               } else {
> +                       err = err ? err : cmd->err;
> +                       mlx5_vdpa_err(mvdev, "error creating MTT mkey [0x%llx, 0x%llx]: %d\n",
> +                               dmr->start, dmr->end, cmd->err);
> +               }
> +       }
> +
> +done:
> +       for (i = i-1; i >= 0; i--) {
> +               struct mlx5_create_mkey_mem *cmd_mem;
> +
> +               cmd_mem = container_of(cmds[i].out, struct mlx5_create_mkey_mem, out);
> +               kvfree(cmd_mem);
> +       }
> +
> +       kvfree(cmds);
> +       return err;
> +}
> +
>  static int create_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
>  {
>         int inlen;
> @@ -279,14 +355,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr
>                 goto err_map;
>         }
>
> -       err = create_direct_mr(mvdev, mr);
> -       if (err)
> -               goto err_direct;
> -
>         return 0;
>
> -err_direct:
> -       dma_unmap_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
>  err_map:
>         sg_free_table(&mr->sg_head);
>         return err;
> @@ -401,6 +471,10 @@ static int create_user_mr(struct mlx5_vdpa_dev *mvdev,
>         if (err)
>                 goto err_chain;
>
> +       err = create_direct_keys(mvdev, mr);
> +       if (err)
> +               goto err_chain;
> +
>         /* Create the memory key that defines the guests's address space. This
>          * memory key refers to the direct keys that contain the MTT
>          * translations
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 2/7] vdpa/mlx5: Delete direct MKEYs in parallel
  2024-08-21 11:40 ` [PATCH vhost 2/7] vdpa/mlx5: Delete " Dragos Tatulea
@ 2024-08-29 13:42   ` Eugenio Perez Martin
  0 siblings, 0 replies; 19+ messages in thread
From: Eugenio Perez Martin @ 2024-08-29 13:42 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, virtualization,
	Gal Pressman, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

On Wed, Aug 21, 2024 at 1:42 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Use the async interface to issue MTT MKEY deletion.
>
> This makes destroy_user_mr() on average 8x times faster.
> This number is also dependent on the size of the MR being
> deleted.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  drivers/vdpa/mlx5/core/mr.c | 66 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
>
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index 66e6a15f823f..8cedf2969991 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -55,6 +55,11 @@ struct mlx5_create_mkey_mem {
>         DECLARE_FLEX_ARRAY(__be64, mtt);
>  };
>
> +struct mlx5_destroy_mkey_mem {
> +       u8 out[MLX5_ST_SZ_BYTES(destroy_mkey_out)];
> +       u8 in[MLX5_ST_SZ_BYTES(destroy_mkey_in)];
> +};
> +
>  static void fill_create_direct_mr(struct mlx5_vdpa_dev *mvdev,
>                                   struct mlx5_vdpa_direct_mr *mr,
>                                   struct mlx5_create_mkey_mem *mem)
> @@ -91,6 +96,17 @@ static void create_direct_mr_end(struct mlx5_vdpa_dev *mvdev,
>         mr->mr = mlx5_idx_to_mkey(mkey_index);
>  }
>
> +static void fill_destroy_direct_mr(struct mlx5_vdpa_dev *mvdev,
> +                                  struct mlx5_vdpa_direct_mr *mr,
> +                                  struct mlx5_destroy_mkey_mem *mem)
> +{
> +       void *in = &mem->in;
> +

Nit, isn't this declaration redundant? Looking at the definition of
MLX5_SET, the second argument is casted to (__be32 *) anyway.

> +       MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
> +       MLX5_SET(destroy_mkey_in, in, opcode, MLX5_CMD_OP_DESTROY_MKEY);
> +       MLX5_SET(destroy_mkey_in, in, mkey_index, mlx5_mkey_to_idx(mr->mr));
> +}
> +
>  static void destroy_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
>  {
>         if (!mr->mr)
> @@ -255,6 +271,55 @@ static int create_direct_keys(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *
>         return err;
>  }
>
> +static int destroy_direct_keys(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
> +{
> +       struct mlx5_destroy_mkey_mem *cmd_mem;
> +       struct mlx5_vdpa_async_cmd *cmds;
> +       struct mlx5_vdpa_direct_mr *dmr;
> +       int err = 0;
> +       int i = 0;
> +
> +       cmds = kvcalloc(mr->num_directs, sizeof(*cmds), GFP_KERNEL);
> +       cmd_mem = kvcalloc(mr->num_directs, sizeof(*cmd_mem), GFP_KERNEL);
> +       if (!cmds || !cmd_mem) {
> +               err = -ENOMEM;
> +               goto done;
> +       }
> +
> +       list_for_each_entry(dmr, &mr->head, list) {
> +               cmds[i].out = cmd_mem[i].out;
> +               cmds[i].outlen = sizeof(cmd_mem[i].out);
> +               cmds[i].in = cmd_mem[i].in;
> +               cmds[i].inlen = sizeof(cmd_mem[i].in);
> +               fill_destroy_direct_mr(mvdev, dmr, &cmd_mem[i]);
> +               i++;
> +       }
> +
> +       err = mlx5_vdpa_exec_async_cmds(mvdev, cmds, mr->num_directs);
> +       if (err) {
> +
> +               mlx5_vdpa_err(mvdev, "error issuing MTT mkey deletion for direct mrs: %d\n", err);
> +               goto done;
> +       }
> +
> +       i = 0;
> +       list_for_each_entry(dmr, &mr->head, list) {
> +               struct mlx5_vdpa_async_cmd *cmd = &cmds[i++];
> +
> +               dmr->mr = 0;
> +               if (cmd->err) {
> +                       err = err ? err : cmd->err;
> +                       mlx5_vdpa_err(mvdev, "error deleting MTT mkey [0x%llx, 0x%llx]: %d\n",
> +                               dmr->start, dmr->end, cmd->err);
> +               }
> +       }
> +
> +done:
> +       kvfree(cmd_mem);
> +       kvfree(cmds);

Same nitpick here as in the previous patch about the Scope-based
Cleanup Helpers. Either way,

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> +       return err;
> +}
> +
>  static int create_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
>  {
>         int inlen;
> @@ -563,6 +628,7 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr
>         struct mlx5_vdpa_direct_mr *n;
>
>         destroy_indirect_key(mvdev, mr);
> +       destroy_direct_keys(mvdev, mr);
>         list_for_each_entry_safe_reverse(dmr, n, &mr->head, list) {
>                 list_del_init(&dmr->list);
>                 unmap_direct_mr(mvdev, dmr);
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 4/7] vdpa/mlx5: Extract mr members in own resource struct
  2024-08-21 11:40 ` [PATCH vhost 4/7] vdpa/mlx5: Extract mr members in own resource struct Dragos Tatulea
@ 2024-08-29 14:16   ` Eugenio Perez Martin
  0 siblings, 0 replies; 19+ messages in thread
From: Eugenio Perez Martin @ 2024-08-29 14:16 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, virtualization,
	Gal Pressman, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

On Wed, Aug 21, 2024 at 1:42 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Group all mapping related resources into their own structure.
>
> Upcoming patches will add more members in this new structure.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h | 13 ++++++-----
>  drivers/vdpa/mlx5/core/mr.c        | 30 ++++++++++++-------------
>  drivers/vdpa/mlx5/core/resources.c |  6 ++---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 36 +++++++++++++++---------------
>  4 files changed, 44 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> index 4d217d18239c..5ae6deea2a8a 100644
> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> @@ -83,10 +83,18 @@ enum {
>         MLX5_VDPA_NUM_AS = 2
>  };
>
> +struct mlx5_vdpa_mr_resources {
> +       struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
> +       unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
> +       struct list_head mr_list_head;
> +       struct mutex mr_mtx;
> +};
> +
>  struct mlx5_vdpa_dev {
>         struct vdpa_device vdev;
>         struct mlx5_core_dev *mdev;
>         struct mlx5_vdpa_resources res;
> +       struct mlx5_vdpa_mr_resources mres;
>
>         u64 mlx_features;
>         u64 actual_features;
> @@ -95,13 +103,8 @@ struct mlx5_vdpa_dev {
>         u16 max_idx;
>         u32 generation;
>
> -       struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
> -       struct list_head mr_list_head;
> -       /* serialize mr access */
> -       struct mutex mr_mtx;
>         struct mlx5_control_vq cvq;
>         struct workqueue_struct *wq;
> -       unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
>         bool suspended;
>
>         struct mlx5_async_ctx async_ctx;
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index 149edea09c8f..2c8660e5c0de 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -666,9 +666,9 @@ static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
>  void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
>                       struct mlx5_vdpa_mr *mr)
>  {
> -       mutex_lock(&mvdev->mr_mtx);
> +       mutex_lock(&mvdev->mres.mr_mtx);
>         _mlx5_vdpa_put_mr(mvdev, mr);
> -       mutex_unlock(&mvdev->mr_mtx);
> +       mutex_unlock(&mvdev->mres.mr_mtx);
>  }
>
>  static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
> @@ -683,39 +683,39 @@ static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
>  void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
>                       struct mlx5_vdpa_mr *mr)
>  {
> -       mutex_lock(&mvdev->mr_mtx);
> +       mutex_lock(&mvdev->mres.mr_mtx);
>         _mlx5_vdpa_get_mr(mvdev, mr);
> -       mutex_unlock(&mvdev->mr_mtx);
> +       mutex_unlock(&mvdev->mres.mr_mtx);
>  }
>
>  void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
>                          struct mlx5_vdpa_mr *new_mr,
>                          unsigned int asid)
>  {
> -       struct mlx5_vdpa_mr *old_mr = mvdev->mr[asid];
> +       struct mlx5_vdpa_mr *old_mr = mvdev->mres.mr[asid];
>
> -       mutex_lock(&mvdev->mr_mtx);
> +       mutex_lock(&mvdev->mres.mr_mtx);
>
>         _mlx5_vdpa_put_mr(mvdev, old_mr);
> -       mvdev->mr[asid] = new_mr;
> +       mvdev->mres.mr[asid] = new_mr;
>
> -       mutex_unlock(&mvdev->mr_mtx);
> +       mutex_unlock(&mvdev->mres.mr_mtx);
>  }
>
>  static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
>  {
>         struct mlx5_vdpa_mr *mr;
>
> -       mutex_lock(&mvdev->mr_mtx);
> +       mutex_lock(&mvdev->mres.mr_mtx);
>
> -       list_for_each_entry(mr, &mvdev->mr_list_head, mr_list) {
> +       list_for_each_entry(mr, &mvdev->mres.mr_list_head, mr_list) {
>
>                 mlx5_vdpa_warn(mvdev, "mkey still alive after resource delete: "
>                                       "mr: %p, mkey: 0x%x, refcount: %u\n",
>                                        mr, mr->mkey, refcount_read(&mr->refcount));
>         }
>
> -       mutex_unlock(&mvdev->mr_mtx);
> +       mutex_unlock(&mvdev->mres.mr_mtx);
>
>  }
>
> @@ -753,7 +753,7 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
>         if (err)
>                 goto err_iotlb;
>
> -       list_add_tail(&mr->mr_list, &mvdev->mr_list_head);
> +       list_add_tail(&mr->mr_list, &mvdev->mres.mr_list_head);
>
>         return 0;
>
> @@ -779,9 +779,9 @@ struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
>         if (!mr)
>                 return ERR_PTR(-ENOMEM);
>
> -       mutex_lock(&mvdev->mr_mtx);
> +       mutex_lock(&mvdev->mres.mr_mtx);
>         err = _mlx5_vdpa_create_mr(mvdev, mr, iotlb);
> -       mutex_unlock(&mvdev->mr_mtx);
> +       mutex_unlock(&mvdev->mres.mr_mtx);
>
>         if (err)
>                 goto out_err;
> @@ -801,7 +801,7 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
>  {
>         int err;
>
> -       if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
> +       if (mvdev->mres.group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
>                 return 0;
>
>         spin_lock(&mvdev->cvq.iommu_lock);
> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> index 22ea32fe007b..3e3b3049cb08 100644
> --- a/drivers/vdpa/mlx5/core/resources.c
> +++ b/drivers/vdpa/mlx5/core/resources.c
> @@ -256,7 +256,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>                 mlx5_vdpa_warn(mvdev, "resources already allocated\n");
>                 return -EINVAL;
>         }
> -       mutex_init(&mvdev->mr_mtx);
> +       mutex_init(&mvdev->mres.mr_mtx);
>         res->uar = mlx5_get_uars_page(mdev);
>         if (IS_ERR(res->uar)) {
>                 err = PTR_ERR(res->uar);
> @@ -301,7 +301,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>  err_uctx:
>         mlx5_put_uars_page(mdev, res->uar);
>  err_uars:
> -       mutex_destroy(&mvdev->mr_mtx);
> +       mutex_destroy(&mvdev->mres.mr_mtx);
>         return err;
>  }
>
> @@ -318,7 +318,7 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
>         dealloc_pd(mvdev, res->pdn, res->uid);
>         destroy_uctx(mvdev, res->uid);
>         mlx5_put_uars_page(mvdev->mdev, res->uar);
> -       mutex_destroy(&mvdev->mr_mtx);
> +       mutex_destroy(&mvdev->mres.mr_mtx);
>         res->valid = false;
>  }
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index cf2b77ebc72b..3e55a7f1afcd 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -941,11 +941,11 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev,
>                 MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
>                 MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
>
> -               vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
> +               vq_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_GROUP]];
>                 if (vq_mr)
>                         MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
>
> -               vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
> +               vq_desc_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
>                 if (vq_desc_mr &&
>                     MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
>                         MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, vq_desc_mr->mkey);
> @@ -953,11 +953,11 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev,
>                 /* If there is no mr update, make sure that the existing ones are set
>                  * modify to ready.
>                  */
> -               vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
> +               vq_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_GROUP]];
>                 if (vq_mr)
>                         mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
>
> -               vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
> +               vq_desc_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
>                 if (vq_desc_mr)
>                         mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
>         }
> @@ -1354,7 +1354,7 @@ static void fill_modify_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
>         }
>
>         if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
> -               vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
> +               vq_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_GROUP]];
>
>                 if (vq_mr)
>                         MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
> @@ -1363,7 +1363,7 @@ static void fill_modify_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
>         }
>
>         if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
> -               desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
> +               desc_mr = mvdev->mres.mr[mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
>
>                 if (desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
>                         MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, desc_mr->mkey);
> @@ -1381,8 +1381,8 @@ static void modify_virtqueue_end(struct mlx5_vdpa_net *ndev,
>         struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
>
>         if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
> -               unsigned int asid = mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP];
> -               struct mlx5_vdpa_mr *vq_mr = mvdev->mr[asid];
> +               unsigned int asid = mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_GROUP];
> +               struct mlx5_vdpa_mr *vq_mr = mvdev->mres.mr[asid];
>
>                 mlx5_vdpa_put_mr(mvdev, mvq->vq_mr);
>                 mlx5_vdpa_get_mr(mvdev, vq_mr);
> @@ -1390,8 +1390,8 @@ static void modify_virtqueue_end(struct mlx5_vdpa_net *ndev,
>         }
>
>         if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
> -               unsigned int asid = mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP];
> -               struct mlx5_vdpa_mr *desc_mr = mvdev->mr[asid];
> +               unsigned int asid = mvdev->mres.group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP];
> +               struct mlx5_vdpa_mr *desc_mr = mvdev->mres.mr[asid];
>
>                 mlx5_vdpa_put_mr(mvdev, mvq->desc_mr);
>                 mlx5_vdpa_get_mr(mvdev, desc_mr);
> @@ -3235,7 +3235,7 @@ static void init_group_to_asid_map(struct mlx5_vdpa_dev *mvdev)
>
>         /* default mapping all groups are mapped to asid 0 */
>         for (i = 0; i < MLX5_VDPA_NUMVQ_GROUPS; i++)
> -               mvdev->group2asid[i] = 0;
> +               mvdev->mres.group2asid[i] = 0;
>  }
>
>  static bool needs_vqs_reset(const struct mlx5_vdpa_dev *mvdev)
> @@ -3353,7 +3353,7 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
>                 new_mr = NULL;
>         }
>
> -       if (!mvdev->mr[asid]) {
> +       if (!mvdev->mres.mr[asid]) {
>                 mlx5_vdpa_update_mr(mvdev, new_mr, asid);
>         } else {
>                 err = mlx5_vdpa_change_map(mvdev, new_mr, asid);
> @@ -3637,12 +3637,12 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
>         if (group >= MLX5_VDPA_NUMVQ_GROUPS)
>                 return -EINVAL;
>
> -       mvdev->group2asid[group] = asid;
> +       mvdev->mres.group2asid[group] = asid;
>
> -       mutex_lock(&mvdev->mr_mtx);
> -       if (group == MLX5_VDPA_CVQ_GROUP && mvdev->mr[asid])
> -               err = mlx5_vdpa_update_cvq_iotlb(mvdev, mvdev->mr[asid]->iotlb, asid);
> -       mutex_unlock(&mvdev->mr_mtx);
> +       mutex_lock(&mvdev->mres.mr_mtx);
> +       if (group == MLX5_VDPA_CVQ_GROUP && mvdev->mres.mr[asid])
> +               err = mlx5_vdpa_update_cvq_iotlb(mvdev, mvdev->mres.mr[asid]->iotlb, asid);
> +       mutex_unlock(&mvdev->mres.mr_mtx);
>
>         return err;
>  }
> @@ -3962,7 +3962,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>         if (err)
>                 goto err_mpfs;
>
> -       INIT_LIST_HEAD(&mvdev->mr_list_head);
> +       INIT_LIST_HEAD(&mvdev->mres.mr_list_head);
>
>         if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
>                 err = mlx5_vdpa_create_dma_mr(mvdev);
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 5/7] vdpa/mlx5: Rename mr_mtx -> lock
  2024-08-21 11:40 ` [PATCH vhost 5/7] vdpa/mlx5: Rename mr_mtx -> lock Dragos Tatulea
@ 2024-08-29 14:18   ` Eugenio Perez Martin
  0 siblings, 0 replies; 19+ messages in thread
From: Eugenio Perez Martin @ 2024-08-29 14:18 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, virtualization,
	Gal Pressman, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

On Wed, Aug 21, 2024 at 1:42 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Now that the mr resources have their own namespace in the
> struct, give the lock a clearer name.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 +-
>  drivers/vdpa/mlx5/core/mr.c        | 20 ++++++++++----------
>  drivers/vdpa/mlx5/core/resources.c |  6 +++---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  |  4 ++--
>  4 files changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> index 5ae6deea2a8a..89b564cecddf 100644
> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> @@ -87,7 +87,7 @@ struct mlx5_vdpa_mr_resources {
>         struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
>         unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
>         struct list_head mr_list_head;
> -       struct mutex mr_mtx;
> +       struct mutex lock;
>  };
>
>  struct mlx5_vdpa_dev {
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index 2c8660e5c0de..f20f2a8a701d 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -666,9 +666,9 @@ static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
>  void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
>                       struct mlx5_vdpa_mr *mr)
>  {
> -       mutex_lock(&mvdev->mres.mr_mtx);
> +       mutex_lock(&mvdev->mres.lock);
>         _mlx5_vdpa_put_mr(mvdev, mr);
> -       mutex_unlock(&mvdev->mres.mr_mtx);
> +       mutex_unlock(&mvdev->mres.lock);
>  }
>
>  static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
> @@ -683,9 +683,9 @@ static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
>  void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
>                       struct mlx5_vdpa_mr *mr)
>  {
> -       mutex_lock(&mvdev->mres.mr_mtx);
> +       mutex_lock(&mvdev->mres.lock);
>         _mlx5_vdpa_get_mr(mvdev, mr);
> -       mutex_unlock(&mvdev->mres.mr_mtx);
> +       mutex_unlock(&mvdev->mres.lock);
>  }
>
>  void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
> @@ -694,19 +694,19 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
>  {
>         struct mlx5_vdpa_mr *old_mr = mvdev->mres.mr[asid];
>
> -       mutex_lock(&mvdev->mres.mr_mtx);
> +       mutex_lock(&mvdev->mres.lock);
>
>         _mlx5_vdpa_put_mr(mvdev, old_mr);
>         mvdev->mres.mr[asid] = new_mr;
>
> -       mutex_unlock(&mvdev->mres.mr_mtx);
> +       mutex_unlock(&mvdev->mres.lock);
>  }
>
>  static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
>  {
>         struct mlx5_vdpa_mr *mr;
>
> -       mutex_lock(&mvdev->mres.mr_mtx);
> +       mutex_lock(&mvdev->mres.lock);
>
>         list_for_each_entry(mr, &mvdev->mres.mr_list_head, mr_list) {
>
> @@ -715,7 +715,7 @@ static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
>                                        mr, mr->mkey, refcount_read(&mr->refcount));
>         }
>
> -       mutex_unlock(&mvdev->mres.mr_mtx);
> +       mutex_unlock(&mvdev->mres.lock);
>
>  }
>
> @@ -779,9 +779,9 @@ struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
>         if (!mr)
>                 return ERR_PTR(-ENOMEM);
>
> -       mutex_lock(&mvdev->mres.mr_mtx);
> +       mutex_lock(&mvdev->mres.lock);
>         err = _mlx5_vdpa_create_mr(mvdev, mr, iotlb);
> -       mutex_unlock(&mvdev->mres.mr_mtx);
> +       mutex_unlock(&mvdev->mres.lock);
>
>         if (err)
>                 goto out_err;
> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> index 3e3b3049cb08..fe2ca3458f6c 100644
> --- a/drivers/vdpa/mlx5/core/resources.c
> +++ b/drivers/vdpa/mlx5/core/resources.c
> @@ -256,7 +256,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>                 mlx5_vdpa_warn(mvdev, "resources already allocated\n");
>                 return -EINVAL;
>         }
> -       mutex_init(&mvdev->mres.mr_mtx);
> +       mutex_init(&mvdev->mres.lock);
>         res->uar = mlx5_get_uars_page(mdev);
>         if (IS_ERR(res->uar)) {
>                 err = PTR_ERR(res->uar);
> @@ -301,7 +301,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>  err_uctx:
>         mlx5_put_uars_page(mdev, res->uar);
>  err_uars:
> -       mutex_destroy(&mvdev->mres.mr_mtx);
> +       mutex_destroy(&mvdev->mres.lock);
>         return err;
>  }
>
> @@ -318,7 +318,7 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
>         dealloc_pd(mvdev, res->pdn, res->uid);
>         destroy_uctx(mvdev, res->uid);
>         mlx5_put_uars_page(mvdev->mdev, res->uar);
> -       mutex_destroy(&mvdev->mres.mr_mtx);
> +       mutex_destroy(&mvdev->mres.lock);
>         res->valid = false;
>  }
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 3e55a7f1afcd..8a51c492a62a 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3639,10 +3639,10 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
>
>         mvdev->mres.group2asid[group] = asid;
>
> -       mutex_lock(&mvdev->mres.mr_mtx);
> +       mutex_lock(&mvdev->mres.lock);
>         if (group == MLX5_VDPA_CVQ_GROUP && mvdev->mres.mr[asid])
>                 err = mlx5_vdpa_update_cvq_iotlb(mvdev, mvdev->mres.mr[asid]->iotlb, asid);
> -       mutex_unlock(&mvdev->mres.mr_mtx);
> +       mutex_unlock(&mvdev->mres.lock);
>
>         return err;
>  }
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 6/7] vdpa/mlx5: Introduce init/destroy for MR resources
  2024-08-21 11:41 ` [PATCH vhost 6/7] vdpa/mlx5: Introduce init/destroy for MR resources Dragos Tatulea
@ 2024-08-29 14:37   ` Eugenio Perez Martin
  2024-08-29 15:25     ` Dragos Tatulea
  0 siblings, 1 reply; 19+ messages in thread
From: Eugenio Perez Martin @ 2024-08-29 14:37 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, virtualization,
	Gal Pressman, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

On Wed, Aug 21, 2024 at 1:42 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> There's currently not a lot of action happening during
> the init/destroy of MR resources. But more will be added
> in the upcoming patches.

If the series doesn't receive new patches, it is just the next patch :).

>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 ++
>  drivers/vdpa/mlx5/core/mr.c        | 17 +++++++++++++++++
>  drivers/vdpa/mlx5/core/resources.c |  3 ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 10 ++++++++--
>  4 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> index 89b564cecddf..c3e17bc888e8 100644
> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> @@ -138,6 +138,8 @@ int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, u32 *mkey, u32 *in,
>  int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey);
>  struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
>                                          struct vhost_iotlb *iotlb);
> +int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev);
> +void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
>  void mlx5_vdpa_clean_mrs(struct mlx5_vdpa_dev *mvdev);
>  void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
>                       struct mlx5_vdpa_mr *mr);
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index f20f2a8a701d..ec75f165f832 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -843,3 +843,20 @@ int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
>
>         return 0;
>  }
> +
> +int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev)
> +{
> +       struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
> +
> +       INIT_LIST_HEAD(&mres->mr_list_head);
> +       mutex_init(&mres->lock);
> +
> +       return 0;

I'd leave this function return void here and remove the caller error
control path.

> +}
> +
> +void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
> +{
> +       struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
> +
> +       mutex_destroy(&mres->lock);
> +}
> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> index fe2ca3458f6c..aeae31d0cefa 100644
> --- a/drivers/vdpa/mlx5/core/resources.c
> +++ b/drivers/vdpa/mlx5/core/resources.c
> @@ -256,7 +256,6 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>                 mlx5_vdpa_warn(mvdev, "resources already allocated\n");
>                 return -EINVAL;
>         }
> -       mutex_init(&mvdev->mres.lock);
>         res->uar = mlx5_get_uars_page(mdev);
>         if (IS_ERR(res->uar)) {
>                 err = PTR_ERR(res->uar);
> @@ -301,7 +300,6 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>  err_uctx:
>         mlx5_put_uars_page(mdev, res->uar);
>  err_uars:
> -       mutex_destroy(&mvdev->mres.lock);

Maybe it is just me, but this patch is also moving the lock lifetime
from mlx5_vdpa_alloc_resources / mlx5_vdpa_free_resources to
mlx5_vdpa_dev_add / mlx5_vdpa_free. I guess it has a justification we
can either clarify in the patch message or split in its own patch.

>         return err;
>  }
>
> @@ -318,7 +316,6 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
>         dealloc_pd(mvdev, res->pdn, res->uid);
>         destroy_uctx(mvdev, res->uid);
>         mlx5_put_uars_page(mvdev->mdev, res->uar);
> -       mutex_destroy(&mvdev->mres.lock);
>         res->valid = false;
>  }
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 8a51c492a62a..1cadcb05a5c7 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3434,6 +3434,7 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>
>         free_fixed_resources(ndev);
>         mlx5_vdpa_clean_mrs(mvdev);
> +       mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
>         if (!is_zero_ether_addr(ndev->config.mac)) {
>                 pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
>                 mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
> @@ -3962,12 +3963,15 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>         if (err)
>                 goto err_mpfs;
>
> -       INIT_LIST_HEAD(&mvdev->mres.mr_list_head);
> +       err = mlx5_vdpa_init_mr_resources(mvdev);
> +       if (err)
> +               goto err_res;
> +

Extra newline here.

>
>         if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
>                 err = mlx5_vdpa_create_dma_mr(mvdev);
>                 if (err)
> -                       goto err_res;
> +                       goto err_mr_res;
>         }
>
>         err = alloc_fixed_resources(ndev);
> @@ -4009,6 +4013,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>         free_fixed_resources(ndev);
>  err_mr:
>         mlx5_vdpa_clean_mrs(mvdev);
> +err_mr_res:
> +       mlx5_vdpa_destroy_mr_resources(mvdev);
>  err_res:
>         mlx5_vdpa_free_resources(&ndev->mvdev);
>  err_mpfs:
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 7/7] vdpa/mlx5: Postpone MR deletion
  2024-08-21 11:41 ` [PATCH vhost 7/7] vdpa/mlx5: Postpone MR deletion Dragos Tatulea
@ 2024-08-29 15:07   ` Eugenio Perez Martin
  2024-08-29 15:22     ` Dragos Tatulea
  0 siblings, 1 reply; 19+ messages in thread
From: Eugenio Perez Martin @ 2024-08-29 15:07 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, virtualization,
	Gal Pressman, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

On Wed, Aug 21, 2024 at 1:42 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Currently, when a new MR is set up, the old MR is deleted. MR deletion
> is about 30-40% the time of MR creation. As deleting the old MR is not
> important for the process of setting up the new MR, this operation
> can be postponed.
>
> This series adds a workqueue that does MR garbage collection at a later
> point. If the MR lock is taken, the handler will back off and
> reschedule. The exception during shutdown: then the handler must
> not postpone the work.
>
> Note that this is only a speculative optimization: if there is some
> mapping operation that is triggered while the garbage collector handler
> has the lock taken, this operation it will have to wait for the handler
> to finish.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h | 10 ++++++
>  drivers/vdpa/mlx5/core/mr.c        | 51 ++++++++++++++++++++++++++++--
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  |  3 +-
>  3 files changed, 60 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> index c3e17bc888e8..2cedf7e2dbc4 100644
> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> @@ -86,8 +86,18 @@ enum {
>  struct mlx5_vdpa_mr_resources {
>         struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
>         unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
> +
> +       /* Pre-deletion mr list */
>         struct list_head mr_list_head;
> +
> +       /* Deferred mr list */
> +       struct list_head mr_gc_list_head;
> +       struct workqueue_struct *wq_gc;
> +       struct delayed_work gc_dwork_ent;
> +
>         struct mutex lock;
> +
> +       atomic_t shutdown;
>  };
>
>  struct mlx5_vdpa_dev {
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index ec75f165f832..43fce6b39cf2 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -653,14 +653,46 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_
>         kfree(mr);
>  }
>
> +#define MLX5_VDPA_MR_GC_TRIGGER_MS 2000
> +
> +static void mlx5_vdpa_mr_gc_handler(struct work_struct *work)
> +{
> +       struct mlx5_vdpa_mr_resources *mres;
> +       struct mlx5_vdpa_mr *mr, *tmp;
> +       struct mlx5_vdpa_dev *mvdev;
> +
> +       mres = container_of(work, struct mlx5_vdpa_mr_resources, gc_dwork_ent.work);
> +
> +       if (atomic_read(&mres->shutdown)) {
> +               mutex_lock(&mres->lock);
> +       } else if (!mutex_trylock(&mres->lock)) {

Is the trylock worth it? My understanding is that mutex_lock will add
the kthread to the waitqueue anyway if it is not able to acquire the
lock.

> +               queue_delayed_work(mres->wq_gc, &mres->gc_dwork_ent,
> +                                  msecs_to_jiffies(MLX5_VDPA_MR_GC_TRIGGER_MS));
> +               return;
> +       }
> +
> +       mvdev = container_of(mres, struct mlx5_vdpa_dev, mres);
> +
> +       list_for_each_entry_safe(mr, tmp, &mres->mr_gc_list_head, mr_list) {
> +               _mlx5_vdpa_destroy_mr(mvdev, mr);
> +       }
> +
> +       mutex_unlock(&mres->lock);
> +}
> +
>  static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
>                               struct mlx5_vdpa_mr *mr)
>  {
> +       struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
> +
>         if (!mr)
>                 return;
>
> -       if (refcount_dec_and_test(&mr->refcount))
> -               _mlx5_vdpa_destroy_mr(mvdev, mr);
> +       if (refcount_dec_and_test(&mr->refcount)) {
> +               list_move_tail(&mr->mr_list, &mres->mr_gc_list_head);
> +               queue_delayed_work(mres->wq_gc, &mres->gc_dwork_ent,
> +                                  msecs_to_jiffies(MLX5_VDPA_MR_GC_TRIGGER_MS));

Why the delay?

> +       }
>  }
>
>  void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
> @@ -848,9 +880,17 @@ int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev)
>  {
>         struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
>
> -       INIT_LIST_HEAD(&mres->mr_list_head);
> +       mres->wq_gc = create_singlethread_workqueue("mlx5_vdpa_mr_gc");
> +       if (!mres->wq_gc)
> +               return -ENOMEM;
> +
> +       INIT_DELAYED_WORK(&mres->gc_dwork_ent, mlx5_vdpa_mr_gc_handler);
> +
>         mutex_init(&mres->lock);
>
> +       INIT_LIST_HEAD(&mres->mr_list_head);
> +       INIT_LIST_HEAD(&mres->mr_gc_list_head);
> +
>         return 0;
>  }
>
> @@ -858,5 +898,10 @@ void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
>  {
>         struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
>
> +       atomic_set(&mres->shutdown, 1);
> +
> +       flush_delayed_work(&mres->gc_dwork_ent);
> +       destroy_workqueue(mres->wq_gc);
> +       mres->wq_gc = NULL;
>         mutex_destroy(&mres->lock);
>  }
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 1cadcb05a5c7..ee9482ef51e6 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3435,6 +3435,8 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>         free_fixed_resources(ndev);
>         mlx5_vdpa_clean_mrs(mvdev);
>         mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
> +       mlx5_cmd_cleanup_async_ctx(&mvdev->async_ctx);
> +
>         if (!is_zero_ether_addr(ndev->config.mac)) {
>                 pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
>                 mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
> @@ -4044,7 +4046,6 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
>         destroy_workqueue(wq);
>         mgtdev->ndev = NULL;
>

Extra newline here.

> -       mlx5_cmd_cleanup_async_ctx(&mvdev->async_ctx);
>  }
>
>  static const struct vdpa_mgmtdev_ops mdev_ops = {
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 1/7] vdpa/mlx5: Create direct MKEYs in parallel
       [not found]     ` <6935f3aa-9de5-4781-b823-30c17817cc86@nvidia.com>
@ 2024-08-29 15:15       ` Eugenio Perez Martin
  2024-08-30 11:09         ` Dragos Tatulea
  0 siblings, 1 reply; 19+ messages in thread
From: Eugenio Perez Martin @ 2024-08-29 15:15 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Jason Wang, Si-Wei Liu, virtualization, Gal Pressman, kvm list,
	linux-kernel, Parav Pandit, Xuan Zhuo, Cosmin Ratiu,
	Michael Tsirkin

On Thu, Aug 29, 2024 at 3:54 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
>
>
> On 29.08.24 15:10, Eugenio Perez Martin wrote:
> > On Wed, Aug 21, 2024 at 1:41 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> >>
> >> Use the async interface to issue MTT MKEY creation.
> >> Extra care is taken at the allocation of FW input commands
> >> due to the MTT tables having variable sizes depending on
> >> MR.
> >>
> >> The indirect MKEY is still created synchronously at the
> >> end as the direct MKEYs need to be filled in.
> >>
> >> This makes create_user_mr() 3-5x faster, depending on
> >> the size of the MR.
> >>
> >> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> >> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> >> ---
> >>  drivers/vdpa/mlx5/core/mr.c | 118 +++++++++++++++++++++++++++++-------
> >>  1 file changed, 96 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> >> index 4758914ccf86..66e6a15f823f 100644
> >> --- a/drivers/vdpa/mlx5/core/mr.c
> >> +++ b/drivers/vdpa/mlx5/core/mr.c
> >> @@ -49,17 +49,18 @@ static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
> >>         }
> >>  }
> >>
> >> -static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
> >> +struct mlx5_create_mkey_mem {
> >> +       u8 out[MLX5_ST_SZ_BYTES(create_mkey_out)];
> >> +       u8 in[MLX5_ST_SZ_BYTES(create_mkey_in)];
> >> +       DECLARE_FLEX_ARRAY(__be64, mtt);
> >
> > I may be missing something obvious, but why do we need
> > DECLARE_FLEX_ARRAY here? My understanding is that it is only needed in
> > special cases like uapi headers and we can use "__be64 mtt[]" here.
> >
> checkpatch.pl was complaining about it because in my initial version I
> used the "[0]" version of zero length based arrays.
>
> My impression was that DECLARE_FLEX_ARRAY is preferred option because it
> triggers a compiler error if the zero lenth array is not at the end of
> the struct. But on closer inspection I see that using the right C99
> empty brackets notation is enough to trigger this error.
> DECLARE_FLEX_ARRAY seems to be useful for the union case.
>
> I will change it in a v2.
>
> >> +};
> >> +
> >> +static void fill_create_direct_mr(struct mlx5_vdpa_dev *mvdev,
> >> +                                 struct mlx5_vdpa_direct_mr *mr,
> >> +                                 struct mlx5_create_mkey_mem *mem)
> >>  {
> >> -       int inlen;
> >> +       void *in = &mem->in;
> >>         void *mkc;
> >> -       void *in;
> >> -       int err;
> >> -
> >> -       inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + roundup(MLX5_ST_SZ_BYTES(mtt) * mr->nsg, 16);
> >> -       in = kvzalloc(inlen, GFP_KERNEL);
> >> -       if (!in)
> >> -               return -ENOMEM;
> >>
> >>         MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
> >>         mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
> >> @@ -76,18 +77,25 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct
> >>         MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
> >>                  get_octo_len(mr->end - mr->start, mr->log_size));
> >>         populate_mtts(mr, MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt));
> >> -       err = mlx5_vdpa_create_mkey(mvdev, &mr->mr, in, inlen);
> >> -       kvfree(in);
> >> -       if (err) {
> >> -               mlx5_vdpa_warn(mvdev, "Failed to create direct MR\n");
> >> -               return err;
> >> -       }
> >>
> >> -       return 0;
> >> +       MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
> >> +       MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
> >> +}
> >> +
> >> +static void create_direct_mr_end(struct mlx5_vdpa_dev *mvdev,
> >> +                                struct mlx5_vdpa_direct_mr *mr,
> >> +                                struct mlx5_create_mkey_mem *mem)
> >> +{
> >> +       u32 mkey_index = MLX5_GET(create_mkey_out, mem->out, mkey_index);
> >> +
> >> +       mr->mr = mlx5_idx_to_mkey(mkey_index);
> >>  }
> >>
> >>  static void destroy_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
> >>  {
> >> +       if (!mr->mr)
> >> +               return;
> >> +
> >>         mlx5_vdpa_destroy_mkey(mvdev, mr->mr);
> >>  }
> >>
> >> @@ -179,6 +187,74 @@ static int klm_byte_size(int nklms)
> >>         return 16 * ALIGN(nklms, 4);
> >>  }
> >>
> >> +static int create_direct_keys(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
> >> +{
> >> +       struct mlx5_vdpa_async_cmd *cmds = NULL;
> >> +       struct mlx5_vdpa_direct_mr *dmr;
> >> +       int err = 0;
> >> +       int i = 0;
> >> +
> >> +       cmds = kvcalloc(mr->num_directs, sizeof(*cmds), GFP_KERNEL);
> >> +       if (!cmds)
> >> +               return -ENOMEM;
> >
> > Nit: this could benefit from Scope-based Cleanup Helpers [1], as it
> > would make clear that memory is always removed at the end of the
> > function & could prevent errors if the function is modified in the
> > future. I'm a big fan of them so my opinion may be biased though :).
> >
> > It would be great to be able to remove the array members with that
> > too, but I think the kernel doesn't have any facility for that.
> >
> I didn't know about those. It sounds like a good idea! I will try
> to use them in v2.
>
> >> +
> >> +       list_for_each_entry(dmr, &mr->head, list) {
> >> +               struct mlx5_create_mkey_mem *cmd_mem;
> >> +               int mttlen, mttcount;
> >> +
> >> +               mttlen = roundup(MLX5_ST_SZ_BYTES(mtt) * dmr->nsg, 16);
> >
> > I don't get the roundup here, I guess it is so the driver does not
> > send potentially uninitialized memory to the device? Maybe the 16
> > should be a macro?
> >
> The roundup is a hw requirement. A define would be a good idea. Will add
> it.
>
> >> +               mttcount = mttlen / sizeof(cmd_mem->mtt[0]);
> >> +               cmd_mem = kvcalloc(1, struct_size(cmd_mem, mtt, mttcount), GFP_KERNEL);
> >> +               if (!cmd_mem) {
> >> +                       err = -ENOMEM;
> >> +                       goto done;
> >> +               }
> >> +
> >> +               cmds[i].out = cmd_mem->out;
> >> +               cmds[i].outlen = sizeof(cmd_mem->out);
> >> +               cmds[i].in = cmd_mem->in;
> >> +               cmds[i].inlen = struct_size(cmd_mem, mtt, mttcount);
> >> +
> >> +               fill_create_direct_mr(mvdev, dmr, cmd_mem);
> >> +
> >> +               i++;
> >> +       }
> >> +
> >> +       err = mlx5_vdpa_exec_async_cmds(mvdev, cmds, mr->num_directs);
> >> +       if (err) {
> >> +
> >> +               mlx5_vdpa_err(mvdev, "error issuing MTT mkey creation for direct mrs: %d\n", err);
> >> +               goto done;
> >> +       }
> >> +
> >> +       i = 0;
> >> +       list_for_each_entry(dmr, &mr->head, list) {
> >> +               struct mlx5_vdpa_async_cmd *cmd = &cmds[i++];
> >> +               struct mlx5_create_mkey_mem *cmd_mem;
> >> +
> >> +               cmd_mem = container_of(cmd->out, struct mlx5_create_mkey_mem, out);
> >> +
> >> +               if (!cmd->err) {
> >> +                       create_direct_mr_end(mvdev, dmr, cmd_mem);
> >
> > The caller function doesn't trust the result if we return an error.
> > Why not use the previous loop to call create_direct_mr_end? Am I
> > missing any side effect?
> >
> Which previous loop? We have the mkey value only after the command has
> been executed.

Ok, now I see what I proposed didn't make sense, thanks!

> I added the if here (instead of always calling
> create_direct_mr_end()) just to make things more explicit for the
> reader.
>
> >> +               } else {
> >> +                       err = err ? err : cmd->err;
> >> +                       mlx5_vdpa_err(mvdev, "error creating MTT mkey [0x%llx, 0x%llx]: %d\n",
> >> +                               dmr->start, dmr->end, cmd->err);
> >> +               }
> >> +       }
> >> +
> >> +done:
> >> +       for (i = i-1; i >= 0; i--) {
> >> +               struct mlx5_create_mkey_mem *cmd_mem;
> >> +
> >> +               cmd_mem = container_of(cmds[i].out, struct mlx5_create_mkey_mem, out);
> >> +               kvfree(cmd_mem);
> >> +       }
> >> +
> >> +       kvfree(cmds);
> >> +       return err;
> >> +}
> >> +
> >>  static int create_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
> >>  {
> >>         int inlen;
> >> @@ -279,14 +355,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr
> >>                 goto err_map;
> >>         }
> >>
> >> -       err = create_direct_mr(mvdev, mr);
> >> -       if (err)
> >> -               goto err_direct;
> >> -
> >>         return 0;
> >>
> >> -err_direct:
> >> -       dma_unmap_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
> >>  err_map:
> >>         sg_free_table(&mr->sg_head);
> >>         return err;
> >> @@ -401,6 +471,10 @@ static int create_user_mr(struct mlx5_vdpa_dev *mvdev,
> >>         if (err)
> >>                 goto err_chain;
> >>
> >> +       err = create_direct_keys(mvdev, mr);
> >> +       if (err)
> >> +               goto err_chain;
> >> +
> >>         /* Create the memory key that defines the guests's address space. This
> >>          * memory key refers to the direct keys that contain the MTT
> >>          * translations
> >> --
> >> 2.45.1
> >>
> >
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 7/7] vdpa/mlx5: Postpone MR deletion
  2024-08-29 15:07   ` Eugenio Perez Martin
@ 2024-08-29 15:22     ` Dragos Tatulea
  2024-08-29 17:12       ` Eugenio Perez Martin
  0 siblings, 1 reply; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-29 15:22 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, virtualization,
	Gal Pressman, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu



On 29.08.24 17:07, Eugenio Perez Martin wrote:
> On Wed, Aug 21, 2024 at 1:42 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>
>> Currently, when a new MR is set up, the old MR is deleted. MR deletion
>> is about 30-40% the time of MR creation. As deleting the old MR is not
>> important for the process of setting up the new MR, this operation
>> can be postponed.
>>
>> This series adds a workqueue that does MR garbage collection at a later
>> point. If the MR lock is taken, the handler will back off and
>> reschedule. The exception during shutdown: then the handler must
>> not postpone the work.
>>
>> Note that this is only a speculative optimization: if there is some
>> mapping operation that is triggered while the garbage collector handler
>> has the lock taken, this operation it will have to wait for the handler
>> to finish.
>>
>> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
>> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
>> ---
>>  drivers/vdpa/mlx5/core/mlx5_vdpa.h | 10 ++++++
>>  drivers/vdpa/mlx5/core/mr.c        | 51 ++++++++++++++++++++++++++++--
>>  drivers/vdpa/mlx5/net/mlx5_vnet.c  |  3 +-
>>  3 files changed, 60 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>> index c3e17bc888e8..2cedf7e2dbc4 100644
>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>> @@ -86,8 +86,18 @@ enum {
>>  struct mlx5_vdpa_mr_resources {
>>         struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
>>         unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
>> +
>> +       /* Pre-deletion mr list */
>>         struct list_head mr_list_head;
>> +
>> +       /* Deferred mr list */
>> +       struct list_head mr_gc_list_head;
>> +       struct workqueue_struct *wq_gc;
>> +       struct delayed_work gc_dwork_ent;
>> +
>>         struct mutex lock;
>> +
>> +       atomic_t shutdown;
>>  };
>>
>>  struct mlx5_vdpa_dev {
>> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
>> index ec75f165f832..43fce6b39cf2 100644
>> --- a/drivers/vdpa/mlx5/core/mr.c
>> +++ b/drivers/vdpa/mlx5/core/mr.c
>> @@ -653,14 +653,46 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_
>>         kfree(mr);
>>  }
>>
>> +#define MLX5_VDPA_MR_GC_TRIGGER_MS 2000
>> +
>> +static void mlx5_vdpa_mr_gc_handler(struct work_struct *work)
>> +{
>> +       struct mlx5_vdpa_mr_resources *mres;
>> +       struct mlx5_vdpa_mr *mr, *tmp;
>> +       struct mlx5_vdpa_dev *mvdev;
>> +
>> +       mres = container_of(work, struct mlx5_vdpa_mr_resources, gc_dwork_ent.work);
>> +
>> +       if (atomic_read(&mres->shutdown)) {
>> +               mutex_lock(&mres->lock);
>> +       } else if (!mutex_trylock(&mres->lock)) {
> 
> Is the trylock worth it? My understanding is that mutex_lock will add
> the kthread to the waitqueue anyway if it is not able to acquire the
> lock.
> 
I want to believe it is :). I noticed during testing that this can
interfere with the case where there are several .set_map() operations
in quick succession. That's why the work is delayed by such a long
time.

It's not a perfect heuristic but I found that it's better than not
having it.

>> +               queue_delayed_work(mres->wq_gc, &mres->gc_dwork_ent,
>> +                                  msecs_to_jiffies(MLX5_VDPA_MR_GC_TRIGGER_MS));
>> +               return;
>> +       }
>> +
>> +       mvdev = container_of(mres, struct mlx5_vdpa_dev, mres);
>> +
>> +       list_for_each_entry_safe(mr, tmp, &mres->mr_gc_list_head, mr_list) {
>> +               _mlx5_vdpa_destroy_mr(mvdev, mr);
>> +       }
>> +
>> +       mutex_unlock(&mres->lock);
>> +}
>> +
>>  static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
>>                               struct mlx5_vdpa_mr *mr)
>>  {
>> +       struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
>> +
>>         if (!mr)
>>                 return;
>>
>> -       if (refcount_dec_and_test(&mr->refcount))
>> -               _mlx5_vdpa_destroy_mr(mvdev, mr);
>> +       if (refcount_dec_and_test(&mr->refcount)) {
>> +               list_move_tail(&mr->mr_list, &mres->mr_gc_list_head);
>> +               queue_delayed_work(mres->wq_gc, &mres->gc_dwork_ent,
>> +                                  msecs_to_jiffies(MLX5_VDPA_MR_GC_TRIGGER_MS));
> 
> Why the delay?
> 
See above.

>> +       }
>>  }
>>
>>  void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
>> @@ -848,9 +880,17 @@ int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev)
>>  {
>>         struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
>>
>> -       INIT_LIST_HEAD(&mres->mr_list_head);
>> +       mres->wq_gc = create_singlethread_workqueue("mlx5_vdpa_mr_gc");
>> +       if (!mres->wq_gc)
>> +               return -ENOMEM;
>> +
>> +       INIT_DELAYED_WORK(&mres->gc_dwork_ent, mlx5_vdpa_mr_gc_handler);
>> +
>>         mutex_init(&mres->lock);
>>
>> +       INIT_LIST_HEAD(&mres->mr_list_head);
>> +       INIT_LIST_HEAD(&mres->mr_gc_list_head);
>> +
>>         return 0;
>>  }
>>
>> @@ -858,5 +898,10 @@ void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
>>  {
>>         struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
>>
>> +       atomic_set(&mres->shutdown, 1);
>> +
>> +       flush_delayed_work(&mres->gc_dwork_ent);
>> +       destroy_workqueue(mres->wq_gc);
>> +       mres->wq_gc = NULL;
>>         mutex_destroy(&mres->lock);
>>  }
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 1cadcb05a5c7..ee9482ef51e6 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -3435,6 +3435,8 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>>         free_fixed_resources(ndev);
>>         mlx5_vdpa_clean_mrs(mvdev);
>>         mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
>> +       mlx5_cmd_cleanup_async_ctx(&mvdev->async_ctx);
>> +
>>         if (!is_zero_ether_addr(ndev->config.mac)) {
>>                 pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
>>                 mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
>> @@ -4044,7 +4046,6 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
>>         destroy_workqueue(wq);
>>         mgtdev->ndev = NULL;
>>
> 
> Extra newline here.
Ack.
> 
>> -       mlx5_cmd_cleanup_async_ctx(&mvdev->async_ctx);
>>  }
>>
>>  static const struct vdpa_mgmtdev_ops mdev_ops = {
>> --
>> 2.45.1
>>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 6/7] vdpa/mlx5: Introduce init/destroy for MR resources
  2024-08-29 14:37   ` Eugenio Perez Martin
@ 2024-08-29 15:25     ` Dragos Tatulea
  0 siblings, 0 replies; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-29 15:25 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, virtualization,
	Gal Pressman, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu



On 29.08.24 16:37, Eugenio Perez Martin wrote:
> On Wed, Aug 21, 2024 at 1:42 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>
>> There's currently not a lot of action happening during
>> the init/destroy of MR resources. But more will be added
>> in the upcoming patches.
> 
> If the series doesn't receive new patches, it is just the next patch :).
> 
>>
>> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
>> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
>> ---
>>  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 ++
>>  drivers/vdpa/mlx5/core/mr.c        | 17 +++++++++++++++++
>>  drivers/vdpa/mlx5/core/resources.c |  3 ---
>>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 10 ++++++++--
>>  4 files changed, 27 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>> index 89b564cecddf..c3e17bc888e8 100644
>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>> @@ -138,6 +138,8 @@ int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, u32 *mkey, u32 *in,
>>  int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey);
>>  struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
>>                                          struct vhost_iotlb *iotlb);
>> +int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev);
>> +void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
>>  void mlx5_vdpa_clean_mrs(struct mlx5_vdpa_dev *mvdev);
>>  void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
>>                       struct mlx5_vdpa_mr *mr);
>> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
>> index f20f2a8a701d..ec75f165f832 100644
>> --- a/drivers/vdpa/mlx5/core/mr.c
>> +++ b/drivers/vdpa/mlx5/core/mr.c
>> @@ -843,3 +843,20 @@ int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
>>
>>         return 0;
>>  }
>> +
>> +int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev)
>> +{
>> +       struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
>> +
>> +       INIT_LIST_HEAD(&mres->mr_list_head);
>> +       mutex_init(&mres->lock);
>> +
>> +       return 0;
> 
> I'd leave this function return void here and remove the caller error
> control path.
> 
It is like this because the next patch adds an error path.

>> +}
>> +
>> +void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
>> +{
>> +       struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
>> +
>> +       mutex_destroy(&mres->lock);
>> +}
>> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
>> index fe2ca3458f6c..aeae31d0cefa 100644
>> --- a/drivers/vdpa/mlx5/core/resources.c
>> +++ b/drivers/vdpa/mlx5/core/resources.c
>> @@ -256,7 +256,6 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>>                 mlx5_vdpa_warn(mvdev, "resources already allocated\n");
>>                 return -EINVAL;
>>         }
>> -       mutex_init(&mvdev->mres.lock);
>>         res->uar = mlx5_get_uars_page(mdev);
>>         if (IS_ERR(res->uar)) {
>>                 err = PTR_ERR(res->uar);
>> @@ -301,7 +300,6 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>>  err_uctx:
>>         mlx5_put_uars_page(mdev, res->uar);
>>  err_uars:
>> -       mutex_destroy(&mvdev->mres.lock);
> 
> Maybe it is just me, but this patch is also moving the lock lifetime
> from mlx5_vdpa_alloc_resources / mlx5_vdpa_free_resources to
> mlx5_vdpa_dev_add / mlx5_vdpa_free. I guess it has a justification we
> can either clarify in the patch message or split in its own patch.
> 
Good point. Will do.

>>         return err;
>>  }
>>
>> @@ -318,7 +316,6 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
>>         dealloc_pd(mvdev, res->pdn, res->uid);
>>         destroy_uctx(mvdev, res->uid);
>>         mlx5_put_uars_page(mvdev->mdev, res->uar);
>> -       mutex_destroy(&mvdev->mres.lock);
>>         res->valid = false;
>>  }
>>
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 8a51c492a62a..1cadcb05a5c7 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -3434,6 +3434,7 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>>
>>         free_fixed_resources(ndev);
>>         mlx5_vdpa_clean_mrs(mvdev);
>> +       mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
>>         if (!is_zero_ether_addr(ndev->config.mac)) {
>>                 pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
>>                 mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
>> @@ -3962,12 +3963,15 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>>         if (err)
>>                 goto err_mpfs;
>>
>> -       INIT_LIST_HEAD(&mvdev->mres.mr_list_head);
>> +       err = mlx5_vdpa_init_mr_resources(mvdev);
>> +       if (err)
>> +               goto err_res;
>> +
> 
> Extra newline here.
> 
Seems like I'm keen on these extra newlines. Will fix.

Thanks,
Dragos
>>
>>         if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
>>                 err = mlx5_vdpa_create_dma_mr(mvdev);
>>                 if (err)
>> -                       goto err_res;
>> +                       goto err_mr_res;
>>         }
>>
>>         err = alloc_fixed_resources(ndev);
>> @@ -4009,6 +4013,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>>         free_fixed_resources(ndev);
>>  err_mr:
>>         mlx5_vdpa_clean_mrs(mvdev);
>> +err_mr_res:
>> +       mlx5_vdpa_destroy_mr_resources(mvdev);
>>  err_res:
>>         mlx5_vdpa_free_resources(&ndev->mvdev);
>>  err_mpfs:
>> --
>> 2.45.1
>>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 7/7] vdpa/mlx5: Postpone MR deletion
  2024-08-29 15:22     ` Dragos Tatulea
@ 2024-08-29 17:12       ` Eugenio Perez Martin
  0 siblings, 0 replies; 19+ messages in thread
From: Eugenio Perez Martin @ 2024-08-29 17:12 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, virtualization,
	Gal Pressman, kvm, linux-kernel, Parav Pandit, Xuan Zhuo,
	Cosmin Ratiu

On Thu, Aug 29, 2024 at 5:23 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
>
>
> On 29.08.24 17:07, Eugenio Perez Martin wrote:
> > On Wed, Aug 21, 2024 at 1:42 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> >>
> >> Currently, when a new MR is set up, the old MR is deleted. MR deletion
> >> is about 30-40% the time of MR creation. As deleting the old MR is not
> >> important for the process of setting up the new MR, this operation
> >> can be postponed.
> >>
> >> This series adds a workqueue that does MR garbage collection at a later
> >> point. If the MR lock is taken, the handler will back off and
> >> reschedule. The exception during shutdown: then the handler must
> >> not postpone the work.
> >>
> >> Note that this is only a speculative optimization: if there is some
> >> mapping operation that is triggered while the garbage collector handler
> >> has the lock taken, this operation it will have to wait for the handler
> >> to finish.
> >>
> >> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> >> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> >> ---
> >>  drivers/vdpa/mlx5/core/mlx5_vdpa.h | 10 ++++++
> >>  drivers/vdpa/mlx5/core/mr.c        | 51 ++++++++++++++++++++++++++++--
> >>  drivers/vdpa/mlx5/net/mlx5_vnet.c  |  3 +-
> >>  3 files changed, 60 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> >> index c3e17bc888e8..2cedf7e2dbc4 100644
> >> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> >> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> >> @@ -86,8 +86,18 @@ enum {
> >>  struct mlx5_vdpa_mr_resources {
> >>         struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
> >>         unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
> >> +
> >> +       /* Pre-deletion mr list */
> >>         struct list_head mr_list_head;
> >> +
> >> +       /* Deferred mr list */
> >> +       struct list_head mr_gc_list_head;
> >> +       struct workqueue_struct *wq_gc;
> >> +       struct delayed_work gc_dwork_ent;
> >> +
> >>         struct mutex lock;
> >> +
> >> +       atomic_t shutdown;
> >>  };
> >>
> >>  struct mlx5_vdpa_dev {
> >> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> >> index ec75f165f832..43fce6b39cf2 100644
> >> --- a/drivers/vdpa/mlx5/core/mr.c
> >> +++ b/drivers/vdpa/mlx5/core/mr.c
> >> @@ -653,14 +653,46 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_
> >>         kfree(mr);
> >>  }
> >>
> >> +#define MLX5_VDPA_MR_GC_TRIGGER_MS 2000
> >> +
> >> +static void mlx5_vdpa_mr_gc_handler(struct work_struct *work)
> >> +{
> >> +       struct mlx5_vdpa_mr_resources *mres;
> >> +       struct mlx5_vdpa_mr *mr, *tmp;
> >> +       struct mlx5_vdpa_dev *mvdev;
> >> +
> >> +       mres = container_of(work, struct mlx5_vdpa_mr_resources, gc_dwork_ent.work);
> >> +
> >> +       if (atomic_read(&mres->shutdown)) {
> >> +               mutex_lock(&mres->lock);
> >> +       } else if (!mutex_trylock(&mres->lock)) {
> >
> > Is the trylock worth it? My understanding is that mutex_lock will add
> > the kthread to the waitqueue anyway if it is not able to acquire the
> > lock.
> >
> I want to believe it is :). I noticed during testing that this can
> interfere with the case where there are several .set_map() operations
> in quick succession. That's why the work is delayed by such a long
> time.
>
> It's not a perfect heuristic but I found that it's better than not
> having it.
>

Understood, thanks for explaining! Can you add the explanation to the macro?

It would be great to find a mechanism so the work is added in low
priority fashion, but I don't know any.

> >> +               queue_delayed_work(mres->wq_gc, &mres->gc_dwork_ent,
> >> +                                  msecs_to_jiffies(MLX5_VDPA_MR_GC_TRIGGER_MS));
> >> +               return;
> >> +       }
> >> +
> >> +       mvdev = container_of(mres, struct mlx5_vdpa_dev, mres);
> >> +
> >> +       list_for_each_entry_safe(mr, tmp, &mres->mr_gc_list_head, mr_list) {
> >> +               _mlx5_vdpa_destroy_mr(mvdev, mr);
> >> +       }
> >> +
> >> +       mutex_unlock(&mres->lock);
> >> +}
> >> +
> >>  static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
> >>                               struct mlx5_vdpa_mr *mr)
> >>  {
> >> +       struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
> >> +
> >>         if (!mr)
> >>                 return;
> >>
> >> -       if (refcount_dec_and_test(&mr->refcount))
> >> -               _mlx5_vdpa_destroy_mr(mvdev, mr);
> >> +       if (refcount_dec_and_test(&mr->refcount)) {
> >> +               list_move_tail(&mr->mr_list, &mres->mr_gc_list_head);
> >> +               queue_delayed_work(mres->wq_gc, &mres->gc_dwork_ent,
> >> +                                  msecs_to_jiffies(MLX5_VDPA_MR_GC_TRIGGER_MS));
> >
> > Why the delay?
> >
> See above.
>
> >> +       }
> >>  }
> >>
> >>  void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
> >> @@ -848,9 +880,17 @@ int mlx5_vdpa_init_mr_resources(struct mlx5_vdpa_dev *mvdev)
> >>  {
> >>         struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
> >>
> >> -       INIT_LIST_HEAD(&mres->mr_list_head);
> >> +       mres->wq_gc = create_singlethread_workqueue("mlx5_vdpa_mr_gc");
> >> +       if (!mres->wq_gc)
> >> +               return -ENOMEM;
> >> +
> >> +       INIT_DELAYED_WORK(&mres->gc_dwork_ent, mlx5_vdpa_mr_gc_handler);
> >> +
> >>         mutex_init(&mres->lock);
> >>
> >> +       INIT_LIST_HEAD(&mres->mr_list_head);
> >> +       INIT_LIST_HEAD(&mres->mr_gc_list_head);
> >> +
> >>         return 0;
> >>  }
> >>
> >> @@ -858,5 +898,10 @@ void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
> >>  {
> >>         struct mlx5_vdpa_mr_resources *mres = &mvdev->mres;
> >>
> >> +       atomic_set(&mres->shutdown, 1);
> >> +
> >> +       flush_delayed_work(&mres->gc_dwork_ent);
> >> +       destroy_workqueue(mres->wq_gc);
> >> +       mres->wq_gc = NULL;
> >>         mutex_destroy(&mres->lock);
> >>  }
> >> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> index 1cadcb05a5c7..ee9482ef51e6 100644
> >> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> @@ -3435,6 +3435,8 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
> >>         free_fixed_resources(ndev);
> >>         mlx5_vdpa_clean_mrs(mvdev);
> >>         mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
> >> +       mlx5_cmd_cleanup_async_ctx(&mvdev->async_ctx);
> >> +
> >>         if (!is_zero_ether_addr(ndev->config.mac)) {
> >>                 pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
> >>                 mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
> >> @@ -4044,7 +4046,6 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> >>         destroy_workqueue(wq);
> >>         mgtdev->ndev = NULL;
> >>
> >
> > Extra newline here.
> Ack.
> >
> >> -       mlx5_cmd_cleanup_async_ctx(&mvdev->async_ctx);
> >>  }
> >>
> >>  static const struct vdpa_mgmtdev_ops mdev_ops = {
> >> --
> >> 2.45.1
> >>
> >
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH vhost 1/7] vdpa/mlx5: Create direct MKEYs in parallel
  2024-08-29 15:15       ` Eugenio Perez Martin
@ 2024-08-30 11:09         ` Dragos Tatulea
  0 siblings, 0 replies; 19+ messages in thread
From: Dragos Tatulea @ 2024-08-30 11:09 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, Si-Wei Liu, virtualization, Gal Pressman, kvm list,
	linux-kernel, Parav Pandit, Xuan Zhuo, Cosmin Ratiu,
	Michael Tsirkin



On 29.08.24 17:15, Eugenio Perez Martin wrote:
> On Thu, Aug 29, 2024 at 3:54 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>
>>
>>
>> On 29.08.24 15:10, Eugenio Perez Martin wrote:
>>> On Wed, Aug 21, 2024 at 1:41 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>>>
>>>> Use the async interface to issue MTT MKEY creation.
>>>> Extra care is taken at the allocation of FW input commands
>>>> due to the MTT tables having variable sizes depending on
>>>> MR.
>>>>
>>>> The indirect MKEY is still created synchronously at the
>>>> end as the direct MKEYs need to be filled in.
>>>>
>>>> This makes create_user_mr() 3-5x faster, depending on
>>>> the size of the MR.
>>>>
>>>> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
>>>> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
>>>> ---
>>>>  drivers/vdpa/mlx5/core/mr.c | 118 +++++++++++++++++++++++++++++-------
>>>>  1 file changed, 96 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
>>>> index 4758914ccf86..66e6a15f823f 100644
>>>> --- a/drivers/vdpa/mlx5/core/mr.c
>>>> +++ b/drivers/vdpa/mlx5/core/mr.c
>>>> @@ -49,17 +49,18 @@ static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
>>>>         }
>>>>  }
>>>>
>>>> -static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
>>>> +struct mlx5_create_mkey_mem {
>>>> +       u8 out[MLX5_ST_SZ_BYTES(create_mkey_out)];
>>>> +       u8 in[MLX5_ST_SZ_BYTES(create_mkey_in)];
>>>> +       DECLARE_FLEX_ARRAY(__be64, mtt);
>>>
>>> I may be missing something obvious, but why do we need
>>> DECLARE_FLEX_ARRAY here? My understanding is that it is only needed in
>>> special cases like uapi headers and we can use "__be64 mtt[]" here.
>>>
>> checkpatch.pl was complaining about it because in my initial version I
>> used the "[0]" version of zero length based arrays.
>>
>> My impression was that DECLARE_FLEX_ARRAY is preferred option because it
>> triggers a compiler error if the zero lenth array is not at the end of
>> the struct. But on closer inspection I see that using the right C99
>> empty brackets notation is enough to trigger this error.
>> DECLARE_FLEX_ARRAY seems to be useful for the union case.
>>
>> I will change it in a v2.
>>
>>>> +};
>>>> +
>>>> +static void fill_create_direct_mr(struct mlx5_vdpa_dev *mvdev,
>>>> +                                 struct mlx5_vdpa_direct_mr *mr,
>>>> +                                 struct mlx5_create_mkey_mem *mem)
>>>>  {
>>>> -       int inlen;
>>>> +       void *in = &mem->in;
>>>>         void *mkc;
>>>> -       void *in;
>>>> -       int err;
>>>> -
>>>> -       inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + roundup(MLX5_ST_SZ_BYTES(mtt) * mr->nsg, 16);
>>>> -       in = kvzalloc(inlen, GFP_KERNEL);
>>>> -       if (!in)
>>>> -               return -ENOMEM;
>>>>
>>>>         MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
>>>>         mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
>>>> @@ -76,18 +77,25 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct
>>>>         MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
>>>>                  get_octo_len(mr->end - mr->start, mr->log_size));
>>>>         populate_mtts(mr, MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt));
>>>> -       err = mlx5_vdpa_create_mkey(mvdev, &mr->mr, in, inlen);
>>>> -       kvfree(in);
>>>> -       if (err) {
>>>> -               mlx5_vdpa_warn(mvdev, "Failed to create direct MR\n");
>>>> -               return err;
>>>> -       }
>>>>
>>>> -       return 0;
>>>> +       MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
>>>> +       MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
>>>> +}
>>>> +
>>>> +static void create_direct_mr_end(struct mlx5_vdpa_dev *mvdev,
>>>> +                                struct mlx5_vdpa_direct_mr *mr,
>>>> +                                struct mlx5_create_mkey_mem *mem)
>>>> +{
>>>> +       u32 mkey_index = MLX5_GET(create_mkey_out, mem->out, mkey_index);
>>>> +
>>>> +       mr->mr = mlx5_idx_to_mkey(mkey_index);
>>>>  }
>>>>
>>>>  static void destroy_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
>>>>  {
>>>> +       if (!mr->mr)
>>>> +               return;
>>>> +
>>>>         mlx5_vdpa_destroy_mkey(mvdev, mr->mr);
>>>>  }
>>>>
>>>> @@ -179,6 +187,74 @@ static int klm_byte_size(int nklms)
>>>>         return 16 * ALIGN(nklms, 4);
>>>>  }
>>>>
>>>> +static int create_direct_keys(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
>>>> +{
>>>> +       struct mlx5_vdpa_async_cmd *cmds = NULL;
>>>> +       struct mlx5_vdpa_direct_mr *dmr;
>>>> +       int err = 0;
>>>> +       int i = 0;
>>>> +
>>>> +       cmds = kvcalloc(mr->num_directs, sizeof(*cmds), GFP_KERNEL);
>>>> +       if (!cmds)
>>>> +               return -ENOMEM;
>>>
>>> Nit: this could benefit from Scope-based Cleanup Helpers [1], as it
>>> would make clear that memory is always removed at the end of the
>>> function & could prevent errors if the function is modified in the
>>> future. I'm a big fan of them so my opinion may be biased though :).
>>>
>>> It would be great to be able to remove the array members with that
>>> too, but I think the kernel doesn't have any facility for that.
>>>
>> I didn't know about those. It sounds like a good idea! I will try
>> to use them in v2.
>>
I tried for this patch: doing the DECLARE_FREE only for the cmds
array is only more confusing because the goto done still has to
happen due to cmd_mem elements being of different size. I preferred
not to mix and match.

I also tried adding a ctor/dtor with DECLARE_CLASS that wraps both
of these resources: the problem there is that the ctor can't fail
so the allocation still has to happen in this function but
we stilll need an extra struct to hold the cmds + size. Overall the
code was introducing more confusion.

However, it works very well for the next patch so I used it there
in v2.

In the future I will look at introducing more of these, they are
interesting helpers.

Thanks,
Dragos
>>>> +
>>>> +       list_for_each_entry(dmr, &mr->head, list) {
>>>> +               struct mlx5_create_mkey_mem *cmd_mem;
>>>> +               int mttlen, mttcount;
>>>> +
>>>> +               mttlen = roundup(MLX5_ST_SZ_BYTES(mtt) * dmr->nsg, 16);
>>>
>>> I don't get the roundup here, I guess it is so the driver does not
>>> send potentially uninitialized memory to the device? Maybe the 16
>>> should be a macro?
>>>
>> The roundup is a hw requirement. A define would be a good idea. Will add
>> it.
>>
>>>> +               mttcount = mttlen / sizeof(cmd_mem->mtt[0]);
>>>> +               cmd_mem = kvcalloc(1, struct_size(cmd_mem, mtt, mttcount), GFP_KERNEL);
>>>> +               if (!cmd_mem) {
>>>> +                       err = -ENOMEM;
>>>> +                       goto done;
>>>> +               }
>>>> +
>>>> +               cmds[i].out = cmd_mem->out;
>>>> +               cmds[i].outlen = sizeof(cmd_mem->out);
>>>> +               cmds[i].in = cmd_mem->in;
>>>> +               cmds[i].inlen = struct_size(cmd_mem, mtt, mttcount);
>>>> +
>>>> +               fill_create_direct_mr(mvdev, dmr, cmd_mem);
>>>> +
>>>> +               i++;
>>>> +       }
>>>> +
>>>> +       err = mlx5_vdpa_exec_async_cmds(mvdev, cmds, mr->num_directs);
>>>> +       if (err) {
>>>> +
>>>> +               mlx5_vdpa_err(mvdev, "error issuing MTT mkey creation for direct mrs: %d\n", err);
>>>> +               goto done;
>>>> +       }
>>>> +
>>>> +       i = 0;
>>>> +       list_for_each_entry(dmr, &mr->head, list) {
>>>> +               struct mlx5_vdpa_async_cmd *cmd = &cmds[i++];
>>>> +               struct mlx5_create_mkey_mem *cmd_mem;
>>>> +
>>>> +               cmd_mem = container_of(cmd->out, struct mlx5_create_mkey_mem, out);
>>>> +
>>>> +               if (!cmd->err) {
>>>> +                       create_direct_mr_end(mvdev, dmr, cmd_mem);
>>>
>>> The caller function doesn't trust the result if we return an error.
>>> Why not use the previous loop to call create_direct_mr_end? Am I
>>> missing any side effect?
>>>
>> Which previous loop? We have the mkey value only after the command has
>> been executed.
> 
> Ok, now I see what I proposed didn't make sense, thanks!
> 
>> I added the if here (instead of always calling
>> create_direct_mr_end()) just to make things more explicit for the
>> reader.
>>
>>>> +               } else {
>>>> +                       err = err ? err : cmd->err;
>>>> +                       mlx5_vdpa_err(mvdev, "error creating MTT mkey [0x%llx, 0x%llx]: %d\n",
>>>> +                               dmr->start, dmr->end, cmd->err);
>>>> +               }
>>>> +       }
>>>> +
>>>> +done:
>>>> +       for (i = i-1; i >= 0; i--) {
>>>> +               struct mlx5_create_mkey_mem *cmd_mem;
>>>> +
>>>> +               cmd_mem = container_of(cmds[i].out, struct mlx5_create_mkey_mem, out);
>>>> +               kvfree(cmd_mem);
>>>> +       }
>>>> +
>>>> +       kvfree(cmds);
>>>> +       return err;
>>>> +}
>>>> +
>>>>  static int create_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
>>>>  {
>>>>         int inlen;
>>>> @@ -279,14 +355,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr
>>>>                 goto err_map;
>>>>         }
>>>>
>>>> -       err = create_direct_mr(mvdev, mr);
>>>> -       if (err)
>>>> -               goto err_direct;
>>>> -
>>>>         return 0;
>>>>
>>>> -err_direct:
>>>> -       dma_unmap_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
>>>>  err_map:
>>>>         sg_free_table(&mr->sg_head);
>>>>         return err;
>>>> @@ -401,6 +471,10 @@ static int create_user_mr(struct mlx5_vdpa_dev *mvdev,
>>>>         if (err)
>>>>                 goto err_chain;
>>>>
>>>> +       err = create_direct_keys(mvdev, mr);
>>>> +       if (err)
>>>> +               goto err_chain;
>>>> +
>>>>         /* Create the memory key that defines the guests's address space. This
>>>>          * memory key refers to the direct keys that contain the MTT
>>>>          * translations
>>>> --
>>>> 2.45.1
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-08-30 11:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-21 11:40 [PATCH vhost 0/7] vdpa/mlx5: Optimze MKEY operations Dragos Tatulea
2024-08-21 11:40 ` [PATCH vhost 1/7] vdpa/mlx5: Create direct MKEYs in parallel Dragos Tatulea
2024-08-29 13:10   ` Eugenio Perez Martin
     [not found]     ` <6935f3aa-9de5-4781-b823-30c17817cc86@nvidia.com>
2024-08-29 15:15       ` Eugenio Perez Martin
2024-08-30 11:09         ` Dragos Tatulea
2024-08-21 11:40 ` [PATCH vhost 2/7] vdpa/mlx5: Delete " Dragos Tatulea
2024-08-29 13:42   ` Eugenio Perez Martin
2024-08-21 11:40 ` [PATCH vhost 3/7] vdpa/mlx5: Rename function Dragos Tatulea
2024-08-21 11:40 ` [PATCH vhost 4/7] vdpa/mlx5: Extract mr members in own resource struct Dragos Tatulea
2024-08-29 14:16   ` Eugenio Perez Martin
2024-08-21 11:40 ` [PATCH vhost 5/7] vdpa/mlx5: Rename mr_mtx -> lock Dragos Tatulea
2024-08-29 14:18   ` Eugenio Perez Martin
2024-08-21 11:41 ` [PATCH vhost 6/7] vdpa/mlx5: Introduce init/destroy for MR resources Dragos Tatulea
2024-08-29 14:37   ` Eugenio Perez Martin
2024-08-29 15:25     ` Dragos Tatulea
2024-08-21 11:41 ` [PATCH vhost 7/7] vdpa/mlx5: Postpone MR deletion Dragos Tatulea
2024-08-29 15:07   ` Eugenio Perez Martin
2024-08-29 15:22     ` Dragos Tatulea
2024-08-29 17:12       ` Eugenio Perez Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox