virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume
@ 2024-08-02  7:20 Dragos Tatulea
  2024-08-02  7:20 ` [PATCH vhost 2/7] vdpa/mlx5: Introduce error logging function Dragos Tatulea
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Dragos Tatulea @ 2024-08-02  7:20 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Dragos Tatulea,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez
  Cc: Si-Wei Liu, netdev, linux-rdma, linux-kernel, virtualization

This series parallelizes the mlx5_vdpa device suspend and resume
operations through the firmware async API. The purpose is to reduce live
migration downtime.

The series starts with changing the VQ suspend and resume commands
to the async API. After that, the switch is made to issue multiple
commands of the same type in parallel.

Finally, a bonus improvement is thrown in: keep the notifierd enabled
during suspend but make it a NOP. Upon resume make sure that the link
state is forwarded. This shaves around 30ms per device constant time.

For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
x 2 threads per core), the improvements are:

+-------------------+--------+--------+-----------+
| operation         | Before | After  | Reduction |
|-------------------+--------+--------+-----------|
| mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
| mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
+-------------------+--------+--------+-----------+

Note for the maintainers:
The first patch contains changes for mlx5_core. This must be applied
into the mlx5-vhost tree [0] first. Once this patch is applied on
mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
tree and only then the remaining patches can be applied.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost

Dragos Tatulea (7):
  net/mlx5: Support throttled commands from async API
  vdpa/mlx5: Introduce error logging function
  vdpa/mlx5: Use async API for vq query command
  vdpa/mlx5: Use async API for vq modify commands
  vdpa/mlx5: Parallelize device suspend
  vdpa/mlx5: Parallelize device resume
  vdpa/mlx5: Keep notifiers during suspend but ignore

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
 drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
 drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
 3 files changed, 333 insertions(+), 130 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH vhost 2/7] vdpa/mlx5: Introduce error logging function
  2024-08-02  7:20 [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Dragos Tatulea
@ 2024-08-02  7:20 ` Dragos Tatulea
  2024-08-06 17:57   ` Eugenio Perez Martin
  2024-08-02  7:20 ` [PATCH vhost 3/7] vdpa/mlx5: Use async API for vq query command Dragos Tatulea
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Dragos Tatulea @ 2024-08-02  7:20 UTC (permalink / raw)
  To: Dragos Tatulea, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez
  Cc: Si-Wei Liu, Tariq Toukan, virtualization, linux-kernel

mlx5_vdpa_err() was missing. This patch adds it and uses it in the
necessary places.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  5 +++++
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 24 ++++++++++++------------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 50aac8fe57ef..424d445ebee4 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -135,6 +135,11 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
 int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev);
 int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
 
+#define mlx5_vdpa_err(__dev, format, ...)                                                          \
+	dev_err((__dev)->mdev->device, "%s:%d:(pid %d) error: " format, __func__, __LINE__,        \
+		 current->pid, ##__VA_ARGS__)
+
+
 #define mlx5_vdpa_warn(__dev, format, ...)                                                         \
 	dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__,     \
 		 current->pid, ##__VA_ARGS__)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index fa78e8288ebb..12133e5d1285 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1538,13 +1538,13 @@ static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mv
 
 	err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
 	if (err) {
-		mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed, err: %d\n", err);
+		mlx5_vdpa_err(&ndev->mvdev, "modify to suspend failed, err: %d\n", err);
 		return err;
 	}
 
 	err = query_virtqueue(ndev, mvq, &attr);
 	if (err) {
-		mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue, err: %d\n", err);
+		mlx5_vdpa_err(&ndev->mvdev, "failed to query virtqueue, err: %d\n", err);
 		return err;
 	}
 
@@ -1585,7 +1585,7 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
 		 */
 		err = modify_virtqueue(ndev, mvq, 0);
 		if (err) {
-			mlx5_vdpa_warn(&ndev->mvdev,
+			mlx5_vdpa_err(&ndev->mvdev,
 				"modify vq properties failed for vq %u, err: %d\n",
 				mvq->index, err);
 			return err;
@@ -1600,15 +1600,15 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
 		return 0;
 	default:
-		mlx5_vdpa_warn(&ndev->mvdev, "resume vq %u called from bad state %d\n",
+		mlx5_vdpa_err(&ndev->mvdev, "resume vq %u called from bad state %d\n",
 			       mvq->index, mvq->fw_state);
 		return -EINVAL;
 	}
 
 	err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
 	if (err)
-		mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u, err: %d\n",
-			       mvq->index, err);
+		mlx5_vdpa_err(&ndev->mvdev, "modify to resume failed for vq %u, err: %d\n",
+			      mvq->index, err);
 
 	return err;
 }
@@ -2002,13 +2002,13 @@ static int setup_steering(struct mlx5_vdpa_net *ndev)
 
 	ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, MLX5_FLOW_NAMESPACE_BYPASS);
 	if (!ns) {
-		mlx5_vdpa_warn(&ndev->mvdev, "failed to get flow namespace\n");
+		mlx5_vdpa_err(&ndev->mvdev, "failed to get flow namespace\n");
 		return -EOPNOTSUPP;
 	}
 
 	ndev->rxft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
 	if (IS_ERR(ndev->rxft)) {
-		mlx5_vdpa_warn(&ndev->mvdev, "failed to create flow table\n");
+		mlx5_vdpa_err(&ndev->mvdev, "failed to create flow table\n");
 		return PTR_ERR(ndev->rxft);
 	}
 	mlx5_vdpa_add_rx_flow_table(ndev);
@@ -2530,7 +2530,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa
 
 	err = query_virtqueue(ndev, mvq, &attr);
 	if (err) {
-		mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
+		mlx5_vdpa_err(mvdev, "failed to query virtqueue\n");
 		return err;
 	}
 	state->split.avail_index = attr.used_index;
@@ -3189,7 +3189,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
 	if ((flags & VDPA_RESET_F_CLEAN_MAP) &&
 	    MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
 		if (mlx5_vdpa_create_dma_mr(mvdev))
-			mlx5_vdpa_warn(mvdev, "create MR failed\n");
+			mlx5_vdpa_err(mvdev, "create MR failed\n");
 	}
 	if (vq_reset)
 		setup_vq_resources(ndev, false);
@@ -3244,7 +3244,7 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
 		new_mr = mlx5_vdpa_create_mr(mvdev, iotlb);
 		if (IS_ERR(new_mr)) {
 			err = PTR_ERR(new_mr);
-			mlx5_vdpa_warn(mvdev, "create map failed(%d)\n", err);
+			mlx5_vdpa_err(mvdev, "create map failed(%d)\n", err);
 			return err;
 		}
 	} else {
@@ -3257,7 +3257,7 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
 	} else {
 		err = mlx5_vdpa_change_map(mvdev, new_mr, asid);
 		if (err) {
-			mlx5_vdpa_warn(mvdev, "change map failed(%d)\n", err);
+			mlx5_vdpa_err(mvdev, "change map failed(%d)\n", err);
 			goto out_err;
 		}
 	}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH vhost 3/7] vdpa/mlx5: Use async API for vq query command
  2024-08-02  7:20 [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Dragos Tatulea
  2024-08-02  7:20 ` [PATCH vhost 2/7] vdpa/mlx5: Introduce error logging function Dragos Tatulea
@ 2024-08-02  7:20 ` Dragos Tatulea
  2024-08-02  7:20 ` [PATCH vhost 4/7] vdpa/mlx5: Use async API for vq modify commands Dragos Tatulea
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Dragos Tatulea @ 2024-08-02  7:20 UTC (permalink / raw)
  To: Dragos Tatulea, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez
  Cc: Si-Wei Liu, Tariq Toukan, virtualization, linux-kernel

Switch firmware vq query command to be issued via the async API to
allow future parallelization.

exec_virtqueue_async_cmds() is a generic execution function that will be
used to issue other async operations as well. Handling of throttled
commands is built in.

For now the command is still serial but the infrastructure is there
to issue commands in parallel, including ratelimiting the number
of issued async commands to firmware.

A later patch will switch to issuing more commands at a time.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |   2 +
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 181 +++++++++++++++++++++++++----
 2 files changed, 161 insertions(+), 22 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 424d445ebee4..12136163d8ad 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -103,6 +103,8 @@ struct mlx5_vdpa_dev {
 	struct workqueue_struct *wq;
 	unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
 	bool suspended;
+
+	struct mlx5_async_ctx async_ctx;
 };
 
 int mlx5_vdpa_create_tis(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tisn);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 12133e5d1285..be8df9d9f4df 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1184,40 +1184,173 @@ struct mlx5_virtq_attr {
 	u16 used_index;
 };
 
-static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
-			   struct mlx5_virtq_attr *attr)
-{
-	int outlen = MLX5_ST_SZ_BYTES(query_virtio_net_q_out);
-	u32 in[MLX5_ST_SZ_DW(query_virtio_net_q_in)] = {};
+struct mlx5_virtqueue_query_mem {
+	u8 in[MLX5_ST_SZ_BYTES(query_virtio_net_q_in)];
+	u8 out[MLX5_ST_SZ_BYTES(query_virtio_net_q_out)];
+};
+
+struct mlx5_vdpa_async_virtqueue_cmd {
+	int err;
+	struct mlx5_async_work cb_work;
+	struct completion cmd_done;
+
+	void *in;
+	size_t inlen;
+
 	void *out;
-	void *obj_context;
-	void *cmd_hdr;
+	size_t outlen;
+
+	union {
+		struct mlx5_virtqueue_query_mem query;
+	};
+};
+
+static void virtqueue_cmd_callback(int status, struct mlx5_async_work *context)
+{
+	struct mlx5_vdpa_async_virtqueue_cmd *cmd =
+		container_of(context, struct mlx5_vdpa_async_virtqueue_cmd, cb_work);
+
+	cmd->err = mlx5_cmd_check(context->ctx->dev, status, cmd->in, cmd->out);
+	complete(&cmd->cmd_done);
+}
+
+static int issue_async_cmd(struct mlx5_vdpa_net *ndev,
+			   struct mlx5_vdpa_async_virtqueue_cmd *cmds,
+			   int issued,
+			   int *completed)
+
+{
+	struct mlx5_vdpa_async_virtqueue_cmd *cmd = &cmds[issued];
 	int err;
 
-	out = kzalloc(outlen, GFP_KERNEL);
-	if (!out)
-		return -ENOMEM;
+retry:
+	err = mlx5_cmd_exec_cb(&ndev->mvdev.async_ctx,
+			       cmd->in, cmd->inlen,
+			       cmd->out, cmd->outlen,
+			       virtqueue_cmd_callback,
+			       &cmd->cb_work);
+	if (err == -EBUSY) {
+		if (*completed < issued) {
+			/* Throttled by own commands: wait for oldest completion. */
+			wait_for_completion(&cmds[*completed].cmd_done);
+			(*completed)++;
+
+			goto retry;
+		} else {
+			/* Throttled by external commands: switch to sync api. */
+			err = mlx5_cmd_exec(ndev->mvdev.mdev,
+					    cmd->in, cmd->inlen,
+					    cmd->out, cmd->outlen);
+			if (!err)
+				(*completed)++;
+		}
+	}
+
+	return err;
+}
+
+static int exec_virtqueue_async_cmds(struct mlx5_vdpa_net *ndev,
+				     struct mlx5_vdpa_async_virtqueue_cmd *cmds,
+				     int num_cmds)
+{
+	int completed = 0;
+	int issued = 0;
+	int err = 0;
+
+	for (int i = 0; i < num_cmds; i++)
+		init_completion(&cmds[i].cmd_done);
+
+	while (issued < num_cmds) {
+
+		err = issue_async_cmd(ndev, cmds, issued, &completed);
+		if (err) {
+			mlx5_vdpa_err(&ndev->mvdev, "error issuing command %d of %d: %d\n",
+				      issued, num_cmds, err);
+			break;
+		}
+
+		issued++;
+	}
+
+	while (completed < issued)
+		wait_for_completion(&cmds[completed++].cmd_done);
 
-	cmd_hdr = MLX5_ADDR_OF(query_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+	return err;
+}
+
+static void fill_query_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
+				     struct mlx5_vdpa_virtqueue *mvq,
+				     struct mlx5_vdpa_async_virtqueue_cmd *cmd)
+{
+	void *cmd_hdr;
+
+	cmd->in = &cmd->query.in;
+	cmd->inlen = sizeof(cmd->query.in);
+	cmd->out = &cmd->query.out;
+	cmd->outlen = sizeof(cmd->query.out);
+
+	cmd_hdr = MLX5_ADDR_OF(query_virtio_net_q_in, cmd->query.in, general_obj_in_cmd_hdr);
 
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
-	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, outlen);
-	if (err)
-		goto err_cmd;
+}
+
+static void query_virtqueue_end(struct mlx5_vdpa_net *ndev,
+				struct mlx5_vdpa_async_virtqueue_cmd *cmd,
+				struct mlx5_virtq_attr *attr)
+{
+	void *obj_context = MLX5_ADDR_OF(query_virtio_net_q_out, cmd->query.out, obj_context);
 
-	obj_context = MLX5_ADDR_OF(query_virtio_net_q_out, out, obj_context);
 	memset(attr, 0, sizeof(*attr));
 	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
 	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
 	attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index);
-	kfree(out);
-	return 0;
+}
 
-err_cmd:
-	kfree(out);
+static int query_virtqueues(struct mlx5_vdpa_net *ndev,
+			    int start_vq,
+			    int num_vqs,
+			    struct mlx5_virtq_attr *attrs)
+{
+	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+	struct mlx5_vdpa_async_virtqueue_cmd *cmds;
+	int err = 0;
+
+	WARN(start_vq + num_vqs > mvdev->max_vqs, "query vq range invalid [%d, %d), max_vqs: %u\n",
+	     start_vq, start_vq + num_vqs, mvdev->max_vqs);
+
+	cmds = kvcalloc(num_vqs, sizeof(*cmds), GFP_KERNEL);
+	if (!cmds)
+		return -ENOMEM;
+
+	for (int i = 0; i < num_vqs; i++)
+		fill_query_virtqueue_cmd(ndev, &ndev->vqs[start_vq + i], &cmds[i]);
+
+	err = exec_virtqueue_async_cmds(ndev, cmds, num_vqs);
+	if (err) {
+		mlx5_vdpa_err(mvdev, "error issuing query cmd for vq range [%d, %d): %d\n",
+			      start_vq, start_vq + num_vqs, err);
+		goto done;
+	}
+
+	for (int i = 0; i < num_vqs; i++) {
+		struct mlx5_vdpa_async_virtqueue_cmd *cmd = &cmds[i];
+		int vq_idx = start_vq + i;
+
+		if (cmd->err) {
+			mlx5_vdpa_err(mvdev, "query vq %d failed, err: %d\n", vq_idx, err);
+			if (!err)
+				err = cmd->err;
+			continue;
+		}
+
+		query_virtqueue_end(ndev, cmd, &attrs[i]);
+	}
+
+done:
+	kfree(cmds);
 	return err;
 }
 
@@ -1542,7 +1675,7 @@ static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mv
 		return err;
 	}
 
-	err = query_virtqueue(ndev, mvq, &attr);
+	err = query_virtqueues(ndev, mvq->index, 1, &attr);
 	if (err) {
 		mlx5_vdpa_err(&ndev->mvdev, "failed to query virtqueue, err: %d\n", err);
 		return err;
@@ -2528,7 +2661,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa
 		return 0;
 	}
 
-	err = query_virtqueue(ndev, mvq, &attr);
+	err = query_virtqueues(ndev, mvq->index, 1, &attr);
 	if (err) {
 		mlx5_vdpa_err(mvdev, "failed to query virtqueue\n");
 		return err;
@@ -2879,7 +3012,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
 	int err;
 
 	if (mvq->initialized) {
-		err = query_virtqueue(ndev, mvq, &attr);
+		err = query_virtqueues(ndev, mvq->index, 1, &attr);
 		if (err)
 			return err;
 	}
@@ -3854,6 +3987,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 		ndev->rqt_size = 1;
 	}
 
+	mlx5_cmd_init_async_ctx(mdev, &mvdev->async_ctx);
+
 	ndev->mvdev.mlx_features = device_features;
 	mvdev->vdev.dma_dev = &mdev->pdev->dev;
 	err = mlx5_vdpa_alloc_resources(&ndev->mvdev);
@@ -3935,6 +4070,8 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
 	mvdev->wq = NULL;
 	destroy_workqueue(wq);
 	mgtdev->ndev = NULL;
+
+	mlx5_cmd_cleanup_async_ctx(&mvdev->async_ctx);
 }
 
 static const struct vdpa_mgmtdev_ops mdev_ops = {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH vhost 4/7] vdpa/mlx5: Use async API for vq modify commands
  2024-08-02  7:20 [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Dragos Tatulea
  2024-08-02  7:20 ` [PATCH vhost 2/7] vdpa/mlx5: Introduce error logging function Dragos Tatulea
  2024-08-02  7:20 ` [PATCH vhost 3/7] vdpa/mlx5: Use async API for vq query command Dragos Tatulea
@ 2024-08-02  7:20 ` Dragos Tatulea
  2024-08-07 13:12   ` Eugenio Perez Martin
  2024-08-02  7:20 ` [PATCH vhost 5/7] vdpa/mlx5: Parallelize device suspend Dragos Tatulea
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Dragos Tatulea @ 2024-08-02  7:20 UTC (permalink / raw)
  To: Dragos Tatulea, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez
  Cc: Si-Wei Liu, Tariq Toukan, virtualization, linux-kernel

Switch firmware vq modify command to be issued via the async API to
allow future parallelization. The new refactored function applies the
modify on a range of vqs and waits for their execution to complete.

For now the command is still used in a serial fashion. A later patch
will switch to modifying multiple vqs in parallel.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 150 ++++++++++++++++++++----------
 1 file changed, 103 insertions(+), 47 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index be8df9d9f4df..e56a0ee1b725 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1189,6 +1189,12 @@ struct mlx5_virtqueue_query_mem {
 	u8 out[MLX5_ST_SZ_BYTES(query_virtio_net_q_out)];
 };
 
+struct mlx5_virtqueue_modify_mem {
+	u8 in[MLX5_ST_SZ_BYTES(modify_virtio_net_q_in)];
+	u8 out[MLX5_ST_SZ_BYTES(modify_virtio_net_q_out)];
+};
+
+
 struct mlx5_vdpa_async_virtqueue_cmd {
 	int err;
 	struct mlx5_async_work cb_work;
@@ -1202,6 +1208,7 @@ struct mlx5_vdpa_async_virtqueue_cmd {
 
 	union {
 		struct mlx5_virtqueue_query_mem query;
+		struct mlx5_virtqueue_modify_mem modify;
 	};
 };
 
@@ -1384,51 +1391,35 @@ static bool modifiable_virtqueue_fields(struct mlx5_vdpa_virtqueue *mvq)
 	return true;
 }
 
-static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
-			    struct mlx5_vdpa_virtqueue *mvq,
-			    int state)
+static void fill_modify_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
+				      struct mlx5_vdpa_virtqueue *mvq,
+				      int state,
+				      struct mlx5_vdpa_async_virtqueue_cmd *cmd)
 {
-	int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
-	u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
 	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
 	struct mlx5_vdpa_mr *desc_mr = NULL;
 	struct mlx5_vdpa_mr *vq_mr = NULL;
-	bool state_change = false;
 	void *obj_context;
 	void *cmd_hdr;
 	void *vq_ctx;
-	void *in;
-	int err;
 
-	if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_NONE)
-		return 0;
-
-	if (!modifiable_virtqueue_fields(mvq))
-		return -EINVAL;
-
-	in = kzalloc(inlen, GFP_KERNEL);
-	if (!in)
-		return -ENOMEM;
+	cmd->in = &cmd->modify.in;
+	cmd->inlen = sizeof(cmd->modify.in);
+	cmd->out = &cmd->modify.out;
+	cmd->outlen = sizeof(cmd->modify.out);
 
-	cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+	cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, cmd->in, general_obj_in_cmd_hdr);
 
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
-	obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+	obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, cmd->in, obj_context);
 	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
 
-	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
-		if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
-			err = -EINVAL;
-			goto done;
-		}
-
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
 		MLX5_SET(virtio_net_q_object, obj_context, state, state);
-		state_change = true;
-	}
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS) {
 		MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
@@ -1474,38 +1465,36 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 	}
 
 	MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
-	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
-	if (err)
-		goto done;
+}
 
-	if (state_change)
-		mvq->fw_state = state;
+static void modify_virtqueue_end(struct mlx5_vdpa_net *ndev,
+				 struct mlx5_vdpa_virtqueue *mvq,
+				 int state)
+{
+	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
+		unsigned int asid = mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP];
+		struct mlx5_vdpa_mr *vq_mr = mvdev->mr[asid];
+
 		mlx5_vdpa_put_mr(mvdev, mvq->vq_mr);
 		mlx5_vdpa_get_mr(mvdev, vq_mr);
 		mvq->vq_mr = vq_mr;
 	}
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
+		unsigned int asid = mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP];
+		struct mlx5_vdpa_mr *desc_mr = mvdev->mr[asid];
+
 		mlx5_vdpa_put_mr(mvdev, mvq->desc_mr);
 		mlx5_vdpa_get_mr(mvdev, desc_mr);
 		mvq->desc_mr = desc_mr;
 	}
 
-	mvq->modified_fields = 0;
-
-done:
-	kfree(in);
-	return err;
-}
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
+		mvq->fw_state = state;
 
-static int modify_virtqueue_state(struct mlx5_vdpa_net *ndev,
-				  struct mlx5_vdpa_virtqueue *mvq,
-				  unsigned int state)
-{
-	mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
-	return modify_virtqueue(ndev, mvq, state);
+	mvq->modified_fields = 0;
 }
 
 static int counter_set_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
@@ -1658,6 +1647,73 @@ static int setup_vq(struct mlx5_vdpa_net *ndev,
 	return err;
 }
 
+static int modify_virtqueues(struct mlx5_vdpa_net *ndev, int start_vq, int num_vqs, int state)
+{
+	struct mlx5_vdpa_async_virtqueue_cmd *cmds;
+	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+	int err = 0;
+
+	WARN(start_vq + num_vqs > mvdev->max_vqs, "modify vq range invalid [%d, %d), max_vqs: %u\n",
+	     start_vq, start_vq + num_vqs, mvdev->max_vqs);
+
+	cmds = kvcalloc(num_vqs, sizeof(*cmds), GFP_KERNEL);
+	if (!cmds)
+		return -ENOMEM;
+
+	for (int i = 0; i < num_vqs; i++) {
+		struct mlx5_vdpa_async_virtqueue_cmd *cmd = &cmds[i];
+		struct mlx5_vdpa_virtqueue *mvq;
+		int vq_idx = start_vq + i;
+
+		mvq = &ndev->vqs[vq_idx];
+
+		if (!modifiable_virtqueue_fields(mvq)) {
+			err = -EINVAL;
+			goto done;
+		}
+
+		if (mvq->fw_state != state) {
+			if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
+				err = -EINVAL;
+				goto done;
+			}
+
+			mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
+		}
+
+		fill_modify_virtqueue_cmd(ndev, mvq, state, cmd);
+	}
+
+	err = exec_virtqueue_async_cmds(ndev, cmds, num_vqs);
+	if (err) {
+		mlx5_vdpa_err(mvdev, "error issuing modify cmd for vq range [%d, %d)\n",
+			      start_vq, start_vq + num_vqs);
+		goto done;
+	}
+
+	for (int i = 0; i < num_vqs; i++) {
+		struct mlx5_vdpa_async_virtqueue_cmd *cmd = &cmds[i];
+		struct mlx5_vdpa_virtqueue *mvq;
+		int vq_idx = start_vq + i;
+
+		mvq = &ndev->vqs[vq_idx];
+
+		if (cmd->err) {
+			mlx5_vdpa_err(mvdev, "modify vq %d failed, state: %d -> %d, err: %d\n",
+				      vq_idx, mvq->fw_state, state, err);
+			if (!err)
+				err = cmd->err;
+			continue;
+		}
+
+		modify_virtqueue_end(ndev, mvq, state);
+	}
+
+done:
+	kfree(cmds);
+	return err;
+}
+
 static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 {
 	struct mlx5_virtq_attr attr;
@@ -1669,7 +1725,7 @@ static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mv
 	if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
 		return 0;
 
-	err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
+	err = modify_virtqueues(ndev, mvq->index, 1, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
 	if (err) {
 		mlx5_vdpa_err(&ndev->mvdev, "modify to suspend failed, err: %d\n", err);
 		return err;
@@ -1716,7 +1772,7 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
 		/* Due to a FW quirk we need to modify the VQ fields first then change state.
 		 * This should be fixed soon. After that, a single command can be used.
 		 */
-		err = modify_virtqueue(ndev, mvq, 0);
+		err = modify_virtqueues(ndev, mvq->index, 1, mvq->fw_state);
 		if (err) {
 			mlx5_vdpa_err(&ndev->mvdev,
 				"modify vq properties failed for vq %u, err: %d\n",
@@ -1738,7 +1794,7 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
 		return -EINVAL;
 	}
 
-	err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+	err = modify_virtqueues(ndev, mvq->index, 1, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
 	if (err)
 		mlx5_vdpa_err(&ndev->mvdev, "modify to resume failed for vq %u, err: %d\n",
 			      mvq->index, err);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH vhost 5/7] vdpa/mlx5: Parallelize device suspend
  2024-08-02  7:20 [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Dragos Tatulea
                   ` (2 preceding siblings ...)
  2024-08-02  7:20 ` [PATCH vhost 4/7] vdpa/mlx5: Use async API for vq modify commands Dragos Tatulea
@ 2024-08-02  7:20 ` Dragos Tatulea
  2024-08-02  7:20 ` [PATCH vhost 6/7] vdpa/mlx5: Parallelize device resume Dragos Tatulea
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Dragos Tatulea @ 2024-08-02  7:20 UTC (permalink / raw)
  To: Dragos Tatulea, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez
  Cc: Si-Wei Liu, Tariq Toukan, virtualization, linux-kernel

Currently device suspend works on vqs serially. Building up on previous
changes that converted vq operations to the async api, this patch
parallelizes the device suspend:
1) Suspend all active vqs parallel.
2) Query suspended vqs in parallel.

For 1 vDPA device x 32 VQs (16 VQPs) attached to a large VM (256 GB RAM,
32 CPUs x 2 threads per core), the device suspend time is reduced from
~37 ms to ~13 ms.

A later patch will remove the link unregister operation which will make
it even faster.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 56 ++++++++++++++++---------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index e56a0ee1b725..1887939c5673 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1714,49 +1714,51 @@ static int modify_virtqueues(struct mlx5_vdpa_net *ndev, int start_vq, int num_v
 	return err;
 }
 
-static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+static int suspend_vqs(struct mlx5_vdpa_net *ndev, int start_vq, int num_vqs)
 {
-	struct mlx5_virtq_attr attr;
+	struct mlx5_vdpa_virtqueue *mvq;
+	struct mlx5_virtq_attr *attrs;
+	int vq_idx, i;
 	int err;
 
+	if (start_vq >= ndev->cur_num_vqs)
+		return -EINVAL;
+
+	mvq = &ndev->vqs[start_vq];
 	if (!mvq->initialized)
 		return 0;
 
 	if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
 		return 0;
 
-	err = modify_virtqueues(ndev, mvq->index, 1, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
-	if (err) {
-		mlx5_vdpa_err(&ndev->mvdev, "modify to suspend failed, err: %d\n", err);
-		return err;
-	}
-
-	err = query_virtqueues(ndev, mvq->index, 1, &attr);
-	if (err) {
-		mlx5_vdpa_err(&ndev->mvdev, "failed to query virtqueue, err: %d\n", err);
+	err = modify_virtqueues(ndev, start_vq, num_vqs, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
+	if (err)
 		return err;
-	}
-
-	mvq->avail_idx = attr.available_index;
-	mvq->used_idx = attr.used_index;
-
-	return 0;
-}
 
-static int suspend_vqs(struct mlx5_vdpa_net *ndev)
-{
-	int err = 0;
-	int i;
+	attrs = kcalloc(num_vqs, sizeof(struct mlx5_virtq_attr), GFP_KERNEL);
+	if (!attrs)
+		return -ENOMEM;
 
-	for (i = 0; i < ndev->cur_num_vqs; i++) {
-		int local_err = suspend_vq(ndev, &ndev->vqs[i]);
+	err = query_virtqueues(ndev, start_vq, num_vqs, attrs);
+	if (err)
+		goto done;
 
-		err = local_err ? local_err : err;
+	for (i = 0, vq_idx = start_vq; i < num_vqs; i++, vq_idx++) {
+		mvq = &ndev->vqs[vq_idx];
+		mvq->avail_idx = attrs[i].available_index;
+		mvq->used_idx = attrs[i].used_index;
 	}
 
+done:
+	kfree(attrs);
 	return err;
 }
 
+static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	return suspend_vqs(ndev, mvq->index, 1);
+}
+
 static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 {
 	int err;
@@ -3137,7 +3139,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 	bool teardown = !is_resumable(ndev);
 	int err;
 
-	suspend_vqs(ndev);
+	suspend_vqs(ndev, 0, ndev->cur_num_vqs);
 	if (teardown) {
 		err = save_channels_info(ndev);
 		if (err)
@@ -3690,7 +3692,7 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
 
 	down_write(&ndev->reslock);
 	unregister_link_notifier(ndev);
-	err = suspend_vqs(ndev);
+	err = suspend_vqs(ndev, 0, ndev->cur_num_vqs);
 	mlx5_vdpa_cvq_suspend(mvdev);
 	mvdev->suspended = true;
 	up_write(&ndev->reslock);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH vhost 6/7] vdpa/mlx5: Parallelize device resume
  2024-08-02  7:20 [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Dragos Tatulea
                   ` (3 preceding siblings ...)
  2024-08-02  7:20 ` [PATCH vhost 5/7] vdpa/mlx5: Parallelize device suspend Dragos Tatulea
@ 2024-08-02  7:20 ` Dragos Tatulea
  2024-08-02  7:20 ` [PATCH vhost 7/7] vdpa/mlx5: Keep notifiers during suspend but ignore Dragos Tatulea
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Dragos Tatulea @ 2024-08-02  7:20 UTC (permalink / raw)
  To: Dragos Tatulea, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez
  Cc: Si-Wei Liu, Tariq Toukan, virtualization, linux-kernel

Currently device resume works on vqs serially. Building up on previous
changes that converted vq operations to the async api, this patch
parallelizes the device resume.

For 1 vDPA device x 32 VQs (16 VQPs) attached to a large VM (256 GB RAM,
32 CPUs x 2 threads per core), the device resume time is reduced from
~16 ms to ~4.5 ms.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 40 +++++++++++--------------------
 1 file changed, 14 insertions(+), 26 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1887939c5673..87d355aba380 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1759,10 +1759,15 @@ static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mv
 	return suspend_vqs(ndev, mvq->index, 1);
 }
 
-static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+static int resume_vqs(struct mlx5_vdpa_net *ndev, int start_vq, int num_vqs)
 {
+	struct mlx5_vdpa_virtqueue *mvq;
 	int err;
 
+	if (start_vq >= ndev->mvdev.max_vqs)
+		return -EINVAL;
+
+	mvq = &ndev->vqs[start_vq];
 	if (!mvq->initialized)
 		return 0;
 
@@ -1774,13 +1779,9 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
 		/* Due to a FW quirk we need to modify the VQ fields first then change state.
 		 * This should be fixed soon. After that, a single command can be used.
 		 */
-		err = modify_virtqueues(ndev, mvq->index, 1, mvq->fw_state);
-		if (err) {
-			mlx5_vdpa_err(&ndev->mvdev,
-				"modify vq properties failed for vq %u, err: %d\n",
-				mvq->index, err);
+		err = modify_virtqueues(ndev, start_vq, num_vqs, mvq->fw_state);
+		if (err)
 			return err;
-		}
 		break;
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
 		if (!is_resumable(ndev)) {
@@ -1796,25 +1797,12 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
 		return -EINVAL;
 	}
 
-	err = modify_virtqueues(ndev, mvq->index, 1, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
-	if (err)
-		mlx5_vdpa_err(&ndev->mvdev, "modify to resume failed for vq %u, err: %d\n",
-			      mvq->index, err);
-
-	return err;
+	return modify_virtqueues(ndev, start_vq, num_vqs, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
 }
 
-static int resume_vqs(struct mlx5_vdpa_net *ndev)
+static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 {
-	int err = 0;
-
-	for (int i = 0; i < ndev->cur_num_vqs; i++) {
-		int local_err = resume_vq(ndev, &ndev->vqs[i]);
-
-		err = local_err ? local_err : err;
-	}
-
-	return err;
+	return resume_vqs(ndev, mvq->index, 1);
 }
 
 static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
@@ -3164,7 +3152,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 			return err;
 	}
 
-	resume_vqs(ndev);
+	resume_vqs(ndev, 0, ndev->cur_num_vqs);
 
 	return 0;
 }
@@ -3288,7 +3276,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
 				teardown_vq_resources(ndev);
 
 			if (ndev->setup) {
-				err = resume_vqs(ndev);
+				err = resume_vqs(ndev, 0, ndev->cur_num_vqs);
 				if (err) {
 					mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
 					goto err_driver;
@@ -3712,7 +3700,7 @@ static int mlx5_vdpa_resume(struct vdpa_device *vdev)
 
 	down_write(&ndev->reslock);
 	mvdev->suspended = false;
-	err = resume_vqs(ndev);
+	err = resume_vqs(ndev, 0, ndev->cur_num_vqs);
 	register_link_notifier(ndev);
 	up_write(&ndev->reslock);
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH vhost 7/7] vdpa/mlx5: Keep notifiers during suspend but ignore
  2024-08-02  7:20 [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Dragos Tatulea
                   ` (4 preceding siblings ...)
  2024-08-02  7:20 ` [PATCH vhost 6/7] vdpa/mlx5: Parallelize device resume Dragos Tatulea
@ 2024-08-02  7:20 ` Dragos Tatulea
  2024-08-02 13:14 ` [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Michael S. Tsirkin
  2024-08-07 13:25 ` Eugenio Perez Martin
  7 siblings, 0 replies; 16+ messages in thread
From: Dragos Tatulea @ 2024-08-02  7:20 UTC (permalink / raw)
  To: Dragos Tatulea, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez
  Cc: Si-Wei Liu, Tariq Toukan, virtualization, linux-kernel

Unregistering notifiers is a costly operation. Instead of removing
the notifiers during device suspend and adding them back at resume,
simply ignore the call when the device is suspended.

At resume time call queue_link_work() to make sure that the device state
is propagated in case there were changes.

For 1 vDPA device x 32 VQs (16 VQPs) attached to a large VM (256 GB RAM,
32 CPUs x 2 threads per core), the device suspend time is reduced from
~13 ms to ~2.5 ms.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 87d355aba380..af96e49697d0 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2934,6 +2934,9 @@ static int event_handler(struct notifier_block *nb, unsigned long event, void *p
 	struct mlx5_eqe *eqe = param;
 	int ret = NOTIFY_DONE;
 
+	if (ndev->mvdev.suspended)
+		return NOTIFY_DONE;
+
 	if (event == MLX5_EVENT_TYPE_PORT_CHANGE) {
 		switch (eqe->sub_type) {
 		case MLX5_PORT_CHANGE_SUBTYPE_DOWN:
@@ -3679,7 +3682,6 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
 	mlx5_vdpa_info(mvdev, "suspending device\n");
 
 	down_write(&ndev->reslock);
-	unregister_link_notifier(ndev);
 	err = suspend_vqs(ndev, 0, ndev->cur_num_vqs);
 	mlx5_vdpa_cvq_suspend(mvdev);
 	mvdev->suspended = true;
@@ -3701,7 +3703,7 @@ static int mlx5_vdpa_resume(struct vdpa_device *vdev)
 	down_write(&ndev->reslock);
 	mvdev->suspended = false;
 	err = resume_vqs(ndev, 0, ndev->cur_num_vqs);
-	register_link_notifier(ndev);
+	queue_link_work(ndev);
 	up_write(&ndev->reslock);
 
 	return err;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume
  2024-08-02  7:20 [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Dragos Tatulea
                   ` (5 preceding siblings ...)
  2024-08-02  7:20 ` [PATCH vhost 7/7] vdpa/mlx5: Keep notifiers during suspend but ignore Dragos Tatulea
@ 2024-08-02 13:14 ` Michael S. Tsirkin
  2024-08-04  8:48   ` Leon Romanovsky
  2024-08-16  9:13   ` Dragos Tatulea
  2024-08-07 13:25 ` Eugenio Perez Martin
  7 siblings, 2 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2024-08-02 13:14 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Si-Wei Liu, netdev, linux-rdma, linux-kernel,
	virtualization

On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
> This series parallelizes the mlx5_vdpa device suspend and resume
> operations through the firmware async API. The purpose is to reduce live
> migration downtime.
> 
> The series starts with changing the VQ suspend and resume commands
> to the async API. After that, the switch is made to issue multiple
> commands of the same type in parallel.
> 
> Finally, a bonus improvement is thrown in: keep the notifierd enabled
> during suspend but make it a NOP. Upon resume make sure that the link
> state is forwarded. This shaves around 30ms per device constant time.
> 
> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> x 2 threads per core), the improvements are:
> 
> +-------------------+--------+--------+-----------+
> | operation         | Before | After  | Reduction |
> |-------------------+--------+--------+-----------|
> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> +-------------------+--------+--------+-----------+
> 
> Note for the maintainers:
> The first patch contains changes for mlx5_core. This must be applied
> into the mlx5-vhost tree [0] first. Once this patch is applied on
> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> tree and only then the remaining patches can be applied.

Or maintainer just acks it and I apply directly.

Let me know when all this can happen.

> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> 
> Dragos Tatulea (7):
>   net/mlx5: Support throttled commands from async API
>   vdpa/mlx5: Introduce error logging function
>   vdpa/mlx5: Use async API for vq query command
>   vdpa/mlx5: Use async API for vq modify commands
>   vdpa/mlx5: Parallelize device suspend
>   vdpa/mlx5: Parallelize device resume
>   vdpa/mlx5: Keep notifiers during suspend but ignore
> 
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
>  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
>  3 files changed, 333 insertions(+), 130 deletions(-)
> 
> -- 
> 2.45.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume
  2024-08-02 13:14 ` [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Michael S. Tsirkin
@ 2024-08-04  8:48   ` Leon Romanovsky
  2024-08-04 13:39     ` Michael S. Tsirkin
  2024-08-16  9:13   ` Dragos Tatulea
  1 sibling, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2024-08-04  8:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Dragos Tatulea, Saeed Mahameed, Tariq Toukan, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Si-Wei Liu, netdev, linux-rdma, linux-kernel,
	virtualization

On Fri, Aug 02, 2024 at 09:14:28AM -0400, Michael S. Tsirkin wrote:
> On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
> > This series parallelizes the mlx5_vdpa device suspend and resume
> > operations through the firmware async API. The purpose is to reduce live
> > migration downtime.
> > 
> > The series starts with changing the VQ suspend and resume commands
> > to the async API. After that, the switch is made to issue multiple
> > commands of the same type in parallel.
> > 
> > Finally, a bonus improvement is thrown in: keep the notifierd enabled
> > during suspend but make it a NOP. Upon resume make sure that the link
> > state is forwarded. This shaves around 30ms per device constant time.
> > 
> > For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> > x 2 threads per core), the improvements are:
> > 
> > +-------------------+--------+--------+-----------+
> > | operation         | Before | After  | Reduction |
> > |-------------------+--------+--------+-----------|
> > | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> > | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> > +-------------------+--------+--------+-----------+
> > 
> > Note for the maintainers:
> > The first patch contains changes for mlx5_core. This must be applied
> > into the mlx5-vhost tree [0] first. Once this patch is applied on
> > mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> > tree and only then the remaining patches can be applied.
> 
> Or maintainer just acks it and I apply directly.

We can do it, but there is a potential to create a conflict between your tree
and netdev for whole cycle, which will be a bit annoying. Easiest way to avoid
this is to have a shared branch, but in august everyone is on vacation, so it
will be probably fine to apply such patch directly.

Thanks

> 
> Let me know when all this can happen.
> 
> > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> > 
> > Dragos Tatulea (7):
> >   net/mlx5: Support throttled commands from async API
> >   vdpa/mlx5: Introduce error logging function
> >   vdpa/mlx5: Use async API for vq query command
> >   vdpa/mlx5: Use async API for vq modify commands
> >   vdpa/mlx5: Parallelize device suspend
> >   vdpa/mlx5: Parallelize device resume
> >   vdpa/mlx5: Keep notifiers during suspend but ignore
> > 
> >  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
> >  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
> >  3 files changed, 333 insertions(+), 130 deletions(-)
> > 
> > -- 
> > 2.45.2
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume
  2024-08-04  8:48   ` Leon Romanovsky
@ 2024-08-04 13:39     ` Michael S. Tsirkin
  2024-08-04 14:52       ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2024-08-04 13:39 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Dragos Tatulea, Saeed Mahameed, Tariq Toukan, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Si-Wei Liu, netdev, linux-rdma, linux-kernel,
	virtualization

On Sun, Aug 04, 2024 at 11:48:39AM +0300, Leon Romanovsky wrote:
> On Fri, Aug 02, 2024 at 09:14:28AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
> > > This series parallelizes the mlx5_vdpa device suspend and resume
> > > operations through the firmware async API. The purpose is to reduce live
> > > migration downtime.
> > > 
> > > The series starts with changing the VQ suspend and resume commands
> > > to the async API. After that, the switch is made to issue multiple
> > > commands of the same type in parallel.
> > > 
> > > Finally, a bonus improvement is thrown in: keep the notifierd enabled
> > > during suspend but make it a NOP. Upon resume make sure that the link
> > > state is forwarded. This shaves around 30ms per device constant time.
> > > 
> > > For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> > > x 2 threads per core), the improvements are:
> > > 
> > > +-------------------+--------+--------+-----------+
> > > | operation         | Before | After  | Reduction |
> > > |-------------------+--------+--------+-----------|
> > > | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> > > | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> > > +-------------------+--------+--------+-----------+
> > > 
> > > Note for the maintainers:
> > > The first patch contains changes for mlx5_core. This must be applied
> > > into the mlx5-vhost tree [0] first. Once this patch is applied on
> > > mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> > > tree and only then the remaining patches can be applied.
> > 
> > Or maintainer just acks it and I apply directly.
> 
> We can do it, but there is a potential to create a conflict between your tree
> and netdev for whole cycle, which will be a bit annoying. Easiest way to avoid
> this is to have a shared branch, but in august everyone is on vacation, so it
> will be probably fine to apply such patch directly.
> 
> Thanks

We can let Linus do something, it's ok ;)

> > 
> > Let me know when all this can happen.
> > 
> > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> > > 
> > > Dragos Tatulea (7):
> > >   net/mlx5: Support throttled commands from async API
> > >   vdpa/mlx5: Introduce error logging function
> > >   vdpa/mlx5: Use async API for vq query command
> > >   vdpa/mlx5: Use async API for vq modify commands
> > >   vdpa/mlx5: Parallelize device suspend
> > >   vdpa/mlx5: Parallelize device resume
> > >   vdpa/mlx5: Keep notifiers during suspend but ignore
> > > 
> > >  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
> > >  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
> > >  3 files changed, 333 insertions(+), 130 deletions(-)
> > > 
> > > -- 
> > > 2.45.2
> > 
> > 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume
  2024-08-04 13:39     ` Michael S. Tsirkin
@ 2024-08-04 14:52       ` Leon Romanovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-08-04 14:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Dragos Tatulea, Saeed Mahameed, Tariq Toukan, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Si-Wei Liu, netdev, linux-rdma, linux-kernel,
	virtualization

On Sun, Aug 04, 2024 at 09:39:29AM -0400, Michael S. Tsirkin wrote:
> On Sun, Aug 04, 2024 at 11:48:39AM +0300, Leon Romanovsky wrote:
> > On Fri, Aug 02, 2024 at 09:14:28AM -0400, Michael S. Tsirkin wrote:
> > > On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
> > > > This series parallelizes the mlx5_vdpa device suspend and resume
> > > > operations through the firmware async API. The purpose is to reduce live
> > > > migration downtime.
> > > > 
> > > > The series starts with changing the VQ suspend and resume commands
> > > > to the async API. After that, the switch is made to issue multiple
> > > > commands of the same type in parallel.
> > > > 
> > > > Finally, a bonus improvement is thrown in: keep the notifierd enabled
> > > > during suspend but make it a NOP. Upon resume make sure that the link
> > > > state is forwarded. This shaves around 30ms per device constant time.
> > > > 
> > > > For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> > > > x 2 threads per core), the improvements are:
> > > > 
> > > > +-------------------+--------+--------+-----------+
> > > > | operation         | Before | After  | Reduction |
> > > > |-------------------+--------+--------+-----------|
> > > > | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> > > > | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> > > > +-------------------+--------+--------+-----------+
> > > > 
> > > > Note for the maintainers:
> > > > The first patch contains changes for mlx5_core. This must be applied
> > > > into the mlx5-vhost tree [0] first. Once this patch is applied on
> > > > mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> > > > tree and only then the remaining patches can be applied.
> > > 
> > > Or maintainer just acks it and I apply directly.
> > 
> > We can do it, but there is a potential to create a conflict between your tree
> > and netdev for whole cycle, which will be a bit annoying. Easiest way to avoid
> > this is to have a shared branch, but in august everyone is on vacation, so it
> > will be probably fine to apply such patch directly.
> > 
> > Thanks
> 
> We can let Linus do something, it's ok ;)

Right and this is how it was for years - Linus dealt with the conflicts
between RDMA and netdev, until he pushed us to have a shared branch :).

However, in this specific cycle and for this specific change, we probably won't
get any conflicts between various trees.

Thanks

> 
> > > 
> > > Let me know when all this can happen.
> > > 
> > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> > > > 
> > > > Dragos Tatulea (7):
> > > >   net/mlx5: Support throttled commands from async API
> > > >   vdpa/mlx5: Introduce error logging function
> > > >   vdpa/mlx5: Use async API for vq query command
> > > >   vdpa/mlx5: Use async API for vq modify commands
> > > >   vdpa/mlx5: Parallelize device suspend
> > > >   vdpa/mlx5: Parallelize device resume
> > > >   vdpa/mlx5: Keep notifiers during suspend but ignore
> > > > 
> > > >  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
> > > >  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
> > > >  3 files changed, 333 insertions(+), 130 deletions(-)
> > > > 
> > > > -- 
> > > > 2.45.2
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH vhost 2/7] vdpa/mlx5: Introduce error logging function
  2024-08-02  7:20 ` [PATCH vhost 2/7] vdpa/mlx5: Introduce error logging function Dragos Tatulea
@ 2024-08-06 17:57   ` Eugenio Perez Martin
  0 siblings, 0 replies; 16+ messages in thread
From: Eugenio Perez Martin @ 2024-08-06 17:57 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Si-Wei Liu,
	Tariq Toukan, virtualization, linux-kernel

On Fri, Aug 2, 2024 at 9:24 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> mlx5_vdpa_err() was missing. This patch adds it and uses it in the
> necessary places.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  5 +++++
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 24 ++++++++++++------------
>  2 files changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> index 50aac8fe57ef..424d445ebee4 100644
> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> @@ -135,6 +135,11 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
>  int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev);
>  int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
>
> +#define mlx5_vdpa_err(__dev, format, ...)                                                          \
> +       dev_err((__dev)->mdev->device, "%s:%d:(pid %d) error: " format, __func__, __LINE__,        \
> +                current->pid, ##__VA_ARGS__)
> +
> +
>  #define mlx5_vdpa_warn(__dev, format, ...)                                                         \
>         dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__,     \
>                  current->pid, ##__VA_ARGS__)
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index fa78e8288ebb..12133e5d1285 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1538,13 +1538,13 @@ static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mv
>
>         err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
>         if (err) {
> -               mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed, err: %d\n", err);
> +               mlx5_vdpa_err(&ndev->mvdev, "modify to suspend failed, err: %d\n", err);
>                 return err;
>         }
>
>         err = query_virtqueue(ndev, mvq, &attr);
>         if (err) {
> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue, err: %d\n", err);
> +               mlx5_vdpa_err(&ndev->mvdev, "failed to query virtqueue, err: %d\n", err);
>                 return err;
>         }
>
> @@ -1585,7 +1585,7 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
>                  */
>                 err = modify_virtqueue(ndev, mvq, 0);
>                 if (err) {
> -                       mlx5_vdpa_warn(&ndev->mvdev,
> +                       mlx5_vdpa_err(&ndev->mvdev,
>                                 "modify vq properties failed for vq %u, err: %d\n",
>                                 mvq->index, err);
>                         return err;
> @@ -1600,15 +1600,15 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
>         case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
>                 return 0;
>         default:
> -               mlx5_vdpa_warn(&ndev->mvdev, "resume vq %u called from bad state %d\n",
> +               mlx5_vdpa_err(&ndev->mvdev, "resume vq %u called from bad state %d\n",
>                                mvq->index, mvq->fw_state);
>                 return -EINVAL;
>         }
>
>         err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
>         if (err)
> -               mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u, err: %d\n",
> -                              mvq->index, err);
> +               mlx5_vdpa_err(&ndev->mvdev, "modify to resume failed for vq %u, err: %d\n",
> +                             mvq->index, err);
>
>         return err;
>  }
> @@ -2002,13 +2002,13 @@ static int setup_steering(struct mlx5_vdpa_net *ndev)
>
>         ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, MLX5_FLOW_NAMESPACE_BYPASS);
>         if (!ns) {
> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to get flow namespace\n");
> +               mlx5_vdpa_err(&ndev->mvdev, "failed to get flow namespace\n");
>                 return -EOPNOTSUPP;
>         }
>
>         ndev->rxft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
>         if (IS_ERR(ndev->rxft)) {
> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to create flow table\n");
> +               mlx5_vdpa_err(&ndev->mvdev, "failed to create flow table\n");
>                 return PTR_ERR(ndev->rxft);
>         }
>         mlx5_vdpa_add_rx_flow_table(ndev);
> @@ -2530,7 +2530,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa
>
>         err = query_virtqueue(ndev, mvq, &attr);
>         if (err) {
> -               mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
> +               mlx5_vdpa_err(mvdev, "failed to query virtqueue\n");
>                 return err;
>         }
>         state->split.avail_index = attr.used_index;
> @@ -3189,7 +3189,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
>         if ((flags & VDPA_RESET_F_CLEAN_MAP) &&
>             MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
>                 if (mlx5_vdpa_create_dma_mr(mvdev))
> -                       mlx5_vdpa_warn(mvdev, "create MR failed\n");
> +                       mlx5_vdpa_err(mvdev, "create MR failed\n");
>         }
>         if (vq_reset)
>                 setup_vq_resources(ndev, false);
> @@ -3244,7 +3244,7 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
>                 new_mr = mlx5_vdpa_create_mr(mvdev, iotlb);
>                 if (IS_ERR(new_mr)) {
>                         err = PTR_ERR(new_mr);
> -                       mlx5_vdpa_warn(mvdev, "create map failed(%d)\n", err);
> +                       mlx5_vdpa_err(mvdev, "create map failed(%d)\n", err);
>                         return err;
>                 }
>         } else {
> @@ -3257,7 +3257,7 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
>         } else {
>                 err = mlx5_vdpa_change_map(mvdev, new_mr, asid);
>                 if (err) {
> -                       mlx5_vdpa_warn(mvdev, "change map failed(%d)\n", err);
> +                       mlx5_vdpa_err(mvdev, "change map failed(%d)\n", err);
>                         goto out_err;
>                 }
>         }
> --
> 2.45.2
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH vhost 4/7] vdpa/mlx5: Use async API for vq modify commands
  2024-08-02  7:20 ` [PATCH vhost 4/7] vdpa/mlx5: Use async API for vq modify commands Dragos Tatulea
@ 2024-08-07 13:12   ` Eugenio Perez Martin
  0 siblings, 0 replies; 16+ messages in thread
From: Eugenio Perez Martin @ 2024-08-07 13:12 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Si-Wei Liu,
	Tariq Toukan, virtualization, linux-kernel

On Fri, Aug 2, 2024 at 9:24 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Switch firmware vq modify command to be issued via the async API to
> allow future parallelization. The new refactored function applies the
> modify on a range of vqs and waits for their execution to complete.
>
> For now the command is still used in a serial fashion. A later patch
> will switch to modifying multiple vqs in parallel.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 150 ++++++++++++++++++++----------
>  1 file changed, 103 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index be8df9d9f4df..e56a0ee1b725 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1189,6 +1189,12 @@ struct mlx5_virtqueue_query_mem {
>         u8 out[MLX5_ST_SZ_BYTES(query_virtio_net_q_out)];
>  };
>
> +struct mlx5_virtqueue_modify_mem {
> +       u8 in[MLX5_ST_SZ_BYTES(modify_virtio_net_q_in)];
> +       u8 out[MLX5_ST_SZ_BYTES(modify_virtio_net_q_out)];
> +};
> +
> +

Extra newline here.

>  struct mlx5_vdpa_async_virtqueue_cmd {
>         int err;
>         struct mlx5_async_work cb_work;
> @@ -1202,6 +1208,7 @@ struct mlx5_vdpa_async_virtqueue_cmd {
>
>         union {
>                 struct mlx5_virtqueue_query_mem query;
> +               struct mlx5_virtqueue_modify_mem modify;
>         };
>  };
>
> @@ -1384,51 +1391,35 @@ static bool modifiable_virtqueue_fields(struct mlx5_vdpa_virtqueue *mvq)
>         return true;
>  }
>
> -static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
> -                           struct mlx5_vdpa_virtqueue *mvq,
> -                           int state)
> +static void fill_modify_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
> +                                     struct mlx5_vdpa_virtqueue *mvq,
> +                                     int state,
> +                                     struct mlx5_vdpa_async_virtqueue_cmd *cmd)
>  {
> -       int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
> -       u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
>         struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
>         struct mlx5_vdpa_mr *desc_mr = NULL;
>         struct mlx5_vdpa_mr *vq_mr = NULL;
> -       bool state_change = false;
>         void *obj_context;
>         void *cmd_hdr;
>         void *vq_ctx;
> -       void *in;
> -       int err;
>
> -       if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_NONE)
> -               return 0;
> -
> -       if (!modifiable_virtqueue_fields(mvq))
> -               return -EINVAL;
> -
> -       in = kzalloc(inlen, GFP_KERNEL);
> -       if (!in)
> -               return -ENOMEM;
> +       cmd->in = &cmd->modify.in;
> +       cmd->inlen = sizeof(cmd->modify.in);
> +       cmd->out = &cmd->modify.out;
> +       cmd->outlen = sizeof(cmd->modify.out);
>
> -       cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, in, general_obj_in_cmd_hdr);
> +       cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, cmd->in, general_obj_in_cmd_hdr);
>
>         MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
>         MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
>         MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
>         MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
>
> -       obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
> +       obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, cmd->in, obj_context);
>         vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
>
> -       if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
> -               if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
> -                       err = -EINVAL;
> -                       goto done;
> -               }
> -
> +       if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
>                 MLX5_SET(virtio_net_q_object, obj_context, state, state);
> -               state_change = true;
> -       }
>
>         if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS) {
>                 MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
> @@ -1474,38 +1465,36 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
>         }
>
>         MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
> -       err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
> -       if (err)
> -               goto done;
> +}
>
> -       if (state_change)
> -               mvq->fw_state = state;
> +static void modify_virtqueue_end(struct mlx5_vdpa_net *ndev,
> +                                struct mlx5_vdpa_virtqueue *mvq,
> +                                int state)
> +{
> +       struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
>
>         if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
> +               unsigned int asid = mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP];
> +               struct mlx5_vdpa_mr *vq_mr = mvdev->mr[asid];
> +
>                 mlx5_vdpa_put_mr(mvdev, mvq->vq_mr);
>                 mlx5_vdpa_get_mr(mvdev, vq_mr);
>                 mvq->vq_mr = vq_mr;
>         }
>
>         if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
> +               unsigned int asid = mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP];
> +               struct mlx5_vdpa_mr *desc_mr = mvdev->mr[asid];
> +
>                 mlx5_vdpa_put_mr(mvdev, mvq->desc_mr);
>                 mlx5_vdpa_get_mr(mvdev, desc_mr);
>                 mvq->desc_mr = desc_mr;
>         }
>
> -       mvq->modified_fields = 0;
> -
> -done:
> -       kfree(in);
> -       return err;
> -}
> +       if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
> +               mvq->fw_state = state;
>
> -static int modify_virtqueue_state(struct mlx5_vdpa_net *ndev,
> -                                 struct mlx5_vdpa_virtqueue *mvq,
> -                                 unsigned int state)
> -{
> -       mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
> -       return modify_virtqueue(ndev, mvq, state);
> +       mvq->modified_fields = 0;
>  }
>
>  static int counter_set_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> @@ -1658,6 +1647,73 @@ static int setup_vq(struct mlx5_vdpa_net *ndev,
>         return err;
>  }
>
> +static int modify_virtqueues(struct mlx5_vdpa_net *ndev, int start_vq, int num_vqs, int state)
> +{
> +       struct mlx5_vdpa_async_virtqueue_cmd *cmds;
> +       struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
> +       int err = 0;
> +
> +       WARN(start_vq + num_vqs > mvdev->max_vqs, "modify vq range invalid [%d, %d), max_vqs: %u\n",
> +            start_vq, start_vq + num_vqs, mvdev->max_vqs);
> +
> +       cmds = kvcalloc(num_vqs, sizeof(*cmds), GFP_KERNEL);
> +       if (!cmds)
> +               return -ENOMEM;
> +
> +       for (int i = 0; i < num_vqs; i++) {
> +               struct mlx5_vdpa_async_virtqueue_cmd *cmd = &cmds[i];
> +               struct mlx5_vdpa_virtqueue *mvq;
> +               int vq_idx = start_vq + i;
> +
> +               mvq = &ndev->vqs[vq_idx];
> +
> +               if (!modifiable_virtqueue_fields(mvq)) {
> +                       err = -EINVAL;
> +                       goto done;
> +               }
> +
> +               if (mvq->fw_state != state) {
> +                       if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
> +                               err = -EINVAL;
> +                               goto done;
> +                       }
> +
> +                       mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
> +               }
> +
> +               fill_modify_virtqueue_cmd(ndev, mvq, state, cmd);
> +       }
> +
> +       err = exec_virtqueue_async_cmds(ndev, cmds, num_vqs);
> +       if (err) {
> +               mlx5_vdpa_err(mvdev, "error issuing modify cmd for vq range [%d, %d)\n",
> +                             start_vq, start_vq + num_vqs);
> +               goto done;
> +       }
> +
> +       for (int i = 0; i < num_vqs; i++) {
> +               struct mlx5_vdpa_async_virtqueue_cmd *cmd = &cmds[i];
> +               struct mlx5_vdpa_virtqueue *mvq;
> +               int vq_idx = start_vq + i;
> +
> +               mvq = &ndev->vqs[vq_idx];
> +
> +               if (cmd->err) {
> +                       mlx5_vdpa_err(mvdev, "modify vq %d failed, state: %d -> %d, err: %d\n",
> +                                     vq_idx, mvq->fw_state, state, err);
> +                       if (!err)
> +                               err = cmd->err;
> +                       continue;
> +               }
> +
> +               modify_virtqueue_end(ndev, mvq, state);
> +       }
> +
> +done:
> +       kfree(cmds);
> +       return err;
> +}
> +
>  static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>  {
>         struct mlx5_virtq_attr attr;
> @@ -1669,7 +1725,7 @@ static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mv
>         if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
>                 return 0;
>
> -       err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
> +       err = modify_virtqueues(ndev, mvq->index, 1, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
>         if (err) {
>                 mlx5_vdpa_err(&ndev->mvdev, "modify to suspend failed, err: %d\n", err);
>                 return err;
> @@ -1716,7 +1772,7 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
>                 /* Due to a FW quirk we need to modify the VQ fields first then change state.
>                  * This should be fixed soon. After that, a single command can be used.
>                  */
> -               err = modify_virtqueue(ndev, mvq, 0);
> +               err = modify_virtqueues(ndev, mvq->index, 1, mvq->fw_state);
>                 if (err) {
>                         mlx5_vdpa_err(&ndev->mvdev,
>                                 "modify vq properties failed for vq %u, err: %d\n",
> @@ -1738,7 +1794,7 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
>                 return -EINVAL;
>         }
>
> -       err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> +       err = modify_virtqueues(ndev, mvq->index, 1, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
>         if (err)
>                 mlx5_vdpa_err(&ndev->mvdev, "modify to resume failed for vq %u, err: %d\n",
>                               mvq->index, err);
> --
> 2.45.2
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume
  2024-08-02  7:20 [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Dragos Tatulea
                   ` (6 preceding siblings ...)
  2024-08-02 13:14 ` [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Michael S. Tsirkin
@ 2024-08-07 13:25 ` Eugenio Perez Martin
  2024-08-07 14:54   ` Dragos Tatulea
  7 siblings, 1 reply; 16+ messages in thread
From: Eugenio Perez Martin @ 2024-08-07 13:25 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Si-Wei Liu, netdev, linux-rdma,
	linux-kernel, virtualization

On Fri, Aug 2, 2024 at 9:24 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> This series parallelizes the mlx5_vdpa device suspend and resume
> operations through the firmware async API. The purpose is to reduce live
> migration downtime.
>
> The series starts with changing the VQ suspend and resume commands
> to the async API. After that, the switch is made to issue multiple
> commands of the same type in parallel.
>

There is a missed opportunity processing the CVQ MQ command here,
isn't it? It can be applied on top in another series for sure.

> Finally, a bonus improvement is thrown in: keep the notifierd enabled
> during suspend but make it a NOP. Upon resume make sure that the link
> state is forwarded. This shaves around 30ms per device constant time.
>
> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> x 2 threads per core), the improvements are:
>
> +-------------------+--------+--------+-----------+
> | operation         | Before | After  | Reduction |
> |-------------------+--------+--------+-----------|
> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> +-------------------+--------+--------+-----------+
>

Looks great :).

Apart from the nitpick,

Acked-by: Eugenio Pérez <eperezma@redhat.com>

For the vhost part.

Thanks!

> Note for the maintainers:
> The first patch contains changes for mlx5_core. This must be applied
> into the mlx5-vhost tree [0] first. Once this patch is applied on
> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> tree and only then the remaining patches can be applied.
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
>
> Dragos Tatulea (7):
>   net/mlx5: Support throttled commands from async API
>   vdpa/mlx5: Introduce error logging function
>   vdpa/mlx5: Use async API for vq query command
>   vdpa/mlx5: Use async API for vq modify commands
>   vdpa/mlx5: Parallelize device suspend
>   vdpa/mlx5: Parallelize device resume
>   vdpa/mlx5: Keep notifiers during suspend but ignore
>
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
>  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
>  3 files changed, 333 insertions(+), 130 deletions(-)
>
> --
> 2.45.2
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume
  2024-08-07 13:25 ` Eugenio Perez Martin
@ 2024-08-07 14:54   ` Dragos Tatulea
  0 siblings, 0 replies; 16+ messages in thread
From: Dragos Tatulea @ 2024-08-07 14:54 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Si-Wei Liu, netdev, linux-rdma,
	linux-kernel, virtualization



On 07.08.24 15:25, Eugenio Perez Martin wrote:
> On Fri, Aug 2, 2024 at 9:24 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>
>> This series parallelizes the mlx5_vdpa device suspend and resume
>> operations through the firmware async API. The purpose is to reduce live
>> migration downtime.
>>
>> The series starts with changing the VQ suspend and resume commands
>> to the async API. After that, the switch is made to issue multiple
>> commands of the same type in parallel.
>>
> 
> There is a missed opportunity processing the CVQ MQ command here,
> isn't it? It can be applied on top in another series for sure.
> 
Initially I considered that it would complicate the code too much in
change_num_qps(). But in the current state of the patches it's doable.

Will send a V2 with an extra patch for this.

>> Finally, a bonus improvement is thrown in: keep the notifierd enabled
>> during suspend but make it a NOP. Upon resume make sure that the link
>> state is forwarded. This shaves around 30ms per device constant time.
>>
>> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
>> x 2 threads per core), the improvements are:
>>
>> +-------------------+--------+--------+-----------+
>> | operation         | Before | After  | Reduction |
>> |-------------------+--------+--------+-----------|
>> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
>> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
>> +-------------------+--------+--------+-----------+
>>
> 
> Looks great :).
> 
> Apart from the nitpick,
>
> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> 
> For the vhost part.
Thanks!

> 
> Thanks!
> 
>> Note for the maintainers:
>> The first patch contains changes for mlx5_core. This must be applied
>> into the mlx5-vhost tree [0] first. Once this patch is applied on
>> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
>> tree and only then the remaining patches can be applied.
>>
>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
>>
>> Dragos Tatulea (7):
>>   net/mlx5: Support throttled commands from async API
>>   vdpa/mlx5: Introduce error logging function
>>   vdpa/mlx5: Use async API for vq query command
>>   vdpa/mlx5: Use async API for vq modify commands
>>   vdpa/mlx5: Parallelize device suspend
>>   vdpa/mlx5: Parallelize device resume
>>   vdpa/mlx5: Keep notifiers during suspend but ignore
>>
>>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>>  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
>>  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
>>  3 files changed, 333 insertions(+), 130 deletions(-)
>>
>> --
>> 2.45.2
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume
  2024-08-02 13:14 ` [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Michael S. Tsirkin
  2024-08-04  8:48   ` Leon Romanovsky
@ 2024-08-16  9:13   ` Dragos Tatulea
  1 sibling, 0 replies; 16+ messages in thread
From: Dragos Tatulea @ 2024-08-16  9:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Si-Wei Liu, netdev, linux-rdma, linux-kernel,
	virtualization



On 02.08.24 15:14, Michael S. Tsirkin wrote:
> On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
>> This series parallelizes the mlx5_vdpa device suspend and resume
>> operations through the firmware async API. The purpose is to reduce live
>> migration downtime.
>>
>> The series starts with changing the VQ suspend and resume commands
>> to the async API. After that, the switch is made to issue multiple
>> commands of the same type in parallel.
>>
>> Finally, a bonus improvement is thrown in: keep the notifierd enabled
>> during suspend but make it a NOP. Upon resume make sure that the link
>> state is forwarded. This shaves around 30ms per device constant time.
>>
>> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
>> x 2 threads per core), the improvements are:
>>
>> +-------------------+--------+--------+-----------+
>> | operation         | Before | After  | Reduction |
>> |-------------------+--------+--------+-----------|
>> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
>> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
>> +-------------------+--------+--------+-----------+
>>
>> Note for the maintainers:
>> The first patch contains changes for mlx5_core. This must be applied
>> into the mlx5-vhost tree [0] first. Once this patch is applied on
>> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
>> tree and only then the remaining patches can be applied.
> 
> Or maintainer just acks it and I apply directly.
> 
Tariq reviewed the patch, he is a mlx5_core maintainer. So consider it acked.
Just sent the v2 with the same note in the cover letter.

Thanks,
Dragos

> Let me know when all this can happen.
> 
>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
>>
>> Dragos Tatulea (7):
>>   net/mlx5: Support throttled commands from async API
>>   vdpa/mlx5: Introduce error logging function
>>   vdpa/mlx5: Use async API for vq query command
>>   vdpa/mlx5: Use async API for vq modify commands
>>   vdpa/mlx5: Parallelize device suspend
>>   vdpa/mlx5: Parallelize device resume
>>   vdpa/mlx5: Keep notifiers during suspend but ignore
>>
>>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>>  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
>>  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
>>  3 files changed, 333 insertions(+), 130 deletions(-)
>>
>> -- 
>> 2.45.2
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-08-16  9:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02  7:20 [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Dragos Tatulea
2024-08-02  7:20 ` [PATCH vhost 2/7] vdpa/mlx5: Introduce error logging function Dragos Tatulea
2024-08-06 17:57   ` Eugenio Perez Martin
2024-08-02  7:20 ` [PATCH vhost 3/7] vdpa/mlx5: Use async API for vq query command Dragos Tatulea
2024-08-02  7:20 ` [PATCH vhost 4/7] vdpa/mlx5: Use async API for vq modify commands Dragos Tatulea
2024-08-07 13:12   ` Eugenio Perez Martin
2024-08-02  7:20 ` [PATCH vhost 5/7] vdpa/mlx5: Parallelize device suspend Dragos Tatulea
2024-08-02  7:20 ` [PATCH vhost 6/7] vdpa/mlx5: Parallelize device resume Dragos Tatulea
2024-08-02  7:20 ` [PATCH vhost 7/7] vdpa/mlx5: Keep notifiers during suspend but ignore Dragos Tatulea
2024-08-02 13:14 ` [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume Michael S. Tsirkin
2024-08-04  8:48   ` Leon Romanovsky
2024-08-04 13:39     ` Michael S. Tsirkin
2024-08-04 14:52       ` Leon Romanovsky
2024-08-16  9:13   ` Dragos Tatulea
2024-08-07 13:25 ` Eugenio Perez Martin
2024-08-07 14:54   ` Dragos Tatulea

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).